Kotlin http4k (via GraalVM Native Image) and Golang
First things first, this article is not a scientific performance comparison between Kotlin and Go (aka Golang). I want to show that someone can use lightweight frameworks with Kotlin to achieve small enough binaries that execute fast, and quite comparable to Go’s executables. One of my favourite deployment platforms is AWS Lambda, and it really makes a difference having small artifacts to upload, with very fast startup times, otherwise cold start invocations will essentially push latencies to the roof.
As part of a new project, I am trying to decide what stack to use as the backend. Let’s see how Go and Kotlin compare based on my very subjective and opinionated experience.
Go (favourite backend choice)
- PRO: Great developer experience (i.e. tooling)
- PRO: Great standard library
- PRO: Small binary executables
- PRO: Amazing performance (especially the low memory usage!)
- CON: Some things are not available in quality libraries, hence might need boilerplate, or complete development (e.g. Machine Learning)
Kotlin (favourite language)
- PRO: Amazingly expressive language
- PRO: Great standard library & third-party libraries for anything
- PRO: Good developer experience (at least when Gradle et al. work as expected)
- PRO: Good overall performance
- CON: Very high memory usage due to Java Virtual Machine (JVM)
The rest of the article will be a walkthrough for implementing a small API in Go and Kotlin, with a comparison of their performance against 200 simultaneous connections of 10s continuous traffic.
API
A very simple API with two endpoints, and SQLite as the database.
/?name=<requested-name>
- Checks the database to see if
<requested-name>
exists in theusers
table and responds accordingly.
- Checks the database to see if
/add?name=<new-name>
- Tries to add a new entry in the
users
table withname=<new-name>
(without checking if it exists) and responds accordingly.
- Tries to add a new entry in the
The table schema:
sqlite> .schema
CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
UNIQUE(name)
);
Go
I will not paste here the full code for Go since it’s quite a few lines, and it’s not the main purpose of this article.
Check out the full source code at https://github.com/lambrospetrou/code-playground/blob/master/go-vs-kotlin-datastores/golang/main.go.
The only third-party dependency I use is crawshaw.io/sqlite
library by David Crawshaw as the SQLite driver, who also wrote an amazing article about it.
Stats
Some interestings facts of the Go implementation.
- Binary size:
9.7MB
$ ls -hl v1
-rwxr-xr-x 1 lambros lambros 9.7M 26 Sep 20:10 v1
- RSS memory at startup:
6.6MB
$ ps -eo pid,rss,%mem,%cpu,command | grep -e "./v1"
90176 6660 0.0 0.0 ./v1
- Replay
GET /add?name=i_am_ironman
for 10s with 2 threads of 100 connections each.- Average requests per second:
15.11k
(total:303834
) - RSS memory after replay:
17.9MB
(!)
- Average requests per second:
$ wrk -t2 -c100 -d10s --latency --timeout 1s http://localhost:8080/add\?name\=i_am_ironman
Running 10s test @ http://localhost:8080/add?name=i_am_ironman
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.89ms 15.52ms 675.26ms 97.49%
Req/Sec 15.11k 2.10k 18.45k 58.91%
Latency Distribution
50% 3.00ms
75% 3.62ms
90% 4.51ms
99% 51.22ms
303834 requests in 10.10s, 52.45MB read
Non-2xx or 3xx responses: 303833
Requests/sec: 30075.57
Transfer/sec: 5.19MB
- Replay
GET /?name=i_am_ironman
for 10s with 2 threads of 100 connections each.- Average requests per second:
18.58k
(total:373386
) - RSS memory after replay:
18.2MB
(!)
- Average requests per second:
$ wrk -t2 -c100 -d10s --latency --timeout 1s http://localhost:8080/\?name\=i_am_ironman
Running 10s test @ http://localhost:8080/?name=i_am_ironman
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.65ms 702.95us 9.48ms 73.62%
Req/Sec 18.58k 2.45k 26.61k 74.26%
Latency Distribution
50% 2.60ms
75% 2.99ms
90% 3.50ms
99% 4.75ms
373386 requests in 10.10s, 53.06MB read
Requests/sec: 36960.38
Transfer/sec: 5.25MB
Small binary, fast startup, and extremely low memory usage 🚀
Kotlin
I am a huge fan of the http4k framework for building APIs with Kotlin. It is very small, modular, extremely versatile, and due to its philosophy “Server as a Function” it’s just an absolute joy to work with. It works both locally, and in AWS Lambda, and runs inside the JVM but also supports compiling down to native binary through GraalVM Native Image.
As a driver for SQLite I used SQLDelight. This is the first time I used it, and I am very pleasantly surprised by how nice it is. Definitely using it for all my SQL needs in the JVM. I tried to use JetBrains Exposed before SQLDelight but faced issues during the native image compilation, and its custom modelling is a no-go for me.
Check out the full source code at https://github.com/lambrospetrou/code-playground/tree/master/go-vs-kotlin-datastores/kotlin-sqlite.
As I mentioned above, I really love Kotlin “the language”, and in combination with http4k
, it enables very lean servers. This is the entire server code.
package com.example
import com.squareup.sqldelight.db.SqlDriver
import com.squareup.sqldelight.sqlite.driver.JdbcSqliteDriver
import org.http4k.core.HttpHandler
import org.http4k.core.Method.GET
import org.http4k.core.Response
import org.http4k.core.Status
import org.http4k.core.Status.Companion.BAD_REQUEST
import org.http4k.core.Status.Companion.NOT_FOUND
import org.http4k.core.Status.Companion.OK
import org.http4k.routing.bind
import org.http4k.routing.routes
import org.http4k.server.ApacheServer
import org.http4k.server.asServer
import org.sqlite.SQLiteConfig
fun makeApp(db: Database): HttpHandler = routes(
"/ping" bind GET to {
Response(OK).body("pong")
},
"/add" bind GET to {
val name = (it.query("name") ?: "").trim()
when {
name.isEmpty() ->
Response(BAD_REQUEST).body("Invalid 'name' given: $name")
else -> {
val userQueries = db.usersQueries
try {
userQueries.transaction {
userQueries.addUser(name = name)
}
Response(OK).body("Welcome to our community $name!")
} catch (e: Exception) {
Response(Status.INTERNAL_SERVER_ERROR).body("Sadly we failed to register you: $name")
}
}
}
},
"/" bind GET to {
val name = (it.query("name") ?: "").trim()
when {
name.isEmpty() ->
Response(BAD_REQUEST).body("Invalid 'name' given: $name")
else -> {
val userQueries = db.usersQueries
userQueries.selectByName(name = name).executeAsOneOrNull()?.let {
Response(OK).body("Boom! We found you: $name")
} ?: Response(NOT_FOUND).body("Sadly we could not find you: $name")
}
}
}
)
fun main() {
val port = (System.getenv("PORT") ?: "9000").toInt()
val config = SQLiteConfig().apply {
setSharedCache(true)
setJournalMode(SQLiteConfig.JournalMode.WAL)
}
val driver: SqlDriver = JdbcSqliteDriver("jdbc:sqlite:/tmp/users.sqlite3", properties = config.toProperties())
Database.Schema.create(driver)
val app: HttpHandler = makeApp(db = Database(driver))
val server = app.asServer(ApacheServer(port)).start()
println(Runtime.getRuntime().availableProcessors())
println("Server started on " + server.port())
}
Stats - JVM
In this section I am going to show the stats for running the above Kotlin code inside the JVM, the standard way.
The following stats are without any custom JVM argument, just java -jar build/libs/HelloWorld.jar
.
For context, I use the JVM provided by GraalVM:
$ java -version
openjdk version "16.0.2" 2021-07-20
OpenJDK Runtime Environment GraalVM CE 21.2.0 (build 16.0.2+7-jvmci-21.2-b08)
OpenJDK 64-Bit Server VM GraalVM CE 21.2.0 (build 16.0.2+7-jvmci-21.2-b08, mixed mode, sharing)
- Fat-JAR size:
15MB
$ ls -hl build/libs/HelloWorld.jar
-rw-r--r-- 1 lambros lambros 15M 26 Sep 21:42 build/libs/HelloWorld.jar
- RSS memory at startup:
143.9MB
$ ps -eo pid,rss,%mem,%cpu,command | grep -e "HelloWorld.jar"
3460 143924 0.4 0.0 java -jar build/libs/HelloWorld.jar
- Replay
GET /add?name=i_am_ironman
for 10s with 2 threads of 100 connections each.- Average requests per second:
7.17k
(total:142653
) - RSS memory after replay:
1400MB
(1.4GB
)
- Average requests per second:
$ wrk -t2 -c100 -d10s --latency --timeout 1s http://localhost:9000/add\?name\=i_am_ironman
Running 10s test @ http://localhost:9000/add?name=i_am_ironman
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 68.57ms 154.96ms 950.92ms 88.32%
Req/Sec 7.17k 2.18k 12.00k 65.50%
Latency Distribution
50% 138.00us
75% 39.81ms
90% 269.81ms
99% 722.75ms
142653 requests in 10.01s, 27.75MB read
Socket errors: connect 0, read 0, write 0, timeout 278
Non-2xx or 3xx responses: 142653
Requests/sec: 14253.41
Transfer/sec: 2.77MB
- Replay
GET /?name=i_am_ironman
for 10s with 2 threads of 100 connections each.- Average requests per second:
17.37k
(total:345639
) - RSS memory after replay:
2500MB
(2.5GB
)
- Average requests per second:
$ wrk -t2 -c100 -d10s --latency --timeout 1s http://localhost:9000/\?name\=i_am_ironman
Running 10s test @ http://localhost:9000/?name=i_am_ironman
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.41ms 4.36ms 139.93ms 93.48%
Req/Sec 17.37k 3.04k 20.22k 92.00%
Latency Distribution
50% 2.68ms
75% 4.68ms
90% 6.92ms
99% 15.54ms
345639 requests in 10.01s, 1.01GB read
Non-2xx or 3xx responses: 340797
Requests/sec: 34526.79
Transfer/sec: 103.35MB
So much memory used…
I also tried restricting the heap memory to 256MB
(with java -Xmx256M -jar build/libs/HelloWorld.jar
) but the results were roughly the same, even though the memory heap part was capped.
The ApacheServer
used here seems to be using lots of memory due to the many connections.
Stats - GraalVM Native Image
This section shows the boost we can get by compiling our code down to a native binary using GraalVM Native Image. This is a great piece of technology and hopefully it will keep evolving and improving.
This is the command I use to compile the fat-JAR down to a binary:
$ native-image --no-fallback -H:+ReportExceptionStackTraces --enable-url-protocols=https -jar build/libs/HelloWorld.jar build/libs/HelloWorld-native
- Binary size:
46MB
- RSS memory at startup:
19.7MB
(!) - Replay
GET /add?name=i_am_ironman
for 10s with 2 threads of 100 connections each.- Average requests per second:
6.50k
(total:129385
) - RSS memory after replay:
721.1MB
- Average requests per second:
- Replay
GET /?name=i_am_ironman
for 10s with 2 threads of 100 connections each.- Average requests per second:
17.17k
(total:341736
) - RSS memory after replay:
1700MB
(1.7GB
)
- Average requests per second:
As we can see, the memory usage is reduced significantly compared to the JVM runs, especially at the start, without introducing significant performance regression. Also, even though the binary size is larger than the fat-JAR, this is a standalone binary we can copy to any system and just run. In the fat-JAR case we still need to have the Java JVM installed in the system running the code. Although, to be fair, with the fat-JAR being less than 20MB
cold starts in AWS Lambda should be non-existent.
Binary compression with UPX
UPX is a free, portable, extendable, high-performance executable packer for several executable formats.
Running upx -7 -k build/libs/HelloWorld-native
will essentially take the GraalVM binary and compress it down significantly. The first time the executable runs, it decompresses the content and then proceeds to execution.
The size of the binary went from 46MB
down to 18.9MB
, without affecting the startup time in any meanigful way.
Stats - Summary
Artifact Size | Initial RSS memory | Final RSS memory | Replay 1 Total Requests | Replay 2 Total Requests | |
---|---|---|---|---|---|
Go | 9.7MB | 6.6MB | 18.2MB | 303834 | 373386 |
JVM Fat-JAR | 15MB | 143.9MB | 2500MB | 142653 | 345639 |
GraalVM Native Image | 46MB | 19.7MB | 1700MB | 129385 | 341736 |
UPX | 18.9MB | - | - | - | - |
Conclusion
Please don’t get too scientific on me about the results. I know (and acknowledge) there are many nuances that affect the differences in performance, especially with SQLite parameters used, connection pooling in Go but not in Kotlin, etc.
The throughput & latency of all server versions examined above is satisfactory to me, so that’s not a concern.
The huge memory usage difference though is what still concerns me when using the JVM, even when the code gets compiled down to an executable binary. I expected that with GraalVM Native Image the memory use would be reduced a lot more, but I guess there is still lots of space for improvement.
I really like Kotlin the language though, so I am looking forward to the moment when memory won’t be an issue anymore. For the time being, I think I will stick with Go for long-running servers that will have thousands of parallel open connections. On the other hand, for AWS Lambda deployments Kotlin has become a very viable solution! Each Lambda invocation serves exactly one connection at a time, which means we only care about the binary size, and overall performance, and both of these are in the OK-territory (albeit the extra memory usage).