Faster GraalVM Clojure Compilation Times

Slash those GraalVM native-image compilation times with this one wierd trick!

Crispin Wellington
May 26, 2020 · 7 min read
unsplash-logoMuhammad Haikal Sjukri

Slash GraalVM Clojure Compilation Time With This One Weird Trick

Are you building a clojure project and compiling it to a native binary with GraalVM? As the project has grown, has the GraalVM compilation time gotten longer and longer? Here is the story of how I discovered @borkdude's trick to getting GraalVM native image times in under 5 minutes.

The Story

As I've been building spire, compilation times have gradually been increasing. Then in order to bring a bunch of increased functionality I added a large number of Java class bindings to my sci environment. I check the code in and go to bed. When I wake I discover my Circle CI builds have been killed in the night for taking too long. The build was underway for an hour when the CI system killed the build!

So I try compiling on my local linux workstation to see what is going on. After an eternity the build crashes with:

[build/spire:2202]     analysis: 12,784,610.52 ms,  6.22 GB
Fatal error:java.lang.OutOfMemoryError: GC overhead limit exceeded
Error: Image build request failed with exit status 1
com.oracle.svm.driver.NativeImage$NativeImageError: Image build request failed with exit status 1
    at com.oracle.svm.driver.NativeImage.showError(NativeImage.java:1527)
    at com.oracle.svm.driver.NativeImage.build(NativeImage.java:1289)
    at com.oracle.svm.driver.NativeImage.performBuild(NativeImage.java:1250)
    at com.oracle.svm.driver.NativeImage.main(NativeImage.java:1209)
    at com.oracle.svm.driver.NativeImage$JDK9Plus.main(NativeImage.java:1707)

12,784,610ms spent in analysis. That is over three and a half hours!

As I just added a bunch of new class bindings and reflection config for them all, I assume that must be the culprit. I remove 75% of them. And build again. I time this run. It succeeds, but it isn't pretty.

[build/spire:11341]    classlist:   5,953.42 ms,  0.94 GB
[build/spire:11341]        (cap):     800.78 ms,  0.94 GB
[build/spire:11341]        setup:   2,536.63 ms,  0.94 GB
[build/spire:11341]   (typeflow): 1,544,374.55 ms,  6.51 GB
[build/spire:11341]    (objects): 5,164,930.94 ms,  6.51 GB
[build/spire:11341]   (features):   4,320.62 ms,  6.51 GB
[build/spire:11341]     analysis: 6,724,448.94 ms,  6.51 GB
[build/spire:11341]     (clinit):   1,898.20 ms,  6.51 GB
[build/spire:11341]     universe:  47,567.23 ms,  6.51 GB
[build/spire:11341]      (parse):  12,451.68 ms,  4.17 GB
[build/spire:11341]     (inline):   9,098.24 ms,  3.94 GB
[build/spire:11341]    (compile):  75,592.43 ms,  3.59 GB
[build/spire:11341]      compile: 102,870.54 ms,  3.59 GB
[build/spire:11341]        image:  14,383.80 ms,  3.61 GB
[build/spire:11341]        write:   1,725.39 ms,  3.61 GB
[build/spire:11341]      [total]: 6,899,951.34 ms,  3.61 GB
cp build/spire spire

real	115m59.386s
user	839m24.587s
sys	1m56.493s

Wow! Analysis is now down to only 1 hour and 50 minutes! Good times!

So this leads me to wonder how Michiel (@borkdude) is building his babashka images on Circle CI without hitting the wall. I go over to have a look at his Circle CI builds. They are building his images in less that 5 minutes.... whaaaaaaat?

That One Weird Trick

This leads me to a bunch of comparisons between the compilation and building of babashka with that of spire. I rule out reflection as the source of the problem. I rule out GraalVM options. And then I discover the magical incantation.

I had this setting on my GraalVM build, but I did not have it on when clojure was doing AOT compilation for the uberjar. Before we turn it on lets look into what it does.

Direct Linking

Clojure's direct linking can be activated by passing -Dclojure.compiler.direct-linking=true to the compiler. This feature is documented here. From this discussion we read:

"Normally, invoking a function will cause a var to be dereferenced to find the function instance implementing it, then invoking that function... Direct linking can be used to replace this indirection with a direct static invocation of the function instead. This will result in faster var invocation. Additionally, the compiler can remove unused vars from class initialization and direct linking will make many more vars unused. Typically this results in smaller class sizes and faster startup times."

And faster Graal compilation times to boot! This option will produce JVM byte code that will be much more like what a standard Java program will produce. Java is a statically typed language after all, and Java programs are not dereferencing vars every time they are invoked. The Graal compiler, being built primarily to compile Java programmes, is obviously having a very hard time with the dynamic nature of clojure's compiled byte code.

But do we lose anything if we compile our code with direct linking? According to the docs:

"One consequence of direct linking is that var redefinitions will not be seen by code that has been compiled with direct linking (because direct linking avoids dereferencing the var). Vars marked as ^:dynamic will never be direct linked. If you wish to mark a var as supporting redefinition (but not dynamic), mark it with ^:redef to avoid direct linking."

As Michiel pointed out to me, things like a general use of with-redefs won't work with direct linking. But if we do want to do something dynamic like with-redefs in our code, we can individually mark those vars with ^:redef meta data to allow them to work. Also, things like with-redefs is more commonly used in writing tests, so we can keep the option off in our test code and save direct linking for our uberjar builds.

The Fixed Build

Now taking the original problematic build that crashed after three and a half hours, I switch that setting on in my uberjar compilation and rebuild. Here's the result:

[build/spire:22871]    classlist:   4,379.73 ms,  0.96 GB
[build/spire:22871]        (cap):     739.03 ms,  0.96 GB
[build/spire:22871]        setup:   2,268.45 ms,  0.96 GB
[build/spire:22871]   (typeflow):  50,731.79 ms,  5.92 GB
[build/spire:22871]    (objects): 116,840.36 ms,  5.92 GB
[build/spire:22871]   (features):   2,682.61 ms,  5.92 GB
[build/spire:22871]     analysis: 173,423.16 ms,  5.92 GB
[build/spire:22871]     (clinit):   1,193.08 ms,  5.92 GB
[build/spire:22871]     universe:   3,049.85 ms,  5.92 GB
[build/spire:22871]      (parse):   7,224.44 ms,  5.79 GB
[build/spire:22871]     (inline):  13,408.30 ms,  4.19 GB
[build/spire:22871]    (compile):  77,773.85 ms,  4.16 GB
[build/spire:22871]      compile: 104,068.58 ms,  4.16 GB
[build/spire:22871]        image:  13,542.17 ms,  4.16 GB
[build/spire:22871]        write:   1,792.26 ms,  4.22 GB
[build/spire:22871]      [total]: 302,882.67 ms,  4.22 GB
cp build/spire spire

real	5m32.050s
user	33m7.865s
sys	0m12.756s

Analysis is down from 3½ hours... to a little under 3 minutes. And amazing improvement!

Summary

Add -Dclojure.compiler.direct-linking=true to your clojure compilation JVM options when building your uberjar and when compiling with GraalVM.

In lein:

(defproject foo "0.1.0-SNAPSHOT"

  ;; missing lines

  :profiles {:uberjar {:aot :all
                       :jvm-opts ["-Dclojure.compiler.direct-linking=true"]}
)

And in GraalVM native-image:

graalvm-ce-java11-20.1.0-dev/bin/native-image \
    ...
    -J-Dclojure.compiler.direct-linking=true \
    ...

About Crispin Wellington

Crispin Wellington is a Software Developer and System Operations expert with over 25 years of professional experience. As well as being the founder of Epic Castle he is the founder and developer of backgammonbuddy.com. He is based in Perth, Australia and is a husband and father. As Epic Castle's principal consultant he is available to assist you in solving your software, operations and service problems.