-
Notifications
You must be signed in to change notification settings - Fork 4
Namespace Load Optimization
Arcadia's version of the ClojureCLR compiler incorporates important optimizations for loading namespaces fast. This article describes the rationale for and implementation of these optimizations.
Clojure is a live dynamic language and a compiled language, permitting its runtime speed to be much faster than an interpreted language. Compiling from source can be a relatively slow process however – as much as 30 seconds for the entire Clojure and Arcadia codebase. Clojure solves this by supporting ahead of time (AOT) compilation, which pre-compiles Clojure source into CIL bytecode stored in a DLL file on disk.
AOT greatly reduces load times by eliminating the compilation step. Even with AOT, however, users have reported load times of as much as 9 seconds due to certain inefficiencies in the namespace initialization procedure. The problem is exacerbated in Unity, where the virtual machine is reset every time the user plays their game or edits a C# file, triggering the startup procedure multiple times during a single development session.
The optimizations described here eliminate these inefficiencies. They reduce our total startup time, including the overhead of starting Unity, by about a factor of three, going from 9s to 3s on a 2017 Razer Blade laptop with a 2.8Ghz Core i7 CPU.
The primary culprit is JIT time. Clojure is a completely dynamic language, and namespaces are not sets of static definitions that can be loaded directly by the VM like C# namespaces. Rather, they are processes that need to run to completion. These processes manifest in the bytecode primarily as an Initialize
method for each namespace, with each expression in the namespace compiled into the Initialize
method's body. For namespaces with many expressions, this method can become very large. The Mono VM must JIT compile every method to native code before it can execute it, and we've observed the time to JIT compile Clojure namespace initialization methods dominating our startups.
In many environments this JIT step can be skipped by compiling to native code ahead of time, but this is not an option in Unity3D, so this optimization is designed to minimize the cost of JIT compiling the bodies of namespaces. Three techniques work together to achieve this.
By default, ClojureCLR will move every constant expression into a field that is statically initialized. This avoids interning vars or keywords every time a function is invoked. However, this optimization does not make sense for the code at the top level of a namespace, as it will only ever be run once, and the resulting static initializer method can be large enough to slow down the JIT. We introduce the dynamic var clojure.lang.Compiler.RegisterConstants
that is true
by default, but can be bound to false
to prevent statically initializing constants. We bind it to false
when compiling the namespace initialization code.
A significant portion of namespace initialization code is the construction of Clojure metadata hashmaps (with documentation strings, argument lists etc.) and their assignment to vars. Almost none of this information is used to boot the language, so this code and the associated JIT cost is wasteful. In the common case where the metadata hashmap is constant data, we can move its construction and assignment bytecode out of the namespace initialization logic and into its own type, to be loaded lazily in when needed. We store the bytecode in the type initializer of a generated container type, and store a reference to that type in the var being compiled. This happens in two places to deal with optimized defn
s (described below) and all other expressions.
Three elements of Clojure metadata are actually used by the runtime during initialization, and receive special treatment: :dynamic
, :macro
, and :private
. Our optimization implements these "static metadata flags" as instance fields on the Var class, so accessing them does not need to construct and load the whole Clojure metadata hashmap.
With constants and metadata out of the namespace initialization method, what remains is the compilation of every top level expression in the namespace. This optimization considers them in two categories: defn
s with constant metadata, and everything else. defn
s with constant metadata receive special treatment because their initialization bytecode is identical and does not require any per-expression JIT compilation. All other expressions are treated as arbitrary code that needs to be compiled normally and run. Most well written namespaces will consist primarily of expressions in the former category, so this technique is quite effective.
Every namespace's initialization type gets a NamespaceBodyAttribute
CLR metadata attribute. This data is effectively baked into the the compiled binary and loaded very performantly by the VM. Namespace loading at runtime is reduced to feeding data from this attribute into the Compiler.InitializeNamespace
method.
NamespaceBodyAttribute
contains four fields, all of which are arrays. The lengths of the arrays are guaranteed to be equal, and each index into the arrays cooresponds to an expression at the top level of the namespace. The fields are:
-
string[] names
if expressioni
is adefn
with constant metadata,names[i]
is the name of the var. Otherwisenames[i]
isnull
. -
Type[] types
if expressioni
is adefn
with constant metadata,types[i]
is theIFn
subclass to associate with the var. Otherwisetypes[i]
is a container type that stores the compiled expression in its static initializer. -
StaticMetadataFlags[] metadataFlags
if expressioni
is adefn
with constant metadata,metadataFlags[i]
is a bitmask of which static metadata flags to set when constructing the var as described by theStaticMetadataFlags
enum type. Otherwise,metadataFlags[i]
isStaticMetadataFlags.None
and ignored. -
Type[] metadataTypes
if expressioni
is adefn
with constant metadata,metadataTypes[i]
is the container type storing the metadata construction and assignment bytecode in its static initializer to be loaded lazily by the var. Otherwise,metadataTypes[i]
null.
- We store arbitrary code in container types' static initializers for lazy metadata loading and top level expressions. This is a bit of a hack to avoid looking up a method by name reflectively at runtime and using the .NET framework's
RuntimeHelpers.RunClassConstructor
method instead, as we do here and here.