Clojure on Fennel part one: Persistent Data Structures
Somewhere in 2019 I started a project that aimed to bring some of Clojure features to Lua runtime - fennel-cljlib.
It was a library for Fennel that implemented a basic subset of clojure.core namespace functions and macros.
My goal was simple - I enjoy working with Clojure, but I don’t use it for hobby projects, so I wanted Fennel to feel more Clojure-like, besides what it already provides for that.
This library grew over the years, I implemented lazy sequences, added immutability, made a testing library, inspired by clojure.test and kaocha, and even made a port of clojure.core.async.
It was a passion project, I almost never used it to write actual software.
One notable exception is fenneldoc - a tool for documentation generation for Fennel libraries.
And I haven’t seen anyone else use it for a serious project.
The reason for that is simple - it was an experiment. Corners were cut, and Fennel, being a Clojure-inspired lisp is not associated with functional programming the same way Clojure is. As a matter of fact, I wouldn’t recommend using this library for anything serious… yet.
Recently, however, I started a new project: ClojureFnl.
This is a Clojure-to-Fennel compiler that uses fennel-cljlib as a foundation.
It’s still in early days of development, but I’ve been working on it for a few months in private until I found a suitable way to make things work in March.
As of this moment, it is capable of compiling most of .cljc files I threw at it, but running the compiled code is a different matter.
I mean, it works to some degree, but the support for standard library is far from done.
;; Welcome to ClojureFnl REPL
;; ClojureFnl v0.0.1
;; Fennel 1.6.1 on PUC Lua 5.5
user=> (defn prime? [n]
(not (some zero? (map #(rem n %) (range 2 n)))))
#<function: 0x89ba7c550>
user=> (for [x (range 3 33 2)
:when (prime? x)]
x)
(3 5 7 11 13 17 19 23 29 31)
user=>
However, there was a problem.
My initial implementation of immutable data structures in the itable library had a serious flaw. The whole library was a simple hack based on the copy-on-write approach and a bunch of Lua metatables to enforce immutability. As a result, all operations were extremely slow. It was fine as an experiment, but if I wanted to go further with ClojureFnl, I had to replace it. The same problem plagued immutableredblacktree.lua, an implementation of a copy-on-write red-black tree I made for sorted maps. It did a full copy of the tree each time it was modified.
For associative tables it wasn’t that big of a deal - usually maps contain a small amount of keys, and itable only copied levels that needed to be changed.
So, if you had a map with, say, ten keys, and each of those keys contained another map with ten keys, adding, removing or updating a key in the outer map meant only copying these ten keys - not the whole nested map.
I could do that reliably, because inner maps were immutable too.
But for arrays the story is usually quite different. Arrays often store a lot of indices, and rarely are nested (or at least not as often as maps). And copying arrays on every change quickly becomes expensive. I’ve mitigated some of the performance problems by implementing my version of transients, however the beauty of Clojure’s data structures is that they’re quite fast even without this optimization.
Proper persistent data structures
Clojure uses Persistent HAMT as a base for its hash maps and sets, and a bit-partitioned trie for vectors. For sorted maps and sets, Clojure uses an immutable red-black tree implementation, but as far as I know it’s not doing a full copy of the tree, and it also has structural sharing properties.
I started looking into existing implementations of HAMT for Lua:
- hamt.lua (based on mattbierner/hamt)
- seemed incomplete
- ltrie
- no transients
- no hashset
- no ordered map (expectable, different algorithm)
- no compound vector/hash
- Michael-Keith Bernard’s gist
- no custom hashing
I could use one of those, notably ltrie seemed the most appropriate one, but given that I’m working on a fennel library that I want later to embed into my Clojure compiler I needed a library implemented in Fennel.
So I made my own library: immutable.fnl. This library features HAMT-hash maps, hash-sets, and vectors, as well as a better implementation of a persistent red-black tree, and lazy linked lists.
Persistent Hash Map
I started the implementation with a Persistent HAMT with native Lua hashing. The data structure itself is a Hash Array Mapped Trie (HAMT) with 16-factor branching. Thus all operations are O(Log16 N), which is effectively O(1) for a practical amount of keys.
As far as I know, Clojure uses branching factor of 32, but for a Lua runtime this would mean that the popcount would be more expensive, and despite a shallower tree, each mutation would need to copy a larger sparse array. With branching factor of 16 a map with 50K entries is ~4 levels deep, which would be ~3 with 32 branching factor. So my logic was that it’ll be a compromise, especially since Lua is not JVM when it comes to performance.
Of course, it’s not as fast as a pure Lua table, which is to be expected. Lua tables are implemented in C, use efficient hashing, and dynamically re-allocated based on key count. So for my implementation most operations are a lot slower, but the total time for an operation is still usable.
Here are some benchmarks:
Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on PUC Lua 5.5
Regular operations are notably slower, when compared to Lua:
| Operation | Lua table | Persistent HashMap | Ratio | per op |
|---|---|---|---|---|
| insert 50000 random keys | 2.05 ms | 164.80 ms | 80.3x slower | 3.3 us |
| lookup 50000 random keys | 0.83 ms | 92.51 ms | 110.8x slower | 1.9 us |
| delete all | 0.78 ms | 170.78 ms | 219.8x slower | 3.4 us |
| delete 10% | 0.14 ms | 19.50 ms | 136.4x slower | 3.9 us |
| iterate 50000 entries | 1.74 ms | 6.64 ms | 3.8x slower | 0.133 us |
For transients the situation is a bit better, but not by much:
| Operation | Lua table | Transient HashMap | Ratio | per op |
|---|---|---|---|---|
| insert 50000 random keys | 2.05 ms | 89.17 ms | 43.5x slower | 1.8 us |
| delete all | 0.76 ms | 104.31 ms | 138.0x slower | 2.1 us |
| delete 10% | 0.16 ms | 12.71 ms | 82.0x slower | 2.5 us |
On LuaJIT numbers may seem worse, but per-operation cost is much lower, it’s just that native table operations are so much faster:
Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on LuaJIT 2.1.1774896198 macOS/arm64
| Operation | Lua table | Persistent HashMap | Ratio | per op |
|---|---|---|---|---|
| insert 50000 random keys | 0.86 ms | 49.05 ms | 56.8x slower | 0.981 us |
| lookup 50000 random keys | 0.27 ms | 14.21 ms | 53.4x slower | 0.284 us |
| delete all | 0.13 ms | 48.63 ms | 374.1x slower | 0.973 us |
| delete 10% | 0.05 ms | 6.49 ms | 138.1x slower | 1.3 us |
| iterate 50000 entries | 0.07 ms | 1.80 ms | 27.7x slower | 0.036 us |
| Operation | Lua table | Transient HashMap | Ratio | per op |
|---|---|---|---|---|
| insert 50000 random keys | 0.76 ms | 22.43 ms | 29.6x slower | 0.449 us |
| delete all | 0.15 ms | 34.16 ms | 232.4x slower | 0.683 us |
| delete 10% | 0.04 ms | 5.02 ms | 132.1x slower | 1.0 us |
With a branching factor of 32 the situation gets worse on PUC Lua, but is slightly better on LuaJIT. So there’s still space for fine-tuning.
For hashing strings and objects I decided to use djb2 algorithm.
I am almost as old as this hash function, so seemed like a good fit.
JK.
The main reason to use it was that it can be implemented even if we don’t have any bit-wise operators, and Lua doesn’t have them in all of the versions.
It only uses +, *, and % arithmetic operators, so can be done on any Lua version.
It’s prone to collisions, and I try to mitigate that by randomizing it when the library is loaded.
Still, collisions do happen, but HAMT core ensures that they will still resolve correctly by implementing a deep equality function for most objects.
However, when first working on this, I noticed this:
>> (local hash-map (require :io.gitlab.andreyorst.immutable.PersistentHashMap))
nil
>> (local {: hash} (require :io.gitlab.andreyorst.immutable.impl.hash))
nil
>> (hash (hash-map :foo 1 :bar 2))
161272824
>> (hash {:foo 1 :bar 2})
161272824
>> (hash-map (hash-map :foo 1 :bar 2) 1 {:foo 1 :bar 2} 2)
{{:foo 1 :bar 2} 2}
This is an interesting loophole. What object ended up in our hash map as a key - our persistent map or plain Lua table? Well, that depends on insertion order:
>> (each [_ k (pairs (hash-map (hash-map :foo 1 :bar 2) 1 {:foo 1 :bar 2} 2))]
(print (getmetatable k)))
IPersistentHashMap: 0x824d9b570
nil
>> (each [_ k (pairs (hash-map {:foo 1 :bar 2} 2 (hash-map :foo 1 :bar 2) 1))]
(print (getmetatable k)))
nil
nil
To reiterate, I’m creating a hash map, with a key set to another persistent hash map, and then insert a plain Lua table with the same content. The Lua table hashes to exactly the same hash, and goes into the same bucket, but there’s no collision, because objects are equal by value. But equality of mutable collections is very loosely defined - it may be equal right now, but the next time you look at it, it’s different. So a different hashing was needed for persistent collections, to avoid these kinds of collision. I ended up salting persistent collections with their prototype address in memory.
Other than that, the HAMT implementation is by the book, and the rest is the interface for interacting with maps.
Main operations:
new- construct a new map of key value pairsassoc- associate a key with a valuedissoc- remove key from the mapconj- universal method for association, much like in Clojurecontains- check if key is in the mapcount- map size, constant timeget- get a key value from a mapkeys- get a lazy list of keysvals- get a lazy list of valuestransient- convert a map to a transient
Coercion/conversion:
from- create a map from another objectto-table- convert a map to a Lua tableiterator- get an iterator to use in Lua loops
Transient operations:
assoc!- mutableassocdissoc!- mutabledissocpersistent- convert back to persistent variant, and mark transient as completed
This covers most of the needs in my fennel-cljlib library, as anything besides it I can implement myself, or just adapt existing implementations.
A Persistent Hash Set is also available as a thin wrapper around PersistentHashMap with a few method changes.
A note on
PersistentArrayMap.In Clojure there is a second kind of maps that are ordered, not sorted, called a Persistent Array Map. They are used by default when defining a map with eight keys or less, like
{:foo 1 :bar 2}. The idea is simple - for such a small map, a linear search through all keys is faster than with a HAMT-based map.However, in my testing on the Lua runtime, there’s no benefit in this kind of a data structure, apart from it being an ordered variant. Lookup is slower, because of a custom equality function, which does deep comparison.
Persistent Vector
Persistent Vectors came next, and while the trie structure is similar to hash maps, vectors use direct index-based navigation instead of hashing, with a branching factor of 32. Unlike maps, vector arrays in the HAMT are more densely packed, and therefore a higher branching factor is better for performance. So lookup, update, and pop are O(log32 N), append can be considered O(1) amortized.
Still, compared to plain Lua sequential tables the performance is not as good:
Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on PUC Lua 5.5
| Operation | Lua table | Persistent Vector | Ratio | per op |
|---|---|---|---|---|
| insert 50000 elements | 0.19 ms | 21.07 ms | 109.7x slower | 0.421 us |
| lookup 50000 random indices | 0.47 ms | 14.05 ms | 29.7x slower | 0.281 us |
| update 50000 random indices | 0.32 ms | 70.04 ms | 221.6x slower | 1.4 us |
| pop all 50000 elements | 0.25 ms | 24.34 ms | 96.2x slower | 0.487 us |
| iterate 50000 elements | 0.63 ms | 10.16 ms | 16.2x slower | 0.203 us |
| Operation | Lua table | Transient Vector | Ratio | per op |
|---|---|---|---|---|
| insert 50000 elements | 0.19 ms | 7.81 ms | 40.3x slower | 0.156 us |
| update 50000 random indices | 0.33 ms | 20.76 ms | 62.4x slower | 0.415 us |
| pop all 50000 elements | 0.25 ms | 11.14 ms | 44.4x slower | 0.223 us |
On LuaJIT:
Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on LuaJIT 2.1.1774896198 macOS/arm64
| Operation | Lua table | Persistent Vector | Ratio | per op |
|---|---|---|---|---|
| insert 50000 elements | 0.10 ms | 7.62 ms | 74.0x slower | 0.152 us |
| lookup 50000 random indices | 0.06 ms | 0.67 ms | 11.8x slower | 0.013 us |
| update 50000 random indices | 0.04 ms | 29.13 ms | 710.4x slower | 0.583 us |
| pop all 50000 elements | 0.02 ms | 8.62 ms | 410.4x slower | 0.172 us |
| iterate 50000 elements | 0.02 ms | 0.57 ms | 28.7x slower | 0.011 us |
| Operation | Lua table | Transient Vector | Ratio | per op |
|---|---|---|---|---|
| insert 50000 elements | 0.05 ms | 0.59 ms | 11.6x slower | 0.012 us |
| update 50000 random indices | 0.04 ms | 2.06 ms | 51.6x slower | 0.041 us |
| pop all 50000 elements | 0.02 ms | 0.84 ms | 46.7x slower | 0.017 us |
I think this is an OK performance still. Vectors don’t use hashing, instead it is a direct index traversal via bit-shifting, so there’s no hashing, just index math.
Operations on vectors include:
new- constructorconj- append to the tailassoc- change a value at given indexcount- element count (constant time)get- get value at given indexpop- remove lasttransient- convert to a transientsubvec- create a slice of the vector in constant time
Transient operations:
assoc!- mutableassocconj!- mutableconjpop!- mutablepoppersistent- convert back to persistent and finalize
Interop:
from- creates a vector from any other collectioniterator- returns an iterator for use in Lua loopsto-table- converts to a sequential Lua table
One notable difference in both vector and hash-map is that it allows nil to be used as a value (and as a key, in case of the hash-map).
Vectors don’t have the same problem that Lua sequential tables have, where length is not well-defined if the table has holes in it.
It’s a debate for another time, whether allowing nil as a value (and especially as a key) is a good decision to make, but Clojure already made it for me.
So for this project I decided to support it.
Persistent Red-Black Tree
For sorted maps and sorted sets I chose Okasaki’s insertion and Germane & Might’s deletion algorithms. Most of the knowledge I got from this amazing blog post by Matt Might.
I believe the operations are O(Log N), as for any binary tree, but given that the deletion algorithm is tricky, I’m not exactly sure:
Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on PUC Lua 5.5
| Operation | Lua table | PersistentTreeMap | Ratio | per op |
|---|---|---|---|---|
| insert 50000 random keys | 2.10 ms | 209.23 ms | 99.8x slower | 4.2 us |
| lookup 50000 random keys | 0.88 ms | 82.97 ms | 94.2x slower | 1.7 us |
| delete all | 0.74 ms | 173.76 ms | 234.8x slower | 3.5 us |
On LuaJIT:
Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on LuaJIT 2.1.1774896198 macOS/arm64
| Operation | Lua table | PersistentTreeMap | Ratio | per op |
|---|---|---|---|---|
| insert 50000 random keys | 0.72 ms | 101.08 ms | 140.4x slower | 2.0 us |
| lookup 50000 random keys | 0.25 ms | 12.67 ms | 49.9x slower | 0.253 us |
| delete all | 0.14 ms | 56.14 ms | 403.9x slower | 1.1 us |
The API for sorted maps and sets is the same as to their hash counterparts with a small difference - no transients. Clojure doesn’t do them, and I’m not doing them too.
That’s all for benchmarks.
I know that there are many problems with this kind of benchmarking, so take it with a grain of salt.
Still, the results are far, far better than what I had with itable.
But there are two more data structures to talk about.
Persistent List
As I mentioned, I made a lazy persistent list implementation a while ago but it had its problems and I couldn’t integrate that library with the current one well enough.
The main problem was that this library uses a single shared metatable per data structure, and the old implementation of lazy lists didn’t. This difference makes it hard to check whether the object is a table, hash-map, list, vector, set, etc. So I reimplemented them.
The reason for old implementation to use different metatables was because I decided to try the approach described in Reversing the technical interview post by Kyle Kingsbury (Aphyr). I know this post is more of a fun joke, but it actually makes sense to define linked lists like that in Lua.
See, tables are mutable, and you can’t do much about it.
Closures, on the other hand are much harder to mutate - you can still do it via the debug module, but it’s hard, and it’s not always present.
So storing head and tail in function closures was a deliberate choice.
However, it meant that I needed to somehow attach metadata to the function, to make it act like a data structure, and you can’t just use setmetatable on a function.
Again, you can do debug.setmetatable but all function objects share the same metadata table.
So, while you can do fancy things like this:
>> (fn comp [f g] (fn [...] (f (g ...))))
#<function: 0x7bdb320a0>
>> (debug.setmetatable (fn []) {:__add comp})
#<function: 0x7bd17f040>
>> ((+ string.reverse string.upper) "foo")
"OOF"
You can also notice, that our + overload applied to functions in the string module.
So instead, we use a table, and wrap it with a metatable that has a __call metamethod, essentially making our table act like a function.
This, in turn means, that we have to create two tables per list node - one to give to the user, the other to set our __call and use it as a meta-table.
Convoluted, I know.
It’s all in the past now - current implementation is a simple {:head 42 :tail {...}} table.
Not sure what is worse.
But that meant that I had to rework how lazy lists worked, because previously it was just a metatable swap.
Now list stores a “thunk”, that when called replaces itself in the node with the :head and :tail keys.
Unless it’s an empty list, of course - in that case we swap the metatable to an empty list one.
So Lists have three metatables now:
IPersistentListIPersistentList$EmptyIPersistentList$Lazy
Instead of god knows how many in the old implementation.
The list interface is also better now. Previously it was hardcoded how to construct a list from a data structure. Current implementation also hardcodes it, but also allows to build a list in a lazy way from an iterator.
This is better, because now a custom data structure that has weird iteration schema (like maps and sets in this library), we still can convert it to a list. A general case is just:
(PersistentList.from-iterator #(pairs data) (fn [_ v] v))
Meaning that we pass a function that will produce the iterator, and a function to capture values from that iterator. Reminds me of clojure transducers in some way.
Persistent Queue
And the final data structure - a persistent queue. Fast append at the end, and also fast remove from the front.
It’s done by holding two collections - a linked list at the front, and a persistent vector for the rear. So removing from the list is O(1), and appending to the vector is also pretty much O(1).
Interesting things start to happen when we exhaust the list part - we need to move vector’s contents into the list.
It is done by calling PersistentList.from on the rear.
And building a list out of a persistent vector is an O(1) operation as well!
Well, because nothing happens, we simply create an iterator, and build the list in a lazy way.
But since indexing the vector is essentially ~O(1), we can say that we still retain this property.
Or at least that’s how I reasoned about this - I’m not that good with time-complexity stuff.
ClojureFnl
That concludes part one about ClojureFnl.
I know that this post was not about ClojureFnl at all, but I had to fix my underlying implementation first. Now, that I have better data structures to build from, I can get back working on the compiler itself. So the next post will hopefully be about the compiler itself.
Unless I get distracted again.




{align=right loading=lazy style="width:240px"}
