This post walks through a small web development project using Clojure, covering everything from building the app to packaging and deploying it. It’s a collection of insights and tips I’ve learned from building my Clojure side projects, but presented in a more structured format.
As the title suggests, we’ll be deploying the app to Fly.io. It’s a service that allows you to deploy apps packaged as Docker images on lightweight virtual machines.[1][1] My experience with it has been good; it’s easy to use and quick to set up. One downside of Fly is that it doesn’t have a free tier, but if you don’t plan on leaving the app deployed, it barely costs anything.
This isn’t a tutorial on Clojure, so I’ll assume you already have some familiarity with the language as well as some of its libraries.[2][2]
In this post, we’ll be building a barebones bookmarks manager for the demo app. Users can log in using basic authentication, view all bookmarks, and create a new bookmark. It’ll be a traditional multi-page web app and the data will be stored in a SQLite database.
Here’s an overview of the project’s starting directory structure:
And the libraries we’re going to use. If you have some Clojure experience or have used Kit, you’re probably already familiar with all the libraries listed below.[3][3]
I use Aero and Integrant for my system configuration (more on this in the next section), Ring with the Jetty adaptor for the web server, Reitit for routing, next.jdbc for database interaction, and Hiccup for rendering HTML. From what I’ve seen, this is a popular “library combination” for building web apps in Clojure.[4][4]
The user namespace in dev/user.clj contains helper functions from Integrant-repl to start, stop, and restart the Integrant system.
dev/user.clj
(ns user
(:require[acme.main :as main][clojure.tools.namespace.repl :as repl][integrant.core :as ig][integrant.repl :refer[set-prep! go halt reset reset-all]]))(set-prep!(fn[](ig/expand(main/read-config))));; we'll implement this soon(repl/set-refresh-dirs"src""resources")(comment(go)(halt)(reset)(reset-all))
If you’re new to Integrant or other dependency injection libraries like Component, I’d suggest reading “How to Structure a Clojure Web”. It’s a great explanation of the reasoning behind these libraries. Like most Clojure apps that use Aero and Integrant, my system configuration lives in a .edn file. I usually name mine as resources/config.edn. Here’s what it looks like:
In production, most of these values will be set using environment variables. During local development, the app will use the hard-coded default values. We don’t have any sensitive values in our config (e.g., API keys), so it’s fine to commit this file to version control. If there are such values, I usually put them in another file that’s not tracked by version control and include them in the config file using Aero’s #include reader tag.
This config file is then “expanded” into the Integrant system map using the expand-key method:
The system map is created in code instead of being in the configuration file. This makes refactoring your system simpler as you only need to change this method while leaving the config file (mostly) untouched.[5][5]
My current approach to Integrant + Aero config files is mostly inspired by the blog post “Rethinking Config with Aero & Integrant” and Laravel’s configuration. The config file follows a similar structure to Laravel’s config files and contains the app configurations without describing the structure of the system. Previously, I had a key for each Integrant component, which led to the config file being littered with #ig/ref and more difficult to refactor.
Also, if you haven’t already, start a REPL and connect to it from your editor. Run clj -M:dev if your editor doesn’t automatically start a REPL. Next, we’ll implement the init-key and halt-key! methods for each of the components:
src/acme/main.clj
;; src/acme/main.clj(ns acme.main
(:require;; ...[acme.handler :as handler][acme.util :as util])[next.jdbc :as jdbc][ring.adapter.jetty :as jetty]));; ...(defmethodig/init-key:server/jetty[_ opts](let[{:keys[handler port]} opts
jetty-opts (-> opts (dissoc:handler:auth)(assoc:join?false))
server (jetty/run-jetty handler jetty-opts)](println"Server started on port " port)
server))(defmethodig/halt-key!:server/jetty[_ server](.stop server))(defmethodig/init-key:handler/ring[_ opts](handler/handler opts))(defmethodig/init-key:database/sql[_ opts](let[datasource (jdbc/get-datasource opts)](util/setup-db datasource)
datasource))
The setup-db function creates the required tables in the database if they don’t exist yet. This works fine for database migrations in small projects like this demo app, but for larger projects, consider using libraries such as Migratus (my preferred library) or Ragtime.
src/acme/util.clj
(ns acme.util
(:require[next.jdbc :as jdbc]))(defnsetup-db[db](jdbc/execute-one!
db
["create table if not exists bookmarks (
bookmark_id text primary key not null,
url text not null,
created_at datetime default (unixepoch()) not null
)"]))
For the server handler, let’s start with a simple function that returns a “hi world” string.
Now all the components are implemented. We can check if the system is working properly by evaluating (reset) in the user namespace. This will reload your files and restart the system. You should see this message printed in your REPL:
:reloading (acme.util acme.handler acme.main)
Server started on port 8080
:resumed
If we send a request to http://localhost:8080/, we should get “hi world” as the response:
$ curl localhost:8080/
# hi world
Nice! The system is working correctly. In the next section, we’ll implement routing and our business logic handlers.
If you remember the :handler/ring from earlier, you’ll notice that it has two dependencies, database and auth. Currently, they’re inaccessible to our route handlers. To fix this, we can inject these components into the Ring request map using a middleware function.
The components-middleware function takes in a map of components and creates a middleware function that “assocs” each component into the request map.[6][6] If you have more components such as a Redis cache or a mail service, you can add them here.
We’ll also need a middleware to handle HTTP basic authentication.[7][7] This middleware will check if the username and password from the request map match the values in the auth map injected by components-middleware. If they match, then the request is authenticated and the user can view the site.
A nice feature of Clojure is that interop with the host language is easy. The base64-encode function is just a thin wrapper over Java’s Base64.Encoder:
We now have everything we need to implement the route handlers or the business logic of the app. First, we’ll implement the index-page function, which renders a page that:
Shows all of the user’s bookmarks in the database, and
Shows a form that allows the user to insert new bookmarks into the database
src/acme/handler.clj
(ns acme.handler
(:require;; ...[next.jdbc :as jdbc][next.jdbc.sql :as sql]));; ...(defntemplate[bookmarks][:html[:head[:meta{:charset"utf-8":name"viewport":content"width=device-width, initial-scale=1.0"}]][:body[:h1"bookmarks"][:form{:method"POST"}[:div[:label{:for"url"}"url "][:input#url {:name"url":type"url":requiredtrue:placeholer"https://en.wikipedia.org/"}]][:button"submit"]][:p"your bookmarks:"][:ul(if(empty? bookmarks)[:li"you don't have any bookmarks"](map(fn[{:keys[url]}][:li[:a{:href url} url]])
bookmarks))]]])(defnindex-page[req](try(let[bookmarks (sql/query(:db req)["select * from bookmarks"]
jdbc/unqualified-snake-kebab-opts)](util/render(template bookmarks)))(catch Exception e
(util/server-error e))));; ...
Database queries can sometimes throw exceptions, so it’s good to wrap them in a try-catch block. I’ll also introduce some helper functions:
render takes a hiccup form and turns it into a ring response, while server-error takes an exception, logs it, and returns a 500 response.
Next, we’ll implement the index-action function:
src/acme/handler.clj
;; ...(defnindex-action[req](try(let[{:keys[db form-params]} req
value (get form-params "url")](sql/insert! db :bookmarks{:bookmark_id(random-uuid):url value})(res/redirect"/"303))(catch Exception e
(util/server-error e))));; ...
This is an implementation of a typical post/redirect/get pattern. We get the value from the URL form field, insert a new row in the database with that value, and redirect back to the index page. Again, we’re using a try-catch block to handle possible exceptions from the database query.
That should be all of the code for the controllers. If you reload your REPL and go to http://localhost:8080, you should see something that looks like this after logging in:
The last thing we need to do is to update the main function to start the system:
Now, you should be able to run the app using clj -M -m acme.main. That’s all the code needed for the app. In the next section, we’ll package the app into a Docker image to deploy to Fly.
While there are many ways to package a Clojure app, Fly.io specifically requires a Docker image. There are two approaches to doing this:
Build an uberjar and run it using Java in the container, or
Load the source code and run it using Clojure in the container
Both are valid approaches. I prefer the first since its only dependency is the JVM. We’ll use the tools.build library to build the uberjar. Check out the official guide for more information on building Clojure programs. Since it’s a library, to use it, we can add it to our deps.edn file with an alias:
Tools.build expects a build.clj file in the root of the project directory, so we’ll need to create that file. This file contains the instructions to build artefacts, which in our case is a single uberjar. There are many great examples of build.clj files on the web, including from the official documentation. For now, you can copy+paste this file into your project.
To build the project, run clj -T:build uber. This will create the uberjar standalone.jar in the target directory. The uber in clj -T:build uber refers to the uber function from build.clj. Since the build system is a Clojure program, you can customise it however you like. If we try to run the uberjar now, we’ll get an error:
# build the uberjar$ clj -T:build uber
# Cleaning build directory...# Copying files...# Compiling Clojure...# Building Uberjar...# run the uberjar$ java-jar target/standalone.jar
# Error: Could not find or load main class acme.main# Caused by: java.lang.ClassNotFoundException: acme.main
This error occurred because the Main class that is required by Java isn’t built. To fix this, we need to add the :gen-class directive in our main namespace. This will instruct Clojure to create the Main class from the -main function.
src/acme/main.clj
(ns acme.main
;; ...(:gen-class));; ...
If you rebuild the project and run java -jar target/standalone.jar again, it should work perfectly. Now that we have a working build script, we can write the Dockerfile:
Dockerfile
# install additional dependencies here in the base layer# separate base from build layer so any additional deps installed are cachedFROM clojure:temurin-21-tools-deps-bookworm-slim AS baseFROM base AS buildWORKDIR /optCOPY . .RUN clj -T:build uberFROM eclipse-temurin:21-alpine AS prodCOPY--from=build /opt/target/standalone.jar /EXPOSE 8080ENTRYPOINT ["java", "-jar", "standalone.jar"]
It’s a multi-stage Dockerfile. We use the official Clojure Docker image as the layer to build the uberjar. Once it’s built, we copy it to a smaller Docker image that only contains the Java runtime.[8][8] By doing this, we get a smaller container image as well as a faster Docker build time because the layers are better cached.
That should be all for packaging the app. We can move on to the deployment now.
First things first, you’ll need to install flyctl, Fly’s CLI tool for interacting with their platform. Create a Fly.io account if you haven’t already. Then run fly auth login to authenticate flyctl with your account.
$ fly app create
# ? Choose an app name (leave blank to generate one): # automatically selected personal organization: Ryan Martin# New app created: blue-water-6489
Another way to do this is with the fly launch command, which automates a lot of the app configuration for you. We have some steps to do that are not done by fly launch, so we’ll be configuring the app manually. I also already have a fly.toml file ready that you can straight away copy to your project.
fly.toml
# replace these with your app and region name# run `fly platform regions` to get a list of regionsapp='blue-water-6489'primary_region='sin'[env]DB_DATABASE="/data/database.db"[http_service]internal_port=8080force_https=trueauto_stop_machines="stop"auto_start_machines=truemin_machines_running=0[mounts]source="data"destination="/data"initial_sie=1[[vm]]size="shared-cpu-1x"memory="512mb"cpus=1cpu_kind="shared"
These are mostly the default configuration values with some additions. Under the [env] section, we’re setting the SQLite database location to /data/database.db. The database.db file itself will be stored in a persistent Fly Volume mounted on the /data directory. This is specified under the [mounts] section. Fly Volumes are similar to regular Docker volumes but are designed for Fly’s micro VMs.
We’ll need to set the AUTH_USER and AUTH_PASSWORD environment variables too, but not through the fly.toml file as these are sensitive values. To securely set these credentials with Fly, we can set them as app secrets. They’re stored encrypted and will be automatically injected into the app at boot time.
$ fly secrets setAUTH_USER=hi@ryanmartin.me AUTH_PASSWORD=not-so-secure-password
# Secrets are staged for the first deployment
With this, the configuration is done and we can deploy the app using fly deploy:
$ fly deploy
# ...# Checking DNS configuration for blue-water-6489.fly.dev# Visit your newly deployed app at https://blue-water-6489.fly.dev/
The first deployment will take longer since it’s building the Docker image for the first time. Subsequent deployments should be faster due to the cached image layers. You can click on the link to view the deployed app, or you can also run fly open, which will do the same thing. Here’s the app in action:
If you made additional changes to the app or fly.toml, you can redeploy the app using the same command, fly deploy. The app is configured to auto stop/start, which helps to cut costs when there’s not a lot of traffic to the site. If you want to take down the deployment, you’ll need to delete the app itself using fly app destroy <your app name>.
This is an interesting topic in the Clojure community, with varying opinions on whether or not it’s a good idea. Personally, I find having a REPL connected to the live app helpful, and I often use it for debugging and running queries on the live database.[9][9] Since we’re using SQLite, we don’t have a database server we can directly connect to, unlike Postgres or MySQL.
If you’re brave, you can even restart the app directly without redeploying from the REPL. You can easily go wrong with it, which is why some prefer not to use it.
For this project, we’re gonna add a socket REPL. It’s very simple to add (you just need to add a JVM option) and it doesn’t require additional dependencies like nREPL. Let’s update the Dockerfile:
The socket REPL will be listening on port 7888. If we redeploy the app now, the REPL will be started, but we won’t be able to connect to it. That’s because we haven’t exposed the service through Fly proxy. We can do this by adding the socket REPL as a service in the [services] section in fly.toml.
However, doing this will also expose the REPL port to the public. This means that anyone can connect to your REPL and possibly mess with your app. Instead, what we want to do is to configure the socket REPL as a private service.
By default, all Fly apps in your organisation live in the same private network. This private network, called 6PN, connects the apps in your organisation through WireGuard tunnels (a VPN) using IPv6. Fly private services aren’t exposed to the public internet but can be reached from this private network. We can then use Wireguard to connect to this private network to reach our socket REPL.
Fly VMs are also configured with the hostname fly-local-6pn, which maps to its 6PN address. This is analogous to localhost, which points to your loopback address 127.0.0.1. To expose a service to 6PN, all we have to do is bind or serve it to fly-local-6pn instead of the usual 0.0.0.0. We have to update the socket REPL options to:
After redeploying, we can use the fly proxy command to forward the port from the remote server to our local machine.[10][10]
$ fly proxy 7888:7888
# Proxying local port 7888 to remote [blue-water-6489.internal]:7888
In another shell, run:
$ rlwrap nc localhost 7888# user=>
Now we have a REPL connected to the production app! rlwrap is used for readline functionality, e.g. up/down arrow keys, vi bindings. Of course, you can also connect to it from your editor.
To get this to work, you’ll need to create a deploy token from your app’s dashboard. Then, in your GitHub repo, create a new repository secret called FLY_API_TOKEN with the value of your deploy token. Now, whenever you push to the main branch, this workflow will automatically run and deploy your app. You can also manually run the workflow from GitHub because of the workflow_dispatch option.
As always, all the code is available on GitHub. Originally, this post was just about deploying to Fly.io, but along the way, I kept adding on more stuff until it essentially became my version of the user manager example app. Anyway, hope this post provided a good view into web development with Clojure. As a bonus, here are some additional resources on deploying Clojure apps:
The way Fly.io works under the hood is pretty clever. Instead of running the container image with a runtime like Docker, the image is unpacked and “loaded” into a VM. See this video explanation for more details. ↩︎
Kit was a big influence on me when I first started learning web development in Clojure. I never used it directly, but I did use their library choices and project structure as a base for my own projects. ↩︎
There’s no “Rails” for the Clojure ecosystem (yet?). The prevailing opinion is to build your own “framework” by composing different libraries together. Most of these libraries are stable and are already used in production by big companies, so don’t let this discourage you from doing web development in Clojure! ↩︎
There might be some keys that you add or remove, but the structure of the config file stays the same. ↩︎
“assoc” (associate) is a Clojure slang that means to add or update a key-value pair in a map. ↩︎
For more details on how basic authentication works, check out the specification. ↩︎
Here’s a cool resource I found when researching Java Dockerfiles: WhichJDK. It provides a comprehensive comparison of the different JDKs available and recommendations on which one you should use. ↩︎
If you encounter errors related to WireGuard when running fly proxy, you can run fly doctor, which will hopefully detect issues with your local setup and also suggest fixes for them. ↩︎
This post is about six seven months late, but here are my takeaways from Advent of Code 2024. It was my second time participating, and this time I actually managed to complete it.[1][1] My goal was to learn a new language, Zig, and to improve my DSA and problem-solving skills.
If you’re not familiar, Advent of Code is an annual programming challenge that runs every December. A new puzzle is released each day from December 1st to the 25th. There’s also a global leaderboard where people (and AI) race to get the fastest solves, but I personally don’t compete in it, mostly because I want to do it at my own pace.
I went with Zig because I have been curious about it for a while, mainly because of its promise of being a better C and because TigerBeetle (one of the coolest databases now) is written in it. Learning Zig felt like a good way to get back into systems programming, something I’ve been wanting to do after a couple of chaotic years of web development.
This post is mostly about my setup, results, and the things I learned from solving the puzzles. If you’re more interested in my solutions, I’ve also uploaded my code and solution write-ups to my GitHub repository.
There were several Advent of Code templates in Zig that I looked at as a reference for my development setup, but none of them really clicked with me. I ended up just running my solutions directly using zig run for the whole event. It wasn’t until after the event ended that I properly learned Zig’s build system and reorganised my project.
The project is powered by build.zig, which defines several commands:
Build
zig build - Builds all of the binaries for all optimisation modes.
Run
zig build run - Runs all solutions sequentially.
zig build run -Day=XX - Runs the solution of the specified day only.
Benchmark
zig build bench - Runs all benchmarks sequentially.
zig build bench -Day=XX - Runs the benchmark of the specified day only.
Test
zig build test - Runs all tests sequentially.
zig build test -Day=XX - Runs the tests of the specified day only.
You can also pass the optimisation mode that you want to any of the commands above with the -Doptimize flag.
Under the hood, build.zig compiles src/run.zig when you call zig build run, and src/bench.zig when you call zig build bench. These files are templates that import the solution for a specific day from src/days/dayXX.zig. For example, here’s what src/run.zig looks like:
The day module imported is an anonymous import dynamically injected by build.zig during compilation. This allows a single run.zig or bench.zig to be reused for all solutions. This avoids repeating boilerplate code in the solution files. Here’s a simplified version of my build.zig file that shows how this works:
build.zig
const std =@import("std");pubfnbuild(b:*std.Build)void{const target = b.standardTargetOptions(.{});const optimize = b.standardOptimizeOption(.{});const run_all = b.step("run","Run all days");const day_option = b.option(usize,"ay","");// The `-Day` option// Generate build targets for all 25 days.for(1..26)|day|{const day_zig_file = b.path(b.fmt("src/days/day{d:0>2}.zig",.{day}));// Create an executable for running this specific day.const run_exe = b.addExecutable(.{.name = b.fmt("run-day{d:0>2}",.{day}),.root_source_file = b.path("src/run.zig"),.target = target,.optimize = optimize,});// Inject the day-specific solution file as the anonymous module `day`.
run_exe.root_module.addAnonymousImport("day",.{.root_source_file = day_zig_file });// Install the executable so it can be run.
b.installArtifact(run_exe);// ...}}
My actual build.zig has some extra code that builds the binaries for all optimisation modes.
This setup is pretty barebones. I’ve seen other templates do cool things like scaffold files, download puzzle inputs, and even submit answers automatically. Since I wrote my build.zig after the event ended, I didn’t get to use it while solving the puzzles. I might add these features to it if I decided to do Advent of Code again this year with Zig.
While there are no rules to Advent of Code itself, to make things a little more interesting, I set a few constraints and rules for myself:
The code must be readable.
By “readable”, I mean the code should be straightforward and easy to follow. No unnecessary abstractions. I should be able to come back to the code months later and still understand (most of) it.
Solutions must be a single file.
No external dependencies. No shared utilities module. Everything needed to solve the puzzle should be visible in that one solution file.
The total runtime must be under one second.[2][2]
All solutions, when run sequentially, should finish in under one second. I want to improve my performance engineering skills.
Parts should be solved separately.
This means: (1) no solving both parts simultaneously, and (2) no doing extra work in part one that makes part two faster. The aim of this is to get a clear idea of how long each part takes on its own.
No concurrency or parallelism.
Solutions must run sequentially on a single thread. This keeps the focus on the efficiency of the algorithm. I can’t speed up slow solutions by using multiple CPU cores.
No ChatGPT. No Claude. No AI help.
I want to train myself, not the LLM. I can look at other people’s solutions, but only after I have given my best effort at solving the problem.
Follow the constraints of the input file.
The solution doesn’t have to work for all possible scenarios, but it should work for all valid inputs. If the input file only contains 8-bit unsigned integers, the solution doesn’t have to handle larger integer types.
Hardcoding is allowed.
For example: size of the input, number of rows and columns, etc. Since the input is known at compile-time, we can skip runtime parsing and just embed it into the program using Zig’s @embedFile.
Most of these constraints are designed to push me to write clearer, more performant code. I also wanted my code to look like it was taken straight from TigerBeetle’s codebase (minus the assertions).[3][3] Lastly, I just thought it would make the experience more fun.
From all of the puzzles, here are my top 3 favourites:
Day 6: Guard Gallivant - This is my slowest day (in benchmarks), but also the one I learned the most from. Some of these learnings include: using vectors to represent directions, padding 2D grids, metadata packing, system endianness, etc.
Day 17: Chronospatial Computer - I love reverse engineering puzzles. I used to do a lot of these in CTFs during my university days. The best thing I learned from this day is the realisation that we can use different integer bases to optimise data representation. This helped improve my runtimes in the later days 22 and 23.
Day 21: Keypad Conundrum - This one was fun. My gut told me that it can be solved greedily by always choosing the best move. It was right. Though I did have to scroll Reddit for a bit to figure out the step I was missing, which was that you have to visit the farthest keypads first. This is also my longest solution file (almost 400 lines) because I hardcoded the best-moves table.
Honourable mention:
Day 24: Crossed Wires - Another reverse engineering puzzle. Confession: I didn’t solve this myself during the event. After 23 brutal days, my brain was too tired, so I copied a random Python solution from Reddit. When I retried it later, it turned out to be pretty fun. I still couldn’t find a solution I was satisfied with though.
During the event, I learned a lot about Zig and performance, and also developed some personal coding conventions. Some of these are Zig-specific, but most are universal and can be applied across languages. This section covers general programming and Zig patterns I found useful. The next section will focus on performance-related tips.
Zig’s flagship feature, comptime, is surprisingly useful. I knew Zig uses it for generics and that people do clever metaprogramming with it, but I didn’t expect to be using it so often myself.
My main use for comptime was to generate puzzle-specific types. All my solution files follow the same structure, with a DayXX function that takes some parameters (usually the input length) and returns a puzzle-specific type, e.g.:
This lets me instantiate the type with a size that matches my input:
src/days/day01.zig
// Here, `Day01` is called with the size of my actual input.pubfnrun(_:std.mem.Allocator, is_run:bool)![3]u64{// ...const input =@embedFile("./data/day01.txt");var puzzle =tryDay01(1000).init(input);// ...}// Here, `Day01` is called with the size of my test input.test"day 01 part 1 sample 1"{var puzzle =tryDay01(6).init(sample_input);// ...}
This allows me to reuse logic across different inputs while still hardcoding the array sizes. Without comptime, I have to either create a separate function for all my different inputs or dynamically allocate memory because I can’t hardcode the array size.
I also used comptime to shift some computation to compile-time to reduce runtime overhead. For example, on day 4, I needed a function to check whether a string matches either "XMAS" or its reverse, "SAMX". A pretty simple function that you can write as a one-liner in Python:
example.py
defmatches(pattern, target):return target == pattern or target == pattern[::-1]
Typically, a function like this requires some dynamic allocation to create the reversed string, since the length of the string is only known at runtime.[4][4] For this puzzle, since the words to reverse are known at compile-time, we can do something like this:
This creates a separate function for each word I want to reverse.[5][5] Each function has an array with the same size as the word to reverse. This removes the need for dynamic allocation and makes the code run faster. As a bonus, Zig also warns you when this word isn’t compile-time known, so you get an immediate error if you pass in a runtime value.
A common pattern in C is to return special sentinel values to denote missing values or errors, e.g. -1, 0, or NULL. In fact, I did this on day 13 of the challenge:
src/days/day13.zig
// We won't ever get 0 as a result, so we use it as a sentinel error value.fncount_tokens(a:[2]u8, b:[2]u8, p:[2]i64)u64{const numerator =@abs(p[0]* b[1]- p[1]* b[0]);const denumerator =@abs(@as(i32, a[0])* b[1]-@as(i32, a[1])* b[0]);returnif(numerator % denumerator !=0)0else numerator / denumerator;}// Then in the caller, skip if the return value is 0.if(count_tokens(a, b, p)==0)continue;
This works, but it’s easy to forget to check for those values, or worse, to accidentally treat them as valid results. Zig improves on this with optional types. If a function might not return a value, you can return ?T instead of T. This also forces the caller to handle the null case. Unlike C, null isn’t a pointer but a more general concept. Zig treats null as the absence of a value for any type, just like Rust’s Option<T>.
The count_tokens function can be refactored to:
src/days/day13.zig
// Return null instead if there's no valid result.fncount_tokens(a:[2]u8, b:[2]u8, p:[2]i64)?u64{const numerator =@abs(p[0]* b[1]- p[1]* b[0]);const denumerator =@abs(@as(i32, a[0])* b[1]-@as(i32, a[1])* b[0]);returnif(numerator % denumerator !=0)nullelse numerator / denumerator;}// The caller is now forced to handle the null case.if(count_tokens(a, b, p))|n_tokens|{// logic only runs when n_tokens is not null.}
Zig also has a concept of error unions, where a function can return either a value or an error. In Rust, this is Result<T>. You could also use error unions instead of optionals for count_tokens; Zig doesn’t force a single approach. I come from Clojure, where returning nil for an error or missing value is common.
This year has a lot of 2D grid puzzles (arguably too many). A common feature of grid-based algorithms is the out-of-bounds check. Here’s what it usually looks like:
example.zig
fndfs(map:[][]u8, position:[2]i8)u32{const x,const y = position;// Bounds check here.if(x <0or y <0or x >= map.len or y >= map[0].len)return0;if(map[x][y]==.visited)return0;
map[x][y]=.visited;var result:u32=1;for(directions)| direction|{
result +=dfs(map, position + direction);}return result;}
This is a typical recursive DFS function. After doing a lot of this, I discovered a nice trick that not only improves code readability, but also its performance. The trick here is to pad the grid with sentinel characters that mark out-of-bounds areas, i.e. add a border to the grid.
You can use any value for the border, as long as it doesn’t conflict with valid values in the grid. With the border in place, the bounds check becomes a simple equality comparison:
example.zig
const border ='*';fndfs(map:[][]u8, position:[2]i8)u32{const x,const y = position;if(map[x][y]== border){// We are out of boundsreturn0;}// ...}
This is much more readable than the previous code. Plus, it’s also faster since we’re only doing one equality check instead of four range checks.
That said, this isn’t a one-size-fits-all solution. This only works for algorithms that traverse the grid one step at a time. If your logic jumps multiple tiles, it can still go out of bounds (except if you increase the width of the border to account for this). This approach also uses a bit more memory than the regular approach as you have to store more characters.
This could also go in the performance section, but I’m including it here because the biggest benefit I get from using SIMD in Zig is the improved code readability. Because Zig has first-class support for vector types, you can write elegant and readable code that also happens to be faster.
If you’re not familiar with vectors, they are a special collection type used for Single instruction, multiple data (SIMD) operations. SIMD allows you to perform computation on multiple values in parallel using only a single CPU instruction, which often leads to some performance boosts.[6][6]
I mostly use vectors to represent positions and directions, e.g. for traversing a grid. Instead of writing code like this:
You can represent position and direction as 2-element vectors and write code like this:
example.zig
next_position = position + direction;
This is much nicer than the previous version!
Day 25 is another good example of a problem that can be solved elegantly using vectors:
src/days/day25.zig
var result:u64=0;for(self.locks.items)|lock|{// lock is a vectorfor(self.keys.items)|key|{// key is also a vectorconst fitted = lock + key >@as(@Vector(5,u8),@splat(5));const is_overlap =@reduce(.Or, fitted);
result +=@intFromBool(!is_overlap);}}
Expressing the logic as vector operations makes the code cleaner since you don’t have to write loops and conditionals as you typically would in a traditional approach.
The tips below are general performance techniques that often help, but like most things in software engineering, “it depends”. These might work 80% of the time, but performance is often highly context-specific. You should benchmark your code instead of blindly following what other people say.
This section would’ve been more fun with concrete examples, step-by-step optimisations, and benchmarks, but that would’ve made the post way too long. Hopefully, I’ll get to write something like that in the future.[7][7]
Whenever possible, prefer static allocation. Static allocation is cheaper since it just involves moving the stack pointer vs dynamic allocation which has more overhead from the allocator machinery. That said, it’s not always the right choice since it has some limitations, e.g. stack size is limited, memory size must be compile-time known, its lifetime is tied to the current stack frame, etc.
If you need to do dynamic allocations, try to reduce the number of times you call the allocator. The number of allocations you do matters more than the amount of memory you allocate. More allocations mean more bookkeeping, synchronisation, and sometimes syscalls.
A simple but effective way to reduce allocations is to reuse buffers, whether they’re statically or dynamically allocated. Here’s an example from day 10. For each trail head, we want to create a set of trail ends reachable from it. The naive approach is to allocate a new set every iteration:
src/days/day10.zig
for(self.trail_heads.items)|trail_head|{var trail_ends = std.AutoHashMap([2]u8,void).init(self.allocator);defer trail_ends.deinit();// Set building logic...}
What you can do instead is to allocate the set once before the loop. Then, each iteration, you reuse the set by emptying it without freeing the memory. For Zig’s std.AutoHashMap, this can be done using the clearRetainingCapacity method:
src/days/day10.zig
var trail_ends = std.AutoHashMap([2]u8,void).init(self.allocator);defer trail_ends.deinit();for(self.trail_heads.items)|trail_head|{
trail_ends.clearRetainingCapacity();// Set building logic...}
If you use static arrays, you can also just overwrite existing data instead of clearing it.
A step up from this is to reuse multiple buffers. The simplest form of this is to reuse two buffers, i.e. double buffering. Here’s an example from day 11:
src/days/day11.zig
// Initialise two hash maps that we'll alternate between.var frequencies:[2]std.AutoHashMap(u64,u64)=undefined;for(0..2)|i| frequencies[i]= std.AutoHashMap(u64,u64).init(self.allocator);deferfor(0..2)|i| frequencies[i].deinit();var id:usize=0;for(self.stones)|stone|try frequencies[id].put(stone,1);for(0..n_blinks)|_|{var old_frequencies =&frequencies[id %2];var new_frequencies =&frequencies[(id +1)%2];
id +=1;defer old_frequencies.clearRetainingCapacity();// Do stuff with both maps...}
Here we have two maps to count the frequencies of stones across iterations. Each iteration will build up new_frequencies with the values from old_frequencies. Doing this reduces the number of allocations to just 2 (the number of buffers). The tradeoff here is that it makes the code slightly more complex.
A performance tip people say is to have “mechanical sympathy”. Understand how your code is processed by your computer. An example of this is to structure your data so it works better with your CPU. For example, keep related data close in memory to take advantage of cache locality.
Reducing the size of your data helps with this. Smaller data means more of it can fit in cache. One way to shrink your data is through bit packing. This depends heavily on your specific data, so you’ll need to use your judgement to tell whether this would work for you. I’ll just share some examples that worked for me.
The first example is in day 6 part two, where you have to detect a loop, which happens when you revisit a tile from the same direction as before. To track this, you could use a map or a set to store the tiles and visited directions. A more efficient option is to store this direction metadata in the tile itself.
There are only four tile types, which means you only need two bits to represent the tile types as an enum. If the enum size is one byte, here’s what the tiles look like in memory:
As you can see, the upper six bits are unused. We can store the direction metadata in the upper four bits. One bit for each direction. If a bit is set, it means that we’ve already visited the tile in this direction. Here’s an illustration of the memory layout:
direction metadata tile type
┌─────┴─────┐ ┌─────┴─────┐
┌────────┬─┴─┬───┬───┬─┴─┬─┴─┬───┬───┬─┴─┐
│ Tile: │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │ 1 │ 0 │
└────────┴─┬─┴─┬─┴─┬─┴─┬─┴───┴───┴───┴───┘
up bit ─┘ │ │ └─ left bit
right bit ─┘ down bit
If your language supports struct packing, you can express this layout directly:[8][8]
Doing this avoids extra allocations and improves cache locality. Since the directions metadata is colocated with the tile type, all of them can fit together in cache. Accessing the directions just requires some bitwise operations instead of having to fetch them from another region of memory.
Another way to do this is to represent your data using alternate number bases. Here’s an example from day 23. Computers are represented as two-character strings made up of only lowercase letters, e.g. "bc", "xy", etc. Instead of storing this as a [2]u8 array, you can convert it into a base-26 number and store it as a u16.[9][9]
Here’s the idea: map 'a' to 0, 'b' to 1, up to 'z' as 25. Each character in the string becomes a digit in the base-26 number. For example, "bc" ( [2]u8{ 'b', 'c' }) becomes the base-10 number 28 (). If we represent this using the base-64 character set, it becomes 12 ('b' = 1, 'c' = 2).
While they take the same amount of space (2 bytes), a u16 has some benefits over a [2]u8:
It fits in a single register, whereas you need two for the array.
Comparison is faster as there is only a single value to compare.
I won’t explain branchless programming here; Algorithmica explains it way better than I can. While modern compilers are often smart enough to compile away branches, they don’t catch everything. I still recommend writing branchless code whenever it makes sense. It also has the added benefit of reducing the number of codepaths in your program.
Again, since performance is very context-dependent, I’ll just show you some patterns I use. Here’s one that comes up often:
src/days/day02.zig
if(is_valid_report(report)){
result +=1;}
Instead of the branch, cast the bool into an integer directly:
src/days/day02.zig
result +=@intFromBool(is_valid_report(report))
Another example is from day 6 (again!). Recall that to know if a tile has been visited from a certain direction, we have to check its direction bit. Here’s one way to do it:
The final performance tip is to prefer iterative code over recursion. Recursive functions bring the overhead of allocating stack frames. While recursive code is more elegant, it’s also often slower unless your language’s compiler can optimise it away, e.g. via tail-call optimisation. As far as I know, Zig doesn’t have this, though I might be wrong.
Recursion also has the risk of causing a stack overflow if the execution isn’t bounded. This is why code that is mission- or safety-critical avoids recursion entirely. It’s in TigerBeetle’s TIGERSTYLE and also NASA’s Power of Ten.
Iterative code can be harder to write in some cases, e.g. DFS maps naturally to recursion, but most of the time it is significantly faster, more predictable, and safer than the recursive alternative.
I ran benchmarks for all 25 solutions in each of Zig’s optimisation modes. You can find the full results and the benchmark script in my GitHub repository. All benchmarks were done on an Apple M3 Pro.
As expected, ReleaseFast produced the best result with a total runtime of 85.1 ms. I’m quite happy with this, considering the two constraints that limited the number of optimisations I can do to the code:
Parts should be solved separately - Some days can be solved in a single go, e.g. day 10 and day 13, which could’ve saved a few milliseconds.
No concurrency or parallelism - My slowest days are the compute-heavy days that are very easily parallelisable, e.g. day 6, day 19, and day 22. Without this constraint, I can probably reach sub-20 milliseconds total(?), but that’s for another time.
You can see the full benchmarks for ReleaseFast in the table below:
Day
Title
Parsing (µs)
Part 1 (µs)
Part 2 (µs)
Total (µs)
1
Historian Hysteria
23.5
15.5
2.8
41.8
2
Red-Nosed Reports
42.9
0.0
11.5
54.4
3
Mull it Over
0.0
7.2
16.0
23.2
4
Ceres Search
5.9
0.0
0.0
5.9
5
Print Queue
22.3
0.0
4.6
26.9
6
Guard Gallivant
14.0
25.2
24,331.5
24,370.7
7
Bridge Repair
72.6
321.4
9,620.7
10,014.7
8
Resonant Collinearity
2.7
3.3
13.4
19.4
9
Disk Fragmenter
0.8
12.9
137.9
151.7
10
Hoof It
2.2
29.9
27.8
59.9
11
Plutonian Pebbles
0.1
43.8
2,115.2
2,159.1
12
Garden Groups
6.8
164.4
249.0
420.3
13
Claw Contraption
14.7
0.0
0.0
14.7
14
Restroom Redoubt
13.7
0.0
0.0
13.7
15
Warehouse Woes
14.6
228.5
458.3
701.5
16
Reindeer Maze
12.6
2,480.8
9,010.7
11,504.1
17
Chronospatial Computer
0.1
0.2
44.5
44.8
18
RAM Run
35.6
15.8
33.8
85.2
19
Linen Layout
10.7
11,890.8
11,908.7
23,810.2
20
Race Condition
48.7
54.5
54.2
157.4
21
Keypad Conundrum
0.0
1.7
22.4
24.2
22
Monkey Market
20.7
0.0
11,227.7
11,248.4
23
LAN Party
13.6
22.0
2.5
38.2
24
Crossed Wires
5.0
41.3
14.3
60.7
25
Code Chronicle
24.9
0.0
0.0
24.9
A weird thing I found when benchmarking is that for day 6 part two, ReleaseSafe actually ran faster than ReleaseFast (13,189.0 µs vs 24,370.7 µs). Their outputs are the same, but for some reason, ReleaseSafe is faster even with the safety checks still intact.
The Zig compiler is still very much a moving target, so I don’t want to dig too deep into this, as I’m guessing this might be a bug in the compiler. This weird behaviour might just disappear after a few compiler version updates.
Looking back, I’m really glad I decided to do Advent of Code and followed through to the end. I learned a lot of things. Some are useful in my professional work, some are more like random bits of trivia. Going with Zig was a good choice too. The language is small, simple, and gets out of your way. I learned more about algorithms and concepts than the language itself.
Besides what I’ve already mentioned earlier, here are some examples of the things I learned:
Some of my self-imposed constraints and rules ended up being helpful. I can still (mostly) understand the code I wrote a few months ago. Putting all of the code in a single file made it easier to read since I don’t have to context switch to other files all the time.
However, some of them did backfire a bit, e.g. the two constraints that limit how I can optimise my code. Another one is the “hardcoding allowed” rule. I used a lot of magic numbers, which helped to improve performance, but I didn’t document them, so after a while, I don’t even remember how I got them. I’ve since gone back and added explanations in my write-ups, but next time I’ll remember to at least leave comments.
One constraint I’ll probably remove next time is the no concurrency rule. It’s the biggest contributor to the total runtime of my solutions. I don’t do a lot of concurrent programming, even though my main language at work is Go, so next time it might be a good idea to use Advent of Code to level up my concurrency skills.
I also spent way more time on these puzzles than I originally expected. I optimised and rewrote my code multiple times. I also rewrote my write-ups a few times to make them easier to read. This is by far my longest side project yet. It’s a lot of fun, but it also takes a lot of time and effort. I almost gave up on the write-ups (and this blog post) because I don’t want to explain my awful day 15 and day 16 code. I ended up taking a break for a few months before finishing it, which is why this post is published in August lol.
Just for fun, here’s a photo of some of my notebook sketches that helped me visualise my solutions. See if you can guess which days these are from:
So… would I do it again? Probably, though I’m not making any promises. If I do join this year, I’ll probably stick with Zig. I had my eyes on Zig since the start of 2024, so Advent of Code was the perfect excuse to learn it. This year, there aren’t any languages in particular that caught my eye, so I’ll just keep using Zig, especially since I have a proper setup ready.
If you haven’t tried Advent of Code, I highly recommend checking it out this year. It’s a great excuse to learn a new language, improve your problem-solving skills, or just learn something new. If you’re eager, you can also do the previous years’ puzzles as they’re still available.
One of the best aspects of Advent of Code is the community. The Advent of Code subreddit is a great place for discussion. You can ask questions and also see other people’s solutions. Some people also post really cool visualisations like this one. They also have memes!
I failed my first attempt horribly with Clojure during Advent of Code 2023. Once I reached the later half of the event, I just couldn’t solve the problems with a purely functional style. I could’ve pushed through using imperative code, but I stubbornly chose not to and gave up… ↩︎
The original constraint was that each solution must run in under one second. As it turned out, the code was faster than I expected, so I increased the difficulty. ↩︎
You can implement this function without any allocation by mutating the string in place or by iterating over it twice, which is probably faster than my current implementation. I kept it as-is as a reminder of what comptime can do. ↩︎
As a bonus, I was curious as to what this looks like compiled, so I listed all the functions in this binary in GDB and found:
Well, not always. The number of SIMD instructions depends on the machine’s native SIMD size. If the length of the vector exceeds it, Zig will compile it into multiple SIMD instructions. ↩︎
One thing about packed structs is that their layout is dependent on the system endianness. Most modern systems are little-endian, so the memory layout I showed is actually reversed. Thankfully, Zig has some useful functions to convert between endianness like std.mem.nativeToBig, which makes working with packed structs easier. ↩︎
Technically, you can store 2-digit base 26 numbers in a u10, as there are only possible numbers. Most systems usually pad values by byte size, so u10 will still be stored as u16, which is why I just went straight for it. ↩︎
Financial regulation — Basel III, MiFID II, Solvency II, SOX — requires that risk calculations, credit decisions, and compliance reports be reproducible. Not just the code, but the exact data state that produced them. When an auditor asks “show me the data behind this risk number from six months ago,” the answer can’t be “we’ll try to reconstruct it.”
Version control solved this problem for source code decades ago. But analytical data infrastructure never caught up. Data warehouses don’t version tables. Temporal tables track row-level changes but don’t compose across tables or systems. Manual snapshots are expensive, fragile, and don’t support branching for scenario analysis.
Stratum brings the git model to analytical data: every write creates an immutable, content-addressed snapshot. Old states remain accessible by commit UUID. Branches are O(1). And via Yggdrasil, you can tie entity databases, analytical datasets, and search indices into a single consistent, auditable snapshot.
The problem
A typical analytical pipeline at a regulated institution:
Transactional data flows into a warehouse (nightly ETL or streaming)
Analysts run GROUP BY / SUM / STDDEV queries for risk models and reports
Months later, an auditor asks: “What data produced risk report X on date Y?”
Step 4 is where things break. The warehouse has been mutated since then. Maybe there’s a backup, maybe not. Reconstructing the exact state requires replaying ETL from source systems — if those logs still exist.
Even if you can reconstruct the data, you can’t prove it’s the same data. There’s no cryptographic link between the report and the state that produced it. The best you can offer is procedural trust: “our backup process is reliable, and we believe this is what the data looked like.” That’s a weak foundation for regulatory compliance.
Immutable snapshots as audit anchors
With Stratum, every table is a copy-on-write value. Writes create new snapshots; old snapshots remain addressable by commit UUID or branch name. The underlying storage is a content-addressed Merkle tree — each snapshot’s identity is derived from a hash of its data, providing a cryptographic chain of custody from report to source.
require('[stratum.api :as st])
;; Load the current production state
def trades: st/load(store "trades" {:branch "production"})
;; Run today's risk calculation
def risk-report: st/q({:from trades, :group [:desk :currency], :agg [[:sum :notional] [:stddev :pnl] [:count]]})
;; The commit UUID is your audit anchor — store it alongside the report
;; Six months later, reproduce exactly:
def historical-trades: st/load(store "trades" {:as-of #uuid "a1b2c3d4-..."})
def historical-report: st/q({:from historical-trades, :group [:desk :currency], :agg [[:sum :notional] [:stddev :pnl] [:count]]})
;; Identical results, guaranteed by content addressing
(require '[stratum.api :as st])
;; Load the current production state
(def trades (st/load store "trades" {:branch "production"}))
;; Run today's risk calculation
(def risk-report
(st/q {:from trades
:group [:desk :currency]
:agg [[:sum :notional] [:stddev :pnl] [:count]]}))
;; The commit UUID is your audit anchor — store it alongside the report
;; Six months later, reproduce exactly:
(def historical-trades
(st/load store "trades" {:as-of #uuid "a1b2c3d4-..."}))
(def historical-report
(st/q {:from historical-trades
:group [:desk :currency]
:agg [[:sum :notional] [:stddev :pnl] [:count]]}))
;; Identical results, guaranteed by content addressing
Or via SQL — connect any PostgreSQL client:
-- Today's reportSELECT desk, currency, SUM(notional), STDDEV(pnl), COUNT(*)FROM trades GROUP BY desk, currency;-- Historical report: same query, different snapshot-- resolved server-side via branch/commit configuration
Once committed, data cannot be modified — every state is a value, addressable by its content hash. Historical snapshots load lazily from storage on demand, so keeping years of history doesn’t mean paying for it in memory. And because snapshots are immutable values, multiple analysts can query the same or different points in time concurrently without coordination or locks.
Scenario analysis with branching
Beyond audit compliance, regulated institutions need scenario analysis. Basel III stress testing requires banks to evaluate capital adequacy under hypothetical adverse conditions — equity drawdowns, interest rate shocks, credit spread widening. Traditional approaches involve copying production data into staging environments, running scenarios, comparing results, and cleaning up. That process is slow, expensive, and error-prone.
With copy-on-write branching, forking a dataset is O(1) regardless of size. A 100-million-row table branches in microseconds because the fork is just a new root pointer into the shared tree. Only chunks that are actually modified get copied.
;; Fork production data for stress testing — O(1) regardless of table size
def stress-scenario: st/fork(trades)
;; Apply adverse conditions — only modified chunks are copied
;; e.g. via SQL: UPDATE trades SET price = price * 0.7
;; WHERE asset_class = 'equity'
;; Compare risk metrics: production vs stressed
def baseline-risk: st/q({:from trades, :group [:desk], :agg [[:stddev :pnl] [:sum :notional]]})
def stressed-risk: st/q({:from stress-scenario, :group [:desk], :agg [[:stddev :pnl] [:sum :notional]]})
;; Run as many scenarios as needed — each is an independent branch
;; Baseline, adverse, severely adverse, custom scenarios
;; all sharing unmodified data via structural sharing
;; Fork production data for stress testing — O(1) regardless of table size
(def stress-scenario (st/fork trades))
;; Apply adverse conditions — only modified chunks are copied
;; e.g. via SQL: UPDATE trades SET price = price * 0.7
;; WHERE asset_class = 'equity'
;; Compare risk metrics: production vs stressed
(def baseline-risk
(st/q {:from trades
:group [:desk]
:agg [[:stddev :pnl] [:sum :notional]]}))
(def stressed-risk
(st/q {:from stress-scenario
:group [:desk]
:agg [[:stddev :pnl] [:sum :notional]]}))
;; Run as many scenarios as needed — each is an independent branch
;; Baseline, adverse, severely adverse, custom scenarios
;; all sharing unmodified data via structural sharing
Each branch is fully isolated: modifications to the stress scenario can’t touch production data. You can maintain dozens of concurrent scenarios without multiplying storage costs — they share all unmodified data. When you stop referencing a branch, mark-and-sweep GC reclaims the storage. No staging environments, no cleanup scripts.
This also applies to model validation. When a risk model is updated, you can run the new model against historical snapshots and compare its outputs to the original model’s results — same data, different code, verifiable divergence.
Cross-system consistency
A real regulatory pipeline isn’t just one analytical table. Entity data (customers, counterparties, legal entities) lives in a transactional database. Analytical views (positions, P&L, exposures) live in a columnar engine. Compliance documents and communications live in a search index. For an audit to be meaningful, all of these need to be at the same point in time.
Yggdrasil provides a shared branching protocol across these heterogeneous systems. You can compose a Datahike entity database, a Stratum analytical dataset, and a Scriptum search index into a single composite system — branching, snapshotting, and time-traveling all of them together.
require('[yggdrasil.core :as ygg])
;; Compose entity database + analytics + search into one system
def system: ygg/composite-system({:entities datahike-conn, :analytics stratum-store, :search scriptum-index})
;; Branch the entire system for an investigation
ygg/branch!(system "investigation-2026-Q1")
;; Every component is now at the same logical point in time
;; Query across all three with a single consistent snapshot
(require '[yggdrasil.core :as ygg])
;; Compose entity database + analytics + search into one system
(def system
(ygg/composite-system
{:entities datahike-conn ;; customer records, counterparties
:analytics stratum-store ;; trade data, positions, P&L
:search scriptum-index})) ;; compliance documents, communications
;; Branch the entire system for an investigation
(ygg/branch! system "investigation-2026-Q1")
;; Every component is now at the same logical point in time
;; Query across all three with a single consistent snapshot
When an auditor needs the full picture — the trade data, the customer entity that placed the trade, and the compliance documents reviewed at the time — they get a single consistent view across all systems, tied to one branch identifier. No manual coordination, no hoping the timestamps line up.
Compliance lifecycle
Immutable systems raise an obvious question: what about GDPR right-to-erasure, or data retention policies that require deletion?
Immutability doesn’t mean data can never be removed — it means deletion is explicit and verifiable rather than implicit and unauditable. The Datahike ecosystem supports purge operations that remove specific data from all indices and all historical snapshots. Mark-and-sweep garbage collection, coordinated across systems via Yggdrasil, reclaims storage from unreachable snapshots.
This is actually a stronger compliance story than mutable databases offer. In a mutable system, you DELETE a row and trust that the storage layer eventually overwrites it — but you can’t prove it’s gone from backups, replicas, or caches. With explicit purge on content-addressed storage, you can verify that the data no longer exists in any reachable snapshot.
Production-ready performance
Versioning and immutability don’t come at the cost of query speed. Stratum uses SIMD-accelerated execution via the Java Vector API, fused filter-aggregate pipelines, and zone-map pruning to skip entire data chunks. It runs standard OLAP benchmarks competitively with engines like DuckDB — while also providing branching, time travel, and content addressing that pure analytical engines don’t.
Full SQL is supported via the PostgreSQL wire protocol: aggregates, window functions, joins, CTEs, subqueries. Connect with psql, JDBC, DBeaver, or any PostgreSQL-compatible client. See the Stratum technical deep-dive for architecture details and benchmark methodology.
Getting started
Stratum runs as an in-process Clojure library or a standalone SQL server. Requires JDK 21+.
If you’re building analytical infrastructure in a regulated environment — or exploring how versioned data can simplify your compliance story — get in touch. We work with teams in finance, insurance, and healthcare to design data architectures where auditability is built in, not bolted on.
(nslittle-ring-things.handler(:require[compojure.core:refer:all][compojure.route:asroute][ring.middleware.defaults:refer[wrap-defaultssite-defaults]][little-ring-things.template:ast])(:import[java.timeLocalDateTime]))(defroutesapp-routes(GET"/"[](t/template"Home""<p>Hello World</p>"))(GET"/about"[](t/template"About""<p>This is Clojure ring tutorial</p>"))(GET"/hello/:name"[name](t/template"Hello"(str"<p>Hello "name"</p>")))(GET"/time"[](t/template"Time"(str"<p>The current time is "(LocalDateTime/now)"</p>")))(route/not-found"Not Found"))(defapp(wrap-defaultsapp-routessite-defaults))
Meta-programming = the broad idea of “programs that manipulate or generate programs”. It can happen at runtime (reflection) or compile-time (macros).
Macros = one specific style of meta-programming, usually tied to transforming syntax at compile time (in a pre-processor or AST-transformer). It takes a piece of code as input and replaces it with another piece of code as output, often based on patterns or parameters.
Rule‑based transformation: A macro is specified as a pattern (e.g., a template, an AST pattern, or token pattern) plus a replacement that is generated when that pattern is matched.
Expansion, not function call: Macro use is not a runtime call; the macro is expanded before execution, so the final code is the result of replacing the macro invocation with its generated code.
Here are some programming languages and their meta-programming and macro capabilities.
NB! Take with a grain of salt. The result comes from working with perplexity.ai, and I have not had a chance to personally verify all of the cells. They do look generally correct to me overall, though. Corrections are welcome!
Metaprogramming + macro features
Here are the programming languages with their scores (out of 18) and links to their repos or homepages:
The score counts one point per row where the language can reasonably do what the feature describes (DSL‑building is counted as a full feature, even if “limited” in some languages).
The feature score is not an ultimate measure of meta-programming power, since a language (like C++) may have a higher score than another language (like Ruby), but generally be considered less tailored for meta-programming than the other language (Ruby is generally revered for its powerful meta-programming abilities).
Macro features are varied and many, and thus in the total score they gain an undue weight, although runtime meta-programming may be just as, or even more, powerful.
After years of watching talented developers bounce off Clojure’s prefix notation, we at Flexiana decided it was time to act. Today we’re open-sourcing **Infix** — a library that brings natural, readable mathematical and data-processing syntax to Clojure, while compiling to standard Clojure forms with zero runtime overhead.
This is not a toy. This is the future.
The Problem Nobody Was Brave Enough to Solve
Let’s be honest. We’ve all been in that meeting where a data scientist looks at
clojure
(+ (* a b) (/ c d))
and quietly opens a Python tab. We’ve all watched a business analyst try to read a pricing rule written as `(<= (count (:items order)) (* 2 (get-in config [:limits :base])))` and slowly lose the will to live.
Prefix notation is elegant. It is consistent. It is *theoretically* superior. But so is Esperanto, and we all know how that worked out.
The Solution
Infix lets you write this:
clojure
(infix a * b + c / d)
and it compiles to `(+ (* a b) (/ c d))`. Operator precedence works exactly as you’d expect from every other language you’ve ever used. Because we studied those languages. Carefully.
Clojure’s threading macros become first-class infix operators with the lowest precedence, because data flows left to right. Like water. Like time. Like *progress*.
Arrow Lambdas
clojure
(map (infix x => x * x + 1) [1 2 3])
;; => [2 5 10]
Clean, readable anonymous functions. No `#(…)` gymnastics. No counting percent signs.
Familiar `fn(args)` notation, because sometimes you just want to feel at home.
How It Works
Everything is a macro. The `infix` macro uses a Shunting Yard parser to transform your expressions into standard Clojure forms at compile time. There is no interpreter. There is no string parsing. There is no runtime cost. Your production Clojure is exactly the same Clojure it always was — we just let you *write* it differently.
The entire library is roughly 300 lines of code across four namespaces: a parser, a precedence table, a compiler, and the public API. We encourage you to read it. It’s well-commented and, dare we say, rather elegant.
We’ve heard the arguments. S-expressions are homoiconic. Prefix notation eliminates ambiguity. Operator precedence is a source of bugs. Rich Hickey didn’t design Clojure so you could write `a + b` like some *Java developer*.
To which we say: you’re absolutely right. And yet here we are, shipping it anyway. On April 1st, no less — the only day of the year when the Clojure community might forgive us.
The library is real. The tests pass. The macros expand. Whether you *should* use it is a question we leave to your conscience, your team lead, and your local REPL priest.
What’s Next
Infix 1.0.0 is feature-complete and ready for production use. Future enhancements may include:
I use Debian Linux as it provides a stable and low maintenance operating system, enabling me to focus on getting valuable tasks done. Package updates are well tested and the deb package management system avoid incompatible versions between dependent packages.
The small amount of maintenance is done via the Advance Packaging Tool (Apt), which has a very simple to understand command line, i.e. apt install, apt update, apt upgrade and apt purge.
The constraint of a stable operating system is that some of the latest versions of development tools and programming languages may not be available as part of the distributions package manager.
{align=right loading=lazy}
Simple bash scripts were created to install the latest development tools, made effectively one-liner's where the Download Release Artifacts (DRA) project could be used. The scripts were very simple even when falling back to curl with a few basic Linux commands.
Each shell script installs either an editor, programming language (e.g. Clojure, Rust, Node.js), Terminal UI (TUI) for development or system administration and for a few desktop apps.
debian-linux-post-instal.sh updates all the Debian packages, reading packages to add and remove from a plain text file.
dev-tools-install.sh calls each script to install all the development tools, programming languages, TUI's and desktop app's.
Check out the Clojure: The Documentary trailer! We’re so lucky to have a documentary made about how Clojure came about and what people love about it.
Let me tell you a story about the beautiful possibilities and the mundane realities of code. I lived this story. And it taught me that a good library doesn’t have to make code sing. It just needs to get the job done without making a mess.
I was working as a contractor at a company that was using Om Next, the sequel to the original Om. Both libraries were wrappers around React. While Om was a minimalist take on how to structure a UI, it revealed a problem: Often your UI’s hierarchy is very different from how your data is structured. Om Next solved this problem by introducing the idea of parsers, which let you convert a declarative expression of what a component needed into a function of the data. This indirection allowed you to shape your data and UI in different ways.
I had used the original Om, but I had not used Om Next in anger. At this job, I needed to create a few new components in the UI, so I got to writing a parser. I made a big mess. In order to express the information I needed (the query), I had to invent a new language. That’s not easy. Then you have to write code to interpret that new language, dig into the central data store for what the query needs, and return data in a new format that your component will consume. The code looked terrible (many nested ifs) and I just couldn’t see how to improve it. The existing components’ parsers looked just as bad. It looked like all of the complexity of the data model was condensed into these parsers so that the data store could remain normalized, the components could be simple, and the query could be concise.
I couldn’t help comparing this to Reagent and Reframe, other wrappers around React, both of which I was more familiar with. Reframe also had a centralized data store, but it didn’t have parsers and queries. Instead, you built subscriptions, which were “reactive” functions of the data store. If you wanted the data for a component, you would write a subscription which gave it to you. It was more direct than writing a query (new language) and a parser (new interpreter). But it was still a layer of indirection.
I asked one of the more senior engineers at the company whether I was crazy. I had never used Om Next before, so maybe I was missing something. Was all of that gnarly code worth it? Why is this better than Reframe? His answer inspired this post. He said, “It can get gnarly, but it gives you the tools such that when it’s done right, it’s really sublime.”
Being a fan of sublime code, I wondered why his answer troubled me. And then I realized the reason: If the choice is between sublime when everything goes right and horrible if anything goes wrong, that’s not a great choice. I would much rather have mostly readable, even when I’m not making perfect decisions, with less extremes on the good and bad. Put another way, mundane but workable beats sublime with herculean effort. Reframe is mostly workable. Om Next is mostly gnarly.
Since that conversation I’ve been pondering this as a principle. It’s a tradeoff between tradeoffs. When it comes to code beauty, I would rather have a fat average of the bell curve than a lot of outliers. I’d rather have base hits than home runs, because the players swinging for home runs strike out just as often. And, listen, I’m just a working programmer trying to get the cart icon to show the count of items inside. I’m not building a Rolex.
I believe my experience also inspired me to create my Reframe course. I learned to appreciate the framework. Reframe gives you the tools to build understandable UIs of medium to high complexity. And after my experience with Om Next, I saw how Reframe’s tools made decisions straightforward. They never reach the heights of sublimity that legends tell of Om Next. But the everyday code of Reframe is fine. Messes are contained and can be split. There aren’t any difficult design decisions to make, like designing a language and interpreter.
So that’s my story. I became a fan of Reframe because of how bad the default code in Om Next was. And I decided that the promise of beautiful code if you do it right is not enough. What does the code look like when I’m in a rush? What does it look like when I don’t really care that day? What does it look like when a junior programmer writes it? What does it look like when you’ve made the wrong decision but you don’t have time to rework it? On most days, things are not at their best. Making it easy to write decent code is way more important than making it possible to write beautiful code. It’s a principle I keep in my back pocket for when I or a colleague wants to make a tool to make beautiful things at work. I pull it out to dissuade them. It’s the everyday code that matters.
This started as a simple question: what is an agent framework actually doing?
A few months ago I built a minimal agent engine from scratch in Clojure to find out. That engine used a graph model: nodes were pure functions, edges defined routing, a loop drove execution. The short answer was that it is a recursive loop, some JSON parsing, and a state machine. That article is here if you want the details.
But once you strip the agent down to its core, a different question surfaces. If agents can call other agents, how does authority move between them? That led me back to Spritely Goblins, which I already knew. OCapN came from there. When I tried to interpret those ideas in a cloud-native context, I ran into Biscuit tokens. I wrote about reconstructing Biscuit in Clojure. The result was kex, a minimal proof of concept.
Ayatori is where those two threads converge. The graph model from the first experiment, combined with a capability-based design for agent composition and authority.
What Ayatori is
Ayatori is an experimental graph-based AI agent orchestration engine built in Clojure. Nodes are mostly pure functions. Edges define routing. The executor handles the rest.
The name comes from ayatori, the Japanese string figure game known in English as cat’s cradle. Agents passing capabilities to each other, each move shaping what comes next. The attenuation and delegation that constrains those moves will come with kex integration.
It is currently a proof of concept, under active development. Not production-ready.
Thanks for reading Taorem! Subscribe for free to receive new posts and support my work.
The core model
An agent is a graph. You define nodes and edges, declare what capabilities it exposes and what dependencies it needs, and the system wires everything together at start time.
deps -> agent(nodes, edges) -> caps
Nodes can be pure functions, stateful handlers, LLM nodes, fan-out nodes, or nested agent nodes. The simplest case is a pure function: takes input, returns output. Routing is data: edges are keywords for unconditional routing, or maps for conditional routing based on a :route key the node returns. The executor handles dispatch, state, and middleware.
A simple example. An LLM agent with tool calling and structured output:
Tool calls route through graph nodes. Middleware observes every step. :response-format enforces output schema with self-healing retries via :max-retries if the LLM returns invalid output.
Agents expose caps and declare deps. The system wires deps to caps at start time. An agent can use another agent as a node.
The inner agent runs with its own execution scope but shares the same store and trace-id. Caps can carry Malli schemas for input and output validation. rewire! changes dep targets at runtime without restarting the system.
Where kex comes in
Right now, capability security is structural. Possessing a CapHandle is sufficient to invoke it. There is no cryptographic verification.
When kex is integrated, cross-node calls will carry token chains. Each delegation can only narrow the original capability, never expand it. An agent receiving a delegated capability cannot do more than what it was given. This is still on the roadmap.
Ayatori is a local prototype. No distributed execution yet. CapHandles cannot be serialized or sent across the wire. The middleware dispatch is synchronous. Observability covers individual runs but not topology visualization. Time-based timeout is not implemented, though :max-steps (default 100) provides step-based protection against infinite loops.
These are the next problems to solve.
Multi-node execution is a harder problem. When an agent on one node calls an agent on another and expects a result back, the continuation has to survive the network boundary. Supervision, timeout handling, and meaningful error propagation across agent boundaries are in the same category. Things I want to tackle, but will take time.
Why I built this
My day-to-day work has not involved writing production code for a while. I try to stay hands-on anyway. Clojure is what I reach for when I want to think through a problem with code.
When I started exploring agentic systems, I kept building things to understand them. The minimal agent engine was one experiment. Kex was another. Ayatori is where the two ideas merged into something more structured.
It is an experiment, not a product. The code is available on GitHub if you want to read it, run it, or point out what is wrong with it.
I will keep working on it when I have time. Distributed execution and kex integration are the next things I want to tackle. No timeline.
As part of the Clojure team’s efforts to improve the onboarding experience for new users, we have recorded a step-by-step tutorial taking you from zero to a running REPL.
The video begins with installing the JVM and Clojure CLI, then walks through installing Calva, opening an example project, and connecting it to a REPL to evaluate code.
There is a modeling technique I’ve used in the past, and the more I use it, the more it feels like the right default in a certain class of problems.
The situation is simple enough: you model with data, happily, until one day plain values are no longer enough.
Not because you need more structure.
Because you need more distinctions, more equivalence classes.
You have values representable by the same collection type but they should not be confused. At this point we usually reach for one of three things:
maps with a :type-like key (or worse: keyset-sniffing!),
metadata,
defrecord or deftype.
They all work... to some extent.
They all fail in the same way: code that looks sensible do the wrong thing, because the nuances, the invariants of our fifty shades of maps gets ignored.
Let's review them!
Maps with a :type
The classic just add a :type key, one can't go wrong with classics, right? Right?
{:type::user-id:value42}
Good enough for a while but the cost is that you are still working with a map.
Sooner or later, someone writes or runs map code over it as if it were a plain map.
It's not that one shouldn't be able to use generic functions on them, just that one shouldn't be able to use generic functions on them without being reminded they are no plain maps.
Metadata
Metadata is attractive because it does not pollute the value itself. Unfortunately that is also why it is such a poor fit for modeling: metadata is not part of equality.
Plus it's not printed by default, preserving metadata across transformations is a constant cognitive overload.
defrecord and deftype
Okay, deftype can do the job, at the cost of a lot of boilerplate to give it value semantics.
Wait! Isn't defrecord essentially deftype with value semantics? Yes, it ticks all the boxes: value semantics with its own equivalence class and prints clearly. The catch is that map? returns true on records.
Is that really a problem? Yes because one can't guard every map? with a record? (especially when using third-party code).
Imagine the mess if (every? fn? [:a 'a [] {} #{} #'*out*]) was true. That's why we have fn? and ifn?.
Plus you have to go through protocols or instance? checks to tell them apart. Nothing as easy (or simple? 🤔) than :type. (Yes, there's type but then you can't have types in a case...)
Last you have the hysteresis issues caused by records silently downgrading to maps when a field key is dissoc-ed.
The silver bullet: Tagged Values
All hope is not lost, I've been increasingly trodding a fourth path: tagged values.
The idea is to (ab)use the tagged-literal function to create values which can't be construed for others.
user=>(tagged-literal`customer{:name"Wile E. Coyote"}); prints clearly by default#user/customer{:name"Wile E. Coyote"}user=>(= (tagged-literal`supplier{:name"Wile E. Coyote"})(tagged-literal`customer{:name"Wile E. Coyote"})); each tag is in its own equivalence classfalseuser=>(= {:name"Wile E. Coyote"}(tagged-literal`customer{:name"Wile E. Coyote"})); since they have their own equivalence class, they are not equal to mapsfalseuser=>(map? (tagged-literal`customer{:name"Wile E. Coyote"})); they are no mapsfalseuser=>(coll?(tagged-literal`customer{:name"Wile E. Coyote"})); not even collectionsfalseuser=>(:tag(tagged-literal`customer{:name"Wile E. Coyote"})); still, accessing the tag is easyuser/customeruser=>(:form(tagged-literal`customer{:name"Wile E. Coyote"})); as well as accessing the payload.{:name"Wile E. Coyote"}
It is a wrapper with meaning, with no ceremony.
The important part is not the printed literal syntax. In fact the reader is beside the point here. The important part is that you can create a distinct semantic value for free!
So tagged value buys you something very simple and valuable: safe modeling space! (Fresh equivalence classes.)
If a plain 42 and a "user id 42" should not be interchangeable, then they should not be equal, not be confused, and not accidentally flow through the same code paths. This is what tagged values give you: not more structure, but stronger distinction to prevent unknowingly sending specific data through generic paths and its counterpoint avoiding to make specific pipelines accidentally generic.
Closing
Clojure makes it blissfully easy to model with plain data. That is one of its strengths.
When you run out of types, you don't need more shapes, you need more separation and that's what tagged values brings to the table at almost no cost.
Once you start seeing some modeling problems in terms of equivalence classes rather than representation, they make more and more sense.
Clojure: The Official Documentary premieres April 16th!
From a two-year sabbatical and a stubborn idea to powering the engineering stack of one of the world’s largest fintech companies — this is the story of Clojure.
Featuring Rich Hickey, Alex Miller, Stuart Halloway, and many more, this full-length documentary traces Clojure’s unconventional origins, its values-driven community, and the language’s quiet but profound impact on how we think about software.
Documentary made possible with the support of Nubank!