Software Engineer at Scarlet

Software Engineer at Scarlet

gbp60000 - gbp110000

Our mission is to hasten the transition to universally accessible healthcare. We deliver on this mission by enabling innovators to bring cutting-edge software and AI to the healthcare market safely and quickly. We're regulated by the UK Government and European Commission to do so.

Our certification process is optimised for software and AI, facilitating a more efficient time to market, and the frequent releases needed to build great software. This ensures patients safely get the most up-to-date versions of life-changing technology.

Come help us bring the next generation of healthcare to the people who need it.

Our challenges

Product and engineering challenges go hand in hand at Scarlet. We know our mission can only be accomplished if we:

  • Build products and services that our customers love.
  • Streamline and accelerate complex regulatory processes without sacrificing quality.
  • Ensure that we always conduct excellent safety assessments of our customers’ products.
  • Continuously ship great functionality at a high cadence - and have fun while doing it.
  • Build and maintain a tech stack that embraces the complex domain in which we work.

Our engineering problems are plenty and we have chosen Clojure as the tool to solve them.

The team

The team is everything at Scarlet and we aspire to shape and nurture a team where every team member:

  • Really cares about our customers.
  • Works cross-functionally with engineers, product managers, designers, regulatory experts, and other stakeholders.
  • Collaborates on solving hard, important, real-world problems.
  • Helps bring out the best in each other and support each other’s growth.
  • Brings a diverse set of experiences, backgrounds, and perspectives.
  • Feels that what we do day-to-day is deeply meaningful.

We all have our fair share of experience working with startups, open source and various problem spaces. We wish to expand the team with ~2 more team members that can balance our strengths and weaknesses and help Scarlet build fantastic products.

We’re looking for ambitious teammates who have at least a few years of experience, have an insatiable hunger to learn, and want to do the most important work of their career!

How we work

Our ways of working are guided by a desire to perform at the highest level and do great work.

  • Flexible working: Remote-first with no fixed hours or vacation tracking.
  • Low/no scheduled meetings: Keep meetings to a minimum—no daily stand-ups or agile ceremonies.
  • Asynchronous collaboration: Have rich async discussions and flexible 1:1s as needed.
  • High trust and autonomy: Everyone solves problems; we are responsible for our choices and communicating them with our teammates.
  • Getting together: We meet a minimum of twice a year for a week at our offices in London.
  • Pick your tools: We believe in engineering excellence trust you to use the tool set you feel most productive with.

About you

If this sounds exciting to you, we believe Scarlet may be a great fit and would love to hear from you!

We believe that the potential for a great fit is even higher if you have one or more of the following:

  • Professional Clojure experience.
  • Professional full-stack web development experience.
  • Previous experience in the health tech / regulatory space
  • Endless curiosity and are always driven to understand why things are the way they are.
  • Superb written and verbal communication
  • Live within +/- 2h of the UK’s timezone

The interview process

Though the order may change, the interview steps are:

  1. Intro call with Niclas - 45 mins
  2. Technical knowledge call with an engineer - 60 mins
  3. Culture fit call with a product manager - 60 mins
  4. Technical workshop with Niclas - 90 mins
    • A pair programming session or a presentation of something you’ve built, the choice is yours
  5. Culture fit calls with our co-founders Jamie and James - 30 mins each
  6. Referencing & offer

We want your experience with Scarlet to be a good one and we do our utmost to ensure that you feel welcomed throughout the interview process.

Permalink

nREPL 1.4

nREPL 1.4.0 is out! This month we celebrate 15 years since nREPL’s development started, so you can consider this release part of the upcoming birthday celebrations.

So, what’s new?

Probably the highlight is the ability to pre-configure default values for dynamic variables in all nREPL servers that are launched locally (either per project or system-wide). The most useful application for this would be to enable *warn-on-reflection* in all REPLs. To achieve this, create ~/.nrepl/nrepl.edn with this content:

{:dynamic-vars {clojure.core/*warn-on-reflection* true}}

Now, any nREPL server started from any IDE will have *warn-on-reflection* enabled.

$ clojure -Sdeps "{:deps {nrepl/nrepl {:mvn/version \"1.4.0\"}}}" -m nrepl.cmdline -i
user=> #(.a %)
Reflection warning, NO_SOURCE_PATH:1:2 - reference to field a can't be resolved.

Note: nREPL doesn’t support directly XDG_CONFIG_HOME yet, but you can easily override the default global config directory (~/.nrepl) with NREPL_CONFIG_DIR.

Another new feature is the ability to specify :client-name and :client-version when creating a new nREPL session with the clone operator. This allows collecting information about the clients used, which some organizations might find useful. (I know Nubank are making use of this already)

One notable change in nREPL 1.4 is the removal of support for Clojure 1.7. Clojure 1.7 was released way back in 2015 and I doubt anyone is using it these days, but we try to be extra conservative with the supported Clojure versions and this is only the second time nREPL’s runtime requirements were bumped in the 7 and a half years I’ve been the maintainer of the project. (early on I bumped the required Clojure from 1.2 to 1.7)

As usual the release features also small bug-fixes and internal improvements. One such improvement was the introduction of matcher-combinators in our test suite. (which was the main motivation to bid farewell to Clojure 1.7) You can check out the release note for a complete list of changes.

That’s all I have for you today. I hope we’ll have more exciting nREPL news to share by nREPL’s “official” birthday, October 8th.1 Big thanks to everyone who has contributed to this release and to all the people supporting my Clojure OSS work! In the REPL we trust! Keep hacking!

  1. nREPL 0.1 was released on Oct 11, 2015. 

Permalink

How Nubank Built its in-house log platform

This work was done collaboratively by several incredible people (alphabetical order): Amarilis Campos, Beca Maia, Caio Sousa, Daniel Cristian, Jade Costa, Maria Duarte, Otavio Valadares, Robert Cristiam and many teams inside Nu.

Why did we do it?

We believe that behind all technical solutions there must be a clear business problem to be solved. In this case, we were facing challenges related to the stability and efficiency of our monitoring ecosystem. As Nubank scaled rapidly, our existing log infrastructure began to show signs of pressure, especially in terms of cost predictability and scalability.

Considering the logging platform is key to support all the engineering teams during troubleshooting and incident mitigation, not having full control and visibility on your monitoring data is bad. There is nothing worse than trying to debug a production problem and discover you can’t see the logs of your application. In our case, we used to rely on an external solution to ingest and store our logs, and we had poor observability around it (ironic). Once we created metrics to understand the real situation, our analysis showed that a significant portion of logs weren’t being retained end-to-end, which limited our ability to act quickly in incident response scenarios.

On top of that, our contract was getting expensive (really expensive). The only way to mitigate our problems was buying more licenses (paying more), and there wasn’t a clear pricing model for us to plan our spends. If we had problems, we’d have to add more money. No predictability was possible here. It got to a point that the team analyzed that we could hire Lionel Messi as a software engineer, paying the same amount we were paying for the external solution.

With this complex and exciting problem at hand, we decided to explore alternatives, and the most efficient one seemed to be creating our own platform. This way we would have total control over our data, ingestion pipeline, storage strategy and querying runtime.

How was Nubank’s Log Infrastructure

Before moving to an in-house solution, Nubank’s log infrastructure was very simple, and totally coupled with the previous solution.

In short, every application log was being sent directly to the vendor’s platforms by its own forwarder. Also, we had many different unknown internal sources sending data directly to its API.

This architecture served Nubank well for many years, but with our massive hyper-scale growth, some years ago, we started to face its limitations, the future with it started to be a concern.

The primary concerns and problems with this architecture and approach, as identified by the team, were:

  • Lack of observability: We didn’t have any observability over the ingestion and storage flow, if something happens we didn’t have any trustable metric about it.
  • High coupling: At this time, lots of our alerts, dashboards were defined directly on vendor interfaces, all our data was stored within it, and we didn’t have the capability to change solutions or migrate away easily.
  • Lack of control: We didn’t have any way to filter, aggregate, route or apply logic over incoming data.
  • High Costs: The increasing costs related to logs stack was a constant concern from stakeholders and the trend was that it would keep growing if we didn’t take any action.
  • Coupled ingestion and querying processes: High load on ingestion directly impacted querying performance, and vice versa.

Divide and conquer

Developing an entire log platform from scratch is hard, at the time we didn’t have anything built!

To be able to solve this problem, we divided the entire project into two major steps:

  • The Observability Stream: A complete stream platform capable of ingesting and processing observability signals in a reliable and efficient manner. Decoupling from the other solution and having full control over our data.
  • Query & Storage Platform: The platform that would store and make logs searchable, so that engineers could use it on daily troubleshooting tasks. 

For both projects we had a different set of requirements and features that we needed to build, but there were three common requirements:

  • Reliable: The platform needed to be reliable even under high load or unexpected scenarios to support Nubank operations.
  • Scalable: Be able to quickly scale when facing spike in ingestion and usage, and on long term when dealing with the hyper-growth of Nubank.
  • Cost Efficient: Being cost efficient is always important at Nubank, and we needed a platform that would be cost efficient on a long-term vision, being able to ingest and store all our generated data cheaper than any vendor.

With a clear list of requirements and expectations we started the project, first focusing on the ingestion and processing, and then the querying and storage platform.

The Observability Stream

The decision was to build the ingestion platform first, it allowed us to start the migration process without requiring any major disruptions on the developer experience while already decoupling the transaction environment from the observability environment. It also allowed us to gather metrics about our data to support better decision-making, especially during the storage platform development.

The observability stream was built with simplicity in mind, with a mix of open source projects and in-house developed systems.

To summarize, the ingestion architecture is composed of three distinct systems:

  • Fluent Bit: We opted for a lightweight, easily configurable, and efficient data collector and forwarder. This open-source, CNCF-backed project is a reliable industry standard for the task.
  • Data Buffer Service: The service responsible to handle all incoming data from forwarders and accumulate them in large chunks of data, to proceed on the pipeline on a micro-batching architecture.
  • Filter & Process Service: An in-house developed high scalable system able to filter and process any incoming data efficiently. This system is the core of our ingestion platform, being easily extensible to add any new filter/process logic as needed, it’s also responsible to collect metrics from incoming data.

With the observability stream fully operational, we established a foundation of reliability and scalability for our log ingestion processes. This comprehensive system not only resolved our immediate needs for quality data intake but also provided us with invaluable insights into our logging activities. Furthermore, it decoupled our ingestion processes from the querying process, allowing for greater flexibility and the ability to easily change components when needed, a capability we lacked previously due to tight coupling.

Query & Log Platform

With a robust ingestion platform ensuring reliability and scalability, our next challenge was to develop a query and storage solution capable of effectively handling and retrieving this massive volume of log data.

With all this, we needed to choose a query engine to search all this data, and Trino was the choice for several reasons:

  • Trino partitions feature was a crucial feature.  Using it we’re able to enhance our query performance by segmenting data into manageable chunks, this allows queries to target only relevant data subsets, improving response times and reducing resource usage. Trino’s partitioning feature was a key factor in our decision to adopt it.
  • AWS S3 as storage: By storing all our data on AWS S3 we guarantee the high reliability of our data in a cost-effective way, its high scalability is well grounded to receive this massive amount of data, while being able to scale in long-term as Nubank grows.

To store the logs, the chosen format was Parquet. Using it, we’re able to achieve the best search performance due to its colunar storage while also achieving an average of 95% of compaction rate. This helps us achieve the goal of having all our data stored in the most effective way.

To generate all this parquet, we built a high scalable and extensible parquet generator app, that are capable of transform into parquet all the massive data coming the ingestion platform, the choice to build our own internal infrastructure to it, also emphasize our goal of have a cost-effective alternative while being able to extend and adapt on Nubank’s needs.

With our query and log platform fully integrated and operational, we have successfully redefined how Nubank manages its log data. The strategic choice of Trino for querying, S3 for storage, and Parquet for data format ensures that our logs are not only efficiently stored but also readily accessible for analysis and troubleshooting. These innovations have not only resolved initial challenges but have also equipped Nubank with a powerful tool for future growth.

Final Thoughts

Since mid-2024, Nubank’s in-house logging platform has been the default for log storage and querying. It currently ingests 1 trillion logs daily, totaling 1 PB of data. With a 45-day retention period, it stores 45 PB of searchable data. The platform handles almost 15,000 queries daily, scanning 150 PB of data each day.

Nubank developed this in-house logging platform to achieve significant cost savings and operational efficiency, moving away from reliance on external vendors. This platform is designed to support all current and future operations, scaling efficiently while costing 50% less than market solutions, according to our benchmarks.

This approach also provides Nubank with unparalleled control and flexibility. It enables rapid iteration, custom feature development, and a deeper understanding of data flows, leading to improved analytics, troubleshooting, and security.

Challenging the status quo is a core Nubank value, and this ambition drove the creation of an entire log platform from scratch, leveraging a combination of open-source projects and in-house software development.

The post How Nubank Built its in-house log platform appeared first on Building Nubank.

Permalink

🌌A Tutorial on P-adic Structures with Clojure.

This tutorial explores how to construct and analyze p-adic structures using prefix trees (tries) in Clojure. We will generalize binary Morton codes to other prime bases (like p=3, p=5) to understand p-adic norms and their applications in data analysis.

👈This article is a supplement to part 1.

1. 🔗 The Core Idea: Prefix Chains

Any sequence of data, such as a Morton code or spatial coordinates, can be broken down into a chain of its prefixes. This forms a natural hierarchy, where each step in the chain adds more specific information.

For a sequence [a, b, c, d], the prefix chain is:
[[a], [a, b], [a, b, c], [a, b, c, d]]

This structure is essentially a linked list or a simple trie, which is the foundation for our analysis. 🧱

(defn build-prefix-chain
  "Builds a list of all prefixes for a given sequence."
  [sequence]
  (map #(vec (take % sequence))
       (range 1 (inc (count sequence)))))

;; Example 💡
(let [morton-code [1 0 2 1]]
  (println "Sequence:" morton-code)
  (println "Prefix Chain:" (build-prefix-chain morton-code)))
;; Output:
;; Sequence: [1 0 2 1]
;; Prefix Chain: ([1] [1 0] [1 0 2] [1 0 2 1])

2. 🔀 Decomposing Data: Two Perspectives

We can analyze the hierarchical data in our prefix chains in two ways, analogous to Jordan and Cartan decompositions in algebra.

📊 A. Jordan-like Decomposition (Breadth-First)

This approach processes prefixes level by level, from shortest to longest. It's useful for analyzing data at progressive scales of detail.

(defn jordan-decomposition
  "Sorts prefixes by their length (breadth-first)."
  [prefix-chains]
  (sort-by count (distinct (apply concat prefix-chains))))

;; Example: Analyze all prefixes by depth level 📏
(let [sequences [[1 0 2] [1 0 1] [2 0]]
      all-prefixes (mapcat build-prefix-chain sequences)
      decomposed (jordan-decomposition all-prefixes)]
  (clojure.pprint/pprint (group-by count decomposed)))
;; Output:
;; {1 #{[1] [2]},
;;  2 #{[1 0], [2 0]},
;;  3 #{[1 0 2], [1 0 1]}}

🎯 B. Cartan-like Decomposition (Depth-First)

This approach processes the deepest (most specific) prefixes first. It's useful for focusing on fine-grained local details before considering the broader structure.

(defn cartan-decomposition
  "Sorts prefixes by length in descending order (depth-first)."
  [prefix-chains]
  (reverse (jordan-decomposition prefix-chains)))

;; Example: Focus on the most detailed prefixes first 🔍
(let [sequences [[1 0 2] [1 0 1] [2 0]]
      all-prefixes (mapcat build-prefix-chain sequences)
      decomposed (cartan-decomposition all-prefixes)]
  (println decomposed))
;; Output:
;; ([1 0 2] [1 0 1] [1 0] [2 0] [1] [2])

3. 📐 P-adic Norms and Ultrametric Distance

The prefix structure directly leads to the concept of p-adic norms and ultrametric distance. The distance between two sequences is determined by the length of their longest common prefix.

If two sequences A and B share a prefix of length k, their ultrametric distance is p^(-k), where p is the base of the digits (e.g., p=2 for binary, p=3 for ternary). The longer the shared prefix, the closer they are. 🎯

(defn get-common-prefix-length
  "Finds the length of the common prefix between two sequences."
  [seq-a seq-b]
  (count (take-while true? (map = seq-a seq-b))))

(defn p-adic-distance
  "Calculates the p-adic distance between two sequences for a given base p."
  [p seq-a seq-b]
  (let [k (get-common-prefix-length seq-a seq-b)]
    (Math/pow p (- k))))

;; Example with p=3 ⚡
(let [p 3
      a [1 2 0 1]
      b [1 2 0 2]
      c [1 2 1 0]]
  (println (str "Common prefix length (a, b): " (get-common-prefix-length a b)))
  (println (str "Distance(a, b): " (p-adic-distance p a b))) ; should be 3^-3 = 0.037
  (println (str "Common prefix length (a, c): " (get-common-prefix-length a c)))
  (println (str "Distance(a, c): " (p-adic-distance p a c)))) ; should be 3^-2 = 0.111

This distance function satisfies the strong triangle inequality, d(x, z) <= max(d(x, y), d(y, z)), which is the defining property of an ultrametric space. ✨

4. 🌍 Case Study: Clustering with Ternary (p=3) Morton Codes

Let's apply these ideas to cluster spatial data. Instead of using standard binary (p=2) Morton codes, we can use a ternary (p=3) system. This creates a different hierarchical grouping of the data.

The goal is to convert 3D coordinates into a 1D ternary Morton code, which preserves spatial locality. Then, we can use our p-adic distance to find clusters. 🗺️

;; A simplified function to interleave digits for a p-adic Morton code 🔢
(defn to-base-p [p precision n]
  (loop [num n
         result ()]
    (if (or (zero? num) (= (count result) precision))
      (take precision (concat (repeat (- precision (count result)) 0) result))
      (recur (quot num p) (cons (rem num p) result)))))

(defn p-ary-morton-3d [x y z p precision]
  (let [x' (to-base-p p precision x)
        y' (to-base-p p precision y)
        z' (to-base-p p precision z)]
    (vec (interleave x' y' z'))))

;; Example: Use p=3 for clustering earthquake data 🌋
(let [p 3
      precision 4
      ;; Mock earthquake data (normalized coordinates)
      earthquakes [[10 12 5] [11 13 5] [25 26 20]]

      ;; Generate ternary Morton codes
      morton-codes (map #(p-ary-morton-3d (nth % 0) (nth % 1) (nth % 2) p precision) earthquakes)
      [morton-a morton-b morton-c] morton-codes]

  (println "Earthquake A Morton:" morton-a)
  (println "Earthquake B Morton:" morton-b)
  (println "Earthquake C Morton:" morton-c)

  ;; The first two earthquakes are spatially close 📍
  ;; Their Morton codes will share a longer prefix.
  (println "\nDistance A-B:" (p-adic-distance p morton-a morton-b))
  (println "Distance A-C:" (p-adic-distance p morton-a morton-c)))

;; By sorting data based on these Morton codes, we achieve
;; a spatially coherent ordering that can be used for efficient
;; clustering, neighbor searches, and indexing. 🚀

🎉 Conclusion

By viewing data sequences as prefix trees, we have built a practical foundation for understanding p-adic numbers and ultrametric spaces. This tutorial shows that we can go beyond binary systems to construct p-adic fields for any prime p, using it to decompose and analyze data in a hierarchical way.

This approach connects computational geometry with number theory, offering a powerful framework for spatial analysis in Clojure. 💎

Buy me a coffee if this helped! ☕

Permalink

Clojure Deref (Sep 2, 2025)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS).

Upcoming Events

Podcasts, videos, and media

Libraries and Tools

New releases and tools this week:

Permalink

Tripping around REPL

What does it mean, tripping around? Is it about round-tripping values between the REPL and the editor? Or about tripping over obstacles? In this post, I talk about both!

Round-tripping

In the context of REPL use, round-tripping means a particular property of printed data: the printed string representation of data, if evaluated, produces an equivalent data structure. For example, this map is round-trippable:

{
    :a 1 
    :b true 
    "str" 0
}

If you print it, you’ll get the same thing back. This is very useful because it speeds up development at the REPL: maps can be copied, saved, loaded, programmatically re-read, you get the point.

Some things cannot be round-tripped. Take this function, for example:

;; normal Clojure REPL
user=> assoc
#object[clojure.core$assoc__5416 0x4a9486c0 "clojure.core$assoc__5416@4a9486c0"]

Function is not data; it cannot always be round-tripped exactly, so you get this. It makes sense to a degree, though I don’t like it. The utility of round-tripping during development far outweighs the drawback of some inaccuracy. For this reason, Reveal — Read Eval Visualize Loop for Clojure — always used this representation instead:

;; Reveal REPL output
assoc
=> clojure.core/assoc

Why did I use this representation? Two reasons:

  1. It is round-trippable. I can copy a data structure with a function from the Reveal output pane into REPL, and it will evaluate to the same function without problem.
  2. Due to syntax highlighting, it is visually distinct from symbol clojure.core/assoc:

Did you notice #_0x4a9486c0 after the function name? This is a new addition to the Reveal function printer, available in Reveal 1.3.296. It fixes a problem I was tripping over from time to time.

Tripping over identity

Default Clojure representation of a function includes an important bit of information: the object’s identity hash code. Identity matters, and hiding it makes it harder to discover identity-related issues; for example:

  • When comparing objects for equality to determine if some computation has to be repeated, using a function as a part of a “cache entry” requires care. Yes, I have a custom partial implementation with equality semantics in production.
  • Using objects with unique identity as keys requires care.

One particular gotcha is regex: instances of java.util.regex.Pattern do NOT define value equality and hash code. This means using them as keys is dangerous. This is why Reveal also shows regexes with their identity:

Yes, this code is not even a duplicate key error:

{#"a|b" :a-or-b
 #"a|b" :a-or-b}

You might ask, why use #_0xcafebabe to show identity? Well, that’s because it does not sacrifice round-tripping! #_ is a reader macro that ignores the next form, and 0xcafebabe is a valid, complete Clojure form. With this approach, you can both:

  • see identities of objects
  • copy them from the output pane to the editor, evaluate, and get (more or less) equivalent objects, whose identities, again, you can see.

More round-tripping with syntax highlighting

Syntax highlighting adds color — an extra dimension to printed data that allows for differentiating related things when the text is the same. Earlier, I showed how the symbol clojure.core/assoc and the function clojure.core/assoc use the same text, but different colors. But there is more! If we can use colors to differentiate symbols and functions, we can use them to differentiate objects and Clojure forms that produce such objects when evaluated. What kinds of objects? Refs! Futures! Files! Other stuff!

When dogfooding this feature, I found it important to use a separate color for parens, making them grey so they are not mistaken for collections (which also use parens — of yellow color). I think it’s very useful!

Tripping over namespaces (in Cursive)

In the final part of the post, I want to talk about using socket REPL in Cursive. I’ve been using it with Reveal for ages. One aspect in which a socket REPL is inferior to nREPL is automatic switching of a namespace to the current file. Cursive — a Clojure plugin for Intellij IDEA — only sends evaluated forms verbatim when using socket REPL. This means every time you switch between Clojure files in IDEA, you need to trigger a shortcut that will explicitly switch the ns so that sent forms will evaluate without errors. It’s annoying that I have to have this habit.

Had to have this habit. Turns out, since Cursive also sends file and line as form metadata, Reveal (or any other REPL implementation, really) can infer the right namespace for evaluation by inspecting the file content. The newest version of Reveal now supports this (under a flag, but enabled by default)! This means Reveal, when used as a socket REPL in IDEA, will now automatically evaluate forms in the right namespace — this greatly improves the experience!

Permalink

Build and Deploy Web Apps With Clojure and FLy.io

This post walks through a small web development project using Clojure, covering everything from building the app to packaging and deploying it. It’s a collection of insights and tips I’ve learned from building my Clojure side projects but presented in a more structured format.

As the title suggests, we’ll be deploying the app to Fly.io. It’s a service that allows you to deploy apps packaged as Docker images on lightweight virtual machines.[1] My experience with it has been good, it’s easy to use and quick to set up. One downside of Fly is that it doesn’t have a free tier, but if you don’t plan on leaving the app deployed, it barely costs anything.

This isn’t a tutorial on Clojure, so I’ll assume you already have some familiarity with the language as well as some of its libraries.[2]

Project Setup

In this post, we’ll be building a barebones bookmarks manager for the demo app. Users can log in using basic authentication, view all bookmarks, and create a new bookmark. It’ll be a traditional multi-page web app and the data will be stored in a SQLite database.

Here’s an overview of the project’s starting directory structure:

.
├── dev
│   └── user.clj
├── resources
│   └── config.edn
├── src
│   └── acme
│       └── main.clj
└── deps.edn

And the libraries we’re going to use. If you have some Clojure experience or have used Kit, you’re probably already familiar with all the libraries listed below.[3]

deps.edn
{:paths ["src" "resources"]
 :deps {org.clojure/clojure               {:mvn/version "1.12.0"}
        aero/aero                         {:mvn/version "1.1.6"}
        integrant/integrant               {:mvn/version "0.11.0"}
        ring/ring-jetty-adapter           {:mvn/version "1.12.2"}
        metosin/reitit-ring               {:mvn/version "0.7.2"}
        com.github.seancorfield/next.jdbc {:mvn/version "1.3.939"}
        org.xerial/sqlite-jdbc            {:mvn/version "3.46.1.0"}
        hiccup/hiccup                     {:mvn/version "2.0.0-RC3"}}
 :aliases
 {:dev {:extra-paths ["dev"]
        :extra-deps  {nrepl/nrepl    {:mvn/version "1.3.0"}
                      integrant/repl {:mvn/version "0.3.3"}}
        :main-opts   ["-m" "nrepl.cmdline" "--interactive" "--color"]}}}

I use Aero and Integrant for my system configuration (more on this in the next section), Ring with the Jetty adaptor for the web server, Reitit for routing, next.jdbc for database interaction, and Hiccup for rendering HTML. From what I’ve seen, this is a popular “library combination” for building web apps in Clojure.[4]

The user namespace in dev/user.clj contains helper functions from Integrant-repl to start, stop, and restart the Integrant system.

dev/user.clj
(ns user
  (:require
   [acme.main :as main]
   [clojure.tools.namespace.repl :as repl]
   [integrant.core :as ig]
   [integrant.repl :refer [set-prep! go halt reset reset-all]]))

(set-prep!
 (fn []
   (ig/expand (main/read-config)))) ;; we'll implement this soon

(repl/set-refresh-dirs "src" "resources")

(comment
  (go)
  (halt)
  (reset)
  (reset-all))

Systems and Configuration

If you’re new to Integrant or other dependency injection libraries like Component, I’d suggest reading “How to Structure a Clojure Web”. It’s a great explanation about the reasoning behind these libraries. Like most Clojure apps that use Aero and Integrant, my system configuration lives in a .edn file. I usually name mine as resources/config.edn. Here’s what it looks like:

resources/config.edn
{:server
 {:port #long #or [#env PORT 8080]
  :host #or [#env HOST "0.0.0.0"]
  :auth {:username #or [#env AUTH_USER "john.doe@email.com"]
         :password #or [#env AUTH_PASSWORD "password"]}}

 :database
 {:dbtype "sqlite"
  :dbname #or [#env DB_DATABASE "database.db"]}}

In production, most of these values will be set using environment variables. During local development, the app will use the hard-coded default values. We don’t have any sensitive values in our config (e.g., API keys), so it’s fine to commit this file to version control. If there are such values, I usually put them in another file that’s not tracked by version control and include them in the config file using Aero’s #include reader tag.

This config file is then “expanded” into the Integrant system map using the expand-key method:

src/acme/main.clj
(ns acme.main
  (:require
   [aero.core :as aero]
   [clojure.java.io :as io]
   [integrant.core :as ig]))

(defn read-config
  []
  {:system/config (aero/read-config (io/resource "config.edn"))})

(defmethod ig/expand-key :system/config
  [_ opts]
  (let [{:keys [server database]} opts]
    {:server/jetty (assoc server :handler (ig/ref :handler/ring))
     :handler/ring {:database (ig/ref :database/sql)
                    :auth     (:auth server)}
     :database/sql database}))

The system map is created in code instead of being in the configuration file. This makes refactoring your system simpler as you only need to change this method while leaving the config file (mostly) untouched.[5]

My current approach to Integrant + Aero config files is mostly inspired by the blog post “Rethinking Config with Aero & Integrant” and Laravel’s configuration. The config file follows a similar structure to Laravel’s config files and contains the app configurations without describing the structure of the system. Previously I had a key for each Integrant component, which led to the config file being littered with #ig/ref and more difficult to refactor.

Also, if you haven’t already, start a REPL and connect to it from your editor. Run clj -M:dev if your editor doesn’t automatically start a REPL. Next, we’ll implement the init-key and halt-key! methods for each of the components:

src/acme/main.clj
;; src/acme/main.clj
(ns acme.main
  (:require
   ;; ...
   [acme.handler :as handler]
   [acme.util :as util])
   [next.jdbc :as jdbc]
   [ring.adapter.jetty :as jetty]))
;; ...

(defmethod ig/init-key :server/jetty
  [_ opts]
  (let [{:keys [handler port]} opts
        jetty-opts (-> opts (dissoc :handler :auth) (assoc :join? false))
        server     (jetty/run-jetty handler jetty-opts)]
    (println "Server started on port " port)
    server))

(defmethod ig/halt-key! :server/jetty
  [_ server]
  (.stop server))

(defmethod ig/init-key :handler/ring
  [_ opts]
  (handler/handler opts))

(defmethod ig/init-key :database/sql
  [_ opts]
  (let [datasource (jdbc/get-datasource opts)]
    (util/setup-db datasource)
    datasource))

The setup-db function creates the required tables in the database if they don’t exist yet. This works fine for database migrations in small projects like this demo app, but for larger projects, consider using libraries such as Migratus (my preferred library) or Ragtime.

src/acme/util.clj
(ns acme.util 
  (:require
   [next.jdbc :as jdbc]))

(defn setup-db
  [db]
  (jdbc/execute-one!
   db
   ["create table if not exists bookmarks (
       bookmark_id text primary key not null,
       url text not null,
       created_at datetime default (unixepoch()) not null
     )"]))

For the server handler, let’s start with a simple function that returns a “hi world” string.

src/acme/handler.clj
(ns acme.handler
  (:require
   [ring.util.response :as res]))

(defn handler
  [_opts]
  (fn [req]
    (res/response "hi world")))

Now all the components are implemented. We can check if the system is working properly by evaluating (reset) in the user namespace. This will reload your files and restart the system. You should see this message printed in your REPL:

:reloading (acme.util acme.handler acme.main)
Server started on port  8080
:resumed

If we send a request to http://localhost:8080/, we should get “hi world” as the response:

$ curl localhost:8080/
# hi world

Nice! The system is working correctly. In the next section, we’ll implement routing and our business logic handlers.

Routing, Middleware, and Route Handlers

First, let’s set up a ring handler and router using Reitit. We only have one route, the index / route that’ll handle both GET and POST requests.

src/acme/handler.clj
(ns acme.handler
  (:require
   [reitit.ring :as ring]))

(def routes
  [["/" {:get  index-page
         :post index-action}]])

(defn handler
  [opts]
  (ring/ring-handler
   (ring/router routes)
   (ring/routes
    (ring/redirect-trailing-slash-handler)
    (ring/create-resource-handler {:path "/"})
    (ring/create-default-handler))))

We’re including some useful middleware:

  • redirect-trailing-slash-handler to resolve routes with trailing slashes,
  • create-resource-handler to serve static files, and
  • create-default-handler to handle common 40x responses.

Implementing the Middlewares

If you remember the :handler/ring from earlier, you’ll notice that it has two dependencies, database and auth. Currently, they’re inaccessible to our route handlers. To fix this, we can inject these components into the Ring request map using a middleware function.

src/acme/handler.clj
;; ...

(defn components-middleware
  [components]
  (let [{:keys [database auth]} components]
    (fn [handler]
      (fn [req]
        (handler (assoc req
                        :db database
                        :auth auth))))))
;; ...

The components-middleware function takes in a map of components and creates a middleware function that “assocs” each component into the request map.[6] If you have more components such as a Redis cache or a mail service, you can add them here.

We’ll also need a middleware to handle HTTP basic authentication.[7] This middleware will check if the username and password from the request map matche the values in the auth map injected by components-middleware. If they match, then the request is authenticated and the user can view the site.

src/acme/handler.clj
(ns acme.handler
  (:require
   ;; ...
   [acme.util :as util]
   [ring.util.response :as res]))
;; ...

(defn wrap-basic-auth
  [handler]
  (fn [req]
    (let [{:keys [headers auth]} req
          {:keys [username password]} auth
          authorization (get headers "authorization")
          correct-creds (str "Basic " (util/base64-encode
                                       (format "%s:%s" username password)))]
      (if (and authorization (= correct-creds authorization))
        (handler req)
        (-> (res/response "Access Denied")
            (res/status 401)
            (res/header "WWW-Authenticate" "Basic realm=protected"))))))
;; ...

A nice feature of Clojure is that interop with the host language is easy. The base64-encode function is just a thin wrapper over Java’s Base64.Encoder:

src/acme/util.clj
(ns acme.util
   ;; ...
  (:import java.util.Base64))

(defn base64-encode
  [s]
  (.encodeToString (Base64/getEncoder) (.getBytes s)))

Finally, we need to add them to the router. Since we’ll be handling form requests later, we’ll also bring in Ring’s wrap-params middleware.

src/acme/handler.clj
(ns acme.handler
  (:require
   ;; ...
   [ring.middleware.params :refer [wrap-params]]))
;; ...

(defn handler
  [opts]
  (ring/ring-handler
   ;; ...
   {:middleware [(components-middleware opts)
                 wrap-basic-auth
                 wrap-params]}))

Implementing the Route Handlers

We now have everything we need to implement the route handlers or the business logic of the app. First, we’ll implement the index-page function which renders a page that:

  1. Shows all of the user’s bookmarks in the database, and
  2. Shows a form that allows the user to insert new bookmarks into the database
src/acme/handler.clj
(ns acme.handler
  (:require
   ;; ...
   [next.jdbc :as jdbc]
   [next.jdbc.sql :as sql]))
;; ...

(defn template
  [bookmarks]
  [:html
   [:head
    [:meta {:charset "utf-8"
            :name    "viewport"
            :content "width=device-width, initial-scale=1.0"}]]
   [:body
    [:h1 "bookmarks"]
    [:form {:method "POST"}
     [:div
      [:label {:for "url"} "url "]
      [:input#url {:name "url"
                   :type "url"
                   :required true
                   :placeholer "https://en.wikipedia.org/"}]]
     [:button "submit"]]
    [:p "your bookmarks:"]
    [:ul
     (if (empty? bookmarks)
       [:li "you don't have any bookmarks"]
       (map
        (fn [{:keys [url]}]
          [:li
           [:a {:href url} url]])
        bookmarks))]]])

(defn index-page
  [req]
  (try
    (let [bookmarks (sql/query (:db req)
                               ["select * from bookmarks"]
                               jdbc/unqualified-snake-kebab-opts)]
      (util/render (template bookmarks)))
    (catch Exception e
      (util/server-error e))))
;; ...

Database queries can sometimes throw exceptions, so it’s good to wrap them in a try-catch block. I’ll also introduce some helper functions:

src/acme/util.clj
(ns acme.util
  (:require
   ;; ...
   [hiccup2.core :as h]
   [ring.util.response :as res])
  (:import java.util.Base64))
;; ...

(defn preprend-doctype
  [s]
  (str "<!doctype html>" s))

(defn render
  [hiccup]
  (-> hiccup h/html str preprend-doctype res/response (res/content-type "text/html")))

(defn server-error
  [e]
  (println "Caught exception: " e)
  (-> (res/response "Internal server error")
      (res/status 500)))

render takes a hiccup form and turns it into a ring response, while server-error takes an exception, logs it, and returns a 500 response.

Next, we’ll implement the index-action function:

src/acme/handler.clj
;; ...

(defn index-action
  [req]
  (try
    (let [{:keys [db form-params]} req
          value (get form-params "url")]
      (sql/insert! db :bookmarks {:bookmark_id (random-uuid) :url value})
      (res/redirect "/" 303))
    (catch Exception e
      (util/server-error e))))
;; ...

This is an implementation of a typical post/redirect/get pattern. We get the value from the URL form field, insert a new row in the database with that value, and redirect back to the index page. Again, we’re using a try-catch block to handle possible exceptions from the database query.

That should be all of the code for the controllers. If you reload your REPL and go to http://localhost:8080, you should see something that looks like this after logging in:

Screnshot of the app

The last thing we need to do is to update the main function to start the system:

src/acme/main.clj
;; ...

(defn -main [& _]
  (-> (read-config) ig/expand ig/init))

Now, you should be able to run the app using clj -M -m acme.main. That’s all the code needed for the app. In the next section, we’ll package the app into a Docker image to deploy to Fly.

Packaging the App

While there are many ways to package a Clojure app, Fly.io specifically requires a Docker image. There are two approaches to doing this:

  1. Build an uberjar and run it using Java in the container, or
  2. Load the source code and run it using Clojure in the container

Both are valid approaches. I prefer the first since its only dependency is the JVM. We’ll use the tools.build library to build the uberjar. Check out the official guide for more information on building Clojure programs. Since it’s a library, to use it we can add it to our deps.edn file with an alias:

deps.edn
{;; ...
 :aliases
 {;; ...
  :build {:extra-deps {io.github.clojure/tools.build 
                       {:git/tag "v0.10.5" :git/sha "2a21b7a"}}
          :ns-default build}}}

Tools.build expects a build.clj file in the root of the project directory, so we’ll need to create that file. This file contains the instructions to build artefacts, which in our case is a single uberjar. There are many great examples of build.clj files on the web, including from the official documentation. For now, you can copy+paste this file into your project.

build.clj
(ns build
  (:require
   [clojure.tools.build.api :as b]))

(def basis (delay (b/create-basis {:project "deps.edn"})))
(def src-dirs ["src" "resources"])
(def class-dir "target/classes")

(defn uber
  [_]
  (println "Cleaning build directory...")
  (b/delete {:path "target"})

  (println "Copying files...")
  (b/copy-dir {:src-dirs   src-dirs
               :target-dir class-dir})

  (println "Compiling Clojure...")
  (b/compile-clj {:basis      @basis
                  :ns-compile '[acme.main]
                  :class-dir  class-dir})

  (println "Building Uberjar...")
  (b/uber {:basis     @basis
           :class-dir class-dir
           :uber-file "target/standalone.jar"
           :main      'acme.main}))

To build the project, run clj -T:build uber. This will create the uberjar standalone.jar in the target directory. The uber in clj -T:build uber refers to the uber function from build.clj. Since the build system is a Clojure program, you can customise it however you like. If we try to run the uberjar now, we’ll get an error:

# build the uberjar
$ clj -T:build uber
# Cleaning build directory...
# Copying files...
# Compiling Clojure...
# Building Uberjar...

# run the uberjar
$ java -jar target/standalone.jar
# Error: Could not find or load main class acme.main
# Caused by: java.lang.ClassNotFoundException: acme.main

This error occurred because the Main class that is required by Java isn’t built. To fix this, we need to add the :gen-class directive in our main namespace. This will instruct Clojure to create the Main class from the -main function.

src/acme/main.clj
(ns acme.main
  ;; ...
  (:gen-class))
;; ...

If you rebuild the project and run java -jar target/standalone.jar again, it should work perfectly. Now that we have a working build script, we can write the Dockerfile:

Dockerfile
# install additional dependencies here in the base layer
# separate base from build layer so any additional deps installed are cached
FROM clojure:temurin-21-tools-deps-bookworm-slim AS base

FROM base as build
WORKDIR /opt
COPY . .
RUN clj -T:build uber

FROM eclipse-temurin:21-alpine AS prod
COPY --from=build /opt/target/standalone.jar /
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "standalone.jar"]

It’s a multi-stage Dockerfile. We use the official Clojure Docker image as the layer to build the uberjar. Once it’s built, we copy it to a smaller Docker image that only contains the Java runtime.[8] By doing this, we get a smaller container image as well as a faster Docker build time because the layers are better cached.

That should be all for packaging the app. We can move on to the deployment now.

Deploying with Fly.io

First things first, you’ll need to install flyctl, Fly’s CLI tool for interacting with their platform. Create a Fly.io account if you haven’t already. Then run fly auth login to authenticate flyctl with your account.

Next, we’ll need to create a new Fly App:

$ fly app create
# ? Choose an app name (leave blank to generate one): 
# automatically selected personal organization: Ryan Martin
# New app created: blue-water-6489

Another way to do this is with the fly launch command, which automates a lot of the app configuration for you. We have some steps to do that are not done by fly launch, so we’ll be configuring the app manually. I also already have a fly.toml file ready that you can straight away copy to your project.

fly.toml
# replace these with your app and region name
# run `fly platform regions` to get a list of regions
app = 'blue-water-6489' 
primary_region = 'sin'

[env]
  DB_DATABASE = "/data/database.db"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 0

[mounts]
  source = "data"
  destination = "/data"
  initial_sie = 1

[[vm]]
  size = "shared-cpu-1x"
  memory = "512mb"
  cpus = 1
  cpu_kind = "shared"

These are mostly the default configuration values with some additions. Under the [env] section, we’re setting the SQLite database location to /data/database.db. The database.db file itself will be stored in a persistent Fly Volume mounted on the /data directory. This is specified under the [mounts] section. Fly Volumes are similar to regular Docker volumes but are designed for Fly’s micro VMs.

We’ll need to set the AUTH_USER and AUTH_PASSWORD environment variables too, but not through the fly.toml file as these are sensitive values. To securely set these credentials with Fly, we can set them as app secrets. They’re stored encrypted and will be automatically injected into the app at boot time.

$ fly secrets set AUTH_USER=hi@ryanmartin.me AUTH_PASSWORD=not-so-secure-password
# Secrets are staged for the first deployment

With this, the configuration is done and we can deploy the app using fly deploy:

$ fly deploy
# ...
# Checking DNS configuration for blue-water-6489.fly.dev
# Visit your newly deployed app at https://blue-water-6489.fly.dev/

The first deployment will take longer since it’s building the Docker image for the first time. Subsequent deployments should be faster due to the cached image layers. You can click on the link to view the deployed app, or you can also run fly open which will do the same thing. Here’s the app in action:

The app in action

If you made additional changes to the app or fly.toml, you can redeploy the app using the same command, fly deploy. The app is configured to auto stop/start, which helps to cut costs when there’s not a lot of traffic to the site. If you want to take down the deployment, you’ll need to delete the app itself using fly app destroy <your app name>.

Adding a Production REPL

This is an interesting topic in the Clojure community, with varying opinions on whether or not it’s a good idea. Personally I find having a REPL connected to the live app helpful, and I often use it for debugging and running queries on the live database.[9] Since we’re using SQLite, we don’t have a database server we can directly connect to, unlike Postgres or MySQL.

If you’re brave, you can even restart the app directly without redeploying from the REPL. You can easily go wrong with it, which is why some prefer to not use it.

For this project, we’re gonna add a socket REPL. It’s very simple to add (you just need to add a JVM option) and it doesn’t require additional dependencies like nREPL. Let’s update the Dockerfile:

Dockerfile
# ...
EXPOSE 7888
ENTRYPOINT ["java", "-Dclojure.server.repl={:port 7888 :accept clojure.core.server/repl}", "-jar", "standalone.jar"]

The socket REPL will be listening on port 7888. If we redeploy the app now, the REPL will be started but we won’t be able to connect to it. That’s because we haven’t exposed the service through Fly proxy. We can do this by adding the socket REPL as a service in the [services] section in fly.toml.

However, doing this will also expose the REPL port to the public. This means that anyone can connect to your REPL and possibly mess with your app. Instead, what we want to do is to configure the socket REPL as a private service.

By default, all Fly apps in your organisation live in the same private network. This private network, called 6PN, connects the apps in your organisation through Wireguard tunnels (a VPN) using IPv6. Fly private services aren’t exposed to the public internet but can be reached from this private network. We can then use Wireguard to connect to this private network to reach our socket REPL.

Fly VMs are also configured with the hostname fly-local-6pn, which maps to its 6PN address. This is analogous to localhost, which points to your loopback address 127.0.0.1. To expose a service to 6PN, all we have to do is bind or serve it to fly-local-6pn instead of the usual 0.0.0.0. We have to update the socket REPL options to:

Dockerfile
# ...
ENTRYPOINT ["java", "-Dclojure.server.repl={:port 7888,:address \"fly-local-6pn\",:accept clojure.core.server/repl}", "-jar", "standalone.jar"]

After redeploying, we can use the fly proxy command to forward the port from the remote server to our local machine.[10]

$ fly proxy 7888:7888
# Proxying local port 7888 to remote [blue-water-6489.internal]:7888

In another shell, run:

$ rlwrap nc localhost 7888
# user=>

Now we have a REPL connected to the production app! rlwrap is used for readline functionality, e.g. up/down arrow keys, vi bindings. Of course you can also connect to it from your editor.

Deploy with GitHub Actions

If you’re using GitHub, we can also set up automatic deployments on pushes/PRs with GitHub Actions. All you need is to create the workflow file:

.github/workflows/fly.yaml
name: Fly Deploy
on:
  push:
    branches:
      - main
  workflow_dispatch:

jobs:
  deploy:
    name: Deploy app
    runs-on: ubuntu-latest
    concurrency: deploy-group
    steps:
      - uses: actions/checkout@v4
      - uses: superfly/flyctl-actions/setup-flyctl@master
      - run: flyctl deploy --remote-only
        env:
          FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}

To get this to work, you’ll need to create a deploy token from your app’s dashboard. Then, in your GitHub repo, create a new repository secret called FLY_API_TOKEN with the value of your deploy token. Now, whenever you push to the main branch, this workflow will automatically run and deploy your app. You can also manually run the workflow from GitHub because of the workflow_dispatch option.

End

As always, all the code is available on GitHub. Originally, this post was just about deploying to Fly.io, but along the way I kept adding on more stuff until it essentially became my version of the user manager example app. Anyway, hope this post provided a good view into web development with Clojure. As a bonus, here are some additional resources on deploying Clojure apps:


  1. The way Fly.io works under the hood is pretty clever. Instead of running the container image with a runtime like Docker, the image is unpacked and “loaded” into a VM. See this video explanation for more details. ↩︎

  2. If you’re interested in learning Clojure, my recommendation is to follow the official getting started guide and join the Clojurians Slack. Also, read through this list of introductory resources. ↩︎

  3. Kit was a big influence on me when I first started learning web development in Clojure. I never used it directly, but I did use their library choices and project structure as a base for my own projects. ↩︎

  4. There’s no “Rails” for the Clojure ecosystem (yet?). The prevailing opinion is to build your own “framework” by composing different libraries together. Most of these libraries are stable and are already used in production by big companies, so don’t let this discourage you from doing web development in Clojure! ↩︎

  5. There might be some keys that you add or remove, but the structure of the config file stays the same. ↩︎

  6. “assoc” (associate) is a Clojure slang that means to add or update a key-value pair in a map. ↩︎

  7. For more details on how basic authentication works, check out the specification. ↩︎

  8. Here’s a cool resource I found when researching Java Dockerfiles: WhichJDK. It provides a comprehensive comparison on the different JDKs available and recommendations on which one you should use. ↩︎

  9. Another (non-technically important) argument for live/production REPLs is just because it’s cool. Ever since I read the story about NASA’s programmers debugging a spacecraft through a live REPL, I’ve always wanted to try it at least once. ↩︎

  10. If you encounter errors related to Wireguard when running fly proxy, you can run fly doctor which will hopefully detect issues with your local setup and also suggest fixes for them. ↩︎

Permalink

Advent of Code 2024 in Zig

This post is about six seven months late, but here are my takeaways from Advent of Code 2024. It was my second time participating, and this time I actually managed to complete it.[1] My goal was to learn a new language, Zig, and to improve my DSA and problem-solving skills.

If you’re not familiar, Advent of Code is an annual programming challenge that runs every December. A new puzzle is released each day from December 1st to the 25th. There’s also a global leaderboard where people (and AI) race to get the fastest solves, but I personally don’t compete in it, mostly because I want to do it at my own pace.

I went with Zig because I have been curious about it for a while, mainly because of its promise of being a better C and because TigerBeetle (one of the coolest databases now) is written in it. Learning Zig felt like a good way to get back into systems programming, something I’ve been wanting to do after a couple of chaotic years of web development.

This post is mostly about my setup, results, and the things I learned from solving the puzzles. If you’re more interested in my solutions, I’ve also uploaded my code and solution write-ups to my GitHub repository.

My Advent of Code results page

Project Setup

There were several Advent of Code templates in Zig that I looked at as a reference for my development setup, but none of them really clicked with me. I ended up just running my solutions directly using zig run for the whole event. It wasn’t until after the event ended that I properly learned Zig’s build system and reorganised my project.

Here’s what the project structure looks like now:

.
├── src
│   ├── days
│   │   ├── data
│   │   │   ├── day01.txt
│   │   │   ├── day02.txt
│   │   │   └── ...
│   │   ├── day01.zig
│   │   ├── day02.zig
│   │   └── ...
│   ├── bench.zig
│   └── run.zig
└── build.zig

The project is powered by build.zig, which defines several commands:

  1. Build
    • zig build - Builds all of the binaries for all optimisation modes.
  2. Run
    • zig build run - Runs all solutions sequentially.
    • zig build run -Day=XX - Runs the solution of the specified day only.
  3. Benchmark
    • zig build bench - Runs all benchmarks sequentially.
    • zig build bench -Day=XX - Runs the benchmark of the specified day only.
  4. Test
    • zig build test - Runs all tests sequentially.
    • zig build test -Day=XX - Runs the tests of the specified day only.

You can also pass the optimisation mode that you want to any of the commands above with the -Doptimize flag.

Under the hood, build.zig compiles src/run.zig when you call zig build run, and src/bench.zig when you call zig build bench. These files are templates that import the solution for a specific day from src/days/dayXX.zig. For example, here’s what src/run.zig looks like:

src/run.zig
const std = @import("std");
const puzzle = @import("day"); // Injected by build.zig

pub fn main() !void {
    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();
    const allocator = arena.allocator();

    std.debug.print("{s}\n", .{puzzle.title});
    _ = try puzzle.run(allocator, true);
    std.debug.print("\n", .{});
}

The day module imported is an anonymous import dynamically injected by build.zig during compilation. This allows a single run.zig or bench.zig to be reused for all solutions. This avoids repeating boilerplate code in the solution files. Here’s a simplified version of my build.zig file that shows how this works:

build.zig
const std = @import("std");

pub fn build(b: *std.Build) void {
    const target = b.standardTargetOptions(.{});
    const optimize = b.standardOptimizeOption(.{});

    const run_all = b.step("run", "Run all days");
    const day_option = b.option(usize, "ay", ""); // The `-Day` option

    // Generate build targets for all 25 days.
    for (1..26) |day| {
        const day_zig_file = b.path(b.fmt("src/days/day{d:0>2}.zig", .{day}));

        // Create an executable for running this specific day.
        const run_exe = b.addExecutable(.{
            .name = b.fmt("run-day{d:0>2}", .{day}),
            .root_source_file = b.path("src/run.zig"),
            .target = target,
            .optimize = optimize,
        });

        // Inject the day-specific solution file as the anonymous module `day`.
        run_exe.root_module.addAnonymousImport("day", .{ .root_source_file = day_zig_file });

        // Install the executable so it can be run.
        b.installArtifact(run_exe);

        // ...
    }
}

My actual build.zig has some extra code that builds the binaries for all optimisation modes.

This setup is pretty barebones. I’ve seen other templates do cool things like scaffold files, download puzzle inputs, and even submit answers automatically. Since I wrote my build.zig after the event ended, I didn’t get to use it while solving the puzzles. I might add these features to it if I decided to do Advent of Code again this year with Zig.

Self-Imposed Constraints

While there are no rules to Advent of Code itself, to make things a little more interesting, I set a few constraints and rules for myself:

  1. The code must be readable. By “readable”, I mean the code should be straightforward and easy to follow. No unnecessary abstractions. I should be able to come back to the code months later and still understand (most of) it.
  2. Solutions must be a single file. No external dependencies. No shared utilities module. Everything needed to solve the puzzle should be visible in that one solution file.
  3. The total runtime must be under one second.[2] All solutions, when run sequentially, should finish in under one second. I want to improve my performance engineering skills.
  4. Parts should be solved separately. This means: (1) no solving both parts simultaneously, and (2) no doing extra work in part one that makes part two faster. The aim of this is to get a clear idea of how long each part takes on its own.
  5. No concurrency or parallelism. Solutions must run sequentially on a single thread. This keeps the focus on the efficiency of the algorithm. I can’t speed up slow solutions by using multiple CPU cores.
  6. No ChatGPT. No Claude. No AI help. I want to train myself, not the LLM. I can look at other people’s solutions, but only after I have given my best effort at solving the problem.
  7. Follow the constraints of the input file. The solution doesn’t have to work for all possible scenarios, but it should work for all valid inputs. If the input file only contains 8-bit unsigned integers, the solution doesn’t have to handle larger integer types.
  8. Hardcoding is allowed. For example: size of the input, number of rows and columns, etc. Since the input is known at compile-time, we can skip runtime parsing and just embed it into the program using Zig’s @embedFile.

Most of these constraints are designed to push me to write clearer, more performant code. I also wanted my code to look like it was taken straight from TigerBeetle’s codebase (minus the assertions).[3] Lastly, I just thought it would make the experience more fun.

Favourite Puzzles

From all of the puzzles, here are my top 3 favourites:

  1. Day 6: Guard Gallivant - This is my slowest day (in benchmarks), but also the one I learned the most from. Some of these learnings include: using vectors to represent directions, padding 2D grids, metadata packing, system endianness, etc.
  2. Day 17: Chronospatial Computer - I love reverse engineering puzzles. I used to do a lot of these in CTFs during my university days. The best thing I learned from this day is the realisation that we can use different integer bases to optimise data representation. This helped improve my runtimes in the later days 22 and 23.
  3. Day 21: Keypad Conundrum - This one was fun. My gut told me that it can be solved greedily by always choosing the best move. It was right. Though I did have to scroll Reddit for a bit to figure out the step I was missing, which was that you have to visit the farthest keypads first. This is also my longest solution file (almost 400 lines) because I hardcoded the best-moves table.

Honourable mention:

  1. Day 24: Crossed Wires - Another reverse engineering puzzle. Confession: I didn’t solve this myself during the event. After 23 brutal days, my brain was too tired, so I copied a random Python solution from Reddit. When I retried it later, it turned out to be pretty fun. I still couldn’t find a solution I was satisfied with though.

Programming patterns and Zig Tricks

During the event, I learned a lot about Zig and performance, and also developed some personal coding conventions. Some of these are Zig-specific, but most are universal and can be applied across languages. This section covers general programming and Zig patterns I found useful. The next section will focus on performance-related tips.

Comptime

Zig’s flagship feature, comptime, is surprisingly useful. I knew Zig uses it for generics and that people do clever metaprogramming with it, but I didn’t expect to be using it so often myself.

My main use for comptime was to generate puzzle-specific types. All my solution files follow the same structure, with a DayXX function that takes some parameters (usually the input length) and returns a puzzle-specific type, e.g.:

src/days/day01.zig
fn Day01(comptime length: usize) type {
    return struct {
        const Self = @This();
        
        left: [length]u32 = undefined,
        right: [length]u32 = undefined,

        fn init(input: []const u8) !Self {}

        // ...
    };
}

This lets me instantiate the type with a size that matches my input:

src/days/day01.zig
// Here, `Day01` is called with the size of my actual input.
pub fn run(_: std.mem.Allocator, is_run: bool) ![3]u64 {
    // ...
    const input = @embedFile("./data/day01.txt");
    var puzzle = try Day01(1000).init(input);
    // ...
}

// Here, `Day01` is called with the size of my test input.
test "day 01 part 1 sample 1" {
    var puzzle = try Day01(6).init(sample_input);
    // ...
}

This allows me to reuse logic across different inputs while still hardcoding the array sizes. Without comptime, I have to either create a separate function for all my different inputs or dynamically allocate memory because I can’t hardcode the array size.

I also used comptime to shift some computation to compile-time to reduce runtime overhead. For example, on day 4, I needed a function to check whether a string matches either "XMAS" or its reverse, "SAMX". A pretty simple function that you can write as a one-liner in Python:

example.py
def matches(pattern, target):
    return target == pattern or target == pattern[::-1]

Typically a function like this requires some dynamic allocation to create the reversed string, since the length of the string is only known at runtime.[4] For this puzzle, since the words to reverse are known at compile-time, we can do something like this:

src/days/day04.zig
fn matches(comptime word: []const u8, slice: []const u8) bool {
    var reversed: [word.len]u8 = undefined;
    @memcpy(&reversed, word);
    std.mem.reverse(u8, &reversed);
    return std.mem.eql(u8, word, slice) or std.mem.eql(u8, &reversed, slice);
}

This creates a separate function for each word I want to reverse.[5] Each function has an array with the same size as the word to reverse. This removes the need for dynamic allocation and makes the code run faster. As a bonus, Zig also warns you when this word isn’t compile-time known, so you get an immediate error if you pass in a runtime value.

Optional Types

A common pattern in C is to return special sentinel values to denote missing values or errors, e.g. -1, 0, or NULL. In fact, I did this on day 13 of the challenge:

src/days/day13.zig
// We won't ever get 0 as a result, so we use it as a sentinel error value.
fn count_tokens(a: [2]u8, b: [2]u8, p: [2]i64) u64 {
    const numerator = @abs(p[0] * b[1] - p[1] * b[0]);
    const denumerator = @abs(@as(i32, a[0]) * b[1] - @as(i32, a[1]) * b[0]);
    return if (numerator % denumerator != 0) 0 else numerator / denumerator;
}

// Then in the caller, skip if the return value is 0.
if (count_tokens(a, b, p) == 0) continue;

This works, but it’s easy to forget to check for those values, or worse, to accidentally treat them as valid results. Zig improves on this with optional types. If a function might not return a value, you can return ?T instead of T. This also forces the caller to handle the null case. Unlike C, null isn’t a pointer but a more general concept. Zig treats null as the absence of a value for any type, just like Rust’s Option<T>.

The count_tokens function can be refactored to:

src/days/day13.zig
// Return null instead if there's no valid result.
fn count_tokens(a: [2]u8, b: [2]u8, p: [2]i64) ?u64 {
    const numerator = @abs(p[0] * b[1] - p[1] * b[0]);
    const denumerator = @abs(@as(i32, a[0]) * b[1] - @as(i32, a[1]) * b[0]);
    return if (numerator % denumerator != 0) null else numerator / denumerator;
}

// The caller is now forced to handle the null case.
if (count_tokens(a, b, p)) |n_tokens| {
    // logic only runs when n_tokens is not null.
}

Zig also has a concept of error unions, where a function can return either a value or an error. In Rust, this is Result<T>. You could also use error unions instead of optionals for count_tokens; Zig doesn’t force a single approach. I come from Clojure where returning nil for an error or missing value is common.

Grid Padding

This year has a lot of 2D grid puzzles (arguably too many). A common feature of grid-based algorithms is the out-of-bounds check. Here’s what it usually looks like:

example.zig
fn dfs(map: [][]u8, position: [2]i8) u32 {
    const x, const y = position;
    
    // Bounds check here.
    if (x < 0 or y < 0 or x >= map.len or y >= map[0].len) return 0;

    if (map[x][y] == .visited) return 0;
    map[x][y] = .visited;

    var result: u32 = 1;
    for (directions) | direction| {
        result += dfs(map, position + direction);
    }
    return result;
}

This is a typical recursive DFS function. After doing a lot of this, I discovered a nice trick that not only improves code readability, but also its performance. The trick here is to pad the grid with sentinel characters that mark out-of-bounds areas, i.e. add a border to the grid.

Here’s an example from day 6:

Original map:               With borders added:
                            ************
....#.....                  *....#.....*
.........#                  *.........#*
..........                  *..........*
..#.......                  *..#.......*
.......#..        ->        *.......#..*
..........                  *..........*
.#..^.....                  *.#..^.....*
........#.                  *........#.*
#.........                  *#.........*
......#...                  *......#...*
                            ************

You can use any value for the border, as long as it doesn’t conflict with valid values in the grid. With the border in place, the bounds check becomes a simple equality comparison:

example.zig
const border = '*';

fn dfs(map: [][]u8, position: [2]i8) u32 {
    const x, const y = position;
    if (map[x][y] == border) { // We are out of bounds
        return 0;
    }
    // ...
}

This is much more readable than the previous code. Plus, it’s also faster since we’re only doing one equality check instead of four range checks.

That said, this isn’t a one-size-fits-all solution. This only works for algorithms that traverse the grid one step at a time. If your logic jumps multiple tiles, it can still go out of bounds (except if you increase the width of the border to account for this). This approach also uses a bit more memory than the regular approach as you have to store more characters.

SIMD Vectors

This could also go in the performance section, but I’m including it here because the biggest benefit I get from using SIMD in Zig is the improved code readability. Because Zig has first-class support for vector types, you can write elegant and readable code that also happens to be faster.

If you’re not familiar with vectors, they are a special collection type used for Single instruction, multiple data (SIMD) operations. SIMD allows you to perform computation on multiple values in parallel using only a single CPU instruction, which often leads to some performance boosts.[6]

I mostly use vectors to represent positions and directions, e.g. for traversing a grid. Instead of writing code like this:

example.zig
next_position = .{ position[0] + direction[0], position[1] + direction[1] };

You can represent position and direction as 2-element vectors and write code like this:

example.zig
next_position = position + direction;

This is much nicer than the previous version!

Day 25 is another good example of a problem that can be solved elegantly using vectors:

src/days/day25.zig
var result: u64 = 0;
for (self.locks.items) |lock| { // lock is a vector
    for (self.keys.items) |key| { // key is also a vector
        const fitted = lock + key > @as(@Vector(5, u8), @splat(5));
        const is_overlap = @reduce(.Or, fitted);
        result += @intFromBool(!is_overlap);
    }
}

Expressing the logic as vector operations makes the code cleaner since you don’t have to write loops and conditionals as you typically would in a traditional approach.

Performance Tips

The tips below are general performance techniques that often help, but like most things in software engineering, “it depends”. These might work 80% of the time, but performance is often highly context-specific. You should benchmark your code instead of blindly following what other people say.

This section would’ve been more fun with concrete examples, step-by-step optimisations, and benchmarks, but that would’ve made the post way too long. Hopefully I’ll get to write something like that in the future.[7]

Minimise Allocations

Whenever possible, prefer static allocation. Static allocation is cheaper since it just involves moving the stack pointer vs dynamic allocation which has more overhead from the allocator machinery. That said, it’s not always the right choice since it has some limitations, e.g. stack size is limited, memory size must be compile-time known, its lifetime is tied to the current stack frame, etc.

If you need to do dynamic allocations, try to reduce the number of times you call the allocator. The number of allocations you do matters more than the amount of memory you allocate. More allocations mean more bookkeeping, synchronisation, and sometimes syscalls.

A simple but effective way to reduce allocations is to reuse buffers, whether they’re statically or dynamically allocated. Here’s an example from day 10. For each trail head, we want to create a set of trail ends reachable from it. The naive approach is to allocate a new set every iteration:

src/days/day10.zig
for (self.trail_heads.items) |trail_head| {
    var trail_ends = std.AutoHashMap([2]u8, void).init(self.allocator);
    defer trail_ends.deinit();
    
    // Set building logic...
}

What you can do instead is to allocate the set once before the loop. Then, each iteration, you reuse the set by emptying it without freeing the memory. For Zig’s std.AutoHashMap, this can be done using the clearRetainingCapacity method:

src/days/day10.zig
var trail_ends = std.AutoHashMap([2]u8, void).init(self.allocator);
defer trail_ends.deinit();

for (self.trail_heads.items) |trail_head| {
    trail_ends.clearRetainingCapacity();
    
    // Set building logic...
}

If you use static arrays, you can also just overwrite existing data instead of clearing it.

A step up from this is to reuse multiple buffers. The simplest form of this is to reuse two buffers, i.e. double buffering. Here’s an example from day 11:

src/days/day11.zig
// Initialize two hash maps that we'll alternate between.
var frequencies: [2]std.AutoHashMap(u64, u64) = undefined;
for (0..2) |i| frequencies[i] = std.AutoHashMap(u64, u64).init(self.allocator);
defer for (0..2) |i| frequencies[i].deinit();

var id: usize = 0;
for (self.stones) |stone| try frequencies[id].put(stone, 1);

for (0..n_blinks) |_| {
    var old_frequencies = &frequencies[id % 2];
    var new_frequencies = &frequencies[(id + 1) % 2];
    id += 1;

    defer old_frequencies.clearRetainingCapacity();

    // Do stuff with both maps...
}

Here we have two maps to count the frequencies of stones across iterations. Each iteration will build up new_frequencies with the values from old_frequencies. Doing this reduces the number of allocations to just 2 (the number of buffers). The tradeoff here is that it makes the code slightly more complex.

Make Your Data Smaller

A performance tip people say is to have “mechanical sympathy”. Understand how your code is processed by your computer. An example of this is to structure your data so it works better with your CPU. For example, keep related data close in memory to take advantage of cache locality.

Reducing the size of your data helps with this. Smaller data means more of it can fit in cache. One way to shrink your data is through bit packing. This depends heavily on your specific data, so you’ll need to use your judgement to tell whether this would work for you. I’ll just share some examples that worked for me.

The first example is in day 6 part two, where you have to detect a loop, which happens when you revisit a tile from the same direction as before. To track this, you could use a map or a set to store the tiles and visited directions. A more efficient option is to store this direction metadata in the tile itself.

There are only four tile types, which means you only need two bits to represent the tile types as an enum. If the enum size is one byte, here’s what the tiles look like in memory:

.obstacle -> 00000000
.path     -> 00000001
.visited  -> 00000010
.path     -> 00000011

As you can see, the upper six bits are unused. We can store the direction metadata in the upper four bits. One bit for each direction. If a bit is set, it means that we’ve already visited the tile in this direction. Here’s an illustration of the memory layout:

        direction metadata   tile type
           ┌─────┴─────┐   ┌─────┴─────┐
┌────────┬─┴─┬───┬───┬─┴─┬─┴─┬───┬───┬─┴─┐
│ Tile:  │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │ 1 │ 0 │
└────────┴─┬─┴─┬─┴─┬─┴─┬─┴───┴───┴───┴───┘
   up bit ─┘   │   │   └─ left bit
    right bit ─┘ down bit

If your language supports struct packing, you can express this layout directly:[8]

src/days/day06.zig
const Tile = packed struct(u8) {
    const TileType = enum(u4) { obstacle, path, visited, exit };

    up: u1 = 0,
    right: u1 = 0,
    down: u1 = 0,
    left: u1 = 0,
    tile: TileType,

    // ...
}

Doing this avoids extra allocations and improves cache locality. Since the directions metadata is colocated with the tile type, all of them can fit together in cache. Accessing the directions just requires some bitwise operations instead of having to fetch them from another region of memory.

Another way to do this is to represent your data using alternate number bases. Here’s an example from day 23. Computers are represented as two character strings made up of only lowercase letters, e.g. "bc", "xy", etc. Instead of storing this as a [2]u8 array, you can convert it into a base-26 number and store it as a u16.[9]

Here’s the idea: map 'a' to 0, 'b' to 1, up to 'z' as 25. Each character in the string becomes a digit in the base-26 number. For example, "bc" ( [2]u8{ 'b', 'c' }) becomes the base-10 number 28 (1×26+2=28). If we represent this using the base-64 character set, it becomes 12 ('b' = 1, 'c' = 2).

While they take the same amount of space (2 bytes), a u16 has some benefits over a [2]u8:

  1. It fits in a single register, whereas you need two for the array.
  2. Comparison is faster as there is only a single value to compare.

Reduce Branching

I won’t explain branchless programming here; the Algorithmica explains it way better than I can. While modern compilers are often smart enough to compile away branches, they don’t catch everything. I still recommend writing branchless code whenever it makes sense. It also has the added benefit of reducing the number of codepaths in your program.

Again, since performance is very context-dependent, I’ll just show you some patterns I use. Here’s one that comes up often:

src/days/day02.zig
if (is_valid_report(report)) {
    result += 1;
}

Instead of the branch, cast the bool into an integer directly:

src/days/day02.zig
result += @intFromBool(is_valid_report(report))

Another example is from day 6 (again!). Recall that to know if a tile has been visited from a certain direction, we have to check its direction bit. Here’s one way to do it:

src/days/day06.zig
fn has_visited(tile: Tile, direction: Direction) bool {
    switch (direction) {
        .up => return self.up == 1,
        .right => return self.right == 1,
        .down => return self.down == 1,
        .left => return self.left == 1,
    }
}

This works, but it introduces a few branches. We can make it branchless using bitwise operations:

src/days/day06.zig
fn has_visited(tile: Tile, direction: Direction) bool {
    const int_tile = std.mem.nativeToBig(u8, @bitCast(tile));
    const mask = direction.mask();
    const bits = int_tile & 0xff; // Get only the direction bits
    return bits & mask == mask;
}

While this is arguably cryptic and less readable, it does perform better than the switch version.

Avoid Recursion

The final performance tip is to prefer iterative code over recursion. Recursive functions bring the overhead of allocating stack frames. While recursive code is more elegant, it’s also often slower unless your language’s compiler can optimise it away, e.g. via tail-call optimisation. As far as I know, Zig doesn’t have this, though I might be wrong.

Recursion also has the risk of causing a stack overflow if the execution isn’t bounded. This is why code that is mission- or safety-critical avoids recursion entirely. It’s in TigerBeetle’s TIGERSTYLE and also NASA’s Power of Ten.

Iterative code can be harder to write in some cases, e.g. DFS maps naturally to recursion, but most of the time it is significantly faster, more predictable, and safer than the recursive alternative.

Benchmarks

I ran benchmarks for all 25 solutions in each of Zig’s optimisation modes. You can find the full results and the benchmark script in my GitHub repository. All benchmarks were done on an Apple M3 Pro.

As expected, ReleaseFast produced the best result with a total runtime of 85.1 ms. I’m quite happy with this, considering the two constraints that limited the number of optimisations I can do to the code:

  • Parts should be solved separately - Some days can be solved in a single go, e.g. day 10 and day 13, which could’ve saved a few milliseconds.
  • No concurrency or parallelism - My slowest days are the compute-heavy days that are very easily parallelisable, e.g. day 6, day 19, and day 22. Without this constraint, I can probably reach sub-20 milliseconds total(?), but that’s for another time.

You can see the full benchmarks for ReleaseFast in the table below:

Day Title Parsing (µs) Part 1 (µs) Part 2 (µs) Total (µs)
1 Historian Hysteria 23.5 15.5 2.8 41.8
2 Red-Nosed Reports 42.9 0.0 11.5 54.4
3 Mull it Over 0.0 7.2 16.0 23.2
4 Ceres Search 5.9 0.0 0.0 5.9
5 Print Queue 22.3 0.0 4.6 26.9
6 Guard Gallivant 14.0 25.2 24,331.5 24,370.7
7 Bridge Repair 72.6 321.4 9,620.7 10,014.7
8 Resonant Collinearity 2.7 3.3 13.4 19.4
9 Disk Fragmenter 0.8 12.9 137.9 151.7
10 Hoof It 2.2 29.9 27.8 59.9
11 Plutonian Pebbles 0.1 43.8 2,115.2 2,159.1
12 Garden Groups 6.8 164.4 249.0 420.3
13 Claw Contraption 14.7 0.0 0.0 14.7
14 Restroom Redoubt 13.7 0.0 0.0 13.7
15 Warehouse Woes 14.6 228.5 458.3 701.5
16 Reindeer Maze 12.6 2,480.8 9,010.7 11,504.1
17 Chronospatial Computer 0.1 0.2 44.5 44.8
18 RAM Run 35.6 15.8 33.8 85.2
19 Linen Layout 10.7 11,890.8 11,908.7 23,810.2
20 Race Condition 48.7 54.5 54.2 157.4
21 Keypad Conundrum 0.0 1.7 22.4 24.2
22 Monkey Market 20.7 0.0 11,227.7 11,248.4
23 LAN Party 13.6 22.0 2.5 38.2
24 Crossed Wires 5.0 41.3 14.3 60.7
25 Code Chronicle 24.9 0.0 0.0 24.9

A weird thing I found when benchmarking is that for day 6 part two, ReleaseSafe actually ran faster than ReleaseFast (13,189.0 µs vs 24,370.7 µs). Their outputs are the same, but for some reason ReleaseSafe is faster even with the safety checks still intact.

The Zig compiler is still very much a moving target, so I don’t want to dig too deep into this, as I’m guessing this might be a bug in the compiler. This weird behaviour might just disappear after a few compiler version updates.

Reflections

Looking back, I’m really glad I decided to do Advent of Code and followed through to the end. I learned a lot of things. Some are useful in my professional work, some are more like random bits of trivia. Going with Zig was a good choice too. The language is small, simple, and gets out of your way. I learned more about algorithms and concepts than the language itself.

Besides what I’ve already mentioned earlier, here are some examples of the things I learned:

Some of my self-imposed constraints and rules ended up being helpful. I can still (mostly) understand the code I wrote a few months ago. Putting all of the code in a single file made it easier to read since I don’t have to context switch to other files all the time.

However, some of them did backfire a bit, e.g. the two constraints that limit how I can optimise my code. Another one is the “hardcoding allowed” rule. I used a lot of magic numbers, which helped to improve performance, but I didn’t document them so after a while I don’t even remember how I got them. I’ve since gone back and added explanations in my write-ups, but next time I’ll remember to at least leave comments.

One constraint I’ll probably remove next time is the no concurrency rule. It’s the biggest contributor to the total runtime of my solutions. I don’t do a lot of concurrent programming, even though my main language at work is Go, so next time it might be a good idea to use Advent of Code to level up my concurrency skills.

I also spent way more time on these puzzles than I originally expected. I optimised and rewrote my code multiple times. I also rewrote my write-ups a few times to make them easier to read. This is by far my longest side project yet. It’s a lot of fun, but it also takes a lot of time and effort. I almost gave up on the write-ups (and this blog post) because I don’t want to explain my awful day 15 and day 16 code. I ended up taking a break for a few months before finishing it, which is why this post is published in August lol.

Just for fun, here’s a photo of some of my notebook sketches that helped me visualise my solutions. See if you can guess which days these are from:

Photos of my notebook sketches

What’s Next?

So… would I do it again? Probably, though I’m not making any promises. If I do join this year, I’ll probably stick with Zig. I had my eyes on Zig since the start of 2024, so Advent of Code was the perfect excuse to learn it. This year, there aren’t any languages in particular that caught my eye, so I’ll just keep using Zig, especially since I have a proper setup ready.

If you haven’t tried Advent of Code, I highly recommend checking it out this year. It’s a great excuse to learn a new language, improve your problem-solving skills, or just to learn something new. If you’re eager, you can also do the previous years’ puzzles as they’re still available.

One of the best aspects of Advent of Code is the community. The Advent of Code subreddit is a great place for discussion. You can ask questions and also see other people’s solutions. Some people also post really cool visualisations like this one. They also have memes!


  1. I failed my first attempt horribly with Clojure during Advent of Code 2023. Once I reached the later half of the event, I just couldn’t solve the problems with a purely functional style. I could’ve pushed through using imperative code, but I stubbornly chose not to and gave up… ↩︎

  2. The original constraint was that each solution must run in under one second. As it turned out, the code was faster than I expected, so I increased the difficulty. ↩︎

  3. TigerBeetle’s code quality and engineering principles are just wonderful. ↩︎

  4. You can implement this function without any allocation by mutating the string in place or by iterating over it twice, which is probably faster than my current implementation. I kept it as-is as a reminder of what comptime can do. ↩︎

  5. As a bonus, I was curious as to what this looks like compiled so I listed all the functions in this binary in GDB and found:

    72:     static bool day04.Day04(140).matches__anon_19741;
    72:     static bool day04.Day04(140).matches__anon_19750;

    It does generate separate functions! ↩︎

  6. Well, not always. The number of SIMD instructions depends on the machine’s native SIMD size. If the length of the vector exceeds it, Zig will compile it into multiple SIMD instructions. ↩︎

  7. Here’s a nice post on optimising day 9’s solution with Rust. It’s a good read if you’re into performance engineering or Rust techniques. ↩︎

  8. One thing about packed structs is that their layout is dependent on the system endianness. Most modern systems are little-endian, so the memory layout I showed is actually reversed. Thankfully, Zig has some useful functions to convert between endianness like std.mem.nativeToBig, which makes working with packed structs easier. ↩︎

  9. Technically, you can store 2-digit base 26 numbers in a u10, as there are only 262 possible numbers. Most systems usually pad values by byte size, so u10 will still be stored as u16, which is why I just went straight for it. ↩︎

Permalink

Java’s not dead, but it’s definitely been zombified

The language everyone loves to hate is still powering billions of devices but that doesn’t mean it’s thriving. Let’s talk about why it refuses to die.

If I had a dollar for every time someone announced Java’s death, I could probably retire and spend my days writing Rust that never gets deployed.
This month’s “Java funeral” post was just the latest in a long line of dramatic obituaries. Dev Twitter threw flowers, LinkedIn had eulogies, and somewhere in an Oracle boardroom… nothing happened.

Java is still here. Still running banks, insurance giants, government systems, and yes your old Minecraft server.
But here’s the thing: surviving isn’t the same as thriving. Java feels less like a rockstar and more like that MMO boss you’ve been fighting for three expansions. It won’t go down, but it’s not exactly exciting anyone either.

TLDR:
Java isn’t actually dead (again). It’s still the backbone of a ton of critical infrastructure, thanks to the JVM, corporate inertia, and its stability. But for many devs, the hype has moved on to shinier, newer tools. We’ll dig into why Java keeps surviving its own funerals, where it still shines, and what the next few years might look like.

Table of contents

  • The recurring death of Java
  • What keeps Java alive (and kicking, sort of)
  • Why devs keep “leaving” anyway
  • Where Java actually shines in 2025
  • The zombie factor
  • What’s next for Java?
  • Conclusion + resources

The recurring death of Java

Java’s death is like the Duke Nukem Forever release date it just keeps coming back every few years.
We’ve been through this cycle since the mid-2000s:

  • 2007–2010: “Java applets are dead, Flash is the future!” (Flash is now in the graveyard too.)
  • 2015: “Android’s moving to Kotlin, Java’s done.” (Spoiler: Kotlin is winning mobile, but Java is still in the room.)
  • Every year since: Some tech blogger declares Java obsolete, pointing to low Stack Overflow survey rankings and dropping “enterprise” like it’s a slur.

And yet… Java refuses to die.

Why? Two big reasons: legacy code and corporate inertia.
When you have a 20-year-old banking system running on Java 6, you don’t “rewrite it in Go for fun.” You keep paying developers who know the language and the JVM. The risk of touching certain codebases is like opening a cursed tomb better to leave it undisturbed unless you really want to awaken something.

And it’s not just dusty old systems. The JVM ecosystem is a massive reason Java’s still relevant. Even if you’ve switched to Kotlin, Scala, or Clojure, you’re still benefiting from decades of JVM tooling and optimizations. It’s like Java’s ghost still paying your rent.

If a language is boring but keeps the lights on for millions of people… is it really dead?

What keeps Java alive (and kicking, sort of)

If Java were a video game character, it’d be that tanky support class nobody mains but everyone relies on to survive the raid. You might not brag about playing it, but without it, the whole squad wipes.

The JVM is the real MVP

Java’s not just a language it’s the gateway to the Java Virtual Machine, which is still one of the most battle-tested, optimized runtimes in existence. The JVM lets you run Java, Kotlin, Scala, Clojure, Groovy, JRuby, and more, all on the same underlying tech. You can think of it like a universal controller adapter for programming languages.

The performance gains from decades of JVM optimization are insane. Garbage collection? Mature. Just-In-Time compilation? Rock solid. Cross-platform consistency? Still unmatched in many cases. Even if you’ve left Java for Kotlin, you’re still on its turf.

Enterprise adoption is sticky

Banks, governments, insurance companies these places love Java like sysadmins love Bash scripts. Not because it’s “sexy,” but because it’s predictable, stable, and has an army of devs who know it. Once Java is embedded in an org’s infrastructure, ripping it out isn’t just costly it’s a potential compliance nightmare.

The tooling is ridiculously good

Want a rock-solid IDE? IntelliJ IDEA has been spoiling Java devs for years. Want a mature web framework? Spring Boot basically invented “Java, but not painful for web dev.” Want battle-tested libraries? Maven Central is overflowing with them. Sure, dependency hell is still a thing, but it’s a familiar hell.

Modern Java isn’t the Java you remember

Lambdas, var, switch expressions, records the language has evolved. It’s not Haskell-level elegant, but it’s a far cry from the Java 1.4 you learned in school.

Java’s secret survival trick isn’t that it’s exciting it’s that it’s safe. It’s the Toyota Corolla of programming languages. No one writes a love song about it, but it’ll still be running when your trendy electric scooter startup shuts down.

Why devs keep “leaving” anyway

If Java is so stable, reliable, and backed by billions in enterprise infrastructure… why do so many devs ditch it the moment they get the chance?
Because stability isn’t the same as fun.

The boredom factor

Java is like that MMO you’ve been playing for 15 years you know every mechanic, every quest, every bug exploit. Sure, it works, but the thrill’s gone. Compare that to picking up Rust and wrestling with its borrow checker, or hacking together a Go project in a weekend. Newer languages feel like new worlds to explore; Java feels like your hometown safe, familiar, and kinda dull.

Syntax fatigue

Even with modern updates, Java can still feel verbose. Writing a simple DTO can feel like you’re transcribing legal documents. You see Kotlin’s data class syntax or Go’s minimal boilerplate and suddenly Java’s ceremony starts to feel exhausting.

Better options for certain jobs

Want to build a quick serverless API? Node.js, Go, or Python will get you there faster. Doing data science? Python owns that space. Writing mobile apps? Kotlin has the native advantage on Android and Swift rules iOS. Java is still capable, but for many domains, it’s not the first tool people reach for anymore.

Oracle drama

Licensing changes, legal disputes, and the general perception of Oracle as “the Disney of enterprise software” have left a sour taste in the dev community. OpenJDK is the safe escape hatch, but the PR damage lingers.

Generational shift

Ask a junior dev to start in Java and you might as well have asked them to code in COBOL. Many CS grads today cut their teeth on Python or JavaScript, so by the time they hit the job market, Java feels like a step backward in “cool factor.”

Where Java actually shines in 2025

For all the jokes, Java still has domains where it’s basically the undisputed boss battle. You might not choose it for a weekend side project, but if you’re building certain kinds of systems, Java’s still the right call.

High-performance enterprise backends

Massive transaction volumes? Mission-critical reliability? Decades of domain-specific libraries? Java’s got you. Banks, airlines, logistics giants they’re not rewriting their core systems in TypeScript just because it’s trending on GitHub.

Big data and analytics

Hadoop, Apache Spark, Flink a lot of big data tooling either runs on the JVM or was written in Java/Scala. Even if your data scientists are in Python, the heavy lifting often happens in the JVM world. It’s the invisible layer doing the grunt work while the shiny Python notebooks get the credit.

Android still in the picture

Yes, Kotlin is the cool kid now, but Java’s still the underlying reality for much of Android’s ecosystem. Tons of legacy apps and libraries are still Java-based, and Android Studio is perfectly happy compiling both.

Regulated industries

Healthcare, finance, government these sectors care about proven, tested, and auditable. Java’s long history and huge developer pool make it a low-risk choice when the stakes are high (and the lawyers are watching).

Long-term maintainability

If you expect a codebase to last 10+ years and survive multiple developer turnovers, Java is a safe bet. The language changes slowly, the ecosystem is stable, and the risk of “we can’t find anyone to maintain this” is way lower than with niche languages.

The zombie factor

Java in 2025 feels less like a vibrant, bustling city and more like a well-maintained fortress with a skeleton crew. It’s not expanding like crazy, but it’s not crumbling either it just keeps going, slowly, steadily, and stubbornly.

The best analogy? A zombie. Not the fast, rage-virus kind. The slow, shambling, “still dangerous if you get too close” kind.
It’s not winning marathons, but it’s impossible to kill because:

  • It’s everywhere. From backend services to industrial control systems, there’s Java code humming away that hasn’t been touched in years and probably shouldn’t be.
  • It has a massive support network. Even if the community isn’t screaming with excitement, the libraries, tooling, and dev talent are still there.
  • It adapts just enough. Features from Project Amber and Project Loom trickle in to keep it relevant, even if the updates feel like a Netflix show on its 15th season.

The cultural perception vs reality gap is huge. Dev Twitter loves dunking on Java for being “outdated,” while Stack Overflow data shows it’s still one of the most-used languages globally. This isn’t a language on life support it’s a boss battle with infinite respawns.

Java the unkillable boss battle

What’s next for Java?

If you’re expecting Java to suddenly pull a No Man’s Sky redemption arc and become the hottest new language overnight… yeah, that’s not happening. But it is quietly evolving in ways that make it more pleasant to work with especially if you haven’t touched it since the “public static void main” dark ages.

Project Loom better concurrency without the headache

Java’s concurrency story has always been solid, but Loom takes it up a notch with virtual threads. It’s not as flashy as async/await in newer languages, but it’s a massive quality-of-life boost for writing scalable services without juggling callbacks like a circus act.

📄 Project Loom overview

Project Amber smaller syntax wins

Amber is Java’s slow but steady makeover. Pattern matching for switch statements, records, sealed classes all aimed at reducing boilerplate and making the language feel less like paperwork.

📄 Project Amber page

Project Panama better native integration

Want to tap into native C/C++ libraries without JNI pain? Panama is working on it. This could make Java more attractive for performance-heavy workloads that need native interop without going full masochist mode.

📄 Project Panama page

Release cadence

Since Java moved to a 6-month release cycle, updates are more incremental. No more “massive version jumps every 5 years” panic just steady, bite-sized improvements.

In reality, Java’s future isn’t about winning the cool-kid contest. It’s about being good enough to keep. Enterprises love predictable tech stacks, and the JVM is still one of the most reliable platforms ever built. Java will probably still be around when half of today’s hyped languages have gone the way of Perl.

Conclusion

Java isn’t dead. It isn’t even dying.
It’s just… undead.
The kind of undead that doesn’t care if you mock it, because it’s busy running the systems that keep your paycheck coming.

If you’re a startup founder chasing developer hype, you probably won’t touch Java. If you’re a bank processing billions in transactions a day, you’ll hire another Java dev tomorrow. It’s not a language you brag about using it’s one you trust to not burn down the data center at 2 AM.

Do I reach for Java when I start a personal project? Almost never. Do I respect it? Absolutely.
Because here’s the truth: flashy languages rise and fall, but boring tech often wins in the long game. And Java, for better or worse, plays the long game better than almost anyone.

So the next time you see a “Java is dead” headline, remember we’ve been to its funeral a dozen times already. The coffin’s still empty.

Helpful resources

Blending my thoughts with the brilliance of modern tools. ✨
Thanks to ChatGPT, Midjourney, Envato, Grammarly, and friends for the assist. Together, these tools help us turn imagination into a fantastic world of ideas and stories.

Permalink

Clojure Support for Popular Data Tools: A Data Engineer's Perspective, and a New Clojure API for Snowflake

In this article I look at the extent of Clojure support for some popular on-cluster data processing tools that Clojure users might need for their data engineering or data science tasks. Then for Snowflake in particular I go further and present a new Clojure API.

Why is the level of Clojure support important? As an example, consider that Scicloj is mostly focused on in-memory processing. As such, if you need to work with a large dataset it will be necessary to compute on-cluster and extract a smaller result before continuing your data science task locally.

However, without sufficient Clojure support for on-cluster processing, anyone needing that facility for their data science or data engineering task would be forced to reach outside the Clojure ecosystem. That adds complexity in terms of interop, compatibility and overall stack requirements.

With that in mind, let's examine the level of Clojure support for some popular on-cluster data processing tools. For each tool I selected its official Clojure library if one exists, or if not the most popular and well-known community-supported alternative with at least 100 stars and 10 contributors on GitHub. I then used the following criteria against the library to classify it as "supported" or "support unknown":

  1. CI/CD build passing
  2. Most recent commit less than 12 months ago
  3. Most recent release less than 12 months ago
  4. Maintainers responded to any issue or question less than 12 months ago
  5. Maintainers either accepted or rejected any PR less than 12 months ago

If I couldn't find any such library at all, I classified it as having "no support".

Tool CategorySupportedSupport UnknownNo Support
On-cluster batch processing1. Spark (see Spark Interop with Geni below)
On-cluster stream processing2. Kafka Streams (see Kafka Interop with Jackdaw below)3. Spark Structured Streaming,
4. Flink
On-cluster batch and stream processing5. Databricks (see Spark Interop with Geni below),
6. Snowflake (see Snowflake Interop below)

Please note, I don't wish to make any critical judgments based on either the summary analysis above or the more detailed analysis below. The goal is to understand the situation with respect to Clojure support and highlight any gaps, although I suppose I am also inadvertently highlighting the difficulties of maintaining open source software!

Spark Interop with Geni

Geni is the go-to library for Spark interop. Some months back, I was motivated to evaluate the coverage of Spark features. In particular, I wanted to understand what would be involved to support Spark Connect as it would reduce the complexity of computing on-cluster directly from the Clojure REPL.

However, I found a number of issues that would need to be addressed in order to support Spark Connect and Databricks:

  1. Problems with the default session.
  2. Problems with support for Databricks, although I suspect this is related to point 1.

Also, in general by my criteria the support classification is "support unknown":

  1. CI/CD build failing.
  2. Version 0.0.42 api docs broken, also affects version 0.0.41
  3. No commits since November 2023.
  4. No releases since November 2023.
  5. No PRs accepted or rejected since November 2023.
  6. No response when attempting to contact the author or maintainers.

Kafka Interop with Jackdaw

Jackdaw is the go-to library for Kafka interop. However, by my criteria the support classification is also "support unknown":

  1. No commits since August 2024.
  2. No releases since December 2023.
  3. No PRs accepted or rejected since August 2024. As a further example, here's a PR raised in May 2024 but not yet commented on either way by the maintainers.

Snowflake Interop with a New Clojure API!

Although the Snowpark library has Java and Scala bindings, it doesn't provide anything for Clojure. As such, it's currently not possible to interact with Snowflake using the Clojure way.

To address this gap, I decided to try my hand at creating a Clojure API for Snowflake as part of a broader effort to improve the overall situation regarding Clojure support for popular data tools.

The aim is to validate this approach as a foundation for enabling a wide range of data science or data engineering use cases from the Clojure REPL, in situations where Snowflake is the data warehouse of choice.

The README provides usage examples for all the current features, but I've copied the essential ones here to illustrate the API:

Load Clojure data from local and save to a Snowflake table

(require '[snowpark-clj.core :as sp])

;; Sample data
(def employee-data
  [{:id 1 :name "Alice" :age 25 :department "Engineering" :salary 75000}
   {:id 2 :name "Bob" :age 30 :department "Marketing" :salary 65000}
   {:id 3 :name "Charlie" :age 35 :department "Engineering" :salary 80000}])

;; Create session and save data
(with-open [session (sp/create-session "snowflake.edn")]
  (-> employee-data
      (sp/create-dataframe session)
      (sp/save-as-table "employees" :overwrite)))

Compute over Snowflake table(s) on-cluster and extract results locally

(with-open [session (sp/create-session "snowflake.edn")]
  (let [table-df (sp/table session "employees")]
    (-> table-df
        (sp/filter (sp/gt (sp/col table-df :salary) (sp/lit 70000)))
        (sp/select [:name :salary])
        (sp/collect))))
;; => [{:name "Alice" :salary 75000} {:name "Charlie" :salary 80000}]

As an early-stage proof-of-concept, it only covers the essential parts of the underlying API without being too concerned with performance or completeness. Other more advanced features are noted and planned, pending further elaboration.

I hope you find it useful and I welcome any feedback or contributions!

Permalink

Scicloj on EdTech Platforms: Enabling Clojure-based Data Science in the Browser

You may or may not be aware that the Clojure data science stack a.k.a. Scicloj has been gaining momentum in recent years. To give a few highlights, dtype-next / fastmath are comparable with scipy / numpy for numerical work, and tech.ml.dataset / tablecloth are comparable with Pandas for tabular data. Kira McLean's 2023 Conj presentation explains that feature parity is almost upon us, and also offers some reasoning on why data scientists are now considering Clojure as an alternative to Python or R.

However, the leading EdTech platforms don't have much support for Clojure so all the potential benefits of both the language and Scicloj are not currently accessible to those communities.

The good news is that I have created a proof-of-concept for using a browser to write, load and evaluate Clojure code running on a remote server using websockets, processing the results for display using the Scicloj notebook library clay.

I don't believe this combination has been achieved before. It is a significant step when you consider that Scicloj is Java/JVM-based on account of the underlying math support, there is no Javascript or ClojureScript implementation and that is likely to remain the case.

This work opens up the possibility for Clojure-based data science on e-learning platforms so that anyone anywhere can learn and experiment with the Scicloj stack.

Here are some examples of new e-learning content that could be unlocked..

Theory:

  • Value vs state & functional programming
  • Concurrent programming.

Clojure hands-on:

  • Interactive programming and structural editing with the REPL
  • Data processing with lazy sequences and transducers
  • Data science notebooks covering stats, ML or LLM with rendered tabular data and charts.

More recently I gave an update on my progress at the Scicloj Visual Tools #34 meetup, including a live demo:

Watch the video

I hope you can appreciate the opportunity here. I'm happy to give a live demonstration to anyone who's interested!

Permalink

A/B Testing for Decision of scaling or decommissioning an Human Resources product

Introduction

Today, creating a culture where employees genuinely feel valued is more important than ever. But how do we move beyond standard recognition practices to truly understand what matters most to our team members?

At Nubank, innovation and rethinking what’s possible are at the heart of everything we do—and our People team embodies this philosophy too. We’re continuously pushing past traditional approaches, using advanced technology to enhance the way our employees experience work.

This mindset led to the birth of our “Noodles” project, an initiative aimed at revolutionizing how we recognize and appreciate one another.

The Technical Bet: Integrating Recognition into Everyday Tools

The core idea behind “Noodles” was simple: recognition shouldn’t be something extra or forced; it should seamlessly fit into the everyday tools our teams already use. Given how integral Slack is for our daily communication, we decided it was the perfect place to integrate regular peer-to-peer appreciation. Our hypothesis was straightforward: embedding recognition into Slack would naturally lead to more frequent appreciation, and ultimately, employees would feel more recognized.

To validate this, we applied a rigorous A/B testing framework, commonly used in product development, to assess Noodles’ true impact. We split users into two groups (treatment and control), carefully calculated the necessary sample size to ensure our findings were statistically sound, and conducted surveys before and after implementation to track changes in key perception metrics. We closely monitored structured metrics such as our internal Recognition Index, Product-Market Fit (PMF) Score, and user behavior patterns.

Instead of launching Noodles to everyone and holding back only a small control group, we practiced what we call Smart Efficiency—one of Nu’s core values—by identifying the minimum sample size needed to detect meaningful change within our test window. This helped us limit operational overhead like training and surveys, while still collecting reliable, statistically significant insights.

By treating this internal HR product with the same experimental discipline we apply to customer-facing features, we ensured that our decisions wouldn’t rely on anecdotal feedback or early excitement, but on evidence.

Unpacking the Results: Learning from Data

While the “Noodles” experiment showed promising results in some areas (like tool adoption and active users), the data also surfaced unexpected challenges. Despite achieving a 43% Product-Market Fit, our core Recognition Index actually declined among specific groups.

But thanks to our structured A/B testing, we could isolate the effects of the new tool from external influences and dive deeper into understanding these unexpected results. One insight was that “Noodles” resonated more positively with people giving recognition than with those receiving it. In fact, increased usage didn’t always translate to a stronger feeling of being recognized—a critical insight that might have been overlooked without an experimental design.

This reinforced a fundamental principle we believe at Nubank: data, when approached with honesty and scientific rigor, leads to better decisions—even when the outcome challenges our initial assumptions.

Our culture of experimentation and collaboration

Although we ultimately decided to pause “Noodles” in its current form, the experience was invaluable. It illustrates not only our commitment to continuous improvement, but also how experimentation is central to our People strategy. Rather than scale a solution based on incomplete signals, we chose a disciplined route—testing, measuring, and interpreting results with precision.

At Nubank, we approach People & Culture as a technology-driven domain. The Noodles experiment showcased how cross-functional teams—HR, Product, Data Science—can come together to test hypotheses, build MVPs, and make decisions based on real-world impact. This collaborative model reinforces our belief that HR should be as analytical, experimental, and responsive as any other function of our business.

Looking Forward: The Path to Meaningful Recognition

The learnings from Noodles have clarified the path ahead. Rather than scaling a solution that doesn’t fully solve the core issue, we’re shifting towards qualitative exploration to deeply understand the motivations behind the numbers. Combining quantitative experimentation with human insights is essential to creating meaningful employee experiences.

Moving forward, we want to apply this same philosophy to as many People & Culture initiatives as possible: test rigorously, measure results, learn continuously, and adapt accordingly. Whether a tool is built in-house or externally sourced, our approach remains the same: only scale initiatives that demonstrate clear, positive impacts, letting data guide our actions.

Conclusion

Even though “Noodles” didn’t turn out the way we expected, it illustrates Nubank’s dedication to pushing boundaries within our People team. By integrating data, product development strategies, and an iterative approach, we are revolutionizing how we engage and appreciate our people. 

Our goal remains clear: leverage technology thoughtfully to build a meaningful, human-centered employee experience. At Nubank, we are committed to ensuring that every Nubanker feels genuinely seen, recognized, and ready to build the Purple Future.

The post A/B Testing for Decision of scaling or decommissioning an Human Resources product appeared first on Building Nubank.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.