Learn Ring - 3. Prerequisite
Notes
- Prerequisite
- git
- IDE
- VSCode
- Antigravity
- Clojure
- Web Tech
Notes
Clojure: The Official Documentary premieres April 16th!
From a two-year sabbatical and a stubborn idea to powering the engineering stack of one of the world’s largest fintech companies — this is the story of Clojure.
Featuring Rich Hickey, Alex Miller, Stuart Halloway, and many more, this full-length documentary traces Clojure’s unconventional origins, its values-driven community, and the language’s quiet but profound impact on how we think about software.
Documentary made possible with the support of Nubank!
March 2026
When two teams need to combine data, the usual answer is infrastructure: an ETL pipeline, an API, a message bus. Each adds latency, maintenance burden, and a new failure mode. The data moves because the systems can’t share it in place.
There’s a simpler model. If your database is an immutable value in storage, then anyone who can read the storage can query it. No server to run, no API to negotiate, no data to copy. And if your query language supports multiple inputs, you can join databases from different teams in a single expression.
This is how Datahike works. It isn’t a feature we bolted on - it intentionally falls out of two properties fundamental to the architecture.
In a traditional database, you query through a connection to a running server. The data may change between queries. The database is a service, not something you hold.
Datahike inverts this. Dereference a connection (@conn) and you get an immutable database value - a snapshot frozen at a specific transaction. It won’t change. Pass it to a function, hold it in a variable, hand it to another thread. Two concurrent readers holding the same snapshot always agree, without locks or coordination.
This is an idea Rich Hickey introduced with Datomic in 2012: separate process (writes, managed by a single writer) from perception (reads, which are just values). The insight was that a correct implementation of perception does not require coordination.
Datomic’s indices live in storage, but its transactor holds an in-memory overlay of recent index segments that haven’t been flushed yet. Readers typically need to coordinate with the transactor to get a complete, current view. The storage alone isn’t enough.
Datahike removes that dependency. The writer flushes to storage on every transaction, so storage is always authoritative. Any process that can read the store sees the full, current database - no overlay, no transactor connection needed. To understand why this works, you need to see how the data is structured.
Datahike keeps its indices in a persistent sorted set - a B-tree variant where nodes are immutable. Every node is stored as a key-value pair in konserve, which abstracts over storage backends: S3, filesystem, JDBC, IndexedDB.
When a transaction adds data, Datahike doesn’t modify existing nodes. It creates new nodes for the changed path from leaf to root, while the unchanged subtrees are shared with the previous version. This is structural sharing - the same technique behind Clojure’s persistent vectors and Git’s object store.
A concrete example: a database with a million datoms might have a B-tree with thousands of nodes. A transaction that adds ten datoms rewrites perhaps a dozen nodes along the affected paths. The new tree root points to these new nodes and to the thousands of unchanged nodes from before. Both the old and new snapshots are valid, complete trees. They just share most of their structure.
The crucial property: every node is written once and never modified. Its key can be content-addressed. This means nodes can be cached aggressively, replicated independently, and read by any process that has access to the storage - without coordinating with the process that wrote them. (For more on how structural sharing, branching, and the tradeoffs work, see The Git Model for Databases.)
This is where it comes together.
When you call @conn, Datahike fetches one key from the konserve store: the branch head (e.g. :db). This returns a small map containing root pointers for each index, schema metadata, and the current transaction ID. Nothing else is loaded - the database value you receive is a lazy handle into the tree.
When a query traverses the index, each node is fetched on demand from storage and cached in a local LRU. Subsequent queries hitting the same nodes pay no I/O.
That’s the entire read path. No server process mediating access, no connection protocol, no port to expose. The indices live in storage, and any process that can read the storage can load the branch head, traverse the tree, and run queries. We call this the distributed index space.
Two processes reading the same database fetch the same immutable nodes independently. They don’t know about each other. A writer publishes new snapshots by writing new tree nodes, then atomically updating the branch head. Readers that dereference afterward see the new snapshot. Readers holding an earlier snapshot continue undisturbed - their nodes are immutable and won’t be garbage collected while reachable.
Because databases are values and Datalog natively supports multiple input sources, the next step is natural: join databases from different teams, different storage backends, or different points in time - in a single query.
Team A maintains a product catalog on S3. Team B maintains inventory on a separate bucket. A third team joins them without either team doing anything:
def catalog := d/connect({:store {:backend :s3, :bucket "team-a"}})
def inventory := d/connect({:store {:backend :s3, :bucket "team-b"}})
d/q('[:find ?name ?price ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$cat ?p :product/price ?price]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]
[(> ?stock 0)]], @catalog, @inventory)
(def catalog (d/connect {:store {:backend :s3 :bucket "team-a"}}))
(def inventory (d/connect {:store {:backend :s3 :bucket "team-b"}}))
(d/q '[:find ?name ?price ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$cat ?p :product/price ?price]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]
[(> ?stock 0)]]
@catalog @inventory)
Each @ dereference fetches a branch head from its respective S3 bucket and returns an immutable database value. The query engine joins them locally. There is no server coordinating between the two, no data copied.
And because both are values, you can mix snapshots from different points in time:
;; Last quarter's catalog crossed with current inventory
def old-catalog := d/as-of(@catalog, #inst "2025-11-01")
d/q('[:find ?name ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]], old-catalog, @inventory)
;; Last quarter's catalog crossed with current inventory
(def old-catalog (d/as-of @catalog #inst "2025-11-01"))
(d/q '[:find ?name ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]]
old-catalog @inventory)
The old snapshot and the current one are both just values. The query engine doesn’t care when they’re from. This is useful for audits, regulatory reproducibility, and debugging: “what would this report have shown against last quarter’s data?”
So far, “storage” has meant S3 or a filesystem. But konserve also has an IndexedDB backend, which means the same model works in a browser. Using Kabel WebSocket sync and konserve-sync, a browser client replicates a database locally into IndexedDB. Queries run against the local replica with zero network round-trips. Updates sync differentially - only changed tree nodes are transmitted, the same structural sharing that makes snapshots cheap on the server makes sync cheap over the wire.
A complete cross-database join, runnable in a Clojure REPL:
require('[datahike.api :as d])
;; Two independent databases
def catalog-cfg := {:store {:backend :memory, :id java.util.UUID/randomUUID()},
:schema-flexibility :read}
def inventory-cfg := {:store {:backend :memory, :id java.util.UUID/randomUUID()},
:schema-flexibility :read}
d/create-database(catalog-cfg)
d/create-database(inventory-cfg)
def catalog := d/connect(catalog-cfg)
def inventory := d/connect(inventory-cfg)
;; Team A: products
d/transact(catalog,
[{:product/sku "W001", :product/name "Widget", :product/price 9.99},
{:product/sku "G002", :product/name "Gadget", :product/price 24.5},
{:product/sku "T003",
:product/name "Thingamajig",
:product/price 3.75}])
;; Team B: stock levels
d/transact(inventory,
[{:stock/sku "W001", :stock/count 140},
{:stock/sku "G002", :stock/count 0},
{:stock/sku "T003", :stock/count 58}])
;; Join: in-stock products with price
d/q('[:find ?name ?price ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$cat ?p :product/price ?price]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]
[(> ?stock 0)]], @catalog, @inventory)
;; => #{["Widget" 9.99 140] ["Thingamajig" 3.75 58]}
(require '[datahike.api :as d])
;; Two independent databases
(def catalog-cfg {:store {:backend :memory
:id (java.util.UUID/randomUUID)}
:schema-flexibility :read})
(def inventory-cfg {:store {:backend :memory
:id (java.util.UUID/randomUUID)}
:schema-flexibility :read})
(d/create-database catalog-cfg)
(d/create-database inventory-cfg)
(def catalog (d/connect catalog-cfg))
(def inventory (d/connect inventory-cfg))
;; Team A: products
(d/transact catalog
[{:product/sku "W001" :product/name "Widget" :product/price 9.99}
{:product/sku "G002" :product/name "Gadget" :product/price 24.50}
{:product/sku "T003" :product/name "Thingamajig" :product/price 3.75}])
;; Team B: stock levels
(d/transact inventory
[{:stock/sku "W001" :stock/count 140}
{:stock/sku "G002" :stock/count 0}
{:stock/sku "T003" :stock/count 58}])
;; Join: in-stock products with price
(d/q '[:find ?name ?price ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$cat ?p :product/price ?price]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]
[(> ?stock 0)]]
@catalog @inventory)
;; => #{["Widget" 9.99 140] ["Thingamajig" 3.75 58]}
Replace :memory with :s3, :file, or :jdbc and the same code works across storage backends. The databases don’t need to share a backend - join an S3 database against a local file store in the same query.
By 2026, AI software development with a native LLM layer is not an extra feature anymore- it is the standard requirement. In fact, LLM integration for SaaS has become the standard for modern platforms. If business software can not learn, adapt, or automate on its own, it is already outdated. Whether teams are automating tedious tasks within the organization or turning SaaS into something that thinks for itself depends on how closely the AI is linked to data and how the team works.
Honestly, the pace of AI software development has been unpredictable. What was experimental just a few years back is now completely normal. All organizations, from scrappy startups to large enterprises, are integrating LLMs right into their SaaS application development pipelines. And it is not just about adding a chatbot on top. The real shift? AI is becoming embedded in the core of products, reshaping how work gets done.
What’s pushing this change? Three big things:
By 2026, skipping LLM integration is a sure way to fall behind. Competitors are already building with AI in mind from the very beginning. The strategy guide has really got better, too. Now businesses have everything, ranging from machine learning to smart ways to keep SaaS data separate for different customers. It is not guesswork anymore- it is a repeatable, scalable framework. If a business doesn’t adapt, it risks being left behind.

By 2026, companies won’t be debating whether to use AI anymore. The real question is how much of their systems should rely on it.
It is a big shift, and it highlights that building internal AI tools is a totally different game from SaaS application development.
| Feature | Internal AI Tools | AI‑Powered SaaS Products |
| Primary Goal | Engineering productivity & operational ROI | User retention & market differentiation |
| Data Source | Private knowledge bases (Slack, Jira, Wikis) | User‑generated data & behavioral logs |
| Compliance Focus | SOC2, internal privacy, data leaks | GDPR‑compliant AI, multi‑tenancy isolation |
| Interface | Slackbots, internal dashboards, CLI | Conversational UI, embedded copilots |
| Integration Style | Point solutions for specific workflows | Deep LLM integration for SaaS across product layers |
| Scalability | Limited to team or department use | Designed as scalable software solutions for thousands of users |
| AI Software Development Approach | Focused on automating repetitive internal tasks | Built for AI‑powered business intelligence (BI) and personalization |
| Privacy Strategy | Controlled access within the company | Privacy‑first AI software development with anonymization and tenant isolation |
| Maintenance | Managed by internal IT or engineering teams | Continuous updates through SaaS release cycles |
| User Experience | Functional, task‑driven | Adaptive, proactive, and customer‑centric |
Internal tools are all about making work smoother and faster. With AI, that usually means assistants that summarize meetings, draft documents, or help engineers find information without having to look all over. The goal is to focus on ROI and efficiency, not market dominance.
SaaS platforms have a different mission. They need to build scalable software solutions and keep users coming back. Here, AI gets right into the workflow- LLMs offer smart suggestions, guide new users, and AI‑powered business intelligence (BI) features that actually make sense of data. This is where SaaS application development no longer just integrates chatbots but starts to feel truly AI-native.
Compliance matters everywhere. Internal teams worry about leaks and passing SOC2 audits. SaaS providers deal with even tougher requirements- GDPR, privacy across lots of customers, the works. The answer? Develop privacy‑first AI software. Anonymize sensitive data before it reaches an external model. That builds trust and keeps everything on the right side of the rules.
Search is going out of use. Retrieval is taking over. Instead of forcing employees to scroll through endless wikis, Slack threads, or Jira tickets, AI steps in with Retrieval‑Augmented Generation (RAG). These days, individuals only need to ask a query, and the AI will find the appropriate information and provide a concise response.
✅️ Example: A developer asks, “What’s the latest update on the payment API?” No digging through Jira. The AI finds the right entries and gives a clear update. It seems small, but over time it saves hours.

AI agents shine when it comes to routine tasks. They can:
✅️ Example: The AI generates Jira tickets, assigns tasks, and gives a summary after planning the sprint. Engineers skip the admin work and get back to actual engineering.
Teams are not just guessing about the impact of AI- they track it:
✅️ Example: After rolling out RAG-based tools, a company saw developers spend 40% less time searching for documentation.
Most SaaS platforms started with simple chatbots or basic support features. But AI‑native SaaS changes the approach. Instead of adding AI later, it is built into the product’s core. Workflows shift in real time. Insights emerge before even asking. Personalization just happens- without having to do a thing.
Forget sitting around waiting for users to type into a help chat. Now, AI takes the lead. In a project management tool, it might spot a stuck task and remind the user of the next steps. A CRM identifies leads that are being overlooked.
BI dashboards are not just about flashy graphs anymore. AI steps in and explains what those trends actually mean, points out unusual spikes, and even recommends the next move- entirely in simple terms.
Personalization used to mean just showing the right product. Now, AI-native SaaS is shaped by what each user really wants, all while keeping privacy front and center.
AI-native SaaS is not about eye-catching new features. It is about building real intelligence right into the product, so people waste less time clicking around and get more value from the start. When it is done right, it scales up, protects privacy, and turns software into something that feels less like a tool and more like a true partner.
At Flexiana, Clojure is the backbone of our AI systems. Its functional style and immutability keep code stable and predictable, even as systems grow. That is a big deal when companies are trying to keep orchestration layers simple to maintain and scalable.
(ns orchestration
(:require [clj-http.client :as http]
[cheshire.core :as json]
[clojure.spec.alpha :as s]))
(def openai-api-key "YOUR_API_KEY")
(s/def ::summary (s/and vector? #(= 3 (count %)) (s/every string?)))
(s/def ::response (s/keys :req-un [::summary]))
(defn call-llm [prompt]
(let [response (http/post "https://api.openai.com/v1/chat/completions"
{:headers {"Authorization" (str "Bearer " openai-api-key)
"Content-Type" "application/json"}
:body (json/generate-string
{:model "gpt-4o-mini"
:temperature 0
:messages [{:role "user"
:content prompt}]})})]
(-> response :body (json/parse-string true))))
(defn build-prompt [user-input]
(str
"Summarize the following text into exactly 3 bullet points.\n\n"
"Return ONLY valid JSON:\n"
"{\"summary\": [\"point1\", \"point2\", \"point3\"]}\n\n"
user-input))
(defn safe-parse-json [s]
(try
(json/parse-string s true)
(catch Exception _
nil)))
(defn validate-response [parsed]
(if (s/valid? ::response parsed)
parsed
(throw (ex-info "Invalid LLM structure"
{:errors (s/explain-data ::response parsed)
:data parsed}))))
(defn extract-structured-answer [llm-response]
(let [content (get-in llm-response [:choices 0 :message :content])
parsed (safe-parse-json content)]
(if parsed
(validate-response parsed)
(throw (ex-info "Failed to parse JSON"
{:raw content})))))
(defn orchestrate-llm-workflow [user-input]
(let [prompt (build-prompt user-input)
raw-response (call-llm prompt)
validated (extract-structured-answer raw-response)]
(:summary validated)))
(orchestrate-llm-workflow
"Large language models are transforming how businesses automate workflows.")
Flexiana’s model selection is not about chasing the latest and greatest. We keep it practical- balancing expenses, efficiency, and the specific job requirements.
| Approach | Cost Level | Performance Level | Best Use Cases | Trade‑offs |
| Frontier Models | High | Very High | Complex analysis, deep reasoning, nuanced BI | Expensive, slower response times |
| Small Models | Low | Moderate | Routine queries, dashboards, lightweight reports | Less accurate on complex tasks |
| Hybrid Strategy | Balanced | Adaptive | Mix of high‑value analysis + everyday reporting | Requires orchestration, but is cost‑efficient |
Flexiana actually cares about building systems that work- real solutions for real problems. We use Clojure and smart model selection to build BI tools that not only work on day one but also keep up as the business grows. Companies get valuable insights, efficient use of their resources, and a configuration that works well.
Let’s be real: integrating AI with a SaaS platform is no simple task. Multi-tenant systems need to balance many customers at once, all while maintaining high performance, strong privacy, and unbreakable security. Flexiana focuses on what truly matters.
When teams have numerous tenants, they can not mess around with data separation. Every customer’s info has to stay private – no exceptions, no accidental crossovers.
Flexiana draws clear lines from the database all the way up to the AI layer. Strong tenant boundaries, workflows that keep data in place, and pipelines that scale without losing trust. Customers are assured that their data remains secure even as the system expands.
Large language models are powerful, but not flawless. Malicious users sometimes trick models into breaking rules or revealing hidden info.
Flexiana blocks them at the checkpoint, with built-in filters that detect suspicious input, validation layers that enforce safe responses, and monitoring that detects emerging tactics. With these protections, users do not have to worry about AI misuse.
Flexiana does not add privacy as an afterthought- we integrate it from the very beginning. Every feature, every layer, follows strict privacy standards and keeps tenant data confidential. We stick to EU GDPR guidelines and give customers real control over their info, keeping everything transparent. This way, the AI is not just smart; it is responsible.
Trust is everything in multi-tenant SaaS. Flexiana’s focus-tight data isolation, strong defenses, and a privacy-first mindset-means our AI systems stay secure, scale up easily, and follow the guidelines. That is how we build something customers can actually rely on.
Bringing large language models (LLMs) into business software is neither inexpensive nor fast. Businesses want to know if it is actually worth the effort. ROI is not just about saving money. It is about moving faster, getting people on board, and making things run smoother. At Flexiana, we break it down into three main areas.
LLMs can take a lot of the pain out of daily work. Companies see the benefits when teams solve problems faster and feel like they actually have the right tools.
These figures demonstrate whether AI is indeed simplifying tasks rather than adding more processes.
For customer‑facing platforms, ROI comes from how much people use the new features and how much less support they need.
This helps companies see whether AI is actually improving their products and removing obstacles to progress.
Behind the scenes, businesses have to make smart choices, since running LLMs is not free. There is a clear difference between using external APIs and running smaller models in‑house.
An API may seem low-cost at a few cents per request, but costs rise quickly. If demand rises, switching to a local quantized model saves money over time. It is all about finding that right balance between staying flexible and saving in the long run. An ROI calculator helps with that.

ROI isn’t just a box to tick to prove AI is worth it. It is about making better decisions as your business grows. When companies track things like internal efficiency, how customers are using the product, and what it costs to keep everything running, they actually see where LLMs make a difference- and where they need to make changes.
Not always. APIs are easy to set up, but costs rise as usage grows. Running smaller models yourself takes more work at first, but you end up saving money in the long run.
It limits how much data the AI sees and puts safety checks in place. That reduces mistakes, keeps data safe, and supports compliance. Plus, it builds trust.
If you want to move fast and handle growth, a team helps a lot. When you are just getting started, you can stick with APIs or managed services- they get the job done. Once your SaaS starts to expand quickly, having real experts on board makes everything run more smoothly and improves what you deliver.
It analyzes consumer data and identifies patterns. Then it gives guidance on shaping your product.BI takes all that raw information and turns it into something you can actually use, making your platform smarter and more useful.
They let you handle more users and data without slowing down. When you add more AI features, your system stays fast, and costs remain controlled.
Definitely. Clojure’s concurrency capabilities and design make it a good option for machine learning pipelines. It helps you add AI features that are reliable and easy to maintain.
If companies are building SaaS applications, LLM integration is not just a nice-to-have anymore- it is expected. Teams have two main paths. They can plug in external APIs for a faster launch, or can run smaller models in-house if they want more control. It really depends on what they want to invest in, how big they want to grow, and how closely they need to monitor things.
Sticking to privacy‑first design and building software that scales- this is what keeps the business platform solid. When teams follow smart AI development practices, customers can actually trust what they see. AI-powered business intelligence is not just a set of buzzwords, either. It gives teams a clear view of customer behavior, helps them spot trends before everyone else, and guides product decisions with real data. And if companies are working on something more advanced, tools like machine learning with Clojure make it possible to build pipelines that don’t break down and are pretty straightforward to maintain.
At the end of the day, integrating AI is not about chasing trends. It is about making SaaS tools that actually work and scale with business goals.
The post LLM Integration for Internal Tools & SaaS Products (2026 Strategy Guide) appeared first on Flexiana.
We built our pull-pattern API on lasagna-pull, a library designed by Robert Luo that lets clients send EDN patterns to describe what data they want. The core pattern-matching engine is solid. But as we added more resources, roles, and mutation types, we wanted a different model for how patterns interact with data sources and side effects. This article is about the design decisions behind lasagna-pattern, the successor stack that replaces the function-calling handler layer while building on the same pull-pattern ideas.
For context on what the new architecture looks like, see Building a Pure Data API with Lasagna Pattern. For the monorepo structure that hosts the libraries, see Clojure Monorepo with Babashka.
In lasagna-pull, the core mechanism was :with. Patterns contained function calls: (list :key :with [args]) told the engine to look up :key in a data map, call the function stored there, and pass it args. Functions returned {:response ... :effects ...}.
Here is what a few common operations looked like.
List all dashboards (read):
{:dashboards
{(list :role/user :with [])
{(list :self :with []) [{:title '? :id '?}]}}}
The outer :with checked authorization. The inner :with called a function to list all entries. The vector with map shape [{:title '? :id '?}] described which fields to return.
Read by ID:
{:dashboards
{(list :role/user :with [])
{(list :dashboard :with [{:id 123} :read])
{:title '? :content '?}}}}
The :read action dispatched inside the function via case.
Create:
{:dashboards
{(list :role/user :with [])
{(list :dashboard :with [{:title "New" :content "..."} :save])
{:id '? :title '?}}}}
Same function, different action. The function returned {:response data :effects {:rama {...}}}, and a separate executor ran the side effects after the pattern resolved.
On the server, the data map was a nested structure of functions:
(defn pullable-data [session]
{:dashboards
{:role/user (with-role session :user
(fn []
{:dashboard (fn [data action]
(case action
:read {:response (get-dash (:id data))}
:save {:response data
:effects {:rama {...}}}
:delete {:response true
:effects {:rama {...}}}))}))}})
Authorization was a function wrapper: with-role took a session, a role keyword, and a thunk that returned the data map. If the role was missing, the thunk never ran.
This architecture had a name: the "saturn handler" pattern, designed by Robert Luo. The idea was to split request handling into three stages:
{:response, :effects-desc, :session} with zero side effectsThe context-of mechanism coordinated accumulation during pattern resolution. A modifier function extracted :response, :effects, and :session from each operation result. A finalizer attached the accumulated effects and session updates to the final result. The handler itself never touched the database for writes.
;; Saturn handler: purely functional, no side effects
(defn saturn-handler [{:keys [db session] :as req}]
(let [pattern (extract-pattern req)
data (pullable-data db session)
result (pull/with-data-schema schema (mk-query pattern data))]
{:response ('&? result)
:effects-desc (:context/effects result)
:session (merge session (:context/sessions result))}))
This was a clean separation. The saturn handler was fully testable with no mocks. Effects were pure data descriptions. The executor was the only impure component, and it was small. The original implementation is documented in the archived flybot.sg repository.
The saturn handler separation was elegant, but as the system grew, specific limitations emerged.
Response before effects. The saturn handler computed :response before the executor ran :effects. This worked when the response data was already known (e.g., returning the input entity on create). But when you needed something produced by the side effect itself (a DB-generated ID, a timestamp set by the storage layer, a merged entity after a partial update), you were stuck. The f-merge escape hatch existed: a closing function in the effects description that could amend the response after execution. But using f-merge essentially reintroduced in-place mutation, defeating the purpose of the pure/impure split.
Verb-oriented patterns. Every pattern was a set of function calls. Reading all items called a function. Reading one item called a different function with a :read action. Creating called the same function with a :save action. The case dispatch inside each :with function grew as operations multiplied. The pattern language was supposed to describe data, but it was describing procedure calls.
Authorization at two granularities. with-role gated access to the entire data map (coarse). But ownership enforcement (can this user edit this specific item?) had to live inside the :with function's case dispatch (fine). These were two different authorization mechanisms in two different places, with no intermediate layer for "can mutate, but only own entities."
Indirection through context-of. The modifier/finalizer mechanism in context-of was well-designed for what it did: accumulate effects and session updates during pattern resolution without side effects. But it was a layer you had to understand to trace a request end-to-end. Each operation returned {:response :effects :session :error}, the modifier unpacked those, and the finalizer attached the accumulations. The mechanics were sound, but the indirection meant debugging required following the data through several stages.
The saturn handler pattern achieved something valuable: a fully testable, purely functional request handler. The redesign was not about fixing a broken system. It was about recognizing that once collections replaced functions as the API's building blocks, the pure/impure split could happen at a different boundary (inside DataSource methods), and the accumulation machinery was no longer needed.
The rewrite inverted the relationship. Instead of patterns calling functions, patterns match against data structures. Collections implement ILookup (Clojure's get protocol) for reads and a Mutable protocol for writes. The pattern engine does not know about functions. It just walks a data structure.
Here are the same operations in the new model.
List all dashboards:
'{:user {:dashboards ?all}}
:user is a top-level key in the API map. If the session has the user role, it resolves to a map containing :dashboards. If not, it resolves to nil. ?all is a variable that binds to (seq dashboards), triggering list-all on the DataSource.
Read by ID:
'{:user {:dashboards {{:id $id} ?dash}}}
;; client sends: {:pattern ... :params {:id 123}}
{:id $id} is a lookup key. $id gets replaced with 123 before the pattern compiles. The collection's ILookup implementation receives {:id 123} and delegates to the DataSource's fetch method.
Create:
{:user {:dashboards {nil {:title "New" :content "..."}}}}
nil as a key means "create". The collection's Mutable implementation calls create! on the DataSource. The response is the full created entity.
No :with, no action keywords, no case dispatch. The pattern syntax itself encodes the operation: ?var means read, nil key means create, nil value means delete, key + value means update.
On the server, the data map is a structure of collections, not functions:
(defn make-api [{:keys [storage cache]}]
(let [dashboards (coll/collection (->DashboardSource storage cache)
{:id-key :id
:indexes #{#{:id}}})]
(fn [ring-request]
(let [session (:session ring-request)]
{:data {:user (when (:user session)
{:dashboards dashboards})
:owner (when (:owner session)
{:users users-collection
:roles roles-collection})}
:schema {:user {:dashboards [:vector Dashboard]}
:owner {:users [:vector User]}}
:errors {:detect :error
:codes {:forbidden 403 :not-found 404}}}))))
The contrast is clearest when you see old and new patterns next to each other.
;; OLD: two nested function calls
{:dashboards
{(list :role/user :with [])
{(list :self :with []) [{:title '? :id '?}]}}}
;; NEW: structural traversal
'{:user {:dashboards ?all}}
The old pattern needed two :with calls just to list everything: one for role checking, one for the listing function. The new pattern walks a data structure. If :user exists in the API map, :dashboards is a collection, and ?all binds to its contents.
;; OLD: function call with arguments
{:dashboards
{(list :role/user :with [])
{(list :dashboard :with [{:id 123} :read])
{:title '? :content '?}}}}
;; NEW: indexed lookup with $params
'{:user {:dashboards {{:id $id} ?dash}}}
;; params: {:id 123}
:with [{:id 123} :read] called a function and passed it two arguments. {:id $id} is text substitution: $id becomes 123, then {:id 123} is used as a lookup key on the collection. The difference is that $params happens before pattern compilation. There is no function call in the pattern at all.
;; OLD: function call with :save action
{:dashboards
{(list :role/user :with [])
{(list :dashboard :with [{:title "New" :content "..."} :save])
{:id '? :title '?}}}}
;; NEW: nil key = create
{:user {:dashboards {nil {:title "New" :content "..."}}}}
The old model used the same function for reads and writes, distinguished by an action keyword (:read, :save, :delete). The new model uses structural conventions: nil as the key means create. The collection's Mutable protocol handles it.
;; OLD: function call with :delete action
{:dashboards
{(list :role/user :with [])
{(list :dashboard :with [{:id 123} :delete])
{:id '?}}}}
;; NEW: query key + nil value = delete
{:user {:dashboards {{:id 123} nil}}}
nil as the value means delete. No action keywords, no function dispatch.
;; OLD: arbitrary query object via :with
{:analytics
{(list :raw :with [{:data-source [:module-1 :stats]
:select :col-name
:time-range {:from "2026-01-01" :to "2026-02-01"}}])
'?}}
;; NEW: query object as lookup key via $params
'{:user {:analytics {$query ?result}}}
;; params: {:query {:data-source [:module-1 :stats]
;; :select :col-name
;; :time-range {:from "2026-01-01" :to "2026-02-01"}}}
The query object is the same in both cases. The difference is where it lives: inside a function call (old) versus as a lookup key (new). The DataSource's fetch method receives the full query map and routes internally.
Old: (with-role session :user (fn [] ...)) wraps a thunk. Authorization is a function that gates access to other functions.
New: top-level keys in the API map are nil when the session lacks the role. The pattern simply gets nil for unauthorized paths. No function call, no wrapper.
;; Session has :user but not :owner
{:data {:user {:dashboards dashboards} ;; present
:owner nil}} ;; nil: patterns against :owner return nothing
For finer-grained checks (ownership enforcement on mutations), wrap-mutable intercepts write operations:
(coll/wrap-mutable dashboards
(fn [inner query value]
(if (owns? session query)
(coll/mutate! inner query value)
{:error {:type :forbidden}})))
This is still structural: a decorator around a collection, not conditional logic inside a handler.
:with called a function with arguments at pattern-resolution time. $params does text substitution before the pattern is even compiled.
;; $params: symbol replacement before compilation
'{:users {{:id $uid} ?user}}
;; + params {:uid 123}
;; becomes: {:users {{:id 123} ?user}}
The pattern engine never sees $uid. By the time it runs, the pattern is pure data. This means patterns are always static structures from the engine's perspective, which simplifies the implementation and makes patterns easier to reason about.
The old context-of mechanism was well-engineered: modifier functions extracted :response/:effects/:session from each operation, accumulated them in transient collections, and the finalizer attached them to the result. The saturn handler stayed pure throughout. It was a clean solution to the problem of accumulating side-effect descriptions during pattern resolution.
The new system does not need any of it:
The tradeoff: the saturn handler's strict pure/impure boundary is gone. DataSource methods perform side effects directly, which means the handler is no longer purely functional. In practice, this turned out to be acceptable because DataSource implementations are small, focused, and testable in isolation. The purity moved from the handler level to the collection wrapper level (decorators like wrap-mutable and read-only are pure transformations).
Old: functions returned {:response ... :effects {:rama {...}}}. The saturn handler accumulated these descriptions. A separate executor ran them afterward. The handler was purely functional.
New: create!, update!, and delete! in DataSource perform the side effects directly. The return value is the entity itself, not a description of work to be done.
(defrecord DashboardSource [storage cache]
coll/DataSource
(create! [_ data]
(storage-append! storage [data :save])
(assoc data :id (generate-id) :created-at (now)))
(delete! [_ query]
(storage-append! storage [query :delete])
true))
This solves the "response before effects" problem directly: create! performs the write and returns the full entity with DB-generated fields. No f-merge, no two-phase response construction.
The tradeoff is that the handler is no longer purely functional. If you need the old effects-description pattern for testing, you can wrap the DataSource to capture effects without executing them. But the default path is direct execution, which is simpler to trace.
Collections return errors as plain maps:
{:error {:type :forbidden :message "You don't own this resource"}}
{:error {:type :not-found}}
The remote layer maps error types to HTTP status codes via a declarative config:
{:detect :error
:codes {:forbidden 403 :not-found 404 :invalid-mutation 422}}
This keeps collections pure (they return data describing what happened) while the transport layer decides how to represent it. The design is heading toward GraphQL-style partial responses, where one branch failing does not fail the whole pattern. A request for {:user ?data :admin ?admin-stuff} should return :user data even if :admin is forbidden, with errors collected in a top-level array alongside the data.
The old saturn handler architecture was a genuinely clean design: a purely functional handler, effects as data descriptions, executors as the only impure component. It achieved testability and separation of concerns that many web frameworks do not even attempt.
The redesign was not about fixing something broken. It was about moving the purity boundary. The saturn handler kept the entire request pipeline pure by deferring effects. The new model keeps collections and their wrappers pure by pushing side effects into DataSource methods. The accumulation machinery (context-of, modifier, finalizer) disappears because there is nothing to accumulate. The response-before-effects limitation disappears because create! returns the entity directly.
The deeper lesson is about API identity. When your API is a set of handler functions, cross-cutting concerns (authorization, transport, error handling) become imperative code woven through those handlers. When your API is a data structure, those same concerns become structural: the shape of the map enforces authorization, the protocols enforce CRUD semantics, and the transport layer works generically over any ILookup-compatible structure.
Verbs become nouns, and the nouns compose.
Clojure's built-in functions work on built-in types because those types implement specific Java interfaces. get works on maps because maps implement ILookup. seq works on vectors because vectors implement Seqable. count works on both because they implement Counted.
The interesting part: your custom types can implement the same interfaces. Once they do, Clojure's standard library treats them as first-class citizens. get, seq, map, filter, count all work transparently, no special dispatch, no wrapper functions.
The lasagna-pattern collection library (Clojars) does exactly this. It defines a Collection type backed by a database, then implements ILookup and Seqable so that (get coll {:post/id 3}) triggers a database query while looking like a plain map lookup to the caller. The companion article, Building a Pure Data API with Lasagna Pattern, covers the full architecture. This article focuses on the Clojure constructs that make it work.
Clojure provides four ways to define types that implement protocols and interfaces. Each serves a different purpose.
Defines method signatures with no implementation. Conceptually similar to a Java interface.
(defprotocol DataSource
(fetch [this query])
(list-all [this])
(create! [this data])
(update! [this query data])
(delete! [this query]))
This says: "any storage backend must support these 5 operations." It does not say how. The implementation is left to the types that satisfy the protocol.
A concrete implementation of a protocol. Has named fields and behaves like a Clojure map (you can assoc, dissoc, and destructure it).
(defrecord PostsDataSource [conn]
DataSource
(fetch [_ query] (d/q ... @conn))
(list-all [_] (d/q ... @conn))
(create! [_ data] (d/transact conn [data]))
(update! [_ q data] (d/transact conn [(merge ...)]))
(delete! [_ query] (d/transact conn [[:db/retractEntity ...]])))
Use defrecord for persistent, reusable implementations with named fields: storage backends, services, configuration holders.
Like defrecord but without map behavior. Used for structural wrappers that implement platform interfaces rather than domain protocols.
(deftype Collection [data-source id-key indexes]
clojure.lang.ILookup
(valAt [this q] (.valAt this q nil))
(valAt [_ q nf] (or (fetch data-source q) nf))
clojure.lang.Seqable
(seq [_] (seq (list-all data-source))))
Use deftype when you need to override built-in Clojure verbs (get, seq, count). The type itself is opaque. Callers interact with it through standard Clojure functions, not through field access.
Same capability as deftype but anonymous and created inline. Closes over local variables.
(defn profile-lookup [session]
(reify clojure.lang.ILookup
(valAt [this k] (.valAt this k nil))
(valAt [_ k nf]
(case k
:name (:user-name session)
:email (:user-email session)
nf))))
Use for one-off objects, per-request wrappers, or cases where a named type would be overkill. The session value is captured from the enclosing scope.
| Construct | What it is | When to use |
|---|---|---|
defprotocol |
Contract (method signatures) | Define a role: "what must a DataSource do?" |
defrecord |
Named type, map-like | Concrete implementations: PostsDataSource |
deftype |
Named type, not map-like | Structural wrappers: Collection |
reify |
Anonymous inline type | One-off objects: per-request lookups |
Each Clojure interface corresponds to a built-in verb. Implementing an interface teaches Clojure how your custom type responds to that verb.
getWhen you call (get thing key), Clojure calls (.valAt thing key nil) under the hood. Maps implement this by default. Custom types do not.
;; Without ILookup
(deftype Box [x])
(get (->Box 42) :x) ;; => nil (Box doesn't implement ILookup)
;; With ILookup
(deftype SmartBox [x y]
clojure.lang.ILookup
(valAt [this k] (.valAt this k nil))
(valAt [_ k nf]
(case k :x x :y y nf)))
(get (->SmartBox 1 2) :x) ;; => 1
In the collection library, ILookup is what makes (get coll {:post/id 3}) trigger a database query. The caller writes standard Clojure. The collection translates the get call into a fetch on the underlying DataSource.
seq (and map, filter, etc.)clojure.lang.Seqable
(seq [_] (seq (list-all data-source)))
Once a type implements Seqable, all sequence functions work: (seq coll), (map f coll), (filter pred coll). The collection becomes iterable by delegating to its DataSource's list-all.
count directlyclojure.lang.Counted
(count [_] (count (list-all data-source)))
Without Counted, calling count on a custom Seqable type throws UnsupportedOperationException. Clojure's RT.count() does not fall back to seq. It only works on types that implement Counted, IPersistentCollection, java.util.Collection, or a few other JDK interfaces. If your custom type needs to support count, implement Counted explicitly. This also lets you provide an optimized path (e.g., a SELECT COUNT(*) instead of fetching all rows).
The interfaces above override Clojure's built-in verbs. But some operations have no built-in verb. The collection library defines two custom protocols for these cases.
| Protocol | Verb | Purpose |
|---|---|---|
Mutable |
mutate! |
Unified CRUD: (nil, data) = create, (query, data) = update, (query, nil) = delete |
Wireable |
->wire |
Serialize for HTTP transport: collections become vectors, lookups become maps or nil |
mutate! unifies create, update, and delete into a single function. The operation is determined by the combination of arguments: nil query means create, nil value means delete, both present means update.
Wireable is conceptually similar to clojure.core.protocols/Datafiable (datafy). Both turn opaque types into plain Clojure data. The difference is intent: datafy is for introspection and navigation, ->wire is specifically for HTTP serialization.
Here is the key design insight: one DataSource, one Collection, multiple wrappers per role.
;; 3 records duplicating the same Datahike queries
(defrecord GuestPostsDataSource [conn] ...)
(defrecord MemberPostsDataSource [conn] ...)
(defrecord AdminPostsDataSource [conn] ...)
Each record contains a full copy of the same fetch, list-all, create!, update!, and delete! logic. Domain logic changes must be applied to all three.
(def posts (db/posts conn)) ;; one DataSource, one Collection
(public-posts posts) ;; reify: override get/seq to strip email
(member-posts posts user-id email) ;; wrap-mutable: override mutate! for ownership
posts ;; admin: no wrapper needed
The DataSource is created once. Each role gets a thin wrapper that overrides only the behavior it needs. Reads, storage queries, and domain logic live in one place.
| Wrapper | What it overrides | Use case |
|---|---|---|
coll/read-only |
Removes Mutable entirely |
Guest access (no writes) |
coll/wrap-mutable |
Overrides mutate!, delegates reads |
Ownership enforcement |
reify (manual) |
Override any interface | Transform read results, composite routing |
coll/lookup |
Provides ILookup from a keyword-value map |
Non-enumerable resources (profile, session data) |
(let [posts (db/posts conn)] ;; one record, created once
{:guest {:posts (public-posts posts)} ;; reify over read-only, strips :user/email
:member {:posts (member-posts posts uid email)} ;; wrap-mutable, ownership checks
:admin {:posts posts} ;; raw collection, full access
:owner {:users (coll/read-only (db/users conn))}})
Guests see a read-only view with PII stripped. Members see a mutable view that enforces ownership. Admins see the raw collection. Each wrapper does one thing.
The public-posts wrapper demonstrates how reify serves as the escape hatch when the built-in wrappers are not enough:
(defn- public-posts [posts]
(let [inner (coll/read-only posts)]
(reify
clojure.lang.ILookup
(valAt [_ query]
(when-let [post (.valAt inner query)]
(strip-author-email post)))
clojure.lang.Seqable
(seq [_]
(map strip-author-email (seq inner))))))
The library provides read-only (restricts writes) and wrap-mutable (intercepts writes), but no built-in way to transform read results. For that, you implement ILookup and Seqable directly via reify.
Authorization in this pattern is distributed structurally rather than imperatively. Instead of a single middleware that checks permissions, three layers each handle a different granularity.
with-role (API map structure)Binary gate: you have the role or you don't. The entire subtree of collections is present or replaced with an error map.
(defn- with-role [session role data]
(if (contains? (:roles session) role)
data
{:error {:type :forbidden :message (str "Role " role " required")}}))
;; In make-api:
:owner (with-role session :owner
{:users users, :users/roles roles})
A non-owner sending '{:owner {:users ?all}} hits the error map, not the collection. The remote/ layer detects errors along variable paths, so the error flows through as inline data and prevents any mutation from being attempted.
A planned improvement is an error-gate function that replaces the plain map with a reify implementing ILookup (returns self for any key, so deeply nested pattern traversal keeps working), Mutable (returns the error for mutations), and Wireable (serializes as the error map). This would be a good example of composing three protocols into a single anonymous sentinel object.
wrap-mutable (per-entity mutation rules)Controls who can create, update, or delete specific entities:
(coll/wrap-mutable posts
(fn [posts query value]
(if (owns-post? posts user-email query)
(coll/mutate! posts query value)
{:error {:type :forbidden}})))
Reads pass through untouched. Only mutations are intercepted. The check is per-entity: does this user own this specific post?
reify decorator (field-level read transformation)Controls which fields are visible:
(public-posts posts) ;; strips :user/email from author on every read
Every get and seq call on this wrapper runs through a transformation function that removes sensitive fields before the data reaches the caller.
| Layer | Tool | What it guards | Example |
|---|---|---|---|
| Coarse | with-role |
"Can you access :owner at all?" |
Non-owners get error map |
| Medium | wrap-mutable |
"Can you mutate this entity?" | Members can only edit own posts |
| Fine | reify decorator |
"What fields can you see?" | Guests don't see author email |
The DataSource stays "dumb" about authorization. It only knows about storage. This keeps it reusable across all roles without conditional logic.
Not everything needs defrecord + DataSource + Collection. If a resource is read-only, non-enumerable, and has a single query shape, a raw reify implementing ILookup + Wireable is enough.
Example: a post history lookup that takes a post ID and returns the revision history:
(defn post-history-lookup [conn]
(reify
clojure.lang.ILookup
(valAt [_ query]
(when-let [post-id (:post/id query)]
(post-history @conn post-id)))
(valAt [this query not-found]
(or (.valAt this query) not-found))
coll/Wireable
(->wire [_] nil))) ;; can't enumerate all history
The pattern engine still calls get on it, so it works identically from the caller's perspective. The full DataSource/Collection stack would add index validation, Seqable, Mutable, none of which history needs.
| Need | Tool |
|---|---|
| Full CRUD + enumeration + index validation | defrecord + coll/collection |
| Read-only, keyword keys, flat values | coll/lookup |
| Read-only, map keys, single query shape | Raw reify with ILookup + Wireable |
Note: coll/lookup only supports keyword keys (:email, :name). For map keys like {:post/id 3}, use a raw reify.
This post walks through a small web development project using Clojure, covering everything from building the app to packaging and deploying it. It’s a collection of insights and tips I’ve learned from building my Clojure side projects, but presented in a more structured format.
As the title suggests, we’ll be deploying the app to Fly.io. It’s a service that allows you to deploy apps packaged as Docker images on lightweight virtual machines. [1] [1] My experience with it has been good; it’s easy to use and quick to set up. One downside of Fly is that it doesn’t have a free tier, but if you don’t plan on leaving the app deployed, it barely costs anything.
This isn’t a tutorial on Clojure, so I’ll assume you already have some familiarity with the language as well as some of its libraries. [2] [2]
In this post, we’ll be building a barebones bookmarks manager for the demo app. Users can log in using basic authentication, view all bookmarks, and create a new bookmark. It’ll be a traditional multi-page web app and the data will be stored in a SQLite database.
Here’s an overview of the project’s starting directory structure:
.
├── dev
│ └── user.clj
├── resources
│ └── config.edn
├── src
│ └── acme
│ └── main.clj
└── deps.edn
And the libraries we’re going to use. If you have some Clojure experience or have used Kit, you’re probably already familiar with all the libraries listed below. [3] [3]
{:paths ["src" "resources"]
:deps {org.clojure/clojure {:mvn/version "1.12.0"}
aero/aero {:mvn/version "1.1.6"}
integrant/integrant {:mvn/version "0.11.0"}
ring/ring-jetty-adapter {:mvn/version "1.12.2"}
metosin/reitit-ring {:mvn/version "0.7.2"}
com.github.seancorfield/next.jdbc {:mvn/version "1.3.939"}
org.xerial/sqlite-jdbc {:mvn/version "3.46.1.0"}
hiccup/hiccup {:mvn/version "2.0.0-RC3"}}
:aliases
{:dev {:extra-paths ["dev"]
:extra-deps {nrepl/nrepl {:mvn/version "1.3.0"}
integrant/repl {:mvn/version "0.3.3"}}
:main-opts ["-m" "nrepl.cmdline" "--interactive" "--color"]}}}
I use Aero and Integrant for my system configuration (more on this in the next section), Ring with the Jetty adaptor for the web server, Reitit for routing, next.jdbc for database interaction, and Hiccup for rendering HTML. From what I’ve seen, this is a popular “library combination” for building web apps in Clojure. [4] [4]
The user namespace in dev/user.clj contains helper functions from Integrant-repl to start, stop, and restart the Integrant system.
(ns user
(:require
[acme.main :as main]
[clojure.tools.namespace.repl :as repl]
[integrant.core :as ig]
[integrant.repl :refer [set-prep! go halt reset reset-all]]))
(set-prep!
(fn []
(ig/expand (main/read-config)))) ;; we'll implement this soon
(repl/set-refresh-dirs "src" "resources")
(comment
(go)
(halt)
(reset)
(reset-all))
If you’re new to Integrant or other dependency injection libraries like Component, I’d suggest reading “How to Structure a Clojure Web”. It’s a great explanation of the reasoning behind these libraries. Like most Clojure apps that use Aero and Integrant, my system configuration lives in a .edn file. I usually name mine as resources/config.edn. Here’s what it looks like:
{:server
{:port #long #or [#env PORT 8080]
:host #or [#env HOST "0.0.0.0"]
:auth {:username #or [#env AUTH_USER "john.doe@email.com"]
:password #or [#env AUTH_PASSWORD "password"]}}
:database
{:dbtype "sqlite"
:dbname #or [#env DB_DATABASE "database.db"]}}
In production, most of these values will be set using environment variables. During local development, the app will use the hard-coded default values. We don’t have any sensitive values in our config (e.g., API keys), so it’s fine to commit this file to version control. If there are such values, I usually put them in another file that’s not tracked by version control and include them in the config file using Aero’s #include reader tag.
This config file is then “expanded” into the Integrant system map using the expand-key method:
(ns acme.main
(:require
[aero.core :as aero]
[clojure.java.io :as io]
[integrant.core :as ig]))
(defn read-config
[]
{:system/config (aero/read-config (io/resource "config.edn"))})
(defmethod ig/expand-key :system/config
[_ opts]
(let [{:keys [server database]} opts]
{:server/jetty (assoc server :handler (ig/ref :handler/ring))
:handler/ring {:database (ig/ref :database/sql)
:auth (:auth server)}
:database/sql database}))
The system map is created in code instead of being in the configuration file. This makes refactoring your system simpler as you only need to change this method while leaving the config file (mostly) untouched. [5] [5]
My current approach to Integrant + Aero config files is mostly inspired by the blog post “Rethinking Config with Aero & Integrant” and Laravel’s configuration. The config file follows a similar structure to Laravel’s config files and contains the app configurations without describing the structure of the system. Previously, I had a key for each Integrant component, which led to the config file being littered with #ig/ref and more difficult to refactor.
Also, if you haven’t already, start a REPL and connect to it from your editor. Run clj -M:dev if your editor doesn’t automatically start a REPL. Next, we’ll implement the init-key and halt-key! methods for each of the components:
;; src/acme/main.clj
(ns acme.main
(:require
;; ...
[acme.handler :as handler]
[acme.util :as util])
[next.jdbc :as jdbc]
[ring.adapter.jetty :as jetty]))
;; ...
(defmethod ig/init-key :server/jetty
[_ opts]
(let [{:keys [handler port]} opts
jetty-opts (-> opts (dissoc :handler :auth) (assoc :join? false))
server (jetty/run-jetty handler jetty-opts)]
(println "Server started on port " port)
server))
(defmethod ig/halt-key! :server/jetty
[_ server]
(.stop server))
(defmethod ig/init-key :handler/ring
[_ opts]
(handler/handler opts))
(defmethod ig/init-key :database/sql
[_ opts]
(let [datasource (jdbc/get-datasource opts)]
(util/setup-db datasource)
datasource))
The setup-db function creates the required tables in the database if they don’t exist yet. This works fine for database migrations in small projects like this demo app, but for larger projects, consider using libraries such as Migratus (my preferred library) or Ragtime.
(ns acme.util
(:require
[next.jdbc :as jdbc]))
(defn setup-db
[db]
(jdbc/execute-one!
db
["create table if not exists bookmarks (
bookmark_id text primary key not null,
url text not null,
created_at datetime default (unixepoch()) not null
)"]))
For the server handler, let’s start with a simple function that returns a “hi world” string.
(ns acme.handler
(:require
[ring.util.response :as res]))
(defn handler
[_opts]
(fn [req]
(res/response "hi world")))
Now all the components are implemented. We can check if the system is working properly by evaluating (reset) in the user namespace. This will reload your files and restart the system. You should see this message printed in your REPL:
:reloading (acme.util acme.handler acme.main)
Server started on port 8080
:resumed
If we send a request to http://localhost:8080/, we should get “hi world” as the response:
$ curl localhost:8080/
# hi world
Nice! The system is working correctly. In the next section, we’ll implement routing and our business logic handlers.
First, let’s set up a ring handler and router using Reitit. We only have one route, the index / route that’ll handle both GET and POST requests.
(ns acme.handler
(:require
[reitit.ring :as ring]))
(def routes
[["/" {:get index-page
:post index-action}]])
(defn handler
[opts]
(ring/ring-handler
(ring/router routes)
(ring/routes
(ring/redirect-trailing-slash-handler)
(ring/create-resource-handler {:path "/"})
(ring/create-default-handler))))
We’re including some useful middleware:
redirect-trailing-slash-handler to resolve routes with trailing slashes,create-resource-handler to serve static files, andcreate-default-handler to handle common 40x responses.If you remember the :handler/ring from earlier, you’ll notice that it has two dependencies, database and auth. Currently, they’re inaccessible to our route handlers. To fix this, we can inject these components into the Ring request map using a middleware function.
;; ...
(defn components-middleware
[components]
(let [{:keys [database auth]} components]
(fn [handler]
(fn [req]
(handler (assoc req
:db database
:auth auth))))))
;; ...
The components-middleware function takes in a map of components and creates a middleware function that “assocs” each component into the request map.
[6]
[6]
If you have more components such as a Redis cache or a mail service, you can add them here.
We’ll also need a middleware to handle HTTP basic authentication.
[7]
[7]
This middleware will check if the username and password from the request map match the values in the auth map injected by components-middleware. If they match, then the request is authenticated and the user can view the site.
(ns acme.handler
(:require
;; ...
[acme.util :as util]
[ring.util.response :as res]))
;; ...
(defn wrap-basic-auth
[handler]
(fn [req]
(let [{:keys [headers auth]} req
{:keys [username password]} auth
authorization (get headers "authorization")
correct-creds (str "Basic " (util/base64-encode
(format "%s:%s" username password)))]
(if (and authorization (= correct-creds authorization))
(handler req)
(-> (res/response "Access Denied")
(res/status 401)
(res/header "WWW-Authenticate" "Basic realm=protected"))))))
;; ...
A nice feature of Clojure is that interop with the host language is easy. The base64-encode function is just a thin wrapper over Java’s Base64.Encoder:
(ns acme.util
;; ...
(:import java.util.Base64))
(defn base64-encode
[s]
(.encodeToString (Base64/getEncoder) (.getBytes s)))
Finally, we need to add them to the router. Since we’ll be handling form requests later, we’ll also bring in Ring’s wrap-params middleware.
(ns acme.handler
(:require
;; ...
[ring.middleware.params :refer [wrap-params]]))
;; ...
(defn handler
[opts]
(ring/ring-handler
;; ...
{:middleware [(components-middleware opts)
wrap-basic-auth
wrap-params]}))
We now have everything we need to implement the route handlers or the business logic of the app. First, we’ll implement the index-page function, which renders a page that:
(ns acme.handler
(:require
;; ...
[next.jdbc :as jdbc]
[next.jdbc.sql :as sql]))
;; ...
(defn template
[bookmarks]
[:html
[:head
[:meta {:charset "utf-8"
:name "viewport"
:content "width=device-width, initial-scale=1.0"}]]
[:body
[:h1 "bookmarks"]
[:form {:method "POST"}
[:div
[:label {:for "url"} "url "]
[:input#url {:name "url"
:type "url"
:required true
:placeholer "https://en.wikipedia.org/"}]]
[:button "submit"]]
[:p "your bookmarks:"]
[:ul
(if (empty? bookmarks)
[:li "you don't have any bookmarks"]
(map
(fn [{:keys [url]}]
[:li
[:a {:href url} url]])
bookmarks))]]])
(defn index-page
[req]
(try
(let [bookmarks (sql/query (:db req)
["select * from bookmarks"]
jdbc/unqualified-snake-kebab-opts)]
(util/render (template bookmarks)))
(catch Exception e
(util/server-error e))))
;; ...
Database queries can sometimes throw exceptions, so it’s good to wrap them in a try-catch block. I’ll also introduce some helper functions:
(ns acme.util
(:require
;; ...
[hiccup2.core :as h]
[ring.util.response :as res])
(:import java.util.Base64))
;; ...
(defn preprend-doctype
[s]
(str "<!doctype html>" s))
(defn render
[hiccup]
(-> hiccup h/html str preprend-doctype res/response (res/content-type "text/html")))
(defn server-error
[e]
(println "Caught exception: " e)
(-> (res/response "Internal server error")
(res/status 500)))
render takes a hiccup form and turns it into a ring response, while server-error takes an exception, logs it, and returns a 500 response.
Next, we’ll implement the index-action function:
;; ...
(defn index-action
[req]
(try
(let [{:keys [db form-params]} req
value (get form-params "url")]
(sql/insert! db :bookmarks {:bookmark_id (random-uuid) :url value})
(res/redirect "/" 303))
(catch Exception e
(util/server-error e))))
;; ...
This is an implementation of a typical post/redirect/get pattern. We get the value from the URL form field, insert a new row in the database with that value, and redirect back to the index page. Again, we’re using a try-catch block to handle possible exceptions from the database query.
That should be all of the code for the controllers. If you reload your REPL and go to http://localhost:8080, you should see something that looks like this after logging in:

The last thing we need to do is to update the main function to start the system:
;; ...
(defn -main [& _]
(-> (read-config) ig/expand ig/init))
Now, you should be able to run the app using clj -M -m acme.main. That’s all the code needed for the app. In the next section, we’ll package the app into a Docker image to deploy to Fly.
While there are many ways to package a Clojure app, Fly.io specifically requires a Docker image. There are two approaches to doing this:
Both are valid approaches. I prefer the first since its only dependency is the JVM. We’ll use the tools.build library to build the uberjar. Check out the official guide for more information on building Clojure programs. Since it’s a library, to use it, we can add it to our deps.edn file with an alias:
{;; ...
:aliases
{;; ...
:build {:extra-deps {io.github.clojure/tools.build
{:git/tag "v0.10.5" :git/sha "2a21b7a"}}
:ns-default build}}}
Tools.build expects a build.clj file in the root of the project directory, so we’ll need to create that file. This file contains the instructions to build artefacts, which in our case is a single uberjar. There are many great examples of build.clj files on the web, including from the official documentation. For now, you can copy+paste this file into your project.
(ns build
(:require
[clojure.tools.build.api :as b]))
(def basis (delay (b/create-basis {:project "deps.edn"})))
(def src-dirs ["src" "resources"])
(def class-dir "target/classes")
(defn uber
[_]
(println "Cleaning build directory...")
(b/delete {:path "target"})
(println "Copying files...")
(b/copy-dir {:src-dirs src-dirs
:target-dir class-dir})
(println "Compiling Clojure...")
(b/compile-clj {:basis @basis
:ns-compile '[acme.main]
:class-dir class-dir})
(println "Building Uberjar...")
(b/uber {:basis @basis
:class-dir class-dir
:uber-file "target/standalone.jar"
:main 'acme.main}))
To build the project, run clj -T:build uber. This will create the uberjar standalone.jar in the target directory. The uber in clj -T:build uber refers to the uber function from build.clj. Since the build system is a Clojure program, you can customise it however you like. If we try to run the uberjar now, we’ll get an error:
# build the uberjar
$ clj -T:build uber
# Cleaning build directory...
# Copying files...
# Compiling Clojure...
# Building Uberjar...
# run the uberjar
$ java -jar target/standalone.jar
# Error: Could not find or load main class acme.main
# Caused by: java.lang.ClassNotFoundException: acme.main
This error occurred because the Main class that is required by Java isn’t built. To fix this, we need to add the :gen-class directive in our main namespace. This will instruct Clojure to create the Main class from the -main function.
(ns acme.main
;; ...
(:gen-class))
;; ...
If you rebuild the project and run java -jar target/standalone.jar again, it should work perfectly. Now that we have a working build script, we can write the Dockerfile:
# install additional dependencies here in the base layer
# separate base from build layer so any additional deps installed are cached
FROM clojure:temurin-21-tools-deps-bookworm-slim AS base
FROM base as build
WORKDIR /opt
COPY . .
RUN clj -T:build uber
FROM eclipse-temurin:21-alpine AS prod
COPY /opt/target/standalone.jar /
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "standalone.jar"]
It’s a multi-stage Dockerfile. We use the official Clojure Docker image as the layer to build the uberjar. Once it’s built, we copy it to a smaller Docker image that only contains the Java runtime. [8] [8] By doing this, we get a smaller container image as well as a faster Docker build time because the layers are better cached.
That should be all for packaging the app. We can move on to the deployment now.
First things first, you’ll need to install flyctl, Fly’s CLI tool for interacting with their platform. Create a Fly.io account if you haven’t already. Then run fly auth login to authenticate flyctl with your account.
Next, we’ll need to create a new Fly App:
$ fly app create
# ? Choose an app name (leave blank to generate one):
# automatically selected personal organization: Ryan Martin
# New app created: blue-water-6489
Another way to do this is with the fly launch command, which automates a lot of the app configuration for you. We have some steps to do that are not done by fly launch, so we’ll be configuring the app manually. I also already have a fly.toml file ready that you can straight away copy to your project.
# replace these with your app and region name
# run `fly platform regions` to get a list of regions
app = 'blue-water-6489'
primary_region = 'sin'
[env]
DB_DATABASE = "/data/database.db"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 0
[mounts]
source = "data"
destination = "/data"
initial_sie = 1
[[vm]]
size = "shared-cpu-1x"
memory = "512mb"
cpus = 1
cpu_kind = "shared"
These are mostly the default configuration values with some additions. Under the [env] section, we’re setting the SQLite database location to /data/database.db. The database.db file itself will be stored in a persistent Fly Volume mounted on the /data directory. This is specified under the [mounts] section. Fly Volumes are similar to regular Docker volumes but are designed for Fly’s micro VMs.
We’ll need to set the AUTH_USER and AUTH_PASSWORD environment variables too, but not through the fly.toml file as these are sensitive values. To securely set these credentials with Fly, we can set them as app secrets. They’re stored encrypted and will be automatically injected into the app at boot time.
$ fly secrets set AUTH_USER=hi@ryanmartin.me AUTH_PASSWORD=not-so-secure-password
# Secrets are staged for the first deployment
With this, the configuration is done and we can deploy the app using fly deploy:
$ fly deploy
# ...
# Checking DNS configuration for blue-water-6489.fly.dev
# Visit your newly deployed app at https://blue-water-6489.fly.dev/
The first deployment will take longer since it’s building the Docker image for the first time. Subsequent deployments should be faster due to the cached image layers. You can click on the link to view the deployed app, or you can also run fly open, which will do the same thing. Here’s the app in action:

If you made additional changes to the app or fly.toml, you can redeploy the app using the same command, fly deploy. The app is configured to auto stop/start, which helps to cut costs when there’s not a lot of traffic to the site. If you want to take down the deployment, you’ll need to delete the app itself using fly app destroy <your app name>.
This is an interesting topic in the Clojure community, with varying opinions on whether or not it’s a good idea. Personally, I find having a REPL connected to the live app helpful, and I often use it for debugging and running queries on the live database. [9] [9] Since we’re using SQLite, we don’t have a database server we can directly connect to, unlike Postgres or MySQL.
If you’re brave, you can even restart the app directly without redeploying from the REPL. You can easily go wrong with it, which is why some prefer not to use it.
For this project, we’re gonna add a socket REPL. It’s very simple to add (you just need to add a JVM option) and it doesn’t require additional dependencies like nREPL. Let’s update the Dockerfile:
# ...
EXPOSE 7888
ENTRYPOINT ["java", "-Dclojure.server.repl={:port 7888 :accept clojure.core.server/repl}", "-jar", "standalone.jar"]
The socket REPL will be listening on port 7888. If we redeploy the app now, the REPL will be started, but we won’t be able to connect to it. That’s because we haven’t exposed the service through Fly proxy. We can do this by adding the socket REPL as a service in the [services] section in fly.toml.
However, doing this will also expose the REPL port to the public. This means that anyone can connect to your REPL and possibly mess with your app. Instead, what we want to do is to configure the socket REPL as a private service.
By default, all Fly apps in your organisation live in the same private network. This private network, called 6PN, connects the apps in your organisation through WireGuard tunnels (a VPN) using IPv6. Fly private services aren’t exposed to the public internet but can be reached from this private network. We can then use Wireguard to connect to this private network to reach our socket REPL.
Fly VMs are also configured with the hostname fly-local-6pn, which maps to its 6PN address. This is analogous to localhost, which points to your loopback address 127.0.0.1. To expose a service to 6PN, all we have to do is bind or serve it to fly-local-6pn instead of the usual 0.0.0.0. We have to update the socket REPL options to:
# ...
ENTRYPOINT ["java", "-Dclojure.server.repl={:port 7888,:address \"fly-local-6pn\",:accept clojure.core.server/repl}", "-jar", "standalone.jar"]
After redeploying, we can use the fly proxy command to forward the port from the remote server to our local machine.
[10]
[10]
$ fly proxy 7888:7888
# Proxying local port 7888 to remote [blue-water-6489.internal]:7888
In another shell, run:
$ rlwrap nc localhost 7888
# user=>
Now we have a REPL connected to the production app! rlwrap is used for readline functionality, e.g. up/down arrow keys, vi bindings. Of course, you can also connect to it from your editor.
If you’re using GitHub, we can also set up automatic deployments on pushes/PRs with GitHub Actions. All you need is to create the workflow file:
name: Fly Deploy
on:
push:
branches:
- main
workflow_dispatch:
jobs:
deploy:
name: Deploy app
runs-on: ubuntu-latest
concurrency: deploy-group
steps:
- uses: actions/checkout@v4
- uses: superfly/flyctl-actions/setup-flyctl@master
- run: flyctl deploy --remote-only
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
To get this to work, you’ll need to create a deploy token from your app’s dashboard. Then, in your GitHub repo, create a new repository secret called FLY_API_TOKEN with the value of your deploy token. Now, whenever you push to the main branch, this workflow will automatically run and deploy your app. You can also manually run the workflow from GitHub because of the workflow_dispatch option.
As always, all the code is available on GitHub. Originally, this post was just about deploying to Fly.io, but along the way, I kept adding on more stuff until it essentially became my version of the user manager example app. Anyway, hope this post provided a good view into web development with Clojure. As a bonus, here are some additional resources on deploying Clojure apps:
The way Fly.io works under the hood is pretty clever. Instead of running the container image with a runtime like Docker, the image is unpacked and “loaded” into a VM. See this video explanation for more details. ↩︎
If you’re interested in learning Clojure, my recommendation is to follow the official getting started guide and join the Clojurians Slack. Also, read through this list of introductory resources. ↩︎
Kit was a big influence on me when I first started learning web development in Clojure. I never used it directly, but I did use their library choices and project structure as a base for my own projects. ↩︎
There’s no “Rails” for the Clojure ecosystem (yet?). The prevailing opinion is to build your own “framework” by composing different libraries together. Most of these libraries are stable and are already used in production by big companies, so don’t let this discourage you from doing web development in Clojure! ↩︎
There might be some keys that you add or remove, but the structure of the config file stays the same. ↩︎
“assoc” (associate) is a Clojure slang that means to add or update a key-value pair in a map. ↩︎
For more details on how basic authentication works, check out the specification. ↩︎
Here’s a cool resource I found when researching Java Dockerfiles: WhichJDK. It provides a comprehensive comparison of the different JDKs available and recommendations on which one you should use. ↩︎
Another (non-technically important) argument for live/production REPLs is just because it’s cool. Ever since I read the story about NASA’s programmers debugging a spacecraft through a live REPL, I’ve always wanted to try it at least once. ↩︎
If you encounter errors related to WireGuard when running fly proxy, you can run fly doctor, which will hopefully detect issues with your local setup and also suggest fixes for them. ↩︎
This post is about six seven months late, but here are my takeaways from Advent of Code 2024. It was my second time participating, and this time I actually managed to complete it.
[1]
[1]
My goal was to learn a new language, Zig, and to improve my DSA and problem-solving skills.
If you’re not familiar, Advent of Code is an annual programming challenge that runs every December. A new puzzle is released each day from December 1st to the 25th. There’s also a global leaderboard where people (and AI) race to get the fastest solves, but I personally don’t compete in it, mostly because I want to do it at my own pace.
I went with Zig because I have been curious about it for a while, mainly because of its promise of being a better C and because TigerBeetle (one of the coolest databases now) is written in it. Learning Zig felt like a good way to get back into systems programming, something I’ve been wanting to do after a couple of chaotic years of web development.
This post is mostly about my setup, results, and the things I learned from solving the puzzles. If you’re more interested in my solutions, I’ve also uploaded my code and solution write-ups to my GitHub repository.

There were several Advent of Code templates in Zig that I looked at as a reference for my development setup, but none of them really clicked with me. I ended up just running my solutions directly using zig run for the whole event. It wasn’t until after the event ended that I properly learned Zig’s build system and reorganised my project.
Here’s what the project structure looks like now:
.
├── src
│ ├── days
│ │ ├── data
│ │ │ ├── day01.txt
│ │ │ ├── day02.txt
│ │ │ └── ...
│ │ ├── day01.zig
│ │ ├── day02.zig
│ │ └── ...
│ ├── bench.zig
│ └── run.zig
└── build.zig
The project is powered by build.zig, which defines several commands:
zig build - Builds all of the binaries for all optimisation modes.zig build run - Runs all solutions sequentially.zig build run -Day=XX - Runs the solution of the specified day only.zig build bench - Runs all benchmarks sequentially.zig build bench -Day=XX - Runs the benchmark of the specified day only.zig build test - Runs all tests sequentially.zig build test -Day=XX - Runs the tests of the specified day only.You can also pass the optimisation mode that you want to any of the commands above with the -Doptimize flag.
Under the hood, build.zig compiles src/run.zig when you call zig build run, and src/bench.zig when you call zig build bench. These files are templates that import the solution for a specific day from src/days/dayXX.zig. For example, here’s what src/run.zig looks like:
const std = @import("std");
const puzzle = @import("day"); // Injected by build.zig
pub fn main() !void {
var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena.deinit();
const allocator = arena.allocator();
std.debug.print("{s}\n", .{puzzle.title});
_ = try puzzle.run(allocator, true);
std.debug.print("\n", .{});
}
The day module imported is an anonymous import dynamically injected by build.zig during compilation. This allows a single run.zig or bench.zig to be reused for all solutions. This avoids repeating boilerplate code in the solution files. Here’s a simplified version of my build.zig file that shows how this works:
const std = @import("std");
pub fn build(b: *std.Build) void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const run_all = b.step("run", "Run all days");
const day_option = b.option(usize, "ay", ""); // The `-Day` option
// Generate build targets for all 25 days.
for (1..26) |day| {
const day_zig_file = b.path(b.fmt("src/days/day{d:0>2}.zig", .{day}));
// Create an executable for running this specific day.
const run_exe = b.addExecutable(.{
.name = b.fmt("run-day{d:0>2}", .{day}),
.root_source_file = b.path("src/run.zig"),
.target = target,
.optimize = optimize,
});
// Inject the day-specific solution file as the anonymous module `day`.
run_exe.root_module.addAnonymousImport("day", .{ .root_source_file = day_zig_file });
// Install the executable so it can be run.
b.installArtifact(run_exe);
// ...
}
}
My actual build.zig has some extra code that builds the binaries for all optimisation modes.
This setup is pretty barebones. I’ve seen other templates do cool things like scaffold files, download puzzle inputs, and even submit answers automatically. Since I wrote my build.zig after the event ended, I didn’t get to use it while solving the puzzles. I might add these features to it if I decided to do Advent of Code again this year with Zig.
While there are no rules to Advent of Code itself, to make things a little more interesting, I set a few constraints and rules for myself:
@embedFile.Most of these constraints are designed to push me to write clearer, more performant code. I also wanted my code to look like it was taken straight from TigerBeetle’s codebase (minus the assertions). [3] [3] Lastly, I just thought it would make the experience more fun.
From all of the puzzles, here are my top 3 favourites:
Honourable mention:
During the event, I learned a lot about Zig and performance, and also developed some personal coding conventions. Some of these are Zig-specific, but most are universal and can be applied across languages. This section covers general programming and Zig patterns I found useful. The next section will focus on performance-related tips.
Zig’s flagship feature, comptime, is surprisingly useful. I knew Zig uses it for generics and that people do clever metaprogramming with it, but I didn’t expect to be using it so often myself.
My main use for comptime was to generate puzzle-specific types. All my solution files follow the same structure, with a DayXX function that takes some parameters (usually the input length) and returns a puzzle-specific type, e.g.:
fn Day01(comptime length: usize) type {
return struct {
const Self = @This();
left: [length]u32 = undefined,
right: [length]u32 = undefined,
fn init(input: []const u8) !Self {}
// ...
};
}
This lets me instantiate the type with a size that matches my input:
// Here, `Day01` is called with the size of my actual input.
pub fn run(_: std.mem.Allocator, is_run: bool) ![3]u64 {
// ...
const input = @embedFile("./data/day01.txt");
var puzzle = try Day01(1000).init(input);
// ...
}
// Here, `Day01` is called with the size of my test input.
test "day 01 part 1 sample 1" {
var puzzle = try Day01(6).init(sample_input);
// ...
}
This allows me to reuse logic across different inputs while still hardcoding the array sizes. Without comptime, I have to either create a separate function for all my different inputs or dynamically allocate memory because I can’t hardcode the array size.
I also used comptime to shift some computation to compile-time to reduce runtime overhead. For example, on day 4, I needed a function to check whether a string matches either "XMAS" or its reverse, "SAMX". A pretty simple function that you can write as a one-liner in Python:
def matches(pattern, target):
return target == pattern or target == pattern[::-1]
Typically, a function like this requires some dynamic allocation to create the reversed string, since the length of the string is only known at runtime. [4] [4] For this puzzle, since the words to reverse are known at compile-time, we can do something like this:
fn matches(comptime word: []const u8, slice: []const u8) bool {
var reversed: [word.len]u8 = undefined;
@memcpy(&reversed, word);
std.mem.reverse(u8, &reversed);
return std.mem.eql(u8, word, slice) or std.mem.eql(u8, &reversed, slice);
}
This creates a separate function for each word I want to reverse. [5] [5] Each function has an array with the same size as the word to reverse. This removes the need for dynamic allocation and makes the code run faster. As a bonus, Zig also warns you when this word isn’t compile-time known, so you get an immediate error if you pass in a runtime value.
A common pattern in C is to return special sentinel values to denote missing values or errors, e.g. -1, 0, or NULL. In fact, I did this on day 13 of the challenge:
// We won't ever get 0 as a result, so we use it as a sentinel error value.
fn count_tokens(a: [2]u8, b: [2]u8, p: [2]i64) u64 {
const numerator = @abs(p[0] * b[1] - p[1] * b[0]);
const denumerator = @abs(@as(i32, a[0]) * b[1] - @as(i32, a[1]) * b[0]);
return if (numerator % denumerator != 0) 0 else numerator / denumerator;
}
// Then in the caller, skip if the return value is 0.
if (count_tokens(a, b, p) == 0) continue;
This works, but it’s easy to forget to check for those values, or worse, to accidentally treat them as valid results. Zig improves on this with optional types. If a function might not return a value, you can return ?T instead of T. This also forces the caller to handle the null case. Unlike C, null isn’t a pointer but a more general concept. Zig treats null as the absence of a value for any type, just like Rust’s Option<T>.
The count_tokens function can be refactored to:
// Return null instead if there's no valid result.
fn count_tokens(a: [2]u8, b: [2]u8, p: [2]i64) ?u64 {
const numerator = @abs(p[0] * b[1] - p[1] * b[0]);
const denumerator = @abs(@as(i32, a[0]) * b[1] - @as(i32, a[1]) * b[0]);
return if (numerator % denumerator != 0) null else numerator / denumerator;
}
// The caller is now forced to handle the null case.
if (count_tokens(a, b, p)) |n_tokens| {
// logic only runs when n_tokens is not null.
}
Zig also has a concept of error unions, where a function can return either a value or an error. In Rust, this is Result<T>. You could also use error unions instead of optionals for count_tokens; Zig doesn’t force a single approach. I come from Clojure, where returning nil for an error or missing value is common.
This year has a lot of 2D grid puzzles (arguably too many). A common feature of grid-based algorithms is the out-of-bounds check. Here’s what it usually looks like:
fn dfs(map: [][]u8, position: [2]i8) u32 {
const x, const y = position;
// Bounds check here.
if (x < 0 or y < 0 or x >= map.len or y >= map[0].len) return 0;
if (map[x][y] == .visited) return 0;
map[x][y] = .visited;
var result: u32 = 1;
for (directions) | direction| {
result += dfs(map, position + direction);
}
return result;
}
This is a typical recursive DFS function. After doing a lot of this, I discovered a nice trick that not only improves code readability, but also its performance. The trick here is to pad the grid with sentinel characters that mark out-of-bounds areas, i.e. add a border to the grid.
Here’s an example from day 6:
Original map: With borders added:
************
....#..... *....#.....*
.........# *.........#*
.......... *..........*
..#....... *..#.......*
.......#.. -> *.......#..*
.......... *..........*
.#..^..... *.#..^.....*
........#. *........#.*
#......... *#.........*
......#... *......#...*
************
You can use any value for the border, as long as it doesn’t conflict with valid values in the grid. With the border in place, the bounds check becomes a simple equality comparison:
const border = '*';
fn dfs(map: [][]u8, position: [2]i8) u32 {
const x, const y = position;
if (map[x][y] == border) { // We are out of bounds
return 0;
}
// ...
}
This is much more readable than the previous code. Plus, it’s also faster since we’re only doing one equality check instead of four range checks.
That said, this isn’t a one-size-fits-all solution. This only works for algorithms that traverse the grid one step at a time. If your logic jumps multiple tiles, it can still go out of bounds (except if you increase the width of the border to account for this). This approach also uses a bit more memory than the regular approach as you have to store more characters.
This could also go in the performance section, but I’m including it here because the biggest benefit I get from using SIMD in Zig is the improved code readability. Because Zig has first-class support for vector types, you can write elegant and readable code that also happens to be faster.
If you’re not familiar with vectors, they are a special collection type used for Single instruction, multiple data (SIMD) operations. SIMD allows you to perform computation on multiple values in parallel using only a single CPU instruction, which often leads to some performance boosts. [6] [6]
I mostly use vectors to represent positions and directions, e.g. for traversing a grid. Instead of writing code like this:
next_position = .{ position[0] + direction[0], position[1] + direction[1] };
You can represent position and direction as 2-element vectors and write code like this:
next_position = position + direction;
This is much nicer than the previous version!
Day 25 is another good example of a problem that can be solved elegantly using vectors:
var result: u64 = 0;
for (self.locks.items) |lock| { // lock is a vector
for (self.keys.items) |key| { // key is also a vector
const fitted = lock + key > @as(@Vector(5, u8), @splat(5));
const is_overlap = @reduce(.Or, fitted);
result += @intFromBool(!is_overlap);
}
}
Expressing the logic as vector operations makes the code cleaner since you don’t have to write loops and conditionals as you typically would in a traditional approach.
The tips below are general performance techniques that often help, but like most things in software engineering, “it depends”. These might work 80% of the time, but performance is often highly context-specific. You should benchmark your code instead of blindly following what other people say.
This section would’ve been more fun with concrete examples, step-by-step optimisations, and benchmarks, but that would’ve made the post way too long. Hopefully, I’ll get to write something like that in the future. [7] [7]
Whenever possible, prefer static allocation. Static allocation is cheaper since it just involves moving the stack pointer vs dynamic allocation which has more overhead from the allocator machinery. That said, it’s not always the right choice since it has some limitations, e.g. stack size is limited, memory size must be compile-time known, its lifetime is tied to the current stack frame, etc.
If you need to do dynamic allocations, try to reduce the number of times you call the allocator. The number of allocations you do matters more than the amount of memory you allocate. More allocations mean more bookkeeping, synchronisation, and sometimes syscalls.
A simple but effective way to reduce allocations is to reuse buffers, whether they’re statically or dynamically allocated. Here’s an example from day 10. For each trail head, we want to create a set of trail ends reachable from it. The naive approach is to allocate a new set every iteration:
for (self.trail_heads.items) |trail_head| {
var trail_ends = std.AutoHashMap([2]u8, void).init(self.allocator);
defer trail_ends.deinit();
// Set building logic...
}
What you can do instead is to allocate the set once before the loop. Then, each iteration, you reuse the set by emptying it without freeing the memory. For Zig’s std.AutoHashMap, this can be done using the clearRetainingCapacity method:
var trail_ends = std.AutoHashMap([2]u8, void).init(self.allocator);
defer trail_ends.deinit();
for (self.trail_heads.items) |trail_head| {
trail_ends.clearRetainingCapacity();
// Set building logic...
}
If you use static arrays, you can also just overwrite existing data instead of clearing it.
A step up from this is to reuse multiple buffers. The simplest form of this is to reuse two buffers, i.e. double buffering. Here’s an example from day 11:
// Initialise two hash maps that we'll alternate between.
var frequencies: [2]std.AutoHashMap(u64, u64) = undefined;
for (0..2) |i| frequencies[i] = std.AutoHashMap(u64, u64).init(self.allocator);
defer for (0..2) |i| frequencies[i].deinit();
var id: usize = 0;
for (self.stones) |stone| try frequencies[id].put(stone, 1);
for (0..n_blinks) |_| {
var old_frequencies = &frequencies[id % 2];
var new_frequencies = &frequencies[(id + 1) % 2];
id += 1;
defer old_frequencies.clearRetainingCapacity();
// Do stuff with both maps...
}
Here we have two maps to count the frequencies of stones across iterations. Each iteration will build up new_frequencies with the values from old_frequencies. Doing this reduces the number of allocations to just 2 (the number of buffers). The tradeoff here is that it makes the code slightly more complex.
A performance tip people say is to have “mechanical sympathy”. Understand how your code is processed by your computer. An example of this is to structure your data so it works better with your CPU. For example, keep related data close in memory to take advantage of cache locality.
Reducing the size of your data helps with this. Smaller data means more of it can fit in cache. One way to shrink your data is through bit packing. This depends heavily on your specific data, so you’ll need to use your judgement to tell whether this would work for you. I’ll just share some examples that worked for me.
The first example is in day 6 part two, where you have to detect a loop, which happens when you revisit a tile from the same direction as before. To track this, you could use a map or a set to store the tiles and visited directions. A more efficient option is to store this direction metadata in the tile itself.
There are only four tile types, which means you only need two bits to represent the tile types as an enum. If the enum size is one byte, here’s what the tiles look like in memory:
.obstacle -> 00000000
.path -> 00000001
.visited -> 00000010
.path -> 00000011
As you can see, the upper six bits are unused. We can store the direction metadata in the upper four bits. One bit for each direction. If a bit is set, it means that we’ve already visited the tile in this direction. Here’s an illustration of the memory layout:
direction metadata tile type
┌─────┴─────┐ ┌─────┴─────┐
┌────────┬─┴─┬───┬───┬─┴─┬─┴─┬───┬───┬─┴─┐
│ Tile: │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │ 1 │ 0 │
└────────┴─┬─┴─┬─┴─┬─┴─┬─┴───┴───┴───┴───┘
up bit ─┘ │ │ └─ left bit
right bit ─┘ down bit
If your language supports struct packing, you can express this layout directly: [8] [8]
const Tile = packed struct(u8) {
const TileType = enum(u4) { obstacle, path, visited, exit };
up: u1 = 0,
right: u1 = 0,
down: u1 = 0,
left: u1 = 0,
tile: TileType,
// ...
}
Doing this avoids extra allocations and improves cache locality. Since the directions metadata is colocated with the tile type, all of them can fit together in cache. Accessing the directions just requires some bitwise operations instead of having to fetch them from another region of memory.
Another way to do this is to represent your data using alternate number bases. Here’s an example from day 23. Computers are represented as two-character strings made up of only lowercase letters, e.g. "bc", "xy", etc. Instead of storing this as a [2]u8 array, you can convert it into a base-26 number and store it as a u16.
[9]
[9]
Here’s the idea: map 'a' to 0, 'b' to 1, up to 'z' as 25. Each character in the string becomes a digit in the base-26 number. For example, "bc" ( [2]u8{ 'b', 'c' }) becomes the base-10 number 28 (). If we represent this using the base-64 character set, it becomes 12 ('b' = 1, 'c' = 2).
While they take the same amount of space (2 bytes), a u16 has some benefits over a [2]u8:
I won’t explain branchless programming here; Algorithmica explains it way better than I can. While modern compilers are often smart enough to compile away branches, they don’t catch everything. I still recommend writing branchless code whenever it makes sense. It also has the added benefit of reducing the number of codepaths in your program.
Again, since performance is very context-dependent, I’ll just show you some patterns I use. Here’s one that comes up often:
if (is_valid_report(report)) {
result += 1;
}
Instead of the branch, cast the bool into an integer directly:
result += @intFromBool(is_valid_report(report))
Another example is from day 6 (again!). Recall that to know if a tile has been visited from a certain direction, we have to check its direction bit. Here’s one way to do it:
fn has_visited(tile: Tile, direction: Direction) bool {
switch (direction) {
.up => return self.up == 1,
.right => return self.right == 1,
.down => return self.down == 1,
.left => return self.left == 1,
}
}
This works, but it introduces a few branches. We can make it branchless using bitwise operations:
fn has_visited(tile: Tile, direction: Direction) bool {
const int_tile = std.mem.nativeToBig(u8, @bitCast(tile));
const mask = direction.mask();
const bits = int_tile & 0xff; // Get only the direction bits
return bits & mask == mask;
}
While this is arguably cryptic and less readable, it does perform better than the switch version.
The final performance tip is to prefer iterative code over recursion. Recursive functions bring the overhead of allocating stack frames. While recursive code is more elegant, it’s also often slower unless your language’s compiler can optimise it away, e.g. via tail-call optimisation. As far as I know, Zig doesn’t have this, though I might be wrong.
Recursion also has the risk of causing a stack overflow if the execution isn’t bounded. This is why code that is mission- or safety-critical avoids recursion entirely. It’s in TigerBeetle’s TIGERSTYLE and also NASA’s Power of Ten.
Iterative code can be harder to write in some cases, e.g. DFS maps naturally to recursion, but most of the time it is significantly faster, more predictable, and safer than the recursive alternative.
I ran benchmarks for all 25 solutions in each of Zig’s optimisation modes. You can find the full results and the benchmark script in my GitHub repository. All benchmarks were done on an Apple M3 Pro.
As expected, ReleaseFast produced the best result with a total runtime of 85.1 ms. I’m quite happy with this, considering the two constraints that limited the number of optimisations I can do to the code:
You can see the full benchmarks for ReleaseFast in the table below:
| Day | Title | Parsing (µs) | Part 1 (µs) | Part 2 (µs) | Total (µs) |
|---|---|---|---|---|---|
| 1 | Historian Hysteria | 23.5 | 15.5 | 2.8 | 41.8 |
| 2 | Red-Nosed Reports | 42.9 | 0.0 | 11.5 | 54.4 |
| 3 | Mull it Over | 0.0 | 7.2 | 16.0 | 23.2 |
| 4 | Ceres Search | 5.9 | 0.0 | 0.0 | 5.9 |
| 5 | Print Queue | 22.3 | 0.0 | 4.6 | 26.9 |
| 6 | Guard Gallivant | 14.0 | 25.2 | 24,331.5 | 24,370.7 |
| 7 | Bridge Repair | 72.6 | 321.4 | 9,620.7 | 10,014.7 |
| 8 | Resonant Collinearity | 2.7 | 3.3 | 13.4 | 19.4 |
| 9 | Disk Fragmenter | 0.8 | 12.9 | 137.9 | 151.7 |
| 10 | Hoof It | 2.2 | 29.9 | 27.8 | 59.9 |
| 11 | Plutonian Pebbles | 0.1 | 43.8 | 2,115.2 | 2,159.1 |
| 12 | Garden Groups | 6.8 | 164.4 | 249.0 | 420.3 |
| 13 | Claw Contraption | 14.7 | 0.0 | 0.0 | 14.7 |
| 14 | Restroom Redoubt | 13.7 | 0.0 | 0.0 | 13.7 |
| 15 | Warehouse Woes | 14.6 | 228.5 | 458.3 | 701.5 |
| 16 | Reindeer Maze | 12.6 | 2,480.8 | 9,010.7 | 11,504.1 |
| 17 | Chronospatial Computer | 0.1 | 0.2 | 44.5 | 44.8 |
| 18 | RAM Run | 35.6 | 15.8 | 33.8 | 85.2 |
| 19 | Linen Layout | 10.7 | 11,890.8 | 11,908.7 | 23,810.2 |
| 20 | Race Condition | 48.7 | 54.5 | 54.2 | 157.4 |
| 21 | Keypad Conundrum | 0.0 | 1.7 | 22.4 | 24.2 |
| 22 | Monkey Market | 20.7 | 0.0 | 11,227.7 | 11,248.4 |
| 23 | LAN Party | 13.6 | 22.0 | 2.5 | 38.2 |
| 24 | Crossed Wires | 5.0 | 41.3 | 14.3 | 60.7 |
| 25 | Code Chronicle | 24.9 | 0.0 | 0.0 | 24.9 |
A weird thing I found when benchmarking is that for day 6 part two, ReleaseSafe actually ran faster than ReleaseFast (13,189.0 µs vs 24,370.7 µs). Their outputs are the same, but for some reason, ReleaseSafe is faster even with the safety checks still intact.
The Zig compiler is still very much a moving target, so I don’t want to dig too deep into this, as I’m guessing this might be a bug in the compiler. This weird behaviour might just disappear after a few compiler version updates.
Looking back, I’m really glad I decided to do Advent of Code and followed through to the end. I learned a lot of things. Some are useful in my professional work, some are more like random bits of trivia. Going with Zig was a good choice too. The language is small, simple, and gets out of your way. I learned more about algorithms and concepts than the language itself.
Besides what I’ve already mentioned earlier, here are some examples of the things I learned:
Some of my self-imposed constraints and rules ended up being helpful. I can still (mostly) understand the code I wrote a few months ago. Putting all of the code in a single file made it easier to read since I don’t have to context switch to other files all the time.
However, some of them did backfire a bit, e.g. the two constraints that limit how I can optimise my code. Another one is the “hardcoding allowed” rule. I used a lot of magic numbers, which helped to improve performance, but I didn’t document them, so after a while, I don’t even remember how I got them. I’ve since gone back and added explanations in my write-ups, but next time I’ll remember to at least leave comments.
One constraint I’ll probably remove next time is the no concurrency rule. It’s the biggest contributor to the total runtime of my solutions. I don’t do a lot of concurrent programming, even though my main language at work is Go, so next time it might be a good idea to use Advent of Code to level up my concurrency skills.
I also spent way more time on these puzzles than I originally expected. I optimised and rewrote my code multiple times. I also rewrote my write-ups a few times to make them easier to read. This is by far my longest side project yet. It’s a lot of fun, but it also takes a lot of time and effort. I almost gave up on the write-ups (and this blog post) because I don’t want to explain my awful day 15 and day 16 code. I ended up taking a break for a few months before finishing it, which is why this post is published in August lol.
Just for fun, here’s a photo of some of my notebook sketches that helped me visualise my solutions. See if you can guess which days these are from:

So… would I do it again? Probably, though I’m not making any promises. If I do join this year, I’ll probably stick with Zig. I had my eyes on Zig since the start of 2024, so Advent of Code was the perfect excuse to learn it. This year, there aren’t any languages in particular that caught my eye, so I’ll just keep using Zig, especially since I have a proper setup ready.
If you haven’t tried Advent of Code, I highly recommend checking it out this year. It’s a great excuse to learn a new language, improve your problem-solving skills, or just learn something new. If you’re eager, you can also do the previous years’ puzzles as they’re still available.
One of the best aspects of Advent of Code is the community. The Advent of Code subreddit is a great place for discussion. You can ask questions and also see other people’s solutions. Some people also post really cool visualisations like this one. They also have memes!
I failed my first attempt horribly with Clojure during Advent of Code 2023. Once I reached the later half of the event, I just couldn’t solve the problems with a purely functional style. I could’ve pushed through using imperative code, but I stubbornly chose not to and gave up… ↩︎
The original constraint was that each solution must run in under one second. As it turned out, the code was faster than I expected, so I increased the difficulty. ↩︎
TigerBeetle’s code quality and engineering principles are just wonderful. ↩︎
You can implement this function without any allocation by mutating the string in place or by iterating over it twice, which is probably faster than my current implementation. I kept it as-is as a reminder of what comptime can do. ↩︎
As a bonus, I was curious as to what this looks like compiled, so I listed all the functions in this binary in GDB and found:
72: static bool day04.Day04(140).matches__anon_19741;
72: static bool day04.Day04(140).matches__anon_19750;
It does generate separate functions! ↩︎
Well, not always. The number of SIMD instructions depends on the machine’s native SIMD size. If the length of the vector exceeds it, Zig will compile it into multiple SIMD instructions. ↩︎
Here’s a nice post on optimising day 9’s solution with Rust. It’s a good read if you’re into performance engineering or Rust techniques. ↩︎
One thing about packed structs is that their layout is dependent on the system endianness. Most modern systems are little-endian, so the memory layout I showed is actually reversed. Thankfully, Zig has some useful functions to convert between endianness like std.mem.nativeToBig, which makes working with packed structs easier. ↩︎
Technically, you can store 2-digit base 26 numbers in a u10, as there are only possible numbers. Most systems usually pad values by byte size, so u10 will still be stored as u16, which is why I just went straight for it. ↩︎
Another Ultra Process gig for The Printer Jam.
Venue: Folklore, Hoxton.
Tickets here.
Image credit: Evan Raskob.
Notes
A lot of great things have origins from the 1970s: Hip Hop redefining music and street culture, Bruce Lee was taking Martial Arts to the next level and the initial development of something called editor macros (also known as Emacs) was happening. I was born in that decade, but that's purely coincidence.
My choice of primary development tool since a couple of years back is that editor from the seventies. It is my choice of development for Python, JavaScript, TypeScript and Lisp dialects such as Clojure and elisp. And today, as an agentic engineer, it turned out to be a great choice for this kind of software development too. With the rise of various CLI, TUI & Desktop based tools for AI development, it would be reasonable to think that this ancient code editor would become obsolete - right?
Not if you knew about the innovative Emacs community. It is driven by passion, support from the community itself and Open Source. These components are usually more resilient and reliable long term than the VC driven startup culture. Emacs is part of the greater Lisp community, where a lot of innovations in general take place. The Clojure community is cutting edge in many aspects of software development including AI.
One thing that I have noticed lately is that the more I get into Agentic Engineering, the more I use Emacs. When the focus has shifted from typing code to instruct and review, I have found use of Emacs powers I haven't really needed until now. Tools like Magit (git) and I'm also learning more about the powerful Org Mode. I didn't care that much about Markdown before, but now it is an important part of the development itself. So I just configured my Emacs to have a nice-looking, simple and readable markdown experience.
"More Agentic Engineering, More Emacs"
With Emacs, I use a great AI-tool called Eca and with it I am not limited to any specific vendor for agentic development. Vendor lock-in is something I really want to avoid. The combination of Eca and the power tools mentioned before, makes a very nice Agentic Engineering toolset. Eca is actively developed and has a lot of useful features and a very nice developer experience. It supports standards like AGENTS.md, commands, skills, hooks, sub-agents and use a client-server setup in the same way as the language server protocol. It is Open Source and not only for Emacs. Have a look at the website for support of your favorite editor or IDE. By the way, Eca is developed in Lisp (Clojure).
I have my Eca-setup shared at GitHub, and have also some contributions to the Eca plugins repository.
With this setup, the human reviewing can happen in real time, and doesn't have to wait until the end where the amount of code too often is quite overwhelming. The human developer (that's me) can quickly act when noticing that things takes a different route than expected, in a similar way as the stop-the-line principle from the Toyota Way. This is a lean way to reach the end goal quickly: deploying code that is good enough for production and adds value.
I have found that many Agile practices in combination with developer friendly tools fits well with the ideas of Agentic Engineering. Even though I've seen worrying signs of a return of the Waterfall movement.
To summarize: the result of my new Agentic Engineering development-style is that I haven't put my IDE to the side - it's at the very Center of the agentic workflow.
The Role
We're looking for a senior engineer with deep ClojureScript expertise to work directly with our CTO and leadership team on high-impact technical initiatives.
This role spans cross-team work that pushes the boundaries of what's possible: accelerating product innovation through AI-assisted development, shaping our product's future through rapid experimentation, and shipping delightful, performant software at scale.
What You’ll Do
Requirements
Nice to Haves
Cofinite sets with one sentinel key: all set operations reduce to three map primitives.
AI coding agents are powerful — but they're also blind. Every time Claude Code, Codex, or Gemini CLI needs to understand your codebase, they explore it file by file. Grep here, read there, grep again. For a simple question like "what calls ProcessOrder?", an agent might burn through 45,000 tokens just opening files and scanning for matches.
I built codebase-memory-mcp to fix this. It parses your codebase into a persistent knowledge graph — functions, classes, call chains, imports, HTTP routes — and exposes it through 14 MCP tools. The same question now costs ~200 tokens and answers in under 1ms.
Here's what actually happens when you ask an AI agent "trace the callers of ProcessOrder":
ProcessOrder across all files (~15,000 tokens)Multiply this by every question in a coding session and you're burning hundreds of thousands of tokens per hour — most of it reading files that aren't relevant.
codebase-memory-mcp runs a one-time indexing pass using tree-sitter AST parsing. It extracts every function, class, method, import, call relationship, and HTTP route into a SQLite-backed graph. After that, the graph stays fresh automatically via a background watcher that detects file changes.
You: "what calls ProcessOrder?"
Agent calls: trace_call_path(function_name="ProcessOrder", direction="inbound")
→ Returns structured call chain in ~200 tokens, <1ms
No LLM is embedded in the server. Your agent is the intelligence layer — it just gets precise structural answers instead of raw file contents.
I ran agent-vs-agent testing across 31 languages (372 questions). Five representative structural queries on a real multi-service project:
| Query Type | Knowledge Graph | File-by-File Search | Savings |
|---|---|---|---|
| Find function by pattern | ~200 tokens | ~45,000 tokens | 225x |
| Trace call chain (depth 3) | ~800 tokens | ~120,000 tokens | 150x |
| Dead code detection | ~500 tokens | ~85,000 tokens | 170x |
| List all HTTP routes | ~400 tokens | ~62,000 tokens | 155x |
| Architecture overview | ~1,500 tokens | ~100,000 tokens | 67x |
| Total | ~3,400 | ~412,000 | 121x |
That's a 99.2% reduction. The cost difference between graph queries and file exploration adds up fast over a full development session.
The stress test I'm most proud of: indexing the entire Linux kernel.
The pipeline is RAM-first: LZ4-compressed bulk read, in-memory SQLite, fused Aho-Corasick pattern matching, single dump at the end. Memory is released back to the OS after indexing completes. Average-sized repos index in milliseconds.
All 64 language grammars are vendored as C source and compiled into a single static binary. Nothing to install, nothing that breaks when tree-sitter updates a grammar upstream.
Programming languages (39): Python, Go, JavaScript, TypeScript, TSX, Rust, Java, C++, C#, C, PHP, Ruby, Kotlin, Scala, Swift, Dart, Zig, Elixir, Haskell, OCaml, Objective-C, Lua, Bash, Perl, Groovy, Erlang, R, Clojure, F#, Julia, Vim Script, Nix, Common Lisp, Elm, Fortran, CUDA, COBOL, Verilog, Emacs Lisp
Scientific (5): MATLAB, Lean 4, FORM, Magma, Wolfram
Config/markup (20): HTML, CSS, SCSS, YAML, TOML, HCL, SQL, Dockerfile, JSON, XML, Markdown, Makefile, CMake, Protobuf, GraphQL, Vue, Svelte, Meson, GLSL, INI
This matters because real-world codebases aren't monolingual. A typical project has Go backends, TypeScript frontends, SQL migrations, Dockerfiles, YAML configs, and shell scripts. One indexing pass captures all of it. We also already introduced more advanced indexing using LSP like techniques, basically creating a "LSP + Tree-Sitter" hybrid approach. Currently only supported for Go, C and C++, more supported languages coming soon.
The full tool surface:
| Tool | What it does |
|---|---|
search_graph |
Find functions/classes by name pattern, label, degree |
trace_call_path |
Follow callers/callees at configurable depth |
get_architecture |
Languages, packages, entry points, routes, hotspots, clusters |
detect_changes |
Map git diff to affected symbols with risk classification |
query_graph |
Raw Cypher queries (MATCH (f:Function)-[:CALLS]->(g)...) |
search_code |
Full-text search across indexed source |
get_code_snippet |
Read a specific function/class by qualified name |
get_graph_schema |
Inspect available node/edge types |
manage_adr |
Architecture Decision Records that persist across sessions |
index_repository |
Trigger initial index (auto-sync handles the rest) |
list_projects |
Show all indexed repos with stats |
delete_project |
Clean up a project's graph data |
index_status |
Check indexing progress |
ingest_traces |
Import OpenTelemetry traces into the graph |
One install command auto-detects and configures all of these:
The hooks are advisory — they remind agents to check the graph before reaching for grep/glob/read, without blocking anything.
# Download (or use the one-liner: curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/scripts/setup.sh | bash)
tar xzf codebase-memory-mcp-*.tar.gz
mv codebase-memory-mcp ~/.local/bin/
# Auto-configure all detected agents
codebase-memory-mcp install
# Restart your agent, then:
"Index this project"
That's it. No Docker, no API keys, no npm install, no runtime dependencies. A ~15MB static binary for macOS (arm64/amd64), Linux (arm64/amd64), or Windows (amd64).
If you download the UI variant, you get a 3D interactive graph explorer at localhost:9749:
It runs as a background thread alongside the MCP server — available whenever your agent is connected.
A lot of code intelligence tools embed an LLM for natural language → graph query translation. This means extra API keys, extra cost, and another model to configure and keep updated.
With MCP, the agent you're already talking to is the query translator. It reads tool descriptions, understands your question, and makes the right tool call. No intermediate LLM needed.
Similarly, the tool focuses on structural precision over semantic fuzziness. When an agent asks "what calls X?", it needs an exact answer — not a ranked list of "probably related" functions. The graph gives exact call chains with import-aware, type-inferred resolution.
If you're burning tokens on file-by-file exploration, give it a shot. Index your project and ask your agent a structural question — the difference is immediate.
Built with pure C, tree-sitter, and SQLite. No runtime dependencies. 780+ stars and growing. We built it for developers using coding agents. We want to reach here the most performant solution in this space as we believe it will enable more efficient coding for everyone and vice versa will translate in more good solutions coming up, faster and cheaper in token burn
Having played both parts in the kabuki play that is employee-employer matchmaking, I feel the way we play it is a zero-sum game. I wish it were not so. When this post started life in 2024, as a wall of text chat message, it was brutal out there, on both sides of the software industry interview table. The ZIRP had ended. As of 2026, post-ZIRP reality has properly set in and remains bad ("AI" is a Fig Leaf (Enterprise Edition) for structural damage they self-inflicted, and if you look at Hyperscaler GPU depreciation schedules, they are making it order-of-magnitude worse). Set to that backdrop, here is a hopefully hopeful hiring anecdote where I think we avoided the so-called "Secretary Problem", framed within Optimal Stopping Theory. It can be done. Non-zero-sum hiring ought to be default-mode for any industry, AI or no AI.