When you let an AI agent write Clojure code, you expect it to leverage the language's superpowers—the REPL's interactivity, structural editing, format-preserving code manipulation, and the rich ecosystem of wrapper libraries. Instead, what you typically see is mediocre code written slowly, as the agent makes the same mistakes every developer learns to avoid.
I discovered this the hard way.
The Setup: Vibe Coding with Observations
While building lite-crm with Claude Code, I deliberately avoided the –dangerously-skip-permissions flag. Instead, I sat beside the agent and watched it work—observing its patterns, frustrations, and failures. What I saw was an agent trained on millions of codebases but ignorant of how Clojure practitioners actually think.
Three concrete problems emerged:
Problem 1: The Wrapper Library Blind Spot
When encountering Java interop, the agent jumps straight into direct interoperability without ever asking: "Is there a Clojure wrapper library for this?"
The result: Uglier code, harder to maintain, and a missed opportunity for idiomatic Clojure.
Problem 2: Formatting Brittleness
Code formatters like cljfmt are essential—but they create a sneaky problem. When the agent modifies source and the formatter shifts indentation by a single space, the agent's subsequent str_replace operations fail due to whitespace mismatch.
The result: I watched it fail, retry, fail again, then give up and rewrite entire files. Enormous token waste.
Problem 3: Primitive Debugging
When a test failed, the agent fell back on the crudest debugging technique: add println statements, run the test, inspect output, delete the logs, restore the code. Repeat.
This is especially wasteful in a Clojure project where I've provided direct access to the REPL via the brepl CLI. The agent could inspect values interactively, test hypotheses instantly, and trace execution without touching source code. But it never did.
The Recognition
These weren't knowledge gaps. They were behavioral gaps—places where the agent's default approach conflicted with Clojure expertise.
In the context of Clojure Stack Lite (which includes proper testing harness and real database, not mocks), the agent wasn't just writing suboptimal code—it was making design decisions based on unfamiliar tools.
I decided to address this not by teaching the agent more facts, but by redirecting its behavior.
Four Skills to Close the Gap
The result is four skills, each targeting a specific behavioral pattern that distinguishes novice agents from expert Clojure practitioners:
1. clj-debug: From Logging to REPL Inspection
The Problem: Agents default to adding println, tap>, or logging statements, then running tests to inspect output.
The Pattern: In Clojure, this is backwards. The REPL lets you pin a value with def, explore its structure instantly, test hypotheses interactively—all without modifying code.
What the skill does: When you're about to debug, clj-debug redirects from logging patterns to REPL-based inline inspection. It teaches the agent to use def, keys, keyword access, and structural exploration—the actual workflow expert Clojure developers follow.
Behavioral change: From edit-test-inspect cycle to interactive REPL inspection. This is faster, non-invasive, and gives immediate feedback.
2. clj-discover: Systematic API Exploration
The Problem: When encountering unfamiliar Java classes or macros, agents jump to direct integration without exploring whether an idiomatic Clojure wrapper already exists.
The Pattern: Expert Clojure developers follow a deliberate workflow:
Search for a Clojure wrapper library first (usually there is one)
If not, inspect the Java class via reflection
For macros, expand them to understand what code they generate
What the skill does: clj-discover codifies this workflow, ensuring the agent prioritizes idiomatic libraries and systematic exploration before writing integration code.
Behavioral change: From direct interop to research-first integration. The result is cleaner, more maintainable code.
The Problem: Code formatters shift indentation by spaces, breaking text-based str_replace. The agent then wastes tokens failing repeatedly or rewriting entire files.
The Pattern: Clojure is homoiconic—code is data. Two S-expressions are semantically equivalent even if formatted differently. Expert editors handle this automatically via structural editing.
What the skill does: clj-replace compares code by structure (S-expression equivalence) rather than text, ignoring whitespace while preserving the original file's formatting style. It uses the rewrite-clj library to parse, match, and replace nodes safely.
Behavioral change: From brittle text matching to robust structural matching. Formatting variations become irrelevant.
4. clj-refactor: Mechanism/Policy Separation
The Problem: Without guidance, agents write tangled code where reusable mechanisms are mixed with business policy, creating inflexible designs that accumulate technical debt.
The Pattern: Arne Brasseur's mechanism/policy separation principle is core to building maintainable Clojure systems. Mechanism is context-free, stable, and reusable. Policy is opinionated, domain-specific, and volatile. Expert developers keep these separate.
What the skill does: clj-refactor scans code for opportunities to extract mechanisms from policy—functions where hard-coded values or implicit context can be made explicit, dependencies can be pushed to parameters, and reusable logic can be isolated.
Behavioral change: From monolithic functions to extracted, composable mechanisms. Code becomes easier to test, reuse, and reason about.
Note: Unlike clj-debug, clj-discover, and clj-replace—which activate automatically when the agent encounters problems—clj-refactor is user-initiated. You invoke it when you want the agent to analyze code for refactoring opportunities, not in response to a failure.
Why This Matters
These aren't reference manuals or API documentation. They're workflow redirects—rules that teach AI agents to think like expert Clojure developers instead of generic code writers.
The underlying philosophy is simple: A skill's value is measured by behavioral change, not knowledge transfer.
When an agent uses clj-debug, it stops adding logging. When it uses clj-discover, it checks for idiomatic wrappers before raw interop. When it uses clj-replace, formatting becomes irrelevant. When you invoke clj-refactor, the agent identifies tangled mechanisms and suggests extraction. Each skill shifts the agent's default patterns closer to expert practice.
This matters because Clojure is a language of leverage. The REPL, immutability, homoiconicity, and the functional approach all reward practitioners who use them correctly. An agent that doesn't leverage these features isn't just writing slow code—it's missing the point of the language.
The goal is simple: your AI agent shouldn't just write Clojure code—it should think like a Clojure developer. These four skills make that possible.
I keep seeing people share vibe-coded apps built on TypeScript/React + Supabase — seemingly the default recommendation from Lovable or Cursor. As a Clojure programmer, I can't stay quiet about this. In an era where AI agents are deeply embedded in the development workflow, that choice carries structural hidden costs that almost nobody is talking about.
Context Window Is the Bottleneck, and Framework Design Determines Burn Rate
LongCodeBench research shows that Claude 3.5 Sonnet's accuracy on bug-fixing tasks drops from 29% to 3% as context grows from 32K to 256K tokens. Chroma tested 18 frontier models and found the same pattern across all of them.
Coding agents accelerate this degradation: every tool call, every file read, every error message accumulates in the context. A 30-step agent session can consume more than ten times the context of a single conversation turn.
Countless efforts are already underway to manage context from the harness-design side — but the tech stack itself has an enormous impact on context efficiency that rarely gets discussed.
Task-Relevant Subgraph
An AI agent completing a task doesn't need to read the entire codebase — only the files relevant to that task. Call this set the task-relevant subgraph. The size of the subgraph is determined by the architectural design of the framework, not by the model.
The problem with TypeScript + React + Supabase is that a single feature naturally spans multiple layers — component, hook, state, API client, type definition — each living in a different file. The subgraph starts large and only grows as shared dependencies accumulate.
AI tends to recommend the stack it was trained on the most, but "easy to generate" is not the same as "efficient for long-term AI-assisted development." These are two different things.
What Makes a Stack More Agent-Ready
My current go-to is Clojure Stack Lite, and several of its design choices structurally shrink the task-relevant subgraph.
HTMX eliminates implicit client state. React state is scattered across multiple interdependent files; to verify behavior, an agent has to simulate browser interactions. HTMX is driven by server responses, so an agent can verify with a plain curl — the response is an HTML fragment, right or wrong, no ambiguity.
HoneySQL eliminates implicit lazy loading. When an ORM produces an N+1 problem, the debug subgraph includes model definitions, association configs, and migration files, because the issue is buried in implicit behavior. HoneySQL expresses queries as SQL-as-data — no lazy loading, no association magic. N+1 can't happen silently, because the syntax simply doesn't allow it to sneak in. The debug subgraph shrinks from five files to one.
Blocking IO eliminates implicit error paths. The fundamental problem with async isn't the syntax — it's that error paths are implicit. Every async call site is a potential break point where an exception can detach from the main flow. To locate a root cause, an agent must trace the entire call chain, and context width grows linearly with chain length. Clojure's blocking IO has no async boundaries; exceptions follow a single path — propagate upward, handled uniformly in middleware. When debugging, an agent only needs two places: the middleware log and the call site the log points to. Context scope stays fixed regardless of system size.
Explicit Over Implicit Is Not Just a Clojure Virtue
All three points share a common structure: the less implicit behavior, the smaller the context an agent needs to bring in.
The point here isn't a framework or language comparison — it's an observation about design philosophy. Explicit over implicit is a virtue for human developers; for AI agents, it's a structural guarantee that they won't go dumb prematurely.
Design principles the Clojure community has championed for years happen to be a competitive advantage in the AI agent era. I've chosen to frame this in terms of context efficiency, hoping it helps more people appreciate what the Clojure community figured out a long time ago.
On of the first programs I ever ran on a computer was a text adventure game, also known as
Interactive Fiction. I think the first one I played was Adventureland
by Scott Adams, which was based on the
first ever text adventure called Adventure
by Crowther and Woods. Adventureland was the first text adventure available for personal computers.
Not long after that I discovered Zork I, the first game by
Infocom. I loved the Infocom games, I played most of them,
spending many hours solving the games.
Open psql. Connect. Run a query. Switch branches. Run it again — same connection, same wire protocol, different version of the database.
$ psql postgresql://localhost:5432/inventoryinventory=> SELECT count(*) FROM widget; count------- 4218inventory=> SET datahike.branch = 'pricing-experiment';SETinventory=> SELECT count(*) FROM widget; count------- 4221inventory=> RESET datahike.branch;SET
That’s not a feature toggle on a Postgres replica. It’s the same database — addressed through standard pgwire — viewed through two different commits. The implementation is pg-datahike, a beta we’re shipping today.
What it is
pg-datahike embeds a PostgreSQL-compatible adapter inside a Datahike process: wire protocol, SQL translator, virtual pg_* and information_schema catalogs, constraint enforcement, schema hints. Clients that speak Postgres talk to Datahike without a Postgres install — pgjdbc, Hibernate, SQLAlchemy, Odoo 19, and Metabase bootstrap unmodified against it. The migration path is round-trippable: pg_dump output replays into pg-datahike via psql, and the standalone jar dumps Datahike databases back out as portable PG SQL. Detailed test results at the end of this post.
A 60-second tour
The operator runs one jar. Everything else is psql.
$ java -jar pg-datahike-VERSION-standalone.jarpg-datahike VERSION ready on 127.0.0.1:5432 backend: file (~/.local/share/pg-datahike) history: off CREATE DATABASE: enabled databases: ["datahike"]Connect with: psql -h 127.0.0.1 -p 5432 -U datahike datahikePress Ctrl+C to stop.
JDK 17+ is the only prerequisite; the jar is on GitHub releases. --memory for an ephemeral run; --help covers the rest.
The rest is psql — provision a fresh database, populate it, pin a session to a historical commit, drop it.
$ psql postgresql://localhost:5432/datahikedatahike=> CREATE DATABASE inventory;CREATE DATABASEdatahike=> \c inventoryYou are now connected to database "inventory".inventory=> CREATE TABLE widget (sku TEXT PRIMARY KEY, weight INT);CREATE TABLEinventory=> INSERT INTO widget VALUES ('A', 10), ('B', 20);INSERT 0 2inventory=> SELECT datahike.commit_id(); commit_id--------------------------------------- b4f2e1c0-2feb-5b61-be14-5590b9e01e48 ← copy thisinventory=> INSERT INTO widget VALUES ('C', 30);INSERT 0 1inventory=> SELECT count(*) FROM widget; count------- 3inventory=> SET datahike.commit_id = 'b4f2e1c0-2feb-5b61-be14-5590b9e01e48';SETinventory=> SELECT count(*) FROM widget; -- the database before the third insert count------- 2inventory=> RESET datahike.commit_id;SETinventory=> \c datahikedatahike=> DROP DATABASE inventory;DROP DATABASE
SET datahike.commit_id pins the session to a historical commit; everything else is plain Postgres. Sixty seconds, one jar, no Postgres install, no Clojure.
Architecture in one minute
What happens when you SET datahike.branch = 'feature'?
Datahike stores its database as a tree of immutable nodes in konserve, a key-value abstraction over filesystems, S3, JDBC, IndexedDB, and others. Every transaction writes new nodes for changed paths and shares unchanged subtrees with the previous version — the trick behind Clojure’s persistent vectors and Git’s object store. A commit is a small map listing the root pointers for each index; a branch is a named pointer at a commit.
So on SET datahike.branch = 'feature', the handler updates a session variable, and the next query loads that branch’s commit pointer from konserve, walks the tree, returns rows. No coordination with a transactor; storage is the source of truth. SET datahike.commit_id = '<uuid>' works the same way one level deeper — the session points at a specific commit instead of a branch head.
Two consequences worth flagging:
Branching is one konserve write. Creating a branch from any commit is constant time, regardless of database size, because structural sharing means the new branch points at existing nodes.
Reads don’t go through a transactor. Every node is content-addressable; any process that can read the storage can run queries against it. In principle, read fanout is bounded by storage bandwidth, not replica capacity — we’ll publish numbers in a follow-up. See Memory That Collaborates for more.
Integration patterns
1. Multi-database server
A single start-server call serves many Datahike connections. Clients route on the JDBC URL’s database name:
SELECT current_database() returns the connected name; pg_database enumerates the registry. Useful for multi-tenant deployments, or when ops wants one pgwire endpoint serving many independent stores.
2. Schema hints
Existing Datahike schemas don’t always look the way you’d want them to over SQL. :datahike.pg/* meta-attributes customize the SQL view without touching the underlying schema:
(pg/set-hint! conn :person/full_name {:column "name"}) ; rename the column
(pg/set-hint! conn :person/ssn {:hidden true}) ; exclude from SQL
(pg/set-hint! conn :person/company {:references :company/id}) ; FK target
After set-hint!, SELECT name FROM person works, ssn is invisible to SELECT * and information_schema.columns, and JOIN company c ON p.company = c.id resolves on Datahike’s native ref semantics.
3. Time-travel via SET
Datahike’s temporal primitives are exposed as session variables. The client doesn’t need to know what as-of means — it just sets a variable:
Every subsequent query in the session sees the chosen view. A reporting tool that doesn’t know about Datahike can produce point-in-time reports by setting one variable.
4. Git-like branching
Branching is cheap in Datahike: every transaction produces a new immutable commit, so a branch is just a named pointer at a commit UUID. Creation is O(1) — one konserve write, no data copy, no WAL replay. pgwire exposes the read side and the admin operations through standard PG mechanisms:
-- IntrospectSELECT datahike.branches();SELECT datahike.current_branch();SELECT datahike.commit_id();-- Admin (konserve-level writes — they don't go through the tx writer)SELECT datahike.create_branch('preview', 'db'); -- 'db' is Datahike's default branch nameSELECT datahike.create_branch('from-cid', '69ea6ee1-…');SELECT datahike.delete_branch('preview');-- Session view: three cuts on the same immutable log.-- They compose — a feature branch's state as of yesterday is two SETs.SET datahike.branch = 'feature';SET datahike.commit_id = '69ea6ee1-2feb-5b61-be14-5590b9e01e48';SET datahike.as_of = '2024-01-15T00:00:00Z';
Or pin a branch at connect time via the JDBC URL:
jdbc:postgresql://localhost:5432/prod:feature → prod-conn, pinned to :featurejdbc:postgresql://localhost:5432/prod → prod-conn, default branch
SET datahike.commit_id = '<uuid>' is Datahike-unique: no other PG-compatible database lets a session pin to an exact commit identifier.
We’ll cover the structural-sharing model that makes branching this cheap in a follow-up post — including how it works across all the Datahike bindings, not just pgwire.
5. SQL-driven database provisioning
Set a :database-template on the server and pgwire clients self-provision and tear down databases over plain SQL. The template is a partial Datahike config; each CREATE DATABASE produces a fresh store with a generated UUID:
Postgres-only; silently accepted with a NOTICE so pg_dump round-trips work
The standalone jar enables this by default (use --no-create-database to disable). Embedded servers opt in via :database-template (or explicit :on-create-database / :on-delete-database hooks). Without one, CREATE / DROP DATABASE return SQLSTATE 0A000 feature_not_supported; mismatched preconditions return the standard PG SQLSTATEs.
Migrating from PostgreSQL
Wire compatibility extends to pg_dump SQL on both sides. Three workflows.
Real PostgreSQL → pg-datahike
pg_dump output replays straight into pg-datahike via psql or any JDBC client. Schema-side coverage: CREATE TABLE with FK constraints, CREATE SEQUENCE, DEFAULT nextval(…), CREATE TYPE … AS ENUM, CREATE DOMAIN, partitioned tables. Data-side: INSERT (single + multi-VALUES) and COPY … FROM stdin (text and CSV).
Run with the :pg-dump compat preset to silently accept constructs pg-datahike doesn’t model — triggers, functions, materialized views, ALTER OWNER:
Validated end-to-end against Chinook (15.6k rows, 11 tables, FKs, NUMERIC, TIMESTAMP) — full byte-identical bidirectional roundtrip — and Pagila (50k rows, 22 tables, ENUM, DOMAIN, partitioning, triggers, functions) — schema parses end-to-end, data loads.
pg-datahike → portable PG SQL
The standalone jar’s dump subcommand walks a Datahike database and emits pg_dump-shaped SQL. The output replays into either pg-datahike or real PostgreSQL via psql:
java -jar pg-datahike.jar dump --data-dir DIR --db NAME --out out.sqljava -jar pg-datahike.jar dump --config datahike-config.edn --copy
Flags cover INSERT-vs-COPY output, schema-only / data-only, and table exclusion. --config accepts a full Datahike config EDN, so any konserve backend works; store-id is auto-discovered.
What the resulting Datahike schema looks like
A native Datahike database — created with d/transact, never touched by SQL — also dumps as clean PG SQL. The inverse mapping is well-defined:
:db.unique/identity → PRIMARY KEY NOT NULL
:db.unique/value → UNIQUE
:db.cardinality/many T → T[] with PG array literals
:db.type/ref → bigint (the entity id; opt in to FK constraints with set-hint! :references)
So whether you start from a real PostgreSQL dump or from native Datahike, both sides translate cleanly through the same shape. The resulting schema is correct and queryable as both SQL relations and Datalog datoms. It isn’t always what you’d hand-design for entity-shaped Datalog queries — many apps stay with the relational shape, others evolve incrementally as they reach for Datalog’s strengths (pull patterns, rules, multi-source joins).
What it isn’t
This is a 0.1 beta and we want to be specific about the gaps:
PL/pgSQL, stored functions, triggers, rules, and materialized views are accepted under the :pg-dump compat preset (loaded but not executed); strict mode rejects them
No LISTEN / NOTIFY
No COPY … TO STDOUT (COPY … FROM stdin is supported in text and CSV formats)
FK ON DELETE enforced for NO ACTION / RESTRICT / CASCADE; SET NULL / SET DEFAULT and any ON UPDATE action are rejected at DDL
Single public schema — CREATE SCHEMA is silently accepted but a no-op
Cursor materialization is eager (entire result set held in memory)
No deferrable constraints
Generated columns parse but aren’t enforced
Writes always land on the connection’s default branch in 0.1, even when SET datahike.branch is active. Reads respect the pinned branch; writes don’t yet. Use datahike.versioning/branch! and merge! from Clojure for branch-targeted writes, or open a second connection on /<db>:<branch>.
Constraint enforcement is one-directional. SQL constraints declared via DDL (NOT NULL, CHECK, UNIQUE, FK RESTRICT) are enforced by the pgwire handler; direct (d/transact) writes from Clojure bypass them because Datahike’s schema doesn’t yet carry the constraint vocabulary. A future release will lift enforcement into the tx layer so both paths are gated.
Bulk-insert throughput is ~5,000 rows/sec on JDBC batch (Pagila replays in ~12s, Chinook in ~3s) — Datahike maintains EAVT/AEVT/AVET live, so a 10-column row costs ~10× a single index write. Tuned bulk paths in vanilla PG (COPY, pg_restore -j) are an order of magnitude faster, partly via deferred index construction; an analogous bulk-load fast path is a future item. Large migrations are overnight-cutover territory today.
The conformance posture is: pass for the workloads we’ve measured against, fail fast and loud everywhere else. We’d rather reject a stored procedure than execute it incorrectly.
Where this fits
If you’ve used Neon or Xata, the goal will look familiar — branchable Postgres. The mechanism is different. Their branches are control-plane operations: call the API, get a new compute instance over copy-on-write storage. pg-datahike’s branches are session-level — SET datahike.branch = 'feature' inside an open psql connection switches what you’re reading. No provisioning, no compute. An agent or a query planner can switch branches mid-session.
Commit pinning — SET datahike.commit_id = '<uuid>' — is the part where we don’t know of a peer. Neon’s time-travel is bounded by a 6h–1d restore window; pg-datahike pins to any historical commit, indefinitely. We have not seen another PG-compatible database expose this directly through the wire protocol.
Dolt is the closest in spirit — git-like semantics, commit pinning, time-travel — but Dolt is MySQL with a custom storage engine. pg-datahike rides on the standard Postgres wire protocol; every PG client works without modification.
The honest tradeoff: we are a compatibility layer over Datahike’s storage, not a fork of Postgres. Some features tied to the Postgres codebase — PL/pgSQL, the extension ecosystem, procedural languages — aren’t on our roadmap today. If you need those, use Postgres. If your bottleneck is versioning, branching, or reproducibility, this gets you there without leaving the wire protocol your tools already speak.
Datahike has been a Datalog database with a Clojure API and growing language bindings; pg-datahike isn’t a separate database, just another front end on the same store. There’s a sibling: Stratum, a SIMD-accelerated columnar engine that speaks the same wire protocol over an analytical column store with the same fork-as-pointer semantics. Both fit into a shared branching model — see Yggdrasil: Branching Protocols for how a Datahike database, a Stratum dataset, and a vector index can fork together at a single snapshot.
The rest of this post is for callers who do speak Clojure — the same data accessible as relations and as datoms, in-process queries that skip the wire, embedded mode without TCP, and configuration knobs that aren’t exposed over SQL.
Bidirectional view
The pgwire layer is a view onto Datahike’s datom store, not a separate representation. Tables you create over SQL show up as normal Datahike schemas, queryable from Clojure with (d/q …). Existing Datahike schemas show up as SQL tables with no setup.
-- Same database, over psql:SELECT * FROM person;-- id | name-- ----+--------- 1 | Alice
The reverse holds too — CREATE TABLE over pgwire transacts a normal Datahike schema, and the next (d/q …) from Clojure sees the rows you just inserted. There is no shadow representation, no separate metadata. One datom store, two query languages.
Using the library directly
Two ways to skip the standalone jar — start a server from your own JVM application, or bypass the wire layer entirely.
Same pgwire surface, in-process. The integration patterns earlier in this post are the embedded-library API; the standalone jar wraps the same calls behind CLI flags.
Bypass the wire entirely
Tests and in-process applications don’t need the wire layer at all:
def h: pg/make-query-handler(conn)
h.execute("CREATE TABLE person (id INT PRIMARY KEY, name TEXT)")
h.execute("INSERT INTO person VALUES (1, 'Alice')")
h.execute("SELECT * FROM person")
(def h (pg/make-query-handler conn))
(.execute h "CREATE TABLE person (id INT PRIMARY KEY, name TEXT)")
(.execute h "INSERT INTO person VALUES (1, 'Alice')")
(.execute h "SELECT * FROM person")
Same SQL surface, no socket. Useful for property-based testing of SQL workloads, or for embedding the SQL interface inside a Clojure or ClojureScript application without exposing a port.
Permissive vs. strict compat
By default the handler rejects unsupported DDL — GRANT, REVOKE, CREATE POLICY, ROW LEVEL SECURITY, CREATE EXTENSION, COPY — with SQLSTATE 0A000 feature_not_supported. Most ORMs emit some of these unconditionally. Two ways to relax:
;; silently accept every auth/RLS/extension no-op (Hibernate, Odoo)
pg/make-query-handler(conn {:compat :permissive})
;; accept specific kinds only
pg/make-query-handler(conn {:silently-accept #{:grant :policy}})
;; silently accept every auth/RLS/extension no-op (Hibernate, Odoo)
(pg/make-query-handler conn {:compat :permissive})
;; accept specific kinds only
(pg/make-query-handler conn {:silently-accept #{:grant :policy}})
The named presets in datahike.pg.server/compat-presets cover the common ORM patterns.
SQL or Datalog?
Both interfaces see the same datoms, the same indexes, the same history. The choice is about how the query reaches the engine.
Reach for SQL when callers don’t share a runtime with the database — services over the wire, analysts in Metabase, tools that only speak the wire protocol — or when you want existing tooling: ORMs, migration runners, BI dashboards.
Reach for Datalog when the query runs in the same process as the database. Datahike’s Datalog API is a Clojure function: pass values in, get values out, no parsing, no serialization, no socket. Even pg-datahike’s embedded mode (the make-query-handler path shown above) still goes through the SQL parser and the translator; Datalog skips both. You can invoke arbitrary Clojure functions inside predicates, return live data structures without copying, and join across multiple databases on different storage backends in a single query.
The two paths compose. DDL via Flyway over SQL, then reads in Datalog from your Clojure backend. Or: Datahike schema in Clojure, ORM-driven CRUD over SQL. Both stay coherent because they’re views of the same datom store.
Compatibility evidence
We test pg-datahike against the same suites the Postgres ecosystem uses on itself. If a suite passes here, the apps that depend on it generally work here.
Layer
Test suite
Result
What this proves
JDBC driver
pgjdbc 42.7.5 — ResultSetTest
80 / 80
Cursors, type decoding, and metadata behave the way every JVM Postgres client expects.
Java ORM
Hibernate 6 — DatahikeHibernateTest
13 / 13
JPA stacks — Spring, Quarkus, Jakarta — talk to pg-datahike the same way they talk to Postgres.
Python ORM
SQLAlchemy 2.0 dialect
16 / 16 across 7 phases
The Python data ecosystem — Django, Flask, FastAPI, Airflow, dbt — connects via the standard dialect path.
SQL semantics
sqllogictest
779 assertions, 61 files
Cases derived from PostgreSQL's regression suite, expressed in the sqllogictest format SQLite, CockroachDB, and DuckDB use for their own correctness work.
Real application
Odoo 19 — --init=base --test-tags=:TestORM
11 / 11 cases, ~38k queries, zero translator errors
A 200-table ERP with one of the most demanding open-source ORM layers boots and passes its own test suite.
BI tool
Metabase native SQL
20-probe MBQL sweep
Schema introspection, prepared statements, and result handling work for the paths real BI tools depend on.
Migration roundtrip
Chinook + Pagila pg_dump fixtures
Chinook: byte-equal roundtrip. Pagila: schema parses, data loads.
A real Postgres database can be exported, replayed in pg-datahike, and dumped back — schema and data preserved through the round-trip.
Internal
Unit suite
544 tests, 1603 assertions
Standard regression coverage.
Per-commit suites run on CircleCI. Odoo, Metabase, and psql / libpq (\d, \dt, \df family) are run on a manual harness before each release. A dedicated compatibility page with linked test artifacts and a published gaps registry is in flight.
Try it
Download the jar from GitHub releases, java -jar pg-datahike-VERSION-standalone.jar, point psql at it. To embed in a JVM app, the coordinate is org.replikativ/pg-datahike on Clojars. Repo, docs, and issues at github.com/replikativ/pg-datahike; feedback to contact@datahike.io.
A follow-up post will cover the structural-sharing model that makes branching O(1), what merge! does, and the same workflow across every Datahike binding (Clojure, Java, JavaScript, Python, the C library, the CLI, and SQL). Subscribe to the RSS feed.
We’re happy to announce a new release of ClojureScript. If you’re an
existing user of ClojureScript please read over the following release
notes carefully.
Async Functions
Now that ClojureScript targets
ECMAScript 2016 we can
carefully choose new areas of enhanced interop. Starting with this
release, hinting a function as ^:async will make the ClojureScript
compiler emit an
JavaScript
async function:
(refer-global :only '[Promise])
(defn ^:async foo [n]
(let [x (await (Promise/resolve 10))
y (let [y (await (Promise/resolve 20))]
(inc y))
;; not async
f (fn [] 20)]
(+ n x y (f))))
In the last Clojure survey support for async functions dominated the
list of desired ClojureScript enhancements for JavaScript
interop. This enhancement eliminates the need to take on additional
dependencies for the common cases of interacting with modern Browser
APIs and popular libraries.
For a complete list of fixes, changes, and enhancements to
ClojureScript see
here
Contributors
Thanks to all of the community members who contributed to ClojureScript 1.12.145
The Moment Everything Breaks (And Why It Always Happens)
It often begins the same way. The system performs well, traffic increases, data volumes grow, and new features accumulate. Then, gradually, performance degrades. Deployments slow down, bugs become harder to trace, and engineers spend more time debugging than building. What once felt scalable begins to feel fragile.
This is the underlying challenge of data-intensive systems: as data grows, complexity tends to grow with it.
Most teams respond predictably—by adding more tools, more layers, and more abstractions. But this often compounds the problem rather than solving it.
What if the solution to scale isn’t added complexity, but reduced complexity? This is the core philosophy behind Clojure, created by Rich Hickey.
This guide explores how to build scalable data architectures using simple, data-centric approaches—without compromising performance, reliability, or developer productivity.
The Problem: Why Data-Heavy Systems Become Unmanageable
🔹 The “Box Problem” in Traditional (Object-Oriented) Systems
In Java, data is wrapped inside objects.
It works at the beginning of the application. But over time, complexity accumulates and becomes harder to manage. Why?
Because objects hide data:
Teams lack visibility into the system’s contents without performing analysis.
Logic and data are tightly coupled.
Changes ripple unpredictably.
This reflects a core limitation of Object-Oriented Programming: teams gradually shift from working with data to contending with the systems that encapsulate it.
Complexity outpaces data growth, and that’s where things get messy.
As systems grow, teams often introduce:
Caching layers.
Queue systems.
Synchronization mechanisms.
Each “solution” adds more complexity.
📌Google’s SRE guidelines are pretty clear: if you make things complicated, you’re asking for trouble. Reliability drops, so keeping things simple really matters.
The Clojure Philosophy: Simple Data Over Complex Abstractions
Clojure takes a totally different approach. Forget all those complicated wrappers and abstractions. It just treats data as data — plain and straightforward. Do not stack items; no unnecessary layers.
🔹 Plain Maps and Vectors
In Clojure:
In Clojure, data is represented as plain maps, vectors, and sets. No classes. No hidden behavior. Just data that is easy to inspect, easy to serialize (JSON, EDN), and easy to transform across services, pipelines, and systems without rewriting everything.
🔹 Why This Matters for Data-Heavy Systems
Easy to inspect.
Easy to serialize (JSON, EDN).
Easy to transform.
You can pass data across:
Services.
Pipelines.
Systems.
Without rewriting everything.
🔹 Structural Sharing (Scale Without Memory Explosion)
Clojure uses persistent data structures. There are no full dataset copies — it reuses what’s the same and stores only what’s new. Teams end up with millions of records but almost no additional memory overhead
Teams end up with millions of records, but almost no additional memory gets used.
🔹 Immutability: The Foundation of Simplicity
Immutability is the core idea. Once the team creates data, it stays exactly as it is — no messing around, no changes. That’s where the simplicity comes from. Instead:
New versions are created.
Old versions remain intact.
This eliminates:
Side effects.
Unexpected state changes.
And enables safe concurrency.
Keeping Data Correct with Malli (Schema Without Pain)
The bigger a system gets, the trickier it is to keep data in line. Everyone is worried about data going off track—so how does a team maintain strict control? That’s where Malli steps in.
🔹 So, What is Malli Clojure?
It’s a lightweight schema library that validates data and ensures teams aren’t sending anything unusual. Simple as that.
Example:
Clear Errors Instead of Chaos
Whenever the app breaks down and produces unclear errors, Malli tells teams straight-up what’s wrong, so they can fix errors fast:
Instant Output:
🔹 Why Malli Fits Data-Heavy Systems
Teams benefit from flexible schemas that adapt to changing data as conditions evolve, avoiding rigid constraints that disrupt the flow and enabling seamless, continuous adaptation.
Malli integrates seamlessly into environments where teams are managing growing datasets and evolving requirements.
It is designed to scale, maintaining stability even when data becomes unpredictable.
🔹 Better Errors = Faster Debugging
Validation messages are precise and actionable, enabling teams to quickly identify both the location and the cause of issues.
Identify issues early, avoid costly downtime, and maintain uninterrupted system operations.
Because errors are clearly surfaced, teams spend less time diagnosing issues and more time resolving them.
Concurrency Without Chaos
Concurrency is where most systems break.
Locks. Deadlocks. Race conditions. Clojure avoids all of this.
🔹 Why Immutability Removes the Need for Locks
Because data is immutable, multiple threads can read it safely without requiring synchronization.
This is a direct benefit of:
Functional Programming.
🔹 core.async and Event Streams
core.async makes handling streams simple.
Example:
🔹 Scaling with Simplicity
Fewer race conditions:
No more problems from the shared state.
Data flows in a way that teams can actually follow.
Parallel code runs safely.
Teams sidestep those troublesome timing errors.
Debugging doesn’t have to be difficult:
Data is not buried within opaque structures—it remains explicit and directly accessible in maps.
The absence of side effects makes issues easier to trace and resolve.
Teams get the same results wherever they run their code.
Clear error signals (especially with Malli).
📌 See how this worked for a real-world high-traffic site in our Livesport Case Study.
The REPL Advantage: Building Systems Live
🔹 Instant Feedback Loop
Teams move beyond the traditional cycle of writing, building, deploying, and waiting. With a REPL, they can execute code immediately and receive instant feedback.
🔹 Test with Real Data
Need to understand how changes behave with real data? Simply load production data, experiment with live transformations, and debug in real time—while the system continues to run.
🔹 Continuous System Evolution
The overhead of long build cycles is eliminated. Teams can shape and refine their systems in real time, without delays or uncertainty.
🎬 Clojure exemplifies this approach—teams aren’t just writing code; they are interactively evolving their systems. You can see this in action in the video here:
Designing a Scalable Data Architecture in Clojure
As systems begin to handle larger volumes of data, complexity can escalate quickly. Some systems continue to perform reliably, while others struggle under the load. The difference is rarely accidental—it is largely determined by the underlying architecture.
Clojure takes a different path. It keeps things simple from the start—and that’s what makes it scale.
🔹 Core Principles of Simple Data Systems
➜ Data-First Design
In many systems, logic comes first. Data is secondary. In Clojure, it’s the opposite. Data comes first.
Teams use maps, vectors, and sets.
Data is easy to read and inspect.
Nothing is obscured behind layers of objects; data remains transparent and directly accessible.
And that changes how you build systems. Instead of designing classes, teams work with data flows.
Why this helps:
Debugging is easier.
Data moves cleanly between services.
Teams don’t break things when requirements change.
➜ Stateless Services
Each part of the system does one simple thing:
Takes data in.
Changes it.
Returns new data.
That’s it. No hidden state. No surprises. This works because of:
Immutability.
Functional Programming.
What teams get:
Teams can scale services easily.
Running things in parallel is safe.
Testing becomes straightforward.
➜ Clear Boundaries
As systems grow, boundaries tend to blur. One service starts doing too much. Data shapes drift.
Clojure pushes you to keep things clear.
Define what data should look like.
Validate it using tools like Malli.
Keep the functions concise and to the point.
When teams do this,
Each service runs on its own—so if something crashes, it doesn’t drag everything down with it.
Teams don’t end up with problems bouncing around the whole system.
The system remains transparent and simple.
🔹 Recommended Stack
➜ Clojure Backend
Clojure keeps backend logic simple. Teams use small functions to shape data, so everyone’s right next to the real information. It clears out the interference.
Fewer lines of code mean teams reduce risks and clarify their goals.
➜ Event-Driven Architecture
Instead of calling each other up, services just broadcast events to the world, allowing the appropriate recipients to receive them.
So, when something happens, teams create an event. The rest of the system listens and responds as needed. It’s a cleaner way to connect everything without binding them too firmly. Everything runs independently.
As Martin Fowler explains, event sourcing lets teams rebuild system state by replaying events. That makes systems easier to scale and debug.
What this gives teams:
Loose coupling.
A clear history of what happened.
The ability to replay and fix issues.
🔹 Patterns to Follow
➜ Data Pipelines
Think of the system as a pipeline.
Each step is simple:
Take data.
Return new data.
Why this works:
Easy to follow.
Easy to test.
Easy to scale.
➜ Event Sourcing
Save the full history, not just the current version. For example:
UserCreated
OrderPlaced
PaymentProcessed
The state results from all these events.
As Martin Fowler points out, this lets you:
Rebuild the state anytime.
Debug past issues.
Keep systems resilient.
➜ Functional Transformations
In Clojure, most work is done through small functions.
Simple. Predictable. Testable.
Why it matters:
No side effects.
Same input → same result.
Easy to test.
🔹 Example: A Transformation Pipeline
What does this indicate:
Check if the input is good.
Add extra data.
Apply business logic.
Return a new version.
No mutation. No hidden steps.
📌Martin Fowler puts it well: event sourcing lets a system rebuild everything it needs just by replaying a series of events. This keeps systems solid and ready to scale.
📌With Apache Kafka, data doesn’t sit stuck in batches—you receive it in real time.
Why Simplicity Wins (Business Perspective)
It’s not only about clean code. Simple systems save money, let teams move faster, and prevent failures. They’re easier to handle and easier to expand.
🔹 Lower Infrastructure Costs
Complex systems grow in layers.
More services, more duplication, more overhead. Simple systems stay lean.
Data is stored and passed efficiently.
Fewer components doing the same work.
Less need for constant scaling.
What this means in practice:
Lower cloud bills.
Better performance with the same hardware.
Fewer surprises as data grows.
Teams are not paying extra just to manage complexity.
🔹 Faster Developer Onboarding
When a system is hard to read, new developers slow down. They depend on others to understand how things work. Simple systems remove that friction.
Code is clear and direct.
Data flows are easy to follow.
Logic is not hidden behind layers.
The impact:
New developers get productive fast.
Less reliance on “tribal knowledge”.
Teams spend more time building, less time explaining.
And when someone leaves, the system doesn’t become a mystery.
🔹 Reduced Failure Rates
When things get complicated, failures follow. Hidden states, complex dependencies, and surprise side effects make bugs a pain to find. But simple systems just work better—they’re easier to predict.
Give them the same input, and they’ll spit out the same output every time.
Teams don’t see weird interactions between parts, so problems don’t hide as easily.
Tracking down issues gets a whole lot simpler.
What does this mean?
Teams deal with reduced production problems.
If something does go wrong, teams fix it faster.
There’s less downtime, and any issues that pop up don’t cause as much damage.
Real-World Use Cases
Simple data systemsaren’t just a buzzword—it’s what keeps high-volume, real-time systems running smoothly, even as things shift and scale up.
🔹 High-Throughput Systems
Let’s begin with systems that receive events continuously, leaving little opportunity for delays or unforeseen errors.
➜ Financial systems
Payment processing or trading platforms are pretty unforgiving. They process thousands of transactions each second, and teams can’t have mistakes or inconsistent data.
Simple, data-focused architecture really shines here: each transaction is just data, tracked as an event that teams can review or replay. When things go wrong, it’s a lot easier to pinpoint exactly what failed and roll everything back to a safe place.
What does this fix?
Teams get clear audit trails.
It’s safer when many transactions occur at once.
Tracking down the root of any problem is much faster.
➜ Sports/live data platforms
These systems face heavy loads with real-time updates—millions of people are refreshing nonstop. No matter how wild the traffic gets, the data everywhere needs to stay perfectly in sync.
That’s where tools like Apache Kafka help out. Score changes and other updates stream through constantly, and every part of the platform reacts right away, without missing a beat.
Why keep it simple?
Real‑time means updates don’t fail.
The system stays steady, even during huge spikes.
Finding and fixing live bugs isn’t a big deal.
🔹 AI/ML Data Pipelines
AI systems are addicted to clean, reliable data. That’s usually where messy architectures fail.
➜ Feature pipelines
Before models do anything, teams need to turn raw data into usable features. Data arrives continuously from everywhere, and each transformation has to stay consistent.
With a basic pipeline, teams always know what’s happening at each step. It’s way easier to test things, catch mistakes, and keep everything running as expected.
What teams get:
Fewer strange data mix-ups.
Faster fixes when the models lose alignment.
Better performance as the models learn.
➜ Data preprocessing
Data preprocessing is just as important. This is all the data cleaning, normalization, and blank-filling before training or inference.
If teams build things in a clear, functional way—no hidden side effects—each step is independent, and they can replay everything if needed.
That makes a difference:
The results can be reproduced, not simply assumed.
Testing and experimenting are smoother.
It’s less likely that teams will miss hidden data errors.
📌 If you’re fed up with struggling against chaotic, hard-to-understand systems and want to build something solid, we can help. Check out our AI and Machine Learning services.
🔹 Rich Hickey explains more about building data-heavy systems in his talks — explore below:
The ideas explored in this post are deeply rooted in the work of Rich Hickey. His talks have shaped how the Clojure community thinks about simplicity and data
The following talks by Rich Hickey form the intellectual foundation of the blog post “Building Data-Heavy Systems in Clojure Without Losing Simplicity.” These resources are recommended for further reading and should be linked from the blog where relevant.
Summary: Hickey’s most influential talk. Reframes how developers think about complexity and simplicity — the philosophical backbone of the blog’s entire approach
Summary: A deep dive into why values win over variables. Hickey demonstrates that immutability eliminates entire categories of bugs around shared state, making it the natural foundation for concurrent, data-heavy systems.
Summary: Explores how software should explicitly model time. Argues that values should be immutable by default and that mutable state is a source of accidental complexity. This talk is the intellectual foundation for Clojure’s design around immutable data structures.
Summary: Focuses specifically on Clojure’s two defining traits: data orientation and simplicity. Covers how these characteristics lead to faster time to market, smaller codebases, and better quality — exactly what the blog promises for data-heavy systems.
Summary: Hickey argues that traditional OOP and relational databases entangle value, identity, and state in ways that make reasoning about data evolution difficult. Directly relevant to the blog’s argument about avoiding hidden state in data-heavy systems.
Summary: Examines how the architecture of distributed systems (multiple communicating programs) compares to single-program architecture. Explores tradeoffs in data formats and what characteristics well-designed system components should have.
❓ FAQs
Q1: Why is Clojure good for data-heavy systems?
Because it keeps data simple. You work with maps and vectors. No hidden state. No complex object layers. So it’s easier to track data, change it, and debug issues—even when the system grows.
Q2: What makes Clojure simpler than Java?
It avoids a lot of moving parts.
No heavy object-oriented structure.
No mutable state by default.
Fewer abstractions.
You write less code. And it’s easier to see what’s going on.
Q3: Is functional programming better for big data?
Often, yes. Functional Programming removes side effects. That makes systems more predictable. When things are predictable, parallel execution is simple and stable.
Q4: What is Malli in Clojure used for?
It checks your data. You define what valid data looks like. Malli checks the data and makes it reliable across services.
Q5: How does immutability improve scalability?
If data doesn’t change, no race conditions or state‑sharing bugs.
If you want to update something, just make a new version instead of changing the old one. That means different parts of the system run in parallel without conflict.
Scaling up gets a lot easier and less risky.
Q6: Can Clojure handle real-time data streams?
Absolutely. Clojure comes with tools like core.async, so you can process streams of data in real time. It lets you build systems that
Keep up with data as it comes in.
Handle events right away.
Scale out without getting blocked.
That’s why it’s such a good fit for streaming or event-driven applications.
Conclusion: Clojure Simplicity as a Competitive Advantage
Most teams believe: “Complex systems require complex solutions.”
Clojure proves the opposite.
By embracing:
Simple data.
Immutability.
Functional design.
You get:
Faster systems.
Lower costs.
Happier developers.
And most importantly: Teams build systems that don’t collapse under their own weight.
📞 Book a Scalable System Audit
Connect with Flexiana’s experts to get a clear view of your system.
We’ll dig into your architecture, pinpoint what’s slowing you down, and work with you to map out a plan that simplifies your setup and makes it ready to grow.
Browser Server
| |
|--- Request (with session ID) ->|
| |--- Looks up session ID in store
| |--- Retrieves session data
|<-- Response -------------------|
A common pattern is: the server creates a session, stores the session ID in a cookie, and on each request the browser sends that cookie so the server can find the right session data.
Quick comparison
Feature
Cookie
Session
Storage location
Client (browser)
Server
Size limit
~4KB
No strict limit
Lifetime
Configurable
Usually browser session
Security
Less secure
More secure
Performance
Faster (no server lookup)
Slightly slower
Use case
Preferences, tracking
Auth, sensitive data
Rule of thumb: Use cookies for small, non-sensitive data you’re okay storing client-side. Use sessions for sensitive data or anything that shouldn’t be exposed to the client.
This is a collection of various Babashka scripts that I use in my back-up process. The back-ups and back-up locations are all custom and offline so I had to have a custom solution.
To illustrate the usage with an example, I use repository-state script to ‘monitor’ the files and folders in a root folder that I declared as a ‘repository’. The script output functions like a report of what is present in the repository and the state of the files at the time of the report.
When I want to create a backup of the repository, I create a snapshot using directory-snapshot. The script zips the whole directory and appends the current date to the archive name. Since the repo contains the output of repository-state it is included into the archive, so the archive now has info about the repository.
The directory-sync compares two directories and if needed synchronizes them. I use it when I want to perform back-ups (one or more directory snapshots, for example). The comparison produces an EDN output which I manually inspect and modify according to what am I doing at the moment. The EDN is used later for synchronization. I repeat the process for all my back-ups and redundant back-up locations.
The hex-dump was just fun to write, but it can be used to compare binary files.
The theme will be the design, long-term development, and operation of reliable
systems. Enjoy three days of workshops, talks, and conversation relevant for
industry veterans, ranging from senior developers, tech leads, VPs of
Engineering to CTOs.
The conference will showcase engineering approaches within the Clojure
ecosystem, but with a strong emphasis on broader applicability. It will explore
the real-world portability of these concepts to other tech stacks, mutual
inspiration across different communities, and adapting to AI-assisted
development.
Join the mailing list for early bird tickets and announcements.
Share your ideas and content suggestions when you sign up.
Clojure/Conj 2026 CFP
We’re looking for 40-minute talks that go beyond the basics: hard-won lessons,
production stories, trade-offs, deep dives into language features, libraries,
or tools, and ideas that change how people build things. Tracks include:
Language, Experience Report, Library, Tools, AI, Ideas, and Fun.
Join us for the largest gathering of Clojure developers in the world! Meet new
people and reconnect with old friends. Enjoy two full days of talks, a day of
workshops, social events, and more.
September 30 – October 2, 2026
Charlotte Convention Center, Charlotte, NC
joyride0.0.74 - Making VS Code Hackable like Emacs since 2022
baredom2.7.0 - BareDOM: Lightweight CLJS UI components built on web standards (Custom Elements, Shadow DOM, ES modules). No framework, just the DOM
r11y1.0.6 - CLI tool for extracting URLs as Markdown
phel-lang0.35.0 - A functional, Lisp-inspired language that compiles to PHP. Inspired by Clojure, Phel brings macros, persistent data structures, and expressive functional idioms to the PHP ecosystem.
metamorph.ml1.5.1 - Machine learning functions based on metamorph and machine learning pipelines
epupp0.0.19 - A web browser extension that lets you tamper with web pages, live and/or with userscripts.
kairos0.2.62 - Crontab parser for Clojure with human-readable cron explanations