Triple Store, Triple Progress: Datalevin Posited for the Future

Interactive Short Query Performance

In my previous post, Competing for the JOB with a Triplestore, I showed that a triple store, such as Datalevin, can compete with the best row stores on complex relational workloads. Since then, I have rewritten Datalevin's rule engine and improved its storage and query engine. This post focuses on why these matter for the broader goal of using a triple store as a single, flexible data substrate.

One store, many workloads

Our goal is to simplify data storage and access by supporting diverse database use cases and paradigms, because maximal flexibility is the core strength of a triple store. Using one data store for different use cases simplifies and reduces the cost of software development, deployment, and maintenance.

Since 2020, we have been working hard toward this goal of building an easy-to-use and versatile database. I am happy to report that, today, in addition to key-value and relational database features, Datalevin also handles graph queries and deductive logic reasoning tasks, with built-in support for full-text search and vector similarity search as well. All these are seamlessly integrated in a compact package that works in both embedded and server modes.

This post is a guided tour of the progress made so far in the storage engine, the query engine, and the new rule engine.

Triples as the substrate

Datalevin stores data as triples: entity, attribute, value (EAV): the smallest, atomic unit of a data item. This uniform representation is the key that lets one engine handle many shapes of data and many ways of asking questions. A triple store can behave like a relational system when your data is tabular, like a graph system when your data is connected, and like a logic system when you define recursive rules.

The challenge has always been performance. Triple stores have historically been slower than row/column stores. The rest of this post explains how Datalevin addresses this challenge.

Storage

Datalevin uses a fast key-value database library as the storage layer. Specifically, the exceptional read performance of LMDB is the foundation of Datalevin's query performance. Datalevin stores triples with nested indices by leveraging LMDB's DUPSORT capability: the head element of a triple is stored once as the key, and the tail elements are stored as a sorted list of values. This reduces storage overhead and alleviates the data redundancy problem inherent in triple stores.

To further reduce data redundancy, we built our own LMDB fork, DLMDB, which adds page-level prefix compression to the storage. For DUPSORT, prefix compression is applied to both keys and values, resulting in significant storage savings. We also removed the lesser-used VAE index. For a typical reference-heavy (foreign key) Datalog database, the footprint reduction can be over 40%.

Through relentless code optimization, we achieved these savings without incurring excessive read/write overhead. In fact, for the common Datalevin use case of seeking to a key and reading its list of values in full, we obtained a 40% speedup in most cases.

The Datalevin query planner relies heavily on online counting and sampling. To facilitate these operations, we added subtree node count maintenance in DLMDB. These order statistics turn counting and sampling operations from O(n) to O(log n), cutting Datalog query planning time in half. This feature was introduced with minimal write overhead.

Query engine

Datalevin's query planner performs extensive query rewrite passes to optimize performance. For example, predicates are pushed down into index scans so filters execute early rather than after joins; inequality predicates are rewritten into index range scans, and so on.

The planner simplifies the query graph by treating stars as meta-nodes, then applies a Selinger-style dynamic programming algorithm with accurate cardinality estimates. Merge scans collapse star-shaped entity access into a single scan, avoiding redundant joins for attributes belonging to the same entity group. More details on these optimizations can be found in the query engine documentation.

Even with relatively accurate cardinality estimation, occasional bad plans are unavoidable, particularly for tricky joins that link different entities. To reduce the impact of such cases, the cost-based optimizer now considers hash joins as an alternative when input size is large. We also extended the optimizer's coverage to more complex query clauses, such as or-join.

We made the multi-threaded query execution pipeline more robust by handling edge cases in concurrency and adding backpressure. The pipeline now uses its own thread pool to avoid contention with worker thread pools.

Other improvements have focused on usability: "creature-comfort" features like a :having clause, allowing math expressions in :find, specifying sort variables by indices in :order-by, full boolean search expressions for full-text, and so on. These features reduce the amount of custom code needed for post-processing results, and in-database operations are usually more efficient than equivalent work in application code.

Rule engine

Datalevin uses rules to bundle a set of query clauses into named, reusable invocations. As rules can call themselves and other rules, this feature enables recursive logic computation and graph navigation. We wrote a new rule engine that leverages the same cost-based optimizer, allowing Datalevin to serve as an efficient graph database and logic reasoner.

The rule engine uses semi-naive fixpoint evaluation, applies magic-set rewrites where beneficial, and seeds evaluation from outer query bindings so it starts with relevant candidates instead of a blank slate.

In addition, non-recursive rule clauses are inlined into the main query to let the optimizer plan them with index scans and joins. For T-stratified rules, temporal elimination avoids storing unnecessary intermediate results. Detailed information is available in the rule engine documentation.

Benchmarks

We added two new benchmarks to showcase the progress we have made.

Logic workloads: Math Genealogy benchmark

The Math Genealogy benchmark focuses entirely on rule resolution. It is a good stress test for recursive Datalog rules. The dataset contains roughly 256,769 dissertations, 256,767 students, and 276,635 advisor-student relationships. There are four queries in this benchmark. Datalevin's rule engine is very fast on these queries: Q1 (14.4 ms), Q2 (330.9 ms), Q3 (269.6 ms), and Q4 (recursive academic ancestry, 2.9 ms).

By comparison, Datomic takes over 40 seconds on Q4, and Datascript runs out of memory. This query is difficult because recursive ancestry computes a transitive closure: each new level of ancestors can join with every previously found level, which can quickly create a combinatorial explosion of intermediate tuples. Even if the average branching factor is modest (say b=3–5 advisors per student), intermediate results can grow on the order of b^k at depth k. If those tuples are generated repeatedly across branches, we end up materializing large intermediate relations just to discard most of them later.

Why is Q4 so fast in Datalevin? It exploits bound starting points. The semi-naive fixpoint evaluation works off delta relations only, i.e., each iteration only joins newly produced tuples. When a query binds a head argument, the engine also seeds recursion (and, when effective, applies magic-set rewrites) so it only explores the reachable slice of the graph rather than materializing the full closure.

Graph workloads: LDBC SNB benchmark

Graph workloads are where triple stores should shine, but performance is where dedicated graph databases usually try to defend their turf. The LDBC Social Network Benchmark (SNB) is an industry-standard benchmark for interactive graph queries. We implemented the full workload and included a Neo4j implementation for comparison here.

On the SF1 dataset (about 3.2M entities and 17.3M edges), Datalevin is 27x to 620x faster on short interactive queries (pictured above), averaging 48x faster than Neo4j. These short queries are the most commonly encountered workloads in an operational graph database, so performing well on them has significant practical implications.

Some of these short queries (IS2 and IS6) involve unbounded graph traversal, such as finding the root post of a comment. Datalevin handles these with a recursive rule. Thanks to the efficiency of our rule engine, graph navigation performance is stellar.

On complex queries, Datalevin is about 12% faster overall, with some queries (IC6, IC8, IC11) orders of magnitude faster and a few (IC3, IC5, IC9) slower than Neo4j. I am sure Neo4j is extensively tuned for these queries, as it is one of the authors of this industry-standard benchmark. It is remarkable that Datalevin performs so well on these complex graph queries without any specific tuning.

The important observation here is that the same triple store and query engine handle both relational-style joins and graph traversals without needing special cases for either.

Towards the future

A triple store is a flexible substrate. When paired with a cost-based query optimizer and a modern rule engine, it can span relational, graph, and logical reasoning workloads. It can also expand toward richer document workloads without changing the underlying model.

With the current focus on AI systems, a triple store like Datalevin can serve several critical purposes.

An AI agent needs a context graph: entities, facts, relations, constraints, and memories that evolve over time. Keeping that context in one integrated store reduces the impedance mismatch between relational tables, graph edges, embeddings, and documents. Datalevin makes retrieval and grounding first-class: full-text and vector search can pull candidate facts, while Datalog queries and rules can verify, connect, and constrain them. It can also support tool use, where the "tool outputs" are simply more facts to join and reason over. For example, in a RAG pipeline, vector search retrieves candidate snippets, graph relations link them to entities and events, and rules enforce constraints such as provenance or recency.

Datalevin can serve as an agent's memory model, where episodic facts, long-term knowledge, and computed embeddings all live in one place and can be queried together. In that sense, a unified store is to an AI agent what memory is to human cognition: the common ground where different kinds of signals meet to be reasoned over.

Even as AI writes more code, a simple and versatile database remains highly AI-friendly. A context-limited LLM model benefits from a coherent data model and a single query language that covers many use cases. Datalog is truly declarative, meaning there are fewer procedural or implementation details for a model to trip over. That translates to less boilerplate, fewer dialects to remember, and fewer quirks in query semantics, making it more likely that the system handles data correctly. In fact, AI wrote all the Datalevin queries used in the LDBC SNB benchmark mentioned above. Although it is still a relatively niche database, the AI composed queries for Datalevin with ease because the query language is inherently simpler.

Next steps

The recent rule engine rewrite brings us much closer to the 1.0 release milestone. With the addition of high availability, a JSON API, and libraries for other languages, we expect to reach this milestone this year.

After six years of continuous research and development, Datalevin would not be in its current hardened state without the experimentation and production deployment efforts of the Clojure community. I truly appreciate everyone who has used Datalevin and made contributions. The future of simplified data storage and access is near, for human and AI developers alike.

Permalink

Designing real systems with immutable data in Clojure

During his talk at Clojure South, Alex Miller, Principal Software Engineer at Nubank, addressed a problem that lies at the heart of every developer’s daily work: the painful ways we often handle data. Our industry is rife with fragile formats, objects laden with boilerplate code like getters and setters, and the ever-present complexities of mutable state that demand layer upon layer of protection. This friction isn’t just a technical debt; it’s a tax on our creativity and our joy.

Clojure offers a powerful alternative to this paradigm. It places simple, immutable data and pure functions at the very core of its philosophy. It’s a fundamentally different way of thinking about software rather than just a different syntax. By embracing this model, we can build cleaner, more mathematical, and incredibly powerful mental models. This approach, as Alex demonstrated, leads to systems that are not only more robust but also more joyful to create.

The journey to this simpler, more powerful way of programming begins by rethinking a word we use every day: “value.”

Back to basics: what truly is a “value”?

While programming involves manipulating various entities, Clojure’s power stems from a rigorous definition of ‘values.’ This distinction is far from a semantic triviality; it is the core design principle that gives the language its characteristic elegance. In this context, a value represents a specific category of data, defined by immutable properties that change how we reason about state.

According to Alex, for something to be considered a true value, it must possess four key characteristics:

  • Immutable: A value cannot be changed. The number 100 doesn’t magically become 50. Once created, it is constant for all time.
  • Comparable: You can take two values and determine if they are the same. This property allows for sorting, indexing, and reasoning about equality in a straightforward way.
  • Sharable: A value can be passed to another thread, sent over a network, or written to disk and read back without fear of it being altered. Its integrity is guaranteed.
  • Precise: To be sharable and comparable, a value must have a precise, concrete representation—ultimately, as a specific sequence of bits.

This definition clarifies what is not a value. Things like network sockets, file handles, streams, and mutable objects fail this test. 

“Those things are not values… You cannot compare them or write them down precisely.”

Alex Miller, Principal Software Engineer at Nubank

They are processes or resources, not facts. A subtle but important distinction exists here: a file path can be treated as a value, as it is an immutable, precise description. A file handle, however, representing a live connection to a changing resource, cannot.

Embracing immutability for all values, including collections, has profound consequences that simplify system design immensely:

  • No locks needed for reading: Since a value can never change, you can read it from any number of threads without locks or synchronization.
  • Fearless sharing: You can freely pass data structures between concurrent processes or distributed systems, confident that you are sharing information, not a potential source of bugs.
  • Effortless caching and history: Caching becomes trivial when the data never changes. It also enables powerful features like undo stacks or “time-travel” debugging, as you can simply hold onto references to previous states.

This design is not without its costs; it is a conscious engineering choice made for immense benefits. 

“These are awesome properties. They’re not free… but immutability is definitely worth the price that you pay for it.”

Alex Miller, Principal Software Engineer at Nubank

This powerful philosophy isn’t just for single numbers or strings; in Clojure, it extends to the very collections that structure our data.

The bedrock: simplicity in four core collections

If Clojure South had a silent protagonist, it was the humble map. This is no accident. Clojure deliberately rejects a sprawling ecosystem of collection types in favor of a small, powerful bedrock of four core structures. This strategic choice to favor a minimal set of data structures fosters immense reusability and clarity.

Clojure is built on four primary collection types, each serving a distinct structural purpose:

  • Lists: Sequential collections where values are efficiently added to the front.
  • Vectors: Indexed, sequential collections where you can quickly access any element by its position.
  • Maps: Unordered collections of key-value associations.
  • Sets: Unordered collections of unique values.

What makes these collections so powerful is not just that they are immutable, but that they are all accessible through a single, universal vocabulary of functions. In many object-oriented languages, “every class invents a new language.” To get a user’s name, you call user.getFirstName(); for an order’s total, order.getTotal(). This is a major source of friction; developers must constantly context-switch, learning a new mini-protocol for every object they encounter.

Clojure rejects this bespoke complexity. Instead, it provides a unified set of functions—get, keys, vals, seq, count—that work on all data. This common vocabulary dramatically reduces cognitive overhead. As Alex Miller noted, once you 

“Learn how to use map, filter, remove and you know how to navigate your entire data set.” 

Alex Miller, Principal Software Engineer at Nubank

The same tools work everywhere, making data feel close, transparent, and easy to reshape.

This common language for manipulating data is the key that unlocks Clojure’s central, and remarkably simple, algorithm for solving problems.

The core algorithm for programming

At the very heart of the talk, Alex presented the key slide—a simple, two-step algorithm that encapsulates the entire Clojure development philosophy. It is an idea so powerful that it can fundamentally reshape how one approaches building software.

“There is a very simple algorithm for programming in Clojure: represent your domain as data and write functions that transform that data. That’s it.” Alex Miller, Principal Software Engineer

This two-step model of “data in, data out” is profoundly effective for several reasons:

  • Predictable & Testable: Pure functions that operate on immutable values are the bedrock of predictability. Given the same input, they will always produce the same output, making them trivial to reason about, test in isolation, and debug.
  • Composable: Small, pure functions are like Lego bricks. They can be combined and composed into larger, more complex functions that retain the same desirable properties of predictability and testability.
  • Safe for Concurrency: By eliminating shared, mutable state, this model sidesteps an entire class of notoriously difficult concurrency bugs. There are no race conditions or deadlocks when functions are simply transforming immutable data.
  • Stable & Evolvable: Systems built this way are remarkably stable and easy to evolve. Adding a new feature often just means adding a new key to a map. Existing functions that don’t know about the new key will continue to work, completely unaffected. This allows systems to grow by addition rather than by modification, reducing the risk of breaking existing code.

    This might sound elegant in theory, but its true power becomes clear when applied to real-world problems.

    From to-do lists to music: data in action

    To demonstrate the core algorithm in practice, Alex walked through a series of diverse examples, showing how any domain can be modeled as data and manipulated with functions.

    A simple to-do list

    A to-do item is not an object with methods; it is simply a map containing keys like :task, :errand, and :duration. Miller’s advice here is a cornerstone of pragmatic Clojure design: 

    “Always have a bottom data layer that bottoms out and verbose maps you’ll thank me later.”

    Alex Miller, Principal Software Engineer

    From this foundation, a common pattern emerges: write a small function to handle a single item, then another to handle the collection. The task of transforming to-do items into HTML follows this perfectly, with one function converting a single map into a Hiccup data structure, and a second mapping that function over a collection. This emphasis on small, single-purpose functions is pervasive. As Miller observed, “it’s not uncommon to look at a closure codebase and see that… the average size of a function is five lines.”

    Conway’s game of life

    The infinite grid of Conway’s Game of Life is elegantly represented not with a massive, two-dimensional array of zeros and ones, but with a sparse representation: a simple set containing the [x, y] coordinates of “live” cells. This data modeling choice is incredibly powerful. The entire game logic becomes a pipeline of pure functions that takes one set of coordinates as input and produces the next set as output, implementing the game’s rules with mathematical clarity.

    Graphics and music as data

    The pattern extends even to creative domains. A circle on a screen can be represented as a map with :x, :y, :radius, and :color keys. Similarly, a musical note can be a map with :pitch, :octave, and :duration. By modeling these entities as data, Alex Miller was able to build a mini-synth on stage, using a sequence of maps to play the iconic five-note theme from “Close Encounters of the Third Kind.”

    In every case, the solution was found not by designing complex class hierarchies, but by first asking,  “What is the simplest data structure that can represent this problem?”, and then applying simple, composable functions to transform it.

    While this model covers the vast majority of programming tasks, Clojure also provides a principled approach for the rare moments when values truly need to change over time.

    Managing change: a principled approach to state

    Inevitably, some parts of an application must change. A user’s session, a database connection, a UI component’s state—these all require managing change over time. For these scenarios, Clojure provides a clear answer that avoids the chaos of unmanaged mutability. The goal is to create a “stable logical identity that’s going to have different immutable values over time.”

    Clojure offers a small set of stateful constructs for this purpose, primarily atoms, refs, and agents. Instead of directly mutating the value they hold, you provide them with a pure function that computes the next state from the current one. This design means developers don’t perform manual locking; reads are always available without coordination, and writes are managed safely by the Clojure runtime.

    What’s most telling is how this plays out in practice. When Clojure was created, its Software Transactional Memory (STM) system, built around refs, was seen as a flagship feature. Yet, over years of production use, a surprising truth emerged. As Miller noted, developers discovered that they “can get by with an amazingly little amount of state in your system.” The disciplined, data-first approach naturally leads to architectures where the vast majority of the code is pure and stateless, reinforcing the core philosophy of the language.

    Conclusion: building on rock, not on sand

    The core message of Alex’s talk is a call for a return to simplicity. The philosophy is clear: build systems on a foundation of immutable values and pure functions, and manage state only when absolutely necessary. This approach yields programs that are easier to understand, grow, and maintain.

    Miller concluded with a powerful metaphor that perfectly captures the difference between this approach and more conventional, mutable-state-heavy paradigms. Following this philosophy is like “building on bedrock instead of building on sand.”

    The stability and clarity that come from this foundation are not just technical achievements; they are the source of genuine professional joy. It allows developers to focus on solving problems rather than fighting with the accidental complexity of their tools. The standing ovation Alex received was not just for a well-delivered talk, but for articulating a philosophy that brings the joy back into the craft of programming.

    The post Designing real systems with immutable data in Clojure appeared first on Building Nubank.

    Permalink

    I am sorry, but everyone is getting syntax highlighting wrong

    Translations: Russian

    Syntax highlighting is a tool. It can help you read code faster. Find things quicker. Orient yourself in a large file.

    Like any tool, it can be used correctly or incorrectly. Let’s see how to use syntax highlighting to help you work.

    Christmas Lights Diarrhea

    Most color themes have a unique bright color for literally everything: one for variables, another for language keywords, constants, punctuation, functions, classes, calls, comments, etc.

    Sometimes it gets so bad one can’t see the base text color: everything is highlighted. What’s the base text color here?

    The problem with that is, if everything is highlighted, nothing stands out. Your eye adapts and considers it a new norm: everything is bright and shiny, and instead of getting separated, it all blends together.

    Here’s a quick test. Try to find the function definition here:

    and here:

    See what I mean?

    So yeah, unfortunately, you can’t just highlight everything. You have to make decisions: what is more important, what is less. What should stand out, what shouldn’t.

    Highlighting everything is like assigning “top priority” to every task in Linear. It only works if most of the tasks have lesser priorities.

    If everything is highlighted, nothing is highlighted.

    Enough colors to remember

    There are two main use-cases you want your color theme to address:

    1. Look at something and tell what it is by its color (you can tell by reading text, yes, but why do you need syntax highlighting then?)
    2. Search for something. You want to know what to look for (which color).

    1 is a direct index lookup: color → type of thing.

    2 is a reverse lookup: type of thing → color.

    Truth is, most people don’t do these lookups at all. They might think they do, but in reality, they don’t.

    Let me illustrate. Before:

    After:

    Can you see it? I misspelled return for retunr and its color switched from red to purple.

    I can’t.

    Here’s another test. Close your eyes (not yet! Finish this sentence first) and try to remember what color your color theme uses for class names?

    Can you?

    If the answer for both questions is “no”, then your color theme is not functional. It might give you comfort (as in—I feel safe. If it’s highlighted, it’s probably code) but you can’t use it as a tool. It doesn’t help you.

    What’s the solution? Have an absolute minimum of colors. So little that they all fit in your head at once. For example, my color theme, Alabaster, only uses four:

    • Green for strings
    • Purple for constants
    • Yellow for comments
    • Light blue for top-level definitions

    That’s it! And I was able to type it all from memory, too. This minimalism allows me to actually do lookups: if I’m looking for a string, I know it will be green. If I’m looking at something yellow, I know it’s a comment.

    Limit the number of different colors to what you can remember.

    If you swap green and purple in my editor, it’ll be a catastrophe. If somebody swapped colors in yours, would you even notice?

    What should you highlight?

    Something there isn’t a lot of. Remember—we want highlights to stand out. That’s why I don’t highlight variables or function calls—they are everywhere, your code is probably 75% variable names and function calls.

    I do highlight constants (numbers, strings). These are usually used more sparingly and often are reference points—a lot of logic paths start from constants.

    Top-level definitions are another good idea. They give you an idea of a structure quickly.

    Punctuation: it helps to separate names from syntax a little bit, and you care about names first, especially when quickly scanning code.

    Please, please don’t highlight language keywords. class, function, if, elsestuff like this. You rarely look for them: “where’s that if” is a valid question, but you will be looking not at the if the keyword, but at the condition after it. The condition is the important, distinguishing part. The keyword is not.

    Highlight names and constants. Grey out punctuation. Don’t highlight language keywords.

    Comments are important

    The tradition of using grey for comments comes from the times when people were paid by line. If you have something like

    of course you would want to grey it out! This is bullshit text that doesn’t add anything and was written to be ignored.

    But for good comments, the situation is opposite. Good comments ADD to the code. They explain something that couldn’t be expressed directly. They are important.

    So here’s another controversial idea:

    Comments should be highlighted, not hidden away.

    Use bold colors, draw attention to them. Don’t shy away. If somebody took the time to tell you something, then you want to read it.

    Two types of comments

    Another secret nobody is talking about is that there are two types of comments:

    1. Explanations
    2. Disabled code

    Most languages don’t distinguish between those, so there’s not much you can do syntax-wise. Sometimes there’s a convention (e.g. -- vs /* */ in SQL), then use it!

    Here’s a real example from Clojure codebase that makes perfect use of two types of comments:

    Disabled code is gray, explanation is bright yellow

    Light or dark?

    Per statistics, 70% of developers prefer dark themes. Being in the other 30%, that question always puzzled me. Why?

    And I think I have an answer. Here’s a typical dark theme:

    and here’s a light one:

    On the latter one, colors are way less vibrant. Here, I picked them out for you:

    Notice how many colors there are. No one can remember that many.

    This is because dark colors are in general less distinguishable and more muddy. Look at Hue scale as we move brightness down:

    Basically, in the dark part of the spectrum, you just get fewer colors to play with. There’s no “dark yellow” or good-looking “dark teal”.

    Nothing can be done here. There are no magic colors hiding somewhere that have both good contrast on a white background and look good at the same time. By choosing a light theme, you are dooming yourself to a very limited, bad-looking, barely distinguishable set of dark colors.

    So it makes sense. Dark themes do look better. Or rather: light ones can’t look good. Science ¯\_(ツ)_/¯

    But!

    But.

    There is one trick you can do, that I don’t see a lot of. Use background colors! Compare:

    The first one has nice colors, but the contrast is too low: letters become hard to read.

    The second one has good contrast, but you can barely see colors.

    The last one has both: high contrast and clean, vibrant colors. Lighter colors are readable even on a white background since they fill a lot more area. Text is the same brightness as in the second example, yet it gives the impression of clearer color. It’s all upside, really.

    UI designers know about this trick for a while, but I rarely see it applied in code editors:

    If your editor supports choosing background color, give it a try. It might open light themes for you.

    Bold and italics

    Don’t use. This goes into the same category as too many colors. It’s just another way to highlight something, and you don’t need too many, because you can’t highlight everything.

    In theory, you might try to replace colors with typography. Would that work? I don’t know. I haven’t seen any examples.

    Using italics and bold instead of colors

    Myth of number-based perfection

    Some themes pay too much attention to be scientifically uniform. Like, all colors have the same exact lightness, and hues are distributed evenly on a circle.

    This could be nice (to know if you have OCD), but in practice, it doesn’t work as well as it sounds:

    OkLab l=0.7473 c=0.1253 h=0, 45, 90, 135, 180, 225, 270, 315

    The idea of highlighting is to make things stand out. If you make all colors the same lightness and chroma, they will look very similar to each other, and it’ll be hard to tell them apart.

    Our eyes are way more sensitive to differences in lightness than in color, and we should use it, not try to negate it.

    Let’s design a color theme together

    Let’s apply these principles step by step and see where it leads us. We start with the theme from the start of this post:

    First, let’s remove highlighting from language keywords and re-introduce base text color:

    Next, we remove color from variable usage:

    and from function/method invocation:

    The thinking is that your code is mostly references to variables and method invocation. If we highlight those, we’ll have to highlight more than 75% of your code.

    Notice that we’ve kept variable declarations. These are not as ubiquitous and help you quickly answer a common question: where does thing thing come from?

    Next, let’s tone down punctuation:

    I prefer to dim it a little bit because it helps names stand out more. Names alone can give you the general idea of what’s going on, and the exact configuration of brackets is rarely equally important.

    But you might roll with base color punctuation, too:

    Okay, getting close. Let’s highlight comments:

    We don’t use red here because you usually need it for squiggly lines and errors.

    This is still one color too many, so I unify numbers and strings to both use green:

    Finally, let’s rotate colors a bit. We want to respect nesting logic, so function declarations should be brighter (yellow) than variable declarations (blue).

    Compare with what we started:

    In my opinion, we got a much more workable color theme: it’s easier on the eyes and helps you find stuff faster.

    Shameless plug time

    I’ve been applying these principles for about 8 years now.

    I call this theme Alabaster and I’ve built it a couple of times for the editors I used:

    It’s also been ported to many other editors and terminals; the most complete list is probably here. If your editor is not on the list, try searching for it by name—it might be built-in already! I always wondered where these color themes come from, and now I became an author of one (and I still don’t know).

    Feel free to use Alabaster as is or build your own theme using the principles outlined in the article—either is fine by me.

    As for the principles themselves, they worked out fantastically for me. I’ve never wanted to go back, and just one look at any “traditional” color theme gives me a scare now.

    I suspect that the only reason we don’t see more restrained color themes is that people never really thought about it. Well, this is your wake-up call. I hope this will inspire people to use color more deliberately and to change the default way we build and use color themes.

    Permalink

    Statistics made simple

    I have a weird relationship with statistics: on one hand, I try not to look at it too often. Maybe once or twice a year. It’s because analytics is not actionable: what difference does it make if a thousand people saw my article or ten thousand?

    I mean, sure, you might try to guess people’s tastes and only write about what’s popular, but that will destroy your soul pretty quickly.

    On the other hand, I feel nervous when something is not accounted for, recorded, or saved for future reference. I might not need it now, but what if ten years later I change my mind?

    Seeing your readers also helps to know you are not writing into the void. So I really don’t need much, something very basic: the number of readers per day/per article, maybe, would be enough.

    Final piece of the puzzle: I self-host my web projects, and I use an old-fashioned web server instead of delegating that task to Nginx.

    Static sites are popular and for a good reason: they are fast, lightweight, and fulfil their function. I, on the other hand, might have an unfinished gestalt or two: I want to feel the full power of the computer when serving my web pages, to be able to do fun stuff that is beyond static pages. I need that freedom that comes with a full programming language at your disposal. I want to program my own web server (in Clojure, sorry everybody else).

    Existing options

    All this led me on a quest for a statistics solution that would uniquely fit my needs. Google Analytics was out: bloated, not privacy-friendly, terrible UX, Google is evil, etc.

    What is going on?

    Some other JS solution might’ve been possible, but still questionable: SaaS? Paid? Will they be around in 10 years? Self-host? Are their cookies GDPR-compliant? How to count RSS feeds?

    Nginx has access logs, so I tried server-side statistics that feed off those (namely, Goatcounter). Easy to set up, but then I needed to create domains for them, manage accounts, monitor the process, and it wasn’t even performant enough on my server/request volume!

    My solution

    So I ended up building my own. You are welcome to join, if your constraints are similar to mine. This is how it looks:

    It’s pretty basic, but does a few things that were important to me.

    Setup

    Extremely easy to set up. And I mean it as a feature.

    Just add our middleware to your Ring stack and get everything automatically: collecting and reporting.

    (def app
      (-> routes
        ...
        (ring.middleware.params/wrap-params)
        (ring.middleware.cookies/wrap-cookies)
        ...
        (clj-simple-stats.core/wrap-stats))) ;; <-- just add this

    It’s zero setup in the best sense: nothing to configure, nothing to monitor, minimal dependency. It starts to work immediately and doesn’t ask anything from you, ever.

    See, you already have your web server, why not reuse all the setup you did for it anyway?

    Request types

    We distinguish between request types. In my case, I am only interested in live people, so I count them separately from RSS feed requests, favicon requests, redirects, wrong URLs, and bots. Bots are particularly active these days. Gotta get that AI training data from somewhere.

    RSS feeds are live people in a sense, so extra work was done to count them properly. Same reader requesting feed.xml 100 times in a day will only count as one request.

    Hosted RSS readers often report user count in User-Agent, like this:

    Feedly/1.0 (+http://www.feedly.com/fetcher.html; 457 subscribers; like FeedFetcher-Google)
    
    Mozilla/5.0 (compatible; BazQux/2.4; +https://bazqux.com/fetcher; 6 subscribers)
    
    Feedbin feed-id:1373711 - 142 subscribers

    My personal respect and thank you to everybody on this list. I see you.

    Graphs

    Visualization is important, and so is choosing the correct graph type. This is wrong:

    Continuous line suggests interpolation. It reads like between 1 visit at 5am and 11 visits at 6am there were points with 2, 3, 5, 9 visits in between. Maybe 5.5 visits even! That is not the case.

    This is how a semantically correct version of that graph should look:

    Some attention was also paid to having reasonable labels on axes. You won’t see something like 117, 234, 10875. We always choose round numbers appropriate to the scale: 100, 200, 500, 1K etc.

    Goes without saying that all graphs have the same vertical scale and syncrhonized horizontal scroll.

    Insights

    We don’t offer much (as I don’t need much), but you can narrow reports down by page, query, referrer, user agent, and any date slice.

    Not implemented (yet)

    It would be nice to have some insights into “What was this spike caused by?”

    Some basic breakdown by country would be nice. I do have IP addresses (for what they are worth), but I need a way to package GeoIP into some reasonable size (under 1 Mb, preferably; some loss of resolution is okay).

    Finally, one thing I am really interested in is “Who wrote about me?” I do have referrers, only question is how to separate signal from noise.

    Performance. DuckDB is a sport: it compresses data and runs column queries, so storing extra columns per row doesn’t affect query performance. Still, each dashboard hit is a query across the entire database, which at this moment (~3 years of data) sits around 600 MiB. I definitely need to look into building some pre-calculated aggregates.

    One day.

    How to get

    Head to github.com/tonsky/clj-simple-stats and follow the instructions:

    Let me know what you think! Is it usable to you? What could be improved?

    P.S. You can try the live example at tonsky.me/stats. The data was imported from Nginx access logs, which I turned on and off on a few occasions, so it’s a bit spotty. Still, it should give you a general idea.

    Permalink

    Build a ClojureScript Application Running on Node Using Nix

    To build a ClojureScript application, we need to pull dependencies from two package managers. That is, JavaScript packages from npm and Clojure packages from deps.edn. After pulling the dependencies, we can then let shadow-cljs compiles the ClojureScript files into a single javascript and make the result a npm package. In this post, we rely on nixpkgs's builtin fetchNpmDeps and buildNpmPackage for the npm side, and clj-nix for the nix side.

    • changelog Jan 17, 2026: use a new method to build the nix package, supports git repos, reduces output size.

    Prerequisites

    I will skip the details that are not unique to Nix. The following contents assume we already have a working shadow-cljs project that can produce a node.js target when running npm run release. Which is basically:

    • a :node-script target in shadow-cljs.edn, with :main function specified
    • a release script in package.json, most probably shadow-cljs release script
    • an entry script specified in bin in package.json.

    Prefetch and generate lock file for dependencies

    During the build phase of nix package, the process cannot perform any network request, as that will defeat the whole point of determinism. The common practice is to use a lock file. Which is essentially a recipe of all the necessary resources (and their checksums) for the building process. We can then prefetch all the resources into the nix store and tell the building process where to find the necessary resources during the actual building.

    On the npm side, we do it by running the following command. It will output a sha256 code, we save it for later.

    nix run nixpkgs#prefetch-npm-deps package-lock.json
    

    On the Clojure side, we execute the following command. It will generate a deps-lock.json file. We will refer it in the flake.nix.

    nix run github:jlesquembre/clj-nix#deps-lock
    

    Build with shadow-cljs and mkCljLib

    We need to add clj-nix's overlays to our flake.nix to be able to use mkCljLib.

    pkgs = import nixpkgs {
      inherit system;
      overlays = [
        clj-nix.overlays.default
      ];
    };
    

    First, we create a node_modules derivative which we will use later.

    deps-hash = "deps-hash";
    
    npm-deps = pkgs.buildNpmPackage(finalAttrs: {
      src = self;
      pname = "package-npm-deps";
      version = "0.0";
      dontNpmBuild = true;
      npmDepsHash = deps-hash;
      installPhase = ''
      mkdir $out
      cp -r ./node_modules $out/node_modules
      '';
    });
    

    We will use mkCljLib to setup Clojure dependencies for us, while we copy JavaScript dependencies from the earlier step manually.

    build = pkgs.mkCljLib {
      projectSrc = self;
      name = "espoir";
      buildCommand = "
      # copy node_modules from earlier step to build directory
      cp -r ${npm-deps}/node_modules ./node_modules
      ${pkgs.nodejs}/bin/npm run release
      ";
      installPhase = ''
      mkdir -p $out
      cp -R * $out/
      '';
    };
    

    In the buildCommand, we setup the JavaScript dependencies, and call npm run release which in turn calls shadow-cljs to build the project.

    In installPhase, we override the install process provided by mkCljLib. Instead, we simply use all the files in the build directory as output.

    Package results as a npm package

    We can directly feed output from the earlier example to builNpmPackage. Since we already build the package, we will set dontNpmBuild = true;.

    The install process provided by buildNpmPackage will package necessary dependencies and generate an executable calling the entry script.

    packages.default = pkgs.buildNpmPackage {
      pname = "espoir";
      src = build;
      dontNpmBuild = true;
      version = "0.0.1";
      npmDepsHash = deps-hash;
    };
    

    Tips

    Permalink

    Senior Backend Engineer (f/m/d) at HolidayPirates GmbH

    Senior Backend Engineer (f/m/d) at HolidayPirates GmbH

    eur64000 - eur75000

    AHOY MATE!

    Are you a backend developer passionate about Clojure and functional programming?

    Do you enjoy solving complex problems and building clean, maintainable systems?

    Yo-ho-ho! Then you are the pirate we are looking for!

    As a Senior Backend Developer, you’ll be a key part of our backend team, designing, building, and maintaining some of the core applications of our platform. You’ll collaborate closely with engineers, product managers, and stakeholders to develop scalable, high-quality solutions using Clojure and other modern technologies.

    DUTIES ON DECK
    • Architect, design, and implement backend services in Clojure.
    • Build scalable APIs and services to support high-throughput, low-latency applications.
    • Improve system performance, reliability, and observability.
    • Collaborate with product and engineering teams to deliver well-designed solutions.
    • Maintain and improve existing codebase with a focus on quality and long-term maintainability.
    • Lead technical discussions and contribute to best practices, tooling, and architecture.
    YOUR TREASURE OF EXPERIENCE
    • 5+ years of backend development experience.
    • Working experience in Clojure or already learning it because of interest.
    • Good understanding of functional programming principles and best practices.
    • Experience with relational databases (e.g., PostgreSQL) and caching/key-value stores (e.g., Redis).
    • Experience with event-driven systems.
    • Familiarity with performance monitoring tools and automated testing tools.
    • Knowledge of CI/CD pipelines and DevOps practices.
    • Experience with AWS and IaC tools such as Terraform is a plus.
    • Strong communication skills and the ability to work effectively in a collaborative environment.
    THE PIRATE SHIP OFFERS YOU
    • Inclusive & Diverse: We welcome all pirates from every port; whoever you are, you belong here.
    • Transparent Pay: Salary bands are clearly communicated from the start no guesswork, just fairness.
    • Work-Life Adventure: Enjoy workations and exclusive travel perks to keep your explorer’s spirit alive.
    • Home Office Allowance: Get €50/month to keep your remote setup smooth sailing.
    • A Ticket Edenred Card: €50/month to keep you fueled, whether you’re snacking at sea or lunching in port.
    • Tools: A MacBook for work and personal use, plus any extra equipment you need (and a very lovely remote IT support crew).
    • Wellbeing Support: Access free life coaches, psychologists, and nutritionists to support your mental and physical health.
    • Learning & Growth: Use your personal training budget to level up your skills at your own pace.
    • Extra Perks: Private travel insurance, and corporate discounts with brands like Adidas, Apple, LG, and more.
    • A Truly Global Crew: Work alongside inspiring crewmates from across the globe and drop anchor in our Berlin Mitte office whenever you like (bonus: it’s dog friendly, so feel free to bring your mate!).
    • Legendary Events: From our annual summit to team get-togethers, we celebrate well and often.
    • Visa Assistance: We support relocation and visa processes for international candidates.

    Permalink

    Stop Round-Tripping Your Codebase: How to Cut LLM Token Usage by 80% Using Recursive Document Analysis

    When you employ AI agents such as Claude, there’s a significant volume problem for document study. Reading one file of 1000 lines consumes about 10,000 tokens. Token consumption incurs costs and time penalties. Codebases with dozens or hundreds of files, a common case for real world projects, can easily exceed 100,000 tokens in size when the whole thing must be considered. The agent must read and comprehend, and be able to determine the interrelationships among these files. And, particularly, when the task requires multiple passes over the same documents, perhaps one pass to divine the structure and one to mine the details, costs multiply rapidly.

    Matryoshka is a tool for document analysis that achieves over 80% token savings while enabling interactive and exploratory analysis. The key insight of the tool is to save tokens by caching past analysis results, and reusing them, so you do not have to process the same document lines again. These ideas come from recent research, and retrieval-augmented generation, with a focus on efficiency. We'll see how Matryoshka unifies these ideas into one system that maintains a persistent analytical state. Finally, we'll take a look at some real-world results analyzing the anki-connect codebase.

    The Problem: Context Rot and Token Costs

    A common task is to analyze a codebase to answers a question such as “What is the API surface of this project?” Such work includes identifying and cataloguing all the entry points exposed by the codebase.

    Traditional approach:

    1. 1. Read all source files into context (~95,000 tokens for a medium project)
    2. 2. The LLM analyzes the entire codebase’s structure and component relationships
    3. 3. For follow-up questions, the full context is round-tripped every turn

    This creates two problems:

    Token Costs Compound

    Every time, the entire context has to go to the API. In a 10-turn conversation about a codebase of 7,000 lines, almost a million tokens might be processed by the system. Most of those tokens are the same document contents being dutifully resent, over and over. The same core code is sent with every new question. This redundant transaction is a massive waste. It forces the model to process the same blocks of text repeatedly, rather than concentrating its capabilities on what’s actually novel.

    Context Rot Degrades Quality

    As described in the Recursive Language Models paper, even the most capable models exhibit a phenomenon called context degradation, in which their performance declines with increasing input length. This deterioration is task-dependent. It’s connected to task complexity. In information-dense contexts, where the correct output requires the synthesis of facts presented in widely dispersed locations in the prompt, this degradation may take an especially precipitous form. Such a steep decline can occur even for relatively modest context lengths, and is understood to reflect a failure of the model to maintain the threads of connection between large numbers of informational fragments long before it reaches its maximum token capacity.

    The authors argue that we should not be inserting prompts into the models, since this clutters their memory and compromises their performance. Instead, documents should be considered as external environments with which the LLM can interact by querying, navigating through structured sections, and retrieving specific information on an as-needed basis. This approach treats the document as a separate knowledge base, an arrangement that frees up the model from having to know everything.

    Prior Work: Two Key Insights

    Matryoshka builds on two research directions:

    Recursive Language Models (RLM)

    The RLM paper introduces a new methodology that treats documents as external state to which step-by-step queries can be issued, without the necessity of loading them entirely. Symbolic operations, search, filter, aggregate, are actively issued against this state, and only the specific, relevant results are returned, maintaining a small context window while permitting analysis of arbitrarily large documents.

    Key point is that the documents stay outside the model, and only the search results enter the context. This separation of concerns ensures that the model never sees complete files, instead, a search is initiated to retrieve the information.

    Barliman: Synthesis from Examples

    Barliman, a tool developed by William Byrd and Greg Rosenblatt, shows that it is possible to use program synthesis without asking for precise code specifications. Instead, input/output examples are used, and a solver engine is used as a relational programming system in the spirit of miniKanren. Barliman uses such a system to synthesize functions that satisfy the constraints specified. The system interprets the examples as if they were relational rules, and the synthesis engine tries to satisfy them. This approach makes it possible to describe what is desired for concrete test cases.

    The approach is to simply show examples of the kind of behavior one wishes the system to exhibit, letting it derive the implmentation on its own. Thus, the emphasis shifts from writing long and detailed step-by-step recipes for behavior to simply portraying, in a declarative fashion, what the desired goal is.

    Matryoshka: Combining the Insights

    Matryoshka incorporates these insights into a functioning system for LLM agents. A practical tool is provided that enables agents to decompose challenging tasks into a sequence of smaller and more manageable objectives.

    1. Nucleus: A Declarative Query Language

    Instead of issuing commands, the LLM describes what it wants, using Nucleus, a simple S-expression query language. This changes the focus from describing each step to specifying the desired outcome.

    (grep "class ")           ; Find all class definitions
    (count RESULTS)           ; Count them
    (map RESULTS (lambda x    ; Extract class names
      (match x "class (\\w+)" 1)))
    

    We observe that the declarative interface retains its robustness even when the LLM employs different vocabulary or sentence structures. This robustness originates from the system’s commitment to elucidating the underlying intent of a request, independent of superficial linguistic variations.

    2. Pointer-Based State

    The key new insight is that we can separate the results from the context. Results are now stored in the REPL state, rather than in the context.

    When Claude runs (grep "def ") and gets 150 matches:

    • Traditional tools: All 150 lines are fed into context, and round-tripped every turn
    • Matryoshka: Binds matches to RESULTS in the REPL, returning only "Found 150 results"

    The variable RESULTS is bound to the actual value in the REPL. This binding acts as a pointer, revealing the location of the data within the server's memory. Subsequent operations, queries, for example, or updates, use this reference to access the data. But the data itself never actually enters the conversation:

    Turn 1: (grep "def ")         → Server stores 150 matches as RESULTS
                                  → Context gets: "Found 150 results"
    
    Turn 2: (count RESULTS)       → Server counts its local RESULTS
                                  → Context gets: "150"
    
    Turn 3: (filter RESULTS ...)  → Server filters locally
                                  → Context gets: "Filtered to 42 results"
    

    The LLM never sees the 150 function definitions, just the aggregated answers from these functions.

    3. Synthesis from Examples

    When queries need custom parsing, Matryoshka synthesizes functions from examples:

    (synthesize_extractor
      "$1,250.00" 1250.00
      "€500" 500
      "$89.99" 89.99)
    

    The synthesizer learns the pattern directly from examples, obtaining numerical values straight from the currency strings and entirely circumventing the need to construct manual regex.

    The Lifecycle

    A typical Matryoshka session:

    1. Load Document

    (load "./plugin/__init__.py")
    → "Loaded: 2,244 lines, 71.5 KB"
    

    The document is parsed and stored server-side. Only metadata enters the context.

    2. Query Incrementally

    (grep "@util.api")
    → "Found 122 results, bound to RESULTS"
       [402] @util.api()
       [407] @util.api()
       ... (showing first 20)
    

    Each query returns a preview plus the count. Full data stays on server.

    3. Chain Operations

    (count RESULTS)           → 122
    (filter RESULTS ...)      → "Filtered to 45 results"
    (map RESULTS ...)         → Transforms bound to RESULTS
    

    Operations chain through the RESULTS binding. Each step refines without re-querying.

    4. Close Session

    (close)
    → "Session closed, memory freed"
    

    Sessions auto-expire after 10 minutes of inactivity.

    How Agents Discover and Use Matryoshka

    Matryoshka integrates with LLM agents via the Model Context Protocol (MCP).

    Tool Discovery

    When Claude Code starts, it launches Matryoshka as an MCP server and receives a tool manifest:

    {
      "tools": [
        {
          "name": "lattice_load",
          "description": "Load a document for analysis..."
        },
        {
          "name": "lattice_query",
          "description": "Execute a Nucleus query..."
        },
        {
          "name": "lattice_help",
          "description": "Get Nucleus command reference..."
        }
      ]
    }
    

    Claude sees the available tools and their descriptions. When a user asks to analyze a file, Claude decides which tools to use based on the task.

    Guided Discovery

    The lattice_help tool returns a command reference, teaching the LLM the query language on-demand:

    ; Search commands
    (grep "pattern")              ; Regex search
    (fuzzy_search "query" 10)     ; Fuzzy match, top N
    (lines 10 20)                 ; Get line range
    
    ; Aggregation
    (count RESULTS)               ; Count items
    (sum RESULTS)                 ; Sum numeric values
    
    ; Transformation
    (map RESULTS fn)              ; Transform each item
    (filter RESULTS pred)         ; Keep matching items
    

    The agent learns capabilities incrementally rather than needing upfront training.

    Session Flow

    User: "How many API endpoints does anki-connect have?"
    
    Claude: [Calls lattice_load("plugin/__init__.py")]
            → "Loaded: 2,244 lines"
    
    Claude: [Calls lattice_query('(grep "@util.api")')]
            → "Found 122 results"
    
    Claude: [Calls lattice_query('(count RESULTS)')]
            → "122"
    
    Claude: "The anki-connect plugin exposes 122 API endpoints,
             decorated with @util.api()."
    

    Each tool invocation maintains its own state within the conversation. So, for example, when a document is loaded, that content is retained in memory. Similarly, the results of any query that is executed are saved and available for later use.

    Real-World Example: Analyzing anki-connect

    Let's walk through a complete analysis of the anki-connect Anki plugin. Here we have a real-world codebase with 7,770 lines across 17 files.

    The Task

    "Analyze the anki-connect codebase: find all classes, count API endpoints, extract configuration defaults, and document the architecture."

    The Workflow

    The agent uses Matryoshka's prompt hints to accomplish the following workflow:

    1. 1. Discover files with Glob
    2. 2. Read small files directly (<300 lines)
    3. 3. Use Matryoshka for large files (>500 lines)
    4. 4. Aggregate across all files

    Step 1: File Discovery

    Glob **/*.py → 15 Python files
    Glob **/*.md → 2 markdown files
    
    File sizes:
      plugin/__init__.py    2,244 lines  → Matryoshka
      plugin/edit.py          458 lines  → Read directly
      plugin/web.py           301 lines  → Read directly
      plugin/util.py          107 lines  → Read directly
      README.md             4,660 lines  → Matryoshka
      tests/*.py           11 files      → Skip (tests)
    

    Step 2: Read Small Files

    Reading util.py (107 lines) reveals configuration defaults:

    DEFAULT_CONFIG = {
        'apiKey': None,
        'apiLogPath': None,
        'apiPollInterval': 25,
        'apiVersion': 6,
        'webBacklog': 5,
        'webBindAddress': '127.0.0.1',
        'webBindPort': 8765,
        'webCorsOrigin': None,
        'webCorsOriginList': ['http://localhost'],
        'ignoreOriginList': [],
        'webTimeout': 10000,
    }
    

    Reading web.py (301 lines) reveals the server architecture:

    • - Classes: WebRequest, WebClient, WebServer
    • - JSON-RPC style API with jsonschema validation
    • - CORS support with configurable origins

    Step 3: Query Large Files with Matryoshka

    ; Load the main plugin file
    (load "plugin/__init__.py")
    → "Loaded: 2,244 lines, 71.5 KB"
    
    ; Find all classes
    (grep "^class ")
    → "Found 1 result: [65] class AnkiConnect:"
    
    ; Count methods
    (grep "def \\w+\\(self")
    → "Found 148 results"
    
    ; Count API endpoints
    (grep "@util.api")
    → "Found 122 results"
    
    ; Load README for documentation
    (load "README.md")
    → "Loaded: 4,660 lines, 107.2 KB"
    
    ; Find documented action categories
    (grep "^### ")
    → "Found 13 sections"
       [176] ### Card Actions
       [784] ### Deck Actions
       [1231] ### Graphical Actions
       ...
    

    Complete Findings

    Metric Value
    Total files 17 (15 .py + 2 .md)
    Total lines 7,770
    Classes 8 (1 main + 3 web + 4 edit)
    Instance methods 148
    API endpoints 122
    Config settings 11
    Imports 48
    Documentation sections 8 categories, 120 endpoints

    Token Usage Comparison

    Approach Lines Processed Tokens Used Coverage
    Read everything 7,770 ~95,000 100%
    Matryoshka only 6,904 ~6,500 65%
    Hybrid 7,770 ~17,000 100%

    The hybrid method achieves a 82% savings in tokens while retaining 100% of the original coverage. This approach combines two different strategies, one for compressing redundant information and one for preserving unique insights.

    The pure Matryoshka approach ends up missing details from small files (configuration defaults, web server classes), because Claude only uses the tool to query large ones. The hybrid workflow does direct, full-content reads on small files, while leveraging Matryoshka to analyze bigger files, in a kind of divide-and-conquer strategy. All that's needed is to provide the agent an explicit hint on the strategy to use.

    Why Hybrid Works

    Small files (<300 lines) contain critical details:

    • - util.py: All configuration defaults, the API decorator implementation
    • - web.py: Server architecture, CORS handling, request schema

    These fit comfortably in context, and there's no need to do anything different. Matryoshka adds value for:

    • - __init__.py (2,244 lines): Query specific patterns without loading everything
    • - README.md (4,660 lines): Search documentation sections on demand

    Architecture

    ┌─────────────────────────────────────────────────────────┐
    │                     Adapters                            │
    │  ┌──────────┐  ┌──────────┐  ┌───────────────────────┐  │
    │  │   Pipe   │  │   HTTP   │  │   MCP Server          │  │
    │  └────┬─────┘  └────┬─────┘  └───────────┬───────────┘  │
    │       │             │                    │              │
    │       └─────────────┴────────────────────┘               │
    │                          │                              │
    │                ┌─────────┴─────────┐                    │
    │                │   LatticeTool     │                    │
    │                │   (Stateful)      │                    │
    │                │   • Document      │                    │
    │                │   • Bindings      │                    │
    │                │   • Session       │                    │
    │                └─────────┬─────────┘                    │
    │                          │                              │
    │                ┌─────────┴─────────┐                    │
    │                │  NucleusEngine    │                    │
    │                │  • Parser         │                    │
    │                │  • Type Checker   │                    │
    │                │  • Evaluator      │                    │
    │                └─────────┬─────────┘                    │
    │                          │                              │
    │                ┌─────────┴─────────┐                    │
    │                │    Synthesis      │                    │
    │                │  • Regex          │                    │
    │                │  • Extractors     │                    │
    │                │  • miniKanren     │                    │
    │                └───────────────────┘                    │
    └─────────────────────────────────────────────────────────┘
    

    Getting Started

    Install from npm:

    npm install matryoshka-rlm
    

    As MCP Server (Claude Code / Claude Desktop)

    Add to your Claude configuration:

    {
      "mcpServers": {
        "lattice": {
          "command": "npx",
          "args": ["lattice-mcp"]
        }
      }
    }
    

    Programmatic Use

    import { NucleusEngine } from "matryoshka-rlm";
    
    const engine = new NucleusEngine();
    await engine.loadFile("./document.txt");
    
    const result = engine.execute('(grep "pattern")');
    console.log(result.value); // Array of matches
    

    Interactive REPL

    npx lattice-repl
    lattice> :load ./data.txt
    lattice> (grep "ERROR")
    lattice> (count RESULTS)
    

    Conclusion

    Matryoshka embodies the principle, emerging from RLM research, that documents are to be treated as external environments rather than as contexts to be parsed. This principle alters the fundamental character of the model’s engagement, no longer a passive reader but an active agent, navigating through and interrogating a document to extract specific information, somewhat as a programmer would browse through code. Combined with Barliman-style synthesis, in which a solution is built up in a series of small, well-defined steps, and pointer-based state management, it achieves:

    • - 82% token savings on real-world codebase analysis
    • - 100% coverage when combined with direct reads for small files
    • - Incremental exploration where each query builds on previous results
    • - No context rot because documents stay outside the model

    We observe that variable bindings such as RESULTS refer to REPL state rather than holding data directly in model context. As we formulate and submit queries, what is sent to the server are mere pointers, placeholders indicating where the actual computation should occur. It is the server that executes the substantive computational tasks, returning only the distilled results.

    The tool is open source: https://github.com/yogthos/Matryoshka

    Permalink

    Java’s Plans for 2026 and new curiosities from JDK Mailing Lists - JVM Weekly vol. 159

    New year, new plans, new promises we’ll look back on in twelve months with a mixture of nostalgia and disappointment.

    But rather than dwelling too much on those New Year’s resolutions - there’ll be plenty of opportunity for that - let’s take a look at what OpenJDK teams themselves are saying about their plans for 2026.

    Thanks for reading JVM Weekly! Subscribe for free to receive new posts and support my work.

    OpenJDK Plans for 2026 - Valhalla Getting Closer, Amber Not Slowing Down, Loom Nearly Complete

    I have to admit: Nicolai Parlog from Oracle gave me a pleasant surprise. His New Year’s episode of Inside Java Newscast extracted concrete details from people working on individual projects. An “anonymous source” (whose identity anyone who’s ever heard Brian Goetz will immediately guess) dropped some bombs worth discussing.

    Valhalla is targeting JDK 28. This is probably the most important piece of news. Our Anonymous Source revealed that value types won’t make it into JDK 27 - but not because they aren’t ready.

    “We’re bringing an elephant onto a train and want to make sure we get into an empty car.”

    JDK 27 forks in June, so mainline will switch to 28 then - with room for JEP 401. After value types, nullness markers, array improvements, and primitive-wrapper unification are queued up - but that’s a perspective for later releases.

    Vector API is waiting for Valhalla. JDK 26 will see its eleventh (!) incubation - and it’ll stay that way until value types land in mainline. When that happens, the implementation will be rewritten and the API moved from jdk.incubator.vector to the proper java package.

    Leyden - AOT code compilation. The AOT cache will contain not just loaded classes and method profiles, but compiled machine code as well. The runtime will be able to pull optimized code straight from the cache, dramatically reducing warmup time.

    Structured Concurrency nearing finalization. After the revamp in JDK 25, the API will go through preview with minimal changes - Nikolai rates the chances of finalization this year as good. This is the last piece of the big Project Loom picture (though some would like to see a bit of “scope creep” here - in the positive way - which we’ll get to shortly).

    Amber - constant patterns and pattern assignment. The team is “knee-deep in the second phase of pattern matching.” Two features are mature enough that JEPs may appear this year. We’ll examine all of this ourselves in a moment since the details from the mailing lists are very interesting, but apparently ideas for generalizing records and pattern matching to classes and interfaces are popping up on amber-spec-experts. So “things are happening.”

    Babylon preparing code reflection incubation. The technology allowing frameworks to reflect over code in methods and lambdas is developing well—we should hear more this year. I’m personally eager for any announcements related to GPU support.


    That’s the official plans. But as usual, the most interesting things happen at the margins—in experimental branches and mailing list discussions. And that’s exactly where Valhalla is showing that value types are just the beginning of a much more ambitious story.

    Type Classes - Valhalla experiments with the next step

    Maurizio Cimadamore announced on the valhalla-dev list the publication of an experimental type classes prototype - a mechanism Brian Goetz presented at JVMLS 2025 in his talk Growing the Java Language. The code landed in a new type-classes branch in the Valhalla repository.

    What problem do type classes solve? Today in Java, you can’t write generic mathematical code that works on int, BigDecimal, and a hypothetical Float16 alike. Interfaces require types to explicitly implement them - you can’t say Integer implements Addable without modifying the Integer class. Type classes (known from Haskell, appearing as traits in Rust) invert this relationship: instead of requiring implementation in the class, you provide an external “witness” that says “here’s how to add values of type X.” Witnesses can be defined for any type, even someone else’s.

    record MyInt(int x) { }
    
    interface Sum<X> {
        X zero();
        X add(X a, X b);
    }
    
    __witness Sum<MyInt> SUM_MYINT = new Sum<>() {
        MyInt zero() { return new MyInt(0); }
        MyInt add(MyInt a, MyInt b) { return new MyInt(a.x + b.x); }
    };
    
    // Usage:
    Sum<MyInt> sum = Sum<MyInt>.__witness;
    MyInt zero = sum.zero();
    MyInt one = new MyInt(1);
    assert sum.add(zero, one).equals(one);
    

    In this prototype, you can define a type class and a witness for a specific type. For example, here’s how an addition type class (Sum) and its witness for a value class MyInt are defined.

    • Sum<X> is a generic interface representing a type class for addition.

    • __witness defines a witness for how MyInt implements that type class externally.

    • The witness can then be looked up and used at runtime.

    This prototype enables external definitions of operations for types without modifying those types themselves, addressing a long-standing limitation in Java: today it’s impossible to write truly generic mathematical code that works uniformly across primitives like int, boxed types such as BigDecimal, and custom numeric or value types, because interfaces require the types to explicitly implement them. Type classes invert this relationship by allowing behavior to be attached externally via so-called “witnesses,” making it possible to state “here’s how type X does addition” without editing X at all.

    Goetz explained the broader vision at JVMLS: type classes are meant to enable operator overloading for value types, collection literals, or new numeric types with full support for +, -, * - but without the operator hell known from C++, since they’d be limited to value classes only.

    r/ProgrammerHumor - The one true evil
    C++ macros are even more… interesting.

    Maurizio is very clear on one point, though: this is purely a space for experimentation, not a proposal for inclusion in the Java platform, and any JEP is still a long way off.

    With Valhalla, as ever, patience is part of the lesson. However, they have even more under they sleeve now!


    Null Checks get concrete - The “Bang World” Prototype

    Daniel Smith announced on valhalla-spec-experts another prototype branch worth watching: bworld (short for “bang world”). This one tackles the long-awaited nullness markers - specifically, the runtime enforcement side of making ! actually mean something.

    The idea is straightforward: mark types with ! to indicate a non-null barrier, and have the JVM enforce it. You’ll be able to use ! on field types, local variables, method parameters, return types, casts, instanceof, and array element types. And crucially - this isn’t limited to value classes. Any reference type can be marked.

    What happens at runtime? The compiler generates calls to a new java.lang.runtime.Checks API (deliberately not Objects.requireNonNull - they want the JVM to have freedom to treat these checks specially). What those it mean in practice?

    • A cast to String! will throw if you pass null

    • A field declared as String! name must be initialized before the super() call.

    • Arrays created with new Foo![]{a, b, c} will reject null writes dynamically.

    Daniel notes that current implementation only fully supports runtime checks for value-class-typed fields and arrays - other reference types will get the metadata in the class file, but enforcement is coming later.

    The prototype also includes optional lint warnings for suspicious patterns - like assigning null literals to ! targets or removing ! markers when overriding methods. But the most interesting bit is “use-site checks”: the compiler can insert null checks when calling methods from untrusted binaries! This neatly exposes the real problem: not greenfield code, but millions of libraries that were written when null markers weren’t even imaginable.

    That’s the real challenge: how to introduce null-safety into a 30-year-old ecosystem gradually, without breaking the world? But as Daniel puts it:

    This is not the final version of the feature... it’s a snapshot. But we’ve been wanting something concrete that we could play with.

    The Kotlin crowd will feel right at home, except for one crucial difference: Kotlin’s null-safety is purely a compiler-land, erased at runtime. Java is building actual JVM enforcement. Slower to arrive, but when a String! field says “never null,” the runtime will back that promise up.


    Ephemeral Threads - Clojure knocks on Project Loom’s Door “With a Request”

    Mama, take this badge off of me
    I can’t use it anymore
    It’s gettin’ dark, too dark for me to see
    I feel like I’m knockin’ on heaven’s door

    Since Structured Concurrency is approaching finalization, can Project Loom be considered “finished”? The community has a different opinion. A heated discussion (30+ emails in a week) erupted on the loom-dev list, initiated by Alex Miller from the Clojure team.

    The topic? So-called “ephemeral threads”- threads that can be garbage collected before they finish their work. Sounds like heresy in the Java world, where for 30 years an iron rule has applied: threads are GC roots and live until they complete their work. But for the Clojure community, this has been daily bread for over a decade.

    What’s the deal? The core.async library lets you create lightweight “go blocks” that wait for data from channels. If all channels become unreachable, the block can also be collected by GC - since it would never wake up anyway. Elegant and practical when building pub/sub or pipelines.

    The problem: after migrating to virtual threads, the pattern stopped working. Alan Bateman, Tech Lead of Loom, is however skeptical about the ephemeral thread concept, pointing to deeper complications: interactions with finalizers and cleaners can lead to scenarios from mildly creepy to truly terrifying.:

    The possibility of GC’ing a started thread before it terminates is a scary topic. It interacts with many areas and gets really scary once you bring phantom refs, cleaners, and finalizers into the discussion.

    For these so-called “forgotten sender” and “abandoned receiver” cases then it might be more interesting to see how they could be work with structured concurrency. Right now, the first API is focused on fan-out scenarios but in time we would like to have it work with channel like constructs too. I suspect this will be closer to what you are interested in.

    Since JDK 21, VTs are tracked by default for diagnostic tools, so “abandoned” threads hang in memory indefinitely. The flag - Djdk.trackAllThreads=false helps, but Miller rightly asks about its future, and hears from Alan Bateman:

    It's clearly an attractive nuisance right now and setting it to
    false is specific to the root "thread grouping". There is some
    performance work required in that area but otherwise I think it needs to
    be removed.

    The argument for ephemeral threads is simple: virtual threads open the door to Erlang-style architectures where lightweight processes can be abandoned when they become redundant. Miller writes directly:

    Most of these constructs work as infinite loops without persistent references. You simply can’t build such libraries with the traditional approach to thread termination.

    Similar voices came from other users experimenting with their own schedulers - they want to use VTs as an invisible implementation detail where the end programmer doesn’t think about threads at all.

    Oracle has serious concerns, however. The main one: debugging. Viktor Klang illustrates with an example - code acquires a file descriptor, parks the thread, then releases it. If the thread gets collected while parked, the descriptor leaks without a trace.

    This easily leads to problems where resources leak without any trace of who lost the - which can be nightmarish in production,

    argues Klang.

    Andrew Haley from Red Hat offered an interesting counterargument: if a thread is waiting on an unreachable semaphore, it will never release resources anyway - whether it takes up memory or not, the problem is the same.

    There is a light at the end of the tunnel, however. Bateman suggests that cases of “abandoned” threads might play better with structured concurrency, which over time is meant to handle channel constructs as well. For Clojure, though, this means waiting - an official API for ephemeral threads probably won’t materialize, and the unofficial flag will likely disappear.

    The discussion shows a broader trend: virtual threads opened Pandora’s box of new patterns that the JVM ecosystem is only beginning to explore.

    Project Amber in 2026 - Pattern Assignment and Constant Patterns on the Horizon

    And finally, Gavin Bierman from the Amber team shared details of plans for 2026 on the amber-spec-experts list. Beyond continuing work on Primitive Patterns (currently in preview), two new features are in the pipeline - draft JEPs should appear soon.

    Pattern Assignment solves an irritating problem: sometimes we use pattern matching not because something might match, but because we want to conveniently decompose a value into parts. Today we have to write:

    void process(ColorPoint cp) {
        if (cp instanceof ColorPoint(var x, var y, var c)) {
            // actual code, unnecessarily nested
        }
    }
    

    The compiler and programmer both know the pattern will always work—but the syntax forces us to pretend it’s a conditional operation. The new proposal will let you simply write:

    void process(ColorPoint cp) {
        ColorPoint(var x, var y, var c) = cp;  // Pattern Assignment!
        // actual code, no nesting
    }

    Constant Patterns is the second proposal, simplifying a common case—matching against a specific value. Instead of:

    case Point(var x, var y) when x == 0 && y == 0 -> { /* origin */ }

    You’ll be able to write:

    case Point(0, 0) -> { /* origin */ }

    Constants (including null) will be able to appear directly as nested patterns - which will somewhat unify the awkward division between “case constants” and “case patterns.”


    Finally, a small observation from my functional heart (working at the company behind Scala obliges 😉).

    Looking at all these discussions, it’s hard not to notice a common denominator: functional languages and their concepts remain a key reference point for JVM evolution. Type classes are a mechanism straight from Haskell. Pattern assignment is essentially the equivalent of let with deconstruction known from ML-family languages. And ephemeral threads? It’s a request for semantics that Erlang and its descendants have treated as obvious for years.

    But here’s where it gets interesting: Java isn’t so much “adopting” these concepts as conducting a sort of dialogue with them - and often says “no” or “yes, but.” Type classes will be limited to value classes to avoid “operator overloading hell.” Ephemeral threads? Bateman politely suggests that maybe structured concurrency will someday handle these cases - which in practice means “we’ll do it our way or not at all.” Pattern matching is evolving so cautiously that Scala had time to have it, stop having it (in the sense of: stop being fashionable), and have it again before Java got to constant patterns.

    And this is precisely the paradox that fascinates me: paradoxically, Java has probably become the most important testing ground for functional ideas in the mainstream—not despite its conservatism, but because of it. Every feature goes through such brutal mills of backward compatibility, edge case analysis, and years of preview that what comes out the other side is... surprisingly solid. Haskell’s type classes are elegant but also notoriously difficult for the average programmer to understand. Java will probably produce something less elegant, more “corporate”- and paradoxically more useful for most developers.

    The memeification of Monads was a disaster to the FP world. :  r/ProgrammerHumor
    Because there are some genuinely good ideas there beyond the memes and cheap laughs.

    There’s a certain irony in all of this: Clojure, the language that since 2007 has been proving that functional programming on the JVM is possible and practical, is now asking for functionality it implemented itself for a decade—but can’t port to virtual threads without platform support. Alex Miller knocks on the door with a proposal inspired by Erlang, and Java responds in the style of: “interesting, but have you thought about interaction with finalizers? Because we have, and we have nightmares.”

    Maybe that’s exactly why this relationship works. Functional languages explore, Java stabilizes. Erlang shows that ephemeral processes are possible, Clojure proves they work in practice on the JVM, and ten years from now Java will introduce something called “Scoped Ephemeral Task Contexts” that will work with every version back to JDK 8. And mass-market enterprise will finally get a feature that functional programmers were talking about at conferences in 2015. That’s the deal—and honestly, I’m not sure there’s a better model for evolving a programming language that has to support billions of lines of production code.

    PS: If you want to go deeper - see you in person 👋

    If this edition resonated and you’d like to continue the conversation offline, I’ll be talking about JVM in the Age of AI at a few upcoming events:

    🇸🇪 JVM in the Age of AI: 2026 Edition @ JFokus 2026

    I’ll be running a 90-minute Deep Dive session focused on what really needs to happen inside the JVM for it to remain a serious platform for AI and ML - hardware, Valhalla, Babylon, GPU offloading, TornadoVM, Llama3.java, and the 2026 perspective.


    I’ll also be doing two polish JUGs during one week, talking about “Agentic Systems beyond localhost” with more room for questions (and for me to fill 😁)

    Love the meetup, as they give a bit more freedom to speaker - and I always like to adapt to what you want to dig into.

    If you’re around: come say hi, argue, disagree, or just nerd out about the JVM.


    Before we close this edition, one sad piece of news.

    Scott Adams has passed away.

    I know - he was a controversial figure, especially in his later years, and not everyone agreed with his views. But there’s no denying one thing: his humor was singular, sharp in a way that only engineers and office survivors truly appreciate.

    For years, Dilbert perfectly captured the absurdities of corporate life, technical organizations, and management theater — and it did so with a precision that made it a recurring guest in this newsletter. Many of us laughed because it was funny; many of us winced because it was accurate.

    Whatever one thinks about Scott Adams the person, Scott Adams the cartoonist shaped a generation of engineers and gave us a shared language to talk about dysfunction, nonsense, and the quiet heroism of surviving another meeting that should have been an email.

    All time favorite Dilbert cartoon : r/dilbert

    Rest in peace, Scott.

    Thanks for reading JVM Weekly! Subscribe for free to receive new posts and support my work.

    Permalink

    Clojure Deref (Jan 13, 2026)

    Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS).

    Upcoming Events

    Libraries and Tools

    Debut release

    Updates

    • clj-async-profiler 2.0.0-beta1 - Embedded high-precision Clojure profiler

    • reitit 0.10.0 - A fast data-driven routing library for Clojure/Script

    • qclojure 0.24.0 - A functional quantum computer programming library for Clojure with backend protocols, simulation backends and visualizations.

    • clj-threats 1.0.0 - Clojure implementation of Threagile

    • csvx 973ab7f - A zero dependencies tool that enables you to control how to tokenize, transform and handle files with char(s) separated values in Clojure, ClojureScript and Babashka.

    • dompa 1.2.2 - A zero-dependency, runtime-agnostic HTML parser and builder.

    • clay 2.0.5 - A REPL-friendly Clojure tool for notebooks and datavis

    • dataspex 2026.01.1 - See the shape of your data: point-and-click Clojure(Script) data browser

    • clj-kondo 2026.01.12 - Static analyzer and linter for Clojure code that sparks joy

    • quiescent 0.1.10 - A Clojure library for composable async tasks with automatic parallelization, structured concurrency, and parent-child and chain cancellation

    • eca 0.91.1 - Editor Code Assistant (ECA) - AI pair programming capabilities agnostic of editor

    • babashka 1.12.214 - Native, fast starting Clojure interpreter for scripting

    • editscript 0.7.0 - A library to diff and patch Clojure/ClojureScript data structures

    Permalink

    Grounding LLMs with Recursive Code Execution

    Despite context windows expanding to millions of tokens, LLMs still struggle with the fundamental task of precision. When you ask an LLM to "analyze this report," it often glances at the text and simply hallucinates a plausible-sounding answer based on probability.

    A good example of the problem can be seen when asking a model to sum sales figures from a financial report. Left to its own devices, it will likely not bother reading the whole document and simply give you a made-up answer. This is especially a problem with smaller models that you can run locally.

    The standard approach to dealing with this problem is to use Retrieval Augmented Generation (RAG), which relies on semantic similarity (embeddings). If you ask for "sales figures," a Vector DB retrieves chunks of text that sound like sales figures. However, semantic similarity is fuzzy and limited in functionality. Embeddings can't count, so you can't ask questions like "count the number of times X happens." They also can't handle information scattered across a bunch of unrelated lines in a document. Furthermore, they don't distinguish between concepts like "Projected Sales" and "Actual Sales" when they appear in similar contexts.

    It would be nice to have a system that treats text as a dataset to be queried rather than a prompt to be completed. This is where the Recursive Language Model paper comes in. The core idea here is that instead of having the model operate directly on the document, it uses a programmatic interface to interact with it via a REPL. The model acts as a programmer writing code to explore the document, interpreting execution results, and only then formulating an answer based on them.

    The core insight is that code execution provides grounding for the model. When an LLM guesses a number by trying to understand the document, it might be right, or it might be wrong. It has no way to know. When it writes regex.match() and the computer returns ['$2,340,000'], that result is a hard fact. What the model needs to understand is how to form a query—a general task it's likely good at—instead of trying to solve a domain-specific problem it has no direct training on.

    Allowing an LLM to write and run code directly on your system would obviously be a security nightmare, so the implementation uses isolated-vm to create a secure sandbox for it to play in. The model cannot hallucinate rm -rf / or curl a random URL. Having a sandbox also prevents infinite loops or memory leaks. And since the document is immutable, the model can read it but cannot alter the source truth.

    The process works as follows:

    1. 1. The document is loaded into a secure, isolated Node.js environment as a read-only context variable.
    2. 2. The model is given exploration tools: text_stats(), fuzzy_search(), and slice().
    3. 3. The Loop:
      • • The model writes TypeScript to probe the text.
      • • The Sandbox executes it and returns the output.
      • • The model reads the result and refines its next step.
    4. 4. The loop iterates until the model has enough proven data to answer FINAL("...").

    RLM execution model

    The system can work entirely locally using something like Ollama with Qwen-Coder, or with hosted models like DeepSeek, which are much smarter by default. It also works as an MCP that you can plug in and let your agent use to solve problems.

    Finally, I used Universal Tool Calling Protocol (UTCP) patterns from code-mode to generate strict TypeScript interfaces. This provides the LLM with a strict contract such as:

    // The LLM sees exactly this signature in its system prompt
    declare function fuzzy_search(query: string, limit?: number): Array<{
      line: string;
      lineNum: number;
      score: number; // 0 to 1 confidence
    }>;
    
    

    One problem is that LLMs tend to be messy coders; they forget semicolons, use hallucinated imports, etc. The way around that is to add a self-healing layer. If the sandbox throws a syntax error, a lightweight intermediate step attempts to fix imports and syntax before re-running. This keeps the reasoning chain alive and minimizes round trips to the model.

    As a demo to test out the concept, I made a document containing a bunch of scattered data, with 5 distinct sales figures hidden inside 4,700 characters of Lorem Ipsum filler and unrelated business jargon.

    Predictably, feeding the text into a standard context window and asking for the total promptly resulted in a hallucinated total of $480,490. It just grabbed numbers that looked like currency from unrelated sections, mashed them together, and called it a day.

    Running the same query through RLM was a completely different story. The model took 4 iterations to converge on the actual solution. Instead of trying to guess, it started writing code to explore the document. It first checked the file size:

    const stats = text_stats();
    console.log(`Document length: ${stats.length}, Lines: ${stats.lineCount}`);
    
    

    Next, it used fuzzy search to locate relevant lines, ignoring the noise:

    const matches = fuzzy_search("SALES_DATA");
    console.log(matches);
    // Output: [
    //   { line: "SALES_DATA_NORTH: $2,340,000", ... },
    //   { line: "SALES_DATA_SOUTH: $3,120,000", ... }
    // ]
    
    

    And finally, it wrote a regex to parse the strings into integers and summed them programmatically to get the correct result:

    // ...regex parsing logic...
    console.log("Calculated Total:", total); // Output: 13000000
    
    

    Only after the code output confirmed the math did the model verify the answer.

    The key difference is that the traditional approach asks the model "what does this document say," while the recursive coding approach asks it to "write a program to find out what this document says." The logic is now expressed using actual code, and the role of the LLM is to write the code and read the results as opposed to working with the document directly.

    As with all things, there is a trade-off here: the RLM approach is slower since it takes multiple turns and can generate more tokens as a result. However, if the document you're working on is itself large, then you will actually save context tokens by not loading it directly.

    MCP Integration

    The project also includes an MCP (Model Context Protocol) server, making it available as a tool for coding agents like Crush. Once configured, you can ask the agent to analyze documents that would otherwise exceed its context window or require precise data extraction.

    The server exposes an analyze_document tool that takes a query and file path. The tool can then use the RLM approach to explore documents by writing code, executing it in the sandbox, and iterating until it finds the answer.

    This creates an interesting dynamic where you agent writes the high-level query, the RLM's backing model (which can be a local Ollama instance) does the iterative code exploration, and the verified results come back to your agent. The grounding problem is solved at the tool level, so the agent can trust the results it receives.

    The implementation is available at https://github.com/yogthos/Matryoshka.

    Permalink

    Copyright © 2009, Planet Clojure. No rights reserved.
    Planet Clojure is maintained by Baishamapayan Ghose.
    Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
    Theme by Brajeshwar.