How G+D Netcetera used Rama to 100x the performance of a product used by millions of people

“Rama enabled us to improve the performance of two critical metrics for Forward Publishing by over 100x, and it reduced our AWS hosting costs by 55%. It sounds like a cliché, but learning Rama mostly involved us unlearning a lot of the conventional wisdom accepted by the software industry. It was a strange experience realising how much simpler software can be.”

Ognen Ivanovski, Principal Architect at G+D Netcetera

Forward Publishing from G+D Netcetera powers a large percentage of digital newspapers in Switzerland and Germany. Forward Publishing was launched in 2019 and handles millions of pageviews per day. The backend of Forward Publishing was rewritten on top of Rama in the past year to greatly improve its performance and expand its capabilities.

Forward Publishing determines what content to include on each page of a newspaper website with a complex set of rules considering recency, categories, and relations between content. Home pages link to dozens of stories, while story pages contain the content and links to other stories. Some portions, especially for showing lists of related content, are specified with dynamic queries. Rendering any of these pages requires determining what content to display at each position and then fetching the content for each particular item (e.g. summary, article text, images, videos). Determining what content to render on a page is a compute-intensive process. The original implementation of Forward Publishing had some serious performance issues:

  • Since computing a page from scratch on each pageview is too expensive, the architecture was reliant on a Varnish cache and only recomputing pages on an interval. This caused the system to take multiple minutes before new content would be displayed on a page. This was especially bad for breaking news.
  • Computing what to render on a page put a large amount of load on the underlying content-management system (CMS), such as Livingdocs. This load frequently negatively affected the experience of using the CMS by worsening performance.
  • The large amount of load sometimes caused outages due to the system becoming overwhelmed, especially if the cache had any sort of issue.

With Rama, they were able to improve the latency for new content becoming available on pages from a few minutes to less than a second, and they reduced the load on the CMS from Forward Publishing to almost nothing. Both of these are over 100x improvements compared to their previous implementation.

As a bonus, their Rama-based implementation requires much less infrastructure. They went from running 18 nodes per customer for Forward Publishing for various pieces of infrastructure to just 9 nodes per customer for their Rama implementation. In total their Rama-based implementation reduced their AWS hosting costs by 55%.

The new implementation reducing load on the CMS so much also has enabled G+D Netcetera to improve how the business is structured. Before, the CMS and Forward Publishing for each customer had to be operated as one integrated unit and required a lot of DevOps work to be able to handle the load. Now, the CMS can be operated independently of Forward Publishing and customers can customize it how they wish. It’s a better experience for G+D Netcetera’s customers, and it’s much less DevOps work for G+D Netcetera.

Before, Forward Publishing only worked with the Livingdocs CMS because supporting a new CMS would require a large engineering effort to get it to handle the necessary load. Now, Forward Publishing is not tied to a single CMS anymore. To support a new CMS, they just need to write a little bit of code to ingest new data from the CMS into Rama. This greatly expands the market for Forward Publishing as they can interoperate with any CMS.

Shortly before Rama was announced, the engineers at G+D Netcetera realized the backend architecture of Forward Publishing was wasteful. The number of updates to the CMS is only hundreds per day, so the vast majority of the work to compute pages was repeated each time the TTL of the cache expired. Pages usually only have a few changes when they’re recomputed. Flipping the architecture by maintaining a denormalized view of each page that’s incrementally updated as new content arrives seemed like it would be a big improvement.

They initially built a few prototypes with core.async and Missionary to explore this idea. However, these did not address how to store and query the denormalized views in a data-parallel and fault-tolerant manner. As soon as Rama was announced, they realized it provided everything needed to make their idea a reality.

Traditional databases, especially RDBMS’s, serve as both a source of truth and an indexed store to serve queries. There’s a major tension from serving both these purposes, as you want data in a source of truth to be fully normalized to ensure data integrity and consistency. What G+D Netcetera found out the hard way is that only storing data fully normalized caused significant performance issues for their application.

Rama explicitly separates the source of truth from the indexed datastores that serve queries. It provides a coherent and general model for incrementally materializing indexed datastores from the source of truth in a scalable, high-performance, and fault-tolerant way. You get the data integrity benefits of full normalization and the freedom to fully optimize indexed datastores for queries in the same system. That tension between data integrity and performance that traditionally exists just does not exist in Rama.

Backend dataflow

At the core of G+D Netecetera’s Rama-based implementation is a microbatch topology that maintains a denormalized view of each page as new content comes in. The denormalized views are PStates (“partitioned state”) that reduce the work to render a page to just a single lookup by key.

New content could be a new article, edits to an existing article, or edits to the layout of a page. The microbatch topology determines all denormalized entities affected by new content, which involves updating and traversing the graph of entities. This whole process takes less than a second and results in content always being fresh for all pages.

G+D Netcetera built a small internal library similar to Pregel on top of Rama’s dataflow abstractions. This allows them to easily express the code performing graph operations like the aforementioned traversals.

The core microbatch topology relies heavily on Rama’s batch blocks, a computation abstraction that has the same capabilities as relational languages (inner joins, outer joins, aggregation, subqueries). Batch blocks are the core abstraction that enables G+D Netcetera’s graph computations.

PState examples

Unlike databases, which have fixed data models (e.g. relational, key/value, graph, column-oriented), PStates can have any data model conceivable. PStates are based on the simpler primitive of data structures, and each data model is just a particular combination of data structures. This flexibility allows each PState to be tuned to exactly match the use cases they support. G+D Netcetera materializes many PStates of many different shapes in their Forward Publishing implementation.

Let’s take a look at some of their PState schemas to see how they use the flexibility of PStates. These examples are in Clojure because G+D Netcetera uses Rama’s Clojure API, but Rama also has a Java API.

All the PState definitions below make use of this data type to identify entities:

1
(defrecord EntityId [tag id])

An entity ID consists of a “tag” (e.g. page, article, image) and an ID in the scope of that tag. You can store any types inside PStates, so this defrecord definition is used directly.

Two of their PStates are similar to how Datomic indexes data, called $$eavt and $$avet (PState names always begin with $$ ). The $$eavt definition is this:

1
2
3
4
5
6
7
(declare-pstate
  topology
  $$eavt
  {EntityId
   (map-schema
     Keyword ; attribute
     #{Object})}) ; values

This PState is a map from entity ID to a map from attribute to a set of values. The operations efficient on a PState are the same kinds of operations that are efficient on corresponding in-memory data structures. This PState efficiently supports queries like: “What are all the attributes for a particular entity?”, “What are all the values for an entity/attribute pair?”, “how many attributes does an entity have?”, and so on.

The $$avet PState is defined like this:

1
2
3
4
5
6
7
8
(declare-pstate
  topology
  $$avet
  {Keyword ; attribute
   (map-schema
     Object ; value
     (set-schema EntityId {:subindex? true})
     {:subindex? true})})

This PState is a map from attribute to a map from value to a set of entities. This allows different kinds of questions to be answered efficiently, like: “What are all the values associated with an attribute?”, “What are all the entities associated with a particular attribute/value pair?”, “How many entities have a particular attribute value?”, and many others.

This PState uses subindexing, which causes those nested data structures to index their elements individually on disk rather than serialize/deserialize the entire data structure as one value on every read and write. Subindexing enables reads and writes to nested data structures to be extremely efficient even if they’re huge, like containing billions of elements. As a rule of thumb, a nested data structure should be subindexed if it will ever have more than a few hundred elements. Since the number of values for an attribute and the number of entities for a particular attribute/value is unbounded, these nested data structures are subindexed.

While these PStates are similar to the data model of an existing database, G+D Netcetera has many PStates with data models that no database has. Here’s one example:

1
2
3
4
5
6
7
8
9
10
(declare-pstate
  topology
  $$dynamic-lists
  {EntityId
   (fixed-keys-schema
     {:spec  [[Object]]
      :max   Long
      :items (set-schema Object {:subindex? true}) ; set of [timestamp, EntityId]
      :count Long
      })})

This PState supports an incremental Datalog engine they wrote for finding the top matching entities for a given criteria. Dynamic lists are specified by users of Forward Publishing and can be part of the layout of pages. The idea behind “lists” on a page is to show related content a reader is likely to click on. Manually curating lists is time-intensive, so dynamic lists allow users to instead specify a list as a query on the attributes of all stories in the system, such as story category, location, and author.

A dynamic list has its own ID represented as an EntityId , and the fields of each dynamic list are:

  • :spec : the Datalog query which specifies which content to find for the list. A query could say something like “Find all stories that match either: a) from the region ‘Switzerland’ and tagged with “technology”, or b) tagged with ‘database'”.
  • :max : the maximum number of items to keep from entities that match
  • :items : the results of the Datalog query, tuples of timestamp and EntityId
  • :count : the total number of entities found of which at most :max were selected for :items

Dynamic lists can be dynamically added and removed, and they’re incrementally updated as new content arrives. The code implementing this incremental Datalog engine is only 700 lines of code in the Forward Publishing codebase. By storing the results in a PState, the dynamic lists are durable and replicated, making them highly available against faults like process death or node death.

Module overview

A Rama module contains all the storage and computation for a backend. Forward Publishing is implemented with two modules. The first module manages all content and contains:

The second module is smaller and manages user preferences. It consists of:

Topologies

Stream topologies are used for interactive features that require single-digit millisecond update latency. Microbatch topologies have update latency on the order of a few hundred milliseconds, but they have expanded computation capabilities (the “batch blocks” mentioned before) and higher throughput. So microbatch topologies are used for all features that don’t need single-digit millisecond update latency.

Two of the microbatch topologies implement the core of Forward Publishing, while the other ones support various miscellaneous needs of the product. The two core microbatch topologies are:

  • “ingress”: Ingests new content from the CMS and appends it to a Rama depot called *incoming-entities
  • “main”: Consumes *incoming-entities to integrate new content into a bidirectional graph. This topology then implements the denormalization algorithm that traverses the graph.

The miscellaneous topologies include:

  • “aliases”: A stream topology that maps entity IDs to other entity IDs. Some entities can be referenced by an “alias” entity ID, and this tracks the mapping from an alias to the primary entity ID.
  • “bookmarks”: A stream topology that tracks pages bookmarked by users.
  • “recently-read”: A stream topology that tracks articles recently read by users.
  • “redirects”: A stream topology that maps URLs to alternate URLs to support browser redirects.
  • “selected-geo”: A stream topology that maps users to geographical regions of interest.
  • “indexing”: A microbatch topology tracking metadata about regions and lists.
  • “routing”: A microbatch topology tracking metadata for each page of the website.
  • “sitemap”: A microbatch topology that incrementally updates a sitemap for the website. Sitemaps allow search engines to index the full website. Before Rama, Forward Publishing could only update the sitemap once per month in an expensive batch process. Now, the sitemap is always up to date.
  • “dynamic-lists”: A microbatch topology that implements the incremental Datalog engine described above.

Summary

Rama has been a game changer for G+D Netcetera, improving the performance of multiple key metrics by over 100x while simultaneously reducing the cost of operation by 55%. Additionally, their system is much more stable and fault tolerant than it was before.

G+D Netcetera found it takes about two weeks for a new engineer to learn Rama and become productive, with “batch blocks” being the biggest learning hurdle. The benefits they’ve achieved with Rama have made that modest investment in learning well worth it.

You can get in touch with us at consult@redplanetlabs.com to schedule a free consultation to talk about your application and/or pair program on it. Rama is free for production clusters for up to two nodes and can be downloaded at this page.

Permalink

Small modular parts

Our last episode was with David Nolen. We talk about his development process, his origin, and his philosophy. The next episode is on Tuesday, April 22 with special guest Fogus. Please watch us live so you can ask questions.

I have finally released the new version of Introduction to Clojure, my flagship module in Beginner Clojure, my signature video course. This update is long overdue, but it makes up for its tardiness with fully updated content, modernized for current idioms and better teaching.

If you have the previous edition, you can already find the new edition in your dashboard. You get the upgrade for free as my thanks for being a part of this crazy journey.

If you buy Beginner Clojure now, you’ll also get the new version. Because it’s such a major upgrade, I’m going to raise the prices soon. If you want it, now is the time to buy. It will never be this cheap again.


Small modular parts

I’ve been seeping in the rich conceptual stews of Patterns of Software. In it, Richard Gabriel explores how the ideas of Christopher Alexander apply to software engineering (long before the GoF Design Patterns book). One of the early ideas in the book is that of habitability, the characteristic of a building to support human life. Architecture needs to provide the kinds of spaces humans need, and also be adaptable to changing life circumstances. A house must support time together, time alone, and time alone together (sharing a space but doing different things). But it also must allow adding an extra bedroom as your family grows.

Habitable software is analogous. Programmers live in the code. They must feel comfortable navigating around, finding what they need, and making changes as requirements change. Christopher Alexander says that it is impossible to create truly living structures out of modular parts. They simply don’t adapt enough to the circumstances.

However, we know that’s not entirely true. Bricks are modular parts, and many of the living examples he gives are buildings made of bricks. It must be that the modules need to be small enough to permit habitability. You can’t adjust the size of a wall if the wall is prefabricated. But you can adjust the size of the wall if the bricks are prefabricated to a resolution that is just right.

This is true in software as well. Large modules are not as reusable as small ones. Take classical Java. I think the language gets the size of the abstractions wrong. The affordances of the language are the for/if/… statements, arithmetic expressions, and method calls, plus a way to compose those up into a new class. It goes from the lowest level (basically C) to a very high level, with very little in-between.

Contrast that with Clojure, which gives you many general-purpose abstractions at a higher level than Java (map/filter/reduce, first-class functions, generic data structures), and then almost nothing above it. Just the humble function to parameterize and name a thing. Lambda calculus (basically first-class functions) goes a long way. Java’s methods and classes give you a way to build procedural abstractions over data storage and algorithms, but the language offers torturous facilities for control flow abstractions. First-class functions can abstract control flow. I think Clojure got the level right.

Except maybe Clojure’s standard library overdoes it. I’m a big fan of map/filter/reduce. You can do a lot with them. But then there are others. For instance, there’s keep, which is like `map` but it rejects `nil`s. And there’s `remove`, which is the opposite of `filter`. Any call to keep could be rewritten:

(keep {:a 1 :b 2 :c 3} [:a :b :c :d])

(filter some? (map {:a 1 :b 2 :c 3} [:a :b :c :d]))

Those two are equivalent. `remove` can also be rewritten:

(remove #{:a :b :c} [:a :b :c :d])

(filter (comp #{:a :b :c}) [:a :b :c :d])

I do use keep and remove sometimes, when I think about them. But how much do they really add? Is the cost of learning these worth it? How often do you have to switch it back to map and filter anyway to make the change you want?

Here’s what I think: keep is just slightly too big. It’s a modular part that does just a tad too much. map is like a standard brick. keep is like an L-shaped brick that’s only useful at the end of a wall or on a corner. Useful but not that useful, and certainly not necessary. The same is true of remove. It’s not useful enough.

I think Clojure did a remarkably good job of finding the right size of module. They feel human-scale, ready for composition in an understandable way. It makes programs of medium to large size feel more habitable. I see this about what little Smalltalk code I’ve read: Smalltalk’s classes are small, highly general modular units, like Point and Rectangle, not UserPictureDrawingManager.

One aspect of habitability is maintainability—the de moda holy grail of software design. Back in 1996, when Patterns of Software was published, Gabriel felt the need to argue against efficiency as the reason for software design. Somewhere between efficiency’s reign and today’s maintainability, code size and then complexity ruled.

Long-time readers may guess where I’m going: These characteristics all focus on the code. Abstraction gets talked about in terms of its (excuse me) abstract qualities. An abstraction is too big or small, too high- or low-level, too shallow or deep. At best, we’re talking about something measurable in the code, at worst, some mental structures only in the mind of the guru designer who talks about it.

I want to posit domain fit as a better measure—one that leads to habitability—and that is also objective. Domain fit asks: “How good is the mapping between what your code represents and the meanings available in the domain?” That mapping goes both ways. We ask both “How easily can I express a domain situation in the code?” and “How easily does the code express the domain situation it represents?” Fit covers both directions of expressivity.

I believe that domain misfit causes the most difficulties for code habitability. If your code doesn’t fit well with the domain, you’ll need many workarounds. Using reusable modules is only a problem because they don’t adapt well to the needs of your domain—not because they’re too big. It just so happens that bigger modules are harder to adapt. It’s not that a wall module is bad, per se, just that it’s almost never exactly the right size, and so you make compromises.

It’s not that the components in Clojure are the right size. It’s that Clojure’s domain—data-oriented programming—is the right size for many problems. It allows you, the programmer, to compose a solution out of parts—like bricks in a wall. And Clojure’s code fits the domain very well. Tangentially: It makes me wonder what the domain of Java is. I guess what I’m saying is that using a vector graphics API to do raster graphics is going to feel uninhabitable. But you can’t say it’s because vector graphics is a bigger abstraction than raster. It’s more about having the right model.

Now, Alexander might disagree that a pre-fab wall of exactly the right size is okay. He believes that there’s something in the handmadeness of things, too. It’s not just that the wall is the wrong or right size. Even if it were perfect, the perfection itself doesn’t lend itself to beauty. Geometrically precisely laid tiles cross some threshold where you don’t feel comfortable anymore. Ragged symmetry is better. We want bricks but they shouldn’t be platonic prisms.

So this is where I conclude and tease the next issue. I started this essay thinking size was important. I thought that Clojure got it right by finding a size of composable module that was a sweet spot. But now, I think it’s not about size. I don’t even know what size means anymore. It’s more about domain fit than ever. Perhaps I’m digging in deeper to my own biases (and please, I’m relying on you the reader to help me realize if I am). But this is what my reading is leading me to—the importance of building a domain model. When we talk about domain models, we often think of these jewel-like abstractions with perfect geometry. But this is a pipe dream. Our domains are too messy for that. In the next issue, I want to explore the dichotomy of geometric and organic adaptation.

Permalink

The Duality of Transducers

I finally got around to re-recording and posting this talk on Clojure’s transducers that I gave last year to the Austin Clojure Meetup:

The talk walks through what transducers are, their benefits, where they make sense to use (and where they don’t), and how to implement them from scratch.

The title refers to an idea that really helped make transducers click for me: namely that there are two different conceptual models of transducers that I needed to apply (to different contexts) to really get it. (This reminded me a lot of the wave-particle duality of light in physics, which describes a single underlying reality in two different ways, with each way tending to prove more practical in analyzing particular scenarios).

The two models, then, are:

  1. The encapsulated transformation model, where the transducer is an opaque representation of a (likely-compound) transformation of a collection, which merges with other transformations via the mechanical application of comp. And…

  2. The constructive model, where we’re dealing in the underlying machinery of transducers (say, implementing a transducible process), and it’s helpful to conceptualize a transducer simply as a function from reducing-function to reducing-function (rf -> rf).

I hope that comes through clearly in the presentation.

If you don’t use transducers in your Clojure code today, I highly suggest you do—you will see benefits. And I’m convinced that the best way to get them to really click is to implement them from scratch, which the video will walk you through.

For your convenience, here are the slides for the presentation. The links are not clickable there (sorry), but are all included in the video description on YouTube.

If you have any questions, feedback, or need mentorship, feel free to reach out on Bluesky, Mastodon, or X and I’d be happy to help.

Permalink

No, really, you can’t branch Datomic from the past (and what you can do instead)

I have a love-hate relationship with Datomic. Datomic is a Clojure-based database based on a record of immutable facts; this post assumes a passing familiarity with it – if you haven’t yet, I highly recommend checking it out, it’s enlightening even if you end up not using it.

I’ll leave ranting on the “hate” part for some other time; here, I’d like to focus on some of the love – and its limits.

Datomic has this feature called “speculative writes”. It allows you to take an immutable database value, apply some new facts to it (speculatively, i.e., without sending them over to the transactor – this is self-contained within the JVM), and query the resulting database value as if those facts had been transacted for real.

This is incredibly powerful. It lets you “fork” a Datomic connection (with the help of an ingenious library called Datomock), so that you can see all of the data in the source database up to the point of forking, but any new writes happen only in memory. You can develop on top of production data, but without any risk of damaging them! I remember how aghast I was upon first hearing about the concept, but now can’t imagine my life without it. Datomock’s author offers an analogy to Git: it’s like database values being commits, and connections being branches.

Another awesome feature of Datomic is that it lets you travel back in time. You can call as-of on a database value, passing a timestamp, and you get back a db as it was at that point in time – which you can query to your heart’s content. This aids immensely in forensic debugging, and helps answer questions which would have been outright impossible to answer with classical DBMSs.

Now, we’re getting to the crux of this post: as-of and speculative writes don’t compose together. If you try to create a Datomocked connection off of a database value obtained from as-of, you’ll get back a connection to which you can transact new facts, but you’ll never be able to see them. The analogy to Git falls down here: it’s as if Git only let you branch HEAD.

This is a well-known gotcha among Datomic users. From Datomic’s documentation:

as-of Is Not a Branch

Filters are applied to an unfiltered database value obtained from db or with. In particular, the combination of with and as-of means "with followed by as-of", regardless of which API call you make first. with plus as-of lets you see a speculative db with recent datoms filtered out, but it does not let you branch the past.

So it appears that this is an insurmountable obstacle: you can’t fork the past with Datomic.

Or can you?

Reddit user NamelessMason has tried to reimplement as-of on top of d/filter, yielding what seems to be a working approach to “datofork”! Quoting his post:

Datomic supports 4 kinds of filters: as-of, since, history and custom d/filter, where you can filter by arbitrary datom predicate. […]

d/as-of sets a effective upper limit on the T values visible through the Database object. This applies both to existing datoms as well as any datoms you try to add later. But since the tx value for the next transaction is predictable, and custom filters compose just fine, perhaps we could just white-list future transactions?

(defn as-of'' [db t]
  (let [tx-limit (d/t->tx t)
        tx-allow (d/t->tx (d/basis-t db))]
    (d/filter db (fn [_ [e a v tx]] (or (<= tx tx-limit) (> tx tx-allow))))))

[…] Seems to work fine!

Sadly, it doesn’t actually work fine. Here’s a counterexample:

(def conn (let [u "datomic:mem:test"] (d/create-database u) (d/connect u)))

;; Let's add some basic schema
@(d/transact conn [{:db/ident :test/id :db/valueType :db.type/string
                    :db/cardinality :db.cardinality/one :db/unique :db.unique/identity}])
(d/basis-t (d/db conn)) ;=> 1000

;; Now let's transact an entity
@(d/transact conn [{:test/id "test", :db/ident ::the-entity}])
(d/basis-t (d/db conn)) ;=> 1001

;; And in another transaction let's change the :test/id of that entity
@(d/transact conn [[:db/add ::the-entity :test/id "test2"]])
(d/basis-t (d/db conn)) ;=> 1003

;; Trying a speculative write, forking from 1001
(def db' (-> (d/db conn)
             (as-of'' 1001)
             (d/with [[:db/add ::the-entity :test/id "test3"]])
             :db-after))
(:test/id (d/entity db' ::the-entity)) ;=> "test" (WRONG! it should be "test3")

To recap what we just did: we transacted version A of an entity, then an updated version B, then tried to fork C off of A, but we’re still seeing A’s version of the data. Can we somehow save the day?

To see what d/filter is doing, we can add a debug println to the filtering function, following NamelessMason’s example (I’m translating tx values to t for easier understanding):

(defn as-of'' [db t]
  (let [tx-limit (d/t->tx t)
        tx-allow (d/t->tx (d/basis-t db))]
    (d/filter db (fn [_ [e a v tx :as datom]]
                   (let [result (or (<= tx tx-limit) (> tx tx-allow))]
                     (printf "%s -> %s\n" (pr-str [e a v (d/tx->t tx)]) result)
                     result)))))

Re-running the above speculative write snippet now yields:

[17592186045418 72 "test" 1003] -> false
[17592186045418 72 "test" 1001] -> true

So d/filter saw that tx 1003 retracts the "test" value for our datom, but it’s rejected because it doesn’t meet the condition (or (<= tx tx-limit) (> tx tx-allow)). And at this point, it never even looks at datoms in the speculative transaction 1004, the one that asserted our "test3". It looks like Datomic’s d/filter does some optimizations where it skips datoms if it determines they cannot apply based on previous ones.

But even if it did do what we want (i.e., include datoms from tx 1001 and 1004 but not 1003), it would have been impossible. Let’s see what datoms our speculative transaction introduces:

(-> (d/db conn)
    (as-of'' 1001)
    (d/with [[:db/add ::the-entity :test/id "test3"]])
    :tx-data
    (->> (mapv (juxt :e :a :v (comp d/tx->t :tx) :added))))
;=> [[13194139534316 50 #inst "2025-04-22T12:48:40.875-00:00" 1004 true]
;=>  [17592186045418 72 "test3" 1004 true]
;=>  [17592186045418 72 "test2" 1004 false]]

It adds the value of "test3" but retracts "test2"! Not "test"! It appears that d/with looks at the unfiltered database value to produce new datoms for the speculative db value (corroborated by the fact that we don’t get any output from the filtering fn at this point; we only do when we actually query db'). Our filter cannot work: transactions 1001 plus 1004 would be “add "test", retract "test2", add "test3"”, which is not internally consistent.

So, no, really, you can’t branch Datomic from the past.

Which brings us back to square one: what can we do? What is our usecase for branching the past, anyway?

Dunno about you, but to me the allure is integration testing. Rather than having to maintain an elaborate set of fixtures, with artificial entity names peppered with the word “example”, I want to test on data that’s close to production; that feels like production. Ideally, it is production data, isolated and made invincible by forking. At the same time, tests have to behave predictably: I don’t want a test to fail just because someone deleted yesterday an entity from production that the test depends on. Being able to fork the past would have been a wonderful solution if it worked, but… it’s what it is.

So now I’m experimenting with a different approach. My observation here is that my app’s Datomic database is (and I’d wager a guess that most real-world DBs are as well) “mostly hierarchical”. That is, while its graph of entities might be a giant strongly-connected blob, it can be subdivided into many small subgraphs by judiciously removing edges.

This makes sense for testing. A test typically focuses on a handful of “top-level entities” that I need to be present in my testing database like they are in production, along with all their dependencies – sub-entities that they point to. Say, if I were developing a UI for the MusicBrainz database and testing the release page, I’d need a release entity, along with its tracks, label, medium, artist, country etc to be present in my testing DB. But just one release is enough; I don’t need all 10K of them.

My workflow is thus:

  • create an empty in-memory DB
  • feed it with the same schema that production has
  • get hold of a production db with a fixed as-of
  • given a “seed entity”, perform a graph traversal (via EAVT and VAET indexes) starting from that entity to determine reachable entities, judiciously blacklisting attributes (and whitelisting “backward-pointing” ones) to avoid importing too much
  • copy those entities to my fresh DB
  • run the test!

This can be done generically. I’ve written some proof-of-concept code that wraps a Datomic db to implement the Loom graph protocol, so that one can use Loom’s graph algorithms to perform a breadth-first entity scan, and a function to walk over those entities and convert them to a transaction applicable on top of a pristine DB. So far I’ve been able to extract meaningful small sub-dbs (on the order of ~10K datoms) from my huge production DB of 17+ billion datoms.

This is a gist for now, but let me know if there’s interest and I can convert it into a proper library.

Permalink

Clojure Submaps

The gentleman behind Clojure Diary was kind enough to recently post a video based on a small comment I made on his YouTube channel: Clojure Diary - Elegant way of filtering maps based on key value pairs.

(Please check out his channel as he’s doing a great job regularly posting videos about his journey through Clojure!)

In my comment I used a little submap? function from my personal utilities library that I’ve gotten a lot of mileage out of the last year or two.

I thought I’d mention it here. It’s isn’t anything magical, but I think it’s part of my personal standard library from here on out:

(defn submap?
 "Are all of the key-value pairs in `m1` also in `m2`?"
 [m1 m2]
 (= m1 (select-keys m2 (keys m1))))

(submap {:a 1, :b 2} {:a 1, :b 2, :c 3}} ; => true
(submap {:a 1, :b 2} {:a 1, :b 4, :c 3}} ; => false

One of the things I use it for constantly is unit tests where I want to assert that a map contains multiple key-value pairs, but I don’t want to assume that a map contains only the specified key-value pairs (in the spirit of the open-world assumption)1.

It also plays nicely with clojure.test, yielding useful test output which shows the full value tested against:

Fail in handler

expected: (utils/submap? {:status 200} (sut/handler {:method :get, :path "/"}))
 actual: (not
 (utils/submap?
 {:status 200}
 {:status 404,
 :body "Not Found",
 :headers {"Content-type" "text/html"}}))

  1. I hear https://github.com/nubank/matcher-combinators is another great option here if you’re ready to invite a new dependency into your project. ↩︎

Permalink

Local S3 storage with MinIO for your Clojure dev environment

Simple Storage Service or S3 Cloud Object Storage is a versatile and cheap storage. I’ve used it for invoices, contracts, media, and configuration snapshots, among other things. It’s a perfect fit when the object key (or what many might think of as a filename or path) can be used as a unique key to retrieve what you need — and doing it from Clojure is no exception.

💡 This post assumes you have the following tools installed: Docker, Docker Compose, aws CLI tool and Clojure ofcourse.

S3 is best for write-once-read-many use cases, and in a well-designed application architecture, you can often derive an object key (path) from already available data, like a customer number.

Object key (path) examples:

customer-58461/invoice-1467.pdf
marketing-campaign-14/revision-3.template

Though S3 was originally an Amazon product introduced in 2006, many cloud providers now offer fully compatible services. S3 is cheaper than traditional block storage (also known as volume storage or disk storage), and while it is also slower, it scales extremely well, even across regions (like EU and US).

I probably wouldn’t be so enthusiastic about S3 if it weren’t so easy to use in a local development environment. That’s where MinIO, running in a Docker container, comes in.

I usually commit a docker-compose.yml file to the repo alongside any code that requires S3-compatible object storage:

services:
  s3:
    image: minio/minio
    ports:
      - "9000:9000"     # S3 API service endpoint
      - "9001:9001"     # Web interface
    volumes:
      - './data/minio:/data' # Persist data; path must match command 👇
    command: server /data --console-address ":9001"
    environment:
      MINIO_DOMAIN: localhost:9000 # Requried for virtual-host bucket lookups
      MINIO_ROOT_USER: AKIAIOSFODNN7EXAMPLE
      MINIO_ROOT_PASSWORD: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Amazon S3 typically runs on the default HTTPS port (443), while MinIO defaults to port 9000. The web interface on port 9001 is excellent for browsing bucket content where using code or CLI is “too much”. Storing data on a volume outside the container makes it easy to update MinIO without losing data since “updating” usually means stopping and deleting the old container and then starting a new container.

A new version of the Docker image can be pulled with:

docker pull minio/minio

Setting the MINIO_DOMAIN environment variable is required to support “virtual host”-style presigned URLs (more on that later). Root user credentials are configured using MINIO_ROOT_* environment variables, and these credentials work for both logging into the web interface and generating presigned URLs.

Before starting the service, it’s possible to pre-create buckets by creating local folders inside the data volume:

mkdir -p data/minio/mybucket1

⚠️ Warning: Buckets created this way will be missing metadata, which can cause issues when listing ALL buckets.

Start the S3-compatible storage (MinIO) with:

docker compose up s3  # (Press CTRL+C to stop)

For using older versions of Docker Compose it can look like: docker-compose up s3 (notice the dash).

An AWS profile is handy when using the AWS CLI, among other things. The following will create a profile named minio:

Add these lines to $HOME/.aws/credentials:

[minio]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

And these to $HOME/.aws/config:

[profile minio]
region = us-east-1
endpoint_url = http://localhost:9000

Notice: MinIO’s default region is us-east-1, and the endpoint URL matches the configuration in docker-compose.yml.

Make sure that everything is configured correctly by checking read access:

aws s3 ls mybucket1 --profile minio

This should return a non-error but empty result.

Now, add a file to the bucket:

echo "Hello, World!" > hello.txt
aws s3 cp hello.txt s3://mybucket1/ --profile minio

… then verify the new content:

aws s3 ls mybucket1 --profile minio

Lo and behold — the bucket now has content.

Alternatively, set the environment variable AWS_PROFILE (using export AWS_PROFILE=minio) to avoid repeating --profile minio in each command.

Having a local S3 bucket, let’s start interacting with it from Clojure.

The S3 clients from Cognitect AWS API and Aw Yeah (AWS API for Babashka), implicitly resolve credentials the same way the Amazon Java SDK and the aws CLI tool do.

By leveraging the AWS_PROFILE environment variable, both the Clojure code and the aws CLI tool will use the exact same configuration. This makes any credential-related issues easier to reproduce outside of your code. In theory, this also helps avoid the need for code along the lines (if production? ... and (if dev? ... — which is usually a sign of poor software design.

There are many ways to manage environment variables, and each IDE handles them differently. To check if the REPL has picked up the expected environment configuration, use the following:

(System/getenv "AWS_PROFILE") ; Should return "minio"

In other contexts, I’d recommend avoiding implicit configuration, which using AWS_PROFILE is. However, in this case, it aligns with how AWS tooling is typically set up, making it intuitive for anyone already familiar with AWS. Plus, a wealth of resources (official docs, StackOverflow, etc.) rely on this convention.

To try out some code with the MinIO setup, start a REPL with AWS_PROFILE set and add the following dependencies to your deps.edn:

{:deps {com.cognitect.aws/api       {:mvn/version "0.8.741"}
        com.cognitect.aws/endpoints {:mvn/version "871.2.30.22"}
        com.cognitect.aws/s3        {:mvn/version "871.2.30.22"}
        dk.emcken/aws-simple-sign   {:mvn/version "2.1.0"}}}

To set up an S3 client:

(require '[cognitect.aws.client.api :as aws])

(def s3
  (aws/client {:api :s3 :endpoint-override {:protocol :http :hostname "localhost" :port 9000}}))

… and interact with the S3 bucket:

(aws/invoke s3 {:op :ListBuckets})
(aws/invoke s3 {:op :ListObjectsV2 :request {:Bucket "mybucket1"}})

Unfortunately, neither Cognitect’s aws-api nor awyeah-api, respect profiles or environment variables specifying an alternative endpoint, which is required when working with MinIO.

It forces the application to conditionally override the endpoint, which kinda ends up looking like (if dev? ... code 😭

I’ve reported the issue on GitHub. But until it’s fixed, I found the following generic code to be an acceptable compromise for working with MinIO locally:

(import '[java.net URL])

(defn url->map
  "Convenience function for providing a single connection URL over multiple
   individual properties."
  [^URL url]
  (let [port (.getPort url)
        path (not-empty (.getPath url))]
    (cond-> {:protocol (keyword (.getProtocol url))
             :hostname (.getHost url)}
      (not= -1 port)
      (assoc :port port)

      path
      (assoc :path path))))

(defn create-s3-client
  "Takes an endpoint-url (or nil if using default) and returns a S3 client."
  [endpoint-url]
  (let [url (some-> endpoint-url not-empty (URL.))]
    (cond-> {:api :s3}
      url (assoc :endpoint-override (url->map url))
      :always (aws/client))))

(def s3 ; local environment (using MinIO)
  (create-s3-client "http://localhost:9000"))

(def s3 ; production environment (using Amazon S3)
  (create-s3-client nil))

Now, replace the hardcoded local URL with an environment variable that is only set in a local development environment. The URL in the environment variable must match the value of the AWS profile setup.

; export MY_APP_S3_URL=http://locahost:9000
; only set MY_APP_S3_URL locally
(def s3
  (create-s3-client (System/getenv "MY_APP_S3_URL")))

The post could have ended here, but often it is beneficial to provide presigned URLs for the content in an S3 bucket. Luckily, aws-simple-sign presign URLs and work seamlessly with both previously mentioned S3 clients:

(require '[aws-simple-sign.core :as aws-sign])

; Using the already configured client from above:
(aws-sign/generate-presigned-url s3 "mybucket1" "hello.txt" {})
; Open URL in your browser and see "Hello, World!"

“Virtual hosted-style” presigned URLs only works, because the MINIO_DOMAIN environment variable is configured in docker-compose.yml. Alternatively, use path-style URLs. Checkout Amazons official documentation on the different styles, if you need a refresher.

This illustrates just how simple working with S3 from Clojure is. All the way from code iterations in the local development environment to transitioning the application into production.

Best of all, no AWS Java SDK is required. 🚀

Permalink

Setup Emacs to autoformat your Clojure code with Apheleia and zprint

Keeping code consistently formatted is important for readability and maintainability. Once you get used to having a computer format your code for you, manually formatting code can feel tedious.

For the last few years, my team has been using zprint to keep our Clojure codebase formatted to our specifications. zprint is great because it runs fast and is extremely customizable. This flexibility is powerful since it lets you format your code exactly how you want.

I've recently migrated from my own custom before-save-hook that triggered zprint whenever I saved a buffer to using Apheleia. Apheleia is an Emacs package that applies code formatters automatically on save. I won't quote the whole introduction in Apheleia's readme but it is designed to keep Emacs feeling responsive.

Here's the configuration I use in my Emacs setup:

(use-package apheleia
  :straight (apheleia :host github :repo "radian-software/apheleia")
  :config
  (setf (alist-get 'zprint apheleia-formatters)
        '("zprint" "{:style [:community] :map {:comma? false}}"))
  (setf (alist-get 'clojure-mode apheleia-mode-alist) 'zprint
        (alist-get 'clojure-ts-mode apheleia-mode-alist) 'zprint)
  (apheleia-global-mode t))

This snippet shows how to install and configure using straight.el and use-package. The :config section instructs apheleia under what modes it should run zprint and how to run it.1 I found the docstring for apheleia-formatters to be crucial for figuring out how to hook zprint into apheleia.

With this setup, your Clojure code will be automatically formatted using zprint every time you save. No more manual formatting needed. I've been running with this for a little while now and am enjoying it.

  1. I don't actually use :community and have my own custom formatting configuration but am using :community in this post so the snippet is immediately useful to readers.

Permalink

Optimizing syntax-quote

Syntax-quote in Clojure is an expressive reader macro for constructing syntax. The library backtick demonstrates how to write syntax-quote as a macro. Both approaches are ripe for optimization—take these programs that both evaluate to []:

'`[]
;=> (clojure.core/apply
;     clojure.core/vector
;     (clojure.core/seq (clojure.core/concat)))

(macroexpand-1 '(backtick/syntax-quote []))
;=> (clojure.core/vec (clojure.core/concat))

The reason syntax-quote’s expansion is so elaborate comes down to unquote-splicing (~@) support. When the program passed to ~@ is completely dynamic, the extra computation is essential.

(macroexpand-1 '(backtick/syntax-quote [1 ~@a 4]))
;=> (clojure.core/vec (clojure.core/concat [(quote 1)] a [(quote 4)]))

Since a cannot be evaluated at compile-time, we can’t do much better than this in terms of code size. The problem is that we’re stuck with this extra scaffolding even when ~@ is never used. We don’t even need it for unquote (~):

(macroexpand-1 '(backtick/syntax-quote [1 ~two ~three 4]))
;=> (clojure.core/vec
;     (clojure.core/concat
;       [(quote 1)] [two] [three] [(quote 4)]))

A more direct expansion would be ['1 two three '4].

I have implemented a branch of backtick that optimizes syntax-quote to only pay for the penalty of ~@ if it is used.

You can see the progression of the generated program becoming more dynamic as less static information can be inferred.

(macroexpand-1 '(backtick/syntax-quote []))))
;=> []
(macroexpand-1 '(backtick/syntax-quote [~local-variable]))))
;=> [local-variable]
(macroexpand-1 '(backtick/syntax-quote [~@local-variable]))))
;=> (clojure.core/vec local-variable)

Future work includes flattening spliced collections, such as:

(macroexpand-1 '(backtick/syntax-quote [1 ~@[two three] 4]))
;=> (clojure.core/vec
;     (clojure.core/concat [(quote 1)] [two three] [(quote 4)]))

This should be simply ['1 two three '4].

PR’s to my branch are welcome for further performance enhancements.

Also, if you are interested in implementing these changes in a real Clojure implementation, jank is accepting contributions. It will directly help with ongoing efforts towards AOT compilation:

Permalink

Q2 2025 Funding Announcement

Clojurists Together is excited to announce that we will be funding 6 projects in Q2 2025 for a total of $33K USD (3 for $9K and 3 shorter or more experimental projects for $2K). Thanks to all our members for making this happen! Congratulations to the 6 developers below:

$9K Projects
Bozhidar Batsov: CIDER
Brandon Ringe: CALVA
Jeaye Wilkerson: Jank

$2K Projects
Jeremiah Coyle: Bling
Karl Pietrzak: CodeCombat
Siyoung Byun: Scicloj - Building Bridges to New Clojure Users

Bozhidar Batsov: CIDER

Provide continued support for CIDER, nREPL and the related libraries (e.g. Orchard, cidernrepl, etc) and improve them in various ways.

Some ideas that I have in my mind:

  • Improve support for alternative Clojure runtimes
  • Simplify some of CIDER’s internals (e.g. jack-in, session management)
  • Improve CIDER’s documentation (potentially record a few up-to-date video tutorials as well)
  • Improve clojure-ts-mode and continue the work towards it replacing clojure-mode
  • Add support for clojure-ts-mode in inf-clojure
  • Continue to move logic outside of cider-nrepl
  • Improvement to the nREPL specification and documentation; potentially built some test suite for nREPL specification compatibility
  • Various improvements to the nREPL protocol
  • Stabilize Orchard and cider-nrepl enough to do a 1.0 release for both projects
  • Build a roadmap for CIDER 2.0
  • Write up an analysis of the State of Clojure 2024 survey results (connected to the roadmap item)

Brandon Ringe: CALVA

I’ll be working on a new REPL output view for Calva, which is a webview in VS Code. The current default REPL output view utilizes an editor and somewhat emulates a terminal prompt. The performance of the editor view degrades when there’s a high volume of output and/or when there are large data structures printed in it. The webview will allow us to add more rich features to the output webview, while also providing better performance.

I’ve started this work, the and I’ll use the funding of Clojurists Together to get the work over the finish line and release an initial, opt-in version of the REPL output webview. I’ll also be adding tests, responding to user feedback about the feature, fixing bugs, and adding features to it.

This is the first feature of Calva that integrates with VS Code’s API directly from ClojureScript. This is partly an experiment to see if writing more of Calva in ClojureScript is a good idea; I suspect that it is.

Jeaye Wilkerson: Jank

In Q1 2025, I built out jank’s error reporting to stand completely in a category of its own, within the lisp world. We have macro expansion stack tracing, source info preserved across expansions so we can point at specific forms in a syntax quote, and even clever solutions for deducing source info for non-meta objects like numbers and keywords. All of this is coupled with gorgeous terminal reporting with syntax highlighting, underlining, and box formatting.

In Q2, I plan to aim even higher. I’m going to build jank’s seamless C++ interop system. We had native/raw, previously, for embedding C++ strings right inside of jank code. This worked alright, but it was tied to jank having C++ codegen. Now that we have LLVM IR codegen, embedding C++ is less practical. Beyond that, though, we want to do better. Here’s a snippet of what I have designed for jank this quarter.
; Feed some C++ into Clang so we can start working on it.
; Including files can also be done in a similar way.
; This is very similar to native/raw, but is only used for declarations.
; It cannot run code.
(c++/declare “struct person{ std::string name; };")
; let is a Clojure construct, but c++/person. creates a value
; of the person struct we just defined above, in automatic memory (i.e. no heap allocation). (let [s (c++/person. “sally siu”)
; We can then access structs using Clojure’s normal interop syntax. n (.-name s)
; We can call member functions on native values, too.
; Here we call std::string::size on the name member.
l (.size n)]
; When we try to gives these native values to println, jank will
; detect that they need boxing and will automatically find a
; conversion function from their native type to jank’s boxed
; object_ptr type. If such a function doesn’t exist, the
; jank compiler fails with a type error.
(println n l))

image

In truth, this is basically the same exact syntax that Clojure has for Java interop, except for the c++ namespace to disambiguate. Since I want jank to work with other langs in the future, I think it makes sense to spell out the lang. Later, we may have a swift or rust namespace which works similarly. But let’s talk about this code.

This interop would be unprecedented. Sure, Clojure JVM does it, but we’re talking about the native world. We’re talking about C++. Ruby, Python, Lua, etc. can all reach into C. The C ABI is the lingua franca of the native world. But here, we’re reaching into C++ from a dynamic lang. We’ll call constructors, pull out members, call member functions, and jank will automatically ensure that destructors are called for any locals. Furthermore, jank already has full JIT compilation abilities for C++ code, so that means we can use our seamless interop to instantiate templates, define new structs which never existed before, etc.

Jeremiah Coyle: Bling

Bling is a library for rich text formatting in the console. https://github.com/paintparty/bling Work on Bling in Q2 of 2025 will focus on the following 3 goals:

  • Add support for using hiccup to style and format messages
  • Add support a template string syntax to style and format messages
  • Create 1-3 additional formatting templates for callouts, headers, and points-of-interest.

The following 4 features are stretch goals for Q2. They will be pursued in the following order when the initial 3 goals are completed.

  • Add support automatic detection of the 3 levels of color support (16-color, 256-color, or Truecolor), using an approach similar to https://github.com/chalk/supports-color
  • Add documentation about how to leverage Bling to create great-looking warnings and errors in your own projects. Example of using bling’s templates to create nice warnings can be found {here:](https://github.com/paintparty/fireworks?tab=readme-ov-file#helpful-warnings-forbad-option-values)
  • Add documentation about using Bling in conjunction with existing libraries which format Spec and Malli messages into human readable form.
  • Support arbitrary hex colors, and their conversion, if necessary, to x256

Karl Pietrzak: Code Combat

My project will focus on adding Clojure(Script) to CodeCombat
See Wiki page at https://github.com/codecombat/codecombat/wiki/Aether

Siyoung Byun: Scicloj - Building Bridges to New Clojure Users

In 2025, Scicloj aims to improve the accessibility of Clojure for individuals working with data, regardless of their programming backgrounds. The project will initially focus on reviewing existing Scicloj libraries, analyzing their codebases, and actively using them to better understand their documentation structure. Specifically, the initial effort will concentrate on clearly organizing and distinguishing between tutorials and API documentation. From these insights, the project aims to develop standardized templates to encourage greater consistency across the documentation of existing Scicloj ecosystem libraries, making those libraries more robust and user-friendly.

Permalink

Clojure Power Tools Part 3

Clojure REPL

Clojure REPL.

Table of Contents

Introduction

I have already covered Clojure Power Tools some 5 years ago in a couple of blog posts:

In this new blog post, I will briefly summarize the most important tools discussed in those two blog posts and then introduce some new power tools that I have found useful recently. I thought it might be a good idea to list all the most important Clojure power tools in one blog post so that I don’t forget them in the future. I add to this list also Clojure libraries that I use in my Clojure/script fullstack applications.

I use this Clojure fullstack application to introduce those power tools: replicant-webstore.

VSCode, Calva and REPL Editor Integration

Your editor is of course one of your most important tools what ever programming language you use. My current choice is Visual Studio Code. It is rather light but also provides a rich set of extensions for various programming purposes. Nowadays, it also provides a good generative AI integration to help you with your programming tasks, I have written a couple of blog posts about Using Copilot in Programming about and my Copilot Keybindings.

If you are programming Clojure with VSCode editor, I defnitely recommend the excellent Calva extension. It provides a great Clojure REPL integration to VSCode, paredit structural editing, and much more. If you are interested trying Calva, I recommend reading the excellent Calva documentation and start using it. I have also written three blog posts regarding my Calva configurations:

An important part of using Clojure is the keybindings (e.g. for evaluating forms, giving paredit commands, etc.). I have written a couple of blog posts regarding my keybindings:

And one hint. Keep your VSCode configurations (at least keybindings.json and settings.json) in version control (Git).

Babashka

Babashka is a marvelous tool for writing scripts and automating tasks. I have written a couple of blog posts regarding Babashka:

I learned from one Metosin example project how to use Babashka as a task runner for my projects. See my latest Clojure fullstack exercise in which I used Babashka as a task runner, expecially file bb.edn and bb-scripts directory for how to start the backend and frontend REPLs.

Fullstack Libraries

Metosin Libraries: Reitit, Malli and Jsonista

These are my favourite Metosin Libraries I always include to my Clojure fullstack projects. You can use these libraries both in the backend and the frontend.

Reitit provides excellent routing functionalities. See in that clojure fullstack application I mentioned previously:

Malli provides excellent schema that you can use as a Clojure common (cljc) file that you can comprise both to your backend API and your frontend to validate that the backend returned data that conforms to the schema. See example in that Clojure fullstack application: schema.cljc.

Jsonista is a Clojure library for JSON encoding and decoding. Using Muuntaja you can easily do edn/json encoding in your API.

Aero and Integrant

Aero is an excellent configuration library. See example in config.edn regarding the demonstration application configuration and how to read it in main.clj.

Integrant provides a nice way to define your application from components, define the relationships between the components in your configuration (see the config.edn file above), and reset/reload the state of your application using those components. See also db.clj in which the defmethod ig/init-key :db/tsv function reads the tab separated file and initializes our little “demonstration database.”

Replicant and Hiccup

In the frontend, I used for years Reagent which is a React wrapper for Clojurescript. There are some technical challenges for Reagent to use the latest React versions, and I was therefore looking for some new UI Clojurescript technology. I first considered using UIx which is also a React wrapper for Clojurescript. But then I discovered Replicant which with Hiccup is a very lightweight and Clojurish way of doing frontend. I have covered Replicant in a couple of my blog posts:

Development Practices and Tools

REPL

If you are learning Clojure, Programming at the REPL is something you definitely have to learn. You should check what kind of REPL support there is with the editor you are using, and start learning to use it. If you are using VSCode, you find more information above in chapter VSCode, Calva and REPL Editor Integration.

I have three monitors at my desk. The main monitor is where I keep my VSCode editor. In the side monitor I keep the REPL window. This way I can maximize the main monitor for the editing, but also see in my side the REPL output. If you are using VSCode, this is easy. You first start the REPL with Calva. If you have done the same kind of Calva configuration that I have explained in my previous blog posts, you should have your Calva Output in VSCode editor area in a tab. Give VSCode command View: Move Editor into New Window, this will move the active editor tab into a new VSCode Window. Now you can move the REPL output window into your second monitor.

I have a couple similar commands to evaluate Clojure forms in Calva. Alt-L evaluates the form and outputs the result in the editor as an ephemeral output which you can reset with Esc key. With Alt+Shift+L Calva writes the evaluation result below the evaluated form like this:

  (keys (deref (:db/tsv (user/env))))
  ;;=> (:books :movies)

REPL is your power tool with Clojure and you should learn to use it efficiently.

Personal Profile Deps

This is my current ~/.clojure/deps.edn file:


{:aliases {:kari {:extra-paths ["scratch"]
                  :extra-deps {; NOTE: hashp 0.2.1 sci print bug.
                               hashp/hashp {:mvn/version "0.2.2"}
                               org.clojars.abhinav/snitch {:mvn/version "0.1.16"}
                               com.gfredericks/debug-repl {:mvn/version "0.0.12"}
                               djblue/portal {:mvn/version "0.58.5"}}}

           :reveal {:extra-deps {vlaaad/reveal {:mvn/version "1.3.284"}}
                    :ns-default vlaaad.reveal
                    :exec-fn repl}

           :outdated {;; Note that it is `:deps`, not `:extra-deps`
                      :deps {com.github.liquidz/antq {:mvn/version "2.11.1269"}}
                      :main-opts ["-m" "antq.core"]}}}

I use these tools quite often and therefore keep them in my personal profile kari.

I then add my kari profile to scripts I use to start REPL in development, like this:

:backend-repl-command ["clojure -M:dev:backend:frontend:shadow-cljs:calva-external-repl:test:kari -i bb-scripts/backendinit.clj -m nrepl.cmdline --middleware \"[cider.nrepl/cider-middleware,shadow.cljs.devtools.server.nrepl/middleware]\""]

Inline Defs

Inline defs is an old Clojure trick to debug Clojure code. Let’s explain it with a small example:

(defmethod ig/init-key :db/tsv [_ {:keys [path data] :as db-opts}]
  (log/infof "Reading tsv data, config is %s" (pr-str db-opts))
  (let [books (read-datafile (str path "/" (:books data)) book-line book-str)
        _ (def mybooks books) ;; THIS IS THE INLINE DEF
        movies (read-datafile (str path "/" (:movies data)) movie-line movie-str)]
    (atom {:books books
           :movies movies})))

(comment
  ;; AND HERE WE EXAMINE WHAT HAPPENED.
  (count mybooks)
  ;;=> 35
  (first mybooks)
  ;;=> {:id 2001,
  ;;    :product-group 1,
  ;;    :title "Kalevala",
  ;;    :price 3.95,
  ;;    :author "Elias Lönnrot",
  ;;    :year 1835,
  ;;    :country "Finland",
  ;;    :language "Finnish"}

I hardly ever use the Calva debugger, since Clojure provides much better tools to examine your live program state. Nowadays instead of inline defs, I use Snitch.

Hashp

I used to use Hashp quite often in my debugging sessions, but nowadays more Snitch. But instead of adding a prn line in some let and see the REPL output, hashp is a good alternative.

Portal

You can use portal in development to tap to various data. I have added a couple of examples how to tap to the data in files.

In the Clojure side, in routes.clj:

  ;; Example how to tap to the data using djblue Portal:
  (require '[clj-http.client :as client])
  (require '[jsonista.core :as json])
  (defn json-to-edn [json-str]
    (json/read-value json-str (json/object-mapper {:decode-key-fn keyword}))) 
  (json-to-edn "{\"name\": \"Book\", \"price\": 29.99}") 
  
  (:body (client/get "http://localhost:8331/api/products/books"))
  ;; Tap to the data:
  ; https://github.com/djblue/portal
  (require '[portal.api :as p])
  ; This should open the Portal window.
  (def p (p/open))
  (add-tap #'p/submit) 
  (tap> :hello)
  (tap> (json-to-edn (:body (client/get "http://localhost:8331/api/products/books"))))
  ;; You should now see a vector of book maps in the portal window.

In the Clojurescript side, in app.cljs:

  ;; Example how to tap to the data using djblue Portal: 
  (require '[portal.web :as p])
  ; NOTE: This asks a popup window, you have to accept it in the browser!!!
  (def p (p/open))
  ; Now you should have a new pop-up browser window...
  (add-tap #'p/submit)
  (tap> :hello)
  (tap> (get-in @!state [:db/data :books]))
  ;; You should now see a vector of book maps in the portal window.

Gadget

Gadget is nowadays my main debugging tool with Replicant. Gadget provides a very good view to your frontend state while developing the frontend.

Gadget

Gadget.

Calva Debugger

Calva provides a nice debugger. As I already explained before, I very seldom use it. But now, I just used it to provide the example below, and I realized that it is actually quite a nice tool, and I should use it more in the future.

Calva debugger

Calva debugger.

So, you just add the #dbg reader tag to your code and once your code execution goes to that point the debugger triggers.

Snitch

Peter Strömberg, the creator of Calva, once again introduced an excellent new tool to me: Snitch. I watched Peter’s excellent demo how he uses Snitch, and I immediately realized that I switch ad hoc inline defs to Snitch. I recommend watching Peter’s video on how to use Snitch.

Snitch is a tool that adds inline Defs to your function.

I mostly use defn* which injects inline defs for all the bindings in the function: parameters and let bindings. If I want to examine what happens in the function, my workflow is like this: 1. Change: def => def*. 2. Integrant reset. 3. Call the API (or what ever, which finally calls the function). 4. Examine bindings in the function by evaluating them in the function context.

Add this to user.clj

;; https://github.com/AbhinavOmprakash/snitch
(require '[snitch.core :refer [defn* defmethod* *fn *let]])

Bonus Tool: Copilot

My corporation provides GitHub Copilot Enterprise License. Copilot is a great tool to assist you in programming. I am still a bit of old school programmer in that sense that I hardly ever let Copilot to do editing in the actual text files, but I mostly have a conversation with Copilot in the VSCode integrated Copilot Chat view.

I have explained my Copilot use in this blog post: Copilot Keybindings .

Conclusions

Clojure is an excellent programming language. It has a rich ecosystem and tools that you just don’t have in other programming languages, due to the fact that other programming languages not being homoiconic languages just can’t have e.g. a real REPL.

The writer is working at a major international IT corporation building cloud infrastructures and implementing applications on top of those infrastructures.

Kari Marttila

Kari Marttila’s Home Page in LinkedIn: https://www.linkedin.com/in/karimarttila/

Permalink

Clojure Deref (Apr 17, 2025)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS). Thanks to Anton Fonarev for link aggregation.

Libraries and Tools

New releases and tools this week:

  • cursive 2025.1 - Cursive: The IDE for beautiful Clojure code

  • conjtest 0.0.1 - A command-line utility heavily inspired by and partially based on Conftest

  • cheshire 6.0.0 - Clojure JSON and JSON SMILE (binary json format) encoding/decoding

  • big-container - doom emacs development inside a container

  • datomic-pro-manager 1.0.0 - Download, setup, and run Datomic Pro backed by SQLite in a single command

  • amalgam 2.0.0 - Useful utilities and mixtures for com.stuartsierra/component

  • calva 2.0.501 - Clojure & ClojureScript Interactive Programming for VS Code

  • squint 0.8.146 - Light-weight ClojureScript dialect

  • cli 0.8.65 - Turn Clojure functions into CLIs!

  • slim 0.3.2 - The slim way to build Clojure

  • basilisp 0.3.8 - A Clojure-compatible(-ish) Lisp dialect targeting Python 3.9+

  • clojure-ts-mode 0.2.3 - The next generation Clojure major mode for Emacs, powered by TreeSitter

  • nippy 3.5.0 - Fast serialization library for Clojure

  • timbre 6.7.0 - Pure Clojure/Script logging library

  • tufte 2.7.0 - Simple performance monitoring library for Clojure/Script

  • http-kit 2.9.0-beta1 - Simple, high-performance event-driven HTTP client+server for Clojure

  • license-finder 0.4.0 - Finds licenses of your Clojure(Script) dependencies

  • babashka 1.12.198 - Native, fast starting Clojure interpreter for scripting

Permalink

Next-level backends with Rama: fault-tolerant timed notifications in 25 LOC

This is part of a series of posts exploring programming with Rama, ranging from interactive consumer apps, high-scale analytics, background processing, recommendation engines, and much more. This tutorial is self-contained, but for broader information about Rama and how it reduces the cost of building backends so much (up to 100x for large-scale backends), see our website.

Like all Rama applications, the example in this post requires very little code. It’s easily scalable to millions of reads/writes per second, ACID compliant, high performance, and fault-tolerant from how Rama incrementally replicates all state. Deploying, updating, and scaling this application are all one-line CLI commands. No other infrastructure besides Rama is needed. Comprehensive monitoring on all aspects of runtime operation is built-in.

In this post, I’ll explore implementing timed notifications in a Rama backend. More generally, I’ll be showing how to schedule future work in a fault-tolerant way.

The example will store a “feed” per user and accept events that specify when in the future to add an item to a user’s feed. Code will be shown in both Clojure and Java, with the total code being only 25 lines for each implementation. You can download and play with the Clojure implementation in this repository or the Java implementation in this repository.

Backend storage

Indexed datastores in Rama, called PStates (“partitioned state”), are much more powerful and flexible than databases. Whereas databases have fixed data models, PStates can represent infinite data models due to being based on the composition of the simpler primitive of data structures. PStates are distributed, durable, high-performance, and incrementally replicated. Each PState is fine-tuned to what the application needs, and an application makes as many PStates as needed. For this application, we’ll make two PStates: one to track the feed for each user, and one to manage scheduled work in the future.

Here’s the PState definition to track each user’s feed:

ClojureJava
1
2
3
4
(declare-pstate
  topology
  $$feeds
  {String (vector-schema String {:subindex? true})})
1
2
3
topology.pstate("$$feeds",
                PState.mapSchema(String.class,
                                 PState.listSchema(String.class).subindexed()));

This declares the PState as a map of lists, with the key being a username string and the inner lists containing the list of string items for that user’s feed. The inner list is declared as “subindexed”, which instructs Rama to store the elements individually on disk rather than the whole list read and written as one value. Subindexing enables nested data structures to have billions of elements and still be read and written to extremely quickly. This PState can support many queries in less than one millisecond: get the number of items in a feed, get a single item at a particular index, or get all items between two indices.

Here’s the definition of the PState to track scheduled work:

ClojureJava
1
2
(let [scheduler (TopologyScheduler. "$$scheduled")]
  (.declarePStates scheduler topology))
1
2
TopologyScheduler scheduler = new TopologyScheduler("$$scheduled");
scheduler.declarePStates(topology);

This uses a higher-level abstraction called TopologyScheduler from the open-source rama-helpers project. TopologyScheduler is a small helper that handles the storage and logic for scheduling future work, designed to be used as part of any module. It keeps state in a PState which is declared into a topology with the declarePStates method. You’ll soon see its helper methods for injecting code into a topology to schedule future work and handling scheduled work which is ready to execute. Because TopologyScheduler is built upon Rama’s primitives of PStates and topologies, it’s scalable and fault-tolerant.

declarePStates takes as an argument the name of the PState for this TopologyScheduler instance. Later on when methods are called on TopologyScheduler to inject code, it will automatically reference that PState when doing reads and writes. TopologyScheduler stores in that PState an ordered queue based on the timestamp for which each piece of future work is scheduled.

Let’s now review the broader concepts of Rama in order to understand how these PStates will be utilized.

Rama concepts

A Rama application is called a “module”. In a module you define all the storage and implement all the logic needed for your backend. All Rama modules are event sourced, so all data enters through a distributed log in the module called a “depot”. Most of the work in implementing a module is coding “ETL topologies” which consume data from one or more depots to materialize any number of PStates. Modules look like this at a conceptual level:

Modules can have any number of depots, topologies, and PStates, and clients interact with a module by appending new data to a depot or querying PStates. Although event sourcing traditionally means that processing is completely asynchronous to the client doing the append, with Rama that’s optional. By being an integrated system Rama clients can specify that their appends should only return after all downstream processing and PState updates have completed.

A module deployed to a Rama cluster runs across any number of worker processes across any number of nodes, and a module is scaled by adding more workers. A module is broken up into “tasks” like so:

A “task” is a partition of a module. The number of tasks for a module is specified on deploy. A task contains one partition of every depot and PState for the module as well as a thread and event queue for running events on that task. A running event has access to all depot and PState partitions on that task. Each worker process has a subset of all the tasks for the module.

Coding a topology involves reading and writing to PStates, running business logic, and switching between tasks as necessary.

Implementing the module

Let’s start implementing the module for timed notifications. The first step to coding the module is defining the depots:

ClojureJava
1
2
3
4
5
(defmodule TimedNotificationsModule
  [setup topologies]
  (declare-depot setup *scheduled-post-depot (hash-by :id))
  (declare-tick-depot setup *tick 1000)
  )
1
2
3
4
5
6
7
public class TimedNotificationsModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*scheduled-post-depot", Depot.hashBy("id"));
    setup.declareTickDepot("*tick", 1000);
  }
}

This declares a Rama module called “TimedNotificationsModule” with two depots. The first depot is called *scheduled-post-depot and will receive all scheduled post information. Objects appended to a depot can be any type. The second argument of declaring the depot is called the “depot partitioner” – more on that later.

To keep the example simple, the data appended to the depot will be defrecord objects for the Clojure version and HashMap objects for the Java version. To have a tighter schema on depot records you could instead use Thrift, Protocol Buffers, or a language-native tool for defining the types. Here’s the function that will be used to create depot data:

ClojureJava
1
(defrecord ScheduledPost [id time-millis post])
1
2
3
4
5
6
7
public static Map makeScheduledPost(String id, long timeMillis, String post) {
  Map ret = new HashMap();
  ret.put("id", id);
  ret.put("time-millis", timeMillis);
  ret.put("post", post);
  return ret;
}

The second depot *tick will be used for checking if any scheduled posts are ready to be appended to a user’s feed and then triggering the appropriate code to do so. Unlike the first depot, this is declared as a “tick depot”. Whereas a normal depot emits whenever new data is appended to it, a tick depot emits when the configured amount of time has passed. Subscribing to a tick depot from a topology is no different than subscribing to a regular depot. This particular tick depot is configured to emit once every 1000 milliseconds, meaning a scheduled post will be delivered within one second of its scheduled time.

Next, let’s begin defining the topology to consume data from the depot and materialize the PStates. Here’s the declaration of the topology with the PStates:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
(defmodule TimedNotificationsModule
  [setup topologies]
  (declare-depot setup *scheduled-post-depot (hash-by :id))
  (declare-tick-depot setup *tick 1000)
  (let [topology (stream-topology topologies "core")
        scheduler (TopologyScheduler. "$$scheduled")]
    (declare-pstate
      topology
      $$feeds
      {String (vector-schema String {:subindex? true})})
    (.declarePStates scheduler topology)
    ))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public class TimedNotificationsModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*scheduled-post-depot", Depot.hashBy("id"));
    setup.declareTickDepot("*tick", 1000);

    StreamTopology topology = topologies.stream("core");
    TopologyScheduler scheduler = new TopologyScheduler("$$scheduled");

    topology.pstate("$$feeds",
                    PState.mapSchema(String.class,
                                     PState.listSchema(String.class).subindexed()));
    scheduler.declarePStates(topology);
  }
}

This defines a stream topology called “core”. Rama has two kinds of topologies, stream and microbatch, which have different properties. In short, streaming is best for interactive applications that need single-digit millisecond update latency, while microbatching has update latency of a few hundred milliseconds and is best for everything else. Streaming is used here so a user gets immediate feedback that their post has been scheduled.

Notice that the PStates are defined as part of the topology. Unlike databases, PStates are not global mutable state. A PState is owned by a topology, and only the owning topology can write to it. Writing state in global variables is a horrible thing to do, and databases are just global variables by a different name.

Since a PState can only be written to by its owning topology, they’re much easier to reason about. Everything about them can be understood by just looking at the topology implementation, all of which exists in the same program and is deployed together. Additionally, the extra step of appending to a depot before processing the record to materialize the PState does not lower performance, as we’ve shown in benchmarks. Rama being an integrated system strips away much of the overhead which traditionally exists.

Let’s now add the code that consumes *scheduled-post-depot to get posts durably scheduled for the future:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(defmodule TimedNotificationsModule
  [setup topologies]
  (declare-depot setup *scheduled-post-depot (hash-by :id))
  (declare-tick-depot setup *tick 1000)
  (let [topology (stream-topology topologies "core")
        scheduler (TopologyScheduler. "$$scheduled")]
    (declare-pstate
      topology
      $$feeds
      {String (vector-schema String {:subindex? true})})
    (.declarePStates scheduler topology)
    (<<sources topology
      (source> *scheduled-post-depot :> {:keys [*time-millis] :as *scheduled-post})
      (java-macro! (.scheduleItem scheduler "*time-millis" "*scheduled-post"))
      )))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public class TimedNotificationsModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*scheduled-post-depot", Depot.hashBy("id"));
    setup.declareTickDepot("*tick", 1000);

    StreamTopology topology = topologies.stream("core");
    TopologyScheduler scheduler = new TopologyScheduler("$$scheduled");

    topology.pstate("$$feeds",
                    PState.mapSchema(String.class,
                                     PState.listSchema(String.class).subindexed()));
    scheduler.declarePStates(topology);

    topology.source("*scheduled-post-depot").out("*scheduled-post")
            .each(Ops.GET, "*scheduled-post", "time-millis").out("*time-millis")
            .macro(scheduler.scheduleItem("*time-millis", "*scheduled-post"));
  }
}

This part of the topology is only three lines, but there’s a lot to unpack here. The business logic is implemented with dataflow. Rama’s dataflow API is exceptionally expressive, able to intermix arbitrary business logic with loops, conditionals, and moving computation between tasks. This post is not going to explore all the details of dataflow as there’s simply too much to cover. Full tutorials for Rama dataflow can be found on our website for the Java API and for the Clojure API.

Let’s go over each line of this topology implementation. The first step is subscribing to the depot:

ClojureJava
1
2
(<<sources topology
  (source> *scheduled-post-depot :> {:keys [*time-millis] :as *scheduled-post})
1
2
topology.source("*scheduled-post-depot").out("*scheduled-post")
        .each(Ops.GET, "*scheduled-post", "time-millis").out("*time-millis")

This subscribes the topology to the depot *scheduled-post-depot and starts a reactive computation on it. Operations in dataflow do not return values. Instead, they emit values that are bound to new variables. In the Clojure API, the input and outputs to an operation are separated by the :> keyword. In the Java API, output variables are bound with the .out method.

Whenever data is appended to that depot, the data is emitted into the topology. The Java versions binds the emit into the variable *scheduled-post and then gets the field “time-millis” from the map into the variable *time-millis , while the Clojure version captures the emit as the variable *scheduled-post and also destructures a field into the variable *time-millis . All variables in Rama code begin with a * . The subsequent code runs for every single emit.

Remember that last argument to the depot declaration called the “depot partitioner”? That’s relevant here. Here’s that image of the physical layout of a module again:

The depot partitioner determines on which task the append happens and thereby on which task computation begins for subscribed topologies. In this case, the depot partitioner says to hash by the “id” field of the appended data. The target task is computed by taking the hash and modding it by the total number of tasks. This means data with the same ID always go to the same task, while different IDs are evenly spread across all tasks.

Rama gives a ton of control over how computation and storage are partitioned, and in this case we’re partitioning by the hash of the user ID since that’s how we ultimately want the $$feeds PState to be partitioned. This allows us to easily locate the task storing data for any particular user.

The final line schedules the post for the future:

ClojureJava
1
(java-macro! (.scheduleItem scheduler "*time-millis" "*scheduled-post"))
1
.macro(scheduler.scheduleItem("*time-millis", "*scheduled-post"));

This uses the scheduleItem method from TopologyScheduler to write the scheduled post into the PState managed by TopologyScheduler . scheduleItem returns a block of code that is inserted into the topology using Rama’s “macro” facility. “Macros” are a feature from Rama’s Java API for decomposing bits of dataflow code and later mixing them in to any other dataflow code. The definition of scheduleItem from the TopologyScheduler implementation is as follows:

1
2
3
4
5
6
7
8
9
public Block.Impl scheduleItem(Object timestampMillis, Object item) {
  String uuidVar = Helpers.genVar("scheduledUUID");
  String tupleVar = Helpers.genVar("scheduleTuple");
  String longVar = Helpers.genVar("timestampLong");
  return Block.each(() -> UUID.randomUUID().toString()).out(uuidVar)
              .each((Number n) -> n.longValue(), timestampMillis).out(longVar)
              .each(Ops.TUPLE, new Expr(TopologyScheduler::padTimeStr, longVar), uuidVar).out(tupleVar)
              .localTransform(_pstateVar, Path.key(tupleVar).termVal(item));
}

Here you can see it writes to its managed PState using localTransform after doing a bit of preparatory logic. Any intermediate variables are given unique names using genVar so they don’t inadvertently shadow any variables that are already in scope in the dataflow code where this code is being inserted. You can read more about macros in this section. Note that the Clojure API has additional facilities for decomposition that the Java API does not have, such as deframaop.

Understanding the implementation of this macro isn’t important – what matters is it handles the work of scheduling an item for the future at a particular timestamp. scheduleItem takes in a timestamp and an item, and that item will be given to the callback that’s later invoked when the scheduled time is reached. The state for scheduling the item is stored on the PState on this task, and the callback will be invoked on this same task when it’s ready.

Since TopologyScheduler is built with the Java API, the Clojure API has the java-macro! operation so that it can use utilities built around the Java API. This is also why the variables are specified as strings instead of symbols in this line of Clojure code, since the Java API represents variables as strings beginning with * .

Let’s now take a look at handling callbacks from TopologyScheduler , which completes the module implementation:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
(defmodule TimedNotificationsModule
  [setup topologies]
  (declare-depot setup *scheduled-post-depot (hash-by :id))
  (declare-tick-depot setup *tick 1000)
  (let [topology (stream-topology topologies "core")
        scheduler (TopologyScheduler. "$$scheduled")]
    (declare-pstate
      topology
      $$feeds
      {String (vector-schema String {:subindex? true})})
    (.declarePStates scheduler topology)
   (<<sources topology
     (source> *scheduled-post-depot :> {:keys [*time-millis] :as *scheduled-post})
     (java-macro! (.scheduleItem scheduler "*time-millis" "*scheduled-post"))

     (source> *tick)
     (java-macro!
      (.handleExpirations
        scheduler
        "*scheduled-post"
        "*current-time-millis"
        (java-block<-
          (identity *scheduled-post :> {:keys [*id *post]})
          (local-transform> [(keypath *id) AFTER-ELEM (termval *post)] $$feeds)
          )))
     )))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public class TimedNotificationsModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*scheduled-post-depot", Depot.hashBy("id"));
    setup.declareTickDepot("*tick", 1000);

    StreamTopology topology = topologies.stream("core");
    TopologyScheduler scheduler = new TopologyScheduler("$$scheduled");

    topology.pstate("$$feeds",
                    PState.mapSchema(String.class,
                                     PState.listSchema(String.class).subindexed()));
    scheduler.declarePStates(topology);

    topology.source("*scheduled-post-depot").out("*scheduled-post")
            .each(Ops.GET, "*scheduled-post", "time-millis").out("*time-millis")
            .macro(scheduler.scheduleItem("*time-millis", "*scheduled-post"));

    topology.source("*tick")
            .macro(scheduler.handleExpirations(
              "*scheduled-post",
              "*current-time-millis",
              Block.each(Ops.GET, "*scheduled-post", "id").out("*id")
                   .each(Ops.GET, "*scheduled-post", "post").out("*post")
                   .localTransform("$$feeds",
                                   Path.key("*id").afterElem().termVal("*post"))));
  }
}

The code added was:

ClojureJava
1
2
3
4
5
6
7
8
9
10
(source> *tick)
(java-macro!
  (.handleExpirations
    scheduler
    "*scheduled-post"
    "*current-time-millis"
    (java-block<-
      (identity *scheduled-post :> {:keys [*id *post]})
      (local-transform> [(keypath *id) AFTER-ELEM (termval *post)] $$feeds)
      )))
1
2
3
4
5
6
7
8
topology.source("*tick")
        .macro(scheduler.handleExpirations(
          "*scheduled-post",
          "*current-time-millis",
          Block.each(Ops.GET, "*scheduled-post", "id").out("*id")
               .each(Ops.GET, "*scheduled-post", "post").out("*post")
               .localTransform("$$feeds",
                               Path.key("*id").afterElem().termVal("*post"))));

This first adds a subscription to the tick depot, which emits at the configured 1000 millisecond frequency. The emit isn’t captured into a variable like the previous depot subscription since all this code cares about is the frequency at which the code runs.

Tick depots are global and emit only on task 0, so the subsequent code runs on just one task each time it emits. The next line handles expired items using the handleExpirations method from TopologyScheduler and inserting the code into the topology with a macro. handleExpirations does the following:

  • Goes to all tasks using the “all partitioner”.
  • Fetches expired items from the PState based on the current time
  • Loops over expired items and runs the code in the provided callback

handleExpirations takes in three arguments as input. The first specifies a variable to bind the expired item. This is the same item as was passed to scheduleItem before, and the variable will be in scope for the callback code. The next argument specifies a variable to bind the current time, which isn’t needed for this particular use case. The last argument is the callback code that specifies what to do with an expired item. It’s an arbitrary block of dataflow code. Since the method expects a Java API block of code, the Clojure API uses java-block<- to convert Clojure dataflow code to a Java API block.

Let’s go through each line of this callback code:

ClojureJava
1
(identity *scheduled-post :> {:keys [*id *post]})
1
2
Block.each(Ops.GET, "*scheduled-post", "id").out("*id")
     .each(Ops.GET, "*scheduled-post", "post").out("*post")

First, the “id” and “post” fields from the scheduled post are extracted into the variables *id and *post . The Clojure API destructures them, while the Java API uses the Ops.GET function to fetch them from the map.

The final line adds the post to the feed for the user:

ClojureJava
1
(local-transform> [(keypath *id) AFTER-ELEM (termval *post)] $$feeds)
1
2
.localTransform("$$feeds",
                Path.key("*id").afterElem().termVal("*post"))));

The PState is updated with the “local transform” operation. The transform takes in as input the PState $$feeds and a “path” specifying what to change about the PState. When a PState is referenced in dataflow code, it always references the partition of the PState that’s located on the task on which the event is currently running.

Paths are a deep topic, and the full documentation for them can be found here. A path is a sequence of “navigators” that specify how to hop through a data structure to target values of interest. A path can target any number of values, and they’re used for both transforms and queries. In this case, the path navigates by the key *id to the list of posts for that user. The next navigator, called AFTER-ELEM in Clojure and afterElem() in Java, navigates to the “void” element after the end of the list. Setting that “void” element to a value with the “term val” navigator causes that value to be appended to that list.

This code writes to the correct partition of $$feeds because the callback code is run on the same task where the post was scheduled. That was on the correct task because of the depot partitioner, as discussed before.

TopologyScheduler performance

TopologyScheduler adds very little overhead to processing. Scheduling an item of future work is a fast PState write, and checking for expired items is also very fast due to how it’s indexed by TopologyScheduler . The total overhead from TopologyScheduler is one small unit of work to schedule it, and then one small unit of work to emit it to the callback code later. This means TopologyScheduler can handle very large throughputs of scheduled items.

Summary

There’s a lot to learn with Rama, but you can see from this example application how much you can accomplish with very little code. Timed notifications like this probably aren’t going to be a module of its own, but part of the implementation of a larger module with more scope. For example, in our Twitter-scale Mastodon implementation, the techniques shown in this post are used for scheduled posts and poll expirations.

As mentioned earlier, there’s a Github project for the Clojure version and for the Java version containing all the code in this post. Those projects also have tests showing how to unit test modules in Rama’s “in-process cluster” environment.

You can get in touch with us at consult@redplanetlabs.com to schedule a free consultation to talk about your application and/or pair program on it. Rama is free for production clusters for up to two nodes and can be downloaded at this page.

Permalink

Skyscrapers or mud huts

Our last episode was with David Nolen. We talk about his development process, his origin, and his philosophy. The next episode is on Tuesday, April 22 with special guest Fogus. Please watch us live so you can ask questions.

Have you seen Grokking Simplicity, my book for beginners to functional programming? Please check it out or recommend it to a friend. You can also get it from Manning. Use coupon code TSSIMPLICITY for 50% off.


Skyscrapers or mud huts

From 2005-2007, I lived in the West African country of Guinea. I loved the people, but their country had a sad story. After rejecting their French colonizers, they made a series of bad deals with other countries. While they started out as the hope of West Africa due to their wealth of natural resources, after 45 years of kleptocracy, the infrastructure had decayed. While Guinea’s neighbors saw increasing prosperity, theirs was declining.

One of the symptoms of their decline was that they were de-urbanizing. In other countries in the region, people were leaving the village, looking for a better life in the cities. But in Guinea, apparently the better life was in the village.

I would often ponder this on my infrequent trips to the capital, Conakry. Large portions of the streets were broken or missing, revealing mud that could swallow a small car during the rainy season. Utility poles were a rats-nest of spliced wires. Probably most of the wires were dead. Roofs leaked. Many inhabited ten-story highrises had no electricity or running water.

Contrast this to the village where I lived. Yes, they were poor, but I never saw squalor. What I came to discover was that the village was simply easier to maintain by people with few resources. You could make a house out of mud bricks and thatch it with straw you harvested from your field. If the roof leaked, you could climb up and patch it. If a wall crumbled, you could repair it.

In the city, the tall buildings require a huge number of resources to maintain. You need cranes and bulldozers and other diesel machines. You need cement and iron forges and other high-input materials. You need a functional supply chain to get the diesel and other inputs to you. And you need the highly skilled labor to do the repairs. Don’t forget a functioning economy to pay for those workers. In short, modern urban life, which is very attractive when it works, is hard to maintain.

Even if you aren’t talking about repairs, the cost of keeping the building working is expensive. You need to pump water up to the top floors. If the building is tall enough, you need to pump clean air up there as well. And you need to deal with all of the waste products of life. Things have to keep working or the top floors aren’t usable anymore.

When programming, sometimes I wonder if our high-power tools, like Clojure, cause us a similar issue. One person can create a wondrous indirection mechanism that solves the problem at hand. It’s like erecting a skyscraper. Sure, it might be easy to write the code—easier than construction. But there is a certain mental cost. The complexity of understanding it requires the mental equivalent of cranes.

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? — Brian Kernighan

This quote from Kernighan (though I believe the sentiment comes from before his time). It expresses succinctly the idea I’m trying to convey. Clojure gives us the power to build things that are too clever to debug, just like a surge of political will can create a skyscraper that you don’t have the ability to maintain. Is maintaining a skyscraper twice as hard as building it?

I’ve worked on codebases where the cleverness was too much to maintain. The history of the project was a super smart programmer being brought onto the team to solve a tough problem. They built a brilliant solution. Then they left to use their brain somewhere more worthy of their mental capacity. The team enjoyed their skyscraper while it slowly decayed.

Richard Gabriel wrote a wonderful essay in Patterns of Software about the habitability of our code. In it, he argues that habitability is the important quality of good software. Programmers work inside the code, making adjustments, improvements, and expansions. We need to feel at home there—and like we can take care of it ourselves.

What is important is that it be easy for programmers to come up to speed with the code, to be able to navigate through it effectively, to be able to understand what changes to make, and to be able to make them safely and correctly. If the beauty of the code gets in the way, the program is not well written, just as an office building designed to win design awards is not well designed when the building later must undergo changes but those changes are too hard to make. — Richard Gabriel

A skyscraper has low habitability. It’s hard to adapt it to new purposes. It’s hard to adapt it as your needs change. But a mud hut is easy to change and expand. Our software needs that because we can’t plan it out ahead of time.

The genius’s plan for the software might have been good. But it was a master plan that only they could understand. They were the smartest programmer at the company and they used all of their capacity. How could anyone else keep up? Further, you can never fully communicate the “theory” behind the code. Just like Peter Naur recounted in Programming as Theory Building, it’s not just the raw intelligence. There’s also quite a lot of knowledge for how to build and extend the program:

In several major cases it turned out that the solutions suggested by group B [the extenders of the software] were found by group A [the original builders] to make no use of the facilities that were not only inherent in the structure of the existing compiler but were discussed at length in its documentation, and to be based instead on additions to that structure in the form of patches that effectively destroyed its power and simplicity. The members of group A were able to spot these cases instantly and could propose simple and effective solutions, framed entirely within the ex1 isting structure. This is an example of how the full program text and additional documentation is insufficient in conveying to even the highly motivated group B the deeper insight into the design, that theory which is immediately present to the members of group A. — Peter Naur

You have to learn to live in a skyscraper. And you have to learn to extend the software.

So we’ve got this triple whammy:

  1. The mental capacity of the original “hero” developer.

  2. The knowledge and plan they build while developing it (the “theory”).

  3. The high-leverage Clojure programming language.

But here’s the thing: We can’t consider the power of Clojure to be a bad thing. That can’t make sense. If it were true, we’d want to move toward a much poorer programming language, maybe all the way back to assembly. I can’t believe that. Neither can we consider intelligence bad. Where would that lead us?

It seems to me that we have a problem of balancing power with wisdom. What I mean by wisdom is the skill of using half of one’s mental capacity to build the original software so that you have enough intelligence to debug it. It’s the ability to see that our plan is too ambitious, we need to grow the software more organically. It’s the ability to keep the theory simple so you can teach it to the rest of the team. Software can get complicated. The only solution seems to be humility.

Permalink

Learning Fennel from Scratch to Develop Neovim Plugins

by Laurence Chen

As a Neovim user writing Clojure, I often watch my colleagues modifying Elisp to create plugins—for example, setting a shortcut key to convert Hiccup-formatted data in the editor into HTML. My feeling is quite complex. On one hand, I just can’t stand Emacs; I’ve tried two or three times, but I could not learn it well. On the other hand, if developing Neovim plugins means choosing between Lua and VimScript, neither of those options excites me. Despite these challenges, I envy those who can extend their editor using Lisp.

Wouldn’t it be great if there were a Lisp I could use to develop my own Neovim plugins? One day, I discovered Fennel, which compiles to Lua. Great! Now, Neovim actually has a Lisp option. But then came the real challenge—getting started with Fennel and Neovim development was much harder than I expected.

Here’s how I initially failed.

I read the aniseed GitHub repository’s README and followed along with the tutorial. Some Clojure-like functions seemed to work, but the critical Conjure command, jump to definition, didn’t. Next, I tried setting a seemingly simpler goal and pushed forward. However, whenever I encountered problems, I often didn’t know how to handle them. Eventually, I put the whole thing aside.

Reducing Runtime Dependencies

My second attempt at learning Fennel was triggered by a specific issue: when developing ClojureScript with Conjure, I had to manually run the command :ConjureShadowSelect [build-id] to enable interactive development. But I kept forgetting that command.

One day, I found that others had the same problem. Someone proposed a solution using Neovim auto commands:

" Define a function `AutoConjureSelect` to auto-select
function! AutoConjureSelect()
  let shadow_build=system("ps aux | grep 'shadow-cljs watch' | head -1 | sed -E 's/.*?shadow-cljs watch //' | tr -d '\n'")
  let cmd='ConjureShadowSelect ' . shadow_build
  execute cmd
endfunction
command! AutoConjureSelect call AutoConjureSelect()

" Trigger `AutoConjureSelect` whenever a cljs file is opened
autocmd BufReadPost *.cljs :AutoConjureSelect

Unfortunately, this solution didn’t work on my machine, likely due to OS differences. So I wrote a Babashka script to replace system(...):

#!/usr/bin/env bb

(require '[clojure.edn :as edn])
(require '[clojure.java.io :as io])

(def shadow-config (edn/read-string (slurp (io/file "shadow-cljs.edn"))))
(def build-ids (map name (keys (:builds shadow-config)))) ;; Convert to strings

(print (first build-ids))

Seeing how simple this Babashka script was, I felt inspired to take on Fennel again. If I could rewrite this in Fennel, my editor configuration would rely solely on the Neovim runtime—no extra Babashka runtime required.

After a few days of struggling, I barely made it.

I wrote a Fennel script at .config/nvim/fnl/auto-conjure.fnl, roughly the same length as the Babashka script:

(local {: autoload} (require :nfnl.module))
(local a (autoload :nfnl.core))
(local {: decode} (autoload :edn))
(local nvim vim.api)

(fn shadow-cljs-content []
  (a.slurp :shadow-cljs.edn))

(fn build-key [tbl]
  (a.first (a.keys (a.get tbl :builds))))

(fn shadow_build_id []
  (build-key (decode (shadow-cljs-content))))

{: shadow_build_id}

I then replaced system(...) with luaeval("require('auto-conjure').shadow_build_id()"), creating an auto Conjure command that relied only on the Neovim runtime.

Here is the full implementation and discussion.

How to Learn Fennel and Develop Neovim Plugins?

I finally have a grasp of writing Fennel. After some detours, I found the following learning sequence to be effective:

  1. Spend time reading the documentation on the Fennel website. The Setup Guide, Tutorial, Rationale, Lua Primer, Reference, and Fennel from Clojure are particularly helpful.

  2. Set up Conjure, Fennel syntax highlighting, and s-expression editing for Fennel.

  3. Choose a small enough project—preferably one that you can first implement in Babashka before rewriting in Fennel. Keep it simple; it doesn’t have to be a full-fledged plugin—just part of a feature. You can start by placing an .fnl file in .config/nvim/fnl/ and loading it in .config/nvim/init.vim via luaeval().

  4. While developing, you’ll encounter many challenges. The key is to learn how to debug effectively in Neovim. Writing plugins in Fennel isn’t difficult due to syntax or semantics—those are easy to pick up. The real challenge is Neovim’s runtime, which is a unique environment. You’ll need to memorize new commands to inspect the runtime state and understand its core workings to make reasonable debugging assumptions.

Common Pitfalls

One major trap I fell into was trying to reuse others’ Neovim configurations or convert my entire config to Fennel at once. My system’s configuration differed slightly from others’, so copying their Neovim setup led to an overwhelming number of bugs.

Additionally, after trying several Neovim plugins, I found that many didn’t suit my workflow. In the end, I stuck with my VimScript-based Neovim config, making only minor adjustments for Fennel development.

Aniseed is Friendly for Clojure Developers, but nfnl Fits the Lua Ecosystem Better

Originally, Conjure’s author, Olical, wrote Conjure in Clojure and used Neovim’s msgpack RPC for communication. Later, he rewrote it in Fennel. Since vanilla Fennel lacked a Clojure-like namespace, he created the aniseed compiler, which introduced module, defn, and def macros.

Developing with aniseed feels great, especially for Clojure developers—it’s almost like renaming namespace to module, replacing require with autoload, and keeping defn and def.

However, over time, Olical realized that forcing Fennel code to depend on Aniseed’s design wasn’t ideal. He then developed nfnl, a new Fennel compiler with a different coding style. In nfnl, functions are declared with fn, variables with local, and instead of module, files follow Lua’s convention of returning a table at the end. This makes nfnl less Clojurish but more compatible with Lua’s ecosystem.

aniseed vs nfnl

One confusing point: even in the latest Conjure version (4.53.0), the variable vim.g.conjure#filetype#fennel still points to "conjure.client.fennel.aniseed". After checking the release notes, I found this clarification:

I’m still working on the new Fennel client that runs through nfnl and supports REPL-driven development with pure Fennel (no module or def macros required!), but that’s still in the works.

Debug experience in Neovim runtime

During my learning process, the biggest challenge I encountered was my lack of understanding of the Neovim runtime. It felt like dealing with a black box—unexpected things happened, and I couldn’t immediately figure out a reasonable way to handle them. Here are some examples:

Plugin not automatically enabled

This type of issue is relatively simple. Checking the plugin’s source code and modifying global variables usually solves the problem.

Fennel is a lesser-known Lisp in the Lisp family, so some generic Lisp plugins might not automatically recognize it, causing them not to activate. The design convention in Neovim plugins typically addresses such issues by setting global variables. When you examine the plugin’s source code, you can often find some global variables prefixed with g:.

For example, in the guns/vim-sexp plugin, I modified a global variable, and s-expression editing was enabled for Fennel:

let g:sexp_filetypes = 'clojure,scheme,lisp,fennel'

Plugin enabled despite not being in init.vim

Earlier, I took a detour by using someone else’s Neovim config. Their init.lua installed several unfamiliar plugins. Later, when I abandoned init.lua and switched back to my own init.vim, the plugins installed via init.lua were still present. Additionally, some Neovim behaviors were unexpectedly altered.

How should this be handled? After some research, I discovered the Neovim Ex command :scriptnames, which lists all currently loaded plugins. After identifying and removing the unintended plugins, everything returned to normal.

Below is a list of Ex commands used for observing the state.

nvim Ex command

nfnl automatic compilation not occurring

Fennel files placed under .config/nvim/fnl can be automatically compiled using either aniseed or nfnl. However, I encountered a strange issue—when using aniseed, automatic compilation worked as expected after configuration, but with nfnl, it didn’t trigger at all.

Thus, my debugging steps became:

  • Check whether the nfnl module is loaded successfully.

This can be verified using the Ex command :lua print(require('nfnl')). If you get a result like table: 0x0102671b80, it means the module has been successfully loaded. If the module fails to load, an error message will be displayed instead.

  • Check whether manual compilation works.

Open an .fnl file and execute the Ex command :NfnlCompileFile. If successful, you should see output like:

Compilation complete.                                                                                               
{:destination-path "$path_to_nvim/lua/auto-conjure.lua"                         
 :source-path "$path_to_nvim/fnl/auto-conjure.fnl"                              
 :status "ok"} 

If .nfnl.fnl is misconfigured, you might get:

Compilation complete.                                                                                               
{:source-path "$path_to_nvim/fnl/auto-conjure.fnl"                     
 :status "path-is-not-in-source-file-patterns"}   

After performing these checks and finding no issues, I narrowed the problem down to auto command execution. Eventually, I discovered that an auto-formatting auto command I had set up was conflicting with the nfnl module’s auto command.

My original auto command was:

function! Fnlfmt()
 !fnlfmt --fix %
 " :e is to force reload the file after it got formatted.
 :e
endfunction

autocmd BufWritePost *.fnl call Fnlfmt()

Once I identified the issue, the solution was straightforward—I modified the auto command to perform both auto-formatting and compilation:

augroup FennelOnSave
  autocmd!
  autocmd BufWritePost *.fnl call Fnlfmt() | NfnlCompileFile
augroup END

Conclusion

For me, learning Fennel to develop Neovim plugins was not just about learning a language or a development environment—it was an opportunity to rethink how to effectively learn new technologies from scratch.

This journey provided several key takeaways:

  1. Learning strategy is crucial — Instead of trying to refactor my entire Neovim setup at the start, I should have started with a small, well-defined project and learned progressively. An ideal project should be small enough to implement in another language first. Additionally, reading the official documentation with patience often yields unexpected benefits.

  2. Understanding the Neovim runtime is essential — Fennel’s syntax itself isn’t difficult; the real challenge lies in interacting with the Neovim runtime. Mastering how to observe Neovim’s internal state (e.g., using Ex commands) is critical.

  3. The value of a standardized ecosystem — During my research, I also studied Conjure’s transition from aniseed to nfnl. This was a large-scale migration, but I appreciated nfnl’s design, which helped me better understand the significance of standardized ecosystems for code reuse.

For Clojure developers already using Neovim and Conjure, learning Fennel won’t be a steep challenge. If you haven’t tried it yet, why not eat your own dog food—it actually tastes pretty good!

>

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.