Reagent and React 19 support​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍​‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‍‌‌‍‌‍​‌​‍​​‌‍​‍‌‌‍‌‌​‌​​‍‌‌‍​‍‌‍​‍​​​​‌‌​‍‌​‌​‌‍‌‍​​‍​‌‍​‍‌‌‍​‌​‌‍​‌‌‌‍‌‌​‍‌‌‍‌​‌‍‌​​​​​‌​‌‍​‌‌‍‌​‌‍​‌‌‍​‌‍​‍‌‍‌​​‌​​‌​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌​‍‌‌​​‍‌​‌‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‍‌‌‍‌‍​‌​‍​​‌‍​‍‌‌‍‌‌​‌​​‍‌‌‍​‍‌‍​‍​​​​‌‌​‍‌​‌​‌‍‌‍​​‍​‌‍​‍‌‌‍​‌​‌‍​‌‌‌‍‌‌​‍‌‌‍‌​‌‍‌​​​​​‌​‌‍​‌‌‍‌​‌‍​‌‌‍​‌‍​‍‌‍‌​​‌​​‌​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‍​‍‌‌

Permalink

Fixed Order Sorting in Clojure

Here’s another tidy bit of Clojure that makes me happy…

Of course it’s easy to sort items in the natural order:

(sort [1 3 4 2]) ; => (1 2 3 4)

Or via a mapping to elements that sort in a natural order:

(sort-by count ["xx" "xxx" "x"]) ; => ("x" "xx" "xxx")

But how do we sort in a user-defined, fixed order?

For example, let’s say we have projects, and each project has a status which is one of:

(def statuses [:draft :in-progress :completed :archived])

(Note that those statuses are ordered in a logical, temporal progression.)

Here’s a quick helper function:

(defn fixed-order [xs]
 (let [pos (zipmap xs (range))]
 (fn [x y]
 (compare (pos x) (pos y)))))

To sort statuses, we simply do:

(def status-order (fixed-order statuses))

(sort status-order [:archived :in-progress :draft :completed])
;; =>
(:draft :in-progress :completed :archived)

Now let’s say we want to display all of our projects, sorted by their status:

(def projects
 [{:name "Paint the house", :status :completed}
 {:name "Update the blog", :status :in-progress}
 {:name "Fly to the moon" :status :draft}
 {:name "Master Clojure" :status :in-progress}])

(sort-by :status status-order projects)
;; =>
({:name "Fly to the moon", :status :draft}
 {:name "Update the blog", :status :in-progress}
 {:name "Master Clojure", :status :in-progress}
 {:name "Paint the house", :status :completed})

Maybe our app should group those sorted in the right order:

(->> projects
 (group-by :status)
 (into (sorted-map-by status-order)))
;; =>
{:draft [{:name "Fly to the moon", :status :draft}]
 :in-progress [{:name "Update the blog", :status :in-progress}
 {:name "Master Clojure", :status :in-progress}]
 :completed [{:name "Paint the house", :status :completed}]}

Ship it! 🚀

Permalink

JSAM: a simple JSON writer and reader

JSam is a lightweight, zero-deps JSON parser and writer. Named after Jetstream Sam.

  • Small: only 14 Java files with no extra libraries;
  • Not the fastest one but is pretty good (see the chart below);
  • Has got its own features, e.g. read and write multiple values;
  • Flexible and extendable.

Installation

Requires Java version at least 17. Add a new dependency:

;; lein
[com.github.igrishaev/jsam "0.1.0"]

;; deps
com.github.igrishaev/jsam {:mvn/version "0.1.0"}

Import the library:

(ns org.some.project
  (:require
    [jsam.core :as jsam]))

Reading

To read a string:

(jsam/read-string "[42.3e-3, 123, \"hello\", true, false, null, {\"some\": \"map\"}]")

[0.0423 123 "hello" true false nil {:some "map"}]

To read any kind of a source: a file, a URL, a socket, an input stream, a reader, etc:

(jsam/read "data.json") ;; a file named data.json
(jsam/read (io/input-stream ...))
(jsam/read (io/reader ...))

Both functions accept an optional map of settings:

(jsam/read-string "..." {...})
(jsam/read (io/file ...) {...})

Here is a table of options that affect reading:

option default comment
:read-buf-size 8k Size of a buffer to read
:temp-buf-scale-factor 2 Scale factor for an innter buffer
:temp-buf-size 255 Inner temp buffer initial size
:parser-charset UTF-8 Must be an instance of Charset
:arr-supplier jsam.core/sup-arr-clj An object to collect array values
:obj-supplier jsam.core/sup-obj-clj An object to collect key-value pairs
:bigdec? false Use BigDecimal when parsing numbers
:fn-key keyword A function to process keys

If you want keys to stay strings, and parse large numbers using BigDecimal to avoid infinite values, this is what you pass:

(jsam/read-string "..." {:fn-key identity :bigdec? true})

We will discuss suppliers a bit later.

Writing

To dump data into a string, use write-string:

(jsam/write-string {:hello "test" :a [1 nil 3 42.123]})

"{\"hello\":\"test\",\"a\":[1,null,3,42.123]}"

To write into a destination, which might be a file, an output stream, a writer, etc, use write:

(jsam/write "data2.json" {:hello "test" :a [1 nil 3 42.123]})

;; or

(jsam/write (io/file ...))

;; or

(with-open [writer (io/writer ...)]
  (jsam/write writer {...}))

Both functions accept a map of options for writing:

option default comment
:writer-charset UTF-8 Must be an instance of Charset
:pretty? false Use indents and line breaks
:pretty-indent 2 Indent growth for each level
:multi-separator \n How to split multiple values

This is how you pretty-print data:

(jsam/write "data3.json"
            {:hello "test" :a [1 {:foo [1 [42] 3]} 3 42.123]}
            {:pretty? true
             :pretty-indent 4})

This is what you’ll get (maybe needs some further adjustment):

{
    "hello": "test",
    "a": [
        1,
        {
            "foo": [
                1,
                [
                    42
                ],
                3
            ]
        },
        3,
        42.123
    ]
}

Handling Multiple Values

When you have 10.000.000 of rows of data to dump into JSON, a regular approach is not developer friendly. It leads to a single array with 10M items that you read into memory at once. Only few libraries provide facilities to read arrays lazily.

It’s much better to dump rows one by one into a stream and then read them one by one without saturating memory. Here is how you do it:

(jsam/write-multi "data4.json"
                  (for [x (range 0 3)]
                    {:x x}))

The second argument is a collection that might be lazy as well. The content of the file is:

{"x":0}
{"x":1}
{"x":2}

Now read it back:

(doseq [item (jsam/read-multi "data4.json")]
  (println item))

;; {:x 0}
;; {:x 1}
;; {:x 2}

The read-multi function returns a lazy iterable object meaning it won’t read everything at once. Also, both write- and read-multi functions are pretty-print friendly:

;; write
(jsam/write-multi "data5.json"
                  (for [x (range 0 3)]
                    {:x [x x x]})
                  {:pretty? true})

;; read
(doseq [item (jsam/read-multi "data5.json")]
  (println item))

;; {:x [0 0 0]}
;; {:x [1 1 1]}
;; {:x [2 2 2]}

The content of the data5.json file:

{
  "x": [
    0,
    0,
    0
  ]
}
{
  "x": [
    1,
    1,
    1
  ]
}
{
  "x": [
    2,
    2,
    2
  ]
}

Type Mapping and Extending

This chapter covers how to control type mapping between Clojure and JSON realms.

Writing is served using a protocol named jsam.core/IJSON with a single encidng method:

(defprotocol IJSON
  (-encode [this writer]))

The default mapping is the following:

Clojure JSON Comment
nil null  
String string  
Boolean bool  
Number number  
Ratio string e.g. (/ 3 2) -> "3/2"
Atom any gets deref-ed
Ref any gets deref-ed
List array lazy seqs as well
Map object keys coerced to strings
Keyword string leading : is trimmed

Anything else gets encoded like a string using the .toString invocation under the hood:

(extend-protocol IJSON
  ...
  Object
  (-encode [this ^JsonWriter writer]
    (.writeString writer (str this)))
  ...)

Here is how you override encoding. Imagine you have a special type SneakyType:

(deftype SneakyType [a b c]

  ;; some protocols...

  jsam/IJSON
  (-encode [this writer]
    (jsam/-encode ["I used to be a SneakyType" a b c] writer)))

Test it:

(let [data1 {:foo (new SneakyType :a "b" 42)}
      string (jsam/write-string data1)]
  (jsam/read-string string))

;; {:foo ["I used to be a SneakyType" "a" "b" 42]}

When reading the data, there is a way to specify how array and object values get collected. Options :arr-supplier and :obj-supplier accept a Supplier instance where the get method returns instances of IArrayBuilder or IObjectBuilder interfaces. Each interface knows how to add a value into a collection how to finalize it.

Default implementations build Clojure persistent collections like PersistentVector or PersistenHashMap. There is a couple of Java-specific suppliers that build ArrayList and HashMap, respectively. Here is how you use them:

(jsam/read-string "[1, 2, 3]"
                  {:arr-supplier jsam/sup-arr-java})

;; [1 2 3]
;; java.util.ArrayList

(jsam/read-string "{\"test\": 42}"
                  {:obj-supplier jsam/sup-obj-java})

;; {:test 42}
;; java.util.HashMap

Here are some crazy examples that allow to modify data while you build collections. For an array:

(let [arr-supplier
      (reify java.util.function.Supplier
        (get [this]
          (let [state (atom [])]
            (reify org.jsam.IArrayBuilder
              (conj [this el]
                (swap! state clojure.core/conj (* el 10)))
              (build [this]
                @state)))))]

  (jsam/read-string "[1, 2, 3]"
                    {:arr-supplier arr-supplier}))

;; [10 20 30]

And for an object:

(let [obj-supplier
      (jsam/supplier
        (let [state (atom {})]
          (reify org.jsam.IObjectBuilder
            (assoc [this k v]
              (swap! state clojure.core/assoc k (* v 10)))
            (build [this]
              @state))))]

  (jsam/read-string "{\"test\": 1}"
                    {:obj-supplier obj-supplier}))

;; {:test 10}

Benchmarks

Jsam doesn’t try to gain as much performance as possible; tuning JSON reading and writing is pretty challenging. But so far, the library is not as bad as you might think! It’s two times slower that Jsonista and slightly slower than Cheshire. But it’s times faster than data.json which is written in pure Clojure and thus is so slow.

The chart below renders my measures of reading a 100MB Json file. Then the data read from this file were dumped into a string. It’s pretty clear that Jsam is not the best nor the worst one in this competition. I’ll keep the question of performance for further work.

Measured on MacBook M3 Pro 36Gb.

Another benchmark made by Eugene Pakhomov. Reading:

size jsam mean data.json cheshire jsonista jsoniter charred
10 b 182 ns 302 ns 800 ns 230 ns 101 ns 485 ns
100 b 827 ns 1 µs 2 µs 1 µs 504 ns 1 µs
1 kb 5 µs 8 µs 9 µs 6 µs 3 µs 5 µs
10 kb 58 µs 108 µs 102 µs 58 µs 36 µs 59 µs
100 kb 573 µs 1 ms 968 µs 596 µs 379 µs 561 µs

Writing:

size jsam mean data.json cheshire jsonista jsoniter charred
10 b 229 ns 491 ns 895 ns 185 ns 2 µs 326 ns
100 b 2 µs 3 µs 2 µs 540 ns 3 µs 351 ns
1 kb 14 µs 14 µs 8 µs 3 µs 8 µs 88 ns
10 kb 192 µs 165 µs 85 µs 29 µs 96 µs 10 µs
100 kb 2 ms 2 ms 827 µs 325 µs 881 µs 88 µs

Measured on i7-9700K.

On Tests

One can be interested in how this library was tested. Although being considered as a simple format, JSON has got plenty of surprises. Jsam has tree sets of tests, namely:

  • basic cases written by me;
  • a large test suite borrowed from the Charred library. Many thanks to Chris Nuernberger who allowed me to use his code.
  • an extra set of generative tests borrowed from the official clojure.data.json library developed by Clojure team.

These three, I believe, cover most of the cases. Should you face any weird behavior, please let me know.

Permalink

Look at how little I need

I recently found the time to rewrite an old library of mine from the ground up: ez-form. This is one of the first libraries I wrote for Clojure, based on code I found myself writing yet again, in yet another project. Forms for the web is one of those things that you find yourself doing a lot when you do web development. And there is a whole lot that goes in to them: specification, validation, rendering, layout, error handling (where do we show them, do we show all?, etc), i18n, reuse and probably more.

Permalink

How G+D Netcetera used Rama to 100x the performance of a product used by millions of people

“Rama enabled us to improve the performance of two critical metrics for Forward Publishing by over 100x, and it reduced our AWS hosting costs by 55%. It sounds like a cliché, but learning Rama mostly involved us unlearning a lot of the conventional wisdom accepted by the software industry. It was a strange experience realising how much simpler software can be.”

Ognen Ivanovski, Principal Architect at G+D Netcetera

Forward Publishing from G+D Netcetera powers a large percentage of digital newspapers in Switzerland and Germany. Forward Publishing was launched in 2019 and handles millions of pageviews per day. The backend of Forward Publishing was rewritten on top of Rama in the past year to greatly improve its performance and expand its capabilities.

Forward Publishing determines what content to include on each page of a newspaper website with a complex set of rules considering recency, categories, and relations between content. Home pages link to dozens of stories, while story pages contain the content and links to other stories. Some portions, especially for showing lists of related content, are specified with dynamic queries. Rendering any of these pages requires determining what content to display at each position and then fetching the content for each particular item (e.g. summary, article text, images, videos). Determining what content to render on a page is a compute-intensive process. The original implementation of Forward Publishing had some serious performance issues:

  • Since computing a page from scratch on each pageview is too expensive, the architecture was reliant on a Varnish cache and only recomputing pages on an interval. This caused the system to take multiple minutes before new content would be displayed on a page. This was especially bad for breaking news.
  • Computing what to render on a page put a large amount of load on the underlying content-management system (CMS), such as Livingdocs. This load frequently negatively affected the experience of using the CMS by worsening performance.
  • The large amount of load sometimes caused outages due to the system becoming overwhelmed, especially if the cache had any sort of issue.

With Rama, they were able to improve the latency for new content becoming available on pages from a few minutes to less than a second, and they reduced the load on the CMS from Forward Publishing to almost nothing. Both of these are over 100x improvements compared to their previous implementation.

As a bonus, their Rama-based implementation requires much less infrastructure. They went from running 18 nodes per customer for Forward Publishing for various pieces of infrastructure to just 9 nodes per customer for their Rama implementation. In total their Rama-based implementation reduced their AWS hosting costs by 55%.

The new implementation reducing load on the CMS so much also has enabled G+D Netcetera to improve how the business is structured. Before, the CMS and Forward Publishing for each customer had to be operated as one integrated unit and required a lot of DevOps work to be able to handle the load. Now, the CMS can be operated independently of Forward Publishing and customers can customize it how they wish. It’s a better experience for G+D Netcetera’s customers, and it’s much less DevOps work for G+D Netcetera.

Before, Forward Publishing focused on its collaboration with the Livingdocs CMS because supporting a new CMS would require a large engineering effort to get it to handle the necessary load. Now, Forward Publishing is not tied to a single CMS anymore. To support a new CMS, they just need to write a little bit of code to ingest new data from the CMS into Rama. This greatly expands the market for Forward Publishing as they can interoperate with any CMS.

Shortly before Rama was announced, the engineers at G+D Netcetera realized the backend architecture of Forward Publishing was wasteful. The number of updates to the CMS is only hundreds per day, so the vast majority of the work to compute pages was repeated each time the TTL of the cache expired. Pages usually only have a few changes when they’re recomputed. Flipping the architecture by maintaining a denormalized view of each page that’s incrementally updated as new content arrives seemed like it would be a big improvement.

They initially built a few prototypes with core.async and Missionary to explore this idea. However, these did not address how to store and query the denormalized views in a data-parallel and fault-tolerant manner. As soon as Rama was announced, they realized it provided everything needed to make their idea a reality.

Traditional databases, especially RDBMS’s, serve as both a source of truth and an indexed store to serve queries. There’s a major tension from serving both these purposes, as you want data in a source of truth to be fully normalized to ensure data integrity and consistency. What G+D Netcetera found out the hard way is that only storing data fully normalized caused significant performance issues for their application.

Rama explicitly separates the source of truth from the indexed datastores that serve queries. It provides a coherent and general model for incrementally materializing indexed datastores from the source of truth in a scalable, high-performance, and fault-tolerant way. You get the data integrity benefits of full normalization and the freedom to fully optimize indexed datastores for queries in the same system. That tension between data integrity and performance that traditionally exists just does not exist in Rama.

Backend dataflow

At the core of G+D Netecetera’s Rama-based implementation is a microbatch topology that maintains a denormalized view of each page as new content comes in. The denormalized views are PStates (“partitioned state”) that reduce the work to render a page to just a single lookup by key.

New content could be a new article, edits to an existing article, or edits to the layout of a page. The microbatch topology determines all denormalized entities affected by new content, which involves updating and traversing the graph of entities. This whole process takes less than a second and results in content always being fresh for all pages.

G+D Netcetera built a small internal library similar to Pregel on top of Rama’s dataflow abstractions. This allows them to easily express the code performing graph operations like the aforementioned traversals.

The core microbatch topology relies heavily on Rama’s batch blocks, a computation abstraction that has the same capabilities as relational languages (inner joins, outer joins, aggregation, subqueries). Batch blocks are the core abstraction that enables G+D Netcetera’s graph computations.

PState examples

Unlike databases, which have fixed data models (e.g. relational, key/value, graph, column-oriented), PStates can have any data model conceivable. PStates are based on the simpler primitive of data structures, and each data model is just a particular combination of data structures. This flexibility allows each PState to be tuned to exactly match the use cases they support. G+D Netcetera materializes many PStates of many different shapes in their Forward Publishing implementation.

Let’s take a look at some of their PState schemas to see how they use the flexibility of PStates. These examples are in Clojure because G+D Netcetera uses Rama’s Clojure API, but Rama also has a Java API.

All the PState definitions below make use of this data type to identify entities:

1
(defrecord EntityId [tag id])

An entity ID consists of a “tag” (e.g. page, article, image) and an ID in the scope of that tag. You can store any types inside PStates, so this defrecord definition is used directly.

Two of their PStates are similar to how Datomic indexes data, called $$eavt and $$avet (PState names always begin with $$ ). The $$eavt definition is this:

1
2
3
4
5
6
7
(declare-pstate
  topology
  $$eavt
  {EntityId
   (map-schema
     Keyword ; attribute
     #{Object})}) ; values

This PState is a map from entity ID to a map from attribute to a set of values. The operations efficient on a PState are the same kinds of operations that are efficient on corresponding in-memory data structures. This PState efficiently supports queries like: “What are all the attributes for a particular entity?”, “What are all the values for an entity/attribute pair?”, “how many attributes does an entity have?”, and so on.

The $$avet PState is defined like this:

1
2
3
4
5
6
7
8
(declare-pstate
  topology
  $$avet
  {Keyword ; attribute
   (map-schema
     Object ; value
     (set-schema EntityId {:subindex? true})
     {:subindex? true})})

This PState is a map from attribute to a map from value to a set of entities. This allows different kinds of questions to be answered efficiently, like: “What are all the values associated with an attribute?”, “What are all the entities associated with a particular attribute/value pair?”, “How many entities have a particular attribute value?”, and many others.

This PState uses subindexing, which causes those nested data structures to index their elements individually on disk rather than serialize/deserialize the entire data structure as one value on every read and write. Subindexing enables reads and writes to nested data structures to be extremely efficient even if they’re huge, like containing billions of elements. As a rule of thumb, a nested data structure should be subindexed if it will ever have more than a few hundred elements. Since the number of values for an attribute and the number of entities for a particular attribute/value is unbounded, these nested data structures are subindexed.

While these PStates are similar to the data model of an existing database, G+D Netcetera has many PStates with data models that no database has. Here’s one example:

1
2
3
4
5
6
7
8
9
10
(declare-pstate
  topology
  $$dynamic-lists
  {EntityId
   (fixed-keys-schema
     {:spec  [[Object]]
      :max   Long
      :items (set-schema Object {:subindex? true}) ; set of [timestamp, EntityId]
      :count Long
      })})

This PState supports an incremental Datalog engine they wrote for finding the top matching entities for a given criteria. Dynamic lists are specified by users of Forward Publishing and can be part of the layout of pages. The idea behind “lists” on a page is to show related content a reader is likely to click on. Manually curating lists is time-intensive, so dynamic lists allow users to instead specify a list as a query on the attributes of all stories in the system, such as story category, location, and author.

A dynamic list has its own ID represented as an EntityId , and the fields of each dynamic list are:

  • :spec : the Datalog query which specifies which content to find for the list. A query could say something like “Find all stories that match either: a) from the region ‘Switzerland’ and tagged with “technology”, or b) tagged with ‘database'”.
  • :max : the maximum number of items to keep from entities that match
  • :items : the results of the Datalog query, tuples of timestamp and EntityId
  • :count : the total number of entities found of which at most :max were selected for :items

Dynamic lists can be dynamically added and removed, and they’re incrementally updated as new content arrives. The code implementing this incremental Datalog engine is only 700 lines of code in the Forward Publishing codebase. By storing the results in a PState, the dynamic lists are durable and replicated, making them highly available against faults like process death or node death.

Module overview

A Rama module contains all the storage and computation for a backend. Forward Publishing is implemented with two modules. The first module manages all content and contains:

The second module is smaller and manages user preferences. It consists of:

Topologies

Stream topologies are used for interactive features that require single-digit millisecond update latency. Microbatch topologies have update latency on the order of a few hundred milliseconds, but they have expanded computation capabilities (the “batch blocks” mentioned before) and higher throughput. So microbatch topologies are used for all features that don’t need single-digit millisecond update latency.

Two of the microbatch topologies implement the core of Forward Publishing, while the other ones support various miscellaneous needs of the product. The two core microbatch topologies are:

  • “ingress”: Ingests new content from the CMS and appends it to a Rama depot called *incoming-entities
  • “main”: Consumes *incoming-entities to integrate new content into a bidirectional graph. This topology then implements the denormalization algorithm that traverses the graph.

The miscellaneous topologies include:

  • “aliases”: A stream topology that maps entity IDs to other entity IDs. Some entities can be referenced by an “alias” entity ID, and this tracks the mapping from an alias to the primary entity ID.
  • “bookmarks”: A stream topology that tracks pages bookmarked by users.
  • “recently-read”: A stream topology that tracks articles recently read by users.
  • “redirects”: A stream topology that maps URLs to alternate URLs to support browser redirects.
  • “selected-geo”: A stream topology that maps users to geographical regions of interest.
  • “indexing”: A microbatch topology tracking metadata about regions and lists.
  • “routing”: A microbatch topology tracking metadata for each page of the website.
  • “sitemap”: A microbatch topology that incrementally updates a sitemap for the website. Sitemaps allow search engines to index the full website. Before Rama, Forward Publishing could only update the sitemap once per month in an expensive batch process. Now, the sitemap is always up to date.
  • “dynamic-lists”: A microbatch topology that implements the incremental Datalog engine described above.

Summary

Rama has been a game changer for G+D Netcetera, improving the performance of multiple key metrics by over 100x while simultaneously reducing the cost of operation by 55%. Additionally, their system is much more stable and fault tolerant than it was before.

G+D Netcetera found it takes about two weeks for a new engineer to learn Rama and become productive, with “batch blocks” being the biggest learning hurdle. The benefits they’ve achieved with Rama have made that modest investment in learning well worth it.

You can get in touch with us at consult@redplanetlabs.com to schedule a free consultation to talk about your application and/or pair program on it. Rama is free for production clusters for up to two nodes and can be downloaded at this page.

Permalink

Small modular parts

Our last episode was with David Nolen. We talk about his development process, his origin, and his philosophy. The next episode is on Tuesday, April 22 with special guest Fogus. Please watch us live so you can ask questions.

I have finally released the new version of Introduction to Clojure, my flagship module in Beginner Clojure, my signature video course. This update is long overdue, but it makes up for its tardiness with fully updated content, modernized for current idioms and better teaching.

If you have the previous edition, you can already find the new edition in your dashboard. You get the upgrade for free as my thanks for being a part of this crazy journey.

If you buy Beginner Clojure now, you’ll also get the new version. Because it’s such a major upgrade, I’m going to raise the prices soon. If you want it, now is the time to buy. It will never be this cheap again.


Small modular parts

I’ve been seeping in the rich conceptual stews of Patterns of Software. In it, Richard Gabriel explores how the ideas of Christopher Alexander apply to software engineering (long before the GoF Design Patterns book). One of the early ideas in the book is that of habitability, the characteristic of a building to support human life. Architecture needs to provide the kinds of spaces humans need, and also be adaptable to changing life circumstances. A house must support time together, time alone, and time alone together (sharing a space but doing different things). But it also must allow adding an extra bedroom as your family grows.

Habitable software is analogous. Programmers live in the code. They must feel comfortable navigating around, finding what they need, and making changes as requirements change. Christopher Alexander says that it is impossible to create truly living structures out of modular parts. They simply don’t adapt enough to the circumstances.

However, we know that’s not entirely true. Bricks are modular parts, and many of the living examples he gives are buildings made of bricks. It must be that the modules need to be small enough to permit habitability. You can’t adjust the size of a wall if the wall is prefabricated. But you can adjust the size of the wall if the bricks are prefabricated to a resolution that is just right.

This is true in software as well. Large modules are not as reusable as small ones. Take classical Java. I think the language gets the size of the abstractions wrong. The affordances of the language are the for/if/… statements, arithmetic expressions, and method calls, plus a way to compose those up into a new class. It goes from the lowest level (basically C) to a very high level, with very little in-between.

Contrast that with Clojure, which gives you many general-purpose abstractions at a higher level than Java (map/filter/reduce, first-class functions, generic data structures), and then almost nothing above it. Just the humble function to parameterize and name a thing. Lambda calculus (basically first-class functions) goes a long way. Java’s methods and classes give you a way to build procedural abstractions over data storage and algorithms, but the language offers torturous facilities for control flow abstractions. First-class functions can abstract control flow. I think Clojure got the level right.

Except maybe Clojure’s standard library overdoes it. I’m a big fan of map/filter/reduce. You can do a lot with them. But then there are others. For instance, there’s keep, which is like `map` but it rejects `nil`s. And there’s `remove`, which is the opposite of `filter`. Any call to keep could be rewritten:

(keep {:a 1 :b 2 :c 3} [:a :b :c :d])

(filter some? (map {:a 1 :b 2 :c 3} [:a :b :c :d]))

Those two are equivalent. `remove` can also be rewritten:

(remove #{:a :b :c} [:a :b :c :d])

(filter (comp #{:a :b :c}) [:a :b :c :d])

I do use keep and remove sometimes, when I think about them. But how much do they really add? Is the cost of learning these worth it? How often do you have to switch it back to map and filter anyway to make the change you want?

Here’s what I think: keep is just slightly too big. It’s a modular part that does just a tad too much. map is like a standard brick. keep is like an L-shaped brick that’s only useful at the end of a wall or on a corner. Useful but not that useful, and certainly not necessary. The same is true of remove. It’s not useful enough.

I think Clojure did a remarkably good job of finding the right size of module. They feel human-scale, ready for composition in an understandable way. It makes programs of medium to large size feel more habitable. I see this about what little Smalltalk code I’ve read: Smalltalk’s classes are small, highly general modular units, like Point and Rectangle, not UserPictureDrawingManager.

One aspect of habitability is maintainability—the de moda holy grail of software design. Back in 1996, when Patterns of Software was published, Gabriel felt the need to argue against efficiency as the reason for software design. Somewhere between efficiency’s reign and today’s maintainability, code size and then complexity ruled.

Long-time readers may guess where I’m going: These characteristics all focus on the code. Abstraction gets talked about in terms of its (excuse me) abstract qualities. An abstraction is too big or small, too high- or low-level, too shallow or deep. At best, we’re talking about something measurable in the code, at worst, some mental structures only in the mind of the guru designer who talks about it.

I want to posit domain fit as a better measure—one that leads to habitability—and that is also objective. Domain fit asks: “How good is the mapping between what your code represents and the meanings available in the domain?” That mapping goes both ways. We ask both “How easily can I express a domain situation in the code?” and “How easily does the code express the domain situation it represents?” Fit covers both directions of expressivity.

I believe that domain misfit causes the most difficulties for code habitability. If your code doesn’t fit well with the domain, you’ll need many workarounds. Using reusable modules is only a problem because they don’t adapt well to the needs of your domain—not because they’re too big. It just so happens that bigger modules are harder to adapt. It’s not that a wall module is bad, per se, just that it’s almost never exactly the right size, and so you make compromises.

It’s not that the components in Clojure are the right size. It’s that Clojure’s domain—data-oriented programming—is the right size for many problems. It allows you, the programmer, to compose a solution out of parts—like bricks in a wall. And Clojure’s code fits the domain very well. Tangentially: It makes me wonder what the domain of Java is. I guess what I’m saying is that using a vector graphics API to do raster graphics is going to feel uninhabitable. But you can’t say it’s because vector graphics is a bigger abstraction than raster. It’s more about having the right model.

Now, Alexander might disagree that a pre-fab wall of exactly the right size is okay. He believes that there’s something in the handmadeness of things, too. It’s not just that the wall is the wrong or right size. Even if it were perfect, the perfection itself doesn’t lend itself to beauty. Geometrically precisely laid tiles cross some threshold where you don’t feel comfortable anymore. Ragged symmetry is better. We want bricks but they shouldn’t be platonic prisms.

So this is where I conclude and tease the next issue. I started this essay thinking size was important. I thought that Clojure got it right by finding a size of composable module that was a sweet spot. But now, I think it’s not about size. I don’t even know what size means anymore. It’s more about domain fit than ever. Perhaps I’m digging in deeper to my own biases (and please, I’m relying on you the reader to help me realize if I am). But this is what my reading is leading me to—the importance of building a domain model. When we talk about domain models, we often think of these jewel-like abstractions with perfect geometry. But this is a pipe dream. Our domains are too messy for that. In the next issue, I want to explore the dichotomy of geometric and organic adaptation.

Permalink

The Duality of Transducers

I finally got around to re-recording and posting this talk on Clojure’s transducers that I gave last year to the Austin Clojure Meetup:

The talk walks through what transducers are, their benefits, where they make sense to use (and where they don’t), and how to implement them from scratch.

The title refers to an idea that really helped make transducers click for me: namely that there are two different conceptual models of transducers that I needed to apply (to different contexts) to really get it. (This reminded me a lot of the wave-particle duality of light in physics, which describes a single underlying reality in two different ways, with each way tending to prove more practical in analyzing particular scenarios).

The two models, then, are:

  1. The encapsulated transformation model, where the transducer is an opaque representation of a (likely-compound) transformation of a collection, which merges with other transformations via the mechanical application of comp. And…

  2. The constructive model, where we’re dealing in the underlying machinery of transducers (say, implementing a transducible process), and it’s helpful to conceptualize a transducer simply as a function from reducing-function to reducing-function (rf -> rf).

I hope that comes through clearly in the presentation.

If you don’t use transducers in your Clojure code today, I highly suggest you do—you will see benefits. And I’m convinced that the best way to get them to really click is to implement them from scratch, which the video will walk you through.

For your convenience, here are the slides for the presentation. The links are not clickable there (sorry), but are all included in the video description on YouTube.

If you have any questions, feedback, or need mentorship, feel free to reach out on Bluesky, Mastodon, or X and I’d be happy to help.

Permalink

No, really, you can’t branch Datomic from the past (and what you can do instead)

I have a love-hate relationship with Datomic. Datomic is a Clojure-based database based on a record of immutable facts; this post assumes a passing familiarity with it – if you haven’t yet, I highly recommend checking it out, it’s enlightening even if you end up not using it.

I’ll leave ranting on the “hate” part for some other time; here, I’d like to focus on some of the love – and its limits.

Datomic has this feature called “speculative writes”. It allows you to take an immutable database value, apply some new facts to it (speculatively, i.e., without sending them over to the transactor – this is self-contained within the JVM), and query the resulting database value as if those facts had been transacted for real.

This is incredibly powerful. It lets you “fork” a Datomic connection (with the help of an ingenious library called Datomock), so that you can see all of the data in the source database up to the point of forking, but any new writes happen only in memory. You can develop on top of production data, but without any risk of damaging them! I remember how aghast I was upon first hearing about the concept, but now can’t imagine my life without it. Datomock’s author offers an analogy to Git: it’s like database values being commits, and connections being branches.

Another awesome feature of Datomic is that it lets you travel back in time. You can call as-of on a database value, passing a timestamp, and you get back a db as it was at that point in time – which you can query to your heart’s content. This aids immensely in forensic debugging, and helps answer questions which would have been outright impossible to answer with classical DBMSs.

Now, we’re getting to the crux of this post: as-of and speculative writes don’t compose together. If you try to create a Datomocked connection off of a database value obtained from as-of, you’ll get back a connection to which you can transact new facts, but you’ll never be able to see them. The analogy to Git falls down here: it’s as if Git only let you branch HEAD.

This is a well-known gotcha among Datomic users. From Datomic’s documentation:

as-of Is Not a Branch

Filters are applied to an unfiltered database value obtained from db or with. In particular, the combination of with and as-of means "with followed by as-of", regardless of which API call you make first. with plus as-of lets you see a speculative db with recent datoms filtered out, but it does not let you branch the past.

So it appears that this is an insurmountable obstacle: you can’t fork the past with Datomic.

Or can you?

Reddit user NamelessMason has tried to reimplement as-of on top of d/filter, yielding what seems to be a working approach to “datofork”! Quoting his post:

Datomic supports 4 kinds of filters: as-of, since, history and custom d/filter, where you can filter by arbitrary datom predicate. […]

d/as-of sets a effective upper limit on the T values visible through the Database object. This applies both to existing datoms as well as any datoms you try to add later. But since the tx value for the next transaction is predictable, and custom filters compose just fine, perhaps we could just white-list future transactions?

(defn as-of'' [db t]
  (let [tx-limit (d/t->tx t)
        tx-allow (d/t->tx (d/basis-t db))]
    (d/filter db (fn [_ [e a v tx]] (or (<= tx tx-limit) (> tx tx-allow))))))

[…] Seems to work fine!

Sadly, it doesn’t actually work fine. Here’s a counterexample:

(def conn (let [u "datomic:mem:test"] (d/create-database u) (d/connect u)))

;; Let's add some basic schema
@(d/transact conn [{:db/ident :test/id :db/valueType :db.type/string
                    :db/cardinality :db.cardinality/one :db/unique :db.unique/identity}])
(d/basis-t (d/db conn)) ;=> 1000

;; Now let's transact an entity
@(d/transact conn [{:test/id "test", :db/ident ::the-entity}])
(d/basis-t (d/db conn)) ;=> 1001

;; And in another transaction let's change the :test/id of that entity
@(d/transact conn [[:db/add ::the-entity :test/id "test2"]])
(d/basis-t (d/db conn)) ;=> 1003

;; Trying a speculative write, forking from 1001
(def db' (-> (d/db conn)
             (as-of'' 1001)
             (d/with [[:db/add ::the-entity :test/id "test3"]])
             :db-after))
(:test/id (d/entity db' ::the-entity)) ;=> "test" (WRONG! it should be "test3")

To recap what we just did: we transacted version A of an entity, then an updated version B, then tried to fork C off of A, but we’re still seeing A’s version of the data. Can we somehow save the day?

To see what d/filter is doing, we can add a debug println to the filtering function, following NamelessMason’s example (I’m translating tx values to t for easier understanding):

(defn as-of'' [db t]
  (let [tx-limit (d/t->tx t)
        tx-allow (d/t->tx (d/basis-t db))]
    (d/filter db (fn [_ [e a v tx :as datom]]
                   (let [result (or (<= tx tx-limit) (> tx tx-allow))]
                     (printf "%s -> %s\n" (pr-str [e a v (d/tx->t tx)]) result)
                     result)))))

Re-running the above speculative write snippet now yields:

[17592186045418 72 "test" 1003] -> false
[17592186045418 72 "test" 1001] -> true

So d/filter saw that tx 1003 retracts the "test" value for our datom, but it’s rejected because it doesn’t meet the condition (or (<= tx tx-limit) (> tx tx-allow)). And at this point, it never even looks at datoms in the speculative transaction 1004, the one that asserted our "test3". It looks like Datomic’s d/filter does some optimizations where it skips datoms if it determines they cannot apply based on previous ones.

But even if it did do what we want (i.e., include datoms from tx 1001 and 1004 but not 1003), it would have been impossible. Let’s see what datoms our speculative transaction introduces:

(-> (d/db conn)
    (as-of'' 1001)
    (d/with [[:db/add ::the-entity :test/id "test3"]])
    :tx-data
    (->> (mapv (juxt :e :a :v (comp d/tx->t :tx) :added))))
;=> [[13194139534316 50 #inst "2025-04-22T12:48:40.875-00:00" 1004 true]
;=>  [17592186045418 72 "test3" 1004 true]
;=>  [17592186045418 72 "test2" 1004 false]]

It adds the value of "test3" but retracts "test2"! Not "test"! It appears that d/with looks at the unfiltered database value to produce new datoms for the speculative db value (corroborated by the fact that we don’t get any output from the filtering fn at this point; we only do when we actually query db'). Our filter cannot work: transactions 1001 plus 1004 would be “add "test", retract "test2", add "test3"”, which is not internally consistent.

So, no, really, you can’t branch Datomic from the past.

Which brings us back to square one: what can we do? What is our usecase for branching the past, anyway?

Dunno about you, but to me the allure is integration testing. Rather than having to maintain an elaborate set of fixtures, with artificial entity names peppered with the word “example”, I want to test on data that’s close to production; that feels like production. Ideally, it is production data, isolated and made invincible by forking. At the same time, tests have to behave predictably: I don’t want a test to fail just because someone deleted yesterday an entity from production that the test depends on. Being able to fork the past would have been a wonderful solution if it worked, but… it’s what it is.

So now I’m experimenting with a different approach. My observation here is that my app’s Datomic database is (and I’d wager a guess that most real-world DBs are as well) “mostly hierarchical”. That is, while its graph of entities might be a giant strongly-connected blob, it can be subdivided into many small subgraphs by judiciously removing edges.

This makes sense for testing. A test typically focuses on a handful of “top-level entities” that I need to be present in my testing database like they are in production, along with all their dependencies – sub-entities that they point to. Say, if I were developing a UI for the MusicBrainz database and testing the release page, I’d need a release entity, along with its tracks, label, medium, artist, country etc to be present in my testing DB. But just one release is enough; I don’t need all 10K of them.

My workflow is thus:

  • create an empty in-memory DB
  • feed it with the same schema that production has
  • get hold of a production db with a fixed as-of
  • given a “seed entity”, perform a graph traversal (via EAVT and VAET indexes) starting from that entity to determine reachable entities, judiciously blacklisting attributes (and whitelisting “backward-pointing” ones) to avoid importing too much
  • copy those entities to my fresh DB
  • run the test!

This can be done generically. I’ve written some proof-of-concept code that wraps a Datomic db to implement the Loom graph protocol, so that one can use Loom’s graph algorithms to perform a breadth-first entity scan, and a function to walk over those entities and convert them to a transaction applicable on top of a pristine DB. So far I’ve been able to extract meaningful small sub-dbs (on the order of ~10K datoms) from my huge production DB of 17+ billion datoms.

This is a gist for now, but let me know if there’s interest and I can convert it into a proper library.

Permalink

Local S3 storage with MinIO for your Clojure dev environment

Simple Storage Service or S3 Cloud Object Storage is a versatile and cheap storage. I’ve used it for invoices, contracts, media, and configuration snapshots, among other things. It’s a perfect fit when the object key (or what many might think of as a filename or path) can be used as a unique key to retrieve what you need — and doing it from Clojure is no exception.

💡 This post assumes you have the following tools installed: Docker, Docker Compose, aws CLI tool and Clojure ofcourse.

S3 is best for write-once-read-many use cases, and in a well-designed application architecture, you can often derive an object key (path) from already available data, like a customer number.

Object key (path) examples:

customer-58461/invoice-1467.pdf
marketing-campaign-14/revision-3.template

Though S3 was originally an Amazon product introduced in 2006, many cloud providers now offer fully compatible services. S3 is cheaper than traditional block storage (also known as volume storage or disk storage), and while it is also slower, it scales extremely well, even across regions (like EU and US).

I probably wouldn’t be so enthusiastic about S3 if it weren’t so easy to use in a local development environment. That’s where MinIO, running in a Docker container, comes in.

I usually commit a docker-compose.yml file to the repo alongside any code that requires S3-compatible object storage:

services:
  s3:
    image: minio/minio
    ports:
      - "9000:9000"     # S3 API service endpoint
      - "9001:9001"     # Web interface
    volumes:
      - './data/minio:/data' # Persist data; path must match command 👇
    command: server /data --console-address ":9001"
    environment:
      MINIO_DOMAIN: localhost:9000 # Requried for virtual-host bucket lookups
      MINIO_ROOT_USER: AKIAIOSFODNN7EXAMPLE
      MINIO_ROOT_PASSWORD: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Amazon S3 typically runs on the default HTTPS port (443), while MinIO defaults to port 9000. The web interface on port 9001 is excellent for browsing bucket content where using code or CLI is “too much”. Storing data on a volume outside the container makes it easy to update MinIO without losing data since “updating” usually means stopping and deleting the old container and then starting a new container.

A new version of the Docker image can be pulled with:

docker pull minio/minio

Setting the MINIO_DOMAIN environment variable is required to support “virtual host”-style presigned URLs (more on that later). Root user credentials are configured using MINIO_ROOT_* environment variables, and these credentials work for both logging into the web interface and generating presigned URLs.

Before starting the service, it’s possible to pre-create buckets by creating local folders inside the data volume:

mkdir -p data/minio/mybucket1

⚠️ Warning: Buckets created this way will be missing metadata, which can cause issues when listing ALL buckets.

Start the S3-compatible storage (MinIO) with:

docker compose up s3  # (Press CTRL+C to stop)

For using older versions of Docker Compose it can look like: docker-compose up s3 (notice the dash).

An AWS profile is handy when using the AWS CLI, among other things. The following will create a profile named minio:

Add these lines to $HOME/.aws/credentials:

[minio]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

And these to $HOME/.aws/config:

[profile minio]
region = us-east-1
endpoint_url = http://localhost:9000

Notice: MinIO’s default region is us-east-1, and the endpoint URL matches the configuration in docker-compose.yml.

Make sure that everything is configured correctly by checking read access:

aws s3 ls mybucket1 --profile minio

This should return a non-error but empty result.

Now, add a file to the bucket:

echo "Hello, World!" > hello.txt
aws s3 cp hello.txt s3://mybucket1/ --profile minio

… then verify the new content:

aws s3 ls mybucket1 --profile minio

Lo and behold — the bucket now has content.

Alternatively, set the environment variable AWS_PROFILE (using export AWS_PROFILE=minio) to avoid repeating --profile minio in each command.

Having a local S3 bucket, let’s start interacting with it from Clojure.

The S3 clients from Cognitect AWS API and Aw Yeah (AWS API for Babashka), implicitly resolve credentials the same way the Amazon Java SDK and the aws CLI tool do.

By leveraging the AWS_PROFILE environment variable, both the Clojure code and the aws CLI tool will use the exact same configuration. This makes any credential-related issues easier to reproduce outside of your code. In theory, this also helps avoid the need for code along the lines (if production? ... and (if dev? ... — which is usually a sign of poor software design.

There are many ways to manage environment variables, and each IDE handles them differently. To check if the REPL has picked up the expected environment configuration, use the following:

(System/getenv "AWS_PROFILE") ; Should return "minio"

In other contexts, I’d recommend avoiding implicit configuration, which using AWS_PROFILE is. However, in this case, it aligns with how AWS tooling is typically set up, making it intuitive for anyone already familiar with AWS. Plus, a wealth of resources (official docs, StackOverflow, etc.) rely on this convention.

To try out some code with the MinIO setup, start a REPL with AWS_PROFILE set and add the following dependencies to your deps.edn:

{:deps {com.cognitect.aws/api       {:mvn/version "0.8.741"}
        com.cognitect.aws/endpoints {:mvn/version "871.2.30.22"}
        com.cognitect.aws/s3        {:mvn/version "871.2.30.22"}
        dk.emcken/aws-simple-sign   {:mvn/version "2.1.0"}}}

To set up an S3 client:

(require '[cognitect.aws.client.api :as aws])

(def s3
  (aws/client {:api :s3 :endpoint-override {:protocol :http :hostname "localhost" :port 9000}}))

… and interact with the S3 bucket:

(aws/invoke s3 {:op :ListBuckets})
(aws/invoke s3 {:op :ListObjectsV2 :request {:Bucket "mybucket1"}})

Unfortunately, neither Cognitect’s aws-api nor awyeah-api, respect profiles or environment variables specifying an alternative endpoint, which is required when working with MinIO.

It forces the application to conditionally override the endpoint, which kinda ends up looking like (if dev? ... code 😭

I’ve reported the issue on GitHub. But until it’s fixed, I found the following generic code to be an acceptable compromise for working with MinIO locally:

(import '[java.net URL])

(defn url->map
  "Convenience function for providing a single connection URL over multiple
   individual properties."
  [^URL url]
  (let [port (.getPort url)
        path (not-empty (.getPath url))]
    (cond-> {:protocol (keyword (.getProtocol url))
             :hostname (.getHost url)}
      (not= -1 port)
      (assoc :port port)

      path
      (assoc :path path))))

(defn create-s3-client
  "Takes an endpoint-url (or nil if using default) and returns a S3 client."
  [endpoint-url]
  (let [url (some-> endpoint-url not-empty (URL.))]
    (cond-> {:api :s3}
      url (assoc :endpoint-override (url->map url))
      :always (aws/client))))

(def s3 ; local environment (using MinIO)
  (create-s3-client "http://localhost:9000"))

(def s3 ; production environment (using Amazon S3)
  (create-s3-client nil))

Now, replace the hardcoded local URL with an environment variable that is only set in a local development environment. The URL in the environment variable must match the value of the AWS profile setup.

; export MY_APP_S3_URL=http://locahost:9000
; only set MY_APP_S3_URL locally
(def s3
  (create-s3-client (System/getenv "MY_APP_S3_URL")))

The post could have ended here, but often it is beneficial to provide presigned URLs for the content in an S3 bucket. Luckily, aws-simple-sign presign URLs and work seamlessly with both previously mentioned S3 clients:

(require '[aws-simple-sign.core :as aws-sign])

; Using the already configured client from above:
(aws-sign/generate-presigned-url s3 "mybucket1" "hello.txt" {})
; Open URL in your browser and see "Hello, World!"

“Virtual hosted-style” presigned URLs only works, because the MINIO_DOMAIN environment variable is configured in docker-compose.yml. Alternatively, use path-style URLs. Checkout Amazons official documentation on the different styles, if you need a refresher.

This illustrates just how simple working with S3 from Clojure is. All the way from code iterations in the local development environment to transitioning the application into production.

Best of all, no AWS Java SDK is required. 🚀

Permalink

Setup Emacs to autoformat your Clojure code with Apheleia and zprint

Keeping code consistently formatted is important for readability and maintainability. Once you get used to having a computer format your code for you, manually formatting code can feel tedious.

For the last few years, my team has been using zprint to keep our Clojure codebase formatted to our specifications. zprint is great because it runs fast and is extremely customizable. This flexibility is powerful since it lets you format your code exactly how you want.

I've recently migrated from my own custom before-save-hook that triggered zprint whenever I saved a buffer to using Apheleia. Apheleia is an Emacs package that applies code formatters automatically on save. I won't quote the whole introduction in Apheleia's readme but it is designed to keep Emacs feeling responsive.

Here's the configuration I use in my Emacs setup:

(use-package apheleia
  :straight (apheleia :host github :repo "radian-software/apheleia")
  :config
  (setf (alist-get 'zprint apheleia-formatters)
        '("zprint" "{:style [:community] :map {:comma? false}}"))
  (setf (alist-get 'clojure-mode apheleia-mode-alist) 'zprint
        (alist-get 'clojure-ts-mode apheleia-mode-alist) 'zprint)
  (apheleia-global-mode t))

This snippet shows how to install and configure using straight.el and use-package. The :config section instructs apheleia under what modes it should run zprint and how to run it.1 I found the docstring for apheleia-formatters to be crucial for figuring out how to hook zprint into apheleia.

With this setup, your Clojure code will be automatically formatted using zprint every time you save. No more manual formatting needed. I've been running with this for a little while now and am enjoying it.

  1. I don't actually use :community and have my own custom formatting configuration but am using :community in this post so the snippet is immediately useful to readers.

Permalink

Optimizing syntax-quote

Syntax-quote in Clojure is an expressive reader macro for constructing syntax. The library backtick demonstrates how to write syntax-quote as a macro. Both approaches are ripe for optimization—take these programs that both evaluate to []:

'`[]
;=> (clojure.core/apply
;     clojure.core/vector
;     (clojure.core/seq (clojure.core/concat)))

(macroexpand-1 '(backtick/syntax-quote []))
;=> (clojure.core/vec (clojure.core/concat))

The reason syntax-quote’s expansion is so elaborate comes down to unquote-splicing (~@) support. When the program passed to ~@ is completely dynamic, the extra computation is essential.

(macroexpand-1 '(backtick/syntax-quote [1 ~@a 4]))
;=> (clojure.core/vec (clojure.core/concat [(quote 1)] a [(quote 4)]))

Since a cannot be evaluated at compile-time, we can’t do much better than this in terms of code size. The problem is that we’re stuck with this extra scaffolding even when ~@ is never used. We don’t even need it for unquote (~):

(macroexpand-1 '(backtick/syntax-quote [1 ~two ~three 4]))
;=> (clojure.core/vec
;     (clojure.core/concat
;       [(quote 1)] [two] [three] [(quote 4)]))

A more direct expansion would be ['1 two three '4].

I have implemented a branch of backtick that optimizes syntax-quote to only pay for the penalty of ~@ if it is used.

You can see the progression of the generated program becoming more dynamic as less static information can be inferred.

(macroexpand-1 '(backtick/syntax-quote []))))
;=> []
(macroexpand-1 '(backtick/syntax-quote [~local-variable]))))
;=> [local-variable]
(macroexpand-1 '(backtick/syntax-quote [~@local-variable]))))
;=> (clojure.core/vec local-variable)

Future work includes flattening spliced collections, such as:

(macroexpand-1 '(backtick/syntax-quote [1 ~@[two three] 4]))
;=> (clojure.core/vec
;     (clojure.core/concat [(quote 1)] [two three] [(quote 4)]))

This should be simply ['1 two three '4].

PR’s to my branch are welcome for further performance enhancements.

Also, if you are interested in implementing these changes in a real Clojure implementation, jank is accepting contributions. It will directly help with ongoing efforts towards AOT compilation:

Permalink

Q2 2025 Funding Announcement

Clojurists Together is excited to announce that we will be funding 6 projects in Q2 2025 for a total of $33K USD (3 for $9K and 3 shorter or more experimental projects for $2K). Thanks to all our members for making this happen! Congratulations to the 6 developers below:

$9K Projects
Bozhidar Batsov: CIDER
Brandon Ringe: CALVA
Jeaye Wilkerson: Jank

$2K Projects
Jeremiah Coyle: Bling
Karl Pietrzak: CodeCombat
Siyoung Byun: Scicloj - Building Bridges to New Clojure Users

Bozhidar Batsov: CIDER

Provide continued support for CIDER, nREPL and the related libraries (e.g. Orchard, cidernrepl, etc) and improve them in various ways.

Some ideas that I have in my mind:

  • Improve support for alternative Clojure runtimes
  • Simplify some of CIDER’s internals (e.g. jack-in, session management)
  • Improve CIDER’s documentation (potentially record a few up-to-date video tutorials as well)
  • Improve clojure-ts-mode and continue the work towards it replacing clojure-mode
  • Add support for clojure-ts-mode in inf-clojure
  • Continue to move logic outside of cider-nrepl
  • Improvement to the nREPL specification and documentation; potentially built some test suite for nREPL specification compatibility
  • Various improvements to the nREPL protocol
  • Stabilize Orchard and cider-nrepl enough to do a 1.0 release for both projects
  • Build a roadmap for CIDER 2.0
  • Write up an analysis of the State of Clojure 2024 survey results (connected to the roadmap item)

Brandon Ringe: CALVA

I’ll be working on a new REPL output view for Calva, which is a webview in VS Code. The current default REPL output view utilizes an editor and somewhat emulates a terminal prompt. The performance of the editor view degrades when there’s a high volume of output and/or when there are large data structures printed in it. The webview will allow us to add more rich features to the output webview, while also providing better performance.

I’ve started this work, the and I’ll use the funding of Clojurists Together to get the work over the finish line and release an initial, opt-in version of the REPL output webview. I’ll also be adding tests, responding to user feedback about the feature, fixing bugs, and adding features to it.

This is the first feature of Calva that integrates with VS Code’s API directly from ClojureScript. This is partly an experiment to see if writing more of Calva in ClojureScript is a good idea; I suspect that it is.

Jeaye Wilkerson: Jank

In Q1 2025, I built out jank’s error reporting to stand completely in a category of its own, within the lisp world. We have macro expansion stack tracing, source info preserved across expansions so we can point at specific forms in a syntax quote, and even clever solutions for deducing source info for non-meta objects like numbers and keywords. All of this is coupled with gorgeous terminal reporting with syntax highlighting, underlining, and box formatting.

In Q2, I plan to aim even higher. I’m going to build jank’s seamless C++ interop system. We had native/raw, previously, for embedding C++ strings right inside of jank code. This worked alright, but it was tied to jank having C++ codegen. Now that we have LLVM IR codegen, embedding C++ is less practical. Beyond that, though, we want to do better. Here’s a snippet of what I have designed for jank this quarter.
; Feed some C++ into Clang so we can start working on it.
; Including files can also be done in a similar way.
; This is very similar to native/raw, but is only used for declarations.
; It cannot run code.
(c++/declare “struct person{ std::string name; };")
; let is a Clojure construct, but c++/person. creates a value
; of the person struct we just defined above, in automatic memory (i.e. no heap allocation). (let [s (c++/person. “sally siu”)
; We can then access structs using Clojure’s normal interop syntax. n (.-name s)
; We can call member functions on native values, too.
; Here we call std::string::size on the name member.
l (.size n)]
; When we try to gives these native values to println, jank will
; detect that they need boxing and will automatically find a
; conversion function from their native type to jank’s boxed
; object_ptr type. If such a function doesn’t exist, the
; jank compiler fails with a type error.
(println n l))

image

In truth, this is basically the same exact syntax that Clojure has for Java interop, except for the c++ namespace to disambiguate. Since I want jank to work with other langs in the future, I think it makes sense to spell out the lang. Later, we may have a swift or rust namespace which works similarly. But let’s talk about this code.

This interop would be unprecedented. Sure, Clojure JVM does it, but we’re talking about the native world. We’re talking about C++. Ruby, Python, Lua, etc. can all reach into C. The C ABI is the lingua franca of the native world. But here, we’re reaching into C++ from a dynamic lang. We’ll call constructors, pull out members, call member functions, and jank will automatically ensure that destructors are called for any locals. Furthermore, jank already has full JIT compilation abilities for C++ code, so that means we can use our seamless interop to instantiate templates, define new structs which never existed before, etc.

Jeremiah Coyle: Bling

Bling is a library for rich text formatting in the console. https://github.com/paintparty/bling Work on Bling in Q2 of 2025 will focus on the following 3 goals:

  • Add support for using hiccup to style and format messages
  • Add support a template string syntax to style and format messages
  • Create 1-3 additional formatting templates for callouts, headers, and points-of-interest.

The following 4 features are stretch goals for Q2. They will be pursued in the following order when the initial 3 goals are completed.

  • Add support automatic detection of the 3 levels of color support (16-color, 256-color, or Truecolor), using an approach similar to https://github.com/chalk/supports-color
  • Add documentation about how to leverage Bling to create great-looking warnings and errors in your own projects. Example of using bling’s templates to create nice warnings can be found {here:](https://github.com/paintparty/fireworks?tab=readme-ov-file#helpful-warnings-forbad-option-values)
  • Add documentation about using Bling in conjunction with existing libraries which format Spec and Malli messages into human readable form.
  • Support arbitrary hex colors, and their conversion, if necessary, to x256

Karl Pietrzak: Code Combat

My project will focus on adding Clojure(Script) to CodeCombat
See Wiki page at https://github.com/codecombat/codecombat/wiki/Aether

Siyoung Byun: Scicloj - Building Bridges to New Clojure Users

In 2025, Scicloj aims to improve the accessibility of Clojure for individuals working with data, regardless of their programming backgrounds. The project will initially focus on reviewing existing Scicloj libraries, analyzing their codebases, and actively using them to better understand their documentation structure. Specifically, the initial effort will concentrate on clearly organizing and distinguishing between tutorials and API documentation. From these insights, the project aims to develop standardized templates to encourage greater consistency across the documentation of existing Scicloj ecosystem libraries, making those libraries more robust and user-friendly.

Permalink

Clojure Power Tools Part 3

Clojure REPL

Clojure REPL.

Table of Contents

Introduction

I have already covered Clojure Power Tools some 5 years ago in a couple of blog posts:

In this new blog post, I will briefly summarize the most important tools discussed in those two blog posts and then introduce some new power tools that I have found useful recently. I thought it might be a good idea to list all the most important Clojure power tools in one blog post so that I don’t forget them in the future. I add to this list also Clojure libraries that I use in my Clojure/script fullstack applications.

I use this Clojure fullstack application to introduce those power tools: replicant-webstore.

VSCode, Calva and REPL Editor Integration

Your editor is of course one of your most important tools what ever programming language you use. My current choice is Visual Studio Code. It is rather light but also provides a rich set of extensions for various programming purposes. Nowadays, it also provides a good generative AI integration to help you with your programming tasks, I have written a couple of blog posts about Using Copilot in Programming about and my Copilot Keybindings.

If you are programming Clojure with VSCode editor, I defnitely recommend the excellent Calva extension. It provides a great Clojure REPL integration to VSCode, paredit structural editing, and much more. If you are interested trying Calva, I recommend reading the excellent Calva documentation and start using it. I have also written three blog posts regarding my Calva configurations:

An important part of using Clojure is the keybindings (e.g. for evaluating forms, giving paredit commands, etc.). I have written a couple of blog posts regarding my keybindings:

And one hint. Keep your VSCode configurations (at least keybindings.json and settings.json) in version control (Git).

Babashka

Babashka is a marvelous tool for writing scripts and automating tasks. I have written a couple of blog posts regarding Babashka:

I learned from one Metosin example project how to use Babashka as a task runner for my projects. See my latest Clojure fullstack exercise in which I used Babashka as a task runner, expecially file bb.edn and bb-scripts directory for how to start the backend and frontend REPLs.

Fullstack Libraries

Metosin Libraries: Reitit, Malli and Jsonista

These are my favourite Metosin Libraries I always include to my Clojure fullstack projects. You can use these libraries both in the backend and the frontend.

Reitit provides excellent routing functionalities. See in that clojure fullstack application I mentioned previously:

Malli provides excellent schema that you can use as a Clojure common (cljc) file that you can comprise both to your backend API and your frontend to validate that the backend returned data that conforms to the schema. See example in that Clojure fullstack application: schema.cljc.

Jsonista is a Clojure library for JSON encoding and decoding. Using Muuntaja you can easily do edn/json encoding in your API.

Aero and Integrant

Aero is an excellent configuration library. See example in config.edn regarding the demonstration application configuration and how to read it in main.clj.

Integrant provides a nice way to define your application from components, define the relationships between the components in your configuration (see the config.edn file above), and reset/reload the state of your application using those components. See also db.clj in which the defmethod ig/init-key :db/tsv function reads the tab separated file and initializes our little “demonstration database.”

Replicant and Hiccup

In the frontend, I used for years Reagent which is a React wrapper for Clojurescript. There are some technical challenges for Reagent to use the latest React versions, and I was therefore looking for some new UI Clojurescript technology. I first considered using UIx which is also a React wrapper for Clojurescript. But then I discovered Replicant which with Hiccup is a very lightweight and Clojurish way of doing frontend. I have covered Replicant in a couple of my blog posts:

Development Practices and Tools

REPL

If you are learning Clojure, Programming at the REPL is something you definitely have to learn. You should check what kind of REPL support there is with the editor you are using, and start learning to use it. If you are using VSCode, you find more information above in chapter VSCode, Calva and REPL Editor Integration.

I have three monitors at my desk. The main monitor is where I keep my VSCode editor. In the side monitor I keep the REPL window. This way I can maximize the main monitor for the editing, but also see in my side the REPL output. If you are using VSCode, this is easy. You first start the REPL with Calva. If you have done the same kind of Calva configuration that I have explained in my previous blog posts, you should have your Calva Output in VSCode editor area in a tab. Give VSCode command View: Move Editor into New Window, this will move the active editor tab into a new VSCode Window. Now you can move the REPL output window into your second monitor.

I have a couple similar commands to evaluate Clojure forms in Calva. Alt-L evaluates the form and outputs the result in the editor as an ephemeral output which you can reset with Esc key. With Alt+Shift+L Calva writes the evaluation result below the evaluated form like this:

  (keys (deref (:db/tsv (user/env))))
  ;;=> (:books :movies)

REPL is your power tool with Clojure and you should learn to use it efficiently.

Personal Profile Deps

This is my current ~/.clojure/deps.edn file:


{:aliases {:kari {:extra-paths ["scratch"]
                  :extra-deps {; NOTE: hashp 0.2.1 sci print bug.
                               hashp/hashp {:mvn/version "0.2.2"}
                               org.clojars.abhinav/snitch {:mvn/version "0.1.16"}
                               com.gfredericks/debug-repl {:mvn/version "0.0.12"}
                               djblue/portal {:mvn/version "0.58.5"}}}

           :reveal {:extra-deps {vlaaad/reveal {:mvn/version "1.3.284"}}
                    :ns-default vlaaad.reveal
                    :exec-fn repl}

           :outdated {;; Note that it is `:deps`, not `:extra-deps`
                      :deps {com.github.liquidz/antq {:mvn/version "2.11.1269"}}
                      :main-opts ["-m" "antq.core"]}}}

I use these tools quite often and therefore keep them in my personal profile kari.

I then add my kari profile to scripts I use to start REPL in development, like this:

:backend-repl-command ["clojure -M:dev:backend:frontend:shadow-cljs:calva-external-repl:test:kari -i bb-scripts/backendinit.clj -m nrepl.cmdline --middleware \"[cider.nrepl/cider-middleware,shadow.cljs.devtools.server.nrepl/middleware]\""]

Inline Defs

Inline defs is an old Clojure trick to debug Clojure code. Let’s explain it with a small example:

(defmethod ig/init-key :db/tsv [_ {:keys [path data] :as db-opts}]
  (log/infof "Reading tsv data, config is %s" (pr-str db-opts))
  (let [books (read-datafile (str path "/" (:books data)) book-line book-str)
        _ (def mybooks books) ;; THIS IS THE INLINE DEF
        movies (read-datafile (str path "/" (:movies data)) movie-line movie-str)]
    (atom {:books books
           :movies movies})))

(comment
  ;; AND HERE WE EXAMINE WHAT HAPPENED.
  (count mybooks)
  ;;=> 35
  (first mybooks)
  ;;=> {:id 2001,
  ;;    :product-group 1,
  ;;    :title "Kalevala",
  ;;    :price 3.95,
  ;;    :author "Elias Lönnrot",
  ;;    :year 1835,
  ;;    :country "Finland",
  ;;    :language "Finnish"}

I hardly ever use the Calva debugger, since Clojure provides much better tools to examine your live program state. Nowadays instead of inline defs, I use Snitch.

Hashp

I used to use Hashp quite often in my debugging sessions, but nowadays more Snitch. But instead of adding a prn line in some let and see the REPL output, hashp is a good alternative.

Portal

You can use portal in development to tap to various data. I have added a couple of examples how to tap to the data in files.

In the Clojure side, in routes.clj:

  ;; Example how to tap to the data using djblue Portal:
  (require '[clj-http.client :as client])
  (require '[jsonista.core :as json])
  (defn json-to-edn [json-str]
    (json/read-value json-str (json/object-mapper {:decode-key-fn keyword}))) 
  (json-to-edn "{\"name\": \"Book\", \"price\": 29.99}") 
  
  (:body (client/get "http://localhost:8331/api/products/books"))
  ;; Tap to the data:
  ; https://github.com/djblue/portal
  (require '[portal.api :as p])
  ; This should open the Portal window.
  (def p (p/open))
  (add-tap #'p/submit) 
  (tap> :hello)
  (tap> (json-to-edn (:body (client/get "http://localhost:8331/api/products/books"))))
  ;; You should now see a vector of book maps in the portal window.

In the Clojurescript side, in app.cljs:

  ;; Example how to tap to the data using djblue Portal: 
  (require '[portal.web :as p])
  ; NOTE: This asks a popup window, you have to accept it in the browser!!!
  (def p (p/open))
  ; Now you should have a new pop-up browser window...
  (add-tap #'p/submit)
  (tap> :hello)
  (tap> (get-in @!state [:db/data :books]))
  ;; You should now see a vector of book maps in the portal window.

Gadget

Gadget is nowadays my main debugging tool with Replicant. Gadget provides a very good view to your frontend state while developing the frontend.

Gadget

Gadget.

Calva Debugger

Calva provides a nice debugger. As I already explained before, I very seldom use it. But now, I just used it to provide the example below, and I realized that it is actually quite a nice tool, and I should use it more in the future.

Calva debugger

Calva debugger.

So, you just add the #dbg reader tag to your code and once your code execution goes to that point the debugger triggers.

Snitch

Peter Strömberg, the creator of Calva, once again introduced an excellent new tool to me: Snitch. I watched Peter’s excellent demo how he uses Snitch, and I immediately realized that I switch ad hoc inline defs to Snitch. I recommend watching Peter’s video on how to use Snitch.

Snitch is a tool that adds inline Defs to your function.

I mostly use defn* which injects inline defs for all the bindings in the function: parameters and let bindings. If I want to examine what happens in the function, my workflow is like this: 1. Change: def => def*. 2. Integrant reset. 3. Call the API (or what ever, which finally calls the function). 4. Examine bindings in the function by evaluating them in the function context.

Add this to user.clj

;; https://github.com/AbhinavOmprakash/snitch
(require '[snitch.core :refer [defn* defmethod* *fn *let]])

Bonus Tool: Copilot

My corporation provides GitHub Copilot Enterprise License. Copilot is a great tool to assist you in programming. I am still a bit of old school programmer in that sense that I hardly ever let Copilot to do editing in the actual text files, but I mostly have a conversation with Copilot in the VSCode integrated Copilot Chat view.

I have explained my Copilot use in this blog post: Copilot Keybindings .

Conclusions

Clojure is an excellent programming language. It has a rich ecosystem and tools that you just don’t have in other programming languages, due to the fact that other programming languages not being homoiconic languages just can’t have e.g. a real REPL.

The writer is working at a major international IT corporation building cloud infrastructures and implementing applications on top of those infrastructures.

Kari Marttila

Kari Marttila’s Home Page in LinkedIn: https://www.linkedin.com/in/karimarttila/

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.