Massively scalable collaborative text editor backend with Rama in 120 LOC

This is part of a series of posts exploring programming with Rama, ranging from interactive consumer apps, high-scale analytics, background processing, recommendation engines, and much more. This tutorial is self-contained, but for broader information about Rama and how it reduces the cost of building backends so much (up to 100x for large-scale backends), see our website.

Like all Rama applications, the example in this post requires very little code. It’s easily scalable to millions of reads/writes per second, ACID compliant, high performance, and fault-tolerant from how Rama incrementally replicates all state. Deploying, updating, and scaling this application are all one-line CLI commands. No other infrastructure besides Rama is needed. Comprehensive monitoring on all aspects of runtime operation is built-in.

In this post, I’ll explore building the backend for a real-time collaborative text editor like Google Docs or Etherpad. The technical challenge of building an application like this is conflict resolution. When multiple people edit the same text at the same time, what should be the result? If a user makes a lot of changes offline, when they come online how should their changes be merged in to a document whose state may have diverged significantly?

There are many approaches for solving this problem. I’ll show an implementation of “operational transformations” similar to the one Google Docs uses as described here. Only incremental changes are sent back and forth between server and client, such as “Add text ‘hello’ to offset 128” or “Remove 14 characters starting from offset 201”. When clients send a change to the server, they also say what version of the document the change was applied to. When the server receives a change from an earlier document version, it applies the “operational transformation” algorithm to modify the addition or removal to fit with the latest document version.

Code will be shown in both Clojure and Java, with the total code being about 120 lines for the Clojure implementation and 160 lines for the Java implementation. Most of the code is implementing the “operational transformation” algorithm, which is just plain Clojure or Java functions, and the Rama code handling storage/queries is just 40 lines. You can download and play with the Clojure implementation in this repository or the Java implementation in this repository.

Operational transformations

The idea behind “operational transformations” is simple and can be understood through a few examples. Suppose Alice and Bob are currently editing a document, and Alice adds "to the " at offset 6 when the document is at version 3, like so:

However, when the change gets to the server, the document is actually at version 4 since Bob added "big " to offset 0:

Applying Alice’s change without modification would produce the string “big heto the llo world”, which is completely wrong. Instead, the server can transform Alice’s change based on the single add that happened between versions 3 and 4 by pushing Alice’s change to the right by the length of "big ". So Alice’s change of “Add ‘to the ’ at offset 6” becomes “Add ‘to the ’ at offset 10”, and the document becomes:

Now suppose instead the change Bob made between versions 3 and 4 was adding " again" to the end:

In this case, Alice’s change should not be modified since Bob’s change was to the right of her addition, and applying Alice’s change will produce “hello to the world again”.

Transforming an addition against a missed remove works the same way: if Bob had removed text to the left of Alice’s addition, her addition would be shifted left by the amount of text removed. If Bob removed text to the right, Alice’s addition is unchanged.

Now let’s take a look at what happens if Alice had removed text at an earlier version, which is slightly trickier to handle. Suppose Alice removes 7 characters starting from offset 2 when the document was “hello world” at version 3. Suppose Bob had added "big " to offset 0 between versions 3 and 4:

In this case, Alice’s removal should be shifted right by that amount to remove 7 characters starting from offset 6. Now suppose Bob had instead added "to the " to offset 6:

In this case, text had been added in the middle of where Alice was deleting text. It’s wrong and confusing for Alice’s removal to delete text that wasn’t in her version of the document. The correct way to handle this is split her removal into two changes: remove 3 characters from offset 13 and then remove 4 characters from offset 2. This produces the following document:

The resulting document isn’t legible, but it’s consistent with the conflicting changes that Alice and Bob were making at the same time without losing any information improperly.

Let’s take a look at one more case. Suppose again that Alice removes 7 characters from offset 2 when the document was “hello world” at version 3. This time, Bob had already removed one character starting from offset 3:

In this case, one of the characters Alice had removed was already removed, so Alice’s removal should be reduced by one to remove 6 characters from offset 2 instead of 7, producing:

There’s a few more intricacies to how operational transformations work, but this gets the gist of the idea across. Since this post is really about using Rama to handle the storage and computation for a real-time collaborative editor backend, I’ll keep it simple and only handle adds and removes. The code can easily be extended to handle other kinds of document edits such as formatting changes.

The Rama code will make use of two functions that encapsulate the “operational transformation” logic, one to apply a series of edits to a document and the other to transform an edit against a particular version against all the edits that happened until the latest version. These functions are as follows:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
(defn transform-edit [edit missed-edits]
  (if (instance? AddText (:action edit))
    (let [new-offset (reduce
                       (fn [offset missed-edit]
                         (if (<= (:offset missed-edit) offset)
                           (+ offset (add-action-adjustment missed-edit))
                           offset))
                       (:offset edit)
                       missed-edits)]
    [(assoc edit :offset new-offset)])
  (reduce
    (fn [removes {:keys [offset action]}]
      (if (instance? AddText action)
        (transform-remove-against-add removes offset action)
        (transform-remove-against-remove removes offset action)))
    [edit]
    missed-edits)))

(defn apply-edits [doc edits]
  (reduce
    (fn [doc {:keys [offset action]}]
      (if (instance? AddText action)
        (setval (srange offset offset) (:content action) doc)
        (setval (srange offset (+ offset (:amount action))) "" doc)))
    doc
    edits))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public static List transformEdit(Map edit, List<Map> missedEdits) {
  int offset = offset(edit);
  if(isAddEdit(edit)) {
    int adjustment = 0;
    for(Map e: missedEdits) {
      if(offset(e) <= offset) offset += addActionAdjustment(e);
    }
    Map newEdit = new HashMap(edit);
    newEdit.put("offset", offset + adjustment);
    return Arrays.asList(newEdit);
  } else {
    List removes = Arrays.asList(edit);
    for(Map e: missedEdits) {
      if(isAddEdit(e)) removes = transformRemoveAgainstAdd(removes, e);
      else removes = transformRemoveAgainstRemove(removes, e);
    }
    return removes;
  }
}

private static String applyEdits(String doc, List<Map> edits) {
  for(Map edit: edits) {
    int offset = offset(edit);
    if(isAddEdit(edit)) {
      doc = doc.substring(0, offset) + content(edit) + doc.substring(offset);
    } else {
      doc = doc.substring(0, offset) + doc.substring(offset + amount(edit));
    }
  }
  return doc;
}

The “apply edits” function updates a document with a series of edits, and the “transform edit” function implements the described “operational transformation” algorithm. The “transform edit” function returns a list since an operational transformation on a remove edit can split that edit into multiple remove edits, as described in one of the cases above.

The implementation of the helper functions used by “apply edits” and “transform edit” can be found in the Clojure repository or the Java repository.

Backend storage

Indexed datastores in Rama, called PStates (“partitioned state”), are much more powerful and flexible than databases. Whereas databases have fixed data models, PStates can represent infinite data models due to being based on the composition of the simpler primitive of data structures. PStates are distributed, durable, high-performance, and incrementally replicated. Each PState is fine-tuned to what the application needs, and an application makes as many PStates as needed. For this application, we’ll make two PStates: one to track the latest contents of a document, and one to track the sequence of every change made to a document in its history.

Here’s the PState definition for the latest contents of a document:

ClojureJava
1
2
3
4
(declare-pstate
  topology
  $$docs
  {Long String})
1
topology.pstate("$$docs", PState.mapSchema(Long.class, String.class));

This PState is a map from a 64-bit document ID to the string contents of the document. The name of a PState always begins with $$ , and this is equivalent to a key/value database.

Here’s the PState tracking the history of all edits to a document:

ClojureJava
1
2
3
4
(declare-pstate
  topology
  $$edits
  {Long (vector-schema Edit {:subindex? true})})
1
2
3
topology.pstate("$$edits",
                PState.mapSchema(Long.class,
                                 PState.listSchema(Map.class).subindexed()));

This declares the PState as a map of lists, with the key being the 64-bit document ID and the inner lists containing the edit data. The inner list is declared as “subindexed”, which instructs Rama to store the elements individually on disk rather than the whole list read and written as one value. Subindexing enables nested data structures to have billions of elements and still be read and written to extremely quickly. This PState can support many queries in less than one millisecond: get the number of edits for a document, get a single edit at a particular index, or get all edits between two indices.

Let’s now review the broader concepts of Rama in order to understand how these PStates will be materialized.

Rama concepts

A Rama application is called a “module”. In a module you define all the storage and implement all the logic needed for your backend. All Rama modules are event sourced, so all data enters through a distributed log in the module called a “depot”. Most of the work in implementing a module is coding “ETL topologies” which consume data from one or more depots to materialize any number of PStates. Modules look like this at a conceptual level:

Modules can have any number of depots, topologies, and PStates, and clients interact with a module by appending new data to a depot or querying PStates. Although event sourcing traditionally means that processing is completely asynchronous to the client doing the append, with Rama that’s optional. By being an integrated system Rama clients can specify that their appends should only return after all downstream processing and PState updates have completed.

A module deployed to a Rama cluster runs across any number of worker processes across any number of nodes, and a module is scaled by adding more workers. A module is broken up into “tasks” like so:

A “task” is a partition of a module. The number of tasks for a module is specified on deploy. A task contains one partition of every depot and PState for the module as well as a thread and event queue for running events on that task. A running event has access to all depot and PState partitions on that task. Each worker process has a subset of all the tasks for the module.

Coding a topology involves reading and writing to PStates, running business logic, and switching between tasks as necessary.

Implementing the backend

Let’s start implementing the module for the collaborative editor backend. The first step to coding the module is defining the depot:

ClojureJava
1
2
3
4
(defmodule CollaborativeDocumentEditorModule
  [setup topologies]
  (declare-depot setup *edit-depot (hash-by :id))
  )
1
2
3
4
5
6
public class CollaborativeDocumentEditorModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*edit-depot", Depot.hashBy("id"));
  }
}

This declares a Rama module called “CollaborativeDocumentEditorModule” with a depot called *edit-depot which will receive all new edit information. Objects appended to a depot can be any type. The second argument of declaring the depot is called the “depot partitioner” – more on that later.

To keep the example simple, the data appended to the depot will be defrecord objects for the Clojure version and HashMap objects for the Java version. To have a tighter schema on depot records you could instead use Thrift, Protocol Buffers, or a language-native tool for defining the types. Here are the functions that will be used to create depot data:

ClojureJava
1
2
3
4
5
6
7
8
9
10
(defrecord AddText [content])
(defrecord RemoveText [amount])

(defrecord Edit [id version offset action])

(defn mk-add-edit [id version offset content]
  (->Edit id version offset (->AddText content)))

(defn mk-remove-edit [id version offset amount]
  (->Edit id version offset (->RemoveText amount)))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public static Map makeAddEdit(long id, int version, int offset, String content) {
  Map ret = new HashMap();
  ret.put("type", "add");
  ret.put("id", id);
  ret.put("version", version);
  ret.put("offset", offset);
  ret.put("content", content);
  return ret;
}

public static Map makeRemoveEdit(long id, int version, int offset, int amount) {
  Map ret = new HashMap();
  ret.put("type", "remove");
  ret.put("id", id);
  ret.put("version", version);
  ret.put("offset", offset);
  ret.put("amount", amount);
  return ret;
}

Each edit contains a 64-bit document ID that identifies which document the edit should apply.

Next, let’s begin defining the topology to consume data from the depot and materialize the PStates. Here’s the declaration of the topology with the PStates:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
(defmodule CollaborativeDocumentEditorModule
  [setup topologies]
  (declare-depot setup *edit-depot (hash-by :id))
  (let [topology (stream-topology topologies "core")]
    (declare-pstate
      topology
      $$docs
      {Long String})
    (declare-pstate
      topology
      $$edits
      {Long (vector-schema Edit {:subindex? true})})
    ))
1
2
3
4
5
6
7
8
9
10
11
12
public class CollaborativeDocumentEditorModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*edit-depot", Depot.hashBy("id"));

    StreamTopology topology = topologies.stream("core");
    topology.pstate("$$docs", PState.mapSchema(Long.class, String.class));
    topology.pstate("$$edits",
                    PState.mapSchema(Long.class,
                                     PState.listSchema(Map.class).subindexed()));
  }
}

This defines a stream topology called “core”. Rama has two kinds of topologies, stream and microbatch, which have different properties. In short, streaming is best for interactive applications that need single-digit millisecond update latency, while microbatching has update latency of a few hundred milliseconds and is best for everything else. Streaming is used here because a collaborative editor needs quick feedback from the server as it sends changes back and forth.

Notice that the PStates are defined as part of the topology. Unlike databases, PStates are not global mutable state. A PState is owned by a topology, and only the owning topology can write to it. Writing state in global variables is a horrible thing to do, and databases are just global variables by a different name.

Since a PState can only be written to by its owning topology, they’re much easier to reason about. Everything about them can be understood by just looking at the topology implementation, all of which exists in the same program and is deployed together. Additionally, the extra step of appending to a depot before processing the record to materialize the PState does not lower performance, as we’ve shown in benchmarks. Rama being an integrated system strips away much of the overhead which traditionally exists.

Let’s now add the code to materialize the PStates:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(defmodule CollaborativeDocumentEditorModule
  [setup topologies]
  (declare-depot setup *edit-depot (hash-by :id))
  (let [topology (stream-topology topologies "core")]
    (declare-pstate
      topology
      $$docs
      {Long String})
    (declare-pstate
      topology
      $$edits
      {Long (vector-schema Edit {:subindex? true})})
    (<<sources topology
      (source> *edit-depot :> {:keys [*id *version] :as *edit})
      (local-select> [(keypath *id) (view count)]
        $$edits :> *latest-version)
      (<<if (= *latest-version *version)
        (vector *edit :> *final-edits)
       (else>)
        (local-select>
          [(keypath *id) (srange *version *latest-version)]
          $$edits :> *missed-edits)
        (transform-edit *edit *missed-edits :> *final-edits))
      (local-select> [(keypath *id) (nil->val "")]
        $$docs :> *latest-doc)
      (apply-edits *latest-doc *final-edits :> *new-doc)
      (local-transform> [(keypath *id) (termval *new-doc)]
        $$docs)
      (local-transform>
        [(keypath *id) END (termval *final-edits)]
        $$edits)
      )))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class CollaborativeDocumentEditorModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*edit-depot", Depot.hashBy("id"));

    StreamTopology topology = topologies.stream("core");
    topology.pstate("$$docs", PState.mapSchema(Long.class, String.class));
    topology.pstate("$$edits",
                    PState.mapSchema(Long.class,
                                     PState.listSchema(Map.class).subindexed()));

    topology.source("*edit-depot").out("*edit")
            .each(Ops.GET, "*edit", "id").out("*id")
            .each(Ops.GET, "*edit", "version").out("*version")
            .localSelect("$$edits", Path.key("*id").view(Ops.SIZE)).out("*latest-version")
            .ifTrue(new Expr(Ops.EQUAL, "*latest-version", "*version"),
              Block.each(Ops.TUPLE, "*edit").out("*final-edits"),
              Block.localSelect("$$edits",
                                Path.key("*id")
                                    .sublist("*version", "*latest-version")).out("*missed-edits")
                   .each(CollaborativeDocumentEditorModule::transformEdit,
                         "*edit", "*missed-edits").out("*final-edits"))
            .localSelect("$$docs", Path.key("*id").nullToVal("")).out("*latest-doc")
            .each(CollaborativeDocumentEditorModule::applyEdits,
                  "*latest-doc", "*final-edits").out("*new-doc")
            .localTransform("$$docs", Path.key("*id").termVal("*new-doc"))
            .localTransform("$$edits", Path.key("*id").end().termVal("*final-edits"));
  }
}

The code to implement the topology is less than 20 lines, but there’s a lot to unpack here. The business logic is implemented with dataflow. Rama’s dataflow API is exceptionally expressive, able to intermix arbitrary business logic with loops, conditionals, and moving computation between tasks. This post is not going to explore all the details of dataflow as there’s simply too much to cover. Full tutorials for Rama dataflow can be found on our website for the Java API and for the Clojure API.

Let’s go over each line of this topology implementation. The first step is subscribing to the depot:

ClojureJava
1
2
(<<sources topology
  (source> *edit-depot :> {:keys [*id *version] :as *edit})
1
2
3
topology.source("*edit-depot").out("*edit")
        .each(Ops.GET, "*edit", "id").out("*id")
        .each(Ops.GET, "*edit", "version").out("*version")

This subscribes the topology to the depot *edit-depot and starts a reactive computation on it. Operations in dataflow do not return values. Instead, they emit values that are bound to new variables. In the Clojure API, the input and outputs to an operation are separated by the :> keyword. In the Java API, output variables are bound with the .out method.

Whenever data is appended to that depot, the data is emitted into the topology. The Java version binds the emit into the variable *edit and then gets the fields “id” and “version” from the map into the variables *id and *version , while the Clojure version captures the emit as the variable *edit and also destructures its fields into the variables *id and *version . All variables in Rama code begin with a * . The subsequent code runs for every single emit.

Remember that last argument to the depot declaration called the “depot partitioner”? That’s relevant here. Here’s that image of the physical layout of a module again:

The depot partitioner determines on which task the append happens and thereby on which task computation begins for subscribed topologies. In this case, the depot partitioner says to hash by the “id” field of the appended data. The target task is computed by taking the hash and modding it by the total number of tasks. This means data with the same ID always go to the same task, while different IDs are evenly spread across all tasks.

Rama gives a ton of control over how computation and storage are partitioned, and in this case we’re partitioning by the hash of the document ID since that’s how we want the PStates to be partitioned. This allows us to easily locate the task storing data for any particular document.

The next line fetches the current version of the document:

ClojureJava
1
2
(local-select> [(keypath *id) (view count)]
  $$edits :> *latest-version)
1
.localSelect("$$edits", Path.key("*id").view(Ops.SIZE)).out("*latest-version")

The $$edits PState contains every edit applied to the document, and the latest version of a document is simply the size of that list. The PState is queried with the “local select” operation with a “path” specifying what to fetch. When a PState is referenced in dataflow code, it always references the partition of the PState that’s located on the task on which the event is currently running.

Paths are a deep topic, and the full documentation for them can be found here. A path is a sequence of “navigators” that specify how to hop through a data structure to target values of interest. A path can target any number of values, and they’re used for both transforms and queries. In this case, the path navigates by the key *id to the list of edits for the document. The next navigator view runs a function on that list to get its size. The Clojure version uses Clojure’s count function, and the Java version uses the Ops.SIZE function. The output of the query is bound to the variable *latest-version . This is a fast sub-millisecond query no matter how large the list of edits.

The next few lines run the “operational transformation” algorithm if necessary to produce the edits to be applied to the current version of the document:

ClojureJava
1
2
3
4
5
6
7
(<<if (= *latest-version *version)
  (vector *edit :> *final-edits)
 (else>)
  (local-select>
    [(keypath *id) (srange *version *latest-version)]
    $$edits :> *missed-edits)
  (transform-edit *edit *missed-edits :> *final-edits))
1
2
3
4
5
6
7
.ifTrue(new Expr(Ops.EQUAL, "*latest-version", "*version"),
  Block.each(Ops.TUPLE, "*edit").out("*final-edits"),
  Block.localSelect("$$edits",
                    Path.key("*id")
                        .sublist("*version", "*latest-version")).out("*missed-edits")
       .each(CollaborativeDocumentEditorModule::transformEdit,
             "*edit", "*missed-edits").out("*final-edits"))

First, an “if” is run to check if the version of the edit is the same as the latest version on the backend. If so, the list of edits to be applied to the document is just the edit unchanged. As mentioned before, the operational transformation algorithm can result in multiple edits being produced from a single edit. The Clojure version produces the single-element list by calling vector , and the Java version does so with the Ops.TUPLE function. The list of edits is bound to the variable *final-edits .

The “else” branch of the “if” handles the case where the edit must be transformed against all edits up to the latest version. The “local select” on $$edits fetches all edits from the input edit’s version up to the latest version. The navigator to select the sublist, srange in Clojure and sublist in Java, takes in as arguments a start offset (inclusive) and end offset (exclusive) and navigates to the sublist of all elements between those offsets. This sublist is bound to the variable *missed-edits .

The function shown before implementing operational transformations, transform-edit in Clojure and CollaborativeDocumentEditorModule::transformEdit in Java, is then run on the edit and all the missed edits to produce the new list of edits and bind them to the variable *final-edits .

Any variables bound in both the “then” and “else” branches of an “if” conditional in Rama will be in scope after the conditional. In this case, *final-edits is available after the conditional. *missed-edits is not available since it is not bound in the “then” branch. This behavior comes from Rama implicitly “unifying” the “then” and “else” branches.

The next bit of code gets the latest document and applies the transformed edits to it:

ClojureJava
1
2
3
(local-select> [(keypath *id) (nil->val "")]
  $$docs :> *latest-doc)
(apply-edits *latest-doc *final-edits :> *new-doc)
1
2
3
.localSelect("$$docs", Path.key("*id").nullToVal("")).out("*latest-doc")
.each(CollaborativeDocumentEditorModule::applyEdits,
      "*latest-doc", "*final-edits").out("*new-doc")

The “local select” fetches the latest version of the document from the $$docs PState. The second navigator in the path, nil->val in Clojure and nullToVal in Java, handles the case where this is the first ever edit on the document. In that case the document ID does not exist in this PState. In that case the key navigation by *id would navigate to null , so the next navigator instead navigates to the empty string in that case.

The next line runs the previously defined “apply edits” function to apply the transformed edits to produce the new version of the document into the variable *new-doc .

The next two lines finish this topology:

ClojureJava
1
2
3
4
5
(local-transform> [(keypath *id) (termval *new-doc)]
  $$docs)
(local-transform>
  [(keypath *id) END (termval *final-edits)]
  $$edits)
1
2
.localTransform("$$docs", Path.key("*id").termVal("*new-doc"))
.localTransform("$$edits", Path.key("*id").end().termVal("*final-edits"));

The two PStates are updated with the “local transform” operation. Like “local select”, a “local transform” takes in as input a PState and a “path”. Paths for “local transform” navigate to the values to change and then use special “term” navigators to update them.

The first “local transform” navigates into $$docs by the document ID and uses the “term val” navigator to set the value there to *new-doc . This is exactly the same as doing a “put” into a hash map.

The second “local transform” appends the transformed edits to the end of the list of edits for this document. It navigates to the list by the key *id and then navigates to the “end” of the list. More specifically, the “end” navigator navigates to the empty list right after the overall list. Setting that empty list to a new value appends those elements to the overall list, which is what the final “term val” navigator does.

This entire topology definition executes atomically – all the PState queries, operational transformational logic, and PState writes all happen together and nothing else can run on the task in between. This is a result of Rama colocating computation and storage, which will be explored more in the next section.

The power of colocation

Let’s take a look at the physical layout of a module again:

Every task has a partition of each depot and PState as well as an executor thread for running events on that task. Critically, only one event can run at a time on a task. That means each event has atomic access to all depots and PState partitions on that task. Additionally, those depot and PState partition are local to the JVM process running that event so interactions with them are fully in-process (as opposed to the inter-process communication used with databases).

A traditional database handles many read and write requests concurrently, using complex locking strategies and explicit transactions to achieve atomicity. Rama’s approach is different: parallelism is achieved by having many tasks in a module, and atomicity comes from colocation. Rama doesn’t have explicit transactions because transactional behavior is automatic when computation is colocated with storage.

When writing a topology in a module, you have full control over what constitutes a single event. Code runs synchronously on a task unless they explicitly go asynchronous, like with partitioners or yields.

This implementation for a collaborative editor backend is a great example of the power of colocation. The topology consists of completely arbitrary code running fine-grained logic for the “operational transformational” logic and manipulating multiple PStates, and nothing special needed to be done to get the necessary transactional behavior.

When you use a microbatch topology to implement an ETL, you get even stronger transactional behavior. All microbatch topologies are cross-partition transactions in every case, no matter how complex the logic.

You can read more about Rama’s strong ACID semantics on this page.

Query topology to fetch document and version

The module needs one more small thing to complete the functionality necessary for a real-time collaborative editor backend. When the frontend is loaded, it needs to load the latest document contents along with its version. The contents are stored in the $$docs PStates, and the version is the size of the list of edits in the $$edits PState. So we need to read from both those PStates atomically in one event.

If you were to try to do this with direct PState clients, you would be issuing one query to the client for the $$edits PState and one query to the client for the $$docs PState. Those queries would run as separate events, and the PStates could be updated in between the queries. This would result in the frontend having incorrect state.

Rama provides a feature called “query topologies” to handle this case. Query topologies are exceptionally powerful, able to implement high-performance, real-time, distributed queries across any or all of the PStates of a module and any or all of the partitions of those PStates. They’re programmed with the exact same dataflow API as used to program ETLs.

For this use case, we only need to query two PStates atomically. So this is a simple use case for query topologies. The full implementation is:

ClojureJava
1
2
3
4
5
6
7
(<<query-topology topologies "doc+version"
  [*id :> *ret]
  (|hash *id)
  (local-select> (keypath *id) $$docs :> *doc)
  (local-select> [(keypath *id) (view count)] $$edits :> *version)
  (hash-map :doc *doc :version *version :> *ret)
  (|origin))
1
2
3
4
5
6
7
8
9
10
11
topologies.query("doc+version", "*id").out("*ret")
          .hashPartition("*id")
          .localSelect("$$docs", Path.key("*id")).out("*doc")
          .localSelect("$$edits", Path.key("*id").view(Ops.SIZE)).out("*version")
          .each((String doc, Integer version) -> {
            Map ret = new HashMap();
            ret.put("doc", doc);
            ret.put("version", version);
            return ret;
          }, "*doc", "*version").out("*ret")
          .originPartition();

Let’s go through this line by line. The first part declares the query topology and its arguments:

ClojureJava
1
2
(<<query-topology topologies "doc+version"
  [*id :> *ret]
1
topologies.query("doc+version", "*id").out("*ret")

This declares a query topology named “doc+version” that takes in one argument *id as input. It declares the return variable *ret , which will be bound by the end of the topology execution.

The next line gets the query to the task of the module containing the data for that ID:

ClojureJava
1
(|hash *id)
1
.hashPartition("*id")

The line does a “hash partition” by the value of *id . Partitioners relocate subsequent code to potentially a new task, and a hash partitioner works exactly like the aforementioned depot partitioner. The details of relocating computation, like serializing and deserializing any variables referenced after the partitioner, are handled automatically. The code is linear without any callback functions even though partitioners could be jumping around to different tasks on different nodes.

When the first operation of a query topology is a partitioner, query topology clients are optimized to go directly to that task. You’ll see an example of invoking a query topology in the next section.

The next two lines atomically fetch the document contents and version:

ClojureJava
1
2
(local-select> (keypath *id) $$docs :> *doc)
(local-select> [(keypath *id) (view count)] $$edits :> *version)
1
2
.localSelect("$$docs", Path.key("*id")).out("*doc")
.localSelect("$$edits", Path.key("*id").view(Ops.SIZE)).out("*version")

These two queries are atomic because of colocation, just as explained above. Fetching the latest contents of a document is simply a lookup by key, and fetching the latest version is simply the size of the list of edits.

The next line packages these two values into a single object:

ClojureJava
1
(hash-map :doc *doc :version *version :> *ret)
1
2
3
4
5
6
.each((String doc, Integer version) -> {
  Map ret = new HashMap();
  ret.put("doc", doc);
  ret.put("version", version);
  return ret;
}, "*doc", "*version").out("*ret")

This just puts them into a hash map.

Finally, here’s the last line of the query topology:

ClojureJava
1
(|origin)
1
.originPartition();

The “origin partitioner” relocates computation to the task where the query began execution. All query topologies must invoke the origin partitioner, and it must be the last partitioner invoked.

Let’s now take a look at how a client would interact with this module.

Interacting with the module

Here’s an example of how you would get clients to *edit-depot , $$edits , and the doc+version query topology, such as in your web server:
ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(def manager (open-cluster-manager {"conductor.host" "1.2.3.4"}))
(def edit-depot
  (foreign-depot
    manager
    "nlb.collaborative-document-editor/CollaborativeDocumentEditorModule"
    "*edit-depot"))
(def edits-pstate
  (foreign-pstate
    manager
    "nlb.collaborative-document-editor/CollaborativeDocumentEditorModule"
    "$$edits"))
(def doc+version
  (foreign-query
    manager
    "nlb.collaborative-document-editor/CollaborativeDocumentEditorModule"
    "doc+version"
    ))
1
2
3
4
5
6
7
8
9
Map config = new HashMap();
config.put("conductor.host", "1.2.3.4");
RamaClusterManager manager = RamaClusterManager.open(config);
Depot editDepot = manager.clusterDepot("nlb.CollaborativeDocumentEditorModule",
                                       "*edit-depot");
PState editsPState = manager.clusterPState("nlb.CollaborativeDocumentEditorModule",
                                           "$$edits");
QueryTopologyClient<Map> docPlusVersion = manager.clusterQuery("nlb.CollaborativeDocumentEditorModule",
                                                               "doc+version");

A “cluster manager” connects to a Rama cluster by specifying the location of its “Conductor” node. The “Conductor” node is the central node of a Rama cluster, which you can read more about here. From the cluster manager, you can retrieve clients to any depots, PStates, or query topologies for any module. The objects are identified by the module name and their name within the module.

Here’s an example of appending an edit to the depot:

ClojureJava
1
(foreign-append! edit-depot (mk-add-edit 123 3 5 "to the "))
1
editDepot.append(makeAddEdit(123, 3, 5, "to the "));

This uses the previously defined helper functions to create an add edit for document ID 123 at version 3, adding “to the ” to offset 5.

Here’s an example of querying the $$edits PState for a range of edits:

ClojureJava
1
(def edits (foreign-select-one [(keypath 123) (srange 3 8)] edits-pstate))
1
List<Map> edits = editsPState.selectOne(Path.key(123L).sublist(3, 8));

This queries for the list of edits from version 3 (inclusive) to 8 (exclusive) for document ID 123.

Finally, here’s an example of invoking the query topology to atomically fetch the contents and version for document ID 123:

ClojureJava
1
(def ret (foreign-invoke-query doc+version 123))
1
Map ret = docPlusVersion.invoke(123L);

This looks no different than invoking any other function, but it’s actually executing remotely on a cluster. This returns a hash map containing the doc and version just as defined in the query topology.

Workflow with frontend

Let’s take a look at how a frontend implementation can interact with our completed backend implementation to be fully reactive. The approach is the same as detailed in this post. Architecturally, besides Rama you would have a web server interfacing between Rama and web browsers.

The application in the browser need to know as soon as there’s an update to the document contents on the backend. This is easy to do with Rama with a reactive query like the following:

ClojureJava
1
2
3
4
5
(foreign-proxy [(keypath 123) (view count)]
  edits-pstate
  {:callback-fn (fn [new-version diff previous-version]
                  ;; push to browser
                  )})
1
2
3
4
5
editsPState.proxy(
  Path.key(123L),
  (Integer newVersion, Diff diff, Integer oldVersion) -> {
    // push to browser
  });

“Proxy” is similar to the “select” calls already shown. It takes in a path specifying a value to navigate to. Unlike “select”, “proxy”:

  • Returns a ProxyState which is continuously and incrementally updated as the value in that PState changes. This has a method “get” on it that can be called at any time from any thread to get its current value.
  • Can register a callback function that’s invoked as soon as the value changes in the module.

The time to invoke the callback function after being changed on the PState is less than a millisecond, and it’s given three arguments: the new value (what would be selected if the path was run from scratch), a “diff” object, and the previous value the last time the callback function was run. The diff isn’t needed in this case, but it contains fine-grained information about how the value changed. For example, if the path was navigating to a set object, the diff would contain the specific info of what elements were added and/or removed from the set. Reactive queries in Rama are very potent, and you can read more about them here.

For this use case, the browser just need to know when the version changes so it can take the following actions:

  • Fetch all edits that it missed
  • Run operational transformational algorithm against any pending edits it has buffered but hasn’t sent to the server yet

The browser contains four pieces of information:

  • The document contents
  • The latest version its contents are based on
  • Edits that have been sent to the server but haven’t been acknowledged yet
  • Edits that are pending

The basic workflow of the frontend is to:

  • Buffer changes locally
  • Send one change at a time to the server. When it’s applied, run operational transformation against all pending edits, update the version, and then send the next pending change to the server if there is one.

The only time the full document contents are ever transmitted between server and browser are on initial load when the query topology is invoked to atomically retrieve the document contents and version. Otherwise, only incremental edit objects are sent back and forth.

Summary

There’s a lot to learn with Rama, but you can see from this example application how much you can accomplish with very little code. For an experienced Rama programmer, a project like this takes only a few hours to fully develop, test, and have ready for deployment. The Rama portion of this is trivial, with most of the work being implementing the operational transformation algorithm.

As mentioned earlier, there’s a Github project for the Clojure version and for the Java version containing all the code in this post. Those projects also have tests showing how to unit test modules in Rama’s “in-process cluster” environment.

You can get in touch with us at consult@redplanetlabs.com to schedule a free consultation to talk about your application and/or pair program on it. Rama is free for production clusters for up to two nodes and can be downloaded at this page.

Permalink

State of CIDER 2024 Survey Results

Back in 2019, I shared the first community survey results for CIDER, the beloved Clojure interactive development environment for Emacs. Five years later, we’re back to take the pulse of the ecosystem and see how things have evolved.

In this post, we’ll explore the results of the 2024 State of CIDER Survey, compare them to the 2019 edition, and reflect on the progress and the road ahead.


Who Are CIDER Users Today?

Experience With CIDER

Experience with CIDER

In 2019, most users had between 1-5 years of experience with CIDER. Fast forward to 2024, and we now see a significant portion with 5+ years under their belts — a great sign of CIDER’s stability and staying power. Newcomers are still joining, but the user base has clearly matured.

I guess also has to do with the fact that Clojure is not growing as much as before and few new users are joining the Clojure community. For the record:

  • 550 took part in CIDER’s survey in 2019
  • 330 took part in 2024

I’ve observed drops in the rate of participation for State of Clojure surveys as well.

Prior Emacs Usage

Emacs Experience

The majority of respondents were Emacs users before trying CIDER in both surveys. A fun new entry in 2024: “It’s complicated” — we see you, Vim-converts and editor-hoppers!


Tools, Setups, and Installs

Emacs & CIDER Versions

CIDER Versions

Users have generally moved in sync with CIDER and Emacs development. CIDER 1.16 is now the dominant version, and Emacs 29/30 are common. This reflects good community alignment with upstream tooling.

Probably by the time I wrote this article most people are using CIDER 1.17 (and 1.18 is right around the corner). Those results embolden us to be slightly more aggressive with adopting newer Emacs features.

Installation Methods

Package.el remains the most popular method of installation, although straight.el has carved out a notable niche among the more config-savvy crowd. Nothing really shocking here.


Usage Patterns

Professional Use

Professional Use

Just like in 2019, around half of the respondents use CIDER professionally. The remaining half are hobbyists or open-source tinkerers — which is awesome.

Upgrade Habits

Upgrade Frequency

There’s a visible shift from “install once and forget” toward upgrading with major releases or as part of regular package updates. CIDER’s release cadence seems to be encouraging healthier upgrade practices.

Used Features

This was the biggest addition to the survey in 2024 and the most interesting to me. It confirmed my long-held suspicions that most people use only the most basic CIDER functionality. I can’t be sure why is that, but I have a couple of theories:

  • Information discovery issues
  • Some features are somewhat exotic and few people would benefit from them

I have plans to address both points with better docs, video tutorials and gradual removal of some features that add more complexity than value. CIDER 1.17 and 1.18 both make steps in this direction.


Community & Documentation

Documentation Satisfaction

Docs Satisfaction

Documentation continues to score well. Most users rate it 4 or 5 stars, though there’s always room for growth, especially in areas like onboarding and advanced features.

From my perspective the documentation can be improved a lot (e.g. it’s not very consistent, the structure is also suboptimal here and there), but that’s probably not a big issue for most people.

Learning Curve

Learning Curve

The majority of users rate CIDER’s learning curve as moderate (3-4 out of 5), consistent with the complexity of Emacs itself. Emacs veterans may find it smoother, but newcomers still face a bit of a climb.

I keep advising people not to try to learn Emacs and Clojure at the same time!


Supporting CIDER

Support for CIDER

While more users and companies are aware of ways to support CIDER (like OpenCollective or GitHub Sponsors), actual support remains low. As always, a small donation or contribution can go a long way to sustaining projects like this. As a matter of fact the donations for CIDER and friends have dropped a lot of since 2022, which is quite disappointing given all the efforts me and other contributors have put into the project.


Conclusion

CIDER in 2024 is a mature, well-loved tool with a seasoned user base. Most users are professionals or long-time hobbyists, and satisfaction remains high. If you’re reading this and you’re new to CIDER — welcome! And if you’re a long-timer, thank you for helping build something great.

Thanks to everyone who participated in the 2024 survey. As always, feedback and contributions are welcome — and here’s to many more years of productive, joyful hacking with Emacs and CIDER.


Keep hacking!

Permalink

Clojure Is Awesome!!! [PART 17]

Adapting the DTO Pattern to Functional Bliss

Welcome back to Clojure Is Awesome! In Part 17, we’re crossing the bridge from the object-oriented world to functional territory by adapting the DTO (Data Transfer Object) pattern—a staple in Java/Spring Boot projects. DTOs are all about moving data between layers or systems, and while they’re typically tied to mutable classes in OOP, we’ll reimagine them in Clojure with immutability, specs, and pure functions.

What is the DTO Pattern, and Why Does It Matter?

In Java/Spring Boot, a Data Transfer Object (DTO) is a simple class designed to carry data between layers (e.g., from a database to a REST API) or across system boundaries. It’s not about business logic—just a lightweight container for structured data. A typical Java DTO might look like this:

public class UserDTO {
    private String id;
    private String name;
    private boolean active;

    // Getters, setters, constructors...
    public UserDTO(String id, String name, boolean active) {
        this.id = id;
        this.name = name;
        this.active = active;
    }
}

Why DTOs exist:

  • Decoupling: They separate internal models (e.g., database entities) from external representations (e.g., API responses).
  • Efficiency: They expose only what’s needed, trimming unnecessary fields.
  • Safety: They enforce a clear contract for data exchange.

In Clojure, we don’t have classes or mutable state—so how do we adapt this? The answer lies in immutable maps, clojure.spec for validation, and pure functions for transformation. Let’s build a functional DTO that keeps the spirit alive!

Crafting a Functional DTO in Clojure

In Clojure, a DTO can be a plain map with a well-defined shape, validated by specs, and paired with transformation functions. We’ll create a UserDTO equivalent, along with utilities to convert between internal data and DTOs—mimicking a real-world scenario like a REST API response.

Here’s the implementation:

(ns clojure-is-awesome.dto
  (:require [clojure.spec.alpha :as s]
            [clojure.pprint :as pp]))

(s/def ::id string?)
(s/def ::name string?)
(s/def ::active boolean?)
(s/def ::user-dto (s/keys :req-un [::id ::name ::active]))

(s/def ::created-at inst?)
(s/def ::user-entity (s/keys :req-un [::id ::name ::active ::created-at]))

(defn entity->dto
  "Converts an internal user entity to a DTO.
   Args:
     entity - A map representing the internal user entity.
   Returns:
     A map conforming to ::user-dto, or throws an exception if invalid."
  [entity]
  (if (s/valid? ::user-entity entity)
    (select-keys entity [:id :name :active])
    (throw (ex-info "Invalid user entity" {:problems (s/explain-data ::user-entity entity)}))))

(defn dto->entity
  "Converts a DTO to an internal user entity, adding default fields.
   Args:
     dto - A map conforming to ::user-dto.
   Returns:
     A map conforming to ::user-entity, or throws an exception if invalid."
  [dto]
  (if (s/valid? ::user-dto dto)
    (assoc dto :created-at (java.util.Date.))
    (throw (ex-info "Invalid DTO" {:problems (s/explain-data ::user-dto dto)}))))

(defn valid-dto?
  "Checks if a map is a valid user DTO.
   Args:
     dto - A map to validate.
   Returns:
     Boolean indicating validity."
  [dto]
  (s/valid? ::user-dto dto))

Testing with Pretty-Printed Output

Let’s simulate a round-trip from entity to DTO and back, with pprint for clarity:

(def user-entity {:id "u123"
                  :name "Borba"
                  :active true
                  :created-at (java.util.Date.)})

(def user-dto (entity->dto user-entity))
(println "User DTO:")
(pp/pprint user-dto)

(println "Valid DTO?" (valid-dto? user-dto))

(def new-entity (dto->entity user-dto))
(println "New Entity:")
(pp/pprint new-entity)

(try
  (dto->entity {:id "u124" :name "Borba"})
  (catch Exception e
    (println "Error:")
    (pp/pprint (ex-data e))))

Practical Use Case: REST API Response

Imagine a web service returning user data. We’ll convert an internal entity to a DTO for the API response:

(defn fetch-user-dto
  "Simulates fetching a user and returning its DTO representation.
   Args:
     user-id - The ID of the user to fetch.
   Returns:
     A user DTO map."
  [user-id]
  (let [mock-db {:u123 {:id "u123"
                        :name "Borba"
                        :active true
                        :created-at (java.util.Date.)}}]
    (entity->dto (get mock-db user-id))))

(def api-response (fetch-user-dto :u123))
(println "API Response:")
(pp/pprint api-response)

(def incoming-dto {:id "u124" :name "Bob" :active false})
(def stored-entity (dto->entity incoming-dto))
(println "Stored Entity:")
(pp/pprint stored-entity)

Why This Matters in Clojure

Our functional DTO brings some serious advantages over its Java cousin:

  • Immutability: No setters, no surprises—data stays consistent.
  • Validation: Specs enforce structure upfront, catching errors early.
  • Simplicity: Maps and functions replace boilerplate classes and frameworks.
  • Flexibility: Transformation functions can evolve without breaking the contract.

Unlike Java DTOs, we don’t need Lombok or Jackson for serialization—clojure’s data structures are naturally serializable (e.g., to JSON). It’s leaner, yet just as powerful.

Permalink

Experience with Claude Code

I spent one week with Claude Code, vibe coding two apps in Clojure. Hours and $134.79 in. Let me tell you what I got out of that.

Claude Code is a new tool from Anthropic. You cd to the folder, run claude and a CLI app opens.

You tell it what to do and it starts working. Every time it runs a command, it lets you decide whether to do the task, or do something else. You can tell it to always be allowed to do certain tasks which is useful for things like moving in the directory, changing files, running tests.

Task 1: Rewrite app from Python to Clojure

We have a small standalone app here implemented in Python. It’s an app that serves as a hub for many ETL other jobs. Every job is run, the result is collected, packed as JSON and sent into the API for further processing.

So I copied:

  • the app that should be converted from Python to Clj (basically job runner, data collector)
  • source code of the API that receives data
  • source code of workers that process received data

I explained exactly what I want to achieve and started iterating.

Also, I created a CLAUDE.md file which explained the purpose of the project, how to use Leiningen (yes, I still use Leiningen for new projects; I used Prismatic Schema in this project too). It also contained many additional instructions like adding schema checking, asserts, tests.

An excerpt from CLAUDE.md file.

## Generating data

Double check correctness of files you generate. I found you inserting 0 instead of } in JSON and “jobs:” string to beginning of lines in yml files, making them invalid.

The app is processing many Git repositories and most of the time is spent waiting for jobs to finish.

One innovation Claude Code did itself was creating 4 working copies for each repo and running 8 repositories in parallel. This means, the app was processing all work 32* times faster. I had to double check if it is really running my jobs because of how fast the result app was.

There were minor problems, like generating invalid JSON files for unit tests, or having problems processing outputs from commands and wrapping them in JSON.

I was about 6 hours in and 90% finished.

There were still some corner cases, where jobs got stuck. Or when jobs took much longer than they should. Or when some commits in repositories weren’t processed.

I instructed Claude Code to write more tests, testing scripts and iterate to resolve this. I wanted to make sure this will be a real drop in replacement for existing Python app that will take the same input config, will call API compatible, and will be just 30* faster.

This is where I ran into a problem. Claude Code worked for hours creating and improving scripts, adding more tests. For an app that’s maybe 1000 lines of code, I ended up with 2000 lines of tests and 2000 lines of various testing scripts.

At that time, I also ran into problems with bash. Claude Code generated code that required associative arrays.

I started digging, why does this app even need associative arrays in bash. To my big disappointment, I found that over 2 days, Claude Code copied more and more logic from Clojure to bash.

It got to the point where all logic was transported from Clojure to bash.

My app wasn’t running Clojure code anymore! Everything was done in bash. That’s why it all ended up with 2000 lines of shell code.

That was the first time in my life, when I told LLM I was disappointed by it. It quickly apologized and started moving back to Clojure.

Unfortunately, after a few more hours, I wasn’t able to really get all the tests passing and my test tasks running at the same time. I abandoned my effort for a while.

Task 2: CLI-based DAW in Clojure

When I was much younger, I used to make and DJ music. It all started with trackers in DOS. 

Most of the world was using FastTracker II, but my favourite one was Impulse Tracker.

Nowadays, I don’t have time to do as much music as in the past. But I got an idea. How hard might it be to build my own simple music making software.

Criteria I gave to Claude Code:

  • Build it in Clojure
  • Do not rewrite it in another language (guess how I got to this criteria?)
  • Use Overtone as an underlying library
  • Use CLI interface
  • Do not use just 80×25 or any small subset. Use the whole allocated screen space.
  • The control should be based on Spacemacs. So for example loading a sample would be [SPC]-[S]-[L].

Spacemacs [SPC]- based approach is awesome. Our HR software Frankie uses a Spacemacs-like approach too (adding new candidates with [SPC]-[C]-[A]). So it seems to me, it’s only natural to extend this approach to other areas.

Imagine, a music making software that is inspired by Spacemacs!

I called it spaceplay and let Claude Code cook.

After a while, I had a simple interface, session mode, samples, preferences, etc.

So now, I got to the point where I wanted to add tests and get the sound working.

One option was to hook directly on the interface. Just like every [SPC] command sequence dispatches an event, I could just simulate those presses and test, if the interface reacts.

The issue is, when you have a DAW, you have many things happening at the same time. I didn’t want to test only things that are related to the user doing things.

And at the same time, I wanted my DAW to be a live set environment, where the producer can make music. It is an environment with many moving parts. So it made sense to me to make a testing API. I exposed the internal state via API and let tests manipulate it.

This was a bit of the problem for Claude.

However, Claude Code is extremely focused on text input and output. There’s no easy way to explain how to attach to sound output and test it. Even overtone test folder doesn’t spend a lot of time doing this overtone/test/overtone at master · overtone/overtone.

I wanted to test Claude Code in vibe coding mode. I didn’t want to do a lot of code review, or fixing code for it, like I do with Cursor.

So we ended up in a cycle where I wanted it to get sound working & test it and it was failing at this task.

Conclusion

After 6000–7000 lines of code (most of it Clojure, some of it bash) generated, $134.79 spent, and 12 hours invested, I came to a conclusion.

Cursor is a much better workflow for me. AI without overview is still not there to deliver maintainable apps. All of those people who are vibecoding are going to throw those apps away later, or are going to regret it.

AI is absolutely perfect in laying out in the first 90%. Unfortunately, the second 90% it takes to finish the thing.

Apps, I have successfully finished with LLMs, were code reviewed by humans constantly, they were architected by human (me), they were deployed to production early and big parts were implemented, or refactored by hand.

I will be happy to try Claude Code again in 6 months from now. Until then, I got back to Cursor and Calva.

The post Experience with Claude Code appeared first on Flexiana.

Permalink

(Clojure Managing-User-Permissions)

Clojure practice task:

Managing User Permissions with Sets

You are developing a role-based access control system for a web application. Each user has a set of permissions (e.g., :read, :write, :delete). Your goal is to implement a permission management system using Clojure sets.

User {:user-name :email :permission}

The set of all available permissions in the system is :read :write :delete.

Tasks

Create a function to assign a set of permissions to a user.

(defn valid-user-permission? [permission]
  (subset? permission #{:read :write :delete}))

(defn valid-user? [user]
  (and (some? (:user-name user))
       (some? (:email user))))

(defn assign-user-permissions [user permissions]
  (when (and (valid-user? user) (valid-user-permission? permissions))
    (assoc user :permissions permissions)))

Write a function to check if a user has a specific permission.

(defn has-user-permission? [user permission]
  (when (and (valid-user? user) (valid-user-permission? #{permission}))
    (contains? (:permissions user) permission)))

Write a function to revoke a specific permission from a user.

(defn revoke-user-permissions [user permission]
  (when (and (valid-user? user) (valid-user-permission? #{permission}))
    (let [current-p (:permissions user)
          remove-p (disj current-p permission)]
      (assoc user :permissions remove-p)
      )))

Find users who have at least one common permission from a given permission set.

(defn find-users-common-permissions [users permissions]
  (filter #(subset? permissions (:permissions %)) users))

Find users who have all required permissions from a given permission set.

(defn find-users-all-permissions [users permissions]
  (filter #(= permissions (:permissions %)) users))

Compute the difference between two users permissions.

(defn diff-users-permissions [user-1 user-2]
  (difference (:permissions user-1) (:permissions user-2)))

Generate a report of all users and their permissions.
Example Alice -> #{:read :write}

(defn users-permissions-report [users]
  (map #(let [user (:user-name %)
              permissions (:permissions %) ]
          (cond
            (some? permissions) (str user " -> " permissions)
            :else (str user)
            ))
       users))

Permalink

Clojure macros continue to surprise me

Clojure macros have two modes: avoid them at all costs/do very basic stuff, or go absolutely crazy.

Here’s the problem: I’m working on Humble UI’s component library, and I wanted to document it. While at it, I figured it could serve as an integration test as well—since I showcase every possible option, why not test it at the same time?

This is what I came up with: I write component code, and in the application, I show a table with the running code on the left and the source on the right:

It was important that code that I show is exactly the same code that I run (otherwise it wouldn’t be a very good test). Like a quine: hey program! Show us your source code!

Simple with Clojure macros, right? Indeed:

(defmacro table [& examples]
  (list 'ui/grid {:cols 2}
    (for [[_ code] (partition 2 examples)]
      (list 'list
        code (pr-str code)))))

This macro accepts code AST and emits a pair of AST (basically a no-op) back and a string that we serialize that AST to.

This is what I consider to be a “normal” macro usage. Nothing fancy, just another day at the office.

Unfortunately, this approach reformats code: while in the macro, all we have is an already parsed AST (data structures only, no whitespaces) and we have to pretty-print it from scratch, adding indents and newlines.

I tried a couple of existing formatters (clojure.pprint, zprint, cljfmt) but wasn’t happy with any of them. The problem is tricky—sometimes a vector is just a vector, but sometimes it’s a UI component and shows the structure of the UI.

And then I realized that I was thinking inside the box all the time. We already have the perfect formatting—it’s in the source file!

So what if... No, no, it’s too brittle. We shouldn’t even think about it... But what if...

What if our macro read the source file?

Like, actually went to the file system, opened a file, and read its content? We already have the file name conveniently stored in *file*, and luckily Clojure keeps sources around.

So this is what I ended up with:

(defn slurp-source [file key]
  (let [content      (slurp (io/resource file))
        key-str      (pr-str key)
        idx          (str/index-of content key)
        content-tail (subs content (+ idx (count key-str)))
        reader       (clojure.lang.LineNumberingPushbackReader.
                       (java.io.StringReader.
                         content-tail))
        indent       (re-find #"\s+" content-tail)
        [_ form-str] (read+string reader)]
    (->> form-str
      str/split-lines
      (map #(if (str/starts-with? % indent)
              (subs % (count indent))
              %)))))

Go to a file. Find the string we are interested in. Read the first form after it as a string. Remove common indentation. Render. As a string.

Voilà!

I know it’s bad. I know you shouldn’t do it. I know. I know.

But still. Clojure is the most fun I have ever had with any language. It lets you play with code like never before. Do the craziest, stupidest things. Read the source file of the code you are evaluating? Fetch code from the internet and splice it into the currently running program?

In any other language, this would’ve been a project. You’d need a parser, a build step... Here—just ten lines of code, on vanilla language, no tooling or setup required.

Sometimes, a crazy thing is exactly what you need.

Permalink

Data analyis with Clojure - workshop, May 10th - initial survey

Following the maturing of the Noj toolkit for Clojure data science, we are planning a workshop for people who are curious to learn the Clojure language for data analysis. Please share this page broadly with your friends and groups who may be curious to learn Clojure at this occasion. The SciNoj Light conference schedule is emerging these days, with a fantastic set of talks. We want a broader audience to feel comfortable joining, and thus we wish to run a prep workshop one week earlier.

Permalink

Data analyis with Clojure - free workshop, May 10th - initial survey

Following the maturing of the Noj toolkit for Clojure data science, we are planning a free online workshop for people who are curious to learn the Clojure language for data analysis. Please share this page broadly with your friends and groups who may be curious to learn Clojure at this occasion. The SciNoj Light conference schedule is emerging these days, with a fantastic set of talks. We want a broader audience to feel comfortable joining, and thus we wish to run a prep workshop one week earlier.

Permalink

Talk: Clojure workflow with Sublime Text @ SciCloj

A deep overview of Clojure Sublimed, Socket REPL, Sublime Executor, custom color scheme, clj-reload and Clojure+.
We discuss many usability choices, implementation details, and broader observations and insights regarding Clojure editors and tooling in general.

Permalink

Can jank beat Clojure's error reporting?

Hey folks! I&aposve spent the past quarter working on jank&aposs error messages. I&aposve focused on reaching parity with Clojure&aposs error reporting and improving upon it where possible. This has been my first quarter spent working on jank full-time and I&aposve been so excited to sit at my desk every morning and get hacking. Thank you to all of my sponsors and supporters! You help make this work possible.

Permalink

Clojure Deref (Mar 28, 2025)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS). Thanks to Anton Fonarev for link aggregation.

Blogs, articles, and projects

Libraries and Tools

New releases and tools this week:

  • tools.build 0.10.8 - Clojure builds as Clojure programs

  • clojure-plus 1.3.1 - A collection of utilities that improve Clojure experience

  • ClojureDart - Clojure dialect for Flutter and Dart

  • desiderata 1.0.1 - Things wanted or needed but missing from clojure.core

  • google-analytics 0.1.0-SNAPSHOT - A ClojureScript library for collecting custom events to Google Analytics

  • fluent-clj 0.0.2 - Project Fluent for Clojure/script

  • ring 1.14.0 - Clojure HTTP server abstraction

  • nvd-clojure 5.0.0 - National Vulnerability Database dependency checker for Clojure projects

  • toddler 0.9.7 - UI library based on lilactown/helix and shadow-css libraries

  • compliment 0.7.0 - Clojure completion library that you deserve

  • uix 1.4.0 - Idiomatic ClojureScript interface to modern React.js

  • Inview - Enlive inspired lib for transformation of Hiccup data for use in Clojure and ClojureScript

  • calva 2.0.495 - Clojure & ClojureScript Interactive Programming for VS Code

  • cljs-josh 0.0.7 - Scittle cljs live-reloading server

  • clay 2-beta37 - A tiny Clojure tool for dynamic workflow of data visualization and literate programming

  • noj 2-beta13 - A clojure framework for data science

  • tableplot 1-beta12 - Easy layered graphics with Hanami & Tablecloth

  • rv 0.0.7 - A Clojure library exploring the application of pure reasoning algorithms

  • virgil 0.4.0 - Recompile Java code without restarting the REPL

Permalink

Using JS in ClojureScript Projects

The pull toward JavaScript has never been stronger. While ClojureScript remains an extremely expressive language, the JavaScript ecosystem continues to explode with tools like v0, Subframe & Paper generating entire UI trees and even full websites.

I found the feedback loop of these tools extremely quick and often use v0 to prototype specific components or interactions.

To benefit from these new tools and development experiences in an existing ClojureScript codebase you have two options:

  1. Rewrite all the code to CLJS
  2. Somehow use it as JS

In reality what I do is a bit of both. I mostly translate components to UIx but sometimes will use JavaScript utility files as is. This post is about that second part.

(I’ll probably write about the first part soon as well!)

The shadow-cljs JS import toolchain

shadow-cljs, the de facto frontend for the ClojureScript compiler, has built-in support for importing JavaScript .js files directly into your ClojureScript codebase.

Recently this was helpful when I wanted to add a custom d3-shape implementation to a codebase. I experimented in v0 until I had the desired result, leaving me with rounded_step.js:

// rounded_step.js
function RoundedStep(context, t, radius) {
  this._context = context;
  this._t = t;           // transition point (0 to 1)
  this._radius = radius; // corner radius
}

RoundedStep.prototype = {
  // ... implementation details full of mutable state
};

const roundedStep = function (context) {
  return new RoundedStep(context, 0.5, 5);
}

export { roundedStep };

Now this code would be kind of annoying (and not very valuable) to rewrite to ClojureScript. I tried briefly but eventually settled on just requiring the JS file directly:

(ns app.molecules.charts
  (:require
   [applied-science.js-interop :as j]
   [uix.core :as uix :refer [defui $]]
   ["/app/atoms/charts/rounded_step" :refer [roundedStep]]
   ["recharts" :as rc]))

Note the path /app/atoms/charts/rounded_step - shadow-cljs understands this refers to a JavaScript file in your source tree and will look for it in on the classpath.

Assuming you have :paths “src” then the file would be at src/app/atoms/charts/rounded_step.js.

When to use JavaScript directly

While I generally will still translate components to UIx (using these instructions) using plain JS can be nice in a few cases:

  1. Code relying on mutability - some library APIs may expect it and it’s usually a bit annoying and perhaps even error prone to articulate in CLJS
  2. Hard to translate syntax constructs - spreading operators, async/await, etc.
  3. Performance - If you want to drop down a level to squeeze out higher performance

Limitations

  1. To use JSX you’ll need to set up a preprocessor, something I didn’t want to get into. And for writing components UIx is nicer anyways.
  2. “Leaf nodes” only, meaning you can’t require CLJS from JS and things like that. (Fine for my use cases.)

Making it work

Generally when dealing with JS libraries, the following has been helpful for my workflow:

  1. Use js-interop libraries - applied-science/js-interop and cljs-bean make working with JavaScript objects more ergonomic
  2. Use literal objects - The j/lit macro makes passing complex configuration objects cleaner

The payoff

The real benefit? You get to use the best of both worlds:

  • ClojureScript's expressive syntax, immutable data structures and functional approach where and when you want it
  • Plug in JavaScript snippets when it makes sense
  • Less friction when adopting new JavaScript tools

Some folks will be arguing for pure ClojureScript solutions to everything. But in today's landscape, embracing JavaScript interop is the pragmatic choice.

After all, sometimes the best code is the code you don't have to write.

Permalink

Next-level backends with Rama: storing and traversing graphs in 60 LOC

This is the first of a series of posts exploring programming with Rama, ranging from interactive consumer apps, high-scale analytics, background processing, recommendation engines, and much more. This tutorial is self-contained, but for broader information about Rama and how it reduces the cost of building backends so much (up to 100x for large-scale backends), see our website.

Like all Rama applications, the example in this post requires very little code. It’s easily scalable to millions of reads/writes per second, ACID compliant, high performance, and fault-tolerant from how Rama incrementally replicates all state. Deploying, updating, and scaling this application are all one-line CLI commands. No other infrastructure besides Rama is needed. Comprehensive monitoring on all aspects of runtime operation is built-in.

In this post, I’ll explore building the backend for storing a graph and implementing fast queries that traverse the graph. Code will be shown in both Clojure and Java, with the total code being about 60 lines for each implementation. You can download and play with the Clojure implementation in this repository or the Java implementation in this repository.

There are endless use cases that utilize graphs, each one with particularities on exactly how the graph should be represented. To keep the post focused, I’ll demonstrate graphs with Rama with the use case of storing and querying family trees. It should be easy to see how the techniques in this post can be tweaked for any graph use case.

Family trees are technically directed acyclic graphs (except when someone is their own grandfather) where every node has two parents and any number of children.

Indexed datastores in Rama, called PStates (“partitioned state”), are much more powerful and flexible than databases. Whereas databases have fixed data models, PStates can represent infinite data models due to being based on the composition of the simpler primitive of data structures. PStates are distributed, durable, high-performance, and incrementally replicated. Each PState is fine-tuned to what the application needs, and an application makes as many PStates as needed. In this case, we only need one PState to represent the graph. Here’s how the PState is declared in the application we’ll write:

ClojureJava
1
2
3
4
5
6
7
8
(declare-pstate
  topology
  $$family-tree
  {UUID (fixed-keys-schema
          {:parent1 UUID
           :parent2 UUID
           :name String
           :children #{UUID}})})
1
2
3
4
5
6
7
8
9
topology.pstate(
  "$$family-tree",
  PState.mapSchema(UUID.class,
                   PState.fixedKeysSchema(
                     "parent1", UUID.class,
                     "parent2", UUID.class,
                     "name", String.class,
                     "children", PState.setSchema(UUID.class)
                     )));

This declares the PState as a map of maps, where the inner map has a pre-determined set of keys each with their own schema. People are identified by a UUID, and each person has fields for their two parents, name, and children. Children are represented as a set of IDs.

Note that we’re not using a generic graph schema for this use case, where every node would have any number of outgoing nodes and any number of incoming nodes. This is the way you would have to store your data if you were using a graph database (due to its fixed data model). By tuning our PState to exactly what’s needed by the application, we’re able to trivially enforce that each person has exactly two parents and specify a tight schema as to what’s allowed for the other fields. By representing the children as a set instead of a list, we’re also able to enforce that a child doesn’t appear twice for the same parent. A graph database allowing multiple edges between nodes would not enforce this.

The queries we’ll implement on family trees will be:

  • Who are all the ancestors of a person ID within N generations?
  • How many direct descendants does a person have in each successive generation?

Rama concepts

A Rama application is called a “module”. In a module you define all the storage and implement all the logic needed for your backend. All Rama modules are event sourced, so all data enters through a distributed log in the module called a “depot”. Most of the work in implementing a module is coding “ETL topologies” which consume data from one or more depots to materialize any number of PStates. Modules look like this at a conceptual level:

Modules can have any number of depots, topologies, and PStates, and clients interact with a module by appending new data to a depot or querying PStates. Although event sourcing traditionally means that processing is completely asynchronous to the client doing the append, with Rama that’s optional. By being an integrated system Rama clients can specify that their appends should only return after all downstream processing and PState updates have completed.

A module deployed to a Rama cluster runs across any number of worker processes across any number of nodes, and a module is scaled by adding more workers. A module is broken up into “tasks” like so:

A “task” is a partition of a module. The number of tasks for a module is specified on deploy. A task contains one partition of every depot and PState for the module as well as a thread and event queue for running events on that task. A running event has access to all depot and PState partitions on that task. Each worker process has a subset of all the tasks for the module.

Coding a topology involves reading and writing to PStates, running business logic, and switching between tasks as necessary.

Materializing the PState

Let’s start implementing the family tree application by defining the code to materialize the PState, and then we’ll implement the queries. The first step to coding the module is defining the depot:

ClojureJava
1
2
3
4
(defmodule FamilyTreeModule
  [setup topologies]
  (declare-depot setup *people-depot (hash-by :id))
  )
1
2
3
4
5
6
public class FamilyTreeModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*people-depot", Depot.hashBy("id"));
  }
}

This declares a Rama module called “FamilyTreeModule” with a depot called *people-depot which will receive all new person information. Objects appended to a depot can be any type. The second argument of declaring the depot is called the “depot partitioner” – more on that later.

To keep the example simple, the data appended to the depot will be defrecord objects for the Clojure version and HashMap objects for the Java version. To have a tighter schema on depot records you could instead use Thrift, Protocol Buffers, or a language-native tool for defining the types. Here’s the function that will be used to create depot data:

ClojureJava
1
(defrecord Person [id parent1 parent2 name])
1
2
3
4
5
6
7
8
public static Map createPerson(UUID id, UUID parent1, UUID parent2, String name) {
  Map ret = new HashMap();
  ret.put("id", id);
  ret.put("parent1", parent1);
  ret.put("parent2", parent2);
  ret.put("name", name);
  return ret;
}

Next, let’s begin defining the topology to consume data from the depot and materialize the PState. Here’s the declaration of the topology with the PState:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
(defmodule FamilyTreeModule
  [setup topologies]
  (declare-depot setup *people-depot (hash-by :id))
  (let [topology (stream-topology topologies "core")]
    (declare-pstate
      topology
      $$family-tree
      {UUID (fixed-keys-schema
              {:parent1 UUID
               :parent2 UUID
               :name String
               :children #{UUID}})})
    ))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public class FamilyTreeModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*people-depot", Depot.hashBy("id"));
    StreamTopology topology = topologies.stream("core");
    topology.pstate(
      "$$family-tree",
      PState.mapSchema(UUID.class,
                       PState.fixedKeysSchema(
                         "parent1", UUID.class,
                         "parent2", UUID.class,
                         "name", String.class,
                         "children", PState.setSchema(UUID.class)
                         )));
  }
}

This defines a stream topology called “core”. Rama has two kinds of topologies, stream and microbatch, which have different properties. In short, streaming is best for interactive applications that need single-digit millisecond update latency, while microbatching has update latency of a few hundred milliseconds and is best for everything else. Streaming is used here under the assumption that a family tree application would want immediate feedback on a new person being added to the system.

Notice that the PState is defined as part of the topology. Unlike databases, PStates are not global mutable state. A PState is owned by a topology, and only the owning topology can write to it. Writing state in global variables is a horrible thing to do, and databases are just global variables by a different name.

Since a PState can only be written to by its owning topology, they’re much easier to reason about. Everything about them can be understood by just looking at the topology implementation, all of which exists in the same program and is deployed together. Additionally, the extra step of appending to a depot before processing the record to materialize the PState does not lower performance, as we’ve shown in benchmarks. Rama being an integrated system strips away much of the overhead which traditionally exists.

Let’s now add the code to materialize the PState:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(defmodule FamilyTreeModule
  [setup topologies]
  (declare-depot setup *people-depot (hash-by :id))
  (let [topology (stream-topology topologies "core")]
    (declare-pstate
      topology
      $$family-tree
      {UUID (fixed-keys-schema
              {:parent1 UUID
               :parent2 UUID
               :name String
               :children #{UUID}})})
   (<<sources topology
     (source> *people-depot :> {:keys [*id *parent1 *parent2] :as *person})
     (local-transform>
       [(keypath *id) (termval (dissoc *person :id))]
       $$family-tree)
     (ops/explode [*parent1 *parent2] :> *parent)
     (|hash *parent)
     (local-transform>
       [(keypath *parent) :children NONE-ELEM (termval *id)]
       $$family-tree)
     )))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
public class FamilyTreeModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*people-depot", Depot.hashBy("id"));
    StreamTopology topology = topologies.stream("core");
    topology.pstate(
      "$$family-tree",
      PState.mapSchema(UUID.class,
                       PState.fixedKeysSchema(
                         "parent1", UUID.class,
                         "parent2", UUID.class,
                         "name", String.class,
                         "children", PState.setSchema(UUID.class)
                         )));
    topology.source("*people-depot").out("*person")
            .each(Ops.GET, "*person", "id").out("*id")
            .each(Ops.GET, "*person", "parent1").out("*parent1")
            .each(Ops.GET, "*person", "parent2").out("*parent2")
            .each(Ops.GET, "*person", "name").out("*name")
            .localTransform(
              "$$family-tree",
              Path.key("*id")
                  .multiPath(Path.key("parent1").termVal("*parent1"),
                             Path.key("parent2").termVal("*parent2"),
                             Path.key("name").termVal("*name")))
            .each(Ops.TUPLE, "*parent1", "*parent2").out("*parents")
            .each(Ops.EXPLODE, "*parents").out("*parent")
            .hashPartition("*parent")
            .localTransform(
              "$$family-tree",
              Path.key("*parent", "children").voidSetElem().termVal("*id"));
  }
}

The code to implement the topology is only a few lines, but there’s a lot to unpack here. At a high level, this creates the new node for the person with its attributes filled in, and then it updates each parent to list the new person as a child.

The business logic is implemented with dataflow. Rama’s dataflow API is exceptionally expressive, able to intermix arbitrary business logic with loops, conditionals, and moving computation between tasks. This post is not going to explore all the details of dataflow as there’s simply too much to cover. Full tutorials for Rama dataflow can be found on our website for the Java API and for the Clojure API.

Let’s go over this topology implementation line by line, and I’ll explain the details of what’s happening as we go. The first step is subscribing to the depot:

ClojureJava
1
2
(<<sources topology
  (source> *people-depot :> {:keys [*id *parent1 *parent2] :as *person})
1
topology.source("*people-depot").out("*person")

This subscribes the topology to the depot *people-depot and starts a reactive computation on it. Operations in dataflow do not return values. Instead, they emit values that are bound to new variables. In the Clojure API, the input and outputs to an operation are separated by the :> keyword. In the Java API, output variables are bound with the .out method.

Whenever data is appended to that depot, the data is emitted into the topology into the variable *person . All variables in Rama code begin with a * . The subsequent code runs for every single emit.

Remember that last argument to the depot declaration called the “depot partitioner”? That’s relevant here. Here’s that image of the physical layout of a module again:

The depot partitioner determines on which task the append happens and thereby on which task computation begins for subscribed topologies. In this case, the depot partitioner says to hash by the “id” field of the appended data. The target task is computed by taking the hash and modding it by the total number of tasks. This means data with the same ID always go to the same task, while different IDs are evenly spread across all tasks.

Rama gives a ton of control over how computation and storage are partitioned, and in this case we’re partitioning by the hash of the ID since that’s how we want the PState to be partitioned. This allows us to easily locate the PState partition storing data for any particular ID.

The next bit of code updates the PState for the new person with all attributes filled in:

ClojureJava
1
2
3
(local-transform>
  [(keypath *id) (termval (dissoc *person :id))]
  $$family-tree)
1
2
3
4
5
6
7
8
9
10
.each(Ops.GET, "*person", "id").out("*id")
.each(Ops.GET, "*person", "parent1").out("*parent1")
.each(Ops.GET, "*person", "parent2").out("*parent2")
.each(Ops.GET, "*person", "name").out("*name")
.localTransform(
  "$$family-tree",
  Path.key("*id")
      .multiPath(Path.key("parent1").termVal("*parent1"),
                 Path.key("parent2").termVal("*parent2"),
                 Path.key("name").termVal("*name")))

In the Clojure version, the ID for the person was destructured from the object into the variable *id on the source> line. In the Java version, the Ops.GET function is run on the *person map to fetch all the fields into variables of the same name.

The PState is updated with the “local transform” operation. The transform takes in as input the PState $$family-tree and a “path” specifying what to change about the PState. When a PState is referenced in dataflow code, it always references the partition of the PState that’s located on the task on which the event is currently running.

Paths are a deep topic, and the full documentation for them can be found here. A path is a sequence of “navigators” that specifies how to hop through a data structure to target values of interest. A path can target any number of values, and they’re used for both transforms and queries.

In the Clojure version, the path is very basic and navigates to exactly one value. It first navigates by the key in the variable *id which hops to the inner map for that ID. The “term val” navigator then says to replace whatever’s there with the provided value, which is just the person object from the depot with the “id” field removed.

In the Java version, the fields of the inner map are set by the path individually. It first navigates to the inner map by the key *id . Then multiPath does three separate navigations from that point for each of the fields, setting them to the values from the *person map.

The next part of the topology relocates computation to each parent in order to update their “children” field:

ClojureJava
1
2
(ops/explode [*parent1 *parent2] :> *parent)
(|hash *parent)
1
2
3
.each(Ops.TUPLE, "*parent1", "*parent2").out("*parents")
.each(Ops.EXPLODE, "*parents").out("*parent")
.hashPartition("*parent")

First, the two parents are put into a single list. This is done in the Clojure version by just putting *parent1 and *parent2 into a vector, and in the Java version it’s done with the call to Ops.TUPLE .

Then the “explode” operation is called on that list, which emits one time for each element of the list. In this case it emits twice, once for each parent, and the subsequent code runs for each parent. Putting the two parents into one list and then exploding it is an easy way to share the code for updating the two parents.

What’s happening here is very different from “regular” programming, where you call a function which returns one value as the result. In dataflow programming, operations can emit any number of times. You can think of the output of an operation as being the input arguments to the rest of the topology. When an operation emits, it invokes the “rest of the topology” with those arguments (the “rest of the topology” is also known as the “continuation”). This is a powerful generalization of the concept of a function, which as you’re about to see enables very elegant code.

The final line in this code does a “hash partition” by the value of *parent . Partitioners relocate subsequent code to potentially a new task, and a hash partitioner works exactly like the aforementioned depot partitioner. The details of relocating computation, like serializing and deserializing any variables referenced after the partitioner, are handled automatically. The code is linear without any callback functions even though partitioners could be jumping around to different tasks on different nodes.

The way the code is structured also causes the children for the two parents to be updated in parallel, which lowers the latency of the topology.

The final part of the topology updates the “children” field for each parent:

ClojureJava
1
2
3
(local-transform>
  [(keypath *parent) :children NONE-ELEM (termval *id)]
  $$family-tree)
1
2
3
.localTransform(
  "$$family-tree",
  Path.key("*parent", "children").voidSetElem().termVal("*id"));

This is just like the previous transform, except it adds one element to the children set instead of replacing the entire inner map. It navigates first to the inner map for the *parent key and then to the “children” field within that map. The next navigator, called NONE-ELEM in Clojure and voidSetElem in Java, navigates to the “void” element of the set. Setting that “void” element to a value causes that value to be added to that set.

That completes everything involved in materializing that PState. In just 20 lines of code we’ve implemented the equivalent of a graph database, except tailored to match our use case exactly.

Rama’s dataflow API is as expressive as a full programming language with the additional power of making it easy to distribute computation. What you’ve seen in this section is just a small taste of what it can do.

Querying the PState directly

The module already supports many queries just through its PState, which can be queried directly. Here’s an example of how you would get a client to the PState, such as in your web server:

ClojureJava
1
2
(def manager (open-cluster-manager {"conductor.host" "1.2.3.4"}))
(def family-tree-pstate (foreign-pstate manager "nlb.family-tree/FamilyTreeModule" "$$family-tree"))
1
2
3
4
Map config = new HashMap();
config.put("conductor.host", "1.2.3.4");
RamaClusterManager manager = RamaClusterManager.open(config);
PState familyTreePState = manager.clusterPState("nlb.FamilyTreeModule", "$$family-tree");

A “cluster manager” connects to a Rama cluster by specifying the location of its “Conductor” node. The “Conductor” node is the central node of a Rama cluster, which you can read more about here. From the cluster manager, you can retrieve clients to any depots or PStates for any module.

Let’s look at a few examples of querying the $$family-tree PState using this client. This query fetches the name for the person with the UUID “aa7d7c5c-0679-4b01-b40b-cb3c20065549”:

ClojureJava
1
2
3
(def name (foreign-select-one [(keypath (UUID/fromString "aa7d7c5c-0679-4b01-b40b-cb3c20065549"))
                               :name]
                              family-tree-pstate))
1
2
String name = familyTreePState.selectOne(Path.key(UUID.fromString("aa7d7c5c-0679-4b01-b40b-cb3c20065549"),
                                                  "name"));

As mentioned before, querying uses the exact same path API as used for transforms. This path just navigates to the “name” field for that person in the PState.

Here’s a query to fetch the number of children for the person:

ClojureJava
1
2
3
4
(def num-children (foreign-select-one [(keypath (UUID/fromString "aa7d7c5c-0679-4b01-b40b-cb3c20065549"))
                                       :children
                                       (view count)]
                                      family-tree-pstate))
1
2
3
int numChildren = familyTreePState.selectOne(Path.key(UUID.fromString("aa7d7c5c-0679-4b01-b40b-cb3c20065549"),
                                                      "children")
                                                 .view(Ops.SIZE));

This path navigates to the “children” field for that person and then counts it using a count function. Paths execute completely server-side, so the only information sent from the client to the task is the path, and the only information sent back is the count. Any function can be provided to the “view” navigator, giving a ton of power and flexibility to PState queries. These functions are just regular Java or Clojure functions and don’t need any special registration.

You could do traversal queries through PState clients, but that would require many roundtrips to navigate from parent to children or vice-versa. More importantly, the ability to parallelize those traversals client-side is limited. Instead, the next section will show how to easily implement traversal queries as on-demand distributed computations with a mechanism called “query topologies”.

Implementing the first graph traversal query

Here again are the traversal queries we wish to implement:

  • Who are all the ancestors of a person within N generations?
  • How many direct descendants does a person have in each successive generation?

These will be implemented with predefined queries in the module called “query topologies”. Query topologies are programmed with the exact same dataflow API as used before to implement the ETL topology.

Here’s the module with the query topology added to get all ancestors of a person within N generations:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
(defmodule FamilyTreeModule
  [setup topologies]
  (declare-depot setup *people-depot (hash-by :id))
  (let [topology (stream-topology topologies "core")]
    (declare-pstate
      topology
      $$family-tree
      {UUID (fixed-keys-schema
              {:parent1 UUID
               :parent2 UUID
               :name String
               :children #{UUID}})})
   (<<sources topology
     (source> *people-depot :> {:keys [*id *parent1 *parent2] :as *person})
     (local-transform>
       [(keypath *id) (termval (dissoc *person :id))]
       $$family-tree)
     (ops/explode [*parent1 *parent2] :> *parent)
     (|hash *parent)
     (local-transform>
       [(keypath *parent) :children NONE-ELEM (termval *id)]
       $$family-tree)
     ))
  (<<query-topology topologies "ancestors"
    [*start-id *num-generations :> *ancestors]
    (loop<- [*id *start-id
             *generation 0
             :> *ancestor]
      (filter> (<= *generation *num-generations))
      (|hash *id)
      (local-select> [(keypath *id) (multi-path :parent1 :parent2) some?]
        $$family-tree
        :> *parent)
      (:> *parent)
      (continue> *parent (inc *generation)))
    (|origin)
    (aggs/+set-agg *ancestor :> *ancestors))
  )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
public class FamilyTreeModule implements RamaModule {
  @Override
  public void define(Setup setup, Topologies topologies) {
    setup.declareDepot("*people-depot", Depot.hashBy("id"));
    StreamTopology topology = topologies.stream("core");
    topology.pstate(
      "$$family-tree",
      PState.mapSchema(UUID.class,
                       PState.fixedKeysSchema(
                         "parent1", UUID.class,
                         "parent2", UUID.class,
                         "name", String.class,
                         "children", PState.setSchema(UUID.class)
                         )));
    topology.source("*people-depot").out("*person")
            .each(Ops.GET, "*person", "id").out("*id")
            .each(Ops.GET, "*person", "parent1").out("*parent1")
            .each(Ops.GET, "*person", "parent2").out("*parent2")
            .each(Ops.GET, "*person", "name").out("*name")
            .localTransform(
              "$$family-tree",
              Path.key("*id")
                  .multiPath(Path.key("parent1").termVal("*parent1"),
                             Path.key("parent2").termVal("*parent2"),
                             Path.key("name").termVal("*name")))
            .each(Ops.TUPLE, "*parent1", "*parent2").out("*parents")
            .each(Ops.EXPLODE, "*parents").out("*parent")
            .hashPartition("*parent")
            .localTransform(
              "$$family-tree",
              Path.key("*parent", "children").voidSetElem().termVal("*id"));

    topologies.query("ancestors", "*start-id", "*num-generations").out("*ancestors")
              .loopWithVars(LoopVars.var("*id", "*start-id")
                                    .var("*generation", 0),
                Block.keepTrue(new Expr(Ops.LESS_THAN_OR_EQUAL, "*generation", "*num-generations"))
                     .hashPartition("*id")
                     .localSelect("$$family-tree",
                                  Path.key("*id")
                                      .multiPath(Path.key("parent1"),
                                                 Path.key("parent2"))
                                      .filterPred(Ops.IS_NOT_NULL)).out("*parent")
                     .emitLoop("*parent")
                     .continueLoop("*parent", new Expr(Ops.INC, "*generation"))).out("*ancestor")
              .originPartition()
              .agg(Agg.set("*ancestor")).out("*ancestors");
  }
}

The implementation iteratively looks at the parents of a node keeping a counter of how many generations back it has traversed so far. All ancestors are aggregated into a set for the return value. Let’s go through it line by line, starting with the declaration of the topology:

ClojureJava
1
2
(<<query-topology topologies "ancestors"
  [*start-id *num-generations :> *ancestors]
1
topologies.query("ancestors", "*start-id", "*num-generations").out("*ancestors")

This declares a query topology named “ancestors” that takes in input arguments *start-id and *num-generations . It declares the return variable *ancestors , which will be bound by the end of the topology execution.

The next line starts traversal of the family tree graph by initiating a loop:

ClojureJava
1
2
3
(loop<- [*id *start-id
         *generation 0
         :> *ancestor]
1
2
.loopWithVars(LoopVars.var("*id", "*start-id")
                      .var("*generation", 0),

Loops in dataflow first declare “loop variables” which are in scope for the body of the loop and are set to new values each time the loop is recurred. Here the two variables *id and *generation are declared, and they are initialized to *start-id and 0 for the first iteration of the loop. Loops can be emitted from any number of times (with :> in Clojure and .emitLoop in Java), and each emit runs the code after the loop with the emitted values. The Clojure version binds the emitted value from this loop as *ancestor in the binding vector, while that’s specified after the loop body in the Java version.

Then next line of code checks whether the query has traversed too far:

ClojureJava
1
(filter> (<= *generation *num-generations))
1
Block.keepTrue(new Expr(Ops.LESS_THAN_OR_EQUAL, "*generation", "*num-generations"))

This operation, filter> in Clojure and .keepTrue in Java, emits exactly one time if its input is true and otherwise doesn’t emit. Just like how the “explode” call differs from regular functions by emitting many times, this differs from regular functions by potentially not emitting at all. No values are captured for the output since this operation doesn’t emit any values when it emits, but you can still think of its emit as invoking the “rest of the topology”.

The *generation variable keeps track of how many parents have been traversed, and this code stops execution of the loop iteration if *generation is greater than the *num-generations parameter from the invoke of the query.

The next lines fetch the two parents for node:

ClojureJava
1
2
3
4
(|hash *id)
(local-select> [(keypath *id) (multi-path :parent1 :parent2) some?]
  $$family-tree
  :> *parent)
1
2
3
4
5
6
.hashPartition("*id")
.localSelect("$$family-tree",
             Path.key("*id")
                 .multiPath(Path.key("parent1"),
                            Path.key("parent2"))
                 .filterPred(Ops.IS_NOT_NULL)).out("*parent")

The hash partition, just like the ETL topology, moves the computation to the task containing information for that ID in the $$family-tree PState. The local select call selects both parents and emits them. The “multi path” navigator causes this local select to emit twice, once for each parent. The filter at the end of the path causes it not to emit parents that are null, which are nodes that have no parents specified.

The next line emits the found ancestor from the loop:

ClojureJava
1
(:> *parent)
1
.emitLoop("*parent")

As mentioned before, this runs the code following the loop with that value bound to the variable *ancestor .

The next line invokes another iteration of the loop with that node:

ClojureJava
1
(continue> *parent (inc *generation))
1
.continueLoop("*parent", new Expr(Ops.INC, "*generation"))

This runs the loop from the start, setting the value of *id to *parent and *generation to one more than it was before.

Something very different from loops in languages like Java or Clojure is happening here. The loop is being continued multiple times in one iteration, once for each parent. Along with the hash partitioner, this is causing the loop to recur an ever increasing number of times in parallel across the cluster until iterations reach the generation limit and filter themselves out. This is a very elegant way to express a parallel traversal.

The next line is declared after the loop and runs for each emit from the loop:

ClojureJava
1
(|origin)
1
.originPartition()

The “origin partitioner” relocates computation to the task where the query began execution. It gets all the emitted ancestors onto the same task so they can be aggregated and returned.

The last line of the query topology aggregates the ancestors together:

ClojureJava
1
(aggs/+set-agg *ancestor :> *ancestors)
1
.agg(Agg.set("*ancestor")).out("*ancestors");

Up until here, the query topology has been just like the ETL topology. Every line processed emits from the preceding line and emitted some number of times itself. Aggregators in dataflow are different in that they’re collecting all emits that happened in the execution of the query topology and combining them into a single value.

Query topologies are “batch blocks”, which have expanded dataflow capabilities including aggregation. Batch blocks can also do inner joins, outer joins, subqueries, and everything else possible with relational languages. You can find the full documentation for batch blocks here, with Clojure-specific documentation here.

For this query, you may be wondering about the case of the same person being reached through multiple ancestry paths. The code as written will traverse that parson multiple times, emitting them and all their ancestors every single time they’re reached. This doesn’t change the results since the set aggregation will eliminate duplicates, but it can be very wasteful. Fortunately, this can be optimized easily by just adding three more lines to the query topology:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(<<query-topology topologies "ancestors"
  [*start-id *num-generations :> *ancestors]
  (loop<- [*id *start-id
           *generation 0
           :> *ancestor]
    (filter> (<= *generation *num-generations))
    (|hash *id)
    (local-select> (view contains? *id) $$ancestors$$ :> *traversed?)
    (filter> (not *traversed?))
    (local-transform> [NONE-ELEM (termval *id)] $$ancestors$$)
    (local-select> [(keypath *id) (multi-path :parent1 :parent2) some?]
      $$family-tree
      :> *parent)
    (:> *parent)
    (continue> *parent (inc *generation)))
  (|origin)
  (aggs/+set-agg *ancestor :> *ancestors))
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
topologies.query("ancestors", "*start-id", "*num-generations").out("*ancestors")
          .loopWithVars(LoopVars.var("*id", "*start-id")
                                .var("*generation", 0),
            Block.keepTrue(new Expr(Ops.LESS_THAN_OR_EQUAL, "*generation", "*num-generations"))
                 .hashPartition("*id")
                 .localSelect("$$ancestors$$", Path.view(Ops.CONTAINS, "*id")).out("*traversed?")
                 .keepTrue(new Expr(Ops.NOT, "*traversed?"))
                 .localTransform("$$ancestors$$", Path.voidSetElem().termVal("*id"))
                 .localSelect("$$family-tree",
                              Path.key("*id")
                                  .multiPath(Path.key("parent1"),
                                             Path.key("parent2"))
                                  .filterPred(Ops.IS_NOT_NULL)).out("*parent")
                 .emitLoop("*parent")
                 .continueLoop("*parent", new Expr(Ops.INC, "*generation"))).out("*ancestor")
          .originPartition()
          .agg(Agg.set("*ancestor")).out("*ancestors");

The three lines added were:

ClojureJava
1
2
3
(local-select> (view contains? *id) $$ancestors$$ :> *traversed?)
(filter> (not *traversed?))
(local-transform> [NONE-ELEM (termval *id)] $$ancestors$$)
1
2
3
.localSelect("$$ancestors$$", Path.view(Ops.CONTAINS, "*id")).out("*traversed?")
.keepTrue(new Expr(Ops.NOT, "*traversed?"))
.localTransform("$$ancestors$$", Path.voidSetElem().termVal("*id"))

Every query topology invocation has a temporary, in-memory PState it can use with the name of the query topology surrounded by $$ . In this case, that PState is called $$ancestors$$ . This code uses that temporary PState to record when it traverses a node with a set on each task and to skip traversal if it’s already seen it. Using the temporary PState like this is common in graph queries.

Let’s take a look at how to invoke this query topology from outside a module, like from a web server. First, you retrieve a client for the query topology just like how we retrieved a PState client earlier:

ClojureJava
1
2
(def manager (open-cluster-manager {"conductor.host" "1.2.3.4"}))
(def ancestors-query (foreign-query manager "nlb.family-tree/FamilyTreeModule" "ancestors"))
1
2
3
4
Map config = new HashMap();
config.put("conductor.host", "1.2.3.4");
RamaClusterManager manager = RamaClusterManager.open(config);
QueryTopologyClient<Set> ancestorsQuery = manager.clusterPState("nlb.FamilyTreeModule", "ancestors");

Then, the query topology can be invoked just like any other function. It takes in input and returns the result:

ClojureJava
1
(def ancestors (foreign-invoke-query ancestors-query (UUID/fromString "aa7d7c5c-0679-4b01-b40b-cb3c20065549")))
1
Set ancestors = ancestorsQuery.invoke(UUID.fromString("aa7d7c5c-0679-4b01-b40b-cb3c20065549"));

It looks like any other function call, but it’s actually executing as a distributed query across the Rama cluster where the module is deployed.

Implementing the second graph traversal query

The second traversal query is to compute the number of direct descendants a person has per successive generation. It returns a map from generation number to the number of descendants in that generation, i.e. 3 children, 7 grand-children, 22 great-grand-children, etc.

The implementation is similar to the last one:

ClojureJava
1
2
3
4
5
6
7
8
9
10
11
12
(<<query-topology topologies "descendants-count"
  [*start-id *num-generations :> *result]
  (loop<- [*id *start-id
           *generation 0 :> *gen *count]
    (filter> (< *generation *num-generations))
    (|hash *id)
    (local-select> [(keypath *id) :children] $$family-tree :> *children)
    (:> *generation (count *children))
    (ops/explode *children :> *c)
    (continue> *c (inc *generation)))
  (|origin)
  (+compound {*gen (aggs/+sum *count)} :> *result))
1
2
3
4
5
6
7
8
9
10
11
topologies.query("descendants-count", "*start-id", "*num-generations").out("*result")
          .loopWithVars(LoopVars.var("*id", "*start-id")
                                .var("*generation", 0),
            Block.keepTrue(new Expr(Ops.LESS_THAN, "*generation", "*num-generations"))
                 .hashPartition("*id")
                 .localSelect("$$family-tree", Path.key("*id", "children")).out("*children")
                 .emitLoop("*generation", new Expr(Ops.SIZE, "*children"))
                 .each(Ops.EXPLODE, "*children").out("*c")
                 .continueLoop("*c", new Expr(Ops.INC, "*generation"))).out("*gen", "*count")
          .originPartition()
          .compoundAgg(CompoundAgg.map("*gen", Agg.sum("*count"))).out("*result");

The query topology takes in the person ID and the number of generations to count, and it returns the map from generation number to count. Since the implementation is similar to the last query topology, I’ll just point out the differences.

The body of the loop fetches the number of children for a node and continues traversal for each child:

ClojureJava
1
2
3
4
(local-select> [(keypath *id) :children] $$family-tree :> *children)
(:> *generation (count *children))
(ops/explode *children :> *c)
(continue> *c (inc *generation)))
1
2
3
4
.localSelect("$$family-tree", Path.key("*id", "children")).out("*children")
.emitLoop("*generation", new Expr(Ops.SIZE, "*children"))
.each(Ops.EXPLODE, "*children").out("*c")
.continueLoop("*c", new Expr(Ops.INC, "*generation"))

The emit callsite emits two values, the generation number of that person and how many children that person has. Those counts will later be summed together to get the total number of descendants for that generation. You can see that the loop binds two variables for its output which corresponds to this callsite emitting two values per emit.

The way those counts are aggregated uses a slightly different mechanism than the last query topology:

ClojureJava
1
(+compound {*gen (aggs/+sum *count)} :> *result)
1
.compoundAgg(CompoundAgg.map("*gen", Agg.sum("*count"))).out("*result");

This uses “compound aggregation”, which allows one or more aggregations to be composed into a data structure. This compound aggregation produces a map keyed by *gen with the values being the sum of the counts for that generation. This produces exactly what’s desired for the result of the query topology. You can read more about aggregation and compound aggregation on this page.

Summary

There’s a lot to learn with Rama, but you can see from this example application how much you can accomplish with very little code. Building the equivalent of a graph database with tailored queries to a particular use case is no small feat, but with Rama it only took 60 lines of code. There’s no additional work needed for deployment, updating, and scaling since that’s all built-in to Rama. For an experienced Rama programmer, a project like this takes only a few hours to fully develop, test, and have ready for deployment.

Rama being an event sourced system instills some extremely useful properties to applications that you don’t get without event sourcing. Depots provide an audit log of every change that’s ever happened to the application, making it possible to go back and answer questions about the application’s history. They also enable PStates to be recomputed in the future, which could save the company if a bad bug was deployed that corrupted vast portions of the PState. The fault tolerance you get from event sourcing is night and day compared to the alternative.

As mentioned earlier, there’s a Github project for the Clojure version and for the Java version containing all the code in this post. Those projects also have tests showing how to unit test modules in Rama’s “in-process cluster” environment.

You can get in touch with us at consult@redplanetlabs.com to schedule a free consultation to talk about your application and/or pair program on it. Rama is free for production clusters for up to two nodes and can be downloaded at this page.

Permalink

Solving Datomic BI Integration: Why We Built Plenish

When Eleven approached us a few years ago to help build the Business Intelligence (BI) component of their product, they faced a significant challenge: their data resided in Datomic, a powerful immutable database, while their BI tool of choice, Metabase, did not support Datalog. Eleven, an accounting platform built with Clojure and Datomic, needed an efficient way to bridge the gap between these technologies.

Over the years, we have engineered a robust and maintainable solution, allowing seamless integration between Datomic and Metabase. This journey led to the development of Plenish, a Change Data Capture (CDC) solution tailored for Datomic.

The Requirements

To integrate Metabase as the BI interface within Eleven’s product, we needed to develop a glue layer that would:

  1. Make data available to Metabase.
  2. Provide suitable access control, ensuring that Metabase’s access control aligns with Eleven’s user data boundaries.
  3. Be easy to maintain.
  4. Be performant enough.
  5. Be easy to deploy.
alt

The Challenges of Using Metabase with Datomic

Metabase is a powerful dashboard with a beautiful UI and easy connectivity to many RDBMS databases. However, Eleven&aposs data resides in Datomic, and Metabase does not support Datalog.

Attempt 1: Custom Metabase Driver

Custom Metabase Driver

We first developed a custom Metabase driver to communicate with Datomic. This worked reasonably well but was costly to maintain, mainly because Metabase has very little regard for maintaining compatibility with third party driver vendors.

Attempt 2: Datomic Analytics via Presto

Datomic analytics

In 2019 the Datomic team came out with a Datomic driver for Presto, which is bundled with Datomic pro, and thus allows SQL access to Datomic’s triple-store data. Given that we could rely on the Datomic team to maintain this, it seemed like a logical choice to adopt it. However, this has not been a great experience.

  • Outdated Presto Driver: Presto has since forked into PrestoSQL and Trino, with Trino gaining traction. However, a Trino-compatible Datomic-Analytics driver has yet to emerge, blocking us from upgrading Metabase, which has dropped support for Presto in favor of Trino.
  • Operational Overhead: Datomic-analytics connects via a peer server. Running a peer server alongside Presto introduced complexity, requiring additional maintenance and frequent restarts to recognize new databases.
  • Performance Bottlenecks: Queries on large datasets suffered from severe slowdowns, leaving us with few optimization options beyond increasing CPU and RAM.
  • High Resource Consumption: The combined resource usage of the peer server, Presto, and Metabase exceeded that of the entire application stack.
  • Limited Access Control: There is a desire to integrate with other data sources at the SQL layer, which Presto in theory is a good fit for, but the Metabase driver for Presto only allows accessing a single “catalog” at once, at least when using the Metabase query builder. On the other hand Presto has access to all datomic databases that the peer server has access to, with no finer grained access control, meaning we can’t provide people with raw SQL access, and need to rely on the somewhat thin assurances of Metabase’s query builder to ensure no unauthorized data access.

Plenish: A Solution for Datomic

Plenish

With all the above hard-won lessons, we adopted a third approach: mapping Datomic data to relational tables in a way similar to Datomic Analytics. However, instead of having "virtual tables" backed by Datomic, we actually copy the data into a relational database. Our choice of database has been PostgreSQL, because it’s a fantastic all-around relational database that is well supported and well understood. With this goal in mind, we started a new Lambda Island project called Plenish.

Plenish synchronizes Datomic to an RDBMS in real time by reading the Datomic transaction log and converting each event into a corresponding SQL command. With this simple but challenging idea, Plenish satisfies all five important requirements elegantly:

  • Seamless Metabase Integration: Since data is copied to PostgreSQL, Metabase can query it effortlessly.
  • Minimal Maintenance: Plenish relies on Datomic’s stable transaction log API, ensuring long-term reliability.
  • Optimized Performance: PostgreSQL is performant enough for OLAP workloads, eliminating the query speed issues seen with Presto.
  • Natural Access Control: Each Datomic database is synchronized to a corresponding PostgreSQL database, aligning naturally with Metabase’s user access model.
  • Simple Deployment: Plenish functions as a library, enabling straightforward integration.

Future Directions

We have recently enhanced Plenish with two major improvements:

  1. Support for DuckDB – an embedded analytical database, reducing initial setup complexity while improving query performance.
  2. Extension Points for Contributors – making it easier to integrate new data warehouse adapters within the open-source community.

As we continue refining Plenish, we recognize that similar CDC solutions exist in the RDBMS world—such as Debezium, Airbyte, and Qlik Replicate—but the Datomic ecosystem still lacks a mature, widely adopted alternative. We welcome collaboration from engineers interested in evolving this space.

If you’re working with Datomic and facing BI integration challenges, we’d love to hear from you. Let’s build the future of Datomic analytics together.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.