bisql (Clojure Data Access Library) released v0.4.0: Support Malli Validation

Bisql

I'm building Bisql, a data access library for 2-way SQL in Clojure.

https://github.com/hatappo/bisql

Since it's “2-way,” I called it bisql with the bi- prefix.

It is pronounced like bicycle.

A common weakness of SQL-template libraries, not just 2-way SQL libraries, is that writing all SQL as templates can become tedious.

To address that, many SQL-template libraries support a query builder.

You can write some queries as templates and others with a builder.

But I think that approach is fundamentally inconsistent.

If every query function is maintained as SQL or SQL templates, the cost of reviewing and understanding the data access layer drops significantly. Every database access has a concrete representation as an actual SQL file.

Once a query builder gets mixed in, that consistency is gone.

And in practice, what starts as “we’ll only use the builder for simple queries” often drifts into using it for complex queries as well, until the generated SQL is no longer obviously what you intended.

Bisql takes a different approach:

every database access should be written as SQL.

That said, hand-writing even simple CRUD operations for every table is tedious. So Bisql takes another approach there: it generates a large and comprehensive set of typical CRUD queries automatically.

It connects to a real database, inspects the schema, considers indexes, and generates SQL templates for many index-friendly query patterns.

Then the defquery macro converts all of those .sql template files into Clojure functions at once.

So you keep the consistency of SQL-first development without the repetitive CRUD work.

Malli Support

In this release, generated SQL templates can now include :malli/in and :malli/out declaration metadata.

These hold schemas for:

  • the parameters passed to a query function
  • the response data returned from the query

Bisql SQL templates can already carry arbitrary metadata that becomes metadata on the generated query functions. Malli support builds on that.

If a query function has :malli/in and :malli/out metadata, Bisql can automatically run Malli validation during query execution. This behavior is configurable.

Bisql also generates a base Malli schema file for each table as schema.clj.

The :malli/in and :malli/out metadata refer to those generated schemas.

Example of a generated query

/*:name crud.get-by-id */
/*:cardinality :one */
/*:malli/in [:map {:closed true} [:id int?]] */
/*:malli/out [:maybe sql.postgresql.public.users.schema/row] */
SELECT *
FROM users
WHERE id = /*$id*/1

Example of a generated schema

(ns sql.postgresql.public.users.schema
  (:refer-clojure :exclude [update])
  (:require [bisql.schema :as bisql.schema]))

#_{:clojure-lsp/ignore [:clojure-lsp/unused-public-var]}
(def insert
  [:map
   {:closed true}
   [:id [:or int? bisql.schema/malli-default-sentinel]]
   [:email string?]
   [:display-name string?]
   [:status [:or string? bisql.schema/malli-default-sentinel]]
   [:created-at [:or [:fn bisql.schema/offset-date-time?] bisql.schema/malli-default-sentinel]]])

#_{:clojure-lsp/ignore [:clojure-lsp/unused-public-var]}
(def update
  (bisql.schema/malli-map-all-entries-optional insert))

#_{:clojure-lsp/ignore [:clojure-lsp/unused-public-var]}
(def row
  (bisql.schema/malli-map-all-entries-strip-default-sentinel insert))

In other words, typical CRUD queries and their schemas can now be generated automatically, and validation can run transparently with very little manual work.

A Small Expression Language for if

This release also adds a small expression language for if conditions inside SQL templates.

That makes conditional rendering more expressive without leaving SQL templates or introducing a separate query builder layer.

Links

Permalink

Total functions in untyped languages

Watch the Clojure Documentary!


I was explaining an idea from my book to a coworker. I was saying that total functions can reduce defensive coding and simplify code. My coworker commented that it’s kind of meaningless to talk about totality in an untyped language.

I don’t agree, but I do understand the idea. It is much harder to define total function in an untyped language than in a typed language. When you’ve got types, you can say “A function is total if it returns a value (instead of throwing an error) for any arguments that pass the type checker.” It’s a straightforward definition. Or so it seems.

Totality is an idea from mathematics, specifically in the field of computability. A function is total if every combination of arguments in the domain of the function results in a value in its range. The domain and range are each sets of values. The domain specifies valid arguments while the range specifies valid return values.

And here we see a major problem: We’re using domain in two different ways. One is about the valid arguments to a function. The other is domain as in domain modeling, where it means the area of concern of your software. It’s tricky to talk about the first when in the context of the second.

Division is total in mathematics. Division’s domain is

ℝ x (ℝ - 0)

and the range is ℝ. That means division is defined for pairs of real numbers where the second element is not zero. That’s what we learned in high school algebra.

But in programming, division is the prime example of a partial function. How come? Well, when you define the function in a type language like this:

function divide(a: number, b: number): number

You get a function that can throw an exception—so it’s not total—even after it passes the type checker:

divide(1, 0) //=> Exception! Divide by zero.

What we see is that we’ve mapped the “set of values” idea of domain from math onto the “types of arguments” idea in programming languages. It’s an imperfect mapping. And it’s a mapping many of us have gladly accepted. The function is partial because the language cannot express the domain perfectly.

That’s why the whole concept of total functions is important. It gets us thinking about the gaps in our mappings. It gets us thinking about how to do a better mapping. Or about what kinds of checks on the arguments we should do before we call a function or after we get a return value. In short, it’s about safety and trust.

But notice that even in typed languages, there is a gap in the mapping from domain to types. The gap may be smaller than with untyped languages, but there can be a gap. My colleague said it’s meaningless, which implies a Church Typing mindset. It implies that because there are no checks for types, you could pass anything, such as:

(+ “a” 4) ;=> Exception! Expected Number but got String.

That means that even venerable addition is not total in Clojure. I think this is a little disingenuous. This is obviously an incorrect program, even if in general, incorrectness is hard to define in untyped languages. The correct set of values is well-known for + even if it’s not written down. And for any function, all we have to do is write it down—in documentation—and the intended domain is explicit.

I don’t want this essay to be a debate about static vs. dynamic typing. This essay is about totality and how to define it. Totality is really a concept that came from computability theory. It’s for asking questions like “does function f halt for all integer inputs.” The domain is specified by the problem, not the function. That’s why division is partial in programming: The domain is specified from outside, by the language’s choice of types.

In programming we use totality to consider the possible and probable inputs of a function. We concede that we must make imperfect tradeoffs between ideal mathematical objects and available language features. In most type systems, we can’t say “it’s a number, but it can’t be zero” and have the type checker ensure it’s correct.

So I’ve come up with a definition that takes the pragmatics of the idea into account:

A function is total if it returns a valid domain value for every combination of valid domain values as arguments.

In this definition, “valid domain value” is doing a lot of work. It’s purposefully vague. But it asks you, the domain modeler, to consider your domain (in the business domain sense) and the meaningful values in it. It asks you to consider what subset of those values are valid. These often don’t correspond perfectly to types in your language. Ideally there would be a perfect overlap between the domain of a function and the business domain values. Where the overlap is not perfect is where you should look.

I want programmers to think past types and peer deep into the domain. I want them to ask these questions:

  • What are the meaningful domain values that this function is meant to operate on?

  • Does that set constitute a cohesive concept?

  • Are there values that are conceptually cohesive that it won’t work for?

  • How can I map the set of domain values this function is defined on to language features to get some kind of safety guarantees?

This definition also has a curious effect. Let me demonstrate in Clojure. Let’s say I’m definition a function that should only be defined over non-negative numbers. I don’t have complex numbers in my business domain:

(defn square-root [x] …)

If I’m conscientious, I’ll add a docstring explaining the domain of the function (the set of arguments it is defined over):

(defn square-root
“The square root of a non-negative number.”
[x]…)

Great! But I want the domain to be enforced in code. In Clojure, I might write:

(defn square-root
“The square root of a non-negative number.”
[x]
(assert (not (neg? x)))
…)

Awesome! I’ve now made the function total! We’ve expressed the domain of the function adequately using the features of the language. Ironically, though, we have written code that is more likely to throw when totality is usually defined as throwing less often. But we’re also defining what are valid arguments. Calling (square-root -1) throws an AssertionError, indicating that you, the programmer, violated the contract. So it is total, considering the explicitly defined domain of the function.

These are the kinds of questions I wrestle with when writing a book. I know I use the idea of total function all the time when programming in Clojure. I think it’s valuable. But Clojure doesn’t have types, so what can it mean? Why do others believe it can’t mean anything? I think it’s important to get these ideas right. If you were ever wondering what takes so long to write a book, it’s reading, thinking, digging, conversing until I feel like I’ve got a handle on it.

When you’re working on software design, or any kind of design, there are always hard questions. You can’t rely on pat answers. Instead, you need conceptual models to give you different perspectives. Total functions is one of those. It helps you focus on the failure modes of your functions: Could this function be passed something that will break it? Will it ever return an unexpected value? What would cause it to throw an exception? It’s not about writing code according to some rule, or using a language feature in a particular pattern. You have to think broader than that.

Permalink

Proximal Policy Optimization with Clojure and PyTorch

(Cross posting article published at Clojure Civitas)

Motivation

Recently I started to look into the problem of reentry trajectory planning in the context of developing the sfsim space flight simulator. I had looked into reinforcement learning before and even tried out Q-learning using the lunar lander reference environment of OpenAI’s gym library (now maintained by the Farama Foundation). However it had stability issues. The algorithm would converge on a strategy and then suddenly diverge again.

More recently (2017) the Proximal Policy Optimization (PPO) algorithm was published and it has gained in popularity. PPO is inspired by Trust Region Policy Optimization (TRPO) but is much easier to implement. Also PPO handles continuous observation and action spaces which is important for control problems. The Stable Baselines3 Python library has a implementation of PPO, TRPO, and other reinforcement learning algorithms. However I found XinJingHao’s PPO implementation which is easier to follow.

In order to use PPO with a simulation environment implemented in Clojure and also in order to get a better understanding of PPO, I dediced to do an implementation of PPO in Clojure.

Dependencies

For this project we are using the following deps.edn file. The Python setup is shown further down in this article.

{:deps
 {org.clojure/clojure {:mvn/version "1.12.4"}
  clj-python/libpython-clj {:mvn/version "2.026"}
  quil/quil {:mvn/version "4.3.1563"}
  org.clojure/core.async {:mvn/version "1.9.865"}}
}

The dependencies can be pulled in using the following statement.

(require '[clojure.math :refer (PI cos sin exp to-radians)]
         '[clojure.core.async :as async]
         '[tablecloth.api :as tc]
         '[scicloj.tableplot.v1.plotly :as plotly]
         '[quil.core :as q]
         '[quil.middleware :as m]
         '[libpython-clj2.require :refer (require-python)]
         '[libpython-clj2.python :refer (py.) :as py])

Pendulum Environment

screenshot of pendulum environment

To validate the implementation, we will implement the classical pendulum environment in Clojure. In order to be able to switch environments, we define a protocol according to the environment abstract class used in OpenAI’s gym.

(defprotocol Environment
  (environment-update [this action])
  (environment-observation [this])
  (environment-done? [this])
  (environment-truncate? [this])
  (environment-reward [this action]))

Here is a configuration for testing the pendulum.

(def frame-rate 20)

(def config
  {:length  (/ 2.0 3.0)
   :max-speed 8.0
   :motor 6.0
   :gravitation 10.0
   :dt (/ 1.0 frame-rate)
   :save false
   :timeout 10.0
   :angle-weight 1.0
   :velocity-weight 0.1
   :control-weight 0.0001})

Setup

A method to initialise the pendulum is defined.

(defn setup
  "Initialise pendulum"
  [angle velocity]
  {:angle          angle
   :velocity       velocity
   :t              0.0})

Same as in OpenAI’s gym the angle is zero when the pendulum is pointing up. Here a pendulum is initialised to be pointing down and have an angular velocity of 0.5 radians per second.

(setup PI 0.5)
; {:angle 3.141592653589793, :velocity 0.5, :t 0.0}

State Updates

The angular acceleration due to gravitation is implemented as follows.

(defn pendulum-gravity
  "Determine angular acceleration due to gravity"
  [gravitation length angle]
  (/ (* (sin angle) gravitation) length))

The angular acceleration depends on the gravitation, length of pendulum, and angle of pendulum.

(pendulum-gravity 9.81 1.0 0.0)
; 0.0
(pendulum-gravity 9.81 1.0 (/ PI 2))
; 9.81
(pendulum-gravity 9.81 2.0 (/ PI 2))
; 4.905

The motor is controlled using an input value between -1 and 1. This value is simply multiplied with the maximum angular acceleration provided by the motor.

(defn motor-acceleration
  "Angular acceleration from motor"
  [control motor-acceleration]
  (* control motor-acceleration))

A simulation step of the pendulum is implemented using Euler integration.

(defn update-state
  "Perform simulation step of pendulum"
  ([{:keys [angle velocity t]}
    {:keys [control]}
    {:keys [dt motor gravitation length max-speed]}]
   (let [gravity        (pendulum-gravity gravitation length angle)
         motor          (motor-acceleration control motor)
         t              (+ t dt)
         acceleration   (+ motor gravity)
         velocity       (max (- max-speed)
                             (min max-speed
                                  (+ velocity (* acceleration dt))))
         angle          (+ angle (* velocity dt))]
     {:angle          angle
      :velocity       velocity
      :t              t})))

Here are a few examples for advancing the state in different situations.

(update-state {:angle PI :velocity 0.0 :t 0.0} {:control 0.0} config)
; {:angle 3.141592653589793, :velocity 9.184850993605151E-17, :t 0.05}
(update-state {:angle PI :velocity 0.1 :t 0.0} {:control 0.0} config)
; {:angle 3.146592653589793, :velocity 0.1000000000000001, :t 0.05}
(update-state {:angle (/ PI 2) :velocity 0.0 :t 0.0} {:control 0.0} config)
; {:angle 1.6082963267948966, :velocity 0.75, :t 0.05}
(update-state {:angle 0.0 :velocity 0.0 :t 0.0} {:control 1.0} config)
; {:angle 0.015000000000000003, :velocity 0.30000000000000004, :t 0.05}

Observation

The observation of the pendulum state uses cosinus and sinus of the angle to resolve the wrap around problem of angles. The angular speed is normalized to be between -1 and 1 as well. This so called feature scaling is done in order to improve convergence.

(defn observation
  "Get observation from state"
  [{:keys [angle velocity]} {:keys [max-speed]}]
  [(cos angle) (sin angle) (/ velocity max-speed)])

The observation of the pendulum is a vector with 3 elements.

(observation {:angle 0.0 :velocity 0.0} config)
; [1.0 0.0 0.0]
(observation {:angle 0.0 :velocity 0.5} config)
; [1.0 0.0 0.0625]
(observation {:angle (/ PI 2) :velocity 0.0} config)
; [6.123233995736766E-17 1.0 0.0]

Note that the observation needs to capture all information required for achieving the objective, because it is the only information available to the actor for deciding on the next action.

Action

The action of a pendulum is a vector with one element between 0 and 1. The following method clips it and converts it to an action hashmap used by the pendulum environment. Note that an action can consist of several values.

(defn action
  "Convert array to action"
  [array]
  {:control (max -1.0 (min 1.0 (- (* 2.0 (first array)) 1.0)))})

The following examples show how the action vector is mapped to a control input between -1 and 1.

(action [0.0])
; {:control -1.0}
(action [0.5])
; {:control 0.0}
(action [1.0])
; {:control 1.0}

Termination

The truncate method is used to stop a pendulum run after a specific amount of time.

(defn truncate?
  "Decide whether a run should be aborted"
  ([{:keys [t]} {:keys [timeout]}]
   (>= t timeout)))

(truncate? {:t 50.0} {:timeout 100.0})
; false
(truncate? {:t 100.0} {:timeout 100.0})
; true

It is also possible to define a termination condition. For the pendulum environment we specify that it never terminates.

(defn done?
  "Decide whether pendulum achieved target state"
  ([_state _config]
   false))

Reward

The following method normalizes an angle to be between -PI and +PI.

(defn normalize-angle
  "Angular deviation from up angle"
  [angle]
  (- (mod (+ angle PI) (* 2 PI)) PI))

We also need the square of a number.

(defn sqr
  "Square of number"
  [x]
  (* x x))

The reward function penalises deviation from the upright position, non-zero velocities, and non-zero control input. Note that it is important that the reward function is continuous because machine learning uses gradient descent.

(defn reward
  "Reward function"
  [{:keys [angle velocity]}
   {:keys [angle-weight velocity-weight control-weight]}
   {:keys [control]}]
  (- (+ (* angle-weight (sqr (normalize-angle angle)))
        (* velocity-weight (sqr velocity))
        (* control-weight (sqr control)))))

Environment Protocol

Finally we are able to implement the pendulum as a generic environment.

(defrecord Pendulum [config state]
  Environment
  (environment-update [_this input]
    (->Pendulum config (update-state state (action input) config)))
  (environment-observation [_this]
    (observation state config))
  (environment-done? [_this]
    (done? state config))
  (environment-truncate? [_this]
    (truncate? state config))
  (environment-reward [_this input]
    (reward state config (action input))))

The following factory method creates an environment with an initial random state covering all possible pendulum states.

(defn pendulum-factory
  []
  (let [angle     (- (rand (* 2.0 PI)) PI)
        max-speed (:max-speed config)
        velocity  (- (rand (* 2.0 max-speed)) max-speed)]
    (->Pendulum config (setup angle velocity))))

Visualisation

The following method is used to draw the pendulum and visualise the motor control input.

(defn draw-state [{:keys [angle]} {:keys [control]}]
  (let [origin-x   (/ (q/width) 2)
        origin-y   (/ (q/height) 2)
        length     (* 0.5 (q/height) (:length config))
        pendulum-x (+ origin-x (* length (sin angle)))
        pendulum-y (- origin-y (* length (cos angle)))
        size       (* 0.05 (q/height))
        arc-radius (* (abs control) 0.2 (q/height))
        positive   (pos? control)
        tip-angle  (if positive 225 -45)]
    (q/frame-rate frame-rate)
    (q/background 255)
    (q/stroke-weight 5)
    (q/stroke 0)
    (q/fill 175)
    (q/line origin-x origin-y pendulum-x pendulum-y)
    (q/stroke-weight 1)
    (q/ellipse pendulum-x pendulum-y size size)
    (q/no-fill)
    (q/arc origin-x origin-y
           (* 2 arc-radius) (* 2 arc-radius)
           (to-radians -45) (to-radians 225))
    (q/with-translation [(+ origin-x (* (cos (to-radians tip-angle)) arc-radius))
                         (+ origin-y (* (sin (to-radians tip-angle)) arc-radius))]
      (q/with-rotation [(to-radians (if positive 225 -45))]
        (q/triangle 0 (if positive 10 -10) -5 0 5 0)))
    (when (:save config)
      (q/save-frame "frame-####.png"))))

Animation

With Quil we can create an animation of the pendulum and react to mouse input.

(defn -main [& _args]
  (let [done-chan   (async/chan)
        last-action (atom {:control 0.0})]
    (q/sketch
      :title "Inverted Pendulum with Mouse Control"
      :size [854 480]
      :setup #(setup PI 0.0)
      :update (fn [state]
                  (let [action {:control (min 1.0
                                              (max -1.0
                                                   (- 1.0 (/ (q/mouse-x)
                                                             (/ (q/width) 2.0)))))}
                        state  (update-state state action config)]
                    (when (done? state config) (async/close! done-chan))
                    (reset! last-action action)
                    state))
      :draw #(draw-state % @last-action)
      :middleware [m/fun-mode]
      :on-close (fn [& _] (async/close! done-chan)))
    (async/<!! done-chan))
  (System/exit 0))

manually controlled pendulum

Neural Networks

PPO is a machine learning technique using backpropagation to learn the parameters of two neural networks.

  • The actor network takes an observation as an input and outputs the parameters of a probability distribution for sampling the next action to take.
  • The critic takes an observation as an input and outputs the expected cumulative reward for the current state.

Import PyTorch

For implementing the neural networks and backpropagation, we can use the Python-Clojure bridge libpython-clj2 and the PyTorch machine learning library. The PyTorch library is quite comprehensive, is free software, and you can find a lot of documentation on how to use it. The default version of PyTorch on pypi.org comes with CUDA (Nvidia) GPU support. There are also PyTorch wheels provided by AMD which come with ROCm support. Here we are going to use a CPU version of PyTorch which is a much smaller install.

You need to install Python 3.10 or later. For package management we are going to use the uv package manager. The following pyproject.toml file is used to install PyTorch and NumPy.

[project]
name = "ppo"
version = "0.1.0"
description = "Proximal Policy Optimization"
authors = [{ name="Jan Wedekind", email="jan@wedesoft.de" }]
requires-python = ">=3.10.0"
dependencies = [
    "numpy",
    "torch",
]

[tool.uv]
python-preference = "only-system"

[tool.uv.sources]
torch = { index = "pytorch" }
numpy = { index = "pytorch" }

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

Note that we are specifying a custom repository index to get the CPU-only version of PyTorch. Also we are using the system version of Python to prevent uv from trying to install its own version which lacks the _cython module. To freeze the dependencies and create a uv.lock file, you need to run

uv lock

You can install the dependencies using

uv sync

In order to access PyTorch from Clojure you need to run the clj command via uv:

uv run clj

Now you should be able to import the Python modules using require-python.

(require-python '[builtins :as python]
                '[torch :as torch]
                '[torch.nn :as nn]
                '[torch.nn.functional :as F]
                '[torch.optim :as optim]
                '[torch.distributions :refer (Beta)]
                '[torch.nn.utils :as utils])
; :ok

Tensor Conversion

First we implement a few methods for converting nested Clojure vectors to PyTorch tensors and back.

Clojure to PyTorch

The method tensor is for converting a Clojure datatype to a PyTorch tensor.

(defn tensor
  "Convert nested vector to tensor"
  ([data]
   (tensor data torch/float32))
  ([data dtype]
   (torch/tensor data :dtype dtype)))

(tensor PI)
; tensor(3.1416)
(tensor [2.0 3.0 5.0])
; tensor([2., 3., 5.])
(tensor [[1.0 2.0] [3.0 4.0] [5.0 6.0]])
; tensor([[1., 2.],
;         [3., 4.],
;         [5., 6.]])
(tensor [1 2 3] torch/long)
; tensor([1, 2, 3])

PyTorch to Clojure

The next method is for converting a PyTorch tensor back to a Clojure datatype.

(defn tolist
  "Convert tensor to nested vector"
  [tensor]
  (py/->jvm (py. tensor tolist)))

(tolist (tensor [2.0 3.0 5.0]))
; [2.0 3.0 5.0]
(tolist (tensor [[1.0 2.0] [3.0 4.0] [5.0 6.0]]))
; [[1.0 2.0] [3.0 4.0] [5.0 6.0]]

PyTorch scalar to Clojure

A tensor with no dimensions can also be converted using toitem

(defn toitem
  "Convert torch scalar value to float"
  [tensor]
  (py. tensor item))

(toitem (tensor PI))
; 3.1415927410125732

Critic Network

The critic network is a neural network with an input layer of size observation-size and two fully connected hidden layers of size hidden-units with tanh activation functions. The critic output is a single value (an estimate for the expected cumulative return achievable by the given observed state).

(def Critic
  (py/create-class
    "Critic" [nn/Module]
    {"__init__"
     (py/make-instance-fn
       (fn [self observation-size hidden-units]
           (py. nn/Module __init__ self)
           (py/set-attrs!
             self
             {"fc1" (nn/Linear observation-size hidden-units)
              "fc2" (nn/Linear hidden-units hidden-units)
              "fc3" (nn/Linear hidden-units 1)})
           nil))
     "forward"
     (py/make-instance-fn
       (fn [self x]
           (let [x (py. self fc1 x)
                 x (torch/tanh x)
                 x (py. self fc2 x)
                 x (torch/tanh x)
                 x (py. self fc3 x)]
             (torch/squeeze x -1))))}))

When running inference, you need to run the network with gradient accumulation disabled, otherwise gradients get accumulated and can leak into a subsequent training step. In Python this looks like this.

with torch.no_grad():
    # ...

Here we create a Clojure macro to do the same job.

(defmacro without-gradient
  "Execute body without gradient calculation"
  [& body]
  `(let [no-grad# (torch/no_grad)]
     (try
       (py. no-grad# ~'__enter__)
       ~@body
       (finally
         (py. no-grad# ~'__exit__ nil nil nil)))))

Now we can create a network and try it out. We create a test multilayer perceptron with three inputs, two hidden layers of 8 units each, and one output.

(def critic (Critic 3 8))

example of critic multilayer perceptron

Note that the network creates non-zero outputs because PyTorch performs random initialisation of the weights for us.

(without-gradient
  (toitem (critic (tensor [-1 0 0]))))
; -0.38925105333328247

We can also create a wrapper for using the neural network with Clojure datatypes.

(defn critic-observation
  "Use critic with Clojure datatypes"
  [critic]
  (fn [observation]
      (without-gradient (toitem (critic (tensor observation))))))

Here is the output of the network for the observation [-1 0 0].

((critic-observation critic) [-1 0 0])
; -0.38925105333328247

Training

Training a neural network is done by defining a loss function. The loss of the network then is calculated for a mini-batch of training data. One can then use PyTorch’s backpropagation to compute the gradient of the loss value with respect to every single parameter of the network. The gradient then is used to perform a gradient descent step. A popular gradient descent method is the Adam optimizer.

Here is a wrapper for the Adam optimizer.

(defn adam-optimizer
  "Adam optimizer"
  [model learning-rate weight-decay]
  (optim/Adam (py. model parameters) :lr learning-rate :weight_decay weight-decay))

PyTorch also provides the mean square error (MSE) loss function.

(defn mse-loss
  "Mean square error cost function"
  []
  (nn/MSELoss))

A training step can be performed as follows. Here we only use a single mini-batch with a single observation and an expected output of 1.0.

(def optimizer (adam-optimizer critic 0.01 0.0))
(def criterion (mse-loss))
(def mini-batch [(tensor [[-1 0 0]]) (tensor [1.0])])
(let [prediction (critic (first mini-batch))
      expected   (second mini-batch)
      loss       (criterion prediction expected)]
  (py. optimizer zero_grad)
  (py. loss backward)
  (py. optimizer step))

As you can see, the output of the network for the observation [-1 0 0] is now closer to 1.0.

((critic-observation critic) [-1 0 0])
; -0.3086397051811218

Actor Network

The actor network for PPO takes an observation as an input and it outputs the parameters of a probability distribution over actions. In addition to the forward pass, the actor network has a method deterministic_act to choose the expectation value of the distribution as a deterministic action.

(def Actor
  (py/create-class
    "Actor" [nn/Module]
    {"__init__"
     (py/make-instance-fn
       (fn [self observation-size hidden-units action-size]
           (py. nn/Module __init__ self)
           (py/set-attrs!
             self
             {"fc1"     (nn/Linear observation-size hidden-units)
              "fc2"     (nn/Linear hidden-units hidden-units)
              "fcalpha" (nn/Linear hidden-units action-size)
              "fcbeta"  (nn/Linear hidden-units action-size)})
           nil))
     "forward"
     (py/make-instance-fn
       (fn [self x]
           (let [x (py. self fc1 x)
                 x (torch/tanh x)
                 x (py. self fc2 x)
                 x (torch/tanh x)
                 alpha (torch/add 1.0 (F/softplus (py. self fcalpha x)))
                 beta  (torch/add 1.0 (F/softplus (py. self fcbeta x)))]
             [alpha beta])))
     "deterministic_act"
     (py/make-instance-fn
       (fn [self x]
            (let [[alpha beta] (py. self forward x)]
              (torch/div alpha (torch/add alpha beta)))))
     "get_dist"
     (py/make-instance-fn
       (fn [self x]
           (let [[alpha beta] (py. self forward x)]
             (Beta alpha beta))))}))

Furthermore the actor network has a method get_dist to return a Torch distribution object which can be used to sample a random action or query the current log-probability of an action. Here (as the default in XinJingHao’s PPO implementation) we use the Beta distribution with parameters alpha and beta both greater than 1.0. See here for an interactive visualization of the Beta distribution.

(defn indeterministic-act
  "Sample action using actor network returning random action and log-probability"
  [actor]
  (fn indeterministic-act-with-actor [observation]
      (without-gradient
        (let [dist    (py. actor get_dist (tensor observation))
              sample  (py. dist sample)
              action  (torch/clamp sample 0.0 1.0)
              logprob (py. dist log_prob action)]
          {:action (tolist action) :logprob (tolist logprob)}))))

We create a test multilayer perceptron with three inputs, two hidden layers of 8 units each, and two outputs which serve as parameters for the Beta distribution.

(def actor (Actor 3 8 1))

example of actor multilayer perceptron

One can then use the network to:

a. get the parameters of the distribution for a given observation.

(without-gradient (actor (tensor [-1 0 0])))
; (tensor([1.7002]), tensor([1.7489]))

b. choose the expectation value of the distribution as an action.

(without-gradient (py. actor deterministic_act (tensor [-1 0 0])))
; tensor([0.4929])

c. sample a random action from the distribution and get the associated log-probability.

((indeterministic-act actor) [-1 0 0])
{:action [0.6526480913162231], :logprob [0.2350209504365921]}

We can also query the current log-probability of a previously sampled action.

(defn logprob-of-action
  "Get log probability of action"
  [actor]
  (fn [observation action]
      (let [dist (py. actor get_dist observation)]
        (py. dist log_prob action))))

Here is a plot of the probability density function (PDF) actor output for a single observation.

(without-gradient
  (let [actions (range 0.0 1.01 0.01)
        logprob (fn [action]
                    (tolist
                      ((logprob-of-action actor) (tensor [-1 0 0]) (tensor action))))
        scatter (tc/dataset
                  {:x actions
                   :y (map (fn [action] (exp (first (logprob [action])))) actions)})]
    (-> scatter
        (plotly/base {:=title "Actor output for a single observation" :=mode :lines})
        (plotly/layer-point {:=x :x :=y :y}))))

probability density function output of actor for a single observation

Finally we can also query the entropy of the distribution. By incorporating the entropy into the loss function later on, we can encourage exploration and prevent the probability density function from collapsing.

(defn entropy-of-distribution
  "Get entropy of distribution"
  [actor observation]
  (let [dist (py. actor get_dist observation)]
    (py. dist entropy)))

(without-gradient (entropy-of-distribution actor (tensor [-1 0 0])))
; tensor([-0.0825])

Proximal Policy Optimization

Sampling data

In order to perform optimization, we sample the environment using the current policy (indeterministic action using actor).

(defn sample-environment
  "Collect trajectory data from environment"
  [environment-factory policy size]
  (loop [state             (environment-factory)
         observations      []
         actions           []
         logprobs          []
         next-observations []
         rewards           []
         dones             []
         truncates         []
         i                 size]
    (if (pos? i)
      (let [observation      (environment-observation state)
            sample           (policy observation)
            action           (:action sample)
            logprob          (:logprob sample)
            reward           (environment-reward state action)
            done             (environment-done? state)
            truncate         (environment-truncate? state)
            next-state       (if (or done truncate)
                               (environment-factory)
                               (environment-update state action))
            next-observation (environment-observation next-state)]
        (recur next-state
               (conj observations observation)
               (conj actions action)
               (conj logprobs logprob)
               (conj next-observations next-observation)
               (conj rewards reward)
               (conj dones done)
               (conj truncates truncate)
               (dec i)))
      {:observations      observations
       :actions           actions
       :logprobs          logprobs
       :next-observations next-observations
       :rewards           rewards
       :dones             dones
       :truncates         truncates})))

Here for example we are sampling 3 consecutives states of the pendulum.

(sample-environment pendulum-factory (indeterministic-act actor) 3)
; {:observations
;  [[-0.7596729533565417 0.6503053159390207 0.5479034035454418]
;   [-0.8900589293843874 0.4558454806435161 0.5866609335014912]
;   [-0.9762048336009674 0.21685046196424718 0.6368372482766531]],
;  :actions
;  [[0.20388542115688324] [0.5992106795310974] [0.1662445366382599]],
;  :logprobs
;  [[0.08455279469490051] [0.26384592056274414] [-0.028919726610183716]],
;  :next-observations
;  [[-0.8900589293843874 0.4558454806435161 0.5866609335014912]
;   [-0.9762048336009674 0.21685046196424718 0.6368372482766531]
;   [-0.99941293940555 -0.034260422483655656 0.6321353193336707]],
;  :rewards [-7.8437431872499745 -9.322367484397839 -11.139601368813137],
;  :dones [false false false],
;  :truncates [false false false]}

Advantages

Theory

If we are in state s_t and take an action a_t at timestep t, we receive reward r_t and end up in state s_{t+1}. The cumulative reward for state s_t is a finite or infinite sequence using a discount factor γ<1:

latex formula

The critic V estimates the expected cumulative reward for starting from the specified state.

latex formula

In particular, the difference between discounted rewards can be used to get an estimate for the individual reward:

latex formula

The deviation of the individual reward received in state s_t from the expected reward is:

latex formula

The special case where a time series is “done” (and the next one is started) uses 0 as the remaining expected cumulative reward.

latex formula

If we have a sample set with a sequence of T states (t=0,1,…,T-1), one can compute the cumulative advantage for each time step going backwards:

latex formula

I.e. we can compute the cumulative advantages as follows:

latex formula

PPO uses an additional factor λ≤1 called Generalized Advantage Estimation (GAE) which can be used to steer the training towards more immediate rewards if there are stability issues. See Schulman et al. for more details.

Implementation of Deltas

The code for computing the $\delta$ values follows here:

(defn deltas
  "Compute difference between actual reward plus discounted estimate of next state and estimated value of current state"
  [{:keys [observations next-observations rewards dones]} critic gamma]
  (mapv (fn [observation next-observation reward done]
            (- (+ reward
                  (if done 0.0 (* gamma (critic next-observation))))
               (critic observation)))
        observations next-observations rewards dones))

If the reward is zero and the critic outputs constant zero, there is no difference between the expected and received reward.

(deltas {:observations [[4]] :next-observations [[3]] :rewards [0] :dones [false]}
        (constantly 0)
        1.0)
; [0.0]

If the reward is 1.0 and the critic outputs zero for both observations, the difference is 1.0.

(deltas {:observations [[4]] :next-observations [[3]] :rewards [1] :dones [false]}
        (constantly 0)
        1.0)
; [1.0]

If the reward is 1.0 and the difference of critic outputs is also 1.0 then there is no difference between the expected and received reward (when $\gamma=1$).

(defn linear-critic [observation] (first observation))
(deltas {:observations [[4]] :next-observations [[3]] :rewards [1] :dones [false]}
        linear-critic
        1.0)
; [0.0]

If the next critic value is 1.0 and discounted with 0.5 and the current critic value is 2.0, we expect a reward of 1.5. If we only get a reward of 1.0, the difference is -0.5.

(deltas {:observations [[2]] :next-observations [[1]] :rewards [1] :dones [false]}
        linear-critic
        0.5)
; [-0.5]

If the run is terminated, the current critic value is compared with the reward which in this case is the last reward received in this run.

(deltas {:observations [[4]] :next-observations [[3]] :rewards [4] :dones [true]}
        linear-critic
        1.0)
; [0.0]

Implementation of Advantages

The advantages can be computed in an elegant way using reductions and the previously computed deltas.

(defn advantages
  "Compute advantages attributed to each action"
  [{:keys [dones truncates]} deltas gamma lambda]
  (vec
    (reverse
    (rest
      (reductions
        (fn [advantage [delta done truncate]]
            (+ delta (if (or done truncate) 0.0 (* gamma lambda advantage))))
        0.0
        (reverse (map vector deltas dones truncates)))))))

For example when all deltas are 1.0 and if using an discount factor of 0.5, the advantages approach 2.0 assymptotically when going backwards in time.

(advantages {:dones [false false false] :truncates [false false false]}
            [1.0 1.0 1.0]
            0.5
            1.0)
; [1.75 1.5 1.0]

When an episode is terminated (or truncated), the accumulation of advantages starts again when going backwards in time. I.e. the computation of advantages does not distinguish between terminated and truncated episodes (unlike the deltas).

(advantages {:dones [false false true false false true]
             :truncates [false false false false false false]}
            [1.0 1.0 1.0 1.0 1.0 1.0]
            0.5
            1.0)
; [1.75 1.5 1.0 1.75 1.5 1.0]

We add the advantages to the batch of samples with the following function.

(defn assoc-advantages
  "Associate advantages with batch of samples"
  [critic gamma lambda batch]
  (let [deltas     (deltas batch critic gamma)
        advantages (advantages batch deltas gamma lambda)]
    (assoc batch :advantages advantages)))

Critic Loss Function

The target values for the critic are simply the current values plus the new advantages. The target values can be computed using PyTorch’s add function.

(defn critic-target
  "Determine target values for critic"
  [{:keys [observations advantages]} critic]
  (without-gradient (torch/add (critic observations) advantages)))

We add the critic targets to the batch of samples with the following function.

(defn assoc-critic-target
  "Associate critic target values with batch of samples"
  [critic batch]
  (let [target (critic-target batch critic)]
    (assoc batch :critic-target target)))

If we add the target values to the samples, we can compute the critic loss for a batch of samples as follows.

(defn critic-loss
  "Compute loss value for batch of samples and critic"
  [samples critic]
  (let [criterion (mse-loss)
        loss      (criterion (critic (:observations samples)) (:critic-target samples))]
    loss))

Actor Loss Function

The core of the actor loss function relies on the action probability ratio of using the updated and the old policy (actor network output). The ratio is defined as latex formula Note that r_t(θ) here refers to the probability ratio as opposed to the reward of the previous section.

The sampled observations, log probabilities, and actions are combined with the actor’s parameter-dependent log probabilities.

(defn probability-ratios
  "Probability ratios for a actions using updated policy and old policy"
  [{:keys [observations logprobs actions]} logprob-of-action]
  (let [updated-logprobs (logprob-of-action observations actions)]
    (torch/exp (py. (torch/sub updated-logprobs logprobs) sum 1))))

The objective is to increase the probability of actions which lead to a positive advantage and reduce the probability of actions which lead to a negative advantage. I.e. maximising the following objective function.

latex formula

The core idea of PPO is to use clipped probability ratios for the loss function in order to increase stability, . The probability ratio is clipped to stay below 1+ε for positive advantages and to stay above 1-ε for negative advantages.

latex formula

See Schulman et al. for more details.

Because PyTorch minimizes a loss, we need to negate above objective function.

(defn clipped-surrogate-loss
  "Clipped surrogate loss (negative objective)"
  [probability-ratios advantages epsilon]
  (torch/mean
    (torch/neg
      (torch/min
        (torch/mul probability-ratios advantages)
        (torch/mul (torch/clamp probability-ratios (- 1.0 epsilon) (+ 1.0 epsilon))
                   advantages)))))

We can plot the objective function for a single action and a positive advantage.

(without-gradient
  (let [ratios  (range 0.0 2.01 0.01)
        loss    (fn [ratio advantage epsilon]
                    (toitem
                      (torch/neg
                        (clipped-surrogate-loss (tensor ratio)
                                                (tensor advantage)
                                                epsilon))))
        scatter (tc/dataset
                  {:x ratios
                   :y (map (fn [ratio] (loss ratio 0.5 0.2)) ratios)})]
    (-> scatter
        (plotly/base {:=title "Objective Function for Positive Advantage" :=mode :lines})
        (plotly/layer-point {:=x :x :=y :y}))))

actor loss over ratio for positive advantage

And for a negative advantage.

(without-gradient
  (let [ratios  (range 0.0 2.01 0.01)
        loss    (fn [ratio advantage epsilon]
                    (toitem
                      (torch/neg
                        (clipped-surrogate-loss (tensor ratio)
                                                (tensor advantage)
                                                epsilon))))
        scatter (tc/dataset
                  {:x ratios
                   :y (map (fn [ratio] (loss ratio -0.5 0.2)) ratios)})]
    (-> scatter
        (plotly/base {:=title "Objective Function for Negative Advantage" :=mode :lines})
        (plotly/layer-point {:=x :x :=y :y}))))

actor loss over ratio for positive advantage

We can now implement the actor loss function which we want to minimize. The loss function uses the clipped surrogate loss function as defined above. The loss function also penalises low entropy values of the distributions output by the actor in order to encourage exploration.

(defn actor-loss
  "Compute loss value for batch of samples and actor"
  [samples actor epsilon entropy-factor]
  (let [ratios         (probability-ratios samples (logprob-of-action actor))
        entropy        (torch/mul
                         entropy-factor
                         (torch/neg
                           (torch/mean
                             (entropy-of-distribution actor (:observations samples)))))
        surrogate-loss (clipped-surrogate-loss ratios (:advantages samples) epsilon)]
    (torch/add surrogate-loss entropy)))

A notable detail in XinJingHao’s PPO implementation is that the advantage values used in the actor loss (not in the critic loss!) are normalized.

(defn normalize-advantages
  "Normalize advantages"
  [batch]
  (let [advantages (:advantages batch)]
    (assoc batch :advantages (torch/div (torch/sub advantages (torch/mean advantages))
                                        (torch/std advantages)))))

Preparing Samples

Shuffling

The data required for training needs to be converted to PyTorch tensors.

(defn tensor-batch
  "Convert batch to Torch tensors"
  [batch]
  {:observations (tensor (:observations batch))
   :logprobs (tensor (:logprobs batch))
   :actions (tensor (:actions batch))
   :advantages (tensor (:advantages batch))})

Furthermore it is good practice to shuffle the samples. This ensures that samples early and late in the sequence are not threated differently. Note that you need to shuffle after computing the advantages, because the computation of the advantages relies on the order of the samples.

We separate the generation of random indices to facilitate unit testing of the shuffling function.

(defn random-order
  "Create a list of randomly ordered indices"
  [n]
  (shuffle (range n)))

(defn shuffle-samples
  "Random shuffle of samples"
  ([samples]
   (shuffle-samples samples (random-order (python/len (first (vals samples))))))
  ([samples indices]
   (zipmap (keys samples)
           (map #(torch/index_select % 0 (torch/tensor indices)) (vals samples)))))

Here is an example of shuffling observations:

(shuffle-samples {:observations (tensor [[1] [2] [3] [4] [5] [6] [7] [8] [9] [10]])})
; {:observations tensor([[ 1.],
;         [ 4.],
;         [ 6.],
;         [ 5.],
;         [10.],
;         [ 8.],
;         [ 7.],
;         [ 2.],
;         [ 9.],
;         [ 3.]])}

Creating Batches

Furthermore we split up the samples into smaller batches to improve training speed.

(defn create-batches
  "Create mini batches from environment samples"
  [batch-size samples]
  (apply mapv
         (fn [& args] (zipmap (keys samples) args))
         (map #(py. % split batch-size) (vals samples))))

(create-batches 5 {:observations (tensor [[1] [2] [3] [4] [5] [6] [7] [8] [9] [10]])})
; [{:observations tensor([[1.],
;         [2.],
;         [3.],
;         [4.],
;         [5.]])} {:observations tensor([[ 6.],
;         [ 7.],
;         [ 8.],
;         [ 9.],
;         [10.]])}]

Putting it All Together

Finally we can implement a method which

  • samples data
  • adds advantages
  • converts to PyTorch tensors
  • adds critic targets
  • normalizes the advantages
  • shuffles the samples
  • creates batches
(defn sample-with-advantage-and-critic-target
  "Create batches of samples and add add advantages and critic target values"
  [environment-factory actor critic size batch-size gamma lambda]
  (->> (sample-environment environment-factory (indeterministic-act actor) size)
       (assoc-advantages (critic-observation critic) gamma lambda)
       tensor-batch
       (assoc-critic-target critic)
       normalize-advantages
       shuffle-samples
       (create-batches batch-size)))

PPO Main Loop

Now we can implement the PPO main loop.

The outer loop samples the environment using the current actor (i.e. policy) and computes the data required for training.

The inner loop performs a small number of updates using the samples from the outer loop.

Each update step performs a gradient descent update for the actor and a gradient descent update for the critic. Another detail from XinJingHao’s PPO implementation is that the gradient norm for the actor update is clipped.

At the end of the loop, the smoothed loss values are shown and the deterministic actions and entropies for a few observations are shown which helps with parameter tuning. Furthermore the entropy factor is slowly lowered so that the policy reduces exploration over time.

The actor and critic model are saved to disk after each checkpoint.

(defn -main [& _args]
  (let [factory          pendulum-factory
        actor            (Actor 3 64 1)
        critic           (Critic 3 64)
        n-epochs         100000
        n-updates        10
        gamma            0.99
        lambda           1.0
        epsilon          0.2
        n-batches        8
        batch-size       50
        checkpoint       100
        entropy-factor   (atom 0.1)
        entropy-decay    0.999
        lr               5e-5
        weight-decay     1e-4
        smooth-actor-loss  (atom 0.0)
        smooth-critic-loss (atom 0.0)
        actor-optimizer  (adam-optimizer actor lr weight-decay)
        critic-optimizer (adam-optimizer critic lr weight-decay)]
    (doseq [epoch (range n-epochs)]
           (let [samples (sample-with-advantage-and-critic-target factory actor critic
                                                                  (* batch-size n-batches)
                                                                  batch-size
                                                                  gamma lambda)]
             (doseq [k (range n-updates)]
                    (doseq [batch samples]
                           (let [loss (actor-loss batch actor epsilon @entropy-factor)]
                             (py. actor-optimizer zero_grad)
                             (py. loss backward)
                             (utils/clip_grad_norm_(py. actor parameters) 0.5)
                             (py. actor-optimizer step)
                             (swap! smooth-actor-loss
                                    (fn [x] (+ (* 0.999 x) (* 0.001 (toitem loss))))) ))
                    (doseq [batch samples]
                           (let [loss (critic-loss batch critic)]
                             (py. critic-optimizer zero_grad)
                             (py. loss backward)
                             (py. critic-optimizer step)
                             (swap! smooth-critic-loss
                                    (fn [x] (+ (* 0.999 x) (* 0.001 (toitem loss))))))))
             (println "Epoch:" epoch
                      "Actor Loss:" @smooth-actor-loss
                      "Critic Loss:" @smooth-critic-loss
                      "Entropy Factor:" @entropy-factor))
           (without-gradient
             (doseq [input [[1 0 -1.0] [1 0 1.0] [0 -1 -1.0] [0 -1 1.0] [0 1 -1.0] [0 1 1.0] [-1 0 -1.0] [-1 0 1.0]]]
                    (println
                      input
                      "->" (action (tolist (py. actor deterministic_act (tensor input))))
                      "entropy" (toitem (entropy-of-distribution actor (tensor input))))))
           (swap! entropy-factor * entropy-decay)
           (when (= (mod epoch checkpoint) (dec checkpoint))
             (println "Saving models")
             (torch/save (py. actor state_dict) "actor.pt")
             (torch/save (py. critic state_dict) "critic.pt")))
    (torch/save (py. actor state_dict) "actor.pt")
    (torch/save (py. critic state_dict) "critic.pt")
    (System/exit 0)))

Visualisation of Actor Output

We can use dtype-next to visualise the output of the actor. First we need to load additional modules.

(require '[tech.v3.datatype :as dtype]
         '[tech.v3.tensor :as dtt]
         '[tech.v3.libs.buffered-image :as bufimg]
         '[tech.v3.datatype.functional :as dfn])

Here we load a pre-trained model and visualise the output of the actor.

(def actor (Actor 3 64 1))
(py. actor load_state_dict (torch/load "src/ppo/actor.pt"))
; <All keys matched successfully>

(let [angle-values   (torch/linspace (- PI) PI 854)
      speed-values   (torch/linspace 1.0 -1.0 480)
      grid           (torch/meshgrid speed-values angle-values :indexing "ij")
      cos-angle      (torch/cos (last grid))
      sin-angle      (torch/sin (last grid))
      observations   (torch/stack [(py. cos-angle ravel)
                                   (py. sin-angle ravel)
                                   (py. (first grid) ravel)]
                                  :axis 1)
      actions        (without-gradient
                       (py. (py. (py. actor deterministic_act observations)
                                 reshape 480 854) numpy))
      actions-tensor (dtt/clone
                       (dtype/elemwise-cast (dtt/ensure-tensor (py/->jvm actions))
                                            :float32))
      actions-trsps  (dtt/transpose actions-tensor [1 0])]
  (dtt/mset! actions-tensor 240 (dfn/- 1.0 (actions-tensor 240)))
  (dtt/mset! actions-trsps 427 (dfn/- 1.0 (actions-trsps 427)))
  (bufimg/tensor->image (dfn/* actions-tensor 255)))

Actor function output over state space This image shows the motor control input as a function of pendulum angle and angular velocity. As one can see, the pendulum is decelerated when the speed is high (dark values at the top of the image). Near the centre of the image (speed zero and angle zero) one can see how the pendulum is accelerated when the angle is negative and the speed small and decelerated when the angle is positive and the speed is small. Also the image is not symmetrical because otherwise the pendulum would not start swinging up when pointing downwards (left and right boundary of the image).

Automated Pendulum

The pendulum implementation can now be updated to use the actor instead of the mouse position as motor input when the mouse button is pressed.

(defn -main [& _args]
  (let [actor       (Actor 3 64 1)
        done-chan   (async/chan)
        last-action (atom {:control 0.0})]
    (when (.exists (java.io.File. "actor.pt"))
      (py. actor load_state_dict (torch/load "actor.pt")))
    (q/sketch
      :title "Inverted Pendulum with Mouse Control"
      :size [854 480]
      :setup #(setup PI 0.0)
      :update (fn [state]
                  (let [observation (observation state config)
                        action      (if (q/mouse-pressed?)
                                      (action (tolist (py. actor
                                                           deterministic_act
                                                           (tensor observation))))
                                      {:control (min 1.0
                                                     (max -1.0
                                                          (- 1.0 (/ (q/mouse-x)
                                                                    (/ (q/width) 2.0)))))})
                        state       (update-state state action config)]
                    (when (done? state config) (async/close! done-chan))
                    (reset! last-action action)
                    state))
      :draw #(draw-state % @last-action)
      :middleware [m/fun-mode]
      :on-close (fn [& _] (async/close! done-chan)))
    (async/<!! done-chan))
  (System/exit 0))

Here is a small demo video of the pendulum being controlled using the actor network. You can find a repository with the code of this article as well as unit tests at github.com/wedesoft/ppo.

automatically controlled pendulum

Enjoy!

Permalink

Clojure Deref (Apr 21, 2026)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS).

Clojure Documentary

The Clojure Documentary is live!

Afterward, enjoy the Clojure Documentary Q&A with Rich Hickey and other key people in Clojure’s history!

Don’t miss the Documentary show notes with links to:

  • The foundational research papers

  • Influential books

  • Rich’s talks

  • Historical archives

  • Dialects and runtimes

  • Community resources

  • Getting started videos

  • A glossary

  • and more!

Clojure Community Check-In

The world is going through changes: in programming, technology, work, and in specific countries and regions, each with its own form of trouble, hope, or confusion.

People in Clojure communities, like elsewhere, are finding their way through it, sometimes with questions and sometimes with a sense of being alone in it.

The Clojure Community Check-In is a space to share how we’re doing.

Watch a short video from the organizers.

Sessions:

Clojure/Conj 2026

September 30 – October 2, 2026
Charlotte Convention Center, Charlotte, NC

Join us for the largest gathering of Clojure developers in the world! Meet new people and reconnect with old friends. Enjoy two full days of talks, a day of workshops, social events, and more.

Early bird and group tickets are now on sale.

Is your company interested in sponsoring? Email us at clojure_conj@nubank.com.br to discuss opportunities.

Upcoming Events

Libraries and Tools

Debut release

  • webgen - Parameter driven web app generator

  • cljam - Clojure interpreter with a tokenizer, reader, macro expander, evaluator, incremental compiler, vite plugin, nREPL server compatible with calva on vscode, embedded browser REPL, CLI compatible with node and bun as host

  • bisql - Keep SQL executable, call it as Clojure functions 🚲️

  • miniforge-standards - Shared engineering standards for all miniforge.ai repositories

  • cljs-mjml - Write MJML email templates with Hiccup syntax in ClojureScript (or Node Babashka)

Updates

  • clojure 1.12.5-rc1 - The Clojure programming language

  • clj-kondo 2026.04.15 - Static analyzer and linter for Clojure code that sparks joy

  • baredom 2.2.0 - BareDOM: Lightweight CLJS UI components built on web standards (Custom Elements, Shadow DOM, ES modules). No framework, just the DOM

  • clj-format 0.1.2 - A Clojure DSL for cl-format inspired by Hiccup. No dependencies. Drop-in compatibility. The power of FORMAT made easy.

  • any 0.1.1 - Objects for smart comparison in tests.

  • spel 0.9.5 - Idiomatic Clojure wrapper for Playwright. Browser automation, API testing, Allure reporting, and native CLI - for Chromium, Firefox, and WebKit

  • ordered-collections 0.2.1 - Fast, modern, ropes and ordered collections that do more than sort.

  • dexter 0.1-alpha-6 - Dexter - Graphical Dependency Explorer

  • gloat 0.1.26 - Glojure AOT Tool

  • glojure 0.6.5-rc17 - Clojure interpreter hosted on Go, with extensible interop support.

  • squint 0.11.188 - Light-weight ClojureScript dialect

  • dataspex 2026.04.1 - See the shape of your data: point-and-click Clojure(Script) data browser

  • meme-clj 5.0.0 - meme-clj — M-Expressions with Macro Expansion

  • charm.clj 0.2.71 - A Clojure TUI (Terminal User Interface) library inspired by Bubble Tea

  • clj-xref 0.1.1 - LLM-friendly cross-reference database for Clojure code. Query who-calls, calls-who, who-implements, ns-deps to feed precise dependency neighborhoods to AI assistants instead of entire source trees. Built on clj-kondo.

  • babashka 1.12.218 - Native, fast starting Clojure interpreter for scripting

  • fs 0.5.33 - File system utility library for Clojure

  • phel-lang 0.34.1 - A functional, Lisp-inspired language that compiles to PHP. Inspired by Clojure, Phel brings macros, persistent data structures, and expressive functional idioms to the PHP ecosystem.

  • nippy 3.7.0-beta1 - Fast serialization library for Clojure

  • statecharts 1.4.0-RC11 - A Statechart library for CLJ(S)

  • clojure-clr clojure-1.12.3-alpha7 - A port of Clojure to the CLR, part of the Clojure project

Permalink

Your GitHub Actions Workflow is a Waste of Time

Over the years, I’ve experimented with almost every flavor of development environment. I’ve gone from manually provisioning tools on a Mac—hoping I’d remember every brew install six months later—to exploring Docker, Nix, and remote environments.

My journey has touched it all: asdf, Brew, Docker, Nix, and Devbox. I’ve jumped between terminal emulators and multiplexers like Tmux, Kitty, WezTerm, and Zellij.

My Modern Development Environment

My latest setup is built for speed, reproducibility, and a “keyboard-first” philosophy. It lives entirely in the terminal across two environments: my local iMac and an OCI Ampere VPS.

  • Terminal: Ghostty
  • Multiplexer: Zellij
  • Environment Management: Devenv
  • Editor: Doom Emacs

Achieving CI Parity with Self-Hosted Runners

If you haven’t switched to a self-hosted GitHub runner yet, do it for your own sanity. You can thank me later.

By running your CI on your own hardware (like an OCI Ampere instance), you eliminate the overhead of public runners and gain full control over the environment. When paired with Devenv, your CI environment becomes an exact mirror of your local machine.

How to set it up:

  1. Navigate to your GitHub repository Settings.
  2. Go to Actions -> Runners.
  3. Click New self-hosted runner and follow the configuration steps for your OS.
  4. Update your workflow .yml file to use the self-hosted label.

Quantifying the Impact on Feedback Loops

By moving to a self-hosted ARM64 runner, my feedback loop became incredibly tight. My tests now finish in 24 seconds, and the entire image creation process takes just 1 minute and 22 seconds.

Here is what the streamlined job looks like:

jobs:
test:
runs-on:
- self-hosted
- Linux
- ARM64
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Run tests
id: run_tests
run: devenv shell clojure -X:test

The Power of Declarative Environments

Because I’m using devenv shell, I don’t have to worry about whether the CI runner has Clojure, the right JDK, or specific libraries installed. If it works in my local terminal, it works in the CI. Period.

Final Thoughts: Simplify Your Workflow

The way we use GitHub Actions today is often redundant. We spend an enormous amount of time writing complex YAML configurations to install dependencies, manage versions, and configure caching—essentially re-architecting our entire development environment for every single commit.

Provisioning and caching are solved problems. If you are using tools like Devenv or Nix, you’ve already defined exactly what your project needs to run. By moving to a self-hosted runner, you stop fighting the CI and start using it as a natural extension of your workstation. You gain:

  • Total Parity: If the code runs in your local devenv shell, it will run in CI. No exceptions.
  • Instant Caching: Since the runner is persistent, you don’t need to upload or download massive cache blobs; the dependencies are already there.
  • Minimal Configuration: Your workflow files shrink from dozens of lines of setup boilerplate to a single command.

It’s time to stop treating CI like a special snowflake and start treating it like the high-performance terminal it should be. Stop provisioning twice, stop waiting for public runners, and start shipping faster.

Would you like to have a follow-up on this topic? What are your thoughts? I’d love to hear your experiences.

Permalink

How Nubank Uses Transformers to Model Financial Habits at Scale

Written by: Nubank Editorial

What if, instead of relying on manual feature engineering, we could learn directly from raw financial behavior at scale? 

That was the thesis we set out to defend on episode 122 of the Data Hackers podcast, Brazil’s largest Data and AI community. Founded in 2018, the community brings together thousands of data professionals and thought leaders to discuss the cutting edge of technology.

In conversation with hosts Monique Femme and Paulo Vasconcellos, Arissa Yoshida and Rafael Celente, Senior Research Engineers at Nubank, walked through the breakthroughs behind the paper “Your Spending Needs Attention: Modeling Financial Habits with Transformers” and how this research is already making its way into production.

The starting point is straightforward: financial institutions sit on massive volumes of data — transactions, in-app events, customer interactions — yet extracting real value from this data remains a hard problem. Its sequential, unstructured nature has historically pushed teams toward tabular models built on hand-crafted features.

The paper charts a different course: leveraging Transformer-based architectures and self-supervised learning to build representations directly from raw data. This work gave rise to nuFormer, a model that blends structured and textual transaction attributes and supports fine-tuning for tasks like credit scoring, fraud detection, and product recommendation — delivering measurable gains at scale.

From traditional machine learning to foundation models

To appreciate why this matters, consider where the industry started. For years, traditional ML models — particularly tree-based methods paired with heavy feature engineering — dominated financial applications. These models remain effective, but they hit a ceiling when the problem involves large volumes of unstructured data and the need to capture complex temporal patterns.

At Nubank, where we have an extraordinarily rich dataset — especially long sequences of financial transactions — this limitation becomes hard to ignore. As Arissa Yoshida puts it, these traditional approaches lean heavily on a manual, specialized step of variable construction.

 “With traditional models, you rely heavily on handcrafted features — essentially building an entire engineering pipeline to extract value from your data. That requires people with deep domain expertise who can manually work through the data.” 

Arissa Yoshida, Senior Machine Learning Engineer at Nubank

This dependency makes the process less scalable and more expensive, particularly as data volume and complexity grow. Rafael Celente reinforces this point by explaining that the challenge goes beyond modeling itself — it’s about generalization: “we have a massive dataset, and our hypothesis was that we could get a model to generalize customer behavior from that data.”.

This limitation, combined with the need for models that learn directly from data, opens the door to foundation models in finance.

Treating financial data as language

The key paradigm shift lies in how we look at the data. Rather than treating transactions as isolated records, the idea is to interpret them as sequences with structure, context, and meaning — much like natural language.

Transformers operate on tokens and learn relationships between them. By converting transactions into tokenized sequences, we can capture behavioral patterns at a much deeper level. The model doesn’t care whether it’s processing words, pixels, or financial events — what matters is the relationships between these elements.

This flexibility is precisely what makes it possible to apply an architecture originally designed for natural language to an entirely different domain like finance.

What nuFormer is and why it matters

This is the context in which nuFormer, was born — a foundation model developed by Nubank’s AI Core team to learn representations from financial data at scale. The goal isn’t to solve a single problem, but to build a reusable foundation for different applications across the bank. From these representations, we can improve use cases like fraud detection, product recommendation, and risk modeling.

The key differentiator is generalization. Instead of training a model from scratch for every problem, nuFormer learns a representation of financial behavior that can be reused across multiple contexts, giving different applications a shared starting point. 

As Arissa Yoshida explains in the episode, the vision behind this new kind of model is “to generalize and extract insight from raw, often unstructured data, and scale that across many different problems.”.

Although the initial work started with transactions, the model quickly evolved to incorporate different data types. Today, the vision is multimodal — capable of integrating not just structured financial data, but also behavioral signals, in-app interactions, and other information sources. 

This broadens the model’s potential significantly: it moves beyond isolated events to represent a more complete picture of customer behavior, unlocking more sophisticated applications.

This evolution also connects to other AI Core initiatives, such as AI agents that leverage these representations to operate in real-world scenarios at scale. The team shared these examples in the posts “Building AI agents in practice with Clojure” and “Building AI agents for 131 million customers”, here on Building Nubank.

Engineering, data, and governance for foundation models

One of the most insightful parts of the conversation made clear that the biggest challenge isn’t the model itself — it’s the engineering required to make it work. Training a model of this scale demands robust infrastructure: well-structured pipelines, GPU management, and distributed training. But the real pain point shows up when you try to take it to production.

Transformer-based models tend to have higher latency, which can be a sensitive factor in financial applications. Still, with the right infrastructure and specialized teams, it’s possible to achieve performance levels comparable to traditional models. This reality highlights that the challenge extends beyond ML — it’s a systems problem that requires cross-functional collaboration. As Rafael Celente sums it up: “it’s not just a machine learning problem — it’s a systems problem.{RQ}.

This complexity extends to the role of data and model evaluation. While training at scale is already a reality, ensuring models are learning correctly remains one of the biggest challenges. That involves building consistent data pipelines, continuous monitoring, and defining metrics aligned with business impact.

On top of that, the financial sector adds another layer of rigor: governance. Models must pass multiple rounds of validation before going to production, ensuring compliance with regulations and internal standards. In this landscape, building foundation models requires the joint effort of data engineering, infrastructure, evaluation, product, and business teams — ensuring solutions not only work, but deliver sustainable real-world impact.

Results and impact

Deploying these models into existing systems has driven significant gains in key metrics within just a few months — surpassing improvements that had been accumulated over years with traditional approaches.

These gains aren’t limited to a single use case. The model is already being applied across multiple fronts, including credit, lending, income prediction, and cross-sell, demonstrating that the approach can be reused across a variety of contexts within the bank.

This reusability doesn’t just accelerate the development of new solutions — it creates a multiplier effect, allowing different products to benefit from the same learning foundation.

Looking ahead, our ambition isn’t just to keep up with the state of the art — it’s to contribute to it. That means exploring new architectures, expanding multimodal capabilities, and continuing to share what we learn with the community. As discussed in the episode, the goal is to challenge the status quo and set new standards for AI in finance.

Our appearance on Data Hackers Podcast #122 underscores a central pillar of our strategy: foundation models are already being applied in practice to solve real problems in finance, with direct impact on how we build products, make decisions, and scale intelligence.

By applying Transformers to model financial habits at scale, Nubank is building an AI platform that learns directly from data and evolves continuously. nuFormer in production, with applications in credit and beyond, shows how this approach can expand horizontally and generate consistent value.

If you want to work on problems like these — dealing with data at scale, developing foundation models, and impacting over 131 million customers — we’re hiring on the AI Core team.

The post How Nubank Uses Transformers to Model Financial Habits at Scale appeared first on Building Nubank.

Permalink

Biff 2.0 sneak peak

I have for the past year or two been working on some large Biff changes, such as those discussed in Structuring large Clojure codebases with Biff and Biff support for XTDB v2 is in pre-release. Now that coding agents have gone mainstream (and in particular, now that I personally have started using them heavily), I've had a few more ideas for changes I'd like to make to Biff. And also thanks to coding agents, I've actually been able to make consistent progress instead of my Biff development time being bottlenecked by how late I can stay awake on weekend nights after my kids are sleeping. So we, fingers crossed, are getting close to some major Biff updates, and I figure I may as well slap a 2.0 label on it.

Here's what I've got in the works.

SQLite will be the default database

This is the biggest change. Biff will retain first-class support for XTDB, but it'll also have first-class support for SQLite, and I'll update the starter project to use SQLite by default. There will still be a (non-default) starter project that uses XTDB.

Biff has used XTDB since its (Biff's) initial release in 2020, back when the database was still called Crux. About a year ago I started working on migrating Biff from XTDB v1 to XTDB v2, which brings a whole new architecture, including column-oriented indexes that make analytical queries faster. Besides writing some Biff-specific helper code for XTDB v2, I migrated Yakread (a 10k-LOC article recommender system) to v2 and did a bunch of benchmarking for Yakread's queries. (A big thank you to the XTDB team who responded to lots of my questions during this time and also made a bunch of query optimizations!)

Long-story short: despite the optimizations, I had trouble getting Yakread's page load times to be as quick as I wanted. For the particular queries Yakread runs—which are mostly row-oriented—I've generally found v2's performance to be slower than v1. There is also a larger per-query latency overhead, perhaps another design tradeoff of the new architecture (you can still run v2 as an embedded node within your application process, but it’s designed primarily to be run on a separate machine like more traditional networked databases).

I also will admit that before this benchmarking exercise I had not actually used SQLite much, and I was unaware of how ridiculously fast it is. And one of the main downsides of SQLite when compared to XTDB—that SQLite is a mutable database—is mitigated by Litestream, which streams changes to object storage and lets you restore from (and even run ad-hoc queries on) historical snapshots saved with 30-second granularity.

I could see myself switching back to XTDB at some point in the future. It's still the early days for v2 and the XTDB team is doing lots of work, including on query performance. And SQLite's speed comes with tradeoffs:

  • Scaling beyond one machine is an unsolved problem. Litefs can let you put SQLite nodes in a cluster where writes get forwarded to a single leader and changes are streamed to the other nodes. However, to use it with Litestream, you have to disable automatic leader failover. So you basically have to choose between HA or PITR.

  • SQLite only supports a few basic datatypes: ints, floats, strings, and blobs (byte arrays). A large part of my work in integrating SQLite into Biff has been to set up automatic data type coercion so you can use richer types (UUID, boolean, instant, enum, map/set/vector) in your schema without having to do manual coercion when reading and writing.

  • Litestream's snapshots-at-30-second-granularity is fine for recovering from bad transactions like a DELETE FROM without the WHERE, but it's less helpful than XTDB/Datomic for the debugging-weird-production-issues use case: you can't include a transaction ID or similar in your production logs and then re-run queries with 100% confidence that the results you're seeing are what the application saw when it e.g. threw an unexpected exception.

I was chatting with Jeremy from the XTDB team last week, and he mentioned they've been working on having XTDB ingest changes directly from Postgres. It sounds like it shouldn't be much work to make that work with SQLite too, which means that you could stick an XTDB node alongside your SQLite-powered Biff app and then get more granular historical queries. Maybe XTDB could be a replacement for Litestream?

That could get even more interesting if eventually we can do the inverse as well, where data from our immutable XTDB log could be sent both to a bitemporal index for historical queries and also to SQLite "indexes"/databases for the application servers to use. That would solve the HA problem too.

Anyway. However it happens, I'm looking forward to the glorious future when we finally have an inside-out database that's fast for all query shapes, highly available, models time correctly, and can even do advanced things like let you put a UUID in it. In the meantime, I think SQLite is a reasonable default given Biff's focus on solo developers, and I would absolutely consider XTDB today for situations in which modeling time correctly is a top concern.

Alternate starter projects will get easier

Biff consists of a starter project, a bunch of helper code exposed through a single com.biffweb namespace, tooling for CLI tasks and deployment, and a big pile of documentation. The com.biffweb namespace is on its way out: I'll be publishing Biff helper code as individual libraries like com.biffweb.sqlite (and com.biffweb.xtdb), com.biffweb.authentication, com.biffweb.middleware, com.biffweb.config, etc.

Part of the motivation for this change is that Biff is more mature than it was five years ago and it's become more clear what the different cohesive parts of Biff should actually be. I started out with a single kitchen-sink library because splitting it up felt premature; I didn't think it would realistically make sense to use one of them outside a standard Biff project that would already be depending on all the Biff libraries anyway.

But over the past few months, I've been developing a couple new side projects from scratch without even using Biff. As I've done this, I've started extracting various things into standalone libraries, and this time I do see them as useful libraries in their own right. For example, the new biff.authentication library will be an easy way to add email-based authentication to any Clojure web app that uses Reitit—it even comes with a default sign-in page.

The other factor behind this change is agent-driven development. The difficulty of mixing-and-matching different libraries is dramatically easier now to the point where I wondered briefly if Biff was even needed anymore. Developing those new side projects via agent has disabused me of that notion: agents still need a lot of structure (e.g. in the form of these Biff libraries) to guide them. Even for starting new projects, why have everyone generate a different starter project via some prompt when you could have a single person generate the starter project, make sure it actually works, and then publish that?

That's still a meaningful change though: the effort required to create and maintain new project templates has decreased significantly. So I think it makes more sense for Biff to be split up into multiple libraries that can themselves be mixed-and-matched. I will myself provide Biff starter projects for SQLite and XTDB, respectively. If anyone else wants to make a Biff starter project variant with different library choices, they'll similarly be able to do that without much effort.

For vanity reasons, I'll need to continue having a single "main" Biff repo of some sort (did I mention Biff hit 1,000 github stars recently?). Maybe I'll have that repo be the default starter project.

New approaches for structuring application logic

Two of these Biff libraries that happen to contain some new stuff—instead of being a splitting-out of code that was already in Biff—are biff.graph, which lets you structure your domain model as a queryable graph, inspired by Pathom; and biff.fx, which helps you remove effectful code from your application logic via state machines.

Both libraries help you write purer code (and thus code that's easier to understand and test). biff.graph is a higher-level abstraction that helps with code that reads data. biff.fx is a lower-level thing that I mostly use when writing data. However they're also useful together: e.g. my GET request handlers are typically biff.fx machines that run a biff.graph query and pass the results to the (now pure) rendering code:

(def some-route
  ["/some-page/:id"
   {:get
    (fx/machine ::some-page

      :start
      (fn [{:keys [path-params] :as request}]
        {:stuff [:biff.fx/graph
                 {:stuff/id (parse-uuid (:id path-params))}
                 [:stuff/foo :stuff/bar]]
         :biff.fx/next :render-stuff})

      :render-stuff
      (fn [{:keys [stuff] :as request}]
        {:status 200
         :headers {"content/type" "text/html"}
         :body (render-html
                [:div "foo: " (:stuff/foo stuff)
                 ", bar: " (:stuff/bar stuff)])}))}])

biff.fx provides a defroute macro to make this kind of thing more concise, so the code I actually write looks more like this:

(fx/defroute some-page "/some-page/:id"
  [:biff.fx/graph
   {:params/stuff [:stuff/foo :stuff/bar]}]

  :get
  (fn [request stuff]
    [:div
     "foo: " (:stuff/foo stuff)
     ", bar: " (:stuff/bar stuff)]))

I'll save a fuller explanation for later; hopefully that gives you the flavor of what these libs do.

I've been using Pathom heavily over the past few years, both for work and pleasure. I've started referring to the code structure it enables as “data-oriented dependency injection.” It helps you structure your application in small easy-to-understand chunks that declare exactly what data they need as input and what data they provide as output. The main downside in my experience is that it can be difficult to understand exactly what Pathom is doing and debug when things go wrong.

For “serious” projects, that's a price worth paying. For the kinds of solo projects that Biff is aimed at, I've felt apprehensive about foisting another layer of abstraction on people for code structure benefits that they may or may not notice.

However, my own experience is that even for small apps, the benefit is real. So biff.graph is an attempt to provide the same graph computational model / “data-oriented dependency injection” with as small of an implementation as possible: biff.graph is about 400 lines of code currently, whereas Pathom is closer to 10k.

The main tradeoff I've made in service of that goal is to omit the query planning step that Pathom uses. biff.graph traverses directly over your input query, looking up which resolver(s) to call for each attribute as it goes. For each resolver, biff.graph runs what is more-or-less a separate query to get that resolver's inputs. This hopefully makes biff.graph easier to trace and understand what it's doing, but it also means biff.graph isn't able to optimize the query plan the way Pathom does. (biff.graph does support batch resolvers and caching at least).

biff.fx is more of an original creation. Instead of a single function, you have a map of functions, one for each state. Effects happen in the transitions. You define global “fx handlers” that do things like HTTP requests, database queries/transactions, etc, represented by keywords (e.g. :biff.fx/graph in the example). I’ve changed up the format for describing effects a few times; I think I've finally landed on something that feels ergonomic ([:do-something arg1 arg2] as a replacement for (do-something! ctx arg1 arg2)).

Authorization rules are so back

Biff entered this world as a replacement for Firebase, which I had enjoyed using but left me with the desire for a regular long-lived clojure backend. Firebase lets your frontend submit arbitrary transactions from the frontend, and then they're checked against some centralized authorization rules you define (e.g. “documents in the stuff table can only be edited if the current user's ID is the same as stuff.user_id”). I implemented a similar thing where you would submit transactions in a format similar to Firebase's, then I would translate them to XTDB's transaction format and pass a diff of the database changes to your authorization functions.

I ended up abandoning the SPA approach altogether for server-side rendering (with htmx), and that made authorization rules unnecessary since transactions were originating from the backend: I no longer needed to validate completely arbitrary transactions.

Once again, coding agents have changed the game. When working on mature codebases, of course we all read our generated code carefully before submitting a pull request. But when I've got a new app idea, I want to mostly just vibe code it until I get to the MVP. I'd like to be able to do a light review just to make sure the structure of the code is reasonable. With authorization rules, you can carefully review those central rules in the same way you'd carefully review the database schema, and then you can have confidence that the feature code isn't missing an authorization check. (Of course you still have to make sure the agent didn't bypass the authorization rules...)

This is only for writing data. For reading data, I typically have a few Pathom/biff.graph resolvers that e.g. read an entity ID from the incoming request's path parameters and ensure the user has access to that entity (like the :param/stuff resolver alluded to in the example above). Other related entities are queried as joins against that root entity, so if the authorization check fails, the rest of the query will fail too. So once again you have a way to put authorization logic in a central place that can be reused by your feature-specific code.

oh yeah and datastar

As mentioned above, Biff uses htmx. I like server-side rendering and I think it's a particularly good fit for Biff's solo developer focus. htmx however has a critical flaw: it's too popular. It has 47k github stars—that's half of what Tailwind has.

Datastar fixes this problem by being a much younger project—a niche of a niche. There is a much smaller chance that your colleagues will have heard of it. Datastar also has some smaller but still tangible benefits:

  • It has some frontend reactivity built in. With htmx, you typically use another tool like _hyperscript or Alpine.js to provide interactivity in cases where you really don't want to wait for a server roundtrip (e.g. a dropdown menu). Datastar has a concept of "signals" baked in so you don't need a second tool.
  • It has a smaller API surface; much of what htmx offers is replaced by "just use out-of-band swaps." So it might be easier to learn?
  • It works well for fancy CQRS stuff (still on my list of things to try out).

Of the changes I've mentioned, this one is the most experimental. I actually haven't even made an official decision if I really will switch Biff from htmx to Datastar; at this point I'm just making a prediction that I probably will.

More broadly I would like to explore how far I can push the server-side rendering model before I feel it breaking down. e.g. what approach would I use with it to handle forms with 50+ fields and lots of conditional logic, complex validation logic etc? How about charts? (What I'm getting at: would I regret asking an LLM to migrate our large codebase at work over to htmx/Datastar?).


I’d like to give an honorable mention to inline snapshot testing which I’ve been excited about for a year and a half but now find unnecessary—counterproductive, even—with coding agents. I had started working on some updates to my test code so you could do inline snapshot tests in plain .clj files instead of in .edn files (turns out that tooling support is best when you put your code in files meant for code). But with coding agents, I’ve found that I don’t want tests that auto-update when the actual result changes: it’s too easy for agents to ignore new results that are obviously incorrect. And of course I don’t care if my coding agent finds updating unit tests to be tedious. So the test-related stuff that Biff does will be limited to making your application code more pure so you (your agent) can write dumb (is (= (f x) y)) tests. I might add some structure/patterns for integration tests, though.

Another change driven by coding agents, not a change to the code but a change to my philosophy: I'm more interested in smaller projects. As mentioned, my time for working on personal projects has been extremely limited until a few months ago. I've only ever had a single Biff project at a time that I have attempted to work on regularly; new projects started after the old one failed. So the primary use case I designed Biff for was “serious side projects,” applications that may be solo projects now but will definitely be bringing in a 6-figure income and fulfilling all your entrepreneurial desires at... some point. That one project is the only thing I've ever had a chance of having time for.

Now I can code up an MVP for something over a weekend without ever sitting down at my desk. I built an app that helps me find good Star Wars: Unlimited decks to play. I'm building a blogging platform next. After that maybe I'll build a music recommender system. Or a state legislation tracker/summarizer.

I'm having a blast. Maybe that will affect design decisions I make down the road? I certainly am interested in the use case of doing agent-driven development from a mobile device, so maybe expect something in that area.

Permalink

Exploring core.async.flow as an Agent Executor

I have been building an experimental graph-based AI agent runtime in Clojure called Ayatori. It is a side project where I try to stay hands-on with new ideas, in my favorite language. For Ayatori, I had in mind agents as graphs: nodes as functions, edges defining routing, and an executor tying it together.

This article is about exploring whether core.async.flow could replace the executor.

The agent model

In Ayatori, an agent is a data structure. It declares nodes, edges, capabilities, and dependencies.

{:nodes {:preprocess (fn [input] {:result {...}})
         :llm        {:type :llm :client ... :prompt "..."}
         :search     (fn [input] {:result {...}})}
 :edges {:preprocess :llm
         :llm        {:search :search :done :ayatori/done}
         :search     :llm}
 :caps  {:chat {:entry :preprocess}}
 :deps  [:external-cap]}

Nodes are either plain functions or typed maps for built-in behaviors like LLM invocation or fan-out. Edges are either a keyword (unconditional routing) or a map from a route key to a target node (conditional routing). Caps declare which entry points are externally callable. Deps declare which external capabilities the agent needs at runtime, resolved by the system at start time.

The executor’s job is to take this definition and run it.

What I was building by hand was a straightforward go-loop:

(defn execute [graph input]
  (async/go
    (loop [node (:entry graph)
           data input
           step 0]
      (when (>= step max-steps)
        (throw (ex-info "Max steps exceeded" {})))
      (let [result (<! (invoke-node graph node data))]
        (when (instance? Throwable result)
          (throw result))
        (if-let [next (resolve-route graph node result)]
          (recur (:next-node next) (:input next) (inc step))
          (extract-result result))))))

This worked well enough for an experiment. But I kept accumulating concerns around it: What if I need to pause mid-execution? What if a node produces faster than the next can consume? What if I want to stop one node without tearing down the whole graph? Each concern meant more code in the executor.


For readers unfamiliar with core.async.flow, the overview and guide are good starting points.


Mapping to core.async.flow

Thinking about those concerns, node lifecycle, channel wiring, backpressure, routing, the problem started to feel familiar. These are the kinds of concerns a runtime handles, not application code. Then I remembered core.async.flow. I had not found the time to look at it carefully when it was first announced. Working on this executor turned out to be a good reason to go back.

core.async.flow describes itself as a library for building “concurrent, event driven data processing flows out of communication-free functions, while centralizing control, reporting, execution and error handling.” The model is a directed graph of processes communicating via channels. You define step functions that transform data. The flow manages channel creation, process lifecycle, backpressure, and error handling.

Those were the same concerns accumulating in the executor. At first glance, the agent DSL and flow’s topology model seem to map well onto each other. An agent is a directed graph of computation steps. A flow is a directed graph of processes communicating through channels. The unit of work in both is a function that takes input and produces output. The connections are explicit and declared separately from the logic. That structural similarity is what made me want to try this.

  • Agent → Flow graph

  • Node → Process

  • Edge → Connection [[from :out] [to :in]]

  • Conditional edge → Multi-output process, route key selects port

  • Cap entry point → flow/inject target

  • Dep (cross-agent call) → IO process, blocking resolver

  • Fan-out node → Multi-output process with step state correlation

  • Graph result → Collector sink

The topology that I was managing imperatively in a go-loop would become a data structure passed to create-flow. Each node requires a step function and explicit port declarations, but the wiring and lifecycle are handled by the framework.

A simplified view of what that topology definition looks like:

{:procs {:preprocess {:proc (process #'preprocess-step)}
         :llm        {:proc (process #'llm-step) :args llm-config}
         :search     {:proc (process #'search-step)}
         :collector  {:proc (process (collector-step registry))}}
 :conns [[:preprocess :out] [:llm :in]]
         [[:llm :search]    [:search :in]]
         [[:search :out]    [:llm :in]]
         [[:llm :done]      [:collector :in]]]}

The nodes, edges, and routing from the Ayatori definition translate directly into procs and conns.


The mismatch

flow’s model is fire-and-forget. A caller injects a message and the graph processes it downstream. There is no built-in way to wait for a result.

Ayatori needs callers to wait for a result. This mismatch was not a surprise. Flow is a poor fit for RPC-style request paths, as Alex Miller noted on Ask Clojure. What I wanted to find out was what bridging that gap would actually cost in practice.

What I did was build request/reply semantics on top of flow using a correlation ID.

When a caller invokes a cap, a UUID is generated and stored in a registry alongside a promise channel. The message carries that ID through the entire flow. The terminal node is a dedicated collector process that delivers results directly to the caller’s channel. I initially considered using ::flow/report for result delivery, but the guide describes it as a channel for unified logging across the flow. Using it for results would be working against its purpose, so I went with the collector process instead.

;; inject side
(let [corr-id (str (random-uuid))
      result-ch (async/promise-chan)]
  (swap! registry assoc corr-id {:ch result-ch})
  (flow/inject flow [entry-key :in] [{:data input :corr-id corr-id}])
  result-ch)

;; collector step
(defn- collector-step [registry]
  (flow/map->step
   {:describe (fn [] {:ins {:in ""}})
    :transform
    (fn [state _ msg]
      (when-let [{:keys [ch]} (get @registry (:corr-id msg))]
        (async/put! ch (:data msg))
        (swap! registry dissoc (:corr-id msg)))
      [state {}])}))

;; topology - terminal outputs route to collector
{:procs {...
         :collector {:proc (process (collector-step registry))}}
 :conns [...
         [[:llm :done] [:collector :in]]]}

Error handling follows the same pattern. Processes throw exceptions with the corr-id in ex-data. A consumer on the flow’s error channel extracts the corr-id and delivers the exception back to the correct caller. Without this, errors would go to the error channel but never reach the caller waiting on the promise.

(catch Throwable e
  (throw (ex-info "Node failed" {:corr-id corr-id} e)))

;; error handler
(let [corr-id (some-> err ::flow/ex ex-data :corr-id)]
  (when corr-id
    (deliver-result! registry corr-id (::flow/ex err))))

The corr-id is a plain string. In multi-node work, the same pattern can apply across process boundaries without structural change.

This also introduces external mutable state. The registry is an atom that lives outside the flow. Flow’s own model assumes communication-free functions. The collector process breaks that assumption: it reads from and writes to the registry. This feels like forcing the model rather than working with it.


Fan-out

Fan-out requires coordinating results from multiple branches before emitting a final output. A fan-out node broadcasts to N processes and waits for all of them to respond.

flow does not yet provide a built-in way to wait for results from multiple branches. Rich Hickey has noted a planned sync->map process that will handle exactly this. For now, step state serves as a workaround: the fan-out process accumulates branch results across invocations and emits only when all have arrived.

This comes with tradeoffs. If a branch never responds, the pending entry stays in state indefinitely. There is no timeout in this implementation. For an experiment on a single node this is acceptable, but it is something to address before going further.

Each broadcast generates a fan-out ID. The step state holds a pending map keyed by that ID. When all expected results arrive, the step emits the aggregate with the original corr-id.

;; step state init
([_params] {:pending {}})

;; on input - broadcast to all branches
([state :in msg]
 (let [fan-id  (random-uuid)
       outputs (into {} (map (fn [b] [b [{:data (:data msg) :fan-out-id fan-id}]])
                             branches))]
   [(assoc-in state [:pending fan-id] {:expected (set branches)
                                       :results  {}
                                       :corr-id  (:corr-id msg)})
    outputs]))

;; on branch result - collect and maybe emit
([state result-port msg]
 (let [fan-id  (:fan-out-id msg)
       branch  (parse-branch result-port)
       updated (update-in state [:pending fan-id :results] assoc branch (:data msg))]
   (if (all-branches-done? updated fan-id)
     [(dissoc-in updated [:pending fan-id])
      {:out [{:data    (get-in updated [:pending fan-id :results])
              :corr-id (get-in updated [:pending fan-id :corr-id])}]}]
     [updated {}])))

Thanks for reading Taorem! Subscribe for free to receive new posts and support my work.


Streaming: the self-loop

The LLM node handles streaming responses. Tokens arrive on a channel as the provider sends them. The node needs to read from that channel continuously until the stream is done, then emit the final result.

The approach is a self-loop: the process routes output back to one of its own input ports. On each invocation, it reads one event from the stream channel and either loops back to read another or emits the final result.

;; on request arriving at :in
([state :in msg]
 (let [stream-ch (llm/start-stream config (:data msg) state)]
   [(assoc state :stream-ch stream-ch :corr-id (:corr-id msg))
    {::self-out [{:type :next}]}]))

;; on self-loop trigger at ::self-in
([state ::self-in _]
 (let [event (async/<!! (:stream-ch state))]
   (case (:type event)
     :delta [(update state :accumulated str (:delta event))
             {::token-out [{:delta (:delta event) :corr-id (:corr-id state)}]
              ::self-out  [{:type :next}]}]
     :done  [(dissoc state :stream-ch :corr-id :accumulated)
             {::done-out [{:data (:message event) :corr-id (:corr-id state)}]}])))

The process routes ::self-out back to ::self-in in the flow topology. This keeps the process self-contained: it does not need a reference to the flow itself, and the loop is visible in the topology definition rather than hidden inside imperative code.

Ayatori targets Java 21+. Processes declared with :workload :io run on virtual threads. That is why a blocking read inside the self-loop is not a practical concern. If flow were used on an older JVM, this implementation would need a different approach.

The flow guide says a step need not output at all: it can receive input, update its state, and emit nothing until it is ready. I think this self-loop pattern fits that intent. That said, the blocking read is working around flow’s async model rather than with it. It holds for now, but it is worth noting.


What I found

In this branch, lifecycle management that I would otherwise write by hand comes from the framework. Pause, resume, stop, and ping work across the entire graph or per process without additional code. Backpressure is automatic. The topology is still inspectable as data before execution begins.

(aya/pause-agent! sys :assistant)
(aya/ping-agent   sys :assistant)
(aya/resume-agent! sys :assistant)

These things could certainly be added to the hand-written executor. But each one takes time, and none of them is what this experiment is actually about. Each one was a solved problem I was solving again. The execution model is now more explicit: the topology is a data structure, the connections are declared, and the lifecycle is managed in one place. And with core.async.flow-monitor, visualization comes for free as well:

(aya/describe-topology agent)
;; => {:procs {...} :conns [...] :entry-key :preprocess :deps #{...}}

Closing

Where flow worked well in this experiment: topology as data, lifecycle management, backpressure, observability. These came for free and replaced code I would have written by hand.

Where it required bridging: request/reply semantics, fan-out coordination, streaming. Each of those is outside what flow is designed for. The correlation pattern introduces external mutable state. The fan-out workaround has no timeout. The self-loop uses a blocking read inside the step function, bypassing flow’s assumption that step functions do not interact with channels directly.

Rich Hickey’s “Effective Programs” talk distinguishes between situated programs, long-running systems with state, context, and lifecycle, and transient programs, short-lived computations that start, do work, and finish. Ayatori’s runtime is situated. But its external API is transient: invoke a cap, get a result back. Flow handles the situated side well. That tension was always going to require bridging. Whether the bridge I built here is worth maintaining is what I am still working out.

Each workaround gave me the feeling of using flow outside its intended purpose. That might still be acceptable. I have not decided. What I did learn is what those costs actually look like in practice, which is what I set out to find.


The branch is not merged. The code is here: https://github.com/serefayar/ayatori/tree/flow

If you have worked with flow in a similar context, or think I am looking at this the wrong way, I would be glad to hear it.

Permalink

On sabotaging projects by overthinking, scope creep, and structural diffing

Hi friends,

I’ll be attending Babashka Conf on May 8 and Dutch Clojure Days on May 9. If you’re attending either (or just visiting Amsterdam), drop me a line!

On sabotaging projects by overthinking

When I have an idea for a project, it tends to go in one of these two directions:

  1. I just do it. Maybe I make a few minor revisions, but often it turns out exactly how I’d imagined and I’m happy.

  2. I think, “I should look for prior art”. There’s a lot of prior art, dealing with a much broader scope than I’d originally imagined. I start to wonder if I should incorporate that scope. Or perhaps try to build my thing on top of the existing sorta-nearby-solutions. Or maybe I should just use the popular thing. Although I could do a better job than that thing, if I put a bunch of time into it. But actually, I don’t want to maintain a big popular project, nor do I want to put that much time into this project. Uh oh, now I’ve spent a bunch of time, having neither addressed the original issue nor experienced the joy of creating something.

I prefer the first outcome, and I think the pivotal factor is how well I’ve internalized my own success criteria.

For example, last weekend I hosted my friend Marcin and we decided it’d be fun to do some woodworking, so we threw together this shelf and 3d-printed hangers for my kitchen:

a black shelf with a painted orange/pink edge and Ikea food bins hanging off the bottom

Absolute banger of a project:

  • brainstormed the design over coffee
  • did a few 3d-print iterations for the Ikea bin hangers (OnShape CAD, if you want to print your own)
  • used material leftover from my workbench
  • rounded the corner by eye with a palm sander
  • sealed the raw plywood edge with some leftover paint from a friend
  • done in a weekend

The main success criteria was to jam on woodworking with a friend, and that helped me not overthink the object-level success criteria: Just make a shelf for my exact kitchen!

In contrast, this past Friday I noticed difftastic did a poor job, so I decided to shop around for structural/semantic diff tools and related workflows (a topic I’ve never studied, that I’m increasingly interested in as I’m reviewing more and more LLM-generated code).

I spent 4 hours over the weekend researching existing tools (see my notes below), going through dark periods of both “semantic tree diffing is a PhD-level complex problem” and “why do all of these have MCP servers? I don’t want an MCP server”, before I came to my senses and remembered my original success criteria: I just want a nicer diffing workflow for myself in Emacs, I should just build it myself — should take about 4 hours.

I’m cautiously optimistic that, having had this realization and committing myself to a minimal scope, I’ll be able to knock out a prototype before running out of motivation.

However, other long-running interests of mine:

seem to be deep in the well of outcome #2.

That is, I’ve spent hundreds of hours on background research and little prototypes, but haven’t yet synthesized anything that addresses the original motivating issue.

It’s not quite that I regret that time — I do love learning by reading — but I have a nagging sense of unease that my inner critic (fear of failure?) is silencing my generative tendencies, keeping me from the much more enjoyable (and productive!) learning by doing.

I think in these cases the success criteria has been much fuzzier: Am I trying to replace my own usage of Rust/Clojure? Only for some subset of problems? Or is it that I actually just need a playground to learn about language design/implementation, and it’s fine if I don’t end up using it?

Ditto for CAD: Am I trying to replace my commercial CAD tool in favor of my own? Only for some subset of simple or particularly parametric parts? Do I care if it’s useful for others? Does my tool need to be legibly different from existing open-source tools?

It’s worth considering these questions, sure. But at the end of the day, I’d much rather have done a lot than have only considered a lot.

So I’m trying to embrace my inner clueless 20-year-old and just do things — even if some turn out to be “obviously bad” in hindsight, I’ll still be coming out ahead on net =D

Conservation of scope creep

Of course, there’s only so much time to “just do things”, and there’s a balance to be had. I’m not sure how many times I’ll re-learn YAGNI (“you ain’t gonna need it”) in my career, but I was reminded of it again after writing a bunch of code with an LLM agent, then eventually coming to my senses and throwing it all out.

I wanted a Finda-style filesystem-wide fuzzy path search for Emacs. Since I’ve built (by hand, typing the code myself!) this exact functionality before (walk filesystem to collect paths, index them by trigram, do fast fuzzy queries via bitmap intersections), I figured it’d only take a few hours to supervise an LLM to write all the code.

I started with a “plan mode” chat, and the LLM suggested a library, Nucleo, which turned up since I wrote Finda (10 years ago, eek!). I read through it, found it quite well-designed and documented, and decided to use it so I’d get its smart case and Unicode normalization functionality. (E.g., query foo matches Foo and foo, whereas query Foo won’t match foo; similarly for cafe and café.)

Finding a great library wasn’t the problem, the problem was that Nucleo also supported some extra functionality: anchors (^foo only matches at the beginning of a line).

This got me thinking about what that might mean in a corpus that consists entirely of file paths. Anchoring to the beginning of a line isn’t useful (everything starts with /), so I decided to try and interpret the anchors with respect to the path segments. E.g., ^foo would match /root/foobar/ but not /root/barfoo/.

But to do this efficiently, the index needs to keep track of segment boundaries so that the query can be checked against each segment quickly.

But then we also need to handle a slash occurring in an anchored query (e.g., ^foo/bar) since that wouldn’t get matched when only looking at segments individually (root, foo, bar, and baz of a matching path /root/foo/bar/baz/).

Working through this took several hours: first throwing around design ideas with an LLM, having it write code to wrap Nucleo’s types, then realizing its code was bloated and didn’t spark joy, so finally writing my own (smaller) wrapper.

Then, after a break, I realized:

  1. I can’t think of a situation where I’d ever wished Finda had anchor functionality
  2. In a corpus of paths, I can anchor by just adding / to the start or end of a query (this works for everything except anchoring to the end of a filename).

So I tossed all of the anchoring code.

I’m pretty sure I still came out ahead compared to if I’d tried to write everything myself sans LLM or discussion with others, but I’m not certain.

Perhaps there’s some kind of conservation law here: Any increases in programming speed will be offset by a corresponding increase in unnecessary features, rabbit holes, and diversions.

Structural diffing

Speaking of unnecessary diversions, let me tell you everything I’ve learned about structural diffing recently — if you have thoughts/feelings/references in this space, I’d love to hear about ‘em!

When we’re talking about code, a “diff” usually means a summary of the line-by-line changes between two versions of a file. This might be rendered as a “unified” view, where changed lines are prefixed with + or - to indicate whether they’re additions or deletions. For example:

We’ve removed coffee and added apple.

The same diff might also be rendered in a side-by-side view, which can be easier to read when there are more complex changes:

The problem with these line-by-line diffs is that they’re not aware of higher-level structure like functions, types, etc. — if some braces match up somehow between versions, they might not be shown at all, even if the braces “belong” to different functions.

There’s a wonderful tool, difftastic, which tries to address this by calculating diffs using treesitter-provided concrete syntax trees. It’s a huge improvement over line-based diffs, but unfortunately it doesn’t always do a great job matching entities between versions.

Here’s the diff that motivated this entire foray:

Note that it doesn’t match up struct PendingClick, it shows it deleted on the left and added on the right.

I haven’t dug into why difftastic fails to match here, but I do feel like it’s wrong — even if the overall diff would be longer, I’d still rather see PendingClickRequest and PendingClick matched up between both sides.

Here’s a summary of tools / references in the space:

  • The most “baked” and thoughtful semantic diff tool I found is, perhaps unsurprisingly, semanticdiff.com, a small German company with a free VSCode plugin and web app that shows diffs for github PRs. Unfortunately they don’t have any code libraries I can use as a foundation for the workflow I want.

    Context-sensitive keywords in particular were a constant source of annoyance. The grammar looks correct, but it will fail to parse because of the way the lexer works. You don’t want your tool to abort just because someone named their parameter “async”.

  • diffsitter

    • built on treesitter, has MCP server. README includes list of similar projects.
    • lots of github stars, but doesn’t seem particularly well-documented; I couldn’t find an explanation of how it works, but the difftastic wiki says it “runs longest-common-subsequence on the leaves of the tree”
  • gumtree

    • research / academic origin in 2014
    • requires Java, so no-go for my use case of a quick tool I can use via Emacs
  • mergiraf: treesitter-based merge-driver written in rust

    • very nice architecture overview; tool uses Gumtree algorithm
    • docs and adorable illustrations indicate this project was clearly written by a thoughtful human
    • semanticdiff.com author in HN comments: > GumTree is good at returning a result quickly, but there are quite a few cases where it always returned bad matches for us, no matter how many follow-up papers with improvements we tried to implement. In the end we switched over to a dijkstra based approach that tries to minimize the cost of the mapping
  • weave: also a treesitter-based merge-driver written in Rust

    • feels a bit “HN-optimized” (flashy landing pages, lots of github stars, MCP server, etc.)
    • I looked into their entity extraction crate, sem
    • core diffing code is OK but pretty wordy
    • greedy entity matching algorithm
    • data model can’t detect intra-file moves, even though those might be significant
    • includes a lot of heuristic “impact” analysis, which feels like overreaching-scope to me since it’d require much tighter language integration before I’d trust it
      • ran into buggy output when running sem diff --verbose HEAD~4; it showed lines as having changed that…didn’t change at all.
    • Too much 80%-done, hypothetically useful functionality for me to use as a foundation, but props for sure to the undergrad/student(?) who’s built all this in just three months.
  • diffast: tree edit-distance of ASTs based on an algorithm from a 2008 academic paper.

  • autochrome: Clojure-specific diffs based on dynamic programming

    • excellent visual explanation and example walkthrough
  • Tristan Hume has a great article on Designing a Tree Diff Algorithm Using Dynamic Programming and A*

My primary use case is reviewing LLM output turn-by-turn — I’m very much in-the-loop, and I’m not letting my agent (or dozens of them, lol) run wild generating 10k+ lines of code at a time.

Rather, I give an agent a scoped task, then come back in a few minutes and want to see an overview of what it did and then either revise/tweak it manually in Emacs or throw the whole thing out and try again (or just write it myself).

The workflow I want, then, is to

  • see a high-level overview of the diff: what entities (types/functions/methods) were added/removed/changed?
  • quickly see textual diffs on an entity-by-entity basis (“expanding” parts of the above summary)
  • quickly edit any changes, without having to navigate elsewhere (i.e., do it inline, rather than having to switch from “diff” to “file)

Basically, I want something like Magit’s workflow for reviewing and staging changes, but on an entity level rather than file/line level.

In light of the "minimal scope, just get your project done” lesson I’ve just re-learned for the nth time, my plan is to:

  • throw together my own treesitter-based entity extraction framework (just Rust for now)
  • do some simple greedy matching for now
  • render the diff to the command line

Once that seems reasonable (i.e., it does a better job than difftastic did on that specific commit), I’ll:

  • wire into a more interactive Magit-like Emacs workflow (maybe I can reuse Magit itself!?!)
  • add support for new languages, as I need them
  • potentially explore more sophisticated score-based global matching rather than simple greedy matching

Mayyybe if I’m happy with it I’ll end up releasing something. But I’m not trying to collect Github stars or HN karma, so I might just happily use it in the privacy of my own home without trying to “commercialize it”.

After all, sometimes I just want a shelf.

Misc. stuff

Permalink

Week Notes 2026.16

The Clojure documentary is released, taking inspiration from the Datomic Ions approach to logging and alerting, and configuring AWS Lambda schedules with Terraform.

Permalink

From Locks to Actors: The Four Pillars of Modern Concurrency

Most working engineers have spent ninety percent of their concurrent-programming life in one model: shared memory protected by locks. Threads that all see the same variables. Mutexes around the critical sections. Hope and care. It's the model every OS textbook teaches, every mainstream language supports, and every senior engineer has a horror story about.

It's also not the only option. Or even the best one, for many of the problems it gets used for. Three other models — CSP, actors, and software transactional memory — have been around for decades, mature enough for production, and each solves a class of problems that lock-based designs handle poorly.

This is a map of all four, from a working backend engineer who uses each of them for different jobs, and a take on when each is the right answer.

M1[

tl;dr — Concurrency has four viable pillars: shared memory + locks (threads, mutexes), CSP (channels, Go), actors (mailboxes, Erlang), and STM (transactional memory, Clojure). None is universally better. Each solves a different problem and has a different failure mode. Senior designs often mix three of them in one system. Mutex-for-everything works until it doesn't — usually at exactly the scale you promised you'd never reach.

Pillar 1: Shared Memory + Locks

The default. Threads, mutexes, atomics, condition variables. Every mainstream language has them.

How it works: multiple threads of execution share the same address space. They read and write the same data. Mutexes make sure only one thread touches a critical section at a time. Atomics do the same for single-word operations without a full lock.

Where it shines:

  • Simple shared counters and caches. atomic.AddInt64, sync.Map, LRU caches. The right tool.
  • Tight single-process coordination where the code is small enough for one person to hold in their head.
  • Performance-critical paths where the overhead of channel sends or actor dispatches is too much.

Failure modes:

  • Deadlocks. Two threads acquire locks in opposite order. Happens.
  • Priority inversion. Low-priority thread holds the lock, high-priority thread waits, work piles up.
  • Lock ordering bugs at scale. When N components each take M locks, the reasoning gets exponential.
  • Memory-model weirdness. What one thread writes, another may not immediately see. You start caring about happens-before, acquire/release semantics, and why volatile in Java is not what you thought.
  • Invisible races. The worst kind. Tests pass; production fails weirdly twice a month.

Use mutexes for small, localized shared state. Once the shared state has three collaborators or more, or a nontrivial invariant across fields, reach for one of the other models.

Pillar 2: CSP (Communicating Sequential Processes)

Tony Hoare's 1978 paper, popularized by Occam and now Go. The model Rob Pike and Ken Thompson picked for Go's concurrency.

How it works: processes don't share memory; they send messages on named channels. Senders and receivers rendezvous on the channel. Ownership of data moves with the message. "Do not communicate by sharing memory; share memory by communicating."

Where it shines:

  • Pipelines. Data flows through stages, each a goroutine, connected by channels. Clean to read.
  • Fan-out / fan-in. One producer, many workers, one aggregator. The channel topology is the architecture.
  • Backpressure. A bounded channel blocks the producer when full. No extra flow control needed.
  • Cancellation coordination. select with <-ctx.Done() is a clean primitive.
  • Lifecycle control. Closing a channel is a broadcast to every listener.

Failure modes:

  • Deadlocks remain possible. Two goroutines each waiting on the other's channel. Cycles in the channel graph are lethal.
  • Memory leaks via unclosed channels. A goroutine blocked on a send that will never be received lives forever.
  • Awkward request/reply. You end up passing a reply channel with each request, which works but feels verbose.
  • Order isn't free. Channel ordering is only per-channel. If you fan out and fan in, the aggregation is unordered unless you sort.

Use CSP for coordination-heavy designs. When the structure of "who's alive, who sends to whom, when do things stop" is the architecture, channels make that visible in the code.

Go is the obvious exemplar, but CSP-style is also available in Rust (crossbeam-channel, tokio::sync::mpsc), Kotlin (coroutines with channels), Python (asyncio.Queue), and C# (System.Threading.Channels).

Pillar 3: Actors

Carl Hewitt's 1973 paper. Made practical by Erlang (1986) and later Akka (Scala/Java). The model behind WhatsApp, a decade of telecom, and most fault-tolerant messaging infrastructure.

How it works: an actor is a named entity with private state and a mailbox. Other actors send messages to its address. Messages are processed one at a time from the mailbox. No shared memory. Parent actors supervise children; when a child crashes, the parent decides to restart, escalate, or ignore. Crashes are normal.

Where it shines:

  • Fault isolation at scale. One actor crashing is expected; it doesn't take down the system. Supervision hierarchies make "let it crash" a sensible engineering strategy.
  • Stateful services. Each actor holds its own state. Conceptually clean: no shared global state, no locks around it.
  • Location transparency. An actor can live in the same process, another process, or another machine. The sender doesn't know. This is where actors shine in distributed systems — the model scales across the network boundary natively.
  • Massive concurrency with stateful semantics. Erlang routinely runs millions of actors per node. Each is cheap.

Failure modes:

  • Mailbox unboundedness. If a producer sends faster than the actor can process, the mailbox grows without bound. Bounded mailboxes exist; use them.
  • Message-ordering assumptions break across the network. Within one node, delivery order is preserved per sender. Across nodes, all bets are off without explicit sequencing.
  • Testing is harder. Actors make their own state opaque; you test behavior through message exchange. Good frameworks help, but the habits needed are different from testing normal code.
  • Conceptual mismatch in CRUD-style backends. If your business logic is "select some rows, transform them, insert result," actors are overkill. They shine on long-lived stateful entities (a game character, a connected device, a user session), not on stateless request handlers.

Erlang and Elixir are the canonical runtimes. Akka brings actors to the JVM. Pony is a rare actor-first typed language. In Go, you can simulate actors with a goroutine + channel-as-mailbox pattern, but you lose Erlang's supervision and "let it crash" semantics unless you build them yourself.

Use actors when you have long-lived stateful entities with fault requirements. Telecom, messaging, multiplayer game servers, IoT device shadows, any system where "this particular entity has its own state machine, and we really care when it crashes" is the shape.

Pillar 4: Software Transactional Memory (STM)

Imagine database transactions, but for in-memory data. That's STM.

How it works: critical sections are wrapped in transactions. The runtime tracks reads and writes optimistically. On commit, if any data touched was modified by another transaction, the current one rolls back and retries. No explicit locks. Composability — two transactions can be combined into a larger one without redesigning the locking order.

Where it shines:

  • Composable concurrent code. Combining operations that were individually correct usually stays correct under STM. Lock-based code famously does not.
  • Read-mostly workloads. STM with multi-version concurrency control scales reads without blocking.
  • Avoiding the lock-ordering bug class. No locks, no deadlocks. The failure mode is retry storms, which are easier to reason about.

Failure modes:

  • I/O inside transactions is awful. Transactions may retry. If you did I/O, you may have done it multiple times. Either separate I/O from transactional state, or the runtime has to forbid I/O inside transactions (Haskell's STM monad does this at the type level).
  • Retry storms under contention. Heavy write contention on the same data means constant retries. In the worst case, throughput can be worse than locks.
  • Limited language support. Clojure (built-in), Haskell (STM), Scala (scala-stm), Rust (experimental stm crates). Not a mainstream feature of Go/Java/C#.

Clojure is the canonical "STM as a first-class citizen" language — its refs and transactions are idiomatic. Haskell's STM monad is arguably the cleanest realization. In other ecosystems, STM exists as libraries but hasn't displaced mutexes.

Use STM when the concurrent state is small-to-medium, the access pattern is read-heavy with occasional writes, and you want the composability. For the rare problems that fit, STM is strictly simpler to reason about than locks. For problems that don't fit (I/O-heavy, write-contention-heavy), STM is worse.

How Real Systems Mix Them

The surprise for engineers who've only used one model: mature systems mix three of them in one codebase.

A typical backend service I'd build today:

  • Mutexes / atomics for the inner loops — counters, caches, rate-limiter state, anything performance-critical with one clear owner.
  • Channels (CSP) for coordination — worker pools, pipelines, cancellation, shutdown signaling, bounded queues.
  • Actors (in a sense) for long-lived stateful entities — each connected client session, each in-flight request, each background job. In Go I'd model this as "one goroutine per entity, communicating via channels," which isn't formal actors but inherits the useful semantics: isolated state, message-passing, crash-isolation.

And I wouldn't use STM in that stack. Not because it's bad, but because the language runtime doesn't make it first-class. If I were writing Clojure, STM would be a natural fit for the in-memory state machines that would otherwise be locked maps.

The old "pick one concurrency model" debate was always a false choice. The real decision is per-problem: what shape is the concurrent work, what's the state-sharing pattern, and what failure semantics do I want.

Decision Guide

Quick map:

  • I have a counter that multiple goroutines read and update. → atomic or mutex.
  • I have a pipeline of work that flows through stages. → channels (CSP).
  • I have a fleet of long-lived sessions, each with its own state and lifetime. → actor pattern (goroutine + mailbox channel, or real actor framework).
  • I have a fleet of connected devices each with a state machine that must survive crashes. → actor framework with supervision (Erlang, Akka, or Go with explicit crash/restart logic).
  • I have complex shared state with nontrivial invariants across fields, and updates are occasional but important to compose. → STM if your language supports it; otherwise, lots of careful mutex discipline.
  • I have a request/response flow with fan-out to downstreams. → CSP with errgroup.WithContext.
  • I have no idea what I have. → Start with mutexes, switch when it hurts. Don't over-engineer the first version.

The Real Lesson

Most people who get bitten by concurrency bugs got bitten because they used the wrong model, not because they used it wrong. A mutex-heavy design for a workload that's really a pipeline is fragile. A channels-for-everything design when there's a shared counter underneath ends up with awkward rendezvous. An actors-everywhere design when the business is CRUD requests reads like over-engineering.

The four pillars aren't competing theories of concurrency. They're four tools, each good at specific jobs. Senior engineers know all four and reach for the right one. Junior engineers reach for the only one they know and force-fit it.

If your career so far has been mostly mutexes, spend a weekend reading the other three. Write a toy pipeline in Go channels. Read Erlang's supervision documentation. Play with Clojure refs. The investment pays back every time you sit in a design review and someone proposes locking their way out of a structural problem.

Related

Permalink

Runtime async

Starting with .NET 11, the .NET runtime now offers runtime support for async/await. The latest release of ClojureCLR (1.12.3-alpha6) provides experimental support for this feature. In this post, we’ll take a look at what async/await, how it works on the CLR, and how ClojureCLR supports it.

The async feature

Async is feature that allows methods to suspend their computation while waiting for a subcomputation to complete. Often, the subcomputation is something that requires waiting for an event, such as completion of an I/O operation. Suspending the computation means yielding control of the thread of execution, which frees the current thread to be used for other purposes during the wait, thus improving multi-threading efficiency.

To take advantage of async in C#, one marks the method’s signature with the async keyword. Within the method’s body, the await keyword can be used to mark the specific suspension points. A simple example:

public static class AsyncExample
{
    public static async Task<List<string>> GetData(List<string> filenames)
    {
        List<string> contents = new List<string>(filenames.Count);
        foreach (string filename in filenames)
        {
            contents.Add(await GetDataFromFile(filename));
        }
        return contents;
    }

    private static async Task<string> GetDataFromFile(string filename)
    {
       return await System.IO.File.ReadAllTextAsync(filename);
    }  
 
}

(No need for the class or the methods to be static, but that seems correct for this example.)

Several things to note:

  • Methods marked as async must have a return type derived from one of these:
    • Task
    • ValueTask
    • Task<TResult>
    • ValueTask<TResult>
  • There is contagion here. We defined GetDataFromFile as async. When we use await on it in the caller, the caller must be marked as async. (One can use methods from the Task library to avoid using await, but await is the typical mechanism.)

Runtime implementation

Prior to .NET 11, async and await in C# was implemented using code transformations performed by the compiler. A method marked async is rewritten as a finite-state machine. Suspension points required mechanisms to save and restore state around tha await call. This consumed heap memory. Amd the resulting code caused things like stack traces in exceptions to be loaded with compiler-generated method names that were not very helpful to the programmer.

As of .NET 11, these heroic compiler efforts are no longer required. The CLR core runtime detects methods marked as async (the compiler needs to pass along that information in the method metadata). The compiler translates await calls into certain special method calls that the runtime recognizes. The runtime now has the capability of dealing with saving and restoring state on its own. Simpler, and also more efficient, according to benchmarks.

The story for ClojureCLR

Starting with version 1.12.3-alpha6, ClojureCLR provides experimental support for runtime async. A new library clojure.clr.async.task.alpha provides functions to help with this. Internally, the ClojureCLR compiler has been modified to pass along the critical metadata on async functions and to rewrite await calls to the special methods used by the runtime.

(This library is marked “alpha” because we are still experimenting with the API. Also, runtime async in .NET 11 is still in preview. In the final release, the library will be in namespace clojure.clr.async.task.)

It is helpful to import the namespace.. From the sample file for this post, we start with:

(ns test.async-test
  (:require [clojure.clr.async.task.alpha :as t])
  (:import [System.Threading.Tasks Task]
           [System.IO Path File]
		   [System.Threading CancellationToken]))

We have imported a few other classes for the example code. Task is useful. We will also find Task<Object> useful, so we define an alias for it.

(alias-type ObjTask |System.Threading.Tasks.Task`1[Object]|)

We’re going to do some file I/O in our examples, so it helpful to have a few random files to work with:

(def ^String in-file (Path/GetTempFileName))  
(def ^String out-file (Path/GetTempFileName))

(File/WriteAllText in-file "Some random content.")

Accessing a task result

If all you want to do is call an async method without awaiting it (yielding the thread), then you can use t/result. t/result will run a task if it is not already started/completed and wait to return its result. More typically we use it to extract the result of a task that has already completed, but it will start the task and wait if necessary.

(t/result (File/ReadAllTextAsync in-file CancellationToken/None))  ;; => "Some random content."

Async functions

If you want to use await to mark a suspension point where control is yielded, you need to working in an async context. One way to provide that context is to defn a function with the ^:async tag. This marks all of its overloads (IFn.invoke methods) as async for the runtime. It also type hints the return type of the function as System.Threading.Tasks.Task<Object>. This applies to all the arities of the function.

(defn ^:async shout-it-out [infile outfile]
  (let [content (t/await (File/ReadAllTextAsync infile CancellationToken/None))
        capitalized (.ToUpper content)]
    (t/await (File/WriteAllTextAsync ^String outfile capitalized CancellationToken/None))
	"I'm done yelling."))

We have suspension points at the read and write calls, which are asynchronous I/O operations. The function will yield control at those points, allowing the thread to be used for other purposes while waiting for the I/O operations to complete. When the operations complete, the function will resume execution at the point of suspension.

Note that calling shout-it-out will return a Task<Object>.

(def t1 (shout-it-out in-file out-file))

(t/task? t1)  ; => true

To get the result of the task, we can use t/result. In this case, again, the t/result will start and wait on the task if it is not already completed.

(t/result t1) ; => "I'm done yelling."

If you want to use await in a function but don’t want the function itself to return a task, but just get on with things, you can use t/async to provide an :async context. t/async wraps its body in an ^:async (fn [] ...body...). This form returns a task, so you will need to a tas operation on it to get work done. Typically, you will need to call t/result if you want to get the value from the awaited call.

(defn just-read [infile]
   (t/result (t/async (t/await (File/ReadAllTextAsync infile CancellationToken/None)))))

(just-read in-file) ;; => "Some random content"

The call to t/async returns a task, so we need to call t/result to get the value from the awaited call. If we had not done that, we would have gotten a Task<Object> back instead of the string content. In that case, it would be preferable generally to just define an ^:async function if you want to use await in it, but t/async can be useful if you want an anonymous async function. This might be useful to construct tasks for use in wait-all or wait-any calls, for example. (See below.)

Some utility functions

There are several utility functions to create basic tasks.

(t/->task 42)      ;; creates a Task<Object> that returns 42 when run
(t/completed-task) ;; creates a Task that is already completed
(t/delay-task 3000) ;; creates a Task that delays for 3000 milliseconds

You can run any zero-arg Clojure function as a task:

(t/result (t/run (fn [] (+ 1 2 3))))

(defn now [] DateTime/Now)
(t/result (t/run now))

Again, calling t/result on the result of t/run will start the task if it is not already started and wait for it to complete, returning the result. This does not take full advantage of suspension.

Waiting for one or for all

You can do wait-for-one and wait-for-all operations on a group of tasks. You can either just run the tasks or ask for their value(s).

Function Description
(t/wait-all tasks) wait for all the tasks to complete; return nil
(t/wait-any tasks) start all tasks, return the first one to complete
(t/wait-all-results tasks) wait for all the tasks to complete, return a lazy sequence of their results
(t/wait-any-result tasks) return the result of the first task to complete

An example:

;; A little dummy function to delay and then return a value.

(defn ^:async delayed-value [msecs val]
  (t/await (t/delay-task msecs))
  val)
  
;; Just a little test to make sure things are taking time.

(time (t/result (delayed-value 4000 7)))  ;; => take more than 4 seconds  

(t/wait-all-results [(delayed-value 2000 2000) 
                     (delayed-value 4000 4000) 
                     (delayed-value 6000 6000)]) ;; => (2000 4000 6000)
(t/wait-any-result [(delayed-value 2000 2000) 
                     (delayed-value 4000 4000) 
                     (delayed-value 6000 6000)])  ;; => 2000 (most likely)

(t/wait-all-results [(delayed-value 2000 2000) 
                     (delayed-value 4000 4000) 
                     (delayed-value 6000 6000)] 
                    500)                          ;; => nil  (times out)
(t/wait-any-result [(delayed-value 2000 2000) 
                    (delayed-value 4000 4000) 
                    (delayed-value 6000 6000)] 
                   500)                           ;; => nil  (times out)

Does it work?

Short answer: yeah.

I did some simple tests to look at things like thread affinity and flooding the thread pool. The simplest things to do is to replace a call like (t/await ...) with (.Wait ...). The latter does not yield its thread; the difference in performance is notable.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.