Since it's “2-way,” I called it bisql with the bi- prefix.
It is pronounced like bicycle.
A common weakness of SQL-template libraries, not just 2-way SQL libraries, is that writing all SQL as templates can become tedious.
To address that, many SQL-template libraries support a query builder.
You can write some queries as templates and others with a builder.
But I think that approach is fundamentally inconsistent.
If every query function is maintained as SQL or SQL templates, the cost of reviewing and understanding the data access layer drops significantly. Every database access has a concrete representation as an actual SQL file.
Once a query builder gets mixed in, that consistency is gone.
And in practice, what starts as “we’ll only use the builder for simple queries” often drifts into using it for complex queries as well, until the generated SQL is no longer obviously what you intended.
Bisql takes a different approach:
every database access should be written as SQL.
That said, hand-writing even simple CRUD operations for every table is tedious. So Bisql takes another approach there: it generates a large and comprehensive set of typical CRUD queries automatically.
It connects to a real database, inspects the schema, considers indexes, and generates SQL templates for many index-friendly query patterns.
Then the defquery macro converts all of those .sql template files into Clojure functions at once.
So you keep the consistency of SQL-first development without the repetitive CRUD work.
Malli Support
In this release, generated SQL templates can now include :malli/in and :malli/out declaration metadata.
These hold schemas for:
the parameters passed to a query function
the response data returned from the query
Bisql SQL templates can already carry arbitrary metadata that becomes metadata on the generated query functions. Malli support builds on that.
If a query function has :malli/in and :malli/out metadata, Bisql can automatically run Malli validation during query execution. This behavior is configurable.
Bisql also generates a base Malli schema file for each table as schema.clj.
The :malli/in and :malli/out metadata refer to those generated schemas.
In other words, typical CRUD queries and their schemas can now be generated automatically, and validation can run transparently with very little manual work.
A Small Expression Language for if
This release also adds a small expression language for if conditions inside SQL templates.
That makes conditional rendering more expressive without leaving SQL templates or introducing a separate query builder layer.
I was explaining an idea from my book to a coworker. I was saying that total functions can reduce defensive coding and simplify code. My coworker commented that it’s kind of meaningless to talk about totality in an untyped language.
I don’t agree, but I do understand the idea. It is much harder to define total function in an untyped language than in a typed language. When you’ve got types, you can say “A function is total if it returns a value (instead of throwing an error) for any arguments that pass the type checker.” It’s a straightforward definition. Or so it seems.
Totality is an idea from mathematics, specifically in the field of computability. A function is total if every combination of arguments in the domain of the function results in a value in its range. The domain and range are each sets of values. The domain specifies valid arguments while the range specifies valid return values.
And here we see a major problem: We’re using domain in two different ways. One is about the valid arguments to a function. The other is domain as in domain modeling, where it means the area of concern of your software. It’s tricky to talk about the first when in the context of the second.
Division is total in mathematics. Division’s domain is
ℝ x (ℝ - 0)
and the range is ℝ. That means division is defined for pairs of real numbers where the second element is not zero. That’s what we learned in high school algebra.
But in programming, division is the prime example of a partial function. How come? Well, when you define the function in a type language like this:
function divide(a: number, b: number): number
You get a function that can throw an exception—so it’s not total—even after it passes the type checker:
divide(1, 0) //=> Exception! Divide by zero.
What we see is that we’ve mapped the “set of values” idea of domain from math onto the “types of arguments” idea in programming languages. It’s an imperfect mapping. And it’s a mapping many of us have gladly accepted. The function is partial because the language cannot express the domain perfectly.
That’s why the whole concept of total functions is important. It gets us thinking about the gaps in our mappings. It gets us thinking about how to do a better mapping. Or about what kinds of checks on the arguments we should do before we call a function or after we get a return value. In short, it’s about safety and trust.
But notice that even in typed languages, there is a gap in the mapping from domain to types. The gap may be smaller than with untyped languages, but there can be a gap. My colleague said it’s meaningless, which implies a Church Typing mindset. It implies that because there are no checks for types, you could pass anything, such as:
(+ “a” 4) ;=> Exception! Expected Number but got String.
That means that even venerable addition is not total in Clojure. I think this is a little disingenuous. This is obviously an incorrect program, even if in general, incorrectness is hard to define in untyped languages. The correct set of values is well-known for + even if it’s not written down. And for any function, all we have to do is write it down—in documentation—and the intended domain is explicit.
I don’t want this essay to be a debate about static vs. dynamic typing. This essay is about totality and how to define it. Totality is really a concept that came from computability theory. It’s for asking questions like “does function f halt for all integer inputs.” The domain is specified by the problem, not the function. That’s why division is partial in programming: The domain is specified from outside, by the language’s choice of types.
In programming we use totality to consider the possible and probable inputs of a function. We concede that we must make imperfect tradeoffs between ideal mathematical objects and available language features. In most type systems, we can’t say “it’s a number, but it can’t be zero” and have the type checker ensure it’s correct.
So I’ve come up with a definition that takes the pragmatics of the idea into account:
A function is total if it returns a valid domain value for every combination of valid domain values as arguments.
In this definition, “valid domain value” is doing a lot of work. It’s purposefully vague. But it asks you, the domain modeler, to consider your domain (in the business domain sense) and the meaningful values in it. It asks you to consider what subset of those values are valid. These often don’t correspond perfectly to types in your language. Ideally there would be a perfect overlap between the domain of a function and the business domain values. Where the overlap is not perfect is where you should look.
I want programmers to think past types and peer deep into the domain. I want them to ask these questions:
What are the meaningful domain values that this function is meant to operate on?
Does that set constitute a cohesive concept?
Are there values that are conceptually cohesive that it won’t work for?
How can I map the set of domain values this function is defined on to language features to get some kind of safety guarantees?
This definition also has a curious effect. Let me demonstrate in Clojure. Let’s say I’m definition a function that should only be defined over non-negative numbers. I don’t have complex numbers in my business domain:
(defn square-root [x] …)
If I’m conscientious, I’ll add a docstring explaining the domain of the function (the set of arguments it is defined over):
(defn square-root “The square root of a non-negative number.” [x]…)
Great! But I want the domain to be enforced in code. In Clojure, I might write:
(defn square-root “The square root of a non-negative number.” [x] (assert (not (neg? x))) …)
Awesome! I’ve now made the function total! We’ve expressed the domain of the function adequately using the features of the language. Ironically, though, we have written code that is more likely to throw when totality is usually defined as throwing less often. But we’re also defining what are valid arguments. Calling (square-root -1) throws an AssertionError, indicating that you, the programmer, violated the contract. So it is total, considering the explicitly defined domain of the function.
These are the kinds of questions I wrestle with when writing a book. I know I use the idea of total function all the time when programming in Clojure. I think it’s valuable. But Clojure doesn’t have types, so what can it mean? Why do others believe it can’t mean anything? I think it’s important to get these ideas right. If you were ever wondering what takes so long to write a book, it’s reading, thinking, digging, conversing until I feel like I’ve got a handle on it.
When you’re working on software design, or any kind of design, there are always hard questions. You can’t rely on pat answers. Instead, you need conceptual models to give you different perspectives. Total functions is one of those. It helps you focus on the failure modes of your functions: Could this function be passed something that will break it? Will it ever return an unexpected value? What would cause it to throw an exception? It’s not about writing code according to some rule, or using a language feature in a particular pattern. You have to think broader than that.
Recently I started to look into the problem of reentry trajectory planning in the context of developing the sfsim space flight simulator.
I had looked into reinforcement learning before and even tried out Q-learning using the lunar lander reference environment of OpenAI’s gym library (now maintained by the Farama Foundation).
However it had stability issues.
The algorithm would converge on a strategy and then suddenly diverge again.
More recently (2017) the Proximal Policy Optimization (PPO) algorithm was published and it has gained in popularity.
PPO is inspired by Trust Region Policy Optimization (TRPO) but is much easier to implement.
Also PPO handles continuous observation and action spaces which is important for control problems.
The Stable Baselines3 Python library has a implementation of PPO, TRPO, and other reinforcement learning algorithms.
However I found XinJingHao’s PPO implementation which is easier to follow.
In order to use PPO with a simulation environment implemented in Clojure and also in order to get a better understanding of PPO, I dediced to do an implementation of PPO in Clojure.
Dependencies
For this project we are using the following deps.edn file.
The Python setup is shown further down in this article.
To validate the implementation, we will implement the classical pendulum environment in Clojure.
In order to be able to switch environments, we define a protocol according to the environment abstract class used in OpenAI’s gym.
Same as in OpenAI’s gym the angle is zero when the pendulum is pointing up.
Here a pendulum is initialised to be pointing down and have an angular velocity of 0.5 radians per second.
The motor is controlled using an input value between -1 and 1.
This value is simply multiplied with the maximum angular acceleration provided by the motor.
(defnmotor-acceleration"Angular acceleration from motor"[controlmotor-acceleration](*controlmotor-acceleration))
A simulation step of the pendulum is implemented using Euler integration.
(defnupdate-state"Perform simulation step of pendulum"([{:keys[anglevelocityt]}{:keys[control]}{:keys[dtmotorgravitationlengthmax-speed]}](let[gravity(pendulum-gravitygravitationlengthangle)motor(motor-accelerationcontrolmotor)t(+tdt)acceleration(+motorgravity)velocity(max(-max-speed)(minmax-speed(+velocity(*accelerationdt))))angle(+angle(*velocitydt))]{:angleangle:velocityvelocity:tt})))
Here are a few examples for advancing the state in different situations.
The observation of the pendulum state uses cosinus and sinus of the angle to resolve the wrap around problem of angles.
The angular speed is normalized to be between -1 and 1 as well.
This so called feature scaling is done in order to improve convergence.
(defnobservation"Get observation from state"[{:keys[anglevelocity]}{:keys[max-speed]}][(cosangle)(sinangle)(/velocitymax-speed)])
The observation of the pendulum is a vector with 3 elements.
Note that the observation needs to capture all information required for achieving the objective, because it is the only information available to the actor for deciding on the next action.
Action
The action of a pendulum is a vector with one element between 0 and 1.
The following method clips it and converts it to an action hashmap used by the pendulum environment.
Note that an action can consist of several values.
(defnaction"Convert array to action"[array]{:control(max-1.0(min1.0(-(*2.0(firstarray))1.0)))})
The following examples show how the action vector is mapped to a control input between -1 and 1.
The truncate method is used to stop a pendulum run after a specific amount of time.
(defntruncate?"Decide whether a run should be aborted"([{:keys[t]}{:keys[timeout]}](>=ttimeout)))(truncate?{:t50.0}{:timeout100.0}); false(truncate?{:t100.0}{:timeout100.0}); true
It is also possible to define a termination condition.
For the pendulum environment we specify that it never terminates.
The following method normalizes an angle to be between -PI and +PI.
(defnnormalize-angle"Angular deviation from up angle"[angle](-(mod(+anglePI)(*2PI))PI))
We also need the square of a number.
(defnsqr"Square of number"[x](*xx))
The reward function penalises deviation from the upright position, non-zero velocities, and non-zero control input.
Note that it is important that the reward function is continuous because machine learning uses gradient descent.
With Quil we can create an animation of the pendulum and react to mouse input.
(defn-main[&_args](let[done-chan(async/chan)last-action(atom{:control0.0})](q/sketch:title"Inverted Pendulum with Mouse Control":size[854480]:setup#(setupPI0.0):update(fn[state](let[action{:control(min1.0(max-1.0(-1.0(/(q/mouse-x)(/(q/width)2.0)))))}state(update-statestateactionconfig)](when(done?stateconfig)(async/close!done-chan))(reset!last-actionaction)state)):draw#(draw-state%@last-action):middleware[m/fun-mode]:on-close(fn[&_](async/close!done-chan)))(async/<!!done-chan))(System/exit0))
Neural Networks
PPO is a machine learning technique using backpropagation to learn the parameters of two neural networks.
The actor network takes an observation as an input and outputs the parameters of a probability distribution for sampling the next action to take.
The critic takes an observation as an input and outputs the expected cumulative reward for the current state.
Import PyTorch
For implementing the neural networks and backpropagation, we can use the Python-Clojure bridge libpython-clj2 and the PyTorch machine learning library.
The PyTorch library is quite comprehensive, is free software, and you can find a lot of documentation on how to use it.
The default version of PyTorch on pypi.org comes with CUDA (Nvidia) GPU support.
There are also PyTorch wheels provided by AMD which come with ROCm support.
Here we are going to use a CPU version of PyTorch which is a much smaller install.
You need to install Python 3.10 or later.
For package management we are going to use the uv package manager.
The following pyproject.toml file is used to install PyTorch and NumPy.
Note that we are specifying a custom repository index to get the CPU-only version of PyTorch.
Also we are using the system version of Python to prevent uv from trying to install its own version which lacks the _cython module.
To freeze the dependencies and create a uv.lock file, you need to run
uv lock
You can install the dependencies using
uv sync
In order to access PyTorch from Clojure you need to run the clj command via uv:
uv run clj
Now you should be able to import the Python modules using require-python.
A tensor with no dimensions can also be converted using toitem
(defntoitem"Convert torch scalar value to float"[tensor](py.tensoritem))(toitem(tensorPI)); 3.1415927410125732
Critic Network
The critic network is a neural network with an input layer of size observation-size and two fully connected hidden layers of size hidden-units with tanh activation functions.
The critic output is a single value (an estimate for the expected cumulative return achievable by the given observed state).
When running inference, you need to run the network with gradient accumulation disabled, otherwise gradients get accumulated and can leak into a subsequent training step.
In Python this looks like this.
withtorch.no_grad():# ...
Here we create a Clojure macro to do the same job.
(defmacrowithout-gradient"Execute body without gradient calculation"[&body]`(let[no-grad#(torch/no_grad)](try(py.no-grad#~'__enter__)~@body(finally(py.no-grad#~'__exit__nilnilnil)))))
Now we can create a network and try it out.
We create a test multilayer perceptron with three inputs, two hidden layers of 8 units each, and one output.
(defcritic(Critic38))
Note that the network creates non-zero outputs because PyTorch performs random initialisation of the weights for us.
Training a neural network is done by defining a loss function.
The loss of the network then is calculated for a mini-batch of training data.
One can then use PyTorch’s backpropagation to compute the gradient of the loss value with respect to every single parameter of the network.
The gradient then is used to perform a gradient descent step.
A popular gradient descent method is the Adam optimizer.
The actor network for PPO takes an observation as an input and it outputs the parameters of a probability distribution over actions.
In addition to the forward pass, the actor network has a method deterministic_act to choose the expectation value of the distribution as a deterministic action.
Furthermore the actor network has a method get_dist to return a Torch distribution object which can be used to sample a random action or query the current log-probability of an action.
Here (as the default in XinJingHao’s PPO implementation) we use the Beta distribution with parameters alpha and beta both greater than 1.0.
See here for an interactive visualization of the Beta distribution.
(defnindeterministic-act"Sample action using actor network returning random action and log-probability"[actor](fnindeterministic-act-with-actor[observation](without-gradient(let[dist(py.actorget_dist(tensorobservation))sample(py.distsample)action(torch/clampsample0.01.0)logprob(py.distlog_probaction)]{:action(tolistaction):logprob(tolistlogprob)}))))
We create a test multilayer perceptron with three inputs, two hidden layers of 8 units each, and two outputs which serve as parameters for the Beta distribution.
(defactor(Actor381))
One can then use the network to:
a. get the parameters of the distribution for a given observation.
We can also query the current log-probability of a previously sampled action.
(defnlogprob-of-action"Get log probability of action"[actor](fn[observationaction](let[dist(py.actorget_distobservation)](py.distlog_probaction))))
Here is a plot of the probability density function (PDF) actor output for a single observation.
(without-gradient(let[actions(range0.01.010.01)logprob(fn[action](tolist((logprob-of-actionactor)(tensor[-100])(tensoraction))))scatter(tc/dataset{:xactions:y(map(fn[action](exp(first(logprob[action]))))actions)})](->scatter(plotly/base{:=title"Actor output for a single observation":=mode:lines})(plotly/layer-point{:=x:x:=y:y}))))
Finally we can also query the entropy of the distribution.
By incorporating the entropy into the loss function later on, we can encourage exploration and prevent the probability density function from collapsing.
(defnentropy-of-distribution"Get entropy of distribution"[actorobservation](let[dist(py.actorget_distobservation)](py.distentropy)))(without-gradient(entropy-of-distributionactor(tensor[-100]))); tensor([-0.0825])
Proximal Policy Optimization
Sampling data
In order to perform optimization, we sample the environment using the current policy (indeterministic action using actor).
(defnsample-environment"Collect trajectory data from environment"[environment-factorypolicysize](loop[state(environment-factory)observations[]actions[]logprobs[]next-observations[]rewards[]dones[]truncates[]isize](if(pos?i)(let[observation(environment-observationstate)sample(policyobservation)action(:actionsample)logprob(:logprobsample)reward(environment-rewardstateaction)done(environment-done?state)truncate(environment-truncate?state)next-state(if(ordonetruncate)(environment-factory)(environment-updatestateaction))next-observation(environment-observationnext-state)](recurnext-state(conjobservationsobservation)(conjactionsaction)(conjlogprobslogprob)(conjnext-observationsnext-observation)(conjrewardsreward)(conjdonesdone)(conjtruncatestruncate)(deci))){:observationsobservations:actionsactions:logprobslogprobs:next-observationsnext-observations:rewardsrewards:donesdones:truncatestruncates})))
Here for example we are sampling 3 consecutives states of the pendulum.
If we are in state s_t and take an action a_t at timestep t, we receive reward r_t and end up in state s_{t+1}.
The cumulative reward for state s_t is a finite or infinite sequence using a discount factor γ<1:
The critic V estimates the expected cumulative reward for starting from the specified state.
In particular, the difference between discounted rewards can be used to get an estimate for the individual reward:
The deviation of the individual reward received in state s_t from the expected reward is:
The special case where a time series is “done” (and the next one is started) uses 0 as the remaining expected cumulative reward.
If we have a sample set with a sequence of T states (t=0,1,…,T-1), one can compute the cumulative advantage for each time step going backwards:
I.e. we can compute the cumulative advantages as follows:
PPO uses an additional factor λ≤1 called Generalized Advantage Estimation (GAE) which can be used to steer the training towards more immediate rewards if there are stability issues.
See Schulman et al. for more details.
Implementation of Deltas
The code for computing the $\delta$ values follows here:
(defndeltas"Compute difference between actual reward plus discounted estimate of next state and estimated value of current state"[{:keys[observationsnext-observationsrewardsdones]}criticgamma](mapv(fn[observationnext-observationrewarddone](-(+reward(ifdone0.0(*gamma(criticnext-observation))))(criticobservation)))observationsnext-observationsrewardsdones))
If the reward is zero and the critic outputs constant zero, there is no difference between the expected and received reward.
If the reward is 1.0 and the difference of critic outputs is also 1.0 then there is no difference between the expected and received reward (when $\gamma=1$).
If the next critic value is 1.0 and discounted with 0.5 and the current critic value is 2.0, we expect a reward of 1.5.
If we only get a reward of 1.0, the difference is -0.5.
The advantages can be computed in an elegant way using reductions and the previously computed deltas.
(defnadvantages"Compute advantages attributed to each action"[{:keys[donestruncates]}deltasgammalambda](vec(reverse(rest(reductions(fn[advantage[deltadonetruncate]](+delta(if(ordonetruncate)0.0(*gammalambdaadvantage))))0.0(reverse(mapvectordeltasdonestruncates)))))))
For example when all deltas are 1.0 and if using an discount factor of 0.5, the advantages approach 2.0 assymptotically when going backwards in time.
When an episode is terminated (or truncated), the accumulation of advantages starts again when going backwards in time.
I.e. the computation of advantages does not distinguish between terminated and truncated episodes (unlike the deltas).
We add the advantages to the batch of samples with the following function.
(defnassoc-advantages"Associate advantages with batch of samples"[criticgammalambdabatch](let[deltas(deltasbatchcriticgamma)advantages(advantagesbatchdeltasgammalambda)](assocbatch:advantagesadvantages)))
Critic Loss Function
The target values for the critic are simply the current values plus the new advantages.
The target values can be computed using PyTorch’s add function.
(defncritic-target"Determine target values for critic"[{:keys[observationsadvantages]}critic](without-gradient(torch/add(criticobservations)advantages)))
We add the critic targets to the batch of samples with the following function.
(defnassoc-critic-target"Associate critic target values with batch of samples"[criticbatch](let[target(critic-targetbatchcritic)](assocbatch:critic-targettarget)))
If we add the target values to the samples, we can compute the critic loss for a batch of samples as follows.
(defncritic-loss"Compute loss value for batch of samples and critic"[samplescritic](let[criterion(mse-loss)loss(criterion(critic(:observationssamples))(:critic-targetsamples))]loss))
Actor Loss Function
The core of the actor loss function relies on the action probability ratio of using the updated and the old policy (actor network output).
The ratio is defined as
Note that r_t(θ) here refers to the probability ratio as opposed to the reward of the previous section.
The sampled observations, log probabilities, and actions are combined with the actor’s parameter-dependent log probabilities.
(defnprobability-ratios"Probability ratios for a actions using updated policy and old policy"[{:keys[observationslogprobsactions]}logprob-of-action](let[updated-logprobs(logprob-of-actionobservationsactions)](torch/exp(py.(torch/subupdated-logprobslogprobs)sum1))))
The objective is to increase the probability of actions which lead to a positive advantage and reduce the probability of actions which lead to a negative advantage.
I.e. maximising the following objective function.
The core idea of PPO is to use clipped probability ratios for the loss function in order to increase stability, .
The probability ratio is clipped to stay below 1+ε for positive advantages and to stay above 1-ε for negative advantages.
Because PyTorch minimizes a loss, we need to negate above objective function.
(defnclipped-surrogate-loss"Clipped surrogate loss (negative objective)"[probability-ratiosadvantagesepsilon](torch/mean(torch/neg(torch/min(torch/mulprobability-ratiosadvantages)(torch/mul(torch/clampprobability-ratios(-1.0epsilon)(+1.0epsilon))advantages)))))
We can plot the objective function for a single action and a positive advantage.
(without-gradient(let[ratios(range0.02.010.01)loss(fn[ratioadvantageepsilon](toitem(torch/neg(clipped-surrogate-loss(tensorratio)(tensoradvantage)epsilon))))scatter(tc/dataset{:xratios:y(map(fn[ratio](lossratio0.50.2))ratios)})](->scatter(plotly/base{:=title"Objective Function for Positive Advantage":=mode:lines})(plotly/layer-point{:=x:x:=y:y}))))
And for a negative advantage.
(without-gradient(let[ratios(range0.02.010.01)loss(fn[ratioadvantageepsilon](toitem(torch/neg(clipped-surrogate-loss(tensorratio)(tensoradvantage)epsilon))))scatter(tc/dataset{:xratios:y(map(fn[ratio](lossratio-0.50.2))ratios)})](->scatter(plotly/base{:=title"Objective Function for Negative Advantage":=mode:lines})(plotly/layer-point{:=x:x:=y:y}))))
We can now implement the actor loss function which we want to minimize.
The loss function uses the clipped surrogate loss function as defined above.
The loss function also penalises low entropy values of the distributions output by the actor in order to encourage exploration.
(defnactor-loss"Compute loss value for batch of samples and actor"[samplesactorepsilonentropy-factor](let[ratios(probability-ratiossamples(logprob-of-actionactor))entropy(torch/mulentropy-factor(torch/neg(torch/mean(entropy-of-distributionactor(:observationssamples)))))surrogate-loss(clipped-surrogate-lossratios(:advantagessamples)epsilon)](torch/addsurrogate-lossentropy)))
A notable detail in XinJingHao’s PPO implementation is that the advantage values used in the actor loss (not in the critic loss!) are normalized.
The data required for training needs to be converted to PyTorch tensors.
(defntensor-batch"Convert batch to Torch tensors"[batch]{:observations(tensor(:observationsbatch)):logprobs(tensor(:logprobsbatch)):actions(tensor(:actionsbatch)):advantages(tensor(:advantagesbatch))})
Furthermore it is good practice to shuffle the samples.
This ensures that samples early and late in the sequence are not threated differently.
Note that you need to shuffle after computing the advantages, because the computation of the advantages relies on the order of the samples.
We separate the generation of random indices to facilitate unit testing of the shuffling function.
(defnrandom-order"Create a list of randomly ordered indices"[n](shuffle(rangen)))(defnshuffle-samples"Random shuffle of samples"([samples](shuffle-samplessamples(random-order(python/len(first(valssamples))))))([samplesindices](zipmap(keyssamples)(map#(torch/index_select%0(torch/tensorindices))(valssamples)))))
(defnsample-with-advantage-and-critic-target"Create batches of samples and add add advantages and critic target values"[environment-factoryactorcriticsizebatch-sizegammalambda](->>(sample-environmentenvironment-factory(indeterministic-actactor)size)(assoc-advantages(critic-observationcritic)gammalambda)tensor-batch(assoc-critic-targetcritic)normalize-advantagesshuffle-samples(create-batchesbatch-size)))
PPO Main Loop
Now we can implement the PPO main loop.
The outer loop samples the environment using the current actor (i.e. policy) and computes the data required for training.
The inner loop performs a small number of updates using the samples from the outer loop.
Each update step performs a gradient descent update for the actor and a gradient descent update for the critic.
Another detail from XinJingHao’s PPO implementation is that the gradient norm for the actor update is clipped.
At the end of the loop, the smoothed loss values are shown and the deterministic actions and entropies for a few observations are shown which helps with parameter tuning.
Furthermore the entropy factor is slowly lowered so that the policy reduces exploration over time.
The actor and critic model are saved to disk after each checkpoint.
This image shows the motor control input as a function of pendulum angle and angular velocity.
As one can see, the pendulum is decelerated when the speed is high (dark values at the top of the image).
Near the centre of the image (speed zero and angle zero) one can see how the pendulum is accelerated when the angle is negative and the speed small and decelerated when the angle is positive and the speed is small.
Also the image is not symmetrical because otherwise the pendulum would not start swinging up when pointing downwards (left and right boundary of the image).
Automated Pendulum
The pendulum implementation can now be updated to use the actor instead of the mouse position as motor input when the mouse button is pressed.
(defn-main[&_args](let[actor(Actor3641)done-chan(async/chan)last-action(atom{:control0.0})](when(.exists(java.io.File."actor.pt"))(py.actorload_state_dict(torch/load"actor.pt")))(q/sketch:title"Inverted Pendulum with Mouse Control":size[854480]:setup#(setupPI0.0):update(fn[state](let[observation(observationstateconfig)action(if(q/mouse-pressed?)(action(tolist(py.actordeterministic_act(tensorobservation)))){:control(min1.0(max-1.0(-1.0(/(q/mouse-x)(/(q/width)2.0)))))})state(update-statestateactionconfig)](when(done?stateconfig)(async/close!done-chan))(reset!last-actionaction)state)):draw#(draw-state%@last-action):middleware[m/fun-mode]:on-close(fn[&_](async/close!done-chan)))(async/<!!done-chan))(System/exit0))
Here is a small demo video of the pendulum being controlled using the actor network.
You can find a repository with the code of this article as well as unit tests at github.com/wedesoft/ppo.
The world is going through changes: in programming, technology, work, and in
specific countries and regions, each with its own form of trouble, hope, or
confusion.
People in Clojure communities, like elsewhere, are finding their way through
it, sometimes with questions and sometimes with a sense of being alone in it.
The Clojure Community Check-In is a space to share how we’re doing.
September 30 – October 2, 2026
Charlotte Convention Center, Charlotte, NC
Join us for the largest gathering of Clojure developers in the world! Meet new
people and reconnect with old friends. Enjoy two full days of talks, a day of workshops, social events, and more.
cljam - Clojure interpreter with a tokenizer, reader, macro expander, evaluator, incremental compiler, vite plugin, nREPL server compatible with calva on vscode, embedded browser REPL, CLI compatible with node and bun as host
bisql - Keep SQL executable, call it as Clojure functions 🚲️
miniforge-standards - Shared engineering standards for all miniforge.ai repositories
cljs-mjml - Write MJML email templates with Hiccup syntax in ClojureScript (or Node Babashka)
spel0.9.5 - Idiomatic Clojure wrapper for Playwright. Browser automation, API testing, Allure reporting, and native CLI - for Chromium, Firefox, and WebKit
dataspex2026.04.1 - See the shape of your data: point-and-click Clojure(Script) data browser
meme-clj5.0.0 - meme-clj — M-Expressions with Macro Expansion
charm.clj0.2.71 - A Clojure TUI (Terminal User Interface) library inspired by Bubble Tea
clj-xref0.1.1 - LLM-friendly cross-reference database for Clojure code. Query who-calls, calls-who, who-implements, ns-deps to feed precise dependency neighborhoods to AI assistants instead of entire source trees. Built on clj-kondo.
babashka1.12.218 - Native, fast starting Clojure interpreter for scripting
fs0.5.33 - File system utility library for Clojure
phel-lang0.34.1 - A functional, Lisp-inspired language that compiles to PHP. Inspired by Clojure, Phel brings macros, persistent data structures, and expressive functional idioms to the PHP ecosystem.
Over the years, I’ve experimented with almost every flavor of development environment. I’ve gone from manually provisioning tools on a Mac—hoping I’d remember every brew install six months later—to exploring Docker, Nix, and remote environments.
My journey has touched it all: asdf, Brew, Docker, Nix, and Devbox. I’ve jumped between terminal emulators and multiplexers like Tmux, Kitty, WezTerm, and Zellij.
My Modern Development Environment
My latest setup is built for speed, reproducibility, and a “keyboard-first” philosophy. It lives entirely in the terminal across two environments: my local iMac and an OCI Ampere VPS.
Terminal: Ghostty
Multiplexer: Zellij
Environment Management: Devenv
Editor: Doom Emacs
Achieving CI Parity with Self-Hosted Runners
If you haven’t switched to a self-hosted GitHub runner yet, do it for your own sanity. You can thank me later.
By running your CI on your own hardware (like an OCI Ampere instance), you eliminate the overhead of public runners and gain full control over the environment. When paired with Devenv, your CI environment becomes an exact mirror of your local machine.
How to set it up:
Navigate to your GitHub repository Settings.
Go to Actions -> Runners.
Click New self-hosted runner and follow the configuration steps for your OS.
Update your workflow .yml file to use the self-hosted label.
Quantifying the Impact on Feedback Loops
By moving to a self-hosted ARM64 runner, my feedback loop became incredibly tight. My tests now finish in 24 seconds, and the entire image creation process takes just 1 minute and 22 seconds.
Here is what the streamlined job looks like:
jobs:
test:
runs-on:
- self-hosted
- Linux
- ARM64
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Run tests
id: run_tests
run: devenv shell clojure -X:test
The Power of Declarative Environments
Because I’m using devenv shell, I don’t have to worry about whether the CI runner has Clojure, the right JDK, or specific libraries installed. If it works in my local terminal, it works in the CI. Period.
Final Thoughts: Simplify Your Workflow
The way we use GitHub Actions today is often redundant. We spend an enormous amount of time writing complex YAML configurations to install dependencies, manage versions, and configure caching—essentially re-architecting our entire development environment for every single commit.
Provisioning and caching are solved problems. If you are using tools like Devenv or Nix, you’ve already defined exactly what your project needs to run. By moving to a self-hosted runner, you stop fighting the CI and start using it as a natural extension of your workstation. You gain:
Total Parity: If the code runs in your local devenv shell, it will run in CI. No exceptions.
Instant Caching: Since the runner is persistent, you don’t need to upload or download massive cache blobs; the dependencies are already there.
Minimal Configuration: Your workflow files shrink from dozens of lines of setup boilerplate to a single command.
It’s time to stop treating CI like a special snowflake and start treating it like the high-performance terminal it should be. Stop provisioning twice, stop waiting for public runners, and start shipping faster.
Would you like to have a follow-up on this topic? What are your thoughts? I’d love to hear your experiences.
What if, instead of relying on manual feature engineering, we could learn directly from raw financial behavior at scale?
That was the thesis we set out to defend on episode 122 of the Data Hackers podcast, Brazil’s largest Data and AI community. Founded in 2018, the community brings together thousands of data professionals and thought leaders to discuss the cutting edge of technology.
In conversation with hosts Monique Femme and Paulo Vasconcellos, Arissa Yoshida and Rafael Celente, Senior Research Engineers at Nubank, walked through the breakthroughs behind the paper “Your Spending Needs Attention: Modeling Financial Habits with Transformers” and how this research is already making its way into production.
The starting point is straightforward: financial institutions sit on massive volumes of data — transactions, in-app events, customer interactions — yet extracting real value from this data remains a hard problem. Its sequential, unstructured nature has historically pushed teams toward tabular models built on hand-crafted features.
The paper charts a different course: leveraging Transformer-based architectures and self-supervised learning to build representations directly from raw data. This work gave rise to nuFormer, a model that blends structured and textual transaction attributes and supports fine-tuning for tasks like credit scoring, fraud detection, and product recommendation — delivering measurable gains at scale.
From traditional machine learning to foundation models
To appreciate why this matters, consider where the industry started. For years, traditional ML models — particularly tree-based methods paired with heavy feature engineering — dominated financial applications. These models remain effective, but they hit a ceiling when the problem involves large volumes of unstructured data and the need to capture complex temporal patterns.
At Nubank, where we have an extraordinarily rich dataset — especially long sequences of financial transactions — this limitation becomes hard to ignore. As Arissa Yoshida puts it, these traditional approaches lean heavily on a manual, specialized step of variable construction.
“With traditional models, you rely heavily on handcrafted features — essentially building an entire engineering pipeline to extract value from your data. That requires people with deep domain expertise who can manually work through the data.”
Arissa Yoshida, Senior Machine Learning Engineer at Nubank
This dependency makes the process less scalable and more expensive, particularly as data volume and complexity grow. Rafael Celente reinforces this point by explaining that the challenge goes beyond modeling itself — it’s about generalization: “we have a massive dataset, and our hypothesis was that we could get a model to generalize customer behavior from that data.”.
This limitation, combined with the need for models that learn directly from data, opens the door to foundation models in finance.
Treating financial data as language
The key paradigm shift lies in how we look at the data. Rather than treating transactions as isolated records, the idea is to interpret them as sequences with structure, context, and meaning — much like natural language.
Transformers operate on tokens and learn relationships between them. By converting transactions into tokenized sequences, we can capture behavioral patterns at a much deeper level. The model doesn’t care whether it’s processing words, pixels, or financial events — what matters is the relationships between these elements.
This flexibility is precisely what makes it possible to apply an architecture originally designed for natural language to an entirely different domain like finance.
What nuFormer is and why it matters
This is the context in which nuFormer, was born — a foundation model developed by Nubank’s AI Core team to learn representations from financial data at scale. The goal isn’t to solve a single problem, but to build a reusable foundation for different applications across the bank. From these representations, we can improve use cases like fraud detection, product recommendation, and risk modeling.
The key differentiator is generalization. Instead of training a model from scratch for every problem, nuFormer learns a representation of financial behavior that can be reused across multiple contexts, giving different applications a shared starting point.
As Arissa Yoshida explains in the episode, the vision behind this new kind of model is “to generalize and extract insight from raw, often unstructured data, and scale that across many different problems.”.
Although the initial work started with transactions, the model quickly evolved to incorporate different data types. Today, the vision is multimodal — capable of integrating not just structured financial data, but also behavioral signals, in-app interactions, and other information sources.
This broadens the model’s potential significantly: it moves beyond isolated events to represent a more complete picture of customer behavior, unlocking more sophisticated applications.
This evolution also connects to other AI Core initiatives, such as AI agents that leverage these representations to operate in real-world scenarios at scale. The team shared these examples in the posts “Building AI agents in practice with Clojure” and “Building AI agents for 131 million customers”, here on Building Nubank.
Engineering, data, and governance for foundation models
One of the most insightful parts of the conversation made clear that the biggest challenge isn’t the model itself — it’s the engineering required to make it work. Training a model of this scale demands robust infrastructure: well-structured pipelines, GPU management, and distributed training. But the real pain point shows up when you try to take it to production.
Transformer-based models tend to have higher latency, which can be a sensitive factor in financial applications. Still, with the right infrastructure and specialized teams, it’s possible to achieve performance levels comparable to traditional models. This reality highlights that the challenge extends beyond ML — it’s a systems problem that requires cross-functional collaboration. As Rafael Celente sums it up: “it’s not just a machine learning problem — it’s a systems problem.{RQ}.
This complexity extends to the role of data and model evaluation. While training at scale is already a reality, ensuring models are learning correctly remains one of the biggest challenges. That involves building consistent data pipelines, continuous monitoring, and defining metrics aligned with business impact.
On top of that, the financial sector adds another layer of rigor: governance. Models must pass multiple rounds of validation before going to production, ensuring compliance with regulations and internal standards. In this landscape, building foundation models requires the joint effort of data engineering, infrastructure, evaluation, product, and business teams — ensuring solutions not only work, but deliver sustainable real-world impact.
Results and impact
Deploying these models into existing systems has driven significant gains in key metrics within just a few months — surpassing improvements that had been accumulated over years with traditional approaches.
These gains aren’t limited to a single use case. The model is already being applied across multiple fronts, including credit, lending, income prediction, and cross-sell, demonstrating that the approach can be reused across a variety of contexts within the bank.
This reusability doesn’t just accelerate the development of new solutions — it creates a multiplier effect, allowing different products to benefit from the same learning foundation.
Looking ahead, our ambition isn’t just to keep up with the state of the art — it’s to contribute to it. That means exploring new architectures, expanding multimodal capabilities, and continuing to share what we learn with the community. As discussed in the episode, the goal is to challenge the status quo and set new standards for AI in finance.
Our appearance on Data Hackers Podcast #122 underscores a central pillar of our strategy: foundation models are already being applied in practice to solve real problems in finance, with direct impact on how we build products, make decisions, and scale intelligence.
By applying Transformers to model financial habits at scale, Nubank is building an AI platform that learns directly from data and evolves continuously. nuFormer in production, with applications in credit and beyond, shows how this approach can expand horizontally and generate consistent value.
If you want to work on problems like these — dealing with data at scale, developing foundation models, and impacting over 131 million customers — we’re hiring on the AI Core team.
I have for the past year or two been working on some large Biff changes, such as those discussed in
Structuring large Clojure codebases with Biff
and Biff support for XTDB v2 is in pre-release. Now that
coding agents have gone mainstream (and in particular, now that I personally have started using them
heavily), I've had a few more ideas for changes I'd like to make to Biff. And also thanks to coding
agents, I've actually been able to make consistent progress instead of my Biff development time
being bottlenecked by how late I can stay awake on weekend nights after my kids are sleeping. So we,
fingers crossed, are getting close to some major Biff updates, and I figure I may as well slap a 2.0
label on it.
Here's what I've got in the works.
SQLite will be the default database
This is the biggest change. Biff will retain first-class support for XTDB, but it'll also have
first-class support for SQLite, and I'll update the starter project to use SQLite by default. There
will still be a (non-default) starter project that uses XTDB.
Biff has used XTDB since its (Biff's) initial release in 2020, back when the database was still
called Crux. About a year ago I started working on migrating Biff from XTDB v1 to XTDB v2, which
brings a whole new architecture, including column-oriented indexes that make analytical queries
faster. Besides writing some Biff-specific helper code for XTDB v2, I migrated
Yakread (a 10k-LOC article recommender system) to v2 and did a bunch of
benchmarking for Yakread's queries. (A big thank you to the XTDB team who responded to lots of my
questions during this time and also made a bunch of query optimizations!)
Long-story short: despite the optimizations, I had trouble getting Yakread's page load times to be
as quick as I wanted. For the particular queries Yakread runs—which are mostly row-oriented—I've
generally found v2's performance to be slower than v1. There is also a larger per-query latency
overhead, perhaps another design tradeoff of the new architecture (you can still run v2 as an
embedded node within your application process, but it’s designed primarily to be run on a separate
machine like more traditional networked databases).
I also will admit that before this benchmarking exercise I had not actually used SQLite much, and I
was unaware of how ridiculously fast it is. And one of the main downsides of SQLite when compared
to XTDB—that SQLite is a mutable database—is mitigated by Litestream, which streams
changes to object storage and lets you restore from (and even run ad-hoc
queries on) historical snapshots saved with 30-second
granularity.
I could see myself switching back to XTDB at some point in the future. It's still the early days for
v2 and the XTDB team is doing lots of work, including on query performance. And SQLite's speed comes
with tradeoffs:
Scaling beyond one machine is an unsolved problem. Litefs can let you put SQLite nodes in a
cluster where writes get forwarded to a single leader and changes are streamed to the other nodes.
However, to use it with Litestream, you have to disable automatic leader
failover. So you basically
have to choose between HA or PITR.
SQLite only supports a few basic datatypes: ints, floats, strings, and blobs (byte arrays). A
large part of my work in integrating SQLite into Biff has been to set up automatic data type
coercion so you can use richer types (UUID, boolean, instant, enum, map/set/vector) in your schema
without having to do manual coercion when reading and writing.
Litestream's snapshots-at-30-second-granularity is fine for recovering from bad transactions like
a DELETE FROM without the WHERE, but it's less helpful than XTDB/Datomic for the
debugging-weird-production-issues use case: you can't include a transaction ID or similar in your
production logs and then re-run queries with 100% confidence that the results you're seeing are
what the application saw when it e.g. threw an unexpected exception.
I was chatting with Jeremy from the XTDB team last week, and he mentioned they've been working on
having XTDB ingest changes directly from Postgres. It sounds like it shouldn't be much work to make
that work with SQLite too, which means that you could stick an XTDB node alongside your
SQLite-powered Biff app and then get more granular historical queries. Maybe XTDB could be a
replacement for Litestream?
That could get even more interesting if eventually we can do the inverse as well, where data from
our immutable XTDB log could be sent both to a bitemporal index for historical queries and also to
SQLite "indexes"/databases for the application servers to use. That would solve the HA problem too.
Anyway. However it happens, I'm looking forward to the glorious future when we finally have an
inside-out database
that's fast for all query shapes, highly available, models time correctly, and can even do advanced
things like let you put a UUID in it. In the meantime, I think SQLite is a reasonable default given
Biff's focus on solo developers, and I would absolutely consider XTDB today for situations in which
modeling time correctly is a top concern.
Alternate starter projects will get easier
Biff consists of a starter project, a bunch of helper code exposed through a single com.biffweb
namespace, tooling for CLI tasks and deployment, and a big pile of documentation. The
com.biffweb namespace is on its way out: I'll be publishing Biff helper code as individual
libraries like com.biffweb.sqlite (and com.biffweb.xtdb), com.biffweb.authentication,
com.biffweb.middleware, com.biffweb.config, etc.
Part of the motivation for this change is that Biff is more mature than it was five years ago and
it's become more clear what the different cohesive parts of Biff should actually be. I started out
with a single kitchen-sink library because splitting it up felt premature; I didn't think it would
realistically make sense to use one of them outside a standard Biff project that would already be
depending on all the Biff libraries anyway.
But over the past few months, I've been developing a couple new side projects from scratch without
even using Biff. As I've done this, I've started extracting various things into standalone
libraries, and this time I do see them as useful libraries in their own right. For example, the
new biff.authentication library will be an easy way to add email-based authentication to any Clojure
web app that uses Reitit—it even comes with a default sign-in page.
The other factor behind this change is agent-driven development. The difficulty of
mixing-and-matching different libraries is dramatically easier now to the point where I wondered
briefly if Biff was even needed anymore. Developing those new side projects via agent has disabused
me of that notion: agents still need a lot of structure (e.g. in the form of these Biff libraries)
to guide them. Even for starting new projects, why have everyone generate a different starter
project via some prompt when you could have a single person generate the starter project, make sure
it actually works, and then publish that?
That's still a meaningful change though: the effort required to create and maintain new project
templates has decreased significantly. So I think it makes more sense for Biff to be split up into
multiple libraries that can themselves be mixed-and-matched. I will myself provide Biff starter
projects for SQLite and XTDB, respectively. If anyone else wants to make a Biff starter project
variant with different library choices, they'll similarly be able to do that without much effort.
For vanity reasons, I'll need to continue having a single "main" Biff repo of some sort (did I
mention Biff hit 1,000 github stars recently?). Maybe I'll have that repo be the default starter
project.
New approaches for structuring application logic
Two of these Biff libraries that happen to contain some new stuff—instead of being a
splitting-out of code that was already in Biff—are
biff.graph, which lets you structure your domain model
as a queryable graph, inspired by Pathom; and biff.fx,
which helps you remove effectful code from your application logic via state machines.
Both libraries help you write purer code (and thus code that's easier to understand and test).
biff.graph is a higher-level abstraction that helps with code that reads data.
biff.fx is a lower-level thing that I mostly use when writing data. However
they're also useful together: e.g. my GET request handlers are typically biff.fx
machines that run a biff.graph query and pass the results to the (now pure) rendering code:
I'll save a fuller explanation for later; hopefully that gives you the flavor of what these libs do.
I've been using Pathom heavily over the past few years, both for work and pleasure. I've started
referring to the code structure it enables as “data-oriented dependency injection.” It helps you
structure your application in small easy-to-understand chunks that declare exactly what data they
need as input and what data they provide as output. The main downside in my experience is that it
can be difficult to understand exactly what Pathom is doing and debug when things go wrong.
For “serious” projects, that's a price worth paying. For the kinds of solo projects that Biff is
aimed at, I've felt apprehensive about foisting another layer of abstraction on people for code
structure benefits that they may or may not notice.
However, my own experience is that even for small apps, the benefit is real. So biff.graph is an
attempt to provide the same graph computational model / “data-oriented dependency injection” with as
small of an implementation as possible: biff.graph is about 400 lines of code currently, whereas
Pathom is closer to 10k.
The main tradeoff I've made in service of that goal is to omit the query planning step that Pathom
uses. biff.graph traverses directly over your input query, looking up which resolver(s) to call for
each attribute as it goes. For each resolver, biff.graph runs what is more-or-less a separate query
to get that resolver's inputs. This hopefully makes biff.graph easier to trace and understand what
it's doing, but it also means biff.graph isn't able to optimize the query plan the way Pathom does.
(biff.graph does support batch resolvers and caching at least).
biff.fx is more of an original creation. Instead of a single function, you have a
map of functions, one for each state. Effects happen in the transitions. You define global “fx
handlers” that do things like HTTP requests, database queries/transactions, etc, represented by
keywords (e.g. :biff.fx/graph in the example). I’ve changed up the format
for describing effects a few times; I think I've finally landed on something that feels ergonomic
([:do-something arg1 arg2] as a replacement for (do-something! ctx arg1 arg2)).
Authorization rules are so back
Biff entered this world as a replacement for Firebase, which I had enjoyed using but left me with
the desire for a regular long-lived clojure backend. Firebase lets your frontend submit arbitrary
transactions from the frontend, and then they're checked against some centralized authorization
rules you define (e.g. “documents in the stuff table can only be edited if the current user's ID is
the same as stuff.user_id”). I implemented a similar thing where you would submit transactions in a
format similar to Firebase's, then I would translate them to XTDB's transaction format and pass a
diff of the database changes to your authorization functions.
I ended up abandoning the SPA approach altogether for server-side rendering (with htmx), and that
made authorization rules unnecessary since transactions were originating from the backend: I no
longer needed to validate completely arbitrary transactions.
Once again, coding agents have changed the game. When working on mature codebases, of course we all
read our generated code carefully before submitting a pull request. But when I've got a new app
idea, I want to mostly just vibe code it until I get to the MVP. I'd like to be able to do a light
review just to make sure the structure of the code is reasonable. With authorization rules, you can
carefully review those central rules in the same way you'd carefully review the database schema, and
then you can have confidence that the feature code isn't missing an authorization check. (Of course
you still have to make sure the agent didn't bypass the authorization rules...)
This is only for writing data. For reading data, I typically have a few Pathom/biff.graph resolvers
that e.g. read an entity ID from the incoming request's path parameters and ensure the user has
access to that entity (like the :param/stuff resolver alluded to in the example above). Other
related entities are queried as joins against that root entity, so if the authorization check fails,
the rest of the query will fail too. So once again you have a way to put authorization logic in a
central place that can be reused by your feature-specific code.
oh yeah and datastar
As mentioned above, Biff uses htmx. I like server-side
rendering and I think it's a particularly good fit for
Biff's solo developer focus. htmx however has a critical flaw: it's too popular. It has 47k github
stars—that's half of what Tailwind has.
Datastar fixes this problem by being a much younger project—a niche of a niche. There is a
much smaller chance that your colleagues will have heard of it. Datastar also has some smaller but
still tangible benefits:
It has some frontend reactivity built in. With htmx, you typically use another tool like
_hyperscript or Alpine.js to provide interactivity in cases where you really
don't want to wait for a server roundtrip (e.g. a dropdown menu). Datastar has a concept of
"signals" baked in so you don't need a second tool.
It has a smaller API surface; much of what htmx offers is replaced by "just use out-of-band
swaps." So it might be easier to learn?
It works well for fancy CQRS
stuff
(still on my list of things to try out).
Of the changes I've mentioned, this one is the most experimental. I actually haven't even made an
official decision if I really will switch Biff from htmx to Datastar; at this point I'm just making
a prediction that I probably will.
More broadly I would like to explore how far I can push the server-side rendering model before I
feel it breaking down. e.g. what approach would I use with it to handle forms with 50+ fields and
lots of conditional logic, complex validation logic etc? How about charts? (What I'm getting at:
would I regret asking an LLM to migrate our large codebase at work over to htmx/Datastar?).
I’d like to give an honorable mention to inline snapshot testing which I’ve been excited
about for a year and a half but now find
unnecessary—counterproductive, even—with coding agents. I had started working on some updates to
my test code so you could do inline snapshot tests in plain .clj files instead of in .edn files
(turns out that tooling support is best when you put your code in files meant for code). But with
coding agents, I’ve found that I don’t want tests that auto-update when the actual result changes:
it’s too easy for agents to ignore new results that are obviously incorrect. And of course I don’t
care if my coding agent finds updating unit tests to be tedious. So the test-related stuff that Biff
does will be limited to making your application code more pure so you (your agent) can write dumb
(is (= (f x) y)) tests. I might add some structure/patterns for integration tests, though.
Another change driven by coding agents, not a change to the code but a change to my philosophy: I'm
more interested in smaller projects. As mentioned, my time for working on personal projects has been
extremely limited until a few months ago. I've only ever had a single Biff project at a time that I
have attempted to work on regularly; new projects started after the old one failed. So the primary
use case I designed Biff for was “serious side projects,” applications that may be solo projects now
but will definitely be bringing in a 6-figure income and fulfilling all your entrepreneurial
desires at... some point. That one project is the only thing I've ever had a chance of having time
for.
Now I can code up an MVP for something over a weekend without ever sitting down at my desk. I built
an app that helps me find good Star Wars: Unlimited decks to play.
I'm building a blogging platform next. After that maybe I'll build a music recommender system. Or a
state legislation tracker/summarizer.
I'm having a blast. Maybe that will affect design decisions I make down the road? I certainly am
interested in the use case of doing agent-driven development from a mobile device, so maybe expect
something in that area.
I have been building an experimental graph-based AI agent runtime in Clojure called Ayatori. It is a side project where I try to stay hands-on with new ideas, in my favorite language. For Ayatori, I had in mind agents as graphs: nodes as functions, edges defining routing, and an executor tying it together.
This article is about exploring whether core.async.flow could replace the executor.
The agent model
In Ayatori, an agent is a data structure. It declares nodes, edges, capabilities, and dependencies.
Nodes are either plain functions or typed maps for built-in behaviors like LLM invocation or fan-out. Edges are either a keyword (unconditional routing) or a map from a route key to a target node (conditional routing). Caps declare which entry points are externally callable. Deps declare which external capabilities the agent needs at runtime, resolved by the system at start time.
The executor’s job is to take this definition and run it.
What I was building by hand was a straightforward go-loop:
This worked well enough for an experiment. But I kept accumulating concerns around it: What if I need to pause mid-execution? What if a node produces faster than the next can consume? What if I want to stop one node without tearing down the whole graph? Each concern meant more code in the executor.
For readers unfamiliar with core.async.flow, the overview and guide are good starting points.
Mapping to core.async.flow
Thinking about those concerns, node lifecycle, channel wiring, backpressure, routing, the problem started to feel familiar. These are the kinds of concerns a runtime handles, not application code. Then I remembered core.async.flow. I had not found the time to look at it carefully when it was first announced. Working on this executor turned out to be a good reason to go back.
core.async.flow describes itself as a library for building “concurrent, event driven data processing flows out of communication-free functions, while centralizing control, reporting, execution and error handling.” The model is a directed graph of processes communicating via channels. You define step functions that transform data. The flow manages channel creation, process lifecycle, backpressure, and error handling.
Those were the same concerns accumulating in the executor. At first glance, the agent DSL and flow’s topology model seem to map well onto each other. An agent is a directed graph of computation steps. A flow is a directed graph of processes communicating through channels. The unit of work in both is a function that takes input and produces output. The connections are explicit and declared separately from the logic. That structural similarity is what made me want to try this.
Agent → Flow graph
Node → Process
Edge → Connection [[from :out] [to :in]]
Conditional edge → Multi-output process, route key selects port
Cap entry point → flow/inject target
Dep (cross-agent call) → IO process, blocking resolver
Fan-out node → Multi-output process with step state correlation
Graph result → Collector sink
The topology that I was managing imperatively in a go-loop would become a data structure passed to create-flow. Each node requires a step function and explicit port declarations, but the wiring and lifecycle are handled by the framework.
A simplified view of what that topology definition looks like:
The nodes, edges, and routing from the Ayatori definition translate directly into procs and conns.
The mismatch
flow’s model is fire-and-forget. A caller injects a message and the graph processes it downstream. There is no built-in way to wait for a result.
Ayatori needs callers to wait for a result. This mismatch was not a surprise. Flow is a poor fit for RPC-style request paths, as Alex Miller noted on Ask Clojure. What I wanted to find out was what bridging that gap would actually cost in practice.
What I did was build request/reply semantics on top of flow using a correlation ID.
When a caller invokes a cap, a UUID is generated and stored in a registry alongside a promise channel. The message carries that ID through the entire flow. The terminal node is a dedicated collector process that delivers results directly to the caller’s channel. I initially considered using ::flow/report for result delivery, but the guide describes it as a channel for unified logging across the flow. Using it for results would be working against its purpose, so I went with the collector process instead.
Error handling follows the same pattern. Processes throw exceptions with the corr-id in ex-data. A consumer on the flow’s error channel extracts the corr-id and delivers the exception back to the correct caller. Without this, errors would go to the error channel but never reach the caller waiting on the promise.
The corr-id is a plain string. In multi-node work, the same pattern can apply across process boundaries without structural change.
This also introduces external mutable state. The registry is an atom that lives outside the flow. Flow’s own model assumes communication-free functions. The collector process breaks that assumption: it reads from and writes to the registry. This feels like forcing the model rather than working with it.
Fan-out
Fan-out requires coordinating results from multiple branches before emitting a final output. A fan-out node broadcasts to N processes and waits for all of them to respond.
flow does not yet provide a built-in way to wait for results from multiple branches. Rich Hickey has noted a planned sync->map process that will handle exactly this. For now, step state serves as a workaround: the fan-out process accumulates branch results across invocations and emits only when all have arrived.
This comes with tradeoffs. If a branch never responds, the pending entry stays in state indefinitely. There is no timeout in this implementation. For an experiment on a single node this is acceptable, but it is something to address before going further.
Each broadcast generates a fan-out ID. The step state holds a pending map keyed by that ID. When all expected results arrive, the step emits the aggregate with the original corr-id.
Thanks for reading Taorem! Subscribe for free to receive new posts and support my work.
Streaming: the self-loop
The LLM node handles streaming responses. Tokens arrive on a channel as the provider sends them. The node needs to read from that channel continuously until the stream is done, then emit the final result.
The approach is a self-loop: the process routes output back to one of its own input ports. On each invocation, it reads one event from the stream channel and either loops back to read another or emits the final result.
The process routes ::self-out back to ::self-in in the flow topology. This keeps the process self-contained: it does not need a reference to the flow itself, and the loop is visible in the topology definition rather than hidden inside imperative code.
Ayatori targets Java 21+. Processes declared with :workload :io run on virtual threads. That is why a blocking read inside the self-loop is not a practical concern. If flow were used on an older JVM, this implementation would need a different approach.
The flow guide says a step need not output at all: it can receive input, update its state, and emit nothing until it is ready. I think this self-loop pattern fits that intent. That said, the blocking read is working around flow’s async model rather than with it. It holds for now, but it is worth noting.
What I found
In this branch, lifecycle management that I would otherwise write by hand comes from the framework. Pause, resume, stop, and ping work across the entire graph or per process without additional code. Backpressure is automatic. The topology is still inspectable as data before execution begins.
These things could certainly be added to the hand-written executor. But each one takes time, and none of them is what this experiment is actually about. Each one was a solved problem I was solving again. The execution model is now more explicit: the topology is a data structure, the connections are declared, and the lifecycle is managed in one place. And with core.async.flow-monitor, visualization comes for free as well:
Where flow worked well in this experiment: topology as data, lifecycle management, backpressure, observability. These came for free and replaced code I would have written by hand.
Where it required bridging: request/reply semantics, fan-out coordination, streaming. Each of those is outside what flow is designed for. The correlation pattern introduces external mutable state. The fan-out workaround has no timeout. The self-loop uses a blocking read inside the step function, bypassing flow’s assumption that step functions do not interact with channels directly.
Rich Hickey’s “Effective Programs” talk distinguishes between situated programs, long-running systems with state, context, and lifecycle, and transient programs, short-lived computations that start, do work, and finish. Ayatori’s runtime is situated. But its external API is transient: invoke a cap, get a result back. Flow handles the situated side well. That tension was always going to require bridging. Whether the bridge I built here is worth maintaining is what I am still working out.
Each workaround gave me the feeling of using flow outside its intended purpose. That might still be acceptable. I have not decided. What I did learn is what those costs actually look like in practice, which is what I set out to find.
When I have an idea for a project, it tends to go in one of these two directions:
I just do it. Maybe I make a few minor revisions, but often it turns out exactly how I’d imagined and I’m happy.
I think, “I should look for prior art”. There’s a lot of prior art, dealing with a much broader scope than I’d originally imagined. I start to wonder if I should incorporate that scope. Or perhaps try to build my thing on top of the existing sorta-nearby-solutions. Or maybe I should just use the popular thing. Although I could do a better job than that thing, if I put a bunch of time into it. But actually, I don’t want to maintain a big popular project, nor do I want to put that much time into this project. Uh oh, now I’ve spent a bunch of time, having neither addressed the original issue nor experienced the joy of creating something.
I prefer the first outcome, and I think the pivotal factor is how well I’ve internalized my own success criteria.
For example, last weekend I hosted my friend Marcin and we decided it’d be fun to do some woodworking, so we threw together this shelf and 3d-printed hangers for my kitchen:
Absolute banger of a project:
brainstormed the design over coffee
did a few 3d-print iterations for the Ikea bin hangers (OnShape CAD, if you want to print your own)
sealed the raw plywood edge with some leftover paint from a friend
done in a weekend
The main success criteria was to jam on woodworking with a friend, and that helped me not overthink the object-level success criteria: Just make a shelf for my exact kitchen!
In contrast, this past Friday I noticed difftastic did a poor job, so I decided to shop around for structural/semantic diff tools and related workflows (a topic I’ve never studied, that I’m increasingly interested in as I’m reviewing more and more LLM-generated code).
I spent 4 hours over the weekend researching existing tools (see my notes below), going through dark periods of both “semantic tree diffing is a PhD-level complex problem” and “why do all of these have MCP servers? I don’t want an MCP server”, before I came to my senses and remembered my original success criteria: I just want a nicer diffing workflow for myself in Emacs, I should just build it myself — should take about 4 hours.
I’m cautiously optimistic that, having had this realization and committing myself to a minimal scope, I’ll be able to knock out a prototype before running out of motivation.
However, other long-running interests of mine:
interfaces for prototyping hardware (discussed September 2023)
a programming language that fuses what I like about Clojure and Rust (November 2023)
That is, I’ve spent hundreds of hours on background research and little prototypes, but haven’t yet synthesized anything that addresses the original motivating issue.
It’s not quite that I regret that time — I do love learning by reading — but I have a nagging sense of unease that my inner critic (fear of failure?) is silencing my generative tendencies, keeping me from the much more enjoyable (and productive!) learning by doing.
I think in these cases the success criteria has been much fuzzier: Am I trying to replace my own usage of Rust/Clojure?
Only for some subset of problems?
Or is it that I actually just need a playground to learn about language design/implementation, and it’s fine if I don’t end up using it?
Ditto for CAD: Am I trying to replace my commercial CAD tool in favor of my own?
Only for some subset of simple or particularly parametric parts?
Do I care if it’s useful for others?
Does my tool need to be legibly different from existing open-source tools?
It’s worth considering these questions, sure.
But at the end of the day, I’d much rather have done a lot than have only considered a lot.
So I’m trying to embrace my inner clueless 20-year-old and just do things — even if some turn out to be “obviously bad” in hindsight, I’ll still be coming out ahead on net =D
Of course, there’s only so much time to “just do things”, and there’s a balance to be had. I’m not sure how many times I’ll re-learn YAGNI (“you ain’t gonna need it”) in my career, but I was reminded of it again after writing a bunch of code with an LLM agent, then eventually coming to my senses and throwing it all out.
I wanted a Finda-style filesystem-wide fuzzy path search for Emacs.
Since I’ve built (by hand, typing the code myself!) this exact functionality before (walk filesystem to collect paths, index them by trigram, do fast fuzzy queries via bitmap intersections), I figured it’d only take a few hours to supervise an LLM to write all the code.
I started with a “plan mode” chat, and the LLM suggested a library, Nucleo, which turned up since I wrote Finda (10 years ago, eek!).
I read through it, found it quite well-designed and documented, and decided to use it so I’d get its smart case and Unicode normalization functionality.
(E.g., query foo matches Foo and foo, whereas query Foowon’t match foo; similarly for cafe and café.)
Finding a great library wasn’t the problem, the problem was that Nucleo also supported some extra functionality: anchors (^foo only matches at the beginning of a line).
This got me thinking about what that might mean in a corpus that consists entirely of file paths.
Anchoring to the beginning of a line isn’t useful (everything starts with /), so I decided to try and interpret the anchors with respect to the path segments.
E.g., ^foo would match /root/foobar/ but not /root/barfoo/.
But to do this efficiently, the index needs to keep track of segment boundaries so that the query can be checked against each segment quickly.
But then we also need to handle a slash occurring in an anchored query (e.g., ^foo/bar) since that wouldn’t get matched when only looking at segments individually (root, foo, bar, and baz of a matching path /root/foo/bar/baz/).
Working through this took several hours: first throwing around design ideas with an LLM, having it write code to wrap Nucleo’s types, then realizing its code was bloated and didn’t spark joy, so finally writing my own (smaller) wrapper.
Then, after a break, I realized:
I can’t think of a situation where I’d ever wished Finda had anchor functionality
In a corpus of paths, I can anchor by just adding / to the start or end of a query (this works for everything except anchoring to the end of a filename).
So I tossed all of the anchoring code.
I’m pretty sure I still came out ahead compared to if I’d tried to write everything myself sans LLM or discussion with others, but I’m not certain.
Perhaps there’s some kind of conservation law here: Any increases in programming speed will be offset by a corresponding increase in unnecessary features, rabbit holes, and diversions.
Speaking of unnecessary diversions, let me tell you everything I’ve learned about structural diffing recently — if you have thoughts/feelings/references in this space, I’d love to hear about ‘em!
When we’re talking about code, a “diff” usually means a summary of the line-by-line changes between two versions of a file.
This might be rendered as a “unified” view, where changed lines are prefixed with + or - to indicate whether they’re additions or deletions.
For example:
We’ve removed coffee and added apple.
The same diff might also be rendered in a side-by-side view, which can be easier to read when there are more complex changes:
The problem with these line-by-line diffs is that they’re not aware of higher-level structure like functions, types, etc. — if some braces match up somehow between versions, they might not be shown at all, even if the braces “belong” to different functions.
There’s a wonderful tool, difftastic, which tries to address this by calculating diffs using treesitter-provided concrete syntax trees.
It’s a huge improvement over line-based diffs, but unfortunately it doesn’t always do a great job matching entities between versions.
Here’s the diff that motivated this entire foray:
Note that it doesn’t match up struct PendingClick, it shows it deleted on the left and added on the right.
I haven’t dug into why difftastic fails to match here, but I do feel like it’s wrong — even if the overall diff would be longer, I’d still rather see PendingClickRequest and PendingClick matched up between both sides.
Here’s a summary of tools / references in the space:
The most “baked” and thoughtful semantic diff tool I found is, perhaps unsurprisingly, semanticdiff.com, a small German company with a free VSCode plugin and web app that shows diffs for github PRs. Unfortunately they don’t have any code libraries I can use as a foundation for the workflow I want.
one of the authors has great HN comments with hard-won background knowledge. E.g., they moved away from treesitter because it’s unreliable for semantics:
Context-sensitive keywords in particular were a constant source of annoyance. The grammar looks correct, but it will fail to parse because of the way the lexer works. You don’t want your tool to abort just because someone named their parameter “async”.
built on treesitter, has MCP server. README includes list of similar projects.
lots of github stars, but doesn’t seem particularly well-documented; I couldn’t find an explanation of how it works, but the difftastic wiki says it “runs longest-common-subsequence on the leaves of the tree”
docs and adorable illustrations indicate this project was clearly written by a thoughtful human
semanticdiff.com author in HN comments:
> GumTree is good at returning a result quickly, but there are quite a few cases where it always returned bad matches for us, no matter how many follow-up papers with improvements we tried to implement. In the end we switched over to a dijkstra based approach that tries to minimize the cost of the mapping
weave: also a treesitter-based merge-driver written in Rust
feels a bit “HN-optimized” (flashy landing pages, lots of github stars, MCP server, etc.)
data model can’t detect intra-file moves, even though those might be significant
includes a lot of heuristic “impact” analysis, which feels like overreaching-scope to me since it’d require much tighter language integration before I’d trust it
ran into buggy output when running sem diff --verbose HEAD~4; it showed lines as having changed that…didn’t change at all.
Too much 80%-done, hypothetically useful functionality for me to use as a foundation, but props for sure to the undergrad/student(?) who’s built all this in just three months.
diffast: tree edit-distance of ASTs based on an algorithm from a 2008 academic paper.
supports “Python, Java, Verilog, Fortran, and C/C++ via dedicated parsers”
My primary use case is reviewing LLM output turn-by-turn — I’m very much in-the-loop, and I’m not letting my agent (or dozens of them, lol) run wild generating 10k+ lines of code at a time.
Rather, I give an agent a scoped task, then come back in a few minutes and want to see an overview of what it did and then either revise/tweak it manually in Emacs or throw the whole thing out and try again (or just write it myself).
The workflow I want, then, is to
see a high-level overview of the diff: what entities (types/functions/methods) were added/removed/changed?
quickly see textual diffs on an entity-by-entity basis (“expanding” parts of the above summary)
quickly edit any changes, without having to navigate elsewhere (i.e., do it inline, rather than having to switch from “diff” to “file)
In light of the "minimal scope, just get your project done” lesson I’ve just re-learned for the nth time, my plan is to:
throw together my own treesitter-based entity extraction framework (just Rust for now)
do some simple greedy matching for now
render the diff to the command line
Once that seems reasonable (i.e., it does a better job than difftastic did on that specific commit), I’ll:
wire into a more interactive Magit-like Emacs workflow (maybe I can reuse Magit itself!?!)
add support for new languages, as I need them
potentially explore more sophisticated score-based global matching rather than simple greedy matching
Mayyybe if I’m happy with it I’ll end up releasing something.
But I’m not trying to collect Github stars or HN karma, so I might just happily use it in the privacy of my own home without trying to “commercialize it”.
I’m in the market for a few square meters of Tyvek or other translucent, non-woven material suitable for building a light diffuser — let me know if you have any favorite vendors that can ship to the EU.
The Easiest Way To Design Furniture…. Laura Kampf on getting off the computer and designing physical spaces with tape, lil’ wood sticks, and cardboard.
Loon is a Lisp. Thrilled to discover I’m not the only one who wants to mash together Clojure and Rust. The current implementation seems to have been manically vibe-coded and I quickly ran into some terrible bugs, but on the other hand it exists so I’m not going to be a hater.
“There isn’t a lot of reliable information out there about how to buy a gas mask, especially for the specific purpose of living under state repression. But hopefully after reading this guide you’ll feel equipped to make an educated decision.”
“This zoomable map shows every page of every issue of BYTE starting from the front cover of the first issue (top left) to the last page of the final edition (bottom right). The search bar runs RE2 regex over the full text of all 100k pages.” A lovely reminder that user-interfaces can be extremely fast and information dense.
The Clojure documentary is released, taking inspiration from the Datomic Ions approach to logging and alerting, and configuring AWS Lambda schedules with Terraform.
Most working engineers have spent ninety percent of their concurrent-programming life in one model: shared memory protected by locks. Threads that all see the same variables. Mutexes around the critical sections. Hope and care. It's the model every OS textbook teaches, every mainstream language supports, and every senior engineer has a horror story about.
It's also not the only option. Or even the best one, for many of the problems it gets used for. Three other models — CSP, actors, and software transactional memory — have been around for decades, mature enough for production, and each solves a class of problems that lock-based designs handle poorly.
This is a map of all four, from a working backend engineer who uses each of them for different jobs, and a take on when each is the right answer.
tl;dr — Concurrency has four viable pillars: shared memory + locks (threads, mutexes), CSP (channels, Go), actors (mailboxes, Erlang), and STM (transactional memory, Clojure). None is universally better. Each solves a different problem and has a different failure mode. Senior designs often mix three of them in one system. Mutex-for-everything works until it doesn't — usually at exactly the scale you promised you'd never reach.
Pillar 1: Shared Memory + Locks
The default. Threads, mutexes, atomics, condition variables. Every mainstream language has them.
How it works: multiple threads of execution share the same address space. They read and write the same data. Mutexes make sure only one thread touches a critical section at a time. Atomics do the same for single-word operations without a full lock.
Where it shines:
Simple shared counters and caches.atomic.AddInt64, sync.Map, LRU caches. The right tool.
Tight single-process coordination where the code is small enough for one person to hold in their head.
Performance-critical paths where the overhead of channel sends or actor dispatches is too much.
Failure modes:
Deadlocks. Two threads acquire locks in opposite order. Happens.
Priority inversion. Low-priority thread holds the lock, high-priority thread waits, work piles up.
Lock ordering bugs at scale. When N components each take M locks, the reasoning gets exponential.
Memory-model weirdness. What one thread writes, another may not immediately see. You start caring about happens-before, acquire/release semantics, and why volatile in Java is not what you thought.
Invisible races. The worst kind. Tests pass; production fails weirdly twice a month.
Use mutexes for small, localized shared state. Once the shared state has three collaborators or more, or a nontrivial invariant across fields, reach for one of the other models.
Tony Hoare's 1978 paper, popularized by Occam and now Go. The model Rob Pike and Ken Thompson picked for Go's concurrency.
How it works: processes don't share memory; they send messages on named channels. Senders and receivers rendezvous on the channel. Ownership of data moves with the message. "Do not communicate by sharing memory; share memory by communicating."
Where it shines:
Pipelines. Data flows through stages, each a goroutine, connected by channels. Clean to read.
Fan-out / fan-in. One producer, many workers, one aggregator. The channel topology is the architecture.
Backpressure. A bounded channel blocks the producer when full. No extra flow control needed.
Cancellation coordination.select with <-ctx.Done() is a clean primitive.
Lifecycle control. Closing a channel is a broadcast to every listener.
Failure modes:
Deadlocks remain possible. Two goroutines each waiting on the other's channel. Cycles in the channel graph are lethal.
Memory leaks via unclosed channels. A goroutine blocked on a send that will never be received lives forever.
Awkward request/reply. You end up passing a reply channel with each request, which works but feels verbose.
Order isn't free. Channel ordering is only per-channel. If you fan out and fan in, the aggregation is unordered unless you sort.
Use CSP for coordination-heavy designs. When the structure of "who's alive, who sends to whom, when do things stop" is the architecture, channels make that visible in the code.
Go is the obvious exemplar, but CSP-style is also available in Rust (crossbeam-channel, tokio::sync::mpsc), Kotlin (coroutines with channels), Python (asyncio.Queue), and C# (System.Threading.Channels).
Pillar 3: Actors
Carl Hewitt's 1973 paper. Made practical by Erlang (1986) and later Akka (Scala/Java). The model behind WhatsApp, a decade of telecom, and most fault-tolerant messaging infrastructure.
How it works: an actor is a named entity with private state and a mailbox. Other actors send messages to its address. Messages are processed one at a time from the mailbox. No shared memory. Parent actors supervise children; when a child crashes, the parent decides to restart, escalate, or ignore. Crashes are normal.
Where it shines:
Fault isolation at scale. One actor crashing is expected; it doesn't take down the system. Supervision hierarchies make "let it crash" a sensible engineering strategy.
Stateful services. Each actor holds its own state. Conceptually clean: no shared global state, no locks around it.
Location transparency. An actor can live in the same process, another process, or another machine. The sender doesn't know. This is where actors shine in distributed systems — the model scales across the network boundary natively.
Massive concurrency with stateful semantics. Erlang routinely runs millions of actors per node. Each is cheap.
Failure modes:
Mailbox unboundedness. If a producer sends faster than the actor can process, the mailbox grows without bound. Bounded mailboxes exist; use them.
Message-ordering assumptions break across the network. Within one node, delivery order is preserved per sender. Across nodes, all bets are off without explicit sequencing.
Testing is harder. Actors make their own state opaque; you test behavior through message exchange. Good frameworks help, but the habits needed are different from testing normal code.
Conceptual mismatch in CRUD-style backends. If your business logic is "select some rows, transform them, insert result," actors are overkill. They shine on long-lived stateful entities (a game character, a connected device, a user session), not on stateless request handlers.
Erlang and Elixir are the canonical runtimes. Akka brings actors to the JVM. Pony is a rare actor-first typed language. In Go, you can simulate actors with a goroutine + channel-as-mailbox pattern, but you lose Erlang's supervision and "let it crash" semantics unless you build them yourself.
Use actors when you have long-lived stateful entities with fault requirements. Telecom, messaging, multiplayer game servers, IoT device shadows, any system where "this particular entity has its own state machine, and we really care when it crashes" is the shape.
Pillar 4: Software Transactional Memory (STM)
Imagine database transactions, but for in-memory data. That's STM.
How it works: critical sections are wrapped in transactions. The runtime tracks reads and writes optimistically. On commit, if any data touched was modified by another transaction, the current one rolls back and retries. No explicit locks. Composability — two transactions can be combined into a larger one without redesigning the locking order.
Where it shines:
Composable concurrent code. Combining operations that were individually correct usually stays correct under STM. Lock-based code famously does not.
Read-mostly workloads. STM with multi-version concurrency control scales reads without blocking.
Avoiding the lock-ordering bug class. No locks, no deadlocks. The failure mode is retry storms, which are easier to reason about.
Failure modes:
I/O inside transactions is awful. Transactions may retry. If you did I/O, you may have done it multiple times. Either separate I/O from transactional state, or the runtime has to forbid I/O inside transactions (Haskell's STM monad does this at the type level).
Retry storms under contention. Heavy write contention on the same data means constant retries. In the worst case, throughput can be worse than locks.
Limited language support. Clojure (built-in), Haskell (STM), Scala (scala-stm), Rust (experimental stm crates). Not a mainstream feature of Go/Java/C#.
Clojure is the canonical "STM as a first-class citizen" language — its refs and transactions are idiomatic. Haskell's STM monad is arguably the cleanest realization. In other ecosystems, STM exists as libraries but hasn't displaced mutexes.
Use STM when the concurrent state is small-to-medium, the access pattern is read-heavy with occasional writes, and you want the composability. For the rare problems that fit, STM is strictly simpler to reason about than locks. For problems that don't fit (I/O-heavy, write-contention-heavy), STM is worse.
How Real Systems Mix Them
The surprise for engineers who've only used one model: mature systems mix three of them in one codebase.
A typical backend service I'd build today:
Mutexes / atomics for the inner loops — counters, caches, rate-limiter state, anything performance-critical with one clear owner.
Actors (in a sense) for long-lived stateful entities — each connected client session, each in-flight request, each background job. In Go I'd model this as "one goroutine per entity, communicating via channels," which isn't formal actors but inherits the useful semantics: isolated state, message-passing, crash-isolation.
And I wouldn't use STM in that stack. Not because it's bad, but because the language runtime doesn't make it first-class. If I were writing Clojure, STM would be a natural fit for the in-memory state machines that would otherwise be locked maps.
The old "pick one concurrency model" debate was always a false choice. The real decision is per-problem: what shape is the concurrent work, what's the state-sharing pattern, and what failure semantics do I want.
Decision Guide
Quick map:
I have a counter that multiple goroutines read and update. → atomic or mutex.
I have a pipeline of work that flows through stages. → channels (CSP).
I have a fleet of long-lived sessions, each with its own state and lifetime. → actor pattern (goroutine + mailbox channel, or real actor framework).
I have a fleet of connected devices each with a state machine that must survive crashes. → actor framework with supervision (Erlang, Akka, or Go with explicit crash/restart logic).
I have complex shared state with nontrivial invariants across fields, and updates are occasional but important to compose. → STM if your language supports it; otherwise, lots of careful mutex discipline.
I have a request/response flow with fan-out to downstreams. → CSP with errgroup.WithContext.
I have no idea what I have. → Start with mutexes, switch when it hurts. Don't over-engineer the first version.
The Real Lesson
Most people who get bitten by concurrency bugs got bitten because they used the wrong model, not because they used it wrong. A mutex-heavy design for a workload that's really a pipeline is fragile. A channels-for-everything design when there's a shared counter underneath ends up with awkward rendezvous. An actors-everywhere design when the business is CRUD requests reads like over-engineering.
The four pillars aren't competing theories of concurrency. They're four tools, each good at specific jobs. Senior engineers know all four and reach for the right one. Junior engineers reach for the only one they know and force-fit it.
If your career so far has been mostly mutexes, spend a weekend reading the other three. Write a toy pipeline in Go channels. Read Erlang's supervision documentation. Play with Clojure refs. The investment pays back every time you sit in a design review and someone proposes locking their way out of a structural problem.
Starting with .NET 11, the .NET runtime now offers runtime support for async/await. The latest release of ClojureCLR (1.12.3-alpha6) provides experimental support for this feature. In this post, we’ll take a look at what async/await, how it works on the CLR, and how ClojureCLR supports it.
The async feature
Async is feature that allows methods to suspend their computation while waiting for a subcomputation to complete. Often, the subcomputation is something that requires waiting for an event, such as completion of an I/O operation. Suspending the computation means yielding control of the thread of execution, which frees the current thread to be used for other purposes during the wait, thus improving multi-threading efficiency.
To take advantage of async in C#, one marks the method’s signature with the async keyword. Within the method’s body, the await keyword can be used to mark the specific suspension points. A simple example:
public static class AsyncExample
{
public static async Task<List<string>> GetData(List<string> filenames)
{
List<string> contents = new List<string>(filenames.Count);
foreach (string filename in filenames)
{
contents.Add(await GetDataFromFile(filename));
}
return contents;
}
private static async Task<string> GetDataFromFile(string filename)
{
return await System.IO.File.ReadAllTextAsync(filename);
}
}
(No need for the class or the methods to be static, but that seems correct for this example.)
Several things to note:
Methods marked as async must have a return type derived from one of these:
Task
ValueTask
Task<TResult>
ValueTask<TResult>
There is contagion here. We defined GetDataFromFile as async. When we use await on it in the caller, the caller must be marked as async. (One can use methods from the Task library to avoid using await, but await is the typical mechanism.)
Runtime implementation
Prior to .NET 11, async and await in C# was implemented using code transformations performed by the compiler. A method marked async is rewritten as a finite-state machine. Suspension points required mechanisms to save and restore state around tha await call. This consumed heap memory. Amd the resulting code caused things like stack traces in exceptions to be loaded with compiler-generated method names that were not very helpful to the programmer.
As of .NET 11, these heroic compiler efforts are no longer required. The CLR core runtime detects methods marked as async (the compiler needs to pass along that information in the method metadata). The compiler translates await calls into certain special method calls that the runtime recognizes. The runtime now has the capability of dealing with saving and restoring state on its own. Simpler, and also more efficient, according to benchmarks.
The story for ClojureCLR
Starting with version 1.12.3-alpha6, ClojureCLR provides experimental support for runtime async. A new library clojure.clr.async.task.alpha provides functions to help with this. Internally, the ClojureCLR compiler has been modified to pass along the critical metadata on async functions and to rewrite await calls to the special methods used by the runtime.
(This library is marked “alpha” because we are still experimenting with the API. Also, runtime async in .NET 11 is still in preview. In the final release, the library will be in namespace clojure.clr.async.task.)
It is helpful to import the namespace.. From the sample file for this post, we start with:
We’re going to do some file I/O in our examples, so it helpful to have a few random files to work with:
(def^Stringin-file(Path/GetTempFileName))(def^Stringout-file(Path/GetTempFileName))(File/WriteAllTextin-file"Some random content.")
Accessing a task result
If all you want to do is call an async method without awaiting it (yielding the thread), then you can use t/result. t/result will run a task if it is not already started/completed and wait to return its result. More typically we use it to extract the result of a task that has already completed, but it will start the task and wait if necessary.
(t/result(File/ReadAllTextAsyncin-fileCancellationToken/None));; => "Some random content."
Async functions
If you want to use await to mark a suspension point where control is yielded, you need to working in an async context. One way to provide that context is to defn a function with the ^:async tag. This marks all of its overloads (IFn.invoke methods) as async for the runtime. It also type hints the return type of the function as System.Threading.Tasks.Task<Object>. This applies to all the arities of the function.
We have suspension points at the read and write calls, which are asynchronous I/O operations. The function will yield control at those points, allowing the thread to be used for other purposes while waiting for the I/O operations to complete. When the operations complete, the function will resume execution at the point of suspension.
Note that calling shout-it-out will return a Task<Object>.
To get the result of the task, we can use t/result. In this case, again, the t/result will start and wait on the task if it is not already completed.
(t/resultt1); => "I'm done yelling."
If you want to use await in a function but don’t want the function itself to return a task, but just get on with things, you can use t/async to provide an :async context. t/async wraps its body in an ^:async (fn [] ...body...). This form returns a task, so you will need to a tas operation on it to get work done.
Typically, you will need to call t/result if you want to get the value from the awaited call.
(defnjust-read[infile](t/result(t/async(t/await(File/ReadAllTextAsyncinfileCancellationToken/None)))))(just-readin-file);; => "Some random content"
The call to t/async returns a task, so we need to call t/result to get the value from the awaited call. If we had not done that, we would have gotten a Task<Object> back instead of the string content. In that case, it would be preferable generally to just define an ^:async function if you want to use await in it, but t/async can be useful if you want an anonymous async function. This might be useful to construct tasks for use in wait-all or wait-any calls, for example. (See below.)
Some utility functions
There are several utility functions to create basic tasks.
(t/->task42);; creates a Task<Object> that returns 42 when run(t/completed-task);; creates a Task that is already completed(t/delay-task3000);; creates a Task that delays for 3000 milliseconds
You can run any zero-arg Clojure function as a task:
Again, calling t/result on the result of t/run will start the task if it is not already started and wait for it to complete, returning the result. This does not take full advantage of suspension.
Waiting for one or for all
You can do wait-for-one and wait-for-all operations on a group of tasks.
You can either just run the tasks or ask for their value(s).
Function
Description
(t/wait-all tasks)
wait for all the tasks to complete; return nil
(t/wait-any tasks)
start all tasks, return the first one to complete
(t/wait-all-results tasks)
wait for all the tasks to complete, return a lazy sequence of their results
(t/wait-any-result tasks)
return the result of the first task to complete
An example:
;; A little dummy function to delay and then return a value.(defn^:asyncdelayed-value[msecsval](t/await(t/delay-taskmsecs))val);; Just a little test to make sure things are taking time.(time(t/result(delayed-value40007)));; => take more than 4 seconds (t/wait-all-results[(delayed-value20002000)(delayed-value40004000)(delayed-value60006000)]);; => (2000 4000 6000)(t/wait-any-result[(delayed-value20002000)(delayed-value40004000)(delayed-value60006000)]);; => 2000 (most likely)(t/wait-all-results[(delayed-value20002000)(delayed-value40004000)(delayed-value60006000)]500);; => nil (times out)(t/wait-any-result[(delayed-value20002000)(delayed-value40004000)(delayed-value60006000)]500);; => nil (times out)
Does it work?
Short answer: yeah.
I did some simple tests to look at things like thread affinity and flooding the thread pool. The simplest things to do is to replace a call like (t/await ...) with (.Wait ...). The latter does not yield its thread; the difference in performance is notable.