A game loop in a core.async goroutine

Growing up I had an old Commodore 64 with an assortment of games on 5¼" floppy disks. One of them was an election game, possibly President Elect by Strategic Simulations, Inc, but at the time I was too young to understand it, and I never played much. Last year I wanted to teach my kids about the electoral college, so I made my own election game, one aimed more at a third grader. I omitted real-world parties, issues, and polling data, and focused instead on a tight loop of nudging states' preferences for or against two different candidates. There's a little geography, and a lot of puns. I called it Electravaganza. You can play here.

Screenshot of Electravaganza

It's implemented in ClojureScript, the UI is rendered with Replicant, events handled with Nexus, and game data stored in DataScript. The interface is mostly a map of the United States; the player's controls are on the left side. In the original implemenation, each component had buttons which triggered a Replicant event when clicked, then Nexus routed the event to an action that produced an effect and updated the DB, often by setting a new UI component in the DB. Then the renderer would update the UI with the new component.

A simple Replicant event loop

That's a fairly standard rendering loop for Replicant, but putting a UI component in the DB was an experiment. I didn't end up liking it. It made the game loop hard to follow through the code. I had to cross back and forth between code in two different files to trace the event fired by a component's button to the action handler that specified the next UI component, then find that component and its button(s). Even if all the logic had been in action handlers, the code would have been split into small pieces that required lots of jumping around in the file, goto style.

Also, using Nexus actions for the game loop made it hard to change the rules. The first version of the game gave the player a random hand of action cards to choose from, then chose a random action for the computer opponent, then gave the human player a totally new set of cards. That was good enough to get the basic mechanics working, but it didn't allow for any strategy across turns. I wanted to tinker with the game loop, but having the logic split between components and action handlers meant that changing the sequence of steps required rewriting the actions. That was tedious and error-prone.

To address both problems, I introduced a proper state machine. While I briefly toyed with a high-level, data-oriented definition of states and transitions, the need for a loop with a turn counter smelled like a programming language, so first I tried a different form of state machine: a core.async goroutine.

I hadn't used core.async in a while. Though I appreciate Go's model of communicating sequential processes, it's overkill for most of the helper scripts and small apps I write. In this case, I didn't need to execute code in parallel, but for something to serve as a goroutine's namesake, a pausable coroutine. After some refactoring, I ended up with a game loop defined like this:

(defn basic-game [conn turns]
  (let [chans [(async/chan) (async/chan)]]
    (async/go
      (let [[player1 player2] (<! (choose-party chans))]
        (<! (choose-hairdo chans @conn player1))
        (<! (choose-hairdo chans @conn player2))
        (loop [turns-left turns]
          (<! (set-turns-left chans turns-left))
          (if (pos? turns-left)
            (do
              (<! (choose-action chans @conn player1))
              (<! (choose-action chans @conn player2))
              (recur (dec turns-left)))
            (<! (results chans))))))
    chans))

The syntactic gynmastics of core.async chan, go, and <! make it a bit noisy, but it did centralize the logic of the game loop. The helper functions provided a domain-level grammar of the game's sequence.

The game initializes this goroutine during setup, stores the returned channels in the DB, sends Nexus effects keyed with :effect/send to the input channel, and spawns a second goroutine that listens for output messages and dispatches actions:

A Replicant event loop with a goroutine loop

That requires each of the helper functions used by the goroutine to have some of the same syntactic overhead as the main loop, littered with go and <! and >!. Maybe that boilerplate could be cleaned up with macros. But a perfect DSL wasn't my main concern. I was more interested in having a game loop defined in a compact and comprehensible chunk, as well as one that could be easily swapped out for a different loop. A goroutine gave me that. I could fiddle with the loop so each player started with random cards then discarded played cards and kept the rest for the next turn:

(defn hand-of-actions [conn turns actions-per-turn]
  (let [chans [(async/chan) (async/chan)]
        hand-limit actions-per-turn]
    (async/go
      (let [[player1 player2] (<! (choose-party chans))]
        (<! (choose-hairdo chans @conn player1))
        (<! (choose-hairdo chans @conn player2))
        (loop [turns-left turns
               hand1 (db/random-actions @conn hand-limit)
               hand2 (db/random-actions @conn hand-limit)])
	      (<! (set-turns-left chans turns-left))
	      (if (pos? turns-left)
	        (let [[played1 hand1] (<! (choose-actions chans @conn player1 hand1))
	              [played2 hand2] (<! (choose-actions chans @conn player2 hand2))]
	          (recur (dec turns-left)
	                 (replace-action @conn hand1 hand-limit)
	                 (replace-action @conn hand2 hand-limit)))
	        (<! (results chans)))))
    chans))

I didn't have to add features to a custom state machine evaluator, but instead used Clojure's built-in language constructs to keep state between passes through the loop. A minimum viable implementation.

The end result is more complicated, with more moving parts—a game-loop goroutine running in the background, a goroutine listening for output messages, channels saved to the database—but it grouped logic according to its responsibilities and made the individual parts easier to reason about.

That's the important part. A functional language like Clojure usually avoids the kind of imperative logic above, but sometimes the clearest expression of a solution is step-by-step. Especially for a game, which is inherently imperative. Do this, then do that. The goal isn't to always avoid side effects and only use pure functions. My original Nexus handlers were pure functions. The goal is to make code understandable and to facilitate new ways of using it. Having clusters of tiny pure functions can undermine those goals.

Both the code and the game have plenty of room for improvement, but using a core.async goroutine allowed me to define different new game loops easily enough to arrive at some rules my son found fun to play, and hopefully one that taught him a little about the electoral college and geography (and wordplay).

If you have suggestions for the game, or the code, feel free to email me.

Permalink

Clojure Deref (Apr 28, 2026)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS).

Clojure/Conj 2026 CFP

We’re looking for 40-minute talks that go beyond the basics: hard-won lessons, production stories, trade-offs, deep dives into language features, libraries, or tools, and ideas that change how people build things. Tracks include: Language, Experience Report, Library, Tools, AI, Ideas, and Fun.

Join us for the largest gathering of Clojure developers in the world! Meet new people and reconnect with old friends. Enjoy two full days of talks, a day of workshops, social events, and more.

September 30 – October 2, 2026
Charlotte Convention Center, Charlotte, NC

Early bird and group tickets are on sale now.

Clojure Documentary

In case you missed it, the Clojure Documentary is live!

Follow it up with the Clojure Documentary Q&A.

Don’t miss the Documentary show notes.

Libraries and Tools

Debut release

  • feles-tales - A game about a cat sneaking and stealing stuff

  • eca-desktop - ECA Desktop - Use ECA from any machine

  • svar - Type‑safe LLM output for Clojure. Works with any text‑only model.

  • pg-datahike - Postgres compatibility layer for Datahike.

  • disorganized-notes - Tables, lists, reminders, real-time sync, multiplatform. Written in ClojureDart.

  • bareforge - Companion visual builder for BareDOM web components. Drag components, declare reactive state, export fully interactive CLJS or JS project

Updates

  • stripe-clojure 2.3.0 - Clojure SDK for the Stripe API.

  • calva-backseat-driver 0.0.30 - VS Code AI Agent Interactive Programming. Tools for CoPIlot and other assistants. Can also be used as an MCP server.

  • re-frame-query 0.9.0 - Declarative data fetching and caching for re-frame inspired by tanstack query and redux toolkit query

  • plumcp 0.2.0 - Clojure/ClojureScript library for making MCP server and client

  • pretty 3.8.0 - Library for helping print things prettily, in Clojure - ANSI fonts, formatted exceptions

  • clay 2.0.16 - A REPL-friendly Clojure tool for notebooks and datavis

  • sqlatom 1.2.0 - Clojure library that stores atoms in a SQLite database

  • clojure-clr 1.12.3-alpha8 - A port of Clojure to the CLR, part of the Clojure project

  • baredom 2.4.1 - BareDOM: Lightweight CLJS UI components built on web standards (Custom Elements, Shadow DOM, ES modules). No framework, just the DOM

  • calva 2.0.579 - Clojure & ClojureScript Interactive Programming for VS Code

  • datomic-pro 1.0.7622 - The fully transactional, cloud-ready, distributed database.

Permalink

Clojure on Fennel part three: parsing

The two previous posts were not related to the compiler itself, but were kicked off by the start of the compiler development. I’d say this project was the reason that I made proper immutable data structures for Fennel and Lua. But now we’re finally going to talk about the compiler itself! And we’ll start with the parsing stage - transforming Clojure code into something that can be operated on by the compiler.

Manual single-pass parser and compiler

When I started this project, as I usually do, I underestimated the complexity at hand. My first attempt was a single-pass compiler that accepted a stream of characters of Clojure code and re-compiled it on the fly.

My idea was to write a simple recursive-descent parser that, when accepting a character would decide how to translate it to Fennel, assuming my fennel-cljlib library handled most of the semantics. So, for example, when encountering [ it would translate it to (cljlib.vector and thus [1 2 3] would descend to compile-vector and expand to (cljlib.vector 1 2 3).

The idea is simple, and at first I thought that it would suffice, but then I remembered that there are a lot of other things in Clojure that I have to consider. For example #_ - the ignore form syntax, and ^foo - the metadata syntax. This complicated things a lot, so I added initial support for them and moved on - because I decided that this will be a bootstrap compiler for a proper Clojure parser.

Clojure parsers

After some time I had a compiler that could compile most of the arbitrary code I threw at it. Of course, it was not exactly to the Clojure spec, but it worked and I was going to replace it anyway. So I started looking at available Clojure parsers written in Clojure.

Edamame

I started with Edamame. The choice was based on the fact that the Squint project uses it and my project is similar to Squint, or so I thought. This parser is also used in SCI - a Clojure interpreter, so it seemed like a good fit.

The first thing I did was make sure that Edamame at least compiles with my bootstrap compiler. The fact that it compiles doesn’t mean that it works - it just translated everything into Fennel, with bits of cljlib here and there. So the next step was to make it work.

But then I had a realization that I don’t need to do that at all - I could just parse Edamame with Edamame, and write a compiler against its output! Meaning, I could use Edamame’s parser directly until my compiler is complete enough to compile Edamame. And since my idea was to replace bootstrap compiler in the end anyway, I decided that it doesn’t make much sense to make it work enough to fully compile Edamame into a runnable Fennel code. So I used Edamame to parse Edamame, and got back…

… the same Edamame source.

Imagine my face when I realized that Edamame doesn’t produce an abstract syntax tree with all information, but instead it just produces Clojure data structures, such as lists, vectors, and other stuff. And all of the information about the source code sits inside metadata of these data structures, not in the output. Yes, it does transform some Clojure forms into lists, like @foo is transformed into (deref foo), but that’s still nothing to me, because Fennel has no list data type and its support for metadata is nowhere near compatibility with Clojure.

I get it, I should have read the readme with more attention, and don’t make assumptions based on wild guesses, so it’s fine. Still, I wasn’t dropping the idea to use an existing Clojure parser, and the bootstrap compiler combo. I just need a different parser.

Rewrite-clj

Next one I looked at was rewrite-clj.

This library does return a tree of objects. It’s still a list, but at least it contains nodes that can be translated into Fennel objects. However, it still had a few problems.

First, its source code is much bigger than that of Edamame. Its API is more complicated, and it expects the use of Zippers, which I don’t yet have in cljlib. I didn’t want such a big parser for my compiler, given that it’s job is quite simple - produce an AST that I will always walk in full.

Second, it’s still producing some Clojure data structures that I can’t cleanly map to Fennel, so it would mean that the compiler itself would depend on cljlib. And I didn’t want that.

Luckily for me, their user guide mentioned another parser.

Parcera

Finally, parcera.

Looking at the readme, I finally saw what I wanted to see:

(ns example.core
  (:require [parcera.core :as parcera]))

;;parse clojure code from a string
(parcera/ast (str '(ns parcera.core
                     (:require [clojure.data :as data]
                               [clojure.string :as str]))))

;; => returns a data structure with the result from the parser
(:code
 (:list
  (:symbol "ns")
  (:whitespace " ")
  (:symbol "parcera.core")
  (:whitespace " ")
  (:list
   (:keyword ":require")
   (:whitespace " ")
   (:vector (:symbol "clojure.data")
            (:whitespace " ")
            (:keyword ":as")
            (:whitespace " ")
            (:symbol "data"))
   (:whitespace " ")
   (:vector (:symbol "clojure.string")
            (:whitespace " ")
            (:keyword ":as")
            (:whitespace " ")
            (:symbol "str")))))

Tagged tree. Yes, it’s still a list, but at least this can be easily mapped to Fennel sequential tables. So I opened the code and…

…there’s nothing there! Turns out, this project uses an ANTLR grammar for parsing Clojure.

At this point I was quite frustrated with the fact that there’s no single Clojure parser that could provide something similar to what parcera provides, but written in pure Clojure. Or at least in Java, which I could hand-translate as a last resort. Yes, there’s a Clojure parser inside the Clojure project itself, written in Java, but my goal was to recompile an existing parser written in Clojure.

Grammar-based parser

After a bit of consideration, I decided that a grammar-based approach is not that bad of an idea after all. I couldn’t use the ANTLR grammar directly, but at least I had it, and it seemed complete enough.

In Lua there’s a great library called LPeg that can generate parsers at runtime from a PEG grammar, and I wanted to try it out for quite some time. But to do that I needed a PEG grammar for Clojure. ANTLR and PEG are quite different, but it’s not impossible to translate one grammar into another.

So that’s what I did:

# Clojure PEG Grammar
#
# Adapted from https://github.com/carocad/parcera (ANTLR4 grammar v0.11.6)
#
# Key differences:
#
# - Unlike ANTLR, PEG choices are ordered (/).
#   More specific alternatives need to come first.
# - No separate lexer phase.
#   All rules operate on characters directly.
# - ANTLR fragment rules are prefixed with '_'.
# - ANTLR's SENTINEL (catch-all for invalid tokens) has no direct PEG equivalent.
#   The parser will silently fail on invalid input.

# Parser rules

code        <- input* !.

input       <- ignore / form

ignore      <- whitespace / comment / discard

form        <- literal / collection / reader_macro

# sets and namespaced maps are under dispatch (they start with #)
collection  <- list / vector / map

list        <- '(' input* ')'

vector      <- '[' input* ']'

map         <- '{' input* '}'

# macro_keyword before keyword ('::' is longer than ':')
literal     <- macro_keyword / keyword / string / number / character / symbol

keyword       <- KEYWORD
macro_keyword <- MACRO_KEYWORD
string        <- STRING

# ordering: most-specific prefix first; LONG last (least specific)
number        <- HEXADECIMAL / OCTAL / RADIX / DOUBLE / RATIO / LONG

# NAMED_CHAR first (longest); UNICODE before UNICODE_CHAR (more specific)
character     <- NAMED_CHAR / OCTAL_CHAR / UNICODE / UNICODE_CHAR

symbol        <- SYMBOL

# unquote_splicing before unquote ('~@' vs '~')
reader_macro <- unquote_splicing
              / unquote
              / metadata
              / backtick
              / quote
              / dispatch
              / deref

metadata <- ((metadata_entry / deprecated_metadata_entry) ignore*)+
            ( symbol
            / collection
            / set
            / namespaced_map
            / tag
            / fn
            / unquote_splicing
            / unquote
            / conditional_splicing
            / conditional
            / deref
            / quote
            / backtick
            / var_quote
            )

metadata_entry            <- '^' ignore* (map / symbol / string / keyword / macro_keyword / conditional)

# #^ is deprecated syntax for metadata
deprecated_metadata_entry <- '#^' ignore* (map / symbol / string / keyword / macro_keyword / conditional)

backtick          <- '`' ignore* form

quote             <- "'" ignore* form

# negative lookahead prevents '~' from matching when '~@' follows
unquote           <- '~' !'@' ignore* form

unquote_splicing  <- '~@' ignore* form

deref             <- '@' ignore* form

# conditional_splicing before conditional ('#?@' vs '#?')
# tag last (most general: '#' + symbol)
dispatch <- conditional_splicing
          / conditional
          / set
          / namespaced_map
          / fn
          / regex
          / var_quote
          / symbolic
          / eval
          / tag

# no whitespace allowed between '#' and the delimiter
fn                    <- '#' list

regex                 <- '#' STRING

set                   <- '#{' input* '}'

namespaced_map        <- '#' (macro_keyword / keyword / auto_resolve) ignore* map

auto_resolve          <- '::'

var_quote             <- "#'" ignore* form

discard               <- '#_' ignore* form

tag                   <- '#' symbol ignore* form

conditional           <- '#?' whitespace* list

conditional_splicing  <- '#?@' whitespace* list

# ##Inf, ##-Inf, ##NaN  (allows arbitrary symbolic values)
symbolic              <- '##' ignore* SYMBOL

eval                  <- '#=' ignore* (symbol / list / conditional)

whitespace <- WHITESPACE

comment    <- COMMENT


# Lexical rules (terminals)

HEXADECIMAL <- _SIGN? '0' [xX] [0-9A-Fa-f]+ _BIG_INT?

OCTAL       <- _SIGN? '0' [0-7]+ _BIG_INT?

# base 2-36, then r/R, then digits in that base
RADIX       <- _SIGN? ([2-9] / [12] [0-9] / '3' [0-6]) [rR] [0-9a-zA-Z]+

RATIO       <- _SIGN? _DIGIT+ '/' _DIGIT+

LONG        <- _SIGN? _DECIMAL _BIG_INT?

# FRACTION EXPONENT before FRACTION alone (longer match first)
DOUBLE      <- _SIGN? _DECIMAL+ ((_FRACTION _EXPONENT / _FRACTION / _EXPONENT) 'M'? / 'M')

_BIG_INT  <- 'N'
_FRACTION <- '.' _DIGIT*
_EXPONENT <- [eE] _SIGN? _DIGIT+
_DECIMAL  <- '0' / [1-9] _DIGIT*


STRING <- '"' ([^"\\] / '\\' .)* '"'


# \p{White_Space} approximation
WHITESPACE <- [ \t\n\r\f,]+

COMMENT <- (';' / '#!') [^\r\n]*


# Named characters must not be followed by name-chars (word boundary)
NAMED_CHAR   <- '\\' ('newline' / 'return' / 'space' / 'tab' / 'formfeed' / 'backspace')
                !(_ALLOWED_NAME_CHARACTER / _DIGIT)

UNICODE      <- '\\' 'u' [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
                !(_ALLOWED_NAME_CHARACTER / _DIGIT)

# octal char: 0-377; longest alternative first
OCTAL_CHAR   <- '\\' 'o' ([0-3] [0-7] [0-7] / [0-7] [0-7] / [0-7])
                !(_ALLOWED_NAME_CHARACTER / _DIGIT)

# any single non-combining-mark character after backslash
UNICODE_CHAR <- '\\' ![\u0300-\u036F\u1DC0-\u1DFF\u20D0-\u20FF] .
                !(_ALLOWED_NAME_CHARACTER / _DIGIT)


# ::/ is NOT a valid macro keyword
MACRO_KEYWORD <- '::' (_SIMPLE_KEYWORD '/')? _SIMPLE_KEYWORD

KEYWORD       <- ':' (_SIMPLE_KEYWORD '/')? (_SIMPLE_KEYWORD / '/')

_SIMPLE_KEYWORD <- _KEYWORD_HEAD _KEYWORD_BODY+
                 / _KEYWORD_HEAD

_KEYWORD_BODY <- _KEYWORD_HEAD / ':'

_KEYWORD_HEAD <- _ALLOWED_NAME_CHARACTER / _DIGIT / [#'] / _SIGN


SYMBOL <- (_SIMPLE_SYMBOL '/')? (_SIMPLE_SYMBOL / '/')

# longer alternatives first for correct PEG matching
_SIMPLE_SYMBOL <- _SIGN _SYMBOL_HEAD _SYMBOL_BODY*
                / _ALLOWED_NAME_CHARACTER (_SYMBOL_HEAD / _DIGIT / ':') _SYMBOL_BODY*
                / _ALLOWED_NAME_CHARACTER
                / _SIGN

_SYMBOL_BODY <- _SYMBOL_HEAD / _DIGIT / ':'

_SYMBOL_HEAD <- _ALLOWED_NAME_CHARACTER / [#'] / _SIGN


# Characters allowed in names: everything EXCEPT reserved characters.
# Equivalent to ANTLR ~[\p{White_Space},()[\]{}"@~^;`\\:#'/0-9+-]
# Note: \p{White_Space} approximated with common whitespace chars here.
_ALLOWED_NAME_CHARACTER <- ![ \t\n\r\f,()\[\]{}"@~^;`\\:#'/0-9+\-] .

_SIGN  <- [+\-]

_DIGIT <- [0-9]

But of course, you can’t just load a PEG grammar into LPeg. Instead, you need to write the grammar programmatically, using LPeg’s API. I’ll spare you the details, but here’s a full source code of the parser.

After conversion work, I finally had a parser that could produce an AST:

>> (local {: parse} (require :impl.parser))
nil
>> (parse "(ns parcera.core
             (:require [clojure.data :as data]
                       [clojure.string :as str]))")
[:code
 [:list
  [:symbol "ns"]
  [:whitespace " "]
  [:symbol "parcera.core"]
  [:whitespace "\n             "]
  [:list
   [:keyword ":require"]
   [:whitespace " "]
   [:vector
    [:symbol "clojure.data"]
    [:whitespace " "]
    [:keyword ":as"]
    [:whitespace " "]
    [:symbol "data"]]
   [:whitespace "\n                       "]
   [:vector
    [:symbol "clojure.string"]
    [:whitespace " "]
    [:keyword ":as"]
    [:whitespace " "]
    [:symbol "str"]]]]]

Or not:

>> (parse "(ns parcera.core
             (:require [clojure.data :as data]
                       [clojure.string :as str])") ;; <- missing ')'
nil

Turns out, LPeg doesn’t have any way to report errors if the input string doesn’t match. It simply returns nil.

Luckily for me, there’s an additional library, called lpeglabel that can help with that. It implements an extension to the LPeg library that allows defining tagged failures:

>> (parse "(ns parcera.core
             (:require [clojure.data :as data]
                       [clojure.string :as str])")
unknown:3:49: Parse error: EOF while reading: Expected an ')'

Now I have a working Clojure parser with a usable output.

The parser

I mentioned before that Clojure metadata and special characters did introduce problems to my original parser. Well, with this grammar, this is no longer an issue:

>> (each [_ s (ipairs ["@foo" "^Foo bar" "^{:baz true} qux" "#_1 2"])]
     (pp (parse s)))
[:code [:deref [:symbol "foo"]]]
[:code [:metadata
        [:metadata-entry [:symbol "Foo"]]
        [:whitespace " "]
        [:symbol "bar"]]]
[:code [:metadata
        [:metadata-entry
         [:map [:keyword ":baz"] [:whitespace " "] [:symbol "true"]]]
        [:whitespace " "]
        [:symbol "qux"]]]
[:code [:discard [:number "1"]] [:whitespace " "] [:number "2"]]

Yes, the parser doesn’t transform @foo into (deref foo), but it’s OK, as I can handle it in the compiler easily now. Metadata is fully supported by the parser and is stripped in the compiler. And discard (#_) actually works in all of its supported ways, i.e. #_#_ 1 2 works fine.

Each parser result starts with :code, an entry point for the compiler. And the compiler itself is just a simple recursive descent over all of the nodes in a loop. So, that’s good, right?

Well, yes and no. There are some cons to this approach.

First, and foremost - this parser doesn’t support streaming, meaning I can’t stream chunks of code and compile them. Well, it’s not impossible, but currently it works on strings only. It is possible to write a second parser that would split text into top-level expressions, but for now it’s an optimization waiting to happen. Right now, this means that files will be parsed in full - having a greater footprint in memory.

Second, and arguably more important - this is a C library dependency. Both LPeg and lpeglabel are C libraries, meaning that the compiler now works only on Lua runtimes that can load object files. And the choice of systems where this compiler can run is now limited to those that can compile these libraries. Not that big of a deal, if you ask me, since C runs almost everywhere, but it’ll make the compiler difficult to ship as a no-dependency binary. It’s possible to compile a Fennel script into a binary and include all of its dependencies, but it would mean that I’ll have to produce binaries for many different operating systems and architectures. Or, alternatively, require people to install LPeg and lpeglabel through luarocks.

For now, I’m willing to accept this. There’s a port of LPeg in pure Lua, but it’s quite slow and doesn’t have support for lpeglabel features. There’s an open request for that, but it’s been sitting open for quite some time.

Other than that, the parsing is complete and we can look at the compiler part of the ClojureFnl project. But that’s gonna be in the next post.

Permalink

Immutability - Not a Universal Law but a Trade-off

Introduction

Immutability is often presented as a best practice: once data is created, it is never changed. Instead of modifying existing data, you create new versions.

That sounds clean, predictable, and safe—and it often is. But treating immutability as a universal rule leads to overengineering and unnecessary complexity. Like most architectural decisions, immutability is a trade-off.

This post explores where immutability shines, where it hurts, and how to apply it pragmatically across different levels of a system.

Code Level

At the code level, immutability is usually a clear win.

Where it works well

Functions
Pure functions benefit directly from immutable data. No hidden side effects, easier reasoning, simpler testing.

Example: updating user’s balance using Clojure.

(defn add-balance [user amount]
  (update user :balance + amount))

(def user {:id 1 :balance 100})

(def updated-user (add-balance user 20))

;; user stays unchanged
;; updated-user is a new map

Why this is strong

  • No mutation
  • Same input → same output
  • Original data untouched
  • Thread-safe by default

Concurrency (threads/services updating shared data)
With mutation: Problems are hidden inside objects
With immutability: Problems are visible at coordination points

Example: updating user's balance using Java

public class User {
    private int balance;

    public User(int balance) {
        this.balance = balance;
    }

    public void addBalance(int amount) {
        this.balance += amount;
    }

    public int getBalance() {
        return balance;
    }
}

public class Main {
    public static void main(String[] args) throws InterruptedException {
        User user = new User(0);

        Runnable task = () -> {
            for (int i = 0; i < 1000; i++) {
                user.addBalance(1);
            }
        };

        Thread t1 = new Thread(task);
        Thread t2 = new Thread(task);

        t1.start();
        t2.start();

        t1.join();
        t2.join();

        System.out.println("Final balance: " + user.getBalance());
/*
What should happen? Expected result: 2000
What actually happens? other number 
Problem is that this.balance += amount;
Two threads can interleave this -> classic race condition
*/
    }
}

/*
Fix 1 Synchronization (works, but costly)
*/ 

public class User {
    private int balance;

    public User(int balance) {
        this.balance = balance;
    }

    public synchronized void addBalance(int amount) {
        this.balance += amount;
    }

    public synchronized int getBalance() {
        return balance;
    }
}

/*
Fix 2 Atomic Types (better, but still mutable)
*/

public class User {
    private final AtomicInteger balance = new AtomicInteger(0);

    public void addBalance(int amount) {
        balance.addAndGet(amount);
    }

    public int getBalance() {
        return balance.get();
    }
}

/*
Fix 3 Immutability (In java still can be problem with overwrite references) 
*/ 

public final class User {
    private final int balance;

    public User(int balance) {
        this.balance = balance;
    }

    public User addBalance(int amount) {
        return new User(this.balance + amount);
    }

    public int getBalance() {
        return balance;
    }
}

/*
Fix 4: Use immutability + coordination

Concurrency is visible (not hidden)

In mutable code: this.balance += 1;

Concurrency problems are:
* invisible
* implicit
* easy to miss

*/

public class Main {
    public static void main(String[] args) throws InterruptedException {
        AtomicReference<User> userRef = new AtomicReference<>(new User(0));

        Runnable task = () -> {
            for (int i = 0; i < 1000; i++) {
                userRef.updateAndGet(user -> user.addBalance(1));
            }
        };

        Thread t1 = new Thread(task);
        Thread t2 = new Thread(task);

        t1.start();
        t2.start();

        t1.join();
        t2.join();

        System.out.println(userRef.get().getBalance());
    }
}

Concurrency problems are not caused by threads—they are caused by shared mutable state.

Why this is strong ?

  • Mutable state fails silently under concurrency.
  • Immutable state forces you to be explicit about how change happens.

Conclusion

Immutability at the code level is not controversial—it’s cheap and effective.
The real trade-offs begin when you move beyond functions into data modeling and system design.

Data & Domain Level

This is where things get more complicated, because it becomes a design decision with consequences. Lets explore them, but first understand the problem:

At this level, the question changes from:

Should I mutate this object?

to:

What is my data model—state or history?

You are choosing between

  • Mutable (State-based model) You store the current truth: The system answers: “What is the state right now?”
  • Immutable (Event / history-based model) You store what happened: The system answers: “How did we get here?”

Example
Mutable Model

class User {
    int balance;
}
// operations
user.balance += 20;
user.balance -= 10;

What you get is current balance

Example Immutable Model (event-based)

sealed interface Event {}

record BalanceIncreased(int amount) implements Event {}
record BalanceDecreased(int amount) implements Event {}

// state is derived
int apply(List<Event> events) {
    return events.stream()
        .mapToInt(event -> {
            if (event instanceof BalanceIncreased inc) {
                return inc.amount();
            } else if (event instanceof BalanceDecreased dec) {
                return -dec.amount();
            } else {
                return 0;
            }
        })
        .sum();
}
/*
 This works nicely for simple aggregations.
 For real domain logic, you almost always end up with a fold/reduce   over immutable state.
*/

What you get is transactions.

How it works

Instead of updating a single record:

  • user.status = "active"

You model state transitions:

  • UserRegistered
  • UserActivated
  • UserSuspended

Or instead of overwriting:

  • balance = 120

You append:

  • +20
  • -10

Pros

  • Traceability
    You know what happened, not just what is.

  • Auditability
    Full history is preserved.

  • Debugging
    You can reconstruct past states.

  • Temporal queries
    “What was the value at time T?”

Cons

  • More storage
    You store every change, not just the latest state.

  • More complex data modeling
    Designing events or state transitions requires more thinking than simple CRUD. You must design: events invariants transitions.

  • Not always needed
    Do you really need to know when and why user.name changed?

When to use it

  • Shared state across systems
  • Domains where history matters (finance, auditing, workflows)

When not to use it

  • Large objects with frequent small updates
  • Domains where history has no business value (Data changes frequently with no meaning)
  • Simple data like user profiles where only the latest state matters Example CRUD systems: forms → tables → save

Conclusion

Conclusion here: make sure you understand pros and cons and when to use it.

Types of immutable models

  • Event-Based Model- domain events (just discussed)

  • Versioned State Instead of storing events, you store full versions of state.

  • Append-Only Log (State + History Hybrid) lightweight immutability because we track what changed from not why Example:
    json { "field": "status", "old": "PENDING", "new": "SHIPPED" }

  • Immutable Value Objects you keep immutability inside the domain, but store state

API Level

Traditional APIs are usually state-oriented. They ask the client to send the final desired state.

Instead of sending the final state, send the intent / command

Command-Based API

  • The client sends intent
  • API becomes business-driven

Example
POST /orders/123/ship

Instead of

PUT /orders/123
{
  "status": "SHIPPED"
}

You end up with more endpoints
Instead of one CRUD endpoint, you may have many:

/ship
/cancel
/refund

Client must know actions, not just state shape

Event-Style API

This answers what happened

Example

POST /orders/123/events
{
  "type": "OrderShipped",
  "timestamp": "2026-04-23T10:00:00Z"
}

Commands are requests
Events are facts

Clients usually should not publish domain events directly
Better:

  • clients send commands
  • server emits events

Use command/event style when

  • workflows matter
  • domain logic is rich
  • multiple systems react to actions
  • auditability matters

Avoid when

  • simple CRUD app (forms + tables) *no business workflows

Conclusion

CRUD APIs model data.
Command APIs model behavior.

Next steps

So far, we’ve looked at immutability in code, domain modeling, and APIs. At the code level, it’s mostly a win—but beyond that, once you move into domain and API design, trade-offs start to appear—and you need to be aware of them.

This is only the first part. In the next post, we’ll zoom out even further and explore what happens when these ideas hit persistence and system — where those trade-offs become even more apparent.

Permalink

Why Agents Need an Infrastructure Package Manager

Kief Morris recently published Why I have my agents write infrastructure code , and it is one of the clearest articulations of Agentic DevOps I have read. His central observation is hard to argue with: when you let an agent drive the AWS CLI directly, it takes multiple tries, casually blows away data, and replaces solid implementations with slow, less-reliable alternatives. But when you ask an agent to write reusable infrastructure code instead of executing low level CLI tools, you get something much safer: a composable, auditable artifact that a human can review before anything touches production.

We agree with everything in that post. In fact, we have been building exactly the ecosystem he envisions—we just call it a package manager.

The Gap Between Intent and Execution

Morris frames the problem as an agent adapting pre-existing components to specific needs. The missing piece, as he notes, is the ecosystem of components itself. Without it, agents fall back to generating raw HCL or shell scripts from scratch, which is precisely where hallucinations and cascading failures live.

This is the same gap the web ecosystem solved between 2010 and 2015. Before React and npm, every JavaScript developer rewrote the same DOM manipulation logic from scratch. After React, the ecosystem converged on components. A developer—or today, an agent—doesn’t write a date-picker from scratch; they install one from a registry and wire it into the application.

DevOps needs the same shift. BigConfig is the attempt to make it happen.

BigConfig as the npm for Infrastructure

BigConfig is a Clojure-based infrastructure package manager. The mental model is intentionally close to npm or Maven:

  • Packages are versioned, composable units of infrastructure logic (compute, DNS, SMTP, Kubernetes controllers, etc.)
  • The lock file records the exact versions resolved, so every bb bigconfig apply is deterministic
  • Babashka executes the packages locally—no server, no daemon, no state machine to manage

When an agent works with BigConfig, it is not guessing Terraform resource attributes or wrestling with provider schema drift. It is composing packages the same way a React developer composes components: declare what you want, let the abstraction handle the how.

bb.edn
{:deps {io.github.amiorin/once {:git/url "https://github.com/amiorin/once"
:git/sha "8ffbbc2ea0974365575c7ee44b7d890e69447144"}}
:tasks {:requires ([io.github.amiorin.once.package :as pkg])
package {:doc "bb package create | bb package delete"
:task (pkg/once*
*command-line-args*
{:big-config.render/profile "online"
:big-config.workflow/params
{:domain "bigconfig.online"
:package "online"
:once {:applications [{:host "www.bigconfig.online"
:image "ghcr.io/amiorin/big-config-website:latest"}]}
:provider-compute "oci"
:oci-config-file-profile "DEFAULT"
:oci-display-name "bigconfig-online"
:oci-shape "VM.Standard.A1.Flex"
:oci-ocpus 1
:oci-memory-in-gbs 6
:oci-boot-volume-size-in-gbs 50
:oci-boot-volume-vpus-per-gb 30
:oci-ssh-authorized-keys "~/.ssh/id_ed25519.pub"
:provider-smtp "resend"
:resend-server "smtp.resend.com"
:resend-port 587
:resend-username "resend"
:provider-dns "cloudflare"
:provider-backend "s3"}})}}}

An agent reading this file understands the intent immediately. It does not need to know the Hetzner API, the Cloudflare zone API, or the Resend domain verification flow. Those details are encapsulated inside the package.

Why Clojure?

Morris notes that unlike static libraries, agents can adapt code to specific needs. This is where the language matters. Clojure’s data-driven programming model means that infrastructure packages are, at their core, plain data maps. An agent can read them, modify them, and reason about them without parsing a DSL or reverse-engineering a class hierarchy. The code that acts on the data is small and testable independently.

Babashka—the scripting runtime we use—runs the same Clojure source without a JVM startup penalty, which means an agent can invoke bb package create as naturally as it runs terraform apply, but with far more predictable outcomes.

Reusability Without Rigidity

One concern Morris raises is the tension between reusability and adaptability. A static library forces you into its abstractions; you fight it the moment your requirements diverge. BigConfig avoids this by treating packages as data transformations rather than black-box executables. Each package exposes its intermediate data before it runs, so an agent—or a human—can inspect and override specific values without forking the package.

This is the same principle as React’s props. You do not rewrite the component; you pass different props. In BigConfig, you pass different configuration keys.

The Ecosystem Is Already Growing

The once package is the first published example: a complete personal PaaS (compute + DNS + SMTP) that an agent can provision on Hetzner or OCI with a single command. More packages are in progress for Kubernetes namespaces, observability stacks, and database clusters.

Each new package is a node in the ecosystem Morris envisions. Each one reduces the surface area an agent must hallucinate, and increases the confidence that what gets applied to production is what was intended.

What This Means for Practitioners

Morris’s post ends with a hopeful note: if agents can adapt working, pre-built code to specific needs, we may have the basis for a healthy ecosystem of infrastructure components. We share that hope, and we think the path there is clear:

  1. Stop asking agents to write raw Terraform. They are being asked to generate “assembly code” for infrastructure, and they will keep making assembly-level mistakes.
  2. Build or adopt a package layer. The package is the unit of trust, not the generated file.
  3. Let the agent compose, not author. The moment the agent’s job becomes “pick the right packages and supply the right configuration,” the error rate drops dramatically.

If you are thinking about how to structure infrastructure for an agentic world, the package manager mental model is worth exploring. BigConfig is one implementation of that idea, built in Clojure with Babashka, and it is open source.

To see how your agent can operate a BigConfig package that encapsulates multiple Terraform resources and Ansible playbooks, start at the BigConfig Demo .

Permalink

The Quiet Sabotage: Why Most of My Dead Projects Died of Overthinking

Kevin Lynagh published a short essay this week about how he sabotages his own projects by overthinking, scope creep, and structural diffing. Four hours researching semantic diff tools when he needed an Emacs shortcut. Hundreds of hours on background research for a Clojure-Rust hybrid and a constraint-based CAD tool, neither shipped. The piece landed on me hard because I have been keeping a list.

My list is of projects I killed without shipping. Over the last three years it has grown to about forty. In the same window I shipped twenty. So for every one I ship I kill two.

The patterns of the killed ones are almost always the same. Market did not kill them. Competition did not kill them. I killed them by thinking too hard before building.

The three ways I sabotage a project

1. Researching the wrong depth first

Every dead project has a research graveyard in my notes. Deep reads on niche architecture choices. Benchmarks of libraries I would never actually use. Comparison tables of state management patterns. All of it collected before I had written a single line of the thing I wanted to build.

The depth is always disproportionate. I am four layers into a Rust async runtime before I have drawn the UI flow. I know the latency tradeoffs of three vector databases before I know if my users need vector search at all. The research feels productive because the notes fill up. The research is actually procrastination in a smart hat.

The tell: if you have spent more time on research than on the smallest shippable version of the thing, you are not researching. You are avoiding commitment.

2. Scope creep before mile one

Kevin's essay names this directly. He spent hundreds of hours on a programming language that was meant to be a tool to build a CAD app that was meant to design a shelf. The shelf never got built.

I do this constantly. A recent example: I wanted to add a simple analytics view to one of my tools. By the time I finished "planning," I had a writeup of how to build a self-hosted Plausible alternative because I did not want to pay for SaaS analytics for the other thirteen products I run. The scope went from "bar chart of last 30 days" to "OSS analytics platform." The bar chart never got built. The OSS analytics platform also never got built.

When the scope creeps before you have mile one in production, the original need gets orphaned. The orphan dies quietly because the new scope is now too ambitious for a single weekend, and you already have a day job.

3. Tooling yak-shaves dressed up as foundational work

This is the subtle one. You are about to start the project. You open your editor. The editor setup feels slightly wrong. You spend a Saturday on a new dotfiles config. The config involves a new terminal. The new terminal does not render your font correctly. You rebuild your font-rendering pipeline. By Sunday night you have not written a line of the project and you have a beautiful terminal.

The yak-shave feels adjacent to real work because it involves tools, and tools are productive. They are only productive if the thing they enable is also happening.

I have learned to run a sanity check: if the task I just started is not the project, and does not directly unblock the project within the next hour, I close the tab.

What actually ships

Every project I have shipped followed the same shape. Not "planned carefully." Not "researched thoroughly." The shape is:

Step What happened Time
1 Wrote the smallest useful thing 1 weekend
2 Put it in front of one person who might use it 1 evening
3 Fixed the thing they actually needed fixed, not the thing I thought they needed fixed 1 weekend
4 Released a v0.1 publicly 1 afternoon
5 Started iterating based on real usage ongoing

None of this requires prep research. It requires picking a small wedge and accepting that the first version will be wrong in ways you cannot predict.

The heuristic I now use

When I catch myself about to open another tab for research, I ask: can I make progress on this with a slightly worse tool?

  • Do I need the semantically-correct diff tool, or can I use git diff and move on?
  • Do I need a vector database, or can I use SQLite full-text search for the first 1,000 users?
  • Do I need a real design system, or can I use Tailwind defaults and ship?

The answer is almost always yes. The worse tool is almost always enough. The time saved goes into actual product.

One specific reframe that helped

I stopped asking "what is the right way to do this." I started asking "what is the smallest thing I can ship that would embarrass me in a useful way."

The embarrassment is the signal. If v0.1 makes you flinch when users see it, that flinch will tell you what to build next. If v0.1 is polished, you have over-scoped and you will not learn what actually matters.

My most successful products all embarrassed me at launch. theSVG shipped with 400 icons when the category leader had 10,000. Stacklit shipped with a single compression algorithm when more sophisticated approaches existed. Glin-Profanity shipped for one language. Every one of them got where they are because the first version landed and users told me what to fix.

The projects I should have killed sooner

Looking at my graveyard, the top regret is not the projects I started and did not finish. It is the projects I kept alive for eight months of quiet background worry before admitting they were dead. Each of those eight-month zombies cost me a weekend per month in research and note-taking, while producing nothing.

If you have been working on something for more than sixty days without a shipped version anyone has touched, you are probably not going to finish. Kill it. The grief is shorter than the slow bleed.

The uncomfortable conclusion

The reason overthinking sabotages projects is not that thinking is bad. Thinking is fine. Thinking without a deadline is the problem. Without a shipping date, every new piece of information looks like it might matter, and the project drowns in adjacency.

Kevin's shelf project worked because he gave it a weekend. Mine work when I give them one too. When I do not, they join the graveyard.

Build the small thing. Ship the bad thing. Let users tell you what the good thing is.

Do you have a graveyard? What killed the projects you did not ship? Comments open.

Permalink

bisql (Clojure Data Access Library) released v0.4.0: Support Malli Validation

Bisql

I'm building Bisql, a data access library for 2-way SQL in Clojure.

https://github.com/hatappo/bisql

Since it's “2-way,” I called it bisql with the bi- prefix.

It is pronounced like bicycle.

A common weakness of SQL-template libraries, not just 2-way SQL libraries, is that writing all SQL as templates can become tedious.

To address that, many SQL-template libraries support a query builder.

You can write some queries as templates and others with a builder.

But I think that approach is fundamentally inconsistent.

If every query function is maintained as SQL or SQL templates, the cost of reviewing and understanding the data access layer drops significantly. Every database access has a concrete representation as an actual SQL file.

Once a query builder gets mixed in, that consistency is gone.

And in practice, what starts as “we’ll only use the builder for simple queries” often drifts into using it for complex queries as well, until the generated SQL is no longer obviously what you intended.

Bisql takes a different approach:

every database access should be written as SQL.

That said, hand-writing even simple CRUD operations for every table is tedious. So Bisql takes another approach there: it generates a large and comprehensive set of typical CRUD queries automatically.

It connects to a real database, inspects the schema, considers indexes, and generates SQL templates for many index-friendly query patterns.

Then the defquery macro converts all of those .sql template files into Clojure functions at once.

So you keep the consistency of SQL-first development without the repetitive CRUD work.

Malli Support

In this release, generated SQL templates can now include :malli/in and :malli/out declaration metadata.

These hold schemas for:

  • the parameters passed to a query function
  • the response data returned from the query

Bisql SQL templates can already carry arbitrary metadata that becomes metadata on the generated query functions. Malli support builds on that.

If a query function has :malli/in and :malli/out metadata, Bisql can automatically run Malli validation during query execution. This behavior is configurable.

Bisql also generates a base Malli schema file for each table as schema.clj.

The :malli/in and :malli/out metadata refer to those generated schemas.

Example of a generated query

/*:name crud.get-by-id */
/*:cardinality :one */
/*:malli/in [:map {:closed true} [:id int?]] */
/*:malli/out [:maybe sql.postgresql.public.users.schema/row] */
SELECT *
FROM users
WHERE id = /*$id*/1

Example of a generated schema

(ns sql.postgresql.public.users.schema
  (:refer-clojure :exclude [update])
  (:require [bisql.schema :as bisql.schema]))

#_{:clojure-lsp/ignore [:clojure-lsp/unused-public-var]}
(def insert
  [:map
   {:closed true}
   [:id [:or int? bisql.schema/malli-default-sentinel]]
   [:email string?]
   [:display-name string?]
   [:status [:or string? bisql.schema/malli-default-sentinel]]
   [:created-at [:or [:fn bisql.schema/offset-date-time?] bisql.schema/malli-default-sentinel]]])

#_{:clojure-lsp/ignore [:clojure-lsp/unused-public-var]}
(def update
  (bisql.schema/malli-map-all-entries-optional insert))

#_{:clojure-lsp/ignore [:clojure-lsp/unused-public-var]}
(def row
  (bisql.schema/malli-map-all-entries-strip-default-sentinel insert))

In other words, typical CRUD queries and their schemas can now be generated automatically, and validation can run transparently with very little manual work.

A Small Expression Language for if

This release also adds a small expression language for if conditions inside SQL templates.

That makes conditional rendering more expressive without leaving SQL templates or introducing a separate query builder layer.

Links

Permalink

Total functions in untyped languages

Watch the Clojure Documentary!


I was explaining an idea from my book to a coworker. I was saying that total functions can reduce defensive coding and simplify code. My coworker commented that it’s kind of meaningless to talk about totality in an untyped language.

I don’t agree, but I do understand the idea. It is much harder to define total function in an untyped language than in a typed language. When you’ve got types, you can say “A function is total if it returns a value (instead of throwing an error) for any arguments that pass the type checker.” It’s a straightforward definition. Or so it seems.

Totality is an idea from mathematics, specifically in the field of computability. A function is total if every combination of arguments in the domain of the function results in a value in its range. The domain and range are each sets of values. The domain specifies valid arguments while the range specifies valid return values.

And here we see a major problem: We’re using domain in two different ways. One is about the valid arguments to a function. The other is domain as in domain modeling, where it means the area of concern of your software. It’s tricky to talk about the first when in the context of the second.

Division is total in mathematics. Division’s domain is

ℝ x (ℝ - 0)

and the range is ℝ. That means division is defined for pairs of real numbers where the second element is not zero. That’s what we learned in high school algebra.

But in programming, division is the prime example of a partial function. How come? Well, when you define the function in a type language like this:

function divide(a: number, b: number): number

You get a function that can throw an exception—so it’s not total—even after it passes the type checker:

divide(1, 0) //=> Exception! Divide by zero.

What we see is that we’ve mapped the “set of values” idea of domain from math onto the “types of arguments” idea in programming languages. It’s an imperfect mapping. And it’s a mapping many of us have gladly accepted. The function is partial because the language cannot express the domain perfectly.

That’s why the whole concept of total functions is important. It gets us thinking about the gaps in our mappings. It gets us thinking about how to do a better mapping. Or about what kinds of checks on the arguments we should do before we call a function or after we get a return value. In short, it’s about safety and trust.

But notice that even in typed languages, there is a gap in the mapping from domain to types. The gap may be smaller than with untyped languages, but there can be a gap. My colleague said it’s meaningless, which implies a Church Typing mindset. It implies that because there are no checks for types, you could pass anything, such as:

(+ “a” 4) ;=> Exception! Expected Number but got String.

That means that even venerable addition is not total in Clojure. I think this is a little disingenuous. This is obviously an incorrect program, even if in general, incorrectness is hard to define in untyped languages. The correct set of values is well-known for + even if it’s not written down. And for any function, all we have to do is write it down—in documentation—and the intended domain is explicit.

I don’t want this essay to be a debate about static vs. dynamic typing. This essay is about totality and how to define it. Totality is really a concept that came from computability theory. It’s for asking questions like “does function f halt for all integer inputs.” The domain is specified by the problem, not the function. That’s why division is partial in programming: The domain is specified from outside, by the language’s choice of types.

In programming we use totality to consider the possible and probable inputs of a function. We concede that we must make imperfect tradeoffs between ideal mathematical objects and available language features. In most type systems, we can’t say “it’s a number, but it can’t be zero” and have the type checker ensure it’s correct.

So I’ve come up with a definition that takes the pragmatics of the idea into account:

A function is total if it returns a valid domain value for every combination of valid domain values as arguments.

In this definition, “valid domain value” is doing a lot of work. It’s purposefully vague. But it asks you, the domain modeler, to consider your domain (in the business domain sense) and the meaningful values in it. It asks you to consider what subset of those values are valid. These often don’t correspond perfectly to types in your language. Ideally there would be a perfect overlap between the domain of a function and the business domain values. Where the overlap is not perfect is where you should look.

I want programmers to think past types and peer deep into the domain. I want them to ask these questions:

  • What are the meaningful domain values that this function is meant to operate on?

  • Does that set constitute a cohesive concept?

  • Are there values that are conceptually cohesive that it won’t work for?

  • How can I map the set of domain values this function is defined on to language features to get some kind of safety guarantees?

This definition also has a curious effect. Let me demonstrate in Clojure. Let’s say I’m definition a function that should only be defined over non-negative numbers. I don’t have complex numbers in my business domain:

(defn square-root [x] …)

If I’m conscientious, I’ll add a docstring explaining the domain of the function (the set of arguments it is defined over):

(defn square-root
“The square root of a non-negative number.”
[x]…)

Great! But I want the domain to be enforced in code. In Clojure, I might write:

(defn square-root
“The square root of a non-negative number.”
[x]
(assert (not (neg? x)))
…)

Awesome! I’ve now made the function total! We’ve expressed the domain of the function adequately using the features of the language. Ironically, though, we have written code that is more likely to throw when totality is usually defined as throwing less often. But we’re also defining what are valid arguments. Calling (square-root -1) throws an AssertionError, indicating that you, the programmer, violated the contract. So it is total, considering the explicitly defined domain of the function.

These are the kinds of questions I wrestle with when writing a book. I know I use the idea of total function all the time when programming in Clojure. I think it’s valuable. But Clojure doesn’t have types, so what can it mean? Why do others believe it can’t mean anything? I think it’s important to get these ideas right. If you were ever wondering what takes so long to write a book, it’s reading, thinking, digging, conversing until I feel like I’ve got a handle on it.

When you’re working on software design, or any kind of design, there are always hard questions. You can’t rely on pat answers. Instead, you need conceptual models to give you different perspectives. Total functions is one of those. It helps you focus on the failure modes of your functions: Could this function be passed something that will break it? Will it ever return an unexpected value? What would cause it to throw an exception? It’s not about writing code according to some rule, or using a language feature in a particular pattern. You have to think broader than that.

Permalink

Proximal Policy Optimization with Clojure and PyTorch

(Cross posting article published at Clojure Civitas)

Motivation

Recently I started to look into the problem of reentry trajectory planning in the context of developing the sfsim space flight simulator. I had looked into reinforcement learning before and even tried out Q-learning using the lunar lander reference environment of OpenAI’s gym library (now maintained by the Farama Foundation). However it had stability issues. The algorithm would converge on a strategy and then suddenly diverge again.

More recently (2017) the Proximal Policy Optimization (PPO) algorithm was published and it has gained in popularity. PPO is inspired by Trust Region Policy Optimization (TRPO) but is much easier to implement. Also PPO handles continuous observation and action spaces which is important for control problems. The Stable Baselines3 Python library has a implementation of PPO, TRPO, and other reinforcement learning algorithms. However I found XinJingHao’s PPO implementation which is easier to follow.

In order to use PPO with a simulation environment implemented in Clojure and also in order to get a better understanding of PPO, I dediced to do an implementation of PPO in Clojure.

Dependencies

For this project we are using the following deps.edn file. The Python setup is shown further down in this article.

{:deps
 {org.clojure/clojure {:mvn/version "1.12.4"}
  clj-python/libpython-clj {:mvn/version "2.026"}
  quil/quil {:mvn/version "4.3.1563"}
  org.clojure/core.async {:mvn/version "1.9.865"}}
}

The dependencies can be pulled in using the following statement.

(require '[clojure.math :refer (PI cos sin exp to-radians)]
         '[clojure.core.async :as async]
         '[tablecloth.api :as tc]
         '[scicloj.tableplot.v1.plotly :as plotly]
         '[quil.core :as q]
         '[quil.middleware :as m]
         '[libpython-clj2.require :refer (require-python)]
         '[libpython-clj2.python :refer (py.) :as py])

Pendulum Environment

screenshot of pendulum environment

To validate the implementation, we will implement the classical pendulum environment in Clojure. In order to be able to switch environments, we define a protocol according to the environment abstract class used in OpenAI’s gym.

(defprotocol Environment
  (environment-update [this action])
  (environment-observation [this])
  (environment-done? [this])
  (environment-truncate? [this])
  (environment-reward [this action]))

Here is a configuration for testing the pendulum.

(def frame-rate 20)

(def config
  {:length  (/ 2.0 3.0)
   :max-speed 8.0
   :motor 6.0
   :gravitation 10.0
   :dt (/ 1.0 frame-rate)
   :save false
   :timeout 10.0
   :angle-weight 1.0
   :velocity-weight 0.1
   :control-weight 0.0001})

Setup

A method to initialise the pendulum is defined.

(defn setup
  "Initialise pendulum"
  [angle velocity]
  {:angle          angle
   :velocity       velocity
   :t              0.0})

Same as in OpenAI’s gym the angle is zero when the pendulum is pointing up. Here a pendulum is initialised to be pointing down and have an angular velocity of 0.5 radians per second.

(setup PI 0.5)
; {:angle 3.141592653589793, :velocity 0.5, :t 0.0}

State Updates

The angular acceleration due to gravitation is implemented as follows.

(defn pendulum-gravity
  "Determine angular acceleration due to gravity"
  [gravitation length angle]
  (/ (* (sin angle) gravitation) length))

The angular acceleration depends on the gravitation, length of pendulum, and angle of pendulum.

(pendulum-gravity 9.81 1.0 0.0)
; 0.0
(pendulum-gravity 9.81 1.0 (/ PI 2))
; 9.81
(pendulum-gravity 9.81 2.0 (/ PI 2))
; 4.905

The motor is controlled using an input value between -1 and 1. This value is simply multiplied with the maximum angular acceleration provided by the motor.

(defn motor-acceleration
  "Angular acceleration from motor"
  [control motor-acceleration]
  (* control motor-acceleration))

A simulation step of the pendulum is implemented using Euler integration.

(defn update-state
  "Perform simulation step of pendulum"
  ([{:keys [angle velocity t]}
    {:keys [control]}
    {:keys [dt motor gravitation length max-speed]}]
   (let [gravity        (pendulum-gravity gravitation length angle)
         motor          (motor-acceleration control motor)
         t              (+ t dt)
         acceleration   (+ motor gravity)
         velocity       (max (- max-speed)
                             (min max-speed
                                  (+ velocity (* acceleration dt))))
         angle          (+ angle (* velocity dt))]
     {:angle          angle
      :velocity       velocity
      :t              t})))

Here are a few examples for advancing the state in different situations.

(update-state {:angle PI :velocity 0.0 :t 0.0} {:control 0.0} config)
; {:angle 3.141592653589793, :velocity 9.184850993605151E-17, :t 0.05}
(update-state {:angle PI :velocity 0.1 :t 0.0} {:control 0.0} config)
; {:angle 3.146592653589793, :velocity 0.1000000000000001, :t 0.05}
(update-state {:angle (/ PI 2) :velocity 0.0 :t 0.0} {:control 0.0} config)
; {:angle 1.6082963267948966, :velocity 0.75, :t 0.05}
(update-state {:angle 0.0 :velocity 0.0 :t 0.0} {:control 1.0} config)
; {:angle 0.015000000000000003, :velocity 0.30000000000000004, :t 0.05}

Observation

The observation of the pendulum state uses cosinus and sinus of the angle to resolve the wrap around problem of angles. The angular speed is normalized to be between -1 and 1 as well. This so called feature scaling is done in order to improve convergence.

(defn observation
  "Get observation from state"
  [{:keys [angle velocity]} {:keys [max-speed]}]
  [(cos angle) (sin angle) (/ velocity max-speed)])

The observation of the pendulum is a vector with 3 elements.

(observation {:angle 0.0 :velocity 0.0} config)
; [1.0 0.0 0.0]
(observation {:angle 0.0 :velocity 0.5} config)
; [1.0 0.0 0.0625]
(observation {:angle (/ PI 2) :velocity 0.0} config)
; [6.123233995736766E-17 1.0 0.0]

Note that the observation needs to capture all information required for achieving the objective, because it is the only information available to the actor for deciding on the next action.

Action

The action of a pendulum is a vector with one element between 0 and 1. The following method clips it and converts it to an action hashmap used by the pendulum environment. Note that an action can consist of several values.

(defn action
  "Convert array to action"
  [array]
  {:control (max -1.0 (min 1.0 (- (* 2.0 (first array)) 1.0)))})

The following examples show how the action vector is mapped to a control input between -1 and 1.

(action [0.0])
; {:control -1.0}
(action [0.5])
; {:control 0.0}
(action [1.0])
; {:control 1.0}

Termination

The truncate method is used to stop a pendulum run after a specific amount of time.

(defn truncate?
  "Decide whether a run should be aborted"
  ([{:keys [t]} {:keys [timeout]}]
   (>= t timeout)))

(truncate? {:t 50.0} {:timeout 100.0})
; false
(truncate? {:t 100.0} {:timeout 100.0})
; true

It is also possible to define a termination condition. For the pendulum environment we specify that it never terminates.

(defn done?
  "Decide whether pendulum achieved target state"
  ([_state _config]
   false))

Reward

The following method normalizes an angle to be between -PI and +PI.

(defn normalize-angle
  "Angular deviation from up angle"
  [angle]
  (- (mod (+ angle PI) (* 2 PI)) PI))

We also need the square of a number.

(defn sqr
  "Square of number"
  [x]
  (* x x))

The reward function penalises deviation from the upright position, non-zero velocities, and non-zero control input. Note that it is important that the reward function is continuous because machine learning uses gradient descent.

(defn reward
  "Reward function"
  [{:keys [angle velocity]}
   {:keys [angle-weight velocity-weight control-weight]}
   {:keys [control]}]
  (- (+ (* angle-weight (sqr (normalize-angle angle)))
        (* velocity-weight (sqr velocity))
        (* control-weight (sqr control)))))

Environment Protocol

Finally we are able to implement the pendulum as a generic environment.

(defrecord Pendulum [config state]
  Environment
  (environment-update [_this input]
    (->Pendulum config (update-state state (action input) config)))
  (environment-observation [_this]
    (observation state config))
  (environment-done? [_this]
    (done? state config))
  (environment-truncate? [_this]
    (truncate? state config))
  (environment-reward [_this input]
    (reward state config (action input))))

The following factory method creates an environment with an initial random state covering all possible pendulum states.

(defn pendulum-factory
  []
  (let [angle     (- (rand (* 2.0 PI)) PI)
        max-speed (:max-speed config)
        velocity  (- (rand (* 2.0 max-speed)) max-speed)]
    (->Pendulum config (setup angle velocity))))

Visualisation

The following method is used to draw the pendulum and visualise the motor control input.

(defn draw-state [{:keys [angle]} {:keys [control]}]
  (let [origin-x   (/ (q/width) 2)
        origin-y   (/ (q/height) 2)
        length     (* 0.5 (q/height) (:length config))
        pendulum-x (+ origin-x (* length (sin angle)))
        pendulum-y (- origin-y (* length (cos angle)))
        size       (* 0.05 (q/height))
        arc-radius (* (abs control) 0.2 (q/height))
        positive   (pos? control)
        tip-angle  (if positive 225 -45)]
    (q/frame-rate frame-rate)
    (q/background 255)
    (q/stroke-weight 5)
    (q/stroke 0)
    (q/fill 175)
    (q/line origin-x origin-y pendulum-x pendulum-y)
    (q/stroke-weight 1)
    (q/ellipse pendulum-x pendulum-y size size)
    (q/no-fill)
    (q/arc origin-x origin-y
           (* 2 arc-radius) (* 2 arc-radius)
           (to-radians -45) (to-radians 225))
    (q/with-translation [(+ origin-x (* (cos (to-radians tip-angle)) arc-radius))
                         (+ origin-y (* (sin (to-radians tip-angle)) arc-radius))]
      (q/with-rotation [(to-radians (if positive 225 -45))]
        (q/triangle 0 (if positive 10 -10) -5 0 5 0)))
    (when (:save config)
      (q/save-frame "frame-####.png"))))

Animation

With Quil we can create an animation of the pendulum and react to mouse input.

(defn -main [& _args]
  (let [done-chan   (async/chan)
        last-action (atom {:control 0.0})]
    (q/sketch
      :title "Inverted Pendulum with Mouse Control"
      :size [854 480]
      :setup #(setup PI 0.0)
      :update (fn [state]
                  (let [action {:control (min 1.0
                                              (max -1.0
                                                   (- 1.0 (/ (q/mouse-x)
                                                             (/ (q/width) 2.0)))))}
                        state  (update-state state action config)]
                    (when (done? state config) (async/close! done-chan))
                    (reset! last-action action)
                    state))
      :draw #(draw-state % @last-action)
      :middleware [m/fun-mode]
      :on-close (fn [& _] (async/close! done-chan)))
    (async/<!! done-chan))
  (System/exit 0))

manually controlled pendulum

Neural Networks

PPO is a machine learning technique using backpropagation to learn the parameters of two neural networks.

  • The actor network takes an observation as an input and outputs the parameters of a probability distribution for sampling the next action to take.
  • The critic takes an observation as an input and outputs the expected cumulative reward for the current state.

Import PyTorch

For implementing the neural networks and backpropagation, we can use the Python-Clojure bridge libpython-clj2 and the PyTorch machine learning library. The PyTorch library is quite comprehensive, is free software, and you can find a lot of documentation on how to use it. The default version of PyTorch on pypi.org comes with CUDA (Nvidia) GPU support. There are also PyTorch wheels provided by AMD which come with ROCm support. Here we are going to use a CPU version of PyTorch which is a much smaller install.

You need to install Python 3.10 or later. For package management we are going to use the uv package manager. The following pyproject.toml file is used to install PyTorch and NumPy.

[project]
name = "ppo"
version = "0.1.0"
description = "Proximal Policy Optimization"
authors = [{ name="Jan Wedekind", email="jan@wedesoft.de" }]
requires-python = ">=3.10.0"
dependencies = [
    "numpy",
    "torch",
]

[tool.uv]
python-preference = "only-system"

[tool.uv.sources]
torch = { index = "pytorch" }
numpy = { index = "pytorch" }

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

Note that we are specifying a custom repository index to get the CPU-only version of PyTorch. Also we are using the system version of Python to prevent uv from trying to install its own version which lacks the _cython module. To freeze the dependencies and create a uv.lock file, you need to run

uv lock

You can install the dependencies using

uv sync

In order to access PyTorch from Clojure you need to run the clj command via uv:

uv run clj

Now you should be able to import the Python modules using require-python.

(require-python '[builtins :as python]
                '[torch :as torch]
                '[torch.nn :as nn]
                '[torch.nn.functional :as F]
                '[torch.optim :as optim]
                '[torch.distributions :refer (Beta)]
                '[torch.nn.utils :as utils])
; :ok

Tensor Conversion

First we implement a few methods for converting nested Clojure vectors to PyTorch tensors and back.

Clojure to PyTorch

The method tensor is for converting a Clojure datatype to a PyTorch tensor.

(defn tensor
  "Convert nested vector to tensor"
  ([data]
   (tensor data torch/float32))
  ([data dtype]
   (torch/tensor data :dtype dtype)))

(tensor PI)
; tensor(3.1416)
(tensor [2.0 3.0 5.0])
; tensor([2., 3., 5.])
(tensor [[1.0 2.0] [3.0 4.0] [5.0 6.0]])
; tensor([[1., 2.],
;         [3., 4.],
;         [5., 6.]])
(tensor [1 2 3] torch/long)
; tensor([1, 2, 3])

PyTorch to Clojure

The next method is for converting a PyTorch tensor back to a Clojure datatype.

(defn tolist
  "Convert tensor to nested vector"
  [tensor]
  (py/->jvm (py. tensor tolist)))

(tolist (tensor [2.0 3.0 5.0]))
; [2.0 3.0 5.0]
(tolist (tensor [[1.0 2.0] [3.0 4.0] [5.0 6.0]]))
; [[1.0 2.0] [3.0 4.0] [5.0 6.0]]

PyTorch scalar to Clojure

A tensor with no dimensions can also be converted using toitem

(defn toitem
  "Convert torch scalar value to float"
  [tensor]
  (py. tensor item))

(toitem (tensor PI))
; 3.1415927410125732

Critic Network

The critic network is a neural network with an input layer of size observation-size and two fully connected hidden layers of size hidden-units with tanh activation functions. The critic output is a single value (an estimate for the expected cumulative return achievable by the given observed state).

(def Critic
  (py/create-class
    "Critic" [nn/Module]
    {"__init__"
     (py/make-instance-fn
       (fn [self observation-size hidden-units]
           (py. nn/Module __init__ self)
           (py/set-attrs!
             self
             {"fc1" (nn/Linear observation-size hidden-units)
              "fc2" (nn/Linear hidden-units hidden-units)
              "fc3" (nn/Linear hidden-units 1)})
           nil))
     "forward"
     (py/make-instance-fn
       (fn [self x]
           (let [x (py. self fc1 x)
                 x (torch/tanh x)
                 x (py. self fc2 x)
                 x (torch/tanh x)
                 x (py. self fc3 x)]
             (torch/squeeze x -1))))}))

When running inference, you need to run the network with gradient accumulation disabled, otherwise gradients get accumulated and can leak into a subsequent training step. In Python this looks like this.

with torch.no_grad():
    # ...

Here we create a Clojure macro to do the same job.

(defmacro without-gradient
  "Execute body without gradient calculation"
  [& body]
  `(let [no-grad# (torch/no_grad)]
     (try
       (py. no-grad# ~'__enter__)
       ~@body
       (finally
         (py. no-grad# ~'__exit__ nil nil nil)))))

Now we can create a network and try it out. We create a test multilayer perceptron with three inputs, two hidden layers of 8 units each, and one output.

(def critic (Critic 3 8))

example of critic multilayer perceptron

Note that the network creates non-zero outputs because PyTorch performs random initialisation of the weights for us.

(without-gradient
  (toitem (critic (tensor [-1 0 0]))))
; -0.38925105333328247

We can also create a wrapper for using the neural network with Clojure datatypes.

(defn critic-observation
  "Use critic with Clojure datatypes"
  [critic]
  (fn [observation]
      (without-gradient (toitem (critic (tensor observation))))))

Here is the output of the network for the observation [-1 0 0].

((critic-observation critic) [-1 0 0])
; -0.38925105333328247

Training

Training a neural network is done by defining a loss function. The loss of the network then is calculated for a mini-batch of training data. One can then use PyTorch’s backpropagation to compute the gradient of the loss value with respect to every single parameter of the network. The gradient then is used to perform a gradient descent step. A popular gradient descent method is the Adam optimizer.

Here is a wrapper for the Adam optimizer.

(defn adam-optimizer
  "Adam optimizer"
  [model learning-rate weight-decay]
  (optim/Adam (py. model parameters) :lr learning-rate :weight_decay weight-decay))

PyTorch also provides the mean square error (MSE) loss function.

(defn mse-loss
  "Mean square error cost function"
  []
  (nn/MSELoss))

A training step can be performed as follows. Here we only use a single mini-batch with a single observation and an expected output of 1.0.

(def optimizer (adam-optimizer critic 0.01 0.0))
(def criterion (mse-loss))
(def mini-batch [(tensor [[-1 0 0]]) (tensor [1.0])])
(let [prediction (critic (first mini-batch))
      expected   (second mini-batch)
      loss       (criterion prediction expected)]
  (py. optimizer zero_grad)
  (py. loss backward)
  (py. optimizer step))

As you can see, the output of the network for the observation [-1 0 0] is now closer to 1.0.

((critic-observation critic) [-1 0 0])
; -0.3086397051811218

Actor Network

The actor network for PPO takes an observation as an input and it outputs the parameters of a probability distribution over actions. In addition to the forward pass, the actor network has a method deterministic_act to choose the expectation value of the distribution as a deterministic action.

(def Actor
  (py/create-class
    "Actor" [nn/Module]
    {"__init__"
     (py/make-instance-fn
       (fn [self observation-size hidden-units action-size]
           (py. nn/Module __init__ self)
           (py/set-attrs!
             self
             {"fc1"     (nn/Linear observation-size hidden-units)
              "fc2"     (nn/Linear hidden-units hidden-units)
              "fcalpha" (nn/Linear hidden-units action-size)
              "fcbeta"  (nn/Linear hidden-units action-size)})
           nil))
     "forward"
     (py/make-instance-fn
       (fn [self x]
           (let [x (py. self fc1 x)
                 x (torch/tanh x)
                 x (py. self fc2 x)
                 x (torch/tanh x)
                 alpha (torch/add 1.0 (F/softplus (py. self fcalpha x)))
                 beta  (torch/add 1.0 (F/softplus (py. self fcbeta x)))]
             [alpha beta])))
     "deterministic_act"
     (py/make-instance-fn
       (fn [self x]
            (let [[alpha beta] (py. self forward x)]
              (torch/div alpha (torch/add alpha beta)))))
     "get_dist"
     (py/make-instance-fn
       (fn [self x]
           (let [[alpha beta] (py. self forward x)]
             (Beta alpha beta))))}))

Furthermore the actor network has a method get_dist to return a Torch distribution object which can be used to sample a random action or query the current log-probability of an action. Here (as the default in XinJingHao’s PPO implementation) we use the Beta distribution with parameters alpha and beta both greater than 1.0. See here for an interactive visualization of the Beta distribution.

(defn indeterministic-act
  "Sample action using actor network returning random action and log-probability"
  [actor]
  (fn indeterministic-act-with-actor [observation]
      (without-gradient
        (let [dist    (py. actor get_dist (tensor observation))
              sample  (py. dist sample)
              action  (torch/clamp sample 0.0 1.0)
              logprob (py. dist log_prob action)]
          {:action (tolist action) :logprob (tolist logprob)}))))

We create a test multilayer perceptron with three inputs, two hidden layers of 8 units each, and two outputs which serve as parameters for the Beta distribution.

(def actor (Actor 3 8 1))

example of actor multilayer perceptron

One can then use the network to:

a. get the parameters of the distribution for a given observation.

(without-gradient (actor (tensor [-1 0 0])))
; (tensor([1.7002]), tensor([1.7489]))

b. choose the expectation value of the distribution as an action.

(without-gradient (py. actor deterministic_act (tensor [-1 0 0])))
; tensor([0.4929])

c. sample a random action from the distribution and get the associated log-probability.

((indeterministic-act actor) [-1 0 0])
{:action [0.6526480913162231], :logprob [0.2350209504365921]}

We can also query the current log-probability of a previously sampled action.

(defn logprob-of-action
  "Get log probability of action"
  [actor]
  (fn [observation action]
      (let [dist (py. actor get_dist observation)]
        (py. dist log_prob action))))

Here is a plot of the probability density function (PDF) actor output for a single observation.

(without-gradient
  (let [actions (range 0.0 1.01 0.01)
        logprob (fn [action]
                    (tolist
                      ((logprob-of-action actor) (tensor [-1 0 0]) (tensor action))))
        scatter (tc/dataset
                  {:x actions
                   :y (map (fn [action] (exp (first (logprob [action])))) actions)})]
    (-> scatter
        (plotly/base {:=title "Actor output for a single observation" :=mode :lines})
        (plotly/layer-point {:=x :x :=y :y}))))

probability density function output of actor for a single observation

Finally we can also query the entropy of the distribution. By incorporating the entropy into the loss function later on, we can encourage exploration and prevent the probability density function from collapsing.

(defn entropy-of-distribution
  "Get entropy of distribution"
  [actor observation]
  (let [dist (py. actor get_dist observation)]
    (py. dist entropy)))

(without-gradient (entropy-of-distribution actor (tensor [-1 0 0])))
; tensor([-0.0825])

Proximal Policy Optimization

Sampling data

In order to perform optimization, we sample the environment using the current policy (indeterministic action using actor).

(defn sample-environment
  "Collect trajectory data from environment"
  [environment-factory policy size]
  (loop [state             (environment-factory)
         observations      []
         actions           []
         logprobs          []
         next-observations []
         rewards           []
         dones             []
         truncates         []
         i                 size]
    (if (pos? i)
      (let [observation      (environment-observation state)
            sample           (policy observation)
            action           (:action sample)
            logprob          (:logprob sample)
            reward           (environment-reward state action)
            done             (environment-done? state)
            truncate         (environment-truncate? state)
            next-state       (if (or done truncate)
                               (environment-factory)
                               (environment-update state action))
            next-observation (environment-observation next-state)]
        (recur next-state
               (conj observations observation)
               (conj actions action)
               (conj logprobs logprob)
               (conj next-observations next-observation)
               (conj rewards reward)
               (conj dones done)
               (conj truncates truncate)
               (dec i)))
      {:observations      observations
       :actions           actions
       :logprobs          logprobs
       :next-observations next-observations
       :rewards           rewards
       :dones             dones
       :truncates         truncates})))

Here for example we are sampling 3 consecutives states of the pendulum.

(sample-environment pendulum-factory (indeterministic-act actor) 3)
; {:observations
;  [[-0.7596729533565417 0.6503053159390207 0.5479034035454418]
;   [-0.8900589293843874 0.4558454806435161 0.5866609335014912]
;   [-0.9762048336009674 0.21685046196424718 0.6368372482766531]],
;  :actions
;  [[0.20388542115688324] [0.5992106795310974] [0.1662445366382599]],
;  :logprobs
;  [[0.08455279469490051] [0.26384592056274414] [-0.028919726610183716]],
;  :next-observations
;  [[-0.8900589293843874 0.4558454806435161 0.5866609335014912]
;   [-0.9762048336009674 0.21685046196424718 0.6368372482766531]
;   [-0.99941293940555 -0.034260422483655656 0.6321353193336707]],
;  :rewards [-7.8437431872499745 -9.322367484397839 -11.139601368813137],
;  :dones [false false false],
;  :truncates [false false false]}

Advantages

Theory

If we are in state \(s_t\) and take an action \(a_t\) at timestep \(t\), we receive reward \(r_t\) and end up in state \(s_{t+1}\). The cumulative reward for state \(s_t\) is a finite or infinite sequence using a discount factor \(\gamma<1\):

\[ r_t + \gamma r_{t+1} + \gamma^2 r_{t+2} + \gamma^3 r_{t+3} + \ldots \]

The critic \(V\) estimates the expected cumulative reward for starting from the specified state.

\[ V(s_t) = \mathop{\hat{\mathbb{E}}} [ r_t + \gamma r_{t+1} + \gamma^2 r_{t+2} + \gamma^3 r_{t+3} + \ldots ] \]

In particular, the difference between discounted rewards can be used to get an estimate for the individual reward:

\[ V(s_t) = \mathop{\hat{\mathbb{E}}} [ r_t ] + \gamma V(s_{t+1})\Leftrightarrow\mathop{\hat{\mathbb{E}}} [ r_t ] = V(s_t) - \gamma V(s_{t+1}) \]

The deviation of the individual reward received in state \(s_t\) from the expected reward is:

\[ \delta_t = r_t + \gamma V(s_{t+1}) - V(s_t)\mathrm{\ if\ not\ }\operatorname{done}_t \]

The special case where a time series is “done” (and the next one is started) uses 0 as the remaining expected cumulative reward.

\[ \delta_t = r_t - V(s_t)\mathrm{\ if\ }\operatorname{done}_{t} \]

If we have a sample set with a sequence of \(T\) states (\(t=0,1,\ldots,T-1\)), one can compute the cumulative advantage for each time step going backwards:

\[ \begin{aligned} \hat{A} _ {T-1} & = -V(s_{T-1}) + r_{T-1} + \gamma V(s_T) = \delta_{T-1} \\ \hat{A} _ {T-2} & = -V(s_{T-2}) + r_{T-2} + \gamma r_{T-1} + \gamma^2 V(s_T) = \delta_{T-2} + \gamma \delta_{T-1} \\ & \vdots \\ \hat{A} _ 0 & = -V(s_0) + r_0 + \gamma r_1 + \gamma^2 r_2 + \ldots + + \gamma^{T-1} r_{T-1} + \gamma^{T} V(s_{T}) \\ & = \delta_0 + \gamma \delta_1 + \gamma^2 \delta_2 + \ldots + \gamma^{T-1} \delta_{T-1} \end{aligned} \]

I.e. we can compute the cumulative advantages as follows:

  • Start with \(\hat{A} _ {T-1} = \delta_{T-1}\)
  • Continue with \(\hat{A} _ t = \delta_t + \gamma \hat{A} _ {t+1}\) for \(t=T-2,T-3,\ldots,0\)

PPO uses an additional factor \(\lambda\le 1\) called Generalized Advantage Estimation (GAE) which can be used to steer the training towards more immediate rewards if there are stability issues. See Schulman et al. for more details.

Implementation of Deltas

The code for computing the \(\delta\) values follows here:

(defn deltas
  "Compute difference between actual reward plus discounted estimate of next state and estimated value of current state"
  [{:keys [observations next-observations rewards dones]} critic gamma]
  (mapv (fn [observation next-observation reward done]
            (- (+ reward
                  (if done 0.0 (* gamma (critic next-observation))))
               (critic observation)))
        observations next-observations rewards dones))

If the reward is zero and the critic outputs constant zero, there is no difference between the expected and received reward.

(deltas {:observations [[4]] :next-observations [[3]] :rewards [0] :dones [false]}
        (constantly 0)
        1.0)
; [0.0]

If the reward is 1.0 and the critic outputs zero for both observations, the difference is 1.0.

(deltas {:observations [[4]] :next-observations [[3]] :rewards [1] :dones [false]}
        (constantly 0)
        1.0)
; [1.0]

If the reward is 1.0 and the difference of critic outputs is also 1.0 then there is no difference between the expected and received reward (when \(\gamma=1\)).

(defn linear-critic [observation] (first observation))
(deltas {:observations [[4]] :next-observations [[3]] :rewards [1] :dones [false]}
        linear-critic
        1.0)
; [0.0]

If the next critic value is 1.0 and discounted with 0.5 and the current critic value is 2.0, we expect a reward of 1.5. If we only get a reward of 1.0, the difference is -0.5.

(deltas {:observations [[2]] :next-observations [[1]] :rewards [1] :dones [false]}
        linear-critic
        0.5)
; [-0.5]

If the run is terminated, the current critic value is compared with the reward which in this case is the last reward received in this run.

(deltas {:observations [[4]] :next-observations [[3]] :rewards [4] :dones [true]}
        linear-critic
        1.0)
; [0.0]

Implementation of Advantages

The advantages can be computed in an elegant way using reductions and the previously computed deltas.

(defn advantages
  "Compute advantages attributed to each action"
  [{:keys [dones truncates]} deltas gamma lambda]
  (vec
    (reverse
    (rest
      (reductions
        (fn [advantage [delta done truncate]]
            (+ delta (if (or done truncate) 0.0 (* gamma lambda advantage))))
        0.0
        (reverse (map vector deltas dones truncates)))))))

For example when all deltas are 1.0 and if using an discount factor of 0.5, the advantages approach 2.0 assymptotically when going backwards in time.

(advantages {:dones [false false false] :truncates [false false false]}
            [1.0 1.0 1.0]
            0.5
            1.0)
; [1.75 1.5 1.0]

When an episode is terminated (or truncated), the accumulation of advantages starts again when going backwards in time. I.e. the computation of advantages does not distinguish between terminated and truncated episodes (unlike the deltas).

(advantages {:dones [false false true false false true]
             :truncates [false false false false false false]}
            [1.0 1.0 1.0 1.0 1.0 1.0]
            0.5
            1.0)
; [1.75 1.5 1.0 1.75 1.5 1.0]

We add the advantages to the batch of samples with the following function.

(defn assoc-advantages
  "Associate advantages with batch of samples"
  [critic gamma lambda batch]
  (let [deltas     (deltas batch critic gamma)
        advantages (advantages batch deltas gamma lambda)]
    (assoc batch :advantages advantages)))

Critic Loss Function

The target values for the critic are simply the current values plus the new advantages. The target values can be computed using PyTorch’s add function.

(defn critic-target
  "Determine target values for critic"
  [{:keys [observations advantages]} critic]
  (without-gradient (torch/add (critic observations) advantages)))

We add the critic targets to the batch of samples with the following function.

(defn assoc-critic-target
  "Associate critic target values with batch of samples"
  [critic batch]
  (let [target (critic-target batch critic)]
    (assoc batch :critic-target target)))

If we add the target values to the samples, we can compute the critic loss for a batch of samples as follows.

(defn critic-loss
  "Compute loss value for batch of samples and critic"
  [samples critic]
  (let [criterion (mse-loss)
        loss      (criterion (critic (:observations samples)) (:critic-target samples))]
    loss))

Actor Loss Function

The core of the actor loss function relies on the action probability ratio of using the updated and the old policy (actor network output). The ratio is defined as \[ r_t(\theta)=\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{\operatorname{old}}}(a_t|s_t)} \].

Note that \(r_t(\theta)\) here refers to the probability ratio as opposed to the reward of the previous section.

The sampled observations, log probabilities, and actions are combined with the actor’s parameter-dependent log probabilities.

(defn probability-ratios
  "Probability ratios for a actions using updated policy and old policy"
  [{:keys [observations logprobs actions]} logprob-of-action]
  (let [updated-logprobs (logprob-of-action observations actions)]
    (torch/exp (py. (torch/sub updated-logprobs logprobs) sum 1))))

The objective is to increase the probability of actions which lead to a positive advantage and reduce the probability of actions which lead to a negative advantage. I.e. maximising the following objective function.

\[ L^{CPI}(\theta) = \mathop{\hat{\mathbb{E}}}_t [\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{\operatorname{old}}} (a_t|s_t)} \hat{A}_t] = \mathop{\hat{\mathbb{E}}}_t [r_t (\theta) \hat{A}_t] \]

The core idea of PPO is to use clipped probability ratios for the loss function in order to increase stability, . The probability ratio is clipped to stay below \(1+\epsilon\) for positive advantages and to stay above \(1-\epsilon\) for negative advantages.

\[ L^{CLIP}(\theta) = \mathop{\hat{\mathbb{E}}}_t [\min(r_t (\theta) \hat{A}_t, \mathop{\operatorname{clip}}(r_t (\theta), 1-\epsilon, 1+\epsilon) \hat{A}_t)] \]

See Schulman et al. for more details.

Because PyTorch minimizes a loss, we need to negate above objective function.

(defn clipped-surrogate-loss
  "Clipped surrogate loss (negative objective)"
  [probability-ratios advantages epsilon]
  (torch/mean
    (torch/neg
      (torch/min
        (torch/mul probability-ratios advantages)
        (torch/mul (torch/clamp probability-ratios (- 1.0 epsilon) (+ 1.0 epsilon))
                   advantages)))))

We can plot the objective function for a single action and a positive advantage.

(without-gradient
  (let [ratios  (range 0.0 2.01 0.01)
        loss    (fn [ratio advantage epsilon]
                    (toitem
                      (torch/neg
                        (clipped-surrogate-loss (tensor ratio)
                                                (tensor advantage)
                                                epsilon))))
        scatter (tc/dataset
                  {:x ratios
                   :y (map (fn [ratio] (loss ratio 0.5 0.2)) ratios)})]
    (-> scatter
        (plotly/base {:=title "Objective Function for Positive Advantage" :=mode :lines})
        (plotly/layer-point {:=x :x :=y :y}))))

actor loss over ratio for positive advantage

And for a negative advantage.

(without-gradient
  (let [ratios  (range 0.0 2.01 0.01)
        loss    (fn [ratio advantage epsilon]
                    (toitem
                      (torch/neg
                        (clipped-surrogate-loss (tensor ratio)
                                                (tensor advantage)
                                                epsilon))))
        scatter (tc/dataset
                  {:x ratios
                   :y (map (fn [ratio] (loss ratio -0.5 0.2)) ratios)})]
    (-> scatter
        (plotly/base {:=title "Objective Function for Negative Advantage" :=mode :lines})
        (plotly/layer-point {:=x :x :=y :y}))))

actor loss over ratio for positive advantage

We can now implement the actor loss function which we want to minimize. The loss function uses the clipped surrogate loss function as defined above. The loss function also penalises low entropy values of the distributions output by the actor in order to encourage exploration.

(defn actor-loss
  "Compute loss value for batch of samples and actor"
  [samples actor epsilon entropy-factor]
  (let [ratios         (probability-ratios samples (logprob-of-action actor))
        entropy        (torch/mul
                         entropy-factor
                         (torch/neg
                           (torch/mean
                             (entropy-of-distribution actor (:observations samples)))))
        surrogate-loss (clipped-surrogate-loss ratios (:advantages samples) epsilon)]
    (torch/add surrogate-loss entropy)))

A notable detail in XinJingHao’s PPO implementation is that the advantage values used in the actor loss (not in the critic loss!) are normalized.

(defn normalize-advantages
  "Normalize advantages"
  [batch]
  (let [advantages (:advantages batch)]
    (assoc batch :advantages (torch/div (torch/sub advantages (torch/mean advantages))
                                        (torch/std advantages)))))

Preparing Samples

Shuffling

The data required for training needs to be converted to PyTorch tensors.

(defn tensor-batch
  "Convert batch to Torch tensors"
  [batch]
  {:observations (tensor (:observations batch))
   :logprobs (tensor (:logprobs batch))
   :actions (tensor (:actions batch))
   :advantages (tensor (:advantages batch))})

Furthermore it is good practice to shuffle the samples. This ensures that samples early and late in the sequence are not threated differently. Note that you need to shuffle after computing the advantages, because the computation of the advantages relies on the order of the samples.

We separate the generation of random indices to facilitate unit testing of the shuffling function.

(defn random-order
  "Create a list of randomly ordered indices"
  [n]
  (shuffle (range n)))

(defn shuffle-samples
  "Random shuffle of samples"
  ([samples]
   (shuffle-samples samples (random-order (python/len (first (vals samples))))))
  ([samples indices]
   (zipmap (keys samples)
           (map #(torch/index_select % 0 (torch/tensor indices)) (vals samples)))))

Here is an example of shuffling observations:

(shuffle-samples {:observations (tensor [[1] [2] [3] [4] [5] [6] [7] [8] [9] [10]])})
; {:observations tensor([[ 1.],
;         [ 4.],
;         [ 6.],
;         [ 5.],
;         [10.],
;         [ 8.],
;         [ 7.],
;         [ 2.],
;         [ 9.],
;         [ 3.]])}

Creating Batches

Furthermore we split up the samples into smaller batches to improve training speed.

(defn create-batches
  "Create mini batches from environment samples"
  [batch-size samples]
  (apply mapv
         (fn [& args] (zipmap (keys samples) args))
         (map #(py. % split batch-size) (vals samples))))

(create-batches 5 {:observations (tensor [[1] [2] [3] [4] [5] [6] [7] [8] [9] [10]])})
; [{:observations tensor([[1.],
;         [2.],
;         [3.],
;         [4.],
;         [5.]])} {:observations tensor([[ 6.],
;         [ 7.],
;         [ 8.],
;         [ 9.],
;         [10.]])}]

Putting it All Together

Finally we can implement a method which

  • samples data
  • adds advantages
  • converts to PyTorch tensors
  • adds critic targets
  • normalizes the advantages
  • shuffles the samples
  • creates batches
(defn sample-with-advantage-and-critic-target
  "Create batches of samples and add add advantages and critic target values"
  [environment-factory actor critic size batch-size gamma lambda]
  (->> (sample-environment environment-factory (indeterministic-act actor) size)
       (assoc-advantages (critic-observation critic) gamma lambda)
       tensor-batch
       (assoc-critic-target critic)
       normalize-advantages
       shuffle-samples
       (create-batches batch-size)))

PPO Main Loop

Now we can implement the PPO main loop.

The outer loop samples the environment using the current actor (i.e. policy) and computes the data required for training.

The inner loop performs a small number of updates using the samples from the outer loop.

Each update step performs a gradient descent update for the actor and a gradient descent update for the critic. Another detail from XinJingHao’s PPO implementation is that the gradient norm for the actor update is clipped.

At the end of the loop, the smoothed loss values are shown and the deterministic actions and entropies for a few observations are shown which helps with parameter tuning. Furthermore the entropy factor is slowly lowered so that the policy reduces exploration over time.

The actor and critic model are saved to disk after each checkpoint.

(defn -main [& _args]
  (let [factory          pendulum-factory
        actor            (Actor 3 64 1)
        critic           (Critic 3 64)
        n-epochs         100000
        n-updates        10
        gamma            0.99
        lambda           1.0
        epsilon          0.2
        n-batches        8
        batch-size       50
        checkpoint       100
        entropy-factor   (atom 0.1)
        entropy-decay    0.999
        lr               5e-5
        weight-decay     1e-4
        smooth-actor-loss  (atom 0.0)
        smooth-critic-loss (atom 0.0)
        actor-optimizer  (adam-optimizer actor lr weight-decay)
        critic-optimizer (adam-optimizer critic lr weight-decay)]
    (doseq [epoch (range n-epochs)]
           (let [samples (sample-with-advantage-and-critic-target factory actor critic
                                                                  (* batch-size n-batches)
                                                                  batch-size
                                                                  gamma lambda)]
             (doseq [k (range n-updates)]
                    (doseq [batch samples]
                           (let [loss (actor-loss batch actor epsilon @entropy-factor)]
                             (py. actor-optimizer zero_grad)
                             (py. loss backward)
                             (utils/clip_grad_norm_(py. actor parameters) 0.5)
                             (py. actor-optimizer step)
                             (swap! smooth-actor-loss
                                    (fn [x] (+ (* 0.999 x) (* 0.001 (toitem loss))))) ))
                    (doseq [batch samples]
                           (let [loss (critic-loss batch critic)]
                             (py. critic-optimizer zero_grad)
                             (py. loss backward)
                             (py. critic-optimizer step)
                             (swap! smooth-critic-loss
                                    (fn [x] (+ (* 0.999 x) (* 0.001 (toitem loss))))))))
             (println "Epoch:" epoch
                      "Actor Loss:" @smooth-actor-loss
                      "Critic Loss:" @smooth-critic-loss
                      "Entropy Factor:" @entropy-factor))
           (without-gradient
             (doseq [input [[1 0 -1.0] [1 0 1.0] [0 -1 -1.0] [0 -1 1.0] [0 1 -1.0] [0 1 1.0] [-1 0 -1.0] [-1 0 1.0]]]
                    (println
                      input
                      "->" (action (tolist (py. actor deterministic_act (tensor input))))
                      "entropy" (toitem (entropy-of-distribution actor (tensor input))))))
           (swap! entropy-factor * entropy-decay)
           (when (= (mod epoch checkpoint) (dec checkpoint))
             (println "Saving models")
             (torch/save (py. actor state_dict) "actor.pt")
             (torch/save (py. critic state_dict) "critic.pt")))
    (torch/save (py. actor state_dict) "actor.pt")
    (torch/save (py. critic state_dict) "critic.pt")
    (System/exit 0)))

Visualisation of Actor Output

We can use dtype-next to visualise the output of the actor. First we need to load additional modules.

(require '[tech.v3.datatype :as dtype]
         '[tech.v3.tensor :as dtt]
         '[tech.v3.libs.buffered-image :as bufimg]
         '[tech.v3.datatype.functional :as dfn])

Here we load a pre-trained model and visualise the output of the actor.

(def actor (Actor 3 64 1))
(py. actor load_state_dict (torch/load "src/ppo/actor.pt"))
; <All keys matched successfully>

(let [angle-values   (torch/linspace (- PI) PI 854)
      speed-values   (torch/linspace 1.0 -1.0 480)
      grid           (torch/meshgrid speed-values angle-values :indexing "ij")
      cos-angle      (torch/cos (last grid))
      sin-angle      (torch/sin (last grid))
      observations   (torch/stack [(py. cos-angle ravel)
                                   (py. sin-angle ravel)
                                   (py. (first grid) ravel)]
                                  :axis 1)
      actions        (without-gradient
                       (py. (py. (py. actor deterministic_act observations)
                                 reshape 480 854) numpy))
      actions-tensor (dtt/clone
                       (dtype/elemwise-cast (dtt/ensure-tensor (py/->jvm actions))
                                            :float32))
      actions-trsps  (dtt/transpose actions-tensor [1 0])]
  (dtt/mset! actions-tensor 240 (dfn/- 1.0 (actions-tensor 240)))
  (dtt/mset! actions-trsps 427 (dfn/- 1.0 (actions-trsps 427)))
  (bufimg/tensor->image (dfn/* actions-tensor 255)))

Actor function output over state space This image shows the motor control input as a function of pendulum angle and angular velocity. As one can see, the pendulum is decelerated when the speed is high (dark values at the top of the image). Near the centre of the image (speed zero and angle zero) one can see how the pendulum is accelerated when the angle is negative and the speed small and decelerated when the angle is positive and the speed is small. Also the image is not symmetrical because otherwise the pendulum would not start swinging up when pointing downwards (left and right boundary of the image).

Automated Pendulum

The pendulum implementation can now be updated to use the actor instead of the mouse position as motor input when the mouse button is pressed.

(defn -main [& _args]
  (let [actor       (Actor 3 64 1)
        done-chan   (async/chan)
        last-action (atom {:control 0.0})]
    (when (.exists (java.io.File. "actor.pt"))
      (py. actor load_state_dict (torch/load "actor.pt")))
    (q/sketch
      :title "Inverted Pendulum with Mouse Control"
      :size [854 480]
      :setup #(setup PI 0.0)
      :update (fn [state]
                  (let [observation (observation state config)
                        action      (if (q/mouse-pressed?)
                                      (action (tolist (py. actor
                                                           deterministic_act
                                                           (tensor observation))))
                                      {:control (min 1.0
                                                     (max -1.0
                                                          (- 1.0 (/ (q/mouse-x)
                                                                    (/ (q/width) 2.0)))))})
                        state       (update-state state action config)]
                    (when (done? state config) (async/close! done-chan))
                    (reset! last-action action)
                    state))
      :draw #(draw-state % @last-action)
      :middleware [m/fun-mode]
      :on-close (fn [& _] (async/close! done-chan)))
    (async/<!! done-chan))
  (System/exit 0))

Here is a small demo video of the pendulum being controlled using the actor network. You can find a repository with the code of this article as well as unit tests at github.com/wedesoft/ppo.

automatically controlled pendulum

Enjoy!

Permalink

Clojure Deref (Apr 21, 2026)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS).

Clojure Documentary

The Clojure Documentary is live!

Afterward, enjoy the Clojure Documentary Q&A with Rich Hickey and other key people in Clojure’s history!

Don’t miss the Documentary show notes with links to:

  • The foundational research papers

  • Influential books

  • Rich’s talks

  • Historical archives

  • Dialects and runtimes

  • Community resources

  • Getting started videos

  • A glossary

  • and more!

Clojure Community Check-In

The world is going through changes: in programming, technology, work, and in specific countries and regions, each with its own form of trouble, hope, or confusion.

People in Clojure communities, like elsewhere, are finding their way through it, sometimes with questions and sometimes with a sense of being alone in it.

The Clojure Community Check-In is a space to share how we’re doing.

Watch a short video from the organizers.

Sessions:

Clojure/Conj 2026

September 30 – October 2, 2026
Charlotte Convention Center, Charlotte, NC

Join us for the largest gathering of Clojure developers in the world! Meet new people and reconnect with old friends. Enjoy two full days of talks, a day of workshops, social events, and more.

Early bird and group tickets are now on sale.

Is your company interested in sponsoring? Email us at clojure_conj@nubank.com.br to discuss opportunities.

Upcoming Events

Libraries and Tools

Debut release

  • webgen - Parameter driven web app generator

  • cljam - Clojure interpreter with a tokenizer, reader, macro expander, evaluator, incremental compiler, vite plugin, nREPL server compatible with calva on vscode, embedded browser REPL, CLI compatible with node and bun as host

  • bisql - Keep SQL executable, call it as Clojure functions 🚲️

  • miniforge-standards - Shared engineering standards for all miniforge.ai repositories

  • cljs-mjml - Write MJML email templates with Hiccup syntax in ClojureScript (or Node Babashka)

Updates

  • clojure 1.12.5-rc1 - The Clojure programming language

  • clj-kondo 2026.04.15 - Static analyzer and linter for Clojure code that sparks joy

  • baredom 2.2.0 - BareDOM: Lightweight CLJS UI components built on web standards (Custom Elements, Shadow DOM, ES modules). No framework, just the DOM

  • clj-format 0.1.2 - A Clojure DSL for cl-format inspired by Hiccup. No dependencies. Drop-in compatibility. The power of FORMAT made easy.

  • any 0.1.1 - Objects for smart comparison in tests.

  • spel 0.9.5 - Idiomatic Clojure wrapper for Playwright. Browser automation, API testing, Allure reporting, and native CLI - for Chromium, Firefox, and WebKit

  • ordered-collections 0.2.1 - Fast, modern, ropes and ordered collections that do more than sort.

  • dexter 0.1-alpha-6 - Dexter - Graphical Dependency Explorer

  • gloat 0.1.26 - Glojure AOT Tool

  • glojure 0.6.5-rc17 - Clojure interpreter hosted on Go, with extensible interop support.

  • squint 0.11.188 - Light-weight ClojureScript dialect

  • dataspex 2026.04.1 - See the shape of your data: point-and-click Clojure(Script) data browser

  • meme-clj 5.0.0 - meme-clj — M-Expressions with Macro Expansion

  • charm.clj 0.2.71 - A Clojure TUI (Terminal User Interface) library inspired by Bubble Tea

  • clj-xref 0.1.1 - LLM-friendly cross-reference database for Clojure code. Query who-calls, calls-who, who-implements, ns-deps to feed precise dependency neighborhoods to AI assistants instead of entire source trees. Built on clj-kondo.

  • babashka 1.12.218 - Native, fast starting Clojure interpreter for scripting

  • fs 0.5.33 - File system utility library for Clojure

  • phel-lang 0.34.1 - A functional, Lisp-inspired language that compiles to PHP. Inspired by Clojure, Phel brings macros, persistent data structures, and expressive functional idioms to the PHP ecosystem.

  • nippy 3.7.0-beta1 - Fast serialization library for Clojure

  • statecharts 1.4.0-RC11 - A Statechart library for CLJ(S)

  • clojure-clr clojure-1.12.3-alpha7 - A port of Clojure to the CLR, part of the Clojure project

Permalink

Your GitHub Actions Workflow is a Waste of Time

Over the years, I’ve experimented with almost every flavor of development environment. I’ve gone from manually provisioning tools on a Mac—hoping I’d remember every brew install six months later—to exploring Docker, Nix, and remote environments.

My journey has touched it all: asdf, Brew, Docker, Nix, and Devbox. I’ve jumped between terminal emulators and multiplexers like Tmux, Kitty, WezTerm, and Zellij.

My Modern Development Environment

My latest setup is built for speed, reproducibility, and a “keyboard-first” philosophy. It lives entirely in the terminal across two environments: my local iMac and an OCI Ampere VPS.

  • Terminal: Ghostty
  • Multiplexer: Zellij
  • Environment Management: Devenv
  • Editor: Doom Emacs

Achieving CI Parity with Self-Hosted Runners

If you haven’t switched to a self-hosted GitHub runner yet, do it for your own sanity. You can thank me later.

By running your CI on your own hardware (like an OCI Ampere instance), you eliminate the overhead of public runners and gain full control over the environment. When paired with Devenv, your CI environment becomes an exact mirror of your local machine.

How to set it up:

  1. Navigate to your GitHub repository Settings.
  2. Go to Actions -> Runners.
  3. Click New self-hosted runner and follow the configuration steps for your OS.
  4. Update your workflow .yml file to use the self-hosted label.

Quantifying the Impact on Feedback Loops

By moving to a self-hosted ARM64 runner, my feedback loop became incredibly tight. My tests now finish in 24 seconds, and the entire image creation process takes just 1 minute and 22 seconds.

Here is what the streamlined job looks like:

jobs:
test:
runs-on:
- self-hosted
- Linux
- ARM64
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Run tests
id: run_tests
run: devenv shell clojure -X:test

The Power of Declarative Environments

Because I’m using devenv shell, I don’t have to worry about whether the CI runner has Clojure, the right JDK, or specific libraries installed. If it works in my local terminal, it works in the CI. Period.

Final Thoughts: Simplify Your Workflow

The way we use GitHub Actions today is often redundant. We spend an enormous amount of time writing complex YAML configurations to install dependencies, manage versions, and configure caching—essentially re-architecting our entire development environment for every single commit.

Provisioning and caching are solved problems. If you are using tools like Devenv or Nix, you’ve already defined exactly what your project needs to run. By moving to a self-hosted runner, you stop fighting the CI and start using it as a natural extension of your workstation. You gain:

  • Total Parity: If the code runs in your local devenv shell, it will run in CI. No exceptions.
  • Instant Caching: Since the runner is persistent, you don’t need to upload or download massive cache blobs; the dependencies are already there.
  • Minimal Configuration: Your workflow files shrink from dozens of lines of setup boilerplate to a single command.

It’s time to stop treating CI like a special snowflake and start treating it like the high-performance terminal it should be. Stop provisioning twice, stop waiting for public runners, and start shipping faster.

Would you like to have a follow-up on this topic? What are your thoughts? I’d love to hear your experiences.

Permalink

How Nubank Uses Transformers to Model Financial Habits at Scale

Written by: Nubank Editorial

What if, instead of relying on manual feature engineering, we could learn directly from raw financial behavior at scale? 

That was the thesis we set out to defend on episode 122 of the Data Hackers podcast, Brazil’s largest Data and AI community. Founded in 2018, the community brings together thousands of data professionals and thought leaders to discuss the cutting edge of technology.

In conversation with hosts Monique Femme and Paulo Vasconcellos, Arissa Yoshida and Rafael Celente, Senior Research Engineers at Nubank, walked through the breakthroughs behind the paper “Your Spending Needs Attention: Modeling Financial Habits with Transformers” and how this research is already making its way into production.

The starting point is straightforward: financial institutions sit on massive volumes of data — transactions, in-app events, customer interactions — yet extracting real value from this data remains a hard problem. Its sequential, unstructured nature has historically pushed teams toward tabular models built on hand-crafted features.

The paper charts a different course: leveraging Transformer-based architectures and self-supervised learning to build representations directly from raw data. This work gave rise to nuFormer, a model that blends structured and textual transaction attributes and supports fine-tuning for tasks like credit scoring, fraud detection, and product recommendation — delivering measurable gains at scale.

From traditional machine learning to foundation models

To appreciate why this matters, consider where the industry started. For years, traditional ML models — particularly tree-based methods paired with heavy feature engineering — dominated financial applications. These models remain effective, but they hit a ceiling when the problem involves large volumes of unstructured data and the need to capture complex temporal patterns.

At Nubank, where we have an extraordinarily rich dataset — especially long sequences of financial transactions — this limitation becomes hard to ignore. As Arissa Yoshida puts it, these traditional approaches lean heavily on a manual, specialized step of variable construction.

 “With traditional models, you rely heavily on handcrafted features — essentially building an entire engineering pipeline to extract value from your data. That requires people with deep domain expertise who can manually work through the data.” 

Arissa Yoshida, Senior Machine Learning Engineer at Nubank

This dependency makes the process less scalable and more expensive, particularly as data volume and complexity grow. Rafael Celente reinforces this point by explaining that the challenge goes beyond modeling itself — it’s about generalization: “we have a massive dataset, and our hypothesis was that we could get a model to generalize customer behavior from that data.”.

This limitation, combined with the need for models that learn directly from data, opens the door to foundation models in finance.

Treating financial data as language

The key paradigm shift lies in how we look at the data. Rather than treating transactions as isolated records, the idea is to interpret them as sequences with structure, context, and meaning — much like natural language.

Transformers operate on tokens and learn relationships between them. By converting transactions into tokenized sequences, we can capture behavioral patterns at a much deeper level. The model doesn’t care whether it’s processing words, pixels, or financial events — what matters is the relationships between these elements.

This flexibility is precisely what makes it possible to apply an architecture originally designed for natural language to an entirely different domain like finance.

What nuFormer is and why it matters

This is the context in which nuFormer, was born — a foundation model developed by Nubank’s AI Core team to learn representations from financial data at scale. The goal isn’t to solve a single problem, but to build a reusable foundation for different applications across the bank. From these representations, we can improve use cases like fraud detection, product recommendation, and risk modeling.

The key differentiator is generalization. Instead of training a model from scratch for every problem, nuFormer learns a representation of financial behavior that can be reused across multiple contexts, giving different applications a shared starting point. 

As Arissa Yoshida explains in the episode, the vision behind this new kind of model is “to generalize and extract insight from raw, often unstructured data, and scale that across many different problems.”.

Although the initial work started with transactions, the model quickly evolved to incorporate different data types. Today, the vision is multimodal — capable of integrating not just structured financial data, but also behavioral signals, in-app interactions, and other information sources. 

This broadens the model’s potential significantly: it moves beyond isolated events to represent a more complete picture of customer behavior, unlocking more sophisticated applications.

This evolution also connects to other AI Core initiatives, such as AI agents that leverage these representations to operate in real-world scenarios at scale. The team shared these examples in the posts “Building AI agents in practice with Clojure” and “Building AI agents for 131 million customers”, here on Building Nubank.

Engineering, data, and governance for foundation models

One of the most insightful parts of the conversation made clear that the biggest challenge isn’t the model itself — it’s the engineering required to make it work. Training a model of this scale demands robust infrastructure: well-structured pipelines, GPU management, and distributed training. But the real pain point shows up when you try to take it to production.

Transformer-based models tend to have higher latency, which can be a sensitive factor in financial applications. Still, with the right infrastructure and specialized teams, it’s possible to achieve performance levels comparable to traditional models. This reality highlights that the challenge extends beyond ML — it’s a systems problem that requires cross-functional collaboration. As Rafael Celente sums it up: “it’s not just a machine learning problem — it’s a systems problem.{RQ}.

This complexity extends to the role of data and model evaluation. While training at scale is already a reality, ensuring models are learning correctly remains one of the biggest challenges. That involves building consistent data pipelines, continuous monitoring, and defining metrics aligned with business impact.

On top of that, the financial sector adds another layer of rigor: governance. Models must pass multiple rounds of validation before going to production, ensuring compliance with regulations and internal standards. In this landscape, building foundation models requires the joint effort of data engineering, infrastructure, evaluation, product, and business teams — ensuring solutions not only work, but deliver sustainable real-world impact.

Results and impact

Deploying these models into existing systems has driven significant gains in key metrics within just a few months — surpassing improvements that had been accumulated over years with traditional approaches.

These gains aren’t limited to a single use case. The model is already being applied across multiple fronts, including credit, lending, income prediction, and cross-sell, demonstrating that the approach can be reused across a variety of contexts within the bank.

This reusability doesn’t just accelerate the development of new solutions — it creates a multiplier effect, allowing different products to benefit from the same learning foundation.

Looking ahead, our ambition isn’t just to keep up with the state of the art — it’s to contribute to it. That means exploring new architectures, expanding multimodal capabilities, and continuing to share what we learn with the community. As discussed in the episode, the goal is to challenge the status quo and set new standards for AI in finance.

Our appearance on Data Hackers Podcast #122 underscores a central pillar of our strategy: foundation models are already being applied in practice to solve real problems in finance, with direct impact on how we build products, make decisions, and scale intelligence.

By applying Transformers to model financial habits at scale, Nubank is building an AI platform that learns directly from data and evolves continuously. nuFormer in production, with applications in credit and beyond, shows how this approach can expand horizontally and generate consistent value.

If you want to work on problems like these — dealing with data at scale, developing foundation models, and impacting over 131 million customers — we’re hiring on the AI Core team.

The post How Nubank Uses Transformers to Model Financial Habits at Scale appeared first on Building Nubank.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.