Humble Chronicles: The Inescapable Objects

In HumbleUI, there is a full-fledged OOP system that powers lower-level component instances. Sacrilegious, I know, in Clojure we are not supposed to talk about it. But...

Look. Components (we call them Nodes in Humble UI because they serve the same purpose as DOM nodes) have state. Plain and simple. No way around it. So we need something stateful to store them.

They also have behaviors. Again, pretty unavoidable. State and behavior work together.

Still not a case for OOP yet: could’ve been maps and functions. One can just

(def node []
  {:state   (volatile! state)
   :measure (fn [...] ...)
   :draw    (fn [...] ...)})

But there’s more to consider.

Code reuse

Many nodes share the same pattern: e.g. a wrapper is a node that “wraps” another node. padding is a wrapper:

[ui/padding {:padding 10}
 [ui/button "Click me"]]

So is center:

[ui/center
 [ui/button "Click me"]]

So is rect (it draws a rectangle behind its child):

[ui/rect {:paint ...}
 [ui/button "Click me"]]

The first two are different in how they position their child but identical in drawing and event handling. The third one has a different paint function, but the layout and event handling are the same.

I want to write AWrapperNode once and let the rest of the nodes reuse that.

Now — you might think — still not a case for OOP. Just extract a bunch of functions and then pick and choose!

;; shared library code
(defn wrapper-measure [...] ...)

(defn wrapper-draw [...] ...)

;; a node
(defn padding [...]
  {:measure (fn [...]
              <custom measure fn>)
   :draw    wrapper-draw}) ;; reused

This has an added benefit of free choice: you can mix and match implementations from different parents, e.g. measure from wrapper and draw from container.

Partial code replacement

Some functions call other functions! What a surprise.

One direction is easy. E.g. Rect node can first draw itself and then call a parent. We solve this by wrapping one function into another:

(defn rect [opts child]
  {:draw (fn [...]
           (canvas/draw-rect ...)
           ;; reuse by wrapping
           (wrapper-draw ...))})

But now I want to do it the other way: the parent defines wrapping behavior and the child only replaces one part of it.

E.g., for Wrapper nodes we always want to save and restore the canvas state around the drawing, but the drawing itself can be redefined by children:

(defn wrapper-draw [callback]
  (fn [...]
    (let [layer (canvas/save canvas)]
      (callback ...)
      (canvas/restore canvas layer))))

(defn rect [opts child]
  {:draw (wrapper-draw ;; reuse by inverse wrapping
           (fn [...]
             (canvas/draw-rect ...)
             ((:draw child) child ...)}))})

I am not sure about you, but to me, it starts to feel a little too high-ordery.

Another option would be to pass “this” around and make shared functions lookup implementations in it:

(defn wrapper-draw [this ...]
  (let [layer (canvas/save canvas)]
    ((:draw-impl this) ...) ;; lookup in a child
    (canvas/restore canvas layer))))

(defn rect [opts child]
  {:draw      wrapper-draw   ;; reused
   :draw-impl (fn [this ...] ;; except for this part
                (canvas/draw-rect ...)
                ((:draw child) child ...)}))

Starts to feel like OOP, doesn’t it?

Future-proofing

Final problem: I want Humble UI users to write their own nodes. This is not the default interface, mind you, but if somebody wants/needs to go low-level, why not? I want them to have all the tools that I have.

The problem is, what if in the future I add another method? E.g. when it all started, I only had:

-measure
-draw
-event

Eventually, I added -context, -iterate, and -*-impl versions of these. Nobody guarantees I won’t need another one in the future.

Now, with the map approach, the problem is that there will be none. A node is written as:

{:draw    ...
 :measure ...
 :event   ...}

will not suddenly have a context method when I add one.

That’s what OOP solves! If I control the root implementation and add more stuff to it, everybody will get it no matter when they write their nodes.

How does it look

We still have normal protocols:

(defprotocol IComponent
  (-context              [_ ctx])
  (-measure      ^IPoint [_ ctx ^IPoint cs])
  (-measure-impl ^IPoint [_ ctx ^IPoint cs])
  (-draw                 [_ ctx ^IRect rect canvas])
  (-draw-impl            [_ ctx ^IRect rect canvas])
  (-event                [_ ctx event])
  (-event-impl           [_ ctx event])
  (-iterate              [_ ctx cb])
  (-child-elements       [_ ctx new-el])
  (-reconcile            [_ ctx new-el])
  (-reconcile-impl       [_ ctx new-el])
  (-should-reconcile?    [_ ctx new-el])
  (-unmount              [_])
  (-unmount-impl         [_]))

Then we have base (abstract) classes:

(core/defparent ANode
  [^:mut element
   ^:mut mounted?
   ^:mut rect
   ^:mut key
   ^:mut dirty?]
  
  protocols/IComponent
  (-context [_ ctx]
    ctx)

  (-measure [this ctx cs]
    (binding [ui/*node* this
              ui/*ctx*  ctx]
      (ui/maybe-render this ctx)
      (protocols/-measure-impl this ctx cs)))

  ...)

Note that parents can also have fields! Admit it: We all came to Clojure to write better Java.

Then we have intermediate abstract classes that, on one hand, reuse parent behavior, but also redefine it where needed. E.g.

(core/defparent AWrapperNode [^:mut child] :extends ANode
  protocols/IComponent
  (-measure-impl [this ctx cs]
    (when-some [ctx' (protocols/-context this ctx)]
      (measure (:child this) ctx' cs)))

  (-draw-impl [this ctx rect canvas]
    (when-some [ctx' (protocols/-context this ctx)]
      (draw-child (:child this) ctx' rect canvas)))
  
  (-event-impl [this ctx event]
    (event-child (:child this) ctx event))
  
  ...)

Finally, leaves are almost normal deftypes but they pull basic implementations from their parents.

(core/deftype+ Padding [] :extends AWrapperNode
  protocols/IComponent
  (-measure-impl [_ ctx cs] ...)
  (-draw-impl [_ ctx rect canvas] ...))

Underneath, there’s almost no magic. Parent implementations are just copied into children, fields are concatenated to child’s fields, etc.

Again, this is not the interface that the end-user will use. End-user will write components like this:

(ui/defcomp button [opts child]
  [clickable opts
   [clip-rrect {:radii [4]}
    [rect {:paint button-bg)}
     [padding {:padding 10}
      [center
       [label child]]]]]])

But underneath all these rect/padding/center/label will eventually be instantiated into nodes. Heck, even your button will become FnNode. But you are not required to know this.

Also, a reminder: all these solutions, just like Humble UI itself, are a work in progress at the moment. No promises it’ll stay that way.

Conclusion

I’ve heard a rumor that OOP was originally invented for UIs specifically. Mutable objects with mostly shared but sometimes different behaviors were a perfect match for the object paradigm.

Well, now I know: even today, no matter how you start, eventually you will arrive at the same conclusion.

I hope you find this interesting. If you have a better idea — let me know.

Permalink

Humble Chronicles: Shape of the Component

Last time I ran a huge experiment trying to figure out how components should work in Humble UI. Since then, I’ve been trying to bring it to the main.

This was trickier than I anticipated — even with a working prototype, there are still lots of decisions to make, and each one takes time.

I discussed some ideas in Humble Chronicles: Managing State with VDOM, but this is what we ultimately arrived at.

The simplest component:

(ui/defcomp my-comp []
  [ui/label "Hello, world!"])

Note the use of square brackets [], it’s important. We are not creating nodes directly, we return a “description” of UI that will later be analyzed and instantiated for us by Humble UI.

Later if you want to use your component, you do the same:

(ui/defcomp other-comp []
  [my-comp])

You can pass arguments to it:

(ui/defcomp my-comp [text text2 text3]
  [ui/label (str text ", " text2 ", " text3)])

To use local state, return a function. In that case, the body itself will become the “setup” phase, and the returned function will become the “render” phase. Setup is called once, render is called many times:

(ui/defcomp my-comp [text]
  ;; setup
  (let [*cnt (signal/signal 0)]
    (fn [text]
      ;; render
      [ui/label (str text ": " @*cnt)])))

As you can see, we have our own signals implementation. They seem to fit very well with the rest of the VDOM paradigm.

Finally, the fullest form is a map with the :render key:

(ui/defcomp my-comp [text]
  (let [timer (timer/schedule #(println 123) 1000)]
    {:after-unmount
     (fn []
       (timer/cancel timer)) 
     :render
     (fn [text]
       [ui/label text])}))

Again, the body of the component itself becomes “setup”, and :render becomes “render”. As you can see, the map form is useful for specifying lifecycle callbacks.

Code reuse

React has a notion of “hooks”: small reusable bits of code that have access to all the same state and lifecycle machinery that components have.

For example, a timer always needs to be cancelled in unmount, but I don’t want to write after-unmount every time I want to use a timer. I want to use a timer and have its lifecycle to be registered automatically.

Our alternative is with macro:

(defn use-timer []
  (let [*state (signal/signal 0)
        timer  (timer/schedule #(println @*state) 1000)
        cancel (fn []
                 (timer/cancel timer))]
    {:value         *state
     :after-unmount cancel}))

(ui/defcomp ui []
  (ui/with [*timer (use-timer)]
    (fn []
      [ui/label "Timer: " @*timer])))

Under the hood, with just takes a return map of its body and adds stuff it needs to it. Simple, no magic, no special “hooks rules”.

Same as with hooks, with can be used inside with recursively. It just works.

Thanks Kevin Lynagh for the idea.

Shared state

One of the goals of Humble UI was to make component reuse trivial. Web, for example, has hundreds of properties to customize a button, and still, it’s often not enough.

I lack the resources to make hundreds of properties, so I wanted to take another route: make components out of simple reusable parts, and let end users recombine them.

So a button becomes clickable (behavior) and button-look (visual). Want a custom button? Implement your own look, and use the same behavior. Want to reuse the look in another component (e.g. a toggle button?). Write your own behavior, and reuse the visuals.

The look itself consists of simple parts that can be reused and recombined:

(ui/defcomp button-look [child]
  [clip-rrect {:radii [4]}
   [rect {:paint button-bg)}
    [padding {:padding 10}
     [center
      [label child]]]]])

And then the button becomes:

(ui/defcomp button [opts child]
  [ui/clickable opts
   [ui/button-look child]])

(this and a previous one are simplified for clarity)

Now, the problem. The button is, of course, interactive. It reacts to being hovered, pressed, etc. But the state that represents it lives in clickable (the behavior). How to share?

The first idea was to use signals. Like this:

(ui/defcomp button [opts child]
  (let [*state (signal/signal nil)]
    (fn [opts child]
      [ui/clickable {:*state *state}
       [ui/button-look @*state child]])))

Which does work, of course, but a little too verbose. It also forces you to define state outside, while logically clickable should be responsible for it.

So the current solution is this:

(ui/defcomp button [opts child]
  [ui/clickable opts
   (fn [state]
     [ui/button-look state child])])

Which is a bit tighter and doesn’t expose the state unnecessarily. The look component is also straightforward: it accepts the state as an argument, without any magic, so it can be reused anywhere.

Where to try

Current development happens in the “vdom” branch. Components migrate slowly, but steadily, to the new model.

Current screenshot for history:

Soon we will all live in a Virtual DOM world, I hope.

Permalink

OSS Updates March and April 2024

This is a summary of the open source work I've spent my time on throughout March and April, 2024. Overall it was a really insightful couple of months for me, with lots of productive discussions and meetings happening among key contributors to Clojure's data science ecosystem and great progress toward some of our most ambitious goals.

Grammar of graphics in Clojure

With help from Daniel Slutsky and others in the community, I started some concrete work on implementing a grammar of graphics in Clojure. I'm convinced this is the correct long-term solution for dataviz in Clojure, but it is a big project that will take time, including a lot of hammock time. It's still useful to play around with proofs of concept whilst thinking through problems, though, and in the interest of transparency I'm making all of those experiments public.

The discussions around this development are all also happening in public. There were two visual tools meetups focused on this over the last two months (link 1, link 2). And at the London Clojurians talk I just gave today I demonstrated an example of one proposed implementation of a grammar-of-graphics-like API on top of hanami implemented by Daniel.

There are more meetups planned for the coming months and work in this area for the foreseeable future will look like researching and understanding the fundamentals of the grammar of graphics in order to design a simple implementation in Clojure.

Clojure's ML and statistics tools

I spent a lot of time these last couple of months documenting and testing out Clojure's current ML tools, leading to many great conversations and one blog post that generated many more interesting discussions. The takeaway is that the tools themselves in this area are all quite mature and stable, but there are still ongoing discussions around how to best accommodate the different ways that people want to work with them. The overall goal in this area of my work is to stabilize the solutions so we can start advocating for specific ways of using them.

Below are some key takeaways from my research into all this stuff. Note none of these are my decisions to make alone, but represent my current opinions and what I will be advocating for within the community:

Smile will be slowly sunsetted from the ecosystem. The switch to GPL licensing was made in bad faith and many of the common models don't work on Apple chips. Given the abundance of suitable alternatives, the easiest option is to move away from depending on it.
A greater distinction between statistical modelling and machine learning workflows will be helpful. Right now there are many uses of the various models that are available in Clojure, and the wrappers and tools surrounding them are usually designed with a specific type of user in mind. For example machine learning people almost always have separate training and testing datasets, whereas statisticians "train" their models on an entire dataset. The highest-level APIs for these different usages (among others) look quite different, and we would benefit from having APIs that are ergonomic and familiar to our target users of various backgrounds.
We should agree on standards for accomplishing certain very common and basic tasks and propose a recommended usage for users. For example, there are almost a dozen ways to do linear regression in Clojure and it's not obvious which is "the best" way to someone not deeply familiar with the ecosystem.
Everything should work with tablecloth datasets and expect them as inputs. This is mostly the case already, but there is still some progress to be made.

Foundations of Clojure's data science stack

I continue to work on guides and tutorials for the parts of Clojure's data science stack that I feel are ready for prime time, mainly tablecloth and all of the amazing underlying libraries it leverages. Every once in a while this turns up surprises, for example this month I was surprised at how column header processing is handled for nippy files specifically. I also fixed one bug in tablecloth itself, which I discovered in the process of writing a tutorial earlier in March. I have a pile of in-progress guides focusing on some more in-depth topics from developing the London Clojurians talk that I'm going to tidy up and publish in the coming months.

The overarching goal in this area is to create a unified data science stack with libraries for processing, modelling, and visualization that all interoperate seamlessly and work with tablecloth datasets, like the tidyverse in R. Part of achieving that is making sure that tablecloth is rock solid, which just takes a lot of poking and prodding.

London Clojurians talk

This talk was a big inspiration for diving deep into Clojure's data science ecosystem. I experimented with a ton of different datasets for the workshop and discovered tons of potential areas for future development. Trying to put together a polished data workflow really exposed many of the key areas I think we should be focusing on and gave me a lot of inspiration for future work. I spent a ton of time exploring all of the possible ways to demonstrate a broad sample of data science tools and learned a lot along the way.

The resources from the talk are all available in this repo and the video will be posted soon.

Summary of future work

I mentioned a few areas of focus above, below is a summary of the ongoing work as I see it. A framework for organizing this work is starting to emerge, and I've been thinking about in terms of four key areas:

Visualisation

Priority here is to release a stable dataviz API using the tools and wrappers we currently have so that we can start releasing guides and tutorials that follow a consistent style.
The long-term goal is to develop a robust, flexible, and stable data visualization library in Clojure itself based on the grammar of graphics.

Machine learning

Priority is to decide which APIs we will commit to supporting in the long term and stabilize the "glue" libraries that provide the high-level APIs for data-first users.
Long term goal is to support the full spectrum of libraries and models that are in everyday use by data science professionals.

Statistics

Priority is to document the current options for accomplishing basic statistical modelling tasks, including Clojure libraries we do have, Java libs, and Python interop.
Long term goal is to have tablecloth-compatible stats libraries implemented in pure Clojure.

Foundations

Priority is to build a tidyverse for Clojure. This includes battle-testing tablecloth, fully documenting its capabilities, and fixing remaining, small, sharp edges.

Going forward

My overarching goal (personally) is still to write a canonical resource for working with Clojure's data science stack (the Clojure Data Cookbook), and I'm still working on finding the right balance of documenting "work-in-progress" tools and libraries vs. delaying progress until I feel they are more "ready". Until now I've let the absence of stable or ideal APIs in certain areas hinder development of this book, but I'm starting to feel very confident in my understanding of the current direction of the ecosystem, enough so that I would feel good about releasing something a little bit more formal than a tutorial or guide and recommending usages with the caveat that development is ongoing in some areas. And while it will take a while to get where we want to go, I feel like I can finally see the path to getting there. It just takes a lot of work and lot of collaboration, but with your support we'll make it happen! Thanks for reading.

Permalink

OSS Updates March and April 2024

Grammar of graphics in Clojure

Clojure's ML and statistics tools

Smile will be slowly sunsetted from the ecosystem. The switch to GPL licensing was made in bad faith and many of the common models don't work on Apple chips. Given the abundance of suitable alternatives, the easiest option is to move away from depending on it.
A greater distinction between statistical modelling and machine learning workflows will be helpful. Right now there are many uses of the various models that are available in Clojure, and the wrappers and tools surrounding them are usually designed with a specific type of user in mind. For example machine learning people almost always have separate training and testing datasets, whereas statisticians "train" their models on an entire dataset. The highest-level APIs for these different usages (among others) look quite different, and we would benefit from having APIs that are ergonomic and familiar to our target users of various backgrounds.
We should agree on standards for accomplishing certain very common and basic tasks and propose a recommended usage for users. For example, there are almost a dozen ways to do linear regression in Clojure and it's not obvious which is "the best" way to someone not deeply familiar with the ecosystem.
Everything should work with tablecloth datasets and expect them as inputs. This is mostly the case already, but there is still some progress to be made.

Foundations of Clojure's data science stack

London Clojurians talk

The resources from the talk are all available in this repo and the video will be posted soon.

Summary of future work

Visualisation

Priority here is to release a stable dataviz API using the tools and wrappers we currently have so that we can start releasing guides and tutorials that follow a consistent style.
The long-term goal is to develop a robust, flexible, and stable data visualization library in Clojure itself based on the grammar of graphics.

Machine learning

Priority is to decide which APIs we will commit to supporting in the long term and stabilize the "glue" libraries that provide the high-level APIs for data-first users.
Long term goal is to support the full spectrum of libraries and models that are in everyday use by data science professionals.

Statistics

Priority is to document the current options for accomplishing basic statistical modelling tasks, including Clojure libraries we do have, Java libs, and Python interop.
Long term goal is to have tablecloth-compatible stats libraries implemented in pure Clojure.

Foundations

Priority is to build a tidyverse for Clojure. This includes battle-testing tablecloth, fully documenting its capabilities, and fixing remaining, small, sharp edges.

Going forward

Permalink

OSS updates March and April 2024

In this post I'll give updates about open source I worked on during March and April 2024.

To see previous OSS updates, go here.

Updates

Here are updates about the projects/libraries I've worked on last month.

squint: CLJS syntax to JS compiler
- #509: Optimization: use arrow fn for implicit IIFE when possible
- Optimization: emit const in let expressions, which esbuild optimizes better
- Don't wrap arrow function in parens, see this issue
- Fix #499: add support for emitting arrow functions with ^:=> metadata
- Fix #505: Support :rename in :require
- Fix #490: render css maps in html mode
- Fix #502: allow method names in defclass to override squint built-ins
- Fix #496: don't wrap strings in another set of quotes
- Fix rendering of attribute expressions in HTML (should be wrapped in quotes)
- Compile destructured function args to JS destructuring when annotated with ^:js. This benefits working with vitest and playwright.
- #481: BREAKING, squint no longer automatically copies all non-compiled files to the :output-dir. This behavior is now explicit with :copy-resources, see docs.
- Add new #html reader for producing HTML literals using hiccup. See docs and playground example.
- #483: Fix operator precedence problem
neil: A CLI to add common aliases and features to deps.edn-based projects.
Released version 0.3.65 with the following changes:
- #209: add newlines between dependencies
- #185: throw on non-existing library
- Bump babashka.cli
- Fetch latest stable slipset/deps-deploy, instead of hard-coding (@vedang)
- Several emacs package improvements (@agzam)
clj-kondo: static analyzer and linter for Clojure code that sparks joy.
Released 2024.03.13
- Fix memory usage regression introduced in 2024.03.05
- #2299: Add documentation for :java-static-field-call.
- #1732: new linter: :shadowed-fn-param which warns on using the same parameter name twice, as in (fn [x x])
- #2276: New Clojure 1.12 array notation (String*) may occur outside of metadata
- #2278: bigint in CLJS is a known symbol in extend-type
- #2288: fix static method analysis and suppressing :java-static-field-call locally
- #2293: fix false positive static field call for (Thread/interrupted)
- #2296: publish multi-arch Docker images (including linux aarch64)
- #2295: lint case test symbols in list
  Unreleased changed:
- #1035: Support SARIF output with --config {:output {:format :sarif}}
- #2309: report unused for expression
- #2135: fix regression with unused JavaScript namespace
- #2302: New linter: :equals-expected-position to enforce expected value to be in first (or last) position. See docs
- #2304: Report unused value in defn body
CLI: Turn Clojure functions into CLIs!
Released version 0.8.58-59
- Fix #96: prevent false defaults from being removed/ignored
- Fix #91: keyword options and hyphen options should not mix
- Fix #89: long option never represents alias
rewrite-edn: Utility lib on top of rewrite-clj with common operations to update EDN while preserving whitespace and comments
Released 0.4.8 with the following update:
- Add newline after adding new element to top level map with assoc-in
nbb: Scripting in Clojure on Node.js using SCI
- nbb bundle JS output will ignore nbb.edn
- #351: Update bun docs/example.
- Add cljs.core/exists?
clojure-mode: Clojure/Script mode for CodeMirror 6.
- Fix #49: bug with hitting backspace after line comment Test it in the squint playground.
instaparse-bb: Use instaparse from babashka
- Serialize regexes in parse results
scittle: Execute Clojure(Script) directly from browser script tags via SCI
Released v0.6.17
- #77: make dependency on browser (js/document) optional so scittle can run in webworkers, Node.js, etc.
- #69: executing script tag with src + whitespace doesn't work
- #72: add clojure 1.11 functions like update-vals
- #75: Support reader conditionals in source code
cherry: Experimental ClojureScript to ES6 module compiler
- #127: fix duplicate cherry-cljs property in package.json which caused issues with some bundlers
- Bump squint common compiler code
clerk
- #646 Fix parsing + location issue which fixes compatibility with honey.sql
http-client: babashka's http-client
Released 0.4.17-19
- #55: allow :body be java.net.http.HttpRequest$BodyPublisher
- Support a Clojure function as :client option, mostly useful for testing
- #49: add ::oauth-token interceptor
- #52: document :throw option
bbin: Install any Babashka script or project with one command
These fixes have been made by @rads:
- Fix #62: bbin ls is unnecessarily slow
- Fix #72: bbin install [LOCAL-FILE] should not be restricted to files with the .clj extension
SCI: Configurable Clojure/Script interpreter suitable for scripting and Clojure DSLs
- Fix #626: add cljs.core/exists?
- Fix #919: :js-libs + refer + rename clashes with core var
- Fix #906: merge-opts loses :features or previous context
deps.clj: A faithful port of the clojure CLI bash script to Clojure
- Fix Windows issue related to relative paths (which took me all day, argh!)
fs - File system utility library for Clojure
- #122: fs/copy-tree: fix copying read-only directories with children (@sohalt)
- #127: Inconsistent documentation for the :posix-file-permissions options (@teodorlu)
babashka: native, fast starting Clojure interpreter for scripting.
- Fix #1679: bump timbre and fix wrapping timbre/log!
- Add java.util.concurrent.CountDownLatch
- Add java.lang.ThreadLocal
- Bump versions of included libraries

Other projects

These are (some of the) other projects I'm involved with but little to no activity happened in the past month.

Click for more details

process: Clojure library for shelling out / spawning sub-processes
babashka.json: babashka JSON library/adapter
tools-deps-native and tools.bbuild: use tools.deps directly from babashka
edamame: Configurable EDN/Clojure parser with location metadata
http-server: serve static assets
squint-macros: a couple of macros that stand-in for applied-science/js-interop and promesa to make CLJS projects compatible with squint and/or cherry.
sci.configs: A collection of ready to be used SCI configs.
grasp: Grep Clojure code using clojure.spec regexes
lein-clj-kondo: a leiningen plugin for clj-kondo
http-kit: Simple, high-performance event-driven HTTP client+server for Clojure.
babashka.nrepl: The nREPL server from babashka as a library, so it can be used from other SCI-based CLIs
jet: CLI to transform between JSON, EDN, YAML and Transit using Clojure
quickdoc: Quick and minimal API doc generation for Clojure
pod-babashka-go-sqlite3: A babashka pod for interacting with sqlite3
pod-babashka-fswatcher: babashka filewatcher pod
lein2deps: leiningen to deps.edn converter
sql pods: babashka pods for SQL databases
cljs-showcase: Showcase CLJS libs using SCI
babashka.book: Babashka manual
rewrite-clj: Rewrite Clojure code and edn
pod-babashka-buddy: A pod around buddy core (Cryptographic Api for Clojure).
gh-release-artifact: Upload artifacts to Github releases idempotently
carve - Remove unused Clojure vars
4ever-clojure - Pure CLJS version of 4clojure, meant to run forever!
pod-babashka-lanterna: Interact with clojure-lanterna from babashka
joyride: VSCode CLJS scripting and REPL (via SCI)
clj2el: transpile Clojure to elisp
deflet: make let-expressions REPL-friendly!
deps.add-lib: Clojure 1.12's add-lib feature for leiningen and/or other environments without a specific version of the clojure CLI

Permalink

Rama is a testament to the power of Clojure

It took more than ten years of full-time work for Rama to go from an idea to a production system. I shudder to think of how long it would have taken without Clojure.

Rama is a programming platform that integrates and generalizes backend development. Whereas previously backends were built with a hodgepodge of databases, application servers, queues, processing systems, deployment tools, monitoring systems, and more, Rama can build end-to-end backends at any scale on its own in a tiny fraction of the code. At its core is a new programming language implementing a new programming paradigm, at the same level as the “object-oriented”, “imperative”, “logic”, and “functional” paradigms. Rama’s Clojure API gives access to this new language directly, and Rama’s Java API is a thin wrapper around a subset of this language.

There’s a lot in Clojure’s design that’s been instrumental to developing Rama. Three things stand out in particular: its flexibility for defining abstractions, its emphasis on immutability, and its orientation around programming with plain data structures. Besides these being essential to maintaining simplicity in Rama’s implementation, Rama also embraces these principles in its approach to distributed programming and indexing.

Ability to do in libraries what requires language support in other languages

Rama’s language is Turing-complete and defined largely via Clojure macros. So it’s still Clojure, but its semantics are different in many fundamental ways. At its core, Rama generalizes the concept of a function into something called a “fragment”. Whereas a function works by taking in any number of input parameters and then returning a single value as the last thing it does, a fragment can output many times (called “emitting”), can output to multiple “output streams”, and can do more work between or after emitting. A function is just a special case of a fragment. Rama fragments compile to efficient bytecode, and fragments that happen to be functions execute just as efficiently as functions in Java or Clojure.

Even though Rama contains this new programming language implementing this new programming paradigm, it’s still Clojure. So it interoperates perfectly. Rama code can invoke Clojure code directly, and Clojure code can invoke Rama directly as well. There’s no friction between them. Rama itself is implemented in a mixture of regular Clojure code and Rama code.

Neither Rich Hickey nor John McCarthy ever envisioned this completely different programming paradigm being built within their abstractions, much less one that reformulates the basis of nearly every programming language (the function). They didn’t need to. Clojure, along with its Lisp predecessors, are languages that put almost no limitations on your ability to form abstractions. With every other language you at least have to conform to their syntax and basic semantics, and you have limited ability to control what happens at compile-time versus runtime. Lisps have great control over what happens at compile-time, which lets you do incredible things.

Lisp programmers have struggled ever since it was invented to explain why this is so powerful and why this has a major impact on simplifying software development. So I won’t try to explain how powerful this is in general and will focus on how instrumental it was for Rama. I’ll instead point you to Paul Graham’s essay “Beating the Averages”, which was the essay that first inspired me to learn Lisp back when I was in college. When I first read that essay I didn’t understand it completely, but I was particularly compelled by the lines “A big chunk of our code was doing things that are very hard to do in other languages. The resulting software did things our competitors’ software couldn’t do. Maybe there was some kind of connection.”

The new language at Rama’s core is an example of this. Other languages can only have multiple fundamentally different paradigms smoothly interoperating if designed and implemented at the language level. Otherwise, you have to resort to string manipulation (as is done with SQL), which is not smooth and creates a mess of complexity. No amount of abstraction can hide this complexity completely, and attempting to often creates new complexities (like ORMs).

With Clojure, you can do this at the library level. We required no special language support and built Rama on top of Clojure’s basic primitives.

Rama’s language is not our only example of mixing paradigms like this. Another example is Specter. Specter is a generically useful library for querying and manipulating data structures. It’s also a critical part of Rama’s API (since views in Rama, called PStates, are durable data structures of any composition), and it’s a critical part of Rama’s implementation. About 1% of the lines of code in Rama’s source and tests are Specter callsites.

You can define Specter’s abstractions in any language. What makes it special in Clojure is how performant it is. Queries and manipulations with Specter are faster than even hand-rolled Clojure code. The key to Specter’s performance is its inline caching and compilation system. Inline caching is a technique I’ve only seen used before at the language or VM level. It’s a critical part of how the JVM implements polymorphism, for example. Because of the flexibility of Clojure, and the ability to program what happens at compile-time for a Specter callsite, we’re able to utilize the technique at the library level. It’s all done completely behind the scenes, and users of Specter get an expressive and concise API that’s extremely fast.

Power of immutability and data structure orientation

Clojure is unique among Lisps in the degree that it emphasizes immutability. It’s core API is oriented to working with immutable data structures. Additionally, Clojure encourages representing program state with plain data structures and having an expressive API for working with those data structures. The quote “It’s better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.” is part of Clojure’s rationale.

These philosophies have had a major impact on Rama’s development, helping a tremendous amount in managing complexity within Rama’s implementation. The less state you have to think about in a section of code, the easier it is to reason about. When a project gets as big as Rama (190k lines of source, 220k lines of tests), with many layers of abstractions and innumerable subsystems, it’s impossible to keep even a fraction of the whole system “in your head” for reasoning. I frequently have to re-read sections of code to remind myself on the details of that particular subsystem. The dividends you get from lowering complexity of the system, with immutability being a huge part of that, compounds more and more the bigger the codebase gets.

Clojure doesn’t force immutability for every situation, which is also important. Rama tracks a lot of different kinds of state, and we find it much simpler in some cases to use mutability rather than work with state indirectly as you would through something like the State Monad in Haskell. There are also some algorithms that are much simpler to write when they use a volatile internal in the implementation. That said, the vast majority of code in Rama is written in an immutable style. When we use mutability it’s almost always isolated within a single thread. Rather than have concurrent mutability using something like an atom, we use a volatile and send events to its owning thread to interact with it.

Rama embraces and expands upon Clojure’s principles of immutability and orienting code around data structures. These principles are fundamental to Rama’s approach for expressing end-to-end backends. A lot of Rama programming revolves around materializing views (PStates), which are literally just data structures interacted with using the exact same Specter API as used to interact with in-memory data structures. This stands in stark contrast with databases, which have fixed data models and special APIs for interacting with them. Any database can be replicated in a PState in both expressivity and performance, since a data model is just a specific combination of data structures (e.g. key/value is a map, column-oriented is a map of sorted maps, document is a map of maps, etc.).

Rama’s language extends Clojure’s immutable principles into writing distributed, fault-tolerant, and async code. There’s a lot of similarities with Clojure like anonymous operations with lexical closures, immutable local variables, and identical semantics when it comes to shadowing. Rama takes things a step further for distributed computation, doing things like scope analysis to determine what vars needs to be transferred across network boundaries. Rama’s loops have similar syntax to Clojure and have the additional capability of being able to be a distributed computation that hops around the cluster during loop iterations. With Rama this is all written linearly through the power of dataflow, with switching threads/nodes being an operation like anything else (called a “partitioner”).

Clojure’s principles are just sound ideas that really do make a huge impact on simplifying software development. These principles are even more relevant in distributed systems / databases which historically have been overrun with complexity. That’s why these principles are so core to Rama and its implementation.

Conclusion

There’s a seeming contradiction here – if Clojure enables such productivity gains, then why is it still a niche language in the industry? Why aren’t those using Clojure crushing their competition so thoroughly that every programmer is now rushing to adopt Clojure to even the playing field?

I believe this is simply because Clojure does not address all aspects of software development. This is not a criticism of Clojure but a recognition of its scope. Things like durable data storage, deployment, monitoring, evolving an application’s state and logic over time, fault-tolerance, and scaling are huge costs of building end-to-end software. Oftentimes the principles of Clojure are corrupted when using a database, as the database forces you to orient your code around its data model and capabilities.

This is why we’re so excited about Rama and have worked so long on it, because Rama does address everything involved in building end-to-end backends, no matter the scale. Rama provides flexible data storage expressed in terms of data structures, has deployment and monitoring built-in, has first-class features for evolving an application and updating it, is completely fault-tolerant, and is inherently scalable. It does all this while maintaining Clojure’s great principles and functional programming roots.

If you’d like to discuss on the Clojure Slack, we’re active in the #rama channel.

Permalink

What I learned from Alcoa

Rally to a vision to help people change

Permalink

Clojurists Together project - Scicloj community building - April 2024 update

The Clojurists Together organisation has decided to sponsor Scicloj community building for Q1 2024, as a project by Daniel Slutsky. The project is taking place in February, March, and April 2024. Here is Daniel’s update for April. Here are the previous ones: Feb 2024, Mar 2024 Comments and ideas would help. 🙏 Clojurists Together update - April 2024 - Daniel Slutsky # April 2024 was the last of three months on the Clojurists Together project titled “Scicloj Community Building and Infrastructure”.

Permalink

Clojure 1.12.0-alpha11

Clojure 1.12.0-alpha11 is now available! Find download and usage information on the Downloads page.

CLJ-2848 - Qualified instance methods without param-tags should use the qualified method class, not the target object type
CLJ-2847 - Improve error message when a qualified method in value position matches no methods

Permalink

S3 presigned URL generation with Babashka

A while back, I needed to generate presigned URLs for S3 objects in Amazon Web Services. I wanted to use Babashka (Clojure scripting), to avoid my painful friend from the past - Bash.

I looked all the usual places for a Clojure-friendly approach, but even Cognitect’s AWS API did not have any means to presign URLs. Everybody seemed to reluctantly tolerate having to use AWS Java SDK directly for presigning URLs.

— Java interop, oh joy 😣😅

Being forced to use AWS Java SDK would mean a no-go for Babashka. Also, based on how often similar questions pop up, I felt it deserved a better solution.

Let’s imagine for a moment that the AWS Java SDK would work just fine with Babashka, the dependency complexity still matters.

I am not using any scientific way of measuring it, I just follow my intuition. I notice when dependencies take up many MB, when they themselves have many dependencies, or when they contain lots of unneeded code (which potentially could contain security issues).

The com.amazonaws/aws-java-sdk-s3 dependency contains several hundred Java classes, of which - I assume - only a few would be needed to sign URLs. The dependency would also increase the total dependency size by around 7MB. At least to me, that is disproportionate compared to the less than 250 lines of Clojure code, that made up the initial take on solving URL signing.

6 months ago I released the code in a reusable library named aws-simple-sign, stitched together from code snippets in GitHub issues, Gists, blog posts, mailing lists and long hopeless stares at the screen.

Since then, I have made a few small improvements, like support for signing HTTP requests and opt-in support for the legacy “path style” URLs to name some. But I never got around to implementing support for container credential providers.

A couple of days ago, I found myself needing exactly this, and suddenly I got the idea that leveraging Cognitects AWS API client would give me the functionality “for free”.

Blinded by “I need it now”, the ego-stroking notion of having a good idea, and only requiring the functionality in the JVM, I failed to ask the most important question:
Will it work with Babashka?

Surprise… it doesn’t, but alas, the Clojure community heroes came to the rescue. It turns out that I am not the only person who finds it important to have good tooling available in Babashka.

I was kindly pointed in the direction of awyeah-api (aka. aws-api for Babashka), which creates a client that is compatible with Cognitect’s AWS API client.
— Thank you ❤️

By outsourcing “providing credentials and config” using an external client, I simplified the code by having less responsibility (still less than 250 lines of code with more features). Also, these clients are battle-tested (or at least “better tested”), compared to my old sad excuse for a “client implementation”. On top, the external clients are faster because they cache credentials and config, avoiding re-reading files from disk and environment variables.

But “How does using the library look?” you ask.

(require '[com.grzm.awyeah.client.api :as aws]
         '[aws-simple-sign.core :as aws-sign])

(def client
  (aws/client {:api :s3}))

(aws-sign/generate-presigned-url client "somebucket" "someobject.txt" {})
; "https://somebucket.s3.us-east-1.amazonaws.com/someobject.txt?X-Amz-Security-Token=FwoG..."

It has been tested both using an AWS S3 bucket and locally using MinIO, a Cloud storage server compatible with Amazon S3. MinIO has an excellent Docker image, perfect for testing S3 code in a local setup.

(def client
  (aws/client {:api :s3
               :endpoint-override {:protocol :http
                                   :hostname "localhost"
                                   :port 9000}}))

(aws-sign/generate-presigned-url client "somebucket" "someobject.txt"
                                 {:path-style true})
; http://localhost:9000/somebucket/someobject.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&..."

According to “Use endpoints in the AWS CLI” documentation, endpoints can be provided by either using the environment variables AWS_ENDPOINT_URL and AWS_ENDPOINT_URL_S3 or using the endpoint_url setting within a profile - BUT neither client libraries pick up on that. 🤷

For now use :endpoint-override as shown above.

I would love to get some feedback from people using this on non-Amazon clouds like Google, Azure and Digital Ocean among others. There are lots of things I still don’t understand about all the different scenarios in which the S3 technology is used.

On a final note, support for upload URLs is still on the TODO list, but I suspect (hope?) it wouldn’t be too hard to crack.

I really hope this will be useful for others than me.

Permalink

Clojure 1.12.0-alpha10

Clojure 1.12.0-alpha10 is now available! Find download and usage information on the Downloads page.

Method values

Clojure programmers often want to use Java methods in higher-order functions (e.g. passing a Java method to map). Until now, programmers have had to manually wrap methods in functions. This is verbose, and might require manual hinting for overload disambiguation, or incur incidental reflection or boxing.

Programmers can now use Java qualified methods as ordinary functions in value contexts - the compiler will automatically generate the wrapping function. New in this release: the compiler will generate a reflective call when a qualified method does not resolve due to overloading. Developers can supply :param-tags metadata on qualified methods to specify the signature of a single desired method, 'resolving' it.

New in this release: the compiler will generate a reflective call when param tags are not supplied on a qualified method that does not resolve due to overloading.

See: CLJ-2793, CLJ-2844, CLJ-2835

Qualified methods - `Class/method`, `Class/.method`, and `Class/new`

Java members inherently exist in a class. For methods as values we need a way to explicitly specify the class of an instance method because there is no possibility for inference.

Qualified methods have value semantics when used in non-invocation positions:

Classname/method - value is a Clojure function that invokes a static method
Classname/.method - value is a Clojure function that invokes an instance method
Classname/new - value is a Clojure function that invokes a constructor

New in this release: developers must use Classname/method and Classname/.method syntax to differentiate between static and instance methods.

Qualified method invocations with param-tags use only the tags to resolve the method. Without param-tags they behave like the equivalent dot syntax, except the qualifying class takes precedence over hints of the target object, and over its runtime type when invoked via reflection.

Note: Static fields are values and should be referenced without parens unless they are intended as function calls, e.g (System/out) should be System/out. Future Clojure releases will treat the field’s value as something invokable and invoke it.

See: CLJ-2844

:param-tags metadata

When used as values, qualified methods supply only the class and method name, and thus cannot resolve overloaded methods.

Developers can supply :param-tags metadata on qualified methods to specify the signature of a single desired method, 'resolving' it. The :param-tags metadata is a vector of zero or more tags: [… tag …]. A tag is any existing valid :tag metadata value. Each tag corresponds to a parameter in the desired signature (arity should match the number of tags). Parameters with non-overloaded types can use the placeholder _ in lieu of the tag. When you supply :param-tags metadata on a qualified method, the metadata must allow the compiler to resolve it to a single method at compile time.

A new metadata reader syntax ^[ … ] attaches :param-tags metadata to member symbols, just as ^tag attaches :tag metadata to a symbol.

See: CLJ-2805

Array class syntax

Clojure supports symbols naming classes both as a value (for class object) and as a type hint, but has not provided syntax for array classes other than strings.

Developers can now refer to an array class using a symbol of the form ComponentClass/#dimensions, eg String/2 refers to the class of a 2 dimensional array of Strings. Component classes can be fully-qualified classes, imported classes, or primitives. Array class syntax can be used as both type hints and values.

Examples: String/1, java.lang.String/1, long/2.

See: CLJ-2807

Bug fixes

CLJ-2843 - Reflective calls to Java methods that take primitive long or double now work when passed a narrower boxed number at runtime (Integer, Short, Byte, Float). Previously, these methods were not matched during reflection and an error was thrown.
CLJ-2841 - IDeref should also implement DoubleSupplier

Permalink

jank development update - Lazy sequences!

This quarter, I&aposm being funded by Clojurists Together to build out jank&aposs lazy sequences, special loop* form, destructuring, and support for the for and doseq macros. Going into this quarter, I had only a rough idea of how Clojure&aposs lazy sequences were implemented. Now, a month in, I&aposm ready to report some impressive progress!

Permalink

Let's write a simple microservice in Clojure

Initially, this post was published here: https://www.linkedin.com/pulse/lets-write-simple-microservice-clojure-andrew-panfilov-2ghqe/

Intro

This article will explain how to write a simple service in Clojure. The sweet spot of making applications in Clojure is that you can expressively use an entire rich Java ecosystem. Less code, less boilerplate: it is possible to achieve more with less. In this example, I use most of the libraries from the Java world; everything else is a thin Clojure wrapper around Java libraries.

From a business logic standpoint, the microservice calculates math expressions and stores the history of such calculations in the database (there are two HTTP endpoints for that).

Github repository with source code: https://github.com/dzer6/calc

This educational microservice project will provide the following:

Swagger descriptor for REST API with nice Swagger UI console. Nowadays, it is a standard de facto. Microservices should be accessible via HTTP and operate with data in a human-readable JSON format. As a bonus, it is super easy to generate data types and API client code for the client side (it works well for a TypeScript-based front-end, for example).
Postgres-based persistence with a pretty straightforward mapping of SQL queries to Clojure functions. If you have ever used Java with Hibernate ORM for data persistence, you will feel relief after working with the database in Clojure with Hugsql. The model of the persistence layer is much simpler and easier to understand without the need for Session Cache, Application Level Cache and Query Cache. Debugging is straightforward, as opposed to the nightmare of debugging asynchronous actual SQL invocation that is never in the expected location. It is such an incredible experience to see the query invocation result as just a sequence of plain Clojure maps instead of a bag of Java entity proxies.
REPL-friendly development setup. DX (dev experience) might not be the best in class, but it is definitely not bad. Whenever you want to change or add something to the codebase, you start a REPL session in an IDE (in my case, Cursive / IntelliJ Idea). You can run code snippets to print their results, change the codebase, and reload the application. In addition, you can selectively run needed tests. You do not need to restart the JVM instance every time after the codebase changes (JVM is famous for its slow start time). Using the mount library, all stateful resources shut down and initialize correctly every reload.

Leiningen

The project.clj file is a configuration file for Leiningen, a build automation and dependency management tool for Clojure. It specifies the project's metadata, dependencies, paths, and other settings necessary for building the project. Let's break down the libraries listed in the project.clj file into two groups: pure Java libraries and Clojure libraries, and describe each.

Clojure Libraries:

org.clojure/clojure: The Clojure language itself.
org.clojure/core.memoize: Provides memoization capabilities to cache the results of expensive functions.
org.clojure/tools.logging: A simple logging abstraction that allows different logging implementations.
mount: A library for managing state in Clojure applications.
camel-snake-kebab: A library for converting strings (and keywords) between different case formats.
prismatic/schema: A library for structuring and validating Clojure data.
metosin/schema-tools: Utilities for Prismatic Schema.
clj-time: A date and time library for Clojure.
clj-fuzzy: A library for fuzzy matching and string comparison.
slingshot: Provides enhanced try/catch capabilities in Clojure.
ring: A Clojure web applications library.
metosin/compojure-api: A library for building REST APIs with Swagger support.
cprop: A configuration library for Clojure.
com.taoensso/encore: A utility library providing additional Clojure and Java interop facilities.
com.zaxxer/HikariCP: A high-performance JDBC connection pooling library.
com.github.seancorfield/next.jdbc: A modern, idiomatic JDBC library for Clojure.
com.layerware/hugsql-core: A library for defining SQL in Clojure applications.
metosin/jsonista: A fast JSON encoding and decoding library for Clojure.

Pure Java Libraries:

ch.qos.logback: A logging framework.
org.codehaus.janino: A compiler that reads Java expressions, blocks, or source files, and produces Java bytecode.
org.slf4j: A simple logging facade for Java.
org.postgresql/postgresql: The JDBC driver for PostgreSQL.
org.flywaydb: Database migration tool.
com.fasterxml.jackson.core: Libraries for processing JSON.
org.mvel/mvel2: MVFLEX Expression Language (MVEL) is a hybrid dynamic/statically typed, embeddable Expression Language and runtime.

To build the project, just run it in a terminal: lein uberjar

The path to a resulting fat-jar with all needed dependencies: target/app.jar

Frameworks VS Libraries

In the Java world, one common approach is to use full-fledged frameworks that provide comprehensive solutions for various aspects of software development. These frameworks often come with a wide range of features and functionalities built-in, aiming to simplify the development process by providing pre-defined structures and conventions. Examples of such frameworks include the Spring Framework, Java EE (now Jakarta EE), and Hibernate.

On the other hand, in the Clojure world, the approach tends to favour using small, composable libraries rather than monolithic frameworks. Clojure promotes simplicity and flexibility, encouraging developers to choose and combine libraries that best fit their needs. These libraries typically focus on solving one problem well, making them lightweight and easy to understand. Examples of popular Clojure libraries include Ring for web development, Compojure for routing, and Spec for data validation.

The difference between these approaches lies in their philosophies and design principles. Full bloated frameworks in the Java world offer convenience and a one-size-fits-all solution but may come with overhead and complexity. In contrast, small libraries in the Clojure world emphasize simplicity, modularity, and flexibility, allowing developers to build tailored solutions while keeping the codebase lightweight and maintainable.

Docker

Dockerfile sets up a containerized environment for the application, leveraging Amazon Corretto 22 on Alpine Linux. It downloads the AWS OpenTelemetry Agent (you can use the standard one if you don't need AWS-related) to enable observability features, including distributed tracing, and then copies the application JAR file into the container. Environment variables are configured to include the Java agent for instrumentation and allocate 90% of available RAM (which is useful for a container-based setup). Finally, it exposes port 8080 and specifies the command to start the Java application server.

Dev Experience

REPL

The Read-Eval-Print Loop in Clojure is a highly effective tool for interactive development, which allows developers to work more efficiently by providing immediate feedback. Unlike traditional compile-run-debug cycles, the REPL enables developers to evaluate expressions and functions on the fly, experiment with code snippets, and inspect data structures in real time. This makes the development process more dynamic and exploratory, leading to a deeper understanding of the codebase. Additionally, the REPL's seamless integration with the language's functional programming paradigm empowers developers to embrace Clojure's expressive syntax and leverage its powerful features, ultimately enhancing productivity and enabling rapid prototyping and iterative development cycles. REPL is a bee's knees, in other words.

First you start REPL-session:

Next you type (init) to invoke initialization function and press Enter – application will start and you will see something similar to:

The session logs show that the application loads configurations and establishes a connection with a PostgreSQL database. This involves initializing a HikariCP connection pool and Flyway for database migrations. The logs confirm that the database schema validation and migration checks were successful. The startup of the Jetty HTTP server follows, and the server becomes operational and ready to accept requests on the specified port.

After any code change to apply it, you should type (reset) and press Enter.

To run tests, you should type (run-tests) and press Enter.

Docker Compose

Using Docker Compose to run Postgres and any third-party services locally provides a streamlined and consistent development environment. Developers can define services in a docker-compose.yml file, which enables them to configure and launch an entire stack with a single command. In this case, Postgres is encapsulated within a container with predefined configurations. Docker Compose also facilitates easy scaling, updates, and isolation of services, enhancing development efficiency and reducing the setup time for new team members or transitioning between projects. It encapsulates complex configurations, such as Postgres' performance monitoring and logging settings, in a manageable, version-controlled file, simplifying and replicating the service setup across different environments.

Stateful Resources

The mount Clojure library is a lightweight and idiomatic solution for managing application state in Clojure applications. It offers a more straightforward and functional approach than the Spring Framework, which can be more prescriptive and heavy. Mount emphasizes simplicity, making it an excellent fit for the functional programming paradigm without requiring extensive configuration or boilerplate code. This aligns well with Clojure's philosophy, resulting in a more seamless and efficient development experience.

Only two functions: for start and stop.

REST API

Compojure's DSL for web applications makes it easy to set up REST API routes with corresponding HTTP methods. Adding a Swagger API descriptor through libraries like ring-swagger provides a visual interface for interacting with the API and enables client code generation. You can use the Prismatic schema library for HTTP request validation and data coercing to ensure the API consumes and produces data that conforms to predefined schemas. Compojure's middleware approach allows for modular and reusable components that can handle cross-cutting concerns like authentication, logging, and request/response transformations, enhancing the API's scalability and maintainability.

The middleware chain is set up in HTTP server-related namespace:

Developers and QA engineers find Swagger UI console highly convenient. I encourage you to run the service locally and try the console in a browser. Here is a list of HTTP endpoints with data schemas:

Isn't it awesome?

Business Logic

The calc.rpc.controller.calculation controller houses the business logic that defines two primary operations: evaluate and obtain-past-evaluations.

The evaluate operation processes and evaluates mathematical expressions received as requests, storing the results in a database:

The obtain-past-evaluations operation fetches a list of previously executed calculations based on provided offset and limit parameters:

Ensuring that exceptions or database inconsistencies are handled gracefully is crucial for the successful execution of these operations.

The integration of external libraries, MVEL (MVFLEX Expression Language) for expression evaluation, and JDBC for database transactions highlights Clojure's interoperability with Java.

Another essential principle demonstrated by using the MVEL library is never to write your implementation of something already written in Java in Clojure. Most of your business cases are already covered by some Java library written, stabilized, and optimized years ago. You should have strong reasons to write something from scratch in Clojure instead of using a Java analog.

Persistence Layer

Thanks to the hugsql library, we can use autogenerated Clojure functions directly mapped to SQL queries described in a plain text file:

As Clojure is not an object-oriented language, we don't need to specially map query result sets coming from a relational database to a collection of objects in a programming language. No OOP, no ORM. Very convenient. The relational algebra paradigm seamlessly marries with a functional paradigm in Clojure. Very natural:

Compared to NoSQL databases, migrating the data schema in relational databases such as Postgres is a well-established practice. This is typically done through migrations, which is made easy by using the flyway library. To adjust the data schema in Postgres, we simply need to create a new text file containing the Data Definition Language (DDL) commands. In our case there is only one migration file:

Whenever you change an SQL query in the queries.sql file, do not forget to run the (reset) function in the REPL-session console. It automatically regenerates the Clojure namespace with query declarations and runtime-generated SQL wrapper functions.

Configuration

The system uses the Clojure library cprop to manage its configuration. The library adopts a sequential merge policy to construct the application's configuration map. It starts by loading default-config.edn from resources and overlays it with local-config.edn if available. Then, it applies settings from an external config.edn and overrides by environment variables (adhering to the 12-factor app guidelines). This ensures that the latest source has precedence.

The configuration is essential during development and is a Clojure map validated against a Prismatic schema. If discrepancies are detected, the system immediately shuts down, adhering to the fail-fast principle.

Additionally, feature flags within the configuration enable selective feature toggling, aiding in the phased introduction of new functionality and ensuring robustness in production environments.

Logging

The service utilizes org.clojure/tools.logging to offer a logging API at a high level, which works in conjunction with Logback and Slf4j—two Java libraries that are well-known for their reliability in logging. The logging setup is customized for the application's environment: while in development, logs are produced in a plain text format that is easy to read, allowing for efficient debugging. On the other hand, when the service is deployed on servers, logs are structured in a JSON format, which makes them ideal for machine parsing and analysis, optimizing their performance in production.

Tests

This is a real-world industrial example. Yes, we do have tests. Not many. But for this size codebase is pretty much okay.

Unfortunately, most open-source Clojure-based projects on Github do not contain good examples of integration tests. So, here we are, trying to close this gap.

We use the TestContainers library to raise real Postgres instances during the tests. Before Docker and TestContainers, the standard de facto in the Java world was running embedded pure Java database H2, trying to mimic Postgres. It was not good, but there was not much choice then.

The evaluate operation integration test:

The obtain-past-evaluations operation integration test:

After the tests run, you should see this:

Conclusion

Now, when you go through the service codebase and know its internals, you can copy-paste it for yourself, change it according to your requirements, and voila, you will have a really good-looking microservice.

The described codebase is based on years of Clojure programming and a number of projects that have been implemented in Clojure. Some used libraries may look outdated, but in the Clojure world, if a library works, it is okay not to update it often—the language itself is super-stable, and you can easily read and support code written even a decade ago.

Permalink

Virtual Threads in Clojure

How to use Project Loom from Clojure and take advantage of millions of threads.

Permalink

2.5x better performance: Rama vs. MongoDB and Cassandra

We ran a number of benchmarks comparing Rama against the latest stable versions of MongoDB and Cassandra. The code for these benchmarks is available on Github. Rama’s indexes (called PStates) can reproduce any database’s data model since each PState is an arbitrary combination of durable data structures of any size. We chose to do our initial benchmarks against MongoDB and Cassandra because they’re widely used and like Rama, they’re horizontally scalable. In the future we’ll also benchmark against other databases of different data models.

There are some critical differences between these systems that are important to keep in mind when looking at these benchmarks. In particular, Cassandra by default does not guarantee writes are durable when giving acknowledgement of write success. It has a config commitlog_sync that specifies its strategy to sync its commit log to disk. The default setting “periodic” does the sync every 10 seconds. This means Cassandra can lose up to 10 seconds of acknowledged writes and regress reads on those keys (we disagree strongly with this setting being the default, but that’s a post for another day).

Rama has extremely strong ACID properties. An acknowledged write is guaranteed to be durable on the leader and all in-sync followers. This is an enormous difference with Cassandra’s default settings. As you’ll see, Rama beats or comes close to Cassandra in every benchmark. You’ll also see we benchmarked Cassandra with a commitlog_sync setting that does guarantee durability, but that causes its performance to plummet far below Rama.

MongoDB, at least in the latest version, also provides a durability guarantee by default. We benchmarked MongoDB with this default setting. Rama significantly outperforms MongoDB in every benchmark.

Another huge difference between Rama and MongoDB/Cassandra (and pretty much every database) comes from Rama being a much more general purpose system. Rama explicitly distinguishes data from indexes and stores them separately. Data is stored in durable, partitioned logs called “depots”. Depots are a distinct concept from “commit logs”, which is a separate mechanism that MongoDB, Cassandra, and Rama also have as part of their implementations. When using Rama, you code “topologies” that materialize any number of indexes of any shape from depots. You can use depots to recompute indexes if you made a mistake, or you can use depots to materialize entirely new indexes in the future to support new features. Depots can be consumed by multiple topologies materializing multiple indexes of different shapes. So not only is Rama in these benchmarks materializing equivalent indexes as MongoDB / Cassandra with great comparable performance, it’s also materializing a durable log. This is a non-trivial amount of additional work Rama is doing, and we weren’t expecting Rama to perform so strongly compared to databases that aren’t doing this additional work.

Benchmark setup

All benchmarks were done on a single m6gd.large instance on AWS. We used this instance type rather than m6g.large so we could use a local SSD to avoid complications with IOPS limits when using EBS.

We’re just testing single node performance in this benchmark. We may repeat these tests with clusters of varying sizes in the future, including with replication. However, all three systems have already demonstrated linear scalability so we’re most interested in raw single-node performance for this set of benchmarks.

For all three systems we only tested with the primary index, and we did not include secondary indexes in these tests. We tried configuring Cassandra to have the same heap size of Rama’s worker (4GB) instead of the default 2GB that it was choosing, but that actually made its read performance drastically worse. So we left it to choose its own memory settings.

The table definition used for Cassandra was:

1
2
3
4
5
6

CREATE TABLE IF NOT EXISTS test.test (
pk text,
ck text,
value text,
PRIMARY KEY (pk, ck)
);

This is representative of the kind of indexing that Cassandra can handle efficiently, like performing range queries on a clustering key.

All Cassandra reads/writes were done with the prepared statements "SELECT value FROM test.test WHERE pk = ? AND ck = ?;" and "INSERT INTO test.test (pk, ck, value) VALUES (?, ?, ?);" .

Cassandra was tested with both the “periodic” commitlog_sync config, which does not guarantee durability of writes, and the “batch” commitlog_sync config, which does guarantee durability of writes. We played with different values of commitlog_sync_batch_window_in_ms , but that had no effect on performance. We also tried the “group” commitlog_sync config, but we couldn’t get its throughput to be higher than “batch” mode. We tried many permutations of the configs commitlog_sync_group_window (e.g. 1ms, 10ms, 20ms, 100ms) and concurrent_writes (e.g. 32, 64, 128, 256), but the highest we could get the throughput was about 90% that of batch mode. The other suggestions on the Cassandra mailing list didn’t help.

The Rama PState equivalent to this Cassandra table had this data structure schema:

1	{[String, String] -> String}

The module definition was:

1
2
3
4
5
6
7
8
9
10
11

(defmodule CassandraModule [setup topologies]
(declare-depot setup *insert-depot :random)

(let [s (stream-topology topologies "cassandra")]
(declare-pstate s $$primary {java.util.List String})
(<<sources s
(source> *insert-depot :> *data)
(ops/explode *data :> [*pk *ck *val])
(|hash *pk)
(local-transform> [(keypath [*pk *ck]) (termval *val)] $$primary)
)))

This receives triples of partitioning key, clustering key, and value and writes it into the PState, ensuring the data is partitioned by the partitioning key.

Cassandra and Rama both index using LSM trees, which sorts on disk by key. Defining the key as a pair like this is equivalent to Cassandra’s “partitioning key” and “clustering key” definition, as it’s first sorted by the first element and then by the second element. This means the same kinds of efficient point queries or range queries can be done.

The Rama PState equivalent to MongoDB’s index had this data structure schema:

1	{String -> Map}

The module definition was:

1
2
3
4
5
6
7
8
9
10
11

(defmodule MongoModule [setup topologies]
(declare-depot setup *insert-depot :random)

(let [s (stream-topology topologies "mongo")]
(declare-pstate s $$primary {String java.util.Map})
(<<sources s
(source> *insert-depot :> *data)
(ops/explode *data :> {:keys [*_id] :as *m})
(|hash *_id)
(local-transform> [(keypath *_id) (termval *m)] $$primary)
)))

This receives maps containing an :_id field and writes each map to the $$primary index under that ID, keeping the data partitioned based on the ID.

We used strings for the IDs given to MongoDB, so we used strings in the Rama definition as well. MongoDB’s documents are just maps, so they’re stored that way in the Rama equivalent.

Writing these modules using Rama’s Java API is pretty much the same amount of code. There’s no difference in performance between Rama’s Clojure and Java APIs as they both end up as the same bytecode.

Max write throughput benchmark

For the max write throughput benchmark, we wrote to each respective system as fast as possible from a single client colocated on the same node. Each request contained a batch of 100 writes, and the client used a semaphore and the system’s async API to only allow 1000 writes to be in-flight at a time. As requests got acknowledged, more requests were sent out.

As described above, we built one Rama module that mimics how MongoDB works and another module that mimics how Cassandra works. We then did head to head benchmarks against each database with tests writing identical data.

For the MongoDB tests, we wrote documents solely containing an “_id” key set to a UUID. Here’s MongoDB vs. Rama:

Rama’s throughput stabilized after 50 minutes, and MongoDB’s throughput continued to decrease all the way to the end of the three hour test. By the end, Rama’s throughput was 9x higher.

For the Cassandra tests, each write contained a separate UUID for the fields “pk”, “ck”, and “value”. We benchmarked Cassandra both with the default “periodic” commit mode, which does not guarantee durability on write acknowledgement, and with the “batch” commit mode, which does guarantee durability. As mentioned earlier, we couldn’t get Cassandra’s “group” commit mode to match the performance of “batch” mode, so we focused our benchmarks on the other two modes. Here’s a chart with benchmarks of each of these modes along with Rama:

Since Rama guarantees durable writes, the equivalent comparison is against Cassandra’s batch commit mode. As you can see, Rama’s throughput is 2.5x higher. Rama’s throughput is only a little bit below Cassandra when Cassandra is run without the durability guarantee.

Mixed read/write throughput benchmark

For the mixed read/write benchmark, we first wrote a fixed amount of data into each system. We wanted to see the performance after each system had a significant amount of data in it, as we didn’t want read performance skewed by the dataset being small enough to fit entirely in memory.

For the MongoDB tests, we wrote documents solely containing an “_id” field with a stringified number that incremented by two for each write (“0”, “2”, “4”, “6”, etc.). We wrote 250M of those documents (max ID was “500000000”). Then for the mixed reads/writes test, we did 50% reads and 50% writes. 1000 pairs of read/writes were in-flight at a time. Each write was a single document (as opposed to batch write test above which did 100 at a time), and each read was randomly chosen from the keyspace from “0” to the max ID. Since only half the numbers were written, this means each read had a 50% chance of being a hit and a 50% chance of being a miss.

Here’s the result of the benchmark for MongoDB vs. Rama:

We also ran another test of MongoDB with half the initial data:

MongoDB’s performance is unaffected by the change in data volume, and Rama outperforms MongoDB in this benchmark by 2.5x.

For the Cassandra tests, we followed a similar strategy. For every write, we incremented the ID by two and wrote that number stringifed for the “pk”, “ck”, and “value” fields (e.g. "INSERT INTO test.test (pk, ck, value) VALUES ('2', '2', '2');" ). Reads were similarly chosen randomly from the keyspace from “0” to the max ID, with each read fetching the value for a “pk” and “ck” pair. Just like the MongoDB tests, each read had a 50% chance of being a hit and a 50% chance of being a miss.

After writing 250M rows to each system, here’s the result of the benchmark for Cassandra vs. Rama:

Rama performs more than 2.5x better in this benchmark whether Cassandra is guaranteeing durability of writes or not. Since Cassandra’s write performance in this non-durable mode was a little higher than Rama in our batch write throughput test, this test indicates its read performance is substantially worse.

Cassandra’s non-durable commit mode being slightly worse than its durable commit mode in this benchmark, along with Cassandra’s reputation as a high performance database, made us wonder if we misconfigured something. As described earlier, we tried increasing the memory allocated to the Cassandra process to match Rama (4GB), but that actually made its performance much worse. We made sure Cassandra was configured to use the local SSD for everything (data dir, commit log, and saved caches dir). Nothing else in the cassandra.yaml or cassandra-env.sh files seemed misconfigured. There are a variety of configs relating to compaction and caching that could be relevant, but Rama has similar configs that we also didn’t tune for these tests. So we left those at the defaults for both systems. After double-checking all the configs we reran this benchmark for Cassandra for both commit modes and got the same results.

One suspicious data point was the amount of disk space used by each system. Since we wrote a fixed amount of identical data to each system before this test, we could compare this directly. Cassandra used 11GB for its “data dir”, which doesn’t include the commit log. Rama used 4GB for the equivalent. If you add up the raw amount of bytes used by 250M rows with identical “pk”, “ck”, and “value” fields that are stringified numbers incrementing by two, you end up with 6.1GB. Both Cassandra and Rama compress data on disk, and since there are so many identical values compression should be effective. We don’t know enough about the implementation of Cassandra to say why its disk usage is so high relative to the amount of data being put into it.

We ran the test again for Cassandra with half the data (125M rows), and these were the results:

Cassandra’s numbers are much better here, though the numbers were degrading towards the end. Cassandra’s read performance seems to suffer as the dataset gets larger.

Conclusion

We were surprised by how well Rama performed relative to Cassandra and MongoDB given that it also materializes a durable log. When compared to modes of operation that guarantee durability, Rama performed at least 2.5x better in every benchmark.

Benchmarks should always be taken with a grain of salt. We only tested on one kind of hardware, with contrived data, with specific access patterns, and with default configs. It’s possible MongoDB and Cassandra perform much better on different kinds of data sets or on different hardware.

Rama’s performance is reflective of the amount of work we put into its design and implementation. One of the key techniques we use all over the place in Rama’s implementation is what we call a “trailing flush”. This technique allows all disk and network operations to be batched even though they’re invoked one at a time. This is important because disk syncs and network flushes are expensive. For example, when an append is done to a depot (durable log), we don’t apply that immediately. Instead the appends gets put into an in-memory buffer, and an event is enqueued that will flush that buffer if no such event is already enqueued. When that event comes to the front of the processing queue, it flushes whatever has accumulated on the buffer. If the rate of appends is low, it may do a disk operation for a single append. As the rate of appends gets higher, the number of appends that gets performed together increases. This technique greatly increases throughput while also minimizing latency. We use this technique for sending appends from a client, for flushing network messages in Netty (called “flush consolidation”), for writing to indexes, for sending replication messages to followers, and more.

The only performance numbers we shared previously were for our Twitter-scale Mastodon instance, so we felt it was important to publish some more numbers against tools many are already familiar with. If there are any flaws in how we benchmarked MongoDB or Cassandra, please share with us and we’ll be happy to repeat the benchmarks.

Since Rama encompasses so much more than data indexing, in the future we will be doing more benchmarks against different kinds of tooling, like queues and processing systems. Additionally, since Rama is an integrated system we expect its most impressive performance numbers to be when benchmarked against combinations of tooling (e.g. Kafka + Storm + Cassandra + ElasticSearch). Rama eliminates the overhead inherent when using combinations of tooling like that.

Finally, since Rama is currently in private beta you have to join the beta to get access to a full release in order to be able to reproduce these benchmarks. As mentioned at the start of this post, the code we used for the benchmarks is on our Github. Benchmarks are of course better when they can be independently reproduced. Eventually Rama will be generally available, but in the meantime we felt publishing the numbers was still important even with this limitation.

Permalink

Code reuse

Partial code replacement

Future-proofing

How does it look

Conclusion

Code reuse

Shared state

Where to try

Sponsors

Grammar of graphics in Clojure

Clojure's ML and statistics tools

Foundations of Clojure's data science stack

London Clojurians talk

Summary of future work

Visualisation

Machine learning

Statistics

Foundations

Going forward

Sponsors

Grammar of graphics in Clojure

Clojure's ML and statistics tools

Foundations of Clojure's data science stack

London Clojurians talk

Summary of future work

Visualisation

Machine learning

Statistics

Foundations

Going forward

Sponsors

Updates

Other projects

Ability to do in libraries what requires language support in other languages

Power of immutability and data structure orientation

Conclusion

Method values

Qualified methods - Class/method, Class/.method, and Class/new

:param-tags metadata

Array class syntax

Bug fixes

Intro

Leiningen

Frameworks VS Libraries

Docker

Dev Experience

REPL

Docker Compose

Stateful Resources

REST API

Business Logic

Persistence Layer

Configuration

Logging

Tests

Conclusion

Benchmark setup

Max write throughput benchmark

Mixed read/write throughput benchmark

Conclusion

About

Subscriptions

Planetarium

Syndicate

Qualified methods - `Class/method`, `Class/.method`, and `Class/new`