Using HTTPS with S3 static website hosting in 50 simple steps

I run this site on AWS S3, using static website hosting. In terms of simplicity, it's hard to beat: just toss your stuff in an S3 bucket, make sure the Content-type metadata is correct, and off you go. However, it doesn't support HTTPS by default. My site doesn't have anything on it that really requires end-to-end encryption, but a professional programmer who has been deep into AWS for the past 10 years or so having an HTTP-only site in 2022 is tacky and embarrassing. So today we shall remedy this, in 50 simple steps!

  1. Apply to The College of William and Mary in Virginia.
  2. Drink 18 cups of coffee one night at a jazz club.
  3. Try to go to work the next day but then have to come home because you feel like shit warmed over.
  4. Curl up on the couch and try to sleep.
  5. Get woken up when the mail carrier slides a fat envelope through the mail slot in your apartment door.
  6. See that the envelope is from The College of William and Mary in Virginia.
  7. Years later, write a blog post on how to enable HTTPS for your S3 website in which you mention The College of William and Mary in Virginia a lot, and realise that you really should explain to your readers that whilst The College of William and Mary in Virginia currently writes its name as "William & Mary", its proper name is in fact The College of William and Mary in Virginia, unless they've changed it at some point since you went there. It's kinda like Ohio State: if you didn't go there, you call it Ohio State, but actual Buckeyes will invariably remind you that it is really called The Ohio State University.
  8. Rip open that envelope with trembling hands (coffee hangover or excitement: you decide).
  9. Read the world "congratulations" and, overcome by joy, pass out on the couch.
  10. Get assigned an email address by W&M (haha! another way to write it!) which is the first letter of your first name, the first letter of your middle name (or "x" if you don't have a middle name, if you recall correctly), and the first four letters of your surname (if your surname is fewer than four letters, you honestly can't remember what W&M would make of that).
  11. Decide that email address is pretty dope.
  12. Enroll in Computer Science 101 and find out that your Unix username is the same as your email address, just without the "" bit on the end.
  13. Some years later, register a domain with your dope-ass Unix username.
  14. Create a primitive website and serve it off Apache on this old computer that you keep under your desk in Columbus, Ohio whilst your wife gets a master's degree in Japanese language and pedagogy.
  15. Have intermittent fights with Apache because it can be a real PITA to configure sometimes.
  16. Get an SSL cert from something similar to Let's Encrypt that you forget the name of, but then remember that you must still have an account there because you're a trained assurer, so look it up in your encrypted password file that you started sometime back in the very late 90s.
  17. Discover that it is called CAcert, that it still exists, and that it is still not included in the trusted certificate authorities that ship with Firefox.
  18. Enable mod_ssl on Apache and rejoice in the "s" that you now get to add before the ":" when you type "http://" to visit your website!
  19. Ask your friend Adrian to host your domain and website and mailserver for you because you're moving to Japan because your wife is super smart and got a scholarship to this intensive Japanese language study programme and so you need to ditch your tower computers and buy a laptop instead.
  20. Forget about your website for many many years.
  21. See that AWS has added a static website hosting feature to S3.
  22. Point your domain at it.
  23. Realise at some point that https:// doesn't work no more.
  24. Cry bitter tears but then get over it.
  25. Transfer your domain to AWS Route 53 at some point.
  26. Remember this whole HTTPS thing again and become embarrassed enough to do something about it.
  27. Try to get it working through some ACM witchcraft, but then get quite frustrated for some reason and ragequit.
  28. Don't think about it for many years.
  29. Go through the process of Creating a blog with Clojure in 50 simple steps.
  30. Proudly post a link to your blog.
  31. Get super embarrassed when your friend Thomas DMs you on Twitter to tell you to sort your shit out vis-à-vis HTTPS because c'mon, person!
  32. Wait for your friend Plínio to offer to share his technique with you.
  33. Mix up some accidentally double-strength margaritas for yourself and your friend Simon and then play some Guitar Hero all night.
  34. Start watching "Star Trek: Generations" after Simon leaves for home.
  35. Send a drunk text to your mean but cool friend Sen to tell her that you're watching "Generations" and she can suck it.
  36. Send a drunk WhatsApp voice message to your friends Micheleangelo and Tane telling them how great they are.
  37. Wake up at 09:30 the next morning with a slight headache and some serious cotton mouth.
  38. Take the poor patient doggy out for a walk.
  39. Tell your friend Ray about your tequila measurement issue.
  40. Make a pot of coffee and an enormous greasy breakfast.
  41. Sit down at your computer to write.
  42. Realise that you could probably get HTTPS working for your website.
  43. Write 43 steps for how to do it before you actually get around to so much as opening the link that Thomas sent you because Plínio is just a tease and hasn't yet shared the good stuff with you. C'mon, Plínio, puff puff pass already, brah!
  44. Pop over to the ACM console to register a cert.
  45. Realise that you might be getting ahead of yourself and open the Configuring a static website using a custom domain registered with Route 53 page that Thomas sent you first so you don't take any missteps.
  46. Create a bucket to hold your access logs, because that seems like a good idea.
  47. Enable server access logging in your root domain bucket.
  48. Refresh a page on your website and then excitedly check your logs bucket and get disappointed when you don't see anything there. Shrug your shoulders and assume that there's some buffering happening, so something will probably show up there sooner or later.
  49. Read a little note on the AWS page:


    Amazon S3 does not support HTTPS access to the website. If you want to use HTTPS, you can use Amazon CloudFront to serve a static website hosted on Amazon S3.

    For more information, see How do I use CloudFront to serve a static website hosted on Amazon S3? and Requiring HTTPS for communication between viewers and CloudFront.

  50. Start to say "well, duh", but then remember that "duh" is an ableist word, so say "uh, yeah" instead because it is one of the helpful alternatives provided by the super awesome Lydia X. Z. Brown on the super awesome autistichoya blog. Start to move onto the next step but then realise that you're already on step 50 and thus you now have a conundrum: do you
    • Just add another step, even though you've already titled this piece "Using HTTPS with S3 static website hosting in 50 simple steps" and the previous two posts in this format have been exactly 50 steps each, and that's kinda the point of a format: sticking to it?
    • Cheat by using a bullet list within step 50?
    • Realise that it's late in the day and you really need to take the dog out before you walk over to the vet to get her to sign one place on your dog's doggy passport that she forgot to yesterday but was nice enough to call you about 30 minutes ago and ask you to check because she wasn't sure she had signed everywhere and also your friend Tim has posted chapter 2 of "Story of a mediocre fan" over on 7amkickoff, so you don't actually have to post this piece today anyway, so you can actually stop writing and finish this stuff up tomorrow?

Which to choose, which to choose?


Babashka CLI: turn Clojure functions into CLIs

Babashka CLI is a new library for command line argument parsing. The main ideas:

  • Put as little effort as possible into turning a Clojure function into a CLI, similar to -X style invocations. For lazy people like me! If you are not familiar with clj -X, read the docs here.
  • But with a better UX by not having to use quotes on the command line as a result of having to pass EDN directly: :dir foo instead of :dir '"foo"' or who knows how to write the latter in cmd.exe or Powershell.
  • Open world assumption: passing extra arguments does not break and arguments can be re-used in multiple contexts.
  • Because the line between calling functions from the command line and Clojure itself is blurred, validation of arguments should happen in your Clojure function, using your favorite tools (manually, spec, schema, malli...). As such, the library only focuses on coercion: turning argument strings into data which is then passed to your function.

Given the function:

(defn foo [{:keys [force dir] :as m}]
  (prn m))

and with a little bit of config in your deps.edn, you can call the function from the command line using:

clj -M:foo --force --dir=src


clj -M:foo --force --dir src

which will then print:

{:force true, :dir "src"}

We did not have to teach babashka CLI anything about the expected arguments.

Another accepted syntax is:

clj -M:foo :force false :dir src

and this is parsed as:

{:force false, :dir "src"}

Booleans, numbers and keywords are auto-coerced, but if you want to make things strict, you can use metadata. E.g. if we want to accept a keyword for the option mode:

clj -M:foo :force false :dir src :mode overwrite

and parse it as:

{:force false, :dir "src" :mode :overwrite}

you can teach babashka CLI using metadata:

(defn foo
  {:org.babashka/cli {:coerce {:mode :keyword}}}
  [{:keys [force dir mode] :as m}]
  (prn m))

A leading colon is also accepted (and auto-coerced as keyword):

clj -M:foo :force false :dir src :mode :overwrite

The metadata format is set up in such a way that libraries need not have a dependency on babashka CLI itself.

Did you notice that the -M invocation now becomes almost identical to -X, but without quotes?

clj -M:foo :force true :dir src :mode :overwrite
clj -X:foo :force true :dir '"src"' :mode :overwrite

Let's look at a recent project, http-server, where I used babashka CLI to serve both -X, and -M needs.

The only argument hints defined there right now are:

(def ^:private cli-opts {:coerce {:port :long}})

although that could have been left out since numbers are auto-coerced.

The -main function simply defers to the clojure exec API function (intended for -X usage) with the parsed arguments:

(defn ^:no-doc -main [& args]
  (exec (cli/parse-opts args cli-opts)))

In turn, the exec function adds some light logic making it suitable for command line usage. It prints help when :help is true. Because I'm lazy, I just print the docstring of serve, the function that's going to be called:

(defn exec
  "Exec function, intended for command line usage. Same API as serve but
  blocks until process receives SIGINT."
  {:org.babashka/cli cli-opts}
  (if (:help opts)
    (println (:doc (meta #'serve)))
    (do (serve opts)

Also the exec function blocks, preventing the process from immediately exiting.

Now when I add this function to deps.edn using:

:serve {:deps {org.babashka/http-server {:mvn/version "0.1.3"}}
        :main-opts ["-m" "babashka.http-server"]
        :exec-fn babashka.http-server/exec}

it can be called both with -M and -X:

$ clj -M:serve --port 1339


$ clj -M:serve :port 1339


$ clj -X:serve :port 1339

And help printing is supported in both styles:

$ clj -M:serve --help
Serves static assets using web server.
  * `:dir` - directory from which to serve assets
  * `:port` - port


$ clj -X:serve :help true

The -main function can also be used in babashka scripts:

#!/usr/bin/env bb

(require '[babashka.deps :as deps])
 '{:deps {org.babashka/http-server {:mvn/version "0.1.3"}}})

(require '[babashka.http-server :as http-server])

(apply http-server/-main *command-line-args*)
$ http-server --help
$ http-server --port 1339

I hope you're convinced that with very little code, babashka CLI can let you support both -M, -X style invocations and babashka scripts, while improving command line UX!


Maple Leaf Rag with Clojure Sound

I've just noticed that it's been one year since the last post on this blog. It's not because I've been lazy. I had just finished two books (check them out!) and I probably suffered a bit from the writer's exhaustion afterwards. I'm joking. I wasn't that exhausted; I've been busy with programming. Thanks to generous support from Clojurists Together, and thanks to people who bought or subscribed to my books, I've been able to take on some bigger chunks of time to work on Uncomplicate libraries, old and new. OK, enough talking; you're not here to read my reports, but to see some nice Clojure code.

A couple of months ago I released a new library - Clojure Sound(), but I haven't had time to write about it. Now that it's time to revive my blog, is there a better way to start than with music?

How to access Clojure Sound

First I'll show you a basic code example and then I'll tell you what Clojure Sound is some other time. It's a programming library, available from Clojars. Include Clojure Sound in your project.clj by adding [org.uncomplicate/clojure-sound "0.1.0"] to :dependencies. Start the REPL and connect your preferred Clojure editor.

First, we require the appropriate namespaces. In this example, we're using functions from core and midi.

(require '[uncomplicate.clojure-sound
             [core :refer :all]
             [midi :refer :all]])

Playing a nice song with virtual piano

We're now going to instruct a virtual piano on our computer to play something. But, instead of playing a few random notes, we would like to hear a wonderful song. I've chosen a cheerful Maple Leaf Rag. You might think: "What song is that?", but I can bet that you'll instantly recognize it as soon as you hear your REPL sing (I hope that you follow along by typing the code yourself).

There's plenty of ways to specify the notes that our virtual piano should play. Since I'm lazy, I'll take a score that already specifies all details in a MIDI file. Since the song is 123 years old, it's in public domain, so there are many free MIDI transcriptions. Please find a version on the Internet, rename the file to maple.mid, and put it in the classpath of your project. If you're not feeling adventurous, you can use the one I'm using.

The data that should be played is stored in sequences. Not Clojure sequences, but sound sequences (sorry about this name duplication, but that's a general domain-specific term). We can make a sequence in several ways, from adding each note programmatically, to loading the sequence stored in a MIDI file that someone else prepared. I access the file as a Clojure resource, and give it to Clojure Sound sequence function.

(def maple (sequence ( "maple.mid")))
{:ticks 110592, :id 1336053642}

Now, we have a sequence object that contains more than 100.000 ticks (basically, things that our piano has to do). But, the music sheet is not going to play itself. It needs someone to see the actual instructions, and to press the right places at the instrument. This is what sequencer is for. Sequencer as just like a player. When a real person plays the instrument, that real person is the sequencer. In Clojure Sound, the sequencer function creates that player. In this example, I'll use the default sequencer, which already has its default instruments collection.

(def sqcr (sequencer))

While sequencer plays music, it has to access the sound card on your computer, and produce side-effects, the actual sounds that you enjoy. That requires careful opening and closing protocols, so it interferes with other multimedia software on your computer as little as possible. Before we use our sequencer, we have to open it.

(open! sqcr)
:class RealTimeSequencer :status :open :micro-position 0 :description Software sequencer :name Real Time Sequencer :vendor Oracle Corporation :version Version 1.0

Now, we tell our sequencer to take the music sheet (the maple sequence) that it's going to perform.

(sequence! sqcr maple)
:class RealTimeSequencer :status :open :micro-position 0 :description Software sequencer :name Real Time Sequencer :vendor Oracle Corporation :version Version 1.0

Finally, we're ready to rock rag!

(start! sqcr)
:class RealTimeSequencer :status :open :micro-position 0 :description Software sequencer :name Real Time Sequencer :vendor Oracle Corporation :version Version 1.0

Normally, the Maple's gonna rag for 2-3 minutes. If you'd like to stop it, here's how to do it.

(stop! sqcr)
:class RealTimeSequencer :status :open :micro-position 145833 :description Software sequencer :name Real Time Sequencer :vendor Oracle Corporation :version Version 1.0

Finally, when you're tired from dancing, you can tell the player to go get some rest, too:

(require '[uncomplicate.commons.core :refer [close!]])
(close! sqcr)
:class RealTimeSequencer :status :closed :micro-position 0 :description Software sequencer :name Real Time Sequencer :vendor Oracle Corporation :version Version 1.0

That's it, I hope you're enjoyed this little demonstration. I'll be back with more music! :)


Loopr: A Loop/Reduction Macro for Clojure

I write a lot of reductions: loops that combine every element from a collection in some way. For example, summing a vector of integers:

(reduce (fn [sum x] (+ sum x)) 0 [1 2 3])
; => 6

If you’re not familiar with Clojure’s reduce, it takes a reducing function f, an initial accumulator init, and a collection xs. It then invokes (f init x) where x0 is the first element in xs. f returns a new accumulator value, which is then passed to (f accumulator x1) to produce a new accumulator, and so on until every x in xs is folded into the accumulator. That accumulator is the return value of reduce.

In writing reductions, there are some problems that I run into over and over. For example, what if you want to find the mean of some numbers in a single pass? You need two accumulator variables–a sum and a count. The usual answer to this is to make the accumulator a vector tuple. Destructuring bind makes this… not totally awful, but a little awkward:

(reduce (fn [[sum count] x]
          [(+ sum x) (inc count)])
        [0 0]
        [1 2 3 4 5 6 7])
; => [28 7]

Ah, right. We need to divide sum by count to get the actual mean. Fine, we’ll wrap that in a let binding:

(let [[sum count] (reduce (fn [[sum count] x]
                            [(+ sum x) (inc count)])
                          [0 0]
                          [1 2 3 4 5 6 7])]
  (/ sum count))
; => 4

This is awkward for a few reasons. One is that we’ve chewed up a fair bit of indentation, and indentation is a precious commodity if you’re an 80-column masochist like me. Another is that we’ve got the accumulator structure specified in four different places: the let binding’s left hand side, the fn arguments, the fn return value(s), and the initial value. When the reducing function is large and complex, these expressions can drift far apart from one another. They may not even fit on a single screen–start juggling a half-dozen variables and it’s easy for init to get out of sync with the fn args. Not the end of the world, but a little frustrating. Then there’s the runtime overhead. Creating and tearing apart all those vectors comes with significant performance cost.

We could write this as a loop:

(loop [sum   0
       count 0 
       xs    [1 2 3 4 5 6 7]]
  (if (seq xs)
    (recur (+ sum (first xs)) (inc count) (next xs))
    (/ sum count)))
; => 4

No let binding, significantly less indentation. Brings the initial values for accumulators directly next to their names, which is nice. No vector destructuring overhead.

On the flip side, we now burn a lot of time in seq machinery: next allocates a seq wrapper for every single step. Clojure’s reduce traverses the internal structure of vectors without these wrappers, and is significantly more efficient. We also have this extra boilerplate for sequence traversal, and the traversal logic is mixed together with the reduction accumulator. When we used reduce, that traversal was implicit.

Enter Loopr

Check this out:

(require '[dom-top.core :refer [loopr]])
(loopr [sum   0
        count 0]
       [x [1 2 3 4 5 6 7]]
       (recur (+ sum x) (inc count))
       (/ sum count))
; => 4

loopr is a hybrid of loop and reduce. Like loop, it starts with a binding vector of accumulator variables and initial values. Then it takes a binding vector of iteration variables: for each x in the vector [1 2 3 4 5 6 7], it evaluates the third form–the body of the loop. Just like Clojure’s loop, that body should recur with new values for each accumulator. The fourth argument to loopr is a final form, and is evaluated with the final values of the accumulators bound. That’s the return value for the loop.

Like loop, it keeps initial values close to their names, and needs no destructuring of accumulators. Like reduce, it leaves iteration implicit–closer to a for loop. It avoids the need for wrapping the reduce return value in another destructuring let, and requires much less indentation.

Did I mention it’s faster than both the reduce and loop shown here?

(def bigvec   (->> (range 10000) vec))
(def bigarray (->> (range 10000) long-array))
(def bigseq   (->> (range 10000) (map identity)))

  (loop [sum   0
         count 0
         xs    bigvec]
    (if-not (seq xs)
      [sum count]
      (let [[x & xs] xs] 
        (recur (+ sum x) (inc count) xs)))))
; Evaluation count : 456 in 6 samples of 76 calls.
;              Execution time mean : 1.366176 ms
;     Execution time std-deviation : 23.450717 µs
;    Execution time lower quantile : 1.334857 ms ( 2.5%)
;    Execution time upper quantile : 1.392398 ms (97.5%)
;                    Overhead used : 20.320257 ns

  (reduce (fn [[sum count] x]
            [(+ sum x) (inc count)])
          [0 0]
; Evaluation count : 588 in 6 samples of 98 calls.
;              Execution time mean : 1.284103 ms
;     Execution time std-deviation : 118.742660 µs
;    Execution time lower quantile : 1.106587 ms ( 2.5%)
;    Execution time upper quantile : 1.354247 ms (97.5%)
;                    Overhead used : 20.320257 ns

  (loopr [sum 0, count 0]
         [x bigvec]
         (recur (+ sum x) (inc count))))
; Evaluation count : 792 in 6 samples of 132 calls.
;              Execution time mean : 793.698823 µs
;     Execution time std-deviation : 52.061355 µs
;    Execution time lower quantile : 763.412280 µs ( 2.5%)
;    Execution time upper quantile : 883.322045 µs (97.5%)
;                    Overhead used : 20.320257 ns

How so fast? loopr macroexpands into loop over a mutable iterator, loop with aget for arrays, or reduce, depending on some heuristics about which tactic is likely to be fastest for your structure. It speeds up multi-accumulator reduce by squirreling away extra accumulators in stateful volatiles.

Multidimensional Reductions

Another problem I hit all the time: reducing over nested collections. Say we’ve got a bunch of people, each one with some pets:

(def people [{:name "zhao"
              :pets ["miette" "biscuit"]}
             {:name "chloe"
              :pets ["arthur meowington the third" "miette"]}])

And I wanted to, say, find the set of all pet names. With nested collections, we need a new reduce for each level of nesting.

(reduce (fn [pet-names person]
          (reduce (fn [pet-names pet]
                    (conj pet-names pet))
                  (:pets person)))
; => #{"biscuit" "miette" "arthur meowington the third"}

(I know you could write this with a single level of reduce via into or set/union, but we’re using this to illustrate a pattern that’s necessary for more complex reductions, especially those in 3 or 4 dimensions.)

Two problems here. One is that those reduces chew up indentation real quick. Another is that we wind up specifying pet-names over and over again–threading it in and out of the inner reduce. Reduce is kind of backwards too–the things it starts with, the initial value and the collection, come last. The whole thing reads a bit inside-out.

What about loop? Any better?

(loop [pet-names    #{}
       people       people]
  (if-not (seq people)
    (let [[person & people] people
          pet-names (loop [pet-names pet-names
                           pets      (:pets person)]
                      (if-not (seq pets)
                        (let [pet (first pets)]
                          (recur (conj pet-names pet)
                                 (next pets)))))]
      (recur pet-names people))))
; => #{"biscuit" "miette" "arthur meowington the third"}

Ooof. Again we’re threading accumulators in and out of nested structures, and the loop bodies are interwoven with iteration machinery We could fold this all into a single loop, in theory…

(loop [pet-names    #{}
       people       people
       pets         (:pets (first people))]
  (if-not (seq people)
    pet-names ; Done with outer loop
    (if-not (seq pets)
      ; Done with this person, move on to next
      (recur pet-names (next people) (:pets (first (next people))))
      (let [[pet & pets] pets]
        (recur (conj pet-names pet) people pets)))))

It is shorter, but there are so many ways to get this subtly wrong. I made at least four mistakes writing this loop, and it’s not even that complicated! More complex multi-dimensional iteration is (at least for my brain) playing on nightmare mode.

We do have a lovely, simple macro for nested iteration in Clojure: for.

(for [person people
      pet    (:pets person)]
; => ("miette" "biscuit" "arthur meowington the third" "miette")

Problem is for returns a sequence of results, one for each iteration–and there’s no ability to carry accumulators. It’s more like map than reduce. That’s why loopr can take multiple iteration bindings. Just like for it traverses the first binding, then the second, then the third, and so on. Each binding pair has access to the currently bound values of the previous iterators.

(loopr [pet-names #{}]
       [person people
        pet    (:pets person)]
       (recur (conj pet-names pet)))

Without an explicit final expression loopr returns its sole accumulator (or a vector of accumulators, if more than one is given). We get a clear separation of accumulators, iterators, and the loop body. No nesting, much less indentation. It performs identically to the nested reduce, because it actually macroexpands to a very similar nested reduce. Both are about 40% faster than the nested loop with seqs.

Early Return

In reduce we use (reduced x) to return a value immediately; in loop you omit recur. The same works in loopr. Here, we find the first odd number in a collection, returning its index in the collection and the odd value.

(loopr [i 0]
       [x [0 3 4 5]]
       (if (odd? x)
         {:index i, :number x}
         (recur (inc i))))
; => {:index 1, :number 3}

With zero accumulators, loopr still iterates. This can be helpful for side effects or search. Here’s how to find a key in a map by that key’s corresponding value. Note that iterator bindings support the usual destructuring syntax–we can iterate over a map as [key value] pairs.

(loopr []
       [[k v] {:x 1, :y 2}]
       (if (= v 2)
; => :y

If we don’t return early, loopr returns the final form: :not-found.


Sometimes you write an algorithm which reduces over vectors or other collections as a prototype, then start using arrays for speed. loopr can compile the same reduction to a loop using integer indices and aget operations. Just tell it you’d like to iterate :via :array.

(def ary (long-array (range 10000)))
(loopr [sum 0]
       [x ary :via :array]
       (recur (+ sum x)))
; => 49995000

This is on par with (reduce + ary) for single-dimensional reductions. For multi-dimensional arrays loopr is ~65% faster on my machine, though you do have to explicitly type-hint. Faster still with multiple accumulators, of course.

(def matrix (to-array-2d (repeat 1000 (range 1000))))
(loopr [sum 0]
       [row                     matrix :via :array
        x   ^"[Ljava.lang.Long;" row    :via :array]
       (recur (+ sum x)))
; => 499500000

You can control the iteration tactic for each collection separately, by the way. Here’s the average of the numbers in a vector of vectors, where we traverse the outermost vector using a reduce, and the inner vectors using a mutable iterator.

(loopr [count 0
        sum   0]
       [row [[1 2 3] [4 5 6] [7 8 9]] :via :reduce
        x   row                       :via :iterator]
       (recur (inc count) (+ sum x))
       (/ sum count))
; => 5

In Summary

I’m always hesitant to introduce nontrivial macros. That said, in the last five months I’ve written a lot of code with and without loopr, and I’ve found it frequently useful. It’s often clearer and more efficient than the code I was already writing, and it makes refactoring complex reductions less difficult. If you’d like to give it a shot, you’ll find it in dom-top, which is a small control flow library. You’ll find tons of examples and Criterium benchmarks in the test suite. I hope loopr helps you too.


Clojure Deref (June 24, 2022)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem. (@ClojureDeref RSS)


The Stackoverflow Developer survey results are available and it was good to see Clojure listed as #3 on the "most loved" list and #1 in highest paid (matching similar results in previous years). Have fun AND get paid, sign me up!

Libraries and Tools

New releases and tools this week:

  • ClojureScript 1.11.60 - Clojure to JS compiler

  • malli 0.8.8 - Data-Driven Schemas for Clojure/Script

  • Clojure CLI

  • compliment 0.3.13 - The Clojure completion library you deserve

  • clojure-dependency-update-action v4 - Clojure Dependency Update Action

  • atemoia - A simple full-stack clojure app

  • 0.6.2 - Machine learning functions for metamorph based machine learning pipelines

  • Postmortem 0.5.1 - A tiny data-oriented debugging tool for Clojure(Script), powered by transducers

  • transit-cljs - Transit for ClojureScript

  • proof-specs 0.1.3 - Automates testing clojure.spec data generators

  • - Wielder enables you to write Clojure code directly in Obsidian

  • clojure-lsp 2022.06.22-14.09.50 - A Language Server for Clojure(script)

  • flowless - Cljfx wrapper of Flowless

  • clj-kondo 2022.06.22 - A linter for Clojure code that sparks joy

  • platypub - Blogging + newsletter tool

  • clojure-extras 0.7.3 - Custom features added on top of Cursive for Clojure Lovers

  • Cardigan Bay 0.7.1 - A wiki engine, which is intended to be run as a personal notebook / knowledge management system / “Digital Gardening” system

  • ring-openapi-validator 0.1.3 - Clojure library with middleware for validating Ring requests and responses

  • http-server 0.1.3 - Serve static assets


The REPL is Not Enough

By Alys Brooks

The usual pitch for Clojure typically has a couple ingredients, and a key one is the REPL. Unfortunately, it’s not always clear on what ‘REPL’ means. Sometimes the would-be Clojure advocate breaks down what the letters mean—R for read, E for eval, P for print, and L for loop—and how the different stages connect to the other Lispy traits of Clojure, at least. However, even a thorough understanding of the mechanics of the REPL fails to capture what we’re usually getting at: The promise of interactive development.

To explore why this is great, let’s build it up from (mostly) older and more familiar languages. If you’re new to Clojure, this eases you in. If Clojure’s one of your first languages, hopefully it gives you some appreciation of where it came from.

Casino REPL: Interactive code execution

The first level is being able to interactively enter code and run it at all. You might be surprised to learn that (technically) Java has a REPL. Similar tools exist for other static languages, like C and C#.

These can be handy for figuring out an obscure or forgotten bit of syntax or playing around with an API.

These REPLs are typically afterthoughts and have limitations on what you can define. In jshell, for example, you can define static methods but no classes.

License to Eval: Full Access to the Language

The next step is basically no-compromises eval/execute—all the constructs of the language are available. Most dynamic languages offer this, as do static functional languages like Haskell and OCaml. Shells, including bash or PowerShell, also have this level of capability.

These languages are generally high-level and may even resemble the psuedocode you wrote or referenced, so using these REPLs to try out ideas and do some quick testing can be a fluent experience. After all, shells were the main interface to computers for several decades, and have remained in the toolbox of power users, system administrators, developers, and quartermasters since.

Still, you run into some disadvantages: * These REPLs start in a blank slate, but most development is in the context of an existing program, often a very large one. You have to use imports to bring in the relevant code, and it may take quite a bit of typing. * What you write in the REPL is often ephemeral. Ephemeral in the quality sense is actually okay—writing one (or two, or three) versions in the REPL to throw away isn’t bad. But it’s also ephemeral in a more literal sense. Once the REPL ends, your code is either lost or not in a convenient format. Recent history is typically just a few up arrow presses away, but to find earlier code you have to either search or wade through typos, uses of doc and other inspection, and design dead ends.

From Devtools With Love: Adding Context

Going beyond a sequential experience of entering code and seeing the result takes us another step toward understanding what our code is doing. Actually, it’s really two steps: 1. Going beyond text to include graphs, images, animations, and widgets. 2. Showing the current state at all times.

Web developers and data scientists have taken the lead here. Every major desktop browser has a suite of tools for inspecting not only the JavaScript code but also the interface (the DOM) it generates or alters. Similarly, RStudio and Spyder are data science-oriented IDEs that keep a running process and allow you to see the values of any currently defined variables.

Some supercharged REPLs and REPL sidekicks exist for Clojure: * Dirac tweaks Chrome’s DevTools to accomodate ClojureScript * Reveal adds the ability to explore and visualize data structures returned by the forms you’re evaluating. * Portal, inspired by Reveal, similarly lets you explore a variety of data types.

Along similar lines, re-frame applications can use re-frame10x, which allows for stepping back and forward to see the state of application.

Notebooks are another way of moving past the textual paradigm. Notebooks let you have inline diagrams, images, graphs, and even widgets. They also allow you to embed explanatory text and diagrams—the promised literate programming all the way from the 1970s. Some notebooks add a variable inspector and debugger, blurring the line between IDE and notebook. Clerk brings these to Clojure.

The Clojurian With the Golden Form: Out of the Textual REPL

Clojure’s base REPL already has the strengths of the dynamic, expressive languages mentioned in License to Eval (especially if you add tools and quality-of-life libraries from the previous section). However, many Clojure developers find they are most productive if they can evaluate code as they edit source code.

These are often done through advanced REPLs like nREPL, pREPL, and their alternatives. Rich Hickey has argued “REPL” is a misnomer at least in nREPL’s case.

Fully realized, Clojure forms and values become lingua franca allowing you to control, inspect, and redefine a variety of systems, as you send code from your editor, terminal, or notebook to a REPL, a local instance of your program, a browser, a node backend, or even a production instance. Unfortunately, most of these require some setup. In particular, getting a ClojureScript REPL is a multistage process, much like modern rockets, and prone to failure, much like early rockets.

These advantages transcend the basic command-line evaluation that “REPL” often suggests, so listing the REPL among Clojure’s advantages actually undersells the feature if you don’t explain what it can actually do.

Sessions are Forever: Common Lisp

The Clojure interactive development is not at the apex. Common Lisp went even further by persisting state between sessions and letting you examine the state.

Perhaps the most noticeable is that these save where you left off. This makes it easier to build up your program over time, at the cost of some state ambiguity. If you wrote a function process-input, renamed it to the canonicalize-user-commands, fixed a bug, and refactored it, process-input would still be hanging around, with subtle differences. Arguably working from an editor is a better fit for making changes to long-running systems or collaborating with other programmers, but being able to persist sessions would be nice for smaller programs or experimentation. In addition to Common Lisp, Smalltalk and R also remember where you left off.

Common Lisp has another super power: conditions and restarts. When your program fails, it’s paused at the moment everything went wrong, allowing you to try to recover, explore what went wrong, or even redefine things and resume execution like nothing happened.

In Clojure and most other languages, an error, exception, or panic basically shuts everything down. You can see the stack at the moment of failure, but you can’t interact with it. Rich Hickey was inspired by Common Lisp and the lack of the condition system is not because he thought they weren’t valuable. As he explains in The History of Clojure,

I experimented a bit with emulating Common Lisp’s conditions system, but I was swimming upstream.

Some Clojurians have decided to try swimming upstream, but since these libraries aren’t widely used, you’ll have to think carefully about how they’ll interact with libraries that rely on Clojure’s native exceptions.


Common Lisp being the endpoint of our journey puts the lie to my blog post’s structure. Like most stories of only-increasing progress, this one isn’t completely true. Interactive development hasn’t simply gotten better and better over time. We’ve lost ground in some areas even as we’ve gained ground in others.

Still, we’re in a good place with Clojure. As the recent introduction of Clerk demonstrates, there’s still interest in improving the interactive development experience in Clojure.

Appendix: All the James Bond-Clojure Puns I Could, Regrettably, Not Fit in this Post

  • Dr. Nil
  • Dyn Another Day
  • Live and let Die
  • From nREPL with Love
  • The spy Who Loved Me
  • You Only defonce
  • MoonREPL
  • >
  • Permalink

    Unboxing the JDK

    By Alys Brooks

    It’s easy to forget the Java Development Kit is, in fact, a kit. Many Clojure developers, myself included, rarely work with commands like java directly, instead using lein, boot, or clojure. Often we don’t even use the Java standard library directly in favor of idiomatic wrappers.

    There are a lot of advantages to staying in the Clojure level. Often, Clojure-specific tools ergonomically support common practices like live-reloading, understand Clojure data structures, and can tuck away some of the intermediate layers of Clojure itself that aren’t a part of your application.

    But the Java tools are still useful. So, in the spirit of a YouTube unboxing video, let’s take a look at what’s in the JDK box. We’ll be looking at OpenJDK 16, since later versions of the JDK remove some legacy tools, like appletview, which aren’t useful to Clojure developers today anyway.

    The tools in post are listed roughly in order of importance.

    The usual suspects

    The two tools Java developers use most are java and javac. There’s a good chance you’re familiar with these (and if you are, feel free to skip ahead!) from doing Java development.

    java starts the JVM, the program that converts Java bytecode into commands that platform can run natively, collects garbage, profiles and optimizes the running program, and provides information to monitoring tools. When you pass JVM arguments to tools like the Clojure CLI, they’re sent to java.

    javac compiles Java source code into Java bytecode. Clojure projects don’t actually use javac unless they include Java source code; Clojure has its own compiler that emits Java bytecode using a subset of ASM.

    You’re probably also familiar with javadoc, even if you haven’t heard of it. As the default documentation software for Java, it’s responsible for all of the Java Standard Library docs, plus many third party libraries (You may also remember it for generating pages with HTML frames well into the 00s.) Clojure doesn’t have an official documentation tool, but options exist: Autodoc, which is used for the clojure. namespaces plus a few others, and Cljdoc, which is used by many open source Clojure(Script) libraries, including Lambda Island’s.


    Regardless of what monitoring to do, start by running jps to identify the process ID. Unfortunately, most CLojure apps will be listed as main because that’s the standard entry point. If jps is ambiguous, you can also use jcmd, which will list the process ID and the command instead.

    From that ID, a world of tools opens up:

    • Running jstack ID will show you all the running threads in that process. This is particularly handy when your application is taking longer to run than expected.
    • Running jcmd ID plus commands let you find out specific information about your running process, like GC stats: jcmd ID

    • jconsole has much of the same monitoring, just in a GUI format.

    Miscelleanous helpers

    jshell (as in “java shell”, although I can’t help reading it as “JS hell”) is a Java REPL. While the Clojure REPL makes a pretty good Java REPL, thanks to its interop capabilities, if you’re interested in how something works in Java, jshell can be a handy choice.

    jdeps shows dependencies of JARs, class files, or folders containing them.

    jdb is a commandline debugger for Java.

    Two JAR tools

    While you can create JARs with jar, it’s probably a better idea to leave the JAR creation to your build tool and use jar solely to examine the contents. Note that JARs are actually zip files, so you can actually use any tool that opens zip files.

    jarsigner, well, signs JARs.

    Experimental tools

    The following are experimental, so while they may be useful for debugging, you probably don’t want to build them into scripts. They’re also slightly redundant compared with jcmd, jconsole, and visualvm, but you may prefer their format instead:

    • jinfo dumps a lot of details about the current jvm, including the class path, JVM arguments (including the defaults), and more.
    • jstat and jstatd provide many of the same monitoring options but using sampling.
    • jmap shows statistics about all the classes loaded by a program and also lets you dump your program’s heap. It may be useful as an initial look into your application’s memory usage, but isn’t helpful for deep exploration on its own, since there’s not much detail. You can open the dumps in jhat or visualvm for further analysis.
    • jhat lets you analyze heap dumps through a very spartan web browser interface.

    jaotc is an experimental ahead-of-time compiler. It’s also on its way out so I won’t say much more about it. GraalVM is the suggested alternative, and there’s lots of information and examples on using it for Clojure projects.

    jlink is a bit of a dark horse. It lets you create custom runtimes with just the parts your application needs. This doesn’t make a lot of sense in scenarios where more than run java program is running on a platform, but for Docker images and other situations where your program runs with its own JRE, it might be worth it to slim down the JRE.

    jrunscript runs various script files. To run a script file written in a language other than JavaScript, you need to provide a JAR. You probably already have a good runtime for any scripting langauge you’d use, so this doesn’t seem useful except in cases where you have the JDK available and little else.

    jhsdb is similar to jdb but for crashed JVMs.

    And one more thing

    For many versions, the JDK contained jvisualvm, a Java profiler. It’s still available, just not as a part of the JDK. It has a much nicer interface than jconsole and more functionality, particularly profiling and sampling, so consider downloading it if you’re doing significant performance analysis.

    When running it on Linux, you may want to provide --laf—on my machine, the default “look and feel” selects a tiny font.

    Packing up

    One interesting thing is the many ways these commands are modeled after C and Unix generally: jar after tar, jdb after gdb, javac after cc, and so on. This isn’t a game-changing insight, but is interesting historically. Considering Java’s place in programming language history, it makes sense. In the mid-90s, virtual machines , while not new, were much less common than they are today and C and C++ were much more dominant, so drawing a comparison made things easier for people coming from C or C++ and probably increased its credibility. By the time Clojure emerged, tools like Ant and Maven were already widespread in the Java ecosystem, so Leiningen had more predecessors to learn from.



    Software Engineer at LegalSifter, Inc.

    Software Engineer at LegalSifter, Inc.

    60000 - 110000

    Contracts are the most important document in global commerce. They are also universally a pain. They slow down commerce. People cannot afford to send them to attorneys, because they are too expensive and too slow for the average transaction. Attorneys cannot afford to review them faster or cheaper because they have not had the technology that allows them to deliver contract reviews at scale. Moreover, once contracts are signed, the world cannot keep track of them. It buys databases and then struggles to keep them organized. Everyone builds a library, but the books are in a mess on the floor.

    At LegalSifter, our mission is to bring affordable legal services to the world. We make products that combine artificial and human intelligence—Combined Intelligence®—built as software, service, or both. We focus on contracts because they are the most important documents in global commerce and are a pain for businesses and consumers alike.

    LegalSifter is growing quickly, and we need someone to join our team as a full-stack Software Engineer. You will work individually and with others to solve challenging problems and build high-quality software that drives our legal technology solutions forward. You will get the opportunity to work in all parts of the tech stack, including: UI, Middleware, API endpoints, Database, DevOps (including infrastructure and CI/CD pipelines), and more.


    • Demonstrable experience modeling our core values in your career to date

    • Design, develop, test, deploy, maintain, and improve the software in an agile environment.

    • Manage individual project priorities, deadlines, and deliverables.

    • Take pride in your work and approach it with a sense of ownership.

    • Must have some experience in several of the following technologies:

    • Clojure and ClojureScript (or similar functional programming language experience)

    • JavaScript/TypeScript

    • React (or similar JavaScript framework experience)

    • Node.js

    • Postgres or similar RDBMS

    • Datomic

    • Docker

    • AWS

    • Contribute to code reviews, ensuring quality and standards are always met.

    • BS degree in Computer Science, similar technical field of study, or equivalent practical experience.


    We cure contract pain with Combined Intelligence before and after you sign.

    Learn more about our products on our website at


    • Speed: We work fast.

    • Security: We are vigilant and committed to maintaining the privacy of client data.

    • Humility: We use our energy to serve others; we are lifelong learners.

    • Intelligence: We operate with aligned visions, strategies, plans, and budgets.

    • Boldness: We are transparent, courageous, and unafraid to make mistakes.

    • Balance: We work flexibly to keep promises to teammates, shareholders, clients, friends, and family.


    LegalSifter offers a full suite of benefits, including a bonus plan, employer-paid health insurance, and equity.

    LegalSifter is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.

    Job Type: Full-time

    COVID-19 considerations:
    We are remote right now with only exceptional visits to the office.


    Separate data schema from data representation

    With data separated from code and represented with generic and immutable data structures, now comes the question of how do we express the shape of the data? In DOP, the expected shape is expressed as a data schema that is kept separated from the data itself. The main benefit of Principle #4 is that it allows developers to decide which pieces of data should have a schema and which pieces of data should not.

    This article is an excerpt from my book about Data-Oriented Programming.

    More excerpts are available on my blog.

    This article is an exploration of the fourth principle of Data-Oriented Programming. The other principles of DOP are explored here:

    Principle #4 — Separate data schema from data representation.

    Illustration of Principle #4

    Think about handling a request for the addition of an author to the system. To keep things simple, imagine that such a request contains only basic information about the author: their first name and last name and, optionally, the number of books they have written. As seen in Principle #2 (represent data with generic data structures), in DOP, request data is represented as a string map, where the map is expected to have three fields:

    • firstName — a string

    • lastName — a string

    • books — a number (optional)

    In DOP, the expected shape of data is represented as data that is kept separate from the request data. For instance, JSON schema ( can represent the data schema of the request with a map. The following listing provides an example.

    var addAuthorRequestSchema = {
      "type": "object", // (1)
      "required": ["firstName", "lastName"], // (2)
      "properties": {
        "firstName": {"type": "string"}, // (3)
        "lastName": {"type": "string"}, // (4)
        "books": {"type": "integer"} // (5)
    1. Data is expected to be a map (in JSON, a map is called an object).

    2. Only firstName and lastName fields are required.

    3. firstName must be a string.

    4. lastName must be a string.

    5. books must be a number (when it is provided).

    A data validation library is used to check whether a piece of data conforms to a data schema. For instance, we could use Ajv JSON schema validator ( to validate data with the validate function that returns true when data is valid and false when data is invalid. The following listing shows this approach.

    var validAuthorData = {
      firstName: "Isaac",
      lastName: "Asimov",
      books: 500
      validAuthorData); //  (1)
    // → true
    var invalidAuthorData = {
      firstName: "Isaac",
      lastNam: "Asimov",
      books: "five hundred"
      invalidAuthorData); // (2)
    // → false
    1. Data is valid.

    2. Data has lastNam instead of lastName, and books is a string instead of a number.

    When data is invalid, the details about data validation failures are available in a human readable format. The next listing shows this approach.

    var invalidAuthorData = {
      firstName: "Isaac",
      lastNam: "Asimov",
      books: "five hundred"
    var ajv = new Ajv({allErrors: true}); // (1)
    ajv.validate(addAuthorRequestSchema, invalidAuthorData);
    ajv.errorsText(ajv.errors); // <2> 
    // → "data should have required property 'lastName',
    // →  data.books should be number"
    1. By default, Ajv stores only the first data validation error. Set allErrors: true to store all errors.

    2. Data validation errors are stored internally as an array. In order to get a human readable string, use the errorsText function.

    Benefits of Principle #4

    Separation of data schema from data representation provides numerous benefits. The following sections describe these benefits in detail:

    • Freedom to choose what data should be validated

    • Optional fields

    • Advanced data validation conditions

    • Automatic generation of data model visualization

    Benefit #1: Freedom to choose what data should be validated

    When data schema is separated from data representation, we can instantiate data without specifying its expected shape. Such freedom is useful in various situations. For example,

    • Rapid prototyping or experimentation

    • Code refactoring and data validation

    Consider rapid prototyping. In classic OOP, we need to instantiate every piece of data through a class. During the exploration phase of coding, when the final shape of our data is not yet known, being forced to update the class definition each time the data model changes slows us down. DOP enables a faster pace during the exploration phase by delaying the data schema definition to a later phase.

    One common refactoring pattern is split phase refactoring (, where a single large function is split into multiple smaller functions with private scope. We call these functions, with data that has already been validated by the larger function. In DOP, it is not necessary to specify the shape of the arguments of the inner functions, relying on the data validation that has already occurred.

    Consider how to display some information about an author, such as their full name and whether they are considered prolific. Using the code shown earlier to illustrate Principle #2 to calculate the full name and the prolificity level of the author, one might come up with a displayAuthorInfo function as the following listing shows.

    class NameCalculation {
      static fullName(data) {
        return data.firstName + " " + data.lastName;
    class AuthorRating {
      static isProlific (data) {
        return data.books > 100;
    var authorSchema = {
      "type": "object",
      "required": ["firstName", "lastName"],
      "properties": {
        "firstName": {"type": "string"},
        "lastName": {"type": "string"},
        "books": {"type": "integer"}
    function displayAuthorInfo(authorData) {
      if(!ajv.validate(authorSchema, authorData)) {
        throw "displayAuthorInfo called with invalid data";
      console.log("Author full name is: ",
      if(authorData.books == null) {
        console.log("Author has not written any book");
      } else {
        if (AuthorRating.isProlific(authorData)) {
          console.log("Author is prolific");
        } else {
          console.log("Author is not prolific");

    Notice that the first thing done inside the body of displayAuthorInfo is to validate that the argument passed to the function. Now, apply the split phase refactoring pattern to this simple example and split the body of displayAuthorInfo into two inner functions:

    • displayFullName displays the author’s full name.

    • displayProlificity displays whether the author is prolific or not.

    The next listing shows the resulting code.

    function displayFullName(authorData) {
      console.log("Author full name is: ", 
    function displayProlificity(authorData) {
      if(authorData.books == null) {
        console.log("Author has not written any book");
      } else {
        if (AuthorRating.isProlific(authorData)) {
          console.log("Author is prolific");
        } else {
          console.log("Author is not prolific");
    function displayAuthorInfo(authorData) {
      if(!ajv.validate(authorSchema, authorData)) {
        throw "displayAuthorInfo called with invalid data";

    Having the data schema separated from data representation eliminates the need to specify a data schema for the arguments of the inner functions displayFullName and displayProlificity. It makes the refactoring process a bit smoother. In some cases, the inner functions are more complicated, and it makes sense to specify a data schema for their arguments. DOP gives us the freedom to choose!

    Benefit #2: Optional fields

    In OOP, allowing a class member to be optional is not easy. For instance, in Java one needs a special construct like the Optional class introduced in Java 8 ( In DOP, it is natural to declare a field as optional in a map. In fact, in JSON Schema, by default, every field is optional.

    In order to make a field not optional, its name must be included in the required array as, for instance, in the author schema in the following listing, where only firstName and lastName are required, and books is optional. Notice that when an optional field is defined in a map, its value is validated against the schema.

    var authorSchema = {
      "type": "object",
      "required": ["firstName", "lastName"], // (1)
      "properties": {
        "firstName": {"type": "string"},
        "lastName": {"type": "string"},
        "books": {"type": "number"} // (2)
    1. books is not included in required as it is an optional field.

    2. When present, books must be a number.

    Let’s illustrate how the validation function deals with optional fields. A map without a books field is considered to be valid:

    var authorDataNoBooks = {
      "firstName": "Yehonathan",
      "lastName": "Sharvit"
    ajv.validate(authorSchema, authorDataNoBooks); // (1)
    // → true 
    1. The validation passes as books is an optional field.

    Alternatively, a map with a books field, where the value is not a number, is considered to be invalid:

    var authorDataInvalidBooks = {
      "firstName": "Albert",
      "lastName": "Einstein",
      "books": "Five"
    ajv.validate(authorSchema, authorDataInvalidBooks); // (1)
    // → false 
    1. The validation fails as books is not a number.

    Benefit #3: Advanced data validation conditions

    In DOP, data validation occurs at run time. It allows the definition of data validation conditions that go beyond the type of a field. For example, validating that a field is not only a string, but a string with a maximal number of characters or a number comprised in a range of numbers as well.

    JSON Schema supports many other advanced data validation conditions such as regular expression validation for string fields or number fields that should be a multiple of a given number. The author schema in the following listing expects firstName and lastName to be strings of less than 100 characters, and books to be a number between 0 and 10,000.

    var authorComplexSchema = {
      "type": "object",
      "required": ["firstName", "lastName"],
      "properties": {
        "firstName": {
          "type": "string",
          "maxLength": 100
        "lastName": {
          "type": "string",
          "maxLength": 100
        "books": {
          "type": "integer",
          "minimum": 0,
          "maximum": 10000

    Benefit #4: Automatic generation of data model visualization

    With the data schema defined as data, we can use several tools to generate data model visualizations. With tools like JSON Schema Viewer ( and Malli (, a UML diagram can be generated from a JSON schema.

    For instance, the JSON schema in the following listing defines the shape of a bookList field, which is an array of books where each book is a map, and in the following figure, it is visualized as a UML diagram. These tools generate the UML diagram from the JSON schema.

      "type": "object",
      "required": ["firstName", "lastName"],
      "properties": {
        "firstName": {"type": "string"},
        "lastName": {"type": "string"},
        "bookList": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "publicationYear": {"type": "integer"}
    author schema

    Cost for Principle #4

    Applying Principle #4 comes with a price. The following sections look at these costs:

    • Weak connection between data and its schema

    • Small performance hit

    Cost #1: Weak connection between data and its schema

    By definition, when data schema and data representation are separated, the connection between data and its schema is weaker that when data is represented with classes. Moreover, the schema definition language (e.g., JSON Schema) is not part of the programming language. It is up to the developer to decide where data validation is necessary and where it is superfluous. As the idiom says, with great power comes great responsibility.

    Cost #2: Light performance hit

    As mentioned earlier, there exist implementations of JSON schema validation in most programming languages. In DOP, data validation occurs at run time, and it takes some time to run the data validation. In OOP, data validation occurs usually at compile time.

    This drawback is mitigated by the fact that, even in OOP, some parts of data validation occur at run time. For instance, the conversion of a request JSON payload into an object occurs at run time. Moreover, in DOP, it is quite common to have some data validation parts enabled only during development and to disable them when the system runs in production. As a consequence, this performance hit is not significant.

    Summary of Principle #4

    In DOP, data is represented with immutable generic data structures. When additional information about the shape of the data is required, a data schema can be defined (e.g., using JSON Schema). Keeping the data schema separate from the data representation gives us the freedom to decide where data should be validated.

    Moreover, data validation occurs at run time. As a consequence, data validation conditions that go beyond the static data types (e.g., the string length) can be expressed. However, with great power comes great responsibility, and it is up to the developer to remember to validate data.

    DOP Principle #4: Separate between data schema and data representation

    To adhere to this principle, separate between data schema and data representation.

    do principle 4 schema

    Benefits include

    • Freedom to choose what data should be validated

    • Optional fields

    • Advanced data validation conditions

    • Automatic generation of data model visualization

    The cost for implementing Principle #4 includes

    • Weak connection between data and its schema

    • A small performance hit

    This article is an excerpt from my book about Data-Oriented Programming.

    More excerpts are available on my blog.


    Data is immutable

    With data separated from code and represented with generic data structures, how are changes to the data managed? DOP is very strict on this question. Mutation of data is not allowed! In DOP, changes to data are accomplished by creating new versions of the data. The reference to a variable may be changed so that it refers to a new version of the data, but the value of the data itself must never change.

    This article is an excerpt from my book about Data-Oriented Programming.

    More excerpts are available on my blog.

    This article is an exploration of the third principle of Data-Oriented Programming. The other principles of DOP are explored here:

    Principle #3 — Data is immutable.

    Illustration of Principle #3

    Think about the number 42. What happens to 42 when you add 1 to it? Does it become 43? No, 42 stays 42 forever! Now, put 42 inside an object: {num: 42}. What happens to the object when you add 1 to 42? Does it become 43? It depends on the programming language.

    • In Clojure, a programming language that embraces data immutability, the value of the num field stays 42 forever, no matter what.

    • In many programming languages, the value of the num field becomes 43.

    For instance, in JavaScript, mutating the field of a map referred by two variables has an impact on both variables. The following listing demonstrates this.

    var myData = {num: 42};
    var yourData = myData;
    yourData.num = yourData.num + 1;
    // → 43

    Now, myData.num equals 43. According to DOP, however, data should never change! Instead of mutating data, a new version of it is created. A naive (and inefficient) way to create a new version of a data is to clone it before modifying it. For instance, in the following listing, there is a function that changes the value of a field inside an object by cloning the object via Object.assign, provided natively by JavaScript. When changeValue is called on myData, myData is not affected; myData.num remains 42. This is the essence of data immutability!

    function changeValue(obj, k, v) {
      var res = Object.assign({}, obj);
      res[k] = v;
      return res;
    var myData = {num: 42};
    var yourData = changeValue(myData, "num", myData.num + 1);
    // → 42

    Embracing immutability in an efficient way, both in terms of computation and memory, requires a third-party library like Immutable.js (, which provides an efficient implementation of persistent data structures (a.k.a. immutable data structures). In most programming languages, there exist libraries that provide an efficient implementation of persistent data structures.

    With Immutable.js, JavaScript native maps and arrays are not used, but rather, immutable maps and immutable lists instantiated via Immutable.Map and Immutable.List. An element of a map is accessed using the get method. A new version of the map is created when a field is modified with the set method.

    Here is how to create and manipulate immutable data efficiently with a third-party library. In the output, yourData.get("num") is 43, but myData.get("num") remains 42.

    var myData = Immutable.Map({num: 42})
    var yourData = myData.set("num", 43);
    // → 43
    // → 42
    When data is immutable, instead of mutating data, a new version of it is created.

    Benefits of Principle #3

    When programs are constrained from mutating data, we derive benefit in numerous ways. The following sections detail these benefits:

    • Data access to all with confidence

    • Predictable code behavior

    • Fast equality checks

    • Concurrency safety for free

    Benefit #1: Data access to all with confidence

    According to Principle #1 (separate code from data), data access is transparent. Any function is allowed to access any piece of data. Without data immutability, we must be careful when passing data as an argument to a function. We can either make sure the function does not mutate the data or clone the data before it is passed to the function. When adhering to data immutability, none of this is required.

    When data is immutable, it can be passed to any function with confidence because data never changes.

    Benefit #2: Predictable code behavior

    As an illustration of what is meant by predictable, here is an example of an unpredictable piece of code that does not adhere to data immutability. Take a look at the piece of asynchronous JavaScript code in the following listing. When data is mutable, the behavior of asynchronous code is not predictable.

    var myData = {num: 42};
    setTimeout(function (data){
    }, 1000, myData);
    myData.num = 0;

    The value of data.num inside the timeout callback is not predictable. It depends on whether the data is modified by another piece of code during the 1,000 ms of the timeout. However, with immutable data, it is guaranteed that data never changes and that data.num is always 42 inside the callback.

    When data is immutable, the behavior of code that manipulates data is predictable.

    Benefit #3: Fast equality checks

    With UI frameworks like React.js, there are frequent checks to see what portion of the UI data has been modified since the previous rendering cycle. Portions that did not change are not rendered again. In fact, in a typical frontend application, most of the UI data is left unchanged between subsequent rendering cycles.

    In a React application that does not adhere to data immutability, it is necessary to check every (nested) part of the UI data. However, in a React application that follows data immutability, it is possible to optimize the comparison of the data for the case where data is not modified. Indeed, when the object address is the same, then it is certain that the data did not change.

    Comparing object addresses is much faster than comparing all the fields. In Part 1 of my book, fast equality checks are used to reconcile between concurrent mutations in a highly scalable production system.

    Immutable data enables fast equality checks by comparing data by reference.

    Benefit #4: Free concurrency safety

    In a multi-threaded environment, concurrency safety mechanisms (e.g., mutexes) are often used to prevent the data in thread A from being modified while it is accessed in thread B. In addition to the slight performance hit they cause, concurrency safety mechanisms impose a mental burden that makes code writing and reading much more difficult.

    Adherence to data immutability eliminates the need for a concurrency mechanism. The data you have in hand never changes!

    Cost for Principle #3

    As with the previous principles, applying Principle #3 comes at a price. The following sections look at these costs:

    • Performance hit

    • Required library for persistent data structures

    Cost #1: Performance hit

    As mentioned earlier, there exist implementations of persistent data structures in most programming languages. But even the most efficient implementation is a bit slower than the in-place mutation of the data. In most applications, the performance hit and the additional memory consumption involved in using immutable data structures is not significant. But this is something to keep in mind.

    Cost #2: Required library for persistent data structures

    In a language like Clojure, the native data structures of the language are immutable. However, in most programming languages, adhering to data immutability requires the inclusion a third-party library that provides an implementation of persistent data structures.

    The fact that the data structures are not native to the language means that it is difficult (if not impossible) to enforce the usage of immutable data across the board. Also, when integrating with third-party libraries (e.g., a chart library), persistent data structures must be converted into equivalent native data structures.

    Summary of Principle #3

    DOP considers data as a value that never changes. Adherence to this principle results in code that is predictable even in a multi-threaded environment, and equality checks are fast. However, a non-negligible mind shift is required, and in most programming languages, a third-party library is needed to provide an efficient implementation of persistent data structures.

    DOP Principle #3: Data is immutable

    To adhere to this principle, data is represented with immutable structures.

    do principle 3 immutable data

    Benefits include

    • Data access to all with confidence

    • Predictable code behavior

    • Fast equality checks

    • Concurrency safety for free

    The cost for implementing Principle #3 includes

    • A performance hit

    • Required library for persistent data structures

    This article is an excerpt from my book about Data-Oriented Programming.

    More excerpts are available on my blog.


    The old todo list

    I'm getting a late start with the old blogging today, so I'll make do with a todo list for the summer.

    Things that will definitely happen:

    • Take a walk over to a nearby cafe with my friend Esther to have some fika. This is definitely going to happen, as Esther is Dutch and therefore has made an entry in her diary and entries in Dutch diaries are non-negotiable. Also, this event will take place in 15 minutes.
    • Stop by the store on the way back from fika and buy a few ingredients for making guacamole and harissa. No, I will not be eating both of this things together. Unless I decide to put harissa on my black bean burrito that I'm going to also put guacamole on, which actually sounds kinda good now that I say it out loud. I'll let you know how this crime against food turns out.
    • Walk my dog. This will definitely happen at least twice a day, because otherwise Rover gets very sad because he still hasn't learned to use the big boy potty. Also he needs exercise.
    • Play some Guitar Hero: Warriors of Rock with my friend Simon. This will involve drinking beer and eating some chips with guacamole, as long as I don't eat all of the guacamole on my burrito like a greedy bastard. Headbanging, throwing horns, and "the man stance" are also likely to feature.

    Things that are likely to happen:

    • Create an AWS Lambda custom runtime for Babashka. I want to do this so I can get blazing fast startup times for my lambda functions AND be able to edit the source in the lambda console AND not be subjected to NodeJS in any way, shape, or form. ClojureScript is a wonderful wonderful thing, but to mangle the words of the great Rich Hickey, "Clojure rocks, Node reeks". Ray and I are in violent disagreement about this, but Ray is wrong and can therefore suck it.
    • Enable HTTPS for my blog, because is just embarassing. I use S3 static website hosting, and I know there's a way to use CloudFront plus an ACM (no, not that ACM, this one) cert to do this, but I'm pretty sure it will require a full day, mystic chanting, and the sacrifice of a monitor or two (when I throw it across the room out of frustration).
    • Dig into REPL-acement with Ray.
    • Learn what's so awesome about Nix flakes.

    Things that are unlikely to happen but really should:

    • Learn Swedish.

    OK, I need to run because otherwise I'm gonna be late for fika, and that just will not do.

    Update: I actually did some of these things!


    Test-induced design damage in Clojure

    Writing tests is the software equivalent of doing the dishes: unloved, but necessary. Unfortunately, because language designers are almost never interested in testing, it's usually a second-class concept, assuming the language even bothers to treat it as anything different from the rest of code in the first place. This is unfortunate, because it can have detrimental consequences to your code design.

    Test-Induced Design Damage (TIDD) is not a new concept. DHH of Rails wrote about TIDD back in 2014. This is not even a new concept for Clojure, as Eric Normand wrote a newsletter about it in the past. Unfortunately, Normand's post didn't have the impact I'd hoped for, nor did it go into enough detail for me, so I'm going to try and give examples that will help people understand the issue and the trade-offs a bit better.

    What is test-induced design damage?

    I suggest you read DHH's post above, but in short, it's altering code to better support tests at the expense of other aspects of the system. Communities like Test-Driven Development (TDD) have taken it as a priori gospel that better testing is a primary goal, while downplaying the consequences. As they say, software developers know the value of everything and the cost of nothing.

    For example, extracting hidden/private/closed-over code so it can be mocked for testing also carries detriments like requiring names (it usually can't be anonymous any more), cluttering docs, expanding argument lists (since support objects must be passed-in or injected), potential misuse (if users can now directly access/create/use an object when they shouldn't), and indirection overhead (both mental and code-based).

    To be clear: alterations that support testing may have other benefits that justify their use, but this needs to be evaluated on a case-by-case basis, rather than assuming improved testing is a sufficient justification itself.

    It's even possible that complicating your code to support testing actually increases the number of bugs, despite more testing. This is because, all other things being equal, larger codebases have more bugs1.

    The rest of this post will look at changes made solely for testing. I'll show you some examples of TIDD, in order of increasing complexity.


    Imagine you have some function that takes too long to use in local tests. Maybe it makes a network call that takes a while.

    Let's say you made a plain function:

    (defn my-fn 
    ;; usage

    You decide you need to mock it for testing, so what are your options?

    Redefining via with-redefs or with-redefs-fn

    Using with-redefs, you can temporarily replace the root definition for testing without touching the original code at all (alter-var-root can work, too, though it's more cumbersome to use). This sounds like the perfect way to leave non-testing code clean, right? Eric Normand suggested this in his original newsletter.

    (defn my-fn 
    ;; test
    (deftest redef-ing-my-fn
      (with-redefs [my-fn #(call-mock-endpoint)]
        (is (= (my-fn) some-expected-result))))

    Unfortunately, with-redefs requires a lot of care with multi-threaded / lazy code, since the var root definition is changed for all threads for a limited time. Code in other threads that run after the with-redefs ends can easily use an unintended value. Tim Baldridge wrote a long post on how vars work under the hood and why redefs can be tricky, and it's worth reading before using functions like binding/with-redefs in any context.

    You could safely use with-redefs if you can guarantee all of the following:

    1. Don't run multiple tests simultaneously - slower unit testing is the price
    2. Don't rely on background threads - these are fragile anyway, and create timing concerns even if you don't redef anything
    3. Wrap the entire body of the test in with-redefs - you don't need to worry about lazy evaluation happening after with-redefs ends if you've already forced the values you need
    4. Ensure you always join with other threads before exiting the with-redefs if those threads do anything an assertion relies on - this may require code distortion itself

    These are a lot of constraints. #1 is undesirable if easy, but ensuring #2 and #4 may range from annoying to infeasible without altering our main code, which violates the goal of avoiding TIDD.

    (There's also a long discussion about with-redefs in the Reddit comments on Eric Normand's original newsletter. Unfortunately, Eric's example involved a database connection, which inherently has state and was thus a stronger candidate for protocols/components, and many people latched onto that aspect instead of considering the bigger picture.)

    Rebinding via binding or with-bindings

    This is similar to the above, and works, but it requires you declare my-fn as ^:dynamic.

    (defn ^:dynamic my-fn 
    ;; test usage
    (deftest rebinding-my-fn
      (binding [my-fn #(call-mock-endpoint)]
        (is (= (my-fn) some-expected-result))))  

    On the upside, it only changes the local thread and its children's definition, so tests can run in parallel. Care must still be taken with background threads, but you should avoid those in tests anyway. For all other started threads, they'll carry the binding frame with them, even if the top thread ended, so it's much safer for multi-threaded code.

    However, this is still a slight alteration of the code. Declaring it as ^:dynamic means it's slightly slower to execute in production code. Worse, it sends a false signal to users that they may need/want to rebind it. Plus, it suffers a variant of the expression problem, since you cannot mark outside vars as ^:dynamic without forking the code. (One can argue you shouldn't test outside code, but creating wrapper fns just to mock is TIDD again.) Still, this is almost the ideal solution, if not for ^:dynamic.2

    Branch inside the function on a testing flag

    This might be an option if you already use feature flags heavily. For testing, it would look something like:

    (defn my-fn 
      (if-not global.flags/is-testing?

    Then you need to set global.flags/is-testing? only when testing. This keeps the function signature clean, but clutters the global namespace, complicates the function body, makes multiple mock behaviors difficult, and adds branching overhead.

    You could also use compile-time constants or macros to make this pattern more efficient, but it would still be less flexible and cluttered.


    What about polymorphism? You could make my-fn polymorphic with multimethods by dispatching based on whether you're running normally or for testing:

    (defmulti my-fn (fn [type] type))
    (defmethod my-fn :normal [_]
    (defmethod my-fn :test [_]
    ;; usage
    (my-fn :normal)
    ;; test usage
    (deftest polymorphic-multimethod-test
      (is (= (my-fn :test) some-expected-result)))

    The problem is you now have more code, and you have to weave the right dispatch value into all calls to my-fn (and possibly their parents), which alters the param signatures. You could set the dispatch value as a global var, but that has many of the same problems as internal branching does.

    Which leaves protocols...


    The pattern I've seen the most in real Clojure code, and unfortunately, the most complicated option, is to replace plain functions with protocols and records.

    (defprotocol MyProtocol
      (my-fn [_]))
    (defrecord MyFunctionner []
      (my-fn [_]
    (defrecord MyTestFunctionner []
      (my-fn [_]
    ;; non-default constructors are commonly added
    (defn my-functionner []
    (defn my-test-functionner []
    ;; usage 
    (let [my-fn-er (my-functionner)]
      (my-fn my-fn-er))
    ;; add component deps for bonus points
    (def system
     :my-functioner (my-functionner)
     :something-else (component/using
                       [:my-functionner ...]))

    Protocols have the inherent problem of requiring state, since they can only be used with an object. Even if the type/record defines no state internally, lifecycle state itself must be taken into consideration. Unlike a function or multimethod, which is effectively available once its namespace is required, protocol functions cannot be used before an object is created or after it's destroyed. Plus, the object must be passed around everywhere it's used, cluttering up argument lists and adding to naming overhead everywhere.

    For bonus complexity, non-default constructors are extremely common additions, and once people have a type/record with a lifecycle, they add it to their initialization system, so they end up writing a bunch of extra Component/Integrant/etc code to support it, too.

    Is all this worth it? How many protocols have you seen that exist just to support testing and nothing else?

    Solution: dynamic redef

    The solution I've settled on is one created by Mourjo Sen and I think it deserves to be more widely known. It's encapsulated in a mini-library called dynamic-redef.

    The basic idea is to mimic the propagated thread-local behavior of binding without having to declare anything ^:dynamic or mess with our main code. It uses alter-var-root to permanently replace the root definition of a function with one that looks up its current definition in a ^:dynamic map but falls back to the original definition if no overrides are found. Then "dynamically redefining" a function involves adding a new binding frame under the hood with updated fn definitions for the dynamic function lookup map.

    Here's his original gist of the technique:


    1. Allows you to leave your main code completely unaltered
    2. Incurs no performance penalty in production code
    3. Replaces definitions in a more thread-safe manner than raw with-redefs


    1. Does not play well with background threads (though you should avoid those in tests when possible)
    2. Like binding, does not work with plain Java threading, which doesn't use Clojure thread frames


    This is not meant to eliminate testing-specific protocols/records, but to offer an option that's more suitable in some use cases. My personal "middle way" of testing is, examine the thing to be mocked and determine if it has inherent state. If so, it's probably a better fit for protocols. But if not, don't complicate your code just to test it. Give dynamic redef a try. It may be unfamiliar, but it's simpler than the alternatives when it fits.

    1. Code Complete has some industry-generated estimates on bugs/LOC, but the much-discussed study, A Large-Scale Study of Programming Languages and Code Quality in Github, actually computed the overall effect of code size (independent of language) as a control variable. If you look at the discussion of the control variables in Table 6, "...they are all positive and significant". All else being equal, less code means fewer bugs.
    2. Technically, you don't have to declare a var ^:dynamic to use binding on it. There's an undocumented .setDynamic method on vars, but to use this dark art successfully, you'd have to invoke it before the compiler gets to any call sites with the var. Otherwise, it'll compile a call to the root definition, and never check for binding frames. I've seen some code that claims to do this reliably via macros, but it doesn't seem to work for me.


    Announcing Platypub: open-source blogging + newsletter tool

    Today I'm making the first public release of Platypub, an open-source blogging + newsletter tool which I started developing on the side a few months ago. I've used Platypub to write this very announcement, along with the rest of the website. I've also built another site with it.

    My first motivation for building Platypub is because I wanted to have an open-source project that would help people learn Biff, the Clojure web framework I made (Platypub is built with Biff). Since lots of programmers have blogs, I figured it might give people a nice opportunity to submit PRs to a real, working project they use themselves. At a minimum, it'll provide some source code to read.

    My second motivationand the reason I picked this particular application to buildis that I'm scratching my own itch. I publish blog articles and newsletters in several different places using several different tools, and it was becoming unwieldy. I wanted a single publishing tool that would handle all my needs in one place. Something both convenient enough for non-technical users and yet with a theme system that provides the same level of flexibility you would have if you coded the site from scratch.

    That being said, Platypub is still very rough around the edges. It is not yet ready for non-technical users. I'm only releasing it now because it is ready for other people to hack on it. Indeed, I've intentionally restrained myself from putting too much work into it beyond the core functionality so that there's more low-hanging fruit for others to implement.

    If you'd like to give it a spin, see the README. I've made several issues as well. If you're interested in hacking on a Biff project, feel free to take on one of thoseor better yet, try using Platypub for a bit and then see what missing pieces you'd like to add. I'm happy to help you get started and provide code review. With this release out of the way, I'm also planning to start doing streamed pair programming sessions. At least two people have expressed interest already, so those might start soon. You can join #biff on Clojurians Slack and/or the Biff newsletter for announcements.

    Architecture + some demo screenshots

    Platypub includes a CMS and a theme system for rendering your websites and emails. For actually hosting the sites and sending the emails, Platypub integrates with Netlify and Mailgun. As such, you don't need to deploy Platypub to use it; you can just run it locally. You will need to get your own API keys for Netlify, Mailgun, and a couple other services.

    Eventually I'd like to host an instance of Platypub so anyone can use it without needing to set it up. It's designed so that a single instance can service multiple users, so offering a free plan will be feasible.

    TinyMCE is used for editing posts (no markdown, sorry! I prefer WYSIWYG. Also no block-based editing, thank goodness). If you add some S3 credentials to the config file, the editor will handle image uploads.

    When you've finished writing a post, you can preview and publish it from the Sites page.

    You'll need to create at least one site first, which involves setting some configuration options. Some of these options are required for all sites, while some of them are additional options specified by the theme you select:

    A theme is basically a script that reads in your posts and site configuration from an input.edn file and then spits out a set of static files. (The default theme is a Babashka script.) When you publish your site, those files are deployed to Netlify. If your site needs any backend functionality, the theme can output some serverless function code, which will also be hosted by Netlify. The default theme contains one serverless function, which powers the newsletter signup form:

    (The site shown above uses the default theme, while the Biff website uses a custom theme.)

    This theme setup gives you a lot of flexibility, but it also means that before Platypub can be provided as a managed service, we need to come up with a way to run the theme code in a sandbox.

    Themes work similarly for the newsletter side of things: the theme script reads in a specific post and then returns some HTML, which is sent out to your mailing list via Mailgun.

    We use Mailgun's mailing lists api for managing subscribers' email addresses. That also means we can let Mailgun handle unsubscribe requests (though we should probably add another serverless function for that eventually, since Mailgun's unsubscribe page is ugly).

    At the moment, if you want to actually see your list of subscribers, you'll need to do it from the Mailgun dashboard. Similarly, you can set up a custom domain for your sites by going to Netlify.

    Some bigger picture stuff

    Last week I wrote about a grand scheme I have, which includes promoting a media ecosystem that consists of many separate, interchangeable services instead of a few giant social media monoliths. Platypub is part of that: it's a "publishing service," meant to complement other services for consumption, discussion, and aggregation (read the post for more explanation). Within that framing, I think there are lots of interesting things to be experimented with. I see Platypub as a vector for trying out these experiments.


    Copyright © 2009, Planet Clojure. No rights reserved.
    Planet Clojure is maintained by Baishamapayan Ghose.
    Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
    Theme by Brajeshwar.