So it's been somewhat longer than I'd hoped. After the first blog post in this series I started on the second item in the checklist, "Automate the build". And that's where I got suited up and blasted off as a fully fledged architecture
astronaut.
Constructing the monolith
Everything started off reasonably sensibly. Since I'm using Clojure's deps system to configure my application it made sense to use the tools.build library to build an UberJAR for the service. Because of the simplicity of the application, so far, it was basically a copy-&-paste job of the tools.build example.
The first step was to create a :build alias in my deps.edn file to manage the build.
I like to keep all my infrastructure/configuration away from the rest of my code, so I created an infra directory and I created a build.clj file to house the build code.
restaurant
...
├── infra
│ └── build.clj
...
I then copied the example build namespace from the guide. Including the build number in the file name of the JAR makes life a little more complicated later, so I removed it and tided up the rest of the file.
This allowed me to create the UberJAR from either the REPL (as long as it's started with the build alias)
(build/uber nil)
or the command line.
clojure -X:build uber
Both of those commands creates the UberJAR at target/restaurant.jar.
I was now able to run the application UberJAR from the command line
sudo java -jar target/restaurant.jar
and then check that it worked using curl (or your web browser of choice)
$ curl localhost
Hello World!
Shooting for the stars
I had recently come across Large Scale Software Development (and a big trap) by the YouTube channel Code to the Moon that nicely encapsulates the architecture I wanted to use. Essentially, I wanted to architect the service as a stateless monolith that is run behind a load balancer. This allows for fairly straight forward scaling (both horizontally and vertically) and downtimeless deployments, two things that modern software engineering embraces. In my experience this is a suitable architecture for the vast majority of projects to begin with, and for a lot will be all the architecture that will ever be needed.
Wishlist
"All" I wanted was somewhere I could deploy my service that
was a Platform-as-a-Service (PaaS) that aligned with the architecture defined above
was API and/or CLI first
could run a javaagent alongside my service
would not let me deploy a trashed version of the service
was cheap
Platform-as-a-service
Because I'm a developer most of the time my ops skills are a gaping chasm of desirability. Could I learn to cobble together something in AWS, GCP or Azure, probably? Could I do something similar in kubernetes (or k8s or whatever it's called), it's probably not beyond me? Did I want to spend a bunch of time doing that? Nope. Would it have been secure, performant or reliable? Highly unlikely. All I wanted was someone to do the hard work of putting all the pieces together and present them on a platter to me. Not too much to ask, I think you'll agree.
API/CLI first
My intent is to deploy this service in small increments and very often using some sort of CI/CD tooling, so at the very least the deployments must be automatable. If everything else is as well, that will allow me to build any tooling I might need. I don't want your easy "one button deployment" I want a simple "one call deployment".
Run a javaagent
Logging has been a wonderful asset over the last few decades, but I really believe that tracing (via OpenTelemetry) and the observability it provides is the future of understanding live systems. I suspect it will supersede most, if not all, use cases for logging (not with the certainty that certain corners of the internet believed Blockchain was the future finance or similar corners that A.I. is the future of everything, but pretty close). OpenTelemetry provides a Java agent that can automatically instrument a JVM application using libraries that it knows about. E.g. it will instrument the Jetty HTTP server because that's widely known enough that the good folk over at the OpenTelemetry project have created automatic instrumentation for it as opposed to the HTTPKit server, which isn't as well known. For me to benefit from this automated tracing I need to be able to start my application with this Java agent.
Won't let me deploy trashed service versions
I'm a firm believer in trunk based development, or at the very least, short lived feature branches. Ideally I was looking for a service that would provide canary releases which would allow me to progressively roll out new versions of my service to increasingly larger percentages of my users. At the very least I wanted somewhere that would not allow me to deploy obviously broken versions of my service.
Cheap
I'm only a lowly software engineer, this isn't an attempt to make money and I already have plenty of expensive hobbies. I don't need to add cloud computing to that list.
Google-fu failures
I'm not proud to say it, but in this moment, my Google-fu failed me. Heroku looked promising as it could host the UberJAR, but there was no way that I could find to run the OpenTelemetry javaagent alongside it. Railway probably should have been enough, but the documentation didn't immediately click and somehow I missed fly.io (we'll get back to that one).
Blasting off
So I did what any self-respecting (some might say foolish or naive) software engineer suffering from "not invented here" syndrome would do and spent the next 6 months, off and on, building my deployment process through SSH to a Digital Ocean droplet. Had I not been suffering from said syndrome I could have spent that time learning something useful, like k8s or AWS. Don't get me wrong, I learnt a lot about Docker's API, how to use lispyclouds contajners library and Metosin's sieppari, and programmatically using SSH, but it appears building a PaaS isn't as easy as it first looks. Who knew?
Crashing back to earth
And then fly.io turned up in one of my social media feeds (I suspect it was The Primeagen, but I can't find the video now). It's a PaaS primarily driven through the CLI, it runs Docker containers so I can run a javaagent alongside my service, it's got multiple deployment options and it's cheap. Plus it handles SSL certs, secret management and has GitHub Actions integrations. Tick, tick, tick, tick. Okay, it's got a release option called "Canary" that isn't what I would describe as a canary release, but at least it should be enough to stop me shooting myself in the foot. If I'm reasonably careful.
Containing the UberJAR
Since I'm not a Docker expert I decided to steal the expertise of other humans who actually know what they're doing. Practically has some excellent documentation on building Docker images for Clojure applications. Andrey Fadeev has a really useful YouTube video about Docker and Clojure that helped it click for me. In fact, all his videos are great, if you're looking to step up your Clojure game you could do a lot worse than work your way through his back catalogue.
I created a following Dockerfile in the infra directory.
The strategy was to use a fully featured base to build the UberJAR and then a very thin base to run service itself.
FROM clojure:temurin-21-alpine AS builder
RUN mkdir -p /build
WORKDIR /build
COPY deps.edn /build/
RUN clojure -P -X:build
COPY ./src /build/src
RUN mkdir -p /build/infra
COPY ./infra/build.clj /build/infra/
RUN clojure -T:build uber
FROM eclipse-temurin:21-jre-alpine AS final
LABEL org.opencontainers.image.source=https://github.com/HughPowell/restaurant
LABEL org.opencontainers.image.description="Restaurant reservation application"
LABEL org.opencontainers.image.licenses="MPL-2.0"
RUN apk add --no-cache \
dumb-init~=1.2.5
ARG UID=10001
RUN adduser \
--disabled-password \
--gecos "" \
--home "/nonexistent" \
--shell "/sbin/nologin" \
--no-create-home \
--uid "${UID}" \
clojure
RUN mkdir -p /service && chown -R clojure. /service
USER clojure
WORKDIR /service
COPY --from=builder --chown=clojure:clojure /build/target/restaurant.jar /service/restaurant.jar
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["java", "-jar", "/service/restaurant.jar"]
Since I was defining a license here I also included the text of the license in a LICENSE file at the root of the project. My go to is the Mozilla Public License V2 (MPL-2.0).
Flying back to the moon
Now this is what I call a quick start guide. Install, signup, launch (which generated a local config file in infra/fly.toml), deploy, all from the CLI. Just one small problem. It wouldn't let me (probably quite reasonably) expose port 80 on the docker container, so I decided to update the application to listen on port 3000 and then exposed that port from the Dockerfile. I didn't need to change the port the service ran on, but it meant the added benefit that I could run the UberJAR locally without needing sudo. Once that was fixed up I successfully deployed the service and for good measure I pointed a subdomain of my hughpowell.net domain name to it.
Now all I needed was a way to prevent me shooting myself in the foot as often as possible. Fly.io provides a number of deployment strategies including what they call 'canary'. This boots a single additional machine, waits for it to become healthy and then replaces all the running machines one-by-one. Not how I would describe a canary deployment strategy, but enough for the time being.
To enable this I set up a [deploy] section in my infra/fly.toml file like so
With my deployments now configured all I needed was to automate them every time a change was committed to main. I chose GitHub Actions as my CI/CD pipeline orchestrator of choice. No great reason, it's integrated into GitHub, I've used it a couple of times before and it seems as competent as anything else at this scale. Having added my fly.io API token to the FLY_API_TOKEN secret I check out the project, install the flyctl application and deploy the service. I limit the job to 15 minutes because I never want build times to exceed that.
name: Build the Restaurant service
on:
push:
branches:
- main
jobs:
build:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup deployment controller
uses: superfly/flyctl-actions/setup-flyctl@master
- name: Deploy service
run: flyctl deploy --config infra/fly.toml
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
So, yeah. The last six months has basically boiled down to flyctl deploy --config infra/fly.toml and a lot of learning.
What's going on?
Now that my service is running in production I need a way to check that it's doing everything it should do and nothing it shouldn't. As noted above I wanted to use OpenTelemetry for that. Instrumenting the service locally required downloading the OpenTelemetry Java agent and then starting the UberJAR with the -javaagent:<path-to-OpenTelemetry-agent> flag. You can also pass a similar option to your REPL and you'll get traces while developing. There's a couple of options for interacting with OpenTelemetry locally. You can export the traces to the console or to an observability tool like SigNoz, my current tool of choice, or Digma.
What about the old school logs?
Unfortunately OpenTelemetry hasn't penetrated the entire Java ecosystem (yet!) so I still needed a way to output logs that are being written by the libraries I have and will import. Luckily the OpenTelemetry defines a standard which the Java agent has implemented to facilitate recording logs as spans.
First off I had to deal with the chaos that is Clojure logging infrastructure. To be fair it's mostly Java's fault. Either way, the magic incantations were to add a stack of dependencies
It's marginally disconcerting that logging now takes up almost exactly half of the service's code, but to be fair, the rest is just managing the HTTP server.
The important information
I added the OpenTelemetry Java agent to my Dockerfile and added the required arguments to attach to the JVM as the service started.
FROM clojure:temurin-21-alpine AS builder
RUN mkdir -p /artifacts
RUN wget -O /artifacts/opentelemetry-javaagent.jar https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v2.6.0/opentelemetry-javaagent.jar
...
FROM eclipse-temurin:21-jre-alpine AS final
...
COPY --from=builder --chown=clojure:clojure /artifacts/opentelemetry-javaagent.jar /service/opentelemetry-javaagent.jar
...
CMD ["java", "-javaagent:/service/opentelemetry-javaagent.jar", "-jar", "/service/restaurant.jar"]
For production traces I like Honeycomb (I'm also a big fan of Honeycomb's CTO Charity Majors, especially her perspectives on observability and socio-technical systems. I highly recommend perusing her blog when you have a chance.) They've got a free plan of 20 million events per month, which is more than enough to get started with. Having signed up I used flyctl to store the sensitive env vars that required my API key
fly secrets set OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=<API key>"
fly secrets set OTEL_EXPORTER_OTLP_METRICS_HEADERS="x-honeycombb-team=<API key>,x-honeycomb-dataset=restaurant"
and configured the rest of the required environment variables in my fly.toml configuration
The service was redeployed and logs and traces were being sent to Honeycomb.
Summary
The last six months (on and often off) has basically resulted in one line of code, flyctl deploy --config infra/fly.toml. It's pretty humbling, knowing I'm still perfectly capable of barreling headlong into a rabbit hole far enough that the diggers need to be called in.
The restaurant service is now running in production, re-deployed every time a new change is pushed to main and is pushing traces and logs through OpenTelemetry to Honeycomb.
Next time we'll finish off the check-list, "Turn on all the error messages". See you then. Hopefully somewhat sooner than this time.
C++ became a standard language for games and graphics software a long time ago. And, there was actual reason -- work with real-time graphics and physics requires high performance. Processing geometry, managing buffers, matrix calculations - all of that does take time.
But, what about high-level logic? Game mechanics, user interface, storage management, network requests? Stability and safety are much more demanded there, than speed.
Responsibility distribution
We may implement performance-demanding functions in a compiled language, such as C++, and call them from a program written in a dynamic language, such as Python.
But, today we already have well-documented and easy to use libraries for Python (pygame, pyopengl, pyassimp, pybullet, numpy), that are implemented primarily on C/C++ and do provide functions for heavy calculations, or physics/graphics in particular. We may never face necessity to implement such libraries on our own.
Is C++ the only choice?
It is generally accepted, that garbage collected languages, such as Java or C#, are slower than C++ and don't really meet requirements for heavy calculations. This is, of course, not true.
C++ may overcome Java or C# in performance by 20-30% in some special cases, but when it comes to runtime abstractions, such as dynamic function dispatch, languages interaction, asynchronous tasks, text or abstract collections management, Java and C# show much higher efficiency than C++.
Also, we may run our Python programs on the same runtime with Java or C#, using Jython or IronPython. It brings a lot of benefits, such as shared garbage-collected memory, types system and easy access to C# or Java libraries right out of the box. On Java are implemented such nice dynamic languages as Clojure and Groovy, that have complete access to Java Class Library and share previously mentioned benefits.
What really makes influence on performance?
Today personal computers are much faster, than 15-20 years ago. But, most of desktop programs or games do not work as fast as expected (despite that they are still mostly implemented on C/C++). Today we need good algorithms and effective approaches much more, than just language speed. Function with constant complexity on Python is more preferable than function with linear complexity on C. To paint 100 trees by 15 lines of Python code is more preferable than to paint 500 trees by 300 lines of C++ code.
Care about game, not language
It is not really important, what language you use, when you don't have any game made, right?
Making game on C++ is much more demanding and exhausting, than doing same on Python or Ruby. When you would make 1 game with C++, you would make 10 games with Python. When you would make 5 games with Python, it would be 0 games with C++.
Let us care about games and fun, otherwise what the point?
Starting a podcast has been on my to-do list for a while, and I'm happy to announce that I've finally taken the first steps.
My friend and colleague from Metosin, Martin Varela, was happy to jump in front of the camera to capture the following conver...
CIDER 1.14 is our most ambitious release since CIDER 1.8 (“Geneva”), which was released last autumn.
The single most notable user-visible change of this release is that CIDER is now more robust when evaluating and displaying large values. CIDER will no longer hang when C-x C-eing a big value in a source buffer or stepping over such a value with CIDER debugger.
I’m guessing that many people will also appreciate the improvements we’ve made to flex completion (which is finally fully compliant with the Emacs completion API), the inspector and to the cider-cheatsheet functionality which was mostly redesigned.
nREPL 1.2 restores the ability to interrupt evaluation on JDK 20+ (see https://github.com/nrepl/nrepl/pull/318 for details) and CIDER 1.15 implements support for nREPL 1.2.
More interesting work is in progress, so I hope I’ll have another exciting report for all of you in a couple of months!
Michiel Borkent
Updates
In this post I’ll give updates about open source I worked on during May and June 2024. To see previous OSS updates, go here.
Sponsors
I’d like to thank all the sponsors and contributors that make this work possible. Without you, the below projects would not be as mature or wouldn’t exist or be maintained at all.
Fix #98: internal options should not interfere with :restrict
deps.clj: A faithful port of the clojure CLI bash script to Clojure
Upgrade/sync with clojure CLI v1.11.3.1463
Other projects
There are many other projects I’m involved with but that had little to no activity in the past month. Check out the Other Projects section (more details) of my blog here to see a full list.
This is a summary of the open source work I’ve spent my time on throughout May and June, 2024. There were lots of small bug fixes and reports, driven by work on the Clojure Data Cookbook. This work was also the impetus for my initial release of tcutils, a library of utility functions for working with tablecloth datasets. I also had the wonderful opportunity to attend PyData London in June and found it really insightful and inspiring. Read on for more details.
Sponsors
This work is made possible by the generous ongoing support of my sponsors. I appreciate all of the support the community has given to my work and would like to give a special thanks to Clojurists Together and Nubank for providing me with lucrative enough grants that I can reduce my client work significantly and afford to spend more time on these projects.
If you find my work valuable, please share it with others and consider supporting it financially. There are details about how to do that on my GitHub sponsors page. On to the updates!
Ecosystem issue reports and bug fixes
Working on the cookbook these last couple of months turned up a few small issues in ecosystem libraries. The other developers of Clojure’s data science tools are such a pleasure to work with, it’s so rare and nice to have a distributed team of people capable of getting cool things built asynchronously. Here are some details of a few particular issues that came up:
Some good discussions about how best to incorporate the myriad of dependencies required to use Java machine learning libraries in Clojure libs, including sorting out what to do about transitive dependencies in our tribuo wrapper, led by Carsten Behring.
Initial release of tcutils
In my explorations of other languages' tools for working data I often come across nice utility functions that are super simple but have a big impact on the ergonomics of using the tools. I wanted to start bringing some of these convenience utilities to Clojure, so for now I’m putting them in tcutils. So far only a handful of helpers are implemented (lag, lead, cumsum, and clean-column-names). The goal is to eventually fill out more utilities that save people from having to dig into the documentation of half a dozen different libraries to figure out how to implement things like these. The goal is not to achieve feature parity or to exactly copy similar libraries, like pandas or dplyr, but rather to take inspiration from them and make our tools easier to use for people who are used to these conveniences.
Progress on Clojure Data Cookbook
I spent a lot of time on the Clojure Data Cookbook over these last two months. Notable progress includes:
The introductory chapters bear some resemblance now to the final form they’ll take.
The overall structure of the book is much more clear now.
I started the example analysis that will serve as the high-level introductory section of the book.
The publishing and deployment process is finally working.
It’s still very much in progress, but in the interest of transparency the work-in-progress version is available online now. It will continue to evolve and change as I fill out more and more of the chapters, but there’s enough of it available now to hopefully give a sense of the style and tone I’m going for. I also finally have the publishing workflow set up and it’s generating a nice-looking Quarto book, thanks to all of Daniel Slutsky’s amazing work on Clay and Quarto integration recently.
Progress on high-level goals
The high-level goal of my work in general remains to steward Clojure’s data science ecosystem to a state of maturity and flourishing so that data practitioners can use it to get real work done. Toward this end, I set up a project board to track progress toward what I see as the main components of this project.
Over the last couple of months, beginning with a prototype demoed at my London Clojurians talk in April, Daniel Slutsky has made tremendous progress on our goal of implementing a grammar of graphics in Clojure in the new hanamicloth library. The near-term goal is to stabilize the API of this library enough that it can be used to provide a user-friendly way to accomplish all of the simple data visualization tasks that are currently possible with our other tools. The long term goal is to take the lessons we learn from this library and build a JVM-only grammar of graphics library for doing data visualization “right” in Clojure.
The development and surrounding discussions of hanamicloth have also made me realize it would be useful to write an overview of the current state of dataviz options for Clojure and why we’re working on building something new. That’s on my list for the coming months, but lower priority than actual development work.
Impressions from PyData London
I got to attend PyData London this year thanks to a client of mine who was sponsoring the conference. I learned a lot and found the talks very interesting. My overall impression is that data science is maturing as a discipline, with more polished methods and robust theory backing up different approaches to data-related problems. With this maturation, though, comes higher expectations for production-ready, professional quality results. Most of the talks focused on high-level concerns like observability, scalability, and long-term stewardship of large open-source projects.
There are a lot of reasons why Python is just not ideal for building highly available, high-performance systems, and I really believe this is a good time to be building alternative tools for data science. Python is obviously entrenched as the current default language for working with data, but it is difficult and slow to write code that can take full advantage of modern hardware (because of the infamous global interpreter lock, reference counting, slow I/O, among other reasons). And to be fair, the Python community knows this. It’s why virtually all of the libraries that do the heavy lifting for data science in Python are actually implemented in C (numpy, pandas) or Rust (Polars, Pydantic), or are wrappers around C++ (PyTorch, TensorFlow, matplotlib) or Java (PySpark, Pydoop, confluent-kafka) libraries.
I think this provides a lot of insights into what data practitioners want. It’s clear that users want approachable, simple, human-readable interfaces for all of these tools, and that any new tool needs to interoperate with the rest of the ones currently in use. People are also tired of churn and are craving stability. I think Clojure has a lot to offer in all of these areas and is well placed to become more widely adopted for data science.
Ongoing work
My focus over the next two months will remain on the cookbook. My main goal is to finish the introductory chapter with the housing price analysis and to continue putting together the data import section with instructions and examples for all file formats that can reasonably be supported easily at this time.
I’ll continue to support and contribute to all of the ecosystem libraries I come across in my writings and analysis work in hopes of smoothing out all the rough edges I find.
Thanks for reading. I always love hearing from people who are interested in any of the things I’m working on. If that’s you, don’t hesitate to be in touch :)
Nikita Prokopov
Hi, I’m Nikitonsky and this is my open-source update for the past two months. Some good work was done on Humble UI (finally!), DataScript and new project — AlleKinos.de.
New project: AlleKinos.de, a no-nonsense movie showtimes site for the entire Germany:
A simple view for all movie screenings in Germany, inspired by Bret Victor’s Magic Ink
Developed in Clojure, data stored in DataScript, hosted on application.garden
Includes many many small cities (up to 671 now!),
And all the cinemas that were reported missing before.
Started my 3 month sabbatical in June with a road-trip with the kids, a welcome reset! Now back to home, learning and doing.
Refreshed my knowledge of the latest TypeScript, Zod and XState
with a goal to pull some of the good things to Clojure (into Malli + a fully Xstate-compatible FSM-library). Also working on a
template-project with monorepo + malli + reitit, using Java21 and Virtual Threads.
Library Releases
reitit 0.7.1 (active)
Fixing regression bugs from 0.7.0 + latest features via dependent libraries.
Changelog here.
malli 0.16.2 (active)
Welcome Experimental Simplified Function Schemas!
[:->:any] ; [:=> :cat :any]
[:->:int:any] ; [:=> [:cat :int] :any]
[:-> [:cat:int] :any] ; [:=> [:cat [:cat :int]] :any]
[:-> a b c d :any] ; [:=> [:cat a b c d] :any];; guard property
[:-> {:guard (fn [[[arg] ret]] ...)} :string:boolean]
; [:=> [:cat :string] :boolean [:fn (fn [[[arg] ret]] ...)]]
Small fixes and improvements, Changelog here.
If you are a user of spec-tools and want to help, feel free to ping me on Clojurians Slack, happy to take a new contributor here.
Something Else
Old abandoned Soviet-era sanatorium in Latvia.
Peter Taoussanis
A big thanks to Clojurists Together, Nubank, lambdaschmiede, and other sponsors of my open source work! I realise that it’s a tough time for a lot of folks and businesses lately, and that sponsorships aren’t always easy 🙏
2024 May - Jun
Hi folks! 👋
The last couple months have been light on big-ticket releases. Have been focused on maintenance, support, and groundwork for future releases. Output included:
Nippy and Carmine security releases
If you haven’t yet, please do try update to the latest versions of Nippy and/or Carmine when possible:
In short: Carmine uses Nippy for its serialization, and Nippy uses a Java compression library for its compression. Earlier releases of that Java library may be vulnerable when decompressing malicious data directly crafted by an attacker. The attack is believed to require arbitrary control of the data provided to Nippy for thawing.
Relevant posts were made to the Clojure subreddit, Clojurians Slack, and my X account.
Telemere
Work has continued on Telemere, my new structured logging and telemetry library for Clojure/Script.
There were numerous minor beta releases to address various issues that came up, and to polish sharp edges and documentation, etc.
Instead of detailing all that here, I’ll just point to the current release - v1.0.0-beta14. The latest beta release will always include a summary of all major recent changes.
I’m aiming to try out RC1 around the end of August, but won’t needlessly rush. I’d like the API to be completely stable after v1 final is out, so I’d rather go a bit slower now to get things right.
Big thanks to early adopters and testers for all the valuable feedback so far! 🙏
Carmine
Work has continued on Carmine v4. It’s quite an undertaking, but I’ve recently updated and merged the first parts of the new v4 core into mainline.
The current plan is for all the new stuff to live in a parallel taoensso.carmine-v4 namespace. This’ll make it easier for me to roll out the new work in stages, and get feedback from early adopters without negatively impacting existing users.
There’ll be a lot to say on Carmine v4, but that’ll come later.
Upcoming work
My current roadmap can always be found here, and it’s now also possible to vote to help guide my priorities.
Hopefully release the final stable version of Tempel - my new data security framework for Clojure. Before the final release I’m planning to investigate support for MFA, extend the docs re: use with OpenID, OWASP, and make a few other last improvements. Originally had this planned for earlier, but rescheduled so that I could prioritise the Nippy security topic.
When I first became interested in functional programming, a more experienced engineer told me: “you know functional programming doesn’t really amount to much more than procedural programming.” As I insisted on the benefits of map, filter and reduce, he simply shook his head. “You’re thinking in the small. Go look at a large real-world application.”
It took some time for me to see what he meant. My preferred language, Clojure, is a functional language.
More to the point: Where’s the point? Recently I had to dig into the BigDecimal implementation to fix a reported bug. Every time I have to look at the BigDecimal code, it is a journey of rediscovery. I’m going to write down a few things to save me some time in the future.
General Decimal Arithmetic Specification
All the implementations of BigDecimal that I looked at when I started working on this implemented some variation of the specfication given in The General Decimal Arithmetic Specification. The specification is a bit … long. (74 pages.) And dense. But between the spec and the implementations, I got enough insight to proceed.
The GDAS has lots of added complexity. I was only trying to match the capabilities of java.math.BigDecimal. That implementation targeted a subset of the spec, more or less the X3.274 subset that is specified in an appendix of the GDAS. This subset shield us from some complexitiies: NaNs, infinite values, subnormal values, and negative zero. But I figured if it was good enough for Java, … .
What is a number?
The GDAS provides an abstract model for finite numbers. (I’m going to ignore the designated special cases.) A finite number is defined by three integer parameters:
sign: 0 for positive, 1 for negative
coefficient: an integer which is zero of or positive.
exponent: a signed integer
(There is a lot of discussion about allowed values for these parameters and interactions of limits and ranges. You’re welcome to have a go at it.)
The numerical value of a finite number is given by the formula:
value = (-1)^sign * coefficient * 10^exponent
In what follows, I’ll use the abstract representation for clarity. It will be notated as [sign,coeff,exp]. For convenience, I’ll often reduce this to [coeff,exp] and assume the coefficient is a signed integer value. For example, I might notate the
number 123.45 as [0, 12345, -2] or [12345, -2], depending on the context. There should be no confusion.
The first thing to understand is this:
This abstract definition deliberately allows for multiple representations of values which are numerically equal but are visually distinct (such as 1 and 1.00). [GDAS, p. 10]
What is 1? 1.00? Simple. The former is [1, 0] and the latter is [100, -2]. The save the same value. They differ in precision: the former has a precision of one digit; the latter has precision 3.
Conversions
The GDAS defines specific algorithms from converting an abstract representation to a string and string to an abstract representation. The algorithms are not complicated, but a little longer than we need to get into. Here are some examples from the GDAS:
but one finds it hard to image a situation where 29 digits of precision are really necessary. But, you do you.
In case you would like to limit the precision, the GDAS provides a context object which can be used to control the precision and other parameters affecting arithmetic operations. We need only these:
precision: “An integer which must be positive (greater than 0). This sets the maximum number of
significant digits that can result from an arithmetic operation.” [GDAS, p 13]
rounding: “A named value which indicates the algorithm to be used when rounding is necessary.
Rounding is applied when a result coefficient has more significant digits than the value
of precision; in this case the result coefficient is shortened to precision digits and may
then be incremented by one (which may require a further shortening), depending on
the rounding algorithm selected and the remaining digits of the original coefficient. The
exponent is adjusted to compensate for any shortening. “ [GDAS, p 13]”
There are five rounding ‘algorithms’ – usually called rounding mode that must be implemented:
Again quoting from GDAS:
Mode
Description
round-down
(Round toward 0; truncate.) The discarded digits are ignored; the result is unchanged.
round-half-up
If the discarded digits represent greater than or equal to half (0.5) of the value of a one in the next left position then the result coefficient should be incremented by 1 (rounded up). Otherwise the discarded digits are ignored.
round-half-even
If the discarded digits represent greater than half (0.5) the value of a one in the next left position then the result coefficient should be incremented by 1 (rounded up). If they represent less than half, then the result coefficient is not adjusted (that is, the discarded digits are ignored). Otherwise (they represent exactly half) the result coefficient is unaltered if its rightmost digit is even, or incremented by 1 (rounded up) if its rightmost digit is odd (to make an even digit).
round-ceiling
(Round toward +∞.) If all of the discarded digits are zero or if the sign is 1 the result is unchanged. Otherwise, the result coefficient should be ncremented by 1 (rounded up).
round-floor
(Round toward -∞.) If all of the discarded digits are zero or if the sign is 0 the result is unchanged. Otherwise, the sign is 1 and the result coefficient should be incremented by 1.
In Clojure, the dynamic Var named *math-context* is used to hold the current context. The default value of nil indicates unbounded precision and no rounding mode. The Numbers suite of arithmetic operations will use this context to determine the precision and rounding mode for the operation. The context can be set using the with-precision macro. For example:
Note: if :rounding is not specified, the default is HalfUp.
Some operations require an explicit context, typically when the result of an operation with no rounding does not have exactly representable result. Division is the poster child.
The GDAS provides algorithms for the basic arithemetic operations. Some of them are rather involved. In fact, in my original implementation in C#, I have comments specifically noting places where I felt compelled to “port while looking”, i.e, I pretty much just straight translated the code from the OpenJDK implementation.
We can look at one operation, addition, to get a feel for how arithmetic computations are done, especially with regard to how the context comes into play for limiting precision and rounding.
Paraphrasing the GDAS (which combines the description of additoin and subtraction – I’m subtracting the subtraction part):
The coefficient of the result is computed by adding the aligned coefficients of the two operands.
The aligned coefficients are computed by comparing the exponents of the operands:
If the exponents are equal, the aligned coefficients are the same as the original coefficients.
Otherwise, the aligned coefficent of the number with the larger exponent is multiplied by 10^n, where n is the absolute value difference between the exponents; the aligned coefficient of the other operand is the same as the original coefficient.
The result exponent is the minimum of the exponents of the two operands.
In other words, basically you do the equivalent of shifting in order to align the decimal points. Without talking about decimal points, just exponents.
The result is then rounded to precision digits if necessary, counting from the most significant digit of the result.
Now, this is where you are going to get into trouble. Precision is how many digits you want to keep. Rounding with a context does not make it easy to say – give me just an integer result.
Given these definitions for our friend in the previous example:
(Remember that a precision of 0 means no precision limit.)
What if you really want to round a result to get an integer?
See below. But first, let’s write some code to at least get us through addition. It is easiest to discuss the rounding operation concretely.
It’s coding time
Let’s get a context type going first. We need an enum to cover the rounding modes.
type RoundingMode =
| Up
| Down
| Ceiling
| Floor
| HalfUp
| HalfDown
| HalfEven
| Unnecessary
The context is a record type.
[<Struct>]
type Context =
{ precision: uint32
roundingMode: RoundingMode }
// There are some standard contexts that can be used
/// Standard precision for 32-bit decimal
static member val Decimal32 =
{ precision = 7u
roundingMode = HalfEven }
/// Standard precision for 64-bit decimal
static member val Decimal64 =
{ precision = 16u
roundingMode = HalfEven }
static member val Unlimited =
{ precision = 0u
roundingMode = HalfUp }
/// Default mode
static member val Default =
{ precision = 9ul
roundingMode = HalfUp }
// And some factory methods
/// Create a Context with specified precision and roundingMode = HalfEven
static member ExtendedDefault precision =
{ precision = precision
roundingMode = HalfEven }
/// Create a Context from the given precision and rounding mode
static member Create(precision, roundingMode) =
{ precision = precision
roundingMode = roundingMode }
Now we can start implementing BigDecimal. For exposition purposes, I’ll present the code out of order. You’ll need to rearrange for an F# compilation to work.
We need three fields. The coefficient is a BigInteger and the exponent is an int. In addition, we lazily compute the precision of the number itself. We provide the precision in our private constructor. We supply 0, indicating not-yet-computed, much of the time, but if we know it at construction time we can supply it.
[<Sealed>]
type BigDecimal private (coeff, exp, precision) =
// Precision
// Constructor precision is shadowed with a mutable.
// Value of 0 indicates precision not computed
let mutable precision: uint = precision
// Compute actual precision and cache it.
member private _.GetPrecision() =
match precision with
| 0u -> precision <- Math.Max(ArithmeticHelpers.getBIPrecision (coeff), 1u)
| _ -> ()
precision
// Public properties related to precision
member this.Precision = this.GetPrecision()
member _.RawPrecision = precision
member this.IsPrecisionKnown = this.RawPrecision <> 0u
I’m going to skip that little helper method getBIPrecision for now. It deserves its onw (short) post.
Now let’s take a look at addition. We’ll provide two versions, one talking a context and one not. If we don’t take a context, then there is no rounding involved. Just align and add the coefficients and use the exponent of the smaller one.
member this.Add(y: BigDecimal) =
let xc, yc, exp = BigDecimal.align this y in BigDecimal(xc + yc, exp, 0u)
We will define align to give us back the aligned coefficents and the smaller exponent from the two BigDecimals.
/// Return the aligned coefficients and the smaller exponent
static member private align (x: BigDecimal) (y: BigDecimal) =
if y.Exponent > x.Exponent then
(x.Coefficient, BigDecimal.computeAlign y x, x.Exponent)
elif x.Exponent > y.Exponent then
(BigDecimal.computeAlign x y, y.Coefficient, y.Exponent )
else
(x.Coefficient, y.Coefficient, y.Exponent)
The computeAlign function is simple. It just multiplies the coefficient of the larger exponent by 10 raised to the difference in exponents. The larger value is the first argument.
The biPowerOfTen function is a simple helper function to compute BigInteger powers of ten.
When contexts are involved, we have to deal with rounding:
member this.Add(y: BigDecimal, c: Context) =
let result = this.Add(y)
if c.precision = 0u
|| c.roundingMode = RoundingMode.Unnecessary then
// rounding not required
result
else
BigDecimal.round result c
Rounding is required if the precision of the result is greater than the precision of the context.
The precision of the result is just the number of digits in its BigInteger coefficient. Suppose we have the
BigDecimal value [123456789, -2] (= 1234567.89) and the context precision is 4. We need to reduce the coefficient to four digits, leaving us with either 1234 or 1235, depending on the rounding mode. We get the 1234 by dividing by a power of ten, the power being the difference in precision. Here the the difference is 9 - 4 = 5, so we divide the coefficent by 100000 = 10^5. This yields 1234. Rounding up means adding 1, yielding 1235. Finally, to construct the result, we need the correct exponent. We divided by 10000, so we should multiply by the same amount; equivalently, increase the exponent by 5. The result is [1235, 3] or 1235000. In other words:
static member private round (v: BigDecimal) c =
let vp = v.GetPrecision()
if (vp <= c.precision) then
// No rounding required: precision is less than or equal to context precision
v
else
// Rounding required
let drop = vp - c.precision
let divisor = ArithmeticHelpers.biPowerOfTen (drop)
let rounded =
BigDecimal.roundingDivide2 v.Coefficient divisor c.roundingMode
// read below
let exp =
BigDecimal.checkExponentE ((int64 v.Exponent) + (int64 drop)) rounded.IsZero
// check for the case where we had a 9999... that rounded up to 10000...
if (ArithmeticHelpers.getBIPrecision(rounded) > c.precision && c.precision > 0u) then
let newCoeff = rounded / ArithmeticHelpers.biTen
BigDecimal(newCoeff, exp+1, c.precision)
else
BigDecimal(rounded, exp, c.precision)
The call to checkExponentE is to ensure that the exponent is not too large. Our exponents are limited to the range of int32. The increment of the exponent might cause an overflow. We do the arithmetic in int64. checkExponentE will make sure it is in bounds (and also checks if we have a zero result, for which we can just set the exponent to 1). If the exponent is too large, it throws an exception.
The final conditional covers this warning in the GDAS:
When a result is rounded, the coefficient may become longer than the current precision.
In this case the least significant digit of the coefficient (it will be a zero) is removed
(reducing the precision by one), and the exponent is incremented by one.
An example: Context = (4, HalfUp). Number is [999996789, -2]. As in our previous example, we divide by 100000, yielding 9999. Rounding up gives 10000. The exponent is increased by 1, giving [100000, 3]. However, our precision is now 5, not four. The code detects that our coefficient does not have the correct precision; in that case, we divide by 10 and increment the exponent. The result is [1000, 4].
Where’s my integer?
Dig around in GDAS and you will note mention of functions round-to-integral-exact and round-to-integral-value. These are both defined as calls the quantize function. The quantize text mentions that it used to be called rescale, with slightly different parameters. I decided to write Quantize and Rescale. (Some other implementations have the a version of rescale – java.lang.BigDecimal has a setScale method.)
The lhs is the value to be quantized or rescaled. The second argument provides the exponent. (Quantize takes the exponent from a BigDecimal value; Rescale gets the exponent directly.
The GDAS has this to say about quantize:
[…] , quantize returns the number which is equal in value (except for any rounding) and sign to the first (left-hand) operand and which has an exponent set to be equal to the exponent of the second (right-hand) operand.
The coefficient of the result is derived from that of the left-hand operand. It may be rounded using the current rounding setting (if the exponent is being increased), multiplied by a positive power of ten (if the exponent is being decreased), or is unchanged (if the exponent is already equal to that of the right-hand operand).
Unlike other operations, if the length of the coefficient after the quantize operation would be greater than precision then an Invalid operation condition is raised. This guarantees that, unless there is an error condition, the exponent of the result of a quantize is always equal to that of the right-hand operand.
Let’s work through two cases, increasing vs. decreasing the exponent.
Suppose we call Rescale with [123456789, -4] (= 12345.6789) and desired exponent -1. Use mode HalfUp. We can guess what the result should be: 12345.7 = [123457, -1].
Now, we could figure out the code to do that directly but as it turns out, we can press round into service for us. We can view this as decreasing the precision. We just need to figure out what the new precision is. We can see it is 6 = 9 - 3 = the precision of the left-hand size minus the change in the exponent.
For the other direction, suppose we have [123456,-1] and we want to rescale to exponent -4. In other words, we go from 12345.6 to 12345.6000 = [123456000,-4]. No rounding will be needed. Just multiply by the appropriate power of 10.
And that is essentially the code below. A few minor tweaks to compute the precision of the result (since we know it) and a recursive call to Rescale to handle the 999… => 1000… problem
static member Rescale(lhs: BigDecimal, newExponent, mode): BigDecimal =
let increaseExponent delta =
// delta negative => increasing the exponent => we might have to round to a new precision
let decrease = -delta |> uint
let p = lhs.Precision
let newPrecision = if p < decrease then 0u else p - decrease
let r =
lhs.Round
({ precision = newPrecision
roundingMode = mode })
if (r.Exponent = newExponent) then r else BigDecimal.Rescale(r, newExponent, mode)
let decreaseExponent delta =
// delta positive => decrease the exponent => multiply by 10^some power and don't underflow
let newCoeff =
lhs.Coefficient
* ArithmeticHelpers.biPowerOfTen (delta)
let oldPrec = lhs.Precision
let newPrec =
oldPrec + (if oldPrec = 0u then 0u else delta)
BigDecimal(newCoeff, newExponent, newPrec)
let delta =
BigDecimal.checkExponentE ((int64 lhs.Exponent) - (int64 newExponent)) false
if delta = 0 then lhs
elif lhs.Coefficient.IsZero then BigDecimal(BigInteger.Zero, newExponent, 0u)
elif delta < 0 then increaseExponent delta
else decreaseExponent (uint delta)
And I think that’s it for the exposition.
You can stick around for a little background, if you like.
How we got here
I was inspired to write this post because of a post on the Clojurian Slack channel that
there was bug in converting a BigDecimal to an integer using rounding mode Ceiling. That got me to digging into the BigDecimal code, re-reading the GDAS, etc. It was a simple fix, but it took me a long time to get my head into the game; I hadn’t done any substantive work on this code in 14 years.
I thought writing some things down might help future me (or a future maintainer) should I venture this way again.
While writing this post, I found a few things hard to explain/justify and ended up making some code tweaks that got rid of an allocation and simplified a recursive call. A nice side-effect of posting.
But this begs the deeper question of how we got here: Why did I implement BigDecimal?
When Clojure first appeared, it supported all the primitive numeric types of Java and also the java.math.BigInteger and java.math.BigDecimal classes: The lisp reader supports literals of those types (123N or ‘123.45M respectively); the Numbers suite of arithemetical operations supports them via casts, arithemetic contagion, etc.
This presented a bit of a problem when porting to the CLR. At that time, there were no standard packages for these types in the Base Class Libary (BCL). If ClojureCLR was going to provide support for arbitrary precision integers and decimals, the choices seemed to be either find some libraries to include or write my own. I chose the latter approach, in part because there seemed to be a definite distaste for including third-party libraries in the Clojure eco-system. Okay, also in part because I thought it would be fun.
So I looked around at some implementations of BigInteger packages, decided on what I wanted to provide – I needed to support at least the basic methods available in the Java version – and started coding. That was relatively straightforward. I could look at Microsoft.Scripting.Math.BigInteger (part of the IronPython project) and the java.math.BigInteger source code from OpenJDK. But mostly I relied on Donald Knuth’s The Art of Computer Programming, Volume 2. It was worth doing just for the excuse to read that book.
The coding was not terribly hard: We have a pretty intuitive feeling for integer arithmetic. If you can add two integers represented as sequences of digits in the range ‘0’ to ‘9’, how hard can it be to add two integers represented as sequences of ‘digits’ in the range ‘0’ to UInt32.MaxValue? (We use arrays of uint32 to represent values.) See? You’re almost there..
BigDecimal was a another game entirely. This is not floating-point, so no inspiration there. Even the System.Decimal class in the CLR is of no help in this world.
Fortunately, I found references to the GDAS early in the process.
I also look at various implementations of BigDecimal in other languages. I think my primary sources were:
Each has its own approach. The OpenJDK implementation is similar to ours, except instead of the exponent, they have a scale field, which is the negative of the exponent. (Translating that is fraught. All the comparisons and adjustments are backwards.) They also have some efficiency hacks, such as a compact representation for when the coefficient is small enough to fit into a regular integer; I decided not to bother with the added complexity.
I’m not sure of the origin of the IronPython version – I think it comes from some older Python implementation. This package provides a much more complete implementation of GDAS. It has more of the signals (Inexact, Subnormal, etc.). The representation if [exponent, coefficient, sign, isSpecial], implements NaNs, infinities, and other special values.
The following comment in the code indicated that I wasn’t going to get much help here:
# Note that the coefficient, self._int, is actually stored as
# a string rather than as a tuple of digits. This speeds up
# the "digits to integer" and "integer to digits" conversions
# that are used in almost every arithmetic operation on
# Decimals. This is an internal detail: the as_tuple function
# and the Decimal constructor still deal with tuples of
# digits.
The IronRuby implementation also has more of the spec: NaNs and infinity, the overflow exception modes. The representation for a finite value is [sign, fraction, exponent].
The fraction is of type Fraction, implemented in the package. The representation here is unusual. An array of uint is used for the value, but rather than using the whole range in each ‘digit’, each uint will be in the range of 0 to 999999999. This makes translating to and from strings much simpler. It might be fun to play with a BigInteger implemenation usng this representation.
For ClojureCLR.Next, I decided to ditch my own clojure.lang.BigInteger and use System.Numerics.BigInteger instead. Less to maintain. So the F# version of BigDecimal uses System.Numerics.BigInteger for the coefficients.
Last time on "Soundcljoud, or a young man's Soundcloud clonejure", I promised to clone Soundcloud, but then got bogged down in telling the story of my
life and never got around to the actual cloning part. 😬
To be fair to myself, I did do a bunch of
stuff to prepare for cloning, so now we can get to it with no further ado! (Skipping the ado bit is very out of character for me, I know. I'll just claim this parenthetical as my ado and thus fulfil your expectations of me as the most verbose writer in the Clojure community. You're welcome!)
I'll start by creating a player directory and dropping a bb.edn into it:
{:deps {io.github.babashka/sci.nrepl
{:git/sha "2f8a9ed2d39a1b09d2b4d34d95494b56468f4a23"}
io.github.babashka/http-server
{:git/sha "b38c1f16ad2c618adae2c3b102a5520c261a7dd3"}}
:tasks {http-server {:doc "Starts http server for serving static files"
:requires ([babashka.http-server :as http])
:task (do (http/serve {:port 1341 :dir "public"})
(println "Serving static assets at http://localhost:1341"))}
browser-nrepl {:doc "Start browser nREPL"
:requires ([sci.nrepl.browser-server :as bp])
:task (bp/start! {})}
-dev {:depends [http-server browser-nrepl]}
dev {:task (do (run '-dev {:parallel true})
(deref (promise)))}}}
In short, what's happening here is I'm setting up a Babashka project with a dev task that starts a webserver on port 1341 serving up the files in the public/ directory, starts an nREPL server on port 1339 that we can connect to with Emacs (or any inferior text editor of your choosing), and a websocket server on port 1340 that is connected to the nREPL server on one end and waiting for a ClojureScript app to connect to the other end.
Speaking of the public/ directory, I need a public/index.html file to serve up:
The index.html file loads three JavaScript scripts:
Scittle itself, which knows how to interpret ClojureScript scripts
The Scittle Promesa plugin, which provides some niceties for dealing with promises
The Scittle nREPL plugin, which will connect to that websocket server on port 1340 and complete the circuit that will allow us to REPL-drive our browser from Emacs (or the inferior text editor of your choosing)
Once this JavaScript is in place, index.html loads the soundcljoud.cljs ClojureScript file, which we'll come to in just a second.
All of this stuff is about using screen real estate effectively. The first chunk of CSS applies universally, but the bit inside this:
@media screen and (min-width: 900px) {
/* ... */
}
only applies to windows at least 900px wide. So our page defaults to a layout that's appropriate for phones (or really narrow browser windows), but then adjusts to move more content "above the fold" so you can probably see the entire UI without scrolling if you're viewing the page on a standard computer.
Now that we have all of the HTML and CSS plumbing in place, let's add a public/soundcljoud.cljs file to get started with some ClojureScripting:
(ns soundcljoud
(:require [promesa.core :as p]))
Firing up the REPL
Before we can start REPL-driving, we need to put the key in the ignition and give it a right twist! In other words, we open up a terminal in the top-level player/ directory and invoke Babashka:
: jmglov@alhana; bb dev
Serving static assets at http://localhost:1341
nREPL server started on port 1339...
Websocket server started on 1340...
This by itself is of course monumentally boring, so let's inject some excitement into our lives by jumping into soundcljoud.cljs and pressing C-c l C (cider-connect-cljs), selecting localhost, port 1339, and nbb for the REPL type (assuming you're in Emacs; if you're using some other editor, perform the incantations necessary to connect your ClojureScript REPL to localhost:1339).
If everything went according to plan, you should see something like this in your terminal window:
And something like this in your editor's REPL window:
;; Connected to nREPL server - nrepl://localhost:1339
;; CIDER 1.12.0 (Split)
;;
;; ClojureScript REPL type: nbb
;;
nil>
Let's prove that it works by evaluating the buffer with C-c C-k (cider-load-buffer), adding a Rich
comment, putting some ClojureScript in there that grabs our wrapper div, positioning our cursor at the end of the form, and evaluating that sucker with C-c C-v f c e (cider-pprint-eval-last-sexp-to-comment):
We've proven that we can evaluate ClojureScript code in the running browser process from our REPL buffer, which is nifty for sure, but our page still bores us, and the result of evaluating that code is pretty useless:
#object[HTMLDivElement [object HTMLDivElement]]
Let's actually do something with the div we've pulled down, and whilst we're at it, provide a useful way of logging stuff:
Fantastic! By using js/document.log (by the way, that js/ prefix is the way you instruct ClojureScript to do some JavaScript interop; it's basically saying "look for the next symbol in the top-level scope in JavaScript land"), we now get the fancy inspection tools in the browser's JavaScript console so we can expand parts of the object and drill down to see stuff we're interested in.
Now that we've established a baseline, we can get stuck in and do some real work. 💪🏻
Reading some RSS
Do you remember the MP3 files and RSS
feed we prepared in the previous blog post? Let's plop those down in our public/ directory so we can access them from the webapp we're slowly constructing:
: jmglov@alhana; mkdir -p 'public/Garth Brooks/Fresh Horses'
: jmglov@alhana; cp /tmp/soundcljoud.12524185230907219576/*.{rss,mp3} !$
: jmglov@alhana; ls -1 !$
album.rss
'Garth Brooks - Cowboys and Angels.mp3'
'Garth Brooks - Ireland.mp3'
"Garth Brooks - It's Midnight Cinderella.mp3"
"Garth Brooks - Rollin'.mp3"
"Garth Brooks - She's Every Woman.mp3"
"Garth Brooks - That Ol' Wind.mp3"
'Garth Brooks - The Beaches of Cheyenne.mp3'
'Garth Brooks - The Change.mp3'
'Garth Brooks - The Fever.mp3'
'Garth Brooks - The Old Stuff.mp3'
Now that our files are in place, let's see about loading the RSS feed from ClojureScript:
That looks quite familiar! That also looks like a bunch of text, which is not the nicest thing to extract data from. Luckily, that's a bunch of structured text, and more luckily, it's XML (XML is great, and don't let anyone tell you otherwise! And don't get me started on how we've reinvented XML but poorly with JSON Schema and all of this other nonsense we've built up around JSON because we realised that things like data validation are important when exchanging data between machines. 🤦🏼♂️), and most luckily of all, browsers know how to parse XML (which makes sense, as modern HTML is in fact XML):
Now that we know how to fetch and parse XML, let's see how to extract useful information from it. Looking at the log output, we can see that the parsed XML is of type #document, just like our good friend js/document (the current webpage that the browser is displaying). That's right, we have a Document Object Model, which means we can use all the tasty DOM functions we're used to, such as document.querySelector() to grab a node using an XPATH query.
Before we go any further, let's create some functions from this big blob of code. At the moment, we're complecting two things:
Extracting data from the XML DOM
Updating the HTML DOM to display the data
Let's do the functional programming thing and create a purely functional core and a mutable shell. Instead of extracting and updating, we'll create a function that transforms the XML DOM representation of an album into a ClojureScript representation:
Displaying the album title and cover art is all well and good, but in order to complete our Soundcloud clone, we need some way of actually listening to the music on the album. If you recall, our RSS feed contains a series of <item> tags representing the tracks:
What we need from each item in order to display and play the track is:
Song title
Artist (for this album, all tracks are from Garth, but an album could be a compilation of songs by different artists, so let's grab the artist in case we later decide to display it)
Track number
URL of the source audio
Let's write an aspirational function that assumes it will be called with a DOM element representing an <item> and transforms it into a ClojureScript map, just as we did for the item itself:
For the track number, we need to convert it to an integer, since the text contents of an XML elements are, well, text, and we'll want to sort our tracks numerically.
Now that we have a function to convert an <item> into a track, let's plug that into our ->album function to add a list of tracks to the album:
(defn ->album [xml]
{:title (xml-get xml "title")
:image (xml-get-attr xml "image" "href")
:tracks (->> (.querySelectorAll xml "item")
(map ->track)
(sort-by :number))})
OK, we have data representing a list of tracks, so we need to consider how we want to display it. If we cast our mind back to our HTML, we have a div where the tracks should go:
What we can do is create a <span> for each track, something like this:
<span>1. The Old Stuff</span>
Let's go ahead and write that function:
(defn track->span [{:keys [number artist title] :as track}]
(let [span (js/document.createElement "span")]
(set! (.-innerHTML span) (str number ". " title))
span))
(comment
(p/->> (load-album (str base-path "/album.rss"))
:tracks
first
track->span
(log "The first track is:"))
;; => #<Promise[~]>
)
In the JavaScript console, we see:
The first track is: <span>1. The Old Stuff</span>
This is cool, because the track->span function is still pure—there's no mutation occurring there. We have one and only one place where that's doing mutation, and that's display-album!, which is where we can hook into our functional core and display the tracks. In order to do that, we'll take our list of tracks, turn them into a list of <span> elements, and then set them as the children of the #tracks div.
This is fantastic... if all we want to do is know what's on an album. But of course my initial problem was wanting to listen to Garth and not having a way to do that. Now I have written much Clojure and ClojureScript, and still cannot listen to Garth. 🤔
Play it again, Sam
Of course what I do have is an HTML <audio> element and an MP3 file with a source URL, and I bet if I can just put these two things together, my ears will soon be filled with the sweet sweet sounds of 90s country music.
Let's start out with the simplest thing we can do, which is to activate the first track on the album once it's loaded. Since display-album! returns the album, we can just add some code to the end of the pipeline:
As soon as we evaluate this code, the <audio> element comes to life, displaying a duration and activating the play button. Pressing the play button, we do in fact hear some Garth! 🎉
However, our UX is quite poor, since there's no visual representation of which track is playing. We can fix this by emboldening the active track:
Speaking of UX, though, one would imagine that they'd be able to change to a track by clicking on it. At the moment, clicking does nothing, but that's easy enough to fix by adding an event handler to our span for each track that activates the track. Let's create a function and shovel our track activating code in there:
By the way, that clj->js function takes a ClojureScript data structure (in this case, our track map) and recursively transforms it into a JavaScript object so it can be printed nicely in the JS console.
OK, now that we have activate-track! as a function, we can use it in a click handler:
Evaluating this code activates the first track on the album as before, and then clicking another track highlights it in bold and loads it into the <audio> element. That's good, but what isn't so good is that the first track stays bold. 😬
Luckily, there's an easy fix for this. All we need to do is reset the weight of all the track spans before bolding the active one in activate-track!:
Whilst we're ticking off UX issues, let's think about what should happen when our user clicks on a different track. At the moment, we load the track into the player and then the user has to click the play button to start listening to it. That is perfectly reasonable when first loading the album, but if I'm listening to a track and then select another one, I would kinda expect the new track to start playing automatically instead of me having to click play manually.
Let's see how we can do this. According to the HTMLMediaElement documentation, our <audio> element should have paused attribute telling us whether playback is happening. Let's try it out:
Excellent! Now let's see how we programatically start playing a newly loaded track. Referring back to the documentation, we discover a HTMLMediaElement.play() method. Let's try that out:
When the album loads, the first track is activated but doesn't start playing. Clicking on another track activates it but doesn't start playing it. However, if we click the play button and start listening to the active track, then click on another track, the new track is activated and immediately starts playing.
This, my friends, is some seriously good UX! Of course, we can improve it further.
Keep playing it, Sam
The next UX nit that we should pick is the fact that when a track ends, our poor user has to manually click on the next track and then manually click the play button just to keep listening to the album. This seems a bit mean of us, so let's see what we can do in order to be the nice people that we know we are, deep down inside.
Our good friend HTMLMediaElement has a bunch of
events that tell us useful things about what's happening with the media, and one of these events is ended:
Fired when playback stops when end of the media (
This seems like it will fit the bill quite nicely. Hopping back in our hammock for a minute, we think about what should happen when the end of a track is reached:
The next track is activated and starts playing, unless
It's the last track on the album, in which case nothing should happen.
We can of course add a ended event listener to the <audio> element every time a new track is activated, but this is problematic because we would then want to remove the previous event listener, and it turns out that removing event
listeners is a bit
complicated. What if we instead had an event listener that knew what track was currently playing, where that track comes in the album, and what track (if any) is next? Then we'd only have to attach a listener once, right after we load the album. Let's think through how we could do that.
So far, we've been relying on the state of the DOM to tell us things like if the track is paused. A much more functional approach would be to control the state ourselves using immutable data structures and so on. A nice side effect of this (sorry, Haskell folks, Clojurists are just fine with uncontrolled side effects) is that it actually makes REPL-driven development easier as well! 🤯
Let's start by extracting a function to handle the tedium of loading the album, displaying it, and then activating the first track:
(defn load-ui! [dir]
(p/->> (load-album (str dir "/album.rss"))
display-album!
:tracks
first
activate-track!))
Now that we have this, we'll define a top-level atom to hold the state, then update our load-ui! function to stuff the album into the atom once it's loaded:
(def state (atom nil))
(defn load-ui! [dir]
(p/->> (load-album (str dir "/album.rss"))
display-album!
(assoc {} :album)
(reset! state)
:album
:tracks
first
activate-track!))
What we're doing here is creating a map to hold the state, then assoc-ing the loaded album into the map under the :album key, then putting that map into the state atom with reset!, which returns the new value saved in the atom, which is the one we just put in there, which will look like this:
We'll then grab the album back out of the map and proceed as before to activate the first track. This is a little gross, but we'll clean it up as we go.
Oh yeah, and remember when I promised this would make debugging easier? Check this out:
That's right, we no longer have to rely on logging stuff to the JS console in our promise chains!
OK, but we haven't really changed anything other than making the load-ui! function more complicated. Let's add a little more to our state atom so we can actually tackle the problem of auto-advancing tracks. First, we'll add a :paused? key:
Now let's add an event listener to the <audio> element that updates the state when the play button is pressed, doing a little cleanup of the load-ui! function whilst we're at it:
(defn load-ui! [dir]
(p/let [album (load-album (str dir "/album.rss"))]
(display-album! album)
(reset! state {:paused? true, :album album})
(->> album
:tracks
first
activate-track!)
(.addEventListener (get-el "audio") "play"
#(swap! state assoc :paused? false))))
(comment
(load-ui! "http://localhost:1341/Garth+Brooks/Fresh+Horses")
;; => #<Promise[~]>
(:paused? @state)
;; => true
;; Click the play button and...
(:paused? @state)
;; => false
)
If you're not familiar with swap!, it takes an atom and a function which will be called with the current value of the atom, then sets the next value of the atom to whatever the function returns, just like update does for plain old maps. And also just like update, it has a shorthand form so that instead of writing this:
(swap! state #(assoc % :paused? false))
you can write this:
(swap! state assoc :paused? false)
in which case swap! will treat the arg after the atom as a function which will be called with the current value first, then the rest of the args to swap!. You can imagine that swap! is written something like this:
(defn swap!
([atom f]
(reset! atom (f @atom)))
([atom f & args]
(reset! atom (apply f @atom args))))
It's obviously not written like that, even though that would technically probably maybe work. It's actually written like this:
(defn swap!
"Atomically swaps the value of atom to be:
(apply f current-value-of-atom args). Note that f may be called
multiple times, and thus should be free of side effects. Returns
the value that was swapped in."
{:added "1.0"
:static true}
([^clojure.lang.IAtom atom f] (.swap atom f))
([^clojure.lang.IAtom atom f x] (.swap atom f x))
([^clojure.lang.IAtom atom f x y] (.swap atom f x y))
([^clojure.lang.IAtom atom f x y & args] (.swap atom f x y args)))
But you get the point.
Aaaaaanyway, I seem to have digressed—which is firmly on brand for this blog, so I apologise for nothing!
But yeah, at this point, we're back to the functionality that we had before. If we click on a track whilst the player is paused, the new track is selected but doesn't start playing, and if we click on a new track whilst the player is playing, the player plays on by playing the new track. Got it?
However, activate-track! is still relying on the DOM to keep track of whether the player is paused. Let's fix this by checking the state atom instead:
We will have also seen the highlighted track change when we evaluated the (advance-track!) form! 🎉
Is this the end?
What we're building up to is of course the ability to play our album continuously. When one track ends, the next should begin. And our good friend <audio> has just what we need, in the form of the ended event. If we add one line of code to register advance-track! as the listener for the ended event:
(defn load-ui! [dir]
(p/let [album (load-album (str dir "/album.rss"))]
(display-album! album)
(reset! state {:paused? true, :album album})
(->> album
:tracks
first
activate-track!)
(.addEventListener (get-el "audio") "play"
#(swap! state assoc :paused? false))
(.addEventListener (get-el "audio") "ended"
advance-track!)))
(comment
(load-ui! "http://localhost:1341/Garth+Brooks/Fresh+Horses")
;; => #<Promise[~]>
;; Click ▶️ and witness the glory!
)
We win!
Winners who have won before and know how to win will of course know that the best thing to do after winning is to stride triumphantly to the podium, receive your 🥇, wave to your adoring public, soak up the applause like warm sunshine on a July day (unless you're in the southern hemisphere, in which case the warm sunshine is best appreciated in December, unless you're close enough to the equator to appreciate warm sunshine whenever you damn well please, unless you're too close to the equator and that sunshine is too warm to appreciate because you're sweating like wild), and then head home, find a comfy chair and open a bottle of champagne or fizzy water or tasty whiskey or whatever.
I, of course, am no such winner, so instead of retiring to my comfy chair with a glass of Lagavulin, I want to jump ahead in a track, so I confidently reach for the audio control and click ahead in the timeline, and... nothing happens WTF?
Reading more documentation, I discover that I can see the current time in seconds in the track by reading its currentTime property, and I can seek to an arbitrary time by setting currentTime, so let's give that a try, shall we? (Spoiler: we shall.)
To make a long story short, this all boils down to how the browser actually implements seeking. When it first loads the audio track, it issues a request like this:
GET /Garth+Brooks/Fresh+Horses/Garth+Brooks+-+The+Old+Stuff.mp3 HTTP/1.1
Range: bytes=0-
It will then buffer the bytes it got back and make the track seekable within those bytes, as described here. You can peer under the hood by inspecting the buffered and seekable properties of the <audio> element:
The buffering looks fine, but it seems that we can only seek between 0 seconds and 0 seconds in the track, which kinda explains why attempting to set currentTime to any number that isn't 0 results in seeking back to 0. 😭
Seeking apparently only works if we get that blessed 206 Partial Content response from the webserver, so the browser knows how to make subsequent range requests to buffer more data, and unfortunately, the built-in babashka.http-server that we're using to serve up files in public/ responds like this:
HTTP/1.1 200 OK
Content-length: 5943424
Content-Type: audio/mpeg
Server: http-kit
No partial content?
We may attempt to fix this next time on "Soundcljoud, or a young man's Soundcloud clonejure", that is if there is a next time.
We’ve got several updates to share from our Q2 2024 project developers. Check out the latest in their June and July Reports following the project list below.
clj-merge tool: Kurt Harriger
This project focuses on developing a git diff and merge tool for edn and clojure code with the aim of creating a git mergetool that can be used as a replacement for git’s default merge tool for clj(s) and edn files.
Compojure-api: Ambrose Bonnaire-Sergeant This project will deploy the first new releases since 2019 (and include compojure-api 1.x, 2.0.0-alpha branch, ring-swagger), compojure-api/reitet migration tools, and Swagger 3.0.
Enjure: Janet A. Carr
This project focuses on MVP for the Enjure CLI tool and providing the ability to create new projects and view/controller templates as well as delete templates.
Jank: Jeaye Wilkerson Jank’s library parity with Clojure.core is around 20%. The next step is to fill out the language to make it feel more like Clojure - including Lazy sequences, Loop/recur, Destructuring, Symbol interning, and for and doseq macros.
Lost in Lambduhhs Podcast: L. Jordan Miller
Rejuvenate and streamline production of the Lost In Lambduhhs Podcast, where the audience gets the opportunity to “meet the person behind the Github” - illuminating the personal narratives and insights of tech luminaries, giving them a platform to share their perspectives while promoting their library or tool.
Clj-merge: Kurt Harriger
Q2 2024 Report No. 2. Published July 1, 2024
Introduction
This tool aims to reduce unnecessary conflicts due to whitespace and syntax peculiarities by using a more semantic approach to diffing and merging. I’m grateful for the support from ClojuristsTogether and the invaluable feedback and support from the Clojure community.
Recent Progress
This month, I focused on the following improvements:
Bug Fixes: Several bugs were fixed to enhance the stability of the tool.
CI/CD Pipeline: A CI/CD pipeline was added to streamline the installation process and prevent additional regressions.
Error Reporting: Simplified error reporting to make it easier for users to provide useful feedback when the tool does not work as expected.
Due to an exceptionally busy schedule, progress on diff visualization and project promotion was limited.
Milestones Overview
The project was structured around several key milestones:
Development of the MVP - Mostly complete
Enhancement of diff handling and presentation - Ongoing
Community engagement and feedback integration - Ongoing
Performance optimization and cross-platform compatibility - Done
Milestone Progress
Development of the MVP
Goals: To create a minimal viable product using editscript and rewrite-clj.
Recent Updates: Bug fixes were implemented to enhance the stability of the tool.
Status: Mostly Complete. From a technical perspective I have been able to test the feasibility of the implementation and learned a lot. Its hard to say when this is “done,” I don’t quite feel ready to push the adoption until more work has been done on the diff visualization.
Enhancement of Diff Handling and Presentation
Goals: To improve the readability and utility of diffs for developers.
Recent Updates: Limited progress on diff visualization due to time constraints.
Status: Much more work still needs to be done here.
Community Engagement and Feedback Integration
Goals: To actively engage with the community to gather detailed feedback and real-world merge conflict examples.
Recent Updates: Simplified error reporting to facilitate better feedback.
Next Steps: Increase efforts to engage the community and aim to present at a Clojure meetup in the near future.
Performance Optimization and Cross-Platform Compatibility
Goals: Simplify the installation process.
Recent Updates: A CI/CD pipeline was added to streamline the installation process.
Status: Done
Conclusion
Thank you for your support and contributions to the clj-mergetool project.
Compojure-api: Ambrose Bonnaire-Sergeant
Q2 2024 Report No. 3. Published July 8, 2024
Last month I successfully added support
for compojure-api 1.x coercions in the 2.x branch. This is one of the last steps towards backwards-compatibility of 1.x code using the 2.x branch.
I did not make any progress on this front this month, but the remaining steps are starting
to crystallize, which I will talk a bit about here.
Implementation wise, a 1.x coercion is detected with fn?, and implies the Schema backend (Spec support was addedin 2.x). Such a coercion is a nested function with the shape request->field->schema->coercer, often
often implemented like (constantly nil) or (constantly {:body matcher}).
Several insights during this 3-month project made backwards-compatibility particularly clean.
Once I grokked the main differences between 1.x and 2.x coercions at the end of month 2, I realized that
my efforts to restore support for ring-middleware-format was misguided. I could instead
translate 1.x coercions to muuntaja’s expected format.
There are two steps to this. The first was to add support for “legacy” coercions in 2.x’s coercion
abstraction. This involved changing the coerce-request implementation in compojure.api.coercion.schema.
The second step is to translate ring-middleware-format’s :format options to muuntaja’s :formats.
I have only done this for the default options, and currently any custom :format extensions do not work.
One wrinkle that is difficult to reconcile in all cases is that 2.x dropped implicit support for several coercion formats such as yaml. In order to maintain backwards compatibility with 1.x coercions, we want to ensure that yaml formats are supported by default for legacy coercions.
I have not completely solved this problem, but I identified that coercions are usually configured in terms of
the api-defaults var, which is a map with containing :format in 1.x and :formats in 2.x.
In both branches, I introduced a breaking change, renaming api-default to api-defaults-v1 and api-defaults-v2
using :format and :formats respectively. This might help decide whether to include yaml coercion by default, but requires more thought.
Finally, I removed any attempt to add ring-middleware-format support in the 2.x branch since I realized it was unnecessary.
recompojure
As part of this 3-month project, I am releasing a recompojure, which is a library providing compojure-style macros that expand to reitit.
It does not have a stable release yet, but there are some interesting problems to solve.
I worked on recompojure for the first month of this project, but stopped after I ported it out of a corporate repo I prototyped it in. I realized that compojure-api has a different set of features than reitit, and decided that my time would be better served working on compojure-api itself.
One big difference between reitit and compojure-api is reitit accepts a local configuration (opts) map
which compojure-api is extended using global state. To preserve this local configuration style, recompojure
is structured as a macro-generating-macro that is passed a top-level options map. An additional subtlety is
that this options map is needed at compile-time when compojure-api does most of its work.
For example, the load-api call here defines all the compojure-api macros such as GET, POST, context, etc., but their extensions can (eventually) be centralized in the options var. This attempts to
address a common concern using compojure-api where care must be taken to ensure extension are loaded before
any routing macros are expanded—this style should centralize extensions such that they are
deterministic without further safe guards.
(ns com.recompojure.compojure-api1
"Exposes the API of compojure.api.core v1.1.13 but compiling to reitit."
(:require [com.recompojure.compojure-api1.impl :as impl]
[clojure.set :as set]))
(def ^:private options {:impl:compojure-api1})
(impl/load-api`options)
From here, I would like to add compojure-api 2.x support, and fully take advantage of the implicit options map
as described. The next big feature would be to reconcile compojure-api and reitit’s middleware support
so that compojure-api-style applications can easily be translated to reitit via recompojure. In particular,
most compojure-api app use the api function to create an app, but recompojure does not yet support translating this to reitit.
Project Summary
This 3-month project had two main focuses.
The first half concentrated on performance of the 2.x branch of compojure-api and ensuring stable versions
of security fixes were deployed.
My main goal for the second half of this 3-month project was to ease future maintenance of compojure-api by
retiring the 1.x branch. That way, features need only be developed in the 2.x branch and can
still be enjoyed by 1.x users.
This was much more challenging and onerous than I anticipated, and I would not have
been able to invest time in this if Clojurists Together had not funded the project.
My main activity was attempting to understand and compare two versions of the same project and reverse-engineer
the evolution of the project.
I’d like to thank Clojurists Together for selecting this project for funding.
Enjure: Janet A. Carr
Q2 2024 Report. Published June 12, 2024
Progress has been good. I regularly stream Enjure to my audience on Twitch which seem to be bootstrapping Enjure’s Github stars.
Despite the progress, I intentionally expanded the scope of the project by implementing an
HTTP router for Enjure. I’m not entirely sure that this was a wise decision, but
it adheres to Enjure’s guiding philosophy. Enjure’s HTTP router is implemented
with a Radix tree and supports path parameters, in pure Clojure. I’m hoping to come up with a scheme for query, body, path, and form coercion soon, but I
haven’t decided on a scheme I like. The router lives in a dynamic var managed
by Enjure and is updated by macros for defining pages and controllers.
Enjure has several macros enforcing a similar convention to define HTTP resources.
The purpose of which brings together a resource’s routes, contract, coercion and handling
expressions to a single namespace. Often Clojure web applications are structured with several
libraries. For example, Consider an application with Reitit with ring, next.jdbc with postgreSQL,
The application will likely have its routes in one namespace, it’s handlers in another namespace,
its business logic in another namespace and data modification language (DML) in another namespace;
Necessitating opening several source files to accomplish a small task. Rarely do the
Routes, contracts, and handlers change. If they do, it’s from minute changes.
Bringing these together cuts down on the cyclomatic complexity of developing web
Applications in Clojure. Thanks to homoiconicity, I can create constructs to help with
Exactly this:
;; Example from Enjure repo
(defpage user "/users/:user-id"
[req]
(let [{:keys [path-params]} req
{:keys [user-id]} path-params]
(format"<h1>Hello, %s</h1>" user-id)))
This “page” construct is simply a function var under the hood, but also manages the routing-table
Var. There’s no middleware required to update the routing table upon REPL reload. Simply evaluating
The buffer/namespace will change the routing table. defpage expects a string as its return value as
it’s largely tied to the content-type text/html. In the future I hope to have other, similar view
constructs to support other popular application mime types.
Similarly, there are actions, changes, and removals that correspond to POST, PUT, and DELETE
HTTP methods, respectively. Since Enjure places a high emphasis on convention, I’ll only show a
Simple example of a Sign In action for a user:
(defaction signin "/signin"
[req]
(let [{:keys [email password]} (:form-params req)]
(if (check-db email password)
(redirect pages/home :see-other) ;; redirects to whatever route pages/home var has.
(pages/siginin req) ;; this is a page var to render
)))
Since resources in Enjure are just function vars, they can be called directly, and also reverse-routed
to using some of the response macros. In the above example, if the signing check passes, redirect
Redirects to whatever route pages/home has declared. (Reverse-routing is largely for convenience, and
Not mandatory, the redirect macro supports redirecting to static paths/URLs, Enjure resources like pages, and
Values from functions).
Ideally, resources would interact with the database through a data model supported by the framework.
My ideas for this are still experimental and can be found in the repository under the internal “frm”
namespace. Currently, it’s some simple templating of basic queries by querying the information_schema in
Postgres, and interning the query functions as vars. These are queries I’ve seen regularly over the years,
and I’m sure that, once implemented, will give developers a boost in productivity. Plus, there’s the added
benefit of being decoupled from whatever mechanism I choose for supporting migrations/entities in Enjure (still
A TBD). However, this model does not alienate developers who opt to create more complex queries
as those are supported as well with next.jdbc.
Another idea I’ve been experimenting with is something I call the ReactiveRecord. ReactiveRecord uses
Software transactional memory to synchronize with the database, providing an in-memory DB representation.
I think given the functional interface provided by FRM above and information schema data. It might be interesting, but I do believe this kind of transacting might be faux-pas or even dangerous, so more thought is needed here on my part.
All of this will ideally be controlled with the Enjure CLI. Enjure puts a heavy emphasis on reducing
developer friction. Given a base installation of Clojure, installing Enjure should allow for the creation and
management of Enjure projects very easily. I’ll admit this is an area I’ve been slacking on a bit since I
wanted to finished with other core components first. As of writing this, the Enjure CLI has two basic commands:
Notes and help. Notes search a project for comments containing NOTES, FIXME, TODO, and HACK. Help just
prints out the help dialog. Soon enough Enjure CLI will support creating and deleting resources, migrations,
Entities, dependencies, etc., as well as creating new projects. The Enjure CLI can already be installed to a user’s path as a CLI utility written entirely in Clojure.
Finally, documentation of the project has become my lowest priority and definitely at risk. However, I’m not too concerned about the documentation faltering. In some sense, I’m a technical writer thanks to my blog, so I believe writing documentation for Enjure won’t be as challenging as the rest of the project.
Jank: Jeaye Wilkerson
Q2 2024 Report 3. Published June 30, 2024
Welcome back to another jank development update! For the past month, I’ve been
pushing jank closer to production readiness primarily by working on multimethods
and by debugging issues with Clang 19 (currently unreleased). Much love to
Clojurists Together and all of my
Github sponsors for their support this
quarter.
Multimethods
I thought, going into this month, that I had a good idea of how multimethods
work in Clojure. I figured we define a dispatch function with defmulti:
(defmulti sauce-suggestion ::noodle-type)
Then we define our catch-all method for handling types:
(defmethod sauce-suggestion :default [noodle]
(println "You can't go wrong with some butter and garlic."))
Then we define some specializations for certain values which come out of our
dispatch function.
Then, when you call the sauce-suggestion function, first the dispatch
function is called and then the correct method is looked up and called.
(sauce-suggestion {::noodle-type::shell})
Cheeeeeeeese!
(sauce-suggestion {::noodle-type::spaghetti})
You can't go wrong with some butter and garlic.
This is as much as I knew. But wait, there’s more!
Hierarchies
It turns out that multimethods match dispatch values based on a couple of
different hierarchies, too. If you’re matching actual class types, like
String, you could have a method which is parameterized on Object and it will
be a catch-all. So this would allow you to match on everything which inherits
from IRenderable, for example, and then use that interface to render the
object. I wasn’t concerned about this, since jank’s object model isn’t based on
inheritance. I figured I could leave this whole feature out of multimethods.
However, it turns out that Clojure supports another form of hierarchies! Even
crazier, we have full control over those hierarchies at run-time and we can
build as many as we want. Check this out.
; We can classify spaghetti and penne as Italian.; They will both be considered children of ::italian.
(derive::spaghetti::italian)
(derive::penne::italian)
; Then we can define a method based on the parent.
(defmethod sauce-suggestion ::italian [noodle]
(println "Sugo al pomodoro."))
; This allows us to match multiple dispatch values in a; deterministic and intuitive way.
(sauce-suggestion {::noodle-type::penne})
Sugo al pomodoro.
There are a handful of related core functions for working with these
hierarchies. jank now implements all of them.
make-hierarchy
isa?
parents
ancestors
descendents
derive
underive
As I was implementing multimethods, I needed a few more core functions, so those
were all implemented as well:
hash-set
disj
defmulti
alter-var-root
bound?
thread-bound?
Notably, this includes bound?, which required me to actually create a
dedicated unbound var object so I could distinguish between unbound vars and
vars holding nil.
Clang/LLVM 19
Most of my time this past month was not spent developing new features for jank,
which is why I only have multimethods and 13 new functions to report. Instead,
my time was spent trying to get jank ported over to the latest Clang/LLVM
version, which will allow us to leave Cling behind. jank uses these for JIT
compiling C++ code and upgrading to the upstream Clang will unlock huge
performance wins, make compiling jank easier, and will allow for jank to follow
the bleeding edge of the native JIT space. However, before we get there, we have a
couple of bugs to get past.
Extern templates
The first bug, which was causing JIT
linking issues, I reduced down to a simple test case involving an extern
template which is linked either in the current process or in a loaded shared
library. Clang will be unable to resolve the address of the definition of that
function. As it happens, the fmt library uses this pattern to provide some
optimized versions of certain templates. However, we can fortunately work around
this, since fmt wraps those definitions in a FMT_HEADER_ONLY preprocessor flag.
The relevant fmt source is here.
The process of narrowing this down from the entire jank runtime is cumbersome,
ruling out chunks of code at a time while still trying to keep things compiling
and correct.
Optimization crash
This is the blocking bug
preventing jank from switching to Clang. It only happens in release builds,
which also makes it harder to debug. This month, I traced the bug down from
a crash in jank all the way to a minimal test case involving assignments with an
implicit constructor. However, when testing whether or not the bug existed in
Clang 18, I found that it indeed did not. This meant that it’s since been
introduced in the yet unreleased Clang 19. So I bisected around 1300 commits,
each time requiring a fresh Clang/LLVM compilation and taking ~30m. It was an
entire day of all 32 cores on my machine being busy compiling, but fortunately I
could script all of the hard work just using some bash. Bisecting allowed me to
find the commit which introduced the issue. This has yet to be fixed and I don’t
have the expertise to know what’s wrong with that commit, but I’ve provided a
test case, pinged the relevant people, and now I’m hoping the real experts can
come in for the save.
Clang status
Aside form those two issues, only one of them being a blocker, the port to Clang
is ready. In debug builds (which avoid the second bug), jank can pass its full
test suite using Clang 19. Even better, some early benchmarking has shown that
Clang 19 is more than twice as fast as Cling when it comes to JIT compiling
large amounts of generated C++ code (such as all of clojure.core). That will
mean faster startup times and shorter REPL iteration loops.
What’s next?
Implementing multimethods identified a couple of issues related to certain
sequence types in jank which I’m still investigating. Once those are sorted,
I’ll continue working through the requirements to implement clojure.test,
which is why I was implementing multimethods in the first place. From there, I
can start testing my jank code using more jank code and the dogfooding cycle can
really begin. Stay tuned, folks!
Lost in Lambduhhs Podcast: L. Jordan Miller
Q2 2024 Report 2. Published June 27, 2024.
I have made continued progress on my new podcast series, thanks to the support from Clojurists Together. Here are the key milestones I’ve achieved since my last update and my plans moving forward:
Theme Music and Audio Engineering
Created Theme Music and Audio Engineering Template: Developed the theme music and an audio engineering template to ensure a consistent and professional sound for each episode.
Riverside.fm Proficiency
Learned Riverside.fm Editing Software: Gained proficiency in using Riverside.fm’s editing software and created a workflow for efficiently editing audio.
Episode Releases
Released Two Episodes:
David Nolan
Arne Brasseur
Guest Coordination and Diversity Efforts
Gender Diversity Challenge: I am striving to ensure gender diversity in my episodes, which has been challenging.
Reached Out to Prospective Guests: Contacted three prospective guests, with two having returned my communications.
Recent Challenges
Scheduling Conflicts: Faced scheduling conflicts due to a death in my family followed by getting sick with strep throat. I am now on day 8 of recovering from the sickness and have recordings scheduled for next week.
Next Steps
Continue Outreach: Continue to reach out to schedule recordings, ensuring a diverse lineup of guests.
Timely Editing and Release: Edit and release episodes in a timely manner, promoting on Clojurians Slack, Clojure Weekly updates, LinkedIn, and Twitter.
Expand Promotion Channels: Create a Mastodon account to help promote the podcast.
Conclusion
Despite recent challenges, I am on track with my project timeline and excited about the content I am creating. I will continue to provide updates as I progress further.
I spend a lot of time developing and teaching people about Clojure's open source tools for working with data. Almost everybody who wants to use Clojure for this kind of work is coming from another language ecosystem, usually R or Python. Together with Daniel Slutsky, I'm working on formalizing some of the common teachings into a course. Part of that is providing context for people coming from other ecosystems, including "translations" of how to accomplish data science tasks in Clojure.
As part of this development, I wanted to share an early preview in this blog post. The format is inspired by this great blog post I read a while ago comparing R and Polars side by side (where "R" here refers to the tidyverse, an opinionated collection of R libraries for data science, and realistically mostly dplyr specifically). I'm adding Pandas because it's among the most popular dataset manipulation libraries, and of course Clojure, specifically tablecloth, the primary data manipulation library in our ecosystem.
I'll use the same dataset as the original blog post, the Palmer Penguin dataset. For the sake of simplicity, I saved a copy of the dataset as a CSV file and made it available on this website. I will also refer the data as a "dataset" throughout this post because that's what Clojure people call a tabular, column-major data structure, but it's the same thing that is variously referred to as a dataframe, data table, or just "data" in other languages. I'm also assuming you know how to install the packages required in the given ecosystems, but any necessary imports or requirements are included in the code snippets the first time they appear. Versions of all languages and libraries used in this post are listed at the end. Here we go!
Reading data
Reading data is straightforward in every language, but as a bonus we want to be able to indicate on the fly which values should be interpreted as "missing", whatever that means in the given libraries. In this dataset, the string "NA" means "missing", so we want to tell the dataset constructor this as soon as possible. Here's the comparison of how to accomplish that in various languages:
Note that tablecloth interprets the string "NA" as missing (nil, in Clojure) by default.
R
In reality, in R you would get the dataset from the R package that contains the dataset. This is a fairly common practice in R. In order to compare apples to apples, though, here I'll show how to initialize the dataset from a remote CSV file, using the readr package's read_csv, which is part of the tidyverse:
library(tidyverse)
ds <- read_csv("https://codewithkira.com/assets/penguins.csv",
na = "NA")
Pandas
import pandas as pd
ds = pd.read_csv("https://codewithkira.com/assets/penguins.csv")
Note that pandas has a fairly long list of values it considers NaN already, so we don't need to specify what missing values look like in our case, since "NA" is already in that list.
Polars
import polars as pl
ds = pl.read_csv("https://codewithkira.com/assets/penguins.csv",
null_values="NA")
Basic commands to explore the dataset
The first thing people usually want to do with their dataset is see it and poke around a bit. Below is a comparison of how to accomplish basic data exploration tasks using each library.
Operation
tablecloth
dplyr
see first 10 rows
(tc/head ds 10)
head(ds, 10)
see all column names
(tc/column-names ds)
colnames(ds)
select column
(tc/select-columns ds "year")
select(ds, year)
select multiple columns
(tc/select-columns ds ["year" "sex"])
select(ds, year, sex)
select rows
(tc/select-rows ds #(> (% "year") 2008))
filter(ds, year > 2008)
sort column
(tc/order-by ds "year")
arrange(ds, year)
Operation
pandas
polars
see first n rows
ds.head(10)
ds.head(10)
see all column names
ds.columns
ds.columns
select column
ds[["year"]]
ds.select(pl.col("year"))
select multiple columns
ds[["year", "sex"]]
ds.select(pl.col("year", "sex"))
select rows
ds[ds["year"] > 2008]
ds.filter(pl.col("year") > 2008)
sort column
ds.sort_values("year")
ds.sort("year")
Note there are some differences in how different libraries sort missing values, for example in tablecloth and polars they are placed at the beginning (so they're at the top when a column is sorted in ascending order and last when descending), but dplyr and pandas place them last (regardless of whether ascending or descending order is specified).
As you can see, these commands are all pretty similar, with the exception of selecting rows in tablecloth. This is a short-hand syntax for writing an anonymous function in Clojure, which is how rows are selected. Being a functional language, functions in Clojure are "first-class", which basically just means they are passed around as arguments willy-nilly, all over the place, all the time. In this case, the third argument to tablecloth's select-rows function is a predicate (a function that returns a boolean) that takes as its argument a dataset row as a map of column names to values. Don't worry, though, tablecloth doesn't process your entire dataset row-wise. Under the hood datasets are highly optimized to perform column-wise operations as fast as possible.
Here's an example of what it looks like to string a couple of these basic dataset exploration operations together, for example in this case to get the bill_length_mm of all penguins with body_mass_g below 3800:
Note that in tablecloth we have to explicitly omit rows where the value we're filtering by is missing, unlike in other libraries. This is because tablecloth actually uses nil (as opposed to a library-specific construct) to indicate a missing value , and in Clojure nil is not treated as comparable to numbers. If we were to try to compare nil to a number, we would get an exception telling us that we're trying to compare incomparable types. Clojure is fundamentally dynamically typed in that it only does type checking at runtime and bindings can refer to values of any type, but it is also strongly typed, as we see here, in the sense that it explicitly avoids implicit type coercion. For example deciding whether 0 is greater or larger than nil requires some assumptions, and these are intentionally not baked into the core of Clojure or into tablecloth as a library as is the case in some other languages and libraries.
This example also introduces Clojure's "thread-first" macro. The -> arrow is like R's |> operator or the unix pipe, effectively passing the output of each function in the chain as input to the next. It comes in very handy for data processing code like this.
Here is the equivalent operation in the other libraries:
Here is what some more complicated data wrangling looks like across the libraries.
Select all columns except for one
Library
Code
tablecloth
(tc/select-columns ds (complement #{"year"}))
dplyr
select(ds, -year)
pandas
ds.drop(columns=["year"])
polars
ds.select(pl.exclude("year"))
Another property of functional languages in general, and especially Clojure, is that they really take advantage of the fact that a lot of things are functions that you might not be used to treating like functions. They also leverage function composition to simply combine multiple functions into a single operation.
For example a set (indicated with the #{} syntax in Clojure) is a special function that returns a boolean indicating whether the given argument is a member of the set or not. And complement is a function in clojure.core that effectively inverts the function given to it, so combined (complement #{"year"}) means "every value that is not in the set #{"year"}, which we can then use as our predicate column selector function to filter out certain columns.
import polars.selectors as cs ds.select(cs.starts_with("bill"))
Select only numeric columns
Library
Code
tablecloth
(tc/select-columns ds :type/numerical)
dplyr
select(ds, where(is.numeric))
pandas
ds.select_dtypes(include='number')
polars
ds.select(cs.numeric())
The symbol :type/numerical in Clojure here is a magic keyword that tablecloth knows about and can accept as a column selector. This list of magic keywords that tablecloth knows about is not (yet) documented anywhere, but it is available in the source code.
Note here we handle the missing values in the body_mass_g column differently than above, by specifying a default value for the map lookup. We're explicitly telling tablecloth to treat missing values as 0 in this case, which can then be compared to other numbers. This is probably the better way to handle this case, but the method above works, too, plus it gave me the opportunity to soapbox about Clojure types for a moment.
Adding columns based on some other existing columns
There are many reasons you might want to add columns, and often new columns are combinations of other ones. Here's how you'd generate a new column based on the values in some other columns in each library:
Note that this is where the wheels start to come off if you're not working in a functional way with immutable data structures. Clojure data structures (including tablecloth datasets) are immutable, which is not the case Pandas. The Pandas code above mutates the dataset in place, so as soon as you do any mutating operations like these, you now have to keep mental track of the state of your dataset, which can quickly lead to high cognitive overhead and lots of incidental complexity.
Again beware, the Pandas implementation shown here mutates the dataset in place. Also manually specifying every column name transformation you want to do is one way to accomplish the task, but sometimes that can be tedious if you want to apply the same transformation to every column name, which is fairly common.
Transforming column names
Here's how you would upper case all column names:
Library
Code
tablecloth
(tc/rename-columns ds :all str/upper-case)
dplyr
rename_with(ds, toupper)
pandas
ds.columns = ds.columns.str.upper()
polars
ds.select(pl.all().name.to_uppercase())
Like the other libraries, tablecloth's rename-columns accepts both types of arguments – a simple mapping of old -> new column names, or any column selector and any transformation function. For example, removing the units from each column name would look like this in each language:
import re ds.rename(columns=lambda x: re.sub(r"(.+)_(mm|g)$", r"\1", x))
polars
ds = ds.rename({ col: col.replace("_mm", "").replace("_g", "") for col in ds.columns })
Grouping and aggregating
Grouping behaves somewhat unconventionally in tablecloth. Datasets can be grouped by a single column name or a sequence of column names like in other libraries, but grouping can also be done using any arbitrary function. Grouping in tablecloth also returns a new dataset, similar to dplyr, rather than an abstract intermediate object (as in pandas and polars). Grouped datasets have three columns, (name of the group, group id, and a column containing a new dataset of the grouped data). Once a dataset is grouped, the group values can be aggregated in a variety of ways. Here are a few examples, with comparisons between libraries:
As you can see, all of these libraries are perfectly suitable for accomplishing common data manipulation tasks. Choosing a language and library can impact code readability, maintainability, and performance, though, so understanding the differences between available toolkits can help us make better choices.
Clojure's tablecloth emphasizes functional programming concepts and immutability, which can lead to more predictable and re-usable code, at the cost of adopting a potentially new paradigm. Hopefully this comparison serves not only as a translation guide, but an an intro to the different philosophies underpinning these common data science tools.
Thanks for reading :)
Versions
The code in this post works with the following language and library versions:
Clojure macros have two modes: avoid them at all costs/do very basic stuff, or go absolutely crazy.
Here’s the problem: I’m working on Humble UI’s component library, and I wanted to document it. While at it, I figured it could serve as an integration test as well—since I showcase every possible option, why not test it at the same time?
This is what I came up with: I write component code, and in the application, I show a table with the running code on the left and the source on the right:
It was important that code that I show is exactly the same code that I run (otherwise it wouldn’t be a very good test). Like a quine: hey program! Show us your source code!
This macro accepts code AST and emits a pair of AST (basically a no-op) back and a string that we serialize that AST to.
This is what I consider to be a “normal” macro usage. Nothing fancy, just another day at the office.
Unfortunately, this approach reformats code: while in the macro, all we have is an already parsed AST (data structures only, no whitespaces) and we have to pretty-print it from scratch, adding indents and newlines.
I tried a couple of existing formatters (clojure.pprint, zprint, cljfmt) but wasn’t happy with any of them. The problem is tricky—sometimes a vector is just a vector, but sometimes it’s a UI component and shows the structure of the UI.
And then I realized that I was thinking inside the box all the time. We already have the perfect formatting—it’s in the source file!
So what if... No, no, it’s too brittle. We shouldn’t even think about it... But what if...
What if our macro read the source file?
Like, actually went to the file system, opened a file, and read its content? We already have the file name conveniently stored in *file*, and luckily Clojure keeps sources around.
In any other language, this would’ve been a project. You’d need a parser, a build step... Here—just ten lines of code, on vanilla language, no tooling or setup required.
Sometimes, a crazy thing is exactly what you need.
Do you like going to the cinema? I do. But I also like to know where I am going and which movie I am going to see. But how do you choose?
You can’t go to the cinema’s website. There are just too many. Of course, you might have a favorite one and always go to it, but you won’t know what you are missing out.
Then, there are aggregators. The idea is good: gather everything that’s playing in cinemas right now in one place. Flight aggregators, but for movies.
Implementation, unfortunately, is not that good. As with any other website, the aggregator’s goal is to make you go through as many web pages as possible, do as many clicks as possible, and show you as many ads as possible.
Please use an ad blocker, this is unbearable
They even play a freaking TV ad in place of a movie trailer!
It’s a website that shows every movie screening in every cinema across the entire Germany.
And when I say EVERY screening, I mean it:
Every screening, every cinema, every movie. All in one long HTML table.
What else can it do?
Just filter. You can filter:
by city,
by city district (don’t want to travel too far),
by a particular cinema (maybe you have a favorite one),
by genre (want to see something with your kid but don’t know what),
or by movie (which cities does it still play?).
That’s it. That’s the site.
Oh, we also have a list of premieres so you would know what’s coming. But that’s it.
What about the interface?
There isn’t one. I mean, there is, of course, but I tried to make it as invisible as possible. There’s no logo. No menu. No footer. No pagination. No “See more”. No cookie banners (because no cookies). No ChatGPT/SEO generated bullshit. No ads, of course.
Why? Because people don’t care about that stuff. They care about function. And our UI is a pure function.
But how do I search?
Well, Ctrl+F, of course. We are too humble, too lazy, and too smart to try to compete with in-browser implementation.
Wait, what about page size?
It’s totally fine. I mean, for Berlin, for example, we serve 1.4 MB of HTML. 3 MB with posters. It’s fine.
Slack loads 50 MB (yes, MEGA bytes) to show you a list of 10 chats. AirBnB loads 15 MB, including 500 KB HTML, just to show 20 images. LinkedIn loads 1.5 MB of just HTML (37 MB total) for a fraction of the data we’re showing. So we are fine.
It’s kind of refreshing, actually. What kind of speed do you get from a table with a thousand rows. Feels like a lot, but still feels faster than anything on the modern web.
What about mobile?
That is a good question. I am still thinking about it.
The table trick won’t work on mobile. So layout needs to be different, but I also want it to have the same information density as the desktop, which is tricky.
If you just make the table vertical, it’ll be too much to scroll even for people with the strongest fingers. Maybe I’ll figure something out one day.
When I looked at the data, I realized it’s multidimensional: there are movies, they have genres, years, countries, languages, there are cinemas, which are located in districts, which are located in cities, then there are showings, which have day and time, and very possibly something else will come up later, too.
Now, I had no idea how that data would be accessed. Is the cinema part of the movie or is the movie part of the cinema? So I decided to make it all flat and put it into the database.
And it worked! It worked remarkably well. Now I can utilize DataScript queries being data to build them on the fly:
(defn search [{:keys [city cinema district movie genre]}]
(let [inputs
(cond-> [['$ db]]
city (conj ['?city city])
cinema (conj ['?cinema cinema])
district (conj ['?district district])
movie (conj ['?movie movie])
genre (conj ['?genre genre]))
where
(cond-> [:where]
city (conj '(or
[?cinema :cinema/city ?city]
[?cinema :cinema/area ?city]))
cinema (conj '[?cinema :cinema/title ?cinema-title])
district (conj '[?cinema :cinema/district ?district])
movie (conj '[?movie :movie/title ?movie-title])
genre (conj '[?movie :movie/genre ?genre]))]
(apply ds/q
(concat
'[:find ?show ?date ?time ?url ?cinema ?version ?movie
:keys id date time url cinema version movie
:in]
(map first inputs)
where
'[[?show :show/cinema ?cinema]
[?show :show/date ?date]
[?show :show/time ?time]
[?show :show/url ?url]
[?show :show/movie-version ?version]
[?version :movie-version/movie ?movie]])
(map second inputs))))
The whole database is around 11 MB, basically nothing. I don’t even bother with proper storage, I just serialize the whole thing to a single JSON file every time it updates.
The hosting
I have been building websites for a while. I have two (Grumpy and this blog) running right now on my own server. I already spent my time, I have figured this all out. I have all the templates at my fingertips.
It’s a hosting for small Clojure web apps (still in private beta) that’s supposed to take care of insignificant details for you and let you focus on your app first and foremost.
And it works! It’s refreshingly simple: you download a single binary that operates as a command-line tool, create garden.edn file with your project’s name, and call garden deploy. That’s it! Your app is live!
No, seriously. You tend to forget how many annoying small details there are before other people can use your app. But when something like Garden takes them away, you remember and get blown away again! If that’s what Heroku used to feel like back in the day, I’m all in for it.
The beauty Garden is that it helps you start fast, but it’s not a toy. It easily scales all the way up to production. Custom domain, HTTPS, auth, cron, logs, persistent storage: they take care of all of this for you.
And a cherry on top: they even provide nREPL to production! Again, no setup, just garden repl and you are in! Perfect for debugging weird performance issues or running one-off jobs.
An example: when I implemented premieres and committed the code, I still needed to run it for the first time. Instead of making a special flag or endpoint or adding and then immediately removing the startup code, I just connected to remote nREPL and invoked the function in the code. It doesn’t get easier than that!
Uncharacteristic of me, but I kind of enjoy building web apps again, when it’s that simple. Might build more in the future.
Conclusion
In the beginning, I wanted a simple website that solved my problem. I wanted a website that I’d enjoy using.
But I don’t want to make a product out of it. We have enough products already. It’s time someone took a user’s side. And I am one of the users.
Magic things happen when you trust your users and just show them everything you’ve got.
For example, I found some rare films playing that I had no idea about. Matrix in German (!), but once a week and only in one cinema. Or Mars Express, they play it in three cities only, excluding mine. How do you find out about stuff like this?
Here, I discovered it. I looked at the data and you started seeing stuff that otherwise is completely invisible.
Anyway, enjoy. If this becomes a trend, I’m all in for it. Wouldn’t mind seeing more sites like this in the future.
Most of clojurians write good things about Clojure only. I decided to start
sharing techniques and patterns that I consider bad practices. We still have
plenty of them in Clojure projects, unfortunately.
My first candidate is widely used, casual macro called with-retry:
It acts the same but accepts not arbitrary code but a function. A form can be
easily turned into a function by putting a sharp sign in front of it. After all,
it looks almost the same:
Although it is considered being a good practice, here is the outcome of using it
in production.
Practice proves that, even if you wrap something into that macro, you cannot
recover from a failure anyway. Imagine you’re downloading a file from S3 and
pass wrong credentials. You cannot recover no matter how many times you
retry. Wrong creds remain wrong forever. Now there is a missing file: again, no
matter how hard you retry, it’s all in vain and you only waste resources. Should
you put a file into S3, and submit wrong headers, it’s the same. If your network
is misconfigured or some resources are blocked, or you have no permissions, it’s
the same again: no matter how long have you been trying, it’s useless.
There might be dozens of reasons when your request fails, and there is no way to
recover. Instead of invoking a resource again and again, you must investigate
what went wrong.
There might be some rare cases which are worth retrying though. One of them is
an IOException caused by a network blink. But in fact, modern HTTP clients
already handle it for you. If you GET a resource and receive an IOException,
most likely your client has already done three attempts silently with growing
timeouts. By wrapping the call with-retry, you perform 9 attempts or so under
the hood.
Another case might be 429 error code which stands for rate limitation on the
server side. Personally I don’t think that a slight delay may help. Most likely
you need to bump the limits, rotate API keys and so on but not Thread.sleep in
the middle of code.
I’ve seen terrible usage of with-retry macro across various projects. One
developer specified 10 attempts with 10 seconds timeout to reach a remote API
for sure. But he was calling the wrong API handler in fact.
Another developer put two nested with-macro forms. They belonged to different
functions and thus could not be visible at once. I’m reproducing a simplified
version:
According to math, 4 times 3 is 12. When the (do-something-else) function
failed, the whole top-level block started again. It led to 12 executions in
total with terrible side effects and logs which I could not investigate.
One more case: a developer wrapped a chunk of logic that inserted something into
the database. He messed up with foreign keys so the records could not be
stored. Postgres replied with an error “foreign key constraint violation” yet
the macro tried to store them three times before failing completely. Three
broken SQL invocations… for what? Why?
So. Whenever you use with-retry, most likely it’s a bad sign. Most often you
cannot recover from a failure no matter if you add two numbers, upload a file,
or write into a database. You should only retry in certain situations like
IOException or rate limiting. But even those cases are questionable and might
be mitigated with no retrying.
Next time you’re going to cover a block of logic with-retry, think hard if you
really need to retry. Will it really help in case of wrong creds, a missing
file, incorrect signature or similar things? Perhaps not. Thus, don’t retry in
vain. Just fail and write detailed logs. Then find the real problem, fix it and
let it never happen again.
Last week two new language bindings were added to the YAMLScript family:
Go and
Julia.
Go
The Go binding has been a long time coming.
Several people have been working on it this year but it was Andrew Pam who finally got it over the finish line.
Go is a big user of the YAML data language, so we're happy to be able to provide
this library and hope to see it used in many Go projects.
Julia
The Julia binding was a bit more of a recent surprise
addition.
A few weeks ago a Julia hacker dropped by the YAML Chat Room to ask some questions about YAML.
I ended up asking him more about Julia and if he could help write a YAMLScript
binding.
He invited Kenta Murata to the chat room and Kenta
said he could do it for us.
Then Kenta disappeared for a few weeks.
Last week he came back with a fully working Julia binding for YAMLScript!
Fun fact: Julia is Clark Evans favorite
programming language!
Clark is one of the original authors of the YAML data language.
YAMLScript Loader Libraries
These YAMLScript language bindings are intended to be an alternative YAML loader
library for the respective languages.
They can load normal existing YAML files in a consistent way, and common API
across all languages.
They can also load YAML files with embedded YAMLScript code, to achieve data
importing, transformation, interpolation; anything a programming language can
do.
The current list of YAMLScript loader libraries is:
If your language is missing a YAMLScript binding or you want to help improve
one, please drop by the YAMLScript Chat Room and we'll get you started.
All of the bindings are part of the YAMLScript Mono-Repo on GitHub.
If you look at the existing bindings, you'll see that they are all quite small.
You'll need to learn about basic FFI (Foreign Function Interface) for your
language, to make calls to the YAMLScript shared library libyamlscript, but
that's about it.
It's a great way to get started with a new language project.
Some Future Plans
There's a lot of upcoming work planned for YAMLScript.
I've mapped some of it out in the YAMLScript Roadmap.
Currently YAMLScript (written in Clojure, which compiles
to JVM bytecode, which…) compiles to a native binary interpreter using the
GraalVM native-image compiler.
This is great for performance and distribution, but it's not great for
portability, limiting it to Linux, MacOS and Windows.
The JVM is a great platform for portability, so we're planning to make a JVM
version of the ys YAMLScript interpreter.
Of course, having YAMLScript available as a JVM language is also a good thing
for Linux, MacOS and Windows users.
We also want to make WebAssembly, JavaScript and C++
versions of the YAMLScript interpreter.
And of course we still want to get to our goal of 42 language bindings!!!