Let's write a simple microservice in Clojure

Initially, this post was published here: https://www.linkedin.com/pulse/lets-write-simple-microservice-clojure-andrew-panfilov-2ghqe/

Intro

This article will explain how to write a simple service in Clojure. The sweet spot of making applications in Clojure is that you can expressively use an entire rich Java ecosystem. Less code, less boilerplate: it is possible to achieve more with less. In this example, I use most of the libraries from the Java world; everything else is a thin Clojure wrapper around Java libraries.

From a business logic standpoint, the microservice calculates math expressions and stores the history of such calculations in the database (there are two HTTP endpoints for that).

Github repository with source code: https://github.com/dzer6/calc

This educational microservice project will provide the following:

  1. Swagger descriptor for REST API with nice Swagger UI console. Nowadays, it is a standard de facto. Microservices should be accessible via HTTP and operate with data in a human-readable JSON format. As a bonus, it is super easy to generate data types and API client code for the client side (it works well for a TypeScript-based front-end, for example).
  2. Postgres-based persistence with a pretty straightforward mapping of SQL queries to Clojure functions. If you have ever used Java with Hibernate ORM for data persistence, you will feel relief after working with the database in Clojure with Hugsql. The model of the persistence layer is much simpler and easier to understand without the need for Session Cache, Application Level Cache and Query Cache. Debugging is straightforward, as opposed to the nightmare of debugging asynchronous actual SQL invocation that is never in the expected location. It is such an incredible experience to see the query invocation result as just a sequence of plain Clojure maps instead of a bag of Java entity proxies.
  3. REPL-friendly development setup. DX (dev experience) might not be the best in class, but it is definitely not bad. Whenever you want to change or add something to the codebase, you start a REPL session in an IDE (in my case, Cursive / IntelliJ Idea). You can run code snippets to print their results, change the codebase, and reload the application. In addition, you can selectively run needed tests. You do not need to restart the JVM instance every time after the codebase changes (JVM is famous for its slow start time). Using the mount library, all stateful resources shut down and initialize correctly every reload.

Leiningen

The project.clj file is a configuration file for Leiningen, a build automation and dependency management tool for Clojure. It specifies the project's metadata, dependencies, paths, and other settings necessary for building the project. Let's break down the libraries listed in the project.clj file into two groups: pure Java libraries and Clojure libraries, and describe each.

Clojure Libraries:

  1. org.clojure/clojure: The Clojure language itself.
  2. org.clojure/core.memoize: Provides memoization capabilities to cache the results of expensive functions.
  3. org.clojure/tools.logging: A simple logging abstraction that allows different logging implementations.
  4. mount: A library for managing state in Clojure applications.
  5. camel-snake-kebab: A library for converting strings (and keywords) between different case formats.
  6. prismatic/schema: A library for structuring and validating Clojure data.
  7. metosin/schema-tools: Utilities for Prismatic Schema.
  8. clj-time: A date and time library for Clojure.
  9. clj-fuzzy: A library for fuzzy matching and string comparison.
  10. slingshot: Provides enhanced try/catch capabilities in Clojure.
  11. ring: A Clojure web applications library.
  12. metosin/compojure-api: A library for building REST APIs with Swagger support.
  13. cprop: A configuration library for Clojure.
  14. com.taoensso/encore: A utility library providing additional Clojure and Java interop facilities.
  15. com.zaxxer/HikariCP: A high-performance JDBC connection pooling library.
  16. com.github.seancorfield/next.jdbc: A modern, idiomatic JDBC library for Clojure.
  17. com.layerware/hugsql-core: A library for defining SQL in Clojure applications.
  18. metosin/jsonista: A fast JSON encoding and decoding library for Clojure.

Pure Java Libraries:

  1. ch.qos.logback: A logging framework.
  2. org.codehaus.janino: A compiler that reads Java expressions, blocks, or source files, and produces Java bytecode.
  3. org.slf4j: A simple logging facade for Java.
  4. org.postgresql/postgresql: The JDBC driver for PostgreSQL.
  5. org.flywaydb: Database migration tool.
  6. com.fasterxml.jackson.core: Libraries for processing JSON.
  7. org.mvel/mvel2: MVFLEX Expression Language (MVEL) is a hybrid dynamic/statically typed, embeddable Expression Language and runtime.

To build the project, just run it in a terminal: lein uberjar

The path to a resulting fat-jar with all needed dependencies: target/app.jar

Frameworks VS Libraries

In the Java world, one common approach is to use full-fledged frameworks that provide comprehensive solutions for various aspects of software development. These frameworks often come with a wide range of features and functionalities built-in, aiming to simplify the development process by providing pre-defined structures and conventions. Examples of such frameworks include the Spring Framework, Java EE (now Jakarta EE), and Hibernate.

On the other hand, in the Clojure world, the approach tends to favour using small, composable libraries rather than monolithic frameworks. Clojure promotes simplicity and flexibility, encouraging developers to choose and combine libraries that best fit their needs. These libraries typically focus on solving one problem well, making them lightweight and easy to understand. Examples of popular Clojure libraries include Ring for web development, Compojure for routing, and Spec for data validation.

The difference between these approaches lies in their philosophies and design principles. Full bloated frameworks in the Java world offer convenience and a one-size-fits-all solution but may come with overhead and complexity. In contrast, small libraries in the Clojure world emphasize simplicity, modularity, and flexibility, allowing developers to build tailored solutions while keeping the codebase lightweight and maintainable.

Docker

If you do not intend to run the microservice locally on a laptop only, you will probably use containerization, and Docker is today the standard de facto for this.

Dockerfile sets up a containerized environment for the application, leveraging Amazon Corretto 22 on Alpine Linux. It downloads the AWS OpenTelemetry Agent (you can use the standard one if you don't need AWS-related) to enable observability features, including distributed tracing, and then copies the application JAR file into the container. Environment variables are configured to include the Java agent for instrumentation and allocate 90% of available RAM (which is useful for a container-based setup). Finally, it exposes port 8080 and specifies the command to start the Java application server.

Dev Experience

REPL

The Read-Eval-Print Loop in Clojure is a highly effective tool for interactive development, which allows developers to work more efficiently by providing immediate feedback. Unlike traditional compile-run-debug cycles, the REPL enables developers to evaluate expressions and functions on the fly, experiment with code snippets, and inspect data structures in real time. This makes the development process more dynamic and exploratory, leading to a deeper understanding of the codebase. Additionally, the REPL's seamless integration with the language's functional programming paradigm empowers developers to embrace Clojure's expressive syntax and leverage its powerful features, ultimately enhancing productivity and enabling rapid prototyping and iterative development cycles. REPL is a bee's knees, in other words.

First you start REPL-session:
REPL is started and ready for code evaluation

Next you type (init) to invoke initialization function and press Enter – application will start and you will see something similar to:
:done means that the service is up and running

The session logs show that the application loads configurations and establishes a connection with a PostgreSQL database. This involves initializing a HikariCP connection pool and Flyway for database migrations. The logs confirm that the database schema validation and migration checks were successful. The startup of the Jetty HTTP server follows, and the server becomes operational and ready to accept requests on the specified port.

After any code change to apply it, you should type (reset) and press Enter.

To run tests, you should type (run-tests) and press Enter.

Docker Compose

This approach ensures that all team members work in identical settings, thus mitigating the "it works on my machine" problem.

Using Docker Compose to run Postgres and any third-party services locally provides a streamlined and consistent development environment. Developers can define services in a docker-compose.yml file, which enables them to configure and launch an entire stack with a single command. In this case, Postgres is encapsulated within a container with predefined configurations. Docker Compose also facilitates easy scaling, updates, and isolation of services, enhancing development efficiency and reducing the setup time for new team members or transitioning between projects. It encapsulates complex configurations, such as Postgres' performance monitoring and logging settings, in a manageable, version-controlled file, simplifying and replicating the service setup across different environments.

Stateful Resources

The mount Clojure library is a lightweight and idiomatic solution for managing application state in Clojure applications. It offers a more straightforward and functional approach than the Spring Framework, which can be more prescriptive and heavy. Mount emphasizes simplicity, making it an excellent fit for the functional programming paradigm without requiring extensive configuration or boilerplate code. This aligns well with Clojure's philosophy, resulting in a more seamless and efficient development experience.

Example of managing database connection stateful resource.

Only two functions: for start and stop.

REST API

Compojure's DSL for web applications makes it easy to set up REST API routes with corresponding HTTP methods. Adding a Swagger API descriptor through libraries like ring-swagger provides a visual interface for interacting with the API and enables client code generation. You can use the Prismatic schema library for HTTP request validation and data coercing to ensure the API consumes and produces data that conforms to predefined schemas. Compojure's middleware approach allows for modular and reusable components that can handle cross-cutting concerns like authentication, logging, and request/response transformations, enhancing the API's scalability and maintainability.

Declarative concise DSL for REST API.

The middleware chain is set up in HTTP server-related namespace:

HTTP request middleware chain is a powerful yet dangerous tool – be careful when changing.

Developers and QA engineers find Swagger UI console highly convenient. I encourage you to run the service locally and try the console in a browser. Here is a list of HTTP endpoints with data schemas:

All information about the service' REST API in one place!

Isn't it awesome?

Endpoint documentation, request-response data schemas and even cURL command ready to use in the terminal!

Business Logic

The calc.rpc.controller.calculation controller houses the business logic that defines two primary operations: evaluate and obtain-past-evaluations.

The evaluate operation processes and evaluates mathematical expressions received as requests, storing the results in a database:

Only successful calculations will be stored in the database.

The obtain-past-evaluations operation fetches a list of previously executed calculations based on provided offset and limit parameters:

This operation does not contain request data schema as it is exposed as a GET HTTP endpoint.

Ensuring that exceptions or database inconsistencies are handled gracefully is crucial for the successful execution of these operations.

The integration of external libraries, MVEL (MVFLEX Expression Language) for expression evaluation, and JDBC for database transactions highlights Clojure's interoperability with Java.

Another essential principle demonstrated by using the MVEL library is never to write your implementation of something already written in Java in Clojure. Most of your business cases are already covered by some Java library written, stabilized, and optimized years ago. You should have strong reasons to write something from scratch in Clojure instead of using a Java analog.

Persistence Layer

Thanks to the hugsql library, we can use autogenerated Clojure functions directly mapped to SQL queries described in a plain text file:

Hugsql library uses

As Clojure is not an object-oriented language, we don't need to specially map query result sets coming from a relational database to a collection of objects in a programming language. No OOP, no ORM. Very convenient. The relational algebra paradigm seamlessly marries with a functional paradigm in Clojure. Very natural:

Remember  raw `-- :name find-expressions :query :many` endraw  in queries.sql file? It renders as  raw `query/find-expressions` endraw  Clojure function.

Compared to NoSQL databases, migrating the data schema in relational databases such as Postgres is a well-established practice. This is typically done through migrations, which is made easy by using the flyway library. To adjust the data schema in Postgres, we simply need to create a new text file containing the Data Definition Language (DDL) commands. In our case there is only one migration file:

The beauty of the declarative nature of relational DDL.

Whenever you change an SQL query in the queries.sql file, do not forget to run the (reset) function in the REPL-session console. It automatically regenerates the Clojure namespace with query declarations and runtime-generated SQL wrapper functions.

Configuration

The system uses the Clojure library cprop to manage its configuration. The library adopts a sequential merge policy to construct the application's configuration map. It starts by loading default-config.edn from resources and overlays it with local-config.edn if available. Then, it applies settings from an external config.edn and overrides by environment variables (adhering to the 12-factor app guidelines). This ensures that the latest source has precedence.

The configuration is essential during development and is a Clojure map validated against a Prismatic schema. If discrepancies are detected, the system immediately shuts down, adhering to the fail-fast principle.

Additionally, feature flags within the configuration enable selective feature toggling, aiding in the phased introduction of new functionality and ensuring robustness in production environments.

Logging

The service utilizes org.clojure/tools.logging to offer a logging API at a high level, which works in conjunction with Logback and Slf4j—two Java libraries that are well-known for their reliability in logging. The logging setup is customized for the application's environment: while in development, logs are produced in a plain text format that is easy to read, allowing for efficient debugging. On the other hand, when the service is deployed on servers, logs are structured in a JSON format, which makes them ideal for machine parsing and analysis, optimizing their performance in production.

Old good XML.

Tests

This is a real-world industrial example. Yes, we do have tests. Not many. But for this size codebase is pretty much okay.

Unfortunately, most open-source Clojure-based projects on Github do not contain good examples of integration tests. So, here we are, trying to close this gap.

We use the TestContainers library to raise real Postgres instances during the tests. Before Docker and TestContainers, the standard de facto in the Java world was running embedded pure Java database H2, trying to mimic Postgres. It was not good, but there was not much choice then.

The evaluate operation integration test:

Looks pretty concise and declarative.

The obtain-past-evaluations operation integration test:

Unfortunately, the downside of these integration tests is time – they are not fast tests.

After the tests run, you should see this:

Zero fails and zero errors. Awesome!

Conclusion

Now, when you go through the service codebase and know its internals, you can copy-paste it for yourself, change it according to your requirements, and voila, you will have a really good-looking microservice.

The described codebase is based on years of Clojure programming and a number of projects that have been implemented in Clojure. Some used libraries may look outdated, but in the Clojure world, if a library works, it is okay not to update it often—the language itself is super-stable, and you can easily read and support code written even a decade ago.

Permalink

2.5x better performance: Rama vs. MongoDB and Cassandra

We ran a number of benchmarks comparing Rama against the latest stable versions of MongoDB and Cassandra. The code for these benchmarks is available on Github. Rama’s indexes (called PStates) can reproduce any database’s data model since each PState is an arbitrary combination of durable data structures of any size. We chose to do our initial benchmarks against MongoDB and Cassandra because they’re widely used and like Rama, they’re horizontally scalable. In the future we’ll also benchmark against other databases of different data models.

There are some critical differences between these systems that are important to keep in mind when looking at these benchmarks. In particular, Cassandra by default does not guarantee writes are durable when giving acknowledgement of write success. It has a config commitlog_sync that specifies its strategy to sync its commit log to disk. The default setting “periodic” does the sync every 10 seconds. This means Cassandra can lose up to 10 seconds of acknowledged writes and regress reads on those keys (we disagree strongly with this setting being the default, but that’s a post for another day).

Rama has extremely strong ACID properties. An acknowledged write is guaranteed to be durable on the leader and all in-sync followers. This is an enormous difference with Cassandra’s default settings. As you’ll see, Rama beats or comes close to Cassandra in every benchmark. You’ll also see we benchmarked Cassandra with a commitlog_sync setting that does guarantee durability, but that causes its performance to plummet far below Rama.

MongoDB, at least in the latest version, also provides a durability guarantee by default. We benchmarked MongoDB with this default setting. Rama significantly outperforms MongoDB in every benchmark.

Another huge difference between Rama and MongoDB/Cassandra (and pretty much every database) comes from Rama being a much more general purpose system. Rama explicitly distinguishes data from indexes and stores them separately. Data is stored in durable, partitioned logs called “depots”. Depots are a distinct concept from “commit logs”, which is a separate mechanism that MongoDB, Cassandra, and Rama also have as part of their implementations. When using Rama, you code “topologies” that materialize any number of indexes of any shape from depots. You can use depots to recompute indexes if you made a mistake, or you can use depots to materialize entirely new indexes in the future to support new features. Depots can be consumed by multiple topologies materializing multiple indexes of different shapes. So not only is Rama in these benchmarks materializing equivalent indexes as MongoDB / Cassandra with great comparable performance, it’s also materializing a durable log. This is a non-trivial amount of additional work Rama is doing, and we weren’t expecting Rama to perform so strongly compared to databases that aren’t doing this additional work.

Benchmark setup

All benchmarks were done on a single m6gd.large instance on AWS. We used this instance type rather than m6g.large so we could use a local SSD to avoid complications with IOPS limits when using EBS.

We’re just testing single node performance in this benchmark. We may repeat these tests with clusters of varying sizes in the future, including with replication. However, all three systems have already demonstrated linear scalability so we’re most interested in raw single-node performance for this set of benchmarks.

For all three systems we only tested with the primary index, and we did not include secondary indexes in these tests. We tried configuring Cassandra to have the same heap size of Rama’s worker (4GB) instead of the default 2GB that it was choosing, but that actually made its read performance drastically worse. So we left it to choose its own memory settings.

The table definition used for Cassandra was:

1
2
3
4
5
6
CREATE TABLE IF NOT EXISTS test.test (
  pk text,
  ck text,
  value text,
  PRIMARY KEY (pk, ck)
);

This is representative of the kind of indexing that Cassandra can handle efficiently, like performing range queries on a clustering key.

All Cassandra reads/writes were done with the prepared statements "SELECT value FROM test.test WHERE pk = ? AND ck = ?;" and "INSERT INTO test.test (pk, ck, value) VALUES (?, ?, ?);" .

Cassandra was tested with both the “periodic” commitlog_sync config, which does not guarantee durability of writes, and the “batch” commitlog_sync config, which does guarantee durability of writes. We played with different values of commitlog_sync_batch_window_in_ms , but that had no effect on performance. We also tried the “group” commitlog_sync config, but we couldn’t get its throughput to be higher than “batch” mode. We tried many permutations of the configs commitlog_sync_group_window (e.g. 1ms, 10ms, 20ms, 100ms) and concurrent_writes (e.g. 32, 64, 128, 256), but the highest we could get the throughput was about 90% that of batch mode. The other suggestions on the Cassandra mailing list didn’t help.

The Rama PState equivalent to this Cassandra table had this data structure schema:

1
{[String, String] -> String}

The module definition was:

1
2
3
4
5
6
7
8
9
10
11
(defmodule CassandraModule [setup topologies]
  (declare-depot setup *insert-depot :random)

  (let [s (stream-topology topologies "cassandra")]
    (declare-pstate s $$primary {java.util.List String})
    (<<sources s
      (source> *insert-depot :> *data)
      (ops/explode *data :> [*pk *ck *val])
      (|hash *pk)
      (local-transform> [(keypath [*pk *ck]) (termval *val)] $$primary)
      )))

This receives triples of partitioning key, clustering key, and value and writes it into the PState, ensuring the data is partitioned by the partitioning key.

Cassandra and Rama both index using LSM trees, which sorts on disk by key. Defining the key as a pair like this is equivalent to Cassandra’s “partitioning key” and “clustering key” definition, as it’s first sorted by the first element and then by the second element. This means the same kinds of efficient point queries or range queries can be done.

The Rama PState equivalent to MongoDB’s index had this data structure schema:

1
{String -> Map}

The module definition was:

1
2
3
4
5
6
7
8
9
10
11
(defmodule MongoModule [setup topologies]
  (declare-depot setup *insert-depot :random)

  (let [s (stream-topology topologies "mongo")]
    (declare-pstate s $$primary {String java.util.Map})
    (<<sources s
      (source> *insert-depot :> *data)
      (ops/explode *data :> {:keys [*_id] :as *m})
      (|hash *_id)
      (local-transform> [(keypath *_id) (termval *m)] $$primary)
      )))

This receives maps containing an :_id field and writes each map to the $$primary index under that ID, keeping the data partitioned based on the ID.

We used strings for the IDs given to MongoDB, so we used strings in the Rama definition as well. MongoDB’s documents are just maps, so they’re stored that way in the Rama equivalent.

Writing these modules using Rama’s Java API is pretty much the same amount of code. There’s no difference in performance between Rama’s Clojure and Java APIs as they both end up as the same bytecode.

Max write throughput benchmark

For the max write throughput benchmark, we wrote to each respective system as fast as possible from a single client colocated on the same node. Each request contained a batch of 100 writes, and the client used a semaphore and the system’s async API to only allow 1000 writes to be in-flight at a time. As requests got acknowledged, more requests were sent out.

As described above, we built one Rama module that mimics how MongoDB works and another module that mimics how Cassandra works. We then did head to head benchmarks against each database with tests writing identical data.

For the MongoDB tests, we wrote documents solely containing an “_id” key set to a UUID. Here’s MongoDB vs. Rama:

Rama’s throughput stabilized after 50 minutes, and MongoDB’s throughput continued to decrease all the way to the end of the three hour test. By the end, Rama’s throughput was 9x higher.

For the Cassandra tests, each write contained a separate UUID for the fields “pk”, “ck”, and “value”. We benchmarked Cassandra both with the default “periodic” commit mode, which does not guarantee durability on write acknowledgement, and with the “batch” commit mode, which does guarantee durability. As mentioned earlier, we couldn’t get Cassandra’s “group” commit mode to match the performance of “batch” mode, so we focused our benchmarks on the other two modes. Here’s a chart with benchmarks of each of these modes along with Rama:

Since Rama guarantees durable writes, the equivalent comparison is against Cassandra’s batch commit mode. As you can see, Rama’s throughput is 2.5x higher. Rama’s throughput is only a little bit below Cassandra when Cassandra is run without the durability guarantee.

Mixed read/write throughput benchmark

For the mixed read/write benchmark, we first wrote a fixed amount of data into each system. We wanted to see the performance after each system had a significant amount of data in it, as we didn’t want read performance skewed by the dataset being small enough to fit entirely in memory.

For the MongoDB tests, we wrote documents solely containing an “_id” field with a stringified number that incremented by two for each write (“0”, “2”, “4”, “6”, etc.). We wrote 250M of those documents (max ID was “500000000”). Then for the mixed reads/writes test, we did 50% reads and 50% writes. 1000 pairs of read/writes were in-flight at a time. Each write was a single document (as opposed to batch write test above which did 100 at a time), and each read was randomly chosen from the keyspace from “0” to the max ID. Since only half the numbers were written, this means each read had a 50% chance of being a hit and a 50% chance of being a miss.

Here’s the result of the benchmark for MongoDB vs. Rama:

We also ran another test of MongoDB with half the initial data:

MongoDB’s performance is unaffected by the change in data volume, and Rama outperforms MongoDB in this benchmark by 2.5x.

For the Cassandra tests, we followed a similar strategy. For every write, we incremented the ID by two and wrote that number stringifed for the “pk”, “ck”, and “value” fields (e.g. "INSERT INTO test.test (pk, ck, value) VALUES ('2', '2', '2');" ). Reads were similarly chosen randomly from the keyspace from “0” to the max ID, with each read fetching the value for a “pk” and “ck” pair. Just like the MongoDB tests, each read had a 50% chance of being a hit and a 50% chance of being a miss.

After writing 250M rows to each system, here’s the result of the benchmark for Cassandra vs. Rama:

Rama performs more than 2.5x better in this benchmark whether Cassandra is guaranteeing durability of writes or not. Since Cassandra’s write performance in this non-durable mode was a little higher than Rama in our batch write throughput test, this test indicates its read performance is substantially worse.

Cassandra’s non-durable commit mode being slightly worse than its durable commit mode in this benchmark, along with Cassandra’s reputation as a high performance database, made us wonder if we misconfigured something. As described earlier, we tried increasing the memory allocated to the Cassandra process to match Rama (4GB), but that actually made its performance much worse. We made sure Cassandra was configured to use the local SSD for everything (data dir, commit log, and saved caches dir). Nothing else in the cassandra.yaml or cassandra-env.sh files seemed misconfigured. There are a variety of configs relating to compaction and caching that could be relevant, but Rama has similar configs that we also didn’t tune for these tests. So we left those at the defaults for both systems. After double-checking all the configs we reran this benchmark for Cassandra for both commit modes and got the same results.

One suspicious data point was the amount of disk space used by each system. Since we wrote a fixed amount of identical data to each system before this test, we could compare this directly. Cassandra used 11GB for its “data dir”, which doesn’t include the commit log. Rama used 4GB for the equivalent. If you add up the raw amount of bytes used by 250M rows with identical “pk”, “ck”, and “value” fields that are stringified numbers incrementing by two, you end up with 6.1GB. Both Cassandra and Rama compress data on disk, and since there are so many identical values compression should be effective. We don’t know enough about the implementation of Cassandra to say why its disk usage is so high relative to the amount of data being put into it.

We ran the test again for Cassandra with half the data (125M rows), and these were the results:

Cassandra’s numbers are much better here, though the numbers were degrading towards the end. Cassandra’s read performance seems to suffer as the dataset gets larger.

Conclusion

We were surprised by how well Rama performed relative to Cassandra and MongoDB given that it also materializes a durable log. When compared to modes of operation that guarantee durability, Rama performed at least 2.5x better in every benchmark.

Benchmarks should always be taken with a grain of salt. We only tested on one kind of hardware, with contrived data, with specific access patterns, and with default configs. It’s possible MongoDB and Cassandra perform much better on different kinds of data sets or on different hardware.

Rama’s performance is reflective of the amount of work we put into its design and implementation. One of the key techniques we use all over the place in Rama’s implementation is what we call a “trailing flush”. This technique allows all disk and network operations to be batched even though they’re invoked one at a time. This is important because disk syncs and network flushes are expensive. For example, when an append is done to a depot (durable log), we don’t apply that immediately. Instead the appends gets put into an in-memory buffer, and an event is enqueued that will flush that buffer if no such event is already enqueued. When that event comes to the front of the processing queue, it flushes whatever has accumulated on the buffer. If the rate of appends is low, it may do a disk operation for a single append. As the rate of appends gets higher, the number of appends that gets performed together increases. This technique greatly increases throughput while also minimizing latency. We use this technique for sending appends from a client, for flushing network messages in Netty (called “flush consolidation”), for writing to indexes, for sending replication messages to followers, and more.

The only performance numbers we shared previously were for our Twitter-scale Mastodon instance, so we felt it was important to publish some more numbers against tools many are already familiar with. If there are any flaws in how we benchmarked MongoDB or Cassandra, please share with us and we’ll be happy to repeat the benchmarks.

Since Rama encompasses so much more than data indexing, in the future we will be doing more benchmarks against different kinds of tooling, like queues and processing systems. Additionally, since Rama is an integrated system we expect its most impressive performance numbers to be when benchmarked against combinations of tooling (e.g. Kafka + Storm + Cassandra + ElasticSearch). Rama eliminates the overhead inherent when using combinations of tooling like that.

Finally, since Rama is currently in private beta you have to join the beta to get access to a full release in order to be able to reproduce these benchmarks. As mentioned at the start of this post, the code we used for the benchmarks is on our Github. Benchmarks are of course better when they can be independently reproduced. Eventually Rama will be generally available, but in the meantime we felt publishing the numbers was still important even with this limitation.

Permalink

What the Reagent Component?!

Did you know that when you write a form-1, form-2 or form-3 Reagent component they all default to becoming React class components?

For example, if you were to write this form-1 Reagent component:

(defn welcome []
  [:h1 "Hello, friend"])

By the time Reagent passes it to React it would be the equivalent of you writing this:

class Welcome extends React.Component {
  render() {
    return <h1>Hello, friend</h1>
  }
}

Okay, so, Reagent components become React Class Components. Why do we care? This depth of understanding is valuable because it means we can better understand:

The result of all of this "fundamental" learning is we can more effectively harness JavaScript from within ClojureScript.

A Pseudoclassical Pattern

The reason all of your Reagent components become class components is because all of the code you pass to Reagent is run through an internal Reagent function called create-class.

create-class is interesting because of how it uses JavaScript to transform a Reagent component into something that is recognized as a React class component. Before we look into what create-class is doing, it's helpful to review how "classes" work in JavaScript.

Prior to ES6, JavaScript did not have classes. and this made some JS developers sad because classes are a common pattern used to structure code and provide support for:

  • instantiation
  • inheritance
  • polymorphism

But as I said, prior to ES6, JavaScript didn't have a formal syntax for "classes". To compensate for the lack of classes, the JavaScript community got creative and developed a series of instantiation patterns to help simulate classes.

Of all of these patterns, the pseudoclassical instantiation pattern became one of the most popular ways to simulate a class in JavaScript. This is evidenced by the fact that many of the "first generation" JavaScript libraries and frameworks, like google closure library and backbone, are written in this style.

The reason we are going over this history is because the thing about a programming language is there are "patterns" and "syntax". The challenge with "patterns" is:

  • They're disseminated culturally (tribal knowledge)
  • They're difficult to identify
  • They're often difficult to search
  • They often require a deeper knowledge to understand how and why to use a pattern.

The last point in praticular is relevant to our conversation because patterns live in a context and assume prior knowledge. Knowledge like how well we know the context of a problem, the alternative approaches to addressing a problem, advancements in a language and so on.

The end result is that a pattern can just become a thing we do. We can forget or never know why it started in the first place or what the world could look like if we chose a different path.

For example, the most common way of writing a React class component is to use ES6 class syntax. But did you know that ES6 class syntax is little more than syntactic sugar around the pseudoclassical instantiation pattern?

For example, you can write a valid React class component using the pseudoclassical instantiation pattern like this:

// 1. define a function (component) called `Welcome`
function Welcome(props, context, updater) {
  React.Component.call(this, props, context, updater)

  return this
}

// 2. connect `Welcome` to the `React.Component` prototype
Welcome.prototype = Object.create(React.Component.prototype)

// 3. re-define the `constructor`
Object.defineProperty(Welcome.prototype, 'constructor', {
  enumerable: false,
  writable: true,
  configurable: true,
  value: Welcome,
})

// 4. define your React components `render` method
Welcome.prototype.render = function render() {
  return <h2>Hello, Reagent</h2>
}

While the above is a valid React Class Component, it's also verbose and error prone. For these reasons JavaScript introduced ES6 classes to the language:

class Welcome extends React.Component {
  render() {
    return <h1>Hello, Reagent</h1>
  }
}

For those looking for further evidence, we can support our claim that ES6 Classes result in same thing as what the pseudoclassical instantiation pattern produces by using JavaScript's built-in introspection tools to compare the pseudoclassical instantiation pattern to the ES6 class syntax.

pseudoclassical instantiation pattern:

function Welcome(props, context, updater) {
  React.Component.call(this, props, context, updater)

  return this
}

// ...repeat steps 2 - 4 from above before completing the rest

var welcome = new Welcome()

Welcome.prototype instanceof React.Component
// => true

Object.getPrototypeOf(Welcome.prototype) === React.Component.prototype
// => true

welcome instanceof React.Component
// => true

welcome instanceof Welcome
// => true

Object.getPrototypeOf(welcome) === Welcome.prototype
// => true

React.Component.prototype.isPrototypeOf(welcome)
// => true

Welcome.prototype.isPrototypeOf(welcome)
// => true

ES6 class

class Welcome extends React.Component {
  render() {
    console.log('ES6 Inheritance')
  }
}

var welcome = new Welcome()

Welcome.prototype instanceof React.Component
// => true

Object.getPrototypeOf(Welcome.prototype) === React.Component.prototype
// => true

welcome instanceof React.Component
// => true

welcome instanceof Welcome
// => true

Object.getPrototypeOf(welcome) === Welcome.prototype
// => true

React.Component.prototype.isPrototypeOf(welcome)
// => true

Welcome.prototype.isPrototypeOf(welcome)
// => true

What does all of this mean? As far as JavaScript and React are concerned, both definions of the Welcome component are valid React Class Components.

With this in mind, lets look at Reagent's create-class function and see what it does.

What Reagent Does

The history lesson from the above section is important because create-class uses a modified version of the pseudoclassical instantiation pattern. Let's take a look at what we mean.

The following code sample is a simplified version of Reagent's create-class function:

function cmp(props, context, updater) {
  React.Component.call(this, props, context, updater)

  return this
}

goog.extend(cmp.prototype, React.Component.prototype, classMethods)

goog.extend(cmp, React.Component, staticMethods)

cmp.prototype.constructor = cmp

What we have above is Reagents take on the pseudoclassical instantiation pattern with a few minor tweaks:

// 1. we copy to properties + methods of React.Component
goog.extend(cmp.prototype, React.Component.prototype, classMethods)

goog.extend(cmp, React.Component, staticMethods)

// 2. the constructor is not as "thorough"
cmp.prototype.constructor = cmp

Exploring point 1 we see that Reagent has opted to copy the properties and methods of React.Component directly to the Reagent compnents we write. That is what's happening here:

goog.extend(cmp.prototype, React.Component.prototype, classMethods)

If we were using the the traditional pseudoclassical approach we would instead do this:

cmp.prototype = Object.create(React.Component.prototype)

Thus, the difference is that Reagent's approach copies all the methods and properties from React.Component to the cmp prototype where as the second approach is going to link the cmp prototype to React.component prototype. The benefit of linking is that each time you instantiate a Welcome component, the Welcome component does not need to re-create all of the React.components methods and properties.

Exploring the second point, Reagent is doing this:

cmp.prototype.constructor = cmp

whereas with the traditional pseudoclassical approach we would instead do this:

Object.defineProperty(Welcome.prototype, 'constructor', {
  enumerable: false,
  writable: true,
  configurable: true,
  value: Welcome,
})

The difference in the above approaches is that if we just use = as we are doing in the Reagent version we create an enumerable constructor. This can have an implication depending on who consumes our classes, but in our case we know that only React is going to be consuming our class components, so we can do this with relative confidence.

What is one of the more interesting results of the above two Reagent modifications? First, if React depended on JavaScript introspection to tell whether or not a component is a child of React.Component we would not be happy campers:

Welcome.prototype instanceof React.Component
// => false...Welcome is not a child of React.Component

Object.getPrototypeOf(Welcome.prototype) === React.Component.prototype
// => false...React.component is not part of Welcomes prototype chain

welcome instanceof React.Component
// => false...Welcome is not an instance of React.Component

welcome instanceof Welcome
// => true...welcome is a child of Welcome

Object.getPrototypeOf(welcome) === Welcome.prototype
// => true...welcome is linked to Welcome prototype

console.log(React.Component.prototype.isPrototypeOf(welcome))
// => false...React.Component not linked to the prototype of React.Component

console.log(Welcome.prototype.isPrototypeOf(welcome))
// is Welcome is the ancestory?

What the above shows is that Welcome is not a child of React.component even though it has all the properties and methods that React.Component has. This is why were lucky that React is smart about detecting class vs. function components.

Second, by copying rather than linking prototypes we could inccur a performance cost. How much of a performance hit? In our case this cost is likely negligible.

Conclusion

In my experience, digging into the weeds and going on these detours has been an important part of my growth as a developer. The weeds have allowed me to be a better programmer because I'm honing my ability to understand challenging topics and find answers. The result is a strange feeling of calm and comfort.

This calm and comfort shouldn't be overlooked. So much of our day-to-day is left unquestioned and unanalyzed. We let knowledge become "cultural" or "tribal". This is scary. It's scary because it leads to bad decisions because no one around us knows the whys or wherefores. Ultimately, it's a bad habit. A bad habit which is seen by some as a virtue because it would simply take too much time for to learn things ourselves. That's until you actually start doing this kind of work and spend time learning and observing and seeing that these "new things" we're seeing all the time aren't really new, but just another example of that old thing back.

Permalink

What are the Clojure Tools?

The Clojure Tools are a group of convenience tools which currently consist of:

  • Clojure CLI
  • tools.build

The Clojure Tools. were designed to answer some of the following questions:

  • How do I install Clojure? (Clojure CLI)
  • How do I run a Clojure program? (Clojure CLI)
  • How do I manage Clojure packages (dependencies)? (Clojure CLI)
  • How do I configure a Clojure project? (Clojure CLI)
  • How do I build Clojure for production? (tools.build)

The rest of this post will dig into each of these tools.

Clojure CLI

The Clojure CLI is a CLI program. Here is what it looks like to use the Clojure CLI and some of the things it can do:

Run a Clojure repl

clj

Run a Clojure program

clj -M -m your-clojure-program

manage Clojure dependencies

clj -Sdeps '{:deps {bidi/bidi {:mvn/version "2.1.6"}}}'

Like all Clojure programs, the Clojure CLI is built on a few libraries:

The following sections will provide overviews of each of the above tools.

The Clojure CLI is invoked by calling either clj or clojure shell commands:

# clj
clj -M -m your-clojure-program

# clojure
clojure -M -m your-clojure-program

Under the hood, clj actually calls clojure. The difference is that clj wraps the clojure command with a tool called rlwrap. rlwrap improves the developer experience by making it easier for you, a human, to type in the terminal while you're running your Clojure REPL. However, even though it's easier for you to type, rlwrap can make it hard to compose the clj command with other tools. As a result, it's a common practice to use clojure in production/ci environments . Additionally, not all environments have access to rlwrap so it's another dependency you have to install.

Okay, so they do the same thing. What do they do? clj/clojure has one job: run Clojure programs against a classpath.

The next sections will outline the tools that make up the Clojure CLI tool.

clj/clojure

If you dig into the clj/clojure is just a bash script which ultimatley calls a command like this:

java [java-opt*] -cp classpath clojure.main [init-opt*] [main-opt] [arg*]

Thus, the Clojure CLI tool makes it easier to run Clojure programs. It saves you having to type out a gnarly Java command and make it work on different environments (windows, linux, mac etc). However, it orchestrates the building of the classpath by calling out to tools.deps.

tools.deps

tools.deps is a Clojure libary responsible for managing your dependencies. It does the following things:

  • reads in dependencies from a deps.edn file
  • resolves the dependencies and their transitive dependencies
  • builds a classpath

What's interesting about this program is that it's just a Clojure library. This means that you can use it outside of the Clojure CLI.

The other thing that makes tools.deps great is that it's a small and focused library. Why this is great is that if something goes wrong it's easy to read and learn the library in a short period of time.

deps.edn

deps.edn is just an edn file where you configure your project and specify project dependencies. You can think of it like Clojure's version of package.json. The deps.edn file is a Clojure map with a specific structure. Here's an example of some of the properties of a deps.edn file:

{:deps    {...}
 :paths   [...]
 :aliases {...}}

As you can see, we use the keywords :deps, :paths and :aliases and more to start to describe your project and the dependencies it requires.

As we noted above, deps.edn is read in when you run clj/clojure and tells clj/clojure which dependencies are requires to run your project.

Tools.Build

tools.build is a Clojure library with functions for building clojure projects. For example, build a jar or uberjar.

The way you would use tools.build is by writing a separate program inside your app which knows how to build your app. The convention is to create a build.clj file in the root of your project. Import tools.build and use the functions provides by tools.build to build your program.

The 3 main types of Clojure programs one might build into 3 sub categories:

  • A tool
  • A library
  • An app

When you run your build.clj file, you will use Clojure CLI's -T switch. The -T switch is meant to run general clojure programs via the Clojure CLI and since build.clj is a separate program, distinct form the app you are writing, you would run it via the -T switch.

You would use -T for Clojure programs that you want to run as a "tool". For example, deps-new is a Clojure library which creates new Clojure projects based on a template you provide. This is a great example of a Clojure project which is built to be a "tool".

I don't want to go into more detail about -T now because that means we would have to dive into other Clojure CLI switches like -X and -M. That's for another post. On to the Installer!

Installer

The "Clojure CLI Installer" is a fancy way of referring to the brew tap used to install Clojure on mac and linux machines. As of February 2020, Clojure started maintaining their own brew tap. Thus, if you installed the Clojure CLI via

brew install clojure

you will likely want to uninstall clojure and install the following:

brew install clojure/tools/clojure

In all likelihood, you would probably be fine with brew install clojure as it will recieve updates. However, while brew install clojure will still see some love, it won't be as active as the clojure/tools/clojure tap.

clj v lein v boot

This section will provide a quick comparison of clj, lein and boot.

Firstly, all of the above tools are more or less addressing the same problems in their own way. Your job is to choose the one you like best.

If you're curious which to choose, my answer is the Clojure CLI. The reason I like the Clojure CLI is because the tool is simple. You can read through clj and tools.deps in an afternoon and understand what they are doing. The same (subjectively of course) cannot be said for lein or boot. I will note that Clojure CLI's API is not straightforward and can be confusing.

Secondly, the Clojure Tools promote libraries over frameworks. This is important when working with a language like Clojure because it really does reward you for breaking down your thinking.

Finally, the Clojure community is really leaning into building tools for Clojure CLI. For example, where lein used to have significantly more functionality, the community has built a ton of incredible tools that will cover many of your essential requirements.

Permalink

Transducers and Eduction in Clojure simply explained

With help from our coming AI overlords:

Transducers and eduction in Clojure are ways to efficiently process and transform data, especially when working with large datasets.

Here's a simple explanation:

Transducers:

  • Transducers are functions that can be composed together to create a sequence of transformations.
  • They allow you to define a series of operations (like mapping, filtering, etc.) that can be applied to any collection of data, without having to worry about the specific data structure.
  • Transducers are "context-independent" - they don't care about the input or output data structures, they just focus on the transformations.
  • This makes transducers very flexible and reusable. You can combine them in different ways to create complex data pipelines.

Eduction:

  • Eduction is a way to apply a transducer to a collection of data without creating intermediate data structures.
  • Normally, when you apply a series of transformations to a collection (e.g. map, filter, etc.), you end up creating a new collection at each step.
  • With eduction, the transformations are applied "on the fly" as the data is consumed, without creating those intermediate collections.
  • This can be much more efficient, especially when working with large datasets, because you don't have to allocate memory for all the intermediate results.
  • eduction is a Clojure core function that takes transducers as arguments, and captures the transduction process into a function. It applies the transducers to the input collection, but the result is a reducible/iterable, not a concrete data structure. You need to use functions like iterator-seq to get a sequence from the reducible result.

In summary, transducers allow you to define reusable data transformations, and eduction allows you to apply those transformations efficiently without creating unnecessary data structures. Together, they provide a powerful way to build composable, high-performance data processing pipelines in Clojure.

Eduction is best used when the result will be completely consumed in a reducible context. But transducers can be used with other functions as well, depending on the specific use case.

Permalink

Clojure 1.11.3

Clojure 1.11.3 is now available.

  • CLJ-2843 - Reflective calls to Java methods that take primitive long or double now work when passed a narrower boxed number at runtime (Integer, Short, Byte, Float). Previously, these methods were not matched during reflection and an error was thrown.

Java 21 added an overload to the method Thread/sleep in the 1-arity. When upgrading to Java 21, existing Clojure calls to Thread/sleep become reflective, but continue to work. As usual, you can detect reflection with *warn-on-reflection* and address with a type hint (here, ^long) to choose the desired overload. Previously, passing a Short or Integer value to a reflective call like Thread/sleep that takes a long would not match, that has been corrected.

Permalink

Senior Clojure Back-End Engineer at Peruse Technology Inc

Senior Clojure Back-End Engineer at Peruse Technology Inc

130000 - 220000

** Responsibilities:

Development and support of an application, database and associated APIs.

Required Technical Skills:

Clojure, Datomic, GraphQL and general AWS experience (some or all of: Lambda, API gateway, EKS, Open Search, Textract, Cloudwatch, Cloudfront, Cognito, SES, Route 53, DynamoDB, etc.)

Other Details:

Must be US or Canada based.

Must be available to start within a week of the interview.

This is a full-time, 40 hour/week role.

May be either a temporary or long-term role, depending on your preference and the mutual fit.

You must be interested in working in a fast-paced, start-up environment.  If your only experience is working with a large company, this probably isn’t a fit for you.

**

Permalink

Clojure Goodness: Pretty Printing Collection Of Maps #Clojure

The namespace clojure.pprint has some useful function to pretty print different data structures. The function print-table is particularly useful for printing a collection of maps, where each map represents a row in the table, and the keys of the maps represent the column headers. The print-table function accepts the collection as argument and prints the table to the console (or any writer that is bound to the *out* var). We can also pass a vector with the keys we want to include in the table. Only the keys we specify are in the output. The order of the keys in the vector we pass as argument is also preserved in the generated output.

Permalink

Keeping the :argslist of Clojure functions DRY

The problem: I had a case where I had several “pairs” of functions that shared their interface and I want to type as little as possible and don’t want to have to remember to keep their argument lists and doc strings in sync. The functions return SVG and the interface is a configuration map. One of the functions in a pair returns Hiccup, because that’s what I need for the components rendering the SVG.
Read this ↗️

Permalink

Keeping the :arglists of Clojure functions DRY

The problem: function-a takes the same arguments as function-b. In fact, function-a calls function-b. Without too much synchronized updating of the function signatures, I want (doc function-a) to show me the same argument lists as (doc function-b). The solution: Use :arglists in the metadata of the function. (defn function-b [{:keys [a b c]}] (println a b c)) (defn function-a {:arglists '([{:keys [a b c]}])} [args] (function-b args)) That’s the TL;DR. Read on for some rationale, and for some nerdy diving into the worlds of static and dynamic analysis.
Read this ↗️

Permalink

53: Clojure LSP with Eric Dallo

Eric Dallo talks about the LSP protocol, and Clojure LSP. Sorry about the audio quality on this recording, I missed that I was using my MacBook Microphone instead of my podcast microphone. Clojure LSP Langserver.org lsp-mode clj-kondo analysis data clojure-lsp-intellij

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.