SynaptiQ is 3E’s SaaS product, an independent, evolutionary software suite for asset management of renewable energy portfolios (more information on ). SynaptiQ collects near-real time data of more than 20 million devices spread over 10 thousands utility scale and commercial solar and wind sites spread all over the globe.
We develop and operate advanced analytical services to enrich the monitoring data by:
satellite imaging data,
meteorological modelling,
advanced system modelling,
machine learning & artificial intelligence.
The platform combines domains related to big data, high-performance processing, IoT protocols and AI and is the product of the interactions between a multidisciplinary team of developers, scientists, renewable energy architects, electrical engineers, and enthusiast sales that implement, operate and commercialize SynaptiQ worldwide.
The added value realized by SynaptiQ is performance improvement and operational cost reduction for its Operations & Maintenance customers.
What you will be doing
We are looking for a Back-End Developer with experience in Clojure and a passion for creating efficient, scalable, and accessible back-end systems. At 3E, you will have the opportunity to work on meaningful projects that contribute to the advancement of renewable energy technologies and digitalization.
Responsabilities
Develop, test, and maintain robust, performant, and scalable back-end codebase in accordance with design and product requirements.
Identify and optimize performance bottlenecks in the application (and MySQL db) using best practices and techniques.
Use Polylith, Reitit, Malli to build APIs which are efficient and maintainable.
Design and develop efficient business-logic processors that act on data.
Conduct thorough code reviews and enforce good back-end practices to ensure quality and maintainability of code.
Collaborate with multidisciplinary teams of developers, scientists, renewable energy architects, electrical engineers, and sales enthusiasts to achieve project goals (in-person or via email/MS Teams).
Track project evolution (Jira).
Maintain a codebase that is easy to understand, modify, and extend, and adhere to coding standards and best practices.
Requirements
To fulfil this role we are looking for someone with:
Minimum of 3 years of experience in Clojure OR significant open-source contributions which can show a significant level of skill.
A proven track record of optimizing performance bottlenecks, enforcing good back-end practices.
Good understanding of performance optimization techniques in the context of Clojure & MySQL.
Product-based experience – supporting and modifying a product through several years – living with the decisions of the past and building on top of them.
Experience with refactoring a codebase as new features are written.
Bonus points for:
Good profiling skills (JVM profiler).
Other lisp languages (e.g SBCL).
Knowledge of Docker
Benefits
Our offices are hidden in the centre of Brussels with a view on a pond, with ducks and a heron bringing a regular visit. In addition to a stimulating atmosphere in a highly motivated group of people**, 3E offers a unique opportunity to further develop yourself in a company/team with an ambitious growth plan, delivering innovative services.
Furthermore:
Flexible & gliding working hours
Open to fully remote candidates
An international environment with colleagues of 25+ nationalities and projects in over 100 countries.
An open-minded company where everybody can bring their ideas to the table
During Clojure South, Marlon Silva, Senior Software Engineer at Nubank, shared his perspective on a recurring challenge for software engineers working with AI today: how to move beyond using AI assistants and start engineering reliable, task-oriented AI agents.
According to Marlon, the industry has made it increasingly easy to consume AI — APIs, copilots, and assistants are everywhere — but building AI-powered systems still requires engineers to make a series of low-level, often uncomfortable decisions. In his talk, he focused on demystifying those decisions, framing AI not as a black box, but as infrastructure, and integrations that need to be reasoned about explicitly.
Rather than presenting a new framework or abstraction layer, Marlon walked through the architectural choices he believes matter most when building agents in practice, and explained why Clojure offers a particularly strong foundation for exploring this space.
Infrastructure first: where your models live matters
Marlon started by arguing that any serious AI initiative begins with infrastructure — specifically, how teams access models. While this decision is often treated as an implementation detail, he emphasized that it directly impacts scalability, experimentation, security, and integration with existing systems.
From his perspective, engineers typically face two options:
Direct AI vendors: Marlon noted that these providers are an excellent entry point for individual developers. Signing up is straightforward, APIs are well-documented, and it is possible to start experimenting almost immediately. For learning and early exploration, this path minimizes friction.
Cloud providers: For organizations, however, Marlon argued that leveraging existing cloud relationships is usually the better long-term decision. Most companies already have accounts, billing, security controls, and observability in place. Cloud Providers like AWS and GCP make it possible to access models from multiple AI Labs without introducing new suppliers into the stack.
According to Marlon, when teams are operating inside an organization, the most pragmatic default is to use the models already available through their cloud providers. This removes procurement overhead and allows engineers to focus on building systems instead of managing vendors.
The fragmentation problem: too many APIs
Once access to models is established, Marlon pointed out a second, inevitable problem: API fragmentation. Each provider exposes different request formats, parameters, and SDKs, which quickly complicates development and makes experimentation costly.
In his talk, Marlon described this fragmentation as one of the first scaling pain points teams encounter when AI moves beyond a single script or proof of concept.
To address this, he introduced LiteLLM as a practical unification layer. LiteLLM is a proxy that standardizes access to multiple models and providers behind a single API, regardless of where the model is hosted.
Marlon highlighted three concrete benefits of this approach:
A unified interface, allowing teams to switch models without rewriting integration code.
Centralized observability, creating a single point for logging, debugging, and auditing model interactions.
Cost control, which he emphasized as critical. Token usage scales quickly in production, and LiteLLM enables organizations to track and limit usage per team, service, or key.
From Marlon’s perspective, this kind of proxy is not an optimization: it becomes foundational infrastructure as soon as AI is part of a real system.
Why smaller models often work better for agents
Marlon then challenged a common assumption in the AI space: that larger models are always better.
For task-oriented AI agents, he argued, this is rarely true. Citing recent research from Nvidia, Marlon explained that Small Language Models (SLMs) are often a better fit for agents designed to execute specific, well-defined actions.
According to him, the broad generalization capabilities of large LLMs, models capable of writing essays or long-form prose, are unnecessary for most agent workloads. Using them in these contexts leads to wasted capacity, higher costs, and increased complexity.
He outlined several practical advantages of SLMs:
Cost efficiency: models like Llama 4 Scout on AWS Bedrock cost orders of magnitude less per token than large proprietary models.
Lower energy consumption: making them a more responsible choice at scale and more eco-friendly.
Feasible fine-tuning: adapting a 7–10B parameter model to a specific domain is realistic, whereas doing the same with very large models often is not real for most companies.
For Marlon, choosing SLMs is not a compromise, as it is an engineering decision aligned with the actual requirements of agent-based systems.
Why Clojure fits this problem space
From there, Marlon shifted focus to tooling. He explained that AI development is inherently experimental: prompts change, parameters are adjusted, models are swapped, and assumptions are constantly tested. In that context, developer feedback loops matter. This is where Clojure stands out.
Marlon described Clojure as an ergonomic language — not because of syntax alone, but because of its REPL-driven development model. The ability to evaluate functions incrementally, inspect results immediately, and iterate without restarting an application fundamentally changes how engineers explore problem spaces.
In his experience, this interactive workflow aligns closely with how AI systems are built and refined.Beyond the REPL, Marlon highlighted two interoperability advantages:
Java interoperability: Because Clojure runs on the JVM, it has seamless access to the Java ecosystem. Cloud SDKs, HTTP clients, observability tools, and mature libraries are immediately available.
Python interoperability: With libraries such as libpython-clj, Clojure can import and execute Python code directly. While not as seamless as Java interop, this capability allows engineers to reuse Python-based AI tooling without abandoning Clojure’s interactive workflow.
For Marlon, this combination makes Clojure a strong orchestration layer for AI systems that need to integrate with multiple ecosystems.
From theory to practice: the live demonstration
To make these ideas concrete, Marlon walked through a live demonstration.
He started with simple Python scripts that sent text, images, and PDFs to Bedrock-hosted models via a local LiteLLM proxy. These examples established a baseline using familiar tooling.
Next, he reproduced the same workflows in Clojure by importing the Python LiteLLM package directly into the Clojure runtime. Python functions were called from Clojure code, with inputs and outputs handled interactively.
According to Marlon, the most important part of the demo was not that this approach works, but how it changes the development experience. Python-based workflows often require frequent context switching — editing files, running scripts, and restarting processes. With Clojure and the REPL, the entire feedback loop stays inside the editor.
For exploratory domains like AI, Marlon argued, this difference directly translates into faster iteration and deeper focus.
Conclusion
Marlon closed by emphasizing that AI agents remain a young and rapidly evolving field. Libraries, architectures, and best practices are still in flux, which makes flexibility a key requirement.
He also offered a note of caution: granting models excessive autonomy without clear boundaries and controls can lead to fragile systems. In his view, frameworks that favour obscure control flow and mostly relies on LLM itself encourages a “ship and pray” approach to AI development. At the same time, Marlon pointed out that new agent architectures are actively emerging from research groups at organizations like Nvidia and DeepMind, signaling that significant changes are still ahead.
His conclusion was pragmatic: combining well-chosen infrastructure, models sized to the problem, and tools that favor exploration creates a solid foundation for building AI systems grounded in engineering discipline. The repository shared during the talk serves as a starting point for engineers interested in continuing that exploration.
Syntax highlighting is a tool. It can help you read code faster. Find things quicker. Orient yourself in a large file.
Like any tool, it can be used correctly or incorrectly. Let’s see how to use syntax highlighting to help you work.
Christmas Lights Diarrhea
Most color themes have a unique bright color for literally everything: one for variables, another for language keywords, constants, punctuation, functions, classes, calls, comments, etc.
Sometimes it gets so bad one can’t see the base text color: everything is highlighted. What’s the base text color here?
The problem with that is, if everything is highlighted, nothing stands out. Your eye adapts and considers it a new norm: everything is bright and shiny, and instead of getting separated, it all blends together.
Here’s a quick test. Try to find the function definition here:
and here:
See what I mean?
So yeah, unfortunately, you can’t just highlight everything. You have to make decisions: what is more important, what is less. What should stand out, what shouldn’t.
Highlighting everything is like assigning “top priority” to every task in Linear. It only works if most of the tasks have lesser priorities.
If everything is highlighted, nothing is highlighted.
Enough colors to remember
There are two main use-cases you want your color theme to address:
Look at something and tell what it is by its color (you can tell by reading text, yes, but why do you need syntax highlighting then?)
Search for something. You want to know what to look for (which color).
1 is a direct index lookup: color → type of thing.
2 is a reverse lookup: type of thing → color.
Truth is, most people don’t do these lookups at all. They might think they do, but in reality, they don’t.
Let me illustrate. Before:
After:
Can you see it? I misspelled return for retunr and its color switched from red to purple.
I can’t.
Here’s another test. Close your eyes (not yet! Finish this sentence first) and try to remember what color your color theme uses for class names?
Can you?
If the answer for both questions is “no”, then your color theme is not functional. It might give you comfort (as in—I feel safe. If it’s highlighted, it’s probably code) but you can’t use it as a tool. It doesn’t help you.
What’s the solution? Have an absolute minimum of colors. So little that they all fit in your head at once. For example, my color theme, Alabaster, only uses four:
Green for strings
Purple for constants
Yellow for comments
Light blue for top-level definitions
That’s it! And I was able to type it all from memory, too. This minimalism allows me to actually do lookups: if I’m looking for a string, I know it will be green. If I’m looking at something yellow, I know it’s a comment.
Limit the number of different colors to what you can remember.
If you swap green and purple in my editor, it’ll be a catastrophe. If somebody swapped colors in yours, would you even notice?
What should you highlight?
Something there isn’t a lot of. Remember—we want highlights to stand out. That’s why I don’t highlight variables or function calls—they are everywhere, your code is probably 75% variable names and function calls.
I do highlight constants (numbers, strings). These are usually used more sparingly and often are reference points—a lot of logic paths start from constants.
Top-level definitions are another good idea. They give you an idea of a structure quickly.
Punctuation: it helps to separate names from syntax a little bit, and you care about names first, especially when quickly scanning code.
Please, please don’t highlight language keywords. class, function, if, elsestuff like this. You rarely look for them: “where’s that if” is a valid question, but you will be looking not at the if the keyword, but at the condition after it. The condition is the important, distinguishing part. The keyword is not.
Highlight names and constants. Grey out punctuation. Don’t highlight language keywords.
Comments are important
The tradition of using grey for comments comes from the times when people were paid by line. If you have something like
of course you would want to grey it out! This is bullshit text that doesn’t add anything and was written to be ignored.
But for good comments, the situation is opposite. Good comments ADD to the code. They explain something that couldn’t be expressed directly. They are important.
So here’s another controversial idea:
Comments should be highlighted, not hidden away.
Use bold colors, draw attention to them. Don’t shy away. If somebody took the time to tell you something, then you want to read it.
Two types of comments
Another secret nobody is talking about is that there are two types of comments:
Explanations
Disabled code
Most languages don’t distinguish between those, so there’s not much you can do syntax-wise. Sometimes there’s a convention (e.g. -- vs /* */ in SQL), then use it!
Here’s a real example from Clojure codebase that makes perfect use of two types of comments:
Disabled code is gray, explanation is bright yellow
Light or dark?
Per statistics, 70% of developers prefer dark themes. Being in the other 30%, that question always puzzled me. Why?
And I think I have an answer. Here’s a typical dark theme:
and here’s a light one:
On the latter one, colors are way less vibrant. Here, I picked them out for you:
Notice how many colors there are. No one can remember that many.
This is because dark colors are in general less distinguishable and more muddy. Look at Hue scale as we move brightness down:
Basically, in the dark part of the spectrum, you just get fewer colors to play with. There’s no “dark yellow” or good-looking “dark teal”.
Nothing can be done here. There are no magic colors hiding somewhere that have both good contrast on a white background and look good at the same time. By choosing a light theme, you are dooming yourself to a very limited, bad-looking, barely distinguishable set of dark colors.
So it makes sense. Dark themes do look better. Or rather: light ones can’t look good. Science ¯\_(ツ)_/¯
But!
But.
There is one trick you can do, that I don’t see a lot of. Use background colors! Compare:
The first one has nice colors, but the contrast is too low: letters become hard to read.
The second one has good contrast, but you can barely see colors.
The last one has both: high contrast and clean, vibrant colors. Lighter colors are readable even on a white background since they fill a lot more area. Text is the same brightness as in the second example, yet it gives the impression of clearer color. It’s all upside, really.
UI designers know about this trick for a while, but I rarely see it applied in code editors:
If your editor supports choosing background color, give it a try. It might open light themes for you.
Bold and italics
Don’t use. This goes into the same category as too many colors. It’s just another way to highlight something, and you don’t need too many, because you can’t highlight everything.
In theory, you might try to replace colors with typography. Would that work? I don’t know. I haven’t seen any examples.
Using italics and bold instead of colors
Myth of number-based perfection
Some themes pay too much attention to be scientifically uniform. Like, all colors have the same exact lightness, and hues are distributed evenly on a circle.
This could be nice (to know if you have OCD), but in practice, it doesn’t work as well as it sounds:
The idea of highlighting is to make things stand out. If you make all colors the same lightness and chroma, they will look very similar to each other, and it’ll be hard to tell them apart.
Our eyes are way more sensitive to differences in lightness than in color, and we should use it, not try to negate it.
Let’s design a color theme together
Let’s apply these principles step by step and see where it leads us. We start with the theme from the start of this post:
First, let’s remove highlighting from language keywords and re-introduce base text color:
Next, we remove color from variable usage:
and from function/method invocation:
The thinking is that your code is mostly references to variables and method invocation. If we highlight those, we’ll have to highlight more than 75% of your code.
Notice that we’ve kept variable declarations. These are not as ubiquitous and help you quickly answer a common question: where does thing thing come from?
Next, let’s tone down punctuation:
I prefer to dim it a little bit because it helps names stand out more. Names alone can give you the general idea of what’s going on, and the exact configuration of brackets is rarely equally important.
But you might roll with base color punctuation, too:
Okay, getting close. Let’s highlight comments:
We don’t use red here because you usually need it for squiggly lines and errors.
This is still one color too many, so I unify numbers and strings to both use green:
Finally, let’s rotate colors a bit. We want to respect nesting logic, so function declarations should be brighter (yellow) than variable declarations (blue).
Compare with what we started:
In my opinion, we got a much more workable color theme: it’s easier on the eyes and helps you find stuff faster.
It’s also been ported to many other editors and terminals; the most complete list is probably here. If your editor is not on the list, try searching for it by name—it might be built-in already! I always wondered where these color themes come from, and now I became an author of one (and I still don’t know).
Feel free to use Alabaster as is or build your own theme using the principles outlined in the article—either is fine by me.
As for the principles themselves, they worked out fantastically for me. I’ve never wanted to go back, and just one look at any “traditional” color theme gives me a scare now.
I suspect that the only reason we don’t see more restrained color themes is that people never really thought about it. Well, this is your wake-up call. I hope this will inspire people to use color more deliberately and to change the default way we build and use color themes.
I have a weird relationship with statistics: on one hand, I try not to look at it too often. Maybe once or twice a year. It’s because analytics is not actionable: what difference does it make if a thousand people saw my article or ten thousand?
I mean, sure, you might try to guess people’s tastes and only write about what’s popular, but that will destroy your soul pretty quickly.
On the other hand, I feel nervous when something is not accounted for, recorded, or saved for future reference. I might not need it now, but what if ten years later I change my mind?
Seeing your readers also helps to know you are not writing into the void. So I really don’t need much, something very basic: the number of readers per day/per article, maybe, would be enough.
Final piece of the puzzle: I self-host my web projects, and I use an old-fashioned web server instead of delegating that task to Nginx.
Static sites are popular and for a good reason: they are fast, lightweight, and fulfil their function. I, on the other hand, might have an unfinished gestalt or two: I want to feel the full power of the computer when serving my web pages, to be able to do fun stuff that is beyond static pages. I need that freedom that comes with a full programming language at your disposal. I want to program my own web server (in Clojure, sorry everybody else).
Existing options
All this led me on a quest for a statistics solution that would uniquely fit my needs. Google Analytics was out: bloated, not privacy-friendly, terrible UX, Google is evil, etc.
What is going on?
Some other JS solution might’ve been possible, but still questionable: SaaS? Paid? Will they be around in 10 years? Self-host? Are their cookies GDPR-compliant? How to count RSS feeds?
Nginx has access logs, so I tried server-side statistics that feed off those (namely, Goatcounter). Easy to set up, but then I needed to create domains for them, manage accounts, monitor the process, and it wasn’t even performant enough on my server/request volume!
My solution
So I ended up building my own. You are welcome to join, if your constraints are similar to mine. This is how it looks:
It’s pretty basic, but does a few things that were important to me.
Setup
Extremely easy to set up. And I mean it as a feature.
Just add our middleware to your Ring stack and get everything automatically: collecting and reporting.
(def app
(-> routes
...
(ring.middleware.params/wrap-params)
(ring.middleware.cookies/wrap-cookies)
...
(clj-simple-stats.core/wrap-stats))) ;; <-- just add this
It’s zero setup in the best sense: nothing to configure, nothing to monitor, minimal dependency. It starts to work immediately and doesn’t ask anything from you, ever.
See, you already have your web server, why not reuse all the setup you did for it anyway?
Request types
We distinguish between request types. In my case, I am only interested in live people, so I count them separately from RSS feed requests, favicon requests, redirects, wrong URLs, and bots. Bots are particularly active these days. Gotta get that AI training data from somewhere.
RSS feeds are live people in a sense, so extra work was done to count them properly. Same reader requesting feed.xml 100 times in a day will only count as one request.
Hosted RSS readers often report user count in User-Agent, like this:
My personal respect and thank you to everybody on this list. I see you.
Graphs
Visualization is important, and so is choosing the correct graph type. This is wrong:
Continuous line suggests interpolation. It reads like between 1 visit at 5am and 11 visits at 6am there were points with 2, 3, 5, 9 visits in between. Maybe 5.5 visits even! That is not the case.
This is how a semantically correct version of that graph should look:
Some attention was also paid to having reasonable labels on axes. You won’t see something like 117, 234, 10875. We always choose round numbers appropriate to the scale: 100, 200, 500, 1K etc.
Goes without saying that all graphs have the same vertical scale and syncrhonized horizontal scroll.
Insights
We don’t offer much (as I don’t need much), but you can narrow reports down by page, query, referrer, user agent, and any date slice.
Not implemented (yet)
It would be nice to have some insights into “What was this spike caused by?”
Some basic breakdown by country would be nice. I do have IP addresses (for what they are worth), but I need a way to package GeoIP into some reasonable size (under 1 Mb, preferably; some loss of resolution is okay).
Finally, one thing I am really interested in is “Who wrote about me?” I do have referrers, only question is how to separate signal from noise.
Performance. DuckDB is a sport: it compresses data and runs column queries, so storing extra columns per row doesn’t affect query performance. Still, each dashboard hit is a query across the entire database, which at this moment (~3 years of data) sits around 600 MiB. I definitely need to look into building some pre-calculated aggregates.
You compile your code to .so, export some functions with extern "C", write bindings in the target language. Done, interoperability achieved.
Except that's not how it works in practice. At least not when your lib is a database written in Clojure, compiled with GraalVM native-image, that needs to manage Git and Lucene internally, and will be called from Rust in environments ranging from dev laptops to CI containers with 8MB of stack.
Episode 12 of "Clojure in product. Would you do it again?" is live — Marcin Maicki, Global Data Developer & Lead Developer at Dentons, joins Artem Barmin and Vadym Kostiuk to talk about running Clojure inside a large, decentralized enterprise.
Highlights:
- Marcin’s Clojure origin story: started with ClojureScript, moved into Clojure, and found the functional mindset a natural fit coming from React.
- How Clojure landed at Dentons: a conscious choice for a focused referral‑network venture that valued expressiveness and small teams.
- Practical stack and ops: Postgres, Elasticsearch, Reagent/Re‑frame + Material UI, Metabase; Marcin also works on PySpark/Databricks in his global data role.
- Maintenance and risk: why they’re migrating away from old, unmaintained libs; regular security scans and external testing make dependency health a real concern.
- Team, onboarding, and hiring: a small Clojure pod (Marcin + one dev, testers, DevOps); knowledge sharing, docs, and close pairing are the onboarding tools — hiring remains the main practical blocker.
- Enterprise realities: polycentric org structure, integration friction with firm standards (Power BI, Azure), and the tradeoffs that make Clojure a strong fit in some contexts but a harder sell in others.
Watch Episode 12 to hear the full conversation and the nuances of keeping a nine‑year Clojure codebase healthy in a corporate setting.
There is a growing movement to use SQLite for everything. Kent C. Dodds
argues for defaulting to SQLite in web
development
due to its zero-latency reads and minimal operational burden. Wesley
Aptekar-Cassels makes a strong
case that SQLite works for
web apps with large user bases, provided they don't need tens of thousands of
writes per second. Discussions on Hacker News and elsewhere cite companies
like Apple, Adobe, and Dropbox using SQLite in production. Even the official
SQLite documentation encourages its
use for most websites with fewer than 100K hits per day.
These points are fair. The overarching theme is a pushback against
automatically choosing complex, client-server databases like PostgreSQL when
SQLite is often more than sufficient, simpler to manage, and faster for the
majority of use cases. I agree with that framing. The debate has settled into
a well-understood set of tradeoffs:
For "SQLite for everything"
Known limitations
Zero-latency reads as an embedded library
Write concurrency limited to a single writer
No separate server to set up or maintain
Not designed for distributed or clustered systems
Reliable, self-contained, battle-tested (most deployed DB in the world)
No built-in user management; relies on filesystem permissions
Fast enough for most human-driven web workloads
Schema migration can be more complex in large projects
These are the terms of the current discussion. But there is an important,
often overlooked dimension missing from this framing.
SQLite struggles with complex queries. More specifically, SQLite is not
well-suited to handle the kind of multi-join queries that arise naturally in any
serious production system. This goes beyond the usual talking points about
deployment concerns (write concurrency, distribution, and so on). It points to
a system-level limitation: the query optimizer itself. That limitation matters even for
read-heavy, single-node deployments, which is exactly the use case where SQLite
is supposed to shine.
I have benchmark
evidence
showing this clearly. This post focuses on join-heavy analytical queries, not on
the many workloads where SQLite is already the right choice. But first, let me
explain why this matters more than people think.
Multi-join queries are not exotic
A common reaction to discussing multi-join queries is: "I don't write queries
with 10 joins." This usually means one of three things: the schema is
denormalized, the logic has been moved into application code, or the product
is simple. None of these mean the problem goes away.
In any system with many entity types, rich relationships, history or
versioning, permissions, and compositional business rules, multi-join queries
inevitably appear. They emerge whenever data is normalized and questions are
compositional. Here are concrete examples from real production systems.
Enterprise SaaS (CRM / ERP / HR). A query like "show me all open enterprise
deals" in a Salesforce-like system touches accounts, contacts, products,
pricebooks, territories, users, permissions, and activity logs. Real queries in
these systems routinely involve 10-20 joins. Every dimension of the business
(customers, ownership, products, pricing, regions, access control, activity
statistics) is often normalized into its own table.
Healthcare (EHR). "Patients with condition X, treated by doctors in
department Y, prescribed drug Z in the last 6 months, and whose insurance covers
that drug" spans patients, visits, diagnoses, providers, departments, prescriptions,
drugs, insurance plans, coverage rules, and claims. Exceeding 15 joins is
common.
E-commerce and Marketplaces. "Orders in the last 30 days that include
products from vendor V, shipped late, refunded, with customers in region R"
touches orders, order items, products, vendors, shipments, delivery events,
refunds, customers, addresses, regions, and payment methods. Again, 10+ joins.
Authorization and Permission systems. "Which documents can user U see?"
requires traversing users, groups, roles, role assignments, resource
policies, ACLs, inheritance rules, and organizational hierarchies. This
alone can be 12+ joins, sometimes recursive.
Analytics and BI. Star schemas look simple on paper, but real dashboard
queries add slowly changing dimensions, hierarchy tables, permission joins,
and attribution models. A "simple" dashboard query often hits 6-10 dimension
tables plus access control.
Knowledge graphs and semantic systems. "Papers authored by people affiliated
with institutions collaborating with company X on topic Y" requires joining
papers, authors, affiliations, institutions, collaborations, and topics.
Very common in search and recommendation systems.
Event sourcing and temporal queries. Reconstructing the state of an
account at a point in time with approval chains requires joining entity
tables, event tables, approval tables, history tables, and version joins.
Temporal dimensions multiply join counts quickly.
AI / ML feature pipelines. Feature stores generate massive joins.
Assembling a feature vector often requires joining user profiles, sessions,
events, devices, locations, and historical aggregates. This is why feature
stores are expensive.
The pattern is consistent across domains:
Domain
Typical join count
SaaS CRM / ERP
8-20
Healthcare
10-25
Authorization
6-15
BI dashboards
6-12
Knowledge graphs
10-30
Feature pipelines
8-20
Complex joins are not accidental. They emerge from normalized data, explicit
relationships, compositional business rules, layered authorization, and
historical records. Again, if you don't see many joins in your system, it usually
means the schema is denormalized, the logic is in the application layer, or
the product hasn't reached sufficient complexity yet. This does not mean the
system is better. It often means complexity has been pushed into the
application layer, which can add engineering cost without adding real value.
The evidence: JOB benchmark
The Join Order Benchmark
(JOB) is a standard
benchmark designed specifically to stress database query optimizers on complex
multi-join queries [1]. Based on the Internet Movie Database (IMDb), a
real-world, highly normalized dataset with over 36 million rows in its largest
table, it contains 113 analytical queries with 3 to 16 joins each, averaging
8 joins per query. Unlike synthetic benchmarks like TPC, JOB uses real data
with realistic data distributions, making it a much harder test of query
optimization.
I ran this benchmark comparing three databases: SQLite (via JDBC),
PostgreSQL 18, and Datalevin (an
open-source database I build). All were tested in default configurations
with no tuning, on a MacBook Pro M3 Pro with 36GB RAM. This is not a tuning
shootout, but a look at out-of-the-box optimizer behavior. Details of the
benchmark methodology can be found
here.
Overall wall clock time
Database
Total time (113 queries)
Datalevin
93 seconds
PostgreSQL
171 seconds
SQLite
295 seconds (excluding 9 timeouts)
SQLite needed a 60-second timeout per query, and 9 queries failed to complete
within that limit. The actual total time for SQLite would be substantially
higher if these were included. For example, query 10c, when allowed to run to
completion, took 446.5 seconds.
Execution time statistics (milliseconds)
Database
Mean
Median
Min
Max
Datalevin
773
232
0.2
8,345
PostgreSQL
1,507
227
3.5
36,075
SQLite
2,837
644
8.1
37,808
The median tells the story: SQLite's median is nearly 3x worse than
the other two.
Per-query speedup: Datalevin vs. SQLite
The chart at the top of this post shows the speedup ratio (SQLite time /
Datalevin time) for each of the queries on a logarithmic scale (excluding 9 timeouts). Points
above the 1x line (10^0) mean Datalevin is faster; points below mean SQLite
is faster. The horizontal lines mark 1x, 10x, and 100x speedups.
Several patterns stand out:
The vast majority of points are above the 1x line, often by 10x or more.
For the hardest queries, Datalevin achieves 100x+ speedups. These are
precisely the complex multi-join queries where SQLite's optimizer breaks
down.
SQLite is rarely faster, and when it is, the margin is small.
The 9 timed-out queries (not shown) would push the ratio even higher.
Where SQLite breaks down
Timeouts. Queries 8c, 8d, 10c, 15c, 15d, 23a, 23b, 23c, and 28c
all timed out at the 60-second limit during the benchmark runs. These
represent queries with higher join counts where SQLite's optimizer failed
to find an efficient plan.
Extreme slowdowns. Even among queries that completed, SQLite was often
dramatically slower. Query 9d took 37.8 seconds on SQLite versus 1.6 seconds
on Datalevin (24x). Query 19d took 20.8 seconds versus 5.7 seconds. Query
families 9, 10, 12, 18, 19, 22, and 30 all show SQLite performing
significantly worse, often by 10-50x.
Why SQLite falls behind
SQLite's query optimizer has fundamental limitations for complex joins:
Limited join order search. SQLite uses exhaustive search for join
ordering only up to a limited number of tables. Beyond that threshold, it
falls back to heuristics that produce poor plans for complex queries.
Weak statistics model. SQLite's cardinality estimation is simpler than
PostgreSQL's, which itself has well-documented weaknesses [1]. With fewer
statistics to guide optimization, SQLite makes worse choices about which
tables to join first and which access methods to use.
No cost-based plan selection for complex cases. For queries with many
tables, SQLite's planner cannot explore enough of the plan space to find
good join orderings. The result is plans that process orders of magnitude
more intermediate rows than necessary.
These limitations are architectural; they are not bugs likely to be fixed in a
near-term release. They reflect design tradeoffs inherent in SQLite's goal of
being a lightweight, embedded database.
What this means for "SQLite in production"
SQLite is excellent for what it was designed to be: an embedded database for
applications with simple query patterns. It excels as a local data store, a
file format, and a cache. For read-heavy workloads with straightforward
queries touching a few tables, it works extremely well.
But the production systems described above, e.g. CRM, EHR, e-commerce,
authorization, analytics, are precisely where SQLite's query optimizer
becomes a bottleneck. These are not hypothetical workloads, but the
day-to-day reality of systems that serve businesses and users.
The "SQLite in production" advocates often benchmark simple cases: key-value
lookups, single-table scans, basic CRUD operations. On those workloads, SQLite
does extremely well. But production systems grow. Schemas become more normalized as
data integrity requirements increase. Questions become more compositional as
business logic matures. And at that point, the query optimizer becomes the
bottleneck, not the network round trip to a database server.
Before choosing SQLite for a production system, ask: will our queries stay
simple forever? If the answer is no, and it usually is, the savings in
deployment simplicity may not be worth the cost in query performance as the
system grows.
An alternative approach
In a previous
post,
I described how Datalevin, a triplestore using Datalog, handles these complex
queries effectively. Its query optimizer uses counting and sampling on its
triple indices to produce accurate cardinality estimates, resulting in
better execution plans. Unlike row stores, where cardinality estimation is
notoriously difficult due to bundled storage, a triplestore can count and
sample individual data atoms directly.
This approach yields plans that are not only better than SQLite's, but
consistently better than PostgreSQL's across the full range of JOB queries.
Despite Datalevin being written in Clojure on the JVM rather than optimized C code,
it still managed to halve the total query time in the JOB benchmark. The quality of the
optimizer's decisions matters more than the raw execution speed of the engine.
For systems that need both deployment simplicity (Datalevin works as an
embedded database too) and the ability to handle complex queries as they
inevitably arise, a triplestore with a cost-based optimizer offers a practical
alternative to either SQLite or a full client-server RDBMS. It is not a silver
bullet, but it can deliver SQLite-like operational simplicity without giving
up complex-query performance.
If you have different results or have tuned SQLite to handle these queries
well, I would love to compare notes. The goal here is not to dunk on SQLite,
but to surface a missing dimension in a discussion that often defaults to
deployment tradeoffs alone.
References
[1] Leis, V., et al. "How good are query optimizers, really?" VLDB
Endowment. 2015.
We are actively pursuing new work with clients old and new. We&aposre in the first place interested in Clojure/ClojureScript gigs, but happy to chat about any potential leads even if you yourself are not hiring.
Things we can do for you:
Long- or short-term team augmentation
Programming new features
Modernizing old codebases
Clojure / Clojurescript coaching
Make your team more productive by improving dev tooling
Consulting on how to manage YOUR open source and community
Feature work on OUR open source (all lambdaisland/Gaiwan libraries)
Architecture review
Cleaning up vibe-coded code bases that have become unmaintainable
Mitesh Shah profile photo with wild hair and a blue suit.
Our colleague Mitesh Shah is open to new work. He&aposs a talented engineer, practical and enterpreneurial with a lot of experience at startups. He knows how to ship, he&aposs a great communicator, and above all he&aposs a great person to work with. He&aposs truly full stack, from UI/UX to the database and back.
Bettina Shzu-Juraschek, who covers legal, tax, HR, and operations at Gaiwan, can create financial overviews and close money leaks for small businesses.
2026 Conferences Preview
This year we&aposre looking forward to attending the Babashka conference on May 8, 2026, and Dutch Clojure Days on May 9, 2026, in Amsterdam. It&aposs always great to get together with the Clojure community in real life! We&aposre booking a sailboat in Weesp with accommodations for up to 16 people; if you want to book a berth either in your own room or in a shared room, sign up here.
Bettina will be at FOSS Backstage and FOSS Backstage Design in Berlin from March 16-18, 2026, to push forward our open source projects. Reach out if you want to connect there.
#tea-break
At Gaiwan we share interesting reads in our #tea-break channel. Here&aposs a selection:
Why there’s no European Google? "Can we just stop equating success with short-term economic growth? What if we used usefulness and longevity? What if we gave more value to the fundamental technological infrastructure instead of the shiny new marketing gimmick used to empty naive wallets?"
The Bitter Lesson by Rich Sutton. In the long run, general-purpose methods that leverage massive computation—specifically search and learning—consistently outperform specialized systems built on human domain expertise.
Yayyay events shares the financial statements from their events. "Organizing Lambda World this year came with a price tag of €52,000,...with a €4,000 deficit."If we had included the costs of our salaries in the budget for the Heart of Clojure conference we organized in 2024, we would have also been 5 figures in the red. 🙁
Anyone who started programming in the early 1980s might have started with Apple II BASIC, BBC BASIC or Sinclair BASIC (ZX-81 or ZX Spectrum) - and 6502 or Z80 assembler.
Those early environments were all defined by immediacy. You typed something in; the machine did something. There was no ambiguity about what the language was for.
Since then a professional programmer might have ventured through FORTRAN, C, C++, UNIX shell, Visual Basic, VBA, VB.NET, C#, F#, JavaScript and more recently Rust, Zig, Nim and Odin. Every one of those fits elegantly into a mental slot: Systems language, Application language, Functional language, Runtime language or Tooling language.
Python, oddly, doesn’t. It can be tricky, for an experienced programmer, to grasp what it was and how it related to the other languages or which slot to put it in. Often, they will conclude "I don't like Python" and express confusion at its vast popularity - and even primacy - in the 2020s.
A slippery language
Traditionally we’re taught to classify languages along a few axes:
compiled vs interpreted
scripting vs “real” languages
imperative vs OO vs functional
Python fits poorly into all of them.
It isn't a compiled language in the C or Rust sense: it doesn't result in a standalone executable. But it isn't purely interpreted either, since it must be processed before execution. It supports imperative, object-oriented and functional styles but isn’t optimized for any of them. It began as a scripting language, but today it’s used to build large, long-running systems.
So just what IS Python?
Python is not a binary-producing language
The turning point is to realize that Python is not defined by the artefact it produces.
C, C++, Rust, Zig and Fortran produce binaries that can be directly run. The output is the thing. Once compiled, the language more or less disappears.
Python doesn’t work like that.
Python source code is compiled to bytecode, and that bytecode is executed by a virtual machine. The VM, the object model, the garbage collector and the standard library are not incidental. They are Python. A Python program needs this ecosystem to run; it can't run standalone, unless they are all bundled in with it.
In structural terms, Python sits alongside languages with runtimes and ecosystems:
.NET (C#, F#, VB.NET)
the JVM (Java, Scala, Kotlin, Clojure)
In all three cases, the runtime is the unit of execution, not the compiled artefact.
“Interpreted vs compiled” is therefore a false dichotomy. CPython parses Python source-code to an Abstract Syntax Tree (AST), compiles it to bytecode and then executes that bytecode on a VM. That’s not conceptually different from Java or .NET — just simpler and often slower, while the runtime startup overhead persists.
Python’s real role: orchestration
The solution to the puzzle of "What IS Python?" is to realize that Python is a runtime-centric language, whereupon its real role becomes obvious.
Python is not primarily about doing work. It’s about controlling work. It's exceptional good and for quickly lashing stuff together: like Lego, Meccano or snap-on tooling than traditional software construction.
The most important Python libraries — NumPy, SciPy, Pandas, PyTorch, TensorFlow — are not written in Python in any meaningful sense. Python provides the API, the glue and the control flow. The heavy lifting happens in underlying libraries written in C, C++, Fortran or CUDA - anything that can expose a C ABI (Application Binary Interface).
Python performs the same role over its libraries as:
SQL over databases
shell over Unix
VBA over Office
It is an orchestration language sitting above high-performance systems. That’s why it thrives in scientific computing, data pipelines and machine learning. It lets you build rapidly and easily, with simply syntax, whilst the underlying libraries deliver the performance. So long as orchestration overhead is low, Pythobn-based systems can scale surprisingly far.
Why Python still feels slippery
Even with this framing, Python can still feel oddly unsatisfying if you come from strongly structured languages.
Compared with .NET or the JVM, Python has:
weak static guarantees
loose module boundaries
a simpler, leakier object model
If you’re used to the discipline of C#, F# or Rust, Python can feel vague. Things work — until they don’t — and the language often declines to help you reason about correctness ahead of time.
It turns out that being able to throw things together quickly, in easy-to-understand code, and ecosystem breadth are far more important for mass adoption than type-safety, compilation, raw-performance or architectural rigidity. Make something easy, and more people will do it, more often.
Python's winning formula is to lower the barrier-to-entry for proof-of-concept and prototype stage projects - much like Visual BASIC and VBA did in the 1990s - and can even get to MVP (Minimum Viable Product). You can always make it faster, later, by translating critical paths into a compiled language.
Getting something working, at all and quickly, turns out to be hugely more important than getting it working fast or elegantly - something shell scripting showed us as far back as the 1970s.
Clearing up potential misunderstandings
Common misconceptions are worth addressing:
“Python is slow”
Python orchestrates underlying code. In most applications, performance-critical paths live in native libraries. Only in a small number of domains — such as ultra-low-latency systems — does Python itself become the limiting factor.
“Python is a scripting language”
Historically true, as that's how it originated, but it has evolved vastly since then.
“Python is interpreted”
Better to say it is pre-compiled to bytecode that is then executed by a virtual machine.
A better language classification
A proper taxonomy therefore looks like this:
Standalone native languages
C, C++, Rust, Zig, Fortran
→ the binary is the product
Runtime ecosystems
Python, JVM languages, .NET languages
→ the runtime is the product
Host-bound scripting languages
Bash, PowerShell, VBA
→ the host environment is the product
Python belongs firmly in the second group.
A brief note for Rust and Go proponents
A common challenge from Rust or Go developers is that Python’s role is better served by “doing it properly” in a compiled language from the start.
That view makes sense — if your problem is well-specified, stable, performance-critical, and worth committing to upfront architectural constraints. In those cases, Rust or Go are often excellent choices - although these languages are more specialized than, say, C#, F# or JavaScript.
But many real-world problems do not start that way. They begin as ill-defined, exploratory or evolving systems: data pipelines, research code, internal tools, integration glue. A research-team needs to test an idea quickly in a small-scale way, rather than performantly on terabytes of data. A business-development team needs to solve a problem quickly and tactically, because the business needs a solution "yesterday". Some problems move to fast to wait for a strategic solution, or the cost of a strategic solution cannot yet be justified as too much is unknown. In those contexts, early commitment to strict typing, memory models or concurrency primitives can slow learning rather than accelerate it.
Python’s advantage is not that it replaces C#, Java, Rust or Go. It is that it defers commitment. You can explore the problem space quickly, validate assumptions, and only later decide which parts deserve the cost of rewriting in a compiled language. Your Proof-of-Concept or Prototype written in Python becomes your teacher and learning exercise.
In practice, Python and Rust and Go are not competitors but complements: Python for orchestration and discovery; Rust or Go for stabilised, performance-critical components where very specific issues need to be solved. Rust eliminates entire classes of bug to do with memory-management and threading; Go is superb at server-side services. These are not everyday programming needs.
Summary
Python isn’t confused, incoherent or a "toy" language. It simply departs from the mental models of earlier generations of languages and fulfills a unique role that no other language can quite match.
Python is not any of compiled, interpreted or “just a scripting language”. It's a runtime-centric orchestration layer and a complete ecosystem of its own. It's the Visual Basic and VBA of the internet era: ideal for rapid assembly, experimentation and leverage rather than purity or control.
And that makes it incredibly useful - and wildly popular.
Here is a corresponding Midje test.
Note that ideally you practise Test Driven Development (TDD), i.e. you start with writing one failing test.
Because this is a Clojure notebook, the unit tests are displayed after the implementation.
We test the method by replacing the random function with a deterministic function.
(facts"Place random point in a cell"(with-redefs[rand(fn[s](*0.5s))](random-point-in-cell{:cellsize1}00)=>(vec20.50.5)(random-point-in-cell{:cellsize2}00)=>(vec21.01.0)(random-point-in-cell{:cellsize2}03)=>(vec27.01.0)(random-point-in-cell{:cellsize2}20)=>(vec21.05.0)(random-point-in-cell{:cellsize2}235)=>(vec311.07.05.0)))
We can now use the random-point method to generate a grid of random points.
The grid is represented using a tensor from the dtype-next library.
(facts"Greate grid of random points"(let[params-2d(make-noise-params3282)params-3d(make-noise-params3283)](with-redefs[rand(fn[s](*0.5s))](dtype/shape(random-pointsparams-2d))=>[88]((random-pointsparams-2d)00)=>(vec22.02.0)((random-pointsparams-2d)03)=>(vec214.02.0)((random-pointsparams-2d)20)=>(vec22.010.0)(dtype/shape(random-pointsparams-3d))=>[888]((random-pointsparams-3d)235)=>(vec322.014.010.0))))
Here is a scatter plot showing one random point placed in each cell.
(facts"Wrap around components of vector to be within -size/2..size/2"(mod-vec{:size8}(vec223))=>(vec223)(mod-vec{:size8}(vec252))=>(vec2-32)(mod-vec{:size8}(vec225))=>(vec22-3)(mod-vec{:size8}(vec2-52))=>(vec232)(mod-vec{:size8}(vec22-5))=>(vec223)(mod-vec{:size8}(vec3231))=>(vec3231)(mod-vec{:size8}(vec3521))=>(vec3-321)(mod-vec{:size8}(vec3251))=>(vec32-31)(mod-vec{:size8}(vec3235))=>(vec323-3)(mod-vec{:size8}(vec3-521))=>(vec3321)(mod-vec{:size8}(vec32-51))=>(vec3231)(mod-vec{:size8}(vec323-5))=>(vec3233))
Using the mod-dist function we can calculate the distance between two points in the periodic noise array.
The tabular macro implemented by Midje is useful for running parametrized tests.
(tabular"Wrapped distance of two points"(fact(mod-dist{:size8}(vec2?ax?ay)(vec2?bx?by))=>?result)?ax?ay?bx?by?result00000.000202.000503.000022.000053.020002.050003.002002.005003.0)
Modular lookup
We also need to lookup elements with wrap around.
We recursively use tensor/select and then finally the tensor as a function to lookup along each axis.
A tensor with index vectors is used to test the lookup.
(facts"Wrapped lookup of tensor values"(let[t(tensor/compute-tensor[46]vec2)](wrap-gett23)=>(vec223)(wrap-gett27)=>(vec221)(wrap-gett53)=>(vec213)(wrap-get(wrap-gett5)3)=>(vec213)))
The following function converts a noise coordinate to the index of a cell in the random point array.
Using above functions one can now implement Worley noise.
For each pixel the distance to the closest seed point is calculated.
This is achieved by determining the distance to each random point in all neighbouring cells and then taking the minimum.
Perlin noise is generated by choosing a random gradient vector at each cell corner.
The noise tensor’s intermediate values are interpolated with a continuous function, utilizing the gradient at the corner points.
Random gradients
The 2D or 3D gradients are generated by creating a vector where each component is set to a random number between -1 and 1.
Random vectors are generated until the vector length is greater 0 and lower or equal to 1.
The vector then is normalized and returned.
Random vectors outside the unit circle or sphere are discarded in order to achieve a uniform distribution on the surface of the unit circle or sphere.
In the following tests, the random function is again replaced with a deterministic function.
(facts"Create unit vector with random direction"(with-redefs[rand(constantly0.5)](random-gradient00)=>(roughly-vec(vec2(-(sqrt0.5))(-(sqrt0.5)))1e-6))(with-redefs[rand(constantly1.5)](random-gradient00)=>(roughly-vec(vec2(sqrt0.5)(sqrt0.5))1e-6)))
The random gradient function is then used to generate a field of random gradients.
The next step is to determine the vectors to the corners of the cell for a given point.
First we define a function to determine the fractional part of a number.
(defnfrac[x](-x(Math/floorx)))(facts"Fractional part of floating point number"(frac0.25)=>0.25(frac1.75)=>0.75(frac-0.25)=>0.75)
This function can be used to determine the relative position of a point in a cell.
(defncell-pos[{:keys[cellsize]}point](applyvec-n(mapfrac(divpointcellsize))))(facts"Relative position of point in a cell"(cell-pos{:cellsize4}(vec223))=>(vec20.50.75)(cell-pos{:cellsize4}(vec275))=>(vec20.750.25)(cell-pos{:cellsize4}(vec3752))=>(vec30.750.250.5))
A 2 × 2 tensor of corner vectors can be computed by subtracting the corner coordinates from the point coordinates.
(facts"Compute relative vectors from cell corners to point in cell"(let[corners2(corner-vectors{:cellsize4:dimensions2}(vec276))corners3(corner-vectors{:cellsize4:dimensions3}(vec3765))](corners200)=>(vec20.750.5)(corners201)=>(vec2-0.250.5)(corners210)=>(vec20.75-0.5)(corners211)=>(vec2-0.25-0.5)(corners3000)=>(vec30.750.50.25)))
Extract gradients of cell corners
The function below retrieves the gradient values at a cell’s corners, utilizing wrap-get for modular access.
The result is a 2 × 2 tensor of gradient vectors.
(facts"Get 2x2 tensor of gradients from a larger tensor using wrap around"(let[gradients2(tensor/compute-tensor[46](fn[yx](vec2xy)))gradients3(tensor/compute-tensor[468](fn[zyx](vec3xyz)))]((corner-gradients{:cellsize4:dimensions2}gradients2(vec296))00)=>(vec221)((corner-gradients{:cellsize4:dimensions2}gradients2(vec296))01)=>(vec231)((corner-gradients{:cellsize4:dimensions2}gradients2(vec296))10)=>(vec222)((corner-gradients{:cellsize4:dimensions2}gradients2(vec296))11)=>(vec232)((corner-gradients{:cellsize4:dimensions2}gradients2(vec22315))11)=>(vec200)((corner-gradients{:cellsize4:dimensions3}gradients3(vec3963))000)=>(vec3210)))
Influence values
The influence value is the function value of the function with the selected random gradient at a corner.
(facts"Compute influence values from corner vectors and gradients"(let[gradients2(tensor/compute-tensor[22](fn[_yx](vec2x10)))vectors2(tensor/compute-tensor[22](fn[y_x](vec21y)))influence2(influence-valuesgradients2vectors2)gradients3(tensor/compute-tensor[222](fn[zyx](vec3xyz)))vectors3(tensor/compute-tensor[222](fn[_z_y_x](vec3110100)))influence3(influence-valuesgradients3vectors3)](influence200)=>0.0(influence201)=>1.0(influence210)=>10.0(influence211)=>11.0(influence3111)=>111.0))
Interpolating the influence values
For interpolation the following “ease curve” is used.
(facts"Monotonously increasing function with zero derivative at zero and one"(ease-curve0.0)=>0.0(ease-curve0.25)=>(roughly0.1035161e-6)(ease-curve0.5)=>0.5(ease-curve0.75)=>(roughly0.8964841e-6)(ease-curve1.0)=>1.0)
The ease curve monotonously increases in the interval from zero to one.
Here x-, y-, and z-ramps are used to test that interpolation works.
(facts"Interpolate values of tensor"(let[x2(tensor/compute-tensor[46](fn[_yx]x))y2(tensor/compute-tensor[46](fn[y_x]y))x3(tensor/compute-tensor[468](fn[_z_yx]x))y3(tensor/compute-tensor[468](fn[_zy_x]y))z3(tensor/compute-tensor[468](fn[z_y_x]z))](interpolatex22.53.5)=>3.0(interpolatey22.53.5)=>2.0(interpolatex22.54.0)=>3.5(interpolatey23.03.5)=>2.5(interpolatex20.00.0)=>2.5(interpolatey20.00.0)=>1.5(interpolatex32.53.55.5)=>5.0(interpolatey32.53.53.0)=>3.0(interpolatez32.53.55.5)=>2.0))
Octaves of noise
Fractal Brownian Motion is implemented by computing a weighted sum of the same base noise function using different frequencies.
(tabular"Remap values of tensor"(fact((remap(tensor/->tensor[?value])?low1?high1?low2?high2)0)=>?expected)?value?low1?high1?low2?high2?expected001010101011001232101233223010323011102042)
The clamp function is used to element-wise clamp values to a range.
In order to render the clouds we create a window and an OpenGL context.
Note that we need to create an invisible window to get an OpenGL context, even though we are not going to draw to the window
The following method creates a program and the quad VAO and sets up the memory layout.
The program and VAO are then used to render a single pixel.
Using this method we can write unit tests for OpenGL shaders!
We can test this mock function using the following probing shader.
Note that we are using the template macro of the comb Clojure library to generate the probing shader code from a template.
(defnoise-probe(template/fn[xyz]"#version 130
out vec4 fragColor;
float noise(vec3 idx);
void main()
{
fragColor = vec4(noise(vec3(<%= x %>, <%= y %>, <%= z %>)));
}"))
Here multiple tests are run to test that the mock implements a checkboard pattern correctly.
Again we use a probing shader to test the shader function.
(defoctaves-probe(template/fn[xyz]"#version 130
out vec4 fragColor;
float octaves(vec3 idx);
void main()
{
fragColor = vec4(octaves(vec3(<%= x %>, <%= y %>, <%= z %>)));
}"))
A few unit tests with one or two octaves are sufficient to drive development of the shader function.
(tabular"Test octaves of noise"(fact(first(render-pixel[vertex-passthrough][noise-mock(noise-octaves?octaves)(octaves-probe?x?y?z)]))=>?result)?x?y?z?octaves?result000[1.0]0.0100[1.0]1.0100[0.5]0.50.500[0.01.0]1.00.500[0.01.0]1.0100[1.00.0]1.0)
Shader for intersecting a ray with a box
The following shader implements intersection of a ray with an axis-aligned box.
The shader function returns the distance of the near and far intersection with the box.
The ray-box shader is tested with different ray origins and directions.
(tabular"Test intersection of ray with box"(fact((juxtfirstsecond)(render-pixel[vertex-passthrough][ray-box(ray-box-probe?ox?oy?oz?dx?dy?dz)]))=>?result)?ox?oy?oz?dx?dy?dz?result-200100[1.03.0]-200200[0.51.5]-222100[0.00.0]0-20010[1.03.0]0-20020[0.51.5]2-22010[0.00.0]00-2001[1.03.0]00-2002[0.51.5]22-2001[0.00.0]000100[0.01.0]200100[0.00.0])
Shader for light transfer through clouds
We test the light transfer through clouds using constant density fog.
The following fragment shader is used to render an image of a box filled with fog.
The pixel coordinate and the resolution of the image are used to determine a viewing direction which also gets rotated using the rotation matrix and normalized.
The origin of the camera is set at a specified distance to the center of the box and rotated as well.
The ray box function is used to determine the near and far intersection points of the ray with the box.
The cloud transfer function is used to sample the cloud density along the ray and determine the overall opacity and color of the fog box.
The background is a mix of blue color and a small blob of white where the viewing direction points to the light source.
The opacity value of the fog is used to overlay the fog color over the background.
Uniform variables are parameters that remain constant throughout the shader execution, unlike vertex input data.
Here we use the following uniform variables:
resolution: a 2D vector containing the window pixel width and height
light: a 3D unit vector pointing to the light source
rotation: a 3x3 rotation matrix to rotate the camera around the origin
focal_length: the ratio of camera focal length to pixel size of the virtual camera
The following function sets up the shader program, the vertex array object, and the uniform variables.
Then GL11/glDrawElements draws the background quad used for performing volumetric rendering.
We also need to convert the floating point array to a tensor and then to a BufferedImage.
The one-dimensional array gets converted to a tensor and then reshaped to a 3D tensor containing width × height RGBA values.
The RGBA data is converted to BGR data and then multiplied with 255 and clamped.
Finally the tensor is converted to a BufferedImage.
Finally we are ready to render the volumetric fog.
(rgba-array->bufimg(render-fog640480)640480)
Rendering of 3D noise
This method converts a floating point array to a buffer and initialises a 3D texture with it.
It is also necessary to set the texture parameters for interpolation and wrapping.
In-scattering of light towards the observer depends of the angle between light source and viewing direction.
Here we are going to use the phase function by Cornette and Shanks which depends on the asymmetry g and mu = cos(theta).
(defmie-scatter(template/fn[g]"#version 450 core
#define M_PI 3.1415926535897932384626433832795
#define ANISOTROPIC 0.25
#define G <%= g %>
uniform vec3 light;
float mie(float mu)
{
return 3 * (1 - G * G) * (1 + mu * mu) /
(8 * M_PI * (2 + G * G) * pow(1 + G * G - 2 * G * mu, 1.5));
}
float in_scatter(vec3 point, vec3 direction)
{
return mix(1.0, mie(dot(light, direction)), ANISOTROPIC);
}"))
We define a probing shader.
(defmie-probe(template/fn[mu]"#version 450 core
out vec4 fragColor;
float mie(float mu);
void main()
{
float result = mie(<%= mu %>);
fragColor = vec4(result, 0, 0, 1);
}"))
The shader is tested using a few values.
(tabular"Shader function for scattering phase function"(fact(first(render-pixel[vertex-passthrough][(mie-scatter?g)(mie-probe?mu)]))=>(roughly?result1e-6))?g?mu?result00(/3(*16PI))01(/6(*16PI))0-1(/6(*16PI))0.50(/(*30.75)(*8PI2.25(pow1.251.5)))0.51(/(*60.75)(*8PI2.25(pow0.251.5))))
We can define a function to compute a particular value of the scattering phase function using the GPU.
Finally we can implement the shadow function by also sampling towards the light source to compute the shading value at each point.
Testing the function requires extending the render-pixel function to accept a function for setting the light uniform.
We leave this as an exercise for the interested reader 😉.
(defshadow(template/fn[noisestep]"#version 130
#define STEP <%= step %>
uniform vec3 light;
float <%= noise %>(vec3 idx);
vec2 ray_box(vec3 box_min, vec3 box_max, vec3 origin, vec3 direction);
float shadow(vec3 point)
{
vec2 interval = ray_box(vec3(-0.5, -0.5, -0.5), vec3(0.5, 0.5, 0.5), point, light);
float result = 1.0;
for (float t = interval.x + 0.5 * STEP; t < interval.y; t += STEP) {
float density = <%= noise %>(point + t * light);
float transmittance = exp(-density * STEP);
result *= transmittance;
};
return result;
}"))
Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS).
Clojure Dev Call
Join the Clojure core team for an update on what we’ve been working on and
what’s on our horizon. We’ll save time for a Q&A, so bring your questions.
Feb 10 @ 18:00 UTC.
Register here.
eca0.97.0 - Editor Code Assistant (ECA) - AI pair programming capabilities agnostic of editor
agent-o-ramarelease/0.8.0 - End-to-end LLM agent platform for Java and Clojure for building, tracing, testing, and monitoring agents with integrated storage and one-click deployment. Inspired by LangGraph/LangSmith.
Well, I guess people will just inevitably get into the problem of classpath, one way or another. The Classpath is a Lie described the problem very well: classpath is a lie. classpath, per se, is a simple list separated by colons, however, the real work is done by the Classloader.
Nevertheless, this isn't a post talking about classpath and ClassLoader. There are already a lot of great articles talking about it (links at the end of this post), and I can't claim I understand ClassLoaders to the extent that I can confidently teach others about it either.
This is a blog about what I have found during the process of trying to add new directories to classpath and require Clojure files in them at runtime. ClassLoader in Clojure is something very messy. The best strategy probably is to avoid the problem altogether. But still, if you really want to do it, I wish the following content can offer some help.
Clojure has a builtin add-classpath function. Although it has been deprecated, it works for simple use cases.
(defncheck-dynamic-load []
(let [tmp-dir (.toFile (Files/createTempDirectory"classpath-demo" (into-array java.nio.file.attribute.FileAttribute [])))
tmp-clj (File/createTempFile"demo"".clj" tmp-dir)
tmp-name (subs (.getName tmp-clj)
0
(.lastIndexOf (.getName tmp-clj)
"."))]
;; Add `tmp-dir` to `classpath` using builtin `add-classpath`.
(add-classpath (.toURL tmp-dir))
;; Put a Clojure file under the directory
(spit tmp-clj
(str"(ns " tmp-name ") (def a 1)"))
;;; `require` the Clojure file, and resolve the variable
(assert (=1 (var-get (requiring-resolve (symbol tmp-name
"a")))))
;; Update the Clojure file
(spit tmp-clj
(str"(ns " tmp-name ") (def a 2)"))
;; Reload the Clojure file
(require (symbol tmp-name)
:reload-all)
;; We can read the new value
(assert (=2 (var-get (requiring-resolve (symbol tmp-name
"a")))))
(println"success")))
;; success
(check-dynamic-load)
If we evaluate the above code in a REPL or in cider, it works and prints "success". As we can see from the code, we can require a Clojure file whose path determined at runtime, and reload it to get the updated value.
However, there is a reason of it being deprecated. We can check its source code:
// The method used by clojure.core/add-classpathstaticpublicvoidaddURL(Object url)throws MalformedURLException{
URLu= (url instanceof String) ? toUrl((String) url) : (URL) url;
ClassLoaderccl= Thread.currentThread().getContextClassLoader();
if(ccl instanceof DynamicClassLoader)
((DynamicClassLoader)ccl).addURL(u);
elsethrownewIllegalAccessError("Context classloader is not a DynamicClassLoader");
}
It checks if the current thread's ContextClassLoader is a DynamicClassLoader. If so, it will call the DynamicClassLoader's addURL method.
That means, this method will fail if there's some code set the current ContextClassLoader to something other than a DynamicClassLoader.
We expect add-classpath continues to work in case because the convention of setting a new ClassLoader is to set the current ClassLoader as the parent of the newly created ClassLoader. However, clojure.core/add-classpath only checks the current ClassLoader, more on this in the section.
Unlike add-classpath from clojure.core, pomegranate's add-classpath try to find the ClassLoader closest to the Primordial ClassLoader that is compatible with add-classpath, and call the addURL method from it.
;; in `add-classpath` function in pomegranate.clj
(let [classloaders (classloader-hierarchy)]
(if-let [cl (last (filter modifiable-classloader? classloaders))]
(add-classpath jar-or-dir cl)
(throw (IllegalStateException. (str"Could not find a suitable classloader to modify from "
(mapv (fn [^ClassLoader c]
(-> c .getClass .getSimpleName))
classloaders))))))
If you are running Clojure in REPL or with nrepl, the ContextClassLoader of the current thread will certainly be DynamicClassLoader, set by one of those tools. However, when you run clj command in a non-interactive manner, or use a AOT-compiled jar file, this wouldn't be the case.
This problem is quite easy to solve, we just need to set the ContextClassLoader to a DynamicClassLoader created by ourselves in the entrypoint of the program.
So far, so good. Except when you finish the code and try to test some code and run it under the kaocha test runner. kaocha also did its own thing with ClassLoader and provides a add-classpath method. It breaks the previous method.
By detecting kaocha's presence and calling its add-classpath, alongside with the pomegranate one solves the issue for me.
When you create a thread, the new thread will inherit the ContextClassLoader of the thread created it. However when you explicitly or implicitly (like when using future) use a threadpool, the executor may choose an existing thread, which could have a ContextClassLoader different from the calling thread.
You may consider creating a thread directly instead of relying on future in this case.
This post is definitely not comprehensive, and there are still a lot things I currently do not understand. The method I have described works for me for now. If you want to understand more about this topic, I have listed a few links below.
Critical Infrastructure: Clojars Maintentance and Support Update by Toby Crawley
November-December, 2025. Published January 24, 2026
This is an update on the work I’ve done maintaining Clojars in November and December of 2025.
Most of my work on Clojars is reactive, based on issues reported through the community or noticed through monitoring.