Exercism 48in24 Recap

I participated in the Exercism challenge of 48 challenges in 2024. The challenges/exercises came weekly and with 52 weeks in a year and even with a few weeks skipped I was kept pretty busy.

I did not reach the goal of finalizing all 48, but I did do about 66 and some in 14 different languages.

  • leap: 10 - calculating leap years
  • reverse string: 7 - reversing strings
  • rain drops: 4 - basic flow control
  • roman numerals: 3 - converting numbers to roman numerals
  • protein translation: 4 - translating RNA to proteins
  • list ops: 0 - list operations
  • acronym: 2 - string manipulation
  • circular buffer: 0 - circular buffer
  • parallel letter frequency: 0 - parallel letter frequency
  • allergies: 2 - bit manipulation
  • sieve: 1 - prime numbers
  • luhn: 2 - checksums
  • scrabble score: 1 - scoring words
  • difference of squares: 1 - basic math
  • pangram: 1 - string manipulation
  • all your base: 1 - basic math
  • zebra puzzle: 0 - constraint-satisfaction problem
  • mine sweeper: 0 - data structures
  • d&d character: 1 - random numbers and data structures
  • pig latin: 1 - string manipulation
  • space age: 4 - basic math
  • yacht: 0 - basic math
  • matching brackets: 1 - string parsing
  • rna transcription: 0 - string manipulation
  • binary search: 1 - searching algorithms
  • spiral matrix: 0 - algorithms
  • secret handshake: 1 - bit manipulation
  • anagram: 2 - string manipulation
  • kinder garten garden: 1 - data structures
  • robot simulator: 1 - data structures
  • knapsack: 0 - math and data structures
  • meetup: 1 - date manipulation
  • pascal's triangle: 0 - basic math
  • hamming: 1 - string manipulation
  • rotational cipher: 1 - string manipulation
  • phone number: 1 - string parsing
  • isogram: 1 - string manipulation
  • bob: 1 - string manipulation
  • two bucket: 0 - algorithms
  • grade school: 1 - data structures
  • bank account: 0 - data structures
  • food chain: 1 - data structures
  • eliud's eggs: 0 - bit manipulation
  • collatz conjecture: 1 - basic math
  • run-length encoding: 1 - string manipulation
  • armstrong numbers: 2 - basic math
  • diamond: 1 - string manipulation
  • largest series product: 1 - string manipulation and basic math

Through the any challenges I touched the following languages ordered alphabetically:

  • C
  • Clojure
  • Common-Lisp
  • C++
  • Elixir
  • Go (my primary learning path)
  • Javascript (my secondary learning path)
  • jq (part of my toolbox on a basic level though)
  • Mips
  • Nim
  • Perl 5 (my primary language and part of my boring stack)
  • Python
  • Ruby
  • Typescript

In the beginning I did one challenge and covered several languages. Later as time went by I stuck to one language and did several solutions and as the deadline approached I did the problems in the language I was most familiar with (Perl) and then revisited the challenges in familiar languages working my way out to more and more unfamiliar languages.

Every time I started a new challenge I disabled Copilot VSCode, because I found out it was very knowledgeable on these problems. I had to do this manually, but after the challenge ended and I was doing some Exercises I found out that I could disable Copilot by setting up the Workspace like so (I just copied in the file when I can pulled the exercise from Exercism).

File: .vscode/settings.json

{
  "github.copilot.enable": {
    "*": false
  }
}

Overall it was awesome and very educational to solve problems and familiarize myself with some very unfamiliar languages and every time I touched a new language I also had to set up the tooling and do a "Hello World" exercise. Often I got stock, even with easy exercises and it was most frustrating. I often had to sleep on problems, syntax, used method or approach - alternatives to sleeing was going for run or hike all very familiar methods of stepping away from a problem and then returning to it. Other days I would solve 3 exercises in a row.

I learned a lot about thinking about problems and generalizing my approach - not that is not something I do not do everyday, but practicing it on very isolated problems without:

  • Company specific constraints or politics
  • Deadlines
  • Legacy code
  • Disturbances

Made the problem solving very focused and I could really concentrate on the problem at hand.

I also learned a lot about the problems themselves as I started to see patterns in the problems and the solutions and many of the problems were very similar, but with a twist. An example when you are working with ciphers, many of them follow specific patterns, as do many string oriented problems. Often you can develop a method, which is generally useful. Often when you work on a programming task as part of a larger problem, area you are solving it and moving on to another part in your application, which is somewhat different. For many of the cipher or string related problems, I would do parsing, tokenizing and then the special part by shuffling the tokens to solve the problem ad hand, it has also lead me to go back to improving a few of my solutions as I learned better approaches.

I am always recommending Exercism, whether you just want to practice your programming with some funny exercises or your want to learn a new language it is a very good platform.

Even though the challenge has completed, I am still doing exercises and hopefully I will be able to complete some of the tracks at some point.

The problem solving part has really influenced my work and if I get around to it I might see of I can come up with some exercises, which have challenged me at work, which can be isolated to small contained problems, perhaps well suited for Exercism exercises or in my case for easier modelling and understanding.

If I get the time I would very much like to share my notes on adopting the various languages and perhaps even my solutions to some of the exercises. I have some reservations to doing the latter, since it does spoil the fun of solving the exercises for you. I have made some basic tooling which could be of interest/inspiration to you if you are in on Exercism.

Permalink

Taggie

Taggie is an experimental library trying find an answer for a strange question: is it possible to benefit from Clojure tags and readers, and how?

Taggie extends printing methods such that types that could not be read from their representation now can be read. A quick example: if you print an atom, you’ll get a weird string:

(atom 42)
#<Atom@7fea5978: 42>

Run that string, and REPL won’t understand you:

#<Atom@7fea5978: 42>
Syntax error reading source at (REPL:962:5).
Unreadable form

But with Taggie, it goes this way:

(atom 42)
#atom 42 ;; represented with a tag

And vice versa:

#atom 42 ;; run it in repl
#atom 42 ;; the result

The value is an atom indeed, you can check it:

(deref #atom 42)
42

Tags can be nested. Let’s try some madness:

(def omg #atom #atom #atom #atom #atom #atom 42)

(println omg)
#atom #atom #atom #atom #atom #atom 42

@@@@@@omg
42

But this is not only about atoms! Taggie extends many types, e.g. refs, native Java arrays, File, URI, URL, Date, java.time.* classes, and something else. See the corresponding section below.

Installation and Usage

Add this to your project:

;; lein
[com.github.igrishaev/taggie "0.1.0"]

;; deps
com.github.igrishaev/taggie {:mvn/version "0.1.0"}

Then import the core namespace:

(ns com.acme.server
  (:require
    taggie.core))

Now type in the repl any of these:

#LocalDate "2025-01-01"
#Instant "2025-01-01T23:59:59Z"
#File "/path/to/a/file.txt"
#URL "https://clojure.org"
#bytes [0x00 0xff]
#ints [1 2 3]
#floats [1 2 3]
#ByteBuffer [0 1 2 3 4]
...

Each expression gives an instance of a corresponding type: a LocalDate, an Instane, a File, etc… #bytes, #ints and similar produce native Java arrays.

You can pass tagged values into functions as usual:

(deref #atom 42)
42

(alength #longs [1 2 3])
3

To observe what happends under the hood, prepend your expression with a backtick:

`(alength #longs [1 2 3])

(clojure.core/alength (taggie.readers/__reader-longs-edn [1 2 3]))

Internally, all tags expand into an invocation of an EDN reader. Namely, #longs items becomes (taggie.readers/__reader-longs-edn items), and when evaluated, it returs a native array of longs.

EDN Support

Taggie provides functions to read and write EDN with tags. They live in the taggie.edn namespace. Use it as follows:

(def edn-dump
  (taggie.edn/write-string #atom {:test 1
                                  :values #longs [1 2 3]
                                  :created-at #LocalDate "2025-01-01"}))

(println edn-dump)

;; #atom {:test 1,
;;        :values #longs [1, 2, 3],
;;        :created-at #LocalDate "2025-01-01"}

It produces a string with custom tags and data being pretty printed. Let’s read it back:

(taggie.edn/read-string edn-dump)

#atom {:test 1,
       :values #longs [1, 2, 3],
       :created-at #LocalDate "2025-01-01"}

The write function writes EDN into a destination which might be a file path, a file, an output stream, a writer, etc:

(taggie.edn/write (clojure.java.io/file "data.edn")
                  {:test (atom (ref (atom :secret)))})

The read function reads EDN from any kind of source: a file path, a file, in input stream, a reader, etc. Internally, a source is transformed into the PushbackReader instance:

(taggie.edn/read (clojure.java.io/file "data.edn"))

{:test #atom #ref #atom :secret}

Both read and read-string accept standard clojure.edn/read options, e.g. :readers, :eof, etc. The :readers map gets merged with a global map of custom tags.

Motivation

Aside from jokes, this library might save your day. I often see people dump data into .edn files, and the data has atoms, regular expressions, exceptions, and other unreadable types:

(spit "data.edn"
      (with-out-str
        (clojure.pprint/pprint
          {:regex #"foobar"
           :atom (atom 42)
           :error (ex-info "boom" {:test 1})})))

(println (slurp "data.edn"))

{:regex #"foobar", :atom #<Atom@4f7aa8aa: 42>, :error #error {
 :cause "boom"
 :data {:test 1}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "boom"
   :data {:test 1}
   :at [user$eval43373$fn__43374 invoke "form-init6283045849674730121.clj" 2248]}]
 :trace
 [[user$eval43373$fn__43374 invoke "form-init6283045849674730121.clj" 2248]
  [user$eval43373 invokeStatic "form-init6283045849674730121.clj" 2244]
  ;; truncated
  [clojure.lang.AFn run "AFn.java" 22]
  [java.lang.Thread run "Thread.java" 833]]}}

This dump cannot be read back due to:

  1. unknown #"foobar" tag (EDN doesn’t support regex);
  2. broken #<Atom@4f7aa8aa: 42> expression;
  3. unknown #error tag.

But with Taggie, the same data produces tagged fields that can be read back.

Supported Types

In alphabetic order:

Type Example
java.nio.ByteBuffer #ByteBuffer [0 1 2]
java.util.Date #Date "2025-01-06T14:03:23.819Z"
java.time.Duration #Duration "PT72H"
java.io.File #File "/path/to/file.txt"
java.time.Instant #Instant "2025-01-06T14:03:23.819994Z"
java.time.LocalDate #LocalDate "2034-01-30"
java.time.LocalDateTime #LocalDateTime "2025-01-08T11:08:13.232516"
java.time.LocalTime #LocalTime "20:30:56.928424"
java.time.MonthDay #MonthDay "--02-07"
java.time.OffsetDateTime #OffsetDateTime "2025-02-07T20:31:22.513785+04:00"
java.time.OffsetTime #OffsetTime "20:31:39.516036+03:00"
java.time.Period #Period "P1Y2M3D"
java.net.URI #URI "foobar://test.com/path?foo=1"
java.net.URL #URL "https://clojure.org"
java.time.Year #Year "2025"
java.time.YearMonth #YearMonth "2025-02"
java.time.ZoneId #ZoneId "Europe/Paris"
java.time.ZoneOffset #ZoneOffset "-08:00"
java.time.ZonedDateTime #ZonedDateTime "2025-02-07T20:32:33.309294+01:00[Europe/Paris]"
clojure.lang.Atom #atom {:inner 'state}
boolean[] #booleans [true false]
byte[] #bytes [1 2 3]
char[] #chars [\a \b \c]
double[] #doubles [1.1 2.2 3.3]
Throwable->map #error <result of Throwable->map> (see below)
float[] #floats [1.1 2.2 3.3]
int[] #ints [1 2 3]
long[] #longs [1 2 3]
Object[] #objects ["test" :foo 42 #atom false]
clojure.lang.Ref #ref {:test true}
java.util.regex.Pattern #regex "vesion: \d+"
java.sql.Timestamp #sql/Timestamp "2025-01-06T14:03:23.819Z"

The #error tag is a bit special: it returns a value with no parsing. It prevents an error when reading the result of printing of an exception:

(println (ex-info "boom" {:test 123}))

#error {
 :cause boom
 :data {:test 123}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message boom
   :data {:test 123}
   :at [taggie.edn$eval9263 invokeStatic form-init2367470449524935680.clj 97]}]
 :trace
 [[taggie.edn$eval9263 invokeStatic form-init2367470449524935680.clj 97]
  [taggie.edn$eval9263 invoke form-init2367470449524935680.clj 97]
  ;; truncated
  [java.lang.Thread run Thread.java 833]]}

When reading such data from EDN with Taggie, you’ll get a regular map.

Adding Your Types

Imagine you have a custom type and you want Taggie to hande it:

(deftype SomeType [a b c])

(def some-type
  (new SomeType (atom :test)
                (LocalDate/parse "2023-01-03")
                (long-array [1 2 3])))

To override the way it gets printed, run the defprint macro:

(taggie.print/defprint SomeType ^SomeType some-type writer
  (let [a (.-a some-type)
        b (.-b some-type)
        c (.-c some-type)]
    (.write writer "#SomeType ")
    (print-method [a b c] writer)))

The first argument is a symbol bound to a class. The second is a symbol bound to the instance of this class (in some cases you’ll need a type hint). The third symbol is bound to the Writer instance. Inside the macro, you .write certain values into the writer. Avobe, we write the leading "#SomeType " string, and a vector of fields a, b and c. Calling print-method guarantees that all nested data will be written with their custom tags.

Now if you print some-type or dump it into EDN, you’ll get:

#SomeType [#atom :test #LocalDate "2023-01-03" #longs [1 2 3]]

The opposite step: define readers for SomeType class:

(taggie.readers/defreader SomeType [vect]
  (let [[a b c] vect]
    (new SomeType a b c)))

It’s quite simple: the vector of fields is already parsed, so you only need to split it and pass fields into the constructor.

The defreader mutates a global map of EDN readers. When you read an EDN string, the SomeType will be held. But it won’t work in REPL: for example, running #SomeType [...] in REPL will throw an error. The thing is, REPL readers cannot be overriden in runtime.

But you can declare your own readers: in src directory, create a file called data_readers.clj with a map:

{SomeType some.namespace/__reader-SomeType-clj}

Restart the REPL, and now the tag will be available.

As you might have guessed, the defreader macro creates two functions:

  • __reader-<tag>-clj for a REPL reader;
  • __reader-<tag>-edn for an EDN reader.

Each -clj reader relies on a corresponding -edn reader internally.

Emacs & Cider caveat: I noticed that M-x cider-ns-refresh command ruins loading REPL tags. After this command being run, any attempt to execute something like #LocalDate "..." ends up with an error saying “unbound function”. Thus, if you use Emacs and Cider, avoid this command.

Permalink

WebForms.java Update to WebFormsJS 1.6

Good news for Java developers: You can now experience WebForms Core technology in Java. Elanat has update the WebForms class for Java with the latest version of WebFormsJS, 1.6. The WebForms class on the server and the WebFormsJS library on the client constitute the WebForms Core technology.

By using WebForms Core technology, you don't need to develop a front-end, because all HTML tags are managed from the server.

The video below demonstrates the functionality of WebForms Core technology for content management.

Download Links

You can download the WebForms classes for the Java programming language from the link below.

WebForms.java

To download the latest version of WebFormsJS, visit the following link.

Download WebFormsJS

Example: Using WebForms Core in Spring Boot

Below is an example demonstrating how to use Webforms Core with Spring Boot. In this example, the background color and text color of the "form" tag will change to the colors selected in the TextBoxes after clicking the button.

In the view below, after the "form" tag, there is a "p" tag that holds the value of the "WebFormsTag" variable. The value of the "WebFormsTag" variable will be set through the controller.

View (default.html file)

<!DOCTYPE html>
<html>
<head>
  <title>Using WebForms Core</title>
  <script type="text/javascript" src="/script/web-forms.js"></script>
</head>
<body>
    <h1>WebForms Core Technology in JAVA</h1>
    <form method="post" action="/" >
        <label for="TextColor">Set Text Color</label>
        <input name="TextColor" id="TextColor" type="text" /><br><br>
        <label for="BackgroundColor">Set Background Color</label>
        <input name="BackgroundColor" id="BackgroundColor" type="text" /><br><br>
        <input name="Button" type="submit" value="Click to send data" />
    </form>
    <p th:utext="${WebFormsTag}"></p>
</body>
</html>

This controller class has a three method; Each method handles different types of HTTP requests and processes the input data using the WebFormsCore library to dynamically update the form on the webpage.

Controller

package com.example.demo;

import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;

import WebFormsCore.WebForms;
import WebFormsCore.InputPlace;
import WebFormsCore.Fetch;
import WebFormsCore.HtmlEvent;

@Controller
public class MyController {

    @GetMapping("/")
    public String PageLoad(Model model) {

        WebForms form = new WebForms();

        form.setGetEvent("BackgroundColor", HtmlEvent.OnKeyUp, "/set-color");
        form.setGetEvent("TextColor", HtmlEvent.OnKeyUp, "/set-color");

        model.addAttribute("WebFormsTag", form.exportToWebFormsTag(null));

        return "default"; // This will map to default.html
    }

    @GetMapping("/set-color")
    public ResponseEntity<String> PageLoad(@RequestParam(name = "set-text-color", required = false) String Color) {

        WebForms form = new WebForms();

        form.saveValue("BackgroundColor","*");
        form.setBackgroundColor("BackgroundColor", Fetch.saved("*"));
        form.saveValue("TextColor","*");
        form.setTextColor("TextColor", Fetch.saved("*"));
        form.setCache();

        return ResponseEntity.ok(form.response());
    }

    @PostMapping("/")
    @ResponseBody
    public String PageLoad(@RequestParam("BackgroundColor") String backgroundColor, @RequestParam("TextColor") String textColor) {

        WebForms form = new WebForms();

        form.setBackgroundColor(InputPlace.tag("form"), backgroundColor);
        form.setTextColor(InputPlace.tag("form"), textColor);
        form.setDisabled(InputPlace.name("Button"), true);
        form.addTag(InputPlace.tag("form"), "h3", null);
        form.setText(InputPlace.tag("h3"), "The background color and the text color was set!");
        form.delete(InputPlace.tag("h3"));
        form.assignDelay(3, -1);

        return form.response(); 
    }
}

The GIF image below shows how the above code works.

WebForms Core in JAVA

My first experience with Spring Boot

As someone who first created a simple MVC example with the Spring Boot framework, I must say that the crucial aspect of this framework is Mapping. In this example, I specified the "/set-color" path in the SetGetEvent method in the WebForms class and coded the necessary commands using @GetMapping("/set-color").

Initialization

First, we need to import the classes required in the WebForms Core technology. WebForms is the main class in this technology and the other classes play a supporting role.

import WebFormsCore.WebForms;
import WebFormsCore.InputPlace;
import WebFormsCore.Fetch;
import WebFormsCore.HtmlEvent;

GET request (Initial Page Load):

@GetMapping("/")
public String PageLoad(Model model) {

    WebForms form = new WebForms();

    form.setGetEvent("BackgroundColor", HtmlEvent.OnKeyUp, "/set-color");
    form.setGetEvent("TextColor", HtmlEvent.OnKeyUp, "/set-color");

    model.addAttribute("WebFormsTag", form.exportToWebFormsTag(null));
    return "default"; // This will map to default.html
}

PageLoad Method with GetMapping("/")

  • Initializes a new WebForms instance.
  • Sets events for BackgroundColor and TextColor inputs that trigger the "/set-color" path on the OnKeyUp event.
  • Exports the WebForms data to the model attribute WebFormsTag.
  • Returns the view name "default" which maps to "default.html".

Key up event

@GetMapping("/set-color")
public ResponseEntity<String> PageLoad(@RequestParam(name = "set-text-color", required = false) String Color) {

    WebForms form = new WebForms();

    form.saveValue("BackgroundColor","*");
    form.setBackgroundColor("BackgroundColor", Fetch.saved("*"));
    form.saveValue("TextColor","*");
    form.setTextColor("TextColor", Fetch.saved("*"));
    form.setCache();

    return ResponseEntity.ok(form.response());
}

PageLoad Method with GetMapping("/set-color")

  • Initializes a new WebForms instance.
  • Saves and fetches values for BackgroundColor and TextColor inputs.
  • Sets the background and text colors based on saved values.
  • Caches the changes and returns the response as a String.

Note: The general structure of WebForms Core technology is that the client makes a request to the server online and the server sends commands to the client. However, this technology also supports offline mode. In this method, the tag values ​​are first saved and then those values ​​are retrieved again without the need for server data. Calling the setCache method causes the client to make a request to the server only once. Offline mode in WebForms Core technology provides a high-level structure like React Server Components.

Form submit

@PostMapping("/")
@ResponseBody
public String PageLoad(@RequestParam("BackgroundColor") String backgroundColor, @RequestParam("TextColor") String textColor) {

    WebForms form = new WebForms();

    form.setBackgroundColor(InputPlace.tag("form"), backgroundColor);
    form.setTextColor(InputPlace.tag("form"), textColor);
    form.setDisabled(InputPlace.name("Button"), true);
    form.addTag(InputPlace.tag("form"), "h3", null);
    form.setText(InputPlace.tag("h3"), "The background color and the text color was set!");
    form.delete(InputPlace.tag("h3"));
    form.assignDelay(3, -1);

    return form.response(); 
}

PageLoad Method with PostMapping("/")

  • Takes BackgroundColor and TextColor as parameters.
  • Sets the background and text colors of the form based on the inputs.
  • Disables the submit button ("Button").
  • Adds and deletes a temporary heading ("h3") with a message.
  • Sets a delay for the next action.
  • Returns the response as a String.

Note: Please note that when the HTML page is requested for the first time, if you are using the WebForms class, use the exportToWebFormsTag method. For secondary requests, use the response method.

You can get the sample code in link below.
https://elanat.net/content/31/WebForms.java Update to WebFormsJS 1.6.html

WebForms Core technology is also supported by all JVM-based programming languages!

In addition to the Java programming language, you can use the WebForms classes located in the WebFormsCore directory directly or after compilation in the following programming languages (or implementations):

  • Kotlin
  • Scala
  • Groovy
  • Clojure
  • JRuby
  • Jython
  • Fantom
  • Mirah
  • Ceylon
  • Fantom
  • JPHP

Conclusion

With this update, Java developers can leverage WebForms Core, which allows them not only create more dynamic applications but also simplify their workflow significantly since no front-end knowledge is required anymore! This technology supports all JVM-based languages including Kotlin!

Permalink

Clojure Deref (Feb 6, 2025)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS). Thanks to Anton Fonarev for link aggregation.

Libraries and Tools

New releases and tools this week:

Permalink

One-Shot Babashka CLI Scripts

Like everyone I've been exploring AI tools and reading Simon Willisons excellent blog I discovered how he uses LLMs to generate one-off Python tools.

In this post I'm gonna share a bit more about how I generate Babashka scripts in similar ways.

It's all in the context

While Claude is pretty good at Clojure already it often generated code that didn't quite work. One particular case that kept occurring was that Claude kept thinking that babashka.fs/glob returns files (something that slurp could read) when in reality it returns a path. Globs can match directories after all.

So I started to copy together some pieces of documentation into a snippet that I'd always provide as context. Among the documentation I also included some guidelines for how I want things to be done.

# general tips for writing babashka code

1. When using `p/process` and `p/shell` a variable list of strings is expected at the end. When creating the command using a vector or similar, be sure to use `apply` so that the vector is unwrapped
  1. Example: `(apply p/process {} ["echo" "123"])`

2. Some useful flags for file processing scripts
  1. `--dry-run` only print actions, don’t execute
  2. `--verbose` log additional input

3. When creating namespaces and functions using the babashka.cli APIs, it is useful to alias them into your `bb.edn` file so that they can used as a shorter command
  -  e.g. `{:tasks {prune some.ns/prune}}`

Maintaining a list of prompts and context windows seems like a useful thing to do! There's a link to this one and a few others at the end.

Generating a script

Let's say I have a directory of markdown files and I want to add some frontmatter to each of them, say type: post.

With the provided context I'd write a short prompt

Please write a babashka CLI script that transforms markdown files by adding type: post to their YAML frontmatter. It should be possible to specify individual files as well as a glob pattern.

The result will be a script like this one. It's not the prettiest and if you look closely there's definitely a few interesting idiosyncrasies but overall it's pretty good for something I didn't have 30 seconds ago!

  • There's CLI options for verbose and dry-run modes
  • Files to process can be specified via files or pattern options
  • Emojis are used in progress messages
  • It's readable!

(If you properly want to read this code, this Gist will display it better.)

#!/usr/bin/env bb

(require '[babashka.cli :as cli]
         '[babashka.fs :as fs]
         '[clojure.string :as str]
         '[clj-yaml.core :as yaml])

(def cli-opts
  {:spec {:files {:desc "Individual markdown files to process"
                  :coerce []}
          :pattern {:desc "Regex pattern to match markdown files (e.g. \"posts/*.md\")"
                   :alias :p}
          :dry-run {:desc "Print what would be changed without making changes"
                   :coerce :boolean}
          :verbose {:desc "Print additional information during processing"
                   :coerce :boolean}}})

(defn extract-frontmatter
  "Extracts YAML frontmatter from markdown content.
   Returns [frontmatter remaining-content] or nil if no frontmatter found."
  [content]
  (when (str/starts-with? content "---\n")
    (when-let [end-idx (str/index-of content "\n---\n" 4)]
      (let [frontmatter (subs content 4 end-idx)
            remaining (subs content (+ end-idx 5))]
        [frontmatter remaining]))))

(defn update-frontmatter
  "Updates the frontmatter by adding type: post if not present"
  [markdown-str]
  (if-let [[frontmatter content] (extract-frontmatter markdown-str)]
    (let [data (yaml/parse-string frontmatter)
          updated-data (cond-> data
                        (not (:type data)) (assoc :type "post"))
          new-frontmatter (yaml/generate-string updated-data :dumper-options {:flow-style :block})]
      (str "---\n" new-frontmatter "---\n" content))
    markdown-str))

(defn process-file
  "Process a single markdown file, updating its frontmatter"
  [file {:keys [dry-run verbose]}]
  (let [content (slurp file)
        updated-content (update-frontmatter content)]
    (when verbose
      (println "📝 Processing" (str file)))
    (if (= content updated-content)
      (when verbose
        (println "⏭️ No changes needed for" (str file)))
      (do
        (when verbose
          (println "🔄 Updating frontmatter in" (str file)))
        (when-not dry-run
          (spit file updated-content))))))

(defn process-files
  "Process multiple markdown files based on CLI options"
  [{:keys [files pattern] :as opts}]
  (let [pattern-files (when pattern
                       (->> (fs/glob "." pattern)
                            (map fs/file)
                            (filter #(str/ends-with? (str %) ".md"))))
        all-files (concat (map fs/file files) pattern-files)]
    (if (seq all-files)
      (do
        (when (:verbose opts)
          (println "🔍 Found" (count all-files) "files to process"))
        (doseq [file all-files]
          (process-file file opts))
        (println "✨ Processing complete!"))
      (println "⚠️ No markdown files found to process"))))

(defn -main [& args]
  (let [opts (cli/parse-opts args cli-opts)]
    (if (:help opts)
      (println (cli/format-opts cli-opts))
      (process-files opts))))

(when (= *file* (System/getProperty "babashka.file"))
  (apply -main *command-line-args*))

Context windows

Using a prepared context window/prompt I can describe something in a generic way and turn it into something concrete in a twist.

Here's two context windows that might be helpful to you:

  1. The Babashka one previewed here
  2. A JS/ClojureScript conversion helper that understands Uix, Reagent & React and lets you translate code between those.

PRs to improve these are welcome!

I'm curious to make a full REPL loop that runs code that was generated this way. Natural language in, Clojure forms out, confirm to run. Could be fun!

Permalink

How to make each feature take less code

The Apropos show with Zach Tellman was a success (despite a technical hiccup; it wouldn’t really be Apropos without hiccups). And next week we have Alex Miller as a guest. Please join us on YouTube. You can ask us questions in the chat.

And let’s not forget Grokking Simplicity, my book for beginners to functional programming. Please check it out or recommend it to a friend. You can also get it from Manning.


How to make each feature take less code

Last week I lofted the dream of being able to write the same features with less code. Let’s talk about some practical ways of achieving this.

Reusable abstractions

The first way is the normal, humdrum way: you build abstractions that are used multiple times, so you get more features out of every line of code. This has been the pipe dream of the industry for a long time. I want to argue that we have achieved this, but in an unexpected way.

Let’s say your company write software to schedule the delivery of king cakes (it is Mardi Gras season). As the number of features grows, the programmers start to notice certain regularities. For instance, they have logic for doing date math all over the codebase. Each place has to handle rolling a date over to the next day when you go past midnight. So they start to gather those bits of code into a central place. They see that there is much repetition and irregularity. They redesign the date math to be a library with clean abstractions. And the code is smaller for it.

The smallness comes from:

  • Identifying important domain concepts (seen as repeated routines)

  • Writing them at the right level of generality

  • Reusing them throughout the code

But what happens when you go to write the next feature? You obviously use the date library you just wrote. But is there more? Does each new feature make the next one take less code? In my experience, no. Even if you’re very diligent about identifying reuse, there seems to be a limit to how far you can take this. Buy why?

Is it because there just isn’t that much repetition? Is it that even if you squeeze out all of the repeated ideas, there’s still so much special casing? I tend to believe that we don’t take it far enough. I only have some anecdotes about what it would really look like to wring out all of the general ideas (mostly through rewrites). Practically, it’s often not possible to do a rewrite. And the repeated concepts aren’t so clear in the code. And the system is already too big to hold in your head to see how things might work. Finding those new abstractions is a research project. We resign ourselves to modularize only to contain the mess.

But I promised that I’d argue that we have achieved this kind of widespread reuse. Well, we have, but not in a way that makes each feature less costly than the last. We have massive reuse in our software today. We build software by piecing together existing libraries—usually open source. Those libraries are reused all over the world, in many different contexts. We rarely write our own data structures—they come with the standard library. Or our own HTTP libraries. Or the UI widgets we use. It’s all reused code, but at a global scale.

Before I move onto the next section, I’d like to plug a paper:, Lisp: A language for stratified design. It goes into detail about how abstraction can improve your design. I did it on my podcast, but there’s also something missing from the paper.

Just for completeness, here are the relevant highlights:

  • our designs need to be more general than the problem they are solving

  • identifying the right domain concepts is crucial

  • layering function definitions allows us to express the specific in terms of the general

  • the closure property explodes the number of possible combinations through nesting

  • metalinguistic abstraction lets you switch to a new paradigm

It’s all about leverage. Each bit of leverage is a small amount of code, but it adds disproportionate power. Each layer adds leverage, building on the leverage from below. The top layer has a huge amount of power from the combination of the tiny bits of code it sits atop.

Driving the exponential

While writing my essay last week, I thought of another way to achieve the same thing, even if it is theoretical. Here’s a graph I showed last week of a dream for exponential feature growth. That is, more features per line of code:

In modern startups, they often look for exponential growth. If it’s not one of those lucky products that gets adoption with no effort, the road to exponential growth is slow and a slog.

If you ask anyone who’s been through it, they would say that their exponential curve is a stack of linear curves, each with a higher slope. Each new line is a discovery: An ad campaign that drives traffic. A new offer on the website. A new channel that brings new customers

Last week, I scoffed at the idea of a new language feature helping us out. All it can do is improve the slope. And it’s true: It only improves the slope once. What we need is a way to continually improve the slope. We need a way for new code we write to make the next line of code more powerful.

This sounds like an argument for continuous and aggressive software design. But it rarely happens. Why?

My hunch is the same explanation for why we Aristotelian physics lasted for thousands of years before a handful of scientists (Galileo, Newton, etc.) found a better abstraction within a lifetime. It is that it’s hundreds of times harder to design software this way (to get more and more out of less and less code) than it is to get the next feature working without design. Galileo spent years developing equipment and experimenting with balls of different weights and sizes. Newton toiled over scientific texts, refined his scientific method, and invented a new kind of math just to explain what he was seeing. It’s laborious and requires the luck of insight. As much as Newton mastered alchemy, he did not invent chemistry.

When you’re working in a well designed system that is at the right-hand side of this graph, it feels amazing. But the work to get there is enormous. In the lifecycle of businesses, where time is money, you’d much rather work on the red graph, with its high slope, for as long as possible, switching to the orange graph after you’ve secured marketshare and cashflow.

But we do see that big companies, after they make billions, do reinvest to switch over to the orange curve. Facebook experiments with their own PHP compiler, new GUI frameworks, and other paradigms. Google writes their own languages, invents distributed computing frameworks, and builds vast libraries of high-generality code. Once they make it big, they want to switch curves by rebuilding the lower layers. But it’s research projects. They have a low probability of success but a high payoff if they do. Most of us don’t have the luxury of that kind of work. We’re raking muck and piling it higher. It’s the economically correct thing to do.

Does it ever make sense to design before you’ve made it big? I think so. Every team has a maximum capacity of difficulty. As you approach that capacity, you slow down, even if the slope of the curve isn’t that high. Before you hit that capacity, you’ve got to do everything you can to bring the difficulty down. You can add tooling to flatten the curve. Or you can do some design work to lower the exponent. You’ve got to convince the company, using the idea of risk to the feature and revenue milestones, to get that curve under control. The biggest difficulty is that you have to go from feature-building mode to a more reflective, scientific curiosity mode, to look for the abstractions that will help.

Permalink

My Experience Using Prolog in 2024 to Solve Advent of Code

This was my first year doing Advent of Code. Previous years I had other stuff I wanted to do with my free time besides sit at a computer even more than I already do… after seeing some of my coworkers post epic solutions I got FOMO and joined the challenge.

I decided to try to solve things in Prolog to “kill two pigs with one bird” as the Gen Zs say. Green pig #1 was learning Prolog at more than just a really superficial level… I’m a big fan of logic programming languages and Prolog has been on my list for a while. Green pig #2 was I thought it would be a good fit for solving some of these challenges since I’d (in theory) mostly be encoding the rules and letting the computer do the work for me. (Foreshadowing: it was not really this simple in practice.) A bonus pig was flexing on everyone with 1337 skillz which I think I did actually achieve a little bit, at least I like to think so.

I made it all the way to Day 18 before I got sidetracked with family get-togethers and what not. I still think I’m going to go back and finish the last week one of these days tho. At least before AoC 2025 hopefully, if the world still exists by then. Checc out my code on GitHub

Someone convinced me to Twitch stream myself solving the challenges. I really didn’t believe anyone would want to sit around and watch someone program but I guess on the off chance any of the youfs in college might find it useful to see how the “pros” (legends?) work I went ahead and did it… I actually had a lot of people tune in, and no hecklers. I might be the first person to ever live stream coding in Prolog on Twitch. Will I be doing more of this in the future? Maybe. My dream is to be a content creator/influencer and get free merch and a blue checkmark next to my name. Every day I wake up and hang my head in disgust because I am only a minor local celebrity and my invite to the Met Gala gets lost in the mail most years. Gotta stay on that grind!!!!

Anyways you can checc out my Twitch here. Will I be doing more streaming in the future? Was the moon landing faked? Scientists don’t have answers to life’s two biggest questions yet, but keep your eyes peeled.

I also posted videos of some of the live streams on my YouTube for your viewing pleasure.

TL;DR How Was Using Prolog Anyway?

Some of the challenges this year were super easy to solve in Prolog, but others really were not. There was an excessive amount of 2-D grid stuff this year, and Prolog is just really not great at this stuff. (I used SWI Prolog, and maybe one of the other implementations is better, but as a beginner I found this easy to get started and it had a great Emacs integration.) There is no native array type in Prolog, so the idiomatic way of representing a grid is probably a linked list of linked lists… this is fine for small grids but the polynomial lookup/modification times really bite you in the butt when the grid is something larger like 2000 ✕ 2000 and you have to do thousands or even millions of lookups.

At the end of it I spent way more time trying to get my Prolog code to run quickly than I would have with a language that had a better selection of data types and libraries. I wanted to change my grid code to use bit sets instead of lists-of-lists so I could use bit arithmetic for O(1) lookup and modification, but there was no bit set data type or library (that I know of) in SWI prolog… so I had to implement a hacky version myself using big integers. A lot faster than lists-of-lists for sure, but if I was writing this in Clojure I could have just used something like java.util.BitSet. This story kept repeating itself… I had to implement A* myself

Almost every challenge had me coming up with an elegant and straightforward working solution (for the examples) relatively quickly that was just way too slow for the actual challenge input. I then had to spend time reworking my pretty code and making it faster and uglier in order to complete the actual input in a reasonable amount of time.

I think like the semantics and power of Prolog a lot better than miniKanren/core.logic. I really enjoyed wrapping my head around Prolog, and really wish it was more practical to use for more things.

Pits

I think the biggest missing things for me are:

  • A real lack of built-in data types… if Clojure can have immutable arrays then why can’t Prolog?

  • A lack of libraries – why am I wasting time implementing A* myself?

  • Fractured ecosystem – libraries and code written for SWI Prolog are incompatible with GNU Prolog or SICStus or Scryer.

  • A lack of easy escape hatches to drop to lower-level stuff – I wish I could have dropped to something java.util.BitSet or a raw byte array to implement my bitset code. I don’t even know how I would go about implementing a custom data type, I’m sure it would require C code but I don’t have a good sense of how well supported extensions like that are.

  • Questionable performance. I think it’s way to easy to shoot yourself in the foot here and write code that is too slow for real-world use… I spent way to much time performance tuning my code.

  • Some basic things like lambdas (anonymous functions/closures) are missing from the core language, and you can use various competing libraries that utilize operator overloading to add support for them… the syntax is ugly and sad. Prolog syntax is not terrible but I would have preferred writing it in S-expressions.

  • I wish constraint logic programming was more generalized and not limited to specific libraries that apply constraints to specific data types. What if this was the way things worked normally instead of being something you have to opt in to?

  • Thinking in terms of relations instead of functions really is harder, and made my head hurt sometimes. Generally not too bad once I got used to it, but it was a little sad when I had to rework some of my relations so they would work efficiently if say the first parameter was ungrounded versus the second – this sort of “extra-logical” code needed to make things work efficiently takes away from the magic of pure logic programming

Peaks

  • The promise of writing declarative rules-based code (tell the computer what you want instead of how you want it to do something) is so alluring it still makes up for the harsh reality of actually using things in a lot of cases.

  • Pattern matching is amazing – this makes me want to use something like Clojure defun all over the place

  • SWI Prolog’s Emacs integration, Sweep, is really good, and debugging my code was really easy – especially compared to something like core.logic, which in my experience was a real nightmare to debug.

  • I really like Prolog’s built-in definite clause grammar parsing capabilities. They worked really well for parsing the input for most of the challenges.

  • I used Microsoft Copilot to tell me how to do basic stuff in Prolog, and it generally gave me pretty good answers – better than I expected given how niche Prolog is. I guess 50 years of Prolog code to train on worked well here. And while more JS code probably gets written in a given day than all the Prolog code ever written, because Prolog is so niche the AIs are probably training mostly on good code.

  • I am happy Prolog is not as dogmatic about pure logic programming as miniKanren. To me it seemed like the Common Lisp vs. Scheme – the former is for actually solving real problems and the latter is more for writing research papers and admiring the elegance of your solutions.

  • There are lots of good books about learning Prolog from “back in the day”… they don’t all cover a lot of modern stuff like CLP but they did a pretty good job of helping me wrap my head around thinking about things in a Prolog way (as opposed to “writing C in any language”).

Other thoughts

  • I “learned” Prolog in a programming languages course in college, and wow did they really sell the language short. I really did not fully understand the power until I spent a month doing self-study

  • The Mercury language looks really interesting and makes bold promises about performance but it doesn’t seem to have a lot of resources available for learning it.

  • I would really love if there was a nice implementation of Prolog in Clojure that let me use Clojure goodness like S-expressions and macros and drop-down to low-level Java when needed. I think there are a few JVM implementations of Prolog, but I’m not sure what state they’re in. I spent a little time playing around with things and got a basic Prolog implementation working in Clojure… it’s honestly not rocket science. If I get more free time I might try to play around with this more and see if I can turn it into something actually useful. Maybe I can use it for AoC 2025.

  • I know back in the day a lot of the Common Lisp people played around with implementing Prolog in CL – I even have a few books on my bookshelf that walk thru how to write an implementation. Time to do some reading.

👇 Like and subscribe below! 👇

Permalink

Classifying data for developers in a hurry using LLMs

My need is to categorize RSS feed entries coming from various tech websites into themes like AI, Big tech, Cloud, Deep Tech, Fodo delivery, etc.

Initially, I handcrafted my classifier to get the text from the categories, sometimes provided for feed entry, or the title and description if nothing better. It works okay but some entries were missing themes and the code needs upkeeping to incorporate the latest tech trends.

It also took me a while to understand which themes make sense for me and how to group them together. I did not know the domain as well as I thought and the dataset while small, was too big to check it manually.

I tried out some NLP techniques like stemming and tokenizing texts but the results were a bit lackluster and it would have required even more refinement to get it to work by weighting the words found depending on where they are and using heuristics to find out what is it out.

LLM enters the scene

Instead of this handcraft classifier, what if I ask an LLM to find out themes for me? I use ChatGPT because it was free but I think it would work with any LLM. If I went to use an API, I would use Mistral. Here is a maintained list all the LLMs you can play with for free: https://github.com/eugeneyan/open-llms

Without much refinement, it works pretty well. Here is the prompt I use:

I want you to categorize news items in different categories. At maximum I want 3 themes by item but I want only accurate theme. My themes are: agritech, agtech, ai, big tech, biotech, climate tech (the rest of my list). I don’t want any other themes I want the result to be in EDN like this: {:title “” :themes [] :reason “explain why these themes”} Below are my items, as an EDN datastructure:
[{:title
"Zuck shrugs off DeepSeek, vows to spend hundreds of billions on AI",
:link
"https://techcrunch.com/2025/01/29/zuck-shrugs-off-deepseek-vows-to-spend-hundreds-of-billions-on-ai/",
:categories ["AI" "chips" "deepseek" "Meta" "nvidia"]}
{:title
"LinkedIn passes $2B in premium revenue in 12 months, with overall revenue up 9% on the year",
:link
"https://techcrunch.com/2025/01/29/linkedin-passes-2b-in-premium-revenues-in-12-months-with-overall-revenues-up-9-on-the-year/",
:categories
["Enterprise" "Social" "earnings" "LinkedIn" "Microsoft" "Premium"]}
{:title
"Elon Musk claims Tesla will launch a self-driving service in Austin in June",
:link
"https://techcrunch.com/2025/01/29/elon-musk-claims-tesla-will-launch-a-self-driving-service-in-austin-in-june/",
:categories ["Transportation" "self-driving car" "Tesla"]}
{:title
"Threads adds another 20M monthly users since December, reaching 320M",
:link
"https://techcrunch.com/2025/01/29/threads-adds-another-20m-users-since-december-reaching-320m/",
:categories ["Apps" "earnings" "Meta" "social networking" "Threads"]}
{:title
"U.S. Copyright Office says AI generated content can be copyrighted — if a human contributes to or edits it",
:link
"https://venturebeat.com/ai/u-s-copyright-office-says-ai-generated-content-can-be-copyrighted-if-a-human-contributes-to-or-edits-it/",
:categories
["AI"
"Business"
"AI, ML and Deep Learning"
"category-/News"
"Conversational AI"
"copyright"
"copyright office"
"copyright protections"
"government"
"intellectual property"
"NLP"
"US Copyright Office"]}
{:title
"Hackers are hijacking WordPress sites to push Windows and Mac malware",
:link
"https://techcrunch.com/2025/01/29/hackers-are-hijacking-wordpress-sites-to-push-windows-and-mac-malware/",
:categories
["Security"
"Apple"
"Automattic"
"Chrome"
"cybersecurity"
"macos"
"malware"
"Windows"
"WordPress"]}
{:title
"Agentic AI needs orchestration: How ServiceNow’s AI orchestrator automates complex enterprise workflows",
:link
"https://venturebeat.com/ai/agentic-ai-needs-orchestration-how-servicenows-ai-orchestrator-automates-complex-enterprise-workflows/",
:categories
["AI"
"Agentic AI"
"ai orchestration"
"category-/Computers & Electronics/Software/Business & Productivity Software"
"enteprise workflow"
"itsm"
"LangChain"
"LLM router"
"Now Assist"
"Now Platform"
"ServiceNow"]}
{:title "Microsoft brings a DeepSeek model to its cloud",
:link
"https://techcrunch.com/2025/01/29/microsoft-brings-a-deepseek-model-to-its-cloud/",
:categories
["AI"
"Enterprise"
"Azure"
"azure ai foundry"
"copilot+ pcs"
"deepseek"
"deepseek r1"
"Generative AI"
"Microsoft"
"r1"]}
{:title "Joe Lonsdale’s 8VC seeking $1B for new fund",
:link
"https://techcrunch.com/2025/01/29/joe-lonsdales-8vc-seeking-1b-for-new-fund/",
:categories
["Fundraising" "Venture" "8VC" "defense tech" "Joe Lonsdale"]}
{:title "DeepSeek might have a trademark problem in the US",
:link
"https://techcrunch.com/2025/01/29/deepseek-might-have-a-trademark-problem-in-the-u-s/",
:categories
["AI"
"Apps"
"brand"
"deepseek"
"patent"
"Trademark"
"United States Patent and Trademark Office"
"uspto"]}
{:title "MoviePass might pivot to crypto",
:link
"https://techcrunch.com/2025/01/29/moviepass-might-pivot-to-crypto/",
:categories ["Media & Entertainment" "crypto" "moviepass" "web3"]}
{:title
"Climate change ignited LA’s wildfire risk — these startups want to extinguish it",
:link
"https://techcrunch.com/2025/01/29/climate-change-ignited-las-wildfire-risk-these-startups-want-to-extinguish-it/",
:categories
["Climate"
"Startups"
"BurnBot"
"climate change"
"FireDome"
"Google"
"Kodama"
"la wildfires"
"muon space"
"Pano AI"
"vibrant planet"
"wildfires"]}
{:title
"WeatherTech Meteomatics raises €21.1 million to predict weather’s impact on critical enterprises",
:link
"https://www.eu-startups.com/2025/01/weathertech-meteomatics-raises-e21-1-million-to-predict-weathers-impact-on-critical-enterprises/",
:categories
["Funding"
"Switzerland-Startups"
"Alantra Energy Transition"
"Armira Growth"
"Fortyone"
"Klima"
"Martin Fengler"
"Meteodrones"
"Meteomatics"
"Saint Gallen"
"Series C"
"WeatherTech"]}

And the output file:

[
{:title "Zuck shrugs off DeepSeek, vows to spend hundreds of billions on AI",
:themes [:ai :big-tech],
:reason "The article discusses Meta's investment in AI, making it relevant to AI and Big Tech."}

{:title "LinkedIn passes $2B in premium revenue in 12 months, with overall revenue up 9% on the year",
:themes [:big-tech :money],
:reason "LinkedIn, owned by Microsoft, is a major Big Tech company, and the focus on revenue categorizes this under money."}

{:title "Elon Musk claims Tesla will launch a self-driving service in Austin in June",
:themes [:electric-vehicles],
:reason "Tesla's autonomous vehicle plans are directly related to electric vehicles."}

{:title "Threads adds another 20M monthly users since December, reaching 320M",
:themes [:big-tech],
:reason "Threads is a Meta-owned platform, making it part of Big Tech developments."}
...
{:title "WeatherTech Meteomatics raises €21.1 million to predict weather’s impact on critical enterprises",
:themes [:climate-tech],
:reason "Meteomatics focuses on climate and weather impact prediction."}
...

A few takes from this:

  • It can generate EDN just fine, avoiding text parsing or JSON parsing. The output is just reusable as it is, no need to fiddle with it.
  • It’s pretty accurate considering I use only 100 entries (some having no categories and a minimal title. To be fair the themes often have a “close to direct match” in the text but not close enough to be matched by word.
  • The `reason` field is overkill for this task but it was useful for making sense of the location classifier attempt (see below).
  • I did not use the description because I reached the limit of ChatGPT free tiers really quickly. If I were to use this system for production code, I would clean up the entry to limit the text sent (ie. turn the HTML into a simple markdown for the description and remove the redundant text) and of course, I would not mind spending a few euros to improve the results(*).
  • I could automate the process for free considering the limited amount of data I have if I use Mistral or any other free provider with an API. If I had more data the cost would also be really low, especially compared to the price of coding this classifier from scratch.

(*) For production, there is also a confidentiality/security question. How okay is it to send all your data to a third-party actor under the patriot act or similar laws? For certain fields, it is just not an option in the current world.

Removing the LLM

There is a lot of similarity in the data to categorize and I want to keep the cost and complexity low since it’s a side project. Nobody wants an extra dependency if it can be avoided. What if I ask the LLM to create a classifier? I used this prompt:

I’m a Clojure developer. Can you generate a Clojure code using regex to find the same themes with the same data as input?

And I ask for a second try by restating the themes to use. Here is the fragment of the code output:

(def themes
{"Agritech" #"(agritech|agriculture technology|smart farming)"
"Agtech" #"(agtech|precision agriculture|farm automation)"
"AI" #"(artificial intelligence|AI|machine learning|neural network|deep learning)"
"Big Tech" #"(big tech|FAANG|Google|Apple|Microsoft|Amazon|Facebook)"
"Biotech" #"(biotech|biotechnology|genomics|biopharma|bioengineering)"
"Climate Tech" #"(climate tech|carbon capture|renewable energy|climate solutions)"
"Clojure" #"(Clojure|functional programming|Lisp)"
"Cloud" #"(cloud computing|AWS|Azure|Google Cloud|SaaS|serverless)"
"Crypto" #"(crypto|cryptocurrency|blockchain|Bitcoin|Ethereum|DeFi|NFT)"
"Deeptech" #"(deeptech|advanced technology|quantum computing|nanotech|fusion energy)"
"Ecommerce" #"(ecommerce|online shopping|Amazon|Shopify|marketplace)"
...

The output contains a map of regexes that match the domain and some boilerplate to iterate on it.

I did not reuse the code directly but I used it as a starting point for a regex state machine. Mainly:

  • If it starts with a what/which/when… and ends by a ‘?’, then it’s an opinion piece
  • A few regex improvements
  • If it has the dollars, euros, or pounds sign it’s about funding. If there is a pound it’s about the UK.
  • I incorporate the regex matching in my system.

And that was it.

Overall it took me more time to compare the new classifier with the old one than write the code. Writing this, I guess I could have used an LLM for comparing the results 🤔. Pretty sweet to avoid the tedious coding and domain exploration. I know it could be even better if I fed more data and played around with the prompts more but it is good enough for my purpose.

After that, I did the same for my feeds related to Energy, and a few minutes later I had the same kind of classifier. This time I let the LLM suggest themes since my knowledge of the energy domain is more limited. In 5 minutes it was ready to be used.

What about extracting location in my tech feeds?

Calling the LLM with the EDN data works great. It was close to being 100% accurate. Especially for well-known companies, it figures out where they are based without obvious it in the text and can figure out all the aliases and variants of country names.

However, the code generated to parse it was not usable. The data sample was probably too small and the LLM infers that this job should be done by an external API instead of having the tech hubs cities in memory and using that to match. Maybe with more context on what I’m building, it could still figure it out? After all software complexity is arbitrary, I’m sure some dev will be happy to get the code sample to run against a third-party provider. If I were to use a third-party provider for this task, an LLM would make more sense though and it would be in the same price range.

What else could a LLM do for me?

I feel like I’ve only scratched the surface of what’s possible for my smart RSS reader. LLMs work well for classifying similar data when categories are limited and clues are obvious for a human. The costs of calling LLMs for this purpose are dropping almost every week, which makes it a handy tool to have in my toolbox.

An interesting next step would be prompting an LLM with unmatched data to suggest classifier improvements. If my classifier remains modular, the system could self-improve it with minimal human oversight — just a quick code review. Pretty exciting stuff.

Conclusion

Using LLMs for development has a lot of potential to prevent spending time on tedious code and make experimented developers more productive. I’m sure a machine learning specialist will produce a better system but without that expertise, I’m happy to solve my problem in a few hours for a cost of 0 and 0 new dependency.

I did not try out with Cursor + Anthropic but I guess that would have worked similarly with slightly better code suggestions: I could have reached a solution even faster without even leaving my IDE. The key is to keep the function small, dedicated to the task, and free of side effects, which is good code hygiene anyway.

Permalink

Clojure Keybindings

Clojure-logo

Clojure Logo.

Introduction

In my previous blog post Copilot Keybindings I wrote about my VSCode keybinding configurations with Copilot. In this blog post I shortly list my favorite Clojure programming keybindings.

VSCode + Calva

I use nowadays excellent VSCode Calva extension with Clojure programming. I can really recommend Calva if you use VSCode with other programming languages as well. Calva provides a really good REPL integration, and also other interesting stuff for Clojure programming. I have written three articles regarding my Calva configurations, you might want to read those articles as well:

My Favorite Clojure Keybindings

I use paredit for structural editing of Clojure code. These are the paredit keybindings that I use the most.

Slurping and barfing (several keybindings). These are the standard paredit features every paredit user uses constantly. I have already mentioned these in my previous blog posts. The Calva paredit documentation explains slurping and barfing well. Another great website to visualize slurping and barfing is The Animated Guide to Paredit.

Raise Sexp (alt+shift+i). This is also my favorite with Clojure programming. Quite often you realize that you don’t need to apply outer function, just raise the Sexp inside the function.

Evaluate Sexp (alt+l). Evaluating code in editor is part of the standard REPL workflow. You definitely have to learn that.

Evaluate Current Form (Sexp) to Comment (alt+shift+l). This is a nice feature. When you are working something, you might want to experiment and save the evaluation result to a comment, so that you can have the result as a visual aid in the comment. Like this:

  (interpose 0 [1 2 3])<alt+shift+l here>
  ;;=> (1 0 2 0 3)

Kill Next Sexp (ctrl+shift+k). Example: [:a |:b :c] => [:a | :c]. I started programming using Emacs as my editor way back in the 90’s. Therefore, I have used Emacs keybindings in every editor I have used ever since (Eclipse, IntelliJ, and now VSCode). Since ctrl+k is the kill till the end of line and add to kill ring Emacs command, it is natural to have a bit similar keybinding for killing Sexps.

Kill All Next Sexps (ctrl+alt+k). The previous keybinding kills only the next Sexp. This keybinding kills all next Sexps inside this Sexp. Example: [:a |:b :c] => [:a |].

Copy Sexp (alt+shift+k). Again, this is a bit similar keybinding, but instead of killing we copy the Sexp. This is actually provided by excellent Joyride which I learned from Clojurians Slack in #calva channel some years ago.

  {
    "key": "shift+alt+k",
    "command": "joyride.runCode",
    "when": "calva:connected && calva:keybindingsEnabled && editorLangId == 'clojure'",
    "args": "(p/do (vscode/commands.executeCommand \"paredit.selectForwardSexp\") (vscode/commands.executeCommand \"editor.action.clipboardCopyAction\") (vscode/commands.executeCommand \"cursorUndo\"))"
  },

Joyride is an excellent way for a clojurian to extend VSCode functionality. I highly recommend to read more about Joyride.

Show Clojure Documentation for a Symbol (ctrl+alt+d). I learned this one also from Clojurians Slack in #calva channel some years ago.

  {
    "key": "ctrl+alt+d",
    "command": "joyride.runUserScript",
    "args": "clojuredocs.cljs"
  },

The clojuredocs.cljs can be found in the Joyride repo. This is a nice feature. You can browse the Clojure documentation in VSCode without need to open an external web browser. Like this:

In the symbol: ctrl+alt+d:

Show documentation - click

=> Simple browser opens the Clojure documentation for the symbol.

Show documentation - show browser

Read more about ClojureDocs integration in the Calva User Guide.

Copy Clojure Documentation for a Symbol to a Rich Comment (ctrl+shift+d). Saves you from opening the documentation, and copy-pasting it manually. This way you can have the documentation in a Rich comment for experimentation.

Some VSCode Spesific Keybindings Used in Programming

These keybindings are not Clojure related, but work with any programming language in my VSCode configuration:

Go to Definition (ctrl+shift+i). E.g. with Clojure, you can quickly navigate to the function definition. This is actually a great keybinding to learn Clojure standard library functions - look the source code.

Go Back (ctrl+shift+j). Since I have configured i,j,k,l as arrow keys (see e.g. Dygma Raise Keyboard Reflections Part 1), if I “go up” to the Symbol using i = up, it is logical to go back with j = left. Ok, I admit. Logically, it would make more sense to “drill down” to the symbol definition (with k), but since ctrl+shift+k is reserved for killing Sexps I had to use i.

Go Forward (ctrl+shift+l). I guess you already figured out, why l (= right).

Kill Editor Tab (ctrl+x k). I use this quite often after Show Clojure Documentation for a Symbol, i.e., when I have browsed the documentation, I close the tab with ctrl+x k.

Next/previous Editor Tab (CapsLock+ctrl+u / CapsLock+ctrl+o). I actually use CapsLock as a special key, read more about that in my blog Linux Keyboard Configuration with Keyd. Anyway, using these keybindings it is fast to navigate between different editor tabs. I use the same keybingings with Ghostty (see: Linux Configurations for Year 2025) - it is a good general rule to use the same keybingings for similar functionality in different tools.

Show Settings (ctrl+t ctrl+j). Every once in a while you need to check or configure something in VSCode. It is nice to have a shortcut for VSCode settings.

Show Keybindings (ctrl+t ctrl+k). For tweaking keybindings, it is also nice to have a keybinding to configure keybindings.

Show Hover Documentation for a Symbol (ctrl+alt+e). You can see the hover documentation when you place your mouse over the symbol. I prefer not to use mouse, if possible. Therefore I have a keybinding to show the hover documentation. If you want to scroll the hover documentation, you need to give the command twice (i.e., ctrl+alt+e and again ctrl+alt+e). To close the hover window, press Esc. This is a nice feature, since you can have the Clojure documentation in the same editor tab as a hover window, scroll the documentation, and close it - all just using the keyboard. Like this:

Hover documentation

Hover documentation.

Help Needed!

There is one issue regarding the above mentioned keybindings that do not work flawlessly with my no-mouse workflow. If I open the Clojure documentation with the ctrl+alt+d keybinding, I do get the Simple Browser view open, and Joyride script loads the Clojure symbol documentation in Simple Browser. But there are two issues:

  • I cannot scroll the web content unless I click using the mouse inside the web content. Only after that the focus is inside the web content, and I can scroll the web page using my keyboard.
  • If I have clicked the web content, I cannot close the Simple Browser tab view with my ctrl+x k chord, since the web browser consumes all keyboard events. I once again need to use my mouse to close the Simple Browser tab.

This is a bit annoying. If anyone has solved this issue, you can ping me in the Clojurians slack.

Conclusions

As a programmer, it is important to configure your editor to make your workflow as smooth as possible. Therefore, try to figure out logical keybindings for various common functionalities.

And, remember to keep your editor settings in Git!

The writer is working at a major international IT corporation building cloud infrastructures and implementing applications on top of those infrastructures.

Kari Marttila

Kari Marttila’s Home Page in LinkedIn: https://www.linkedin.com/in/karimarttila/

Permalink

About fixing the right problem

It is fascinating how we humans often choose to treat the symptom instead of addressing the actual problem — like endlessly replacing water-filled buckets with empty ones instead of fixing the leaking roof.

At least that is how it felt at work the other day… again. This time though, the problem was easier to spot in the code than usual.

Our code needs to sort a map numerically (:X29, :X137, :X1108) rather than lexically (:X1108, :X137, :X29).

But the business hands down specifications to my team where these keys sometimes include letters, like :X137a.

For reasons now shrouded in the mists of time, we opted to replace letters with numbers in the code. Usually :X137a is represented by :X137.1.

The gap between business specifications and the limitations of the technical solution isn’t obvious. New developers fall into this trap almost inevitably, and even seasoned developers would occasionally forget that letters aren’t allowed.

So of course I found the team fixing the sorting to account for letter postfixes 🚀 creating a test case that blocks “merge to main branch” when letters are mistakenly introduced. 🥺 😭

I suspect that treating symptoms annoys me more than the average person for some reason. This time, my annoyance got the better of me, and I found the code responsible for sorting:

(into (sorted-map) (reduce (fn [acc [k v]]
                             (conj acc [(Float/parseFloat (apply str (rest (name k)))) v]))
                           []
                           some-map))

It should probably have been written like the following for readability:

(->> some-map
     (reduce (fn [acc [k v]]
               (conj acc [(Float/parseFloat (apply str (rest (name k)))) v]))
             [])
     (into (sorted-map))

The code (apply str (rest (name k))) is used to remove the X prefix, and I found that the keys in the resulting map weren’t used for anything other than sorting.

My gut feeling told me that sorted-map-by was the way to go. Even if the “converted to float” keys would be expected later in the call chain, it seemed likely that a trivial code change would be enough to support the original keyword keys instead.

Not having sorted things for a while, I needed to refresh my understanding. A vector representation proved to get the job done:

(sort [[137 "a"] [29 "b"] [1108 "x"]])
; ([29 "b"] [137 "a"] [1108 "x"])

All that was missing, was a function that could convert the special keyword to this vector format:

(defn make-sortable
  [n]
  (-> (name n)
      (subs 1) ; removes "X"
      (str/split #"(?=[^\d\.])" 2) ; splits number from non-number
      (update 0 #(Float/parseFloat %))))

While writing this blog post, I suddenly noticed the following sorting behavior, which would occur since str/split cannot make any guarantees about length of the returned vector:

(sort [[137 "a"] [137] [29 "b"] [1108 "x"]])
; ([137] [29 "b"] [137 "a"] [1108 "x"])

Having 137 sorted before 29b was not going to cut it, so I made the following amendments:

(defn pad-vector
  [v len pad-val]
  (vec (take len (concat v (repeat pad-val)))))

(defn make-sortable
  [n]
  (-> (name n)
      (subs 1) ; removes "K"
      (str/split #"(?=[^\d\.])" 2) ; splits number from non-number
      (pad-vector 2 nil) ; enforce same length for consitent sort
      (update 0 #(Float/parseFloat %))))

Now, back from the detour, the above would allow “sorting code” to look like the following:

(->> some-map
     (into (sorted-map-by #(compare (make-sortable %1)
                                    (make-sortable %2)))))

Instead of adding 30 lines of test suite code to prevent using letters, we wrote 12 lines that allow letters in keys while maintaining functionality — implemented slightly faster. Most importantly, the software behavior is now more intuitive, leading to less confusion.

Way too often, we avoid exploring the unknown, and instead reach for an empty bucket. It’s extra frustrating because my team is filled with both intelligent and competent people. Though most are young and inexperienced, I think the main problem is rooted elsewhere.

In the organization where we work, there’s a culture that often frowns upon suggesting fixing the root cause. Management assumes it’s generally faster to delay and work around problems, and mentioning “technical debt” is a no-no. Do we really want to teach the next generation of developers to “fix” problems like this? How much of the code AI is learning from is addressing symptoms rather than root causes?

Not knowing any better, you might think that at least this time, we got the roof fixed. But the reason we need sorting in the first place is that we never implemented a “dependency resolver”. Instead, we rely on sorting to guarantee dependency availability, following the convention: Never depend on something with a higher number.

How could this ever go wrong? 🤣

Can we please fix the flipping roof already?

Permalink

Clojure Deref (Jan 30, 2025)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS). Thanks to Anton Fonarev for link aggregation.

Libraries and Tools

New releases and tools this week:

  • brew-install 1.12.0.1501 - Clojure CLI

  • editscript 0.6.6 - A library to diff and patch Clojure/ClojureScript data structures

  • neanderthal 0.52.0 - Fast Clojure Matrix Library

  • cljs-worklet 0.2.0 - Run ClojureScript functions on a worklet thread in React Native

  • hazel - POC exploring adaptation of Datomic principles for the frontend

  • clojure-plus 1.0.0 - A collection of utilities that improve Clojure experience

  • next-jdbc 1.3.994 - A modern low-level Clojure wrapper for JDBC-based access to databases

  • replicant 2025.01.29 - A data-driven rendering library for Clojure(Script) that renders hiccup to DOM or to strings

  • basilisp-blender 0.4.0 - A library designed to facilitate the execution of Basilisp Clojure code within Blender

  • stratify 0.3.0 - Explore and improve architecture of software, visualize structure and dependencies of codebases, calculate code metrics

  • pretty 3.3.1 - Library for helping print things prettily, in Clojure - ANSI fonts, formatted exceptions

  • cli-tools 0.15 - CLIs and subcommands for Clojure or Babashka

  • statecharts 1.2.13 - A Statechart library for CLJ(S)

  • fulcro 3.8.2 - A library for development of single-page full-stack web applications in clj/cljs

  • fulcro-rad-semantic-ui 1.4.6 - Semantic UI Rendering Plugin for RAD

  • clj-java-decompiler 0.3.7 - REPL-integrated Clojure-to-Java decompiler

  • cli-tools 0.15.1 - CLIs and subcommands for Clojure or Babashka

  • calva 2.0.485 - Clojure & ClojureScript Interactive Programming for VS Code

  • deps.clj 1.12.0.1501 - A faithful port of the clojure CLI bash script to Clojure

  • virgil 0.3.2 - Recompile Java code without restarting the REPL

  • lexical-chocolate 0.0.3 - Provides utilities for building lexical contours

  • devcontainer-templates 1.1.2 - Devcontainer templates for Clojure

  • dda-backup 5.3.0 - Backup library

Permalink

Nothing Janky About This New Programming Language

With programming languages among the top interests of TNS readers, we are constantly on the lookout for new languages that could have a potential impact on developers. Jeaye Wilkerson’s jank is a dialect of Clojure that he says can be used anywhere C++ and Lua are used. It is a general-purpose language aimed at gaming and other use cases.

jank includes Clojure’s code-as-data philosophy and strong macro system. It is a functional-first language that builds upon Clojure’s rich set of persistent, immutable data structures, Wilkerson said.

TNS News Editor Darryl K. Taft interviewed Wilkerson, who recently quit his job at Electronic Arts (EA) to focus full-time on jank, about his creation.

What was the tipping point that made you decide to work on jank full-time?

Wilkerson: jank has been ramping up over the past two years, since I switched to part-time at EA. At this point, it’s an incredibly popular project, for something which is unreleased, and I still didn’t feel like I had enough time to work on it. I had been considering working on jank full-time for months. I suppose the tipping point was when my wife said something along the lines of “Well, are you going to do it or not?”

What were the key technical challenges in developing jank?

Wilkerson: jank is marrying Clojure and the native world. The two could not be more dissimilar. To get Clojure running in a native environment, we first needed to be able to JIT compile native code. The tooling around this is still young and I’ve had to work closely with multiple LLVM devs to get issues fixed.

Beyond that, Clojure is built on the JVM, which has decades of development put into it. To replicate that on native, I need to effectively build a mini VM myself. This includes the object model, the module loader, the JIT compiler, and the garbage collector, to name a few. I use off-the-shelf solutions where possible, but everything needs to be put together manually.

Lastly, jank aims to provide seamless interop with C++ from an entirely dynamic language. This requires JIT compiling C++ at runtime so that we can know the types of values, look up which functions exist, do overload resolution, instantiate templates, and carry on C++’s RAII guarantees. There is no mainstream dynamic language which has this level of interop with C++, from what I’ve seen. The primary reason is that it’s incredibly difficult.

What’s your vision for jank’s development over the next year?

Wilkerson: In 2025, I will release an alpha version of jank for people to start using. I’ll collect feedback, improve stability, and start changing the Clojure ecosystem to be one which has first-class support for native.

How are you approaching sustainability/funding for jank’s development?

Wilkerson: This year, I’m just focused on getting jank out the door. I’m prepared to not receive any funding, if none comes in. However, in the past, jank has received open source grants from Clojurists Together and I will continue to apply for those. It’s my dream that I can be paid to work on jank full-time, but in order to do that I need jank to start providing value and become and indispensable part of Clojure’s ecosystem. If that doesn’t happen, it’s no problem. I’ll get another job when I need to, but I’ll keep working on jank either way.

Are you working completely solo, or do you have any collaborators/contributors?

Wilkerson: I’m the only one working on jank full-time, but I do have some regular contributors. I also have three mentees, as part of the SciCloj mentorship program, with whom I meet weekly and give them tasks which further develop jank and their compiler hacking skills.

Building a healthy community is important to me. Encouraging people to help out, and making it easy for them to do so, is part of that.

Are you looking to build a community around jank? If so, what kind of contributions would be most valuable?

Wilkerson: Absolutely. In the coming weeks, jank will get easier and easier to install. As of today, it can now be installed via homebrew, which wasn’t possible last month, for example. As jank becomes more accessible, having people try it, provide feedback, and, more importantly, report bugs will be crucial to jank having a stable launch later this year.

How does jank handle performance optimization, particularly for resource-intensive applications?

Wilkerson: jank is written in C++ and has seamless C++ interop, but it’s not C++. It’s not a systems programming language. It’s a Clojure dialect and it has the same performance characteristics of Clojure. jank will do well anywhere Clojure does well, and perhaps in other places, too, since it’s lighter in memory usage and starts up much faster.

With that said, after I reach parity with Clojure and I continue to develop jank’s feature set beyond that, I will be adding more control on the spectrum of dynamism so that some parts of jank can be locked down, with little to no dynamic allocations, using static typing and monomorphized functions.

Clojure, historically, has not had a strong focus on optimizing its compiler. It relies on the JVM for all of the heavy lifting. jank will not follow that path; I think that a smarter compiler can make all the difference and every low hanging fruit will be plucked.

Are there any specific technical decisions or trade-offs you made that you think would be interesting for other developers to understand?

Wilkerson: The JVM is a heavily optimized machine. Competing with it, from the ground up, is tough, even starting on native. In the micro-benchmarking I’ve done, jank is quite competitive with Clojure, but I’ve done this by taking advantage of the fact that I’m building a system specifically for jank while the JVM is a much more generic system. An example of that is jank’s object model, which doesn’t use virtual dispatch to avoid the cost of vtables. [This is documented here: https://jank-lang.org/blog/2023-07-08-object-model/]

What advice would you give to other developers considering a similar move with their projects?

Wilkerson: You only live once.

The post Nothing Janky About This New Programming Language appeared first on The New Stack.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.