Clojure Goodness: Interleave Keys And Values Into A Map With zipmap

The Clojure function zipmap create a map by interleaving a collection of keys with a collection of values. The first element of the keys collection is the map entry keyword and the first element of the values collection is than the map entry value, and so on for the other elements in the keys and values collections.

In the following example code we use zipmap using different examples of keys and values collections:

(ns mrhaki.core.zipmap
  (:require [clojure.test :refer [is]]))

;; zipmap creates a map with keys from the
;; first collection and values from the second.
(is (= {:name "Hubert" :alias "mrhaki"} 
       (zipmap [:name :alias] ["Hubert" "mrhaki"])))

;; If the size of the values collection is smaller
;; than the size of the keys collection, only 
;; keys that map to a value end in up
;; in the resulting map. 
(is (= {:name "Hubert"}
       (zipmap [:name :alias] ["Hubert"])))

;; If the size of the keys collection is smaller
;; than the size of the value collection, then the
;; returned map only contains keys from the keys 
;; collection and some values are ignored.
(is (= {:name "Hubert"}
       (zipmap [:name] ["Hubert" "mrhaki"])))

;; Using a lazy sequence created by the repeat
;; function we can set a default value for all keys.
(is (= {:name "" :alias "" :city ""}
       (zipmap [:name :alias :city] (repeat ""))))

;; If we have keys with the same name the last
;; mapping ends up in the resulting map.
(is (= {:name "mrhaki"}
       (zipmap [:name :name] ["Hubert" "mrhaki"])))

;; Keys for the resulting map don't have to be keywords,
;; but can be any type.
(is (= {"name" "Hubert" "alias" "mrhaki"}
       (zipmap ["name" "alias"] ["Hubert" "mrhaki"])))

Written with Clojure 1.10.1.

Permalink

Venturing into front-end land

I've been a professional software developer for almost half a year now, and I'm still super insecure about my front-end skills. So insecure to the point where at work I make a conscious effort to avoid front-end related tasks by citing my specialty and affinity for the back-end which is a complete lie; I like the back-end because most of my professional training and experience has been back-end oriented.

Recently, I've summoned up the courage to get out of my comfort zone and take a few courses on JS, CSS and HTML, and they've certainly helped me feel at ease about rendering something on the browser. Strangely it took me this long to discover there was a huge merit to working in the front-end: getting to interact directly with what I'm building. In front-end land, I know what I'm building is pretty ugly (😅), but at least I'm directly responsible for the mess, whereas it can be a muddy process tracking exactly what it is that I do on the back-end.

So where am I at? Well, I think I have a loose grasp of everything in JS land. A mentor advised me to start blogging to reinforce the concepts, so here I am. Over the past three weeks, I've learned tons of new things, and I thought it would be nice to share them here. You can tell me if what I learned is BS. For reference, here are some of the topics I intend to cover herein:

  • JavaScript
  • React
  • Vue
  • D3

As for how my newfound skills will help me professionally, I'm not sure. Until I get enough practice and feel like I really know what I'm doing, nothing's going to change at work; I'll continue trodding along in Clojure land which has been my favorite land as of recent. Anyways, I hope you enjoy my writing.

Warmly,
EK

Permalink

Env Vars in under 100 Lines of Code — (.env)

Env Vars in under 100 Lines of Code — (.env)

Photo by John O’Nolan on Unsplash

Like many projects that run on Node.js our apps use the dotenv package from npm to load environment variables from a local file which is kept out of git.

This lets our developers keep important secrets out of source control while our apps maintain a consistent way to read from the environment.

But I like to break things and decided to rewrite the dotenv entirely in ClojureScript, the result has been added to our degree9/enterprise repo.

Let’s get started!

Ok, so the Node.js environment is mutable within the running process. Which means we can override the js/process.env object with variables we load from our local file, this is basically the same process that dotenv uses.

Let’s start with some simple helper functions, these will make working with the .env file and js/process.env a bit easier.

First up is read-file we want to make sure the environment is populated prior to anything trying to read from it, so we are using the synchronous operation. By providing an encoding the function will return a string instead of a buffer.

(defn- read-file [path]
(.readFileSync fs path #js{:encoding "utf8"}))
(defn- env-file [dir]
(.resolve path dir ".env"))

The env-file is a simple path resolver which will allow us to provide an optional path to look for the .env file.

(defn- split-kv [kvstr]
(cstr/split kvstr #"=" 2))
(defn- split-config [config]
(->> (cstr/split-lines config)
(map split-kv)
(into {})))

Next we have the split-kv helper, this produces a key/value pair from MY_ENV_VAR=foobar .

While split-config takes the result of our synchronous file read then splits each line and applies our previous split-kv to each.

(defn- dot-env [path]
(-> (env-file path)
(read-file)
(split-config)))

The result of our existing helpers is the dot-env function, a convenient way to load our .env files as a hash-map config.

(defn- node-env [env]
(->> (js-keys env)
(map (fn [key] [key (obj/get env key)]))
(into {})))

It has a counter part, the node-env function. This converts a js/process.env object into a clojure hash-map.

Our last helper is populate-env! which takes a clojure map and sets each key value pair to the js/process.env object.

(defn- populate-env! [env]
(doseq [[k v] env]
(obj/set js/process.env k v)))

Now that we have all our helper functions ready to go, we move on to the public API we will be using to interact with the environment variables.

(defn init!
"Initialize environment with variables from .env file."
([] (init! {:path (.cwd js/process) :env js/process.env}))
([{:keys [path env]}]
(populate-env!
(merge (dot-env path)
(node-env env)))))

Starting up our application we want to initialize the environment, we do this by populating it with the result of merging our .env and js/process.env maps. We want to call this init! function as early as possible in our code so that the environment is ready before anything tries to read from it.

In our application we simply read from the environment using our other public function get

(defn get
"Return the value for `key` in environment or `default`."
([key] (get key nil))
([key default] (get js/process.env key default))
([env key default] (obj/get env key default)))

Put it all together!

(require [degree9.env :as env])
(env/init!)
(env/get "MY_ENV_VAR")

Env Vars in under 100 Lines of Code — (.env) was originally published in degree9 on Medium, where people are continuing the conversation by highlighting and responding to this story.

Permalink

REPL Driven Design

If you follow me on facebook you know that I’ve been publishing daily CoronaVirus statistics. I generate these statistics using the daily updates in the Johns Hopkins github repository.

At first I just hand copied the data into a spreadsheet. But that became tedious quite rapidly.

Then, in late March, I wrote a little Clojure program to extract and process the data. Every morning I pull the repo, and then run my little program. It reads the files, does the math, and prints the results.

Of course I used TDD to write this little program.

But over the last several weeks I’ve made quite a few small modifications to the program; and it has grown substantially. In making these adaptations I chose to use a different discipline: REPL Driven Design.

REPL Driven Design is quite popular in Clojure circles. It’s also quite seductive. The idea is that you try some experiments in the REPL to make sure you’ve got the right ideas. Then you write a function in your code using those idea. Finally, you test that function by invoking it at the REPL.

It turns out that this is a very satisfying way to work. The cycle time – the time between a code experiment and the test at the REPL – is nearly as small as TDD. This breeds lots of confidence in the solution. It also seems to save the time needed to mock, and create fake data because, at least in my case, I could use real production data in my REPL tests. So, overall, it felt like I was moving faster than I would have with TDD.

But then, in late April, I wanted to do something a little more complicated than usual. It required a design change to my basic structure. And suddenly I found myself full of fear. I had no way to ensure that those design changes wouldn’t leave the system broken in some way. If I made those changes, I’d have to examine every output to make sure that none of them had broken. So I postponed the change until I could muster the courage, and set aside the dedicated time it would require.

The change was not too painful. Clojure is an easy language to work with. But the verfication was not trivial, which led me to deploy the program with a small bug – a bug I caught 4 days later. That bug forced me to go back and correct the data and graphs that I generated.

Why did I need the design change? Because I was not mocking and creating fake data. My functions just read from the repo files directly. There was no way to pass them fake data. The design change I needed to make was precisely the same as the design change that I’d have needed for mocking and fake data.

Had I stuck with the TDD discipline I would have automatically made that design change, and I would not have faced the fear, the delay, and the error.

Is it ironic that the very design change that TDD would have forced upon me was the design change I eventually needed? Not at all. The decoupling that TDD forces upon us in order to pass isolated inputs and gather isolated outputs is almost always the design that fascilitates flexibility and promotes change.

So I’ve learned my lesson. REPL driven development feels easier and faster than TDD; but it is not. Next time, it’s back to TDD for me.

Permalink

ClojureScript web server with Macchiato, Shadow CLJS and Reitit

ClojureScript is a variant of Clojure which compiles to JavaScript. This makes ClojureScript usable in devices that can run JavaScript such as browsers and backend services using Node.js.

Traditionally ClojureScript has been mostly used in browsers and backends have been made with Clojure or with other technologies. While pure ClojureScript backend usage remains rare, the recent movement towards serverless technologies has changed the industry. The serverless technologies typically scale instances up and down regularly and because of that the startup time of the language is important. Compared to JVM, Node.js is much faster in the startup and therefore this applies also to the ClojureScript vs Clojure comparison. That means that in certain cases, ClojureScript might be a better alternative for backend programming than Clojure

The problem is that there exists few tutorials for ClojureScript web development. Googling “ClojureScript web development” returns at least for me only results that show how to create webservers with Clojure. This leads to wonder is it really possible and how? Fortunately, yes it is.

This post is intended for everyone who would like to give ClojureScript a chance in web development.

Server options

First option would be to use existing JavaScript frameworks like Express.js and made route handlers with ClojureScript. This is a valid alternative but personally I think that pure ClojureScript alternative would be better.

After a bit googling I found Macchiato project which is a ready framework that adapts Node.js internal web server to Ring framework. This means that we could use existing Ring middlewares and routers. Also existing knowledge of Clojure web development can be utilized.

I also want to use Metosin Reitit which is an excellent router library. Reitit also supports Swagger generation and Clojure Spec based request and response coercion. Personally I prefer to
use Shadow CLJS as a build tool for ClojureScript instead of Leiningen.

Using Macchiato, Reitit and Shadow CLJS together needs certain tweaks compared to normal Clojure Ring project, which I will next introduce.

I have made a GIT repository https://github.com/hjhamala/macchiato-example-solita-dev-blog containing branches for different parts of the post.

As a prerequirement I will assume that Node.js and NPM are installed.

REPL driven development

This differs a lot depending on what development environment is used, so please read Shadow CLJS documentation for different scenarios.

One way is to:

# Start Shadowcljs compilation
npm run watch 
# Connect node to compilation unit
node target/main.js
# Connect the REPL to port which Shadowcljs exposed
# Invoke from the the REPL
(shadow/repl :app)
# Start coding :)

Installing Shadow CLJS

git branch 01_minimal_project

Shadow-cljs is installed via NPM. First create package.json file with next contents.

{
  "name": "macchiato-shadow-cljs-example",
  "version": "0.1.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "watch": "shadow-cljs watch app",
    "compile": "shadow-cljs compile app",
    "release": "shadow-cljs release app",
    "start_release": "node -e 'require(\"./target/main.js\").server()'"
  },
  "keywords": [],
  "devDependencies": {
    "shadow-cljs": "2.9.8"
  },
  "dependencies": {
  }
}

After this, run command npm install in the same directory.

Shadow CLJS compilation is configured in shadow-cljs.edn so create a new file with the next content:

{:source-paths ["src" "test"]
 :dependencies [[com.taoensso/timbre "4.10.0"]]
 :builds {:app {:target :node-library
                :source-map true
                :exports {:server macchiato-test.core/server}
                :output-dir "target"
                :output-to "target/main.js"
                :compiler-options {:optimizations :simple}}}}

I added Timbre as a logging library but that part is optional.

Then we create a new ClojureScript file for starting the server src/macchiato_test/core.cljs:

(ns macchiato-test.core
  (:require [taoensso.timbre :refer [info]]))

(defn server []
  (info "Hey I am running now!"))

Lets compile the file and run it. This can be done from the REPL as well.

npm run release
npm run start_release

This should print out something like

INFO [macchiato-test.core:5] - Hey I am running now!

and then exit.

Making minimal Macchiato server

git branch 02_add_minimal_macchiato

First we need to install Macchiato as dependency by adding the next dependency to shadow-cljs.edn:

[macchiato/core "0.2.16"]

Change core.cljs with next content:

(defn handler
  [request callback]
  (callback {:status 200
             :body "Hello Macchiato"}))

(defn server []
  (info "Hey I am running now!")
  (let [host "127.0.0.1"
        port 3000]
    (http/start
      {:handler    handler
       :host       host
       :port       port
       :on-success #(info "macchiato-test started on" host ":" port)})))

Then start the server from the REPL by invoking

(server)

And be greeted with many errors…

Most likely compiler or Node.js will give errors like ‘MODULE_NOT_FOUND’ not found. The missing NPM modules are listed in Macchiato core library project.clj file. Shadow CLJS does not load them because it expects NPM dependencies to be in package.json. This means that we must add them to it. This could be avoided by letting Leiningen handle all the dependencies.

npm add ws concat-stream content-type cookies etag lru multiparty random-bytes qs simple-encryptor url xregexp
npm add source-map-support --save-dev

After this, running the server should print to console:

INFO [macchiato-test.core:19] - macchiato-test started on 127.0.0.1 : 3000.

We can test the server by invoking curl localhost:3000 which should return

hello macchiato

Adding Reitit

git branch 03-reitit-added

First we need to add Metosin Reitit and Spec-Tools to Shadowcljs dependencies. These dont have any NPM dependencies so there is no need to update package.json as well.

[metosin/reitit "0.5.1"]
[metosin/spec-tools "0.10.3"]

Then replace core.cljs with the next content.

(ns macchiato-test.core
  (:require [taoensso.timbre :refer [info]]
            [macchiato.server :as http]
            [reitit.ring :as ring]
            [reitit.coercion.spec :as c]
            [reitit.swagger :as swagger]
            [macchiato.middleware.params :as params]
            [reitit.ring.coercion :as rrc]
            [macchiato.middleware.restful-format :as rf]))

(def routes
  [""
   {:swagger  {:info {:title       "Example"
                      :version     "1.0.0"
                      :description "This is really an example"}}
    :coercion c/coercion}
   ["/swagger.json"
    {:get {:no-doc  true
           :handler (fn [req respond _]
                      (let [handler (swagger/create-swagger-handler)]
                        (handler req (fn [result]
                                       (respond (assoc-in result [:headers :content-type] "application/json"))) _)))}}]
   ["/test"
    {:get  {:parameters {:query {:name string?}}
            :responses  {200 {:body {:message string?}}}
            :handler    (fn [request respond _]
                          (respond {:status 200 :body {:message (str "Hello: " (-> request :parameters :query :name))}}))}
     :post {:parameters {:body {:my-body string?}}
            :handler    (fn [request respond _]
                          (respond {:status 200 :body {:message (str "Hello: " (-> request :parameters :body :my-body))}}))}}]
   ["/bad-response-bug"
       {:get  {:parameters {:query {:name string?}}
               :responses  {200 {:body {:message string?}}}
               :handler    (fn [request respond _]
                             (respond {:status 200 :body {:messag (str "Hello: " (-> request :parameters :query :name))}}))}}]])

(defn wrap-coercion-exception
  "Catches potential synchronous coercion exception in middleware chain"
  [handler]
  (fn [request respond _]
    (try
      (handler request respond _)
      (catch :default e
        (let [exception-type (:type (.-data e))]
          (cond
            (= exception-type :reitit.coercion/request-coercion)
            (respond {:status 400
                      :body   {:message "Bad Request"}})

            (= exception-type :reitit.coercion/response-coercion)
            (respond {:status 500
                      :body   {:message "Bad Response"}})
            :else
            (respond {:status 500
                      :body   {:message "Truly internal server error"}})))))))

(defn wrap-body-to-params
  [handler]
  (fn [request respond raise]
    (handler (-> request
                 (assoc-in [:params :body-params] (:body request))
                 (assoc :body-params (:body request))) respond raise)))

(def app
  (ring/ring-handler
    (ring/router
      [routes]
      {:data {:middleware [params/wrap-params
                           #(rf/wrap-restful-format % {:keywordize? true})
                           wrap-body-to-params
                           wrap-coercion-exception
                           rrc/coerce-request-middleware
                           rrc/coerce-response-middleware]}})
    (ring/create-default-handler)))


(defn server []
  (info "Hey I am running now!")
  (let [host "127.0.0.1"
        port 3000]
    (http/start
      {:handler    app
       :host       host
       :port       port
       :on-success #(info "macchiato-test started on" host ":" port)})))

Middlewares

First two middlewares are needed for parameter wrapping. params/wrap-params does query params parsing and reads possible body of a HTTP request. rf/wrap-restful-format does encoding/decoding depending on Content-Type of the request. wrap-body-to-params is a new middleware that I made because Reitit expects body params to be in body-params named map in the Ring request.

wrap-coercion-exception is a middleware which catches request and response coercion errors and returns 400 or 500 level error messages. In real development at least 400 error should include also some information why the requests are rejected. Reitit error object contains data that could be transformed to a more human friendly way pretty easily.

Asynchronous handlers

Compared to regular Clojure Ring handlers Macchiato uses an asynchronous variant of a handler which has three parameters instead of the regular one parameter. The additional parameters are respond and raise callbacks. For Node.js we only need to use respond.

Testing the server

Start the server and run the next commands:

curl localhost:3000/swagger.json returns the Swagger file.

# Missing parameter gives error
curl localhost:3000/test
{"message":"Bad Request"}
# Request with good parameter returns expected response
curl localhost:3000/test?name=heikki
{"message":"Hello: heikki"}
# Reitit coerces bad response
curl localhost:3000/bad-response-bug?name=heikki
{"message":"Bad Response"}

Conclusion

I hope that this post shows the necessary ways to use Reitit with Macchiato.

But is it worth of it? Personally, I think that Macchiato could be used for Lambda development. ClojureScript development gets its power from developing with the REPL. There is no easy way to connect the REPL to either a cloud-based Lambda or local AWS API Gateway.

Instead of this, we could run Macchiato locally as an alternative to ApiGateway. I will introduce one way to do this in a future post so stay tuned on Solita Dev Blog!

Permalink

Faster GraalVM Clojure Compilation Times

<img alt="Crispin Wellington" class="blog-splash" src="https://epiccastle.io/blog/faster-graalvm-clojure-compilation-times/splash-20.jpg" style="width:50%;"><h3>Slash GraalVM Clojure Compilation Time With This One Weird Trick</h3><p>Are you building a clojure project and compiling it to a native binary with GraalVM? As the project has grown, has the GraalVM compilation time gotten longer and longer? Here is the story of how I discovered <a href="https://twitter.com/borkdude">@borkdude</a>'s trick to getting GraalVM native image times in under 5 minutes.</p><h4>The Story</h4><p>As I've been building <a href="https://epiccastle.io/spire/">spire</a>, compilation times have gradually been increasing. Then in order to bring a bunch of increased functionality I added a large number of Java class bindings to my sci environment. I check the code in and go to bed. When I wake I discover my <a href="https://circleci.com/gh/epiccastle/spire/1055">Circle CI builds</a> have been killed in the night for taking too long. The build was underway for an hour when the CI system killed the build!</p><p>So I try compiling on my local linux workstation to see what is going on. After an eternity the build crashes with:</p><pre><code class="shell">[build/spire:2202] analysis: 12,784,610.52 ms, 6.22 GB Fatal error:java.lang.OutOfMemoryError: GC overhead limit exceeded Error: Image build request failed with exit status 1 com.oracle.svm.driver.NativeImage$NativeImageError: Image build request failed with exit status 1 at com.oracle.svm.driver.NativeImage.showError(NativeImage.java:1527) at com.oracle.svm.driver.NativeImage.build(NativeImage.java:1289) at com.oracle.svm.driver.NativeImage.performBuild(NativeImage.java:1250) at com.oracle.svm.driver.NativeImage.main(NativeImage.java:1209) at com.oracle.svm.driver.NativeImage$JDK9Plus.main(NativeImage.java:1707) </code></pre><p><strong>12,784,610ms</strong> spent in analysis. That is over <strong>three and a half hours</strong>!</p><p>As I just added a bunch of new class bindings and reflection config for them all, I assume that must be the culprit. I remove 75% of them. And build again. I time this run. It succeeds, but it isn't pretty.</p><pre><code class="shell">[build/spire:11341] classlist: 5,953.42 ms, 0.94 GB [build/spire:11341] (cap): 800.78 ms, 0.94 GB [build/spire:11341] setup: 2,536.63 ms, 0.94 GB [build/spire:11341] (typeflow): 1,544,374.55 ms, 6.51 GB [build/spire:11341] (objects): 5,164,930.94 ms, 6.51 GB [build/spire:11341] (features): 4,320.62 ms, 6.51 GB [build/spire:11341] analysis: 6,724,448.94 ms, 6.51 GB [build/spire:11341] (clinit): 1,898.20 ms, 6.51 GB [build/spire:11341] universe: 47,567.23 ms, 6.51 GB [build/spire:11341] (parse): 12,451.68 ms, 4.17 GB [build/spire:11341] (inline): 9,098.24 ms, 3.94 GB [build/spire:11341] (compile): 75,592.43 ms, 3.59 GB [build/spire:11341] compile: 102,870.54 ms, 3.59 GB [build/spire:11341] image: 14,383.80 ms, 3.61 GB [build/spire:11341] write: 1,725.39 ms, 3.61 GB [build/spire:11341] [total]: 6,899,951.34 ms, 3.61 GB cp build/spire spire real 115m59.386s user 839m24.587s sys 1m56.493s </code></pre><p>Wow! Analysis is now down to only 1 hour and 50 minutes! Good times!</p><p>So this leads me to wonder how Michiel (<a href="https://twitter.com/borkdude">@borkdude</a>) is building his babashka images on Circle CI without hitting the wall. I go over to have a look at his <a href="https://circleci.com/gh/borkdude/babashka/9392">Circle CI builds</a>. They are building his images in <strong>less that 5 minutes</strong>.... whaaaaaaat?</p><h4>That One Weird Trick</h4><p>This leads me to a bunch of comparisons between the compilation and building of babashka with that of spire. I rule out reflection as the source of the problem. I rule out GraalVM options. And then I discover <a href="https://github.com/borkdude/babashka/blob/c3f9480efe08827dfa4ac0fb21f7376d80287ce6/project.clj#L53">the magical incantation</a>.</p><p>I had this setting on my GraalVM build, but I did not have it on when clojure was doing AOT compilation for the uberjar. Before we turn it on lets look into what it does.</p><h4>Direct Linking</h4><p>Clojure's direct linking can be activated by passing <code>-Dclojure.compiler.direct-linking=true</code> to the compiler. This feature is <a href="https://clojure.org/reference/compilation#directlinking">documented here</a>. From this discussion we read:</p><blockquote><p> &quot;Normally, invoking a function will cause a var to be dereferenced to find the function instance implementing it, then invoking that function... <em>Direct linking</em> can be used to replace this indirection with a direct static invocation of the function instead. This will result in faster var invocation. Additionally, the compiler can remove unused vars from class initialization and direct linking will make many more vars unused. Typically this results in smaller class sizes and faster startup times.&quot; </p></blockquote><p>And faster Graal compilation times to boot! This option will produce JVM byte code that will be much more like what a standard Java program will produce. Java is a statically typed language after all, and Java programs are not dereferencing vars every time they are invoked. The Graal compiler, being built primarily to compile Java programmes, is obviously having a very hard time with the dynamic nature of clojure's compiled byte code.</p><p>But do we lose anything if we compile our code with direct linking? According to the docs:</p><blockquote><p> &quot;One consequence of direct linking is that var redefinitions will not be seen by code that has been compiled with direct linking (because direct linking avoids dereferencing the var). Vars marked as ^:dynamic will never be direct linked. If you wish to mark a var as supporting redefinition (but not dynamic), mark it with ^:redef to avoid direct linking.&quot; </p></blockquote><p>As Michiel pointed out to me, things like a general use of <code>with-redefs</code> won't work with direct linking. But if we do want to do something dynamic like <code>with-redefs</code> in our code, we can individually mark those vars with <code>^:redef</code> meta data to allow them to work. Also, things like <code>with-redefs</code> is more commonly used in writing tests, so we can keep the option off in our test code and save direct linking for our uberjar builds.</p><h4>The Fixed Build</h4><p>Now taking the original problematic build that crashed after three and a half hours, I switch that setting on in my uberjar compilation and rebuild. Here's the result:</p><pre><code class="shell">[build/spire:22871] classlist: 4,379.73 ms, 0.96 GB [build/spire:22871] (cap): 739.03 ms, 0.96 GB [build/spire:22871] setup: 2,268.45 ms, 0.96 GB [build/spire:22871] (typeflow): 50,731.79 ms, 5.92 GB [build/spire:22871] (objects): 116,840.36 ms, 5.92 GB [build/spire:22871] (features): 2,682.61 ms, 5.92 GB [build/spire:22871] analysis: 173,423.16 ms, 5.92 GB [build/spire:22871] (clinit): 1,193.08 ms, 5.92 GB [build/spire:22871] universe: 3,049.85 ms, 5.92 GB [build/spire:22871] (parse): 7,224.44 ms, 5.79 GB [build/spire:22871] (inline): 13,408.30 ms, 4.19 GB [build/spire:22871] (compile): 77,773.85 ms, 4.16 GB [build/spire:22871] compile: 104,068.58 ms, 4.16 GB [build/spire:22871] image: 13,542.17 ms, 4.16 GB [build/spire:22871] write: 1,792.26 ms, 4.22 GB [build/spire:22871] [total]: 302,882.67 ms, 4.22 GB cp build/spire spire real 5m32.050s user 33m7.865s sys 0m12.756s </code></pre><p>Analysis is down from 3½ hours... to a little under <strong>3 minutes</strong>. And amazing improvement!</p><h4>Summary</h4><p>Add <code>-Dclojure.compiler.direct-linking=true</code> to your clojure compilation JVM options when building your uberjar and when compiling with GraalVM.</p><p>In lein:</p><pre><code class="clojure">(defproject foo &quot;0.1.0-SNAPSHOT&quot; ;; missing lines :profiles {:uberjar {:aot :all :jvm-opts [&quot;-Dclojure.compiler.direct-linking=true&quot;]} ) </code></pre><p>And in GraalVM native-image:</p><pre><code class="shell">graalvm-ce-java11-20.1.0-dev/bin/native-image \ ... -J-Dclojure.compiler.direct-linking=true \ ... </code></pre>

Permalink

PurelyFunctional.tv Newsletter 379: get produces unexpected behavior, as expected

Issue 379 – May 25, 2020 · Archives · Subscribe

Clojure Tip 💡

get has unexpected behavior, as expected

Some people wonder about why get returns nil when the key is not found in the map. I actually like that behavior and dislike the behavior in other languages when it is an error to ask for something that doesn’t exist. Errors when the key doesn’t exist feel a bit like going into a restaurant and ordering spaghetti and getting kicked out because they don’t have it. A simple nil would have sufficed.

Do you prefer:

(get some-map :key) ;=> nil

OR

(get some-map :key)
KeyNotFoundException on line 3212312

Obviously, Rich Hickey prefers the first one.

However, if you think returning nil when the key is not found is bad, try this one at the REPL:

(get :key some-map)

For those of you far from the REPL, the answer is nil.

(get :key some-map) ;=> nil

Ugh. That one seems really bad. You get the same answer with the arguments reversed. This does not seem like desirable behavior. In fact, it seems actively hostile.

If you look at the code for get, you can see this behavior implemented. It’s in the Java method getFrom().

static Object getFrom(Object coll, Object key){
	if(coll == null)
		return null;
	else if(coll instanceof Map) {
		Map m = (Map) coll;
		return m.get(key);
	}
	else if(coll instanceof IPersistentSet) {
		IPersistentSet set = (IPersistentSet) coll;
		return set.get(key);
	}
	else if(key instanceof Number && (coll instanceof String || 
coll.getClass().isArray())) {
		int n = ((Number) key).intValue();
		if(n >= 0 && n < count(coll))
			return nth(coll, n);
		return null;
	}
	else if(coll instanceof ITransientSet) {
		ITransientSet set = (ITransientSet) coll;
		return set.get(key);
	}

	return null;
}

Notice the last line: a final return null if none of the types match. Java requires some return statement (or a thrown exception) and Rich chose to return null.

Clojure’s functions are rife with this pattern: check a bunch of types, then return null if none match.

I’ve written about this more here. The biggest problem is that it violates most people’s default assumptions about how types are checked in dynamic languages. It appears that types are not checked in Clojure, and that that policy was implemented systematically. Most people assume types will be checked in some way.

Clojure’s behavior is very confusing to beginner and expert alike. The biggest benefit to this approach is that some code is faster. But many functions could throw exceptions with no performance hit. The only positive thing I can say is that you eventually get used to it. And empirically, this is true. Many companies are very productive with Clojure.

So, the bottom line is: be careful. REPL-driven development really helps uncover the problems sooner. Write code in small steps and test it frequently.

PurelyFunctional.tv Update 💥

Just before the quarantine, I enacted plans to discontinue new memberships to PurelyFunctional.tv. However, the timing was terrible! I planned to make individual course sales much more interesting, but I have not gotten to them yet.

I’ve reopened memberships to anyone who wants them. I’ll discontinue them again later, with the same policy of allowing existing memberships to continue.

If you would like a membership, check out the Register page. Memberships give you access to 100 hours of video courses on Clojure, ClojureScript, and tooling.

Book update 📖

Chapter 8 was just released as part of the Manning Early Access Program. Chapter 8 is all about Stratified Design, where we learn to organize our code into layers. This chapter took me a long time. It was hard for me to boil down this design skill.

You can buy the book and use the coupon code TSSIMPLICITY for 50% off.

Chapter 11 first draft is done. It’s still rough, but it’s good. It clocked in at 52 pages, which is kind of long. I hope I don’t have to split it. I’m letting that chapter sit for a bit before I clean it up. Now I’m onto Chapter 12, all about update, nested update (with recursion!), and returning functions from functions. Will those three topics fit into one chapter? There’s only one way to find out.

Quarantine update 😷

I know a lot of people are going through tougher times than I am. If you, for any reason, can’t afford my courses, and you think the courses will help you, please hit reply and I will set you up. It’s a small gesture I can make, but it might help.

I don’t want to shame you or anybody that we should be using this time to work on our skills. The number one priority is your health and safety. I know I haven’t been able to work very much, let alone learn some new skill. But if learning Clojure is important to you, and you can’t afford it, just hit reply and I’ll set you up. Keeping busy can keep us sane.

Stay healthy. Wash your hands. Stay at home. Wear a mask. Take care of loved ones.

Clojure Challenge 🤔

Last week’s challenge

The challenge in Issue 378 was to classify the symmetry of patterns. You can see the submissions here.

Many people expressed surprise that this could be an Expert level in JavaScript. One solution was 8 lines long in Clojure. If you’re really curious: write it out yourself!

You can leave comments on these submissions in the gist itself. Please leave comments! There are lots of great discussions in there. You can also hit the Subscribe button to keep abreast of the comments. We’re all here to learn.

And I must say that I am so happy with the discussions happening in the gist comments. People are getting fast feedback from each other and trying out multiple implementations. Check it out.

This week’s challenge

Word segmentation

One of the issues with domain names is that spaces aren’t allowed. So we get domain names like this:

  • penisland.com (Pen Island)
  • expertsexchange.com (Experts Exchange)

Now we also have the problem with #hashtags on social media platforms.

We want to be able to take a string without spaces and insert the spaces so that the words are separated and our gradeschool teacher can be happy again.

Your task is to write a function that takes a string without spaces and a dictionary of known words and returns all possible ways it could be segmented (i.e., insert spaces) into those words. If it can’t be segmented, it should return an empty sequence.

(segmentations "hellothere" ["hello" "there"]) ;=> ("hello 
there")
(segmentations "fdsfsfdsjkljf" ["the" "he" "she" "it"...]) ;=> ()

Bonus: use a dictionary file and some text from somewhere and do a real test.

Super bonus: make it lazy.

Thanks to this site for the challenge idea where it is considered Expert level in JavaScript.

You can also find these same instructions here. I might update them to correct errors and clarify the descriptions. That’s also where submissions will be posted. And there’s a great discussion!

As usual, please reply to this email and let me know what you tried. I’ll collect them up and share them in the next issue. If you don’t want me to share your submission, let me know.

Rock on!
Eric Normand

The post PurelyFunctional.tv Newsletter 379: get produces unexpected behavior, as expected appeared first on PurelyFunctional.tv.

Permalink

Using Datahike’s Java API to build a web application

The goal of this article is to introduce Datahike’s Java API. To do so we will create a small web application together. The figure below illustrates the application architecture and lists some of the core advantages of using Datahike. Datahike is an open source, light-weight Datalog runtime with durable and auditable storage. It has an efficient query engine and is written in Clojure. On the left we can see the Spring Boot based application interacting with Datahike’s Java API through its Controllers. Since Clojure is also hosted on the JVM, objects such as lists or functions can be transparently passed into Datahike’s runtime.

Architecture of Java application using Datahike.

There are good reasons for using a Datalog database like Datahike. First, it is simple. It is a very small, well-factored core codebase (< 5000 lines of code) of a few core concepts. This allows it to be flexibly recomposed and integrated with existing data sources in novel ways. Furthermore, it is more declarative than SQL by its roots in logic programming languages like Prolog, which provide first class support for implicit binding by logic variables. Because of its support for recursion it is also strictly more expressive than pure relational algebras such as those described by SQL. Compared to non-functional databases, Datahike provides coordination-free read scaling by automatically snapshotting all write operations. These snapshots can be read in parallel in each JVM runtime context of an arbitrary number of reading processes. It can also be audited at any point in time like git. Datahike requires only Java dependencies and can be used in-memory, with a simple file based backend, with Redis for high throughput, with auto-scaling cloud infrastructure like AWS S3 and DynamoDB, or with all of these backends combined in one query context.

This combination makes Datahike an extremely powerful, but light-weight runtime to reason about data. Datahike is following the path pioneered by Cognitect’s Datomic and DataScript. We also take inspiration from the long and rich research tradition on Datalog. Datahike is largely API-compatible to Datomic. Datomic has more features and is more mature, but Datahike is open-source, leaner and easier to adapt.

Quick introduction to Datahike

Datoms and Entities

A Datahike database stores all facts as so called Datoms. A Datom is a tuple made of an entity id, an attribute and the associated value. The same concept is at the foundation of the semantic web where resources are described using a subject–predicate–object relation also known as triples (see RDF). It is possible to model relational, graph or columnar databases. Below is a Datahike fact stating that the player with id 532 is named ’John’.

[532 :player/name "John"]

In this article we will often use Clojure syntax. Unless obvious, I will introduce the terms along the way. In the above example we denote a Clojure vector. The attribute part, :player/name, is what is called a Clojure keyword. In Datahike, an entity is a group of Datoms sharing the same entity id. An entity can be seen like an object, grouping multiple facts together in time and space. The first three Datoms below describe the entity for player 532.

;; ...
[532 :player/name  "John"]
[532 :player/team  1201]
[532 :player/event 2534]
;; ...
[1201 :team/name   "The Blue Jays"]
;; ...

It tells us that the player’s name is John and that the player’s team is the entity whose id is 1201 (the entity is partially shown in the above figure). This example shows how entities reference other entities. The database also contains the fact that the player’s team name is ’The Blue Jays’ and that the player is associated with event with entity id 2534.

Queries

Datahike queries are written in Datalog. The general structure of a query (in Clojure syntax) is as follows:

[:find ?e
 :where [?e :player/name "John"]]

Here ?e is a variable and [?e :player/name "John"] is a clause. The query extracts all Datoms which have an attribute :player/name and value John.

Recursion

 1  (transact conn [{:db/id 1
 2                   :ancestor 2}
 3                  {:db/id 2
 4                   :ancestor 3}
 5                  {:db/id 3
 6                   :ancestor 4}])
 7  
 8  (def rule '[[(ancestor ?e1 ?e2)
 9               [?e1 :ancestor ?e2]]
10              [(ancestor ?e1 ?e2)
11               [?e1 :ancestor ?t]
12               (ancestor ?t ?e2)]])
13  
14  (q '[:find  ?u1 ?u2
15       :in    $ %
16       :where (ancestor ?u1 ?u2)]
17     @conn
18     rule)

The above example illustrates how Datahike supports recursion. The call to the transact function inserts three entities into the database (represented by the connection conn). The syntax is slightly different here as we use Clojure’s map syntax to pass each entity to the transactor. This transactor DSL is flexible and convenient. This transactions ends up with the following Datoms as facts:

[1 :ancestor  2]
[2 :ancestor  3]
[3 :ancestor  4]

Line 8 to 12 define a Datalog rule. A rule’s role is to infer new facts from existing ones. In our example the rule defines the meaning of an ’ancestor’, i.e., ?e1 is an ancestor of ?e2 if there are facts with attribute :ancestor in the database. And, by induction, ?e1 is an ancestor of ?e2 if ?e1 is an ancestor of another entity ?t and ?t is also the ancestor of ?e2.

Lines 14-18 define the recursive query for deducing and retrieving all ancestor relationships. Notice that the implementation of the equivalent query in SQL would require a lot more work and would not be expressed as elegantly. Datalog turns out to be well-suited both in SQL and graph database use cases.

Our small web application example

The goal of our small web application is to let a team plan for events it is going to participate in. More precisely it lets teams state which of their players will participate in which events.

Below is an illustration of the application. It shows a team with its events and for each event the players who are going to join.

A team at events

As the application is built using Spring Boot, we are going to use Thymeleaf for the view part, and it will be bootstraped using Spring Initializr. The latter will generate a skeleton of the application with all the libraries and dependencies correctly setup.

To build our example, first go to the Spring Initializr web page https://start.spring.io. Fill in the fields as shown in form shown below. Don’t forget to add Spring Web, Spring Boot DevTools and Thymeleaf to the dependencies, then click Generate. This will download a zip file called team-events.zip. Unzip the file and you have your project ready. Import the project in your favorite editor. As I am using IntelliJ, I will illustrate the steps using it.

Screenshot of Spring Initializr.

Select File | New | Project from Existing Source and choose the folder where you unzipped your project. Choose to import the project as a maven project: Import project from external model | maven. Select a JDK no older than JDK 11. In your pom.xml add the dependency to the latest version of Datahike and Clojure.

<dependency>
    <groupId>io.replikativ</groupId>
    <artifactId>datahike</artifactId>
    <version>0.3.0</version>
</dependency>

<dependency>
    <groupId>org.clojure</groupId>
    <artifactId>clojure</artifactId>
    <version>1.10.1</version>
</dependency>

Then select the TeamEventsApplication.java file and run it. If all went well, it should start a web server listening locally on port 8080. If you go to your web browser at localhost:8080 you should see the following error page.

Screenshot of Spring Initializr.

Building the application

It is now time to build the logic of our application. First we need to create a database. With a Datahike database, the first step is to decide whether to create your database with or without a schema. In our application we are going to use a schema. In Datahike, the schema’s role is to define and constrain each attribute.

{:db/ident :player/event
 :db/valueType :db.type/ref
 :db/cardinality :db.cardinality/many}

In the above example, we declare that the :player/event attribute is of type ref because it is used to reference another entity in the database and it is of cardinality many meaning that the attribute can appear multiple times inside one entity. An attribute can also be declared as unique, which means that each value it refers to uniquely identifies the entity. This allows very handy shorthands in queries.

Below is a typical sequence for creating a Datahike database in Java. After defining the schema, we create the database and connect to it. This returns conn, a reference to the database which we will use to interact with it.

Object schema = Clojure.read(" [{:db/ident :team/name\n" +
                             "                 :db/valueType :db.type/string\n" +
                             "                 :db/unique :db.unique/identity\n" +
                             "                 :db/index true\n" +
                             "                 :db/cardinality :db.cardinality/one}\n" +

                             "                {:db/ident :team/event\n" +
                             "                 :db/valueType :db.type/ref\n" +
                             "                 :db/cardinality :db.cardinality/many}" +

                             "                {:db/ident :event/name\n" +
                             "                 :db/valueType :db.type/string\n" +
                             "                 :db/unique :db.unique/identity\n" +
                             "                 :db/cardinality :db.cardinality/one}" +

                             "                {:db/ident :player/name\n" +
                             "                 :db/valueType :db.type/string\n" +
                             "                 :db/unique :db.unique/identity\n" +
                             "                 :db/cardinality :db.cardinality/one}" +

                             "                {:db/ident :player/event\n" +
                             "                 :db/valueType :db.type/ref\n" +
                             "                 :db/cardinality :db.cardinality/many}" +

                             "                {:db/ident :player/team\n" +
                             "                 :db/valueType :db.type/ref\n" +
                             "                 :db/cardinality :db.cardinality/many}" +

                             "]");
                             
String uri = "datahike:file:///home/user/temp/team-db";
Datahike.createDatabase(uri, k(":initial-tx"), schema);
Object conn = Datahike.connect(uri);

It is now time to build the logic of our application. We will start with what is required to handle players.

Creating a player

To create a new Player, in the PlayerController class we define the code that defines the behaviour following a post request at the /players URL. The request will provide the player’s name as a request parameter.

    @RequestMapping(path = "/players", method = RequestMethod.POST)
    public String create(Model model, @RequestParam("name") String name) {
        Datahike.transact(conn, vec(map(k(":player/name"), name)));
        return "redirect:/players";
    }

As we can see, adding a new entity to a Datahike database is fairly simple. It consists in transacting a Datom whose attribute is :player/name and associating it with the player’s name. In Clojure this would be done this way: (transact conn [{:player/name name}]). The equivalent code in Java is what is written in our controller: Datahike.transact(conn, vec(map(k(":player/name"), name))). We can see that the Java methods vec and map are the equivalent of Clojure’s literal vector and map (written [] and {} respectively). Notice also that Clojure keywords are built using the k method in Java. It expects a Java string starting with a colon as argument. Finally, we use conn as the reference to the database for the transaction.

Listing all the players

Listing the players currently in the database is also fairly simple:

    @RequestMapping(path = "/players", method = RequestMethod.GET)
    public String index(Model model){
        String query = "[:find ?e ?n :in $ :where [?e :player/name ?n]]";
        Set<PersistentVector> res = Datahike.q(query, dConn(conn));
        model.addAttribute("players", res);
        return "players/index";
    }

Datahike’s q method is used for querying the database. It takes a query and a variable number of arguments used as input to the query. In the current version of the API, we pass the query as a string written in Clojure syntax. In our example the query input is the database itself. More precisely we use a dereferenced version of the database, dConn(conn). This ensures that we are always accessing the latest version of the database as a snapshot. A query returns a set of PersistentVectors which is the Java implementation of Clojure’s vector type. In Java we can use it simply as a Java vector.

The query [:find ?e ?n :in $ :where [?e :player/name ?n]] extracts all existing players. This query returns a set of tuples consisting of an entity id and the name of a player (respectively the variables ?e ?n). The query result is then passed as an attribute to the Spring model object so that it becomes reachable from the view. An excerpt of the view listing the players is shown below:

    <h1>Players</h1>
    
    <table>
      <tr>
        <th>NAME</th>
      </tr>
      <tr th:each="player : ${players}">
        <td th:text="${player[0]}">id</td>
        <td th:text="${player[1]}">name</td>
        ...
      </tr>
    </table>

We use Thymeleaf’s operator th:each to iterate over the players that were passed to the view as attribute players. Below figure shows the list of players.

List of Players

Adding a player to an event

For a player to attend an event with her/his team, we need the following controller:

     1  @RequestMapping(path = "/events/{eventId}/players", method = RequestMethod.POST)
     2  public String create(Model model, @PathVariable("eventId") int eventId,
     3                       @RequestParam("playerId") int playerId,
     4                       @RequestParam("teamName") String teamName) {
     5      String query = "[" +
     6          ":find ?ti ?teamName " +
     7          ":in $ ?teamName " +
     8          ":where [?ti :team/name ?teamName]]";
     9      Set<PersistentVector> res = Datahike.q(query, dConn(conn), teamName);
    10      int teamId = (int) res.iterator().next().get(0);
    11  
    12      Datahike.transact(conn,
    13                        vec(vec(k(":db/add"), playerId, k(":player/event"), eventId),
    14                            vec(k(":db/add"), playerId, k(":player/team"), teamId)));
    15      return "redirect:/teams/" + teamName;
    16  }

The goal is to transact the association between a player and an event, plus the association between a player and a team. This is what the call to Datahike’s transact method is doing (line 12). To be safe to use with user inputs, a query (line 5-8) should always be a constant String. Parameters can be freely passed to q explicitly and there is no need to concatenate strings. It is only concatenated here to make it more readable.

Since in the first part of the controller the request only provides us the team name, we query the database for the team id of the given team. Notice that along with the database, here the teamName is passed as additional data to the query (line 9).

Removing a player from an event

     1  @RequestMapping(value = "events/{eventId}/teams/{teamId}/players/delete/{playerId}",
     2                  method = RequestMethod.GET)
     3  public String deleteFromEvent(Model model, @PathVariable("teamId") int teamId,
     4                                                @PathVariable("eventId") int eventId,
     5                                                @PathVariable("playerId") int playerId) {
     6      Datahike.transact(conn, vec(vec(k(":db/retract"), playerId, k(":player/event"),
     7                                      eventId)));
     8  
     9      String query = "[:find ?teamName :in $ ?ti :where [?ti :team/name ?teamName]]";
    10      Set<PersistentVector> res = Datahike.q(query, dConn(conn), teamId);
    11      String teamName = (String) res.iterator().next().get(0);
    12      return "redirect:/teams/" + teamName;
    13  }

To remove a player from an event, we use Datahike’s retraction API which consists in transacting a tuple starting with the keyword :db/retract followed by the Datom we want to remove (line 6).

The rest of the method is a query to retrieve the team name from its team id and use its name to render the correct view.

Removing a player from all its events and teams

    @RequestMapping(value = "/players/delete/{id}", method = RequestMethod.GET)
    public String delete(Model model, @PathVariable("id") int id) {
        Datahike.transact(conn, vec(vec(k(":db.fn/retractEntity"), id)));
        "return redirect:/players;
    }

To fully remove a player from the database, we remove its entity from the database. This will remove all its relations to events and teams. To do so, we use Datahike’s entity removal API, which consists in transacting the keyword :db.fn/retractEntity followed by the entity id to remove.

Other controllers

The controllers for events and teams follow the same principle as the player controller. For brevity I will not list the code here. You can retrieve the full code at the application repository.

Conclusion

In this article we have introduced the Java API of Datahike, a durable Datalog based database. We have shown how easy it is to build a Java web application on top of Datahike. We will keep the Java API in sync with our ongoing development of Datahike. As future work, we plan to provide an embedding of Datalog as a Java DSL in addition to the string based query representation. If you want more information on Datahike please visit our repository, the Java API definition or the Slack and Zulip channels. If you have special needs regarding Datahike, we are happy to help. In that case, please contact info@lambdaforge.io.

Permalink

Subverting Common Lisp Types And Emacs Interaction For Clj

Ok, so profiling the extra-low-hanging-fruit in terms of generic function performance revealed that it didn't do much in our situation. My next idea was to subvert the Common Lisp type system to give our set and map primitives some hints about what kind of equality operations to use.

I'm once again starting this piece before having written the code it'll be explaining, so this is less a thoughtful tour and more a stream-of-consciousness account of the writing.

Defining Types

Assuming the thing you're defining fits into the pre-existing Common Lisp types, you're fine. As soon as you want to do something like define polymorphic key/value structures you are, near as I can tell, on your fucking own bucko.

So I guess I'm rolling my own here?

Ok, the good news is that I'm in just enough of a hacky mood that I don't give a flying fuck how shitty this is going to be. That... might come back to bite me later, but we'll burn that bridge and salt it as we pass.

Here's how I have to define type map.

(deftype map (&optional keys vals)
  (let ((sym (intern (format nil "MAP-TYPE-~a-~a" keys vals) :clj)))

    (unless (fboundp sym)
      (setf (fdefinition sym) (kv-types keys vals)))

    `(and (satisfies map?) (satisfies ,sym))))

This feels batshit insane. In order to properly define a polymorphic key/value type, I have to manually intern predicates that deal with the specific types in question at declaration time. The problem is that satisfies specifically only accepts a symbol that must refer to a function of one argument that's meant to return a boolean. If it could take lambda terms, I could do something like

(defun kv-type (k-type v-type)
  (lambda (thing)
    (and (map? thing)
	 (every (lambda (pair)
		  (and (typep (car pair) k-type)
		       (typep (cdr pair) v-type)))
		(values thing)))))
...
(satisfies (kv-type 'keyword 'integer))

This is, unfortunately, off the table. Oh well. The complete definitions for both map and set types is

(defun map? (thing)
  (typep thing 'cl-hamt:hash-dict))

(defun map-type? (type)
  (and type
       (listp type)
       (eq (car type) 'map)))

(defun kv-types (k-type v-type)
  (lambda (map)
    (cl-hamt:dict-reduce
     (lambda (memo k v)
       (and memo (typep k k-type) (typep v v-type)))
     map t)))

(deftype map (&optional keys vals)
  (let ((sym (intern (format nil "MAP-TYPE-~a-~a" keys vals) :clj)))

    (unless (fboundp sym)
      (setf (fdefinition sym) (kv-types keys vals)))

    `(and (satisfies map?) (satisfies ,sym))))

(defun set? (thing)
  (typep thing 'cl-hamt:hash-set))

(defun set-type? (type)
  (and type
       (listp type)
       (eq (car type) 'set)))

(defun seq-types (v-type)
  (lambda (set)
    (cl-hamt:set-reduce
     (lambda (memo elem)
       (and memo (typep elem v-type)))
     set t)))

(deftype set (&optional vals)
  (let ((sym (intern (format nil "SET-TYPE-~a" vals) :clj)))

    (unless (fboundp sym)
      (setf (fdefinition sym) (seq-types vals)))

    `(and (satisfies set?) (satisfies ,sym))))

Once I've got that, I can declare things. Like,

CLJ> (let ((a {:a 1 :b 2}))
  (declare (type (map keyword t) a))
  a)
{:A 1 :B 2}
CLJ>

Checking for equalities

There's some more work to do. The whole point of this exercise is Once I've got a type declared, I need to do the work I actually care about. Which is: figure out which of the built-in structural equality operations is the most efficient I can use while also being as correct as possible.

(defun fullest-equality (equalities)
  (find-if
   (lambda (e) (member e equalities :test #'eq))
   '(clj:== cl:equalp cl:equal cl:eql cl:eq cl:string= cl:=)))

(defun equality-function (name) (fdefinition name))

(defun equality-of (type)
  (cond
    ((member type '(integer number float ratio rational bignum bit complex long-float short-float signed-byte unsigned-byte single-float double-float fixnum))
     'cl:=)
    ((member type '(string simple-string))
     'cl:string=)
    ((member type '(atom symbol keyword package readtable null stream random-state))
     'cl:eq)
    ((member type '(standard-char character pathname))
     'cl:eql)
    ((member type '(cons list))
     'cl:equal)
    ((and (listp type) (eq 'or (first type)))
     (fullest-equality (mapcar #'equality-of (rest type))))
    ((member type '(hash-table sequence array bit-vector simple-array simple-bit-vector simple-vector vector))
     'cl:equalp)
    ((and (listp type) (member (car type) '(array simple-array simple-bit-vector simple-vector vector)))
     'cl:equalp)
    ((member type '(compiled-function function))
     nil)
    (t 'clj:==)))

It's a fairly naive binding table, completely inextensible for the moment, that maps a type to the name of an equality operation that will accurately compare them. Hopefully, I mean. As long as I didn't fuck something up.

CLJ> (equality-of '(map keyword t))
==
CLJ> (equality-of 'keyword)
EQ
CLJ> (equality-of 'list)
EQUAL
CLJ> (equality-of 'hash-table)
EQUALP
CLJ> (equality-of 'string)
STRING=
CLJ>

Seems legit.

Putting it all together

The next step is, we want to use this equality selection procedure to make our map and set constructors pick a better one than == if it can.

(defparameter *type* nil)
...
(defun alist->map (alist &key equality)
  (let ((equality (or equality
		      (if (map-type? *type*)
			  (equality-function (equality-of (second *type*)))
			  #'==))))
    (loop with dict = (cl-hamt:empty-dict :test equality)
       for (k . v) in alist do (setf dict (cl-hamt:dict-insert dict k v))
       finally (return dict))))

(defun list->map (lst &key equality)
  (assert (evenp (length lst)) nil "Map literal must have an even number of elements")
  (let ((equality (or equality
		      (if (map-type? *type*)
			  (equality-function (equality-of (second *type*)))
			  #'==))))
    (loop with dict = (cl-hamt:empty-dict :test equality)
       for (k v) on lst by #'cddr
       do (setf dict (cl-hamt:dict-insert dict k v))
       finally (return dict))))
...
(defun list->set (lst)
  (let ((equality (if (set-type? *type*)
		      (equality-function (equality-of (second *type*)))
		      #'==)))
    (reduce
     (lambda (set elem)
       (cl-hamt:set-insert set elem))
     lst :initial-value (cl-hamt:empty-set :test equality))))

So, we've got a *type* special var that we can use to declare the type of the map/set we're defining, and if it's set, we use it to pick an appropriate equality. Otherwise, we just go with #'==, because that's as general as it gets.

CLJ> (list->set (list 1 2 3 4))
#{3 2 1 4}
CLJ> (cl-hamt::hamt-test (list->set (list 1 2 3 4)))
#<STANDARD-GENERIC-FUNCTION CLJ:== (8)>
CLJ> (let ((*type* '(set integer))) (list->set (list 1 2 3 4)))
#{3 2 1 4}
CLJ> (let ((*type* '(set integer))) (cl-hamt::hamt-test (list->set (list 1 2 3 4))))
#<FUNCTION =>
CLJ> (list->map (list :a 1 :b 2 :c 3))
{:A 1 :C 3 :B 2}
CLJ> (cl-hamt::hamt-test (list->map (list :a 1 :b 2 :c 3)))
#<STANDARD-GENERIC-FUNCTION CLJ:== (8)>
CLJ> (let ((*type* '(map keyword t))) (cl-hamt::hamt-test (list->map (list :a 1 :b 2 :c 3))))
#<FUNCTION EQ>
CLJ>
Nice.

It doesn't fit all of our use cases though.

CLJ> {:a 1 :b 2 :c 3}
{:A 1 :C 3 :B 2}
CLJ> (let ((*type* '(map keyword t))) {:a 1 :b 2 :c 3})
{:A 1 :C 3 :B 2}
CLJ> (let ((*type* '(map keyword t))) (cl-hamt::hamt-test {:a 1 :b 2 :c 3}))
#<STANDARD-GENERIC-FUNCTION CLJ:== (8)>
CLJ>

The problem is that, because we have reader syntax for our maps and sets, this decision kicks in too late to deal with them. We unfortunately also need a reader macro to handle type declarations.

Reader Macro for Type Declaration

The naive solution here is

...
(defun type-literal-reader (stream sub-char numarg)
  (declare (ignore sub-char numarg))
  (let* ((*type* (read stream))
	 (form (read stream))
	 (val (eval form)))
    (assert (typep val *type*) nil "Type checking failure ~s ~s" *type* form)
    val))

...
  (:dispatch-macro-char #\# #\# #'type-literal-reader))

I don't really want to define this as using :: because of the headache-inducing implications of doing (make-dispatch-macro-character #\:). I'm trying to avoid those for the moment. Same story with #:, because uninterned symbols are common and I don't want to stomp them here. So, I had to pick something else, and arbitrarily accepted ## even though #t or #T would have been equally reasonable choices.

This technically works.

CLJ> {:a 1 :b 2 :c 3}
{:A 1 :C 3 :B 2}
CLJ> ## (map keyword t) {:a 1 :b 2 :c 3}
{:A 1 :C 3 :B 2}
CLJ> (cl-hamt::hamt-test ## (map keyword t) {:a 1 :b 2 :c 3})
#<FUNCTION EQ>
CLJ>

But I want to avoid calling eval as part of it. The more macro-like version would look something more like

(defun type-literal-reader (stream sub-char numarg)
  (declare (ignore sub-char numarg))
  (let* ((*type* (read stream))
	 (form (read stream))
	 (res (gensym)))
    (if *type*
	`(let ((,res ,form))
	   (check-type ,res ,*type*)
	   ,res)
	res)))

It still works...

CLJ> ## (map keyword t) {:a 1 :b 2}
{:A 1 :B 2}
CLJ> (cl-hamt::hamt-test ## (map keyword t) {:a 1 :b 2})
#<FUNCTION EQ>
CLJ>

... but has the added bonuses of not calling eval and also making use of check-type, which we couldn't do if we wanted to do that check inline at read time.

I don't really like the syntax1, but that's good enough for now2.

Performance implications

(defun untyped-benchmark (&key (times 10000))
  (loop repeat times
     do (let* ((m {:a 1 :b "two" :c :three :d 44})
	       (inserted (insert m (cons :test-key :test-value))))
	  (list (len m)
		(lookup inserted :test-key)
		(len inserted)))))

(defun typed-benchmark (&key (times 10000))
  (loop repeat times
     do (let* ((m ## (map keyword t) {:a 1 :b "two" :c :three :d 44})
	       (inserted (insert m (cons :test-key :test-value))))
	  (list (len m)
		(lookup inserted :test-key)
		(len inserted)))))

With the above defined in benchmark.lisp, running the benchmarks and reporting them with M-x slime-profile-report slime-profile-reset gives us...

CLJ> (untyped-benchmark :times 1000000)
NIL
measuring PROFILE overhead..done
  seconds  |     gc     |     consed    |   calls   |  sec/call  |  name
---------------------------------------------------------------
     1.230 |      0.068 | 1,076,698,880 | 1,000,000 |   0.000001 | CLJ::INSERT
     0.931 |      0.000 |        32,768 | 2,000,000 |   0.000000 | CLJ::LEN
     0.617 |      0.000 |     1,679,216 | 1,000,000 |   0.000001 | CLJ::LOOKUP
     0.000 |      0.018 |    59,768,832 |         1 |   0.000000 | CLJ::UNTYPED-BENCHMARK
     0.000 |      0.000 |             0 | 1,000,000 |   0.000000 | CLJ:==
---------------------------------------------------------------
     2.778 |      0.086 | 1,138,179,696 | 5,000,001 |            | Total

estimated total profiling overhead: 9.09 seconds
overhead estimation parameters:
  1.8e-8s/call, 1.8179999e-6s total profiling, 8.8e-7s internal profiling

These functions were not called:
 CLJ:ALIST->MAP CLJ::EQUALITY-FUNCTION CLJ::EQUALITY-OF
 CLJ::FULLEST-EQUALITY CLJ::KV-TYPES CLJ::LIST->MAP CLJ:LIST->SET
 CLJ::MAP-LITERAL-READER CLJ::MAP-TYPE-KEYWORD-T CLJ::MAP-TYPE?
 CLJ::MAP? CLJ::SEQ-TYPES CLJ::SET-LITERAL-READER CLJ::SET-TYPE?
 CLJ::SET? CLJ::TYPE-LITERAL-READER CLJ::TYPED-BENCHMARK

CLJ> (typed-benchmark :times 1000000)
NIL
  seconds  |     gc     |     consed    |   calls   |  sec/call  |  name
---------------------------------------------------------------
     1.195 |      0.040 | 1,076,307,616 | 1,000,000 |   0.000001 | CLJ::INSERT
     0.768 |      0.000 |             0 | 2,000,000 |   0.000000 | CLJ::LEN
     0.605 |      0.000 |             0 | 1,000,000 |   0.000001 | CLJ::LOOKUP
     0.000 |      0.004 |    59,703,296 |         1 |   0.000000 | CLJ::TYPED-BENCHMARK
---------------------------------------------------------------
     2.568 |      0.044 | 1,136,010,912 | 4,000,001 |            | Total

estimated total profiling overhead: 7.27 seconds
overhead estimation parameters:
  1.8e-8s/call, 1.8179999e-6s total profiling, 8.8e-7s internal profiling

These functions were not called:
 CLJ:== CLJ:ALIST->MAP CLJ::EQUALITY-FUNCTION CLJ::EQUALITY-OF
 CLJ::FULLEST-EQUALITY CLJ::KV-TYPES CLJ::LIST->MAP CLJ:LIST->SET
 CLJ::MAP-LITERAL-READER CLJ::MAP-TYPE-KEYWORD-T CLJ::MAP-TYPE?
 CLJ::MAP? CLJ::SEQ-TYPES CLJ::SET-LITERAL-READER CLJ::SET-TYPE?
 CLJ::SET? CLJ::TYPE-LITERAL-READER CLJ::UNTYPED-BENCHMARK
CLJ>

A pretty goddamn tiny difference. I'm not sure this approach is worth much more effort, but I'll plug away for a bit longer to see how elegant I can make it. In the meantime,

Emacs Interaction Improvements

So, the sad thing about all of this is that I've been lying to you. Whenever I show you those nice readouts from the SLIME repl that says something like {:a 1 :b 2}, that's a result of me correcting it. Because, by default, when I type {, what I get is {$. Which I then have to manually backspace and correct. Using this shiny new syntax in Common Lisp mode is also less than ideal, because the default paredit doesn't provide s-exp support for curly braces. It's not as simple as adding

"{" 'paredit-open-curly
"}" 'paredit-close-curly

to a mode-map somewhere, because that does pair them, but doesn't help with navigation.

After messing around with modifying existing syntax-tables, redefining matching-paren, and poking around in paredit internals, the solution I settled on was just adding a mode-hook to a bunch of lisp modes and slime-repl modes that activates the clojure-mode-syntax-table.

You can do this in your .emacs file by doing something like

(defun use-clojure-syntax-table () (set-syntax-table (set-syntax-table clojure-mode-syntax-table)))
(add-hook 'common-lisp-mode-hook 'use-clojure-syntax-table)
(add-hook 'slime-mode-hook 'use-clojure-syntax-table)
(add-hook 'slime-repl-mode-hook 'use-clojure-syntax-table)

I added it to my .emacs by doing

(hooks (common-lisp lisp emacs-lisp scheme lisp-interaction slime clojure slime-repl)
       (lambda ()
	 (setq autopair-dont-activate t)
	 (autopair-mode -1))
       'enable-paredit-mode
       (lambda () (set-syntax-table (set-syntax-table clojure-mode-syntax-table))))

Which is both more thorough and more extensive, but requires me to define some conveniences first.

The next time I write about CLJ, the SLIME repl snippets will not be a lie.

  1. Ideally, the type annotation would be declared like (:: type form), :: type form, or possibly type :: form. However, infix operators are more complicated to deal with, and : already has various meanings in Common Lisp that would make using it as a read-macro-char more complicated than I'd like. Specifically, as hinted at above, doing (make-dispatch-macro-character #\:) instantly complicates the parsing of keywords, uninterned-symbols and any qualified name you end up typing. I'll read up on it a bit and see if there's a way to fall through to the default behavior somehow, but in the absence of that option, this is absolutely more hassle than it's worth.
  2. Possible future improvements include inferring the type of a map literal based on its initial values, and storing the type annotation somehow so that it can be checked against by insert later. I'm not sure any of this is worth the time, and once we pick an appropriate interface, it'll be easy to change internals later.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.