PurelyFunctional.tv Newsletter 386: factoring equations in code

Issue 386 – July 13, 2020 · Archives · Subscribe

Clojure Tip 💡

factoring equations in code

Once in college we had to write a simple function that computed a simple formula. I don’t remember exactly what it was, but it was something like this:

2x + 2 y + z

Now, being 18 and wanting to impress the teacher (and having read too many books on how to optimize video game code), I got clever and factored out the 2.

2(x + y) + z

Notice that there is now only one multiplication instead of two! I could save a whole instruction each time this was called!

In those days we printed our code listings out on wide, green dot matrix paper. I handed mine in and waited for my excellent grade.

But the teacher marked it wrong. I went to talk to him about it. I showed him that it was the same formula. That it got the right answer. But he said something I still remember: “That wasn’t part of the spec. Don’t go above and beyond.” He did give me my points back, but the lesson stuck with me.

I still don’t know if he was right. Is it important that it worked? Was the readability impacted that much by my small factoring? What if I had found a way to do something in constant time that, as expressed in the problem, took quadratic time? Wouldn’t that be a worthwhile improvement? Where do you draw the line between saving a multiply and improving the complexity?

Or was I just caught in his pedagogical style, where he wants the code listings to be easy to read so he can grade them quickly? Or, to be generous, to guide me on a journey of discovery with a known destination—don’t stray too far from the path or you may not discover what he wants you to.

Anyway, I’m still thinking about it, and I was reminded of it in last week’s challenge. The quadratic formula is well known. You can find its common expression with a google search. We even had to recite it in school so we would remember it.

But when I went to implement it as code, part of me really wanted to refactor it. I resisted, though I did name the parts. The formula is still intact. Others, however, didn’t resist. There are some really interesting factorings of the formula. And I’m torn.

The factorings achieve some neat properties. No conditionals! Reuse of calculated parts. Even some interesting symmetries that require study.

But the factorings also obscure the traditional expression of the formula. If I look at one of the more factored versions, I can’t tell if it is correct. I can’t see the original formula in there without breaking out a paper and pencil and un-factoring it back. Although the solution is elegant, I wonder if something important is lost.

All this is to say that, as I read the submissions, I feel for my professor from 20+ years ago. Even when I look at my solution, it feels too clever. I wish it had been a straightforward implementation of the formula.

Don’t get me wrong. Exploration is great. But when you check in your final version to git, your solution doesn’t have to be the furthest out you reached. Maybe you go exploring and find out the best place is just where you started.

Podcast episode🎙

My new episode about the classic paper Why Functional Programming Matters by John Hughes is out. In it, we try to understand this paper’s answer to the question of why functional programming is powerful.

Quarantine update 😷

I know a lot of people are going through tougher times than I am. If you, for any reason, can’t afford my courses, and you think the courses will help you, please hit reply and I will set you up. It’s a small gesture I can make, but it might help.

I don’t want to shame you or anybody that we should be using this time to work on our skills. The number one priority is your health and safety. I know I haven’t been able to work very much, let alone learn some new skill. But if learning Clojure is important to you, and you can’t afford it, just hit reply and I’ll set you up. Keeping busy can keep us sane.

Stay healthy. Wash your hands. Stay at home. Wear a mask. Take care of loved ones.

Clojure Challenge 🤔

Last week’s challenge

The challenge in Issue 385 was to write a function that computes the real roots of a quadratic equation. You can find the submissions here.

Very nice variety of answers in this one.

Please do participate in the discussion on the gist where the submissions are hosted. It’s active and it’s a great way to get comments on your code.

This week’s challenge

indices of a value

Let’s say you’ve got a sequence, [:a 1 :d :f :r 4 :d] and you want to find all of the indices where :d is. That would be (2 6). Your task is to write that function:

(indices-of :d [:a 1 :d :f :r 4 :d])  ;=> (2 6)

Thanks to this site for the challenge idea where it is considered Medium level in Python.

You can also find these same instructions here. I might update them to correct errors and clarify the descriptions. That’s also where submissions will be posted. And there’s a great discussion!

As usual, please reply to this email and let me know what you tried. I’ll collect them up and share them in the next issue. If you don’t want me to share your submission, let me know.

Rock on!
Eric Normand

The post PurelyFunctional.tv Newsletter 386: factoring equations in code appeared first on PurelyFunctional.tv.

Permalink

Clojure Goodness: Create And Initialize Object Based On Java Class With doto

It is very easy to work with Java classes in Clojure. If we want to create a new object based on a Java class and invoke methods to initialize the object directly we can use the doto macro. The first argument is an expression to create a new object and the rest of the arguments are functions to invoke methods on the newly created object. The object returned from the first argument is passed as first argument to the method invocations. The doto function returns the object that is created with the first argument.

In the following example code we use the doto function in several cases:

(ns mrhaki.core.doto
  (:require [clojure.test :refer [is]]))

;; With doto we can invoke functions on object returned
;; by the first argument, where the object is passed
;; before the given arguments.
(def sb (doto (StringBuilder.)
          (.append "one")
          (.append "two")
          (.reverse)))

(is (= "owteno" (.toString sb)))

;; We can use functions in doto. For example
;; to create a value for the function invocation
;; on the type of the first argument.
(def sample (doto (new StringBuilder)
              (.append "{")
              (.append (apply str (repeat 10 "a")))
              (.append "}")))

(is (= "{aaaaaaaaaa}" (.toString sample)))

;; Type returned is the same as result of evaluation
;; of first argument, not the last argument.
(is (instance? StringBuilder (doto (StringBuilder.) (.toString))))

Written with Clojure 1.10.1.

Permalink

Why Functional Programming Matters

In this episode, I read excerpts from Why Functional Programming Matters by John Hughes. Does it answer the question of what is functional programming and why is it powerful? Read the paper here.

Video Thumbnail

Why Functional Programming Matters

In this episode, I read excerpts from Why Functional Programming Matters by John Hughes. Does it answer the question of what is functional programming and why is it powerful? Read the paper here. https://share.transistor.fm/e/c58abf8e https://www.youtube.com/watch?v=eagJJs_Ysos

The post Why Functional Programming Matters appeared first on LispCast.

Permalink

Learning about babashka (bb), a minimalist Clojure for building CLI tools

A few years back, I wrote Clojonic: Pythonic Clojure, which compares Clojure to Python, and concluded:

My exploration of Clojure so far has made me realize that the languages share surprisingly more in common than I originally thought as an outside observer. Indeed, I think Clojure may be the most “Pythonic” language running on the JVM today (short of Jython, of course).

That said, as that article discussed, Clojure is a very different language than Python. As Rich Hickey, the creator of Clojure, put it in his “A History of Clojure”:

Most developers come to Clojure from Java, JavaScript, Python, Ruby and other OO languages. [… T]he most significant […] problem  [in adopting Clojure] is learning functional programming. Clojure is not multiparadigm, it is FP or nothing. None of the imperative techniques they are used to are available. That said, the language is small and the data structure set evident. Clojure has a reputation for being opinionated, opinionated languages being those that somewhat force a particular development style or strategy, which I will graciously accept as meaning the idioms are clear, and somewhat inescapable.

There is one area in which Clojure and Python seem to have a gulf between them, for a seemingly minor (but, in practice, major) technical reason. Clojure, being a JVM language, inherits the JVM’s slow startup time, especially for short-lived scripts, as is common for UNIX CLI tools and scripts.

As a result, though Clojure is a relatively popular general purpose programming language — and, indeed, one of the most popular dynamic functional programming languages in existence — it is still notably unpopular for writing quick scripts and commonly-used CLI tools. But, in theory, this needn’t be the case!

If you’re a regular UNIX user, you probably have come across hundreds of scripts with a “shebang”, e.g. something like #!/usr/bin/env python3 at the top of Python 3 scripts or #!/bin/bash for bash scripts. But I bet you have rarely, perhaps never, come across something like #!/usr/bin/env java or #!/usr/bin/env clojure. It’s not that either of these is impossible or unworkable. No, they are simply unergonomic. Thus, they aren’t preferred.

The lack of ergonomics stems from a number of reasons inherent to the JVM, notably slow startup time and complex system-level classpath/dependency management.

Given Clojure’s concision, readability, and dynamism, it might be a nice language for scripting and CLI tools, if we could only get around that slow startup time problem. Could we somehow leverage the Clojure standard library and a subset of the Java standard library as a “batteries included” default environment, and have it all compiled into a fast-launching native binary?

Well, it turns out, someone else had this idea, and went ahead and implemented it. Enter babashka.

babashka

To quote the README:

Babashka is implemented using the Small Clojure Interpreter. This means that a snippet or script is not compiled to JVM bytecode, but executed form by form by a runtime which implements a sufficiently large subset of Clojure. Babashka is compiled to a native binary using GraalVM. It comes with a selection of built-in namespaces and functions from Clojure and other useful libraries. The data types (numbers, strings, persistent collections) are the same. Multi-threading is supported (pmapfuture). Babashka includes a pre-selected set of Java classes; you cannot add Java classes at runtime.

Wow! That’s a pretty neat trick. If you install babashka — which is available as a native binary for Windows, macOS, and Linux — you’ll be able to run bb to try it out. For example:

$ bb
Babashka v0.1.3 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.

user=> (+ 2 2)
4
user=> (println (range 5))
(0 1 2 3 4)
nil
user=> :repl/quit
$

And, the fast startup time is legit. For example, here’s a simple “Hello, world!” in Clojure stored in hello.clj:

(println "Hello, world!")

Now compare:

$ multitime -n 10 -s 1 clojure hello.clj
...
        Mean        Std.Dev.    Min         Median      Max
user    1.753       0.090       1.613       1.740       1.954       
...
$ multitime -n 10 -s 1 bb hello.clj
...
        Mean        Std.Dev.    Min         Median      Max
user    0.004       0.005       0.000       0.004       0.012       
...

That’s a pretty big difference on my modern machine! That’s a median startup time of 1.7 seconds using the JVM version, and a median startup time of 0.004 seconds — that is, four one-thousandths of a second, or 4 milliseconds — using bb, the Babashka version! The JVM version is almost 500x slower!

How does this compare to Python?

$ multitime -n 10 -s 1 python3 hello.py
...
        Mean        Std.Dev.    Min         Median      Max
user    0.012       0.004       0.006       0.011       0.018       
...

So, bb‘s startup is as fast as, perhaps even a little faster than, Python 3. Pretty cool!

All that said, the creator of Babashka has said, publicly:

It’s not targeted at Python programmers or Go programmers. I just want to use Clojure. The target audience for Babashka is people who want to use Clojure to build scripts and CLI tools.

Fair enough. But, as Rich Hickey said, there can be really good reasons for Python, Ruby, and Go programmers to take a peek at Clojure. There are some situations in which it could really simplify your code or approach. Not always, but there are certainly some strengths. Here’s what Hickey had to say about it:

[New Clojure users often] find the amount of code they have to write is significantly reduced, 2—5x or more. A much higher percentage of the code they are writing is related to their problem domain.

Aside from being a useful tool for this niche, bb is also just a fascinating F/OSS research project. For example, the way it manages to pull off native binaries across platforms is via the GraalVM native-image facility. Studying GraalVM native-image is interesting in itself, but bb makes use of this facility and makes its benefit accessible to Clojure programmers without resorting to complex build toolchains.

With bb now stable, its creator took a stab at rewriting the clojure wrapper script itself in Babashka. That is, Clojure programmers may not have realized that when they invoke clojure on Linux, what’s really happening is that they are calling out to a bash script that then detects the local JVM and classpath, and then execs out to the java CLI for the JVM itself. On Windows, that same clojure wrapper script is implemented in PowerShell, pretty much by necessity, and serves the same purpose as the Linux bash script, but is totally different code. Well, now there’s something called deps.clj, which eliminates the need to use bash and PowerShell here, and uses Babashka-flavored Clojure code instead. See the deps.clj rationale in the README for more on that.

If you want a simple real-world example of a full-fledged Babashka-flavored Clojure program that does something useful at the command-line, you can take look at clj-kondo, a simple command-line Clojure linter (akin to pyflakes or flake8 in the Python community), which is also by the same author.

Overall, Babashka is not just a really cool hack, but also a very useful tool in the Clojurist’s toolbelt. I’ve become a convert and evangelist, as well as a happy user. Congrats to Michiel Borkent on a very interesting and powerful piece of open source software!


Note: Some of my understanding of Babashka solidified when hearing Michiel describe his project at the Clojure NYC virtual meetup. The meeting was recorded, so I’ll update this blog post when the talk is available.

Permalink

Property-Based Testing: From Erlang/Elixir to Clojure Part 2

In the previous post that covered chapter 5 of Property-Based Testing with PropEr, Erlang, and Elixir, I ported most of the code from the book with very few modifications. Code in this post that covers chapter 6: Properties-Driven Development are not straight ports. All the code done [so far] are hosted at https://github.com/shaolang/pbtic I’m using test.check as the property-based testing tool in Clojure. In this chapter (of the book), it tackles the Back to the Checkout code kata: calculate the total price of the items scanned at the checkout and handle specials correctly.

Permalink

CLOJURE in Practice: Functional Approaches and Implementation Strategies

 

The pool of information regarding Clojure as a modern functional language is vast with no exaggeration. There have already been written a lot of articles, discussions, news about Clojure as a functional language together with its key characteristics, theoretical moments, and general insights. We decided to dive deeper and to offer our readers a bit different approach to Clojure nature and implementation strategies. The article will be focused on a more practical/applied side of the issue, drawing a parallel between Clojure and Java, and, consequently, the advantages/benefits of the first. However, the information will not be limited to those very features only. We will provide a short insight into the actual project Agiliway is currently running to demonstrate a visual picture of the Clojure functionality scope. Let us roll, then. 

Clojure is quick, simple, elegant, and highly dynamic. It has gained its popularity since the first days of existence. Rich Hickey (the creator) hit the target and made the routine of mainstream developers much more colorful and engaging than it used to be. So, what exactly happened, and what changes were suggested to ‘enlighten’ the world of programming? As there are a lot of them, we will stress only most general and relevant as compared to Java, highly popular programming language nowadays. 

Clojure vs Java 

Java community is one of the largest and most popular presently in spite of its chaotic landscape and many weak points. Javascript has rather special and complicated syntax and semantics; it is believed by the developers to have no beauty while being time-consuming, boring, something even ‘painful’ to write in. 

In its turn, Clojure has gained popularity of a modern practical Lisp, much better than Java, with a number of new tools to be implemented (nevertheless, one has to operate various constructions and modifications to make the best and most relevant tooling choice decisions offered).  

What do Java developers say about Closure? 

  • ‘Plain, simple and compact syntax and structure’ 
  • ‘Dynamic language’; ‘Wide-scope eco-system with versatile libraries and modern Lisp characteristics’  
  • ‘Great self-realization’ 
  • ‘Great architecture’ 
  • ‘Perfect symbiosis with React namely’ 
  • And much more. 

Obviously, the pros are numerous and diverse. It sounds to be really worthy to give Clojure a good try. 

Clojure project implementation/implication 

Presently, Agiliway team is developing a project based on a full-stack Clojure (nothing but Clojure). It means that our engineering team is developing frontend, backend, mobile and database with Clojure. Such an unification (as compared to Java namely) is another reward the Clojure brings as it allows better inner developing cooperation, understanding, etc. inside the very developing team members. The developers engaged have a perfect possibility to gain a good portion of experience much faster and more efficiently than with other programming languages. The practical benefits mentioned are due to the flexibility of the programming team and the JVM platform peculiarities. 

What about the challenges on the project? Frankly speaking, there have been no serious challenges to be discussed here. Everything runs smoothly, quickly, logically and in unison. That is another considerable advantage of Clojure. 

On the project, Clojure, as compared to Java, has given us a perfect chance to: 

  • Write the code needed fast- SPEED 
  • Use and share it on web and mobile platforms with minimum effort- FLEXIBILITY 
  • Decrease the number of source lines of code (approx. 40 lines instead of nearly 200 in Java) – SIMPLICITY 

So, speed-flexibility-simplicity (SFS) – three main features of Clojure to make it stand out in the background of a wide array of programming languages we have at the moment leaving a range of the latter on a periphery of the IT-sphere.  

In case you have not got started with Clojure yet, maybe it is high time to do it and experience the joy of programming. Clojure is for Brave, True and Smart, so go out and start the acquaintance; the reward may be right here around the corner. 

READ ALSO: CLOJURE: FUSION OF SIMPLICITY AND SOPHISTICATION

Permalink

Clojure Goodness: Replacing Characters In A String With escape Function

The clojure.string namespace contains a lot of useful functions to work with string values. The escape function can be used to replace characters in a string with another character. The function accepts as first argument the string value and the second argument is a map. The map has characters as key that need to be replaced followed by the value it is replaced with. For example the map {\a 1 \b 2} replaces the character a with 1 and the character b with 2.

In the following example code we use the escape function in several cases:

(ns mrhaki.string.escape-string
  (:require [clojure.string :as str]
            [clojure.test :refer [is]]))

(is (= "I 10v3 C10jur3"
       (str/escape "I love Clojure" {\o 0 \e 3 \l 1})))

(is (= "mrHAKI" 
       (str/escape "mrhaki" {\h "H" \a "A" \k "K" \i "I" \x "X"})))

(def html-escaping {(char 60) "<" (char 62) ">" (char 38) "&"})
(is (= "<h1>Clojure & Groovy rocks!</h1>"
       (str/escape "<h1>Clojure & Groovy rocks!</h1>" html-escaping)))

(is (= "Special chars: \\t \\n"
       (str/escape "Special chars: \t \n" char-escape-string)))

Written with Clojure 1.10.1.

Permalink

Serving Findka's recommendations via libpython-clj

I've written previously about the simple baseline algorithm that Findka's been using to generate content recommendations. My brother Marshall (who's into data science) has since then started helping out part-time with improving on it. As of last night, we have an off-the-shelf KNN algorithm (written in Python) running in an A/B test. This was my first time integrating Clojure with Python, so I thought I'd give an outline of what we did.

Background: dataset + algorithm

Findka's data set consists of about 75K ratings: tuples of user ID, item ID, rating. The ratings are -1 (for thumbs-down) or 1 (for thumbs-up or bookmark). We also keep track of when items are skipped, but currently that data isn't used for building the model (a lot of users mash the skip button). Marshall used Surprise (a Python recommendation lib) to run a bunch of experiments on our existing rating data and found that KNN Baseline performed the best. His words:

Establishing the best metric was crucial for proper testing. Surprise is designed to handle an arbitrary range of real-valued ratings while our data is binary (thumbs up/down), so the default RMSE used by surprise isn’t the most accurate for our purposes. Surprise also offers a fraction of concordant pairs metric that is more accurate for our data, but using any built in metric on the entire test set doesn’t give the best optimization for our purposes because what we really care about are the top N ratings for each individual user. This way a single user with 500 accurate predictions in the test set won’t offset the overall accuracy when there are lots of users with only a few ratings in the test set, and predictions will change after a few items from the top are rated anyway.

Using accuracy of the top 3 items for each user as a metric, I tested SVD, Co-Clustering, all the KNN algorithms, and a couple other algorithms from Surpise. KNN with baseline was slightly better than any others. One of the biggest advantages of KNN is that we can determine a confidence along with the prediction rating by looking at the number of actual neighbors that were used in each prediction. The algorithm uses 40 neighbors at most, but many items don’t have that many. If an item has a predicted rating of 1 with 40 neighbors, we can have a fair amount of confidence in that prediction. To determine the best predictions, we simply sort with estimated rating as the primary key and actual neighbors as a secondary key. Based on our test set, KNN with baseline will give good recommendations to users more than 75% of the time. Interestingly, the prediction accuracy is actually 83%, but the algorithm correctly predicts that 8% of the top recommendations will be bad because some users dislike almost everything.

Integration via libpython-clj

Since recommendations are made online (i.e. recommendations are computed while you use the app, rather than e.g. once a week like with Spotify's Discover Weekly), we used libpython-clj to integrate Surprise into the Clojure web app.

The first piece is a python file which exposes an informal interface of three functions:

  • train(ratings_csv_filename) -> surprise.KNNBaseline.
  • top(knn_baseline, user_id, item_ids) -> list of the top 20 recommended items, taken from item_ids.
  • set_rating(knn_baseline, user_id, item_id, rating) -> None.

(View source)

This file is kept at resources/python/knn_baseline.py, so we can require it from Clojure like so (and it gets deployed with the rest of the app):

(ns findka.algo
  (:require
    [clojure.java.io :as io]
    [libpython-clj.python :as py]
    [libpython-clj.require :refer [require-python]]
    ...))

(require-python 'sys)
(py/py. sys/path append (.getPath (io/resource "python")))
(require-python '[knn_baseline :reload :as knn])

Thanks to :reload, any changes made to the Python code will come into effect whenever we evaluate the findka.algo namespace.

Then we call the three functions from different parts of Biff. During startup, we pull all the ratings from Crux and store them in a CSV, then pass it to knn/train:

(ns findka.core
  (:require
    [biff.system :refer [start-biff]]
    [findka.algo :as algo]
    ...))

(defn start-findka [sys]
  (-> sys
    (merge
      ...
      {:findka/model (atom {})})
    (start-biff 'findka)
    algo/update-knn-model!
    ...))
(ns findka.algo
  (:require
    [clojure.java.io :as io]
    [crux.api :as crux]
    [findka.trident.crux :as tcrux]
    [trident.util :as u]
    ...))

...

(defn knn-model-inputs [user+item+ratings]
  (let [f (io/file (u/tmp-dir) "knn-ratings.csv")]
    (with-open [w (io/writer f)]
      (doseq [[user item rating] user+item+ratings]
        (.write w (str
                    (hash user) ","
                    (hash item) ","
                    (if (= rating :dislike) -1 1) "\n"))))
    #:knn{:model (knn/train (.getPath f))}))

(defn update-knn-model! [{:keys [findka.biff/node findka/model] :as sys}]
  (swap! model merge
    ; wraps crux.api/open-q
    (tcrux/lazy-q (crux/db node)
      {:find '[user item rating]
       :where '[[user-item :rating rating]
                [(!= rating :skip)]
                [user-item :user user]
                [user-item :item item]]}
      knn-model-inputs))
  sys)

I should also set up a cron job to call update-knn-model! periodically... though for now I'm just calling it manually once a day (via a tunneled nrepl connection) since I have other manual tasks I run daily anyway.

Whenever a client needs some more recommendations, it submits a Sente event. The handler calls findka.algo/recommend, which checks the user's A/B test assignment and then calls knn/top if the user is on the B arm (and returns a Biff transaction, along with some other data for logging purposes).

(ns findka.algo
  (:require
    [crux.api :as crux]
    [trident.util :as u]
    ...))

...

(def hash-str (comp str hash))

(defn recommend-knn [{:keys [user-ref model db n]
                      :or {n 5}}]
  (let [; A list of Crux document IDs for items to possibly recommend. Currently we
        ; query Crux for all the items and then exclude ones which the user has
        ; already rated.
        candidates ...
        hash-str->candidate (u/map-from hash-str candidates)
        ; user-ref is a Crux document ID for the user.
        user-id (hash-str user-ref)
        exploit (for [m (knn/top model (hash-str user-ref) (map hash-str candidates))]
                  (merge (u/map-keys keyword m)
                    {:item (hash-str->candidate (get m "item-id"))
                     :algo :knn/exploit}))
        recs (->> candidates
               shuffle
               (map #(hash-map :algo :knn/explore :item %))
               ; Like interleave, but probabalistically choose items from the first
               ; collection X% of the time.
               (interleave-random 0.666 exploit)
               distinct
               (take n))]
    {:items (map :item recs)
     ; A Biff transaction
     :tx ...}))

(defn recommend [{:keys [user-ref db] :as env}]
  (let [algo-assignment (:ab/algo (crux/entity db {:ab/user user-ref}))
        unassigned (nil? algo-assignment)
        algo-assignment (or algo-assignment (first (shuffle [:cooc :knn])))
        f (case algo-assignment
            :cooc recommend-cooc
            :knn recommend-knn)]
    (cond-> (f env)
      true (assoc :algo algo-assignment)
      unassigned (update :tx conj
                   [[:ab {:ab/user user-ref}]
                    {:db/merge true :ab/algo algo-assignment}]))))

We're still using an epsilon-greedy strategy for exploration: 1/3 of the recommendations are purely random. That's probably a good area for future experiments.

Interlude: performance

On my first pass at using libpython-clj, recommend-knn was really slow. Even when I limited the candidates value to 500 random items (out of ~4.5K), the function was taking anywhere between 8 and 40 seconds to run on the server. (Curiously, it was never that slow on my laptop, even though most experiments I've ran go about 25% faster on the server... so I only found out after I deployed).

Long story short, the culprit was passing too many objects between Python and Clojure. libpython-clj wraps Python objects in Clojure interfaces and vice-versa (or something like that), but evidently that shouldn't be relied on willy-nilly. In particular:

  1. Originally I was passing the rating data directly from Crux to Surprise without writing to a CSV in between. I switched to the approach shown above, having Surprise read the data from disk instead.

  2. Instead of using the knn/top function, I was calling a Surprise method from Clojure to get the rating prediction for each item, after which I sorted the items and took the top 20. Now knn/top does that all in Python, so it only has to pass 20 objects back to Clojure instead of hundreds (or thousands).

I'm also still limiting candidates (to 1,000) because KNN is relatively slow even when I run the Python code directly (i.e. not via Clojure). That should be unnecessary once we switch to matrix factorization (when we tested SVD, predictions were about 30x faster and almost as accurate).

With those changes, recommend-knn for a typical user takes about 1,200 ms to run on the server—roughly 500 ms for knn/top and 700 ms for fetching candidates from Crux. The latter could be optimized further as well. I haven't investigated raw index access with Crux, but we could keep candidates in memory for active clients if nothing else.

(Also: before using knn/top, I tried returning tuples instead of dictionaries (maps), but it was still slow. Passing in a large list of strings (map hash-str candidates) from Clojure wasn't an issue though.)

Some simple tests indicated that computation via libpython-clj ran about 25% slower than plain Python, even when there was no object passing involved. I don't know if that's inherent or if I'm doing something wrong. Right now it doesn't matter, but if future us cares about it, I'm thinking we could run Python in a separate process and communicate over IPC (assuming IPC isn't slow, another thing which I have not investigated). Maybe by that time we'll have switched to rolling our own models in Clojure.

Python integration continued

Finally, on to knn/set_rating. Again, Findka does recommendations online. If someone visits the site for the first time, we want the algorithm to adapt to their preferences as soon as they rate an item. So we must update the model incrementally (at least the portion of the model representing the user's tastes). This was mildly inconvenient because Surprise (and I suspect most other recommendation libraries) aren't written with that use case in mind. After peering into the source, I came up with this:

@synchronized  # lock a mutex during execution
def set_rating(knn_baseline, user_id, item_id, rating):
    if not knn_baseline.trainset.knows_item(item_id):
        return

    inner_item_id = knn_baseline.trainset.to_inner_iid(item_id)
    try:
        inner_user_id = knn_baseline.trainset.to_inner_uid(user_id)
        new_user = False
    except ValueError:
        inner_user_id = len(knn_baseline.trainset._raw2inner_id_users)
        new_user = True
        new_bu = np.append(knn_baseline.bu, 0)
        knn_baseline.bu = new_bu
        knn_baseline.by = new_bu
        knn_baseline.yr[inner_user_id] = []

    new_ratings = [(i, r) for i, r in knn_baseline.yr[inner_user_id] if i != inner_item_id]
    if rating is not None:
        new_ratings += [(inner_item_id, rating)]
    knn_baseline.yr[inner_user_id] = new_ratings

    if new_user:
        # Do this last to make sure top doesn't get messed up if it calls while
        # this function is executing (a likely occurrence).
        knn_baseline.trainset._raw2inner_id_users[user_id] = inner_user_id

This is called from a small wrapper function:

(ns findka.algo
  ...)

(defn set-rating-knn! [{:keys [findka/model]
                        {:keys [user item rating]} :user-item}]
  (knn/set_rating
    (:knn/model @model)
    (hash-str user)
    (hash-str item)
    (normalize-rating rating)))

Which is called from a Biff trigger whenever someone rates an item:

(ns findka.triggers
  ...)

...

(defn ratings-updated [{:keys [doc doc-before] :as env}]
  (when (some-changed? doc doc-before :rating)
    (let [{:keys [user item]} (merge doc-before doc)
          rating (:rating doc)]
      (algo/set-rating-knn! (assoc env :user-item
                              {:user user
                               :item item
                               :rating rating})))
    ...))

(def triggers
  {:user-items {:write ratings-updated}
   ...})

I'm quite happy with this now that it's all set up. I think it'll help us iterate quickly on the algorithm. I'm particularly looking forward to see the results for this A/B test. My next mini project is focused on analytics to that end, as we haven't been doing any A/B testing before now.

Permalink

GraalVM Native Image: Spring VS Quarkus

In 2018, Oracle announce the 1.0 release of GraalVM.

GraalVM is a universal virtual machine for running applications written in JavaScript, Python, Ruby, R, JVM-based languages like Java, Scala, Groovy, Kotlin, Clojure, and LLVM-based languages such as C and C++. [source]

One of the most interesting features for me is compilation to native binary (in GraalVM usually called Native Image) for JVM-based applications. Native Image means the compilation will be standalone that does not require JVM (like C, C++, or Golang), but it have trade-off (JIT vs AOT) and limitations.

Spring Framework, one of the most mature and most popular framework in JVM world, currently does not have native image compilation feature (out of the box) yet because of the limitations I mentioned earlier. In mid 2019 spring-graalvm-feature was developed to support native image compilation for Spring Boot (Spring Framework 5.2), and, in September 2019 officially joined at Spring Projects Experimental Project. I’m very interested to try, but it requires huge memory for build process, my laptop is just 8GiB RAM, and currently there is no need to add more RAM 😅. Good news, this commit make memory usage smaller.

Some new Java frameworks are already have native image compilation feature out of the box, such as Quarkus, Micronaut, and Helidon. A few months ago I was interested to try Quarkus, actually 1.0 was released last November, but I didn’t have much free time to try. A few months ago, after a few days of trying Quarkus, I’m curious to benchmark Quarkus with Spring (sping-graal-native), but after several experiment with spring-graal-native, I think it still painful, even for simpe http API and JPA (with mainstream SQL Database), my issue so far #1, #54, #55, #93, #197. No wonder, it’s 0.6.0, maybe wait for the next version or official support at Spring 5.3 is good idea.

Couple weeks ago, spring-graal-native 0.7.1 has been released, many issues have been closed and I have free credits on GCP, so, lets try again and do some benchmark. I know it’s unfair comparing spring-graalvm-native right now, probably let’s just say this is an “experiment report” 😂.

Application

This benchmark is done to four type of application

  1. Spring Boot
  2. Spring Boot with Webflux
  3. Quarkus
  4. Quarkus (Spring API Extension)

all of that is just simple CRUD application over http with PostgreSQL database. There is 6 http endpoint

  1. Insert
  2. Find by id
  3. Find by page
  4. Find by specific column (filtering)
  5. Update
  6. Delete.

You can check my repository for source code. Spring Data and JDBC version in Webflux version (number 2) is not reactive. I add Webflux version just for comparison, because Quarkus is Netty based.

For how to compile the applications to Native Image, you can check this documentation for Quarkus and this documentation for spring-graalvm-native. I’m using hybrid mode for spring-graalvm-native, and custom build script (see compile.sh on every root directory of spring project).

For framework and runtime version, I’m using Spring Boot 2.3.1 (spring-graalvm-native 0.7.1), Quarkus 1.5.2, GraalVM 2.1.0 Java 11.

Data Model

id *primary key
resource_string *for sample filtering
resource_text *just 3000 random char for payload

Build Process

Build process done on 4 vCPUs, 16 GiB RAM Virtual Machine, 2.8% RAM usage when idle.

1.Build Time

spring-boot 8 minutes 39 seconds
spring-boot-webflux 8 minutes 12 seconds
quarkus 3 minutes 55 seconds
quarkus-spring-api 3 minutes 57 seconds

After I tried compile several times, the inconsistency of the build time can be up to 20–30 seconds.

2. CPU Usage (%) over time

3. RAM Usage (%) over time

4. Peak RAM Usage

spring-boot 10.1 GiB
spring-boot-webflux 9.8 GiB
quarkus 8 GiB
quarkus-spring-api 8 GiB

5. Binary Size

spring-boot 196.6 MB
spring-boot-webflux 182.7 MB
quarkus 69.8 MB
quarkus-spring-api 69.6 MB

Well, result of build time and binary size for Spring is much higher, lets wait (and maybe also contributing) for Spring 5.3 and non-experimental version of spring-graalvm-native.

Startup Time

All of application is started under 1s at 2 vCPUs or 1 vCPU, I think this section doesn’t need to be explained in more detail.

Stress Test

VM Spesification

Database: 6 vCPUs, 16 GiB RAM, SSD (us-west1-b)

Application: 1 vCPUs, 2 GiB RAM (us-west1-b)

JMeter: 4 vCPUs, 16 GiB RAM (us-west2-a)

JMeter VM is at different region because I’m using free trial account that have 8 vCPUs per region limitation.

Scenario

I’m using JMeter Random Controller to randomly request to all endpoint for 15 minutes. Before that, I truncate the table and insert 10000 record.

I didn’t do special tuning or something for all of the application, all default (include spring webmvc, so, 200 tomcat threads), just setting up number of connection pool. I just do tunning at VM level, increase max number of open file and enable tcp_tw_reuse.

Throughput Result

Before I run the 15 minutes test, I tried several 60 seconds test to find most suitable setting (JMeter threads (concurrent users) and number of connection pool) for this kind of VM specification. I’m using Quarkus for base line, and this is the result.

Based on that result, I think 150 JMeter threads is big enough. For efficiency, let’s just use 2 connection in pool and 1vCPU.

Well, this is the result for 2 connection in pool handling 150 JMeter Thread in 15 minute.

spring-boot 390.1/s
spring-boot-webflux 631.6/s
quarkus 1042.7/s
quarkus-spring-api 871.1/s

And this is average througput/s over time (every 30s).

Resource Utilization (during Stress Test)

Application VM

  • CPU Usage (%) over time

Note: As you can see, Application VM CPU Usage for quarkus-spring-api, and all Spring version is decreasing over time. But on DB VM CPU Usage for that version is increasing (see next section).

  • RAM Usage (%) over time

Database VM

  • CPU Usage (%) over time

Note: Continuation from Application VM CPU Usage Note. Database VM CPU Usage for quarkus-spring-api, and all Spring version is increasing over time. I have not investigated that further yet. Probably because I used Quarkus version as baseline when finding the number of JMeter Threads and the other can’t handle request with that setting. I will re-test with 100 JMeter Threads later.

  • RAM Usage (%) over time

JMeter VM

  • CPU Usage (%) over time
  • RAM Usage (%) over time

Full source code is available on my Github page.

ard333/native-binaries-benchmark

Thanks for reading (Sorry For My Bad English 😅) and feel free to comment.

Are you looking for information about remote work?
or have a cool resource about remote work?
remotework.FYI is all you need to know about remote work, find and share cool resources right now.

GraalVM Native Image: Spring VS Quarkus was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

Permalink

How to Check if A List Contains a Value in Clojure

Recently I started to learn Clojure, and usually, my first phase is to complete small programming exercises. To practice Clojure I decided to solve some CodingBat problems. A common task here is to check, if an array does contain some value. The expected result should be a boolean value. When I moved to practical implementations, I realized how tricky can be a simple thing in Clojure.

In this post, I would like to share with you some thoughts on how to check, that a list contains an element in Clojure. As almost any beginner I have started with contains? method, however, it may not be your best option. I will explain to you several solutions, including ones, that come from Java.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.