Writing library for both Clojure and ClojureScript

Clojure 1.7 introduced reader conditionals and standardized way to write Clojure code for various platforms. Last few months ago, when I started to write Rigui, I decided to support both Clojure and ClojureScript. And it turns out to be possible. However, during the development, you need to be aware of a few differences between these two hosts, from tooling, syntax to host natures. That's just way more than reader conditionals.

Host differences

The most important difference between JVM and JavaScript runtime is multithread support. At the moment, almost all JavaScript runtime are event-driven. For most time there is only one thread you can run your code on. And when it comes to multithread, like WebWorker, the platform offers a message-passing API and you don't have any shared memory or data. So features like STM, Agent are not necessary and completely unavailable in this runtime. In case you were using dosync or send! in your Clojure code, you need to add :cljs branch and switch to atom.

#?(:clj (dosync
          (let [b (get (ensure (.-buckets parent)) trigger-time)]
            (alter (.-buckets parent) dissoc trigger-time)
            (ensure b)))
    :cljs (let [b (get @(.-buckets parent) trigger-time)]
            (swap! (.-buckets parent) dissoc trigger-time)
            @b))

Due to the same reason, future and promise (the Clojure promise) are not available in ClojureScript. If you were using a promise as derefable result in API, you need to find other solution in ClojureScript.

Reader conditionals

The Clojure wiki has a brief intro for reader conditionals.

Just to add that if you have a namespace that has completely different implementations, you can write them in different files. For example, platform.clj and platform.cljs. When importing the namespace (require '[platform]'), the reader will look for right file for you.

Syntax differences

There are some minor syntax differences between clj and cljs. To access a field of an object or record, you will need to use (.-fieldName obj). Otherwise cljs will try to call it as a function. Actually this syntax is also supported in clj currently, so I recommend you to switch to this completely in all your clojure code.

Cljs requires explicit macro import. You will need (:require-macros) in your (ns) header, or using :refer-macros in a require form.

Tooling

If you were to write a library for clojurescript, you can use leiningen without plugins. Because cljs libraries are also distributed as jars so you don't need to generate JavaScript files. And in case you need to, cljsbuild is the de-facto plugin for cljs development.

Second, there is no perfect REPL workflow in ClojureScript as in Clojure, to be honest. It's a pain to test your cljs code in Clojure way and browser/node environment. If you have recommendation please tell me.

I thought tests were as pain as REPL until I found doo. doo is a lein plugin made cljs testing as easy as clojure. What you have to do is provide a runner namespace and generate js for this runner. No copy/paste required, no duplicated code. Without doo, you will need to generate JavaScript sources from you clojure test file using cljsbuild, ns by ns, and execute these files one by one. You can find my test runner here and its cljsbuild configuration.

Conclusion

These are my cents about developing library for both clj and cljs. The conclusion is simply it works. In 2016 you can, and you should be encouraged to write Clojure library with cross-platform in mind.

Permalink

Beating bugs with brute force

For years I've been writing the tests for the applications I write. However, it turns out that computers can do a better job. Property based testing is the doorway to a more advanced world of testing that can dramatically improve quality and find the kinds of bugs that would often go undetected outside the live system.

Generative Testing

When you write tests, you will often have to write test data (aka fixtures).

For example, let's say we have a micro-service for dealing with customer details. This is likely a CRUD service, so we might write a test that POSTs a new customer and then tries to GET the customer. We will define some sample data to build a new customer request:

(def a-customer
  {:name "David Smith"
   :age 33
   :gender "male"})

We can then re-use this in various different tests. We may even turn this into a function to try to vary the details for certain tests:

(defn a-customer
  [m]
  (merge {:name "David Smith"
          :age 33
          :gender "male"}
         m))

An alternative would be to use generators to create our data:

(def a-customer
  (gen/hash-map :name gen/string
                :age gen/int
                :gender gen/string))

(deftest my-test
  (let [new-customer (gen/generate a-customer)]))

The gen namespace is part of a Clojure library called test.check which provides functions for generating random data. For example, gen/string will generate a random string, gen/int will generate a random integer etc.

So why would you use the generated version? First it means you don't have to waste your time coming up with witty values but more importantly you are more correctly defining how your function works. In production, the service will not always receive a customer whose name is "David Smith", it will receive a name whose value is a string. With a generator we state this explicitly. On top of that, generators tend to generate loads of rubbish that can screw with your functions surprisingly quickly; I've found quite a few bugs the first time I hit the service with generated data.

Generators can be a bit daunting at first, I thought that they may become so complicated that you would need to test your generators! It turns out though that this is not the case and property testing libraries like test.check have the tools to generate just about anything fairly easily. You can also come up with your own patterns and helpers to make things easier. One of the best examples of this is Plumatic Schema's experimental generators.

Given any schema, the library will provide you with a generator to generate values that conform to this schema. If you are already validating your new customer in the microservice using schema then there is really no work involved:

(defschema Customer
  {:name s/Str
   :age s/Int
   :gender s/Str})

(sg/generate Customer)

It really is as simple as that, we've eliminated the tedious work of writing sample data and at the same time we've increased the scope for finding bugs.

Property Based Testing

Property based testing is a method of testing functions pioneered by the Haskell community. From Hackage:

QuickCheck is a library for random testing of program properties.

The programmer provides a specification of the program, in the form of properties which functions should satisfy, and QuickCheck then tests that the properties hold in a large number of randomly generated cases.

Property based testing libraries such as test.check have 2 distinct parts. The first part is a framework for random value generation as we saw above, the second part is a clever test runner that will try to find the simplest failing case.

As a simple example taken directly from the test.check README.md, lets say you have a function called sort which will reverse a vector of integers. You provide a generator which will generate vectors of random sizes containing random integers, you then use these as inputs into your functions. Finally you provide a set of properties that should hold true, in this example we can say that reversing a list twice should result in the original list. A library such as QuickCheck or Clojure's test.check will then try to find an example that will cause the test to fail by generating hundreds or thousands of test cases.

(def sort-idempotent-prop
  (prop/for-all [v (gen/vector gen/int)]
    (= (sort v) (sort (sort v)))))

(tc/quick-check 100 sort-idempotent-prop)
;; => {:result true, :num-tests 100, :seed 1382488326530}

This all sounds great however all the online examples are testing small, pure functions that are only a small part of the software we write. Impressive as it is, I was struggling to see how often I would use this type of testing in my everyday development of systems such as HTTP microservices, which often have limited functionality and not much complex logic. However that all changed once I started to have a go!

You wanna play rough?

In a recent project we had built a microservice that would take a request through a RESTful interface, provide a small amount of validation and then place the result on RabbitMQ. We decided to use our new yada library to take care of all the HTTP/REST infrastructure for us.

The service wouldn't be used in a particularly intensive way however the team felt that it would be a good idea to write some load tests to see at what point it falls down and what happens when it does.

Say hello to my little friend!

We decided to use clj-gatling for our load testing. This is a clojure testing tool which is designed primarily for hitting servers with thousands of requests in parallel and producing nice reports about what happened. Since we had already written integration tests to check the functionality of the service (using test.check), it was simply a matter of reusing these tests in a slightly modified manner. We would hit the service on a few of the endpoints and check that the appropriate messages were present on the Rabbit queue. I knew that both RabbitMQ and the aleph server that yada is built on were designed for high performance so I imagined that we would have to really push things to see any problems, after all, we had already verified that the service worked reliably with the integration tests.

(deftest load-test-all-endpoints
  (let [{:keys [api-root test-config]} (test-common/load-config)]
    (g/run-simulation [{:name     "sequentially try each endpoint"
                        :requests [{:name "Put user on queue"
                                    :fn   (partial post-user api-root)}
                                   {:name "Put articles queue"
                                    :fn   (partial post-articles-csv api-root)}
                                   {:name "hit health check endoint"
                                    :fn   (partial health-check api-root)}]}]
                      (:users test-config)
                      {:requests (:requests test-config)})
    (let [total-tests (+ @post-user-count @post-articles-csv-count)]
      (is (= 0 (count @errors)) (format "some requests failed e.g. %s" (first @errors)))
      (eventually-is (= total-tests (count (keys (deref test-common/msgs)))) (:message-timeout test-config)
                     (format "all messages should be received within %sms" (:message-timeout test-config))))))

Who put this thing together?

In the first run I decided to hit the service with 1000 requests from 10 'users' in parallel. One of the endpoints was a CSV file upload and I was surprised to find that some of the messages from this endpoint had not appeared on the queue. My initial reaction was that perhaps there was a small overhead getting messages on to Rabbit and although throughput would be high, I might need to give a bit of time after the test had fired it's requests to see all the results. However I discovered that the messages were simply not getting put on the Rabbit queue, they were just disappearing.

With some old-school 'print line' debugging, it was possible to see that request was getting in to the server but the body was not appearing in my yada handler. This would happen for about 0.5% - 1% of requests, which of course we would never have found with our integration tests. Perhaps occasionally we would have a failed Jenkins build but run it again and everything would pass, it would, in all probability, be put down to something weird on the Jenkins slave and be ignored. We would have lost data in production at some point.

Lesson number one; Lesson number two

Firstly, this made us realise that we should give a 400 response if the body was empty, something we had failed to think about.

Next, careful investigation revealed that the library yada was using for finding multipart boundaries was broken. As a side note, this library was a prime candidate for property based testing and it would have revealed this bug. We were making use of the new multipart streaming-upload feature of yada which allows a web API to process large request bodies asynchronously, useful for uploading large images and videos. When handling multipart request bodies, yada needs to efficiently detect boundaries (known sequences of characters). The library it was using (clj-index) had a bug in it that meant that in certain circumstances boundaries would go undetected.

Malcolm, the primary author of yada, developed a new asynchronous implementation of the Boyer-Moore-Horspool algorithm and released a new version.

We ran the tests again but still found some failures! Working with Malcolm we found that under certain circumstances, the logic of piecing together the chunks of an uploaded file was incorrect.

The issue was fixed and finally the tests passed. We were able to push the service hard and it continued working flawlessly (well, until we finally ran out of file descriptors!).

Now you're talking to me baby!

So what did I learn from this experience?

  • Load tests are important, they can test more than just performance.
  • Generative tests are vital and can find bugs that would have resulted in loss of revenue.
  • Even wearing a QA hat, we can miss simple failure scenarios that should be planned for and dealt with appropriately (the 400 response in this case).
  • It's vital to use libraries that are either battle-tested or that are actively maintained so that bugs can be fixed promptly.
  • Property based testing should be applied where possible, especially when it comes to testing implementations of algorithms such as Boyer-Moore-Horspool string finding.

Permalink

Alda: A Music Programming Language Built With Clojure

I gave this talk about Alda a few months ago as part of Clojure Remote, the world’s first remote-only Clojure conference.

This talk serves as a quick introduction to music programming languages and the philosophy that led me to create Alda. I talk a little bit about what Alda is and a few of the things you can do with it, but a large part of my talk focuses on why Clojure proved to be the perfect language for building something like Alda.

The conference was a blast - I was honored to be asked to give a talk, but even happier to attend and watch all the other awesome talks! I’m looking forward to seeing what kind of awesomeness Clojure Remote 2017 will bring.

Permalink

Know when to fold 'em: visualizing the streaming, concurrent reduce

Holy cow! Can you believe your luck? What better way to spend some portion of whatever day of the week it is than to think about parallelizing reduction algorithms?

Really? You can't think of one? In that case, I'll kick off our inevitable friendship by contributing a clojure implementation of a fancy algorithm that asynchronously reduces arbitrary input streams while preserving order. Then I'll scare you with glimpses of my obsessive quest to visualize what this algorithm is actually doing; if you hang around, I'll show you what I eventually cobbled together with clojurescript and reagent. (Hint. It's an ANIMATION! And COLORFUL!)

(If you want to play with the algorithm or the visualization yourself, it's better to clone it from github than to attempt copying and pasting the chopped up code below.)

Every hand's a winner

The first really interesting topic in functional programming is reduction. After absorbing the idea that mutating state is a "bad thing", you learn that you can have your stateless cake and eat it too.

   (reduce + 0 (range 10))

does essentially the same thing as

   int total=0;
   for (int i=0; i<10; i++) {
     total += i;
   }
   return total;

without the fuss of maintaining the running sum explicitly. It becomes second nature to translate this sort of accumulation operation into a fold, and it's usually foldl - the one where the running accumulant is always the left-hand side of the reduction operator.

foldl and foldr considered slightly harmful

In August 2009, Guy Steele, then at Sun Microsystems, gave a fantastic talk titled Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful, alleging that this beautiful paradigm is hostile to concurrency. He's right. By construction, you can only do one reduction operation at a time. That has the advantage of being extremely easy to visualize. ASCII art will do (for now):

1   a     b   c   d   e   f   g  h
     \   /   /   /   /   /   /   /
2     ab    /   /   /   /   /   /
        \  /   /   /   /   /   /
3       abc   /   /   /   /   /
          \  /   /   /   /   /
4        abcd   /   /   /   /
           \   /   /   /   /
5          abcde  /   /   /
            \    /   /   /
6           abdef   /   /
              \    /   /
7            abcdefg  /
                \    /
8              abcdefgh

Here, we are concatenating the first 8 letters of the alphabet, and it will take us 7 sequential steps (aka O(n)) to do it. Steele points out that you can do better if you know in advance that your reduction operator is associative; we can do bits of the reduction in parallel and then reduce the reduced bits. Concatenation is the ultimate in associative operators; we can grab any random consecutive sequence and reduce it, then treat the reduction as if it were just another element of input, e.g.

   a b c d e f g h --> a b c def g h --> abc def gh --> abcdefgh

Approaching this not so randomly, we can repeatedly divide and conquer

1    a   b c   d e    f g    h
      \ /   \ /   \  /   \  /
2     ab     cd    ef     gh
        \   /       \     /
3        abcd         efgh
             \      /
4            abcdefgh

now completing in only takes 3 steps (aka O(log n)). The main thrust of Steele's talk is that you should use data structures that foster this sort of associative reduction. The minimal requirement of such a structure is that you're in possession of all its elements, so you know how to divide them in half. This may remind you of merge-sort, which is in fact a parallelizable, associative reduction, taking advantage of the fact that merging is associative.

associative vs commutative

Suppose, however, that you're reducing a stream of unknown length. It isn't clear anymore how to divvy up the inputs. That isn't a problem if our reduction operation is commutative, rather than just associative. In that case, we can just launch as many reductions as we can, combining new elements as they become available with reduced values, as they complete. If the reduction operator isn't actually commutative, accuracy will suffer:

1   a    b c     d   e  f g h
     \  /   \   /   /   | | |
2     ab     \ /   /    | | |
       \     cd   /     | | |
        \      \ /      | | |
         \     /\      /  | |
          \   /  \    /   | |
3          abe    \  /   /  |
             \     cdf  /   |
              \       \/    |
               \      /\   /
4               abeg    cdfh
                  \      /
5                 abegcdfh

To take an extremely practical example, suppose I needed to keep track of the orientation of my remote-control spaceship (USS Podsnap), which transmits to me stream of 3D rotation matrices, each representing a course correction. Matrix multiplication is associative, so a streaming associative reduce is just the ticket. Matrix multiplication is not, however, commutative, so if I mess up the order I will be lost in space. (Note that 2D rotation matrices - rotation around a single angle - are commutative, so this wouldn't be a problem for my remote-control wheat combine.)

It seems that I truly need a true streaming, associative reduce -- where order matters, but nobody has told me when inputs will stop arriving, at what rate they will arrive, or how long the reductions themselves will take.

streaming associative reduce - a possible algorithm

Here's a possible approach. We maintain multiple queues, and label them 1, 2, 4 etc., corresponding to reductions of that many elements. When the very first element arrives, we throw it onto (the empty) queue #1. When subsequent elements arrive, we check if there's anything at the head of queue #1 and, if so, launch a reduction with it; otherwise, we put it in queue #1. If we do launch a reduction involving an element from queue #1, we'll add a placeholder to queue #2, into which the result of the reduction will be placed when it completes. After a while queue #2 may contain a number of placeholders, some populated with results of completed reductions, others still pending. As soon as we have two complete reductions at the head of queue #2, we launch a reduction for them, and put onto queue #4 a placeholder for the result. And so on.

Sometime after the stream finally completes, we'll find ourselves with all queues containing zero or one reduced value. Because of the way we constructed these queues, we know that any reduction in queue i involves only inputs elements preceding those involved in reductions in any other queue j<i. Accordingly, we just take the single values remaining, put them in reverse order of bucket label, and treat them as a new input series to reduce.

I think we're almost at the limits of ASCII visualization, but let's try anyway. Values in parentheses below are pending:

       1           2            4
       ----------  -----------  -------
       a
       a b
       c           (ab)
       c d          ab
       e            ab (cd)
       e f          ab  cd
       g           (ef)         (abcd)
       g h          ef           abcd
                    ef (gh)      abcd
                    efgh         abcd
       abcd efgh
                   (abcdefgh)
                    abcdefgh

The actual state at any point in time, is going to depend on

  1. When inputs arrive.
  2. The amount of time a reduction takes.
  3. The permitted level of concurrency.

Note that you might not actually achieve the permitted level of concurrency, because we don't yet have two consecutive reductions at the front of a queue. Suppose that queue 2 looks like this (left is front):

2:     (ab) cd (ef) gh

For some reason reducing a/b and e/f is taking longer than reducing c/d and g/h. Only when a/b finishes

2:    ab cd (ef) gh

can we grab the two head elements and launch a reduction of them in queue #4

2:   (ef) gh
4:   (abcd)

Now imagine a case where reductions take essentially no time compared to the arrival interval. Since we do a new reduction instantly upon receiving a new element, the algorithm reduces to foldl, plus a decorative binary counter as we shuffle partial reductions among the queues:

Now another, where inputs arrive as fast as we take them, reductions take 1 second, and we can do up to 10 of them at a time. (n is the actual number of inputs left; np is the actual number of reductions currently running; green squares represent the number completed reductions, red the number in flight ;the actual ordering of reductions and placeholders in the queue is not shown):

See how We immediately fill up queue #2 with placeholders for 10 reductions, which we replenish as they complete and spill over into later buckets.

Finally, here's a complicated example: inputs arrive every millisecond (essentially as fast as we can consume them), reductions take between 1 and 100ms, and we are willing to run up to 10 of them in parallel.

It achieves pretty good concurrency, slowing down only during the final cleanup.

Learn to play it right

Clojure of course contains reduce (and educe and transduce and... well, we've been down that road already), and it even contains an associative reduce, which sticks it to those stuck-up Haskellers by calling itself fold.1 Our reduce will look like

(defn assoc-reduce [f c-in])

where f is a function of two arguments, returning a core.async channel that delivers their reduction and c-in is a channel of inputs; assoc-reduce returns a channel that will deliver the final reduction. In typesprach it would look like this:

(defn :forall [A] assoc-reduce
   ([f     :- (Fn [A A -> (Chan A)])
     c-in  :- (Chan A)
     ] :- (Chan A)))

The central data structure for this algorithm is a queue of place-holders, which I ultimately implemented as a vector of volatiles. That's a bit of a compromise, as it would be possible to employ a fully functional data structure, but we can structure our code to localize the impurity.

When launching a reduction, we place new (volatile! nil) at the end of the queue where its result is supposed to go, and when the answer comes back, we reset! the volatile. Crucially, we do not do this let this resetting occur asynchronously, but arrange for incoming reduction results to contain the volatile placeholder:

   (let [iq     (inc old-queue-number)
         v      (volatile! nil)
         queues (update-in queues [iq] conj v)] ;; put placeholder volatile on queue
     (go (>! reduction-channel                  ;; launch reduction asynchronously
       [iq                 ;; queue number
        (<! (f a b)        ;; reduction resut
        v]])))             ;; destination volatile

The main loop now knows exactly where to put the results, and we know exactly when they were put there. No race conditions here.

   (go-loop [queues {}]
      (let [[iq r v] (<! reduction-channel)
            _        (reset! v r)
            queues   (launch-reductions-using-latest queues)]
         (recur queues)))

What then? After a reduction comes back, we may have an opportunity to launch more, by pulling pairs of reduced values off the the current queue, for further reduction into the next:

(defn launch-reductions [c-redn f iq queue]
  (let [pairs (take-while (fn [[a b]] (and a b))
                          (partition 2 (map deref queue)))]
    (map (fn [[a b]]
           (let [v (volatile! nil)]
             (go (>! c-redn [(inc iq) (<! (f a b)) v]))
             v)) pairs)))

So far, we've thought about what to do with results coming off a reduction channel; we also have to worry about raw inputs. Life will be a little simpler if we make the input channel look like the reduction channel, so we map our stream of xs into [0 x nil]s. One used to do this with (async/map> c-in f), but that's been deprecated in favor of channels with built-in transducers, so we'll create one of those and pipe our input channel to it:

(let [c-in (pipe c-in-orig (chan 1 (map (fn [x] [0 x nil]))))] ...)

Then we'll listen with alts! on [c-redn c-in], taking real or fake reductions as they arrive.

Actually, it's a little more complicated than that, because we don't want to find ourselves listening when no results are expected, and we don't want to accept more inputs when already at maximum parallelization. This means we're going to have to keep track of a little more state than just the queues. Specifically, we'll keep c-in, with the convention that its set to nil when closed and np, the total number of reductions launched:

    (go-loop [{:keys [c-in queues np] :as state} {:c-in c-in :queues {} :np 0}]

The first thing we do in the loop is build a list of channels (possibly empty - a case we'll handle a bit further down

       (if-let [cs (seq (filter identity (list
             (if (pos? np) c-redn)     ;; include reductions if some are expected
             (if (< np np-max) c-in)   ;; include c-in if still open np<np-max
             )))]

and listen for our "stuff':

       (let [[[l res v]  c] (alts! cs)]

The only reason we might get back nil here is that the input channel has been closed, in which case we record that fact and continue looping:

          (if-not l
            (recur (assoc state :c-in nil))

If we do get back a reduction, we put in to the volatile expecting it,

            (let [q (if v
                      (do (vreset! v res) (queues l))        ;; real reduction
                      (concat (queues 0) [(volatile! res)])) ;; actually an input

launch as many reductions as we can from pairs at the head of the queue,

                  vs (launch-reductions c-redn f l q)
                  nr (count vs)
                  q  (drop (* 2 nr) q)

adjust the number of running reductions accordingly,

                  np (cond-> (+ np nr) (pos? l) dec)

put the placeholders on the next queue,

                  l2 (inc l)
                  q2 (concat (queues l2) vs)]

and continue looping

              (recur (assoc state :n n :np np :queues (assoc queues l q l2 q2))))))

In the case where c-in was closed and np was zero, our queues contain nothing but complete reductions, which we extract in reverse order

        (let [reds (->> (seq queues)
                        (sort-by first)     ;; sort by queue number
                        (map second)        ;; extract queues
                        (map first)         ;; take the head, if any
                        (filter identity)   ;; ignore empty heads
                        (map deref)         ;; unpack the volatile
                        reverse
                        )]

If there's only one reduction, we're well and truly done. Otherwise, we treat the new series as inputs:

          (if (<= (count reds) 1)
            (>! c-result (first reds)) ;; return result
            (let [c-in (chan 1 (map (fn [x] [0 x nil])))]
              (onto-chan c-in reds)
              (recur {:n (count reds) :c-in c-in :queues {} :np 0}))))))

Knowin' what the cards were

Surprisingly, it wasn't that difficult to get this working. While the state is a bit messy, we're careful to "modify" it only on one thread, and we enjoy the masochistic frisson of admonishment every time we type one of Clojure's mutation alert2 exclamation points.

Unfortunately, I suffer from a rare disorder in which new algorithms induce psychotic hallucinations. For hours after "discovering" binary trees as an adolescent, I paced slowly back and forth in my friend Steve's family's living room, grinning at phantom nixie numbers3 dancing before my eyes and gesticulating decisively, like some demented conductor. (Subsequently, I used that knowledge to implement in BASIC an animal guessing game, which I taught to disambiguate some kid named Jeremy from a pig with the question, "is it greasy?", so in some ways I was a normal teenager.)

The streaming reduce is particularly attractive - pearls swept forth in quadrilles and copulae - but I guess the graphic equalizer thingie is an ok approximation. Still, I couldn't even see how to make one of those without some horrible sacrifice, like learning javascript. Someday, I will be able to write only clojure, and some kind soul will translate it into whatever craziness the browser wants.

Someday is today

The combination of clojurescript over javascript and reagent over react allows you to do a tremendous amount with the bare minimum of webbish cant. The basic idea of reagent is that you use a special implementation of atom

   (defonce mystate (r/atom "Yowsa!"))

which can be swap!ed and reset! as usual and, when dereferenced in the middle of HTML (here represented as hiccup)

   [:div "The value of mystate is " @mystate]

just plugs in the value as if it had been typed there, updating it whenever it changes. You can also update attributes, which is particularly interesting in SVG elements:

  [:svg [:rect :height 10 :width @applause-volume]]

It's handy to use core.async to glue an otherwise web-agnostic application to depiction in terms of r/atoms, e.g.

   (go-loop []
     (reset! mystate (use-contents-of (<! mychannel)))
     (recur))

Since assoc-reduce was already keeping track of a state, I just introduced an optional debug parameter - a channel which, if not nil, should receive the state whenever it's updated. To simulate varying rates of input and reduction, we use timeouts, optionally fixed or random:

(defn pluss [t do-rand]
  (fn [a b] (go (<! (timeout (if do-rand (rand-int t) t))) (+ a b))))
(defn delay-spool [as t do-rand]
  (let [c (chan)]
    (go-loop [[a & as] as]
      (if a (do (>! c a)
                (<! (timeout (if do-rand (rand-int t) t)))
                (recur as))
        (async/close! c)))
    c))

There's some uninteresting massaging of the state into queue lengths, and some even less interesting boilerplate to read parameters from constructs originally intended for CGI, but in less than half an hour, the following emerges:

Here it is>

It would be more work to make this efficient and to prevent you from breaking it with silly inputs, but I feel vindicated in waiting for the clojurescript ecosystem to catch up to my laziness.

(Of course what I'm really hoping for is that somebody actually animates the dancing pearls for me. Or at least to tell me how to get a carriage return after the damn thing.)

Go forth and reduce!


  1. Haskellers responds with wounding jeers that we do not understand monoids and semigroups, which we will pretend not to care about but obsess over in private. 

  2. A trademark of Cognitect Industries, all rights reserved. 

  3. Yes, numbers. We didn't have pointers back then, so you made structures with arrays of indices into other arrays. Glory days. 

Permalink

Introducing clojure.spec

I'm happy to introduce today clojure.spec, a new core library and support for data and function specifications in Clojure.

Better communication

Clojure is a dynamic language, and thus far we have relied on documentation or external libraries to explain the use and behavior of functions and libraries. But documentation is difficult to produce, is frequently not maintained, cannot be automatically checked and varies greatly in quality. Specs are expressive and precise. Including spec in Clojure creates a lingua franca with which we can state how our programs work and how to use them.

More leverage and power

A key advantage of specifications over documentation is the leverage they provide. In particular, specs can be utilized by programs in ways that docs cannot. Defining specs takes effort, and spec aims to maximize the return you get from making that effort. spec gives you tools for leveraging specs in documentation, validation, error reporting, destructuring, instrumentation, test-data generation and generative testing.

Improved developer experience

Error messages from macros are a perennial challenge for new (and experienced) users of Clojure. Specs can be used to conform data in macros instead of using a custom parser. And Clojure's macro expansion will automatically use specs, when present, to explain errors to users. This should result in a greatly improved experience for users when errors occur.

More robust software

Clojure has always been about simplifying the development of robust software. In all languages, dynamic or not, tests are essential to quality - too many critical properties are not captured by common type systems. spec has been designed from the ground up to directly support generative testing via test.check. When you use spec you get generative tests for free.

Taken together, I think the features of spec demonstrate the ongoing advantages of a powerful dynamic language like Clojure for building robust software - superior expressivity, instrumentation-enhanced REPL-driven development, sophisticated testing and more flexible systems. I encourage you to read the spec rationale and overview. Look for spec's inclusion in the next alpha release of Clojure, within a day or so.

I hope you find spec useful and powerful.

Rich

Permalink

Clojure Gazette 174: Deepening the tree

Clojure Gazette -- Issue 174 - May 22, 2016


Deepening the tree
Read this email on the web
Clojure Gazette
Issue 174 - May 22, 2016

Please consider advertising in the Gazette.


Hi Clojurists,

I wrote last week that we need to write more, smaller abstractions. That's the key to reducing the risk of developing the wrong abstraction. However, no matter how great your language, abstractions have a line count overhead. Line count is highly correlated with bug rate and maintenance cost. Should we pay the cost of more abstraction?

Here's a snippet of code from one of my projects:

    (+ x y)))               ;; end of function above it
                            ;; one blank line of overhead for separation
(defn sum [nums]            ;; one line to name the abstraction
  (reduce + 0 nums))

I could choose to write the reduce expression inline, where I need it. But instead, I chose to increase the number of abstractions and name it sum at the top level. That adds two lines of overhead, one line for the body, and I still have to call sum in another line. So that's four lines of code where one line sufficed. 4x! Try it in any language. Abstraction has a line count cost. For a small program like this, there are more lines of overhead than lines of useful code. It seems ridiculous to scale this to programs of hundreds of thousands of lines.

But not so fast! Things don't always scale linearly. Let's look at the call graph of an imaginary program. This program is one -main function and all of the lines call library or core functions.

The size of this code is eight lines plus one line of overhead to name it -main (9 lines). It lies at one extreme of the spectrum: there's no abstraction, just straight up system calls.

At the other extreme, we have an equivalent function with one line. That function calls two functions on one line. Each of those calls two functions on one line, etc.

The leaf nodes are the same as before. But our cost is now huge. There's one line to name the main, the one line in the body of main, and each function takes three lines to define. Total: 20 lines. So far, it's not looking good for abstraction.

What if you wanted to maximize the effect of adding one more line of code to the program? In the case of the first program, you could add one more system call to the -main function in one line. But in the second program, you could call one of the functions it already calls again. If you call c, d, e, or f, one extra line would result in two system calls. If you call a or b, your new line of code would result in four system calls! So we know at least in principle that we can get big leverage if we reuse stuff near the top of the tree.

To take advantage of this effect in our programs, we need our code to approximate this ideal system. We can do that by making our tree deeper rather than shallower and making more of our functions reusable. If a or b is not reusable, we can't really call it twice like we did in the example.

To make our tree deeper, we need to have lots of nodes that call other nodes. Practically, we need to write functions that call other functions that we've written. We need to build abstractions on top of our abstractions.

Making functions reusable is somewhat of an art. It requires a lot of knowledge about your language's features and your domain. But a great start is the rule "small abstractions are more reusable" if only because small abstractions have less that you don't want to reuse.

Each abstraction adds a fixed line count cost to our code, but they let us do exponentially more with each line. If your code base is significant, the exponential growth of system calls will outweigh the linear growth of overhead, so smaller abstractions can significantly decrease line count.

The keys to maximizing the exponential growth of the potential of one line of code is to build abstractions in terms of other abstractions you write (building a deeper tree) and making them small (so they're more likely to be reusable.) There's a third factor that we didn't explore, which is the branching factor of the tree. In our example, the branching factor was two. What effect does a higher branching factor have on our code size and the riskiness of abstractions? We'll talk about that next time.

Rock on!
Eric Normand <eric@lispcast.com>

PS Want to get this in your email? Subscribe!
PPS Want to advertise to smart and talented Clojure devs?

Permalink

Introducing clojure.spec

I’m happy to introduce today clojure.spec, a new core library and support for data and function specifications in Clojure.

Better Communication

Clojure is a dynamic language, and thus far we have relied on documentation or external libraries to explain the use and behavior of functions and libraries. But documentation is difficult to produce, is frequently not maintained, cannot be automatically checked and varies greatly in quality. Specs are expressive and precise. Including spec in Clojure creates a lingua franca with which we can state how our programs work and how to use them.

More Leverage and Power

A key advantage of specifications over documentation is the leverage they provide. In particular, specs can be utilized by programs in ways that docs cannot. Defining specs takes effort, and spec aims to maximize the return you get from making that effort. spec gives you tools for leveraging specs in documentation, validation, error reporting, destructuring, instrumentation, test-data generation and generative testing.

Improved Developer Experience

Error messages from macros are a perennial challenge for new (and experienced) users of Clojure. Specs can be used to conform data in macros instead of using a custom parser. And Clojure’s macro expansion will automatically use specs, when present, to explain errors to users. This should result in a greatly improved experience for users when errors occur.

More Robust Software

Clojure has always been about simplifying the development of robust software. In all languages, dynamic or not, tests are essential to quality - too many critical properties are not captured by common type systems. spec has been designed from the ground up to directly support generative testing via test.check. When you use spec you get generative tests for free.

Taken together, I think the features of spec demonstrate the ongoing advantages of a powerful dynamic language like Clojure for building robust software - superior expressivity, instrumentation-enhanced REPL-driven development, sophisticated testing and more flexible systems. I encourage you to read the spec rationale and overview. Look for spec’s inclusion in the next alpha release of Clojure, within a day or so.

I hope you find spec useful and powerful.

Rich

Permalink

Editor Abstractions

I was recently inspired by a comment from a respected coworker:

“I am just as productive with basic Vim commands as I am with a refactoring suite like ReSharper.”

I have pair-programmed with him for hundreds of hours of C# development. He is equally productive with both, that much I know. On some tasks he is less efficient than with ReSharper, on others he is more. To clarify, we use the superb VsVim inside Visual Studio, so he still relies on the built-in tools for “Auto-complete” and “Go to Definition”.

The greatest benefit comes when we work in JavaScript, Haskell, or Clojure. His productivity doesn’t drop! His Vim and grep skills work just as effectively on any text.

The tools in our editors and IDE’s are concrete abstractions. If you have a good set of abstractions, you can use them to solve any problem. A well-designed abstraction composes well, and can be combined with others for new utility.

Consider the sequence abstractions. With only: map, filter, and fold, you can transform any sequence of data into another shape. Mastering the three sequence abstractions empowers you to transform any data. The power comes from how easily they can be combined.

Editor abstractions are most powerful when they can be composed. You can replicate most of the functionality of a refactoring suite using basic, composable text-editing commands. Well-designed editor abstractions can be recorded, edited, and replayed to transform text in any way you need. While no replacement for semantic tools like “Language Errors”, “Go to Definition”, and “Auto-complete”, they are an easy replacement for most other refactorings.

If you work in multiple languages, composable text-editing commands are a much better abstraction than those provided by a refactoring suite. Refactoring suites often have dozens of bespoke commands that only work in certain contexts. Even the best of these suites are often constrained to a single language. If you ever work in more than one language, you will get the most value learning to rely on abstractions that are constant across all environments.

I find I get the most value with a Vim plugin inside whatever environment provides the best semantic tools for the language. When building an Android app, I use IDEAVim inside Android Studio. For C#: VsVim inside Visual Studio. For all other languages: Evil mode inside Emacs. Instead of hundreds of specialized commands and contexts, I rely on a few basic abstractions to achieve any text transformation I can imagine.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.