## PurelyFunctional.tv Newsletter 226: Hacking+Ruby+Clojure

 Issue 226 – May 22, 2017

Hi Clojurers,

Thanks to everyone who responded to my plea last week. I have really great readers with great insights.

I got some optimistic perspectives and some were rather pessimistic. I just wanted to reassure everyone that I do have healthy revenue growth on my videos and I do plan on continuing helping people learn Clojure. If anything, I’ve got a more realistic view of the community and a lot of motivation to improve it. We’re not bad off and there’s plenty of room for growth.

Rock on!
Eric Normand <eric@purelyfunctional.tv>

PS Want to get this in your email? Subscribe!

## To Clojure and back: writing and rewriting in Ruby YouTube

I learned about this conference talk this week. It is excellent. Everything Phill MV talks about makes total sense. The thing that is most painful to me is that the phrase “user hostile” rang true with his Clojure experience. We need to listen to these experiences. They may not be totally true in general, but for every Clojure failure story we hear, there must be a hundred we don’t.

An unfortunate thing happened after the talk was published. Apparently the video drew a lot of abuse. This is not okay. He shared his personal experience and there’s no reason to get defensive about it. Please, please, be more excellent to each other.

## Emacs and Clojure, a Lispy Love Affair YouTube

Arne Brasseur totally shares his screen while rocking Emacs.

## Bounce! Hacking Jazzfest with Social Videos

A couple of weeks ago I participated in a hackathon. I love hackathons if they’re done right. They remind me of the joy of programming and creation.

## Ruby versus the Titans of FP YouTube

Cassandra Cruz loved Functional Programming and tried to make it great in Ruby. Her talk is a great lesson on some of the tougher concepts in FP: higher order functions, function composition, and currying.

## Who’s using Clojure, and to do what?

A recent, partial list of companies using Clojure. I was surprised by the quantity and variety. Does your company use Clojure? Put it on this page. Also, want to get more productive? Email me about training.

## Filling gaps in TensorFlow’s Java api

TensorFlow’s announcement of a Java api is great news for the clojure community. I wrote a post a couple of weeks ago that argued that TF’s Java api already provides everything that we need to do useful things. This is mostly true, by leveraging interop we can easily get tensors flowing, use any of TensorFlow’s many operations and do useful calculations with them. That said, if your plan is to use TensorFlow for machine learning, and I’m guessing this is most people, you’ll probably regret the absense of the great optimizers that Python’s TensorFlow api provides. Sure, you can build your own backpropagation out of TensorFlow operations but this becomes very tedious if your network is more than a few layers deep. So this week I had a go at implementing a functional gradient descent optimizer in Clojure/TensorFlow and I thought I’d share what I’ve learned.

### TL;DR

The result is a function, gradient-descent, which given a TensorFlow operation representing the error of a network, returns an operation capable of minimizing the error.

;; with a network described ...
;; using squared difference of training output
;; and the results of our network.
(def error (tf/pow (tf/sub targets network) (tf/constant 2.)))

(session-run
[(tf/global-variables-initializer)
(repeat 1000 (gradient-descent error))])


In this post I’m going to discuss:

## How does machine learning “learn”?

The first thing to say about machine learning is that when we say “learn” we really mean “optimize”. Each learning algorithm uses an error function to measure how well or poorly an algorithm has performed. Learning is equivalent to changing internal features of a model, usually weights and/or biases until the error is as small as it can be.

For supervised learning, which makes up a huge number of ML applications, our error function measures how different the network’s outputs are to known training outputs. If the difference is minimal, we say that the network has learned the pattern in the data.

But how do we know what to do to our model in order to reduce the error? A brute force solution would be to try every possible setting and select the one which produces the lowest error. Unfortunately, this method is extremely inefficient and not feasible for any real world application. Instead, we can use calculus to find the direction and rate of change of the error with respect to each weight or bias in the system and move in the direction which minimizes the error.

## Computing gradients on computational graphs

Consider the following clojure/tensorflow code.

;; training data
(def input (tf/constant [[1. 0.] [0. 0.] [1. 1.] [0. 1.]])
(def target (tf/constant [[1.] [0.] [1.] [0.]])
;; random weights
(def weights (tf/variable
(repeatedly 2 #(vector (dec (* 2 (rand)))))))
;; model
(def weightedsum (tf/matmul input weights))
(def output (tf/sigmoid weightedsum))
;; error
(def diff (tf/sub target output))
(def error (tf/pow diff (tf/constant 2.)))


Under the hood, we’re interoping with TensorFlow via Java to build a graph representation of the computation.

A visualisation of the graph might look something like this.

Each node is a TensorFlow operation and each arrow (or edge) represents the use of a previous operation as an argument.

As above, to train the network we need to find the rate of change (gradient) of the error with respect the weights.

The process is actually fairly simple in the abstract.

### Step 1: Find the path between the nodes

We need to find the path of TensorFlow operations between error and weights.

In this example, the path of operations is:

tf/powtf/subtf/sigmoidtf/matmultf/variable

### Step 2: Derive each function and apply the chain rule

Invoking the chain rule, we know that the derivative of error with respect to weights is equal to the derivative of each operation in the path multiplied together.

tf/pow' * tf/sub' * tf/sigmoid' * tf/matmul' * tf/variable'

It’s pretty much as simple as that.

## Implementing gradient computation in Clojure

The algorithm described above gets a little hairier when we come to actually implement it in Clojure.

### Finding the path between nodes

Our first task is to find the paths between two nodes in the graph. To do this, we need a way of knowing whether a node or any nodes it takes as input lead to the node we are looking for. The problem is, there is currently no method for getting the inputs of a TensorFlow operation in the Java API. To fix this, we can add a “shadow graph” to our code. This will be a secondary graph of our TensorFlow operations each recorded as a clojure map.

(def shadow-graph (atom []))


So, every time we add an operation to the TensorFlow graph. We will also conj its profile to the shadow graph.

We won’t ever use the shadow graph to actually run computations, but will simply reference it when we need information about an operation we can’t yet get from TensorFlow itself.

Now, in order to get the inputs of a TensorFlow operation we look it up in the shadow graph and retrieve the :inputs key.

(defn get-op-by-name [n]
(first (filter #(= (:name %) n) @build/shadow-graph)))

(def get-inputs (comp :inputs get-op-by-name #(.name (.op %))))


### Following the path of inputs

We can define dependence recursively like so. One operation depends on another if it is equal to that operation or any of it’s inputs depend on that node.

(defn depends-on?
"Does b depend on a"
[a b]
(or (some (partial depends-on? a) (get-inputs b))
(= b a)))


To get paths from one operation to another we can use depends-on? to resursively test if either of its inputs depend on the target operation and if so, conj it to the path and repeat the process on the dependent inputs.

The paths function returns a list of all possible paths between two operations in a graph.

(defn paths
"Get all paths from one op to another"
[from to]
(let [paths (atom [])]
(collate-paths from to paths [])
@paths))


I ended up using a function scoped atom to collect all the possible paths because of the structural sharing that can happen if an operation splits off in the middle of the graph. It works, but I’m not totally happy with the statefullness this solution so if anyone out there has a better idea, hit me up.

All of the hard work above is handled by the collate-paths function which recursively follows each input of the operation which also depends on the to operation.

The other important thing that collate paths does, is that it creates a map of the necessary information to differentiate the operation. The :output key stores the node on the path, the :which key stores which of the inputs depend on the to variable. This is important because the derivative of x^y with respect to x is not the same as its derivative with respect to y. Don’t worry about the :chain-fn key for now, we’ll get to that shortly.

(defn collate-paths [from to path-atom path]
(let [dependents (filter (partial depends-on? to) (get-inputs from))
which-dependents (map #(.indexOf (get-inputs from) %) dependents)]
(if (= from to)
(swap! path-atom conj
(conj path {:output (ops/constant 1.0)
:which first
:chain-fn ops/mult}))
(doall
(map
#(collate-paths
%1 to path-atom
(conj path
{:output from
:which (fn [x] (nth x %2))
:chain-fn
(case (.type (.op from))
"MatMul" (if (= 0 %2)
(comp ops/transpose ops/dot-b)
ops/dot-a)
ops/mult)}))
dependents which-dependents)))))


There’s a lot going on in this function, so don’t worry if it doesn’t completely make sense, the main idea is that, it collates all the information we will need to differentiate the operations.

### Find the derivative of each node in the path

Actually differentiating the operations is fairly simple. First, we have a map registered-gradients which contains a derivative function for each input to operation.

The get-registered-gradient function looks up registered-gradients using the information we collected with collate-paths and returns a tensorflow operation representing its derivative.

(defn get-registered-gradient
[node]
(let [{output :output which :which} node]
(apply (which (get @registered-gradients (.type (.op output)))) (get-inputs output))))


### Applying the chain rule

This is also slightly more difficult in clojure. Remember I said we’d get back to the :chain-fn key? Well we need that because we cant just use the standard tf/mult operation to chain all our derivatives. Of course mathematically speaking, we are multiplying all the derivatives together, but when it comes to tensors, not all multiplication operations are created equal.

The problematic operation in our example is tf/matmul because it changes the shape of the tensor output and which elements are multiplied with which based on the order of arguments.

For a graph which contains no tf/matmul operations we could get away with chaining our derivatives like so:

(reduce tf/mult
(paths y x)))


Which is a shame, because it’s so darn elegant.

Instead, we have the slightly more complicated:

(defn gradient [y x]
(reduce
(map
(partial reduce
((:chain-fn node)
(ops/constant 1.))
(paths y x))))


But the principle is the same.

To add a small extention on the gradient function, gradients maps gradient over a list of weights, biases etc.

(defn gradients
"The symbolic gradient of y with respect to xs."
([y & xs] (map (partial gradient y) xs))
([y] (apply (partial gradients y) (relevant-variables y))))


The other useful addition of gradients is that it can be applied without any xs. In this case it returns the gradients of any variable nodes which y depends on. This is powerful because we can optimize an error function without specifying all the weights and biases in the system.

The last function we need is apply-gradients which as you might’ve guessed takes a list of variable nodes and a list of gradients and assigns the variables to their new value.

(defn apply-gradients
(map #(ops/assign %1 (ops/sub %1 %2))


And that’s all we need to compute gradients on the TensorFlow graph in Clojure.

Computed gradients are the basis of all the great optimizers we use in supervised learning, but the simplest of these is gradient descent which simply subracts the gradient from the variable.

(defn gradient-descent
"The very simplest optimizer."
([cost-fn & weights]
(apply-gradients weights (apply gradients (cons cost-fn weights))))
([cost-fn] (apply (partial gradient-descent cost-fn)
(relevant-variables cost-fn))))


And as easy as that, we have a generalised optimizer for tensorflow. Which can be plugged into a network of any number of layers.

(session-run
[(tf/global-variables-initializer)
(repeat 1000 (gradient-descent error))])


## Final Thoughts

Optimizers and gradient computation are extremely useful for neural networks because they eliminate most of the difficult math required to get networks learning. In doing so, the programmer’s role is reduced to choosing the layers for the network and feeding it data.

So why should you bother learning this at all? Personally I believe it’s important to understand how machine learning actually works, especially for programmers. We need to break down some of the techno-mysticism that’s emerging around neural networks. Its not black magic or godlike hyper-intelligence, it’s calculus.

***

I’ve included the code from this post as part of a library with some helper functions to make interoping with TensorFlow through Java a bit nicer. This project is decidedly not an API, but just as little code as we can get away with to make TensorFlow okay to work with while Java gets sorted.

There is however some cool people working on a proper Clojure API for TensorFlow here.

To really get stuck into ML with Clojure you’ve gotta use Cortex. It provides way better optimizers, composable layer functions and a properly clojurian API.

## defn Podcast #21 James Reeves

In which, we talk to one of the most prolific Clojure OSS library developer James Reeves a.k.a @weavejester

## CUDA and cuBLAS GPU matrices in Clojure

The new Neanderthal 0.11 comes with the new CUDA engine! The high-performance Clojure matrix library now supports all 3 major choices that you'd want to crunch those billions of numbers with: CPU, Nvidia GPU with CUDA, and AMD's or Nvidia's GPU, or other accellerators with OpenCL. Let's see why this new stuff is important (and it really is!).

## CUDA/cuBLAS based GPU engine

I prefer free software, but many times I need access to the most powerful stuff. I've added a new engine to Neanderthal that gives us the full speed of Nvidia's CUDA based cuBLAS library. Since I have already stabilized Neanderthal's architecture by now, this is transparent to the user. If you've ever written code for native CPU engine or OpenCL GPU engine, you already know how to use it!

The only thing you need to take care of is basic engine configuration, but even that only if the defaults are not the right thing for you. If you want to use the first available Nvidia GPU in your machine, here's the hello world:

(require '[uncomplicate.commons.core :refer [with-release]]
'[uncomplicate.neanderthal
[core :refer [asum mm! copy]]
[cuda :refer [cuv cuge with-default-engine]]])

(with-default-engine
(with-release [gpu-x (cuv 1 -2 5)]
(asum gpu-x)))


You can study Hello World project example on the GitHub, which shows a basic Leiningen project setup based on Neanderthal, and examples of using CPU, OpenCL and CUDA code.

## Even faster!

Neanderthal was already optimized for top CPU speed, and even more speed on both AMD's and Nvidia's GPU. You could write very concise Clojure code using vector and matrix API, and also easily combine those with customized GPU code in ClojureCL.

One major thing was still left untapped though: Nvidia's proprietary CUDA-based libraries that require Nvidia's proprietary and closed source CUDA technology that is also tied to the Nvidia hardware. Bad. Saint IGNUcius does not approve this sinful technology and I am ashamed to indulge in this blasphemy. On the other hand it gives us the access to a ridiculously well optimized set of libraries ranging from linear algebra to deep learning that Nvidia offers at no charge. How fast it is? Let's see…

In the OpenCL engine tutorial, I did a basic exploration of the capabilities of the OpenCL-based Neanderthal engine, that is based on Cedric Nugteren's excellent open-source library CLBlast. It is amazingly fast. For example, it multiplies three 8192 × 8192 matrices ($$C_{8192\times8192} = \alpha A_{8192\times8192} \cdot B_{8192\times8192}$$) in 220 ms on Nvidia GTX 1080.

Theoretically, matrix computations require $$2\times m \times k \times n$$ floating point operations (FLOPS). (This does not even count memory operations, but that's the problem of the implementer.) $$(2 * 8192^3) \div 0.220$$ is 4.99 TFLOPS ($$10^{12}$$). This boils down to 5 TFLOPS out of 8.228 that the card is theoretically capable of. That's 60% utilization, which is quite impressive for a one-man team working part-time! Now I'm trying the same stuff on the same hardware with the CUDA-based engine:

(require '[uncomplicate.clojurecuda.core :refer [synchronize!]])

(with-default-engine
(let [cnt 8192]
(with-release [gpu-a (cuge cnt cnt (range (* cnt cnt)))
gpu-b (copy gpu-a)
gpu-c (copy gpu-a)]
(time (do (mm! 3 gpu-a gpu-b 2 gpu-c)
(synchronize!) ;;Wait for asynchronious mm!
gpu-c)))))

#CUGEMatrix[float, mxn:8192x8192, order:column, offset:0, ld:8192]



141 ms! $$(2 * 8192^3) \div 0.141$$ is 7.798 TFLOPS, almost at the specification maximum of the hardware! Nvidia did a really good job here, and the performance difference is, according to Cedric's experiments, even larger for smaller or non-quadratic matrices. In Clojure land, now we have the choice between a great free OpenCL backend, and an impressive proprietary Nvidia backend with unbeatable speed!

## Access to the Nvidia CUDA ecosystem

What is left for you to do now, write a few lines of high-level Clojure function calls of these GPU operations, and easily create top-notch machine learning cash machine? Well… not so fast. Yes, linear algebra operations can take us a long way, but there is always some custom stuff that needs to be written. Sometimes it is because we cannot express what we need to do with standard operations, sometimes because we need to optimize for our special use case.

Neanderthal got us covered - we can still use its data structures and automatic transfer methods to manage our data, and combine it with our custom CUDA kernels with the help of ClojureCUDA. That's where real power is.

And that's not all. Nvidia ships several libraries for domains beyond linear algebra. Neanderthal helps with connecting with those in a similar way it helps with our custom ClojureCUDA-based stuff - by taking data management task off our shoulders. Expect some of those Nvidia libraries to be integrated into Clojure ecosystem as additional Neanderthal engines, or as separate Uncomplicate libraries.

I expect to integrate at least cuSPARSE and cuSOLVER into Neanderthal, and I'm eyeing cuDNN as an engine for a possible deep learning library. That shouldn't stop you from creating those before I do!

## Wrap it up

I've just remembered that I had skipped the announcement post for ClojureCUDA last month when it was released. So, check out that library also :) I had in mind a couple of additional things to say, but this post is getting long, so, until the next time, please check out the new Neanderthal 0.11.0, ClojureCUDA 0.2.0, play with the examples, experiment, and write about what you discovered!

## Automating resilience testing with Docker and Property Based Testing - Devoxx UK 2017

Video from my presentation on Devoxx UK 2017 on Docker, resilience and property based testing

## Immutant 2.1.7 Release

We just released Immutant 2.1.7. This release includes the following changes:

• We've upgraded our Undertow dependency to the latest stable version (1.4.14.Final) to resolve an issue with socket leaks when using async http responses/SSE.
• We've also upgraded our Ring dependency to 1.6.0.
• For HEAD requests, async channels now automatically close after the first send! call.

## What is Immutant?

Immutant is an integrated suite of Clojure libraries backed by Undertow for web, HornetQ for messaging, Infinispan for caching, Quartz for scheduling, and Narayana for transactions. Applications built with Immutant can optionally be deployed to a WildFly or JBoss EAP cluster for enhanced features. Its fundamental goal is to reduce the inherent incidental complexity in real world applications.

## Get In Touch

As always, if you have any questions, issues, or other feedback about Immutant, you can always find us on #immutant on freenode or our mailing lists.

## Issues resolved in 2.1.7

• [IMMUTANT-627] - Upgrade to Undertow >= 1.3.27.Final
• [IMMUTANT-629] - Update to Ring 1.6
• [IMMUTANT-630] - Web applications can't adequately respond to HEAD requests in certain cases

## Immutant 2.1.6 Release

We just released Immutant 2.1.6. This release includes the following changes:

• Update to Ring 1.5.1 to address a security vulnerability. This vulnerability only affects applications that are running from the filesytem, not from an uberjar or war, so most users aren't affected.
• Remove our dependency on Potemkin. This was a common source of collision with other application dependencies, so we now use an internal copy of Potemkin under different namespaces so it doesn't conflict.
• A minor update of the version of tools.nrepl on which we depend (0.2.11 -> 0.2.12)

## What is Immutant?

Immutant is an integrated suite of Clojure libraries backed by Undertow for web, HornetQ for messaging, Infinispan for caching, Quartz for scheduling, and Narayana for transactions. Applications built with Immutant can optionally be deployed to a WildFly or JBoss EAP cluster for enhanced features. Its fundamental goal is to reduce the inherent incidental complexity in real world applications.

## Get In Touch

As always, if you have any questions, issues, or other feedback about Immutant, you can always find us on #immutant on freenode or our mailing lists.

## Loading Clojure Libraries Directly From Github

Did you ever fix a bug in an open source library, and then had to wait until the maintainer released an updated version?

It’s happened to me many times, the latest one being Toucan. I had run into a limitation, and found out that there was already an open ticket. It wasn’t a big change so I decided to dive in and address it. Just a little yak shave so I could get on with my life.

Now this pull request needs to be reviewed, and merged, and eventually be released to Clojars, but ain’t nobody got time for that stuff. No sir-ee.

So what do I do? Do I push my own version to Clojars? Call it lambdaisland/toucan? If the project seems abandoned, and it could help others as well, then maybe I would, the way I did with ring.middleware.logger, but Toucan is actively maintained, so releasing a fork might cause confusion (and maybe even be taken the wrong way).

For cases like these it’s good to know that you can use Github as a Maven repository. That’s right, you don’t need to deploy or build any jars. Any Clojure library with a proper project.clj will do, thanks to the power of Jitpack.

All you need to do is create a git tag and push it Github.

$git tag v1.0.2-with-hydrate-fix$ git push plexus --tags


Now add the Jitpack maven repository, and you can load your library straight aways.

(defproject lambdaisland "0.160.0"
:dependencies [,,,
[com.github.plexus/toucan "v1.0.2-hydrate-fix"]]
:repositories [["jitpack" "https://jitpack.io"]])


Notice the format of the dependency vector: [com.github.your-github-name/your-project-name "git-tag"].

Let’s try that out:

$lein deps Could not transfer artifact com.github.plexus:toucan:pom:v1.0.2-hydrate-fix from/to jitpack (https://jitpack.io): Read timed out This could be due to a typo in :dependencies or network issues. If you are behind a proxy, try setting the 'http_proxy' environment variable.  This is usually the case, the first time you try to download the jar, Jitpack still needs to fetch the code and build it, so this tends to time out. But try again a minute later and, tada \o/ $ lein deps
Retrieving com/github/plexus/toucan/v1.0.2-hydrate-fix/toucan-v1.0.2-hydrate-fix.pom from jitpack
Retrieving com/github/plexus/toucan/v1.0.2-hydrate-fix/toucan-v1.0.2-hydrate-fix.jar from jitpack


Jitpack also has paid plans, for when you want to deploy stuff straight from private repositories, but they’re free for open source, so kudos to them!

## Interesting Talk: "Idée Fixe"

I've just watched this wonderful talk by David Nolen

 -main Arne Brasseur on Lambda Island turning one, and the rocky journey so far. "I could see the eduction, presumably still wrapping the reified IReduceInit, perched atop a ziggurat of lesser transducers. I looked, and at the very bottom lay a blasphemous side-effectful invocation of map. I did not belong in this place." Onyx now has a Java API. At work we were building a new system and looked very seriously at Onyx. We talked with Michael Drogalis and Lucas Bradstreet and found them to be very knowledgable and professional to work with. I'd highly recommend putting Onyx on your shortlist if you need to do analytics, CQRS, or anything that works with streaming data. Libraries & Books. Ataraxy is an experimental data-driven Ring routing library from James Reeves. Think Compojure, but with data instead of macros. test2junit 1.3.0 is out huffman-keybindings is a neat way to generate short keybindings to commonly used UI components. Note that it's licensed under AGPL-3.0 which makes it difficult to use with Clojure. Luke Vanderhart has an update on Arachne in a Reddit comment Process jobs with Redis and Farmhand People are worried about Types. ? Foundations. Tools. Cursive 1.5.0 is out with a bunch of improvements. I wrote a post about the improvements relating to dependencies. The last post had a bullet On Lumo's Growth and Sustainability but was missing the hyperlink. Prepack looks like it could really improve ClojureScript startup time. Combining prepack and the Closure compiler seems fruitful too ClojureScript REPLs now support evaluating multiple forms on a single line. Lambda Island again, this time on writing Node.js scripts with ClojureScript Using lein to produce multiple artifacts from a Clojure monorepo. There's been a slow trend of companies moving towards monorepos in the last few years. We've looked at it for work too, but decided against it for now. Java Stream Debugger for IntelliJ looks very very cool Recent Developments. CLJS-2020 defmulti "miss" performance is poor Learning. The status antipattern describes an alternative to having a single "status" field to describe an entity. Data wrangling with transducers for a machine learning problem Dealing with lazy data in a with-open Misc. I watched this video about Allen Hemberger making every recipe in the Alinea cookbook and documenting it in their own beautiful book. I'm still thinking about it a few days later. The teamwork that Allen and his wife Sarah for the project was inspiring to watch. They're now working with the Alinea Group on a Kickstarter to produce a molecular gastronomy cocktail book.