Learning Ring Next Steps

Previously, I wrote about Clojure and Ring in an introductory post where a small debugging application called Echo was developed. If you haven't looked at that post and are looking for an introduction to Ring I suggest you do and then come back here.

For this post I've create another sample application called ring-next which contains routines to demonstrate a number of concepts in Ring. These are:

  • Middleware
  • Responses
  • Parameters
  • Cookies
  • File Uploads
  • Routes

For each of these I'll show a few code snippets and then explain what is going on. The repo for this project is linked at the end of this post.

Middleware

Handlers are the functions that make up your application. They accept responses and return responses. Middleware are functions that add additional functionality to handlers. They are higher-level functions in that they accept a handler as a parameter and return a value which is a new handler function that will call the original handler function. This pattern allows you to wrap your handler in a number of middleware functions to build up layers of functionality.

To give you an idea of middlewhere functions here are three from ring-next. There are ment to show the idea.

The first simply addes a new field with a value to all requests. The second prints each request to STDOUT. When you lein run the application from a terminal window you'll see requests printed to the output. The last was built during the development of ring-next because I found that Chrome was during some requests looking for the favicon.ico file. This was cluttering up my debugging with extra requests being printed. The wrap-ignore-favicon-request function solved the problem by effectively neutering the call. By having this function wrapped before all the mothers ensured a 404 was returned before any other middleware and handlers were called.

;; You can add items to the request ahead of your main handler
;; Here we are adding a new field :message with a string which has been passed in as a parameter
;;
;; (wrap-request-method-message handler "This is a test message!!")
;;
(defn wrap-request-method-message [handler s]
  (fn [request]
    (handler (assoc request :message s))))

;; Log/print some debug messages during a handler's execution
;;
;; (wrap-debug-print-request handler)
;;
(defn wrap-debug-print-request [handler]
  (fn [request]
    (pprint/pprint (conj request {:body (slurp (:body request))}))
    (handler request)))

;; Ignore favicon.ico requests
;;
;; (wrap-ignore-favicon-request [handler]
;;
(defn wrap-ignore-favicon-request [handler]
  (fn [request]
    (if (= (:uri request) "/favicon.ico")
      {:status 404}
      (handler request))))

Notice how each function takes a handler as a parameter and returns a function that accepts a request and closes over the handler. In other words, the returned function is a handler which calls the handler passed to it as a parameter.

With this in mind, assuming you have a main handler called myhandler, then you'd set up your system as follows.

(def app (wrap-ignore-favicon-request (wrap-debug-print-request (wrap-request-method-message myhandler "TESTING"))))

(run-jetty app {:port 3000}

Responses

Your handlers ultimately return responses. The most basic response is a map of three fields.

{:status 200
 :headers {}
 :body "Hello"}

You can build your own response maps but it will become tedious quickly so there are a number of helper functions included in the ring.util.response namespace. For example, to replace our simple map we can use the response function.

(ring.util.response/response "Hello")

Another type of common response is to serve a static file. You can return a file from the file system or from with the resources directory. Later we'll need to return some html files so I'll use the function which returns resources, resource-response. The resource-response function is in the ring.util.response namespace. It returns files from the /resources directory of the project and in this case the public sub-directory.

For example:

(defn static-file
  "Return a static file"
  [filename]
  (response/resource-response filename {:root "public"}))

There are other useful functions in the ring.util.response namespace so it is worth taking a look.

Parameters

Parameters are key value pairs passed to web applications through the query string or in the body of a request. With Ring, there are two middleware libraries to help manage these values. If you've used Echo to submit or post values and have observed the request maps you'll see that without a library you'd need to write functions to parse the key value pairs out of the submitted query or body. The [ring.middleware.params](http://ring-clojure.github.io/ring/ring.middleware.params.html) library does this for you.

The way it works is that you include the library and then wrap your handler with the wrap-params function. This function pre-processes your requests and adds a :params map to your request with all of key value pairs inside. Nicely, it builds the map with with both your query string values as well as your body values in cases where both are present.

A second function wrap-keyword-params is usedful to use with wrap-params. It turns your keys into keywords making your :params map more useful. Make sure to wrap the two in order because the keyword function will be looking for the existance of the :params map.

(wrap-params (wrap-keyword-params handler))

Cookies

Support for cookies comes from a few libraries. First you need to include the wrap-cookies middleware. This function is in the ring.middleware.cookies namespace.

You will also use the set-cookie function from the ring.util.response](http://ring-clojure.github.io/ring/ring.util.response.html) namespace to create your cookie.

The following function from the example code show support for setting, reading and clearing a cookie. The cookie handler is called from three routes:

  • /cookie/get
  • /cookie/set
  • /cookie/clear

These are setup later in the Routes section. The points of interest here are:

The keyword params are pulled out of a request during the set operation. The parameters are generated by a form which you can see in the /resources/public directory called cookie.html. With the value for the cookie in hand the set-cookie function uses ring.util.response/set-cookie to build a response with the cookie set. Reading the cookie is possible due to the wrap-cookies middleware function with turns the cookie value in a request into a :cookies field. Lastly, clearing a cookie is done by setting the cookie's age to `.

(defn get-cookie [request cookie-info]
  (:value (get (:cookies request) (:name cookie-info))))

(defn set-cookie [response request cookie-info val]
  (response/set-cookie response (:name cookie-info) val {:path (:path cookie-info)}))
   
(defn clear-cookie [response request cookie-info]
  (response/set-cookie response (:name cookie-info) "" {:max-age 1 :path (:path cookie-info)}))

(defn cookie
  "Handle cookie request. If GET then read and return the cookie value
else if POST then accept posted value and set the cookie."
  [request]
  (let [cookie-command (fn [s] (keyword (get (clojure.string/split s #"/") 2)))
        cmd (cookie-command (:uri request))          ;; :get, :set, :clear
        cookie-info {:name "ring-next-cookie" :path "/cookie"}]
    (if (and (= cmd :set) (= :post (:request-method request)))    ;; only allow set if posted
      (let [val (:value (:params request))]                       ;; (wrap-params (wrap-keyword-params handler))
        (set-cookie (response/response "Set cookie") request cookie-info val))
      (if (= cmd :clear)
        (clear-cookie (response/response "Clear cookie") request cookie-info)
        (response/response (str "Cookie value is '" (get-cookie request cookie-info) "'"))))))

File Uploads

Uploading files is supported with the wrap-multipart-params middleware function. This function is in the ring.middleware.multipart-paramsw namespace.

The following example code shows how to use it. To start look at the file file.html in the resources/public directory. This is a basic upload file form. It lets you submit a file to the application. There are two routes supported.

  • /file/upload
  • /file/download

The upload route accepted the posted file and saves it using the information organized in the request byt wrap-multipart-parms. Here we get the original filename and the path to the temporary file in which the uploaded file has been saved. Then we save the temporary file's path in a cookie.

When the download route is exercised the cookie is read which lets the download-file function read and return the file. Notice how the file-response function is used to return the file.

;; Middleware: wrap-multipart-params
;;
;; :params
;;  {"file" {:filename     "words.txt"
;;           :content-type "text/plain"
;;           :tempfile     #object[java.io.File ...]
;;           :size         51}}

(defn upload-file [request cookie-info]
  (let [original-filename (:filename (:file (:params request)))
        tempfile (:tempfile (:file (:params request)))]
    ;; save tempfile location in cookie
    (set-cookie (response/response "File uploaded") request cookie-info (.getPath tempfile))))

(defn download-file [request cookie-info]
  ;; read file from tempfile location stored in cookie
  (let [filepath (get-cookie request cookie-info)]
    (response/file-response filepath)))

(defn file
  "Handle file request. If GET then read the file and return its contents
else if POST then accept the posted file and save it."
  [request]
  (let [file-command (fn [s] (keyword (get (clojure.string/split s #"/") 2)))
        cmd (file-command (:uri request))
        cookie-info {:name "ring-next-file" :path "/file"}]

    (if (and (= cmd :upload) (= :post (:request-method request)))   ;; only allow upload if posted
      (upload-file request cookie-info)
      (download-file request cookie-info))))

Routes

Routing describes the mapping of URLs to specific functions. Here with Ring we'll need to do our own routing and as an example I've shown the following routes function. The idea with this function is to look at the uri field in the requst and decide how to handle the request. If you look you'll see a few static file requests which are served by static-file. Other sections are supported by the cookie and file function to demonstrate those examples.

Another thing to mention is that in this example routes is what I'd call my main handler. It needs to be the handler that is wrapped up by all of the middleware. The middleware will pre-process the request and routes here will dispatch to the specific support functions to build the appropriate responses.

Lastly, this function is meant to show the idea of dispatching to support functions. There are other more cleverer ways to do this and at some point you'll want to look to a library to make this look a lot better. A common library for this is Compojure.

(defn routes [request]
  (let [uri (:uri request)]
    (case uri
      ;; static file
      "/" (static-file "index.html")
      "/index.html" (static-file "index.html")

      ;; cookie
      "/cookie" (static-file "cookie.html")
      "/cookie/get" (cookie request)
      "/cookie/set" (cookie request)
      "/cookie/clear" (cookie request)

      ;; file
      "/file" (static-file "file.html")
      "/file/upload" (file request)
      "/file/download" (file request)

      ;; default to our main 'echo' handler
      (debug-return-request request))))

As a last step, here is my app var which contains the setup of wrapping middleware around my routes function. Notice the commented out first version so you see how this sort of thing is built and then the cleaner version using the -> threading macro. Both version work the same.

(def app
  ;; Initial call chain
  ;; (wrap-ignore-favicon-request (wrap-multipart-params (wrap-cookies (wrap-params (wrap-keyword-params (wrap-request-method-message (wrap-debug-print-request routes) "This is a test message!!"))))))

  ;; Using the threading macro
  (-> routes
      wrap-debug-print-request
      (wrap-request-method-message "This is a test message!!")
      wrap-keyword-params                                            ;; this needs to be 'after' wrap-params so there is a :params field for it to its work on
      wrap-params
      wrap-cookies
      wrap-multipart-params       
      wrap-ignore-favicon-request)
  )


(defn -main []
  (jetty/run-jetty app {:port 3000}))

Source Code

The repo for ring-next is available here. https://github.com/bradlucas/ring-next.

Here are few tips for working with the sample code. First, open two terminal windows. In the first start the application with lein run. This window will show debugging output from the wrap-debug-print-request middleware. In the second terminal, you can test with curl.

For a first step try:

$ curl "http://localhost:3000"

You'll get the index.html page. Try the same url in a browser to see what it looks like.

Next, try:

$ curl "http://localhost:3000/testing/this/app?name=foo&city=anywhere"

This url doesn't have a specific handler in routes so we have the requst echoed back to use. Things to look for are:

The :message field was set by wrap-request-method-message

:message "This is a test message!!"

The :uri field has our path of /testing/this/app. And lastly, our parameters have been added to the :params field.

:params {:name "foo", :city "anywhere"}

Notice that our keys are now keywords thanks for wrap-keyword-params.

To try the cookie and file features use the following urls to start from a browser.

After each operation use the back button to return and try the next.

For example, in the file form. Upload a file. You'll get a message telling you it was uploaded. Then use the back button to return to the form. Then click Download File to view the file.

The cookie form works the same. Set the cookie, view message, backup and view cookie. Same for clearing the cookie. You can verify the cookie using your browser's developer tools as well.

Summary

One last thought. The above examples are meant to show an exploratory way of learning about Ring. There are certainly better and more clever ways of doing things but here I was aiming for simple and straightforward. Suggestions are of course welcome and please feel free to leave comments with alternatives to the above functions.

Permalink

A verbose explanation of compact code

Cover image found on imgur.com

My wife is a genius. She's straight up brilliant. She's an astrophysicist with a passion for process and product improvement. However, for the life of me, I have not found an effective way to talk to her about details of code that I've written. Granted, most of the time we have to discuss work happenings is at the dinner with 2 or 3 kids (depending on extracurriculars) being loud, so I have to be succinct. I can answer "how many SLOC" and "how can you reuse this elsewhere" and "what does it do" and "why did you pick that language" fairly easily. But when she wants more technical detail... well, I haven't found a way to be succinct and still convey understanding. So this post is dedicated to the conversation I was trying to have with my wife a few weeks ago about a task I finished at work that day.

Scope of the work

I have a customer that absolutely loves to automate as much as possible so that his people (re: me and the other folks in the lab) can maximize our time spent in code. He's been a really great customer so far, and I volunteered to write something: an automated task to pull together status on all the findings that we found and/or worked on this week, generate a PDF, and ship it off to his boss. I mean, how hard could it be?

Tech Choices

My customer has a love of keeping things as simple as possible, and wanted to build our reports in Markdown, with our findings reported each week in JIRA placed in a table. This was going to require flattening out a response from the JIRA REST API and printing out the applicable data, with a "TBD" where there were nulls. Oh, and he wanted to be able to reuse this for multiple reports, where the different table columns could change.

At this point, I'd had a few things chosen for me:

  • JIRA input
  • Markdown output

Now, all I had to do was choose a language. I finally saw a place to use my pet project language (Clojure) in production! It wasn't just a whimsical choice; I evaluated a few different languages before finally settling in on Clojure.

Breaking apart the JIRA response

This project didn't really take very long; most of my time was spent in Chrome Dev Tools figuring the IDs for the custom fields so that I could flatten the data structure into a simple HashMap. Since all I had to deal with were the issues portion of the JIRA response, I chose to treat each issue as its own "object". This provided a good mapping from response to table because each JIRA issue was going to be its own row in the table. I created a function similar to the following:

(defn issues->useable
  [issue]
  { :finding-id (str "[" (:key issue) "](" (:self issue) ")")
    :title (get-in issue [:fields :summary])
    :status (get-in issue [:fields :status :name])
    :priority (get-in issue [:fields :priority :name])
    :date-created (get-in issue [:fields :created])
    :components (->> issue :fields :components (map :name) (str/join ","))
    :due-date (get-in issue [:fields :duedate])
    :labels (->> issue :fields :labels (str/join ","))
    :risk-consequence (get-in issue [:fields :customfield_22007 :value])
    :risk-probability (get-in issue [:fields :customfield_22006 :value])
  })

Yes, I was a little cheeky in my function naming. And yes, I know there's a lot of code to digest here. I'll step through it.

Output

The output of this function isn't readily apparent to those who haven't looked at Clojure before. Clojure typically represents data structures as HashMaps. The tokens with a leading ":" indicate that they are a keyword, mostly like a sigil in Elixir or a symbol in Ruby. I'll elaborate on why I say "mostly" later, as it was critical to choosing Clojure. In Clojure, a map is surrounded in curly braces with keys and values grouped together. For example,

{:a 1, "foo" "bar"}

is a map with the key/value pairs :a => 1 and "foo" => "bar". Yes, Clojure can have non-keywords as keys, but it's not something that is commonly done in practice. The comma in between the pairs is actually optional; Clojure treats all commas as whitespace. So the output of my issues->useable function is a map with keyword/string pairs.

The get-in function is also rather nice. Given a nested hashmap and a sequence of keys, it will parse its way down through the nested hashmap and return the value stored there. If there is no value associated with that ending key, nil is returned.

A few other Clojure concepts come to life in this section of code as well. If we look at how the components field is generated in my output map, you'll see a the ->> macro. ->> is an end-threading macro that applies the output of the previous function evaluation as the last argument of next function evaluation. So for

    :components (->> issue :fields :components (map :name) (str/join ","))

First, the issue is passed to the function :fields.

I thought :fields was a keyword!

Here is the big reason I chose Clojure. Keywords in Clojure also implement the IFn interface, meaning that they are considered to be functions in Clojure. So the expression (:a {:a 5 :b 6}) will evaluate to 5. It is returning the value associated with the :a keyword.

Sure, I could've written that part of the function as (:fields issue), but then when I wanted to get further in depth, I'd have to write the whole function as

(str/join "," (map :name (:components (:fields issue))))

And that's nowhere near as clean as with using the ->> macro. By using the front-threading and rear-threading macros in Clojure, you are able to see the data pipeline much more cleanly, similar to how you would use the |> operator (and its variants) in OCaml, F#, or Elixir.

Creating table rows

So now I could transform an issue into a flattened map, but I need to have it as a markdown row. Markdown separates its table columns with pipes (|), with pipes on the outsides to box in the whole row. I wrote the following function to do this:

(defn create-row
  [values]
  (->> values
       (map (fnil name "TBD"))
       (into '(nil))
       (cons nil)
       reverse
       (interpose "|")
       (apply str)))

So again, I start with the rear-threading macro and just the values from my key/value pairs. Then I map (fnil name "TBD") over each of the values. fnil returns a higher-order function that takes nil parameters passed to it, replaces them with a different specified value (in our case "TBD"), and calls the function listed as the first argument. I used name to return a string-ified version of the value in the map (or "TBD" in the case of a nil). Sure, I could've used the identity function instead of name, but that's 4 whole characters more ;-) I love fnil solely because it reduces the footprint of NULL/nil checking logic.
The next line may look a little bizarre to you then, based on my previous line:
(into '(nil)) puts the values that have just be string-/"TBD"-ified and puts them into a list that contains only the value nil. I'll explain that in a sec, along with the (cons nil) line. The call to reverse ought to be fairly clear: reverse the order of items in a sequence.

The reason for using nils above was for use in the interpose function. interpose takes a sequence of items and inserts the specified value in between them. For instance, (interpose 1 [3 4 5]) will yield the sequence (3 1 4 1 5). However, I need pipes at the beginning and ending of my sequence as well, for the "outer walls" of the table. So I use (into '(nil)) to pass the values from my sequence into a list that has only a nil in it. Since into performs a conj operation on each element, the elements in the sequence that we've just created will be added in reverse order. For example (into '(nil) [1 5]) will yield (5 1 nil). If a cons a nil onto the front of that collection, then I'll have a sequence of (nil 5 1 nil). Interposing pipes into that will then give me (nil "|" 5 "|" 1 "|" nil).

I then apply the str function to the collection that we have just created. The str function creates a string by concatenating the args passed to it together, with nil yielding an empty string. applying that to our now reversed collection (nil "|" 1 "|" 5 "|" nil) will yield the string "|1|5|". Perfect! This is exactly what we need to create a table row in Markdown.

Creating the whole table

I now have a function to create my rows, but rows aren't terribly useful in a table that has no labels. So we need to have a function that creates the whole table, complete with which "column" of data is which. I was able to generate that in 6 very dense lines of Clojure:

(defn create-table
  [ordered-keys issues]
  (let [header        (create-row (map ->PascalCase ordered-keys))
        separator-row (create-row (repeat (count ordered-keys) "---"))
        rows (map (comp create-row (apply juxt ordered-keys)) issues)]
    (apply vector header separator-row rows)))

Let's walk through this. The function create-table takes a list of ordered keys, as well as the flattened issues (or, basically, any flat hashmap) and generates a Markdown table. I'm also assigning values to three local variables: header, separator-row, and rows. Let's look at the expression where we use those variables: (apply vector header separator-row rows). The vector function takes a list of arguments and creates a vector (i.e., [1 2 3] and not a list like (1 2 3)) from those arguments. In this case, I'm creating a vector of table rows, starting with the header, and a Markdown-required separator row between the headings and values. The header is merely a row created from converting the ordered-keys from keywords to PascalCase using the indispensable camel-snake-kebab library. This way, :finding-id will be printed as the more management-palatable FindingId. Secondly, the separator-row is just three dashes repeated as many times as we have columns. Now, onto my favorite line of Clojure in this project, just due to the dense power of it:

rows (map (comp create-row (apply juxt ordered-keys)) issues)

Clearly, I'm mapping a function across all the rows passed in. That function is fairly complex. juxt is one of my absolute favorite functions in all of Clojure. It takes a sequence of functions and returns a function that passes the same argument to all functions in that sequence and returns a vector of their results. For instance, ((juxt + *) 3 4) yields [7 12]. When I was doing primarily .NET development, I ported this function and multiple arities of it so that I could calculate multiple, independent metrics on terabytes of data in one pass. Using apply juxt to our list of ordered keys (which are all keywords), I just created a function that, when evaluating our data, will generate a vector of the values associated with those keywords in my data, in the order specified. For instance, ((juxt :a :id) {:a 5 :b 60 :id "blue"}) will yield [5 "blue"]. I then compose that (the comp function) with the create-row function, because

(map create-row
  (map (apply juxt ordered-keys)) issues)

just didn't look as clean.

Conclusion

So that's basically it! It's a lot of dense code that is hard to understand at one pass, but is extremely flexible once you understand its use. I can now create multiple tables from any set of data, not tied to any particular type of data. I can pass my JIRA issues (in this case) into my create-table function with different sets of ordered keys to create different reports - for example, one table useful for middle management to report to upper management, and one for developers realizing the visibility of the issues they are working. I'm also not tied to JSON or any other format of data. So long as I can get the data into a map, I can reuse this to generate Markdown tables.

I know this was a rather long read, so congrats if you made it through! If you've got some particularly tricky yet useful code, I'd love to see you write about it as well. And if you have yet to explore Clojure, I welcome you to give it a try and see these and some other exciting language features :-)

Permalink

Build Your Own Transducer and Impress Your Cat - Part 5

This post is a part of a serie:

  1. Introduction to transducers
  2. Anatomy of a transducer
  3. Stateful transducers
  4. Early termination in transducers
  5. Functions which are using transducers (this post)

This article describes some functions which are using transducers, and how to write your own.

Transducers, getting into using them

If you followed the previous 4 parts of this blog serie, you may have noticed that I only mentioned one function which is using transducers so far: the into function.

(into [] (map inc) (list 3 4 5))
; => [4 5 6]

There are more of that kind. Their role is to provide the context of a stream transformation. This includes to provide a data stream to the transducer, and to deal with the transformed stream that they get from the transducer.

They are:

  • the from where and the to where,
  • the how do I take that input and the what should I do with that output.

And since they are the one which are calling the transducer, they are also in charge of deciding the when do I do this to that.

Transducer users from the standard library

Note: In the following paragraph, the parameter xform refers to a transducer.

  • (into to xform from): When you want the output stream's elements to be all stored in a collection, which can be a vector, a list, a map, a sorted map, or any other type of other collection. I love that function because it appends to the collection in a very efficient way.

  • (sequence xform coll): When you want the stream to be processed "only when needed". Useful if you think that the sequence consumer may not need all the output elements, or if don't want all of it to be processed immediately. That's useful for throttling the transformation w.r.t. the CPU workload, or if you need to send the output stream over a slower channel like the network and you don't want to buffer the whole output stream in advance.

  • (eduction xform* coll): ... it's complicated, see the docs for a precise explanation. Rarely needed by the average programmer. It returns a non-lazy sequence which is evaluated using the transducers each time it is being used by a reduce function.

  • (transduce xform f init coll): When you want to use the reduce operation but you also want to perform a stream transformation on the data to be reduced.

Transducer users from the clojure.core.async library

Other uses of transducers

There may be other functions that use transducers, adapted to different contexts. If you know some, please let me know by leaving a comment and I will add them if they are relevant to the audience.

Let's implement some!

Good news, transducers are now your friends, you can call them when you want. But if you want to be a good friend, there are some rules to follow.

The output function

Transducers are technically functions which transform a reducing function into another reducing function.

If you don't know what that mean, you can also see them as a functions that transform a kind of output function into another one, except that this output function is taking a kind of accumulator value as its first parameter.

Let's play with this idea a bit and let's implement a function that provides a stream of data from a collection to a transducer and outputs the transformed stream to the standard output, element by element.

; Notes:
; - Don't do this at home, this is still a toy function.
; - The transducer is often called 'xf' and the reducer function 'rf'.
(defn print-duce [transducer coll]
  (let [reduction-fn #(print (str %2 \space))
        process (transducer reduction-fn)]
    (loop [c coll]
      (when (seq c)
        (do
          (process nil (first c))
          (recur (rest c)))))))

(print-duce (map inc) (list 3 4 5))
; Output:
; 4 5 6

; But this does not work:
(print-duce (partition-all 2) (list 3 4 5))
; Output:
; [3 4]
;
; the [5] is missing!

Arity-{0 1 2}

You said Arrietty?

If you read the parts 1 to 4 of this blog serie, you will know that transducers are expected to be called with different arities.

Arity-0: Just ignore it, don't call it unless you know what you are doing (and let me know why in the comments).

Arity-2: Call it each time you want to feed one more data element to the transducer.

Arity-1: Call it once you to inform that there are no inputs available anymore. The transducer (even a stateless one) may flush a few more data to your output function.

Now let's improve our print-ducer a bit.

; Note: Still a toy function.
(defn print-duce [xf coll]
  (let [rf (fn ([])
               ([result] (print (str \newline "--EOS")))
               ([result input] (print (str input \space))))
        process (xf rf)]
    (loop [c coll]
      (if (seq c)
        (do
          (process nil (first c)) ; 2-arity 'process'
          (recur (rest c)))
        (process nil)))))         ; 1-arity 'flush'

(print-duce (map inc) (list 3 4 5))
; Output:
; 4 5 6 
; --EOS

; Now this works:
(print-duce (partition-all 2) (list 3 4 5))
; Output:
; [3 4] [5] 
; --EOS

Enough is enough (early termination)

Your transducer has its say on when to stop the stream. It will return a reduced value when it thinks that there should not be any other element. You need to pay attention to it and not ask too much.

(defn print-duce [xf coll]
  (let [rf (fn ([])
               ([result] (print (str \newline "--EOS")))
               ([result input] (print (str input \space))))
        process (xf rf)]
    (loop [c coll]
      (if (seq c)
        (let [result (process nil (first c))]
          (if (reduced? result)
            (process (deref result)) ; unwraps it and stop processing the stream
            (recur (rest c))))       ; continue processing the stream
        (process nil)))))

Reduce function as a parameter

What if we want to make our function more general and accept a "1-and-2-arity" reducing function as a parameter?

(defn multi-duce [xf rf init coll]
  (let [process (xf rf)]
    (loop [acc init
           c coll]
      (if (seq c)
        (let [result (process acc (first c))]
          (if (reduced? result)
            (process (deref result))
            (recur result (rest c))))
        (process acc)))))

(multi-duce (map inc) conj [] (list 3 4 5))
; => [4 5 6]

(multi-duce (partition-all 2) conj [] (list 3 4 5))
; => [[3 4] [5]]

Easy made simple (by my cat)

This is not my cat

My cat just told me that I am an idiot as there is a shorter and more efficient way to implement the function above.

I gave the keyboard to him and here is what he shown me:

(defn cat-duce [xf rf init coll]
  (let [process (xf rf)
        result (reduce process init coll)]
      (process result)))

(cat-duce (map inc) conj [] (list 3 4 5))
; => [4 5 6]

(cat-duce (partition-all 2) conj [] (list 3 4 5))
; => [[3 4] [5]]

Indeed, it seems to work. Thank you cat!

The transduce function

The cat-duce function is very close to the implementation of the legendary transduce function. It has the same signature and almost the same implementation, the difference being the added support for collections which want to be noticed when they are being reduced (like the result of the eduction function).

What's next

Congratulations !!!

By now you should probably know anything you need to use the transducers in your own custom way. You may still need to exercise your skills a little bit - Practice makes perfect.

I am going to publish some transducer-related exercises on May 30th 2018 to be used as a support for a transducer workshop in Taipei. The link will be added here soon after.

Stay tuned!

Permalink

Improving legacy Om code (II): Using effects and coeffects to isolate effectful code from pure code

Introduction.

In the previous post, we applied the humble object pattern idea to avoid having to write end-to-end tests for the interesting logic of a hard to test legacy Om view, and managed to write cheaper unit tests instead. Then, we saw how those unit tests were far from ideal because they were highly coupled to implementation details, and how these problems were caused by a lack of separation of concerns in the code design.

In this post we’ll show a solution to those design problems using effects and coeffects that will make the interesting logic pure and, as such, really easy to test and reason about.

Refactoring to isolate side-effects and side-causes using effects and coeffects.

We refactored the code to isolate side-effects and side-causes from pure logic. This way, not only testing the logic got much easier (the logic would be in pure functions), but also, it made tests less coupled to implementation details. To achieve this we introduced the concepts of coeffects and effects.

The basic idea of the new design was:

  1. Extracting all the needed data from globals (using coeffects for getting application state, getting component state, getting DOM state, etc).
  2. Using pure functions to compute the description of the side effects to be performed (returning effects for updating application state, sending messages, etc) given what was extracted in the previous step (the coeffects).
  3. Performing the side effects described by the effects returned by the called pure functions.

The main difference that the code of horizon.controls.widgets.tree.hierarchy presented after this refactoring was that the event handler functions were moved back into it again, and that they were using the process-all! and extract-all! functions that were used to perform the side-effects described by effects, and extract the values of the side-causes tracked by coeffects, respectively. The event handler functions are shown in the next snippet (to see the whole code click here):

Now all the logic in the companion namespace was comprised of pure functions, with neither asynchronous nor mutating code:

Thus, its tests became much simpler:

Notice how the pure functions receive a map of coeffects already containing all the extracted values they need from the “world” and they return a map with descriptions of the effects. This makes testing really much easier than before, and remove the need to use test doubles.

Notice also how the test code is now around 100 lines shorter. The main reason for this is that the new tests know much less about how the production code is implemented than the previous one. This made possible to remove some tests that, in the previous version of the code, were testing some branches that we were considering reachable when testing implementation details, but when considering the whole behaviour are actually unreachable.

Now let’s see the code that is extracting the values tracked by the coeffects:

which is using several implementations of the Coeffect protocol:

All the coeffects were created using factories to localize in only one place the “shape” of each type of coeffect. This indirection proved very useful when we decided to refactor the code that extracts the value of each coeffect to substitute its initial implementation as a conditional to its current implementation using polymorphism with a protocol.

These are the coeffects factories:

Now there was only one place where we needed to test side causes (using test doubles for some of them). These are the tests for extracting the coeffects values:

A very similar code is processing the side-effects described by effects:

which uses different effects implementing the Effect protocol:

that are created with the following factories:

Finally, these are the tests for processing the effects:

Summary.

We have seen how by using the concept of effects and coeffects, we were able to refactor our code to get a new design that isolates the effectful code from the pure code. This made testing our most interesting logic really easy because it became comprised of only pure functions.

The basic idea of the new design was:

  1. Extracting all the needed data from globals (using coeffects for getting application state, getting component state, getting DOM state, etc).
  2. Computing in pure functions the description of the side effects to be performed (returning effects for updating application state, sending messages, etc) given what it was extracted in the previous step (the coeffects).
  3. Performing the side effects described by the effects returned by the called pure functions.

Since the time we did this refactoring, we have decided to go deeper in this way of designing code and we’re implementing a full effects & coeffects system inspired by re-frame.

Acknowledgements.

Many thanks to Francesc Guillén, Daniel Ojeda, André Stylianos Ramos, Ricard Osorio, Ángel Rojo, Antonio de la Torre, Fran Reyes, Miguel Ángel Viera and Manuel Tordesillas for giving me great feedback to improve this post and for all the interesting conversations.

Permalink

Improving legacy Om code (I): Adding a test harness

Introduction.

I’m working at GreenPowerMonitor as part of a team developing a challenging SPA to monitor and manage renewable energy portfolios using ClojureScript. It’s a two years old Om application which contains a lot of legacy code. When I say legacy, I’m using Michael Feathers’ definition of legacy code as code without tests. This definition views legacy code from the perspective of code being difficult to evolve because of a lack of automated regression tests.

The legacy (untested) Om code.

Recently I had to face one of these legacy parts when I had to fix some bugs in the user interface that was presenting all the devices of a given energy facility in a hierarchy tree (devices might be comprised of other devices). This is the original legacy view code:

This code contains not only the layout of several components but also the logic to both conditionally render some parts of them and to respond to user interactions. This interesting logic is full of asynchronous and effectful code that is reading and updating the state of the components, extracting information from the DOM itself and reading and updating the global application state. All this makes this code very hard to test.

Humble Object pattern.

It’s very difficult to make component tests for non-component code like the one in this namespace, which makes writing end-to-end tests look like the only option.

However, following the idea of the humble object pattern, we might reduce the untested code to just the layout of the view. The humble object can be used when a code is too closely coupled to its environment to make it testable. To apply it, the interesting logic is extracted into a separate easy-to-test component that is decoupled from its environment.

In this case we extracted the interesting logic to a separate namespace, where we thoroughly tested it. With this we avoided writing the slower and more fragile end-to-end tests.

We wrote the tests using the test-doubles library (I’ve talked about it in a recent post) and some home-made tools that help testing asynchronous code based on core.async.

This is the logic we extracted:

and these are the tests we wrote for it:

See here how the view looks after this extraction. Using the humble object pattern, we managed to test the most important bits of logic with fast unit tests instead of end-to-end tests.

The real problem was the design.

We could have left the code as it was (in fact we did for a while) but its tests were highly coupled to implementation details and hard to write because its design was far from ideal.

Even though, applying the humble object pattern idea, we had separated the important logic from the view, which allowed us to focus on writing tests with more ROI avoiding end-to-end tests, the extracted logic still contained many concerns. It was not only deciding how to interact with the user and what to render, but also mutating and reading state, getting data from global variables and from the DOM and making asynchronous calls. Its effectful parts were not isolated from its pure parts.

This lack of separation of concerns made the code hard to test and hard to reason about, forcing us to use heavy tools: the test-doubles library and our async-test-tools assertion functions to be able to test the code.

Summary.

First, we applied the humble object pattern idea to manage to write unit tests for the interesting logic of a hard to test legacy Om view, instead of having to write more expensive end-to-end tests.

Then, we saw how those unit tests were far from ideal because they were highly coupled to implementation details, and how these problems were caused by a lack of separation of concerns in the code design.

Next.

In the next post we’ll solve the lack of separation of concerns by using effects and coeffects to isolate the logic that decides how to interact with the user from all the effectful code. This new design will make the interesting logic pure and, as such, really easy to test and reason about.

Permalink

neo4j-clj: a new Neo4j library for Clojure

On designing a ‘simple’ interface to the Neo4j graph database

While creating a platform where humans and AI collaborate to detect and mitigate cybersecurity threats at CYPP, we chose to use Clojure and Neo4j as part of our tech stack. To do so, we created a new driver library (around the Java Neo4j driver), following the clojuresque way of making simple things easy. And we chose to share it, to co-develop it under the Gorillalabs organization. Follow along to understand our motivation, get to know our design decisions, and see examples. If you choose a similar tech stack, this should give you a head start.

https://medium.com/media/1f3f622f0ec90f78b29d04f617cbdc88/href

Who we are

Gorillalabs is a developer-centric organization (not a Company) dedicated to Open Source Software development, mainly in Clojure.

I (@Chris_Betz on Twitter, @chrisbetz on Github) created Gorillalabs to host Sparkling, a Clojure library for Apache Spark. Coworkers joined in, and now Gorillalabs brings together people and code from different employers to create a neutral collaboration platform. I work at CYPP, simplifying cybersecurity for mid-sized companies.

Most of Gorillalabs projects stem from the urge to use the best tools available for a job and make them work in our environment. That’s the fundamental idea and the start of our organization. And for our project at CYPP, using Clojure and Neo4j was the best fit.

Why Clojure?

I started using Common LISP in the 90ies, moved to Java development for a living, and switched to using Clojure in production in 2011 as a good synthesis of the two worlds. And, while constantly switching roles from designing and developing software to managing software development back and forth, I specialized in delivering research-heavy projects.

For many of those projects, Clojure has two nice properties: First, it comes with a set of immutable data structures (reducing errors a lot, making it easier to evolve the domain model). And second, with the combination of ClojureScript and Clojure, you can truly use one language in backend and frontend code. Although you need to understand different concepts on both ends, with your tooling staying the same, it is easier to develop vertical (or feature) slices instead of horizontal layers. Check out my EuroClojure 2017 talk on that, if you’re interested.

https://medium.com/media/b2fa5069e95ef56312be270630318e91/href

Graphs are everywhere — so make use of them

For threat hunting, i.e. the process of detecting cybersecurity threats in an organisation, graphs are a natural data modelling tool. The most obvious graph is the one where computers are connected through TCP/IP connections. You can find malicious behaviour if one of your computers shows unwanted connections. (Examples are over-simplified here.)

But that’s just the 30.000-feet view. In fact, connections are between processes running on computers. And you see malicious behaviour if a process binds to an unusual port.

Processes are running with a certain set of privileges defined by the “user” running the process. Again, it’s suspicious if a user who should be unprivileged started a process listening for an inbound connection.

You get the point: Graphs are everywhere, and they help us cope with threats in a networked world.

Throughout our quest for the best solution around, we experimented with other databases and query languages, but we came to Neo4j and Cypher. First, it’s a production quality database solution, and second, it has a query language you really can use. We used TinkerPop/Gremlin before, but found it not easy to use for simple things, and really hard for complex queries.

Why we created a new driver

There’s already a Neo4j driver for Clojure. There’s even an example project on the Neo4j website. What on earth were we thinking creating our own Neo4j driver?

Neo4j introduced Bolt on Neo4j 3.x as the new protocol to interact with Neo4j. It made immediate sense, however, neocons did not pick it up, at least not at the pace we needed. Instead, it seemed as if the project lost traction, having had only very few contributions for a long time. So we needed to decide whether we should fork neocons to move it to Neo4j 3.x or not.

However, with bolt and the new Neo4j Java Driver, we would have implemented a second, parallel implementation of the driver. That was the point where we decided to go all the way building a new driver: neo4j-clj was born.

Design choices and code examples

Creating a new driver gave us the opportunity to fit it exactly to our needs and desires. We made choices you might like or disagree with, but you should know why we made them.

If you want to follow the examples below, you need to have a Neo4j instance up and running.

Then, you just need to know one namespace alias for neo4j-clj.core and one connection to your test database (also named db):

https://medium.com/media/9c96602b357f21077c3bdd11434a44ec/href

Using “raw” Cypher

The most obvious thing is our choice to keep “raw” Cypher queries as strings, but to be able to use them as Clojure functions. The idea to this is actually not new and not our own, but borrowed from yesql. Doing so, you do not bend one language (Cypher) into another (Clojure), but keep each language for the problems its designed for. And, as a bonus, you can easily copy code over from one tool (code editor) to another (Neo4j browser), or use plugins to your IDE to query a database with the Cypher queries from your code.

So, to create a function wrapping a Cypher query, you just wrap that Cypher string in a defquery macro like this:

https://medium.com/media/6be448981f84ee0c8b2fe7677a81039c/href

And, you can easily copy the string into your Neo4j browser or any other tool to check the query, profile it, whatever you feel necessary.

With this, you can easily run the query like this:

https://medium.com/media/b9ec287db031815c28f26eed8377db4e/href

and, depending on the data in your test database, will end up with a sequence of maps representing your hosts. For me, it’s something like this:

https://medium.com/media/75cd32512d9dd34ac2a396d757c9e30e/href

This style makes it more clear that you should not be constructing queries on the fly, but use a defined set of queries in your codebase. If you need a new query, define one specifically for that purpose. Think about which indices you need, how this query performs best, reads best, you name it.

However, this decision has some drawbacks. There’s no compiler support, no IDE check, as Cypher queries are not recognized as such. They are just strings. However, there’s not much Cypher support in IDEs anyhow. That’s different than with yesql, where you usually have SQL linting with appropriate files.

Each query function will return a list. Even if it’s empty. There’s no convenience function for creating queries to get a single object (for something like host-by-id). If you know there's only one, pick it using first.

Relying on the Java driver, but working with Clojure data structures

We just make use of the Java driver, so basically, neo4j-clj is only a thin wrapper. However, we wanted to be able to live in the Clojure world as much as possible. To us, that meant we need to interact with Neo4j using Clojure data structures. You saw that in the first example, where a query function returns a list of maps.

However, you can also parameterize your queries using maps:

https://medium.com/media/dbc68f61a42865756d5291929834b072/href

This example is more complex than necessary just to make a point clear: You can destructure Clojure maps {:host {:id "..."}} by navigating them in Cypher $host.id.

Nice thing is, you can easily test these queries in the Neo4j browser if you set the parameters correct:

https://medium.com/media/0fe02deb113c99e27c97581a3b85d1f6/href

Joplin integration built-in

We’re fans of having seeding and migration code for the database in our version control. Thus, we use Joplin and we suggest, you do, too. That’s why we built Joplin support right into neo4j-clj.

With Joplin, you can write migrations and seed functions to populate your database. This isn’t as important in Neo4j as it is in relational databases, but it’s necessary, e.g. for index or constraint generation.

First, Joplin migrates your database, if it isn’t at latest stage (path-to-joplin-neo4j-migrators points to a folder of migration files, which are applied in alphabetical order):

https://medium.com/media/c6df5d4905d6adfd0785eb8b1718eac6/href

And each migration file has (at least) the two functions up and down to perform the actual migration. For example:

https://medium.com/media/6513d70dc739ae701864bffeb7bbd8fc/href

Also, you can seed your database from a function like this:

https://medium.com/media/9d33faac226031d7b184dfe71b146e28/href

Now you can seed your database like this. Here, we use a config identical to the one from migration:

https://medium.com/media/c94556770eae4f68c4112efe77075944/href

With this seed function, you see a style we got used to: We prefix all the functions created by defquery with db> and we use the ! suffix to mark functions with side-effects. That way, you see when code leaves your platform and what you can expect to happen.

Tested all the way

Being big fans of testing, we wanted the tests for our driver to be as easy and as fast as possible. You should be able to combine that with a REPL-first approach, where you can experiment on the REPL. Luckily, you can run Neo4j in embedded mode, so we did not need to rely on an existing Neo4j installation or a running docker image of Neo4j. Instead, all our tests run isolated in embedded Neo4j instances. We just needed to make sure not to use the Neo4j embedded API, but the bolt protocol. Easy, my colleague Max Lorenz just bound the embedded Neo4j instance to an open port and connected the driver to that, just as you would do in production.

Using a with-temp-db-fixture, we just create a new session against that embedded database and test the neo4j-clj functions in a round-trip without external requirements. Voilá.

Use it, fork it, blog it

neo4j-clj is ready to be used. We do. We’d love to hear from you (@gorillalabs_de or @chris_betz). Share your experiences with neo4j-clj.

There are still some rough edges: Maybe you need more configuration options. Or support for some other property types, especially the new Date/Time and Geolocation types. We’ll add stuff over time. If you need something specific, please open an issue on Github, or add it yourself and create a Pull Request on the ‘develop’ branch.

We welcome contributions, so feel free to hack right away!


neo4j-clj: a new Neo4j library for Clojure was originally published in neo4j on Medium, where people are continuing the conversation by highlighting and responding to this story.

Permalink

Learning Ring And Building Echo

When you come to Clojure and want to build a web app you'll discover Ring almost immediately. Even if you use another library you are likely to find Ring in use.

What is Ring?

As stated in the Ring repository, it is a library that abstracts the details of HTTP into a simple API. It does this by turning HTTP requests into Clojure maps which can be inspected and modified by a handler which returns an HTTP response. The handlers are Clojure functions that you create. You are also responsible for creating the response. Ring connects your handler with the underlying web server and is responsible for taking the requests and calling your handler with the request map.

If you've had experience with Java Servlets you'll notice a pattern here but will quickly see how much simpler this is here.

Requests

Requests typically come from web browsers and can have a number of fields. Requests also have different types (GET, POST, etc), a unique URI with a query string, and message body. Ring takes all of this information and converts into a Clojure map.

Here is an example of a request map generated by Ring for a request to http://localhost:3000.

{:ssl-client-cert nil,
 :protocol "HTTP/1.1",
 :remote-addr "0:0:0:0:0:0:0:1",
 :headers
 {"cache-control" "max-age=0",
  "accept"
  "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
  "upgrade-insecure-requests" "1",
  "connection" "keep-alive",
  "user-agent"
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
  "host" "localhost:3000",
  "accept-encoding" "gzip, deflate, br",
  "accept-language" "en-US,en;q=0.9"},
 :server-port 3000,
 :content-length nil,
 :content-type nil,
 :character-encoding nil,
 :uri "/",
 :server-name "localhost",
 :query-string nil,
 :body "",
 :scheme :http,
 :request-method :get}
 

How did I get this map so I could show it here? From Echo, which is the application that I'll describe next.

Echo

When learning about Ring you'll hear all about the request map and you'll learn over time what fields it typically contains but sometimes you'll want to see it. Also, you may want to see what a script or form is posting to another system. In such a situation it can be very handy to point the script or form to a debugging site which shows what they are sending. This is the purpose of Echo.

To state it clearly. Echo is to be a web application that echoes back everything that is sent to it. It should take a request map and format it in such a way that it can be returned to the caller.

Steps

1. Start a new Clojure project

$ lein new echo

2. Add Ring dependencies

Add ring-core and ring-jetty-adapter to your project.clj file. Also, add a :main entry to echo.core so you can run your application.

  :dependencies [[org.clojure/clojure "1.8.0"]
                 [ring/ring-core "1.6.3"]
                 [ring/ring-jetty-adapter "1.6.3"]]
  :main echo.core

3. Create a handler and connect it to your Ring handler

Inside of core.clj' add a handler function and connect it to your adapter. Also add :gen-class and create a -main function so you can run` your application.

Here is the complete core.clj file. Notice that you are requiring ring.adapter.jetty. This is the adapter that represents the Jetty web server. It passes your handler requests.

Here the handler will return a minimal response with the words "Hello from Echo".

(ns echo.core
  (:require [ring.adapter.jetty :as jetty])
  (:gen-class))


(defn handler [request]
  {:status 200
   :header {"Content-Type" "text/plain"}
   :body "Hello from Echo"})


(defn -main []
  (jetty/run-jetty handler {:port 3000}))

At this point, you can test your app. From the root of your project enter the following to run it.

$ lein run

Then open a browser to http://localhost:3000/. You should see the following as a response.

Hello from Echo

4. Modify the handler to return the full request

Next, we'll modify the handler to return everything in the request. But, there are a couple things to figure out to make this work. First, you can't just send the request back else you'll get an error as the body of the request needs to be read.

To see what I mean modify your handler so it simply returns the request. The snippet looks like:

:body request

As a next step, you might want to pprint the request. You can try this by adding [clojure.pprint :as pprint] to your require clause and then calling pprint on the request and thinking it will go into the body by using the following snippet.

:body (pprint/pprint request)

Try that and watch the terminal where you entered lein run. You'll see the pprint output there.

Now that would be great if it was passed back to the browser. How? By capturing the output of pprint to a string and then passing that string to the browser through the :body field.

(defn handler [request]
  (let [s (with-out-str (pprint/pprint request))]
    {:status 200
     :header {"Content-Type" "text/plain"}
     :body   s
     }))

At this point go through and text with a new request from a browser. Here I see the following:

{:ssl-client-cert nil,
:protocol "HTTP/1.1",
 :remote-addr "0:0:0:0:0:0:0:1",
 :headers
 {"cache-control" "max-age=0",
  "accept"
  "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
  "upgrade-insecure-requests" "1",
  "connection" "keep-alive",
  "user-agent"
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
  "host" "localhost:3000",
  "accept-encoding" "gzip, deflate, br",
  "accept-language" "en-US,en;q=0.9"},
 :server-port 3000,
 :content-length nil,
 :content-type nil,
 :character-encoding nil,
 :uri "/",
 :server-name "localhost",
 :query-string nil,
 :body
 #object[org.eclipse.jetty.server.HttpInputOverHTTP 0x494e22ea "HttpInputOverHTTP@494e22ea"],
 :scheme :http,
 :request-method :get
 

There is one thing here that isn't a problem but will be when you try Echo with a POST from a form. It's the :body field. See how it is a HttpInputOverHTTP reference. This is something you want to read before sending so it shows up in the response. To do this see this final version of the handler.

(defn handler [request]
  (let [s (with-out-str (pprint/pprint (conj request {:body (slurp (:body request))})))]
    {:status 200
     :header {"Content-Type" "text/plain"}
     :body   s}))

Notice how the :body of the request was read with the slurp function and then the value of the :body field in the request is replaced with conj before being passed to pprint.

With a final test you should see something similar to the following:

{:ssl-client-cert nil,
 :protocol "HTTP/1.1",
 :remote-addr "0:0:0:0:0:0:0:1",
 :headers
 {"cache-control" "max-age=0",
  "accept"
  "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
  "upgrade-insecure-requests" "1",
  "connection" "keep-alive",
  "user-agent"
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
  "host" "localhost:3000",
  "accept-encoding" "gzip, deflate, br",
  "accept-language" "en-US,en;q=0.9"},
 :server-port 3000,
 :content-length nil,
 :content-type nil,
 :character-encoding nil,
 :uri "/",
 :server-name "localhost",
 :query-string nil,
 :body "",
 :scheme :http,
 :request-method :get}
 

Last test. Let's try posting something to our Echo.

$ curl --data 'firstname=Bob&lastname=Smith' http://localhost:3000/add/name?v=1

Here I see the following returned.

{:ssl-client-cert nil,
 :protocol "HTTP/1.1",
 :remote-addr "0:0:0:0:0:0:0:1",
 :headers
 {"user-agent" "curl/7.54.0",
  "host" "localhost:3000",
  "accept" "*/*",
  "content-length" "28",
  "content-type" "application/x-www-form-urlencoded"},
 :server-port 3000,
 :content-length 28,
 :content-type "application/x-www-form-urlencoded",
 :character-encoding nil,
 :uri "/add/name",
 :server-name "localhost",
 :query-string "v=1",
 :body "firstname=Bob&lastname=Smith",
 :scheme :http,
 :request-method :post}
 

Notice the body as well as the URI and query-string values. These are all pulled out of the request by Ring and as such, you can build handlers to look for values in these fields and respond accordingly.

Summation

At this point, you've created a very basic Ring application. Hopefully, it's one that you can use in the future to help debug other applications. Also, now that you see what fields are in requests you can build up from here.

Source code for the example Echo application is available at https://github.com/bradlucas/echo/tree/release/1.0.1.

Permalink

Glad you liked it!

Glad you liked it! :)
I’d be more than happy to have you translate it, do send me a link afterwards (I’m also learning Japanese in my free time xD).

I’m actually curious about the Clojure ecosystem in Japan too, are there any companies using it? What do you think?

Permalink

Shelving; Building a Datalog for fun! and profit?

This post is my speaker notes from my May 3 SF Clojure talk (video) on building Shelving, a toy Datalog implementation


Databases?

  • 1660s; the first data stores. These early systems were very tightly coupled to the physical representation of data on tape. Difficult to use, develop, query and evolve.
    • CODASYL, a set of COBOL patterns for building data stores; essentially using doubly linked lists
    • IBM's IMS which had a notion of hierarchical (nested) records and transactions
  • 1969; E. F. Codd presents the "relational data model"

Relational data

Consider a bunch of data

[{:type ::developer
  :name {:given "Reid"
         :family "McKenzie"
         :middle "douglas"
         :signature "Reid Douglas McKenzie"}
  :nicknames ["arrdem" "wayde"]}
 {:type ::developer
  :name {:given "Edsger"
         :family "Djikstra"}
  :nicknames ["ewd"]}]

In a traditional (pre-relational) data model, you could imagine laying out a C-style struct in memory, where the name structure is mashed into the developer structure at known byte offsets from the start of the record. Or perhaps the developer structure references a name by its tape offset and has a length tagged array of nicknames trailing behind it.

The core insight of the relational data model is that we can define "joins" between data structures. But we need to take a couple steps here first.

Remember that maps are sequences of keys and values. So to take one of the examples above,

{:type ::developer
 :name {:given "Edsger"
        :family "Djikstra"}
 :nicknames ["ewd"]}

;; <=> under maps are k/v sequences

[[:type ::developer]
 [:name [[:given "Edsger"]
         [:family "Djikstra"]]]
 [:nicknames ["ewd"]]]

;; <=> under kv -> relational tuple decomp.

[[_0 :type ::developer]
 [_0 :name _1]
 [_0 :nickname "ewd"]
 [_1 :given "Edsger"]
 [_1 :family "Djikstra"]]

We can also project maps to tagged tuples and back if we have some agreement on the order of the fields.

{:type ::demo1
 :foo 1
 :bar 2}

;; <=>

[::demo1 1 2] ;; under {0 :type 1 :foo 2 :bar}

Finally, having projected maps (records) to tuples, we can display many tuples as a table where columns are tuple entries and rows are whole tuples. I mention this only for completeness, as rows and columns are common terms of use and I want to be complete here.

foo bar
1 2
3 4

Okay so we've got some data isomorphisms. What of it?

Well the relational algebra is defined in terms of ordered, untagged tuples.

Traditionally data stores didn't include their field identifiers in the storage implementation as an obvious space optimization.

That's it. That's the relational data model - projecting flat structures to relatable tuple units.

Operating with Tuples

The relational algebra defines a couple operations on tuples, or to be more precise sets of tuples. There are your obvious set theoretic operators - union, intersection and difference, and there's three more.

cartesian product

let R, S be tuple sets ∀r∈R,∀s∈S, r+s ∈ RxS

Ex. {(1,) (2,)} x {(3,) (4,)} => {(1, 3,) (1, 4,) (2, 3,) (2, 4,)}

projection (select keys)

Projection is bettern known as select-keys. It's an operator for selecting some subset of all the tuples in a tuple space. For instance if we have R defined as

A B C
a b c
d a f
c b d

π₍a,b₎(R) would be the space of tuples from R excluding the C column -

A B
a b
d a
c b

selection

Where projection selects elements from tuples, selection selects tuples from sets of tuples. I dislike the naming here, but I'm going with the original.

To recycle the example R from above,

A B C
a b c
d a f
c b d

σ₍B=b₎(R) - select where B=b over R would be

A B C
a b c

Joins

Finally given the above operators, we can define the most famous one(s), join and semijoin.

join (R⋈S)

The (natural) join of two tuple sets is the subset of the set RxS where any fields COMMON to both r∈R and s∈S are "equal".

Consider some tables, R

A B C
a b c
d e f

and S,

A D E
a 1 3
d 2 3

We then have R⋈S to be

A B C D E
a b c 1 2
d e f 2 3

semijoin

This is a slightly restricted form of join - you can think of it as the join on some particular column. For instance, if R and S had several overlapping columns, the (natural) join operation joins by all of them. For instance we may want to have several relations between two tables - and consequently leave open the possibility of several different joins.

In general when talking about joins for the rest of this presentation I'll be talking about natural joins over tables designed for only overlapping field so the natural join and the semijoin collapse.

Enter Datalog

Codd's relational calculus as we've gone through is a formulation of how to view data and data storage in terms of timeless, placeless algebraic operations. Like the Lambda Calculus or "Maxwell's Laws of Software" as Kay has described the original Lisp formulation, it provides a convenient generic substrate for building up operational abstractions. It's the basis for entire families of query systems which have come since.

Which is precisely what makes Datalog interesting! Datalog is almost a direct implementation of the relational calculus, along with some insights from logic programming. Unfortunately, this also means that it's difficult to give a precise definiton of what datalog is. Like lisp, it's simple enough that there are decades worth of implementations, re-implementations, experimental features and papers.

Traditionally, Datalog and Prolog share a fair bit of notation so we'll start there.

In traditional Datalog as in Prolog, "facts" are declared with a notation like this. This particular code is in Souffle a Datalog dialect, which happened to have an Emacs mode. This is the example I'll be trying to focus on going forwards.

State("Alaska")
State("Arizona")
State("Arkansas")

City("Juneau", "Alaska")
City("Phoenix", "Arizona")
City("Little Rock", "Arkansas")

Population("Juneau", 2018, 32756)
Population("Pheonix", 2018, 1.615e6)
Population("Little Rock", 2018, 198541)

Capital("Juneau")
Capital("Phoenix")
Capital("Little Rock")

Each one of these lines defines a tuple in the datalog "database". The notation is recognizable from Prolog, and is mostly agreed upon.

Datalog also has rules, also recognizable from logic programming. Rules describe sets of tuples in terms of either other rules or sets of tuples. For instance

CapitalOf(?city, ?state) :- State(?state), City(?city, ?state), Capital(?city).

This is a rule which defines the CapitalOf relation in terms of the State, City and Capital tuple sets. The CapitalOf rule can itself be directly evaluated to produce a set of "solutions" as we'd expect.

?city and ?state are logic variables, the ?- prefix convention being taken from Datomic.

That's really all there is to "common" datalog. Rules with set intersection/join semantics.

Extensions

Because Datalog is so minimal (which makes it attractive to implement) it's not particularly useful. Like Scheme, it can be a bit of a hair shirt. Most Datalog implementations have several extensions to the fundimental tuple and rule system.

Recursive rules!

Support for recursive rules is one very interesting extension. Given recursive rules, we could use a recursive Datalog to model network connectivity graphs (1)

Reachable(?s, ?d) :- Link(?s, ?d).
Reachable(?s, ?d) :- Link(?s, ?z), Reachable(?z, ?d).

This rule defines reachability in terms of either there existing a link between two points in a graph, or there existing a link between the source point and some intermediate Z which is recursively reachable to the destination point.

The trouble is that implementing recursive rules efficiently is difficult although possible. Lots of fun research material here!

Negation!

You'll notice that basic Datalog doesn't support negation of any kind, unless "positively" stated in the form of some kind of "not" rule.

TwoHopLink(?s, ?d) :- Link(?s, ?z), Link(?z, ?d), ! Link(?s, ?d).

It's quite common for databases to make the closed world assumption - that is all possible relevant data exists within the database. This sort of makes sense if you think of your tuple database as a subset of the tuples in the world. All it takes is one counter-example to invalidate your query response if suddenly a negated tuple becomes visible.

Incremental queries / differentiability!

Datalog is set-oriented! It doesn't have a concept of deletion or any aggregation operators such as ordering which require realizing an entire result set. This means that it's possible to "differentiate" a Datalog query and evaluate it over a stream of incomming tuples because no possible new tuple (without negation at least) will invalidate the previous result(s).

This creates the possibility of using Datalog to do things like describe application views over incremental update streams.

Eventual consistency / distributed data storage!

Sets form a monoid under merge - no information can ever be lost. This creates the possibility of building distributed data storage and query answering systems which are naturally consistent and don't have the locking / transaction ordering problems of traditional place oriented data stores.

The Yak

Okay. So I went and build a Datalog.

Why? Because I wanted to store documentation, and other data.

95 Theses

Who's ready for my resident malcontent bit?

Grimoire

Grimoire has a custom backing data store - lib-grimoire - which provides a pretty good model for talking about Clojure and ClojureScript's code structure and documentation.

https://github.com/clojure-grimoire/lib-grimoire#things

lib-grimoire was originally designed to abstract over concrete storage implementations, making it possible to build tools which generated or consumed Grimoire data stores. And that purpose it has served admirably for me. Unfortunately looking at my experiences onboarding contributors it's clearly been a stumbling block and the current Grimoire codebase doesn't respect the storage layer abstraction; there are lots of places where Grimoire makes assumptions about how the backing store is structured because I've only ever had one.

Grenada

https://github.com/clj-grenada/grenada-spec

In 2015 I helped mentor Richard Moehn on his Grenada project. The idea with the project was to take a broad view of the Clojure ecosystem and try to develop a "documentation as data" convention which could be used to pack documentation, examples and other content separately from source code - and particularly to enable 3rdparty documenters like myself to create packages for artifacts we don't control (core, contrib libraries). The data format Richard came up with never caught on I think because the scope of the project was just the data format not developing a suite of tools to consume it.

What was interesting about Grenada is that it tried to talk about schemas, and provide a general framework for talking about the annotations provided in a single piece of metadata rather than relying on a hard-coded schema the way Grimoire did.

cljdoc

https://github.com/martinklepsch/cljdoc

In talking to Martin about cljdoc and some other next generation tools, the concept of docs as data has re-surfaced again. Core's documentation remains utterly atrocious, and a consistent gripe of the community yearly survey over survey.

Documentation for core is higher hit rate than documentation for any other single library, so documenting core and some parts of contrib is a good way to get traction and add value for a new tool or suite thereof.

Prior art

You can bolt persistence ala carte onto most of the above with transit or just use edn, but then your serialization isn't incremental at all.

Building things is fun!

Design goals

  • Must lend itself to some sort of "merge" of many stores
    • Point reads
    • Keyspace scans
  • Must have a self-descriptive schema which is sane under merges / overlays
  • Must be built atop a meaningful storage abstraction
  • Design for embedding inside applications first, no server

Building a Datalog

Storage models!

Okay lets settle on an example that we can get right and refine some.

Take a step back - Datalog is really all about sets, and relating a set of sets of tuples to itself. What's the simplest possible implementation of a set that can work? An append only write log!

[[:state "Alaska"]
 [:state "Arizona"]
 [:state "Arkansas"]
 [:city "Juneau" "Alaska"]
 [:city "Pheonix" "Arizona"]
 ...]

Scans are easy - you just iterate the entire thing.

Writes are easy - you just append to one end of the entire thing.

Upserts don't exist, because we have set semantics so either you insert a straight duplicate which doesn't violate set semantics or you add a new element.

Reads are a bit of a mess, because you have to do a whole scan, but that's tolerable. Correct is more important than performant for a first pass!

Schemas!

So this sort of "sequence of tuples" thing is how core.logic.pldb works. It maintains a map of sequences of tuples, keyed by the tuple "name" so that scans can at least be restricted to single tuple "spaces".

Anyone here think that truely unstructured data is a good thing?

Yeah I didn't think so.

Years ago I did a project - spitfire - based on pldb. It was a sketch at a game engine which would load data files for a the Warmachine table top game pieces and provide with a rules quick reference and ultimately I hoped a full simulation to play against.

As with most tabletop war games, play proceeds by executing a clock, and repeatedly consulting tables of properties describing each model. Which we recognize as database query.

Spitfire used pldb to try and solve the data query problem, and I found that it was quite awkward to write to in large part because it was really easy to mess up the tuples you put into pldb. There was no schema system to save you if you messed up your column count somewhere. I built one, but its ergonomics weren't great.

Since then, we got clojure.spec(.alpha) which enables us to talk about the shape and requirements on data structures. Spec is designed for talking about data in a forwards compatible way, unlike traditional type systems which intentionally introduce brittleness to enable evolution.

While this may or may not be an appropriate trade-off for application development, it's a pretty great trade-off for persisted data and schemas on persisted, iterated data!

https://github.com/arrdem/shelving#schemas

(s/def :demo/name string?)
(s/def :demo.state/type #{:demo/state})
(s/def :demo/state
  (s/keys :req-un [:demo/name
                   :demo.state/type]))

(defn ->state [name]
  {:type :demo/state, :name name})

(s/def :demo/state string?)
(s/def :demo.city/type #{:demo/city})
(s/def :demo/city
  (s/keys :req-un [:demo.city/type
                   :demo/name
                   :demo/state]))

(defn ->city [state name]
  {:type :demo/city, :name name, :state state})

(s/def :demo/name string?)
(s/def :demo.capital/type #{:demo/capital})
(s/def :demo/capital
  (s/keys :req-un [:demo.capital/type
                   :demo/name]))

(defn ->capital [name]
  {:type :demo/capital, :name name})

(def *schema
  (-> sh/empty-schema
      (sh/value-spec :demo/state)
      (sh/value-spec :demo/city)
      (sh/value-spec :demo/capital)
      (sh/automatic-rels true))) ;; lazy demo

Writing!

#'shelving.core/put!

  • Recursively walk spec structure
    • depth first
    • spec s/conform equivalent
  • Generate content hashes for every tuple
  • Recursively insert every tuple (skipping dupes)
  • Insert the topmost parent record with either with a content hash ID or a generated ID depending on record/value semantics.
  • Create schema entries in the db if automatic schemas are on and the target schema/spec doesn't exist in the db.

Okay so lets throw some data in -

(def *conn
  (sh/open
   (->MapShelf *schema "/tmp/demo.edn"
               :load false
               :flush-after-write false)))
;; => #'*conn

(let [% *conn]
  (doseq [c [(->city "Alaska" "Juneau")
             (->city "Arizona" "Pheonix")
             (->city "Arkansas" "Little Rock")]]
    (sh/put-spec % :demo/city c))

  (doseq [c [(->capital "Juneau")
             (->capital "Pheonix")
             (->capital "Little Rock")]]
    (sh/put-spec % :demo/capital c))

  (doseq [s [(->state "Alaska")
             (->state "Arizona")
             (->state "Arkansas")]]
    (sh/put-spec % :demo/state s))

  nil)
;; => nil

Schema migrations!

Can be supported automatically, if we're just adding more stuff!

  • Let the user compute the proposed new schema
  • Check compatibility
  • Insert into the backing store if there are no problems

Query parsing!

Shelving does the same thing as most of the other Clojure datalogs and rips off Datomic's datalog DSL.

(sh/q *conn
  '[:find ?state
    :in ?city
    :where [?_0 [:demo/city :demo/name] ?city]
           [?_0 [:demo/city :demo.city/state] ?state]
           [?_1 [:demo/capital :demo/name] ?city]])

This is defined to have the same "meaning" (query evaluation) as

(sh/q *conn
      '{:find  [?state]
        :in    [?city]
        :where [[?_0 [:demo/city :demo/name] ?city]
                [?_0 [:demo/city :demo.city/state] ?state]
                [?_1 [:demo/capital :demo/name] ?city]]})

How can we achieve this? Let alone test it reasonably?

Spec to the rescue once again! src/test/clj/shelving/parsertest.clj conform/unform "normal form" round-trip testing!

Spec's normal form can also be used as the "parser" for the query compiler!

Query planning!

Traditional SQL query planning is based around optimizing disk I/O, typically by trying to do windowed scans or range order scans which respect the I/O characteristics of spinning disks.

This is below the abstractive level of Shelving!

Keys are (abstractly) unsorted, and all we have to program against is a write log anyway! For a purely naive implementation we really can't do anything interesting, we're stuck in an O(lvars) scan bottom.

Lets say we added indices - maps from ids of values of a spec to IDs of values of other specs they relate to. Suddenly query planning becomes interesting. We still have to do scans of relations, but we can restrict ourselves to talking about subscans based on relates-to information.

  • Take all lvars
  • Infer spec information from annotations & rels
  • Topsort lvars
  • Emit state-map -> [state-map] transducers & filters

TODO: - Planning using spec cardinality information - Simultaneous scans (rank-sort vs topsort) - Blocked scans for cache locality

Goodies

API management!

Documentation generation!

Covered previously on the blog - I wrote a custom markdown generator and updater to help me keep my docstrings as the documentation source of truth, and update the markdown files in the repo by inserting appropriate content from docstrings when it changes.

More fun still to be had

What makes Datalog really interesting is that among the many extensions which have been proposed is support for recursive rules.

Negation!

  • Really easy to bolt onto the parser, or enable as a query language flag
  • Doesn't invalidate any of the current stream/filter semantics
  • Closed world assumption, which most databases happily make

Recursive rules!

More backends!

Transactions!

  • Local write logs as views of the post-transaction state
  • transact! writes an entire local write log all-or-nothing
  • server? optimistic locking? consensus / consistency issues

Ergonomics!

The query DSL wound up super verbose unless you realy leverage the inferencer :c

Actually replacing Grimoire…

  • Should have just used a SQL ;) but this has been educational

Permalink

Software Developer (Flowerpilot) (m/f) at LambdaWerk GmbH (Full-time)

Join us as a software developer to grow a new product in the agricultural analytics field. Our client is a startup in the United States who is working on a plant identification system based on Ion Mobility Spectroscopy (IMS). We're looking for a lead programmer to work on scientific measurement workflow support systems, mobile applications and embedded integration system integrat

What you'll do:

Develop lab workflow support and demonstrator applications

Evaluate technology and implement prototypes

Integrate systems with existing deployment infrastructure

Support data science development for measurement classification

What we expect from you:

A desire to program

Professional experience with functional programming

Be comfortable with multiple programming languages

Experience with Clojure and the JVM is a plus

Willingness to embrace XML

Knowledge of web technology and internet protocols

Experience with GIT, Linux shell

Ability to communicate efficiently with your colleagues in English (written and spoken)

What we offer:

A nice office in Berlin

International project setting in a dynamic market

A small, focused and experienced team

Lots of interesting technology to learn and use

Training seminars and conference visits

A competitive salary

About LambdaWerk

We are a software development shop specializing in the implementation of data processing systems for healthcare and insurance companies. We are owned by a major US player in the dental healthcare space and our systems play a crucial role in their day-to-day operations. This is a permanent, full-time position in Berlin, Germany, requiring on-site presence; we are not presently offering visa sponsorship. We are very interested in increasing the diversity of our team, so please do not hesitate to apply especially if you're not a white male!ion to pave the path from prototype to product.

Get information on how to apply for this position.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.