Dynamic prepared statements in Clojure

Recently, at Logic Soft, we put our first (small) clojure project into production. It was a small library management system that was custom developed for one of our existing clients.

At Logic Soft, we are majorly a .NET house but we're beginning to experiment with different platforms and technologies and felt that this was a small enough project to give clojure a shot (alongside other experiments we conducted with it).

This is our first time writing clojure code (at all) so the code is quite naive and very imperative in nature. Because clojure natively enforces immutability, that was something that we got for free with the choice of language itself.

One of the challenges we faced was to dynamically create prepared statements to query the DB. The requirement for this came from having to generate reports in the app with a variety of inputs. We didn't want to just append everything into a single query string since that is a huge risk. Prepared statements are the way to go.

Also, we just went with using plain old clojure.java.jdbc without any DSLs. We made the decision to use as less libraries as possible and stick to the absolute essentials writing the major chunk of code ourselves for the sake of experience.


Disclaimer: Since this is our first project and my first time as well writing anything for production in clojure so there might be a lot of mistakes. I am always looking to learn more so please feel free to write in / tweet me with any constructive criticism and feedback.

Also this post is long because it has a lot of code in it.


Today I'd like to share how we went about constructing these dynamic prepared statements in clojure.

Dependencies

I totally recommend the lein-try plugin by Ryan Neufeld to quickly try stuff out in the REPL before using it in the project itself.

These are my dependencies both in lein-try format for you to throw into the command line or in project.clj format.

lein-try

$ lein try org.clojure/java.jdbc "0.3.0" org.postgresql/postgresql "9.2-1003-jdbc4"

project.clj

:dependencies [;...
               [org.clojure/java.jdbc "0.3.0"]
               [org.postgresql/postgresql "9.2-1003-jdbc4"]]

Querying

Issuing queries on to a db with clojure.java.jdbc is really super simple. Let us start off by requireing the clojure.java.jdbc namespace in the REPL

(require '[clojure.java.jdbc :as j])

A map is used to define the connection parameters.

(def conn {:classname "org.postgresql.Driver"
           :subprotocol "postgresql"
           :subname "//localhost:5432/DB"
           :user "postgres"
           :password "postgres"})

With that defined, we can fire a query to get a list of authors from the DB:

(j/query conn
         ["select id, trim(name) as name from author order by id"])

To fetch a particular author, we use ? in the query followed by the parameter in the vector we pass to the query method:

(j/query db/conn
         ["select id, trim(name) as name from author where id = ?" 1])

To insert a new author:

(j/insert! conn
           :author
           {:name name}))

You can find a lot more great documentation for clojure.java.jdbc over on the API Reference or on the community driven documentation site

Understanding the case at hand

Let us consider a report that allows us to filter out the books based on the author, the publisher and parts of the title itself. The base query to get the columns is quite simple for this:

Note: let us for the sake of not diverging from the topic, not discuss the reasoning behind the schema

select  i.isbn, 
        i.title. 
        a.name as author, 
        p.name as publisher

from item as i

inner join author as a 
  on i.author_id = a.id

inner join publisher as p 
  on i.publisher_id = p.id

where i.type = 1

If an Author is provided, the query needs to be appended with

and a.id = 1

If a Publisher is provided, the query needs to be appended with

and p.id = 2

If parts of a title are provided, say Art and Programming, then the query needs to be appended with

and upper(i.title) like '%ART%' and upper(i.title) like '%PROGRAMMING%'

Preparing the Prepared Statement

We now know that for substitution, we just need to have ? in the query string and pass in the arguments appropriately. So considering all the filter are applied as in the last section, our call to query should be something like this

(j/query conn
         ["select i.isbn, 
                  i.title,
                  a.name as author, 
                  p.name as publisher

          from item as i

          inner join author as a 
          on i.author_id = a.id

          inner join publisher as p 
          on i.publisher_id = p.id

          where i.type = 1
                and a.id = ?
                and p.id = ?
                and upper(i.title) like ?
                and upper(i.title) like ?"

          1 2 "%ART%" "%PROGRAMMING%"])

How hard can this be?

Since the core of clojure's data structures are immutable, thinking about how to do such a thing dynamically took me a bit to figure out.

I took to expressing the construction of the prepared statement in terms of a map, with a :query key to store the base query and a :params key to store a vector of params to be passed in relative to all the ?s in the :query

This went into a let block in a function filter-titles. All of the prepared statement construction had to happen with the let block since that is one among the only places where one can define things in clojure.

(defn filter-titles [author publisher title-parts]
  (let [ps-params {:query "select i.isbn, 
                                  i.title. 
                                  a.name as author, 
                                  p.name as publisher

                           from item as i

                           inner join author as a 
                           on i.author_id = a.id

                           inner join publisher as p 
                           on i.publisher_id = p.id

                           where i.type = 1"
                   :params []}

        ;...

Based on the presence of the filters I had to append a piece of text to the query and an argument to the argument list.

For discussion purposes I've abstracted the part that ensures that author, publisher and title-parts either has some value or nil. This means that I can use a simple if to check for their presence and then append and return a new ps-params.

To take the old ps-params and return a new one with the appended where clause to :query and the respective parameter to :params, I wrote a function that takes the old ps-params, a where-clause to be appended and a where-clause-value to be substituted, if any and returns a new map with the query and parameter appended

(defn add-to-prepared-statement [prepared-statement
                                 where-clause
                                 where-clause-value]
  (let [{:keys [query params]} prepared-statement]
    {:query (str query " " where-clause)
     :params (if where-clause-value
               (conj params where-clause-value)
               params)}))

It first extracts the :query and :params from the given prepared statement

(let [{:keys [query params]} prepared-statement]

then, return a map with the where clause appended to :query

{:query (str query " " where-clause)

And if there is a where-clause-value, conj that into the :params

:params (if where-clause-value
         (conj params where-clause-value)
         params)}

With the add-to-prepared-statement function in my arsenal, appending to ps-params became easier and the let definition of filter-titles could be extended with

;...
ps-params (if author
            (add-to-prepared-statement ps-params
                                       "and a.id = ?"
                                       author)
            ps-params)
ps-params (if publisher
            (add-to-prepared-statement ps-params
                                       "and p.id = ?"
                                       publisher)
            ps-params)
;...

Okay! So far so good.

At this point though, I hit another road block because based on the amount of words in title-part, I had to dynamically append that many and upper(i.title) like ? clauses into :query and that many values into :params.

I could've done this with a loop but decided that I want to do it functionally.

So first, I had to write a function to clean and split the given title-parts string

(defn clean-and-split [title-parts]
  (-> title-parts
      (str/trim)
      (str/replace #"\ +" " ")
      (str/split #" ")))

For this, I first trim the string, replace multiple spaces with a single space and split it at the space character to get a list of words. This function uses another neat clojure feature called the Thread first macro (->) which I spoke a little about in my previous post about clojure. It is something that you should definitely check out.

Now that we have a list, we need to generate that many and upper(i.title) like ? strings to be appended into :query.

The way I chose to do this is by using the repeat function in clojure which returns a lazy, or if a length is specified, that many number of occurrences of whatever is specified.

(let [title-part-splits (clean-and-split title-parts)
      no-of-parts (count title-part-splits)
      title-part-queries (repeat no-of-parts "and upper(i.title) = ?")
      ;...

Now that I had a vector containing as many parts I wanted to be appended into the query, I just used a simple reduce to put them all together.

(let [title-part-splits (clean-and-split title-parts)
      no-of-parts (count title-part-splits)
      where-clause (reduce #(str %1 " " %2)
                           (repeat no-of-parts 
                                   "and upper(i.title) = ?"))
      ;...

before calling add-to-prepared-statement, I needed to append and prepend a % character to all the parts of the title and convert them to upper case. This was simple enough with a map function

(let [title-part-splits (clean-and-split title-parts)
      no-of-parts (count title-part-splits)
      where-clause (reduce #(str %1 " " %2)
                           (repeat no-of-parts 
                                   "and upper(i.title) = ?"))
      where-clause-values (map #(str "%" (str/upper-case %) "%")
                               title-part-splits)
      ;...

Soon enough I hit the next road block with the add-to-prepared-statement function since I had originally only intended for it to take a single where-clause-value. Now I had a vector whose contents needed to be appended into the :params vector.

To do this, I changed the add-to-prepared-statement function and checked if the parameter provided was a single value or a sequential and accordingly either did a conj or did an into

(defn add-to-prepared-statement [prepared-statement 
                                 where-clause 
                                 where-clause-value]
  (let [{:keys [query params]} prepared-statement
        new-query (str query " " where-clause)
        new-params (if where-clause-value
                     (if (sequential? where-clause-value)
                       (into params where-clause-value)
                       (conj params where-clause-value))
                     params)]
    {:query  new-query
     :params new-params}))

This was a great guide that helped me understand when to use coll?, sequential? and the other collection/sequence comparison functions.

With these modifications, I could now extend the let of the original filter-titles function

;...
ps-params (if title-parts
            (let [title-part-splits (clean-and-split title-parts)
                  no-of-parts (count title-part-splits)
                  where-clause (reduce #(str %1 " " %2)
                                       (repeat no-of-parts 
                                               "and upper(i.title) = ?"))
                  where-clause-values (map #(str "%" (str/upper-case %) "%")
                                           title-part-splits)]
              (add-to-prepared-statement ps-params
                                         where-clause
                                         where-clause-values))
            ps-params)
;...

By the end of this, in ps-params, I had all I wanted to create the prepared statement to pass to jdbc's query function.

;...
prepared-statement (-> []
                       (conj (:query ps-params))
                       (into (:params ps-params)))
;...

Which really left the body of the filter-titles function to be as simple as

(j/query conn
         prepared-statement)

With this approach, I was able to dynamically create the prepared statement I wanted without just appending everything into one single query string.

Here is the whole filter-titles function for completion sakes

(defn filter-titles [author publisher title-parts]
  (let [ps-params {:query "select i.isbn, 
                                  i.title. 
                                  a.name as author, 
                                  p.name as publisher

                           from item as i

                           inner join author as a 
                           on i.author_id = a.id

                           inner join publisher as p 
                           on i.publisher_id = p.id

                           where i.type = 1"
                   :params []}
        ps-params (if author
                    (add-to-prepared-statement ps-params
                                               "and a.id = ?"
                                               author)
                    ps-params)
        ps-params (if publisher
                    (add-to-prepared-statement ps-params
                                               "and p.id = ?"
                                               publisher)
                    ps-params)
        ps-params (if title-parts
                    (let [title-part-splits (clean-and-split title-parts)
                          no-of-parts (count title-part-splits)
                          where-clause (reduce #(str %1 " " %2)
                                               (repeat no-of-parts 
                                                       "and upper(i.title) = ?"))
                          where-clause-values (map #(str "%" (str/upper-case %) "%")
                                                   title-part-splits)]
                      (add-to-prepared-statement ps-params
                                                 where-clause
                                                 where-clause-values))
                    ps-params)
        prepared-statement (-> []
                               (conj (:query ps-params))
                               (into (:params ps-params)))]
    (j/query conn
             prepared-statement)))

Conclusion

This was the approach that I took for the rest of the reports implemented in the system as well. Because of my history with mutating variables all around and not thinking functionally, it was hard to reason this out initially. But when it finally hit, the feeling was priceless :)

My sincerest gratitude goes out Ramki Sir and Dheeraj who helped me with pulling this project together and seeing it to production. I've come to notice that people in the Clojure community are very forthcoming and helpful and that means a lot to a beginner like me with their first project.

Thank you, everyone! Hope this helped.

Permalink

Displanting a Function OR Folding a Reverse

"Displant a town, reverse a prince's doom"
-- Shakespeare, Romeo and Juliet
Act III, Scene III, Line 60

Reverse


The interesting thing about the Reverse function is that it is not really doing anything.  With a small clerical error in a visit and recombine function you have reverse.

In Dr. Hutton's excellent paper, "A tutorial on the universality and expressiveness of fold", the following definition is given for reversing:

reverse :: [α] → [α]
reverse = fold (λx xs → xs ++ [x]) [ ]

We see that we concat the memoize with the member, in that order, thus reversing the collection.


Since I like advertising myself, let us go through an example with my name (someone has to advertise for me).


First time the Memoize has nothing and X has M.


Second time the Memoize has M and X is i.


Third time Memoize has i and M and X has k.


Fourth time Memoize has k, i, and M while X has e.


Leaving us with ekiM.

Let us look at some code examples.

Clojure




We see with the Clojure code we are using the cons function to place the current member in the front of the memoize.  We do this for the whole collection thus giving us the collection in reverse.

C#




With the C# code we see that we need to create something to contain the resulting collection, in this case we'll create a List.  We create the reversed collection in an immutable way by creating a new List every time in the lambda.

ECMAScript 2015




With JavaScript (ECMAScript 2015) we us the unshift method on the array.  Since unshift returns the length of the array after the member is added to the head of the array (which I totally did not expect) we need to manually return the memoize after applying unshift.

Fin


There you have it again, yet another function which can be created using Fold.  Showing once again that all you need is Fold.

Permalink

Using Bayes Theorem for NI Startup Probabilities (#startups #clojure #statistics)

This is probably as close to serious as I’ll ever get on the subject, so hold on to your hipster pork pie hats…. The title headings are based on a fairly common path for Northern Ireland startups, other territories will have their own methods I’m sure. Regardless, I need a picture….

Nathan-Barley

The Odds Are Against You.

The harsh reality is that the odds are stacked against you for succeeding. I’ll be ultra liberal with my probabilities and say 4% (I should really be saying 2% but it’s a Bank Holiday Weekend and I’m in a good mood and not my grumpy self). This number could be quantified by mining all the previous startups and seeing who lasted longer than 3 years for example. So four in every hundred isn’t a bad starting point. Let’s call this our prior probability.

What we’re trying to establish is that if an event happens during the startup journey what will that do to the existing probability. The nice thing with Bayes, as you’ll see, is that for every milestone event (or any event) we can re-run the numbers. Ready?

Wow! We Got Proof Of Concept 40k!

Current Prior Probability: 4%

Great news! The good folks at TechstartNI have shone the light on your idea and given the clawback funds for you to build via a respected development house or developer. Will that have an effect on our post probability? It may do but PoC is not a confirmation of your startup really, just access to build.  What we can do though is use Bayes Theorem to recalculate the probability now we have a new event to include.

Bayes In Clojure

So Bayes works on three inputs, the prior probability (in our case the 4% figure we started with), a positive impact on the hypothesis that you’ll last longer than three years and a negative impact on the hypothesis that you won’t last longer than three years.

Assuming that x is our prior, y is the positive event and z is the negative event. We can use the formula: (x * y) / ((x * y) + (z * (1 – x)))

If I were code that up in Clojure it would look like this:

(ns very-simple-bayes.core)

(defn calculate-bayes [prior-prob newevent-positive-hypothesis newevent-negative-hypothesis]
  (double (/ (* prior-prob newevent-positive-hypothesis) 
             (+ (* prior-prob newevent-positive-hypothesis) 
                (* newevent-negative-hypothesis (- 1 prior-prob))))))

(defn calculate-bayes-as-percent [prior-prob newevent-positive-hypothesis newevent-negative-hypothesis]
  (* 100 (calculate-bayes prior-prob newevent-positive-hypothesis newevent-negative-hypothesis)))

The first function does the actual Bayes calculation and the second function merely converts that in to a percentage for me.

Right, back to our TechstartNI PoC. Let’s see how that affects our chances of survival.

Just because PoC funds give you some ground to build a product it has little impact on the survival of the company as a whole. Being liberal again let’s say the positive impact on the hypothesis is 90% and the negative will be 10%. Can only be a good thing to have a product to sell.

very-simple-bayes.core> (calculate-bayes-as-percent 0.04 0.9 0.1)
27.272727272727277

While PoC has a huge effect on you getting product out of the door (do I dare utter the letters M, V and P at this point) it has little effect in your long term survival. So your 4% chance of three year survival has gone to 27.2%. A positive start but all you have is a product.

Put the Champagne on ice just don’t open it…..

Propelling Forward

Current Prior Probability: 27.2%

The next logical step is to look at something like the Propel Programme to get you in the sales and marketing mindset but also making you “investor ready” which is what I see the real aim of Propel to be. So with the new event we can recalculate our survival probability. The 20k doesn’t make a huge dint in your survival score, it helps you get through though, I will take that into account.

As I’ve not experienced Propel first hand it’s unfair to me to say how things will pan out, you’ll have to ask someone who’s done it. It doesn’t, though, stop me guessing some numbers out of the air to test against, and you should really do the same.

Propel will have a positive impact on your startup, no doubt, there’s a lot to learn and you’ll be in the same room as others going through the same process. The “up to” £20k is good to know but there’s no 100% certainty, apart from death and taxes, that you’ll get the full amount.

Propel’s positive probability on hypothesis: 40%

Propel’s false positive probability on the hypothesis: 80%

Running the numbers through Bayes again, let’s see what the new hypothesis probability is looking like.

very-simple-bayes.core> (calculate-bayes-as-percent 0.272 0.4 0.8)
15.74074074074074

That brought us back down to earth a bit. 15.74% chance of a positive hypothesis. No reflection on Propel at all, that’s just how the numbers came out. Now I could be all biased as say that if you do Propel you’re gonna be a unicorn-hipster-star but the reality is far from that.

The false positive is interesting, doing these things can sometimes fool the founder into thinking they’re doing far better than they think they are. If 100 startups went through Propel and 40 are still trading today then our positive event probability is right. And that’s the nice thing about applying Bayes in this way, we can make some fairly reasoned assumptions that we can use to calculate our scores with.

I’m Doing Springboard too Jase!

Okay! And the nice thing is these things can happen in parallel, but looks treat it as a sequential matter to preserve sanity in the probability.

Current Prior Probability: 15.74%

I think the same event +/- probabilities would apply here. Springboard is good for mentorship and contacts. My hand on heart gut thinks the numbers are going to be the same as Propel’s for what we are looking at here.

Springboard positive hypothesis: 40%

Springboard as a false positive on hypothesis: 80%

Let’s run the numbers again (now you can see why I wrote a Clojure program first).

very-simple-bayes.core> (calculate-bayes-as-percent 0.1574 0.4 0.8)
8.54227721697601

Interestingly, according the numbers, doing the two programmes has a negative impact on your startup if you have no revenue and no customers, well the numbers say so.

Getting That First Seed.

Current Prior Probability: 8.54%

Okay, so you’ve your Proof of Concept in hand, grafted through Propel and then gone through Springboard until your get that nice picture on the website. “Investor Ready” is an odd term, markets can change, fashions come and go and investors go looking for different things as time goes by. So all the while you’ve been slaving investors could be looking for something else.

So the opportunity arrives to pitch to one of the NI based VC’s/Angels for some “proper” money. Once again a fairly normal route to go down. It could be a mixture of different funding places (Crescent, Kernel or Techstart). If accepted the goalposts change as it’s people on the board and results focus (ie are you hitting target month on month).

The average figure for a seed round in NI is between £150-£300k but I’ll head for the upper figure. Regardless money in the bank (even though it’s not yours) is a good thing if you are prepared to give up some equity. Saying that investments can go bad, so we need to ying and yang this out a bit. So I’ve put in that there’s a 20% chance of the investment being a false positive. Once again if you had the term sheets of 20 companies you’d be able to do some maths yourself to get a better idea.

Seed round has positive outcome on hypothesis: 70%

Seed round has is a false positive on hypothesis: 20%

Investment is good PR and the hype cycle loves a good startup investment story. it opens up the doors to talking guff in far more many places than you did before. How does that affect our probability though?

very-simple-bayes.core> (calculate-bayes-as-percent 0.0854 0.7 0.2)
 24.631231973629998

Positive indeed. From 4.0% to 24.6% is good. From 1 in 25 to 1 in 4 chance of lasting three years, though the majority hinged on investment in the latter stages of you getting investment. There’s a chance by this point you’d be two thirds the way in of the three year plan.

Blessed Are The 2%

At the start I used 4% as a very optimistic probability of a startup lasting more than three years. I wonder what would happen if I went for a realistic start point of 2%?

Proof Of Concept Stage

very-simple-bayes.core> (calculate-bayes-as-percent 0.02 0.9 0.1)
15.517241379310345

Propel Stage

very-simple-bayes.core> (calculate-bayes-as-percent 0.1551 0.4 0.8)
8.40695972681446

Springboard Stage

very-simple-bayes.core> (calculate-bayes-as-percent 0.0840 0.4 0.8)
4.3841336116910234

Investor Stage

very-simple-bayes.core> (calculate-bayes-as-percent 0.0438 0.7 0.2)
13.817034700315457

So a 1 in 7 chance of you lasting 3 years….. you can also finally put your “passion” line to bed as well.

Concluding…..

You can see why some folk decide to hold startup events. The risk is far lower and the chances of sponsorship are increased and repetitive. Saying that we know that those who want to work 17 hours a day by the seat of their knickers will continue to do so. Depending on your point of view it is indeed easier to sell the shovels instead.

And As For That Champagne….

marilynMarilyn drank it.

 

 

 

 

 

 

 

 

 

 


Permalink

#171: Living Clojure, ClojureScript, and more with Carin Meier

Our guest this week is Carin Meier. She joins the show to talk about Clojure, ClojureScript, her book Living Clojure, all the fun things she loves about math, physics, and creating a programming language.

Download: MP3 Audio

Show sponsors

  • Codeship – Get started for free, or use the code THECHANGELOGPODCAST to get 20% off ANY plan for 3 months
  • imgix – Real-time Image Processing. Resize, crop, and process images on the fly, simply by changing their URLs.
  • DigitalOcean – Use the code CHANGELOG to get a $10 hosting credit when you create your DigitalOcean account

Show notes and links


Subscribe to Changelog Weekly - our free weekly email covering everything that hits our open source radar.


The post #171: Living Clojure, ClojureScript, and more with Carin Meier appeared first on The Changelog.

Permalink

Announcement - Our 'Storefront' is Open Source

We’re excited to have officially open sourced the front end of the Mayvenn shopping experience!

We’ve previously written about the architecture in our Transitions and Effects blog post but now our whole application is available on GitHub. The readme also has a bit more description/explanation of some of how we thought to structure things. We hope that it can serve as another reference point as to what a full single page app written in ClojureScript/Om could look like.

It includes integration with HTML5 History, Optimizely, Honeybadger, Yotpo and of course a bunch of ClojureScript libraries that make it all possible. We think these could be handy examples for similar integrations.

A quick note about the license:

We’ve put an ‘All Rights Reserved’ since our images/styles/etc are in the same repo as the source code and Mayvenn maintains all rights over these works (they may not be used/copied/modified without permission). Rather than come up with some sort of complicated license that protects specifically these assets, we’ve marked the whole repo as ‘All Rights Reserved’. We hope that this is still useful to see some of the patterns. It’s certainly a departure from our other open source repositories and we hope to not have to do so again!

Permalink

Practical Data Coercion with Prismatic/schema

If you follow me on Twitter, you probably know I’m a big fan of Prismatic’s schema library, which gives us a convenient way to validate and enforce the format of data in our Clojure applications. I use schema extensively both to provide some of the comfort / confirmation of a static type system, and to enforce run-time contracts for data coming off the wire.

But a problem quickly arises when we’re enforcing contracts over data drawn from an external format like JSON: the range of types available in JSON is limited compared to what we’re used to in Clojure, and might not include some of the types we’ve used in our schemas, leaving our schemas impossible to satisfy. Note that I’m not necessarily talking about anything exotic—simple things like sets, keywords, and dates are missing. The situation is even worse if we’re talking about validating command line parameters, where everything is a string regardless of if it logically represents a number, an enumeration value, or a URL.

What are we to do? Try to walk this data of unknown format, which is perhaps nested with optional components, transforming certain bits, and then running the result through our schema validation? That sounds ugly. And what do those error messages look like when it doesn’t match? Or we could validate that our (say) “date” parameters are present and are strings in a format that looks like we could parse it, then transform the data (which is at least in a known format now), and then validate it again? Obviously that’s less than ideal. And we’re going to end up with a proliferation of schemas which differ only in predictable ways—e.g. “params come in as a hash of two date-like strings, then get transformed to a hash of two dates”.

Fortunately for us, the fine folks at Prismatic must have run into this before we did, and thus they provided a fine solution in the form of schema-driven data transformations, which allow us to say “here are all the (consistent, well-defined) tricks you can use to beat this data into the right format—could you make it validate? And what did that resultant, valid data look like?” The official docs are good, and this blog post contains a wealth of information, but I found myself struggling to understand certain parts of the documentation until I’d read some of the implementation details and struggled through some coercion code of my own1. My goal here is to provide a practical example of how to use schema’s coercions so you can hit the ground running.

An Illustrative Example

Pretend we’re writing a command-line tool to download users' tweets and output them to a local archive in either plain text or JSON format. We’ll also allow setting a date indicating the earliest tweets we want to fetch in case we don’t need every tweet the users have ever written. Oh, and Ops wants our configuration to be done via JSON. Something, something, Docker.

Configuring the application is pretty simple: we’ll need a set of usernames (strings), a date, and a keyword indicating the output format:2

```clj (ns camdez.blog.coerce (:require [clojure.data.json :as json]

        [clojure.instant :refer [read-instant-date]]
        [clojure.java.io :as io]
        [schema.coerce :as coerce]
        [schema.core :as s]
        [schema.utils :as s-utils])

(:import java.util.Date))

(def Config {:users #{s/Str} :after (s/maybe Date) :format (s/enum :txt :json)}) ```

If it seemed odd earlier when I said we might use types in our schema that JSON doesn’t support—why not just not do that?—hopefully this clears things up; everything used above is pretty standard Clojure data modeling—and none of it works in JSON. To illustrate, let’s try loading our config from a JSON-format file.

Here’s a fairly natural JSON representation of our configuration:

json { "users": ["horse_ebooks", "swiftonsecurity"], "after": "2015-01-01T00:00:00.000Z", "format": "txt" }

Let’s drop that into a config.json file, and then add a quick function to encapsulate the repetitive elements of what we’re going to cover:

```clj (def config-file-name “config.json”)

(defn load-config-file [] (–> config-file-name

  (io/reader)
  (json/read :key-fn keyword)))

```

A First Attempt, Sans Coercion

Now what we want to do is to load our config from JSON, enforcing our Config schema. Here’s a first cut—and where we’ll see what the problem is:

```clj (–>> (load-config-file)

 (s/validate Config))

;; Value does not match schema: {:users (not (set? ;; a-clojure.lang.PersistentVector)), :after (not (instance? ;; java.util.Date a-java.lang.String)), :format (not (#{:txt :json} ;; “txt”))} ```

Oh, snap. Literally nothing about that worked. It seemed so simple, but none of our three map entries are valid. But note that all three of the validation errors are quite similar: :users is not a set because JSON doesn’t have them, :after is not a date because JSON doesn’t have them, and :format is not symbol (belonging to the set we specified) because JSON doesn’t have them. JSON simply isn’t expressive enough to represent our config. What’s a dev to do?

Let’s Get Coercive

This is where coercion comes in—we want to automatically transform the data based on the expectations of the schema. Logically we know that this process is going to require three things: (1) a schema, (2) a specification for how to transform, and (3) the data itself. Keep that in mind as you read the following code snippet:

```clj (defn coerce-and-validate [schema matcher data] (let [coercer (coerce/coercer schema matcher)

    result  (coercer data)]
(if (s-utils/error? result)
  (throw (Exception. (format "Value does not match schema: %s"
                             (s-utils/error-val result))))
  result)))

(–>> (load-config-file)

 (coerce-and-validate Config coerce/json-coercion-matcher))

```

(Don’t worry too much about the if statement—schema.coerce/coerce doesn’t throw exceptions like schema.core/validate so I’ve built a quick recreation of that functionality to maintain parity with the first example.)

Notice that we’re now using schema.coerce/json-coercion-matcher, which gets passed to schema.coerce/coerce along with our Config schema. What we get back is a function we can apply to a piece of data to transform that data to match the schema—or return an error if it can’t find a way fulfill the transformation.

For the moment, just regard json-coercion-matcher as a magical black box of goodness (we’ll dive into matchers in the next section), but the important thing to understand is that it contains instructions for transforming data. This particular matcher is provided with the schema library, and it encapsulates several common JSON to Clojure transformations.

Now when we try to load the config:

clj ;; Value does not match schema: {:after (not (instance? java.util.Date ;; a-java.lang.String))}

This means that the json-coercion-matcher knew how to transform :users’s [Str] to a #{Str}, and :format’s Str to a keyword in the enumeration just based on the expectations laid out by our schema, and without us saying get the value at this key and transform it in that way. Awesome.

Two down, one to go.

Match Me Another

Since json-coercion-matcher didn’t magically transform that date string into a Date for us, it’s time to crack open the black box and learn how to write a matcher of our own. They’re really not that complicated. Fundamentally, a matcher is piece of code that’s handed a single node in the tree of input data and the corresponding node in the schema.

A matcher used by schema.coerce/coerce will be applied to every node in the input, resulting in one of three possible outcomes:

  1. The matcher signals that it can’t be used here, based on the schema alone. In this case it fails fast, without even looking at the input data. (e.g. if a matcher only transforms to Dates, there’s no need to run the matcher unless the schema says we’re looking for a Date.)
  2. The matcher returns transformed data—it knew how to transform the input data, so it did so.
  3. The matcher returns the input data unchanged, effectively signaling that it doesn’t know how to transform the input data to the desired format. (Technically this is a subcase of transformation where the transformation is the identity function, but logically it’s a separate case.)

Keep those cases in mind as we look at some code:

```clj (def datetime-regex #“\d{4}–\d{2}–\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z”)

(defn datetime-matcher schema (when (= Date schema)

(coerce/safe
  (fn [x]
    (if (and (string? x) (re-matches datetime-regex x))
      (read-instant-date x)
      x)))))

```

Let’s break that down:

  • We’re actually expected to return either nil—this transformation doesn’t apply to this schema node (case #1 above)—or a closure (read: function) that attempts to transform the data. The when line says: if you’re not looking for a Date, this matcher can’t help you.
  • The if line checks the input value to make sure it’s something we can transform. We check that it’s a string, and not just any string, but a string in a format that we think we can parse. There’s no sense throwing every single string at read-instant-date—we just want to try to parse the ones that look like dates. If it looks like a date, we parse it (case #2). If it doesn’t, we return it unchanged (case #3).
  • Just because a string matches our datetime-regex, that doesn’t necessary mean it can be parsed by read-instant-date3, and in these cases read-instant-date will throw; that’s what coerce/safe is there for. This handy little utility will catch any exceptions and return the original input value unchanged (i.e. “it couldn’t be transformed”, case #3).

Not too bad, right? Cool. Now we can start putting it all together.

Keep in mind that we can’t just replace the original use of json-coercion-matcher with datetime-matcher or we’ll break the other two coercion cases we already fixed, so we’ll need to combine the two matchers:

clj (def config-matcher (coerce/first-matcher [datetime-matcher coerce/json-coercion-matcher]))

coerce/first-matcher is a matcher combinator that will return a matcher which, given a sequence of matchers, will apply the first matcher that reports it matches. Keep in mind this is based on that initial, schema-only, sans-data check (case #1). Once we find a matcher that says it can produce the desired output type, we apply it and live with whatever we get back. This is sufficient for the majority of cases where you want to apply multiple matchers.

Finally, keep in mind that while I’ve named this config-matcher, there’s nothing about it that is specific to the particular Config schema that we’re using. It represents a generic set of rules about how to transform JSON (or other input) into Clojure data, and we might well apply it to all JSON our application handles.

Ok, let ‘er rip!

```clj (–>> (load-config-file)

 (coerce-and-validate Config config-matcher))

;; {:users #{“swiftonsecurity” “horse_ebooks”} ;; :after #inst “2015-01-01T00:00:00.000-00:00” ;; :format :txt} ```

Bingo! No validation errors, and all the configuration data we need with no manual data munging code.

Closing Remarks

Schema-driven transformations are super cool because they spare us from writing a whole bunch of fiddly, error-prone, repetitive code. They allow us to establish consistent data transformation rules that we can apply as narrowly or widely as we like, and provide schema-based mismatch errors that represent the end-to-end totality of data transformation, unlikely a validation at the border and another after a manual transformation step.

Keep in mind, this definitely isn’t just for config files. I think that is a useful, real-world scenario, but consider this approach any time you need to transform data to a well-structured format—a problem that nearly always arises when crossing boundaries from one data format (JSON, XML, CSV, YAML, edn, CLI params, envvars, DB data, etc.) to another. A particularly powerful case to consider is transforming web API parameters to application domain objects. That’s definitely a usage that I will be exploring more.

Thanks for reading! If you have any questions or great ideas, please feel free to leave a comment or hit me up on Twitter.


  1. In particular, while the docs discuss the basics of coercion, as well as the details of writing walkers, it wasn’t immediately obvious how to add for support a non-core type, or how to augment the existing json-coercion-matcher with custom extensions.

  2. I’m also pulling in all of the dependencies we’ll need for the rest of the post so there’s no need to fuss with that later.

  3. Especially as I’ve been fairly lax about my regex. Consider cases like "2015-99-99T00:00:00.000-00:00".

Permalink

Think About Length OR How Fold Can Do Everything, Even Count

"Leave nothing out for length, and make us think"
-- Shakespeare, Coriolanus
Act II, Scene II, Line 47

How Long is It?

I am not sure why but before I start reading a chapter or watching something I've recorded I almost always check to see how long it is.  Today's post will be using Fold to find the length of a collection.  In Dr. Hutton's excellent paper, "A tutorial on the universality and expressiveness of fold", the following definition is given for finding the length:

length :: [α] → Int
length = fold (λx n → 1 + n) 0

We see with this definition we do nothing with the current member of the collection (x above) and instead only act on the memoize (n above).


In the simple example below we will see that the string "Mike" has 4 characters in it.

At the start we have a Seed of 0 and a collection with the members M, i, k, and e.


First time Memoize has 0 in it and X has M.


Second time Memoize has 1 in it and X has i.
Third time Memoize has 2 in it and X has the letter k.
Last time Memoize has 3 and X has e.
There you have it (maybe I should have used my sister Kim name instead).  Let us see some code.

Clojure


We see in the clojure example that function we give the reduce must have two parameters, so we call the second one _ to denote that it is not used.

C#


With the C# code the lambda we give Aggregate has two values of which we give the second one the name of _ to denote not using it.

JavaScript (ECMAScript 2015)


With ECMAScript 2015 we use lodash's foldl and see that the lambda only has to have one value which is the memoize.

Fin

There you have it counting with Fold, showing that you need is Fold.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.