Elastisch 2.1.0-beta9 is released

TL;DR

Elastisch is a battle tested, small but feature rich and well documented Clojure client for ElasticSearch. It supports virtually every Elastic Search feature and has solid documentation.

2.1.0-beta1 is a preview release of Elastisch 2.1 which introduces a minor feature.

Changes between Elastisch 2.1.0-beta8 and 2.1.0-beta9

Ability to Specify Aliases In index.create-template

clojurewerkz.elastisch.rest.index.create-template now supports the :aliases option:

1
2
3
(require '[clojurewerkz.elastisch.rest.index :as idx])

(idx/create-template conn "accounts" {:template "account*" :settings {:index {:refresh_interval "60s"}} :aliases {:account-alias {}}})

Contributed by Jeffrey Erikson.

Changes between Elastisch 2.1.0-beta7 and 2.1.0-beta8

clj-http Update

clj-http dependency has been upgraded to version 1.0.x.

Allow Retry On Conflict Option

Updates and upserts now allow the retry-on-conflict option to be set. This helps to work around Elasticsearch version conflicts.

GH issue: #119.

Contributed by Michael Nussbaum (Braintree).

Changes between Elastisch 2.1.0-beta6 and 2.1.0-beta7

REST API Bulk Indexing Filters Out Operation Keys

clojurewerkz.elastisch.rest.bulk/bulk-index now filters out all operation/option keys so that they don’t get stored in the document body.

GH issue: #116.

Contributed by Michael Nussbaum (Braintree).

Full Change Log

Elastisch change log is available on GitHub.

Thank You, Contributors

Kudos to Michael Nussbaum and Jeffrey Erikson for contributing to this release.

Elastisch is a ClojureWerkz Project

Elastisch is part of the group of libraries known as ClojureWerkz, together with

  • Langohr, a Clojure client for RabbitMQ that embraces the AMQP 0.9.1 model
  • Monger, a Clojure MongoDB client for a more civilized age
  • Cassaforte, a Clojure Cassandra client
  • Titanium, a Clojure graph library
  • Neocons, a client for the Neo4J REST API
  • Welle, a Riak client with batteries included
  • Quartzite, a powerful scheduling library

and several others. If you like Elastisch, you may also like our other projects.

Let us know what you think on Twitter or on the Clojure mailing list.

About the Author

Michael on behalf of the ClojureWerkz Team

Permalink

Meltdown 1.1.0 is released

TL;DR

Meltdown is a Clojure interface to Reactor, an asynchronous programming, event passing and stream processing toolkit for the JVM.

1.1.0 is a minor release that updates Reactor to the most recent point release.

Changes between 1.0.0 and 1.1.0

Reactor Update

Reactor is updated to 1.1.x.

Change log

Meltodwn change log is available on GitHub.

Meltdown is a ClojureWerkz Project

Meltdown is part of the group of libraries known as ClojureWerkz, together with

  • Langohr, a Clojure client for RabbitMQ that embraces the AMQP 0.9.1 model
  • Elastisch, a Clojure client for ElasticSearch
  • Monger, a Clojure MongoDB client for a more civilized age
  • Cassaforte, a Clojure Cassandra client
  • Titanium, a Clojure graph library
  • Neocons, a client for the Neo4J REST API
  • Quartzite, a powerful scheduling library

and several others. If you like Meltdown, you may also like our other projects.

Let us know what you think on Twitter or on the Clojure mailing list.

About the Author

Michael on behalf of the ClojureWerkz Team

Permalink

Analysis of the State of Clojure and ClojureScript Survey 2014

Yesterday, we posted the raw results from the 2014 State of Clojure and ClojureScript survey, where you can find not only the raw results but the methodology and other details of how it was conducted.  I want to first thank Chas Emerick for having launched the survey back in 2010 and repeating it through 2013, and for reaching out to Alex to run it this year when he could not.  

As always, the purpose of the survey is to shed some light on how and for what Clojure and ClojureScript are being adopted, what is going well and what could stand improvement.   

What's the overview?

We'll look at the individual questions below, but there are some demonstrable trends we can tease out of these responses.  

  1. Clojure (and ClojureScript) are seeing increasing use for commercial purposes, with noticeable growth on all measures where this survey tracks such things.  From use on commercial products and services, to "I use it at work", we're seeing strong positive movement.
  2. ClojureScript is coming along for the ride - even though it does not seem to have a substantial independent identity separate from Clojure, it is also seeing strong growth in commercial application.
  3. The community is adding new users faster each year, which could imply accelerating growth (though remember that this is not a scientific survey).

Let's look at the questions individually, first from the Clojure survey:

How would you characterize your use of Clojure/ClojureScript/ClojureCLR today?

The percentage of respondents using Clojure at work has nearly doubled since the 2012 survey (38% to 65%), which is the big news here.  This would seem to comport with the changes we see in the domains question, and a continued sign of robust commercial adoption of the platform.

In which domains are you applying Clojure/ClojureScript/ClojureCLR?

Web development is still the top dog, and that helps explain the continued increase in usage of ClojureScript as well.  What is significant is the jump in respondents working on commercial products and services (jumping from the low 20s to the low-to-mid 30s), while NoSQL and math/data analysis took a small tumble, essentially reversing positions with commercial development. Network programming is the only other thing to make a substantial move (dropping down about 10%).  Really the takeway is that commercial development is gaining  steadily, demonstrating a continued growth of Clojure in commercial settings, with a quite dramatic increase from 2012 (12% and 14%, respectively).

How long have you been using Clojure?

While the answer distribution has remained largely the same, the relative growth of the "Months" response (moving up a slot) matches up with other metrics, like Kovas Boguta's post analyzing GitHub metrics, to show a continued picture of accelerating growth in the development community.   

Do you use Clojure, ClojureScript, or ClojureCLR?

The JVM platform clearly dominates, which is no shock.  There is no cross-over in the responses between Clojure and ClojureCLR. However, a quite impressive 54.9% of respondents are also using ClojureScript, which is a measurable increase since 2013, though there is still no significant sign in the survey results of a ClojureScript-only userbase.  This seems to imply a continuation of the theme from last time: ClojureScript is adopted (in growing numbers) by existing Clojure developers, not as an independent entity.

We have not previously had ClojureCLR explicitly included in this survey. The recent release of Arcadia (ClojureCLR + Unity gaming engine) may have spurred some recent interest in the ClojureCLR platform. Thanks as always to David Miller's tireless efforts in this area. 

What is your *primary* Clojure/Script/CLR dev environment?

Cursive (a Clojure IDE built on IntelliJ) is the big winner, jumping dramatically to second place.  Interesting that Light Table saw absolute growth in both respondents and percentage, but still fell a spot due to Cursive's massive growth. While Emacs continues to dominate, it is great to see a vibrant collection of options here, to suit every developer's and team's needs.

Which versions of Clojure do you currently use in production or development?

Great to see that 1.6 dominates and it would seem that everyone is able to keep up with the new releases.  A full 18% are using the 1.7 alphas in production or development already.

What version of the JDK do you target?

These answers comport with other survey results recently released, showing a rapid uptake of 1.8 at 49% of respondents.  1.7 is still the most common platform at 70%, and 1.6 is slowly fading at only 14%. Last year, 1.6 was still 19% of the sample, while 1.8 was a mere 5%.  

What tools do you use to compile/package/deploy/release your Clojure projects?
 
Leiningen would now appear to be ubiquitous, at a whopping 98% of respondents using it (up from 75% last year).  There isn't a significant change anywhere else.  

What has been most frustrating for you in your use of Clojure?

There has been remarkably little motion in this list over the years.  Staffing concerns, which jumped all the way to #2 last year, fell a spot this year, falling behind documentation/tutorials.  It is interesting to note that #4, finding editing environments, remains steady even though there has been dramatic shifts and growth within the editor responses.  Otherwise, hard to see any new trends here. Congratulations to everyone for "unpleasant community interactions" continuing to come in dead last.  

Next, let's look at the ClojureScript survey

How would you characterize your use of ClojureScript today?

Once again, we see a dramatic jump in usage at work - from 28% to 49% in just the last year. Serious hobby also climbed roughly 20%.  It would appear that the rising tide floats all boats, as the entire Clojure ecosystem is seeing growth in commercial development use.  

Which JavaScript environments do you target?

Browsers are now ubiquitous, being targeted by 97% of the community, with everything else on the list 

Which ClojureScript REPL do you use most often?

Chas was quite distressed last year to note that more than a fourth of the respondents used no REPL at all.  This year, that number is now almost a third. On the other hand, Austin took a major jump all the way to #2 at 22%, probably due to his commitment to it after last year's survey.  Light Table also came from literally nowhere (not named on last year's survey) to occupy the third spot.  In fact, it wasn't even included in the original responses until a bunch of people requested it be added, so it might be under-represented in the list. So even though even more people aren't using a REPL at all, the options seem to be growing.

What has been most frustrating for you in your use of CLJS?

Through the change in the question style, we can get a better picture of the real answers here.  The difficulty of using a REPL jumps from 14% in 2013 to a whopping 68% this year, while debugging generated JavaScript rises from 14% to 43%.  This is a much better window into how much pain those two items cause the community.  It is impossible to tell because of this change in survey methodology, but the addition of CLJS source map support has most likely made that issue less difficult for many.

The Takeaway

While not a scientific survey, with five years of data it is really possible to spot trends.  Clojure has clearly transitioned from exploratory status to a viable, sustainable platform for development at work.  And as the community continues to add new users at an accelerating pace, we should only expect that trend to continue. 

Our thanks to everyone who took the time to fill out the survey - this kind of sharing is invaluable to the wider community. 

Permalink

Welcome to The Dark Side: Switching to Emacs

I have to start this post by saying I’ve been a dogmatic Vim partisan since the 1990’s, when I started using vi on the Solaris and Irix boxen I had access to, and then on my own machines when I got started with Linux in 1994.

I flamed against Emacs on Usenet, called it all the epithets (Escape Meta Alt Ctrl Delete, Eight Megs And Constantly Swapping (8 megs was a lot then), Eventually Mangles All Computer Storage)… I couldn’t stand the chord keys and lack of modality.

Even once I got heavily into Lisp I still tried to stick with Vim, or tried LightTable, or Atom, or SublimeText. But then one day I hit a wall and Emacs (plus cider-mode and slime and a few other packages) was the obvious solution. Now I’m out there evangelizing Emacs (I’m writing this post in the Markdown major mode plus some helpful minor modes) and I figure I’d offer some advice for those looking to convert to the Church of Emacs.

St. Ignucius

Primarily, this post is inspired by a request I received on Twitter:

Instead of just compiling some links in a gist, I figured it was worthy of a blog post, so my seniors in the Church of Emacs can tell me where I’m wrong in the comments. But this is based on my experience converting from Vim to Emacs, so I’ll explain what worked for me.

Emacs Prelude

Prelude is really a great way to hit the ground running. It provides a wealth of sensible default packages, fixes the color scheme, and configures your .emacs.d config directory in a way that makes it easy to configure without breaking shit.

The install instructions are here and I highly recommend it.

UPDATE: I forgot something vitally important about prelude. Prelude comes with guru-mode enabled by default, which disables your arrow keys and prods you to use Emacs default navigation commands instead (i.e. C-p for up, C-n for down, C-b for left, C-f for right). These commands are worth knowing, but I felt like I was being trolled when my arrow keys just told me what chord combination to use instead. (As an aside, Thoughtbot’s dotfiles do the same thing with vim).

So you have two options: one is to M-x guru-mode to toggle it every session. The more permanent solution is to add the following to your config (if you’re using Prelude, it should go in ~/.emacs.d/personal/preload/user.el):

(setq prelude-guru nil)

Just my personal preference, but something I found really annoying when I got started.

As far as all those useful navigation and editing commands, emacs (naturally) has a built-in tutorial accessible from M-x help-with-tutorial or just C-h t.

UPDATE TO THE UPDATE:

Bozhidar Batsov (the author of Prelude) pointed out in this comment that the current default behavior is to warn when arrow keys are used, not to disable them.

I hadn’t noticed the change, which came in with this commit.

You can find the configuration options for guru-mode in the README here.

Emacs for Mac OS X

I really like using the packaged app version of Emacs available from http://emacsformacosx.com/. It works great with Prelude, and doesn’t include the cruft that Aquamacs tacks on to make it more Mac-ish.

You get a nice packaged Emacs.app that follows OS X conventions, but is really just straight GNU Emacs.

evil-mode

So, this is a touchy subject for me. When I first switched I used evil-mode to get my familiar Vim keybindings in emacs, but I actually found it made it more difficult to dive into emacs. Evil-mode is actually impressively complete when it comes to imposing vim functionality over top of emacs, but there are still times when you needto hit C-x k or M-x something-mode and the cognitive dissonance of switching between them was just overwhelming.

So I’d forego evil-mode and just keep Emacs Wiki open in your browser for the first few days. It doesn’t take that long to dive in head-first.

Projectile

It ships with Prelude, so not a major headline, but it does help to keep your projects organized and navigate files.

On Lisp

Since this is really about Clojure development environments, I might as well dive into the inherent Lispiness of emacs. The extension language is a Lisp dialect, and very easy to learn and use. Emacs is so extensible that one of the running jokes is that it’s a great operating system in need of a decent text editor. I’ll get to that later.

cider-mode

Interacting with Clojure is amazing with cider. You get an in-editor REPL, inline code evaluation, documentation lookup, a scratch buffer for arbitrary code evaluation, and a dozen other features. LightTable is nice with its InstaRepl but emacs/cider is the real deal. You cannot wish for a better Clojure dev environment… and the community agrees:

cider-jack-in connects to a lein repl :headless instance, and cider-mode gives you inline evaluation in any Clojure file. It’s amazing.

paredit and smartparens

Ever have trouble keeping your parens balanced? You’re covered. paredit is the classic solution, but a lot of folks are using smartparens instead… I’ve been using smartparens in strict mode and it’s made me a lot more disciplined about how I place my forms.

Other Languages

I’ve been using Emacs for Ruby, Javascript, Haskell, C++, and so on, and it’s been great. The only time I launch another app is when I have to deal with Java, because IntelliJ/Android Studio make life so much easier. But most of that is all the ridiculous build ceremony for Java, so that’s neither here nor there.

EmacsOS

That joke about Emacs being an operating system? Not such a joke.

My favorite Twitter client right now is Emacs twittering-mode. There’s Gnus for Usenet and Email, and Emacs 24.4 just came out with an improved in-editor web browser called eww.

Emacs is a deep, deep rabbit hole. The only way in is head first. But there’s so much you can do in here, and it’s a staggeringly powerful environment.

Welcome to the dark side. We have macros.

Dark Side

Permalink

Clojure Data Science: Sent Counts and Aggregates


This is Part 3 of a series of blog posts called Clojure Data Science. Check out the previous post if you missed it.


For this post, we want to generate some summaries of our data by doing aggregate queries. We won’t yet be pulling in tools like Apache Storm into the mix, since we can accomplish this through Datomic queries. We will also talk about trade-offs of running aggregate queries on large datasets and devise a way to save our data back to Datomic.

Updating dependencies

It has been some time since we worked on autodjinn. Libraries move fast in the Clojure ecosystem, and we want to make sure that we’re developing against the most recent versions of each dependency. Before we begin making changes, let’s update everything. If you have already read my [Clojure Code Quality Tools]/blog/2014/09/15/clojure-code-quality-tools/) post, you’ll be familiar with the lein ancient plugin.

Below is output when I run lein ancient on the last post’s finished git tag, v0.1.1. To go back to that state, you can run git checkout v0.1.1 on the autodjinn repo.

It looks like our nomad dependency is out of date. Update the version number in project.clj to 0.7.0 and run lein ancient again to verify that it worked.

If you take a look at project.clj yourself, you may notice that our project is still on Clojure 1.5.1. lein ancient doesn’t look at the version of Clojure that we’re specifying; it assumes you have a good reason for picking the Clojure version you specify. In our case, we’d like to be on the latest stable Clojure, version 1.6.0. Update the version of Clojure in project.clj and then run your REPL. There should be no issues with using the functionality in the app that we created in previous posts. If there is, carefully read the error messages and try to find a solution before moving on.

To save on the hassle of upgrading, I have created a tag for the project after upgrading Clojure and nomad. To go to that tag in your local copy of the repo, run git checkout v0.1.2.

Datomic query refresher

If you remember back to the first post, we wrapped up by querying for entity IDs and then using Datomic’s built-in entity and touch functions to instantiate each message with all of its attributes. We had to do this because the query itself only returned a set of entity IDs:

Note that the Datomic query is made up of several parts:

  • The :find clause says what will be returned. In this case, it is the ?eid variable for each record we matched in the rest of the query.
  • The :where clause gives a condition to match. In this case, we want all ?eid where the entity has a :mail/uid fact, but we don’t care about the :mail/uid fact’s value, so we give it a wildcard with the underscore (_).

We could pass in the :mail/uid we care about, and only get one message’s entity-ID back.

Notice how the ?uid variable gets passed in with the :in clause, as the third argument to d/q?

Or we could change the query to match on other attributes:

In all these cases, we’d still get the entity IDs back because the :find clause tells Datomic to return ?eid. Typically, we pass around entity IDs and lazy-load any facts (attributes) that we need off that entity.

But, we could just as easily return other attributes from an entity as part of a query. Let’s ask for the recipients of all the emails in our system:

While it is less common to return only the value of an entity’s attribute, being able to do so will allow us to build more functionality on top of our email abstraction later.

One last thing. Take a look at the return of that query above. Remember that the results returned by a Datomic query are a set. In Clojure, sets are a collection of unique values. So we’re seeing the unique list of addresses that are in the To: field in our data. What we’re not seeing is duplicate recipient addresses. To be able to count the number of times an email address received a message, we’ll need a list with non-unique members.

Datomic creates a unique set for the values returned by a query. This is generally a great thing, since it gets around some of the issues that one can run into with JOINing in SQL. But in this case, it is not ideal for what we want to accomplish. We could try to get around the uniqueness constraint on output by returning vectors of the entity ID and the ?to address, and then mapping across the result to pull out the second item:

There’s a simpler way that we can use in the Datomic query. By keeping it inside Datomic, we can later combine this approach with more-complex queries. We can tell the Datomic query to look at other attributes when considering what the unique key is by passing the query a :with clause. By changing our query slightly to include a :with clause, we end up with the full list of recipients in our datastore:

At this point, it might be a good idea to review Datomic’s querying guide. We’ll be using some of the advanced querying features found in the later sections of that guide, most notably aggregate functions.

Sent Counts

For this feature, we want to find all the pairs of from-to addresses for each email in our datastore, and then sum up the counts for each pair. We will save all these sent counts into a new entity type in Datomic. This will allow us to ask Datomic questions like who sends you the most email, and who you send the most email to.

We start by building up the query in our REPL. Let’s start with a simpler query, to count how many emails have been sent to each email address in our data store. Note that this isn’t sufficient to answer the question above, since we won’t know who those emails came from; they could have been sent by us or by someone else, or they could have been sent to us. Later, we’ll make it work with from-to pairs that allow us to know things like who is sending email to us.

A simple way to do this would be to wrap our previous query in the frequencies function that Clojure.core provides. frequencies returns a map of items with their count from a Clojure collection.

However, we want to perform the same sort of thing in Datomic itself. To do that, we’re going to need to know about aggregate functions. Aggregate functions operate over the intermediate results of a Datomic query. Datomic provides functions like max, min, sum, count, rand (for getting a random value out of the query results), and more. With aggregates, we need to be sure to use a :with clause to ensure we aggregate over all our values.

Looking at that short list of aggregate functions I’ve named, we can see that we probably want to use the count function to count the occurance of each email address in a to field in our data. To see how aggregates work, I’ve come up with a simpler example (the only new thing to know is that Datomic’s Datalog implementation can query across Clojure collections as easily as it can against a database value, so I’ve given a simple vector-of-vectors here to describe data in the form

[database-id person-name]

When the query looks at records in the data, our :where clause gives each position in the vector an id and a name based on position in the vector.)

Let’s review what happened there. Before the count aggregate function was applied, our results looked like this:

[["Jon"] ["Jon"] ["Bob"] ["Chris"]]

So the count function just counts across the values of the variable it is passed (in our case, ?name), and by pairing it with the original ?name value, we get each name and the number of times it appears in our dataset.

It makes sense that we can do the same thing with our recipient email addresses from the previous query. Combining our previous queries with the count aggregate function, we get:

That looks like the same kind of data we were getting with the use of the frequencies function before! So now we know how to use a Datomic aggregate function to count results in our queries.

What’s next? Well, what we really want is to get results that are of the form

[from-address to-address]

and count those tuples. That way, we can differentiate between email sent to us versus email we’ve sent to others, etc. And eventually, we’d like to save those queries off as functions that we can call to compute the counts from other places in our project.

We can’t pass a tuple like [from-address to-address] to the count aggregate function in one query. The way around this is to write two queries. The inner query will return the tuples, and the outer query will return the tuple and a count of the tuple in the output data. Since the queries run on the peer, we don’t really have to worry about whether it is one query or two, just that it returns the correct data at the end.

So what would the inner query look like? Remember that the outer query will still need a field to pass to the :with clause, so we’ll probably want to pass through the entity ID.

Those tuples will be used by our outer query. However, we also need a combined value for the count to operate on. For that, we can throw in a function call in the :where clause and give it a binding at the end for Datomic to use for that new value. In this case, I’ll combine the ?from and ?to values into a PersistentVector that the count aggregate function can use. The combined query ends up looking like this:

And the output is as we expect.

Reusable functions

The next step is to turn the query above into various functions we can use to query for from-to counts later. In our data, we don’t just have recipients in the To: field, we also have CC and BCC recipients. Those fields will need their own variations of the query function, but since they will share so much functionality, we will try to compose our functions in such a way that we avoid duplicate code.

In general, when I write query functions for Datomic, I use multiple arities to always allow a database value to be passed to the query function. This can be useful, for example, when we want to query against previous (historical) values of the database, or when we want to work with a particular database value across multiple queries, to ensure our data is consistent and doesn’t change between queries.

Such a query function typically looks like this:

By taking advantage of multiple arities, we can default to not having to pass a database value into the function. But in the cases where we do need to ensure a particular database version is used, we can do that. This is a very powerful idiom that I’ve learned since I began to use Datomic, and I suggest you structure all your query functions similarly.

Now, let’s take that function that only queries for :mail/to addresses and make it more generic, with specific wrapper functions for each case where we’d want to use it:

Note that we had to change the inner query to take the attr we want to query on as a variable; this is the proper way to pass a piece of data into a query we want to run. The $ that comes first in the :in clause tells Datomic to use the second d/q argument as our dataset (the db value we pass in), and the ?attr tells it to bind the third d/q argument as the variable ?attr.

While the three variations on functions are similar, we keep the code DRY. (DRY is an acronym for Don’t Repeat Yourself.) In the long run, less code should mean less bugs and the ability to fix problems in one place.

Building complex systems by composing functions is one of the features of Clojure that I enjoy the most! And notice how we got to these finished query functions by building up functionality in our REPL: another aspect of writing systems in Clojure that I appreciate.

Querying against large data sets

Right now, our functions calculate the sent counts across all messages every time they’re called. This is fine for the small sample dataset I’ve been working with locally, but if it were to run against the 35K+ messages that are in my Gmail inbox alone (not to mention all the labels and other places my email lives…) it would take a very long time. With even bigger datasets, we can run into an additional problem: the results may not fit into memory.

When building systems with datasets big enough that they don’t fit into memory, or that may take too much time to compute to be practical, there are two general approaches that we will explore. The first is storing results as data (known as memoizing or caching the results), and the other is breaking up the work to run on distributed systems like Hadoop or Apache Storm.

For this data, we only want to avoid redoing the calculating every time we want to know the sent counts. Currently, the data in our system changes infrequently, and it’s likely that we could tell the system to recompute sent counts only after ingesting new data from Gmail. For these reasons, a reasonable solution will be to store the computed sent counts back into Datomic.

A new entity type to store our results

For all three query functions we wrote, each result is of the form:

[from-address to-address count]

Let’s add to the Datomic schema in our core.clj file to create a new :sent-count entity type with these three attributes. Note that sent counts don’t really have a unique identifier of their own; it is the combination of from -> to addresses that uniquely identifies them. However, we will leave the from and to addresses as separate fields so it is easy to use them in queries.

Add the following maps to the schema-txn vector:

You’ll have to call the update-schema function in your REPL to run the schema transaction.

Something that’s worth calling out is that we’re using a Datomic schema valueType that we haven’t seen yet in this project: db.type/ref. In most cases, you’d want to use the ref type to associate with other entities in Datomic. But we can also use it to associate with a given list of facts. Here, we give the ref type an enum of the possible values that :sent-count/type can have: to, cc, and bcc. By adding this type field to our new entities, we can either choose to look at sent counts for only one type of address, or we can sum up all the counts for a given from-to pair and get the total counts for the system.

Our next job is to add some functions to create the initial sent counts data, as well as to query for it. To keep things clean, I created a sent-counts namespace for these functions to live in. I’ve provided it below with minimal explanation, since it should look very familiar to what we’ve already done.

/src/autodjinn/sent_counts.clj

After adding in the sent_counts.clj file, running:

(sent-counts/create-sent-counts)

will populate your datastore with the sent counts computed with functions we created earlier.

Note: The sent counts don’t have any sort of unique key on them, so if you run create-sent-counts multiple times, you’ll get duplicate results. We’ll handle that another time when we need to update our data.

Wrapping up

We’ve covered a lot of material on querying Datomic. In particular, we used aggregate functions to get the counts and sums of records in our data store. Because we don’t want to run the queries all the time, we created a new entity type to store our sent counts and saved our data into it. With query functions like those found in the sent-counts namespace, we can start to ask our data questions like “In the dataset, what address was sent the most email?”

If you want to compare what you’ve done with my version, you can run git diff v0.1.3 on the autodjinn repo.

Please let me know what you think of these posts by sending me an email at contact@mattgauger.com. I’d love to hear from you!

Permalink

Some syntactic sugar for Clojure's threading macros

TL;DR: threading all the things

Introduction

The other day I was writing a Clojure let block to transform a map. It was a pretty usual Clojure pipeline of functions, a use case Clojure excels at. The pipeline included a cond, a couple of maps, some of my own functions, and finally an assoc and dissoc to “update” the input map with the result of the pipeline and delete some redundant keys.

Even though Clojure syntax is quite spare there was quite a bit of inevitable clutter in the code and it struck me the code would be cleaner and clearer if I could use the thread first (→) macro.

If you grok macros you can probably guess the rest of this post (likely you will have seen the title and probably said to yourself “Oh yeah, that’s obvious, nothing to see here” and moved along ☺ )

Threading Macros

Threading Macros - “thread first” and “thread last”

Clojure core has a family of threading macros including the “thread first” ones of →, some→, as→, and cond→, and their equivalent thread last (-») ones.

I’m not going to explain the threading macros in depth as these have been well covered already - see for example this very nice post by Debasish Ghosh (btw Debasish’s book DSLs in Action is worth your money).

Simply put: the threading macros allow a pipeline of functions to be written in visually clean and clear way — pre-empting the need to write a perhaps deep inside-to-outside functional form — by weaving the result of the previous function into the current function as either the first (“thread first”) or last (“thread last”) argument.

The example below of using “thread last” to sum the balances of a bank’s savings account has been taken from Debasish’s post . I have reworked his example slightly and added some narrative comments to make it completely clear what is going on:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
;; Debasish's example slighty reworked
;; For original see http://debasishg.blogspot.co.uk/2010/04/thrush-in-clojure.html

(def all-accounts
  [{:no 101 :name "debasish" :type 'savings :balance 100}
   {:no 102 :name "john p." :type 'checking :balance 200}
   {:no 103 :name "me" :type 'checking :balance -500}
   {:no 104 :name "you" :type 'savings :balance 750}])

(def savings-accounts-balance-sum-using-thread-last
  (
   ;; use the thread-last macro
   ->>

   ;; ... and start from the collection of all accounts
   all-accounts

   ;; ... select only the savings accounts
   (filter #(= (:type %) 'savings))

   ;; ... get the balances from all the saving accounts
   (map :balance)

   ;; ... and add up all their balances
   (apply +)))

(doall (println "savings-accounts-balance-sum-using-thread-last" savings-accounts-balance-sum-using-thread-last))

;; check the answer
(assert (= 850 savings-accounts-balance-sum-using-thread-last))

Threading Macros - what’s not to like?

Nothing really although there are limitations of course. For example to thread a map needs “thread last” and assoc requires “thread first”, but you can’t mix first and last together directly.

Although “thread first” and “thread last” cover a wide range of use cases, there are times where you have to go through hoops to incorporate code that requires the current value of the pipeline as other than the first or last argument, or maybe need to use the value multiple times in multiple subforms.

Threading Macros - using a partial

There are ways around the limitations of course and one way is to use partial with “thread first” to supply the argument as the last argument to the function.

This horrid example using “thread first” with lots of partials is bonkers but does demonstrate the point:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
;; Using partials with thread-first

(def savings-accounts-balance-sum-using-thread-first
  (
   ;; use the thread-first macro
   ->

   ;; ... and start from the collection of all accounts
   all-accounts

   ;; ... select only the savings accounts
   ((partial filter #(= (:type %) 'savings)))

   ;; ... get the balances from all the saving accounts
   ((partial map :balance))

   ;; ... and add up all their balances
   ((partial apply +))))

(doall (println "savings-accounts-balance-sum-using-thread-first" 
                savings-accounts-balance-sum-using-thread-first))

;; check the answer
(assert (= 850 savings-accounts-balance-sum-using-thread-first))

Note each partial call is the first (and only) form inside another form; the “thread first” macro will weave the input to the partial as the second value in the outer form. (Else the macro would weave the previous result into the partial declaration itself.)

Threading Macros - using an in-line function

More generally, you can always escape the confines of the first or last constraint by using an in-line function.

The following example sums the balances of all the checking accounts in deficit, applying 10% interest, to find the total owed to the bank. It uses an in-line function to apply the interest.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
;; Calculate the total balance of all the checking accounts in deficit

;; applies interest to any in deficit

(def deficit-accounts-balance-sum-using-interest-function
  (
   ;; use the thread-last macro
   ->>

   ;; ... and start from the collection of all accounts
   all-accounts

   ;; ... select only the checking accounts
   (filter #(= (:type %) 'checking))

   ;; ... select the accounts in deficit
   (filter #(> 0 (:balance %)))

   ;; add 10% interest to any in deficit
   ;; interest rate is first argument; second (last) is the deficit accounts
   ((fn [interest-rate deficit-accounts]
      (map
       (fn [deficit-account]
         (let [balance (:balance deficit-account)
               interest (* interest-rate balance)]
           (assoc deficit-account :balance (+ balance interest))))
       deficit-accounts))
       ;; interest rate is 10%
       0.1)

   ;; ... get the balances from all the deficit accounts
   (map :balance)

   ;; ... and add up all their balances to get net balance
   (apply +)))

(doall (println "deficit-accounts-balance-sum-using-interest-function" 
                deficit-accounts-balance-sum-using-interest-function))

;; check the answer
(assert (= -550.0 deficit-accounts-balance-sum-using-interest-function))

Note the in-line function declaration is the first form inside another form (for the same reason as the partials above were).

Threading Macros - capturing the result of the previous step

In the above examples the steps were calls to core functions: filter, map and apply.

But the step can be a call to a macro and the macro will be passed the current value of the form being evaluated by the threading macro.

In the example below, a simple macro show-the-argument will print the current evaluated form and return it to continue the evaluation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
;; using a simple macro to show what's passed to each step in the pipeline

(defmacro show-the-argument
  [argument]
  (doall (println argument))
  `(do
     ~argument))

(def savings-accounts-balance-sum-and-show-the-argument
  (
   ;; use the thread-last macro
   ->>

   ;; ... and start from the collection of all accounts
   all-accounts

   ;; show the argument
   (show-the-argument)

   ;; ... select only the savings accounts
   (filter #(= (:type %) 'savings))

   ;; show the argument
   (show-the-argument)

   ;; ... get the balances from all the saving accounts
   (map :balance)

   ;; show the argument
   (show-the-argument)

   ;; ... and add up all their balances
   (apply +)))

(doall (println "savings-accounts-balance-sum-and-show-the-argument" 
                savings-accounts-balance-sum-and-show-the-argument))

;; check the answer
(assert (= 850 savings-accounts-balance-sum-and-show-the-argument))

If you look at the prints, you’ll see something like the below for the output for the post filter call to show-the-argument (I’ve reformatted to aid clarity):

1
2
3
(filter
 (fn* [p1__1419#] (= (:type p1__1419#) (quote savings)))
 (show-the-argument all-accounts))

You can see how “thread last” has woven the previous call to show-the-argument (after all-accounts) into the filter form. The calls to show-the-argument will be evaluated after “thread last” has presented its evaluated form for compilation.

Threading Macros - “thread-first-let”

show-the-argument demonstrates how simple it is in a macro to grab hold of the current form being evaluated and do something with it.

Let’s do that then. The macro “thread-first-let” below takes the argument together with some body forms, and returns a new form that assigns the argument to a let gensym local called x# and evaluates the body forms in the context of the let so the body forms can use x# anywhere needed.

The macro includes a println to shows the form returned by “thread-first-let” to the compiler:

1
2
3
4
5
6
7
8
9
10
;; Using the "thread-first-let" macro create a let and assigns the current form to x#
;; and evaluates the body in the let context, with x# available in the body

(defmacro thread-first-let
  [argument & body]
  (let [let-form# `(let [~'x# ~argument]
                     (~@body))]
    (doall (println let-form#))
    `(do
       ~let-form#)))

Let’s reprise to the horrid example above where I used partials with “thread first” and use “there-first-let” instead.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
;; Using "thread-first-let" inside a "thread-first"

(def savings-accounts-balance-sum-using-thread-first-let
  (
   ;; use the thread-first macro
   ->

   ;; ... and start from the collection of all accounts
   all-accounts

   ;; ... select only the savings accounts
   (thread-first-let filter #(= (:type %) 'savings) x#)

   ;; ... get the balances from all the saving accounts
   (thread-first-let map :balance x#)

   ;; ... and add up all their balances
   (thread-first-let apply + x#)))

(doall (println "savings-accounts-balance-sum-using-thread-first-let" savings-accounts-balance-sum-using-thread-first-let))

;; check the answer
(assert (= 850 savings-accounts-balance-sum-using-thread-first-let))

One of the prints from “thread-first-let” shows the final form of the filter:

1
2
(clojure.core/let [x# all-accounts]
  (filter (fn* [p1__1431#] (= (:type p1__1431#) (quote savings))) x#))

The takeaway here is that after the thread-first-let the code is exactly the same as you would write outside of a core threading macro, and you have the freedom to use the argument (x#) wherever and whenever you need, not just once and / or “last”.

Final Words

“thread-first-let” is a very simple seven line macro allowing arbitrary code to participate in the core threading macros making the whole pipeline as clean as clear as possible.

It not hard to see how this simple idea could taken forward to define macros to support “thread-last”, or even “packaged” macros such as thread-first-map.

The more general point is how even a trivial use of macros makes for a welcome improvement in keeping the code clean and clear, and really does bring home how tractable and malleable macros make Clojure.

Permalink

Weekly Update: Talk Transcripts, Clojure Architecture, OS X Yosemite

As I have no other article to publish this week, I thought a weekly update would be in order. Last week I wrote about making relevant and interesting talks more accessible. In the course of that project, I had eleven talks transcribed so far, four more than when I announced the project last week. Not only have I received great feedback about how appreciated this is, I have also learned a lot myself while proofreading the transcripts.

With all the stuff that I have learned and that I am still learning (with a few more talks in the pipeline), there are a couple of things that I want to rethink regarding the architecture of my BirdWatch application before I continue with describing the architecture further. So let me think first before I publish the next article on the application’s architecture. No worries, I expect to have the next one out next week, or the week after that the lastest.

Thoughts from Guy Steele’s talk on Parallel Programming

The talk that got me thinking the most about the BirdWatch application’s architecture is Guy Steele’s talk about Parallel Programming. Not only does he give a great explanation of the differences between parallelism and concurrency, he also gives great insights into the use of accumulators. So what, according to him, is concurrency? Concurrency is when multiple entities, such as users or processes, compete for scarce resources. In that case, we want efficient ways of utilizing the scarce resources (CPU, memory bandwidth, network, I/O in general) so that more of the entities can be served simultaneously time on the same box or number of boxes.

Parallelism, on the other hand, is when there are vast resources and we want to allocate as many of them as possible to the same number of entities. For example we could have a CPU-bound operation, a single user and 8, 12 or however many cores. If the operation is single-threaded, we won’t be able to utilize the resources well at all.

We could, of course, split up the computation so that it runs on all the cores (maybe even on hundreds of boxes and thousands of cores), but that’s easier said than done. Which brings me to accumulators. The accumulator, as the name suggests, is where intermediate results are stored while a computation is ongoing. As Guy points out, this has served us extremely well for as long as we didn’t have to think about parallelism. If the computation happens serially in a single thread, the accumulator is great, but what do we do when we want to spawn 20 threads on a 32-core machine, or 1000 thread on 100 machines? If each of them had to work with the same accumulator, things would become messy and the accumulator would become the source of contention, with all kinds of ugly coordination and locking. That doesn’t scale at all.

Guy suggests using divide-and-conquer instead so that each process in a parallelized approach only creates a partial result which will be combined with other partial results later. He argues for MapReduce in the small in addition to MapReduce in the large. I think this makes a lot of sense. That way, the partial results are created in the map phase on a number of threads (potentially on many machines) and the reduction is where the partial results are combined into a final result.

I had been thinking along these lines for a while already when thought about moving parts of the computation in BirdWatch for previous tweets (wordcount, retweet count, reach,…) to the server side as the current approach uses way more network bandwidth than strictly necessary. I was mostly thinking about it in terms of mergeability between partial results, which implies that the merge operation between two partial results is both associative and commutative.

To explain associative, let’s say we have partial results A, B, C, D and we can merge them in any way we want, for example (A + B) + C + D or A + (B+ (C + D)) or whatever. As another example, let’s say you have a script with 100 pages in 10 stacks. It doesn’t matter in which way we build intermediate piles as long as we only merge neighboring piles so that the pile with the higher page count goes under the one with the lower page count.

Commutative means that order does not matter. For example, these are all the same: 11 + 5 + 16 + 10 and 10 + 16 + 5 + 11 are the same - both add up to 42.

After listening to Guy Steele’s talk and proof-reading the transcript, I don’t want to push the redesign any further but instead tackle it right away. I think it should be possible to divide the aggregation tasks in BirdWatch in smaller chunks that can then be combined in an associative and commutative way on the client (in ClojureScript), and I have an idea of how to do that. But let me get back into the hammock1 and ponder that idea some more. I’ll certainly let you know what I come up with.

Update to OS X Yosemite

Last weekend I updated my production laptop to Yosemite. Of course, I did a full backup with Carbon Copy Cloner first and I also made sure that my old backup laptop was still working before I embarked on the update adventure, just in case. That turned out to be a good idea.

The system upgrade did not cause any actual trouble, all went smoothly and I also think that the new design looks great. BUT IT TOOK FOREVER. The time estimation was so off, it was worse than the worst Windows installation experiences ever. Overall it probably took six or seven hours. Apparently, this had to do with homebrew, check out this article for more information2.

Luckily I had read about the upgrade taking longer in a forum somewhere, so I wasn’t too worried and just let the installer do its thing. If you plan on doing the upgrade, I think it will be worth it, but only do it when you don’t need your machine for a while, like overnight (or you follow the instructions in the article above). All works nicely on my machine now as well, even without doing anything special, just with the consequence of giving me a free afternoon because of not being able to get any work done.

Also, you can press CMD-l to get a console output, which I found much more reassuring than having the installer tell me it’ll need another 2 minutes that turn into 2 hours.

Conclusion

Okay, that’s it for today. There are some additions to the Clojure Resources project and I have also added links to the talk transcripts in there. Please check out the talk-transcripts if you haven’t done so already. I would love to hear from you if any of these transcripts helped you at all and made the content more accessible than it would have been otherwise.

Until next week, Matthias


  1. If you’ve never listened to Rich Hickey’s talk about Hammock-driven development, you really should. Now there’s also a transcript for that talk. You find the link to the video recording alongside the transcript.

  2. Thanks to @RobStuttaford for pointing this out.

Permalink

Results of 2014 State of Clojure and ClojureScript Survey

Update 10/24/14: this was the raw results, you can see the analysis of the results here.

The 2014 State of Clojure and ClojureScript Survey was open from Oct. 8-17th. The State of Clojure survey (which was applicable to all users of Clojure, ClojureScript, and ClojureCLR) had 1339 respondents. The more targeted State of ClojureScript survey had 642 respondents.

Reports with charts from some of the survey questions and links to the raw data (see "Export Data" in upper right corner of each report) are here:

Those reports contain charts for all but the grid-style and text response questions.

You can find all of the text responses (extracted and sorted) for the text questions here:

You may wish to refer back to the 2013 or 2012 survey results as well!

Permalink

Immutant 2 (The Deuce) Alpha2 Released

We're as happy as a cat getting vacuumed to announce our second alpha release of The Deuce, Immutant 2.0.0-alpha2.

Big, special thanks to all our early adopters who provided invaluable feedback on alpha1 and our incremental releases.

What is Immutant?

Immutant is an integrated suite of Clojure libraries backed by Undertow for web, HornetQ for messaging, Infinispan for caching, Quartz for scheduling, and Narayana for transactions. Applications built with Immutant can optionally be deployed to a WildFly cluster for enhanced features. Its fundamental goal is to reduce the inherent incidental complexity in real world applications.

A few highlights of The Deuce compared to the previous 1.x series:

  • It uses the Undertow web server -- it's much faster, with WebSocket support
  • The source is licensed under the Apache Software License rather than LPGL
  • It's completely functional "embedded" in your app, i.e. no app server required
  • It may be deployed to latest WildFly for extra clustering features

What's changed in this release?

  • Though not strictly part of the release, we've significantly rearranged our documentation. The "tutorials" are now called "guides", and we publish them right along with the apidoc. This gives us a "one-stop doc shop" with better, cross-referenced content.
  • We've introduced an org.immutant/transactions library to provide support for XA distributed transactions, a feature we had in Immutant 1.x, but only recently made available in The Deuce, both within WildFly and out of the container as well. The API is similar, with a few minor namespace changes, and all Immutant caches and messaging destinations are XA capable.
  • We're now exposing flexible SSL configuration options through our immutant.web.undertow namespace, allowing you to set up an HTTPS listener with some valid combination of SSLContext, KeyStore, TrustStore, KeyManagers, or TrustManagers.
  • We've made a large, breaking change to our messaging API. Namely, we've removed the connection and session abstractions, and replaced them with a single one: context. This is somewhat motivated by our implementation using the new JMS 2.0 api's.
  • Datomic can now be used with an Immutant application when inside of WildFly without having to modify the WildFly configuration or add any exclusions. Unfortunately, you still cannot use Datomic with an application that uses org.immutant/messaging outside of WildFly, due to conflicts between the HornetQ version we depend on and the version Datomic depends on. See IMMUTANT-497 for more details.
  • HornetQ is now configured via standard configuration files instead of via static Java code, allowing you to alter that configuration if need be. See the messaging guide for details.

We've also released a new version of the lein-immutant plugin (2.0.0-alpha2). You'll need to upgrade to that release if you will use alpha2 of Immutant with WildFly.

For a full list of changes, see the issue list below.

How to try it

If you're already familiar with Immutant 1.x, you should take a look at our migration guide. It's our attempt at keeping track of what we changed in the Clojure namespaces.

The guides are another good source of information, along with the rest of the apidoc.

For a working example, check out our Feature Demo application!

Get It

There is no longer any "installation" step as there was in 1.x. Simply add the relevant dependency to your project as shown on Clojars.

What's next?

We expect to release a beta fairly soon, once we ensure that everything works well with the upcoming WildFly 9 release.

Get In Touch

If you have any questions, issues, or other feedback about Immutant, you can always find us on #immutant on freenode or our mailing lists.

Issues resolved in 2.0.0-alpha2

  • [IMMUTANT-466] - App using datomic can't find javax.net.ssl.SSLException class in WildFly
  • [IMMUTANT-467] - Datomic HornetQ Conflicts with WildFly
  • [IMMUTANT-473] - web/run only works at deployment inside wildfly
  • [IMMUTANT-474] - See if we need to bring over any of the shutdown code from 1.x to use inside the container
  • [IMMUTANT-475] - Write tutorial on overriding logging settings in-container
  • [IMMUTANT-477] - Figure out how to get the web-context inside WildFly
  • [IMMUTANT-478] - Consider wrapping scheduled jobs in bound-fn
  • [IMMUTANT-479] - Get XA working in (and possibly out of) container
  • [IMMUTANT-480] - Immutant running out of a container does not handle laptop suspend gracefully
  • [IMMUTANT-481] - Expose way to set the global log level
  • [IMMUTANT-482] - Destinations with leading slashes fail to deploy in WildFly
  • [IMMUTANT-483] - Allow nil :body in ring response
  • [IMMUTANT-484] - app-uri has a trailing slash
  • [IMMUTANT-485] - The wunderboss-core jar file has a logback.xml file packaged inside of it which competes with a locally configured logback.xml
  • [IMMUTANT-487] - Enable explicit control of an embedded web server
  • [IMMUTANT-488] - Provide better SSL support than just through the Undertow.Builder
  • [IMMUTANT-489] - Re-running servlets yields IllegalStateException
  • [IMMUTANT-490] - Don't register fressian codec by default
  • [IMMUTANT-491] - at-exit handlers can fail if they refer to any wboss components
  • [IMMUTANT-492] - Expose HornetQ broker configuration options
  • [IMMUTANT-493] - Revert back to :host instead of :interface for nrepl options
  • [IMMUTANT-494] - Expose controlling the context mode to listen
  • [IMMUTANT-496] - Expose way to override HornetQ data directories
  • [IMMUTANT-498] - Replace connection and session with a single context abstraction
  • [IMMUTANT-499] - Consider renaming :client-id on context to :subscription-name
  • [IMMUTANT-500] - Throw if listen, queue, or topic is given a non-remote context
  • [IMMUTANT-501] - Running the standalone JAR with default "/" context path requires extra slash for inner routes
  • [IMMUTANT-502] - Rename caching/compare-and-swap! to swap-in!

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.