Transit

We are pleased to announce today the initial release of Transit.

Transit is a format and set of libraries for conveying values between applications written in different programming languages. The key objectives of Transit are to support:

  • Sending values between applications
  • written in different programming languages
  • without requiring schemas/context
    • i.e., to be self-describing at the bottom
  • with extensibility
  • and good performance
  • with reach to the browser

JSON currently dominates similar use cases, but it has a limited set of types, no extensibility, and is verbose. Actual applications of JSON are rife with ad hoc and context-dependent workarounds for these limitations, yielding coupled and fragile programs.

On the other hand, the reach of JSON is undeniable. High performance parsers are widely available. Thus Transit is specified as an encoding to and from both JSON and MessagePack, a binary JSON-like format with widely available parsers. In particular, both formats have parsers written in C for languages like Ruby and Python that reach to C for performance.

Transit supports a minimal but rich set of core types:

  • strings
  • booleans
  • integers (to 64 bits w/o truncation)
  • floats
  • nil/null
  • arrays
  • maps (with arbitrary scalar keys, not just strings)

Transit also includes a wider set of extension types:

  • timestamps
  • UUIDs
  • URIs
  • arbitrary precision integers and decimals
  • symbols, keywords, characters
  • bytes
  • sets
  • lists
  • hypermedia links
  • maps with composite keys

Transit is extensible - users can define extension types in exactly the same way as the included extension types.

The emphasis of Transit is on communication between programs, thus it prioritizes programmatic types and data structures over human readability and document orientation. That said, it does have a readable verbose mode for JSON.

Transit is self describing using tags, and encourages transmitting information using maps, which have named keys/fields which will repeat often in data. These overheads, typical of self-describing formats, are mitigated in Transit by an integrated cache code system, which replaces repetitive data with small codes. This yields not only a reduction in size, but also an increase in performance and memory utilization. Contrast this with gzipping, which, while it may reduce size on the wire, takes time and doesn't reduce the amount of text to be parsed or the number of objects generated in memory after parsing.

We are shipping an 0.8 version of the Transit spec, which has extensive documentation for implementors, as well as interoperable implementations for:

We welcome feedback and suggestions on the transit-format list.

I'd like to thank the team at Cognitect that built Transit:

  • Tim Ewald - Team Lead
  • Brenton Ashworth
  • Timothy Baldridge
  • Bobby Calderwood
  • David Chelimsky
  • Paul deGrandis
  • Benoit Fleury
  • Michael Fogus
  • Yoko Harada
  • Ben Kamphaus
  • Alex Miller
  • David Nolen
  • Russ Olsen

I hope you find Transit useful, and look forward to your feedback.

Rich

 

Permalink

July 2014 London Clojure Dojo at ThoughtWorks

When:
Tuesday, July 29, 2014 from 7:00 PM to 9:30 PM (BST)

Where:
ThoughtWorks London Office
173 High Holborn
WC1V London
United Kingdom

Hosted By:
London Clojurians

Register for this event now at :
http://www.eventbrite.com/e/july-2014-london-clojure-dojo-at-thoughtworks-tickets-12365027129?aff=rss

Event Details:

London Clojure Dojo at ThoughtWorks

The goal of the session is to help people learn to start working with Clojure through practical exercises, but we hope that more experienced developers will also come along to help form a bit of a London Clojure community. The dojo is a great place for new and experienced clojure coders to learn more. If you want to know how to run your own dojo or get an idea of what dojos are like you can read more here.

 

We hope to break up into groups for the dojo. So if you have a laptop with a working clojure environment please bring it along.

 

We’ll be discussing the meetup on the london-clojurians mailing list

 

Clojure is a JVM language that has syntactically similarities to Lisp, full integration with Java and its libraries and focuses on providing a solution to the issue of single machine concurrency.

 

Its small core makes it surprisingly easy for Java developers to pick up and it provides a powerful set of concurrency strategies and data structures designed to make immutable data easy to work with. If you went to Rich Hickey’s LJC talk about creating Clojure you’ll already know this, if not it’s well worth watching the Rich Hickey “Clojure for Java Programmers” video or Stuart Halloway “Radical Simplicity” video .


Permalink

Ninety-Nine Haskell Problems [Euler/Clojure too]

Ninety-Nine Haskell Problems

From the webpage:

These are Haskell translations of Ninety-Nine Lisp Problems, which are themselves translations of Ninety-Nine Prolog Problems.

Also listed are:

Naming isn’t the only hard problem in computer science. The webpage points out that due to gaps and use of letters, there are 88 problems and not 99.

If you want something a bit more challenging, consider the Project Euler problems. No peeking but there is a wiki with some Clojure answers, http://clojure-euler.wikispaces.com/.

Enjoy!

I first saw this in a tweet by Computer Science.

Permalink

Machine Learning in Clojure - part 2

I am trying to implement the material from the Machine Learning course on Coursera in Clojure.

My last post was about doing linear regression with 1 variable. This post will show that the same process works for multiple variables, and then explain why we represent the problem with matrices.

The only code in this post is calling the functions introduced in the last one. I also use the same examples, so post this will make a lot more sense if you read that one first.

For reference, here is the linear regression function:


(defn linear-regression [x Y a i]
(let [m (first (cl/size Y))
X (add-ones x)]
(loop [Theta (cl/zeros 1 (second (cl/size X))) i i]
(if (zero? i)
Theta
(let [ans (cl/* X (cl/t Theta))
diffs (cl/- ans Y)
dx (cl/* (cl/t diffs) X)
adjust-x (cl/* dx (/ a m))]
(recur (cl/- Theta adjust-x)
(dec i)))))))


Because the regression function works with matrices, it does not need any changes to run a regression over multiple variables.

Some Examples

In the English Premier League, a team gets 3 points for a win, and 1 point for a draw. Trying to find a relationship between wins and points gets close to the answer.

(->> (get-matrices [:win] :pts)
reg-epl
(print-results "wins->points"))

** wins->points **
A 1x2 matrix
-------------
1.24e+01 2.82e+00


When we add a second variable, the number of draws, we get close enough to ascribe the difference to rounding error.

(->> (get-matrices [:win :draw] :pts)
reg-epl
(print-results "wins+draws->points"))

** wins+draws->points **
A 1x3 matrix
-------------
-2.72e-01 3.01e+00 1.01e+00

In the last post, I asserted that scoring goals was the key to success in soccer.

(->> (get-matrices [:for] :pts)
reg-epl
(print-results "for->points"))


** for->points **
A 1x2 matrix
-------------
2.73e+00 9.81e-01

If you saw Costa Rica in the World Cup, you know that defense counts for a lot too. Looking at both goals for and against can give a broader picture.

(->> (get-matrices [:for :against] :pts)
reg-epl
(print-results "for-against->pts"))


** for-against->pts **
A 1x3 matrix
-------------
3.83e+01 7.66e-01 -4.97e-01


The league tables contain 20 fields of data, and the code works for any number of variables. Will adding more features (variables) make for a better model?

We can expand the model to include whether the goals were scored at home or away.

(->> (get-matrices [:for-h :for-a :against-h :against-a] :pts)
reg-epl
(print-results "forh-fora-againsth-againsta->pts"))


** forh-fora-againsth-againsta->pts **
A 1x5 matrix
-------------
3.81e+01 7.22e-01 8.26e-01 -5.99e-01 -4.17e-01

The statistical relationship we have found suggests that that goals scored on the road are with .1 points more than those scored at home. The difference in goals allowed is even greater; they cost .6 points at home and only .4 on the road.

Wins and draws are worth the same number of points, no matter where the game takes place, so what is going on?

In many sports there is a “home field advantage”, and this is certainly true in soccer. A team that is strong on the road is probably a really strong team, so the relationship we have found may indeed be accurate.

Adding more features indiscriminately can lead to confusion.

(->> (get-matrices [:for :against :played :gd :for-h :for-a] :pts)
reg-epl
(map *)
(print-results "kitchen sink”))

** kitchen sink **
(0.03515239958218979 0.17500425607459014 -0.22696465757628984 1.3357911841232217 0.4019689136508527 0.014497060396707949 0.1605071956778842)


When I printed out this result the first time, the parameter representing the number of games played displayed as a decimal point with no digit before or after. Multiplying each term by 1 got the numbers to appear. Weird.

The :gd stands for “goal difference” it is the difference between the number of goals that a team scores and the number they give up. Because we are also pulling for and against, this is a redundant piece of information. Pulling home and away goals for makes the combined goals-for column redundant as well.

All of the teams in the sample played the same number of games, so that variable should not have influenced the model. Looking at the values, our model says that playing a game is worth 1.3 points, and this is more important than all of the other factors combined. Adding that piece of data removed information.

Let’s look at one more model with redundant data. Lets look at goals for, against and the goal difference, which is just the difference of the two.

(->> (get-matrices [:for :against :gd] :pts)
reg-epl
(print-results "for-against-gd->pts"))

** for-against-gd->pts **
A 1x4 matrix
-------------
3.83e+01 3.45e-01 -7.57e-02 4.21e-01


points = 38.3 + 0.345 * goals-for - 0.0757 * goals-against + 0.421 * goal-difference

The first term, Theta[0] is right around 38. If a team neither scores nor allows any goals during a season, they will draw all of their matches, earning 38 points. I didn’t notice that the leading term was 38 in all of the cases that included both goals for and against until I wrote this model without the exponents.

Is this model better or worse than the one that looks at goals for and goals against, without goal difference. I can’t decide.

Why Matrices?

Each of our training examples have a series of X values, and one corresponding Y value. Our dataset contains 380 examples (20 teams * 19 seasons).
Our process is to make a guess as to the proper value for each parameter to multiply the X values by and compare the results in each case to the Y value. We use the differences between the product of our guesses, and the real life values to improve our guesses.

This could be done with a loop. With m examples and n features we could do something like

for i = 1 to m 
guess = 0
for j = 1 to n
guess = guess + X[i, j] * Theta[j]
end for j
difference[i] = guess - Y
end for i

We would need another loop to calculate the new values for Theta.

Matrices have operations defined that replace the above loops. When we multiply the X matrix by the Theta vector, for each row of X, we multiply each element by the corresponding element in Theta, and add the products together to get the first element of the result.

Matrix subtraction requires two matrices that are the same size. The result of subtraction is a new matrix that is the same size, where each element is the difference of the corresponding elements in the original matrices.

Using these two operations, we can replace the loops above with

Guess = X * Theta
Difference = Guess - Y

Clearly the notation is shorter. The other advantage is that there are matrix libraries that are able to do these operations much more efficiently than can be done with loops.

There are two more operations that our needed in the linear regression calculations. One is multiplying matrices by a single number, called a scalar. When multiplying a matrix by a number, multiply each element by that number. [1 2 3] * 3 = [3 6 9].

The other operation we perform is called a transpose. Transposing a matrix turns all of its rows into columns, and its columns into rows. In our examples, the size of X is m by n, and the size of Theta is 1 x n. We don’t have any way to multiply an m by n matrix and a 1 by n matrix, but we can multiply a m by n matrix and an n by 1 matrix. The product will be an m by 1 matrix.

In the regression function there are a couple of transposes to make the dimensions line up. That is the meaning of the cl/t expression. cl is an alias for the Clatrix matrix library.

Even though we replaced a couple of calculations that could have been done in loops with matrix calculations, we are still performing these calculations in a series of iterations. There is a technique for calculating linear regression without the iterative process called Normal Equation.

I am not going to discuss normal equation for two reasons. First, I don’t understand the mathematics. Second the process we use, Gradient Descent, can be used with other types of machine learning techniques, and normal equation cannot.

Permalink

Clojure Gazette 1.85

Clojure Gazette 1.85

Magic, Games, Shootout

Clojure Gazette

Issue 1.85 July 20, 2014


Editorial

Greetings loyal readers,

Yes, it's true. I have the best newsletter readers on the planet. You all are awesome and I'm continually blown away with the support you show me each time I ask. :)

And now, I ask for something again ;)

I've been writing the Clojure Gazette for over two years now. It's brought me joy. It's brought me pain. But I keep at it and I have grown to love it. It started on the free plan of the newsletter service but graduated some time ago to the not-free plan. I've paid for everything myself and continued to build the list.

Although the growth has been steady in the last year, the slope was way lower that I would have liked. I put any marketing efforts on the back burner, not really worrying about it too much. But now, I can't ignore it.

I have recently reworked the Clojure Gazette landing page to be a little bit snazzier and I'm starting to link to it more regularly from my blog. But I need your help.

There are two ways you can help: one is to send me a testimonial. Just a short blurb about how your life is better because you receive the Gazette and a profile-style picture. Be honest! Why do you read the Gazette? I can't promise I'll put yours on the landing page, but I will send you a nice thank you. A lot of people have sent me great emails over the years, but I never asked permission to publish them so I don't feel right putting them on the page.

The second way to help is by sharing it with your friends. It came as a surprise to me, but some people have never heard of the Clojure Gazette! If you know one of these people, it is your duty to inform them of its existence ;) And if you have any kind of internet presence (blog, Twitter, podcast, etc), I would greatly appreciate links, tweets, likes, reblogs, shares, plugs, +1s, hearts, favorites, thumbs up, high fives, pins, retweets, blogs, posts, or whatever your respective social networking technology uses. In return, I will issue fives, high fives, and low fives (unless you're too slow) to any and all at events that I frequent.

Thanks in advance!

Sincerely,
Eric Normand <ericwnormand@gmail.com>

PS Please follow the DISCUSS links and comment!

Of Mages and Grimoires


Reid McKenzie has created new Clojure documentation site called Grimoire and this post discusses the reasoning behind it. It looks like a cool project to make the documentation very useful for quick access (faster than ClojureDocs). It also has lots of examples and the collection is growing. I'm glad someone is working on this. Check out, as an example, the page for defprotocol. It includes, in one place, the docstring, examples, and source. I'd love to see hyperlinks in the code to their respective definitions. DISCUSS

Clojure web server shootout


Charts and graphs comparing the performance of different Clojure web server options. I'm surprised to see Jetty Ring fare so poorly. A great service to the community. It's a collaborative repo and they are accepting benchmarks. DISCUSS

Brute - Entity Component System for Clojure


I have a confession to make: ever since I was young, I've wanted to program games. So, far from bein an unbiased link curator, I do have a softspot for game-related programming posts. This one is an Entity Component System, which is a pattern for composing game objects instead of writing them from scratch each time or using inheritance. And now it works in ClojureScriptDISCUSS


Room Key Case Study


I almost didn't link to this because it was not very technical, but then I thought that it was such a good document on so many other levels that it was worth sharing. It is well designed, well written, and well scoped. It would make a great post to email to a project manager to convince them to give Clojure and Datomic a try. DISCUSS

ki


If game programming was one of my first programming loves, programming languages have to be second. And that love has been much more requited :) ki is an interesting language: it's a sweet.js macro that expands to Javascript, but inside the macro, it's a Lisp. The compiler can run in your browser. It uses mori for its persistent data structures. Very interesting! DISCUSS

Building Single Page Apps with Reagent


Om is all-the-rage, but there are alternative React libraries out there. Reagent is one of them and this post begins with a discussion of the advantages of Reagent over Om, then goes on to show the code and what kinds of things Reagent is good at. DISCUSS

Elm – functional reactive dreams + missile command


A smart, in-depth look at Elm, which is a great new language that's worth watching. DISCUSS

nginx-clojure


A Ring adapter embedded in nginx? Wowee, that's fast! (It's currently winning the Clojure web server shootout.) DISCUSS
Copyright © 2014 LispCast, All rights reserved.


unsubscribe from this list    change email address 

Permalink

A Case for Luminus

First and foremost - frameworks. I view frameworks suspiciously and prefer to build my systems out of a core set of libraries and allow it to evolve and breathe a little without the artificial constraints of a framework. There is a time and place for frameworks, but that time and place never seems to be here and now were my work is concerned. This may be one of the many reasons I find Clojure so enjoyable.

Luminus is a "Clojure web framework". It says so right on the the tagline of the Luminus homepage. This single line has meant I have hesitated diving into Luminus since I starting breaking the back of my Clojure dabbling. I wanted to avoid it until I felt it useful or necessary to invest my time in it, a framework, after-all, needs to be understood as a whole unlike its modular library counterparts.

So my first foray into building web apps in Clojure meant building them out from scratch using a foundation of Ring, Compojure, Hiccup - the usual suspects. Data access meant looking at a few possibilities , assessing each one and making a decision. Various authentication libraries needed thorough inspections. This, as you can expect, leads to a tremendous amount of yak-shaving and time spent satisfying curiosity.

So I had a look at Luminus. I wanted to see what decisions a framework had made and possibly steal some of the better ideas. I used Luminus on my next project. I didn't steal the good ideas for my own work, I didn't need to - the image I have in my head when someone mentions framework is simply not what Luminus is.

In its most basic form a framework is a collection of 3rd party and bespoke libraries tied together with some code that abstracts the library specifics into a more cohesive whole. Luminus doesn't really have this. There is no bespoke modules or libraries, there is no abstraction. Clojars has no luminus library. It does have a Leiningen template that gives you a decent starting point but even that isn't even a heavily opinionated starting point and can be customised to suit your own needs.

So what is Luminus? IMHO Luminus is what you'd end up with if you wanted to personally build a curated list of recommended libraries to build web applications in Clojure. It's what would happen if you bothered to document that process and offer alternatives. In fact while the curated set of libraries makes for a productive development process with reduced decision anxiety it's the documentation that launches it out of the park. Rather than "Do it like this", the documentation offers recommendations and guides for using alternative libraries such as migrations, data access, HTML templating and authentication. Luminus gives you enough to get started and generously gives you a foot up to the next level.

At no point in time does Luminus hide anything from you, it wears its lineage as a badge of honour and won't get offended if you decide it's wrong in places.

I can respect that.

Permalink

How to Make Your Open Source Project Really Awesome, Part 2

TL;DR

This is a continuation to How to Make Your Open Source Project Really Awesome.

In addition to the things recommended in the first part, do the following:

  • Talk to your biggest users
  • Make it easy to contribute
  • Publish releases often

How To Make Your Open Source Project Really Awesome, Part 2

As ClojureWerkz approaches its 3rd birthday, it’s time to expand our (relatively) popular blog post from early 2013, How to Make Your Open Source Project Really Awesome.

The focus of this part is on how to help your users help you, the maintainer. Very few successful open source projects are one man shows. The more people contribute, the better the project can get and less time it will take on your end to maintain it well.

Talk to Your Biggest Users

If your project is reasonably licensed and you maintain it well, eventually there will be people using it in commercial software. Some of them will be one man shops, others are giant public corporations. Most will lie in-between. It’s still possible to identify some key users: those that really push your project to the limits. Their engineers will be filing the most issues, will turn up on your mailing list more often than others, etc.

You need to identify those users and talk to them. Ask them what they do and don’t like, what they find missing, why they decided to use your project over an alternative.

Not only this kind of feedback is the best roadmap for your project, you will build some connections along the way. With ClojureWerkz, we are very fortunate to have commercial users among the companies we admire. Not only that, we now know some of their engineering team members. It’s never a bad position to be in.

Make It Easy to Contribute

We’ve touched on this in the first part. Time to expand on this key subject a little bit.

Identify Low Hanging Fruit Issues

Most contributors start with small issues: contribute a documentation fix here, a tiny feature there, a few extra failing test cases for this issue. Few of your users will jump in and contribute a major feature from the get-go. There are fairly objective reasons for this:

  • It takes time to get familiar with a codebase
  • People are more confident contributing to the projects where they “know” one of the maintainers
  • Contributing major features requires having some experience with the project

All of this takes time. Making it painless for your users to become contributors is perhaps the most important thing you can do. Except for the smallest projects, it usually separates the projects that will die from those that will march on even if you, the author, completely lose interest or ability to work on it.

Here’s a trick we’ve been trying with some of our projects (e.g. Elastisch): label GitHub issues with low-hanging fruit. When someone asks how she can help you, point the person to this label. We’ve seen this work very well for possibly the most widely used Clojure project out there: Leiningen.

At the time of writing, Leiningen has 217 contributors, which is not a small number by any stretch (the most popular ClojureWerkz projects have something between 25 and 40).

Document Development Setup

Many projects have some kind of development setup that needs to be performed before you can work on it. It can be a database of some kind running locally, or an env variable set to a particular value. For example, testing the Elastisch requires setting an env variable that tells the test suite what’s the name of the local ElasticSearch cluster should be.

On top of that, there can be multiple ways to run the tests, multiple test suites, OS-specific hacks necessary, VM/compiler version requirements, etc. Document as much of this process as you can. It should be dead obvious how to set things up for development.

If you have a specific VCS workflow (e.g. development happens on the branch devel and master is only used for stable releases), document this as well. Add a CONTRIBUTING.md file to the repo to make this more visible.

Not doing so will result in frustration for potential contributors. Don’t know about you but I’m not very motivated to contribute to a project that makes it frustrating.

Don’t Be an A-hole

This is a subject of a blog post of its own but be a decent person on the Internet (yes, it’s possible!). Be respectful. Point out various issues in contributed pull request in a reasonably polite way.

Nobody ever will contribute to a project maintained by an a-hole.

Publish Releases Often

This applies to both bug fix releases and development milestone versions. For ClojureWerkz projects, we often publish a new bug fix release when only 1 or 2 bugs were fixed. It’s easy for us to do and may save some pointless work for the users affected by the bugs.

With Leiningen, it takes a couple of commits (version changes) and 2 lines in the terminal to push a new release:

1
2
lein do pom, jar
scp push target/[jar] clojars@clojars.org:

With Leiningen 2.4, it can be as easy as a single command, lein release.

It is also an important thing to do for development milestones. Added a couple of neat features to your project? Publish a new beta release. There will be users who were dying to get their hands on this feature, so much so that they’re willing to run a beta in production just to not spend days reinventing the wheel.

As always, state release stability clearly in the change log and release notes.

This helps avoid another common mistake (applicable primarily to JVM languages): forcing your users to use snapshot releases. Snapshot releases make repeatable builds pretty much impossible and some tools (e.g. Leiningen) won’t allow non-snapshot releases depend on snapshot ones.

Give Credit Where Due

It’s always a good idea to give your contributors some credit by mentioning them in the change log. Even better, have a “Thank You, Contributors” section in your release announcement. Few things are as motivating as getting recognized for your work.

Final Thoughts

Just like it is not terribly hard to make your project approachable, it is also not that hard to make it contributor-friendly. Having a healthy number of contributors (even if it’s just 2-4) ensures that the project will live on even if you no longer have the time (or interest) maintain it. For example, I have a project that I haven’t contributed any code to since 2008. It is still being used and improved by other people.

Maybe more importantly, making your project contributor-friendly will introduce you to a lot of people and may open a lot of doors to you in the software industry, at least in engineering.

About the Author

Michael on behalf of the ClojureWerkz team.

Permalink

Testing, Continuously

This is the fourth post in a series about my current Clojure workflow.

In my last post, I laid out a workflow based on TDD but with the addition of “literate programming” (writing prose interleaved with code) and experimentation at the REPL. Here I dive a bit deeper into my test setup.

Even before I started developing in Clojure full time, I discovered that creating a configuration that provides near-instant test feedback made me more efficient as a developer. I expect to be able to run all relevant tests every time I hit the “Save” button on any file in my project, and to see the results within “a few” (preferrably, less than 3-4) seconds. Some examples of systems which provide this are:

  1. Ruby-based Guard, commonly used in the Rails community;
  2. Conttest, a language-agnostic Python-based test runner I wrote;
  3. in the Clojure sphere:
    1. Midje, using the :autotest option;
    2. Expectations, using the autoexpect plugin for Leiningen;
    3. Speclj
    4. clojure.test, with quickie (and possibly other) plugins
  4. ….

I used to use Expectations; now I use Midje since I like its rich DSL and the ability to develop functionality bottom-up or top-down.

In Midje, using the lein midje plugin, the following will run all tests in a project, and then re-run them more or less instantly whenever you save a file with changes:

$ lein midje :autotest

Since I run this so often, I define a Leiningen alias as follows:

$ lein autotest

by updating the project file project.clj as follows:

:aliases {"autotest" ["midje" ":autotest"]}

Since the Midje dependency is only needed during development, and not in production, it is added to the :dev profile in the project file only:

:profiles {:dev {:dependencies [[midje "1.5.1"]]}}

I also add the Midje plugin for Leingingen to ~/.lein/profiles.clj:

{:user {:plugins [[lein-midje "3.1.1"]]}}

Running your tests hundreds of times per day not only reduces debugging time (you generally can figure out exactly what you broke much easier when the deltas to the code since the last successful test are small), but they also help build knowledge of what parts of the code run slowly. If the tests start taking longer than a few seconds to run, I like to give some thought to what could be improved – either I am focusing my testing effort at too high of a level (e.g. hitting the database, starting/stopping subprocesses, etc.) or I have made some questionable assumptions about performance somewhere.

Sometimes, however, despite best efforts, the tests still take longer than a few seconds to run. At this point I start annotating tests with metadata indicating that those tests should be skipped during autotesting (I still run the full test suite before committing (usually) or pushing to master (always)). In Midje, this is done with the :slow tag as follows:

(fact "I'm a slow test." :slow
   (Thread/sleep 1000)
   (+ 1 1) => 2)

Updating our autotest alias skips the slow tests:

:aliases {"autotest" ["midje" ":autotest" ":filter" "-slow"]}

While I’m working on a slow test (or its associated feature), I omit the :slow tag until I’m done with the feature in question. In this way, I can continue to get the most feedback in real time as possible – which helps me develop quality code efficiently.

A note about Emacs integration: While there is a Midje plugin which annotates one’s working buffer with Midje test results, I prefer to run lein autotest in an Emacs shell window, since that allows me to add printlns, SpyScope debugging, etc.

In the next post, we’ll switch gears and talk about literate programming with Marginalia.

Permalink

A Worfklow: TDD, RDD and DDD

This is the third post in a series about my current Clojure workflow.

Having discussed my Emacs setup for Clojure, I now present my “ideal” workflow, in which I supplement traditional TDD with literate programming and REPL experimentation.

First, some questions:

  1. How do you preserve the ability to make improvements without fear of breaking things?
  2. What process best facilitates careful thinking about design and implementation?
  3. How do you communicate your intentions to future maintainers (including future versions of yourself)?
  4. How do you experiment with potential approaches and solve low-level tactical problems as quickly as possible?
  5. Since "simplicity is a prerequisite for reliability" (Dijkstra), how do you arrive at simple designs and implementations?

The answer to (a) is, of course, by having good tests; and the best way I know of to have maintain good tests is by writing test code along with, and slightly ahead of, the production code (test-driven development, a.k.a. TDD). However, my experience with TDD is that it doesn’t always help much with the other points on the list (though it helps a bit with (b), (c) and (e)). In particular, Rich Hickey points out that TDD is not a substitute for thinking about the problem at hand.

As an aid for thinking, I find writing to be invaluable, so a minimal sort of literate programming has increasingly become a part of my workflow, at least for hard problems.

The Workflow

Now for the workflow proper. Given the following tools:

then my workflow, in its Platonic Form, is:

  1. Is the path forward clear enough to write the next failing test? If not, go to step 2. If it is, go to step 3.
  2. Think and write (see below) about the problem. Go to 1.
  3. Write the next failing test. This test, when made to pass, should represent the smallest “natural” increase of functionality.
  4. Is it clear how to make the test pass? If not, go to step 5. If it is, write the simplest “production” code which makes all tests pass. Go to 6.
  5. Think, write, and conduct REPL experiments. Go to 4.
  6. Is the code as clean, clear, and simple as possible? If so, go to step 7. If not, refactor, continuously making sure tests continue to pass with every change. Go to 6.
  7. Review the Marginalia docs. Is the intent of the code clear? If so, go to step 1. If not, write some more. Go to 7.

“Writing” in each case above refers to updating comments and docstrings, as described in a subsequent post on Literate Programming.

Here are the above steps as a flow chart: Workflow, as a flow chart

The workflow presented above is a somewhat idealized version of what I actually manage to pull off during any given coding session. It is essentially the red-green-refactor of traditional test-driven development, with the explicit addition of REPL experimentation (“REPL-driven development,” or RDD) and continuous writing of documentation (“documentation-driven development,” or DDD) as a way of steering the effort and clarifying the approach.

The utility of the REPL needs no elaboration to Clojure enthusiasts and I won’t belabor the point here. Furthermore, a lot has been written about test-first development and its advantages or drawbacks. At the moment, the practice seems to be particularly controversial in the Rails community. I don’t want to go too deep into the pros and cons of TDD other than to say once again that the practice has saved my bacon so many times that I try to minimize the amount of code I write that doesn’t begin life as a response to a failing test.

What I want to emphasize here is how writing and the use of the REPL complement TDD. These three ingredients cover all the bases (a)-(e), above. While I’ve been combining unit tests and the REPL for some time, the emphasis on writing is new to me, and I am excited about it. Much more than coding by itself, I find that writing things down and building small narratives of code and prose together forces me to do the thinking I need in order to write the best code I can.

Always beginning again

While I don’t always follow each of the above steps to the letter, the harder the problem, the more closely I will tend to follow this plan, with one further modification: I am willing to wipe the slate clean and begin again if new understanding shows that the current path is unworkable, or leads to unneeded complexity.

The next few posts attack specifics about testing and writing, presenting what I personally have found most effective (so far), and elaborating on helpful aspects of each.

How does your preferred workflow differ from the above?

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.