Monkeying with Clojure’s deftest

Say you have a namespace a that needs to be tested: 1 (ns a) (defn ^{:private true} foo [] 42) Using Clojure’s clojure.test libs you might think it would be as simple as the following: (ns b (:use [clojure.test :only [deftest is]])) (deftest test-foo (is (= 42 (a/foo)))) ; java.lang.IllegalStateException: var: #'a/foo is not public [...]

Permalink

Thinking in Clojure?

There's a discussion on the Clojure mailing list about how to learn to "think in Clojure" (or think in Lisp or, really, think in functional programming terms). A prominent recommendation is The Joy Of Clojure by Michael Fogus and Chris Houser, which everyone says is a great book, but here are a couple of free online books that were also recommended:

  • Structure and Interpretation of Computer Programs by Abelson, Sussman, and Sussman. It's the "entry-level subject in computer science at the Massachusetts Institute of Technology" and it uses a dialect of Lisp called Scheme, not Clojure, but it provides a good grounding in both computer science and functional programming.
  • How to Design Programs by Felleisen, Findler, Flatt and Krishnamurthi. This is another introduction to programming / computer science style book that also uses Scheme for its examples.

Enjoy!

Permalink

cfmljure - using Clojure from CFML

If you follow me on Twitter, you'll have seen me posting about Clojure quite a bit recently. I really like the simplicity and elegance of Clojure. I like the function programming style. I like that it's a dynamic scripting language. I like that it can also be compiled to JVM bytecode and used in any mixed-language project on the JVM.

About a month ago I helped someone get some Clojure code compiled and integrated into CFML, like any other Java-based project, but that set me thinking about being able to just use raw Clojure scripts from CFML without needing to go thru the compilation and deployment process. I asked on the Clojure mailing list how to load and run scripts from Java and that gave me what I needed to create a simple CFC wrapper that lets you write Clojure scripts and dynamically load and execute them from inside CFML.

That's how cfmljure was born on github! It's very early days for the project - I consider this an 'experimental' version - but I've created a Google mailing list for cfmljure and it's also listed on RIAForge). I don't expect it to be crazy popular (like FW/1 for example) but I expect to use it on production projects and thought it would be good to put out there for others to experiment with and provide feedback on.

Things on the roadmap include making it more Leiningen friendly (Leiningen is the de facto standard build tool for Clojure and it definitely makes life simpler) as well as figuring out how to access Clojure variables from CFML. I may even try to figure out how to pass CFCs into Clojure and have them be callable (Clojure can call Java but I'll probably go the route of a Clojure proxy function initially).

Have fun with it! Join the Google Group if you have questions / problems / suggestions!

Permalink

Towards generic APIs for the open world


In my last post on how Clojure protocols encourage open abstractions, I did some quick rounds between type classes in Haskell and protocols in Clojure. At the end in the section titled "Not really a type class", I mentioned about the read function of Haskell's Read type class. read takes a String and returns a type - hence it doesn't dispatch on the function argument, but rather on the return type. Clojure protocols can't do this, I am not aware of any dynamic language that can do this. Check out James Iry's insightful comment on this subject on the post.


With type classes all dispatch is static - the dispatch map is passed as a dictionary of types and inferred by the compiler. What benefit does this bring on to us ? Do we really get anything special when the language supports APIs like the read method of Haskell's Read type class ?


In this post I try to explore how type classes help design generic APIs that are open and can work seamlessly with abstractions that you implement much later in timeline than the type class itself. This is in contrast to subtype polymorphism where all subtypes are bound by the contracts that the super type exposes. In this sense subtype polymorphism is closed.


This post is inspired in part by the excellent article Generalizing APIs by Edward Z. Yang. For this post I will use Scala, my current language of choice for most of the things I do today.


My generic API


I want to implement a read API like the one in Haskell encoded in a Scala type class .. Let's make it generic in the type that it returns ..


// type class
// reads a string, returns a T
trait Read[T] {
  def read(s: String): T
}

For the open world


We can define instances of this type class by instantiating the trait as objects. Type classes are implemented in Scala using implicits. In case you're not familiar with the concept, here's what I wrote about them some time back.


// instance for Int
implicit object IntRead extends Read[Int] {
  def read(s: String) = s.toInt
}

// instance for Float
implicit object FloatRead extends Read[Float] {
  def read(s: String) = s.toFloat
}

These are very much like what you would do with type class instances in Haskell. You can even create instances for your own abstractions ..


case class Name(last: String, first: String)

object NameDescription {
  def unapply(s: String): Option[(String, String)] = {
    val a = s.split("/")
    Some((a(1), a(0)))
  }
}

// instance for Name
import NameDescription._
implicit object NameRead extends Read[Name] {
  def read(s: String) = s match {             
    case NameDescription(l, f) => Name(l, f)
    case _ => error("invalid")
  }
}

So the Read type class in Scala is generic enough to be instantiated for all kinds of abstractions. Note that unlike interfaces in Java, the polymorphism is not coupled with inheritance hierarchies. With interface, your abstraction needs to implement the interface statically, which means that the interface has to exist before you design your abstraction. With type classes, the abstractions for Int and Float existed well before we define the Read type class.


Now if we have a generic function that takes a String, we can make it return an instance of the type it is generic on.


def foo[: Read](s: String) = implicitly[Read[T]].read(s)

foo[Int]("123") // 123
foo[Float]("123.0") // 123.0
foo[Name]("debasish/ghosh") // Name("ghosh", "debasish")

Ok .. so that was our generic read API adapting violently to already existing abstractions. In this case it's exactly the Scala variant of how simple type class instances behave in Haskell. The authors of Real World Haskell uses the term open world assumption to describe this feature of the type class system.


Context for selecting the API instance


When the function foo is invoked, the compiler needs to find out the exact instance of the Read type class from the method dictionary in case of Haskell and from the list of available implicit conversions in case of Scala. For this we specify the context bound of the generic type T as T : Read. This is same as the context of the type class that we have in Haskell.  It specifies that the method foo can return any type T provided the type is an instance of the type class Read. Apart from using the context bound, in Scala you can also use view bounds to implement context of a type class. The Haskell equivalent is ..


foo :: Read a => String -> a

Irrespective of Haskell or Scala, our API becomes hugely expressive through such constraints that the static type system allows us to write. And all these constraints are checked during compile time.


Context in implementing specific instances


When defining a generic API, you can also set up a context for specific instances of the type class. Consider our read method for a List datatype in Scala. Haskell defines the instance as ..


instance Read a => Read [a] where ..

Note the context Read a following the instance keyword. This is called the context of the type class instance which says that we can read a List of a only if all individual a's also implement the Read type class. 


We do this in Scala using conditional implicits as ..


implicit def ListRead[A](implicit r: Read[A]) = 
  new Read[List[A]] {
    def read(s: String) = {
      val es = s.split(" ").toList
      es.map(r.read(_))
    }
  }

The implicit definition itself takes another implicit argument to validate during compile time that the individual elements of the List also are instances of the type class. This is similar to what the context does in case of Haskell's type class instantiation.


foo[List[Int]]("12 234 45 678") // List(12, 234, 45, 678)
foo[List[Float]]("12.0 234.0 45.0 678.0") // List(12.0, 234.0, 45.0, 678.0)
foo[List[Name]]("debasish/ghosh maulindu/chatterjee nilanjan/das")
  // List(Name("ghosh", "debasish"), Name("chatterjee", "maulindu"), Name("das", "nilanjan"))

As part of common extensions of GHCI, Haskell also provides support for overlapping instances of type classes ..


instance Read a => Read [a] where ..
instance Read [Int] where ..

In such cases although there are two possible matches for [Int], the compiler can make an unambiguous decision and select the most specific instance. With Scala, there is no such ambiguity to be resolved since Scala anyway allows multiple implementations of the same type class and it's up to the user to import the specific one into the module.


In this post I discussed the power that you get with type class based generic API design. In functional languages like Haskell, type classes are the most potent way to implement extensible APIs for the open world. Of course in object functional languages like Scala, you also have the power of subtyping, which comes good in many circumstances. It will be interesting to come up with a comparative analysis of situations when we prefer one to the other. But that's up for some other day, some other post ..

Permalink

Today in the Intertweets (Sept 2nd Ed)

  • Programming Challenge for Newbies in #Clojure and #Python too? Share your thoughts (here, via @IndianGuru) — RubyLearning has been holding monthly Ruby programming challenges for newbies. They’re thinking about expanding them to Clojure and Python too.
  • John Rose on JVM Summit is all about moving towards a functional paradigm; it seems #clojure’s guiding the way (here, via @pedroteixeira) — didn’t I say those talks were full of gold?!
  • cfmljure – calling #clojure from #coldfusion – is available to play (here, via @seancofrield) — ColdFusion is a veteran of the web scripting languages/frameworks. Now you can finally do cool stuff with it ;)

Permalink

Compojure Demystified with an example – Part 5

In this part lets write our own middleware. From part4 you will remember, “Middleware are functions that could be chained together to process a request. Middleware functions can take any number of arguments, but the spec stats that first argument should be a handler and function should return a handler. An example for middleware is [...]

Permalink

Today in the Intertweets (Sept 1st Ed)

  • Editing trees in #clojure with clojure.zip (here, via @marick) — clojure.zip is a functional traverse and modify (well, create modified copies of) trees. How to use this library is not immediately obvious, and this article explains how to use them.
  • The Joy of Clojure: Thinking the Clojure Way – Book Review (here, via @ibmkhd) — “reading “Joy of Clojure” might actually require the reader to use a dictionary, as the lexical range used by the authors is broad and might be a barrier”. I disagree, english is my third language and I didn’t need a dictionary. I even got some of the jokes! This is a horribly shallow review, actually.
  • Did you know about linear search in #clojure? (here, via @kotarak) — Short article explaining how ‘contains?’ works (which seems to confuse a lot of people) and  how ’some’ is a very useful higher-level function for finding elements in a sequence.
  • “at the moment we believe #fsharp and #clojure to be better suited to most organisations for assessing than #scala” (here, via @ptrelford) — Technology Radar is an advisory publication by the IT consultancy ThoughtWorks that periodically reviews new technologies and assesses their levels of maturity and desirability for IT customers. Well, in this issue they are continuing to favor Clojure over Scala… aaaand now it is time to run for cover ;)

Permalink

Clojure: Mocking

An introduction to clojure.test is easy, but it doesn't take long before you feel like you need a mocking framework. As far as I know, you have 3 options.

  1. Take a look at Midje. I haven't gone down this path, but it looks like the most mature option if you're looking for a sophisticated solution.

  2. Go simple. Let's take an example where you want to call a function that computes a value and sends a response to a gateway. Your first implementation looks like the code below. (destructuring explained)
    (defn withdraw [& {:keys [balance withdrawal account-number]}]
    (gateway/process {:balance (- balance withdrawal)
    :withdrawal withdrawal
    :account-number account-number}))
    No, it's not pure. That's not the point. Let's pretend that this impure function is the right design and focus on how we would test it.

    You can change the code a bit and pass in the gateway/process function as an argument. Once you've changed how the code works you can test it by passing identity as the function argument in your tests. The full example is below.
    (ns gateway)

    (defn process [m] (println m))

    (ns controller
    (:use clojure.test))

    (defn withdraw [f & {:keys [balance withdrawal account-number]}]
    (f {:balance (- balance withdrawal)
    :withdrawal withdrawal
    :account-number account-number}))

    (withdraw gateway/process :balance 100 :withdrawal 22 :account-number 4)
    ;; => {:balance 78, :withdrawal 22, :account-number 4}

    (deftest withdraw-test
    (is (= {:balance 78, :withdrawal 22, :account-number 4}
    (withdraw identity :balance 100 :withdrawal 22 :account-number 4))))

    (run-all-tests #"controller")
    If you run the previous example you will see the println output and the clojure.test output, verifying that our code is working as we expected. This simple solution of passing in your side effect function and using identity in your tests can often obviate any need for a mock.

  3. Solution 2 works well, but has the limitations that only one side-effecty function can be passed in and it's result must be used as the return value.

    Let's extend our example and say that we want to log a message if the withdrawal would cause insufficient funds. (Our gateway/process and log/write functions will simply println since this is only an example, but in production code their behavior would differ and both would be required)
    (ns gateway)

    (defn process [m] (println "gateway: " m))

    (ns log)

    (defn write [m] (println "log: " m))

    (ns controller
    (:use clojure.test))

    (defn withdraw [& {:keys [balance withdrawal account-number]}]
    (let [new-balance (- balance withdrawal)]
    (if (> 0 new-balance)
    (log/write "insufficient funds")
    (gateway/process {:balance new-balance
    :withdrawal withdrawal
    :account-number account-number}))))

    (withdraw :balance 100 :withdrawal 22 :account-number 4)
    ;; => gateway: {:balance 78, :withdrawal 22, :account-number 4}

    (withdraw :balance 100 :withdrawal 220 :account-number 4)
    ;; => log: insufficient funds
    Our new withdraw implementation calls two functions that have side effects. We could pass in both functions, but that solution doesn't seem to scale very well as the number of passed functions grows. Also, passing in multiple functions tends to clutter the signature and make it hard to remember what is the valid order for the arguments. Finally, if we need withdraw to always return a map showing the balance and withdrawal amount, there would be no easy solution for verifying the string sent to log/write.

    Given our implementation of withdraw, writing a test that verifies that gateway/process and log/write are called correctly looks like a job for a mock. However, thanks to Clojure's binding function, it's very easy to redefine both of those functions to capture values that can later be tested.

    The following code rebinds both gateway/process and log/write to partial functions that capture whatever is passed to them in an atom that can easily be verified directly in the test.
    (ns gateway)

    (defn process [m] (println "gateway: " m))

    (ns log)

    (defn write [m] (println "log: " m))

    (ns controller
    (:use clojure.test))

    (defn withdraw [& {:keys [balance withdrawal account-number]}]
    (let [new-balance (- balance withdrawal)]
    (if (> 0 new-balance)
    (log/write "insufficient funds")
    (gateway/process {:balance new-balance
    :withdrawal withdrawal
    :account-number account-number}))))

    (deftest withdraw-test1
    (let [result (atom nil)]
    (binding [gateway/process (partial reset! result)]
    (withdraw :balance 100 :withdrawal 22 :account-number 4)
    (is (= {:balance 78, :withdrawal 22, :account-number 4} @result)))))

    (deftest withdraw-test2
    (let [result (atom nil)]
    (binding [log/write (partial reset! result)]
    (withdraw :balance 100 :withdrawal 220 :account-number 4)
    (is (= "insufficient funds" @result)))))

    (run-all-tests #"controller")
In general I use option 2 when I can get away with it, and option 3 where necessary. Option 3 adds enough additional code that I'd probably look into Midje quickly if I found myself writing a more than a few tests that way. However, I generally go out of my way to design pure functions, and I don't find myself needing either of these techniques very often.

Permalink

Clojure: Mocking

An introduction to clojure.test is easy, but it doesn't take long before you feel like you need a mocking framework. As far as I know, you have 3 options. James Sugrue

Permalink

Programming Challenge for Newbies in Clojure and Python too?

Programming Challenge for Newbies in Clojure and Python too?

RubyLearning has been conducting the monthly Ruby Programming Challenge for Newbies for over a year now and so far 12 challenges have been completed. The 13th challenge is in progress. All this was possible due to the extensive support we got from Rubyists across the world. Also, you all indicated that we continue with these challenges in the months to come.

Recently, my colleague Dhananjay Nene posted a Python based solution to the 13th Ruby challenge. While discussing the solution it struck me that it would help Clojure and Python Newbies, if we opened up these challenges in these languages too. Dhananjay and some of my Clojure colleagues are interested in evaluating the submitted solutions in Clojure and Python and maybe we could start the challenges from Oct. 2010.

Clojure and Python enthusiasts – interested? What Do you Think? What is Your Opinion? Please share in the comments below.

Update

3rd Sept. Thanks for the very encouraging response. Based on the feedback received so far, we have decided the following:

  • We will start the challenges for Clojure, Python and Ruby from 1st Oct. 2010. We will call these “Programming Challenge for Newbies” and host it on this blog till end Dec. 2010. If the response is encouraging, we can host the challenges on different domains.
  • We will have separate panels to evaluate the solutions. One each for Clojure, Python and Ruby.
  • We will keep separate prizes for the 3 languages (and hopefully would find some sponsors).
  • The challenge problem setters (fixed till Dec. 2010) would be told that the problem should be solvable in all languages and specifically Clojure, Python and Ruby. This means that the problem setter should not set a problem that needs to be solved by some specific language feature.

Technorati Tags: , , , , ,

Posted by Satish Talim

Permalink

git cheatsheet and class notes

I recently gave a talk at work about git. I created a cheatsheet based on Steve Tayon’s Clojure Cheatsheet.

Git Cheat Sheet Preview

Git Cheat Sheet Preview

I realize there are a number of cheatsheets for git already. However, I wanted a simple, one-page sheet specifically for my attendees.

You can download it here:

Like it? Hate it? Find a typo? Leave your feedback in the comments!


Here are my raw notes from the talk:

;; -*- mode: Markdown; -*-

# How to read:
commands are indented
actions to perform while presenting are marked with @
left to right

# Welcome
see progit.org
what is version control

why use it:

  * backup/restore
  * synchronization sharing
  * track changes
  * ownership
  * branching and merging

who has used subversion 

git
  * you've heard its distributed
  * b/c branching and merging

pace - slow, no slides

leave with practical understanding

# Install & Config

    sudo port install git-core +svn
    git config --global user.name "Nate Murray"
    git config --global user.email "nate@natemurray.com"

# Basic Commands

    cd ~
    mkdir -p projects/demo       # explain only a little
    cd projects/demo
    git init
    git status                   # nothing here
    ls -a                        # talk .git repository vs. working copy
    echo "version 1" > README.txt
    git status                   # untracked file
    git add README.txt
    git status                   # changes to be committed
    git commit -m "added version one of the file"
    git status                   # clean

stop, draw the picture of the local operation phases - e.g. svn vs. git

> Principle 1: (almost) everything is local

so now that you know about the staging area, lets do it again

    echo "new file" > sheep.rb
    git status                   # draw untracked
    git add sheep.rb
    git status                   # draw staged
    git commit -m "added"

    cat README.txt                 # draw unmodified
    echo "version 2" > README.txt
    git status                   # draw modified
    git commit -a -m "updated version" # shorthand for git add
    git status

Tips:

    git config --global alias.st status
    git st

# Git Internals

* Before we can talk about branching you *have* to understand how git (tried to avoid this)
* files and folders

three objects -  @ Draw first commit

  * blob        - raw data
  * tree        - folder (stores blobs and trees)
  * commit      - snapshot of the repo + meta 

You won't need to use `git cat-file` on a daily basis. however, understanding
the concepts we're going to talk about is really important for branching.

    git log # view the log
    git show ----  # first commit, whatever it is

    git cat-file -p  ---- # first commit
    git cat-file -p  ---- # tree
    git cat-file -p  ---- # blob

draw the rest using git `cat-file`

    git log           # show the log again
    git cat-file -p ---- # second commit

draw the picture. point out the parent connection.
note committer / author

    git cat-file -p ---- # tree

note here there are two blobs!

finish drawing out the second commit
* git stores reference to first file.
* snapshot of the *whole project*
* git stores each file once
* filename is in the `tree` 

draw the last commit

     git log
     git cat-file -p ---- # third commit

> Principle #2 : Git commits are snapshots

* A commit in git is a snapshot of the entire project, not just a list of diffs.
* snapshot is based on the SHA hash function. guarantees file integrity

# refs/branches

questions?

@ stop. redraw commits as *linear* . looking only at commits

ready to define a branch
a branch is a pointer to a commi
text file with a sha. thats it. 

start with one branch called `master`

    git branch

bash prompt

    # skip this
    tree .git/refs/
    cat .git/refs/heads/master
    git log
    # compare the SHAs

update diagram by adding a `ref` to our commit. (`master`). 

@ draw circle pointing to commit

create testing branch

# branching

So lets create another branch:

    git branch testing
    git branch

only created, didn't switch. just created a ref pointing to this
commit

@ update diagram

How does git know what branch we are "on"?

special ref called `HEAD` that points to the local branch
since we are still on master HEAD points to master

@ add HEAD

To switch working copy, use the `git checkout`

    git checkout testing
    git branch

HEAD moves from `master` to `testing`

@ update diagram

master and testing point to the same commit, working directory isn't changed

checkout means something different in git than it does in svn.
checkout in git to switch our working directory to a particular commit. 

now make changes:

    cat README.txt
    echo "we are on the testing branch!" > README.txt
    cat README.txt
    git commit -a -m "updated the readme"
    git log

@ update diagram, adding new commit. move the testing ref and the HEAD ref with it

add a "test"

    echo "this is a test" > test.rb
    git add test.rb                    # stage it for our commit
    git commit -m "added a test"       # now commit
    git log

@ update diagram - should have two commits

hotfix - scenario: you need to switch back to master

    git checkout master
    ls

@ move HEAD

so notice two things.
1) switching to this branch was fast - everything is local
2) our file test.rb is absent

and if we

    cat README.txt

it says 'version 2' just like we would expect

    echo "applying fix" >> sheep.rb
    cat sheep.rb
    git commit -a -m "applied important fix"
    git log
    git cat-file -p ---- # last commit

@ draw the new commit, and draw its reference back to the parent. move HEAD and master

now fixed, can push into production
and get back to work in `testing`

    git checkout testing
    cat README.txt
    cat test.rb

This is a general pattern:

> Principle #3: Branching is cheap, use it often

If you are working on a particular feature, create a branch. 

If you're coming from svn, making frequent branches might seem unnatural.
in svn, a branch is global -> namespace issues.
vs. git: private branches
name your branch 'test' and it won't collide with anyone elses

But branching itself isn't that useful unless its easy to merge.

* how many of you have merged a branch in svn?
* how many of you enjoyed it?

merging is one of git's strength and git makes it relatively easy

# merging

    cat sheep.rb

two branches: `master` and `testing` - need to merge

    git checkout master
    git merge testing
    git show HEAD

instead of a 'parent' we have a line that says 'merge'
a merge commit has more than one parent

@ draw the commit object
@ draw lines to the commits

    gitx

sometimes merging doesn't go as planned - conflicts

    git checkout -b breaker

this is shorthand for create and then checkout a new branch based on the
current HEAD

    vi sheep.rb # changing fix
    git commit -a -m "changed the fix"
    git checkout master
    vi sheep.rb # improving fix
    git commit -a -m "improved the fix"

@(update diagram, adding breaker and master refs)

    git merge breaker
    git status

there are many diff viewing tools.
* perforce
* opendiff - from apple

    git mergetool -t opendiff

I don't really like using the visual tools.
Sometimes you need character level editing

    vi sheep.rb
    git add sheep.rb
    git commit -a

talk about merge with conflicts

@ update diagram draw new merge commit

    gitx

Questions?

# Remotes

Everything so far on one machine. 

I work offline (I take the train)
If I break something I can rollback see where I was an hour ago 

want to share our changes.
might seem scary or messy because changes to totally independent lines of the code.
but in practice its not a problem.

svn version numbers are incremental - so two repos would get out of
step
no easy way of merging two separate repostories. 

git blob identifiers are a SHA of the content.
if the same content is created anywhere in the universe you'll still
have the same SHA

git doesn't care about where your commits come from or how you get them

Protocols:
  * ssh
  * git
  * http
  * local file system

sample project on our github

    cd ..
    open http://XXX/nmurray/simple-echo
    git clone git@XXX:nmurray/simple-echo.git
    cd simple-echo
    git log

svn checkout just HEAD
vs. git - whole repo

To be able to collaborate with others you have to manage 'remote repositories'.
When you clone a project, you have a default remote called 'origin'. 

    git remote -v

Remotes are pointers to other repositories that are _usually_ over the network.
'pull' and 'push' changes.

    vi README.mkd
    # make a change
    git commit -a -m "make a change"
    git push

If someone else makes a change:

    git pull origin master

This means pull from `origin` the branch `master` into local branch `master`. You can often to just

    git pull

which means pull from origin whatever branch Im on (i.e. HEAD) into this branch.

Now let's say someone pushes a change and I make a change
I can't push unless I pull first. This is good.

# remote forks

So that is while we are on the same line. What if were on different lines?

@(open up webbrowser again)

Bh also has forked my project. But when we say forked, all the means is he has
created his own development line from some of my commits

    git remote add bh git@XXX:bhenderson/simple-echo.git
    git remote -v

Now you shouldn't be surprised to learn that adding the remote doesn't change
anything. First we have to `fetch` hist changes

    git fetch bh

`fetch` brings his commits into my repo but again, doesnt change my working copy.

fetch brought branches + commits into repo
work with those branches just like any other branch.

    git branch -a

So you see here we have 

* `master`, which is our local master
* we have at the bottom `origin/master` which is the origin where we pulled from branch master
* and then we have `bh/master`, which is bhendersons master branch

These are all regular branches: they are just pointers to commits. We
can even checkout as branch 

    git checkout bh/master

scary message

    git checkout master

So how would we merge bhendersons changes with our own? I'm sure you could guess by now. Simply:

    git merge bh/master # don't press enter!!!

But lets take it up a notch.
say you didn't want to merge bh changes in your master branch.
real world, you might not know if his changes would merge cleanly
don't want to mess up your master branch.  

What we are going to do is
* create a new branch,
* merge bhs branch in THAT branch
* then we're going to merge to master.

It will make more sense when we do it. Lets try:

Okay we first want to create a new branch based on our master

    git checkout -b bh-merge
    git branch -a 

Now lets merge his changes

    cat simple-echo.rb
    git merge bh/master
    cat simple-echo.rb
    git log                  # see bh as the author of the commit

okay everything was clean! *phew* now lets go back to master

    git checkout master
    git merge bh-merge
    git log

and there we go! merged nicely.
now I don't need bhendersons merge branch anymore, so lets delete it

    git branch -d bh-merge
    git branch -a

git is distributed

Instead of one central server, that everyone has to sync to,
* independent lines of work can go on.
* If someone creates something good in their branch, they just tell people about it.
* permission-less 

you can see why it is so good for open-source development

questions about branching?

# Advanced

* tagging
* rebase
* cherry pick
* git bisect
* hooks
* tracking branches
* submodules
* interactive staging
* squashing commits
* git-svn
* setting up your own server
* patches via email
* gitjour

Share:

del.icio.us
Reddit
Technorati
Twitter
Facebook
Google Bookmarks
HackerNews
PDF
RSS



Permalink

Consistent APIs for collections

I have been using Clojure a lot for work this year and the consistent API for anything that is a seq (lists, vectors, maps, trees, etc.) is probably my favorite language feature. Scala 2.8 collections offer the same uniform API. For me Clojure and Scala, with a fairly small number of operations to remember across most collections therefore represent a new paradigm for programming compared to some older languages Like Java, Scheme, and Common Lisp that force you to remember too many different operation names. The Ruby Enumerable Module is also provides a nice consistent API over collections. Most Ruby collection classes mixin Enumerable, but the API consistency is not as good as Scala and Clojure. That said, even though Enumerable only requires a small number of methods to be implemented like each, map, find, etc., the ability to combine these methods with blocks is very flexible.

Permalink

“Editing” trees in Clojure with clojure.zip

Clojure.zip is a library that lets you move around freely within trees and also create changed copies of them. This is a tutorial I wish I’d had when I started using it.

In this tutorial, I’ll use sequences as trees. You can create your own kind of trees if you want, but I won’t cover that.

Here’s what we’ll be working with:

user=> (def original [1 '(a b c) 2])
#'user/original

It’s a tree whose root is a vector with three children: 1, the subtree (a,b,c) and 2. We need to convert it into some sort of data structure that allows free movement. That’s done like this:

user=> (require '[clojure.zip :as zip])
nil
user=> (def root-loc (zip/seq-zip (seq original)))
#'user/root-loc

Notice that I used the alias zip. If you refer or use clojure.zip, you’ll find yourself overwriting useful functions like clojure.core/next.

Notice also that I explicitly wrapped the tree in a seq. If you use seq-zip on an unwrapped vector, you’ll get confusing results.

At this point, I’d describe root-loc as “the location (or loc) of the root of the tomorrow tree.” I say “tomorrow tree” because it’s not a tree itself, but something that can later be converted into a tree. In reality, root-loc names both the loc and the tomorrow tree, bundled up together, but I think it most straightforward to leave the tomorrow tree implicit.

(The actual data structure is called a “zipper”, which is a decent analogy for the actual implementation but didn’t help me understand how to use the library.)

Moving around inside a tomorrow tree

With the loc, we can move around:

user=> original
[1 (a b c) 2]
user=> (zip/node (zip/down root-loc))
1

zip/down moves to the leftmost child of the current loc and returns that child’s loc. zip/node gives you the subtree of the original tree corresponding to its loc argument. It’s one of the main ways you get parts of a tree—regular lists, vectors, and the like—”out of” a tomorrow tree.

The -> macro makes movement in the tomorrow tree much easier to understand:

user=> (-> root-loc zip/down zip/right zip/node)
(a b c)
user=> (-> root-loc zip/down zip/right zip/down zip/right zip/node)
b

Nevertheless, I always wrap anything but the simplest traversals in their own functions.

Here are some other movement functions for you to try: up, left, rightmost, and leftmost. Beware of (arguable) inconsistencies in the handling of edge cases. For example, right and rightmost behave differently when moving “off the end” of a list of siblings:

user=> original
[1 (a b c) 2]
user=> (def last-one (-> root-loc zip/down zip/right zip/right))
#'user/last-one
user=> (zip/node last-one)
2
user=> (-> last-one zip/right)
nil         ; off into nothingness
user=> (-> last-one zip/rightmost zip/node)
2           ; stays put

Parts of a tree

In addition to zip/node, there are other functions for recovering parts of the original tree. All work relative to the current loc.

user=> (def b (-> root-loc zip/down zip/right zip/down zip/right))
#'user/b
user=> (zip/node b)
b
user=> (zip/lefts b)
(a)
user=> (zip/rights b)
(c)

An interesting one shows all subtrees from the root of the tree down to just above the current loc:

user=> (zip/path b)
[  (1 (a b c) 2)
      (a b c)     ]

Changing the tree

A number of functions take a current loc (and associated tomorrow tree) and produce a new loc inside a different tomorrow tree. For example, let’s delete the ‘(a b c) subtree:

user=> (def loc-in-new-tree (zip/remove (zip/up b)))
#'user/loc-in-new-tree

How does the tree represented by this new tomorrow tree differ from the original tree? We can see that with the root function, which applies zip/node to the root of the tomorrow tree:

user=> (zip/root loc-in-new-tree)
(1 2)
user=> original
[1 (a b c) 2]

Where, exactly, is the new location?

user=> (zip/node loc-in-new-tree)
1

The new loc has “backed up” from its previous version. (That’s not an exact enough description, but it’ll do for a few more paragraphs.) The other editing functions return an “unchanged” loc (except in the sense that it’s pointing into a new tomorrow tree with a changed structure): insert-left, insert-right, replace, edit, insert-child, and append-child. Try them out.

In which I reveal I’m slow

At first, I easily forgot that these functions create a new tomorrow tree and don’t really “replace” or “insert” or “edit” parts of the old one. That is:

user=> (zip/root loc-in-new-tree)
(1 2)                ; I see that I've edited the tree.
user=> (zip/root b)
(1 (a b c) 2)        ; Wait - I thought I changed the tree?!

“Well, duh”, you might say, “it’s a functional language with immutable state, so of course it doesn’t change the old tree.” You’re absolutely right, but it was surprisingly easy for old habits to sneak up on me. So, two rules:

  • If you ever fail to use the return value of one of these functions, you’re doing it wrong.

  • If you ever write code like this:

    (let [stashed-location (zip/whatever ...)]
       ... make "changes" ...
       ... use stashed location ...)
    

    you might be doing it wrong. Make sure that you’re not unthinkingly assuming that later changes to “the” tree are reflected in your stashed-location.

Whole-tree editing

Here’s an example of printing out a whole tree, one node at a time:

(defn print-tree [original]
  (loop [loc (zip/seq-zip (seq original))]
    (if (zip/end? loc)
      (zip/root loc)
      (recur (zip/next
                (do (println (zip/node loc))
                    loc))))))

This is an ordinary recursive loop. It visits each location in the tomorrow tree, stopping when zip/end? is true. zip/next returns a new current loc that is the next one in the tomorrow tree, where “next” means “in preorder depth-first order”. To see that, here’s what one run of the function prints:

user=> (print-tree [1 '(a (i ii iii) c) 2])
(1 (a (i ii iii) c) 2)
1
(a (i ii iii) c)
a
(i ii iii)
i
ii
iii
c
2

To make changes to the tree, add a cond. The default case should hand the current loc to zip/next. The other cases should yield a loc pointing into a changed copy of the tomorrow tree:

  (loop [loc (zip/seq-zip original-tree)]
    (if (zip/end?> loc)
      (zip/root loc)
      (recur (zip/next
          (cond (subtree-to-change? loc)
                (modify-subtree loc)
                …
                :else loc)))))

The tricky bit is making sure that modify-subtree returns a loc just before the next loc of interest (in a depth-first traversal). (It has to be before so that zip/next takes you to the interesting loc.) To get to that loc, you can use any of the movement functions (zip/next, zip/up, zip/rightmost, and so on). There’s also a zip/prev that returns the loc just before the current one.

To keep from confusing myself, I write little helper functions, each named by what it does and what loc it returns. So, for example, I have one function that glories in this name:

(defn wrap-with-expect__at-rightmost-wrapped-location [loc]
  (assert (start-of-arrow-sequence? loc))
  (let [right-hand (-> loc zip/right zip/right)
        edited-loc (zip/edit loc
                   (fn [loc] `(expect ~loc => ~(zip/node right-hand))))]
    (-> edited-loc zip/right zip/right zip/remove zip/remove)))

It takes a form like ... (f 1) => (+ 2 3) :next, with the current loc being (f 1), and turns it into this:

... (expect (f 1) => (+ 2 3)) :next

… with the current loc being at the 3 so that the zip/next returns a loc at :next. This positioning works because I use zip/remove, which returns a loc that’s “backed up” to the previous loc in a depth-first traversal. (That’s the fix to my earlier imprecision about what zip/remove returns. It’s not the previous loc at the same level, which—for the sake of a simpler explanation—I earlier allowed you to assume.)

By building and testing these little functions first, my main cond-loop is easier to get right. You can see some more examples in my test package, Midje. You can look at both the tests and the code.

Permalink

Clojure: Using Sets and Maps as Functions

Clojure sets and maps are functions.

Since they are functions, you don't need functions to get values out of them. You can use the map or set as the example below shows.

(#{1 2} 1)
> 1

({:a 2 :b 3} :a)
> 2
That's nice, but it's not exactly game changing. However, when you use sets or maps with high order functions you can get a lot of power with a little code.

For example, the following code removes all of the elements of a vector if the element is also in the set.
(def banned #{"Steve" "Michael"})
(def guest-list ["Brian" "Josh" "Steve"])

(remove banned guest-list)
> ("Brian" "Josh")
I'm a big fan of using sets in the way described above, but I don't often find myself using maps in the same way. The following code works, but I rarely use maps as predicates.
(def banned {"Steve" [] "Michael" []})
(def guest-list ["Brian" "Josh" "Steve"])

(remove banned guest-list)
> ("Brian" "Josh")
However, yesterday I needed to compare two maps and get the list of ids in the second map where the quantities didn't match the quantities in the first map. I started by using filter and defining a function that checks if the quantities are not equal. The following code shows solving the problem with that approach.
; key/value pairs representing order-id and order-quantity
(def map1 {1 44 2 33})
(def map2 {1 55 2 33})

(defn not=quantities [[id qty]] (not= (map1 id) qty))
(keys (filter not=quantities map2))
> (1)
However, since you can use maps as filter functions you can also solve the problem by merging the maps with not= and filtering by the result. The following code shows an example of merging and using the result as the predicate.
; key/value pairs representing order-id and order-quantity
(def map1 {1 44 2 33})
(def map2 {1 55 2 33})

(filter (merge-with not= map1 map2) (keys map2))
> (1)
I don't often find myself using maps as predicates, but in certain cases it's exactly what I need.

Permalink

Today in the Intertweets (Aug 31st Ed)

  • Protocols are faster than Clojure multimethods. Can be as fast as direct func calls in fact and inlined by hotspot (via @chouser) — This is an important fact.
  • Clojure or: How I Learned to Stop Worrying and Love the Parentheses (here, via @nathanmarz) — This is an article that explains one of the main differences between Lisps and other mainstream languages: the ability to extend the the language –even create your own mini-languages– to fit your purposes from within the host language itself and without adding any accidental complexity to the system.. Most other languages add accidental complexity when trying to do so.
  • Sweet, a Clojure Ring adapter for Mongrel2 by mikejs (here, via @zedshaw) — This allows you to run Ring/Compojure applications inside Mongrel2, a cool new language agnostic web server.

Permalink

Clojure or: How I Learned to Stop Worrying and Love the Parentheses

I'm a longtime Java, Ruby, and Python programmer. Yet Clojure is the first language I've used that I truly enjoy using on a daily basis.

Clojure is a special language. There have been many attempts to articulate the benefits of Lisp-based languages before, but most of these attempts seem to end in futility. Until you use the language, it's hard to understand why functional programming, macros, and immutability are such a big deal.

So I'm going to take a different approach in explaining the virtues of Clojure. I'm going to start off somewhat unusually by talking about SQL, show how querying is done fundamentally differently in Clojure, and transition from there into a broader discussion about domain specific languages, accidental complexity, and how Clojure solves problems that have plagued programmers throughout programming's history.

The problem with SQL

SQL is a language for querying relational databases. SQL is one of the most successful technologies ever, but very few technologies can claim to have led to as much unnecessary complexity as SQL.

SQL solves the problem of querying a relational database in a concise, expressive manner. In that regard, SQL does a very good job.

The problem with SQL is that it's a custom language. Using SQL from other languages causes a host of other problems - problems that are orthogonal to querying a database. These problems are examples of accidental complexity, complexity in an application caused by the tool used to solve a problem rather than the problem itself.

The prime example of accidental complexity caused by the nature of SQL being a custom language are SQL injection attacks.

SQL injection has nothing to do with querying databases

SQL injection results from using one language from within another by doing string manipulation. This has nothing to do with querying databases, it's an integration problem. As we'll see later, it's a problem we can avoid in Clojure.

I know what you're thinking. "There are X, Y, and Z libraries for parameterizing SQL queries and avoiding SQL injection attacks!" This begs the question: then why are SQL injection attacks so pervasive?

The obvious answer is that string manipulation is the most straightforward way to use one language within another. There's something wrong with your tools when the obvious, straightforward way to do something causes major security problems.

There are other problems that arise from using one language within another. The embedded language is second class. The usual techniques programmers use to reduce program complexity, modularization and composition, can't be fully applied to the embedded language.

Similar problems arise when generating HTML - cross site scripting attacks are very pervasive. This is due to the same problems that occur when trying to use one language from within another.

Clojure lets you create integrated languages

There are some serious issues that arise when using distinct languages together. Yet the languages are distinct for a reason - they're intended for different problems and operate with completely different mental models.

What if you could fully integrate the query language into your general purpose programming language? What if queries were first class and could be manipulated as such?

Say hello to Clojure (and Lisps in general).

In Clojure, you can extend the language within the language to create domain specific languages. These mini-languages are fully integrated into Clojure and can be manipulated like anything else in Clojure. Most importantly, you get the benefits of a custom language - conciseness and expressiveness - without the accidental complexities.

There's a number of reasons why building mini-languages is possible in Clojure. These include the "code as data" philosophy of the language, macros, closures, and the emphasis on functional programming.

An example of an integrated query language for Clojure

I wrote an integrated query language for Clojure called Cascalog. Cascalog is a query language for Hadoop clusters, but a very similar library could be built for querying relational databases.

Cascalog forgoes the syntax-heavy design of SQL in favor of the syntax-light design of Datalog. Here are some examples of what Cascalog looks like, compared against the equivalent SQL queries. Remember, Cascalog is a library for Clojure that has the look and feel of an embedded language:

Teaching Cascalog is not the goal of this article, so don't worry if you don't fully understand the Cascalog queries. I just wanted to show that Cascalog is just as concise and declarative as SQL. To learn more about Cascalog, see the introductory tutorial.

The key difference between Cascalog and SQL, of course, is that Cascalog is an embedded language within Clojure. The first class integration between Clojure and Cascalog avoids accidental complexity and lets us use techniques that are otherwise restricted. Queries written with Cascalog, unlike SQL, can be modularized and composed in all sorts of useful and interesting ways.

For example, you can make functions that return subqueries:

You can parameterize your queries without needing to explicitly say that you're doing so. You just use variables in your query like you were passing them to any other function:

You can pass a subquery to a function to use in another query:

You can compose operations together to create new operations. Here's how to define the "average" aggregator in terms of count, sum, and division:

"average" can then be used like any other operation, as in the following query which determines the average age of people in the dataset:

Mold Clojure to your problem

Linq is an integrated query system for C#. It exists to solve the integration problems I discussed when using a query language from within a general purpose language.

There's one huge difference between Cascalog and Linq: Linq is part of C#. You can't define Linq in terms of regular C#, it needed to be added by the language designers. Cascalog, on the other hand, needs no special support from Clojure. Cascalog is a regular Clojure library.

This means that you can define DSL's in Clojure yourself but won't get any help from C#. I've created lots of mini-languages, optimized to my problem domains.

Clojure has a relentless focus on minimizing accidental complexity

The ability to make embedded languages from within Clojure is just one example of Clojure's relentless focus on minimizing accidental complexity.

Clojure has a very opinionated approach to mutable state, another big source of accidental complexity. Looking back on my Java programming days, I'm amazed at how much of my programming time involved controlling when and how the states of objects were modified.

Clojure prefers immutable data and forces the programmer to be explicit about manipulating state. Clojure makes explicit the difference between a value (an immutable piece of data) and an identity (an entity whose value changes over time).

Concurrency can also be a huge source of accidental complexity. Locks and semaphores are not the right abstraction for a large number of concurrency problems. Clojure has a number of concurrency primitives baked in such as software transactional memory, futures, and promises. These primitives are higher level than locks and more appropriate for many problems (although it's worth saying there are some problems where locks are appropriate). Clojure's concurrency features are fully integrated with how it handles mutable state.

You should watch this excellent talk by Rich Hickey where he talks in depth about Clojure's philosophy on state, value, and identity.

Conclusion

A lot of people talk about how wonderfully expressive is Clojure. However, expressiveness is not the goal of Clojure. Clojure aims to minimize accidental complexity, and its expressiveness is a means to that end.

You should follow me on Twitter here.

Permalink

Today in the Intertweets (Aug 30th Ed)

  • In which the lessons of ZZ Top are applied to the marketplace (here, via @planetclojure) — Tangentially related to Clojure, except that the author of the article built Leningen for us, and along the way he did something bad… and nationwide. A good read about why a hacker sometimes needs to assert her/his opinion in the code.
  • Clojure and SQL (here, via @cfbloggers) — Short tutorial on getting some SQL working with Clojure.

Permalink

Clojure and SQL

I spent quite a bit of time playing with Clojure over the weekend (including writing my first plugin and my first hook for Leiningen, Clojure's popular build tool) and I started experimenting with reading data from a database. I tweeted that I was pleased with myself for succeeding and Marc Esher asked "is reading data so hard in clojure that it warrants celebration?" so I figured I'd post my little example, so you could see how easy it is (or isn't, depending on your point of view).

(ns sean.core
  (:use [clojure.contrib.sql :as sql])
  (:gen-class))

This just declares a namespace for my code to live in - the file is sean/core.clj - and says I'm going to be using the library package clojure.contrib.sql under the alias sql. :gen-class says I want the file compiled to a Java class.

(def db {:classname "com.mysql.jdbc.Driver"
         :subprotocol "mysql"
         :subname "//127.0.0.1:3306/mydb"
         :user "dbuser"
         :password "secret"})

This declares the datasource I'm going to use, MySQL, locally, database 'mydb' with the credentials I'm using to login.

(defn print-users
  [] (sql/with-connection db
       (sql/with-query-results res
         ["SELECT * FROM user"]
         (doseq [rec res]
           (println rec)))))

This declares a function print-users taking no arguments [] which reads all users from the 'user' table and prints them out (they are automatically formatted as records with key: value pairs for columns). I'll explain this in more detail below but it really is pretty straightforward.

(defn -main
  [] (print-users))

And that's my 'main' function. Like a regular Java main function. Once this is compiled (I'm using the CounterClockwise Plugin for Eclipse which lets me easily compile Clojure files but you can also use Leiningen - I'll blog examples of both later), I can run it like a regular Java program:

java -cp ./classes/:clojure.jar:clojure-contrib.jar:mysql-connector-java-bin.jar sean.core

So what exactly does it do? The key thing to remember with functional languages is that they often have expressions which are treated as functions that get applied to data in a context. The body of print-users is (sql/with-connection db expression) so it gets a connection for the specified db and then evaluates expression in that context. (sql/with-query-results res [vector] expression) executes the SQL in the vector (subsequent elements of the vector are parameters substituted into the SQL - see below), binds the result to 'res' and then evaluates the expression in that context. (doseq [rec res] expression) takes the sequence 'res' and iterates over each element, binding the element to 'rec' and then evaluating the expression in that context. So, it gets a connection, selects all users and prints each row.

If you wanted to just get users with a particular status, you'd say something like (sql/with-query-results res [ "SELECT * FROM user WHERE status = ?", search-status ] expression) where search-status was a variable containing the status you were searching on.

Permalink

(Community Standards)

As with any Lisp ever created, Clojure has recently been infected with talk of alternative layout styles of closing (and in some cases, open) parentheses...

Permalink

Akamai Technologies, Engineering Manager, Clojure, Cambridge, MA

Hi Will, Mike McLaughlin writes: [Akamai Technologies] has just opened a new Engineering Manager role in Cambridge, MA for our mobile team. The position will help lead a team of engineers working with Clojure. We really want to hire someone that will help act as an advocate and evangelist for the Clojure work we are [...]

Permalink

in which the lessons of ZZ Top are applied to the marketplace

I've been thinking a lot about ZZ Top recently. This isn't something I generally do as a rule, but it was prompted by re-reading a blog post by the inimitable Giles Bowkett ostensibly about the song I'm Bad; I'm Nationwide. Go ahead and cue that up in the background while you read this post; it's vaguely relevant.

zz top

"Bad" in this sense of course is the good kind of bad, like the Michael Jackson song. It seems to be mostly about attitude. Giles makes the point that generally being bad is not correlated with being nationwide.

"I'm Bad, I'm Nationwide" is a ZZ Top song. Hopefully you can figure out what it's about, but just in case, the singer's point is that he is bad, and he is nationwide. [...]
It's good to be bad. It's good to be nationwide. It's even better to be worldwide. How can we apply the lessons of ZZ Top in the workplace?
Obviously if you walk into your boss' office, jump on his or her desk, pull down your pants, and perform toilet functions all over the place, that would be bad. But it would not be nationwide, and it would not encourage becoming nationwide. In fact, it would not really be bad, it would just be stupid. But this silly example highlights a deeper paradox: that which is bad is usually local, and that which is nationwide is usually good.

Giles goes on to talk about how the bad/nationwide balancing act applies to a career in software development, which is interesting, but I've been thinking about it in terms of projects instead. Take the familiar realm of editors. There are a few of them that are nationwide just by virtue of having survived and built up a following over the course of several decades. And they're also often bad when flame wars erupt over them, as is fairly common.

So Emacs and vi have somehow achieved the intersection of bad and nationwide, which as Giles posits is tricky to pull off. Simply being bad doesn't work in the long-term, and while quiet competence sometimes does, it's worth noting that in many cases attention helps a project improve in concrete ways—especially projects whose users are developers. This is pretty key for things like languages, libraries, and build tools.

The problem I'm faced with here is that being bad is also often correlated with being inflammatory. The easiest way to get attention in the software world is to pick a fight. You see this all the time on sites like Reddit; when people smell blood they upvote, which is why stuff like Aleph vs Node.js: the smackdown makes it to the front page despite being a superficial comparison.

The thing you have to remember about picking fights with another project or language just for the sake of it is that often the attention fallout is more evenly distributed than is intended. When someone goes out of their way to pick a fight, they usually aren't much good at hiding the fact that they've got an investment in one side. Impartial readers can usually pick up on this pretty easily, and they're likely to spot holes in the argument or write it off it as a piece of cheerleading. In cases of particularly unfair partisanship, they may even begin to sympathize with the target under attack.

The closest I've come to this sort of bad/nationwide is this post I made on Twitter a few months back:

Q: What's the difference between Ant and Maven? A: The creator of Ant has apologized.

This turned out to be just the right mix of nasty and clever to really take off; hundreds and hundreds of people passed it on, and over the next few days searches for my name came up with just pages and pages of this over and over again. At the time of this writing it's still on the second page of results in a search for my name.

I've got to admit, as the author of a build system that competes with Maven, this felt kind of good. The problem is it's totally a cheap shot—everyone involved with build tools ends up in a position of needing to apologize to their users given enough time. James Duncan Davidson has expressed his regrets over the use of XML in ant, Dave Thomas is less than proud of how RDoc has turned out, and I'm pretty sure the only reason the guy responsible for the tabs/spaces distinction in Makefiles hasn't apologized is that he fled to Tijuana for facial reconstructive surgery. Anyway, Leiningen will eventually be in the same position if it's not already.

why sample

So we're still left with this question of whether you can be bad and nationwide without also being a jerk. I think it's doable, but you just don't see it much because picking fights is so much easier. One example that comes readily to mind is _why the lucky stiff. He qualified not just by his off-kilter visual style but by his aversion to what he scoffed at as "best practices".

Perhaps this is why I have trouble swallowing unit testing or extreme programming or other best practices as the law. I guess there’s a place for these tricks (the work place,) but they do not speak to the pure form of hacking for hacking’s sake, which I so ardently defend! Unit testing, in particular, is designed to reel in spontaneous hacking. It is like framing a picture before it has been painted. Hacking, at heart, will continue to be something of spontaneous order, something of anarchy, and the landscape of hacking is something which comes from human action but is not of human design.

― This Hack was not Properly Planned

This may not sound particularly controversial, but in the context of the test-driven-fanatic Ruby community it was a pretty weighty heresy. But he was all about exploring the fringe, and some excellent ideas came of it. It caught peoples' eyes and drew them in, so much so that when he disappeared, the communities surrounding his projects picked up the orphaned bits and carried them forward.

gibbons

Bad... and nationwide.

This is way more productive than us-vs-them fights that normally accompany attempts to be bad and nationwide. So what does this mean for you and me? Most people can't draw like _why, but injecting your own particular brand of crazy into your projects may be a slick hack you can pull to sidestep the negativity.

Here's an example: the new task in Leiningen spits out a blank Clojure project skeleton. At the time I saw a few too many "foojure"-type names popping up for new projects, and when I saw one called "Couverjure" I said enough is enough. Now the new task will refuse to generate projects named after *jure puns. Arbitrary? You bet. Ridiculous? Perhaps. But harmless and easy to work around. And don't forget controversial:

no more jure names

The point is: don't take yourself too seriously. Hack the good hack and leave an easter egg or two around for the adventurous. Then you too can be bad... and nationwide.

Permalink

This weekend in the Intertweets (Aug 29th Ed)

  • stumbling towards the clojure api (a code example using de maybe monad and null safe operator .?.) (here, via @jneira) — The many ways to deal with functions that can return a null value.
  • I put together a small example of how to use the #websocket support in #aleph using #clojure (here, via @maclausen) — Don’t you worry, you won’t be reading much code as the code in this example is very succinct.
  • Random thoughts on Clojure Protocols (here, via @debasishg) — This is a very informative article about what protocols in clojure are and are not, the latter part (what protocols are not) being most informative for all of you programming polyglots.
  • Porting #clojure ants concurrency demo to #haskell (here, via @wmacgyver) — Here is a follow-up post, “Speeding up the Ants program” which contains some interesting profiling info.
  • lein-hadoop is now available on clojars.org. Contributions welcome! (here, via @xefyr) — A leiningen plugin that lets you create hadoop-compatible jar files.
  • The only real problem I have with Clojure is that after learning it, you never want to program in another language again (via @mauritsrijk) — I know, it happens…
  • Beware Choosing the Most Complex Tool for the Job (here, via @stuartsierra) — A word of caution for those too eager to use the new 1.2 features (protocols, records) that could make your programs more complex and less flexible.
  • static – static blog generator in #clojure (here, vi a@maclausen) — Nurullah Akkaya moved his blog away from Compojure and into the static world by using only hiccup. The code is here.

Permalink

Clojure Plugin For Grails

According to a post from XML co-creator Tim Bray from awhile back, Clojure is considered "the new hotness among people who think the JVM is an interesting platform" for other languages to build on, for people who think that "there's still life in that ol' Lisp beast," and for "people who worry about concurrency and state in the context of the multicore future." In...

Permalink

Using cljr for Clojure development

At work I now use the Clojure setup that everyone else uses, emacs+swank-clojure, with our custom repositories. For my own Clojure hacking (my own projects) I have just about settled on using cljr for convenience and agility. For me, the big win is being able to access Clojure libraries, Java libraries, and JAR files containing data sets I use often for NLP work from any directory. I don't need a heavy weight project, like for example, using Leiningen with all dependencies locally loaded. cljr uses Leiningen to manage the packages in the single ~/.cljr repository. When you startup cljr, everything in ~/.cljr is on your JVM classpath: this may seem a little heavy, but it is very convenient.

As an example, this morning I noticed an old Twitter direct message from the author of Nozzle library asking me if I had a chance to try it. Instead of setting up a separate Leiningen project directory, I just did a cljr install com.ashafa/nozzle 0.2.1, went to my catch-all directory where I keep short snippets of Clojure code, and entered Tunde's test program for Nozzle:

;; assumes: cljr install com.ashafa/nozzle 0.2.1

(use 'com.ashafa.nozzle)

(def username (System/getenv "TWITTER_ACCOUNT"))
(def passwd (System/getenv "TWITTER_PASSWD"))

(defn my-callback
[message]
(println message))

(def noz (create-nozzle "filter" username passwd my-callback {:track "twitter"}))
and running it is as simple as:
cljr run nozzle-twitter-test.clj
or, using swank and Emacs:
cljr swank
and in Emacs do M-x slime-connect and in the repl: (load "nozzle-twitter-test")

Permalink

I am merging my other three blogs into this (my main) blog

I had what I thought was a good idea in the last year: split out special interests into:

I am going to leave my other three blogs intact, as-is, but I am going to start doing two things: all of my non-book writing will go into this single blog and I am going to copy a few of my recent articles in the other three blogs to this one. Havng four distinct blogs has been a nuisance.

Permalink

Beware Choosing the Most Complex Tool for the Job

I once saw a TV show about competing groups of archeologists trying to demonstrate how the ancient Egyptians raised stone obelisks weighing hundreds of tons.

One group of archeologists built a complex apparatus involving a wooden frame and lots of rope. It looked impressive, but it didn’t work.

The other team built a sand pit, dragged the obelisk to the top, and gradually removed sand from below the obelisk until it reached its final position. Simple, unglamorous, and it worked on the first try. (See the whole series of photos.)

The leader of the wooden-frame group admitted they were mistaken in basing their design on the most complex ancient technology they could find, sailing ships. Instead, he said, “We should have asked, ‘What is the simplest way they could have done it?’”

***

From the ancient Egyptians, jump forward about four thousand years to me, sitting at my computer, writing a new testing framework for Clojure. I was excited about the new Clojure features datatypes and protocols. I based my whole framework around them. It was a beast. Lots of weird edge cases and complex interactions that were hard to reason about and even harder to debug. Datatypes and protocols may be a powerful tool, but that doesn’t always make them the right tool.

After my fourth or fifth rewrite of Lazytest, I started asking myself, “What is the simplest way I could do this?”

The answer was staring me in the face. Clojure is a functional language (mostly). What’s the simplest way to do anything? Functions!

All of a sudden, complexity started to fall away. My protocols, most of which only had a single method anyway, became ordinary functions. My typed data structures became ordinary maps. The code shrank by many lines, and it was vastly easier to understand. Even better, I discovered new possibilities in the simpler design.

Functions are a fantastic abstraction because they can be composed. In the new Lazytest, everything is a function: test cases, test suites, and contexts. The RSpec-like describe macros are still there, but they’re simpler. They do what macros are supposed to do: provide a convenient syntactic layer over functional definitions, not define a completely new language.

I haven’t completely finished the new API — fixtures, now renamed back to “contexts,” are not supported yet. But I’m much happier with this version than with the old one. I actually feel like this is something I’d be willing to release soon. So give it a spin and tell me what you think.

Permalink

Random thoughts on Clojure Protocols

Great languages are those that offer orthogonality in design. Stated simply it means that the language core offers a minimal set of non-overlapping ways to compose abstractions. In an earlier article A Case for Orthogonality in Design I discussed some features from languages like Haskell, C++ and Scala that help you compose higher order abstractions from smaller ones using techniques offered by those languages.

In this post I discuss the new feature in Clojure that just made its way in the recently released 1.2. I am not going into what Protocols are - there are quite a few nice articles that introduce Clojure Protocols and the associated defrecord and deftype forms. This post will be some random rants about how protocols encourage non intrusive extension of abstractions without muddling inheritance into polymorphism. I also discuss some of my realizations about what protocols aren't, which I felt was equally important along with understanding what they are.

Let's start with the familiar Show type class of Haskell ..

> :t show
show :: (Show a) => a -> String

Takes a type and renders a string for it. You get show for your class if you have implemented it as an instance of the Show type class. The Show type class extends your abstraction transparently through an additional behavior set. We can do the same thing using protocols in Clojure ..

(defprotocol SHOW 
  (show [val]))

The protocol definition just declares the contract without any concrete implementation in it. Under the covers it generates a Java interface which you can use in your Java code as well. But a protocol is not an interface.

Adding behaviors non-invasively ..

I can extend an existing type with the behaviors of this protocol. And for this I need not have the source code for the type. This is one of the benefits that ad hoc polymorphism of type classes offers - type classes (and Clojure protocols) are open. Note how this is in contrast to the compile time coupling of Java interface and inheritance.

Extending java.lang.Integer with SHOW ..

(extend-type Integer
  SHOW
  (show [i] (.toString i)))

We can extend an interface also. And get access to the added behavior from *any* of its implementations .. Here's extending clojure.lang.IPersistentVector ..

(extend-type clojure.lang.IPersistentVector
  SHOW
  (show [v] (.toString v)))

(show [12 1 4 15 2 4 67])
> "[12 1 4 15 2 4 67]"

And of course I can extend my own abstractions with the new behavior ..

(defrecord Name [last first])

(defn name-desc [name]
  (str (:last name) " " (:first name)))

(name-desc (Name. "ghosh" "debasish")) ;; "ghosh debasish"

(extend-type Name
  SHOW
  (show [n]
    (name-desc n)))

(show (Name. "ghosh" "debasish")) ;; "ghosh debasish"

No Inheritance

Protocols help you wire abstractions that are in no way related to each other. And it does this non-invasively. An object conforms to a protocol only if it implements the contract. As I mentioned before, there's no notion of hierarchy or inheritance related to this form of polymorphism.

No object bloat, no monkey patching

And there's no object bloat going on here. You can invoke show on any abstraction for which you implement the protocol, but show is never added as a method on that object. As an example try the following after implementing SHOW for Integer ..

(filter #(= "show" (.getName %)) (.getMethods Integer))

will return an empty list. Hence there is no scope of *accidentally* overriding some one else's monkey patch on some shared class.

Not really a type class

Clojure protocols dispatch on the first argument of the methods. This limits its ability from getting the full power that Haskell / Scala type classes offer. Consider the counterpart of Show in Haskell, which is the Read type class ..

> :t read  
read :: (Read a) => String -> a

If your abstraction implements Read, then the exact instance of the method invoked will depend on the return type. e.g.

> [1,2,3] ++ read "[4,5,6]"
=> [1,2,3,4,5,6]

The specific instance of read that returns a list of integers is automatically invoked here. Haskell maintains the dispatch match as part of its global dictionary.

We cannot do this in Clojure protocols, since it's unable to dispatch based on the return type. Protocols dispatch only on the first argument of the function.


Permalink

Ants and Haskell

Software Transactional Memory ( STM ) is a concurrency control mechanism designed to simplify programming for shared memory computers. Beautiful Code contains a great introduction to the concepts of STM in the Haskell language

In most languages, locks and condition variables are the main mechanisms for controlling access, but these are notoriously hard to get right. Java Concurrency in Practice is a good read to understand just how many ways there are to shoot yourself in the foot (too few locks, too many locks, wrong locks, race conditions, wrong order, error conditions, deadlock, livelock, live stock, brain explosion). STM simplifies shared memory programming by providing database like semantics for changing memory. Reads/Writes to shared memory happen within a transaction - each memory access appears to happens in isolation from the others and appears atomically to observers. If a transaction conflict with another, then one of the transactions is retried. An implementation typically records the memory accesses somehow and then can decide whether there was a conflict. Languages that restrict mutability (like Clojure and Haskell) have a significantly simpler implementation than imperative languages such as C/C++.

Composability is another advantage for STM. For example, take java.util.Hashtable - what if you want to do an insert/delete as a single atomic operation and only make the contents visible to other threads once finished? As the original design didn't do this, you're on your own. In contrast STM composes well.

Both Clojure and Haskell feature support for STM. The canonical example in Clojure is Ants.clj that demonstrates STM via a simple simulation of foraging ants (see also Flocking about with Clojure). As a learning exercise I thought it'd be neat to try to convert this over to Haskell!

Ants in Haskell

To model the ants world, I use the following data structures. Transactional variables (TVars) are used to hold a reference to a mutable variable. For the Ants simulation, I use a Vector of TCell's to represent the ants world.



TVars can only be modified within the STM context. The key thing is that the only way to mutate transactional variables is from within the STM monad. To fiddle with the variables within TVar you can use newTVar, readTVar and writeTVar. Oddly there didn't seem to be a primitive operation to update a TVar based on its current value. updateTVar updates the TVar by applying a function to the value inside.



check verifies that a condition is true and if it isn't true then the transaction is retried. The key point is that the transaction is only retried when there's a reason to do so (e.g. memory read/write) so you aren't just heating the CPU whilst the condition is being validated. As an example, when we move an ant forward, we want to check that there is not an ant in the way. If there is an ant in the way, we'll wait till the coast is clear before moving.



At some point you have to run your STM actions. atomically runs the STM transactions from the IO monad (e.g. the top-level) and returns the result. A very important point is that you want your actions to be in the STM monad as much as possible. If all of your functions are run within the IO monad then you lose the composability aspect. The pattern to use is make everything use the STM monad and glue together randomly and you won't have a threading problem (you still have other problems though, it's not magic).

The Clojure code used agents to represent each ant. I'm not sure what the most idiomatic translation to Haskell is, but I spawned a thread for each ant using Control.Concurrent and forkIO. Haskell threads are incredibly light-weight so spawning even thousands of them is not a problem. Each ant thread simply evaluates the behaviour, moves, sleeps and repeats.

The rest of the code is more or less a direction translation from the Clojure. It's pretty verbose so I won't bother posting it here, but the full code is on my github page. You should be able to compile it with ghc -lglut --make -main-is AntsVis AntsVis.hs. Any hints on how to make it suck less appreciated!

Performance seems very good with the default compiler options, I'm able to run with 100+ ant agents all running concurrently.  The programming is very simple and once I'd found out and added the appropriate check logic into move everything worked properly.

Hurrah for simple concurrent programming!

(update 30/8/2010 - after finding out the performance sucked after a while with a few more ants than I'd tested with I looked at speeding up the Ants program).


Permalink

ClojureCLR

Installation (Getting to the REPL)



First off, you need Visual Studio. I’m doing this with 2008, but I’m pretty sure it would also work on 2010 and the relevant Express versions. I have doubts as to how well it would work with 2005 and am clearly not going to test it.

Download ClojureCLR 1.1.0 from the github page.

Follow the instructions here.


If you have trouble with that, use these instructions. They worked for me:

  1. Download ClojureCLR and unzip it somewhere

  2. Download the Dynamic Language Runtime: http://dlr.codeplex.com/

  3. Copy the unzipped folder into your Clojure parent directory if you want the Clojure solution to work out of the box. Rename it DLR_Main. Just to make this step clear, if your Clojure solution file is C:\dev\gukjoon-clojure\Clojure\ClojureCLR.sln, then you need to have DLR in C:\Dev\DLR_Main. The subfolders for that should be C:\Dev\DLR_Main\src and C:\Dev\DLR_Main\Samples.

  4. Open the Clojure solution in the Clojure folder.

  5. For now, unload Clojure.Test from the solution.

  6. I had trouble with the post-build in Clojure.Main so I just took that out. If you do this, you will also have to copy the “clojure” folder from Clojure.Source into the output directory (“bin/Debug”) of Clojure.Main.

  7. Run Clojure!


Making a package to build external solutions


Installation is kind of a bitch with ClojureCLR, especially compared to Java Clojure. To help smooth over acceptance, I created a Clojure package that had all the Clojure binaries in it and two batch scripts that would use nant to build from target solutions and start a REPL with the target solutions loaded.

First copy your bin/Debug directory into a folder somewhere. Then, get nant and put that in the same directory.

Bootstrap.build
You’ll have to customize this nant build file for your solutions. You set a base directory for your solution (project.dir) and then can set up targets for each solution you have. I’m not very good with Nant so if you have recommendations for improving this buildfile, definitely let me know.

BuildClojure.bat
BuildClojure.bat runs the nant build file. It takes in an argument that it passes along to the nant build as the target. So depending on how you set up the nant build file, you could build specific solutions or all of them.

RunClojure.bat
RunClojure.bat is kind of worthless. It’s just one line: .\Clojure.Main.exe .\Startup.clj. I mostly have it because the Windows command prompt is a usability nightmare. I would much rather click to run Clojure.

Startup.clj
RunClojure.bat starts clojure with a startup script that loads all the assemblies you need. .NET classloading doesn’t work quite as well as Java so here I am loading the assemblies manually. You will want to change “Adc.*.dll” to something else, unless you also happen to work at AOL Advertising.

Generics


A big problem I came across was the lack of generics support in vanilla ClojureCLR. You actually need to pass in parameter types in .NET for generic types and methods and our legacy code was littered with generic types and methods. I didn’t really even need to create generics, just use them.

It’s fairly easy to hack your way around this using reflection. However, I would advise against doing it the way I did:

Clojure/CljCompiler/GenGeneric.cs
Clojure/CljCompiler/GenGenericMethod.cs
core-clr.clj additions

I created two new classes to do the actual reflection, when you can totally do the whole thing in Clojure with interop. You probably should also put your clojure code in a separate file from the other core-clr functions. This is left as an exercise to the reader.

If you choose to use this code, just add it into your solution in the appropriate places and recompile Clojure.Main and Clojure.Source (make sure to copy over clore-clr.clj.dll from the clojure directory in the Clojure.Source build.) gen-generic takes in a generic type, a vector of types and then arguments to the constructor. It will return an instance of the type. gen-generic-method takes in your callsite (either a type or an object,) the method as a String, a vector of types and arguments to the method.

Since you still need to pass in a generic type to the gen-generic function, you will have to use Type/getType to get the generic type from its string representation. In .NET the number of type parameters is reflected by a ` and then the number. For example, a list would be “System.Collections.Generic.List`1” and a dictionary would be “System.Collections.Generic.Dictionary`2.”

How I used it:


The problem I found with debugging any sort of object oriented code is that your control tends to be limited to an external interface. While the Visual Studio debugger is very helpful in narrowing down where your code is broken, the stubby fingers that C# (and Java, too) force on you make pinpointing irregular behavior harder than it needs to be. I believe that as a developer, no component of your system should be a black box if you don’t want it to be.

My debugging usually works this way:
1) Duplicate the bug in my dev environment, as described by QA
2) Walk through the callstack to pinpoint the exact point where irregular behavior occurs
3) Consistently duplicate this irregular behavior
4) Figure out how to fix it.

Any seasoned developer will tell you that 1-3 are far more difficult and time consuming than 4. Fixing shit is easy. Figuring out what to fix is hard. Furthermore, straight up exceptions tend to be easy to pinpoint using the Visual Studio debugger. The hard bugs to fix are ones that cause bad behavior without any overt exceptions.

ClojureCLR is a tool for helping with 2-3 (sorry, you’re still shit out of luck for non-deterministic bugs that are impossible to duplicate) in finding “soft” bugs. You want to stand up individual components, test these components with inputs and see if you get the expected outputs. It’s far easier to do this on the fly in Clojure than writing a seperate program, compiling it and doing it.


Anyways, long story short: don’t use ClojureCLR unless you have to, but it is useful if you have to deal with .NET.

Permalink

A simple chat app using Aleph, Websockets and Clojure

I implemented a small example shoving how to use the websocket support in Aleph, the asynchronous webframework for Clojure built on Netty.

UPDATE: The example is tested with Chrome and Firefox on Ubuntu. It should work in all modern browsers as it rely on web-socket-js for websocket emulation in browsers that do not have native support. Please note the updated Usage instructions. The socket-policy-server necessary for Flash websocket emulation has to listen on port 843 (at least that is the first place Flash asks for the policy file) the server has to be run using sudo.

If you are interested the example is at hosted on github, along with usage instructions.

Permalink

Today in the Intertweets (Aug 26th Ed)

  • How we Deploy our Clojure App (here, via @kyleburton) — Automatically deploy a clojure webapp using Chef.
  • Reusable method implementations for deftypes (here, via @planetclojure) — The new types in clojure 1.2 are very powerful, both in terms of their flexibility and their performance. Deftype allows you to create lean java classes from clojure. Records are deftypes extended to become first-class clojure citizens (e.g. map support, metadata support, etc…) Currently there is no support for reusing method implementations in deftypes (i.e. reuse map support from records.) This article introduces the library methods-a-la-carte that allows you to do just that.
  • Get @chrishouser’s top-selling “The Joy of Clojure” for 40% off using code s140 (until Sept 1) (here, via @fogus)
  • Used my lunch break to update the Trammel docs with the new syntax (here, via @fogus) — Trammel is a contracts programming library for Clojure that is WIP. Lately the syntax has changed quite a bit and this article introduces this new syntax.
  • Follow-up to yesterday’s post with some examples on using #zeromq to connect #clojure to #ruby! (here, via @trydionel) — This is a follow-up to this article showing how to connect to zeromq from Clojure, and it shows how to get Clojure and Ruby talking.

Permalink

Using 0MQ for Clojure and Ruby Interop

Now that you have ZeroMQ working with Clojure, some introductory examples are in order. The demonstrations below show how simple it is to bridge the gap between programming languages by using ZeroMQ, focusing on connecting Clojure and Ruby.

Pub/Sub Pattern

Our first example is the pub/sub pattern. Our code will broadcast messages from Ruby which our Clojure process can subscribe to. One interesting capability of 0MQ is that we can subscribe to “channels” — that is, we’ll only receive messages which start with a given query.1

#!/usr/bin/env ruby -wKU
# pub.rb
require 'rubygems'
require 'zmq'

zmq = ZMQ::Context.new
socket = zmq.socket(ZMQ::PUB)
socket.bind "tcp://127.0.0.1:5555"

# A REPL of sorts.  Messages entered here
# will get sent to our Clojure process.
while true
  '> '.display

  input = gets
  break unless input

  input.each do |msg|
    socket.send(msg)
  end
end

exit 0
; src/my-project/sub.clj
(ns my-project.sub
  (:use [org.zeromq.clojure :as zmq]))

(defn- string-to-bytes [s] (.getBytes s))
(defn- bytes-to-string [b] (String. b))
(defn- on-thread [f]
  (doto (Thread. #^Runnable f)
    (.start)))

(defn launch-subscriber [query]
  (on-thread
   #((let [ctx (zmq/make-context 1)
	   socket (zmq/make-socket ctx zmq/+sub+)]
       (zmq/connect socket "tcp://127.0.0.1:5555")
       ; Note the dot-syntax! clojure-zmq is a bit behind
       ; on the Java API, so you have to fall back to straight
       ; Java interop to subscribe to a pub "channel".
       (.subscribe socket (string-to-bytes query))
       (while true
	 (println (bytes-to-string (zmq/recv socket))))))))

; Subscribing to the "foo" channel.  We'll only see messages prefixed with foo.
(launch-subscriber "foo")

After launching the Clojure process, hop into the shell:2

$ ruby pub.rb
> Hello There
# No entry in the Clojure output
> foo Hello There
# prints "foo Hello There" in the Clojure output

Request/Response Pattern

The request/response technique allows us to delegate work from our master program to a worker process. In this particular example, we’re sending lists of values from our Ruby process to our Clojure process, which adds them and sends back the result.

#!/usr/bin/env ruby -wKU
# req.rb

require 'rubygems'
require 'zmq'

zmq = ZMQ::Context.new
socket = zmq.socket(ZMQ::REQ)
socket.bind "tcp://127.0.0.1:5556"

while true
  '> '.display

  input = gets
  break unless input

  input.each do |msg|
    socket.send(msg)
  end
  puts socket.recv
end

exit 0
; src/my-project/rep.clj
(ns my-project.rep
  (:use [org.zeromq.clojure :as zmq])
  (:use [clojure.contrib str-utils]))

(defn- string-to-bytes [s] (.getBytes s))
(defn- bytes-to-string [b] (String. b))
(defn- on-thread [f]
  (doto (Thread. #^Runnable f)
    (.start)))

(defn handler [request]
  (let [request (bytes-to-string request)
	values (map
		#(Integer/parseInt %)
		(re-seq #"\d+" request))
	output (str (apply + values))]
    (println (apply
	      str
	      (str-join " + " values)
	      " = "
	      output))
    output))

(defn make-adder []
  (on-thread
   #((let [ctx (zmq/make-context 1)
	   socket (zmq/make-socket ctx zmq/+rep+)]
       (zmq/connect socket "tcp://127.0.0.1:5556")
       (while true
	 (let [request (zmq/recv socket)
	       result (handler request)]
	   (zmq/send- socket (string-to-bytes result))))))))

(make-adder)

Fire up the Clojure process, and hop back into the shell:

$ ruby req.rb
> 2 2
4
# The Clojure process will print "2 + 2 = 4"
> 8 8
16
# The Clojure process will print "8 + 8 = 16"

You can find more ZeroMQ patterns on the official cookbook page.

  1. This could be useful for using 0MQ for method dispatch, e.g. this socket will accept “add” messages and sum the terms in a message, whereas another socket will accept “log” messages and log the message to file.
  2. If you launched the process through lein and swank-clojure, your output will appear in the console which started lein swank.

Permalink

Clojure EMACS swank slime maven maven-clojure-plugin

I've just had to set this up on a fresh install of ubuntu and it's all got very easy.


Install maven: 


$sudo apt-get install maven2 


Then create a pom.xml file to tell maven which repositories to use. There's an example below:


$mvn clojure:swank


Will start a swank server with clojure and clojure-contrib 1.2 on the classpath.


That's it from the clojure side.


For emacs, 


$sudo apt-get install emacs


Then install the emacs lisp package archive (see http://tromey.com/elpa/)

by evaluating this code (cut and paste it into the scratch buffer, put the cursor in the middle, and use M-C-x):


--emacs lisp to install elpa---------------






(let ((buffer (url-retrieve-synchronously
"http://tromey.com/elpa/package-install.el")))
(save-excursion
(set-buffer buffer)
(goto-char (point-min))
(re-search-forward "^$" nil 'move)
(eval-region (point) (point-max))
(kill-buffer (current-buffer))))



---end of emacs lisp to install elpa-------------




Now use M-x package-list-packages to bring up the list of packages, and use i and then x to mark and then install slime, slime-repl, and clojure-mode.


Then connect emacs to the already running clojure image with M-x slime-connect


That should be it. You should now be at a running clojure 1.2 repl inside emacs.







-------------------------------------------------------------------------------------------------------------------------------

Here's an example of a pom.xml file that pulls in clojure and clojure-contrib 1.2 . Just cut and paste it.




As well as the essentials, I've also added the maven versions plugin, which helps with keeping everything cutting edge, and jline so that command line repls work better (mvn clojure:repl)

I like to have my startup repls conditioned a little, so if you have a startup script that you always want to run, add this snippet

        <configuration>
          <replScript>startup.clj</replScript>
        </configuration>

to the clojuire-maven-plugin section so that when maven starts a repl, the code in startup.clj is loaded as the first action.

-pom.xml----------------------------------------------------------------------------------------------------------------



<project>

<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>hello-maven-clojure-swank</artifactId>

<version>1.0-SNAPSHOT</version>
<name>hello-maven</name>
<description>maven, clojure, emacs: together at last</description>

<repositories>
<repository>
<id>clojars</id>
<url>http://clojars.org/repo/</url>
</repository>
<repository>
<id>clojure</id>
<url>http://build.clojure.org/releases</url>
</repository>
<repository>
<id>central</id>
<url>http://repo1.maven.org/maven2</url>
</repository>
</repositories>

<dependencies>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-contrib</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>jline</groupId>
<artifactId>jline</artifactId>
<version>0.9.94</version>
</dependency>
<dependency>
<groupId>swank-clojure</groupId>
<artifactId>swank-clojure</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>com.theoryinpractise</groupId>
<artifactId>clojure-maven-plugin</artifactId>
<version>1.3.3</version>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>versions-maven-plugin</artifactId>
<version>1.2</version>
</plugin>
</plugins>

</build>

</project>

--------end of pom.xml

Permalink

Reusable method implementations for deftypes

One of the big new features in the recently released Clojure 1.2 is the possibility of defining new types having data field and implementing methods conforming to interfaces. Clojure provides two levels of user-defined types: the basic deftype, for defining everything from scratch, and defrecord, which adds method implementations for a couple of interfaces (some from Java, some from Clojure) that make the new type act like a Clojure map object.

But what if you want something in between? For example, to make your type a good Clojure citizen, you want it to accept metadata (a feature provided by defrecord), but you don’t want all the map stuff. Or perhaps you want a map-like interface for the fields of your type, but without the possibility to extend the map with new keys. Clojure doesn’t help you out of the box; your only choice is to re-implement the required interfaces yourself, or borrow the code from Clojure’s defrecord, if you are up to deciphering how it works. There is no way to reuse method implementations.

This also becomes a problem if you want to reuse your own method implementations. You’d need to write your methods outside of any deftype, possibly in a way that allows parametrization, and then insert the code into a deftype form. You might be tempted to use macros for this, but that won’t work: macros are expanded as part of the evaluation of forms, but inside a deftype form, almost nothing gets evaluated. The only place where macros can be put to use inside a deftype is inside the code of the individual methods.

The library methods-a-la-carte (also available at clojars) comes to your rescue. It defines a templating system, similar in spirit to syntax-quote but with some important differences, that lets you define parametrized templates for methods and sets of methods. It also defines an enhanded version of deftype, called deftype+, which expands such templates inside its body. Finally, it comes with a small collection of predefined method implementations, corresponding to the features of defrecord but available individually.

First, a simple example of a type that reuses just the metadata protocol implementation:

(ns example
  (:use [methods-a-la-carte.core :o nly (deftype+)])
  (:use [methods-a-la-carte.implementations :o nly (metadata keyword-lookup)]))

(deftype+ foo
  [field1 field2 __meta]
  ~@(metadata __meta))

(def a-foo (with-meta (new foo 1 2) {:a :b}))
(prn (meta a-foo))

This type has two plain fields (named rather unimaginatively field1 and field2), and a special field __meta for storing the metadata. This happens to be the name that Clojure’s defrecord uses for the metadata field, but this is unimportant. What is important is that the name begins with a double underscore, as deftype handles such fields specially: they are omitted from the constructor argument list (to the best of my knowledge this is an undocumented feature of deftype). Whatever name you choose, you have to give the same name as a parameter to the metadata template.

Let’s add another feature to our type: keyword lookup:

(deftype+ foo
  [field1 field2 __meta]
  ~@(metadata __meta)
  ~@(keyword-lookup field1 field2))

(def a-foo (new foo 1 2))
(prn (:field1 a-foo))
(prn (:field2 a-foo))

The parameters to the template keyword-lookup are the field names for which you want keyword lookup. It can be any subset of the type’s fields.

By now you might be curious to know how the templates are defined, for example in order to define your own. Here’s the metadata template, the simplest one in the collection:

(defimpl metadata [fld]
  clojure.lang.IObj
  (meta [this#]
    ~fld)
  (withMeta [this# m#]
    (new ~this-type ~@(replace {'~fld 'm#} '~this-fields))))

This template has one parameter, fld, naming the field that stores the metadata. Everything after the parameter list is the content of the template, with a tilde standing for expressions that are replaced by their values, just as with syntax-quote templates. Another similarity with syntax-quote is that symbols ending with # are replaced by freshly generated unique symbols.

There are two major differences between the new templating mechanism and the well-known syntax-quote:

  1. Symbols are not namespace-resolved. This is important because, contrary to the use of templates in macro definition, namespace resolution is not appropriate for most of the symbols in a method template (method names, method arguments, interface and protocol names).
  2. Symbols are not looked up in the lexical environment (there is none), but first in a dynamic environment and then in the namespace of the template definition.

The dynamic environment is initialized by deftype+ with the following values:

  • this-type: the symbol naming the type being defined
  • this-fields: the vector of field names supplied to deftype+

The above method template used both these values in its code for withMeta. Here is what the first example (type foo with just the metadata implementation) expands to:

(deftype
  foo
  [field1 field2 __meta]
  clojure.lang.IObj
  (meta [this#2515] __meta)
  (withMeta [this#2515 m#2516] (new foo field1 field2 m#2516)))

As with all templating mechanism, including syntax-quote, the interplay of evaluation rules, substitution rules, and quoting requires some experience before it becomes to seem natural. Be prepared for some head-scratching as you write your first templates. Simply using them should be much easier, and probably sufficient for most users. Feedback welcome!


Permalink

stumbling towards the clojure api

Here are two examples of code I’ve written and re-written lately: My first example came about from dealing with DOM Document elements and Nodes, specifically getting a named attribute from a given Node: ;; bad, throws a NullPointerException if any of the Java methods return nil (defn get-attribute1 [elt attr] (.. elt getAttributes (getNamedItem attr) [...]

Permalink

Sparse Matrices: infer, Clojure and the jvm

When we started Infer, and ran the first set of linear algebra benchmarks, we talked about the advantage of UJMP in bringing other matrix packages onto the classpath and using UJMP's wrappers around them via a consistent API.  This turned out to be very helpful for benchmarking UJMP's native dense and sparse matrices against colt's, and parallel colt's.

A special thanks to Holger Arndt, UJMP comitter, for help using the various sparse matrix implementations via UJMP and catching us up on the state of the art for jvm sparse matrix support.

Holger shares the status for sparse matrices in UJMP:

  • matrix multiplication: full support
  • add/subtract/times/divide: not optimized but working
  • decomposition such as svd, inv, etc: treated as dense matrix

The DefaultSparseDoubleMatrix is UJMP's default implementation, which supports multi-dimensional sparse matrices and stores the entries in a HashMap of coordinate-value pairs.

For two dimensions, this is not an optimal representation, so we test it out against ColtSparseDoubleMatrix2D and ParallelColtSparseDoubleMatrix2D, as well as the built in dense matrix operations.

The only other libraries that I know of that support sparse matrix operations are MTJ (which, as Holger pointed out, only allows only immutable matrices) and Mahout, which can perform matrix multiplications and truncated svd on distributed disk-based sparse matrices.  We want to try this out in Infer, but distributed matrix operations is a later step - we want to find the best solution for in-memory sprase matrices first.

Holger hypothesizes that for matrices of less than 5000x5000, we should probably be using UJMP's dense matrix implementation, as he thinks that it will be faster than Colt's or ParallelColt's sparse matrix.  We test this out below to see at what point sparse matrix operations are faster the dense.


Creating sparse matrices in Infer

To create a sparse matrix in Infer you just call sparse-matrix.  This creates a matrix backed by UJMP's DefaultSparseDoubleMatrix.  You pass it a seq of maps, where each map represents a row in the matrix, with the keys being the column indices, and the values being the entries in the matrix at the corresponding row and column index.

user> (use 'infer.matrix)
nil
user> (sparse-matrix [{0 1, 5 2, 9 3} {4 4, 9 5, 16 6}])

Then to get back to Clojure you can do:

user> (from-sparse-matrix (sparse-matrix [{0 1, 5 2, 9 3} {4 4,9 5, 16 6}]))
> ([0 0 1.0] [0 5 2.0] [0 9 3.0] [1 4 4.0] [1 9 5.0] [1 16 6.0])

Notice that you get vectors of the available coordinates and the entry at that coordinate.  You can also get back the map representation for the 2d case.

user> (from-sparse-2d-matrix (sparse-matrix [{0 1, 5 2, 9 3} {4 4,9 5, 16 6}]))
> ({0 1.0, 5 2.0, 9 3.0} {4 4.0, 9 5.0, 16 6.0})

You can also create sparse colt matrices and sparse parallel colt matrices.

user> (sparse-colt-matrix [{0 1, 5 2, 9 3} {4 4,9 5, 16 6}])
user> (sparse-pcolt-matrix [{0 1, 5 2, 9 3} {4 4,9 5, 16 6}])


OK, now we're ready to do some benchmarks

We are benching UJMP dense matrices vs. UJMP built in default sparse matrices vs. colt sparse matrices vs. parallel colt sparse matrices.

In order, the benchmarks below reflect the timings for the following fns from infer.matrix-bench in infer/test:

  • bench-dense
  • bench-sparse
  • bench-sparse-colt
  • bench-sparse-pcolt
infer.matrix-bench> (bench-all [10 times 100 1000 1000 100 0.01])
"Elapsed time: 88.47919 msecs"
"Elapsed time: 126.646131 msecs"
"Elapsed time: 40.448648 msecs"
"Elapsed time: 232.068246 msecs"

infer.matrix-bench> (bench-all [10 times 1000 1000 1000 1000 0.01])
"Elapsed time: 9330.419378 msecs"
"Elapsed time: 1254.609118 msecs"
"Elapsed time: 415.164851 msecs"
"Elapsed time: 2351.761057 msecs"

infer.matrix-bench> (bench-all [10 times 1000 10000 10000 1000 0.01])
"Elapsed time: 115559.75808 msecs"
"Elapsed time: 12666.080122 msecs"
"Elapsed time: 4827.539895 msecs"
"Elapsed time: 25237.850561 msecs"

This benchmark runs matrix multiplication (the "times" fn) 10 times, and the sparse matrices in the benchmark have 1% column sparsity.  In order to take advantage of sparsity, your matrix should be sufficiently sparse (probably <= 1% is a good start) and the dimensions of your matrix should be sufficiently large.

We can see that we are generally taking advantage of sparsity already with 1000X1000 matrices, and colt sparse matrices are faster even for the 100X1000 case. 

I've seen many times that parallel colt is slower, even on multiple cores.  It's strange.  I have no explination for this other than a guess that perhaps we need to configure things differently and it is currently just paying synchronization costs but not threading properly.

Looking at SVD:

infer.matrix-bench> (bench-all [10 #(svd %2) 100 1000 1000 100 0.01])
"Elapsed time: 3069.474062 msecs"
"Elapsed time: 3000.594768 msecs"
"Elapsed time: 1925.07847 msecs"
"Elapsed time: 1923.685277 msecs"

infer.matrix-bench> (bench-sparse-colt [10 #(svd %2) 1000 1000 1000 1000 0.01])
; Evaluation aborted.
infer.matrix-bench> (bench-sparse-colt 10 #(svd %2) 1000 1000 1000 1000 0.01)
; Evaluation aborted.
infer.matrix-bench> (bench-sparse-colt 10 #(svd %2) 100 10000 10000 100 0.01)
"Elapsed time: 60045.680803 msecs"

We can see that the SVD is taking so long that we have kill it for 1000X1000 matrices, so we tried to just run the benchmark once rather than 10 times so we can tolerate it.

infer.matrix-bench> (bench-dense 1 #(svd %2) 1000 1000 1000 1000)
"Elapsed time: 55596.25651 msecs"

infer.matrix-bench> (bench-sparse-colt 1 #(svd %2) 1000 1000 1000 1000 0.01)
"Elapsed time: 59941.746648 msecs"

So we can do 1 SVD on a 1000X1000 matrix with 1% of it's values populated in about 60 seconds.  This is not fast.

To get some perspective, let's compare this with with the SVD of a 1000X1000 dense matrix in R and NumPy on the same machine.

First, in R.

> a <- matrix(rnorm(1000*1000), nrow=1000)
> system.time(svd(a))

user  system elapsed
7.704   0.056   7.773

Then numpy.

>>> import numpy
>>> import time
>>>
>>> a = numpy.random.random((1000,1000))
>>>
>>> t = time.time(); numpy.linalg.svd(a); print time.time() - t

7.57887601852

So we see that the java SVD implementations are about 10X slower than what is achievable in R or NumPy.  This makes us want to explore better decompositions in Infer.


Conclusions and next steps

You can see that you start to get performance enhancements from sparse multiplications with pretty small dimensions as long as your matrices have sufficient sparsity.  The colt sparse matrix is as fast as it gets on the jvm, and you can create these with sparse-colt-matrix, and the matrix operations will work on them just as with any other Infer matrix (wrapping UJMP matrices).

SVD performance is disappointing in general, and is something we plan to look into implementing in Java ourselves.

Thanks again to to for Holger Arndt for help with UJMP  and understanding the state of sparse matrices on the jvm, and to Hamilton Ulmer for pairing with me on sparse matrices in Infer.

Permalink

Best In Class: Developer Productivity - The Red Pill

Preface

"I know why you're here. I know what you've been doing... why you hardly sleep, why you live alone, and why night after night, you sit by your computer. You're looking for, it I know because I was once looking for the same thing. And when I found it, I knew what I'd been searching for. I was looking for an answer. It's the question that drives us. It's the question that brought you here. You know the question, just as I did."                        --- The (sorta) Matrix

Last time I blogged, I had just returned from the first Conj Labs in Brussels and coincidentally I've been tied up with Clojure development, so that now just as we open registrations for Conj Labs Frankfurt am I able to blog again. My last blogpost was meant as an inspiration to adapt your work environment to make you the most productive - I purposely didn't move into the 'how' as I just wanted to get people interested. It worked it seems so now I'm back to wrap up the series with some productivity tips as well as configs.

 

Top 10 Productivity Boosters

(by request)

 

I've tried to think about some of the deliberate choices I've made to function better in my job, but Im sure your mileage will vary.

1. Get around 7 hours of sleep every single night

If your sleep isn't good you will suffer mentally and energywise. Catching up on a bad nights sleep is trickier than it sounds, so the key is to stick to a routine.

2. Avoid caffeine entirely (coffee & tea)

Ive been a long time coffee drinker (read: bottomless coffee pit), but after I returned from Conj Labs Brussels I considered the effect of the caffeine in my system and how I got up in the morning and felt like I was in a daze until I got my first cup. Taking that thought further I reasoned that after having slept through the first 2 hours of the night, the next 5 would be spent by my body craving the next cup, which was how I would wake up. Since I only have 2 settings (off or on) I quit coffee over night, cold turkey. It took 3 - 4 days before I mentally was on top again, but it took 14 days for the headaches to go away (they were bad). Now that its over and done with Im not touching that stuff again, Im mentally alert and sharp from the moment I roll out of bed till I get back in - No additives needed!

3. Maintain your tools

What good is a lumberjack who doesn't sharpen his axe? Only a little more than a developer who doesn't exercise his body. In our younger years most of us felt immortal, nothing we ever did gave us any lasting marks despite our parents warnings (at least I hope you were as fortunate as me), but as we grow older it becomes clear that the years take their toll. Lack of exercise is a killer - You'll feel it in your bones, when typing, in your energy levels, in your ability to focus - Everything works better if you maintain your primary tool: Your body. Personally I try to work in some exercise either into my lunch-break or in the evening - It takes a little time, but its worth it.

4. Dont be vain

In all honesty, one of the reasons I wanted to try out Linux way back when, was because of some of the Compiz/Beryl visual effects for the desktop - this is vanity. One of my friends recently saw my desktop (in all its Awesome Emacsy splendor) and his comment was "upgrade that ****" because he didn't feel that it was as pretty as his new Windows 7 installation, which it isn't. But in the time he has booted to his Desktop I've already answered 3 emails, said good morning on #clojure and written my first few lines of code - Vanity slows you down, whether its your choice of VM, OS (read: OSX or Windows) or anything else - If you want to be productive, aim for productivity not prettiness.

5. Avoid Social Networks

One of the strongest weapons for building quality systems is the ability to concentrate over prolonged periods of time. If you get regular Facebook updates, Twitter updates, or any other kind of updates which steals your attention, even if its just for a few seconds, I'm willing to bet that you're working at 50% of your full capacity. Why? Because even though it takes you 5 seconds to read a twitter update, I'm guessing it takes you 5 minutes to regain full mental focus. If your updates are coming in at a rate around 1 update every 5 minutes you're never running at full speed. If you are connected to something like twitter, which I am, I recommend that you check it once in a while - I do it in the morning or evening, but typically not during the day - and not every day. There are 2 exceptions: If my phone rings or if I get an email I usually reply ASAP whenever possible, since customers shouldn't have to wait.

6. Work off a TODO list

I showed this in my last screencast as well, how I organize my TODO bullets into categories A, B or C. A means 'will loose significant value if not done today', B means 'important, but will not loose significant value if not done today', C means 'optional'. I cannot stress how important this bullet is - On its own it might overshadow the other 9 in the short-term. When working as a Project Manager I have seen several developers who select some minor task to work on, and everytime they reach a milestone they look around, not knowing what to do, then start reading online newspapers or chit-chatting. Sure there's a time and a place for that, but if you're doing it consistently everytime you've put down 50 lines of code, you have a problem. Working off a TODO list means, that when you're in need of a high level of productivity you cross one item off the list and move on to the next! For many developers, huge wins in terms of productivity are available when the downtime between 2 tasks is cut out.

7. Use a tiling WM

If you're not on a tiling WM, you're constantly switching between the mouse and keyboard. Everytime you want to click an item with the mouse you have to find it visually, take aim, click, maybe miss, click again. In most cases you hit the first time, but you've wasted time context switching and taking aim. When I was younger I had shoulder/neck pains which I think came from using the mouse too much - These days I never have pains.

8. Use Emacs for Everything

This was a central theme in my last blogpost: Integrate as much as you can into Emacs: Code editing, HTML editing, Emailing, reading Twitter streams, file management, Git integration, Day planning and whatever else you can think of. Have a uniform interface and heavy integration between your tools speeds you up enormously.

9. ...Use Conkeror for the rest

Whether you like it or not, you probably spend quite a bit of time in a webbrowser - Either finding javadocs, searching for libraries or something similar. When I switched from Chrome to Conkeror my ability to browse became almost equal to my ability to read and think. As soon as I could think of which link to click, the page was already loading - Thats the power of a keyboard-based browser, why settle for less?  (preemptive strike: Some would argue that the lack of Firebug integration etc, would be an argument for not using Conkeror, but notice I said use it for 'browsing', I use many different browsers for debugging/testing)

10. Always keep looking

I didn't start out with neither Arch, Awesome or Emacs, but a continual focus on optimizing my tools eventually got me here - Who knows where I'll be in 10 years. Right now Im trying to learn the NEO keyboard layout, because Im told it greatly reduces the distance your fingers have to travel when typing, thus speeding you up and minimizing the risk of getting RSI.

 

DISCLAIMER

You might be thinking after reading the above, that Im a sadistic terminator who is determined to run my colleaguees/employees into the ground, sweating, sobbing, broken. Nothing could be further from the truth. The truth is, relax time is important and fun time is important. But I find that I can enjoy all of these the most if I've put in a good amount of work first. So if I am going to work for something like 3 - 4 hours in a row, doesn't it make sense to make those hours as productive as possible without wrecking yourself trying to cut down on breaks, sleep, family time and what not? I think a lot of the common problems that developers struggle with after years in the field can be avoided by adapting the techniques above, but as always I welcome input.

 

Watch out for the pitfall

So what to do, when you've applied all of the above tips and your back is still against the wall timewise. Those who have the resources in terms of colleagues or coworkers tend to try and delegate. Delegation can be like taking a loan to buy something which you really cant afford: It comes back and bites you and ends up costing more than you wanted to pay. Why?

I think primarily because people don't delegate intelligently enough. I recognize two kinds of delegation: Gopher Delegation and Delegation:

Distinguishing

Gopher delegation is the type of delegation that sounds like 'please go to this store, pick up these 3 items, come back and put them on the table in the cantina, arrange the plates, forks and glasses around the table, evenly distributed in the exact amount of people attending the lunch meeting'. This type of delegation is in stark contrast to true delegation which sounds like 'Please make the necessary preparations for our lunch meeting at 12:00'. The key thing to learn, is that gophers need gopher delegation and trusted employees/colleagues need regular delegation.

In the above example, the gopher given the first task would probably forget the knives - he was only asked to get forks and since the entire task had been nicely cut out into smaller tasks he felt no responsibility to go above and beyond, he simply follows orders. For gophers, this is what you want, fortunately there aren't too many of them.

The trusted employee might feel free to pick up more than the 3 items, might clean the table, or apply any number of improvements to your original plan (which he wasn't told) because he feels its his task and his responsebility to work out a great solution. For Conj Labs we work out a number of lab exercises in advance of the course. Imagine I was to ask Christophe to prepare such a lab. Would I get the best result by telling him exact scope of the exercise, where to inject explanatory slides, which functions to use, etc etc? Or would it be better to say 'Christophe, I would love a lab on DSLs can you try to come up with something?' - Let me tell you, he has not yet failed to surprise me :)

In the past I have been bit by delegating important tasks to gophers without being clear enough, and Im sure I have choked trusted employees creativity and intelligence by being too specific - Both are enormous errors where delegations ends up being a pain instead of giving you some freedom with your time. But now for the practical stuff:

 

Emacs keys worth knowing:

I promised to list some of my most used keyboard bindings, so I've asked my fingers and here is the result. Since this is almost an inexhaustable topic, I'll keep it brief and open up the comments section :)

Small helpers

Zap-To-Char (M-z): Kills all characters up to and including the one you supply as an argument

Recenter (C-l): Try hitting it 3 times and see what happens each time (that's C-lower-case L)

Query-replace (M-%): Regex replace, (y to replace, n to skip, ! to replace all, works on regions as well)

Goto line (M-g g): Jumps to a specific line

These three guys go together and I use them constantly for rearranging text:

Kill-Line (C-k): Kills an entire line

Kill-Region (C-w): Kills a region (region: Selected area)

Kill-Ring-Save (M-w): Copies a region to the clipboard

Paste-Ring (C-y): Pastes whatevers in the kill-ring

Paste-again (M-y): Keeps replacing what you just pasted with the next item in the kill ring

For repetitive tasks, these are a must

record-macro-start (C-x (): Records all keystrokes until you stop recording

record-macro-end (C-x )): Stops recording

play-macro (C-x e): Plays the last recorded macro

play-macro-on-region (C-x C-k r): Play its only on the selected region

play-macro-n-times (C-u 10 C-x e): Plays the macro 10 times

save-macro-with-name (C-x C-k n): Give it a name, refer to it later

M-x insert-kbd-macro: Lets you save the Lisp code of your macro for use in future sessions

Optimizations

M-x paredit: Will disable paredit if you enabled it by accident

 

 

Configs

I've put my configs on Github - I dont plan on updating them, so they are there now and can be used as inspiration.

Emacs:

Nothing too special here. There's my swank setup, which has a very customized classpath, this is the times where I want to fiddle or contribute to some project - Most of my development goes on after calling M-x swank-clojure-project. If someone is pinging me in #clojure I hit F12 which makes the channel fullscreen, and once Im done I hit F12 again and the original window configuration is restored completely. At the very bottom Ive added some repos to technomancys new version of ELPA, which I never finished testing (sorry).

Awesome (rc.lua): The awesome config (rc.lua) is looted from anrxc (#archlinux)

This setup will likely save you half a lifetime - Installing Awesome takes a few seconds but configuring your way out of the lua madness (indices start at 1) takes quite some time. Thankfully #archlinux is a good place to get help. There's nothing too specific about this setup, except for when you hit M-q then a small Emacs-Orgmode-Remember window pops up, which allows you to quickly take notes. This feature wont work unless you use (parts of) my emacs config and you need to change a few paths. Finally, your battery is most likely not named as mine, so to get battery stats in the top bar fix the string on line 50 in /awesome/vicious/bat.lua. The entire config goes into ~/.config/awesome. (yes I know its ridicously hacky, I love using Awesome, not configuring it)

Wanderlust:

There are several Email applications for Emacs - They all suck, Wanderlust sucks the least. I use Wanderlust sporadically for sending emails and also for reading emails - I do however always keep a thunderbird icon in the tray to alert me to new emails, as this is one of the (many?) bugs of Wanderlusts IMAP integration - It doesn't notify you of new emails. Word of caution: If you subscribe to the Wanderlust ML, you can only unsubscribe by sending an email from within Wanderlust - Not knowing this, cause a little pain and a lot of flame. When they get the bugs ironed out, I might switch 100% to a fetchmail/wanderlust combo but its not quite ready yet.

Note, that when you're composing an email, hit C-h to convert it to HTML using org-mode-htmlize.

You can find the files: here

 

Conclusion

Its important to be productive - I see it as driving a car where there's no speedlimits, why not see how fast it can go? I hope I was able to inspire some of you to revisit your setup and to start asking the question "How productive can we become?".

There's of course a final very definite way to be more productive and that is to use a screwdriver for screws instead of hammer, and to use Clojure for development of quality software. If you would like to learn more about Clojure and how to use it professionally, I'll recommend you to join us at the next installment of Conj Labs - This time in Frankfurt, Germany. Once again I'm teaming up with the fantastic Christophe Grand (author of Enlive and Moustache) to provide 3 days of Clojure training - We hope to see you there. As a new thing, we have acquired the help of InnoQ who are helping us in spreading the word, so we hope to see many Germans (as well as foreigners) there.


Permalink

Today in the Intertweets (Aug 25th Ed)

  • #Compojure Demystified with an example – Part 4 (here, via @sivajag) — Here are part 1, 2 and 3. This is a series about creating webapps with Compojure and Clojure.
  • Looks like I can improve the Clojure section of DSLs In Action with the new Protocols introduced in 1.2. Non invasive abstractions FTW (via @debasishg) — Those protocols sure are neat…
  • I joined Los Angeles Clojure Users Group on Meetup (here, via @nickmain_ ) — So there, there is a LA Clojure group now.
  • Wrote up the details on getting #zeromq working with #clojure on #osx (here, via @trydionel) — ZeroMQ is a (very fast) messaging library, meant to be used programmatically, as opposed to being a shrink-wrap solution. Making it work with Clojure is not a walk in the park. This article might help if you want to do that on OSX.

Permalink

Setting up 0MQ for Clojure on OSX

ZeroMQ is a simple and robust asynchronous messaging layer.1 Unfortunately, using it with Clojure is far from simple. The steps below are the magic key that worked for me:

  • Starting out nice and slow, install the core zmq libraries and command line tools with homebrew:2
    brew install zmq
  • With ZMQ installed, you’re ready to start building jzmq, the Java bindings. You’ll need to patch the homebrew formula for pkg-config, as jzmq requires the latest-and-greatest to build correctly3
    brew edit pkg-config
    require 'formula'
    
    class PkgConfig <Formula
      homepage 'http://pkgconfig.freedesktop.org'
      url 'http://pkgconfig.freedesktop.org/releases/pkg-config-0.25.tar.gz'
      md5 'a3270bab3f4b69b7dc6dbdacbcae9745'
    
      def install
        paths=%W[#{HOMEBREW_PREFIX}/lib/pkgconfig /usr/local/lib/pkgconfig /usr/lib/pkgconfig /usr/X11/lib/pkgconfig].uniq
        system "./configure", "--with-pc-path=#{paths*':'}", "--disable-debug", "--prefix=#{prefix}"
        system "make install"
      end
    end
    
  • At this point, you’ll need to copy one of the essential pkg-config files into a location that jzmq’s build tools can find.4
    sudo cp /usr/local/Cellar/pkg-config/0.25/share/aclocal/pkg.m4 /usr/share/aclocal/pkg.m4
  • Now we’re ready to build jzmq. This will build the dylib you’ll need to actually use the bindings, and places it in /usr/local/lib.5
    git clone git://github.com/zeromq/jzmq.git
    cd jzmq
    ./autogen.sh
    ./configure
    make
    sudo make install
    
  • Getting closer! We’ll need the zmq.jar file produced during the build, so copy that into your project’s lib directory:6
    cp ./src/zmq.jar /path/to/my/project/lib/zmq.jar
    
  • Now we can actually start setting up Clojure for ZMQ. Edit your leiningen project.clj file:
    (defproject my-project "1.0.0-SNAPSHOT"
      :description "FIXME: write"
      :dependencies [[org.clojure/clojure "1.2.0"]
                     [org.clojure/clojure-contrib "1.2.0"]
                     [org.clojars.mikejs/clojure-zmq "2.0.7-SNAPSHOT"]]
      :dev-dependencies [[swank-clojure "1.2.1"]]
      ; This sets the 'java.library.path' property
      ; so Java can find the ZeroMQ dylib
      :native-path "/usr/local/lib")
    
  • Collect the leiningen dependencies:
    lein deps
    
  • Start your swank server:
    lein swank
    
  • Bask in the results with the canonical example:
    (ns my-project.core
      (:use [org.zeromq.clojure :as zmq]))
    
    (defn- string-to-bytes [s] (.getBytes s))
    (defn- bytes-to-string [b] (String. b))
    
    (defn test-zmq []
      (let [ctx (zmq/make-context 1)
    	socket (zmq/make-socket ctx zmq/+upstream+)]
        (zmq/connect socket "tcp://127.0.0.1:5555")
        (while true
          (println (bytes-to-string (zmq/recv socket))))))
    

This process is definitely rough around the edges, but I hope it helps you get started quickly!

  1. Read this excellent ZeroMQ introduction if you’re not familiar with ZMQ. You’ll love it.
  2. This guide is EXTREMELY brew-centric. If you’re running OS X, you should be using homebrew for dependency management.
  3. There’s an open ticked to update pkg-config in the homebrew backlog, so this may be unnecessary soon.
  4. This step should either be taken care of by homebrew, or by jzmq’s ./configure, but I wasn’t able to get it working. Please leave a note if you know how I could resolve this step!
  5. This is another step that seems problematic. It would be best to just install the dylib straight into OS X’s Java extensions directory. Again, please leave a comment if you can help!
  6. Another hacky step. I gather it’s possible to build jzmq with leiningen, but the native-deps dependency doesn’t seem to work with lein 1.3.0. Suggestions?

Permalink

Quick update

Long time no post. No code in this one though.
However I’m pleased to announce that Lau and me are going to give another Clojure course: October 26-28 in Frankfurt, beginners welcome. So if you want to learn Clojure, register!

Permalink

Today in the Intertweets (Aug 24th Ed)

  • Scala classes in clojure (here, via @ScalaAtSO) — That’s right: make love, not war. It’s refreshing to see Scala and Clojure in the same sentence without a ‘vs.’ in between.
  • Are github-hosted repos the maven gateway drug for my Clojure brethren? Sorry guys: it’s a fact of life on the JVM.(here, via @cemerick) — Yes, it is a fact of life that your project will be broken into a few modules that might evolve concurrently, and you’ll have to manage their dependencies.  This article is actually only tangentially related to Clojure, as it applies to Java itself and all the other JVM-based languages. The fact of life is that you will need to setup a Maven repository which, as the article proposes, can be done without adding much infrastructure overhead.

Permalink

SLIME debugger

http://hugoduncan.org/post/2010/swank_clojure_gets_a_break_with_the_local_environment.xhtml



;; All power to:

;; http://hugoduncan.org/post/2010/swank_clojure_gets_a_break_with_the_local_environment.xhtml

;; Who has given us a way to debug things under emacs/slime/swank, by stopping a
;; running program to examine the variables

;; First define this function under emacs/slime/swank

(defn factorial [n]
(when (= n 23) (swank.core/break))
(if (< n 2) n
(* n (factorial (dec n)))))


;; Then try
(factorial 30)
;; at the repl, so it runs in the repl thread, rather than executing it with
;; C-M-x or C-x e, which for some reason doesn't work as well.

;; Hugo's article explains how to view the local environment and create a repl
;; in context so that you can examine the state of the program when break was
;; called. You can then restart as if nothing had happened.

;; It's not as wonderful as traditional SLIME/LISP debugging, but it's a good
;; start!


Permalink

Reduce: Not Scary



;; Three very fundamental operators in any sort of programming are map, filter
;; and reduce.

;; They represent the common programming tasks of transforming a collection,
;; selecting from it, and summing over it.

;; Most people think that map and filter are fairly obvious, but there seems to
;; be a certain amount of confusion about reduce.

;; But it's actually very simple in concept, and represents an easy idea.


;; Often one needs to loop over a collection, and store results in an
;; accumulator.

;; The simplest example of a reduction would be adding the numbers in a list.

;; Suppose the numbers are 1,2,3,4,5,6,7,8,9,10 and we want to find their sum

;; In C, or another naturally imperative language, we'd say:

;; int list[]={1,2,3,4,5,6,7,8,9,10};

;; int len=10;

;; int a=0;
;; int i;

;; for (i=0; i<len; i++){
;; a += list[i];
;; }

;; Using atoms to provide mutable state, we can do something similar in Clojure:


(def lst '(1 2 3 4 5 6 7 8 9 10))
(def len 10)

(def a (atom 0))

(dotimes [i len]
(swap! a + (nth lst i)))


;; The value ends up in the atom a, just as in the C version.

;; In clojure, this looks slightly more complicated.

;; Partly because mutation in clojure is intrinsically more complicated, because
;; clojure is extremely concerned with thread safety, and so we need to allocate
;; and dereference atoms rather than mutating local variables.

;; And partly because C has very good notations for its fundamental operations.

;; But logically they're the same algorithm.


;; But I'd feel dirty writing this code in clojure, even though that would have
;; been a perfectly good piece of LISP in the sixties. It's just a feeling that
;; I have that it is better to avoid mutation unless it's actually necessary.

;; Even though the mutation-less algorithms are often harder to write, they're
;; often easier to debug and test.

;; A more natural way to accumulate over a list in clojure is the
;; loop-as-function-call, with accumulator and iterator as parameters:


(loop [a 0 i 0]
(if (= i len) a
(recur (+ a (nth lst i)) (inc i))))

;; This is much more idiomatic code in clojure, and it doesn't mutate any values
;; even though the variable-rebinding in the recur call produces a very similar

;; effect.

;; And here the final value is the value of the expression, which is nicer.

;; Of course, clojure's lists know when they are empty, so we don't need an
;; explicit loop counter.

;; So how about:
(loop [a 0 l lst]
(if (empty? l) a
(recur (+ a (first l)) (rest l))))


;; l moves along the list, while a accumulates the values.

;; It still looks a bit long-winded, but we can easily imagine that this is a
;; common pattern:
(loop [acc _ l _]
(if (empty? l) a
(recur (_ a (first l)) (rest l))))


;; Where the blanks represent holes in the boilerplate we have to fill in.

;; It should be almost as common as the equivalent pattern:

;; a= _
;; for(i=0; i<_; i++)
;; {
;; a _= _ [i]
;; }


;; is in C.

;; Where in both cases we need to fill in the _ with the initial value of the
;; accumulator, the list to be accumulated over, and the operation to be
;; performed.

;; Pretty much the first law of programming is:
;; If you see a common pattern, you should name it and abstract it so it goes away.

;; The pattern is called reduce.


;; We need to fill in the blanks with the function to do the accumulating,
;; the initial value of the accumulator, and the list

;; Since we're reducing the list lst, using the operation +, and starting
;; with the value zero, we write:

(reduce + 0 lst)

;; reduce is clojure's natural way of expressing accumulation over a list

;; in the same way as the for-loop over += and ++ is C's

;; Here are some other examples

(reduce * 1 lst)

;; We use * instead of +, and start with 1 instead of 0
;; This produces the product of the numbers in the list.

;; In these cases where the order of the arguments doesn't matter

;; we can think of reduce as 'put the function between the values'

(reduce + 0 '(1 2 3 4 5 6 7 8 9 10)) ; (0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)
(reduce * 1 '(1 2 3 4 5 6 7 8 9 10)) ; (1 * 1 * 2 * 3 * 4 * 5 * ........)

;; But this image is actually harmful when the ordering does matter.

;; What's really going on is that the operator is being used to feed values
;; into the accumulator one by one.

(reduce + 0 '(1 2 3))
;; proceeds like:
;; (+ 0 1) -> 1
;; (+ 1 2) -> 3

;; (+ 3 3) -> 6

;; If this seems confusing, even though you're happy with the C version,
;; think about the C loop.

;; They're actually just different ways of expressing the same idea,
;; and should look equally natural. Seeing how either one can always be
;; transformed into the other will help.


;; Here's an example where the order does matter:
(reduce conj '() '(1 2 3))

;; How do we think about this?
(conj '() 1) ; -> '(1)
(conj '(1) 2) ; -> '(2 1)

(conj '(2 1) 3) ; -> '(3 2 1)

;; So
(reduce conj '() '(1 2 3)) -> '(3 2 1)


;; Of course this simple reduction is so common that the pattern
;; (reduce conj '() _ ) already has a name

(reverse '( 1 2 3 ))

;; Here's the definition of reverse in the clojure.core source code!
(defn reverse
"Returns a seq of the items in coll in reverse order. Not lazy."

{:added "1.0"}
[coll]
(reduce conj () coll))


;; An acceptable definition of reduce itself would be:

(defn my-reduce [fn init coll]
(loop [acc init l (seq coll)]
(if (empty? l) acc
(recur (fn acc (first l)) (rest l)))))


;; This works on any collection that can be made into a sequence:
(my-reduce * 1 '(1 2 3)) ;; a list
(my-reduce * 1 #{1,2,3}) ;; a set
(my-reduce * 1 [1,2,3]) ;; a vector

;; The real reduce in clojure.core is an optimised version and can deal with all
;; sorts of collections efficiently, but in spirit it is just making every
;; collection into a sequence and then doing what my little skeleton above did.


;; It also has another feature, which is that if you don't provide an initial
;; value for the accumulator, then it takes the first element of the sequence as
;; its initial value, and accumulates over the rest of the sequence.

;; For operations which produce answers of the same type as their arguments,
;; this is often what you want.

(reduce * '(1 2 3 4)) ;24

(reduce + [1 2 3 4]) ;10
(reduce bit-xor '(1 2 3 4 5)) ;1

;; So why has this simple operation got a scary reputation?

;; I think it's because all the common cases are so useful that they have

;; already been further abstracted away, like reverse. So in fact you don't
;; meet it that often in practice.

;; Let's see if we can construct something more interesting:

;; Suppose you had a list of strings

(def strlist '("fred" "barney" "fred" "wilma"))


;; And wanted to count how many times each string occurs in the list.

;; We want an accumulator to keep the strings and counts in, and a function
;; which will take that accumulator, and a new string, and return the updated
;; accumulator.

;; The obvious accumulator is a map. We'd want the answer to be something like
{"fred" 2, "barney" 1, "wilma" 1}


;; So what function will add strings to a map?

;;In a rather naive and long-winded way:

(defn addtomap [map string]
(let [oldval
(if (contains? map string)
(map string)
0)]
(assoc map string (inc oldval))))


;; Here's how we'd use it to count our list, starting from the empty map {}, and
;; using addtomap to add each string into the accumulator returned by each call.
(addtomap {} "fred") ;; {"fred" 1}
(addtomap {"fred" 1} "barney") ;; {"barney" 1, "fred" 1}

(addtomap {"fred" 1, "barney" 1} "fred") ;; {"fred" 2, "barney" 1}
(addtomap {"fred" 2, "barney" 1} "wilma") ;; {"wilma" 1, "fred" 2, "barney" 1}


;; So the reduce part is obvious, once you have addtomap

(reduce addtomap {} strlist)

;; But a real Clojure programmer would look at addtomap and think:

;; We can write (map string 0) instead of
;; (if (contains? map string)
;; (map string)

;; 0)

;; So a better version of addtomap would be:

(defn addtomap [map string]
(let [oldval (map string 0)]
(assoc map string (inc oldval))))

(reduce addtomap {} strlist)


;; And now the let statement looks redundant, so let's say
(defn addtomap [map string]
(assoc map string (inc (map string 0))))

(reduce addtomap {} strlist)


;; And then he might say
;; "since I'm only going to use this function here, why not make it anonymous?"
(fn [map string] (assoc map string (inc (map string 0))))


;; And now the reduce looks like:
(reduce (fn [map string] (assoc map string (inc (map string 0)))) {} strlist)


;; And, well, at this point, any reasonable man is going to think:
;; "Since I'm writing a one-liner, I might as well use the anonymous shorthand"

#(assoc %1 %2 (inc (%1 %2 0)))

(reduce #(assoc %1 %2 (inc (%1 %2 0))) {} strlist)


;; And if you already understand reduce and anonymous functions, and how maps
;; work, this is actually not too hard to understand.

;; In fact this is the version of the function that I originally wrote.

;; But I can see it might be a bit off-putting if you thought reduce itself was
;; scary.

;; Actually the obfuscation / pleasing terseness is all in the anonymous
;; function, and the behaviour of maps, and the reduce bit isn't scary at all.


;; Here's another deliberately obscure example, using a little structure as an
;; accumulator. See if you can figure out what it does using the above ideas to
;; unpack it. I'm using the destructuring notation to take the little structure
;; apart, and then the function modifies each part and puts them back together again

(reduce (fn[[c s] n] [(+ c n), (str s n)]) [0,""] lst)


;; The trick is to work out what the anonymous function does to the starting
;; value of the accumulator when it gets a value from the list.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Clojure's natural facility with abstractions and small functions allows
;; some truly terse code.

;; This little piece of code counts words in a file and orders them by popularity:

(sort #(< (%1 1) (%2 1))
(reduce #(assoc %1 %2 (inc (%1 %2 0))) {}
(clojure.contrib.string/split
#"\W"
(slurp "/home/john/hobby-code/reduce.clj"))))


;; With practice this sort of thing is actually readable. Promise!

;; But if I was actually writing it for someone else to read,
;; I'd probably split it up and give the bits names.

(let [filecontents (slurp "/home/john/hobby-code/reduce.clj")
words (clojure.contrib.string/split #"\W" filecontents)
wordmap (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} words)
sortedwords (sort #(< (%1 1) (%2 1)) wordmap)]
sortedwords)


;; And if I knew the library, I'd remember that two of those little operations
;; actually already have names:

;; The idiom
( reduce #(assoc %1 %2 (inc (%1 %2 0)) {} .... )

;; which I used to find myself writing all the time, is such a useful thing
;; that it too has made it into clojure.core as the function frequencies:


(frequencies strlist)

;; and the sort with the comparator on the second elements can be replaced by sort-by, and second

(let [filecontents (slurp "/home/john/hobby-code/reduce.clj")
words (clojure.contrib.string/split #"\W" filecontents)
wordmap (frequencies words)
sortedwords (sort-by second wordmap)]
sortedwords)


;; And then I'd abstract away the word counting operations from the file reading part

(defn sorted-word-frequencies [string]
(sort-by second (frequencies
(clojure.contrib.string/split #"\W+" string))))

;; So now I can ask for the 10 most popular words:

(take 10 (reverse (sorted-word-frequencies (slurp "/home/john/hobby-code/reduce.clj"))))

;; which is also pleasingly terse, but I think more readable.







Permalink

One Language, Many Implementations

The interpreter model that I use to implement Lisp is very simple and flexible. It was originally derived from Peter Norvig’s SILK interpreter, but has gone through multiple revisions over the years. Here is a list of various versions that are currently working: Gwt-Clojure – Clojure syntax, built inJava/Javascript using Google’s GWT compiler and libraries [...]

Permalink

Today in the Intertweets (Aug 23rd Ed)

  • Alle drei Teile unserer Clojure-Serie im JavaSPEKTRUM sind mittlerweile online (here, via @stilkov) — For the German speakers (well, readers would be fine too!), here are the three parts of a series of articles on Clojure for JavaSPEKTRUM (Overview, Data Types and Java Integration, and Concurrency).
  • Version 0.6 of the Grails Clojure plugin utilizes Clojure 1.2.0 (here, via @jeffscottbrown) — Grails is a Groovy-based web development framework. This plugin lets you use clojure from within the framework, which makes it very easy to develop webapps and use Clojure in the backend.
  • Securing #Clojure web applications with Sandbar – Part 2. Form-based authentication and channel security (here, via @brentonashworth) — Sandbar is a library intended to work on top of Compojure/Ring and that simplifies writing web applications.
  • Basic authentication for ring (and compojure etc.) (here, via @planetclojure) — Basic Authentication, but this time without Sandbar ;)
  • A micro-manual for LISP Implemented in C (here, via @planetclojure) — Ok, bear with me here. Nurullah has a free weekend, and what do you normally do with a free weekend? Well, it turns out he decides to write a LISP in C. So he writes it, and it works. Let me ask here, how many of you pull this kind of software over the weekends?!? I don’t. I would if I could though!
  • Are we to static for a dynamic world? (here, via @kotarak) — The pros and cons of extending records statically or dynamically (speed vs. flexibility). Good insight, you might want to tread this, since these features are new to Clojure 1.2.0

Permalink

Basic authentication for ring (and compojure etc.)

I’ve always liked HTTP authentication (like basic and digest) over login pages because they look so.. technically savvy. Finally somebody who bothered to read an RFC to implement it and make me feel warm and welcome like peers do.

Okay, I must admit, it takes a customer just a couple of moments to request a logout button, which is a real pain to implement, if possible at all. And I wouldn’t want to login on something I care about from a public computer either. But it is very nice for services!

Anyway here’s my implementation as ring middleware: ring-basic-authentication

Permalink

Syntax highlighting for clojure code

Having left people hanging last week, the tool I used to format thecode for the blog hosted on blogger is:GNU enscript.None of the available tools would work out of the box. GNU enscript provides the most bang for the buck. I've been using it for printing text for a while now - it has all the flexibility the "in the browser" highlighters have, at least when printing postscript. While it's natural

Permalink

Securing Clojure Web Applications with Sandbar - Part 2

In part 1 of this series, a simple authorization scheme was added to a small Clojure web application. Continuing with this example, this post will demonstrate how features of sandbar.auth may be used to add form-based authentication and channel security.

Form-based authentication and channel security


If you would like to follow along:

$ git clone git://github.com/brentonashworth/sandbar-examples.git
$ cd sandbar-examples/security
$ open src/sandbar/examples/part_two/start.clj
$ lein deps

This code is a bit different from what we ended with last time. A stylesheet has been added and the layout has been improved. The complete source for our starting point is shown below. Make sure you understand this code before moving on.


(ns sandbar.examples.part-two.start
(:use (ring.adapter jetty)
(ring.middleware file)
(compojure core)
(hiccup core page-helpers)
(sandbar core stateful-session auth)))

(defn query [type]
(ensure-any-role-if (= type :top-secret) #{:admin}
(= type :members-only) #{:member}
(str (name type) " data")))

(defn layout [content]
(html
(doctype :html4)
[:html
[:head
(stylesheet "sandbar.css")
(icon "icon.png")]
[:body
[:h2 "Sandbar Security Example"]
content
[:br]
[:div (if-let [username (current-username)]
[:div
(str "You are logged in as " username ". ")
(link-to "logout" "Logout")])]]]))

(defn data-view [title data & links]
[:div
[:h3 title]
[:p data]
(if (seq links) links [:div (link-to "home" "Home")])])

(defn home-view []
(data-view "Home"
(query :public)
[:div (link-to "member" "Member Data")]
[:div (link-to "admin" "Admin Data")]
[:br]
[:div (cond (any-role-granted? :admin)
"Hello administrator!"
(any-role-granted? :member)
"Hello member!"
:else "Click on one of the links above to log in.")]))

(defn member-view []
(data-view "Member Page"
(query :members-only)))

(defn admin-view []
(data-view "Admin Page"
(query :top-secret)))

(defn permission-denied-view []
[:div
[:h3 "Permission Denied"]
[:div (link-to "home" "Home")]])

(defroutes my-routes
(GET "/home*" [] (layout (home-view)))
(GET "/member*" [] (layout (member-view)))
(GET "/admin*" [] (layout (admin-view)))
(GET "/logout*" [] (logout! {}))
(GET "/permission-denied*" [] (layout (permission-denied-view)))
(ANY "*" [] (layout (home-view))))

(defn authenticate [request]
(let [uri (:uri request)]
(cond (= uri "/member") {:name "joe" :roles #{:member}}
(= uri "/admin") {:name "sue" :roles #{:admin}})))

(def app (-> my-routes
(with-security authenticate)
wrap-stateful-session
(wrap-file "public")))

(defn run []
(run-jetty (var app) {:join? false :port 8080}))

Encryption


Our application must ensure that passwords are not sent across the network in plain text. In a production environment, one would enable SSL support on the web server and purchase a legitimate SSL certificate from a valid authority for use with the site's domain. For development, it is good enough to create a self-signed certificate and turn on Jetty's SSL support.

Use Java's keytool to create a self-signed certificate.

$ keytool -genkey -alias sandbar -keyalg RSA -keystore my.keystore -keypass foobar
Enter keystore password:
Re-enter new password:
What is your first and last name?
[Unknown]: localhost
What is the name of your organizational unit?
[Unknown]: dev
What is the name of your organization?
[Unknown]: clojure
What is the name of your City or Locality?
[Unknown]: New York
What is the name of your State or Province?
[Unknown]: New York
What is the two-letter country code for this unit?
[Unknown]: US
Is CN=localhost, OU=dev, O=clojure, L=New York, ST=New York, C=US correct?
[no]: y

For this example, the password "foobar" was entered. keytool has created a keystore in the file named my.keystore. Make sure this file is located in the root directory of the security module (at the same level as the public directory).

To make use of this keystore, update the run function so that it matches the version shown below.


(defn run []
(run-jetty (var app) {:join? false :ssl? true :port 8080 :ssl-port 8443
:keystore "my.keystore"
:key-password "foobar"}))

Setting :join? to false will cause the call to run-jetty to return so that the REPL may still be used. The :ssl-port defaults to 443; here we set it here to 8443. Everything else is straight forward. If you are following along, now would be a great time to start a REPL and test that everything is working as expected.

$ lein repl


user=> (use 'sandbar.examples.part-two.start)
user=> (run)

Navigating to https://localhost:8443/ and http://localhost:8080/ confirms that the application may be used over SSL or standard http and that everything works the same as it did before.

Note: You will get a warning message because the certificate that we are using is not legitimate. This is fine for development; do what you need to do to add an exception for this certificate.

Adding form-based authentication


In the last post, the with-security middleware was added and configured to use our authenticate function. In this section, the authenticate function will be replaced with an authentication function from sandbar.form-authentication and a pre-built login form will be added.

Start by adding the required namespaces sandbar.form-authentication and sandbar.validation.

Delete the authenticate function and replace authenticate with form-authentication in our with-security middleware. form-authentication will redirect a user to a login form when that user is not authenticated.

To implement the login form, add form-authentication-routes to the list of routes.


(form-authentication-routes (fn [_ c] (layout c))
(form-authentication-adapter))

The parameters to form-authentication-routes are: a layout function and something that satisfies the protocol FormAuthAdapter. The layout function must take two parameters: the request and the content to layout. The FormAuthAdapter protocol specifies two functions which allow us to adapt this component to our system. The functions are load-user and validate-password. load-user takes a username and password and returns a user map. A user map must at least have the keys :username and :roles. validate-password returns a function that can validate the user map created by load-user.


(defrecord DemoAdapter []
FormAuthAdapter
(load-user
[this username password]
(let [login {:username username :password password}]
(cond (= username "member")
(merge login {:roles #{:member}})
(= username "admin")
(merge login {:roles #{:admin}})
:else login)))
(validate-password
[this]
(fn [m]
(if (= (:password m) (:username m))
m
(add-validation-error m "Username and password do not match!")))))

(defn form-authentication-adapter []
(DemoAdapter.))

This implementation of load-user will simply look at the username to determine if the user is a member or admin. The validate-password implementation will ensure that the username and password are the same. add-validation-error is a function from sandbar.validation which is being used here to display an error message.

(For more information about validators, see the post Clojure Macros Make Me Happy. I will not go into any more detail here.)

After making these changes, saving them and reloading the namespace,


user=> (use :reload-all 'sandbar.examples.part-two.start)

we can return to our application where we should now have an operational login form. Try submitting the form while it is empty. Try entering a username and password that do not match. The form does the right thing in each situation.

Form Customization


At this point you may want to customize the field names on the form as well as the error messages that are displayed. While we are at it, let's make a custom logout landing page. Create a map with keys that correspond to the fields and errors that we would like to update as well as the key :logout-page that indicates where to go after we logout.


(def properties
{:username "Username"
:password "Password"
:username-validation-error "Enter either admin or member"
:password-validation-error "Enter a password!"
:logout-page "/after-logout"})

Update the form-authentication-adapter constructor to merge these properties into our DemoAdapter. Because Clojure's records implement the persistent map interface, the form-authentication module uses the FormAuthAdapter as a map to look up field names and error messages for the login form.


(defn basic-auth-adapter []
(merge (DemoAdapter.) properties))

After making this change, we will see our own field names and error messages displayed on the login form.

Next, create the logout landing page by replacing the empty map that was passed to logout! with properties and then creating a route


(GET "/after-logout" [] (layout (after-logout-view)))

and view.


(defn after-logout-view []
[:div
[:h3 "Logout"]
[:p "You are no longer logged in!"]
[:div (link-to "home" "Home")]])

In all the changes that have been made so far, a pattern is emerging; add components in the form of middleware or parametrized routes then adapt it to our project. This same pattern will be used in the next section to add channel security.

Channel security


One thing that you may have noticed is that even though SSL is enabled, there is no way to control when it is used and when not. The goal in this application is to secure passwords sent from the client to the server and, because of the additional delay, to use SSL on as few pages as possible. This may be done by adding the with-secure-channel middleware which takes four parameters: the routes to be wrapped, a security configuration, the port and the SSL port. After adding this middleware, the app var definition now looks like this:


(def app (-> my-routes
(with-security basic-auth)
wrap-stateful-session
(wrap-file "public")
(with-secure-channel security-config 8080 8443)))

The security configuration security-config is a vector of pairs. Each pair is a regular expression literal followed by a configuration. We use a vector here instead of a map because each entry is checked against the current URI in order with the first match being selected.


(def security-config
[#"/login.*" :ssl
#".*.css|.*.png" :any-channel
#".*" :nossl])

Three keywords are used to represent the three kinds of channel security: :ssl, :nossl and :any-channel. The above configuration causes the login screen to always be accessed through SSL, images and stylesheets to go over any channel and everything else to go through standard http.

A future post will show how this same vector may be used to authorized access based on URI patterns.

If you did not follow along, you may want to take a look at the final version of the source for this exmaple.

Conclusion


Progress has been made, but this app is still not secure. Passwords should not be hard coded into an application like this. The application may be improved by having a way to easily assign different passwords to each user and store them securely. There will be at least two more posts in this series; one them will cover some of the new features of Sandbar which can help make this easy for the most common cases. The other will demonstrate how to authorize access to resources based on URI patterns.

Permalink

This weekend in the intertweets (Aug 22nd Ed)

  • 32 days after Leiningen 1.2.0: Leiningen 1.3.0 is released (here, via @technomancy) — Multiple connections to the same REPL, task chaining, user-level plugins, and shell script launchers for your jar files.
  • Clojure, concurrency and silver bullets (here, via @cbeust) — Cedric Beust, the author of amongst other things TestNG, comments on Bob Martin’s article”Why Clojure?” and argues that clojure is no silver bullet when it comes to concurrency, arguing that straight Java with the Concurrent library is better.
  • Modularization of #clojure contrib is complete; lib authors please read (here, via @stuartsierra) — This will allow authors to provide updates to each library in clojure.contrib much faster than before.
  • There are two kind of databases, those that can do map-reduce queries in Clojure on those that don’t (here, via @old_sound) — The author attempts to using Clojure to perform the computations related to ma-preduce in Riak. Note that you can already do this in CouchDB via clutch.
  • rolled the SIGHUP config reloading business for #clojure into a library (here, via @alandipert) — “Reload configuration files in Clojure daemons when the JVM receives a SIGHUP. Configuration files are Clojure code, and can contain any Clojure data structures”. Un*x only.
  • any sufficiently advanced magic is indistinguishable from #clojure (via @alandipert) — And with this, somebody in the Internet will call you a ‘fanboy’.
  • A Clojurist’s Guide to Java (here, via @ihodes) — From the article: “it should serve as a “Getting Started with Java from Clojure” guide that will hopefully enable you to more easily navigate the Java documentation and use Java in your Clojure projects when the need arises”
  • Clojure in Python (here, via @HNTweets) — What do you do if Clojure takes to long to start (or re-start) when hosted in Google App Engine? Rewrite  Clojure in Python, of course, since Python starts way faster!
  • Clojure programmers don’t write their apps in Clojure. They write the language that they use to write their apps in Clojure. (via @fogus) — I think it should read “Good Clojure programmers…”, my code doesn’t quite look like that (yet)
  • equiv branch has been merged to #clojure master branch. primitive args have arrived.(here, via @wmacgyver) — Get ready to see your Clojure code fly even faster!

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.