Testing Clojure Web Applications with Selenium

This article is brought with ❤ to you by Semaphore.

Selenium is a commonly used set of tools for automating browsers. It allows you to drive a browser interaction with a web page by writing code. It's most often used to write browser-based tests for web applications. Tests can be executed in a development environment or even on a real application as part of a smoke test suite.

Selenium provides support for most popular languages and browsers. This tutorial explains how to use Selenium for testing Clojure web applications. Our setup will be based on clj-webdriver and Firefox, and we will use Compojure to write a very simple web application.

Setting Up the Project

Prerequisites

For developing a Clojure application using this tutorial you will need:

  • Java JDK version 6 or later.
  • Leiningen 2.
  • Firefox 39 or lower.

At the time of this writing, if you have a newer version of Firefox, the example might not work for you. If it doesn't, you will need to downgrade Firefox. You can see what's the last version of Firefox that Selenium officially supports on the Selenium changelog page. If you plan to use Selenium-based tests regularly, you might want to hold Firefox updates until Selenium starts supporting them.

Create a Hello World Compojure Application

Compojure is a routing library for Ring, and a popular choice for writing web applications in Clojure. Leiningen provides a Compojure template that allows us to get started with Compojure quickly.

Create a Compojure-based Clojure project:

lein new compojure clj-webdriver-tutorial

The second parameter compojure is the name of the template that's going to be used for creating the application. The last parameter, clj-webdriver-tutorial, is the name of your project.

Navigate to the project directory:

cd clj-webdriver-tutorial

Start the server:

lein ring server-headless

After the server starts, visit http://localhost:3000 in a browser and you should see the Hello World greeting from the application:

Hello World

Compojure Application Structure

The structure of your application should look like this:

├── project.clj
├── README.md
├── resources
│   └── public
├── src
│   └── clj_webdriver_tutorial
│       └── handler.clj
├── target
│   ├── ...
└── test
    └── clj_webdriver_tutorial
        └── handler_test.clj

The file that we're interested in is src/clj_webdriver_tutorial/handler.clj. If you open it, it should contain the following code:

(ns clj-webdriver-tutorial.handler
  (:require [compojure.core :refer :all]
            [compojure.route :as route]
            [ring.middleware.defaults :refer [wrap-defaults site-defaults]]))

(defroutes app-routes
  (GET "/" [] "Hello World")
  (route/not-found "Not Found"))

(def app
  (wrap-defaults app-routes site-defaults))

It defines the access point to the application (/ - the root path), and we can see that this is where that "Hello World" is coming from.

We can also notice that Leiningen created the handler_test.clj file that's using clojure.test to test the handler. Since we're concentrating on clj-webdriver instead, let's remove the test:

rm test/clj_webdriver_tutorial/handler_test.clj

Install clj-webdriver

Install clj-webdriver by adding the project development dependencies to project.clj:

(defproject clj-webdriver-tutorial "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :min-lein-version "2.0.0"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [compojure "1.3.1"]
                 [ring/ring-defaults "0.1.2"]]
  :plugins [[lein-ring "0.8.13"]]
  :ring {:handler clj-webdriver-tutorial.handler/app}
  :profiles
  {:dev {:dependencies [[clj-webdriver "0.7.1"]
                        [org.seleniumhq.selenium/selenium-server "2.47.0"]
                        [javax.servlet/servlet-api "2.5"]
                        [ring-mock "0.1.5"]
                        [ring/ring-jetty-adapter "1.4.0"]]}})

There are several new things in project.clj:

  • We added clj-webdriver "0.7.1".
  • Next, we explicitly added the selenium-server that supports at least Firefox
    1. If you have a newer version of Firefox, you can try upgrading selenium-server to the latest available Selenium version.
  • We also added ring-jetty-adapter to run the application before executing tests.

First clj-webdriver Test

Create the features directory where you will put clj-webdriver tests:

mkdir test/clj_webdriver_tutorial/features

Open test/clj_webdriver_tutorial/features/config.clj and add some common configurations that we will use in tests:

(ns clj-webdriver-tutorial.features.config)

(def test-port 5744)
(def test-host "localhost")
(def test-base-url (str "http://" test-host ":" test-port "/"))

The default configuration states that tests will be executed against the application running on http://localhost:5744. 5744 is the default port for Selenium.

Our first test will check if the home page really displays the "Hello World" message. Since we're testing by opening a real browser, the test needs some setup and teardown. Here are the steps that need to be executed:

  1. Start the server for the application.
  2. Open the root path in the browser.
  3. Check if the "Hello World" message is present on the page.
  4. Close the browser.
  5. Shut down the server.

Let's write a skeleton of that code in test/clj_webdriver_tutorial/features/homepage.clj:

(ns clj-webdriver-tutorial.features.homepage
  (:require [clojure.test :refer :all]
            [ring.adapter.jetty :refer [run-jetty]]
            [clj-webdriver.taxi :refer :all]
            [clj-webdriver-tutorial.features.config :refer :all]
            [clj-webdriver-tutorial.handler :refer [app-routes]]))

(deftest homepage-greeting
  (let [server (start-server)]
    (start-browser)
    (to test-base-url)
    (is (= (text "body") "Hello World"))
    (stop-browser)
    (stop-server server)))

The most important parts are the to and text functions that are used for navigating to a page and extracting text from a node, respectively. They are part of the clj-webdriver Taxi API.

Before running the test, we need to implement the start-server, start-browser, stop-browser and stop-server functions.

The start-server function is the most complex one, as it starts the jetty server on the test port and waits for the server to be started:

(defn start-server []
  (loop [server (run-jetty app-routes {:port test-port, :join? false})]
    (if (.isStarted server)
      server
      (recur server))))

The other functions are much simpler:

(defn stop-server [server]
  (.stop server))

(defn start-browser []
  (set-driver! {:browser :firefox}))

(defn stop-browser []
  (quit))

As they are actually wrappers against respective functions in the clj-webdriver, they can be used directly in a real application test.

Putting in all together and our first code in test/clj_webdriver_tutorial/features/homepage.clj looks like this:

(ns clj-webdriver-tutorial.features.homepage
  (:require [clojure.test :refer :all]
            [ring.adapter.jetty :refer [run-jetty]]
            [clj-webdriver.taxi :refer :all]
            [clj-webdriver-tutorial.features.config :refer :all]
            [clj-webdriver-tutorial.handler :refer [app-routes]]))

(defn start-server []
  (loop [server (run-jetty app-routes {:port test-port, :join? false})]
    (if (.isStarted server)
      server
      (recur server))))

(defn stop-server [server]
  (.stop server))

(defn start-browser []
  (set-driver! {:browser :firefox}))

(defn stop-browser []
  (quit))

(deftest homepage-greeting
  (let [server (start-server)]
    (start-browser)
    (to test-base-url)
    (is (= (text "body") "Hello World"))
    (stop-browser)
    (stop-server server)))

Run the tests suite:

lein test

And you will see that the test passed:

Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

You now have a basic setup for testing Clojure web applications with Selenium.

Cleaning Up

This setup works well, but we have to remember to start the server and the browser before each test, and to shut them down after the tests are done. To make things easier, we can implement test fixtures that will do this automatically before and after every test.

The fixture for handling the server can be implemented as follows:

(defn with-server [t]
  (let [server (start-server)]
    (t)
    (stop-server server)))

The t parameter stands for test case function. It starts the server before the test case, executes the test function, and stops the server.

The fixture for handling the browser is similar:

(defn with-browser [t]
  (start-browser)
  (t)
  (stop-browser))

Using fixtures, we can write a much cleaner test:

(ns clj-webdriver-tutorial.features.homepage
  (:require [clojure.test :refer :all]
            [ring.adapter.jetty :refer [run-jetty]]
            [clj-webdriver.taxi :refer :all]
            [clj-webdriver-tutorial.features.config :refer :all]
            [clj-webdriver-tutorial.handler :refer [app-routes]]))

;; Fixtures

(defn start-server []
  (loop [server (run-jetty app-routes {:port test-port, :join? false})]
    (if (.isStarted server)
      server
      (recur server))))

(defn stop-server [server]
  (.stop server))

(defn with-server [t]
  (let [server (start-server)]
    (t)
    (stop-server server)))

(defn start-browser []
  (set-driver! {:browser :firefox}))

(defn stop-browser []
  (quit))

(defn with-browser [t]
  (start-browser)
  (t)
  (stop-browser))

(use-fixtures :once with-server with-browser)

;; Tests

(deftest homepage-greeting
  (to test-base-url)
  (is (= (text "body") "Hello World")))

Note that we passed the :once parameter to the use-fixtures function. This means that the browser will be started once before all tests, and stopped after all tests are finished. The same goes for the server. This should significantly speed up the tests in the file.

In a real application, you can move fixture functions to a separate namespace that is shared by all tests.

Conclusion

Selenium is a valuable tool for testing web applications, and it is indispensable if an application is heavily using JavaScript. Setting up Selenium with Clojure requires several steps covered in the tutorial. Using fixtures, a test setup can be reused, which results in cleaner and faster tests.

You can find more information about common functions for interacting with web pages and inspecting page content in the clj-webdriver Taxi API documentation.

This article is brought with ❤ to you by Semaphore.

Permalink

Defn Podcast Episode 30 – Bruce Hauman

This is a summary of the defn podcast interview with Bruce Hauman.

Once again, it’s been a great and long episode where Ray and Vijay talked to Bruce Hauman. You’ll find my notes below.

These are my notes to help me quickly review the key information. I might have misinterpreted something. I encourage you to listen to the full episode.

 

Summary

  • Name Hauman has German origin.
  • How he got to Clojure and Figwheel
    • He likes languages, parsing et al.
    • He has always had a passion for LISP – he was LISPer for a long time (since college).
    • In the real world, you work with Ruby, PHP, Java, etc.
      • As a result, he got sick and started to play with ClojureScript – there were quite a few rough edges and thus he created Figwheel.
    • Now Bruce lives in a typical apartment in Montreal.
    • ClojureScript doesn’t “encapsulate” functions and modules.
      • You can just load the file, and all your definitions are reloaded => very easy.
      • If you want to do stateful stuff like https://threejs.org/ it becomes problematic => you can turn auto-loading on and off with Figwheel.
    • Using Figwheel with Node.js is different.
      • You have no display and feedback is less visible.
  • Joke: he’s not a real programmer, he just build cool demos that make people think.
    • Figwheel is quite complicated real application written in Clojure.
  • He recently built some application with Clojure and Ethereum – that was a really cool experience.
  • Clojure as a strange maximum
    • He doesn’t see himself gravitating to other languages (although he loves languages a lot).
    • In terms of getting things done, he sees little benefits in using other languages.
  • Static types
    • The static guys are overblowing guarantee they’re getting from types – in real-world complex systems with lots of state (think “Microservices”) the benefits are diminished.
    • In Clojure we are very productive – look at Advent of Code and compare Clojure solutions to other languages.
    • Ray: certain types of functions would benefit from types but exploring external resources and APIs/data is much easier without types ceremony.
    • Racket, gradual typing, etc. (Vijay asked Bruce what he thinks about that and what’s his experience)
      • Paying upfront cost with types doesn’t make much sense to him because so much programming is about exploration – Bruce prefers to have the flexibility to explore.
  • Building new Clojure/ClojureScript app – what are his libraries/tools of choice?
    • Pick as few libraries as possible.
    • He (always) uses React for frontend applications.
    • He prefers Om style – passing state explicitly to downstream components instead of referencing global state.
      • But he didn’t do much ClojureScript development in the last year or so (e.g. re-frame got a lot better)
    • In lots of applications, you don’t need to make decisions about REST and GraphQL until you reach a certain size.
      • just pushing JSON data through APIs
      • If you know exactly what you’re building and how big it’s gonna be then it might be useful to pay that cost upfront.
  • Strictly-specking library
    • It was written mostly for the purpose of checking Figwheel/ClojureScript configuration options of which there are many and they’re easy to get wrong.
    • He started to write it in core.logic, then spec came out and he re-wrote it using spec.
      • Back then, spec was missing some features (it wasn’t so easy to get an exact pointer to the problem inside the input data structure).
      • After that, there were some improvements to spec which made his job easier.
  • Spec error messages
    • Concerning Clojure.core we’re talking about macros and special forms.
      • You get very detailed error messages – however, programming is a very incremental activity, and you just need a very brief and clear error message.
      • BUT, you can write a library that will match certain spec errors for core macros and output precise and clear user-facing error messages.
    • When saying “better error messages”, it helps to be a lot more precise what does that mean – e.g. “We need a concise error message along with a pointer to the start and the end of the relevant code.”
      • Unfortunately, that pointer is contextual – it depends on the file in which you are, etc.
    • Bruce really wants better error messages to bring more people to Clojure – that’s the reason why he is writing rebel-readline.
  • Devcards
    • All Bruce’s projects are focused on bringing Clojure interactivity to people.
    • When writing browser applications you always run in a broader context of the browser.
    • Devcards’ idea is that it should be easy to create independent pieces of an application and have them together.
  • Future (ideas about Fighweel and more)
    • Spec errors descriptions
    • Keeping statistics and visual history of errors and messages.
  • rebel-readline
    • Story
      • He does Advent of Code every year.
      • Working on programming projects let you reflect on our programming experience.
      • TIS 100 – assembly language programming  game – the easiest way to learn assembly programming again.
      • REPL is kind of a game – idea of building challenges in REPL; he never got too far with this.
    • Experienced Clojure programmers don’t need a great REPL experience, but beginners need it.
      • When you’re new to Clojure it’s impossible to choose an editor – everybody says: “Cursive, Emacs, Atom, …”
      • Ray: he struggled for one year or so to grasp the REPL – it’s really a superpower of Clojure and now he uses it all the time.
    • JLine provides a lot of functionality and makes things a lot easier.
    • rebel-readline is practically an editor, and you can put many features there, but it’s already great.
    • IPython-like notebooks
      • Not being in a file feels very constraining.
      • You can use Devcards as a graphical REPL in a browser, and you have your code in a file!
    • Reddit discussion: Pre-release of rebel-readline by Bruce Hauman! 😀
      • But the greatest benefit for me is that we can finally show newcomers an almost proper Clojure workflow without sending them to setup Emacs/vim/Cursive first.
    • crepl project idea
      • collaborative REPL
      • Nice idea but they eventually ran out of money.
      • Tmux can be used for shared typing into REPL but you can’t see who’s typing what and when he typed.
      • Joke: with Clojure we don’t need multiple people; we’re so productive that just one man is enough.
    • Bruce would like to have replreadline ready for getting people’s feedback in a couple of weeks.
  • JavaScript experience
    • Bruce enjoyed JavaScript back in time because of its dynamic nature.
    • If he had a choice, he’d choose CoffeeScript.
  • Clojurists Together initiative
    • Figwheel is one of the sponsored projects.
    • It’s great; please join in and support Clojure open source projects!

 

Permalink

Machine Learning for the Lazy Beginner

Machine Learning for the Lazy Beginner

This article was prompted by a tweet I saw which asked for a walkthrough on training a machine learning service to recognise new members of 3 different data sets.

@rem: Being lazy here: I'm after a (machine learning) service that I can feed three separate datasets (to train with), and then I want to ask: "which dataset is this new bit of content most like".

Is there a walkthrough/cheatsheet/service for this?

My first thought was that this sounds like a classification task, and the idea that there are 3 sets of data should be the other way round: there is one set of data and each item in the set has one of 3 labels.

I didn't have a walkthrough in mind but I do know how to train a classifier to perform this exact task, so here is my walkthrough of classifying text documents using Javascript.

 Do You Have Adequate Supervision?

Machine learning can be classified (no pun intended) as either supervised or unsupervised. The latter refers to problems where the data you feed to the algorithm has no predetermined label. You might have a bunch of text documents and you want to find out if they can be grouped together into similar categories - that would be an example of clustering.

Supervised learning is where you know the outcome already. You have set of data in which each member fits into one of n categories, for example a set of data on customers to your e-commerce platform, labelled according to what category of product they're likely to be interested in. You train your model against that data and use it predict what new customers might be interested in buying - this is an example of classification.

Get in Training

For the classification task we've said that we "train" a model against the data we know the labels for. What that means is that we feed each instance in a dataset into the classifier, saying which label it should have. We can then pass the classifier a new instance, to which we don't know the label, and it will predict which class that fits into, based on what it's seen before.

There's a Javascript package called natural which has several different classifiers for working with text documents (natural language). Using one looks like this:

const { BayesClassifier } = require('natural');
const classifier = new BayesClassifier();

// Feed documents in, labelled either 'nice' or 'nasty'
classifier.addDocument('You are lovely', 'nice');
classifier.addDocument('I really like you', 'nice');
classifier.addDocument('You are horrible', 'nasty');
classifier.addDocument('I do not like you', 'nasty');

// Train the model
classifier.train();

// Predict which label these documents should have
classifier.classify('You smell horrible');
// nasty
classifier.classify('I like your face');
// 'nice'
> classifier.classify('You are nice');
// 'nice'

We add labelled data, train the model and then we can use it to predict the class of text we haven't seen before. Hooray!

Performance Analysis

Training a machine learning model with a dataset of 4 instances clearly isn't something that's going to be very useful - its experience of the problem domain is very limited. Machine learning and big data are somewhat synonymous because the more data you have the better you can train your model, in the same way that the more experience someone has of a topic the more they're likely to know about it. So how do we know how clever our model is?

The way we evaluate supervised learning models is to split our data into a training set and a testing set, train it using one and test it using the other (I'll leave you to guess which way round). The more data in the training set the better.

When we get the predictions for our test data we can determine if the model accurately predicted the class each item is labelled with. Adding up the successes and errors will give us numbers indicating how good the classifier is. For example, successes over total instances processed is our accuracy; errors divided by the total is the error rate. We can get more in-depth analysis by plotting a confusion matrix showing actual classes against predictions:

Actual
nice nasty
Predicted nice 21 2
nasty 1 10

This is really valuable for assessing performance when it's OK to incorrectly predict one class but not another. For example, when screening for terminal diseases it would be much better to bias for false positives and have a doctor check images manually rather than incorrectly give some patients the all clear.

Train On All the Data

One way to train with as much data as possible is to use cross validation, where we take a small subset of our data to test on and use the rest for training. A commonly used technique is k-fold cross validation, where the dataset is divided into k different subsets (k can be any number, even the number of instances in the dataset), each of which is used as a testing set while the rest is used for training - the process is repeated until each subset has been used for testing i.e. k times.

k-fold cross validation

Tweet Data Example

I've put together an example using the natural Javascript package. It gets data from Twitter, searching for 3 different hashtags, then trains a model using those 3 hashtags as classes and evaluates the performance of the trained model. The output looks like this:

$ node gather.js
Found 93 for #javascript
Found 100 for #clojure
Found 68 for #python

$ node train.js
{ positives: 251, negatives: 10 }
Accuracy: 96.17%
Error: 3.83%

The code is on Github: classification-js

Machine Learning is That Easy?!

Well, no. The example is really trivial and doesn't do any pre-processing on the gathered data: it doesn't strip out the hashtag that it searched for from the text (meaning that it would probably struggle to predict a tweet about Python that didn't include "#python"); it doesn't remove any stop words (words that don't really add any value, such as a or the. In fact, natural does this for us when we feed documents in, but we didn't know that...); it doesn't expand any of the shortened URLs in the text (learnjavascript.com surely means more than t.co). We don't even look at the gathered data before using it, for example graphing word-frequencies to get an idea of what we've got: are some of the "#python" tweets from snake enthusiasts talking about their terrariums?

To miss-quote Tom Lehrer, machine learning is like a sewer: what you get out depends on what you put in.

Wrapping Up

The aim of this article was to give an overview of how a machine learning model is trained to perform a classification task. Hopefully, for the beginner, this goes some way to lifting the lid on some of that mystery.

Cover image by: https://www.flickr.com/photos/mattbuck007/

Permalink

PurelyFunctional.tv Newsletter 264: Happy Lundi Gras

Issue 264 – February 12, 2018 · Archives · Subscribe

Hi Clojurers,

Well, this is going to be a shortish issue. It’s carnival here. Things slow down. School is closed. Streets are blocked. Fun is had. And Clojure SYNC is nigh, so I’m busy with that.

Happy Mardi Gras!

Rock on!
Eric Normand <eric@purelyfunctional.tv>

PS Want to get this in your email? Subscribe!


Clojure SYNC is this week!

Clojure SYNC is coming quickly! If you’re not attending, videos will be available online eventually. We won’t have the super-quick turnaround of Clojure/conj.

Lunch and Evening event organization is coming along a little more slowly than I would have liked. But we’ve got great volunteers and I’m sure they’ll come through. The events we do have are awesome.


Robotics in Our Everday Lives YouTube

Carla Diana talks about her work designing robotic products. Very cool, deep thinking.


rebel-readline GitHub

Bruce Hauman, the creator of Figwheel, has started a new project to add interactivity to the REPL. What if you had autocomplete, parentheses matching, and more? And contributions are welcome. Join the project to make the terminal REPL a better experience.


Bruce Hauman on the defn podcast Podcast

Bruce has a lot to say about the beginner experience. Listen to this.


PurelyFunctional.tv Learning Paths

I’ve got over 60 hours of video on PurelyFunctional.tv. I’m well-aware that it’s hard to figure out what’s there and what to watch next. Organizing the material for different people has been a challenge.

I’ve hit on something that’s working. I’m calling it Learning Paths. Each Learning Path exists for a specific purpose. It gives you a selection of the courses for that purpose. And a suggested order.

Using the lesson completion feature (the checkmark under the video), you can see what courses you’ve already watched.

Here are the Learning Paths I’ve built so far:

The post PurelyFunctional.tv Newsletter 264: Happy Lundi Gras appeared first on PurelyFunctional.tv.

Permalink

The REPL

The REPL

prepl, Clojurists Together, and readline
View this email in your browser

The REPL

Notes.

Sorry for the less-than-weekly cadence of The REPL lately. Things have been pretty busy at work and home, plus it's summer in NZ so I've been spending as much time at the beach as possible. Things are starting to settle down though, and The REPL should be returning to a more regular rhythm.

Thanks for reading!

-main

  • The thing I'm personally most excited about this week is the recent announcement from Clojurists Together that we have funded clj-http and Figwheel in our first funding round. We've been chipping away at this since January 2016, and it feels really good to finally start funding projects. If you've signed up as a member, I really truly appreciate it.
  • Bruce Hauman has a very cool demo and prerelease of a new readline library he is working on for Clojure/ClojureScript.
  • Nikita Propov on macros and API design, using the specific case of HTML generation. It added something to the way I think about API design when I read it.
  • Ramsey Nassar is definitely not working on a post-Clojure lisp.

Libraries & Books.

  • Java Concurrency in Practice is on sale in paperback and Kindle form

People are worried about Types. ?

Foundations.

Tools.

Recent Developments.

  • The most intriguing news this week was this commit from Rich Hickey introducing prepl. Alex Miller gives some more context on Reddit.

Learning.

Misc.

Bryan Cantrill - The Hurricane's Butterfly: Debugging Pathologically Performing Systems
Copyright © 2018 Daniel Compton, All rights reserved.


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by MailChimp

Permalink

Defn Podcast Episode 30 – Zach Tellman

This is a summary of the defn podcast interview with Zach Tellman.

It was a wonderful 30th episode of the defn podcast where they interviewed Zach Tellman. As is so often the case with Zach’s talks, it’s a great discussion full of insights.

I usually take personal notes for many podcasts that I listen to (see Busy (Clojure) Developer Guide to Podcasts). The reasons are to help me remember more and sometimes be able to do a quick review without a need to re-listen the whole thing.

Most of the podcasts that I listen to have only very brief show notes. And it’s very true for defn podcast too. Therefore, I decided to put together a somewhat longer version. Beware that these are my own interpretations and because I wrote them down while walking and listening they may also be slightly incorrect. I encourage you to listen to the whole episode, take your own notes and draw your conclusions.

You can also expect more podcasts’ summaries on this blog in the future.

Show notes

  • Elements of Clojure
    • This is Zach’s new self-published book that has already been available for some time.
    • There’s also a discussion forum dedicated to the book.
    • He started the first serious work on this topic two years ago.
    • He didn’t anticipate how much work it would be and spent a lot of time reading many resources on the topic.
  • Lot of code we (programmers) write is a throw-away code
    • More frequent in LISP community.
    • It’s okay many times.
    • Writing reusable code is hard.
  • Correctness & proofs
    • 1972 paper: Proof of correctness of data representations
    • People should stop talking about software being correct and talk about software being self-consistent.
    • The hardest problem is to write software that’s useful when it’s outside of its usual environment.
  • Dynamic vs. static typing
    • Having some dynamism at the edges of your system is useful.
    • The real world doesn’t have schemas.
    • Clojure’s unique feature is that it brings immutability into the ecosystem of dynamic languages.
    • Clojure is a very good language for writing glue between more “rigid” components in your system. Those components might be written in other, more static languages (often in the form of Java library used directly from Clojure).
  • nil handling
    • One thing that Zach considers less than ideal in Clojure is that you can end up with nil propagating through half of the code only to eventually fail with NPE
    • The usual idiom (when fn-arg (do stuff)) is terrible
    • A lot of things in Clojure are designed to be concise. In some situations, it could be better to be explicit: e.g., requiring a default value for clojure.core/get
  • Clojure concurrency features
    • Why STM hasn’t been as successful as expected
      • STM makes things much more complicated (compared to a simple atom) while it gives us only small benefits (mostly under heavy contention).
      • We usually don’t write systems with that heavy concurrency – we instead scale them by distributing load over multiple machines.
    • core.async is mostly useful on the client side (ClojureScript)
      • People often use it on the server side even if there’s no compelling reason to do so.
  • Laziness
    • It can be tricky because there’s no clear distinction between data in the process and outside the process (file, network, etc.)
    • If Rich had known about transducers before Clojure 1.0 release, he would have made the laziness much less a default option in Clojure.
  • Light Table
    • Venture capitalists aim at billion dollar businesses which is problematic for developer tools like Light Table – it’s a pretty decent editor but it’s not clear how much money this area can bring to investors
    • Light Table is a nice demonstration of Bret Victor’s ideas
  • The great aspect of American culture is that it allows you to fail in your business => it encourages experimentation.
  • Zach’s history
    • He studied computer science at college.
    • He originally wanted to do games development, but later realized that it’s not for him.
    • He spent some time at C# shop – working on low-level Windows graphics stuff.
    • He joined Google for a short period of time
    • Eventually, he started working for Clojure startup and has been with Clojure ever since
  • Zach’s successful open-source projects: Aleph and Manifold
  • Spec
    • Spec is perhaps the first addition to Clojure in the last five years that can really change the way how programmers think about the language.
    • Spec error reporting is rather weak, though
      • Cognitect’s approach to leave this as an exercise to a reader is less than ideal.
    • Spec is a clever technical solution to what is a fundamentally human problem.
      • To have a really good human-readable error messages you’d have to have a possibility to specify error message generator with every spec.
  • Mean-time-to-accomplish-something is an important criterion for new people starting with a programming language
    • For Zach, Clojure has been great in this aspect because it allowed him to do rapid experiments with OpenGL library
  • Community has to step in

Permalink

Both, actually.

Both, actually. CLJS wasn’t option for company internal reasons, but having spent 2 years with TS now and not having to worry about compatibility with non-ES6 setups, I think I prefer my new workflow more than with CLJS. The structural typing works well for my use cases and has hugely simplified and accelerated my authoring & refactoring experiences. Occasionally, I do miss things like immutability by default (and some tooling, [e.g. figwheel]), but I’ve also realized they aren’t as important to me as I previously thought and they’re some fairly easy workarounds. Generally, my setup just feels a bit more nimble now, but having said this I still do love Clojure/script and still think everyone should at least dabble in it for a while — it was one of the best & deep learning (no pun!) experiences for me…

Permalink

Reloading Woes

Setting the Stage

When doing client work I put a lot of emphasis on tooling and workflow. By coaching people on their workflow, and by making sure the tooling is there to support it, a team can become many times more effective and productive.

An important part of that is having a good story for code reloading. Real world projects tend to have many dependencies and a large amount of code, making them slow to boot up, so we want to avoid having to restart the process.

Restarting the process is done when things get into a bad state. This can be application state (where did my database connection go?), or Clojure state (I’m redefining this middleware function, but the webserver isn’t picking it up.)

tools.namespace and component/integrant

We can avoid the cataclysm of a restart if we can find some other way to get to a clean slate. To refresh Clojure’s view of the world, there’s clojure.tools.namespace, which is able to unload and then reload namespaces.

To recreate the app state from scratch there’s the “reloaded workflow” as first described by Stuart Sierra, who created Component for this reason.

Combining these two allows you to tear down the app state, reload all your Clojure code, then recreate the app state, this time based on the freshly loaded code. Combining them like this is important because when you use tools.namespace to reload your code, it effectively creates completely new and separate versions of functions, records, multimethods, vars, even namespace objects, and any state that is left around from before the reload might still be referencing the old ones, making matters even worse than before.

So you need to combine the two, which (naively) would look something like this.

(ns user
  (:require [com.stuartsierra.component :as component]
            [clojure.tools.namespace.repl :as c.t.n.r]))

(def app-state nil)

(defn new-system []
  (component/system-map ,,,))

(defn go []
  (alter-var-root! app-state (fn [_]
                               (-> (new-system)
                                   component/start))))

(defn reset
  "Stop the app, reload all changed namespaces, start the app again."
  []
  (component/stop app-state)

  ;; Clojure tools namespace takes a symbol, rather than a function, so that it
  ;; can resolve the function *after code reloading*. That way it's sure to get
  ;; the latest version.
  (c.t.n.r/refresh :after 'user/go))

This pattern is encoded in reloaded.repl; System also contains an implementation. I recommend using an existing implementation over rolling your own, as there are some details to get right. It’s also interesting to note that reloaded.repl uses suspendable, an extension to Component that adds a suspend/resume operation.

To dig deeper into this topic, check out the Lambda Island episodes on Component and System.

The author of reloaded.repl, James Reeves, eventually became dissatisfied with Component, and created Integrant as an alternative, with integrant-repl as the counterpart of reloaded.repl.

With reloaded.repl your own utility code now looks like this:

(ns user
  (:require reloaded.repl
            com.stuartsierra.component))

(defn new-system []
  (com.stuartsierra.component/system-map ,,,))

(reloaded.repl/set-init! new-system)

Now you start the system with (reloaded.repl/go). To stop the system, reload all changed namespaces, and start the system again, you do (reloaded.repl/reset).

With integrant-repl things look similar.

(ns user
  (:require integrant.repl
            clojure.edn))

(defn system-config []
  (-> "system_config.edn" slurp clojure.edn/read-string))

(integrant.repl/set-prep! system-config)

Now (integrant.repl/go) will bring your system up, and with (integrant.repl/reset) you can get back to a clean slate.

cider-refresh

CIDER (Emacs’s Clojure integration) supports tools.namespace through the cider-refresh command. This only does code reloading though, but you can tell it to run a function before and after the refresh, to achieve the same effect.

;; emacs config


;; reloaded.repl
(setq  cider-refresh-before-fn "reloaded.repl/suspend")
(setq  cider-refresh-after-fn "reloaded.repl/resume")

;; integrant-repl
(setq  cider-refresh-before-fn "integrant.repl/suspend")
(setq  cider-refresh-after-fn "integrant.repl/resume")

You can also configure this on a per-project basis, by creating a file called .dir-locals.el in the project root, which looks like this.

((nil . ((cider-refresh-before-fn . "integrant.repl/suspend")
         (cider-refresh-after-fn . "integrant.repl/resume"))))

Now calling cider-refresh (Spacemacs: , s x) will again stop the system, reload all namespaces, and restart.

Plot twist

So far so good, this was a lengthy introduction, but I wanted to make sure we’re all on the same page, now let’s look at some of the things that might spoil our lovely reloading fun.

AOT

Clojure has a nifty feature called AOT or “ahead of time compilation”. This causes Clojure to pre-compile namespaces to Java classes and save those out to disk. This can be a useful feature as part of a deployment pipeline, because it can speed up booting your app. It has some serious drawbacks though, as it messes with Clojure’s dynamic nature.

What tends to happen is that at some point people find out about AOT, they think it’s amazing, and enable it everywhere. A bit later errors start popping up during development that just make no sense.

AOT should only be used for deployed applications. Don’t use it during development, and don’t use it for libraries you publish.

Even when following this advice you can get into trouble. Say you have aot enabled as part of the process of building an uberjar for deployment.

(defproject my-project "0.1.0"
  ,,,
  :profiles {:uberjar
             {:aot true}})

You do a lein uberjar to test your build locally. This will create AOT compiled classes and put them on the classpath (under target/ to be precise). Next time you try to (reset) it will tell you it can’t find certain Clojure source files, even though they’re right there. I have lost hours of my life figuring this out, and have more than once found my own Stackoverflow question+answer when googling for this. Watch out for the ghost of AOT!

By the way, here’s a handy oneliner: git clean -xfd. It removes any files not tracked by git, including files in .gitignore. It’s the most thorough way of cleaning out a repository. Do watch out, this might delete files you still want to keep! With git clean -nxfd you can do a dry run to see what it plans to delete.

defonce won’t save you

Ideally all your state is inside your system, but maybe there’s something else that you want to carry over between reloads. No need to judge, life is compromise.

You might think, “I know, I’ll just defonce it!”, but defonce won’t save you. tools.namespace will completely remove namespaces with everything in them before loading them again, so that var created by defonce the first time is long gone by the time defonce gets called again, and so it happily defines it anew.

What you can do instead is add some namespace metadata telling tools.namespace not to unload the namespace.

(ns my-ns
  (:require [clojure.tools.namespace.repl :as c.t.n.r]))

(c.t.n.r/disable-unload!) ;; this adds :clojure.tools.namespace.repl/unload false
                          ;; to the namespace metadata

(defonce only-once (Date.))

This namespace will still get reloaded, so if you have functions in there then you’ll get the updated definitions, but it won’t be unloaded first, so defonce will work as expected. This does also mean that if you remove (or rename) a function, the old one is still around.

To completely opt-out of reloading, use (c.t.n.r/disable-reload!). This implies disable-unload!.

cljc + ClojureScript source maps

When ClojureScript generates source maps, it needs a way to point the browser at the original source files, so it copies cljs and cljc files into your web server document root. (your “public/” directory). Seems innocent enough, but it can confuse tools.namespace.

It is quite a common practice to search for files that are requested over HTTP in the classpath, so the “public/” directory tends to be on the classpath as well. tools.namespace scans the classpath for Clojure files (clj or cljc) and finds those cljc files that ClojureScript copied there, but their namespace name doesn’t correspond with their location, and things breaks. There is a JIRA ticket for this for tools.namespace: TNS-45, with several proposed patches, but no consensus yet about the right way forward.

The easiest way around this is to limit tools.namespace to only scan your own source directories.

(c.t.n.r/set-refresh-dirs "src/clj")

CIDER does its own thing

cider-refresh roughly does the same thing as clojure.tools.namespace.repl/refresh, and it uses tools.namespace under the hood, but it reimplements refresh using lower level tools.namespace primitives. This leads to subtle differences.

Currently cider-refresh does not honor set-refresh-dirs. It honors the unload/load metadata, but does not follow the convention that :clojure.tools.namespace.repl/load false implies :clojure.tools.namespace.repl/unload false.

I proposed a patch to address both of these issues. In general if cider-refresh seems to have issues, then try to use (reset) directly in the REPL.

Permalink

The Power of Clojure: Debugging

A common question we hear is "How do I use Clojure for real?" Not the language basics, but the practicalities of building software – questions like how to structure the project file tree and namespace hierarchy, how to write tests, or how to use the REPL.

This post tackles a small subset of this: debugging, and more specifically, REPL-based debugging in Clojure.

Debugging is fundamentally difficult. Clojure is a simple language, which helps with the cognitive load of debugging, and the debugging tools available to Clojure programmers are simple and powerful too.

Debugging in Clojure with a REPL involves using a number of small, simple tools in a systematic approach to achieve a thorough understanding of potentially complex situations. In this post, we will cover both these tools and the approach with which to apply to the tools to solve your debugging problems.

Why Is This Important?

A few years ago, a Cambridge University study found that "[...] on average, software developers spend 50% of their programming time finding and fixing bugs." Debugging is inevitable, necessary, important work.

For our purposes, we will take a look at two aspects of debugging: exploration of existing code and fixing bugs.

Exploration: When exploring new projects, debugging is often how you start learning about  how the code works and how it's structured. When you need to alter code structure (e.g. breaking apart a large function before changing its behavior) you need to understand how it works. Learning how libraries work can be difficult. Often the documentation is insufficient and it’s necessary to dive in to the code to answer questions.

Fixing Bugs: It's unavoidable, it will be necessary to solve issues that present themselves in production. People will always write bugs, therefore you will always have to debug code.

Both of these contexts are an intrinsic part of the work of a software developer, and if these constitute 50% of your work (at least in time), debugging skills are important to hone. But note: debugging skills are just that: skills. Learnable, systematized, something to study.

What We Won’t Cover:

Whilst important and and related, this post won’t cover the following:

  • Setting up logging frameworks

  • Reading and understanding Clojure/Script exceptions

  • Profiling (understanding the performance characteristics of code)

  • Interactive debuggers such as CIDER’s debugger or Cursive’s integration with IntelliJ’s debugger.

Debugging Framework: The Scientific Method

 https://upload.wikimedia.org/wikipedia/commons/c/c7/The_Scientific_Method.jpg

 https://upload.wikimedia.org/wikipedia/commons/c/c7/The_Scientific_Method.jpg

Debugging with the scientific method is hardly a novel idea and plenty of great resources exist, but if you’re looking for some information specific to Clojure, check out Stu Halloway’s 2015 Conj talk (video, slides and links) as well as Kyle Kingsbury’s "Clojure from the Ground Up" chapter on debugging. Kyle’s work specifically covers debugging and many other valuable techniques.

In short:

  • Observe the situation (problem)
  • Form a hypothesis to explain the problem
  • Devise and execute a test to show the correctness of your hypothesis
    • Inconclusive? Try again.
    • Conclusive? Good! The problem space has been reduced, try again with a smaller subset.

Contextually, we use debugging techniques to help observe and understand a problem, and to quantitatively evaluate the outcome of a test.

Instrumenting: Inspecting, Tracing

https://pixabay.com/en/meters-armatures-pipes-factory-498677/

https://pixabay.com/en/meters-armatures-pipes-factory-498677/

Adding instrumentation to code is like adding gauges and meters to a mechanical system.

Instrumentation can be added to code in two ways: by changing the code to explicitly instrument a particular value or control path, or by changing the runtime environment to provide insights into the code running within.

Changing the code is easy, precise, but often messy, overly redundant, and prone to leaving the instrumentation in by mistake.

Instrumenting the runtime is powerful and more easily able to paint a fuller picture, but it is generally more complex and harder to master, and can overwhelm you with data.

Example Code

The code samples used below are variations examples of a typical programming problems. The basic overview with lots of annotations and comments can be found here: 

Clojure Debugging Playbook

The three following sections outline techniques for:

  • instrumenting the code, 
  • Instrumenting the runtime, and
  • using the REPL itself as a debugging tool. 

In our experience, problems are usually solved by applying a combination of two or three of these techniques.

Instrumenting the Code

Clojure is a dynamic language. It’s very easy to make changes to a function, eval the code in the REPL, and immediately observe the effects. We can use this tight feedback loop to directly instrument the code.

println debugging

"The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." – Brian Kernighan, Unix for Beginners (1979).

First in our list is probably every programmer’s first debugging technique: println. It’s sometimes seen as a naive approach, perhaps because in compiled languages, like C or Java, the edit/execute/inspect cycle time can be so long. However, in an immutable, functional, dynamic language such as Clojure it is highly effective.

The technique is simple: to see value of an s-expression as the code is executed, println it. To trace the execution of the code, add println statements to act as a “breadcrumb trail” through the code.

For example, the example code has an infinite loop. Knowing that recursion can be tricky, we add a little println into the loop to see what’s happening:

We never finish processing the first account. Looking at the loop/recur, we never reduce the size of the accounts collection - we forgot a (rest accounts) in the recur call.

When you have more than a couple of print statements you will need to be careful to provide enough context in your println so that you know what each output value corresponds to. You might also have to change the structure of your code a little, e.g. introducing a let binding to capture a variable’s value. You can make this easier with a short helper function:

When using printlns it’s necessary to delete them prior to committing code. It’s very easy to forget and leave them in. This problem can be mitigated by using a logging framework and adjusting environments to output the logs at appropriate levels. We cover more on logging below.

inspecting let bindings

We often want to see one or more of the values in a let binding. A specific instance of println value inspection is to use the _ (underscore) convention for an unused variable to act as the name to bind in order to add a println call directly into the let bindings.

Looking at the full stacktrace from the last error, we can see the error is thrown from inside the as-currency function. We can introduce a let and inspect all the intermediate values to see where the problem might be:

A common error is forgetting the _ binding before each println.

A more elegant way to do this is with a debugging let macro. This example shows how it works, but you would likely put the dlet definition in your user namespace somewhere.

The macro instruments every binding in the let form automatically, printing out all data with a single additional character. This is a useful timesaver for larger let forms.

Inspecting threading intermediates

Again, another special case of println debugging is to inspect the intermediate values in a chain of threaded functions.

This technique inserts an invocation of an identity function into a list of threaded forms, but the function has the additional side effect of printing out its argument. The following example shows both the direct inline anonymous function, and a quick helper function:

The double parens aren’t very common, but here they are necessary - we use them to define and immediately execute a function. Also, we use them to make sure to return the value from the inspection function!

logging

Logging: it's basically println debugging but better. From a debugging point of view it is useful because you get a file name and line number for each log statement. From a maintenance point of view, you can leave the log statements in but disable them from printing via configuration.

Each log statement is assigned a log level (typically trace, debug, info, warning, and error) and the log framework is configured to emit log statements at a particular log level on a per-namespace basis. For example, you may want to configure your development environment to print all debug statements, however you just want warnings and errors in production. 

Like println, logger calls return nil, so if you are instrumenting a function that needs a return variable, you may need to capture the value to log and then return it. Here are a couple of ways to do that:

Spyscope

The Spyscope library is like a ninja version of println debugging. From its GitHub page:

Spyscope includes 3 reader tools for debugging your Clojure code, which are exposed as reader tags: #spy/p, #spy/d, and #spy/t, which stand for print, details, and trace, respectively. Reader tags were chosen because they allow one to use Spyscope by only writing 6 characters, and since they exist only to the left of the form one wants to debug, they require the fewest possible keystrokes, optimizing for developer happiness.

#spy/p has the same uses and drawbacks of println debugging, but it’s shorter and easier to add into code, and it outputs pretty printed colorized output.

#spy/d is even more powerful, showing one or more stack frames for where it was called from, timing data, and lots of other useful stuff. The project README contains full documentation of all spyscope’s functionality.

Here is a screencap of how you can quickly add spyscope tracing to your code, and what the output looks like:

spyscope.gif

Instrumenting the runtime

There are a number of more sophisticated tools that observe control flow and return values on your behalf, giving you ways to view and even interact with that data.

clojure/tools.trace (+ CIDER)

The canonical tracing library, clojure.tools.trace, exposes two functions to add traces and two functions to remove them, one each for functions and for namespaces. Tracing is applied dynamically without modifying the code being instrumented. As traced functions are run, arguments and return values are printed. Nested function calls are printed with nesting indicated. 

This screencap shows the same faulty bank transactions example, first being run without tracing, tracing getting applied, and then running with tracing:
 

clojure_tools_trace.gif

Helpfully (for debugging), the function throwing an exception is printed last, clearly showing its arguments. 

CIDER has a couple of convenience functions to help with tracing and untracing, plus highlights traced functions.

cider-trace.gif

Clearly, these are powerful tools. You get to see argument and return values, and get a sense of the control flow, with no need to modify the source itself. A possible downside is that you can get an overwhelming amount of trace data back.

An additional noteworthy mention is for Sayid. It is an inspecting debugger for the Clojure runtime and has a corresponding mode for Emacs. It offers even more advanced views into the data with the ability to drill down and even replay execution. 

One final advantage of inspecting the runtime instead of modifying the code is it is a safer technique: if you have to connect to a production server’s nREPL to live debug a particularly tricky problem, adding traces to the vars is much less error prone than adding instrumentation to the code and re-evaluating live.

re-frisk / re-frame-trace

Re-frisk and re-frame-trace are specific to ClojureScript re-frame apps, but, given their current popularity, these tools are worth mentioning. Both provide inspection tools for the global application state, and an inspectable log containing the sequence of re-frame events triggered during the application’s use. In combination with the other debugging techniques shown here, this specialized view of the re-frame events is very useful for solving the front end bugs that are typically hard to track down.

Evaluative techniques

The least obvious way to debug for new Clojure programmers is to leverage the language’s dynamic nature and tight REPL integrations. Of the three general categories of debugging tools presented here, this is the most powerful and the most general purpose. It is also a key technique for effectively authoring Clojure code.

Using the REPL to eval code

Fast keystroke-based REPL integration with your editor is a must. Emacs, Atom, Cursive and Vim all support this workflow. We highly recommend you become familiar with this way of working. Is truly one of the most amazing aspects of programming in Clojure.

You can use your editor to evaluate code in the Clojure REPL directly from the source code. You might be evaluating a form to see what it evaluates as, or to observe any side effects (e.g. println output). Or you might be evaluating a new function, or a newly changed function, so that it is available for use by other code. Notably with all of these, you don’t need to reload your project, restart the REPL, type directly into the REPL, or even save your source file. Your Clojure runtime is always running; you can change the program, execute parts of it, or write new parts and see how they work in context.

Perhaps the best way to explain this is by example: using the same code sample from earlier, I run the calculation and see an exception, add some instrumentation to observe the program flow, spot a problem case and run that manually, fix the code error and load the fixed function, then re-run the code. The pane on the right shows the Emacs keystrokes and more importantly the commands they correspond to. 
 

repling.gif

repl, def, eval

Many times you want to evaluate how some particular function might be breaking. We’ve shown a number of ways how you can insert code to instrument a function’s operation, but you can use the REPL instead. 

Stu Halloway has another fantastic blog post where he uses a combination of binary search, brainpower, and the REPL to debug an error without even having access to a stacktrace. The REPL technique he uses is this: temporarily define variables that correspond to the names used within your function, then evaluate the code in your function directly.

For example, say we came up with an as-currency function that would handle messier input, like comma separators for thousands:

Running the function throws an exception, so something isn’t right. The following screencap shows how we test how the components of the bigdec conversion work assuming the negative? and cleaned-amount checks are correct, then narrowing the problem to the value of cleaned-amount. That’s definitely not as expected, and so we further narrow the problem to the str/replace call:

Capture runtime data for post-run REPL analysis 

In the previous example, by manually entering some appropriate data and binding a var we can evaluate subcomponents of a function independently. Sometimes the data we need to correctly operate a function is more complex than can reasonably be typed in. Similarly, it might be that you don’t quite know the exact data to “break” a function, but you can run the code from its top level and get the code to break.

In these cases, we can capture runtime data, either by defining a var or by using an atom. Then, when the main execution flow has completed, we can use the data we captured and evaluate that data using our functions, under controlled conditions in the REPL.

Other thoughts

In wrapping up, another of Brian Kernighan’s brilliant aphorisms is applicable:

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" – Brian Kernighan, The Elements of Programming Style

While this has been an entire article focused on inspecting and instrumenting code at run time, without an understanding of the code at hand these efforts are seriously hampered. We must always be able to read and comprehend the code that we are evaluating. When you write code, always strive for readability. Your 6-months-future self will thank you.

Summary

Hopefully we’ve helped answer the question, “How do I debug in Clojure?” 

Remember these three general techniques:

  • instrument the code, 
  • instrument the runtime, and
  • use the REPL itself as a debugging tool. 

You can modify code live in a REPL or editor (depending upon your build setup) to quickly add println, log, and trace statements. You can temporarily define variables and atoms to capture and inspect values in a connected REPL, or inspect a re-frame SPA’s app-db and all events in the system from a browser window with tools like re-frame-trace and re-frisk. There are myriad techniques to help you understand what your Clojure and ClojureScript code is doing, and in practice you will probably use more than one of these at the same time

Once you are familiar with these techniques, you will learn which is the most effective to use to solve a particular problem. The techniques all overlap, each having its own particular pros and cons, and each having different maintainability, complexity, and flexibility characteristics.

Remember though, the easiest program to debug is the one that doesn’t need it, because it is written so clearly that any errors are obvious. Decomposing code into simple, pure functions, covered by reasonable unit tests, goes a long way towards not needing to use the above techniques too often.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.