Package a Clojure web application using Docker

This is the second blog post in a three-part series about building, testing, and deploying a Clojure web application. You can find the first post here.

In this post, we will be focusing on how to add a production database (PostgreSQL, in this instance) to an application, how to package the application as a Docker instance, and how to run the application and the database inside Docker. To follow along, I would recommend going through the first post and following the steps to create the app. Otherwise, you can get the source by forking this repository and checking out the master branch. If you choose this method, you will also need to set up your CircleCI account as described in the first post.

Permalink

Quality Assurance Engineer

Quality Assurance Engineer

MX Healthcare | Berlin
MX Healthcare improves Healthcare by applying Deep Learning to Radiology
€50000 - €70000

What we do


Merantix transforms how businesses, healthcare organizations, and governments operate by building products that apply machine intelligence to their enterprise datasets. We acquire datasets that enable promising products, incubate those products, and spin off ventures to support their growth. We take advantage of three core competencies: Our experience applying machine learning, our expertise fostering enterprise partnerships to assemble vast datasets, and our culture of rapidly iterating to build scalable products.

Our projects are meaningful and diverse. Currently we work on:

  • Automated medical image diagnosis
  • Financial models for automated trading on new markets
  • Deep neural networks used for autonomous driving

As our current ventures progressively grow mature and become more independent, we will look into new industries such as satellite imagery, pharmaceutical drugs, industrial IoT and automated manufacturing.

Our team is made up of entrepreneurs, scientists, and engineers from premier universities around the world. Many of us have PhDs and work experience at top tech companies. We're based in Europe's startup capital, Berlin, and are growing quickly!

Job

You want to work on something with a purpose and make healthcare more safe and efficient?

Merantix is looking for a Quality Assurance Engineer. You will be working on our product team, testing a platform to securely view and report on X-Rays, MRIs, and CT-scans. Our product is written in Clojure and Python, and this unique role will involve automated testing (mostly in Clojure; you can learn it on the job!), as well as exploratory/manual testing where appropriate.

What we provide

  • Work in healthcare: You get to work in an innovative, well-funded startup becoming part of the healthcare industry.
  • Startup: Don't lose time due to politics, build things and have an impact instead
  • Stay ahead of industry trends: You'll have the chance to build on your existing QA skills while learning a functional language.
  • Great team, opportunity for growth, mentoring: Our engineers are active members of the Berlin Clojure community

Qualifications

We define ourselves by a culture of friendship and ownership. We're looking for capable, driven, and thoughtful people who think outside the box and add to our vision.

Basic qualifications:

  • You're self motivated and want to take ownership of product quality
  • You have a solid understanding of quality assurance practices and test strategies
  • You're interested in Clojure and functional programming and willing to learn it (if you already have experience in this area, that's great too!)

Preferred qualifications:

  • Understanding of web application fundamentals (JavaScript/HTML/CSS, REST APIs, relational databases, single page applications, etc.)
  • Experience building automated test suites
  • Experience with sound engineering practices: Git, code review, testing, continuous delivery and automation

The preferred qualifications are just that: preferred. If a few of these points apply to you, we definitely want to talk!

Working at Merantix:

Our hierarchy is flat and communication direct, which means that we operate and learn fast, as a team. You are a great fit for our team if this describes you:

  • Entrepreneurial mindset: Creative, focused, and not easily discouraged
  • An ability to work in a fast-paced environment where continuous innovation is occurring and ambiguity is the norm.
  • Excellent technical and analytical problem-solving skills
  • Structured work style to develop effective solutions with minimal oversight
  • Analyze and solve problems at their root, stepping back to understand the broader context
  • Strong organizational and multitasking skills with ability to balance competing priorities
  • Excellent communication (verbal and written) and interpersonal skills and an ability to effectively communicate with both business and technical teams.
  • Love what you do (and own it)

In addition to a competitive salary and equity package, you can expect the following:

Work-life balance:

Some of us come in during regular business hours, and some of us prefer to start later or earlier. Occasionally, we work from home in the mornings or evenings rather than in the office. We don't track vacation time, and expect you to take time off as you need it. And while the infrequent unavoidable deadline may demand extraordinary effort, we value our evenings and weekends and overtime is the exception, not the rule. Just ask one of our founders, a father of two.

Professional Development:

We aim for attending at least one professional conference each year. Our entire machine learning team attended NIPS and ICCV while our product team attended ClojuTRE, clojure/conj and clojureD. We make major contributions to the open source community. We have weekly engineering meetings both for machine learning and Clojure development which includes presenting papers, talking through interesting pieces of code or just mentioning some cool libraries.

Collaborative Culture:

We have regular outings and team dinners. If you want to participate, that's great! And if you don't, that's fine too.

We are an equal-opportunity employer and value diversity. We consider all applications equally regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or gender identity. We strongly encourage individuals from groups traditionally underrepresented in tech to apply, and we can help with immigration.

Permalink

PurelyFunctional.tv Newsletter 318: Tip: Beware the order of keys in hashmaps

Issue 318 – March 18, 2019 · Archives · Subscribe

Clojure Tip 💡

Beware the order of keys in hashmaps

You cannot rely on key-value pairs in hashmaps coming out in the same order as they went in. The trouble is, when you test, it might appear that you can. It won’t be until later, with a larger input, that the problem arises. But at that point, your mind has moved onto new problems, and you won’t know where to look.

This tip is probably obvious to you, but just to make sure it’s really clear, I’ll demonstrate.

Maps may keep their order for small inputs

Here’s some code, with a small input, that shows that the keys come out in the same order as they went in.

(keys (into {} [[:a 1] [:b 2] [:c 3]])) 
;=> (:a :b :c) 

In fact, I can go all the way up to 8 key-value pairs in a map on my machine and it still maintains the order:

(keys (into {} [[:a 1] [:b 2] [:c 3] [:d 4] [:e 5] [:f 6] [:g 7] [:h 8]]))
;=> (:a :b :c :d :e :f :g :h)

But adding a ninth scrambles them:

(keys (into {} [[:a 1] [:b 2] [:c 3] [:d 4] [:e 5] [:f 6] [:g 7] [:h 8] [:i 
9]]))
;=> (:e :g :c :h :b :d :f :i :a)

Why do maps keep their order for small inputs?

Well, it’s an implementation detail. Small maps are of type PersistentArrayMap.

(type (into {} [[:a 1] [:b 2] [:c 3]])) 
;=> clojure.lang.PersistentArrayMap

Large maps are of type PersistentHashMap.

(type (into {} [[:a 1] [:b 2] [:c 3] [:d 4] [:e 5] [:f 6] [:g 7] [:h 8] [:i 
9]]))
;=> clojure.lang.PersistentHashMap

ArrayMaps happen to maintain order (though the interface does not guarantee it). And at some point, associng one more key-value pair will flip it over into a HashMap.

Why is this a problem?

Well, at small inputs, if you’re doing interactive tests, it looks like you are getting a correct result. It looks like your algorithm is correct, even if you are relying on order.

The trouble is that we often use small inputs when we’re testing. That testing will give you a false confidence. You’ll move on to other areas of the code. And when things start breaking, you won’t know where to look.

How to fix it

Well, it’s not so easy. I suggest two habits.

  1. Think of all maps as unordered, like bags of stuff. You put stuff in a bag, shake it up, and things get mixed up.
  2. Always test with an input at lest 10 elements long. I usually do this test at the end, just to catch ordering problems.

Brain skill 😎

Y’all, we all need to practice a little bit more deliberate practice. Deliberate practice is when you practice with a purpose, and it is the key to mastery. We all write a lot of code, either at work or for school assignments, or even for fun. But are we focused on improving a particular skill?

In deliberate practice, you break down a large skill (say, Clojure programming) into smaller skills (paren management, IDE keystrokes, data structure usage, concurrency, etc). You then focus practice on that skill until mastery. Then you move on to the next.

How small should the skills be? Research shows that you want a skill you can master (achieve 95% accuracy) within 1-3 45-minute sessions (1 is better than 3). If you can’t master it in that time, you need smaller skills.

At 95% accuracy, the skill is reliable and automatic. Under that, you can still do the skill, but only with effort. The trouble is, without pushing past the 95% limit, we’ll have lots of skills that take a lot of effort. It’s better to have a partial skill at 95% accuracy than a full skill at anything less. That’s why it’s important to break things down and focus on one at a time.

Also, you’ll need to have a way of measuring your progress. How do you know how accurate you are? That’s one place where a REPL really shines. It gives you really fast feedback to let you know you’re on track.

Reference: Badass: Making Users Awesome by Kathy Sierra.

Action

Write down a skill you would like to learn. Could you master it in 45 minutes? Chances are, it’s too big. Break it into smaller skills until you’re confident that 45 minutes of practice will get you there.

Clojure Puzzle 🤔

Last two weeks of puzzles

When I got back from vacation, I was so happy to see so many great submissions to both puzzles.

Issue 316 drop every nth element

It’s really amazing how concise these can be. See the submissions.

Issue 317 run-length encoding/decoding

There were some really great answers. See them here.

Thanks to all those who submitted.

This week’s puzzle

generate combinations

If I have 4 flowers to choose from (#{:rose :lily :daisy :tulip}), I can generate 4 different combinations of 3 flowers.

(#{:rose :lily :daisy}, #{:rose :lily :tulip}, #{:rose :daisy :tulip}, #{:lily 
:daisy :tulip})

Write a function combinations that takes a collection of values and a number of items to choose and generates all combinations of that size.

Example:

(defn combinations [coll n]
 ...)

(combinations #{:rose :lily :daisy :tulip} 3)
; => (#{:rose :lily :daisy}, #{:rose :lily :tulip}, #{:rose :daisy :tulip}, 
#{:lily :daisy :tulip})

Bonus points for clarity, interest, and efficiency.

As usual, please send me your implementations. I’ll share them all in next week’s issue. If you send me one, but you don’t want me to share it publicly, please let me know.

Rock on! Eric Normand

PS ✍️

I have started recording a new course called Repl-Driven Development in Clojure. It’s an important topic and there just isn’t enough of a comprehensive take on the subject. Repl-Driven Development is where Clojure shines, and without it, Clojure can seem like a drag.

The course is available as of right now as part of an early access program (read serious discount). If you buy now, you’ll receive updates to the course as new lessons come out. There is already 1.5 hours of video, and many more coming. If I had to guess, I’d say 6-8 hours total when I’m done. But I can’t be sure. That’s what the discount is for. As the course is fleshed out, the price will go up.

Of course, members of PurelyFunctional.tv will get the course as part of their membership for no extra cost. Just another benefit of being a member.

Check out the course. The first lesson is free.

The post PurelyFunctional.tv Newsletter 318: Tip: Beware the order of keys in hashmaps appeared first on PurelyFunctional.tv.

Permalink

Do locks slow down your code?

Yes. Locks slow down your code. But they enable your code to be correct! It’s a tradeoff, but who would ever trade correctness for a little speed? In this episode, we look at the tradeoff, how to make locks less of a speed tradeoff, and some alternatives.

Transcript

Eric Normand: Locks slow down your code. By the end of this episode, I hope to give an engineer’s perspective on the time trade-off of locks.

My name is Eric Normand. I help people thrive with functional programming. This is an important topic because locks are one way, an important way to achieve concurrency. That means sharing of resources between different threads. We need to understand their trade-offs.

This idea, discussion topic was something that was brought up by someone I was having a phone conversation with.

I was talking about how locks are an important concurrency mechanism and he brought up, “Yeah, but they slow down your code.” It struck me as funny to bring that up because I did learn that in school. I remember specifically in class when we learned about locks at university that that was a primary thing that was of concern like, “Remember, these are going to slow down your code.”

I don’t know why people talk about that, because after years of actually building systems using locks and other concurrency primitives, it is just so clear how much more important correctness is than that kind of speed.

It might be an old-fashioned idea because maybe they weren’t so fast. Computers weren’t as fast back then, and maybe locks were considered pretty expensive, and so that university classes often lagged behind the technology a little bit. Professors are just teaching what they learned in school, and this person is older than I am so he probably learned that even before. His professors were even older.

It’s one of those things that I think we have to get over. That there’s nothing you want to trade-off for correctness.

If your program doesn’t do what it’s supposed to do, then it doesn’t matter how fast it is. It just doesn’t make any sense. That’s why it struck me as funny. I just wanted to bring that up and re-emphasize it. Now, that’s not to say that they don’t slow down your code. They definitely do.

There’s a few things that we have to talk about with that. Locks definitely are slower than letting your threads run without locks. There’s no question about that. The problem is without locks or some other kind of concurrency primitive, you run the risk of what are called race conditions.

I have a whole episode on race conditions. Race conditions generally, in a nutshell, mean your threads are not sharing nicely. For example, they could be using the same variable to store stuff in, or the same mutable data structure. They’re reading and writing at the same time over each other.

It’s like you’re trying to all color on the same paper at the same time. You’re bumping into each other, and you’re like, “Hey, I wanted that to be blue, now you just colored it red.” You need some kind of way to share. I use this simplistic example it’s kind of a kid example because it’s like learning to share as a kid.

Like it’s my paper right now, I’m going to color it. When it’s your turn, you can color it. Obviously, that’s going to slow down the kids. They’re going to have to wait. At least you get some sense of order to the drawing. In some sense, it’s going to be more correct.

There’s one thing that you could do in this scenario to reduce that trade-off, reduce that cost of time, which is to reduce the amount of time you spend inside the lock. One thing that we do in Clojure is, because we’re dealing with immutable data structures, you can actually calculate the value that you will store in the locked variable ahead of time.

Normally what you would do if you wanted to operate on some data structure, let’s say it’s immutable data structure, but you want to do it safely. You want to do it correctly with multiple threads.

You would create a lock that all threads they have to acquire the lock before they can modify that data structure. One thread is going to have the lock at a time. You try to get the lock, if it’s already locked, you have to block, you just going to wait.

If it’s not logged, you get it. Now you can modify the data structure. When you’re done modifying it, you release. You might do 5, 10, or 100 operations on that data structure. The longer you go doing operations inside that lock, the more other threads are going to have to wait.

What you want to do is reduce the amount of time that you spend inside that lock. One thing that you could do is, instead of doing 10 operations inside the lock, you could, for instance, make a copy of the data structure without getting a lock or maybe you get a lock just long enough to make the copy.

Then you operate on that copy. It’s your copy. You can do whatever you want. You don’t need to lock. It’s not a shared resource. Then you grab the lock and swap out that data structure for the one that’s in there because you have the lock, you’re allowed to do that. You’re allowed to make modifications to it.

Now we do this in Clojure because we have immutable data structures that have built-in copy-on-write semantics. We don’t even think about the copy. We are making a copy. We can also guarantee that nothing else is modifying it while we’re reading it. Once you have a pointer to it, you’ll know it’s an immutable thing.

We have this built into the language. It’s one of the things that makes Clojure really nice. Now, here’s the thing, when you get that lock, you have to make sure that it hasn’t changed since you read it. You made this copy. You had to read the data structure to make the copy. Has the data structure changed in another thread since you read it?

This is what’s called compare-and-swap. You make this copy, you modify it. Then when you grab the lock, you have to check, “Hey has it changed since I read it last? Has some other thread changed it?” If it has, you got to start over. You’re like, “OK, I’ll read it again make a new copy and make modifications to this new thing.”

If it hasn’t changed, you can just set it right away. What this does is it lets you have a much smaller window of locked code, a much smaller mutual exclusion section. It’s another word, mutual exclusion and lock are pretty much the same. It just makes sure that you are spending much less time in that locked state.

There’s another trade-off here which is that you’re going to be doing work on your threads. This is instead of the threads simply blocking and not doing anything. It’s a trade-off that doesn’t matter. Except there will be a little bit more heat your threads are working, so they are taking up CPU from potentially other threads working, so there is some trade-off there.

I also want to talk about how locks are error-prone. There’s a lot of possibilities there for say, forgetting to acquire the lock before operating on it, before operating on this shared resource. There’s stuff like how do you make sure that you release the lock when you’re done?

You could forget to do that. If you’ve got multiple locks, you’ve got to lock them in the same order and every single thread. It becomes actually a pretty hard challenge and a lot of bugs in.

It’s one of the reasons why even with locks, people think multi-threaded code is very difficult. It’s because you’ve got these challenges now with reasoning about, “If I have this lock, what can I do? I need two locks. What order should I get them in?”

It’s actually a pretty hard thing to reason about once you’ve got a real sizable system. In Clojure, we don’t generally use locks themselves. We use primitives that are often built on top of locks.

The locking has been solved once and for all, and we can think at a higher level. We have something called an atom. It’s probably the most commonly used Clojure concurrency primitive, and it does that compare-and-swap.

It’s built on a Java class. I think the class is called atomic compare-and-swap. All it does is it stores a single immutable value and gives you an interface for modifying that value with retries if something else has modified it while you were modifying it.

It’s probably got locks down…I haven’t looked at the Java implementation, but it probably has locks down at the bottom so that you can do this compare-and-swap. Meaning has this value changed or has this pointer changed since I read it and made a copy?

If it has, I’m going to retry. It means I’m going to read it again and make a new thing. All of this is done with…You don’t have to deal with the locks. It’s a higher level of working.

Another common concurrency primitive is the queue. Instead of racing to grab the lock, and then whoever acquires lock first gets to go first, you could put the work into a queue and have a single thread operating on that shared resource at a time.

This is another way of sharing. It’s taking turns, or you’d line up. This works a lot better when there’s a lot more contention. Just imagine a crowd of people all trying to use the same…

I like to think of it like kids because kids don’t know the rules sometimes or they break the rules. They’re trying to share this toy. A few kids can work through the turn-taking. Whoever grabs it first, they get to play with it until they’re done. Then they pass it on to the next person, and then that person gets to play with it.

Somehow, they manage. Once it gets to 10, 20 kids, some kid is going to be like, “I haven’t played with it in like an hour. Please, can I?” This whole grabbing it, whoever gets it first isn’t going to work anymore.

Now, you have to line up. You get in line. Whoever’s at the front of the line, you get to play with it as long as you want. Then when you’re done, you want to play with it again you’ve got to go back at the end of the line.

It has a fairness to it that you can ensure that things happen in a certain order, and no one goes without playing with the toy. This queue, it changes the nature of the shared resource.

That toy, yes, it’s being shared, but only by one thread at a time. One thread runs that shared resource. Often, the way it’s implemented in software is you have one thread that’s called the worker thread.

Its tasks get put into the queue that operate on that shared resource in the worker thread. By putting something into the queue, you’re given a promise or a future that will have the result of your work being done.

The thread can continue doing some other stuff. Then when the worker thread is done, it will put the value into the feature. Then the original thread can continue working on it.

Really, only one thing is accessing that resource at a time. You could say it’s not sharing that resource anymore. It’s only one thread using it. What becomes shared is the queue.

You get smart people. They sit down. They harden this queue. They make sure it works concurrently, and all the locks are in place. That queue becomes the shared resource because the threads are going to be adding tasks to that queue so that they’re all adding them at the same time in parallel.

They’re sharing. There’s some discipline that’s going on with the locks for how things get put in in order. I just wanted to go over these different ways of getting at a higher level than locks, because locks are just very hard to reason about.

When we’re doing concurrency, it’s all about shared resources. Locks are…Imagine it’s like sharing a bathroom. If you have a couple of roommates, a little lock on the door of your bathroom can really help you share this bathroom safely.

You try to open the door, “Oh, it’s locked. I’ll come back in a little bit and try again. Someone must be in there.” If you had 20 people sharing a bathroom with locks, everyone is constantly knocking on the door.

What if someone takes a long time in there? It becomes a mess. It’s not enough, so you need some other system like a queue. Put your name on the board.

When someone leaves the bathroom, they’re going to call the person’s name on the board and whoever that is can come. You don’t have to just wait and stand in line. It’s also got an order to it so that things work out.

Recap. Locks do trade time for correctness, but correctness is something that you need. It’s not like you’re going to say, “Yeah, we’ll just be wrong half the time, but at least it’s fast.” You just never do that.

You do need to understand that once you go concurrent, meaning you have multiple threads sharing this resource, you are going to trade-off a little time. There’s no way around it.

Just like if you were sharing a bathroom, there are going to be times when you need to use the bathroom, but someone’s in it, so you have to wait. That’s just going to happen.

However, there is an efficiency there, because a lot of times the bathroom isn’t being used. Why have a bathroom per person? It doesn’t make sense. It also lets you scale up faster. Because the bathroom isn’t being used most of the time, you can add more people and — though there might be some bottleneck times — most of the time it will work out, if you’ve got a good system. It’s good for scaling.

The main thing you want to do with locks is reduce the amount of time inside the lock. That’s the main thing. If you can get people to go to the bathroom faster, for instance, you’ve got your toilet in there, then after you use the toilet, you go to the sink, you wash your hands, then you dry your hands, then you open the door.

You could say, “Well, we’re going to move the sink outside of the bathroom. Doesn’t need to be that private.” Now, we’re doing less with the locked door.

People can be using this shared resource more effectively. We’re doing less inside of the lock. I also want to say, because locks are so error-prone, they’re the main reason why people think concurrent programming is difficult. You should look into other primitives that are probably built on top of locks but have a better interface.

They’re less error-prone. They’re actually, I would say, less abstract and more specific. Locks are basically a general purpose tool. Just like in your house, a lock is a general purpose way to keep a door closed from one side. What you want is something much more specific to your bathroom and how to share a bathroom, or how to share a kitchen, etc.

When you get more specific, there’s less you have to think about. There’s less that you have to do as a programmer to make sure it works, less discipline. You can encode the discipline in code. That’s what these concurrency primitives do. I’d mentioned to you compare-and-swap. In Clojure, we call that an atom and queues.

Do yourself a favor. Go, research some concurrency primitives in your language. If you happen to be into Clojure, I have a big list of them for Clojure. Just search for Clojure concurrency. I am in the top, I’m not the number one right now, but I am up there in the top rankings. It’s a purely functional.tv article.

Do me a favor, please. If you found this valuable, you should subscribe because then you’ll get the next valuable episode that I’m doing.

I also like to get into discussions with people. If you have question, a comment, disagreement, agreement, I appreciate all of it. I read all of it. Email me at eric@lispcast.com. You can also message me on Twitter, just at mention me, I am at @ericnormand with a D.

I am also getting into LinkedIn. Find me on LinkedIn, Eric Normand, with a D.

Awesome. See you next time.

The post Do locks slow down your code? appeared first on LispCast.

Permalink

It’s okay to invent unusual things (lessons learned from history of science)

TL;DR: Learn from books or other people but let nobody say that what’s you doing is wrong just because it’s unusual.

Great things happen when people aren’t satisfied with what they have and willing to change it by inventing something new.

Sometimes you meet open-minded people whose reaction is great and they’re always there to help and support you.

But sometimes you don’t.

Fermat and Semmelweis

Pierre de Fermat was a mathematician who lived back in 17th century.

He was into number theory that seemed useless back then. Really, who needs prime numbers when all we need is agricultural calculations?

Nobody understood number theory back then, but after three centuries Fermat’s research paved the way to modern cryptography. We have HTTPS and blockchain because of him.

Okay, this was actually funny, but stubbornness can lead to really tragical things just like the story of Ignaz Semmelweis.

Semmelweis was the doctor who noticed the correlation between keeping hands clean and risk to be infected with fatal diseases.

He pretty much tried to teach doctors of that time to wash their hands.

His approach faced so much stubbornness and straight up hatred that he actually went insane and died in asylum. Sometimes when I read things like this I don’t really believe them but is so hard to argue with the facts.

My personal experience

Here are real quotes from real people I heard through my career. Those had been said to me and other developers I worked with:

Web does not need reactive programming. Nobody does that, so you shouldn’t.

Everybody use PHP now, we don’t need your fancy stuff here (about Clojure and functional programming)

You have to be a moron to write server-side code in JavaScript.

While straight up stubborn reactions to your vision are pretty much unavoidable, you can really push things forward and change the state of art forever.

Just like Quake developers invented the unusual way to calculate the square root unbelievably fast or Dan Abramov wasn’t satisfied with existing solutions and created Redux which later become a de-facto standard, you may be the one who the world needs.

Just carry on.

Patreon link

Permalink

Faster and Friendlier Routing with Reitit 0.3.0

We are happy to introduce a new version of reitit, a fast new data-driven routing library for Clojure/Script!

[metosin/reitit "0.3.0"]

The version is mostly backwards compatible, but contains a lot of big features, including a new route syntax and a new error formatter. As the last post was about version 0.2.0, here's a quick tour of the major changes since that.

New Route Syntax

Before 0.3.0, only colon-based path-parameters were supported. Parameters had to fill the whole space between slashes:

[["/users/:user-id"]
 ["/api/:version/ping"]]

The syntax is simple, but it doesn't allow the use of qualified path-parameters. The new wildcard routing in 0.3.0 support both the old syntax and a new bracket-syntax:

[["/users/{user-id}"]
 ["/api/{version}/ping"]]

Qualified keywords can be used, and parameters don't have to span whole segments between slashes:

[["/users/{domain/user-id}"]
 ["/files/file-{domain.file/number}.pdf"]]

More details in the route syntax documentation.

On Error

There has been a lot of complaints about error messages in Clojure. With Clojure 1.10.0, things are bit better, but the default error printing is still far from the friendly error messages of Elm and Eta.

In reitit, router creation time error messages have been rethought in 0.3.0. In case of error, an ExceptionInfo is thrown with a qualified error name and ex-data having all the relevant data. reitit.core/router catches all Exceptions and rethrows them with an enhanced error message, done by a configured error formatter.

The default formatter formats errors just like before. For more friendly errors, there is a new module reitit-dev, which contains an error message formatter based on fipp and expound (and the lovely 8bit colors from rebl-readline).

Below are few sample error messages.

On Route Conflict

(require '[reitit.core :as r])
(require '[reitit.dev.pretty :as pretty])

(r/router
  [["/ping"]
   ["/:user-id/orders"]
   ["/bulk/:bulk-id"]
   ["/public/*path"]
   ["/:version/status"]]
  {:exception pretty/exception})

Invalid Route Data

(require '[reitit.spec :as spec])
(require '[clojure.spec.alpha :as s])

(s/def ::role #{:admin :user})
(s/def ::roles (s/coll-of ::role :into #{}))

(r/router
  ["/api/admin" {::roles #{:adminz}}]
  {:validate spec/validate
   :exception pretty/exception})

The error formatter is developed and tested on macOS and currently only supports a dark theme. There have been discussion in #clj-commons Slack whether there could be a community-driven error formatter.

Performance

For routes with path-parameters, version 0.2.0 used a segment trie written in Clojure. It was already one of the fastest routers for Clojure, but still much slower compared to fast routers in other languages.

If search for better performance, the segment trie was ported into Java yielding 2x better performance. This was already good, but we didn't want to stop there.

For 0.3.0, the wildcard routing trie was fully rewritten, both with Clojure/Script and with Java. We used Profilers and flamegraphs (via clj-async-profiler) to find and eliminate the performance bottlenecks. On JVM, the trie is now 3x faster than the previous version, making it 6x faster than the one in 0.2.0.

I'll be talking about reitit in Clojure/North and will walk through the performance journey there in more detail.

According to the perf tests, reitit is now orders of magnitude faster than the other tested routing libraries and only less than twice as slow as httprouter, fast(est) router in Go. So, still some work to do ;)

The ClojureScript version is not as highly optimized as the Java version. If you have skills in tuning ClojureScript/JavaScript performance, feel free to contribute.

Spec Coercion

Spec coercion is now much more complete and works with plain clojure.spec Specs. Most of the changes have happened in spec-tools, which now also has a proper coercion guide. It currently supports only Spec1. There are also coercion guides on reitit side, for both core routing and for http/ring.

(require '[clojure.spec.alpha :as s])

(s/def ::x int?)
(s/def ::y int?)
(s/def ::total int?)
(s/def ::request (s/keys :req-un [::x ::y]))
(s/def ::response (s/keys :req-un [::total]))

(def math-routes
  ["/math"
   {:swagger {:tags ["math"]}}

   ["/plus"
    {:get {:summary "plus with query parameters"
           :parameters {:query ::request}
           :responses {200 {:body ::response}}
           :handler (fn [request]
                      (let [x (-> request :parameters :query :x)
                            y (-> request :parameters :query :y)]
                        {:status 200
                         :body {:total (+ x y)}}))}
     :post {:summary "plus with body parameters"
            :parameters {:body ::request}
            :responses {200 {:body ::response}}
            :handler (fn [request]
                       (let [x (-> request :parameters :body :x)
                             y (-> request :parameters :body :y)]
                         {:status 200
                          :body {:total (+ x y)}}))}}]])

See full example apps with spec coercion (and api-docs) for reitit-ring, reitit-http, reitit-frontend and reitit-pedestal.

Frontend

Small improvements, including a new polished api for controllers and HTML5 History routing works now with IE11. Both the Reagent template and kee-frame now default to reitit, which is really awesome.

Pedestal

Support for Pedestal was shipped already in 0.2.10 via new reitit-pedestal module. It allows the default Pedestal router to be swapped into reitit-http router. See the official documentation and an example app.

Final Words

0.3.0 was a big release, thanks to all contributors and pilot users! The full list of changes is found in the Changelog. We'll continue to actively develop the core libraries and the ecosystem around it, the roadmap is mostly laid out as Github Issues. Many issues are marked with help-wanted and good-first-pr, contributions are most welcome.

To discuss about reitit or to get help, there is a #reitit channel in the Clojurians Slack.

And last, but not least, I'll be giving a talk about reitit in Clojure/North on 20.4.2019. Looking forward to it.

Permalink

Journal 2019.11 - spec/select, clj jiras, tools.deps

spec/select

I spent most of the week on spec/select and committed the first version of it. The idea with s/select is to separate the set of attributes allowed in a map from which ones are required in a given context - see Maybe Not. In the talk, Rich uses s/schema to declare the keyset but that is still in flux, so you can currently use either an s/keys spec or a vector of attrs to declare the keyset.

(require '[clojure.spec-alpha2 :as s] 
         '[clojure.spec-alpha2.gen :as gen])

(s/def ::street string?)
(s/def ::city string?)
(s/def ::state string?) ;; for demo
(s/def ::zip int?)      ;; for demo
(s/def ::addr (s/keys :opt [::street ::city ::state ::zip]))

(s/def ::id int?)
(s/def ::first string?)
(s/def ::last string?)
(s/def ::user (s/keys :opt [::id ::first ::last ::addr]))

;; (s/select keyset selection)
;; The selection pattern is a vector of either keywords for 
;; required attrs or maps of optional attrs to a selection pattern
;; to use in a submap.
(s/valid? (s/select ::user [::id ::addr {::addr [::zip]}]) 
          {::id 100 ::addr {::zip 63011}})
;;=> true

;; And it gens, using all the optional sub-parts, but ensures all
;; the required selected paths
(gen/sample (s/gen (s/select ::user [::id ::addr {::addr [::zip]}])))
;;=> (#:user{:last "", :id 0, :addr #:user{:city "", :zip -1}}
;;    #:user{:last "", :id 0, :addr #:user{:zip 0}}
;;    #:user{:first "ie", :id -2, :addr #:user{:city "", :zip -1}}
;;    #:user{:last "uS", :id -1, :addr #:user{:city "9G2", :zip -4}}
;;    #:user{:id -2, :addr #:user{:zip -2}}
;;    #:user{:last "i0T97", :first "", :id -3, 
;;           :addr #:user{:street "88", :city "joOc", :zip 1}}
;;    ...)

You can reuse the same keyset but select different paths (and get different gen and conform):

(gen/sample
  (s/gen
    (s/select ::user [::first ::last ::addr
                      {::addr [::street ::city ::state ::zip]}])))
;;=> (#:user{:id -1, :first "", :last "",
;;           :addr #:user{:city "", :street "", :state "", :zip -1}}
;;    #:user{:id 0, :first "", :last "",
;;           :addr #:user{:city "", :street "C", :state "x", :zip 0}}
;;    #:user{:first "vU", :last "",
;;           :addr #:user{:city "", :street "0", :state "d", :zip 1}}
;;    ... )

select can also take a vector of possible keys (as a keyspec):

(gen/sample (s/gen (s/select [::first ::last] [])))
;;=> (#:user{:last ""}
;;    #:user{:last "", :first "J"}
;;    {}
;;    #:user{:first "p"}
;;    #:user{:first "a9", :last "0"} ...)

Note that this spec has no required keys (the second vector), so may be empty, or have :first, :last, or both.

And the vector keyspec can provide inline specs for unqualified keys:

(gen/sample (s/gen (s/select [{:a int? :b string?}] [])))
;;=> ({} {:a -1} {:a 1, :b "f"} {:a 1, :b "5j9"} ...)

THIS SYNTAX AND MANY DETAILS IS ALL A WORK IN PROGRESS. It will change, don’t get stuck on it.

Clojure jiras

Stu screened the two jiras I’ve discussed in prior journals, CLJ-2484 (the Java performance regression when loading user.clj) and CLJ-2463 (improving error printing for the clojure.main runner). Rich will look at them next. Still moving towards a 1.10.1 RC with these.

tools.deps

A number of people installed and tested clj on Windows this week. Some known issues to work through and seems to be a lot of ways people like to install things. Try it if you haven’t!

I spent some time this week on TDEPS-74 (and related TDEPS-106) for fixing issues around using relative paths in local deps, and I committed some changes for that. Also going to look at the proxy support ticket TDEPS-20 and get all this stuff released.

It also came to my attention that the newest versions of jgit have greatly expanded support for ssh keys and other things via Apache MINA which would potentially resolve several issues we’ve seen in ssh git deps. Hoping to take a look at that soon.

Other stuff I enjoyed this week…

This time of year, my heart is in one place - the SXSW music festival in Austin. My parents lived there for about 15 years and I started going to SXSW in 1996. I went almost every year for a number of years and last attended in 2007 (and brought my 1 and 2 year olds along at the time!). I know Austin has changed a lot in the decade since, and SXSW too I have no doubt, but I’m going to choose to believe that the spirit of serendipity and love of music continues to thrive there.

Literally the first music I saw at the first SXSW I went to was an Iggy Pop show, on a stage in the middle of 6th St. Amazing. Back when I was attending regularly, one of my favorite venues was the Steamboat (long gone) where I discovered local Austin bands I loved for years, like Sister 7 (nee Little Sister), Vallejo, and Pushmonkey. I spent many magical sxsw nights there with my closest friends seeing great shows. I’ll be throwing back a Shiner Bock this weekend to these and dozens more memories of those times.

Here’s a good video of Vallejo doing House at the Steamboat in 1996. I’m pretty sure when we saw them in 1996 at sxsw they were at the Ritz (name changed many times since then, not sure what it is now). Never in my life have I seen a band hit the stage with so much energy. It was like walking into a tornado. Instantly hooked. I saw them more times than I could count - every SXSW but also many other times in Austin and other cities. Me and a good buddy saw them in Cincinnati once on a frigid winter night and we were literally the only people in the place, but they did an incredible gig anyways.

Permalink

Ep 020: Data Dessert

Christoph and Nate discuss the flavor of pure data.

  • “The reduction of the good stuff.”
  • “We filter the points and reduce the good ones.”
  • Concept 1: To use the power of Clojure core, you give it functions as the “vocabulary” to describe your data.
    • “predicate” function: produce truth values about your data
    • “view” or “extractor” function: returns a subset or calculated value from your data
    • “mapper” function: transforms your data into different data
    • “reduction” (or “reducer”) function: combines your data together
  • Concept 2: Don’t ignore the linguistic aspect of how you name your functions.
    • Reading the code can describe what it is doing.
    • Good naming is for humans. Clojure doesn’t care.
  • Concept 3: Transform the data source into a big “bag” data that is true to structure and information of the source.
    • Source data describe the source information well and is not concerned with the processing aspects.
    • Transform into data that is useful for processing.
  • Concept 4: Using loop + recur for data transform is a code smell.
    • Not composable: encourages shoving everything together in one place.
    • “End up with a ball of mud instead of a bag of data you can sift through.”
    • “You know what mud sticks to really well? More mud! It’s very cohesive! And what couldn’t be better than cohesive programs!”
  • Concept 5: Use loop + recur for recursion or blocking operations (like core.async)
    • Data shows up asynchronously
    • Useful when logic is more naturally expressed as recursion than filter + map + reduce.
  • Concept 6: Duality: stepwise vs aggregate
    • Stepwise problem: advance a game state, apply async event, stream processing, etc.
    • Stepwise: reduce, loop + recur
    • Aggregate problem: selecting the right data and combining it together.
    • Aggregate: filter + map + reduce
    • Aggregate problems tend to be eager–they want to process the whole data set.
  • Concept 7: Use your bag of granular data to work toward a bag of higher-level data.
    • We went from lines → entries → days → weeks
    • “Each level of data allows you to answer different questions.”
  • Concept 8: Duality: higher-level data vs granular data with lots of dimensions
    • Eg. having a single “day” record vs a bunch of “entry” records that all share the same “date” field.
    • The “right” choice depends on your usage pattern.
    • Dimensional data tends to stay flat, but high-level data tends toward nesting.
    • A high-level record is a pre-calculated answer you can use over and over quickly.
    • Highly-dimensional, granular record allows you to “ask” questions spanning arbitrary dimensions. Eg. “What weeknights in January did I work past midnight?”
  • Concept 9: Keep it pure. Avoid side effects as much as possible.
    • Pure functions are the bedrock of functional programming.
    • REPL and unit test friendly.
    • “You can use data without hidden attachments. You remember side effects when you’re writing them, but you don’t remember them three months later.”
  • Concept 10: Keep I/O at the “edges” with pure functions in the “middle”.
    • “I/O should be performed by functions that you didn’t write.”
    • Use pure functions to format your data so you only have to hand it off to the I/O function. Eg. Create a list of “line” strings to emit with (run! println lines).
    • You can describe your I/O operations in data and make a “boring” function that just follows them. This allows you to unit test the complicated logic that determines the operations.
    • Separates out I/O specific problems from business logic problem: eg. retries, I/O exceptions, etc.

Related episodes:

Clojure in this episode:

  • filter, map, reduce
  • loop, recur
  • group-by
  • run!
  • println

Permalink

What is idempotence?

Idempotence means duplicates don’t matter. It means you can safely retry an operation with no issues. The classic example is the elevator button: you press it twice and it does not call two elevators. We explore why we would want that property in an email server.

Transcript

Eric Normand: What is idempotence? Why does it help so much with programming in distributed systems? By the end of this episode, you will know how to implement idempotence in your own system.

Hi, my name is Eric Normand, and I help people thrive with functional programming. Idempotence is important because it captures the essence of the safe retry. Without safe retries, you really cannot implement safe distributed protocols.

What is idempotence? The essence of it is that if you ask twice, it’s the same as asking once. It has the same effect. The classic example is the elevator button. You go into a bank of elevators. You press the button. It lights up. It calls the elevator. Then someone else comes in, and they go and they press the button, too, the same button. It’s already lit up.

We know that that doesn’t have an effect. We still want to do it for some reason. It’s a just-in-case. Maybe they’re right, maybe the signal didn’t get to the elevator. It’s worth trying because it can’t hurt anything. That’s the kind of thing that we want to instill in our distributed systems. Technically, it is an algebraic property.

When you’re talking about pressing a button, this is an active effect you’re having on the world. Whereas in algebra, it’s a property of pure functions, mathematical functions. It means, if you capitalize the letters of a string twice, it doesn’t matter. The first time is enough. Technically, if you apply F to a value, let’s say F(x), it’s the same as applying F to applying F(x).

You do the double application of F. It has the same effect as the single application. You could just say that it means duplicates don’t matter. I pressed the button twice. The second one doesn’t matter. If I apply the same function twice, the second time does not matter. The first time matters. The second time, the third time, the fifth time, those do not matter.

Why is this important? In a distributed system, especially in a distributed system, we have this problem where messages over the network are unreliable. Basically, if you send a message, it might not get there and you won’t know. You cannot know if it got there.

Sometimes, you know if it didn’t get there. You get some connection-broken message, but sometimes you just don’t hear back. It timed out.

Did it get there and the acknowledgement timed out, or did it never get there? Did the other system crash? Did it crash before it sent my email or after it sent my email? You don’t know. It crashed, it’s too late. Email is actually a good example because you don’t want to send the same email twice.

Let’s say you have an email server and you send it a message saying, “Please send this email to my customer.” You don’t hear back. You just don’t hear an answer. What do you do? What happened? Do you send it again?

What if it did already send the email? Is it going to send the same email a second time? If it didn’t send it and I don’t send the message again, then the customer won’t get the email.

This is really a real business problem. Idempotence would solve that. If I could send the same message again, and it won’t break anything, it won’t have a second effect. Just like that elevator button, I could send this message all day. I could send it a hundred times, and the email would only get sent once. That’s a good thing.

What it lets you do is decouple. It’s decoupling the number of effects that happen with the number of times that you request that effect. I can request it a hundred times, but it will only get sent once. That is something that you really want to have. You want to be able to retry safely with limited information.

I don’t know if this went through, but I’m going to try again. That is a very nice property to have in your system.

How do you implement idempotence? The simple way if we look at this email server, is you need some way of identifying the email. Some way of saying, “This is the ID of the email that I want to send. If I send you the same email with the same ID, an email with the same ID again, don’t send it a second time.”

The server that’s receiving it has to remember all of the IDs of the emails it has ever sent. That is for total complete idempotence. Usually, that’s not practical. You can’t remember every ID because it could be in the millions. They could date back for many years. It is unlikely that you’re going to get a request that takes years to arrive.

In a practical case, you might have a window that says, “Well, we keep three days of IDs.” That means that you can resend the same ID within those three days, and we won’t send it a second time. You have to find some practical limit that balances the memory requirements and the retries that you’re doing in your system.

Notice, it’s very important, this concept of identity is very important. If you don’t have a concept of identity, what does it mean to send the same message again? If I want to send two emails to this person, I need to be able to send two emails to them. I need some way of saying that they’re different. If I want to retry, I want some way to say that this one is the same as that one.

You need some identity on your requests. If you’re looking at an elevator button, there’s probably an identity, deep in the electronics of this elevator service. It knows what button I pressed. It’s third-floor-up, or fourth-floor-down. There’s some identifier for that button, which allows it to light up, first of all, and stay lit until it needs to be turned off.

That identifier is probably used in multiple places. It puts in a request, “Oh, we need an up-elevator on the third floor because we know that button and what it means.” It’s also used to say, “Hey, I’m already sending that third floor elevator. I don’t need to do that again.” It’s using that identifier.

Let’s recap on this. Wait, no, I did not say how. I did halfway of how. The first half is you need an identity. The second one is once you have that identity, you use a data structure with an operation that is already idempotent. A common idempotent data structure with an idempotent operation is a set, like an in-memory set.

If you have a set of numbers, you give each email a unique number. As the email server sends off emails, it remembers the number in a set, and just adds it to that set. If you add the set, add it twice, you’ve got idempotence already.

Same for an elevator. If you have a button that has an ID, let’s say it’s like the string third-floor-up, or third-floor-down, or fourth-floor-up, fourth-floor-down, you save that into a set that says it’s active. It’s been requested. That means you can send it twice, and it won’t have any effect, to send it twice.

Now, of course, this does not take into account the actual action, the effect that happens by pressing that button, which is to send an elevator to that floor. Same with the email. It doesn’t take into account sending the email or not sending the email.

To determine whether you want to send it, it’s pretty simple. Before you add the thing to the set, the ID, you ask the set, “Do you contain this ID?” If it does, then you’re done. If it doesn’t, you send the email and then you put the ID in the set. There are other data structures that are idempotent. If you have hash maps, those are idempotent.

If you add the same key and value twice, then it has no extra effect. Another thing that you could consider idempotent is something like adding zero to a number. If you need some kind of idempotent addition, you can do that.

There are other data structures that have idempotence in them. They’re more complicated and they’re specialized usage. I’m not going to go into them.

Imagine they’re like sets with other properties. You can add things in, but maybe they don’t grow as fast as a set would grow. They’re like more probabilistic, the kinds of data structures.

I mentioned strings being uppercased. That’s something that’s idempotent as an operation. You could use that if you need to write, like uppercase name means something different from regular case name.

That means that you could do it twice, and it wouldn’t have any extra effect. You probably use this in something, like your email system might lowercase all email addresses before it compares them. What if it’s already lowercased? That doesn’t matter. It’s just going to lowercase everything. If it’s already lowercased, there’s no problem.

Let’s recap. Idempotence means duplicates don’t matter. It’s an algebraic property of certain functions, certain operations, but we extend it to actions in the world. We extend it to effects that we can have on the world, where we’re saying requesting that effect twice is the same as requesting it once. Those duplicates don’t matter there, also.

We need it in distributed systems so that we can have safe retries. It lets us decouple what gets done from how many times we request that it’s done. You can easily implement it using idempotent data structures and operations. It requires a sense, a notion of identity in the messages.

Do yourselves a favor and look for some services that need to happen exactly once. Could be something like sending an email. Could be writing a message to a log. Could be some user setting in your user-panel, and wrap them in something like a data structure that makes them idempotent.

Do me a favor please and share this with friends. If you found it valuable, they might find it valuable, too. Also, if you found it valuable, you probably want to subscribe. That way you’ll get all of the other new episodes as they come out. You won’t miss that value that you have already discovered.

I like to be in deep discussions with smart people. Please email me. I’m eric@lispcast.com or get in a discussion on Twitter. I try to use Twitter as a discussion medium. I’m @ericnormand with a D there.

Also, you can find me on LinkedIn. I’m trying to get better at LinkedIn. It’s a little hard for me. If that’s where you like to connect, let’s connect and start having a conversation.

All right. See you later.

The post What is idempotence? appeared first on LispCast.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.