How to upgrade your Clojure projects to use Java 11

JDK 11 is the latest release of the JDK (Java Development Kit). There are several changes since JDK 8 that will require projects to update their dependencies or build tools. Many open source projects have resolved these issues when adding support for JDK 9 and 10. For most projects that have been updating their dependencies regularly, the upgrade process to JDK 11 shouldn’t be too difficult.

Last year I wrote a guide on upgrading Clojure projects to Java 9. This guide can be read on it’s own, but you can find some more background and context in last year’s guide too.

JDK 11

JDK 11 was released on September 25, 2018. JDK 11 is the first long term support release since JDK 8. Due to the short support lifespan of JDK 9 and 10 (six months each), and the number of breaking changes in those releases, many businesses and individuals have continued to use JDK 8.

Free public updates for Oracle’s JDK 8 end in January 2019 for commercial users, and December 2020 for personal users. After those dates, Oracle will not be providing any more free updates for JDK 8. This means that many Clojure projects will be looking to upgrade to JDK 11 soon, or investigating other JDK providers which support JDK 8. Oracle has also created a guide for updating to JDK 11. Most of the notes in the upgrade guide are not relevant to the majority of Clojure programs, but it’s worth a quick scan as well.

Checking for library upgrades

Before beginning to upgrade to JDK 11, I would recommend checking for new versions of your dependencies. These may have fixes for incompatibilities introduced in new versions of the JDK, reducing the amount of breakage you need to resolve later. It is also much easier to upgrade your libraries first and then your JDK, rather than trying to upgrade both at the same time. You can check for updates to Leiningen projects with lein-ancient, Boot projects with boot-deps, and tools.deps projects with deps-ancient. If your project is publicly available on GitHub, you can use Deps Versions to add a badge to your README to show if your dependencies are up-to-date.

java.util.Collection toArray

If you have previously upgraded to support JDK 9 or JDK 10, the main issue you are likely to face when upgrading to JDK 11 is the addition of a method in the java.util.Collection interface. A new toArray method was added which overloads the existing 1-arity method. Java and other statically typed languages on the JVM have the type information to resolve the ambiguity, but Clojure is not able to resolve this without developers adding extra type hints to specify which method to use.

Without the type hints you would get an error like this:

Exception in thread "main" java.lang.IllegalArgumentException: Must hint overloaded method: toArray, compiling:(clojure/core/rrb_vector/rrbt.clj:282:1)

This change affected several projects directly including core.rrb-vector (CRRBV-18), org.flatland/ordered (#37), datascript (#273), and Clojure (CLJ-2374). In practice, this isn’t an issue for Clojure. Clojure is distributed as an AOT compiled JAR and is compiled against older versions of Java. You were only likely to run into this issue if you were working on Clojure itself.

RRB Vector and Ordered have a much larger impact however. Many projects have dependencies or transitive dependencies on one of these libraries, including Midje, Lacinia, Fipp, Puget, lein-monolith, clj-yaml, compojure-api, and ring-middleware-format.

There are new releases for RRB Vector and Ordered: [org.clojure/core.rrb-vector "0.0.13"], and [org.flatland/ordered "1.5.7"] respectively. Many of the downstream consumers of these libraries have been updated to use these versions, but some haven’t yet.

If you are using Leiningen, this is a great use-case for the :managed-dependencies feature. If you add a vector of :managed-dependencies to your project.clj, Leiningen will choose those versions if your project depended on any version of that dependency. boot-bundle provides a similar feature for Boot.

:managed-dependencies [[org.clojure/core.rrb-vector "0.0.13"]
                       [org.flatland/ordered "1.5.7"]]

If you have any Leiningen plugins with dependencies on rrb-vector or ordered, you can override the resolved dependency by adding an explicit dependency to the :plugins vector. For example:

:plugins [[lein-monolith "1.0.1"]
          ;; Overrides older version of rrb-vector that
          ;; doesn't work on JDK 11.
          [org.clojure/core.rrb-vector "0.0.13"]]

Deprecations and removals

JDK 9 deprecated several Java EE and CORBA modules. The modules were still in the JDK, but not resolved by default. To get access to those modules you needed to explicitly add them back with the command line flag --add-modules, e.g. --add-modules "java.xml.bind".

Without this flag, you would get errors like:

Caused by: java.lang.ClassNotFoundException: javax.xml.bind.DatatypeConverter

The most common use of these modules in the Clojure community was the Base64 converters in javax.xml.bind. Java 8 added a Base64 class to java.util which is a suitable replacement.

If you need to support JDK’s older than Java 8, http-kit shows a way of using macros to support both methods. Note that if you do this and AOT compile your applications, make sure that you compile with the same version of Java you’ll be running in production.

JDK 9 also removed several of the sun.* APIs, including sun.misc.BASE64Encoder and sun.misc.BASE64Decoder. Again, the suggested migration path is to use java.util.Base64.

JDK 11 removed the Java EE and CORBA modules. If you or your dependencies were still using these modules, you will need to add them explicitly as dependencies. JEP 320 has more information on the removed modules and where to get their replacements. If you’re using a library that expects these dependencies, you will need to add a dependency on the JAR to download it from Maven Central. For example:

[javax.xml.bind/jaxb-api "2.4.0-b180830.0359"]

Build Tools

Leiningen 2.8.0 or later is required to use JDK 11. You can get the latest version of Leiningen by running lein upgrade. At the time of writing, Leiningen 2.8.1 was the latest version available.

Boot has been updated to 2.8.2 to build under JDK 11. However, in my testing, it seems that Boot 2.6.0 and up can run under JDK 11. 2.8.2 was released on 14 September 2018. Update boot with boot -u to get the latest version. At the time of writing, Boot 2.8.2 was the latest version.

tools.deps appears to have no blocking issues running JDK 11. There is an open issue about JaJDKva 9 and up adding spurious newlines to a POM, but this is unlikely to be an issue for most people.

If you run into any issues with any of these build tools, let me know, and I’ll update this post.

Clojure and ClojureScript versions

Clojure and ClojureScript both have fixes to improve compatibility with JDK 9 and up. ClojureScript 1.10.63 and up include CLJS-2377 which avoids depending on java.xml.bind.

Clojure 1.9-beta1 has support for running on the bootclasspath under Java 9. If you can’t upgrade Leiningen to 2.8.1, then upgrading Clojure to 1.9-beta1 or later should also work.

Keeping up to date

Java’s new six-monthly release cycle has increased the rate of change in the JDK, and the number of versions that library consumers may be using. The increased willingness to deprecate and extract parts of the JDK means that many developers should consider testing against early access versions of the JDK to ensure that they aren’t caught off-guard. If you’re using Travis CI, I’ve written a guide on testing against the latest JDK.

Permalink

PurelyFunctional.tv Newsletter 304: Back to the factory

Issue 304 – December 10, 2018 · Archives · Subscribe

Hi Clojurists,

More resources about Agile and how it came from the factory. In my experience, people turn to Agile without really understanding where it came from, what it’s about, and what the alternatives are. I’m exploring those myself, because in my experience Agile makes me feel like Charlie Chaplin working at the factory. GIF

Please enjoy the issue!

Rock on!
Eric Normand <eric@purelyfunctional.tv>

PS Want to get this in your email? Subscribe!


I will be speaking at IN/Clojure 2019

I will be speaking at IN/Clojure in January. Be sure to get your tickets while they last.


3 models of software development as a factory

Is software engineering like a factory? If it is, the how so? Is there more than one way to frame that metaphor?


The Dehumanisation of Agile and Objects YouTube

James Coplien at his best, deconstructing Agile.


Real Software Engineering YouTube

Glenn Vanderburg talks about whether programming is engineering, how we got here, and where we can go moving forward. He also has a pretty good breakdown of some of the practices of eXtreme Programming as ways of building feedback loops of various time lengths into your development.


On Storytelling

Evan Czaplicki talks about his process for building Elm. It’s slow and methodical, much like Clojure. People complain and claim that bugs go unfixed for a long time. But he’s optimizing for something more broad and global instead of getting tickets through as fast as possible.


Jason Fried Q&A YouTube

Jason Fried talks in-depth about the process they use at Basecamp, which is nothing like the traditional sprint cycle.


Hammock Driven Development YouTube

Rich Hickey’s classic talk contains a few critiques of Agile development, including calling every cycle a “sprint”. Where is the time for thoughtful reflection?


Unlocking data-driven systems YouTube

A different way to develop: think a lot, then build a solution. Does your company give you time to think? Or are you always busy doing work others have thought about?


Implementing Programmer Anarchy YouTube

Fred George on managing the managers and giving programmers the freedom to solve problems instead of completing tickets. It’s a combination of shielding programmers from over-management, establishing clear goals, and providing a good architecture and coding practices to enable programmers to experiment.


Hiccup id and class Shortcuts Free lesson

Next week’s free lesson is all about how best to use the id and class shortcuts in Reagent’s Hiccup.

I’m going through this course and every week, making one lesson free for the week. You can follow PurelyFunctional.tv on Twitter to be reminded.

The post PurelyFunctional.tv Newsletter 304: Back to the factory appeared first on PurelyFunctional.tv.

Permalink

Easy Clojure, Easy REPL

Have you ever tried to convince a friend that Clojure is a great language?

Have you ever tried to convince your boss to let your team on a side project with Clojure?

If that’s the case, you have probably used as a solid argument the fact that in Clojure, the REPL is very powerful and very helpful for beginners. Inside the REPL, you can do everything and it is so simple to hack with Clojure expressions in the REPL.

If you did a good job, your friend or your boss decided to give it a try. She installed the Clojure CLI on her machine as explained in this blog post and she launched the Clojure REPL with a simple 3-letter command line:

>clj
user=>

Fantastic! Now she had the ability to experiment with Clojure.

After having typed a couple of arithmetic expressions like (+ 2 3) and a few list manipulations expressions like (map inc [1 2 3]), she probably tried to define a variable with a form like the apparenlty innocent (def my-var 42):

user=> (def my-var 42)
#'user/my-var

Then, she asked you with naive expression on her face: “what is this weird dash quote user ?”

At this point, you had two options:

  1. “Oh! That’s the fully qualified name of the variable you created”
  2. “Forget about it, you will understand this part, when you are more experienced with Clojure”

No matter what was your answer, my guess is that she felt a bit confused…

Disapointed

And it probably got worse when you tried to explain her that this was something “simple but not easy”…

The point of this imaginary story is to illustrate the fact that the default Clojure REPL is not beginners friendly.

When I started to write my Get Programming with Clojure book, I had to find a way to let my readers enjoy the power of the REPL without being confused by some weird dash quote symbols too early in their Clojure journey.

The solution I came with was to create my own REPL with a single objective in mind: to be beginners friendly. I named it the Klipse REPL and it is available on github.

The way I am handling def forms in the Klipse REPL is to display the value of the variable instead of its fully-qualified name.

user=> (def my-var 42)
42

No more questions about “weird dash quote symbols”!

Similarly, for function definition with defn forms: I decided to display the name of the function and the arguments that the function expects:

user=> (defn foo [x]
    =>   (* 42 x))
Created function foo ([x])

I made another small improvement in the traditional doc form that the default REPL provides: The doc macro provided by Klipse REPL includes a link to the form page in clojuredocs.org. Clojuredocs is one of the most valuable resources for Clojure beginners as it provides examples of usage of the Clojure forms. (There are so many situations where the docstring is so cryptic.)

For instance, take a look at the last line of the output of (doc inc):

user=> (doc inc)
-------------------------
clojure.core/inc
([x])
  Returns a number one greater than num. Does not auto-promote
  longs, will throw on overflow. See also: inc
-------------------------
Online doc: https://clojuredocs.org/clojure.core/inc

The Klipse REPL also includes all of the great features of Bruce Hauman’s rebel-readline for the simple reason that the Klipse REPL is built on top of rebel-readline. Some of them are:

  1. autocompletion
  2. indentation of multi-line expressions
  3. coloring of forms

Now that my book is available for early access, you can use it as another solid argument to convince your friend and your boss about the value of Clojure.

Permalink

CLJS ❤️'s Lambda

What is Clojure and why do you care?

Because every post about Clojure or Clojurescript (CLJS) has to explain what Clojure(script) is and why you should use it:

But in all seriousness a bunch of people have done better introductions to Clojure then this post will:

🔥Serverless🔥

Unfortunately there isn't an amazing parody commercial for Serverless so instead let's just try to actually motivate why you should care about it. Serverless is just a shorthand for describing a way to run code without worrying about infrastructure. What does that mean practically? Give them and zip and they throw it on some unused hardware, that's it!

It supported on AWS, Google Cloud, and Azure and probably other clouds. Because the examples are built with AWS that is where we will focus.

Probably the biggest motivator for migrating to or build new services with serverless is cost. The the free tier is very generous and it's pay as you go so it's free to prototype!

Eyes popping

You also get ease of deployment/operation and native integration with your cloud provider making it super easy to tap into services like cloud provided DBs, metrics/analytics, queues, email services, etc.

CLJS + Lambda

While there exist ways for you to plug together Clojurescript and Lambda they are coupled to specific build tooling and are a bit too much like a black box for my liking. This also is a good excuse to learn how all the pieces plug together. This post is a distillation of how to do just that. Before finally getting to the meat of the post we have to introduce one more player to the stage.

Shadow-CLJS

Shadow-CLJS is an alternative frontend to the Clojurescript compiler, it is fills the same role as lein-cljsbuild or Figwheel-main if you are familiar with the space. The big advantage it has over the others tools is awesome npm support. We can use basically all of npm with CLJS via Shadow which is important if we want to leverage things like AppSync or the AWS Node SDK.

MAKE THE DAMN LAMBDA ALREADY

Setup

Okay to being lets make a lambda, eventually we will throw away everything inside of it, but for now just leave it be.

And with our basic lambda up we just need our tools and some example code, first the tools:
brew install yarn #Or your OS equiv, apt-get, yum, etc
And that's it!

Example Code

For the example code I have forked Chenyong's minimal Shadow + Node example to create a hello world lambda with Shadow:

royalaid / minimal-shadow-cljs-nodejs

shadow-cljs hot code swapping for Node.js

Node.js example for shadow-cljs

Develop

Watch compile with with hot reloading:

yarn
yarn shadow-cljs watch app

Start program:

node target/main.js

REPL

Start a REPL connected to current running program, app for the :build-id:

yarn shadow-cljs cljs-repl app

Build

shadow-cljs release app

Compiles to target/main.js.

You may find more configurations on http://doc.shadow-cljs.org/ .

Steps

  • add shadow-cljs.edn to config compilation
  • compile ClojureScript
  • run node target/main.js to start app and connect reload server

License

MIT


git clone https://github.com/royalaid/minimal-shadow-cljs-nodejs.git /tmp/lambda-example
cd /tmp/lambda-example
yarn

and now all of your deps are setup!

Code Breakdown

While the project should be pretty self explanatory I will given a high level overview of the important bits, the shadow-cljs.edn and src/server/main.cljs files.

shadow-cljs.edn

The shadow-cljs.edn is the equivalent of project.clj for lein or boot.clj for boot, its the build configuration file for Shadow. The most important part of that file for us is the :target :node-library line which tells shadow to setup module.exports for use by node, this is bridge between the CLJS code with write and the Javascript that Node reads. The second most important line is :exports {:handler server.main/handler}. This line tell Shadow what to call the export and then what CLJS function to bind to it, in our case the handler function in the Clojurescript namespace server.main. This function is what AWS will end up calling when we execute the lambda.

main.cljs

As for the src/server/main.cljs, this is what will eventually be run by AWS after Shadow transpiles it for us. The lone function inside of the namespace is what actually invoked. Important note, the function signature has to have 3 args (2 if using async/await, which we aren't) and must "return" its value by then invoking the cb arg and passing the result to the second arg of that callback (cb) invocation.

We need to do this because Lambda expects the result to come as a promise or from the callback passed in. This took me waaaaaaaaay too long too piece together and hopefully this won't trip up anyone else either. Additionally we need to be sure to return Javascript values and not Clojurescript values hence the use of #js for the array.

Build and release time

Note: I am going to skip the process of actually developing the code but check out the earlier linked Shadow + Node example for details and workflow for how to use Shadow with Nodejs and hot-reloading.
cd /tmp/lambda-example # Get you back to the correct directory
yarn shadow-cljs release app # This transpiles and optimizes your CLJS
zip -r archive.zip target # Prep your output for upload to Lambda

Then we just upload our new zip to our old Lambda and point the handler function at our CLJS function (which will be .handler, main.handler in the recording).

🎉🎉🎉

And there we have it! If you have any questions or feedback reach out on Twitter, Mastodon, or @royalaid on the Clojurians Slack or Zulip

Permalink

3 models of software development as a factory

Summary: Agile software development came from borrowing processes and ideas from manufacturing. Is software engineering like factory work? I examine three metaphors of software engineering as a factory.

Many of the processes we associate with the Agile movement, and certainly many of the ideals, were inspired by manufacturing process management–Lean Manufacturing and the Toyota Production System. Many people criticize the Agile movement on the grounds that software development is very different from manufacturing and so the metaphor might be more harmful than helpful.

Because they believe it is harmful, people have pulled back from the metaphor and only used what seems to be appropriate for teams of programmers. However, do those people know anything about what manufacturing, especially Lean Manufacturing, is even like? How do they know that software engineering on a team is not like building a car?

I’ve been reading books on Agile and the Toyota Manufacturing Process to really understand what’s happening to my industry. My experience on development teams, and talking to other developers, is that only a few know anything about manufacturing. They only imagine what building a car must be like: turning the same screw on 1,000 different cars every day. My research tells me it’s anything but. And by doing that, they’re basically acting like a bad factory. Maybe it’s time to reexamine this bias.

In this essay, I’d like to really dig deep into the metaphor. I’m asking the question: how *is* software development like a factory? I explore three different interpretations and their consequences.

1. Dev team as the factory

I think the most popular way to imagine a dev team as a factory is to see it as a set of features moving down an assembly line, getting gradually closer to deployment. The feature goes from conception, to design, to scheduling into the sprint, to development, through code review, then testing, and finally deployment.

Ideally, the process always flows forward as each step is completed. This way of rationalizing the process is strangely soothing, especially to a manager who would like to track progress. Number of tickets deployed, or its proxy Velocity, feels like a nice metric to get a feel for how the features are coming along.

Fleshing it out

To flesh out the metaphor, the people working the process are like factory workers. Work comes to their station, and they do it. So a feature gets to QA, QA tests the software, then hits a button for it to proceed to the next step. Each worker is a cog in a greater machine, and no one has to know more than their job.

Consequences

So what are the consequences of this model? From my experience, this model makes the developers, and probably the other folks on the “assembly line”, feel like cogs in a machine. Everyone understands their part, but they can’t appreciate the whole. They are like so many Lucille Balls desperately trying to keep up.

People are rewarded for velocity. They don’t know how each feature fits into the whole. All they know is that it was upstream from them and if they pass it on, they can push that button. And so problems get pushed down the line. I’ve been there and it’s not fun.

The teams get divided by function. That means design is separate from dev is separate from testing, etc. Instead of working together, this metaphor can create animosity between the groups as they blame each other for problems.

2. Software as the factory

This is a metaphor that I have never really seen before, but I think it has some merit. Instead of seeing the team as a factory, you see the software as a factory. If car factories deliver cars, your accounting software delivers accounting. A smoothly running factory means smooth accounting. That means invoices move down an assembly line. Programmers build that assembly line.

Fleshing it out

If our software is the factory, then how do we correspond the various roles? Well, at the bottom, invoices and payments are the partially completed cars moving through the assembly line. As an invoice comes in, it is put through a series of steps, like review, recording, and finally payment.

The factory workers are the necessary humans in the accounting loop–usually the customers of the software, but occasionally someone from the software company will have to unjam the works. They work at their stations and review invoices or sign checks.

So what are the programmers, designers, and operations people, then? They’re the process engineers, of course. They are making the machinery that the factory workers use to do accounting.

Consequences

I like this model because it clearly aligns everyone to the job the customer is paying for. Think about it. It’s the process engineer’s job at Toyota to make sure the workers can safely and effectively deliver cars to customers. Wouldn’t it be nice if everyone’s job at an accounting software company was to ensure that accountants could effectively deliver accounting to customers? It’s much better than the job of delivering accounting software features. Customers aren’t paying for features. They want their accounting to be done.

This metaphor also means that every software engineer at the company has to understand accounting. I strongly believe that programmers should become subject matter experts themselves. Doesn’t someone engineering a process for building cars have to know about cars?

Further, it means that the programmer, just like the process engineer, will sometimes go into the factory to help clear things up. This means you’ll see programmers helping customers with accounting issues, like not knowing what to do with a particular invoice. If it’s your job to deliver accounting and your factory’s current machines can’t handle it, you’ve got to do it by hand.

Finally, this metaphor reunites teams. Programmers are not at different stations from designers. They’re all there to make this factory run smoothly and make the work of the customer better. What would an Agile team look like that took this approach?

3. Compiler as the factory

A final metaphor I’d like to explore is the compiler as the factory. In this metaphor, the various software artifacts that get deployed are the product. The compiler, the build tool, etc, those are the factory workers putting the thing together.

This metaphor was inspired by Glenn Vanderburg in a series of talks he has given.

Fleshing it out

Let’s see how the various pieces correspond in this metaphor. The car, our product, is the artifact that runs the software. That could be a binary for a compiled language, or a set of files that will be interpreted by a runtime. The factory workers, and the whole factory, are the compiler and build tools that assemble that artifact. So where do programmers fit in?

Well, the programmers are engineers. They design the artifact their compiler will build using high-level design documents that they call *programs*. That’s right. Coding up a solution is a blueprint for the compiler to use to know what to build. Engineers do tests and experiments to make sure their designs work. And so do programmers!

Consequences

This is a very neat metaphor, which boldly lets us call ourselves *engineers*. Seriously, watch the talk to understand how programmers typically caricature the idea of engineering. I really like that we can be engineers again and not “developers”.

The main problem with the metaphor is that where do the other disciplines come in? Where does the designer fit. Are they industrial designers, working alongside industrial engineers to design the product?

Another problem is how do programmers get feedback from customers about how the software can be improved? I imagine programmers and designers in an R&D lab, trying out different ideas and doing tests for how well they work with real customers. In this way, they may be more like an IDEO product design team than manual laborers.

Conclusions

We’ve assumed the manufacturing metaphor through the Agile movement, but have we really examined it? I believe the way we build software, under the Agile umbrella, is missing out on some of the best parts of modern factory work. If you look at it the wrong way, programmers are reduced to factory workers, trudging through their mindless work. Measurements of story velocity only incentivize the company to push for more features faster.

But we could be so much more! If you look through a different angle, programmers could be aligned with the business value, and measured on delivery of that, instead of delivery of features that may or may not help the customer.

I hope that exploring different ways of seeing our work can give us ideas for the ruts we’re stuck in. In my experience there is lots of room for improvement. We’re stuck thinking of ourselves as car assemblers when we could be designing cars–or even designing car manufacturing itself. And with these three metaphors, I wonder if there are more practices we can borrow from manufacturing.

The post 3 models of software development as a factory appeared first on LispCast.

Permalink

Fast starting Clojure AWS Lambdas using GraalVM and a Lambda custom runtime

Clojure is a dynamic Lisp language which is compiled to JVM bytecode for running as a normal JVM application. Starting a Clojure program is pretty slow compared to many other languages. The start-up takes at least one second but depending on program size this could be almost ten seconds.

JVM itself starts fast in 50 ms. The slow start-up time is mostly caused by JVM class loading. Unfortunately Clojure generates a lot of classes because every Clojure variable definition and function are compiled to classes. This applies also to anonymous functions which are quite common in Clojure applications. For example a Clojure REST API with 2800 lines of code is compiled to over 200 JVM classes. The start-up time of the program is almost eight seconds when running in MacBook Pro 2017 model. The minimum start-up time of one second or more makes Clojure also a bad choice for command-line tools which are expected to run very fast.

Lambda is an AWS serverless technology which lets users to run code without managing servers. Lambdas are paid only for the consumed compute time, which makes them very attractive option for rarely used web applications. Lambdas itself have a start-up time which depends on whether the Lambda instance is cold or warm. When a Lambda is used first time or there has been about 15 to 30 minutes between the last Lambda usage it takes about 600 ms to start the Lambda. After that Lambda is warm, and subsequent invocations are quite similar compared to applications running in normal virtual machines or containers. Performance of a Lambda depends on the memory allocated for it on creation. More memory gives also more computation time.

The start-up time is not a problem for Lambdas which are run as cron-like tasks, SQS queue pollers or otherwise where there is no need for quick synchronous responses.

Clojure running in JVM AWS Lambdas

Due to the nature of Clojure start-up time, using Lambdas have their penalties. In the next table are statistics for different Hello World applications. Clojure applications are tested with 1000 MB and 3000 MB memory for evaluating its effect on Clojure start-up. Other runtimes are tested with 1000 MB memory. Tests were made using a consumer broadband located in Finland against the AWS Ireland region (eu-west 2). The latencies could be better with better network and closer distance to the region. Running tests inside AWS region in EC2 virtual machine would give lower network latencies.

Memory (MB) Runtime N Average Standard deviation
1000 JVM Java (cold) 1000 0.585 0.108
1000 JVM Java (warm) 5000 0.205 0.063
1000 JVM Clojure (cold) 922 2.942 0.391
1000 JVM Clojure (warm) 4600 0.237 0.133
3000 JVM Clojure (cold) 1000 2.348 0.325
3000 JVM Clojure (warm) 5000 0.228 0.134
1000 Python (cold) 1000 0.594 0.210
1000 Python (warm) 5000 0.358 0.074

The results are quite problematic for Clojure. Using fastest available Lambda gives a start-up time of 2.34 seconds. A start-up time using 1000 MB Lambda is nearly three seconds. The long start-up time makes Clojure unusable for example outgoing Slack commands which have a timeout of three seconds. Slow start-up is recognized in the Clojure community. Solving it may be partially accomplished with changes to Clojure itself. Fortunately, a new VM has been created with good results for Clojure.

A new hope emerges - GraalVM and Lambda custom runtime

This year has given two new releases which make the situation better. First, Oracle released a new GraalVM universal virtual machine for running applications written in JavaScript, Python, Ruby, R, JVM-based languages like Java, Scala, Kotlin, Clojure, and LLVM-based languages such as C and C++. GraalVM can ahead-of-time (AOT) compile JVM applications to native binaries which start very fast compared to just-in-time (JIT) compiled programs running in the regular JVM. The memory footprint is also smaller in native images.

The second important release was custom AWS Lambda Runtimes which were introduced in AWS re:Invent 2018. The custom runtimes make possible to make Lambdas with any technology which can be run on Linux – a support is now available for, for example, Ruby, PHP and Cobol! Before, only JVM, Python, Node.JS, C#, GO and PowerShell were supported. With the custom runtime it is possible to compile Clojure as a native GraalVM binary and run it in AWS Lambda.

A custom runtime API exposes API location via environment variables. API itself contains three different REST methods for fetching invocations, posting responses and reporting errors. AWS documentation contains a useful tutorial for creating a custom runtime.

Results using GraalVM

I made a simple custom runtime for running a Clojure program in a custom runtime. Compiling it was quite simple but required using a Docker container because the binary must be compiled in Linux environment. Compilation times are long compared to JVM compilation which must be made before compiling as a native image. The results are described in the next table.

Memory Runtime N Average Standard deviation
1000 GraalVM Clojure (cold) 1000 0.624 0.202
1000 GraalVM Clojure (warm) 5000 0.202 0.068
1000 JVM Java (cold) 1000 0.585 0.108
1000 JVM Java (warm) 5000 0.205 0.063
1000 JVM Clojure (cold) 922 2.942 0.391
1000 JVM Clojure (warm) 4600 0.237 0.133

GraalVM makes Clojure run excellently in the Lambda environment. The cold start time is comparable to regular JVM. A quite interesting result is the warmed performance which is better in GraalVM than in the regular JVM.

Limitations of GraalVM

GraalVM has currently some problems compiling native images. For example no instances are allowed in the image heap for a class that is initialized or reinitialized at image runtime. These classes must be given as parameters, which is cumbersome. The test program contained SSL libraries which caused compilation problems. Also certain libraries cannot be currently compiled. The Apache HTTP client which is used for example by Clojure Clj-http library is one of them. The compilation problem seems not to be Clojure specific so this should be fixed in the future versions of GraalVM. GraalVM added HTTPS protocol support to native-image in version 1.0.0-rc7, but it still has limitations. First, the provided certificate store has only limited set of CA certificates and second, you must configure path to libsunec.so (Sun Elliptic Curve crypto library). GraalVM tries to load the library from the current directory or from java.library.path when it is first used. You can workaround these limitations by:

  1. Copy or make a symbolic link to the certificate store from, e.g., your distribution’s OpenJDK to your GraalVM-installation. The certificate store is usually located in the file $JDK_HOME/jre/lib/security/cacerts.
  2. Configure path java.library.path to include the library libsunec.so (in Linux this is in directory $GRAALVM_HOME/jre/lib/amd64/) or copy the library file to the working directory).

The runtime performance of the native images is slightly worse than regular JVM HotSpot compiler. This may of course change in the future. GraalVM is still in a release candidate phase for 1.0 version so the situation may change in the future.

Java HotSpot VM is a battle tested technology compared to GraalVM which is a relatively young invention. What is the stability of GraalVM compared to Java HotSpot VM is not known yet. Also HotSpot is able to aggressively compile most used code paths during runtime compared to GraalVM. Of course this is not a very big advantage for Lambdas which may have relatively young time of life.

What about ClojureScript?

Instead of using Clojure, we could use ClojureScript which is compiled to JavaScript. This makes possible to run it in the Node.JS runtime. Start up time is quite same than native JavaScript but tooling is currently poorer because traditionally ClojureScript has been targeting the browser environment. If Clojure becomes unviable in AWS perhaps ClojureScript will be Lisp family’s choice for cloud native compilation in the future?

Conclusion

Lambda and the other serverless technologies are most likely to be very important parts of any software product running in a cloud in the future. To be competitive in an enterprise environment, we must fix Clojure’s slow start-up time when running Lambdas. GraalVM seems to fix this by allowing Clojure programs to be compiled to native binaries. The future looks good for Lisp users in the AWS cloud.

Source for tests: https://github.com/hjhamala/graalvm-clojure-lambda-tests

Permalink

SQL NULL, S/nilable, and Optionality

Rich Hickey gave a very thought-provoking talk at Clojure/conj 2018 called Maybe Not, where he mused on optionality and how we represent the absence of a value.

His talk covered many things, including how clojure.spec/keys currently complects both structure and optionality (and his thoughts on fixing that in a future version of clojure.spec), but his mention of s/nilable was what triggered an “ah-ha!” moment for me.

At World Singles Networks, we deal with a lot of data in SQL (specifically in Percona’s fork of MySQL) and, in SQL, you represent the absence of a value with NULL in a column. Columns that represent optional data must be declared as nullable and when you read data from them with clojure.java.jdbc you get hash map entries in the rows that have nil values. If you’re using clojure.spec to describe your tables, rows, and columns, then you are going to have lots of s/nilable specs – and now your “optionality” has been reified into nil values, cast in the stone of your specs… which is clearly not an ideal situation!

This made me realize that java.jdbc probably should just omit keys whose values represent SQL NULL. They are, after all, optional values rather than truly nilable values.

That would be a potentially breaking change in behavior for java.jdbc users. Sure, in most cases, if you have a hash map representing a row in a database table, you’re not really going to care whether (:col row) gives you nil because :col maps to nil or because row doesn’t contain :col. There are use cases where it matters: contains?, row/column specs, tabular printing.

Along with changing the behavior of NULL columns and supporting datafy and nav, I have a lot of other changes that I’d like to apply to java.jdbc, such as automatically qualifying column keys with the table from which they came, improving overall performance (by no longer converting ResultSet objects to sequences of hash maps), dramatically simplifying and streamlining the options that are available (since many of them are very rarely used), and focusing on a reducible-first API. All of which would be breaking changes.

I’ve learned a lot – about Clojure, idioms, and databases – over the seven years that I’ve been maintaining org.clojure/java.jdbc, and it is time for a new namespace or perhaps even a completely new project, that offers a better way to deal with SQL databases from Clojure! I’ll be writing a series of blog posts about the differences I envisage between the current de facto standard JDBC wrapper and where I’d like to go with this, so that I can get community feedback on what should stay, what should change, and what should go. Stay tuned!

Permalink

How do you create a semantic base layer?

In stratified design, we are looking for layers of meaning, each one implemented on top of the last. But how do you go about building those in an existing codebase? While it remains more of an exploration than a step-by-step method, we can still describe some techniques that help find them. In this episode, I talk about four of them.

Transcript

Eric Normand: How do you begin to turn an existing code base into a stratified design?

Hi, my name is Eric Normand, and these are my thoughts on functional programming. In a previous episode, I talked about stratified design and how it’s a design characterized by different layers — meaning semantic layers — one built on top of the other. This was a good way to design and structure your application.

It’s a way that suggests a good structure to your code, like what code goes in what module, things like that. It’s also good architecturally. It puts things that change frequently together and things that change seldom together, which is another good thing architecturally. I had a question. A nice listener posed a very good question, which was the following.

“How do I begin to come up with these base layers?” The base layer is the bottom layer if you think of it like a pyramid. Start at the bottom. The ground is the programming language layer. All the stuff the programming layer, which gives you functions, objects, data types, the mathematical operations — all that stuff that you get in the language layer.

Then the base layer. What he’s referring to is what’s defined directly on top of that, which is a thin layer of semantic meaning on top of that language layer. It’s thin, meaning it doesn’t do much. It doesn’t provide much functionality. Because it doesn’t provide much functionality, it can be really solid.

Then you build another layer on top, another layer on top, and another layer on top until you’re at the layer that changes the most. It’s up at the top. It’s short because it’s built out of more powerful pieces underneath. You don’t have a lot of code that’s changing. What’s up with the bottom? It changes very, very seldom. Maybe you’ll add to it but you shouldn’t have to change the things.

Now, just as a really simple example, in your software, you might have to deal with email addresses. You might not have these operations on email addresses that are universal like they are timeless. Email addresses don’t change. It’s a standard. It’s going to be stuff like pull out the domain part.

You know how you can add plusses to your email address while you might want to have a thing that can remove that to canonicalize it. Maybe there’s a lowercase operation you do. These are very standard email address-specific operations.

Your email address is going to be a string. That’s a language layer. On top of that, you’re going to build these email address-specific operations out of string operations. It might use ReAjax or two lowercase or whatever functions you have in your language.

Then on top of this email address layer, you’re going to add the stuff that you need to do with the email addresses as your software’s domain. What are you doing with them? That’s their user ID for logging in and whatever you’re doing with that.

That’s going to change a lot more frequently than these email address operations. Those email address operations could probably be useful as a library that would be shared across multiple applications in multiple domains, because email address is such a common data type in software in general.

That’s what you’re looking for. Something that is so solid and universal that you can build the whole application on top of it and never touch it. Like I said, you might add to it because you might realize we’re missing something, but you’re not going to change the thing that finds the domain. Once you get that right, it’s done.

How do you go about finding those things in your code? Email addresses is easy because it is a standard already. You’re looking for it in something that maybe hasn’t been done before. You’re looking for this semantic layer in the domains specific concept.

Without knowing the specifics of the code, it might be kind of hard to give general advice, but I’m going to try. I’m going to give the advice that I would do not knowing, [laughs] almost nothing about this software.

I’m also assuming that this is existing working software that has been made by a typical process where you weren’t thinking about layers while you were writing it.

My first go to refactoring for when I don’t know what other refactoring to do, it’s not clear, there isn’t enough semantic information to figure out what to do next. What I do is I shorten functions.

If I have a function that has 10 lines in it, I try to extract out smaller functions from within it. Try to come up with good names for them and make that original function shorter. Instead of having, and I’ll give an example, you might have a function that’s a reduce and it has an anonymous function in it, then it’s got the data that it’s reducing over.

I would take that anonymous function and I would pull it out and name it, like at the top level. That name, trying to come up with like, “What is the meaningful operation here. What level of semantics is it at?”

Did I have a list of employees and so now this reduce operation is taking the employees and doing something to them. Is it treating it like employees? Or is it treating it like data? What’s going on?

I’m trying to name it and I’m just coming up with random examples, but this is summing up all the salaries of all the employees so I know how much I need to pay them this month.

I do a reduce and then like, “Oh, wait inside the reduce I’m pulling out the salary of each employee. That reduce function, the function I just pulled out, it might have 15 lines by itself. I go in and I say, “What are the things, what’s going on? Can I pull things out and name them?”

This is my go-to refactoring in general when things need to be cleaned up. If I don’t know specifically what to do. It’s usually a good thing because there’s a lot of mess hidden inside big functions.

What I’m trying to do is find all the different layers. I’m trying to find the different operations that happen at different levels of meaning by pulling them out. If I’ve got a one or two line function, I know that’s pretty succinct. It’s probably just one layer more then what it’s built on.

If I have 5 to 10 lines, it’s probably skipping layers and there’s a layer in between that I could be building on. This is what to do with existing code where you don’t know where to begin. I think it’s a good place to begin. It’ll help clarify your code in general, even if you never arrive at some kind solid base layer that is unchanging forever. Like a universal base-layer.

It’s good in general. It might to lead you to some understanding. The next thing is, I would be trying to find monoids. That’s just a thing I have. I try to find monoids. Why monoids? There’s a thing about monoids, which is that they take their binary operations. It takes two entities of the same type, two values of the same type, and it returns a value of the same type.

What that means in this discussion is that you’re staying at the same semantic level. If I’m renting something from a car dealership, I take…That’s a bad example. Let’s say I want to combine two discounts. I’m going to have a sale and I need to represent the discount that you get if you’re in the sale.

I combine two discounts. How do they combine? There’s 10 percent off and another 10 percent off. Do I add them? I don’t want to treat it just like a number, because what if you multiply them instead of add them?

A discount, you can start to think of it like, “OK. This is a real semantic thing, because I could have 10 percent off. I could have a fixed amount off, a constant amount like $10 off.” I have this operation where I realize I’m adding two numbers. What I really want to do is combine two discounts.

I’m taking two discounts and returning a new discount. I’m looking for things like that. Usually, they happen with reduces, that reduction where I’m adding salaries of all the employees. Maybe I don’t want to, inside the reduction function, pulling out the salary and adding it to the accumulator value.

What I really want to be doing is taking two salaries and combining them into a new thing. Maybe I wouldn’t consider it a salary. I would consider it an amount of money. It’s just a quantity of money, number of dollars, something like that. I combine those two.

It’s addition, but I get a new quantity of money out. Then it’s not just a reduce. It’s a map over the employees, a map converting all these employees into their salaries. A list of employees into a list of salaries, and then I reduce over that.

I’ve extracted out that part of the reduce function that was doing two things. It was both adding the salaries and extracting it out. Now, I have a monoid. It takes two sums of money and it gives you a new sum of money.

This is an operation that is a candidate for something that is solid. Now, you can have a library. Think of it like, “I have a library of money operations that my accounting department can start to use. I have a library of sale operations.”

That’s stuff like combining two discounts. Maybe they don’t combine. They don’t add. Maybe when you combine two discounts, it just chooses the greater one. If I have a coupon and there’s a sale going on, you don’t want to give 50 percent off of something.

The 30 percent off coupon trumps the 20 percent sale. That might be a company policy. Instead of doing an addition, which is what you had before, you should be calling this other operation because it gives you a place to define the semantics of that operation.

I’m going to recap. Number one was pull out smaller bits of functions to make your functions shorter. You start to get a whole bunch of functions and now you want to organize. Now, you organize them along the dependency lines. That’s going to be number three.

The whole idea is refactor big functions into lots of smaller functions. That’s one. Number two is look for monoids, because monoids are operations that stay, by definition, at a particular semantic level. This is a good number three, too.

They’re also monoids that is they’re usually combiners, combining operations. The thing about combining operations that makes them nice is that they’re the most complicated kind of operation you’ll find.

[drum sounds]

Eric: There’s a marching band in the street. You want to do those first. Since they are the most complicated, they’re the ones that are going to require you to model the data you need most specifically.

If you do your easy operations first, which is what most people do — they leave the hard operations until later — what happens is they finally get to those hard operations and they realize they don’t have the data they need for them.

I have a whole episode about this with a really good example from real software that I’ve used, so I’m not going to go into it anymore. Go listen to that, or just imagine missing data because you didn’t think about it until later.

The thing you’re trying to do is to find these combining operations and define them. They might not be monoids. You might combine two cars into a fleet. Then, of course, you have fleet combining. That’s a cool operation.

You might have operations that aren’t really combining. They’re not monoids but they are combining, meaning they’re returning a different thing. They might have three types but they’re still combiners. Those are good things to focus on at first.

When you find them, they will help you flesh out the semantic entity that you need to be focused on. Then the other operations become easy around them. You can do that same refactoring we talked about in number one where you pull them out.

This will be number four if I can remember it now. It’s something I was thinking about coming back to — the directionality of dependencies. As you are pulling these things out and let’s say you leave them all in the same module, which is fine, this module starts to get bigger and bigger.

What you should be noticing, what you should be looking for is what is depending on what. Function A calls function B. That means A depends on B. If A depends on B, there’s two possible choices.

One is A and B are in the same semantic level, absolutely the same. Or A is in a higher semantic level from B. You got to use your judgment here and you shouldn’t just look at two. You should look at them all as a whole. Start pulling this apart, but you should be looking for a directionality. If A calls B, and B calls C, and C calls D, almost certainly A and D are not in the same semantic level.

This could happen a lot where you’re doing something like a map over an entity. By calling map…Map is a sequence operation, so you are treating this entity like a sequence. You’re probably skipping layers. It is a smell.

Why am I doing map here? Now, you might be doing map because you have something like a sale has a collection of vehicles that are in the sale. Collection, that’s fine. You’re going to do a map at some point, because that’s part of the semantics of the sale.

In general, that’s what you’re looking for. You’re looking for stuff that like, “Wait. Why is it that A calls B, B calls C, C calls D, and then D calls B? Is it because it just happened to have the same shape of data? Maybe that shouldn’t happen.

Or why is it that B calls C, B calls D, and C calls D? Maybe I need to create more of a hierarchy between these things. Maybe D is at a lower level than both B and C, but maybe B has skipped a level. Maybe there is an operation in C.”

It’s not strict. It’s not that a layer can only call stuff one layer below it. That smells. It’s guiding your nose, guiding you through the discovery of these layers. You should be able to graph it.

You should be able to say like, “If you graphed them, if you graph the dependencies in something like graph this and you let its algorithm bubble it up like a tree, you should be able to see these layers.”

You should be able to see, “OK. These five operations at the top, these are the highest level operations, and then they call this other layer that goes in here and his other layer. Then at the bottom, at the leaves of these operations that just call like basic language things.” You should be able to see that.

I’ve never done that. I actually visualized it with a graph visualization algorithm. That might be an interesting thing to do, but you’re always looking to say like, “Relative to these others things that are in its graph, where does it belong? Does it belong with this other thing? Does it belong on this layer or that layer?”

It’s not very specific advice but I think that that’s what I do. Now, the last thing I want to say is I do have a talk called “Building Composable Abstractions” that basically tries to approach this if you got a greenfield project, a greenfield abstraction.

It tries to come from the other direction and do a lot more upfront thinking about how to build this abstraction. I’ll briefly talk about it. I don’t want to go too deep into it because I have a whole hour-long talk that was already condensed from the hours and hours that I could talk about it.

The idea is you pick some core concept in your domain, in your app. If it’s car sales, it might be the sale, the promotion that you’re running. Let’s say it is. Like on a white board, you write down all the operation.

When I say pick the sale, what I mean is you pick that concept and you really develop the metaphor for it. Develop it in your mind. Think about what is this like. This sale is like if you went to a clothing store and they had a sale.

Before the sale starts, you go around and you put a ticket like a red sticker on every sale item. That red sticker means it’s 10 percent off. Then you go through and put a blue sticker on it, and that means 20 percent off.

Boom. Automatically, you have a picture in your mind of what a sale looks like. You’re basically going through all your inventory and you’re tagging cars what discount they’re going to get.

That’s just one possible way to do a sale. I just made that up. Your sale might be different. It might be, “If your name starts with an F, you get 50 percent off. But if your name starts with a J, you get…” Whatever you want to do. It’s up to you. It’s up to your business.

You need to have that in mind before you go into the next step, because that metaphor is going to give you answers. Then you go through and you figure out what the operations are on the sale.

In my case, the operations would be tagging. Given a car, I give it a tag, which represents a discount. I’m also going to need a representation for discount, what the colored tag means.

I’m going to have maybe something that maps blue means 20 percent, green means 30 percent, something like that. I’ll have that also as a concept. You go through and you find all the operations and you start with the combining operations.

Combining operation in a sale might be something like if you have the sale, you want to add a new car to that sale. Not all the cars are on sale but you add cars to the sale. How do you do that?

Do you add one car at a time? Do you add, “All the 2018 cars are now sale.”? You come up with that operation, and these combining operations inform us.

Now, this is an iterative process, so you might not get it right the first time. You do that and you try to figure out what these operations are. If you have to, you start over and you find different operations.

Then you take those operations and you implement them as functions on data. Then you can test it out. It’s not implement. It’s, let’s say, model it with code. [laughs] This isn’t the final implementation.

The final implementation is going to involve the database and Ajax requests and stuff like that. This is just model it in memory with code, so you can play with it, maybe visualize it. Then step four is implement it once you’ve worked out all the kinks in it.

Watch the talk. It’s much better than what I’ve just done right now — Building Composable Abstractions. I’ve given it both in Clojure at the cons, in the Clojure conference, and at OSCON in JavaScript. A functional JavaScript if that’s what you’re into.

They’re both on my site, LispCast.com. You can find them there. I hope this answered the question. I hope it wasn’t too ranty and rambly. It probably was. I apologize, but this is a deep topic, so it had to.

My name is Eric Normand. This has been my thought on functional programming. If you want to get in touch with me, ask me more questions. I love it. I love getting questions. I’m at the audience size now, where I feel like I’m getting a regular stream of questions and I really appreciate that.

Get in touch with me. I’m on Twitter, @ericnormand. You can also email me. Probably better for questions, eric@lispcast.com. LispCast, L-I-S-P-C-A-S-T. You can also find me on LinkedIn if that’s your bag. See you later. Bye.

Transcript

Eric Normand: How do you begin to turn an existing code base into a stratified design?

Hi, my name is Eric Normand, and these are my thoughts on functional programming. In a previous episode, I talked about stratified design and how it’s a design characterized by different layers — meaning semantic layers — one built on top of the other. This was a good way to design and structure your application.

It’s a way that suggests a good structure to your code, like what code goes in what module, things like that. It’s also good architecturally. It puts things that change frequently together and things that change seldom together, which is another good thing architecturally. I had a question. A nice listener posed a very good question, which was the following.

“How do I begin to come up with these base layers?” The base layer is the bottom layer if you think of it like a pyramid. Start at the bottom. The ground is the programming language layer. All the stuff the programming layer, which gives you functions, objects, data types, the mathematical operations — all that stuff that you get in the language layer.

Then the base layer. What he’s referring to is what’s defined directly on top of that, which is a thin layer of semantic meaning on top of that language layer. It’s thin, meaning it doesn’t do much. It doesn’t provide much functionality. Because it doesn’t provide much functionality, it can be really solid.

Then you build another layer on top, another layer on top, and another layer on top until you’re at the layer that changes the most. It’s up at the top. It’s short because it’s built out of more powerful pieces underneath. You don’t have a lot of code that’s changing. What’s up with the bottom? It changes very, very seldom. Maybe you’ll add to it but you shouldn’t have to change the things.

Now, just as a really simple example, in your software, you might have to deal with email addresses. You might not have these operations on email addresses that are universal like they are timeless. Email addresses don’t change. It’s a standard. It’s going to be stuff like pull out the domain part.

You know how you can add plusses to your email address while you might want to have a thing that can remove that to canonicalize it. Maybe there’s a lowercase operation you do. These are very standard email address-specific operations.

Your email address is going to be a string. That’s a language layer. On top of that, you’re going to build these email address-specific operations out of string operations. It might use ReAjax or two lowercase or whatever functions you have in your language.

Then on top of this email address layer, you’re going to add the stuff that you need to do with the email addresses as your software’s domain. What are you doing with them? That’s their user ID for logging in and whatever you’re doing with that.

That’s going to change a lot more frequently than these email address operations. Those email address operations could probably be useful as a library that would be shared across multiple applications in multiple domains, because email address is such a common data type in software in general.

That’s what you’re looking for. Something that is so solid and universal that you can build the whole application on top of it and never touch it. Like I said, you might add to it because you might realize we’re missing something, but you’re not going to change the thing that finds the domain. Once you get that right, it’s done.

How do you go about finding those things in your code? Email addresses is easy because it is a standard already. You’re looking for it in something that maybe hasn’t been done before. You’re looking for this semantic layer in the domains specific concept.

Without knowing the specifics of the code, it might be kind of hard to give general advice, but I’m going to try. I’m going to give the advice that I would do not knowing, [laughs] almost nothing about this software.

I’m also assuming that this is existing working software that has been made by a typical process where you weren’t thinking about layers while you were writing it.

My first go to refactoring for when I don’t know what other refactoring to do, it’s not clear, there isn’t enough semantic information to figure out what to do next. What I do is I shorten functions.

If I have a function that has 10 lines in it, I try to extract out smaller functions from within it. Try to come up with good names for them and make that original function shorter. Instead of having, and I’ll give an example, you might have a function that’s a reduce and it has an anonymous function in it, then it’s got the data that it’s reducing over.

I would take that anonymous function and I would pull it out and name it, like at the top level. That name, trying to come up with like, “What is the meaningful operation here. What level of semantics is it at?”

Did I have a list of employees and so now this reduce operation is taking the employees and doing something to them. Is it treating it like employees? Or is it treating it like data? What’s going on?

I’m trying to name it and I’m just coming up with random examples, but this is summing up all the salaries of all the employees so I know how much I need to pay them this month.

I do a reduce and then like, “Oh, wait inside the reduce I’m pulling out the salary of each employee. That reduce function, the function I just pulled out, it might have 15 lines by itself. I go in and I say, “What are the things, what’s going on? Can I pull things out and name them?”

This is my go-to refactoring in general when things need to be cleaned up. If I don’t know specifically what to do. It’s usually a good thing because there’s a lot of mess hidden inside big functions.

What I’m trying to do is find all the different layers. I’m trying to find the different operations that happen at different levels of meaning by pulling them out. If I’ve got a one or two line function, I know that’s pretty succinct. It’s probably just one layer more then what it’s built on.

If I have 5 to 10 lines, it’s probably skipping layers and there’s a layer in between that I could be building on. This is what to do with existing code where you don’t know where to begin. I think it’s a good place to begin. It’ll help clarify your code in general, even if you never arrive at some kind solid base layer that is unchanging forever. Like a universal base-layer.

It’s good in general. It might to lead you to some understanding. The next thing is, I would be trying to find monoids. That’s just a thing I have. I try to find monoids. Why monoids? There’s a thing about monoids, which is that they take their binary operations. It takes two entities of the same type, two values of the same type, and it returns a value of the same type.

What that means in this discussion is that you’re staying at the same semantic level. If I’m renting something from a car dealership, I take…That’s a bad example. Let’s say I want to combine two discounts. I’m going to have a sale and I need to represent the discount that you get if you’re in the sale.

I combine two discounts. How do they combine? There’s 10 percent off and another 10 percent off. Do I add them? I don’t want to treat it just like a number, because what if you multiply them instead of add them?

A discount, you can start to think of it like, “OK. This is a real semantic thing, because I could have 10 percent off. I could have a fixed amount off, a constant amount like $10 off.” I have this operation where I realize I’m adding two numbers. What I really want to do is combine two discounts.

I’m taking two discounts and returning a new discount. I’m looking for things like that. Usually, they happen with reduces, that reduction where I’m adding salaries of all the employees. Maybe I don’t want to, inside the reduction function, pulling out the salary and adding it to the accumulator value.

What I really want to be doing is taking two salaries and combining them into a new thing. Maybe I wouldn’t consider it a salary. I would consider it an amount of money. It’s just a quantity of money, number of dollars, something like that. I combine those two.

It’s addition, but I get a new quantity of money out. Then it’s not just a reduce. It’s a map over the employees, a map converting all these employees into their salaries. A list of employees into a list of salaries, and then I reduce over that.

I’ve extracted out that part of the reduce function that was doing two things. It was both adding the salaries and extracting it out. Now, I have a monoid. It takes two sums of money and it gives you a new sum of money.

This is an operation that is a candidate for something that is solid. Now, you can have a library. Think of it like, “I have a library of money operations that my accounting department can start to use. I have a library of sale operations.”

That’s stuff like combining two discounts. Maybe they don’t combine. They don’t add. Maybe when you combine two discounts, it just chooses the greater one. If I have a coupon and there’s a sale going on, you don’t want to give 50 percent off of something.

The 30 percent off coupon trumps the 20 percent sale. That might be a company policy. Instead of doing an addition, which is what you had before, you should be calling this other operation because it gives you a place to define the semantics of that operation.

I’m going to recap. Number one was pull out smaller bits of functions to make your functions shorter. You start to get a whole bunch of functions and now you want to organize. Now, you organize them along the dependency lines. That’s going to be number three.

The whole idea is refactor big functions into lots of smaller functions. That’s one. Number two is look for monoids, because monoids are operations that stay, by definition, at a particular semantic level. This is a good number three, too.

They’re also monoids that is they’re usually combiners, combining operations. The thing about combining operations that makes them nice is that they’re the most complicated kind of operation you’ll find.

[drum sounds]

Eric: There’s a marching band in the street. You want to do those first. Since they are the most complicated, they’re the ones that are going to require you to model the data you need most specifically.

If you do your easy operations first, which is what most people do — they leave the hard operations until later — what happens is they finally get to those hard operations and they realize they don’t have the data they need for them.

I have a whole episode about this with a really good example from real software that I’ve used, so I’m not going to go into it anymore. Go listen to that, or just imagine missing data because you didn’t think about it until later.

The thing you’re trying to do is to find these combining operations and define them. They might not be monoids. You might combine two cars into a fleet. Then, of course, you have fleet combining. That’s a cool operation.

You might have operations that aren’t really combining. They’re not monoids but they are combining, meaning they’re returning a different thing. They might have three types but they’re still combiners. Those are good things to focus on at first.

When you find them, they will help you flesh out the semantic entity that you need to be focused on. Then the other operations become easy around them. You can do that same refactoring we talked about in number one where you pull them out.

This will be number four if I can remember it now. It’s something I was thinking about coming back to — the directionality of dependencies. As you are pulling these things out and let’s say you leave them all in the same module, which is fine, this module starts to get bigger and bigger.

What you should be noticing, what you should be looking for is what is depending on what. Function A calls function B. That means A depends on B. If A depends on B, there’s two possible choices.

One is A and B are in the same semantic level, absolutely the same. Or A is in a higher semantic level from B. You got to use your judgment here and you shouldn’t just look at two. You should look at them all as a whole. Start pulling this apart, but you should be looking for a directionality. If A calls B, and B calls C, and C calls D, almost certainly A and D are not in the same semantic level.

This could happen a lot where you’re doing something like a map over an entity. By calling map…Map is a sequence operation, so you are treating this entity like a sequence. You’re probably skipping layers. It is a smell.

Why am I doing map here? Now, you might be doing map because you have something like a sale has a collection of vehicles that are in the sale. Collection, that’s fine. You’re going to do a map at some point, because that’s part of the semantics of the sale.

In general, that’s what you’re looking for. You’re looking for stuff that like, “Wait. Why is it that A calls B, B calls C, C calls D, and then D calls B? Is it because it just happened to have the same shape of data? Maybe that shouldn’t happen.

Or why is it that B calls C, B calls D, and C calls D? Maybe I need to create more of a hierarchy between these things. Maybe D is at a lower level than both B and C, but maybe B has skipped a level. Maybe there is an operation in C.”

It’s not strict. It’s not that a layer can only call stuff one layer below it. That smells. It’s guiding your nose, guiding you through the discovery of these layers. You should be able to graph it.

You should be able to say like, “If you graphed them, if you graph the dependencies in something like graph this and you let its algorithm bubble it up like a tree, you should be able to see these layers.”

You should be able to see, “OK. These five operations at the top, these are the highest level operations, and then they call this other layer that goes in here and his other layer. Then at the bottom, at the leaves of these operations that just call like basic language things.” You should be able to see that.

I’ve never done that. I actually visualized it with a graph visualization algorithm. That might be an interesting thing to do, but you’re always looking to say like, “Relative to these others things that are in its graph, where does it belong? Does it belong with this other thing? Does it belong on this layer or that layer?”

It’s not very specific advice but I think that that’s what I do. Now, the last thing I want to say is I do have a talk called “Building Composable Abstractions” that basically tries to approach this if you got a greenfield project, a greenfield abstraction.

It tries to come from the other direction and do a lot more upfront thinking about how to build this abstraction. I’ll briefly talk about it. I don’t want to go too deep into it because I have a whole hour-long talk that was already condensed from the hours and hours that I could talk about it.

The idea is you pick some core concept in your domain, in your app. If it’s car sales, it might be the sale, the promotion that you’re running. Let’s say it is. Like on a white board, you write down all the operation.

When I say pick the sale, what I mean is you pick that concept and you really develop the metaphor for it. Develop it in your mind. Think about what is this like. This sale is like if you went to a clothing store and they had a sale.

Before the sale starts, you go around and you put a ticket like a red sticker on every sale item. That red sticker means it’s 10 percent off. Then you go through and put a blue sticker on it, and that means 20 percent off.

Boom. Automatically, you have a picture in your mind of what a sale looks like. You’re basically going through all your inventory and you’re tagging cars what discount they’re going to get.

That’s just one possible way to do a sale. I just made that up. Your sale might be different. It might be, “If your name starts with an F, you get 50 percent off. But if your name starts with a J, you get…” Whatever you want to do. It’s up to you. It’s up to your business.

You need to have that in mind before you go into the next step, because that metaphor is going to give you answers. Then you go through and you figure out what the operations are on the sale.

In my case, the operations would be tagging. Given a car, I give it a tag, which represents a discount. I’m also going to need a representation for discount, what the colored tag means.

I’m going to have maybe something that maps blue means 20 percent, green means 30 percent, something like that. I’ll have that also as a concept. You go through and you find all the operations and you start with the combining operations.

Combining operation in a sale might be something like if you have the sale, you want to add a new car to that sale. Not all the cars are on sale but you add cars to the sale. How do you do that?

Do you add one car at a time? Do you add, “All the 2018 cars are now sale.”? You come up with that operation, and these combining operations inform us.

Now, this is an iterative process, so you might not get it right the first time. You do that and you try to figure out what these operations are. If you have to, you start over and you find different operations.

Then you take those operations and you implement them as functions on data. Then you can test it out. It’s not implement. It’s, let’s say, model it with code. [laughs] This isn’t the final implementation.

The final implementation is going to involve the database and Ajax requests and stuff like that. This is just model it in memory with code, so you can play with it, maybe visualize it. Then step four is implement it once you’ve worked out all the kinks in it.

Watch the talk. It’s much better than what I’ve just done right now — Building Composable Abstractions. I’ve given it both in Clojure at the cons, in the Clojure conference, and at OSCON in JavaScript. A functional JavaScript if that’s what you’re into.

They’re both on my site, LispCast.com. You can find them there. I hope this answered the question. I hope it wasn’t too ranty and rambly. It probably was. I apologize, but this is a deep topic, so it had to.

My name is Eric Normand. This has been my thought on functional programming. If you want to get in touch with me, ask me more questions. I love it. I love getting questions. I’m at the audience size now, where I feel like I’m getting a regular stream of questions and I really appreciate that.

Get in touch with me. I’m on Twitter, @ericnormand. You can also email me. Probably better for questions, eric@lispcast.com. LispCast, L-I-S-P-C-A-S-T. You can also find me on LinkedIn if that’s your bag. See you later. Bye.

The post How do you create a semantic base layer? appeared first on LispCast.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.