Building An Image Location Service With Clojure

Recently I was building a small library to extract information from image files. While doing so I thought a simple site to extract GPS information from images would be useful to build as well as to describe how it was created using Clojure.

The source code for this project is available on GitHub at https://github.com/bradlucas/imagelocation while a running example is available at http://imagelocation.beaconhill.com/. If you don't have a photo handy and would quickly like to see what the site returns see the About page http://imagelocation.beaconhill.com/about.

Overview

I'll assume that you the reader have some experience with Clojure and have build something in the language. With that in mind I'll just go through the steps I went through to build up the application.

They are as follows:

  1. Create Initial Project
  2. Add Compojure
  3. Command line support and Jetty Adapter
  4. Process Image Files
  5. Adding Template and Selmer
  6. Final Version

Step 1. Create Initial Project

Let's start with a basic project.

$ lein new imagelocation

https://github.com/bradlucas/imagelocation/commit/0ccb62ab4996b85bcd31931e972b89b086fd3ade

Step 2. Add Compojure

Here we add the Compojure library and get a simple route working.

Add the following to the project.clj file's dependencies.

[compojure "1.6.1"]
[ring/ring-defaults "0.3.2"]   

Also, add the following to the project.clj file.

 :repl-options {:init-ns imagelocation.core}

  :plugins [[lein-ring "0.12.5"]]
  :ring {:handler imagelocation.handler/app}
  :profiles
  {:dev {:dependencies [[javax.servlet/servlet-api "2.5"]
                        [ring/ring-mock "0.3.2"]]}}

Create a file called handler.clj add a require entries and an initial app-routes and app.

(ns imagelocation.handler
  (:require [compojure.core :refer :all]
            [compojure.route :as route]
            [ring.middleware.defaults :refer [wrap-defaults site-defaults]]))

(defroutes app-routes
  (GET "/" [] "Hello World")
  (route/not-found "Not Found"))

(def app
  (wrap-defaults app-routes site-defaults))

Details: https://github.com/bradlucas/imagelocation/commit/9581ccdfc6d464b57fe165d6332160d51d1d3d02

Now test this step with:

$ lein ring server

Command line support and Jetty Adapter

Using lein ring server is fine to get started but we'll want to run a stand alone web app. Also, we'll need to run our system from the command line. To do this see the following commit:

https://github.com/bradlucas/imagelocation/commit/026ac4c4ab4fd0bc092f351e5766aef9d5c411ea

Here we are adding the following libraries.

  [ring/ring-jetty-adapter "1.7.1"]
  [org.clojure/tools.cli "0.4.2"]

The ring-jetty-adapter lets use run standalone with a statement like:

(jetty/run-jetty handler/app {:port 4002}

The org.clojure/tools.cli library lets us process command line arguments easily. See the above mentioned commit for details on how to check for a -f filename to process a single file.

Step 4 - Process Image Files

The following commit introduces the main functionality to extract location information from images.

See this commit:

https://github.com/bradlucas/imagelocation/commit/e39a49516f97c8f899b0d58e0ec0dc77a483cf78

The library the application is using is the metadata-extractor from Drew Noakes https://github.com/drewnoakes/metadata-extractor.

To start a reference to the library is added to the project.clj file.

[com.drewnoakes/metadata-extractor "2.12.0"]

Then see the image.clj file https://github.com/bradlucas/imagelocation/blob/e39a49516f97c8f899b0d58e0ec0dc77a483cf78/src/imagelocation/image.clj.

The main routine to extract the data is image-data.

(defn image-data
  "Return map of all image data fields
"
  [filename]
  (->> (io/file filename)
       ImageMetadataReader/readMetadata
       .getDirectories
       (map #(.getTags %))
       (into {} (map extract-from-tag))))

The output of which is passed to get-location-data to remove just the GPS fields.

(defn get-location-data
  "Return `GPS Latitude` and `GPS Longitude` values in a map
"
  [filename]
  (let [info (image-data filename)]
    {:lat (get info "GPS Latitude")
     :lng (get info "GPS Longitude")}))

There are some routines to convert the three field lat/lng value strings to single numbers in the file as well as a routine to create a Google Map link.

Step 5 - Adding Template and Selmer

The next to last step is to add a simple UI. I choose to use the Selmer libary for templating and a basic Bootstrap template from [Bootswatch])https://bootswatch.com/lux/).

The commit which introduces the files is https://github.com/bradlucas/imagelocation/commit/ce92691f84ee620cb5d353bd73441b3f3f398779.

Adding Selmer consists of adding the following to your project.clj file.

[selmer "1.12.12"]

Getting things setup is a bit more involved than previous steps. It might be worth while looking carefully over the above mentioned commit. What you'll need is the following:

  • A resources/templates directory with a base.html file along with files for each view
  • The Bootstrap template files in resources/public
  • Modifications to your handlers to display the upload form and process the posted upload

If you are following along and building as you go focus on getting a static page working first. This means the base.html and index.html files working with the associated public files.

Then focus on the handlers to process the upload. This may be tricky at first but review the repo for the solution.

Step 6 - Final Version

The last step is after many tweaks to make the system more robust and look better. Feel free to step through the commits to see the details.

The final version as of this writing is version 1.0 and available on this branch https://github.com/bradlucas/imagelocation/tree/release/1.0.

Also, a running version to try is availabe at http://imagelocation.beaconhill.com/

Permalink

Ep 047: What Is "Nil Punning"?

Each week, we answer a different question about Clojure and functional programming.

If you have a question you’d like us to discuss, tweet @clojuredesign, send an email to feedback@clojuredesign.club, or join the #clojuredesign-podcast channel on the Clojurians Slack.

This week, the question is: “What is ‘nil punning’?” We gaze into the nil and find a surprising number of things to talk about.

Selected quotes:

  • “The lowly, magnificent nil. Some people love it, some people hate it.”
  • “Null is the value you give your program if you want to see it die.”
  • “Nil is not null.”
  • “This function found nothing, and I passed that to the next function, and it found nothing in the nothing.”
  • “It’s amazing how much nothing you can find in nothing.”
  • “You can pull data out without fear.”
  • “What does a nil Cat look like?”
  • “A lot of arithmetic stuff is nil-intolerant.”
  • “No answer isn’t going to start becoming an answer later.”

Permalink

Comparing Graph Databases

Comparing Graph Databases II

Part 2: ArangoDB, OrientDB, and AnzoGraph DB

https://medium.com/media/ec53ec9ef1ca580211b5682bbf3df1b7/href

It’s great to see how passionate people are about their favorite graphing database provider or, at least, the graphing database company they work for. Since there are so many available options, please check out db-engine’s list of 33 different options for your graphing database needs. In this article, I will briefly highlight:

  • OrientDB
  • ArangoDB
  • AnzoGraph DB

These three rank 3rd, 4th, and 26th in popularity, according to the list provided by db-engines.com. In the interest of credible journalism, I will humbly attempt to be as unbiased as possible, and report on the information provided on the company’s website as well as recent articles.

Please see my previous article (Part 1) for a quick comparison of relational database management systems and graph databases.

OrientDB — “The database designed for the modern world”

Website: https://orientdb.com/

Documentation: https://orientdb.org/docs/3.0.x/

Initially released in 2010, OrientDB supports many programming languages including: .Net, C, C#, C++, Clojure, Java, JavaScript, JavaScript(Node.js), PHP, Python, Ruby, and Scala. OrientDB is a schema-free multi-model database system supporting graph , document, key/value, and object models. It supports schema-less, schema-full, and schema-mixed modes. Gremlin and SQL queries are both supported for graph traversal. OrientDB is implemented with Java, so it can be run on all operating systems with a Java JDK ≥ JDK6. It is open-sourced, but commercial support is available from OrientDB as well.

According to their website, OrientDB provides the service of a graph database system without the need to “deploy multiple systems to handle other data types.” This method serves to increase “performance and security while supporting scalability.” OrientDB differentiates itself from the many graph database systems by managing a multi-model system by design. It does not simply “add layers for additional models, resulting in decreased performance.”

There are 33 user reviews on G2 with an average rating of 4/5 stars. The majority of the reviews are very positive, and it should be noted that the last average review was from June 2016, so it seems like OrientDB is doing a stand-up job of fixing bugs and deploying fully-developed features. The main criticism of reviews seems to be the desire for more robust documentation. Main praises include the reasonable price, quick-installation, and that it is user-friendly.

https://www.g2.com/products/orientdb/features
https://www.predictiveanalyticstoday.com/orientdb/

ArangoDB — “One engine. One query language. Multiple models.”

Website: https://www.arangodb.com/

Documentation: https://www.arangodb.com/documentation/

Originally called AvocadoDB in 2011, as evident in its logo, ArangoDB came into being in 2012. ArangoDB is open-sourced, multi-model (key/value, documents, and graphs), and implemented with C, C++, and JavaScript. Server operating systems include: Linux, OS X, Raspbian, Solaris, and Windows. It is schema-free and supports the following languages: C#, Clojure, Java, JavaScript (Node.js), PHP, Python, and Ruby. ArangoDB operates with one database core and its own unified query language AQL (ArangoDB Query Language), which is similar to SQL in many ways. AQL is declarative and allows the combination of different data access patterns in a single query. ArangoDB was designed specifically to allow key/value, document, and graph data to be stored together and queried with a common language.

According to their website, ArangoDB can operate as a distributed & highly scalable database cluster. It runs on Kubernetes, including persistent primitives & easy cluster setup. ArangoDB has natively integrated cross-platform indexing, text-search and ranking engine for information retrieval and it is optimized for speed and memory. Full GeoJSON support is also provided.

There are 41 user reviews on G2 with an average of 5/5 stars. The single average rating from 2017 indicated a lack of SQL support and dissatisfaction of having to adapt to AQL. Others refer to AQL as “intuitive” and describe ArangoDB as “feature-rich.”

https://www.g2.com/products/arangodb/features
https://www.predictiveanalyticstoday.com/arangodb/

For a sneak peek at the new features in the upcoming 3.6 release, check out ArangoDB’s Webinar on Oct 10th, 2019 at 1PM EST.

Webinar: ArangoDB 3.6 - The future is full of features - ArangoDB

AnzoGraph — “Build Your Solutions on a Fast, Scalable Database”

Website: www.anzograph.com

Documentation: https://docs.cambridgesemantics.com/anzograph/userdoc/home.htm

Initially released in 2018, this commercial graph database operates with the RDF (Resource Description Framework). The RDF model represents information as triples in the form of subject-predicate-object. RDF stores can be considered as a subclass of graph DBMs, but RDF stores are distinguished as they offer specific methods going beyond the general graph DBMS. Most RDF stores, AnzoGraph included, support SPARQL, which is a SQL-like query language used for OLAP(Online analytical processing)-style analytics. AnzoGraph DB’s operating server is Linux and it supports C++ and Java.

According to their website, AnzoGraph DB is built for online data analytics with performance that linearly scales. It is a Massively Parallel Processing (MPP) native graph database built for analytics at scale (trillions of triples and more), speed and deep link insights. It is intended for embedded analytics that require graph algorithms, graph views, named queries, aggregates, built-in data science functions, data warehouse-style BI, and reporting functions. You can try out their real-world test, check out their benchmark study, and download a 60-day free trial.

Some reviews of AnzoGraph DB exist right here on Medium. Check out this article: Graph Databases. What’s the Big Deal? by Favio Vázquez. He points out that “graph OLAP databases are becoming very important as Machine Learning and AI grows since a number of Machine Learning algorithms are inherently graph algorithms and are more efficient to run on a graph OLAP database vs. running them on a RDBMS.”

Another examination of AnzoGraph exists in the article written by George Anadiotis, which compares AnzoGraph to TigerGraph.

Conclusion

There are many options for graphing databases, and it seems as though each is trying to find their own personal corner of the market. Which one is the best? It really depends on your needs. Every Graph DB will have its unique strengths, weaknesses, and benchmarks. And, as these systems develop and grow their weaknesses will change, and they will likely become more comprehensive and capable. Take the time to shop around and educate yourself on all the available options because there are a lot, and the numbers keep growing.


Comparing Graph Databases was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Permalink

Datahike Release 0.2.0

Last week we released version 0.2.0 of datahike which brings new features and marks a step away from its datascript origin. The major new features are schema flexibility and time travel. After integrating the latest datascript code we extended it through protocols on storage and index level.

Datascript Integration and Cleanup

While datahike began as a fork of datascript and integration of the hitchhiker tree, we are now adding novel features and moving away from the original project. For this release we integrated the latest datascript code and refactored it in preparation for the next features.

Additionally protocols were created for both index and storage layer. Alejandro Gómez provided protocols for the storage backends that support different solutions like in-memory, file-based, LevelDB, or PostgreSQL. New stores can be added by implementing four functions on top of konserve bindings. A good example are the PostgreSQL calls.

The underlying index is now also configurable supporting both the persistent sorted set used in datascript and the hitchhiker tree. Through protocols other interesting index data structures can be added easily.

Since our partners from the datopia project needed the datalog parsing functionality we moved all functions and tests around datalog query parsing into a separate project that will also support functions not used in datahike, like creating datalog queries from its syntax tree, so you can parse a datalog query, add addtionaly data, and create an optimized query for a database to use.

Schema Flexibility

Up until now datahike did not enforce any schema on its data, so any badly shaped data could be added. Starting with datahike 0.2.0 the schema-on-write approach is supported where the schema has to be defined explicitly and the transactor ensures all transacted data conforms to it. Through configuration the previous schema-on-read is still supported.

datahike supports large parts of Datomic’s schema definition except tuples, full text search and byte types. The data validation happens on transaction level using clojure.spec.

In contrast to datascript’s schema in the database record, datahike stores the schema additionally as transactions in the index, which allows queries and time travel that can help auditing schema changes.

Time Travel

Modern database solutions require auditable data history without integrating time stamp attributes to all data models. Similar to Datomic’s historic data datahike supports time travel capabilities that allow datalog queries against the whole history, against any point in the past, or against the difference since a certain point in time.

In accordance to the GDPR regulations data purging capabilities were added that completely remove data either from all indices or only from the historical indices.

In order to keep the current data view fast and clean, a separate set of indices were created that track all past data. This separation makes it very easy to clean up data after certain retention periods without interfering with the current view of data.

Configuration

With the addition of the latest features, datahike required better configuration. Therefore a clean concept was implemented to allow picking only those functions from datahike that you need the most at database creation. For example, both time travel and schema-on-write capabilities can be deactivated if you have no use for it.

Examples

Since it is always helpful to have examples, we provide projects inside the code base that show different features as well as documentation about new features like time travel, schema, and configuration.

These projects just need a REPL and can be worked through at your own pace. Where it was possible we added explanations and different ways to use a certain feature. Have a look at the basic example which introduces topics like stores, schema or time travel. Over the next weeks we will add more examples about queries and transactions.

Future Plans

For the next release we are planning to make datalog available to a wider audience by releasing API bindings for Java/Scala of datahike.

In collaboration with our partners from the 3DF project we aim to support remote capabilities with a HTTP layer, having only a thin client on the application side instead of the full datahike stack.

With support of a datalog query planner we will to be able to predict the costs of each query and recommend possible solutions to improve your overall queries.

For security reasons it would be useful to have an identity and access management concept ready to be used. Therefore user and role definitions will be added to the system entities.

Since we have experience with probabilistic programming in anglican we will try to integrate probabilistic reasoning in datalog and datahike itself.

From a development side we will try to release improvements on a smaller scale with more minor and patch releases and a more transparent development process around GitHub issues.

Thanks

Many thanks to all the people from the Job Tech project supporting us towards the implementation of the schema and time travel functionalities. Also thanks to Chrislain Razafimahefa for his code reviews, Alejandro Gómez for his storage contributions, and all you guys either on slack, GitHub or personally for fruitful discussions and feedback.

Have fun with the latest version of datahike and let us know what you think about it.

Permalink

What is an abstraction barrier?

Structure and Interpretation of Computer Programs talked about abstraction barriers as a way to hide the intricacies of data structures made out of cons cells. Is this concept still useful in a world of literal hashmaps with string keys? I say, yes, but in a much more limited way than before. In this episode, we go into what abstraction barriers are, why (and why not) to use them, and the limits of their usefulness.

Transcript

Eric Normand: What is an abstraction barrier? This is a concept from “Structure and Interpretation of Computer Programs.” In this episode, we’re going to talk about what it is, why to use it, and the limits of its usefulness.

My name is Eric Normand, and I help people thrive with functional programming. Like I said, this is a concept that I was first introduced to in Structure and Interpretation of Computer Programs. I imagine it goes back further than that just because it’s a natural concept to develop.

What it is basically is instead of doing some complex operations inline, you move them, you extract them out into a function, you name that function. Now you have a barrier, where you don’t really have to think about the internals of how this thing gets calculated, you have an operation that’s got a nice, clear, meaningful name that you’re working on.

If you do this with a data structure, so in structure and interpretation of computer programs they’re using scheme. The data structure they use is just basically these cons cells, just pairs. They’re like two poles, but always pairs. They’re building these intricate little data structures out of them.

For instance, in one example I can remember, they build an associative data structure where you can put keys and values and replace keys and values and look up keys by values. It’s all just cons cells. It’s all just pairs, and deeply nested stuff. They build the operations like add a new key value pair. Find the value for this key.

These are operations. They’re giving them nice names. When you look at the code it’s just like…I have to explain this. In scheme, to get the first element from a console, you use a function called car C-A-R, and to get the second element from a console use a function called cdr, which is C-D-R.

You would put these things like get the first element and car of the cdr of the cdr of the car, the car, and that will give you the element you’re looking for. As you’re looking at these operations, they’re not very meaningful except this like first, second, second, first, first, first.

It’s just hard to wrap your head around what is that actually doing. It would be really nice if someone gave it a very meaningful name, like, “Find value given a key.” [laughs] Or just like “get” something like that. You do this to help your mind to encapsulate to put a barrier on the meaning.

You can say, I don’t have to think about this anymore. There’s three operations I’m going to do on these things. I have them well named. I don’t have to go digging around myself and remember how to get a cdr, you know, what cdr, how many cdrs, I need to get the value of something.

This is the what and the why. It’s because sometimes you’ve get these deeply nested things. Just for your mental capacity it’s hard when you inline those to be able to reason about the code. What’s happening is you’re just doing that basic naming operation. You’re naming this thing that you’re going to use a lot.

You’re taking what are meaningless operations like car and cdr. They have a meaning but their level of meaning is very low. You’re elevating that function into a new level based on the name. The name is something much more meaningful at a higher level of meaning.

It’s not data hiding. It is different from data hiding in one very important respect. It’s still all there. It’s still transparent. If you want to pierce your abstraction barrier, go right ahead. You can still map over it. It’s still a list. It’s still cons cells. You can still call car on it.

You are not forced to use those abstraction barriers, those defined operations. Some people say it’s a fine line, because you’re saying you should. There’s some arguments like if this data structure escapes, no one’s going to know to use those things or they’re going to have to use them anyway. You have to give them to them.

All that is true. I don’t want to argue about that. What I just want to argue is that there is this very important difference, which is that you don’t have to use those operations.

It’s not like data hiding in an object-oriented system, where you’ve got all this private stuff. Then the three operations that you want to do on it are the public methods, and you don’t know how it’s implemented and you’re supposed to not care. You’re supposed to not even be able to mess with the data.

You can mess with the data if you want to. This is important, because you want to be able to move through those levels. The trouble is, cons cells are not a very rich data structure. They basically suck. I’m just going to say like that. They’re really neat. They’re complete. You can build trees and lists and a whole variety of data structures with them.

They’re not self-describing. You don’t know what you have, you just have this, if you printed them out they’re just be parentheses with stuff inside. They don’t have…They’re not human-readable. They are, but not very. You need to know what the different levels of the of the parentheses mean.

You need some out-of-band communication, it’s not self-describing. A lot of problems that we have with these data structures are solved by having self-describing names, like a hash map, where the keys are strings. You can have a nice name.

Before you had to build this associated data structure, now you can have a data structure that says, “Hey, I am an associated data structure. I have the curly braces in JSON. My keys are strings. My values are also some value that you can understand.”

This is amazing. Now we don’t need to have these abstraction barriers to do the same thing. A lot of problems are solved just by having better data structures. You don’t need the abstraction. It removes a whole slew of reasons for needing abstraction barriers especially around these intricate data structures.

There’s a thing where if you have public-facing data, you shouldn’t really use abstraction barriers. You should design that data to be easy to consume, easy to produce, not necessarily using specific operations. You want someone to build a type in this literal JSON and it be correct.

You don’t want something where you need some complex operations that they have to define and basically copy from your code base in order to build the thing up. You don’t want that. A public-facing thing, you want very clear names. When you design them, and you put it out into the world, those names are a commitment.

They’re a commitment on your part as an implementer that you’re going to honor those names. If you send me this JSON to my NPI endpoint, I’m going to read it. I’m telling you what is this key means and what the value, how I’m going to interpret the value. That is a commitment that you’re making. The self-describing nature of it is really important.

You shouldn’t rely on an abstraction barrier. However, we also use besides using it for public-facing data, like a public-facing schema or spec. We also use data structures internally in our software. If you need some intermediate index of something, you’ll use a hash map to index it.

Sometimes when you add it, when you make the index where you want to keep track of when you added the thing and when was the last time you accessed it. You’ve got all these other bits of information that you have to maintain. Sometimes, you want to maintain the order, and it’s in a hash map that doesn’t have order. You want to keep them in sync.

Now, you’re starting to talk about this intricate data structure nested and other data structures. At some point, you’re back to the same problem that you had with cons cells which is deeply nested. You’re forgetting how many levels you have deep.

You’re in-lining all these cdrs and cars, except they’re not cdrs and cars. They’re like, “I’ll get this internal map. Inside of that, get this thing.” Then that’s going to give you a map. Then you need the value out of that map. It’s all deeply nested again. It’s easy to get wrong.

When you add a thing to the index, there’s like five things that you need to do, and yet they have to be right. You want to make it easy to get those things right. It’s like doing five things. It’s probably five lines of code. You’re repeating that everywhere. You want to try this up. You want to take that duplication, put it in a function, give it a good name.

All the sudden, you’re doing abstraction barriers again. It’s just the way it is. Is just happens when you have these complex things. Like I said, this isn’t for something that’s going to go external. External, you want to be nice, and clean, and neat, and human-writable, human-readable.

When you’re working internally, sometimes you need an index that’s really tricky and complicated, or you need some data structure that’s super weirdly nested. You want to start extracting out all those operations again.

I want to say another thing. A lot of times…I think in SICP, it says there’s two, and I disagree with it. I disagree with SICP, that Structure and Interpretation of Computer Programs. One of the reasons they give for using abstraction barriers is so that you can change the data structure if you need to.

This is just so overused. We write code today, more complicated than it needs to be because maybe one day in the future, we might want to change it. I just think that that’s wrong. Why complicate your life today for something that might or might not happen in a way that you can’t even predict?

If you know how it’s going to change, if you’re saying, “Look, I know I’m going to swap out my database in one year. I’m using this one database now because I can’t afford the one I really want. When my company does better, I’ll have an income. I’ll be able to pay for that database I do want. I want to be able to swap it out easily.”

Sure. If you’ve got some plan for changing it and you don’t want to have to change all the code again, sure, fine. If that’s part of your plan, you need to be able to change it, yes. Put some kind of indirection in there. If you’re just doing it like a just in case, like maybe we’ll need it, no, do not do that.

People say it, but I think you should not put abstraction barriers just because you might want to change it. You should put abstraction barriers to make it clear what’s going on, especially when you got these intricate, tricky things. It’s hard to get right.

You shouldn’t use it for public-facing data. That should be well-designed, clean, simple, something that another person could write code to generate and not rely on your perfect implementation of all these operations.

You should use it for these intricate data structures that never leave. These indexes aren’t meant to be printed out and send over a wire. They’re meant to be stored in memory for some algorithm or something you’re doing on it where you need constant time access.

All right, so abstraction barriers, I’m going to recap real fast. Abstraction barriers are simply taking operations that you’re doing on some data structure or some piece of data. You’re repeating it. It doesn’t have enough meaning, so you extract it out and give it a name.

If you can count all the operations you’re doing on this data structure…Let’s say there’s three, there’s four of them. You extract all of them out, give them good names. Now, you no longer have to go down into the data structure and manipulate things at the low level. You can operate it at a higher level. That sounds like a good thing to me.

It differs from data hiding in that you can always pierce the barrier. You can look at it, and it’s just raw data. It’s not some encapsulated class or object that has some bespoke methods on it that you can’t see how it’s implemented inside. You can pierce it.

Hash maps and other very much more descriptive data structures that we have in the modern languages, these are because they’re literals. They have descriptive names. They have more well-understood properties, like an array has certain order to it. You don’t need to use a cons cell, which has almost no meaning behind it.

You’ve got higher-level stuff, self-describing. You have literal versions of it. You don’t have to even think about constructors so much anymore. It’s much nicer, but we still build up these intricate, highly nested things for internal use.

I believe that abstraction barriers, as I’ve defined them here, are useful for that, that you want to be able to be operating at a higher level even though it’s this really intricate turning of machines and stuff. It’s just normal.

Let me say it a different way. Hash maps, descriptive names and stuff remove a huge need for the abstraction barriers. We reinvent the problem, because we have all these highly-nested intricate data structures.

Again, they are now made of hash maps, and vectors, and sets, and whatever else we have. They’re still there. They’re just not cons cells anymore. They’re just some other thing.

I’ve seen so many messes in languages like Clojure, that use these data structures a lot when they start getting nested, people forget what they have. They start coupling code together because they’re coupling the “where a value lives” deeply nested in this map.

Because they’re using some path into the nested data structure with the operation — what they want to do. The “where” and the “what they want to do” get coupled together. Having a little bit of barriers when you have a mess, it’s like having little bins to put all your stuff in instead of having in one big bin. It’s just a way to organize it in a way to keep a little bit of sanity when things start to get into a mess.

All right. This has been all about abstraction barriers. This might be a tad controversial. Abstraction barrier shouldn’t be used all the time. I’m not saying that. I still think that they’re really useful, especially when you’ve got nested data structures and you’re starting to get into a mess.

If you liked this episode, please subscribe. Go to lispcast.com/podcast. There you’re going to find all the past episodes. Listen to the one where I talk about building your interface first. Listen to the one where I say just use data. I really think these are subtle issues and it’s not as simple as like use this, don’t use that. You got to allow for some subtlety in there.

You’ll find all the past episodes with audio, video and text transcripts. You listen to it however you want, watch it, or you can even read it if that’s how you like to do it. You can also subscribe. There you’ll find links to subscribe in the various platforms, and also links to find me on social media.

That’s email, Twitter, LinkedIn. Get in touch with me. If you disagree with me, I would love to have a discussion about this because I think it is a bit controversial.

I’d love to hear more arguments for and against. If you’ve got one of those, or you’ve got a question because I didn’t go over something clearly enough, come on, just hit me up, and we’ll talk. Awesome.

My name is Eric Normand. This has been my thought on functional programming. Thank you for listening and rock on.

The post What is an abstraction barrier? appeared first on LispCast.

Permalink

A Common Gotcha with Asynchronous GPU Computing

For the most part, computing libraries such as Neanderthal () abstract the complexity of GPU computing. When you have high-level linear algebra operations, such as mm! (matrix multiply), nrm2 (vector/matrix norm), or sum, you do not have to worry about compiling kernels, loading modules, sending operations to the right stream in the right context, handling stream execution errors, and a bunch of other details. The code we write and test on the CPU can be executed on the GPU!

However, there is one important thing we have to keep in mind: GPU computing routines are asynchronous by default! Most of the time we see the benefits of such behavior, but in some common situations, it can surprise even an experienced developer (yes, humans are fallible).

Suppose we have some code that uses a costly matrix multiplication operation.

(with-release [cpu-a (entry! (fge 100 100) 0.01)
               cpu-b (entry! (fge 100 100) 0.02)
               cpu-c (entry! (fge 100 100) 0.02)]
  (time (dotimes [i 100]
          (mm! 1.0 cpu-a cpu-b 0.0 cpu-c))))
"Elapsed time: 1.153745 msecs"

We called the mm! operation 100 times in a loop and measured the execution time. A bit over 1 millisecond in total, or 10 microseconds per operation.

We decided that, although it is quite fast, it's not fast enough for our customer's requirements. There's an opportunity to use powerful GPU accelerators, and we read in their specs that they can achieve quite a performance boost over Intel CPUs. Since we do not have to adjust the code to run on the GPU (that is, not that much), we quickly write a few tests to see whether the platform switch is worth the effort.

The following code is a port of the previous example to the Nvidia GPU. It is a bit different since GPU context has to be created and managed. It is possible to further abstract these differences away, but I wanted to keep this example straightforward and obvious.

(with-default
  (with-default-engine
    (with-release [gpu-a (entry! (cuge 100 100) 0.01)
                   gpu-b (entry! (cuge 100 100) 0.02)
                   gpu-c (entry! (cuge 100 100) 0.02)]
;; sunchronize! makes sure that measurement is not initiated
;; until the GPU stream is ready
      (synchronize!)
      (time
       (dotimes [i 100]
         (mm! 1.0 gpu-a gpu-b 0.0 gpu-c))))))
"Elapsed time: 0.796928 msecs"

With the same size and computation complexity, we got a modest speedup. What is the problem? We expected a boost! We ask around, and see that communication with the GPU driver carries a fixed overhead. GPU pays off with more expensive operations.

OK, then; let's give it some monster matrices.

(with-default
  (with-default-engine
    (with-release [gpu-a (entry! (cuge 10000 10000) 0.01)
                   gpu-b (entry! (cuge 10000 10000) 0.02)
                   gpu-c (entry! (cuge 10000 10000) 0.02)]
      (synchronize!)
      (time
       (dotimes [i 100]
         (mm! 1.0 gpu-a gpu-b 0.0 gpu-c))))))
"Elapsed time: 2.232541 msecs"

Whow, we increased matrix sizes by a lot, and the GPU didn't even sweat. It finished in almost the same time as before!

What happens on the CPU, do large matrices slow it down?

(with-release [cpu-a (entry! (fge 10000 10000) 0.01)
               cpu-b (entry! (fge 10000 10000) 0.02)
               cpu-c (entry! (fge 10000 10000) 0.02)]
  (time (mm! 1.0 cpu-a cpu-b 0.0 cpu-c)))
"Elapsed time: 6557.759385 msecs"

We excitedly tell everyone decide to transfer all computations to the GPU, since what took days to complete on the CPU, will complete in seconds according to our calculations.

A skeptical colleague is not so sure!

We are reminded that (many) GPU operations are asynchronous.

Our mm! operation does have the same signature on GPU and CPU, but while on the CPU it's synchronous, on the GPU is asynchronous. Simply put, on the GPU, calling mm! means "I acknowledge that I received your request for matrix multiplication and I have put it in the computation queue", and not "I have completed the matrix multiplication that you've requested", as it means on the CPU.

When we measured it in the last example, we just measured how fast the GPU can receive the requests for the mm! operation. Not surprisingly, the size of the arguments does not make much difference there.

To measure how fast the GPU can complete the operation, we must synchronize the queue.

(with-default
  (with-default-engine
    (with-release [gpu-a (entry! (cuge 10000 10000) 0.01)
                   gpu-b (entry! (cuge 10000 10000) 0.02)
                   gpu-c (entry! (cuge 10000 10000) 0.02)]
      (synchronize!)
      (time
       (dotimes [i 100]
         (mm! 1.0 gpu-a gpu-b 0.0 gpu-c)
         (synchronize!))))))
"Elapsed time: 17117.724387 msecs"

Ouch! Instead of 2 microseconds for 100 invocations, we get 17 seconds. That's 171 milli seconds per operation, not 20 micro seconds. Disappointed? We should not be! It's almost 40 times faster than the 6.5 seconds we measured on the CPU. For large matrices, it usually does make sense to use GPU; for medium sized it depends.

If we only work with small-ish data, it never pays off to use GPU. Never? Well, that depends, too. We can group small chunks into larger units that can be computed in batches, but that is out of scope of this article.

Anyway, suppose that we have decided that it pays off to add GPU engine to our project. The code is similar, but there are enough differences. This call to synchronize! bothers us, since it is CUDA-specific.

Fortunately, most of the time we do not need to synchronize streams, and even when we do have to, we do not have to do it explicitly. There are operations available on both CPU and GPU, that implicitly synchronize the stream.

For example, summary matrix and vector operations such as sum, nrm2, dot, etc, which compute a scalar result, implicitly synchronize the stream. The operations that transfer data between CPU and GPU are synchronous by default, too.

(with-default
  (with-default-engine
    (with-release [gpu-a (entry! (cuge 10000 10000) 0.01)
                   gpu-b (entry! (cuge 10000 10000) 0.02)
                   gpu-c (entry! (cuge 10000 10000) 0.02)]
      ;; (synchronize!)
      (time
       (dotimes [i 100]
         (mm! 1.0 gpu-a gpu-b 0.0 gpu-c)
         (asum gpu-c))))))
;; "Elapsed time: 17211.007038 msecs"

Source code for this article

The leiningen project with full source code is available at draganrocks Patreon page. You can choose one of pre-selected tiers, or pledge any amount that you can afford to support https://github.com/uncomplicate open-source Clojure libraries.

Permalink

Senior Developer

Senior Developer

T-Scape Limited | Lisbon,Portugal or Cork,Ireland or Remote
€0 - €500

We are a small company creating a niche financial services product. People we have a strong domain knowledge in the areas we are focusing on and want to bring fresh ideas to old problems. We are creating a development base in Lisbon

We are open to new initiatives and ideas.

You will be working with a small team with close contact to the Business Analysts, you will need to understand and question their requirements for ongoing tasks and expansion of our flag ship product.

We are looking for someone who can find innovative solutions to problems. You will be

  • Self-motivated, energetic and upbeat;
  • Take a logical and systematic approach to problem solving;
  • Strive for timely completion of projects;
  • Take ownership and responsibility;
  • Able to make decisions but are not foolhardy;
  • Ask questions and take advice;
  • You are pragmatic and prefer simple solutions
  • Have the ability to prioritize the urgency of multiple requests and then take charge of these to meet deadlines.
  • Fluent in English

You will need to liaise with our clients if problems arise or to discuss their requirements. As the company expands you must be able to help and nurture junior members of the team.

Skills required.

The product itself is split into three components, the backend server, an application server, and a web based client front end. The key development for this role will report to the backend server and the application server, both of which are written using Clojure.

You will need

  • An excellent knowledge and commercial experience using Clojure, and Java 8+. You should have a good understanding of the STM.
  • Database access via JDBC to one or more of Oracle, SQL Server or Postgres
  • Source control is via GIT, with frequent branches and merges.
  • Documentation is produced in Confluence and Issue Reporting uses JIRA.

Any knowledge of the following will be a plus but not essential:

  • Front End development using Angular 7+, angular cli, CSS, HTML5 and ag-grid
  • Spring MVC and Spring Security for the application server
  • Middleware in the form of Spring Integration for interfaces.

Permalink

Solving Project Euler puzzles using Clojure, 7 years later

During the last few weeks, I've been working on implementing once again the solutions to problems from Project Euler starting with the first 50 problems, which are now published on this Github repo: euler-cljc.

Back in 2012 I was interested in learning Clojure but couldn't think of problems that were small enough to be worked on in small bites but ran into Project Euler, loved the idea and opened an account to track my progress. I would use the problems as a mean to get familiar with the language and its core library. Eventually, I managed to solve 75 of the puzzles with varying degrees of success: many of the solutions are decent but others demonstrate not only that I wasn't familiar with the language yet, but also that settled for a naive solution, left it running for a while and eventually the code brute-forced the problem.

Seven years later, I decided to create a variation of Borkdude's Advent of CLJC but for problems from Project Euler for fun, but also to review that old code and share some of the things I found when comparing the old programming style, approach and development workflow. Some of the following learnings could be also interesting to people that are new to Clojure/ClojureScript and still figuring out the individual pieces.

Development workflow

Back in 2012 I was coming from a Java/Groovy background and possibly the most advanced usage of a REPL equivalent that I had seen was the one in Rails projects where you could start a Ruby console and create blog posts from the interactive shell, but in general the discipline I was used to back then was: creating a test, seeing the test fail, create the simplest implementation that would make the test pass, see the test pass, then refactor.

Your feedback loop was constrained to seeing the results of the execution of your unit tests. Some frameworks would let you have an agent running that would re-run your tests automatically when a file was saved, because it was clear that you should have feedback as soon as possible. Having to click on "Re-run tests" in your IDE and seeing the progress bar fill was not fast enough.

It took me a while to understand how the REPL-driven development workflow was like. If you haven't seen it before, think of it as editing your code while having the means to get instant feedback as fast as entering numbers in a spreadsheet: you change a formula in a given cell and some other cells update instantly, and you can refine whether that helps you solve the problem at hand.

When I was learning Clojure I would have my editor and a terminal and run lein -m my.project.namespace every now and then and run a few functions from the console, or I may have a REPL and I would be pasting code into it and then evaluate some small expressions this way. My old code is plagued with small -main functions or other bits that tell me that I ran the code directly from the command line. Also, some of it it's because my solutions were poor and I let the code running for more than a few seconds (but I'll touch on that later).

It was not until much later that I saw some talk where someone demonstrated how to evaluate code directly from the code editor with a keyboard shortcut that it landed on me. Eventually it became like second nature, but it's the kind of thing that doesn't seem possible until you see it from somebody else. Nowadays, coding in any language that doesn't offer a similar experience seems very rudimentary.

If you are learning Clojure, you should know that learning how to use the REPL from your editor of choice soon will be a time investment that will pay itself and will boost your productivity. Fortunately, most popular code editors that people are familiar have some plugin or feature that allows starting or connecting to a running REPL server.

Nested definitions

If you are coming to Clojure from another language, it will be natural to try to re-map the concepts you already know into a different syntax, but this can also be a source of errors.

My old code is full of nested defs because in Java you use the same syntax to define variables in any scope (or maybe I got the impression it would work like the internal definitions example from SICP?). Most of those defs should be let bindings in the scope of an outer function and a few of the names should just be moved within the namespace so that readers don't get confused about an "inner def". Defs always create a binding at the namespace level, regardless of how nested they are in the code.

If you are learning Clojure, it would be a good idea for you to bookmark the Clojure Style Guide at this point, it's full of great advice that will make your code more readable to other people, because a lot of the time it's about communicating the intent and your assumptions in code too.

Eval time vs. runtime

Another issue my old code has is defining values that are the result of an expensive computation. If you see something like:

;; takes 10 seconds to compute
(def pi-digits (compute-pi-slow (* 1000 1000)))

... what will happen when Clojure loads this line is that before loading the rest of your application, the name pi-digits will be bound to a result after computing these prime numbers. This approach can slow your application startup or give you a false sense of how fast is your code, because effectively you have evaluated something expensive at one time (when the namespace is being loaded) and you might be calling functions that use these definitions at an entirely different time.

The solution in this case is: turn this computation into a function that you call at will or, wrap the value you need into a delay so that the program can compile without actually computing the value. The latter require you to use deref (or @) in your code, which could be seen as a marker of "here's a value potentially slow to compute". Note that the value is computed once and cached for later.

Know your standard library

Another issue that I faced when learning Clojure was getting familiar with the many different functions of the standard library and the different ways they can be combined.

When you start learning Clojure, you may come from a mindset where it's natural to encapsulate your data in different ways and create ah-doc code to transform one model representing your data into another. Sadly, this is common in large Java codebases and APIs to the point where even libraries exist to help with this problem. Rich Hickey has mentioned this a number of times: each new Class is a new ad-hoc glorified API, incompatible with existing general purpose functions.

In Clojure, it's natural to represent data in terms of simple values that aggregate into lists of things or maps from keys of any type and shape to values of any type or shape. And, at the very bottom, there's the sequence abstraction that applies to every collection: a List is naturally a sequence of the head of the list followed by the rest of the list. A HashMap is a sequence (in some order) of key-value pairs. Then you have a large number of functions that can operate at this level and help you extract and recombine your data in different ways. That is tremendously powerful and it's where the Alan Perlis quote shines: "It's better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures".

Examples from ClojureDocs are very helpful but sometimes you don't even know the name of what you are looking for: this is what REPL functions like apropos are for. If you find yourself implementing a function that is fairly general (eg. smallest value in a map, pairing elements with their position in a collection), trust your gut feeling when it tells you I'm sure this function already exists in the clojure.core namespace.

When solving math problems, compute the absolutely minimum required

This may sound obvious and not only applicable to problems from Project Euler, in my experience it's applicable to code you implement in your daily job too: it's very tempting to do some kind of micro-benchmark with a very small subset of data or small collections, but as you start adding zeroes to the size of the input set, the space/time complexity will kill you.

If the problem asks you about "how many routes exist through a grid of size NxN where you can move only to the right or down" my first instinct is to be able to find a way to encode each route, then generate each possible valid route, then count them all. This will work only for small grids, but as you attempt to compute the solution for larger and larger grids you need to realize how it explodes in complexity and wonder if it may be possible to compute the total number without counting each valid option.

The same idea applies on other problems that ask for digit permutations and other problems that look deceitfully simple for smaller bounds. If you find your program needs to run for more than say, 30 seconds to find a solution, it's a strong signal that you are solving the wrong problem. Go outside or have some hammock time and let your back-burner, subconscious brain to explore the problem.

Pipeline of data transforms and bottom-up vs. top-down

Now that I know more about the language that seven years ago, I noticed how a pattern emerged on many of my new solutions:

<generate a basic (often lazy) collection: prime numbers, fibonacci numbers, input from a file, etc.>

then

<possibly combine with some index or some other data from the domain>

then

<filter elements by some criteria>

then

<transform again using some function>

then

<aggregate the results and produce a single result, the solution to the problem>

This pattern made me realize something that other people have described about their own development workflows: rather than starting from the top with a whole problem and breaking down the problem until you implement it in terms of small functions, I found myself doing the opposite: understanding how to build from the smaller blocks and how to combine them in purpose, embracing the problem, until you solve it.

I'm starting to think the former approach (top-down) is part of the perspective of the OOP culture, where you describe the world in terms of a discrete set of entity classes with their attributes and the interactions between them. In this world you want to map the world in terms of roles and it seems even reasonable because it sounds like divide-and-conquer: to understand the whole, break it into individual parts and their relations.

It should be clear by now that this approach is not universal and that it offers help to come up with a mental model but from there to a prescription for implementation there's a larger stretch.

Exploring the problem iteratively

Sometimes you have a hint or a gut feeling of how large is the problem space, but sometimes it's perfectly acceptable to iterate. In problem 44 you are expected to find a value but I didn't want to hard-code an arbitrary limit of how far to search, so I first implemented the solution exploring the problem space in powers of ten: look for the solution in the first 1,000 pentagonal numbers, then in the first 10,000 and so on.

This worked fine on the JVM, but later when running the solution in Node I ran into an issue where this code put too much stress on the GC and crashed after not being able to free memory after a short while.

Be aware of differences between Clojure and ClojureScript

Sharing code between Clojure and ClojureScript is a great thing but don't forget both are hosted languages. Besides the differences in multi-threading story of their respective platforms, there are subtle differences between both for the purposes of solving these puzzles:

  • ClojureScript doesn't have a character literal as in Clojure, and they are single-character strings instead. In practice this is not very important unless you want to extract digits from a number or the index of characters in the English alphabet (e.g. "E" is the 5th letter). For digits you can either use quot and rem to extract the individual digits in base 10, or a function like the following:
    (defn char-score [c]
   
  #?(:clj (inc (- (int c) (int \A)))

   
     :cljs (inc (- (.charCodeAt c 0) (.charCodeAt "A" 0)))))


   
(reduce + (map char-score "COLIN")) ;; => 3 + 15 + 12 + 9 + 14 = 53

  • ClojureScript doesn't support BigDecimal or BigInteger. A new type, BigInt, was added about a year ago to V8 and they are supported on most current JS engines (they also worked on my tests using Node 10 in my laptop). This may be a problem if you are trying to rely on arbitrary-size integers to solve problem 13, so you need to think a little more, or use code that uses BigIntegers in the JVM and BigInts in Node.
  • Differences in regular expressions: there are minor differences in how Regexes work in ClojureScript compared to Clojure because they use the features provided by their host platform, but these rarely are a concern. I solved one problem using regexes (Problem 26, about the length of periodic decimals) and it wasn't an issue.

Laziness as a virtue

This is related to the "computing the absolute minimum required" idea. By having lazy sequences at hand, it's possible to express elegantly a number of constructs in this sort of problem: sequences of primes, Fibonacci numbers, triangular / pentagonal / hexagonal numbers, etc.

In some of the new implementations of the puzzles I relied a lot on the language to compute only what was required using map, lazy-seq or take-while, which are also lazy. If you understand how laziness works you can let the code decide how many elements of these sequences are needed for any given step, and more importantly, when they need to be computed.

On the other hand, there are scenarios where you don't want to be lazy because having predictable performance characteristics in your code is a must, and the language gives you the option.

So, that was 9 ideas after using Clojure (again) to solve puzzles. If you have planning or starting to learn Clojure I think this advice applies the most to you.

 If you are interested, here's my Github repo with the new solutions. Feel free to take a look and if you are interested in submitting your own solutions, PRs are welcome. I'll try to keep uploading solutions to the remaining problems too.

Permalink

In the onion architecture, how do you make business decisions that rely on information from actions?

I’ve gotten several questions about how to do X or Y in the Onion Architecture. It seems like giving the architecture a name has miscommunicated how simple it is. It’s just function calls that at some point are all calculations. In this episode, I try to deconstruct what makes the onion architecture work. Spoiler: it’s just function calls.

Transcript

Eric Normand: In the Onion Architecture, how do you make business decisions that rely on information from actions? I get this question a lot, especially when I bring up Onion Architecture in more of these episodes. In this episode, I am going to answer it for everyone. My name is Eric Normand. I help people thrive with functional programming.

The Onion Architecture is a way of structuring your application with actions on the outside. These are called the interaction layer because if you’re interacting with the world you really have your enactions. You’re receiving requests from the outside. You’re making API requests yourself.

You’re reading from the database. You’re doing a lot of IO, having effects on the world or sending emails. You’re making lights blink. Whatever your software does. That’s all in the interaction layer.

Inside you have a nice, pure set of layers that are all about calculations. I like to divide them up in a certain way. I like to put the business rules as my first layer inside. Inside of that, a domain layer.

Both of those, it doesn’t matter how you divide it up, especially for this discussion. They’re calculations. They’re pure. They are not based on stuff from the outside. They have no effect on the outside. They’re like little brain that you can give it questions and it will answer their questions. It’s making decisions, basically.

You got this calculations making decisions, business decision, even simple domain decisions. The actions are doing stuff like fetching stuff from the database. Sending data to an API.

Here’s the question that I get a lot, “If you’ve got the Onion Architecture, how do you make decisions that should be calculations like how many times to retry an API? If it fails the first time, do you retry?” That’s a decision your software has to make. How do you have a calculation that decides that it needs to have more information from the database?

It says OK. I’ve done a bunch of stuff. Now I know, I need this more information. It can’t get it itself. How does that information get up to the higher layer so that the higher layer can get it and then give it back to it? It’s weird. It starts to sound like a really difficult problem.

I said, “I’ve gotten this question several times in different forms.” It’s a thing that I’ve caused a confusion I’ve caused in how I’ve explained it. I’m using language to try to talk about them as separate layers. People aren’t really used to thinking in layers.

I’m going to try to pick it apart and put it back together with a new explanation. When I talk about layers, I’m talking about functions in your language, they’re either calculations or actions. Functions calling other functions. Function A calls function B.

There is a relationship there. If you draw all of the relationships between functions, and what functions those functions call, then what functions of those function calls. You make sure all of the arrows are pointing down. You have the stuff that nothing calls up at the top that would be like your main.

Then you have the stuff that calls, but nothing else called and you can just arrange them all. At some point, you could draw a line and say, “Everything below this line is calculations.” Because a calculation cannot call an action. If a calculation called an action, it’s not a calculation.

By definition, calculations cannot call anything above that line. You could draw a line and say, “This stuff down here is all calculations.” The stuff above is all actions.

That’s what I mean by the layers. It’s all just function calls, it’s not like some kind of protocol for communicating and getting information like, “Here’s a decision I need you to make,” so you give me the answer, and then I ask you what I do with that answer.

It’s not that, it’s just function calls. Let’s look at the two examples I gave, these two questions that were asked. If you have to do a retry, you make an API call, and it fails, it times out, you don’t know what happened. I’m going to retry it. How do you decide whether to retry it?

Let’s say your rule in your system is retry it three times or retry it two times. You try it once, and you have two more times to retry. That’s really just a less than. You keep track of how many times you called it, and then you see if it’s less than three. If it’s less than three, then you’d keep trying. Then you decrement or whatever, and increment.

That less than is a calculation. It is not named, and is probably in line right in your action. That is a calculation. It is a decision being made. You could, if you wanted to, say, actually this less than sign, this less than operation, less than three, is a business rule.

I probably wouldn’t call it a business rule. It’s more like an architectural rule or system integrity rule, something like that. You could say, I want to name this function that will decide, based on how many times I’ve already tried the API, whether I should try again.

Just a simple Boolean. It’s basically just the body of the function. It’s just less than. Less than three or whatever number you choose. You could say that’s a business rule. Put it in there. The action is just calling this, like function called Retry Question Mark or Should Retry. This is the name of the function.

You notice, this is the name of the action calling a calculation. That’s it, that’s all it is. The action is in the interaction layer, and the calculation is in the business layer.

What if you’re doing something, and you’re doing this big calculation. You’ve managed to turn it into a nice data pipeline. At some point in the pipeline, something says, “Whoa, I need more data from the database. I need this record,” or, “I need this whole set of records from the database to continue working.” What do you do?

Again, when I read the question, it sounds like the same thing, where people are thinking like, “This interaction layer is telling the business layer to do all this work.” Then, somehow, the business layer needs to communicate back up to the interaction layer, and it’s going to go fetch something. There’s this back and forth, back and forth communication.

I don’t know what people are thinking, but it sounds a lot like an object-oriented mindset, where you got this two peers that are communicating.

It’s server, just peer-to-peer communication, with this protocol of like, “You tell me what data you got, and I’ll start calculating. When I need more, I’ll tell you that I need more, and then you’ll fetch it for me, and then I’ll keep going, and then I’ll tell you I need more.”

That is not what I’m trying to get at. When I have taken the code that people have given me, example, this is really hard to do on the Onion Architecture. Basically, all I do is I move stuff from this big action. It’s like 20, 30 lines, and I just move things. I said, “Oh, that could be a calculation. That’s like a business rule, and this is a business rule.” I just move it into calculations.

Then the action just gets shorter, because now it’s just calling these name functions instead of all this inline code. A lot of it is moving into other actions, by the way. It’s like threading it through.

OK, we’re going to fetch this thing from the database, and then past that to this calculation, it’s going to give us an answer. Then we take another thing from the database, and we pass it to the next function with what we already had. That’s going to do some other calculation.

It looks like regular code. It just looks like normal code. I feel like by naming the thing, like Onion Architecture, I’ve somehow confused people that they think it has to be much more sophisticated, complicated than it has to be.

This is the way I see it. A calculation can only make decisions based on what it knows, what it has been passed through the arguments. It can’t say, “Oh, I need more data.” Whatever has to decide, “This thing needs more data,” it’s not a calculation. It’s an action in an upper layer.

What comes out is that we’re doing this because of efficiency, because we might not need to fetch that huge data set from the database. We won’t know until we’re halfway through the calculation whether we’re going to need it or not.

That’s cool. That’s a different problem. Now, we’re talking about, “Are we doing a lazy thing?” Or, “Does this calculation really have a natural point, a break point, where it’s really two calculations that can be called separately, and results from the first thing can be threaded into the second part?”

Is that what’s going on? The idea that this calculation gets to a point says, “OK, I need more data.” The laziness might solve it. We’ve talked about that before, in previous episodes. We constructed a delay. The delay, it’s a suitable thing. Because the calculation is still pure, even though now by triggering this delay to be realized, it is fetching data.

That logic was injected in. It was passed in from the outside. In fact, I’ve seen Onion Architecture implementations that get passed a function, and then that function just returns the data. You could pass it in a function with dummy data that just gets returned, or you can pass it a function that will fetch the data from the database.

Does that turn that calculation into an action because it is now fetching from the database, even though it doesn’t know? That is a very philosophical question that you’re going to have to draw the line somewhere, where you feel comfortable with, I would be very comfortable with that kind of thing. That’s up to you.

I don’t know if I’ve really done my goal, [laughs] achieved my goal of deconstructing this, and really framing it back in terms of function calls.

The Onion Architecture, it’s maybe more of just a way to look at it, that you don’t have to have everything in a big set of actions. That you can push business rules down into calculations, down the layers.

Remember, if all of our dependencies, all of our function call lines are pointing down, that means calculations…They can’t call up because they’re below the actions. That’s what I mean by pushing it down.

We’re taking these business rules, making sure that they’re implemented as pure functions as calculations. They go into a separate layer that then gets called by the interaction layer. That interaction layer is stuff like, I got a Web request — that’s an action. That depends on when, it’s a timely thing. You also can’t decide when you’d get it, it just comes.

I got a Web request. Now, I have to decide what this Web request is. I’m going to route it. The route, that’s probably a calculation. It’s going to take that path, and it’s going to tell me something, like how this request should be handled.

I take that information, I know how it should be handled, still in the interaction layer. That means I had to call this handler. This handler needs XYZ to be called. It needs a little bit more data. It needs the user information, the session, that kind of stuff.

It adds that in, and then the handler gets called. Handler is probably still part of the interaction layer, it still might fetch stuff from the database. This thing is going to fetch that from the database, make a big decision or several small decisions from the business rules, package it up as a response, and then send it back out.

It’s all function calls. The handler is calling calculations, and then coming back. I look at it like, you don’t need to do anything special, except just make sure that at some point, you do have all calculations going down.

That you don’t have a thing where this business rule that is deep down in the…The call graph is going to fetch out to the database somewhere. That’s all it is. That should actually be up at the top, and make sure that stuff is pure. That’s all it is.

I hope this hasn’t been too mystifying. I fear it has been. If it has, get in touch with me. I want to talk about this in a way that’s more understandable. If you have a better way to explain it, if I’m confusing you more, send me examples of things that you don’t understand how you could turn into an Onion Architecture.

You can get in touch with me by going to lispcast.com/podcast. There you’re going to find links to social media like email, Twitter, stuff like that — whatever you think is best for communicating with me, based on the length. Probably if you got some code, it’s best to go by email, not Twitter, but you decide.

You’ll also find links to subscribe to this podcast. You’ll see all the old episodes with audio, video and text transcripts, so if you need to go back and binge-listen, binge-watch or binge-read, it’s there. I was looking at it the other day. I have 139 episodes, so this is 140. That’s quite a number.

This has been my thought on functional programming. My name is Eric Normand. Thank you for listening and rock on.

The post In the onion architecture, how do you make business decisions that rely on information from actions? appeared first on LispCast.

Permalink

PurelyFunctional.tv Newsletter 344: Tip: thank a Clojure OSS dev today

Issue 344 – September 16, 2019 · Archives · Subscribe

Clojure Tip 💡

say thanks to Clojure open source developers

Open source work is hard. People demand a lot of you, and you work for very little money, if any. This goes for open source library, tool, and Clojure core developers. They could all use more messages of thanks for their mostly thankless job.

How to say thanks: keep it simple. “Thanks for all your work on X. I use it all the time and it helps me Y.” A quick message over email or other social media is all it takes to brighten someone’s day.

Book status 📖

I have new codes for my book. They give you 40% off of everything at Manning.

Code: PUREFUNC40

Feel free to buy my book and anything else that suits your fancy. I’ll get a small affiliate fee 🙂

You can find my book, Grokking Simplicity, here.

About the book itself, I’ve found a pretty good layout for showing code progress. That is, code before and after small changes. A sequence of those pages can show a significant change to the code without losing the readers.

It’s mostly a graphic design challenge. What font sizes, weights, and other typographical features make a) everything fit side-by-side and b) make it clear what’s being changed.

You’d think this was already solved, but it’s not. Most authors use the whole width of the page for code listings, then show progress by describing changes in prose. It works, but it’s a lot of burden to put on the reader. I’m hoping this is one of the features that will set my book apart.

Currently recording 🎥

Another week without any new lessons in the Property-based Testing course. My apologies. I try to keep professional but life does get in the way.

Thanks for all of your messages of interest in Property-Based Testing. I am going to finish the course. It’s an important topic and there isn’t enough material on it.

Brain skill 😎

align your learning with purpose

Learning for learning’s sake is great. Following your curiosity is both pleasurable and fruitful.

But curiosity is not always aligned with our desires. For instance, I want to learn Elixir and Rust. I’m curious about them, but that curiosity is easily quenched by reading high-level descriptions of them with a few code examples. I’ve accepted that curiosity is not going to drive my learning.

So I must find a purpose. I must find some purpose for which learning Elixir or Rust would really help. It’s either that or force myself to learn them. But that sounds doubly tedious. Be both laborer and manager.

Purpose has a couple of benefits. For one, it has motivation built in. If I succeed, I have something to be proud of. The second benefit is that it forces me to stay practical. If I’m learning Rust to build an embedded device, I know what I have to learn and getting the software to compile and run is a practical goal. Otherwise, I could read endless documentation about the borrow checker. Purpose is clarifying.

So what purpose are you going to put Clojure to?

Clojure Challenge 🤔

Last week’s challenge

The challenge in Issue 343 was to make a higher-order function that makes another function idempotent.

You can check out the submissions here.

This week’s challenge

retry three times

One of the beautiful things about functional programming is higher-order functions. We write functions that operate on functions. This lets us pass around code to run later, bundled up as a function.

In last week’s challenge, we saw how we could wrap a function in another to make it idempotent. This week, we will make a function that retries another function three times, or until it succeeds.

But first, what does it mean to fail? On the JVM, failure is commonly represented with a thrown exception. So, your task is to write a function that will call its argument. If it throws an exception, it tries again, up to three times total.

As usual, please send me your implementations. I’ll share them all in next week’s issue. If you send me one, but you don’t want me to share it publicly, please let me know.

Rock on!
Eric Normand

The post PurelyFunctional.tv Newsletter 344: Tip: thank a Clojure OSS dev today appeared first on PurelyFunctional.tv.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.