PurelyFunctional.tv Newsletter 362: Tip: double recursion

Issue 362 – January 27, 2020 · Archives · Subscribe

Clojure Tip 💡

double recursion

A few issues ago, I deconstructed the parts of recursion that you need in order for it to work. They were 1) a base case 2) progress toward the base case. But did you know that you can have multiple base cases, making progress toward some or all of them in each recursion? This is how you can implement algorithms that in other paradigms might be nested loops. Let me demonstrate.

If I want to recurse through a doubly-nested vector, I might traditionally use nested for loops. But we can do it with a single recursive function.

(defn print-double-nested
  "v is a doubly nested vector, like [[1 2 3] [4 5 6]]"
  [v]
  (loop [inner [] outer v]
    (cond
      (not (empty? inner))
      (let [[f & rst] inner]
        (println f)
        (recur rst outer))

      (not (empty? outer))
      (let [[f & rst] outer]
        (recur f rst))
      
      :else
      :done!)))

Let’s just gather information about this function. First, there are three cases. One is a base case (when both inner and outer are empty). Second, there are two recursive cases. When inner is not empty, the recursion makes progress on inner but not outer. When outer is not empty, the recursion “refills” inner and makes progress on outer. Eventually, we will run out of nested vectors, and the whole thing will end. Two recursive calls, each making progress toward the same base case.

But you can have multiple base cases!

In the challenge from last week, we needed to find the closest “gapful number” to a given number. The closest number could be smaller or bigger. In the case that both are equally close, we are supposed to favor the smallest one.

Here is a doubly-recursive solution:

(defn closest-gapful2 [n]
  (loop [high? true low (dec n) high n]
    (if high?
      (if (gapful? high)
        high
        (recur false low (inc high)))
      (if (gapful? low)
        low
        (recur true (dec low) high)))))

Note that there are two base cases, one that returns high and one that returns low. These are returned if we have found a gapful number, which we will eventually.

We are also making progress in our two recursive cases. One increments high, making it higher, and the other decrements low, making it lower. In this way, we alternate between making progress up and making progress low. Eventually, one will be a gapful number, and we return.

I often use double recursion like this to traverse known structures. I’m not sure if it’s something I can recommend you do. Should I keep doing them? They often seem too complex and clever. Anyway, it’s something you may encounter and find useful. What do you think about all this?

Relevant courses 📚

If you like recursion, laziness, and reduce, check out these courses.

Book update 📖

Chapter 6 is out! Buy it now: Grokking Simplicity.

I’m still working on Chapter 7, but it is “content complete”. I’ve said everything I need to in the chapter. Now it’s just about final polish: layout, design, editing, and, what my editor calls “over authoring” where you layer on helpful information to make it easier for the reader.

Chapter 7 is about stratified design. And I’m very happy diving into the reality of design. The iterations, the false starts, the dead ends, and sometimes, rarely, finding something sublime. But most of the time you just make it easy to read so you can come back later.

Since Chapter 6 just came out, it will be about a month before Chapter 7 comes out.

You can buy the book and use the coupon code TSSIMPLICITY for 50% off.

Podcast episode🎙

I’ve revamped the format of my podcast. If you like reading old computer science papers to really understand how we got here (and sometimes uncover old gems), this podcast is for you. Subscribe!

In the first episode of the season, we read a paper from 1987 by Abelson and Sussman, the authors of SICP, called * Lisp: A language for stratified design*. The paper asks the question: “What is Lisp good for?” Read the paper, then follow along on the podcast.

The next episode comes out next Monday. We are reading The Early History of Smalltalk.

Clojure Challenge 🤔

Last week’s challenge

The challenge in Issue 361 was a function to find the closest “gapful number” to a given integer. You can see the submissions.

You can leave comments on these submissions in the gist itself. Please leave comments! We’re all here to learn.

This week’s challenge

Parse query parameters

URLs can have optional query parameters. They are the key-value pairs that follow the path.

Write a function that takes a URL (represented as a string) and parses out the query parameters into a hashmap.

Notes:

  • The query string is the string containing the query parameters. It appears after the ? and before a #. Ex: https://lispcast.com/search?q=clojure#page2
  • The keys and values are URL Encoded. You can use java.net.URLDecoder/decode to decode them.
  • The key-value pairs are separated by &. Ex: a=1&b=2
  • Each key-value pair contains the key, followed by =, followed by the value. Ex: a=1

Bonus:

Query parameters can contain duplicate keys. Handle them gracefully.

As usual, please reply to this email and let me know what you tried. I’ll collect them up and share them in the next issue. If you don’t want me to share your submission, let me know.

Rock on!
Eric Normand

The post PurelyFunctional.tv Newsletter 362: Tip: double recursion appeared first on PurelyFunctional.tv.

Permalink

Q1 2020 Funding Announcement

Clojurists Together is happy to announce that for Q1 of 2020 (February-April) we are funding four projects:

Recap

For those who are new to Clojurists Together, our goal is simple: Fund critical Clojure open-source projects to keep them healthy and sustainable. Clojure companies and individual developers sign up for a monthly contribution, and we pick projects to fund each quarter. Previously we have supported datascript, kaocha, cljdoc, Shadow CLJS, clj-http, Figwheel, ClojureScript, CIDER, Aleph, Neanderthal, Fireplace, and other projects.

Funding details

We support our grantees by paying them a total of $9,000 over three months ($3,000/mo).

Reagent

Why is this project important to the Clojure community?

Based on Clojars downloads Reagent is the most used ClojureScript React wrapper. Reagent is also one of the most popular ClojureScript projects on GitHub with 3.7k stars, and re-frame, a framework built on top of Reagent has even more stars.

What are you wanting to achieve with this funding?

I can work on Reagent at work, during our 10% time and in some cases during customer projects, but this is mostly maintenance and small fixes, as it is hard to find time for bigger changes.

To implement larger changes I’d need some weeks of dedicated time. Two of the bigger changes I’ve been thinking about are:

  • Supporting React hooks
  • Configurable (Reagent-)hiccup compiler

One of the most asked features is if Reagent support for React hooks. Unfortunately hooks are only usable with React functional components. and currently Reagent RAtom implementation uses Stateful Component state. RAtoms need to know which components depend on them, and be able to re-render the component when RAtoms change. This should be possible by storing storing state and triggering re-render using hooks. The trick will be implement this in a way that doesn’t break existing Reagent components, which presume stateful components.

As this affects how Reagent turns Hiccup to React elements and components, I have some ideas on allowing users configure the Reagent Hiccup compiler, similar to what Hicada does. This would also allow introducing optional features which would break existing Reagent code, by making users opt-in to these. One case would be to make React component interop simpler.

As hooks is the more important feature, I’d start by testing out the ideas I have for that, and then see if that it makes sense to implement that alone, or if Hiccup-compiler changes are needed/make sense to implement at the same time.

I think a good outcome would be to have Reagent alpha/rc release with Hooks support with the funding, and even better if I also have time to test ideas for Hiccup-compiler feature.

fireplace.vim

Note: Tim has other work commitments for the next few months, and will start work on fireplace once those finish up.

Why is this project important to the Clojure community?

Fireplace.vim was used by 10% of Clojure developers in the most recent Clojure survey.

What are you wanting to achieve with this funding?

  • Leverage the recently added asynchronous APIs for user facing features, starting with eval.
  • Better automatic configuration for projects, notably Shadow CLJS which is currently a pain to get up and running on.
  • Attend to various feature requests in the issue tracker (I got the low hanging fruit during the last sponsorship so most of what’s left is pretty meaty).

Ring

Why is this project important to the Clojure community?

Ring is the most commonly used HTTP abstraction layer for Clojure.

What are you wanting to achieve with this funding?

I’d like to bring out a draft spec and experimental alpha for Ring 2.0, in order to better support asynchronous HTTP connections, and to pave the way to support HTTP/2 and HTTP/3 in future. As a secondary objective, I’d also like to remove the provided dependency on Java servlets from Ring core.

Calva

Why is this project important to the Clojure community?

We’ve seen that Calva helps new Clojurists get up and running with Clojure without having to worry so much about tooling. Perhaps more importantly, VS Code has become such a major platform for developers and it’s important to have Clojure(Script) support in the developers’ editor of choice.

What are you wanting to achieve with this funding?

We would like to add the ability to work with large data sets. Currently, large data sets can freeze up the REPL session, and this makes Calva unusable for some developers who routinely work with such data. We would also like to add Clojure debugging to Calva. These are the two main goals, but handling large data sets is the priority. If at any point one of these problems seems to require a lot more time than we anticipated, we may shift our focus to handling the one task for the duration of the funding period, if necessary. In addition to those goals, we’d like to handle issues here and there and maintain the issue log where needed.

Voting details

The projects that applied this quarter were:

  • Conjure
  • Calva
  • Reagent
  • Graphqlize
  • Ring
  • Tupelo Forest
  • iSpooge Live
  • fireplace.vim
  • Light Table
  • Clojure Goes Fast
  • Saite
  • Clojure’s Boring Web Framework
  • Practicalli Spacemacs
  • vim-iced
  • Chlorine
  • Typed Clojure
  • Datahike
  • Cloxp 2.0
  • Perun
  • Formic
  • Intro to Programming w/Clojure in Georgian Language
  • Pathom
  • Ohmyclj
  • Klipse
  • Origami
  • Oz
  • form-validator-cljs

Q1 2020 Funding

We had a bunch of great applications from great projects. If you’d like to see more projects get funded, then please join. If you applied for the last funding cycle, we will re-use that application for future funding cycles. If you maintain a Clojure/ClojureScript project that is important to the community, consider applying for funding so we can help you keep it sustainable.

Lastly, a big thank you to all of our members. We couldn’t have done it without your support.

Permalink

Clojure & JavaScript Higher-order Functions

It was not easy for me to learn JavaScript. And when it came to learning higher-order functions, I had to take a break — literally get up and take a walk around the block — because I couldn’t seem to understand it. But through my pain, I’ve put together a short guide that hopefully makes it clearer for you.

Higher-order functions are functions that take other functions as arguments or return them as output, as opposed to normal functions that take as arguments and return non-functions. Functions that are passed as an argument to another function are what we refer to as callback functions. In this post we learn about higher-order functions through examples in JavaScript and Clojure, two functional languages with very different syntax.

Why are they necessary?

Flexibility! Higher-order functions allow you to change how a function behaves by adding your own behavior to its existing behavior. You are able to create something complex that is specific to your needs hence people refer to higher-order functions as a “powerful abstraction tool.”

In this post, I will focus on the three most commonly-used higher-order functions: filter, reduce, and map.

Filter

This function takes a callback function and a collection. It returns a sequence in Clojure, or an array in JavaScript, containing the items in the collection which meet the condition of the function.

Example in Clojure


(def numbers [1 20 23 4 75 3])
(filter #(>= % 20) numbers)

;;; output
;;; (20 23 75)

Example in Javascript


const numbers = [1, 20, 23, 4, 75, 3];
const greaterNumbers = numbers.filter(num => {
    return num >= 20 });

console.log(greaterNumbers);

// output
// [20, 23, 75]

Map

Map takes a function and a collection. It then returns a lazy sequence in Clojure, or an array in JavaScript, that is the result of applying the function to each item in the collection.

Example in Clojure


(map #(* % 10) [1 2 3 4 5])

;;; output
;;; (10 20 30 40 50)

Example in Javascript


const numbers = [1, 2, 3, 4, 5]
const multipleNumbers = numbers.map(number => {
    return number * 10 });

console.log(multipleNumbers)

// output
// [10, 20, 30, 40, 50]

Reduce

Reduce also takes a callback function and a collection. The callback function should be a binary function, that is it should take two arguments. If on calling the callback is passed a value, the callback function takes that value and the first item in the collection to give a result. If it is no passed a value on the first iteration, the callback function begins by taking the first and second items in the collection as arguments, and returns a result. Let’s see this in action:

Example in Javascript

Without a value


const numbers = [1, 2, 3, 4];
const summedNums = numbers.reduce((accumulator, currentValue) => {
    return accumulator + currentValue });

console.log(summedNums);

// output
// 10

In the above example, the callback function took the first two items and added them together, then used the result of that operation as the first argument in the second iteration. The below type describes the steps in this iteration:

iteration-table

With a value


const numbers = [1, 2, 3, 4];
const summedNums = numbers.reduce((accumulator, currentValue) => {
    return accumulator + currentValue }, 100);

console.log(summedNums);

// output
// 110

In this case we pass 100 as the first argument to the callback function, and the iteration proceeds as in the table below:

iteration-table-100

Example in Clojure

Without a value


(reduce + [1 2 3 4])

;;; output
;;; 10

Here the + is the callback function that will take the first and second item and loop through.

With a value


(reduce + 100 [1 2 3 4])

;;; output
;;; 110

That is it! At the end of the day, higher-order functions are quite similar regardless of the programming language you are using. As long as you understand the general concepts and goals of higher-order functions in the language of your choice, you will be able to reapply that knowledge and use those same higher-order functions in other languages.

With this knowledge, I can now confidently create my own higher-order functions to write maintainable code in multiple languages. Higher-order functions allow me to write code that uses fewer lines while still being understandle. With higher-order functions the code is less repetitive and more efficient.

Permalink

Clojure Interop With Python NLP Libraries

clojure-python

In this edition of the blog series of Clojure/Python interop with libpython-clj, we’ll be taking a look at two popular Python NLP libraries: NLTK and SpaCy.

NLTK – Natural Language Toolkit

I was taking requests for doing examples of python-clojure interop libraries on twitter the other day, and by far NLTK was the most requested library. After looking into it, I can see why. It’s the most popular natural language processing library in Python and you will see it everywhere there is text someone is touching.

Installation

To use the NLTK toolkit you will need to install it. I use sudo pip3 install nltk, but libpython-clj now supports virtual environments with this PR, so feel free to use whatever is best for you.

Features

We’ll take a quick tour of the features of NLTK following along initially with the nltk official book and then moving onto this more data task centered tutorial.

First, we need to require all of our things as usual:

1
2
3
4
(ns gigasquid.nltk
  (:require [libpython-clj.require :refer [require-python]]
            [libpython-clj.python :as py :refer [py. py.. py.-]]))
(require-python '([nltk :as nltk]))

Downloading packages

There are all sorts of packages available to download from NLTK. To start out and tour the library, I would go with a small one that has basic data for the nltk book tutorial.

1
2
 (nltk/download "book")
  (require-python '([nltk.book :as book]))

There are all other sorts of downloads as well, such as (nltk/download "popular") for most used ones. You can also download "all", but beware that it is big.

You can check out some of the texts it downloaded with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  (book/texts)

  ;;; prints out in repl
  ;; text1: Moby Dick by Herman Melville 1851
  ;; text2: Sense and Sensibility by Jane Austen 1811
  ;; text3: The Book of Genesis
  ;; text4: Inaugural Address Corpus
  ;; text5: Chat Corpus
  ;; text6: Monty Python and the Holy Grail
  ;; text7: Wall Street Journal
  ;; text8: Personals Corpus
  ;; text9: The Man Who Was Thursday by G . K . Chesterton 1908

  book/text1 ;=>  <Text: Moby Dick by Herman Melville 1851>
  book/text2 ;=>  <Text: Sense and Sensibility by Jane Austen 1811>

You can do fun things like see how many tokens are in a text

1
  (count (py.- book/text3 tokens))  ;=> 44764

Or even see the lexical diversity, which is a measure of the richness of the text by looking at the unique set of word tokens against the total tokens.

1
2
3
4
5
6
7
  (defn lexical-diversity [text]
    (let [tokens (py.- text tokens)]
      (/ (-> tokens set count)
         (* 1.0 (count tokens)))))

  (lexical-diversity book/text3) ;=> 0.06230453042623537
  (lexical-diversity book/text5) ;=> 0.13477005109975562

This of course is all very interesting but I prefer to look at some more practical tasks, so we are going to look at some sentence tokenization.

Sentence Tokenization

Text can be broken up into individual word tokens or sentence tokens. Let’s start off first with the token package

1
2
3
(require-python '([nltk.tokenize :as tokenize]))
(def text "Hello Mr. Smith, how are you doing today? The weather is great, and city is awesome.
The sky is pinkish-blue. You shouldn't eat cardboard")

To tokenize sentences, you take the text and use tokenize/sent_tokenize.

1
2
3
4
5
 (def text "Hello Mr. Smith, how are you doing today? The weather is great, and city is awesome.
The sky is pinkish-blue. You shouldn't eat cardboard")
 (def tokenized-sent (tokenize/sent_tokenize text))
 tokenized-sent
 ;;=> ['Hello Mr. Smith, how are you doing today?', 'The weather is great, and city is awesome.', 'The sky is pinkish-blue.', "You shouldn't eat cardboard"]

Likewise, to tokenize words, you use tokenize/word_tokenize:

1
2
3
4
5
6
7
8
9
10
 (def text "Hello Mr. Smith, how are you doing today? The weather is great, and city is awesome.
The sky is pinkish-blue. You shouldn't eat cardboard")
 (def tokenized-sent (tokenize/sent_tokenize text))
 tokenized-sent
 ;;=> ['Hello Mr. Smith, how are you doing today?', 'The weather is great, and city is awesome.', 'The sky is pinkish-blue.', "You shouldn't eat cardboard"]


 (def tokenized-word (tokenize/word_tokenize text))
 tokenized-word
  ;;=> ['Hello', 'Mr.', 'Smith', ',', 'how', 'are', 'you', 'doing', 'today', '?', 'The', 'weather', 'is', 'great', ',', 'and', 'city', 'is', 'awesome', '.', 'The', 'sky', 'is', 'pinkish-blue', '.', 'You', 'should', "n't", 'eat', 'cardboard']

Frequency Distribution

You can also look at the frequency distribution of the words with using the probability package.

1
2
3
4
5
6
7
 (require-python '([nltk.probability :as probability]))

 (def fdist (probability/FreqDist tokenized-word))
 fdist ;=> <FreqDist with 25 samples and 30 outcomes>

 (py. fdist most_common)
  ;=> [('is', 3), (',', 2), ('The', 2), ('.', 2), ('Hello', 1), ('Mr.', 1), ('Smith', 1), ('how', 1), ('are', 1), ('you', 1), ('doing', 1), ('today', 1), ('?', 1), ('weather', 1), ('great', 1), ('and', 1), ('city', 1), ('awesome', 1), ('sky', 1), ('pinkish-blue', 1), ('You', 1), ('should', 1), ("n't", 1), ('eat', 1), ('cardboard', 1)]

Stop Words

Stop words are considered noise in text and there are ways to use the library to remove them using the nltk.corpus.

1
2
3
(def stop-words (into #{} (py. corpus/stopwords words "english")))
 stop-words
  ;=> #{"d" "itself" "more" "didn't" "ain" "won" "hers"....}

Now that we have a collection of the stop words, we can filter them out of our text in the normal way in Clojure.

1
2
3
4
5
6
7
8
(def filtered-sent (->> tokenized-sent
                         (map tokenize/word_tokenize)
                         (map #(remove stop-words %))))
 filtered-sent
 ;; (("Hello" "Mr." "Smith" "," "today" "?")
 ;; ("The" "weather" "great" "," "city" "awesome" ".")
 ;; ("The" "sky" "pinkish-blue" ".")
 ;; ("You" "n't" "eat" "cardboard"))

Lexion Normalization and Lemmatization

Stemming and Lemmatization allow ways for the text to be reduced to base words and normalized. For example, the word flying has a stemmed word of fli and a lemma of fly.

1
2
3
4
5
6
7
8
9
(require-python '([nltk.stem :as stem]))
(require-python '([nltk.stem.wordnet :as wordnet]))

(let [lem (wordnet/WordNetLemmatizer)
       stem (stem/PorterStemmer)
       word "flying"]
   {:lemmatized-word (py. lem lemmatize word "v")
    :stemmed-word (py. stem stem word)})
 ;=> {:lemmatized-word "fly", :stemmed-word "fli"}

POS Tagging

It also has support for Part-of-Speech (POS) Tagging. A quick example of that is:

1
2
3
4
5
6
7
8
(let [sent "Albert Einstein was born in Ulm, Germany in 1879."
       tokens (nltk/word_tokenize sent)]
   {:tokens tokens
    :pos-tag (nltk/pos_tag tokens)})
 ;; {:tokens
 ;; ['Albert', 'Einstein', 'was', 'born', 'in', 'Ulm', ',', 'Germany', 'in', '1879', '.'],
 ;; :pos-tag
 ;; [('Albert', 'NNP'), ('Einstein', 'NNP'), ('was', 'VBD'), ('born', 'VBN'), ('in', 'IN'), ('Ulm', 'NNP'), (',', ','), ('Germany', 'NNP'), ('in', 'IN'), ('1879', 'CD'), ('.', '.')]}

Phew! That’s a brief overview of what NLTK can do, now what about the other library SpaCy?

SpaCy

SpaCy is the main competitor to NLTK. It has a more opinionated library which is more object oriented than NLTK which mainly processes text. It has better performance for tokenization and POS tagging and has support for word vectors, which NLTK does not.

Let’s dive in a take a look at it.

Installation

To install spaCy, you will need to do:

  • pip3 install spacy
  • python3 -m spacy download en_core_web_sm to load up the small language model

We’ll be following along this tutorial

We will, of course, need to load up the library

1
(require-python '([spacy :as spacy]))

and its language model:

1
(def nlp (spacy/load "en_core_web_sm"))

Linguistic Annotations

There are many linguistic annotations that are available, from POS, lemmas, and more:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(let [doc (nlp "Apple is looking at buying U.K. startup for $1 billion")]
  (map (fn [token]
         [(py.- token text) (py.- token pos_) (py.- token dep_)])
       doc))
;; (["Apple" "PROPN" "nsubj"]
;;  ["is" "AUX" "aux"]
;;  ["looking" "VERB" "ROOT"]
;;  ["at" "ADP" "prep"]
;;  ["buying" "VERB" "pcomp"]
;;  ["U.K." "PROPN" "compound"]
;;  ["startup" "NOUN" "dobj"]
;;  ["for" "ADP" "prep"]
;;  ["$" "SYM" "quantmod"]
;;  ["1" "NUM" "compound"]
;;  ["billion" "NUM" "pobj"])

Here are some more:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(let [doc (nlp "Apple is looking at buying U.K. startup for $1 billion")]
  (map (fn [token]
         {:text (py.- token text)
          :lemma (py.- token lemma_)
          :pos (py.- token pos_)
          :tag (py.- token tag_)
          :dep (py.- token dep_)
          :shape (py.- token shape_)
          :alpha (py.- token is_alpha)
          :is_stop (py.- token is_stop)} )
       doc))

;; ({:text "Apple",
;;   :lemma "Apple",
;;   :pos "PROPN",
;;   :tag "NNP",
;;   :dep "nsubj",
;;   :shape "Xxxxx",
;;   :alpha true,
;;   :is_stop false}
;;  {:text "is",
;;   :lemma "be",
;;   :pos "AUX",
;;   :tag "VBZ",
;;   :dep "aux",
;;   :shape "xx",
;;   :alpha true,
;;   :is_stop true}
;;  ...

Named Entities

It also handles named entities in the same fashion.

1
2
3
4
5
6
7
8
9
10
11
(let [doc (nlp "Apple is looking at buying U.K. startup for $1 billion")]
  (map (fn [ent]
         {:text (py.- ent text)
          :start-char (py.- ent start_char)
          :end-char (py.- ent end_char)
          :label (py.- ent label_)} )
       (py.- doc ents)))

;; ({:text "Apple", :start-char 0, :end-char 5, :label "ORG"}
;;  {:text "U.K.", :start-char 27, :end-char 31, :label "GPE"}
;;  {:text "$1 billion", :start-char 44, :end-char 54, :label "MONEY"})

As you can see, it can handle pretty much the same things as NLTK. But let’s take a look at what it can do that NLTK and that is with word vectors.

Word Vectors

In order to use word vectors, you will have to load up a medium or large size data model because the small ones don’t ship with word vectors. You can do that at the command line with:

1
python3 -m spacy download en_core_web_md

You will need to restart your repl and then load it with:

1
2
(require-python '([spacy :as spacy]))
(def nlp (spacy/load "en_core_web_md"))

Now you can see cool word vector stuff!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(let [tokens (nlp "dog cat banana afskfsd")]
  (map (fn [token]
         {:text (py.- token text)
          :has-vector (py.- token has_vector)
          :vector_norm (py.- token vector_norm)
          :is_oov (py.- token is_oov)} )
       tokens))

;; ({:text "dog",
;;   :has-vector true,
;;   :vector_norm 7.033673286437988,
;;   :is_oov false}
;;  {:text "cat",
;;   :has-vector true,
;;   :vector_norm 6.680818557739258,
;;   :is_oov false}
;;  {:text "banana",
;;   :has-vector true,
;;   :vector_norm 6.700014114379883,
;;   :is_oov false}
;;  {:text "afskfsd", :has-vector false, :vector_norm 0.0, :is_oov true})

And find similarity between different words.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(let [tokens (nlp "dog cat banana")]
  (for [token1 tokens
        token2 tokens]
    {:token1 (py.- token1 text)
     :token2 (py.- token2 text)
     :similarity (py. token1 similarity token2)}))

;; ({:token1 "dog", :token2 "dog", :similarity 1.0}
;;  {:token1 "dog", :token2 "cat", :similarity 0.8016854524612427}
;;  {:token1 "dog", :token2 "banana", :similarity 0.2432764321565628}
;;  {:token1 "cat", :token2 "dog", :similarity 0.8016854524612427}
;;  {:token1 "cat", :token2 "cat", :similarity 1.0}
;;  {:token1 "cat", :token2 "banana", :similarity 0.28154364228248596}
;;  {:token1 "banana", :token2 "dog", :similarity 0.2432764321565628}
;;  {:token1 "banana", :token2 "cat", :similarity 0.28154364228248596}
;;  {:token1 "banana", :token2 "banana", :similarity 1.0})

Wrap up

We’ve seen a grand tour of the two most popular natural language python libraries that you can now use through Clojure interop!

I hope you’ve enjoyed it and if you are interested in exploring yourself, the code examples are here

Permalink

Ep 065: Stuck in the Middle

Each week, we discuss a different topic about Clojure and functional programming.

If you have a question or topic you’d like us to discuss, tweet @clojuredesign, send an email to feedback@clojuredesign.club, or join the #clojuredesign-podcast channel on the Clojurians Slack.

This week, the topic is: “Ring middleware.” We find that the middle is a very good place to start when almost everything is composed functions.

Selected quotes:

  • “It’s all fun from Ring on up!”
  • “I want to have more than one handler.”
  • “We want to build up different scenarios declaratively.”
  • “The way you compose functions together is by building a function that takes functions and returns functions.”

Links:

Permalink

Journal 2020.2 - s3 transporter, survey

Happy new year!

clj s3 transporter

I spent a good chunk of the last two weeks wrapping up and releasing the new S3 transporter for tools.deps/clj (supporting access to public and private S3 repos). The S3 transporter now uses a new implementation of the Maven resolver transporter API based on the cognitect aws-api.

The old version used the transporter-wagon adapter, the unmaintained spring s3 wagon, and the AWS Java SDK. Those relatively heavy-weight deps have been happily dropped on the floor, and all other deps were bumped as well. Support for some newer AWS credential sources may be better, particularly when running on AWS.

s3 buckets are inherently region-based. The old s3 wagon, and the new implementation, will attempt to determine a bucket’s region to properly access it. In cases where that doesn’t succeed, the new s3 transporter can explicitly set the s3 bucket region in your deps.edn like {:mvn/repos {:url “<s3://bucket/path?region=us-east-1>”}} - this is a feature of this transporter, not a general AWS thing.

The intention here is that any existing use of an s3 maven repo should continue to work as it did before. If you find that not to be the case, please let me know! It has been tested in a number of configurations by several people, but you know how it goes.

Also, in the latest clj, clj -h will tell you the version on the first line of the help, which had been requested.

Clojure survey

The 2020 State of Clojure Survey has been open for the last couple weeks and is now officially closed, but still accepting responses until Monday if you missed it somehow. We will start poring over the results and hope to have a write up and the full results released in a couple weeks.

Misc

Lots of other tidbits this week:

  • Rich Hickey’s paper on the History of Clojure was accepted to the HOPL IV conference in London in June 2020. Several of us are planning to attend and we hope to connect with the London Clojure community while we’re there - details very TBD! I’ve been helping Rich with some of the data for the paper this week.
  • Added a page about clj on brew to the web site, with help on switching to an older version if you need to do that for some reason. Also looking at some ideas about ways to better provide advance “dev” versions of clj, perhaps from our own tap.
  • Fixed a few issues in tools.deps related to the depedency updates around the s3 transporter.
  • Starting to refresh some stuff for future Clojure work again.
  • Working on odds and ends for spec 2, bigger stuff coming.

Other stuff I enjoyed this week

I’m a long time Pearl Jam fan, and their new single from the forthcoming album goes in a much different direction, calling forth some Talking Heads maybe?!? I dig the track, although it sounds like it has a drum machine instead of Matt Cameron. I’ve seen references online that it’s actually not, but have not seen anything definitive. Anyway, can’t wait to hear the rest of the album and see them on the tour!

Permalink

Transparent Functions with Equality Semantics

Opaque and transparent functions

Opaque functions are functions that encapsulate all the state and scope they have and provide only one interaction interface: invoke. This is how functions in Clojure operate by default, and this seems to be a great fundamental building block.

By transparent functions I mean functions that might also be data, meaning they provide some sort of additional insight into what kind of scope they have, possibly allowing to create derived functions with changed scope, and by being data, additionally providing equality semantics.

There are good reasons why functions are opaque by default:

  • It is very hard to define equality for functions. The class name of a function is not a guarantee of equality. Bytecode or source code equivalence is not enough if function closes over some state. Functions with equal behavior on any input can have different code structure.
  • You can’t easily serialize function as you can serialize data. Putting function that closes over database connection on a wire just does not make any sense.

Partial transparency

With that said, for some functions it’s very easy to define equality semantics. For partial it’s equality of wrapped function and arguments. For comp it’s a chain of wrapped functions. For constantly it’s its return value. Middleware pattern (wrapping functions with functions) can be viewed as the interceptor chain.

How to make transparent functions

The trick is to define a record that implements IFn interface. That way you get a function that is also a map, so you can create derived functions by changing it as a map. Example:

(defrecord Add [x]
  clojure.lang.IFn
  (invoke [_ y]
    (+ x y)))
;; => user.Add

(def add-5 (->Add 5)) 
;; => #user.Add{:x 5}

(def add-6 (update add-5 :x inc))
;; => #user.Add{:x 6}

(add-6 1)
;; => 7

If you want it to work with apply, you will also need to override applyTo method defined on IFn.

Do you need it?

I’ve felt the need for transparent functions twice, and both times later I decided to achieve my goals using other means, so it might be a sign of some issues with the approach taken. That’s why I decided against writing a helper library to reduce boilerplate that might be involved. Hopefully, you’ll find it useful at least as food for thought.

What do you think? Discuss on reddit.

Permalink

Lisp: A language for stratified design

In this first episode of season 3, we analyze a great paper called Lisp: A language for stratified design. The paper asks the question “What is Lisp good for?”

Download the paper.

Video Thumbnail

Lisp: A language for stratified design

In this first episode of season 3, we analyze a great paper called Lisp: A language for stratified design. The paper asks the question "What is Lisp good for?" Download the paper. https://share.transistor.fm/e/4e36ae9b https://www.youtube.com/watch?v=GbZpTHg0KfQ     Transcript Eric Normand

 


 


Transcript

Eric Normand: “People who first learn about Lisp often want to know for what particular programming problems Lisp is the right language. The truth is that Lisp is not the right language for any particular problem. Rather, Lisp encourages one to attack a new problem by implementing new languages tailored to that problem. Such a language might embody an alternative computational paradigm as in the rule language, or it might be a collection of procedures that implement new primitives, means of combination, and means of abstraction embedded within Lisp, as in the Henderson drawing language.”

My name is Eric Normand. We are talking about a paper today called Lisp: a language for stratified design by Harold Ableson and Gerald J. Sussman. Now, these are the authors of Structure and Interpretation of Computer Programs, and, as far as I can tell, they take two examples from within the book and extract them out in a research paper/article that they published. And this is dated August, 1987.

That paragraph was from the conclusion, it’s like almost the last thing they say. And the reason I did that section is because I think that’s the, every research paper has sort of like the main question that it’s trying to answer. And I think that’s the answer. And the question is, what is Lisp good for?

Why do people like Lisp? Now we have to go back to 1987, you know, for context, historical context, there aren’t a lot of functional programming languages and Scheme was very young at that time. These people worked on the original Scheme, the authors of this paper. And so it’s important to put ourselves back there when people talk about Lisp, a lot of what they’re talking about is functional programming, what we would call functional programming today. That is first-class functions, a lot of interesting data types that you can manipulate recursively, things like that, that has really differed from what other languages gave you. Okay, so I’m going to be reading excerpts from this paper.

I really liked this paper and I’ve read it, I don’t know, four or five times now. I made notes and so I’m going to read some sections and talk about them. Okay. Just as an analysis of the paper. Like I said, August, 1987 so we’re going to have to go back in time a little bit here. Okay. So the first part I want to read is in the abstract.

“Lisp as a language for expressing the design and organization of computational systems.”

Okay. So this is, you know, if you go back to the context in which this was written, people weren’t talking about programming languages as a means for expressing a design or an organization. It was a way to get the computer to do what you needed it to do. Okay. So they’re kind of establishing what I think today is a lot more well-understood and well-accepted that there are different readers, you have to write the program for your machine, for your compilers so that it can run on the machine. But you’re also writing it for human readers, for yourself at a later date. Because you’re going to be reading this later, working on a team of people building the software. And so they’re trying to build out this idea that we kind of take for granted today.

So that’s a big theme when you’re reading these old papers that sometimes you stumble upon a paper where they’re saying something totally obvious to someone from 2019, which is when I’m— 2020, when I’m recording this. But then you realize, Oh no, back then it must have been different. This must have been something you had to say as context for why you’re writing the paper.

So I think that that’s very important in this paper. For instance, he said, or they say “a programming language is a medium for communicating methods, not just a means for getting a computer to perform. Operations programs are written for people to read as much as they are written for machines to execute.”

Okay. This is, this takes a very prominent place in the paper. It’s in the first paragraph that they’re trying to establish this idea that I think is pretty well understood today. You’re not just trying to write the most efficient program for your computer to run. You need to basically live in this software as an employee of the company, you’re going to be dealing with the software all the time, modifying it, trying to debug it. You need it to be for people as well.

“This article exhibits programs that illustrate the power of Lisp as a language for expressing the design and organization of computational systems.”

Okay, so it’s answering that question. What is Lisp for? Right. It’s more than—according to this paper, from what they’re trying to say—is that it does more than other languages do.

It’s particularly good at something. There’s a reason they like Lisp. They wrote scheme, right? They must have chosen a Lisp for a reason. So they’re trying to explain themselves.

Okay. And now they’re talking about the paper.

“The examples are chosen to highlight the importance of abstraction in program design and to draw attention to the use of procedures to express abstractions. Any programming language provides primitive components, means by which these can be combined, and means by which patterns and combinations can be named and manipulated as if they were primitive.”

So often when we’re reading, when we’re researching, we’re looking for. The Truth. And that’s dangerous because the truth is elusive. And often there are multiple truths and it’s much better to find perspectives, different ways of looking at some phenomenon. And so I’m going to invite you at this time to not look at like, is it true that lisp is better for something?

Or, what is a programming language and what does it provide? I want The Truth, and rather let this paper be a doorway. Enter into this world. You can always leave. Okay. It’s totally safe, but they’re establishing some definitions here. This happens to be the same definition that they give at the beginning of Structure and Interpretation of Computer Programs.

Another reason why this is a great paper: it’s short. It boils down a really great idea. But let’s go over those things again, a program, “any programming language provides primitive components, means by which these can be combined and means by which patterns of combination can be named and manipulated as if they were primitive.”

Okay, and this last one, this. Patterns can be combined and named and manipulated. This is what they mean by abstraction. Okay? There, there are different definitions of abstraction and sometimes we even use it without really knowing what we mean by it, but they’re defining it right here. So you write a new function, you give it a name, and now that function, because it’s first class, it can be combined in other ways using other primitives that exist.

Okay. So I want to stick to this for a little bit. So primitive components. This is the stuff that comes with the language, okay? These are your built in data types. Your built in functions. Then you have means by which these can be combined. Okay. So in a Lisp, that would be, make a new function or making a compound data type if you’ve got types in your language. In something like Java, it would be you can combine these classes, these objects into a new class.

And then means by which those patterns of combinations—so the things that you created and combined—you can give them a name and manipulate them as if they were primitive.

So you’re back in the cycle. You’ve built all these things. Now, you can act like they’re primitive. You could act like they came with the language. But you wrote them. They’re part of your software. Okay? So this is opening this idea that. There’s a method to our madness, right, that we can, by having the cycle where I can take something, build something new out of it, and then treat that something as if it was originally with the language.

Okay. A lot of languages can’t do that. In a lot of languages, you could make a subroutine. But that subroutine is different from the keyword that came with the language. Right. Or you could make a data type, but it’s never going to be like the data types that come with the language.

Some languages are like that. Probably at this time, more languages were like that. Okay. We’ve learned a lot since then.

Okay, so I’m going to continue.

“With appropriate abstractions to separate the specification of components from the details of their implementation. We can provide a library of standard components that can be freely interconnected, allowing great flexibility in design.”

Okay. So there’s a lot of words in here that we need to go into, and they’re going to touch on this topic.

This is very early. They’re still introducing what they’re going to talk about. And I’ve gone through this sentence many times trying to really understand it. I mean, it sounds good, but I wanted to understand. These papers are written very tightly. I’ve written research papers before and you write them and you rework phrasing. You have a space limit.

So you’re trying to condense stuff and you don’t want to miss this important idea. So you, you really tighten up your language so that you express everything. So I trust that they’ve done that and that this sentence, very dense sentence, contains a lot of information. Okay. So

with appropriate abstractions”

which means you have to do some work. You have to find the abstractions. Can’t just use any old abstractions. You have to do this on purpose. You’re trying to find these abstractions, which remember means you build a thing and you name it. Now you can use that thing as if it were primitive.

To separate the specification of components from the details of their implementation.”

Now, you don’t know how many times I poured over this phrasing, and re-read the article with this idea in mind. This is what I think it means. I don’t think it’s clear. It’s a guess. Okay. It’s 80% 90% sure. The specification of components. I think what they are talking about in an, in an abstract, conceptual way is the function signature. Right? Well, we would think of, in a typed language, the type of the function, what arguments does it take, what are their types, and what does it return. Okay. And then maybe the documentation. Like what is it supposed to do with that? Right? Cause it’s, you can’t just go just by the method, by the function signature.

You have to go also by what’s it supposed to do. But that’s all the spec, the specification. So you’re separating that from the details of the implementation and the implementation is the body of the function. Okay. How does it work? What does it do mechanically? What is it doing. Okay.

“We can provide a library of standard components that can be freely interconnected.”

Standard. It’s an interesting idea, right? The standard components. I think what they mean is general components or components that you would look at them and say, yes, I can reuse that. That’s a standard. That’s the standard meaning of that idea. A good implementation. So you can provide a library of standard components that can be freely interconnected, allowing great flexibility in design.

So that is the sentence. This incredibly dense sentence is introducing this method that they are about to explain. Okay. Now again, the title is Lisp: A language for stratified design. So they’re explaining the stratified design method. Okay. All right.

“A language for design should not unnecessarily limit our ability to make abstractions.”

What do they mean by limit? Limit our ability.

“Most traditional programming languages, however, place arbitrary restrictions on procedural abstractions.”

They’re saying that there are things that we should be able to do, that our languages don’t let us to.

“Three common restrictions are 1) requiring that a procedure be named and then referred to by name rather than stating its definition at the point of reference.”

Alright, so that already is there. They want anonymous functions. Many languages don’t have anonymous functions and you have to define them with a name. And then refer to them by name later. So they want anonymous functions. Okay. That’s, that’s great. Luckily, in 2020, a lot of languages have anonymous functions. JavaScript does, I mean, you could even say that Java does. Now in a certain sense, most functional languages obviously have anonymous functions. So that’s progress, I guess, progress. Snd also good. That means you might be able to do stratified design in these other languages besides lisp. Okay.

“2) forbidding procedures to be returned as the values of other procedures.”

Now, when I read this, I keep thinking about the use of the word procedure. They’re not using the word function. And I wonder if it’s just a trend now to use the word function, because you know, the JavaScript keyword is function. You know, there’s like just a lot of stuff in the air. We’re calling them functions, C called them functions. They’re calling them procedures, which I really respect because that’s what they are.

They’re step-by-step. They’re imperative, you know, at least in Scheme. I know other languages like Haskell do aspire to have the feature called functions is supposed to be much more like the mathematical notion of function.

But in scheme, they don’t do that. So I really appreciate using the word procedure here. And so number two, they’re saying that a lot of languages don’t let you return functions from another function. Right. Return a procedure from a procedure. Luckily we have that. Also in a lot of languages.

So let’s go to

“3) forbidding procedures to be components of such data structures as records or arrays.”

I mean, really the three restrictions that they’ve brought up that are common, they say, maybe they’re not so common anymore, and they’re really just all about first-class functions. They’re saying we need first-class functions.

We want first-class functions. That is great. I feel like we’ve made a lot of progress, and I’ve said this before. I think that JavaScript is the best thing to happen to functional programming in a long time. Just because of this, right? We have first-class functions. People are used to doing anonymous functions all over the place, returning functions, taking functions as arguments.

It’s great. All right. So they’re gonna start contrasting the stratified design with what they’re calling the top down structured design. Okay. They’re calling it a well-publicized programming and methods methodology. I haven’t read these papers. But I, I do know that one thing I was taught in school was you have this program to write. And so you break it up into steps and then each of those steps becomes a procedure. And then those procedures become a bunch of steps with each step is a procedure and you just kind of take it from the top down. Like what are the things that need to happen?

And then, well, that’s kinda hard. It’s not a one liner. So let’s make it a procedure. And then that’s not a one liner. So let’s make it, you know, four procedures and. You just break it up from the top down.

Okay, so they don’t like that.

“The methodology is flawed. If a system is to be robust, it must have more generality than is needed for the particular application.”

I think this is really interesting, a really interesting statement. If it wasn’t in a dense, published peer reviewed article, you might just think, Oh, this is just, you know, we’re you so used to writing, reading blogs that were just kind of like first drafts?

Right. So I’m not sure about Harold Ableson, but Gerald Sussman was an electrical engineer and they have a law, I can’t remember the name of it, but you’re supposed to be a better signal generator than what you can receive. And that cleans up the electrical signal through a circuit, right?

So let’s say you’re a resistor and some component that is feeding you voltage is noisy, right? It’s supposed to be a clean plus five volts, but it’s like wavy, right? It’s not clean. You’re trying to make your resistor able to deal with that, but also output a clean five volts. You don’t want to pass the garbage through. You want to clean it up and. This is related. The idea of control systems, cybernetics, that you’ve got to have a more flexible system. To be robust, the system has to be more flexible than the application it’s dealing with. Right? Otherwise, you don’t get control if you are less flexible.

“The means of combining the parts must allow for after-the-fact changes in the design plan as bugs are discovered and as requirements change.”

Okay? So we have a clear benefit here. They’re saying we need to be able to change the design as bugs are discovered and requirements change. Okay. This is 1987 just for context.

“It must be easy to substitute parts for one another and to vary the arrangement by which parts are combined.”

Okay. And so they’re saying that, okay, wait, let me finish.

“This is necessary so that small changes in the problem to be solved can be effected by making small changes in the design.”

Small change should be a small change. Small change in the system means a small change in the design there. This is still that same paragraph where they are saying that the top down structured design. doesn’t work. You cannot take a system, break it down through procedural abstraction like that, and then vary the design very easily. That’s what they’re saying. That is their assertion. What you do is you make a system that does exactly what you thought it was supposed to do at the beginning, and it doesn’t have more generality.

It’s not more flexible than the system it is programming. It’s probably either exactly as flexible or a little less flexible. And so now as your requirements change, you learn something new. A bug happens. Your code, which did exactly the right thing is now. Not able to, I mean, you can change it, but it’d be a lot of work for even a small change.

Okay. So now this was their sort of antithesis. Right? So now we’re starting into what is stratified design.

“To this end, expert engineers stratify complex designs. Each level is constructed as a stylized combination of interchangeable parts that are regarded as primitive at that level. The parts constructed at each level are used as primitives at the next level. Each level of stratified design may be thought of as a specialized language with a variety of primitives and means of combination appropriate to that level of detail.”

This is okay. I mean, this is a money sentence right here. Money paragraph. Okay. Because what they’ve done is they’ve taken their definition of what a programming language has. Three things. And they’ve said, we can stratify this. We can say, well, one level defines the primitives. The next level defines new means of combination that are named, and then those named things become primitives at the next level. Okay. And stratified means layers layered.

They’re saying low levels, layers, levels. I think they’re using them interchangeably here. Okay. So each level is constructed as a stylized combination of interchangeable parts that are regarded as primitive at that level. So you’ve got your primitives that are defined on one level. Then at the level you’re programming at right now, you combine the and name them. And then at the next level, those things that you just named are the primitives. That’s the stratified design. Okay, so you’re basically inventing languages at each level, what they’re calling languages, because you have the means of combination, you have the primitives, the means of combination, and the way to name something.

Okay. They talk, they give an example of electrical design and you know, I’m not really familiar enough with that to really understand the digital circuits, so I’m not going to read it.

“The real power of lisp is that it’s unrestricted. Abstractions support the construction of new languages, greatly facilitating the strategy of stratified design.”

So this is it. This is where they’re explaining the title. They’re saying the main benefit of Lisp is stratified design. The main benefit, you know, what is, you know, people ask me this about Clojure all the time. Well, what’s it, why? You know, what problem is it particularly good at?

This comes from this idea of say Python. Oh, it’s good for machine learning. Right. Or JavaScript. Oh, well, it’s good, you know, for doing webpages. What is Clojure good for? Well, you know, it doesn’t have an easy answer. Right. But this is an answer. This is an answer. It is their answer. The real power of Lisp is that it’s unrestricted abstractions.

Basically they’re saying first-class functions support the construction of new languages, greatly facilitating the strategy of stratified design. So there you go. It’s a language for, you know, we might call them DSLs now. But it’s a language that lets you build new languages and they actually, in the paper, they go over two different ways that you build two languages.

Wow. I’m only on page two. Okay. So that’s just because that’s a very dense introduction. I read a lot of that. Okay. So I’m gonna start skipping around here in this.

This is section one. This is expressing abstractions as procedures.

“Procedural abstractions can help elucidate a process by allowing us to express it as an instance of a more general idea.”

What they’re saying here is that when you solve a problem with a procedure, you are solving a general idea, a general process. Right? And you can take that procedure. And actually if the, if you have, let’s say you’ve written a program to do something a certain way, let’s say using, I’m using a for loop, you can actually extract out that four loop.

Into map, for instance, which is a more general idea than that specific instance. A for loop that you used. It’s not more general than a for loop in general, because that’s a, that can do anything, but it’s more specific. map is more general than that particular use of the for loop, and now you can use that procedure map in place of that for loop.

So you’re extracting out a more general idea. And notice this is subtle, but if you’re thinking of building layers out of the stuff under it, the more general stuff is at the bottom, right? You’re extracting map out, since this function calls map as a primitive, it has to be above it. And so map is now the more general idea lower down.

And this makes sense if you follow that logic, like all the way to its conclusion. So map is written out of for loop, which is more general. And the for loop is compiles to machine code, which itself is more general, like that’s what the machine knows how to do, and then that’s built out of NAND Gates and stuff.

That’s even more general, so it makes sense if you follow this down, that general stuff goes to the bottom. I think one of the confusions is that general and abstract tend to be used interchangeably, but here what they mean by general is you can use it for more situations. Abstract doesn’t really mean you can use it in more situations.

And they have a specific definition of abstract, which means like an abstract concept. You know, you see people holding hands and people kissing and you’d say, Oh, that’s all love, right? So you, you’ve named this thing that abstracted away all the details and now you can talk about love as a thing.

That’s what they’re talking about. Now, love can be part of your philosophy. You can write books about it. It doesn’t have to be any particular person and their love. Okay. All right.

So they go into a square root implementation. And I’m just gonna I’m not gonna read that. You should read the paper, but I’m not gonna read it here. I don’t have much to say about it. But they mentioned that it “is not built out of components that can be easily isolated for use in solving other problems”. They solve the problem. It works. They have a procedure that does what it needs to do, but it is not built out of components that can be easily isolated.

So they proceed to isolate little pieces of it, pull those out into more general functions that are. Reusable for solving other problems.

Okay. So after doing that, they have a couple of other functions that are now reusable. They can use it in other situations.

“The advantage of this formulation is that it decomposes the method into useful pieces. The two pieces are finding fixed points of general functions and using damping to encourage convergence. These ideas are formalized as procedural abstractions identifiable units that are available to be used in other contexts.”

Okay. They’re now saying this is how mechanically it is done. You have some problem, stuff gets pulled out underneath, and now there are primitives that can be used at this higher layer, right?

They’re trying to explain this idea of abstraction, that this is how abstraction can give you some kind of power, some reuse, a general purposeness, right? And you’re actually expressing a design, a deeper design. Now we’re getting into stratified design.

This is section two. Okay. All right, so this describes this whole like picture language.

It’s really interesting, really fascinating, but I don’t want to go into the picture language, but I do want to talk about some points that they make within that. So just a little bit about the picture language.

“There is a language of primitive pictures that are constructed from points and lines. Built on top of this is language of geometric combination that describes how pictures are placed relative to one another to make compound pictures. Built on this is a language that abstracts common patterns of picture combination.”

All right, so what they’re describing here is actually four different levels. So first there’s these points in lines, however those are described. And then on top of that, there’s a language of primitive pictures that are just, you know, points and lines. Just drawing little things. Okay. On top of that, this is the third level. There’s a language of geometric combination. So stuff like, put this next to this, but this on top of this, you know, flip this one and, and, and flip it and mirror it. So you have the, the two next to each other. And then four, a language that abstracts common patterns of picture combination.

So they have all sorts of cool stuff, like, you know, find the fixed point of spreading this out and it like gets smaller each time. It’s, it’s pretty neat. They have four different levels that this is abstracted into. I’m not going to go into all of those different operators they have, but here’s an important idea that they bring up.

“In this, the set of all pictures is closed under combination.”

Okay. They have this function called beside. A procedure, sorry. A procedure called beside that takes two pictures. Okay. But it makes a new picture, right? It’s closed on in the abstract algebra sense that he argument types are the same as the return type, which means you can use the result of a beside in another beside or in any of the other combinations.

And this is this kind of snake eating its tail that gives it a lot of power. I can use these combinators, these ways of combining pictures, and use them and compose them up into bigger, larger structures. And I’m never changing types, right? This means that all of them are still available. I can use them recursively. It’s just a very nice, elegant property that you never leave the sphere of pictures. In this whole system. You know, first there’s the points in lines. I think that those aren’t pictures, but once you’ve got at the second level of language, a primitive pictures, third level and fourth level are also pictures.

They’re just means of combining those pictures. All right, so there’s two properties here.

“The combinator language derives its power from the closure and abstraction properties of the combinators. That is why it can describe seemingly complex figures using only a few simple ideas.”

Alan Perlis said something like, I’d rather have one data structure operated on by a hundred functions than a hundred data structures operated on from one function or like 10 data structures operated by 10 functions each. This is that same idea that when you’re operating on a specific data type and you’ve got all these operations that work on it and return values of that same type, everything you make can be combined in more ways. It is an exponentially growing system of combinations that you can do. And so at each level, you’re actually increasing the design freedom. You’re increasing the number of combinations that you can perform with these primitives. Okay. That is what they’re talking about.

The power comes from abstracting, meaning I can make a new thing, refer to it by name and pass it to arguments. It has the same status—it’s first-class—has the same status as the built-in ones. And because it’s closed, all the stuff that already exists for operating on pictures can operate on this new thing I’ve made, and so you’re just exponentially increasing the number of possible combinations.

Okay.

No, we’ve got this, and remember there’s two properties. I think I should make that clear. There’s the closure property. And then there’s the abstraction, which is that you can, you can make stuff that has the same status as the other things. It can be primitive. It can act in the same way as primitive things.

And then closure, which is that—closure with an S. right. I know, as I said, I’ve said Clojure already with a J. This is closure with an S, that a combined two pictures, you get a picture out. Okay, so these two properties are what give you this explosion of possibilities.

“We can vary the pieces at any level. We can change the location of a point in the primitive picture leg, [which they defined somewhere]. We can replace the compound picture animal by some other basic repeated unit. We can replace triangle by some other combination to be pushed, or we can replace push by some other transformation of combinators.”

All right, so what they’re talking about here, it’s another important idea. It’s a different idea that you can change one piece of this. So, okay, you build up this complex combination of things, starting from primitives at the lowest level, all the way up to stuff like the fourth level. And at any level you can choose to change something.

I want to change the leg of this cow. Right? You’re drawing a cow and you’re doing this MC Escher thing where it’s like mirrored and rotated and tessellated and shrunk down and pushed to the side and like, so it’s this, it’s this weird pattern, complex pattern, and I want to change the leg. So you just change the procedure leg and now all of the legs change the patterns still persists.

They haven’t changed the pattern. You’ve just changed the leg. Or you can replace the animal. You’d say, I don’t want a cow. I want a dog. All right. Now you can replace triangle, which was one of the Combinator is that made a corner. Okay. That made like it took, it took like a square drawing and then shrunk it down and repeated it around the edges of that.

On one side, so it made a bigger square out of the smaller pieces and then push, which took the fixed point of that, which basically took that and applied it again and again and again until, you know, you’re down below the pixel level and you can’t see the pattern anymore.

So this is what they’re talking about, about being able to change the requirements and the design. So much of their code can stay the same, not have to change it. You can totally change the pattern. The animal, the how the animal, you know, leg is, is drawn, all that stuff. Without changing the rest, I can just pinpoint a different level of abstraction.

Now, I was listening to a podcast with Donald Knuth recently, and he was talking about how he believes that that is one of the requirements for being a good programmer, is to be able to move up and down these different levels of abstraction very freely. So you write a program and you have to think, wait, is this a problem in the compiler or in my, you know, basic data structures?

Or is this as a logic problem at some other level? Or is it an off by one error? Because you’ve got this huge stack of abstractions that you’re dealing with. You have to be able to move up and down that and diagnose problems freely within that. And so.

Likewise. When you’re making design changes, you have to think, wait, I don’t need a change. Most of it, I just need to change the leg. Right. All right, so that was what, what they are calling stratified design using procedural abstraction. Okay. You can make a new language just out of procedures, because we’ll remember what they call a language has three things. Primitives, a means of combination, and means of abstraction. And if you have those three things, you have a language, which is why they were calling the different levels, different languages, okay. But all they were doing was defining new functions, new procedures.

Now they’re getting into a section called metalinguistic abstractions. Now, the first time I read it, I was just, Oh, yeah, metalinguistic abstractions, but then I was like, wait a second, what does that even mean metalinguistic abstractions? Like, is my mind going to be blown like, is this, am I ready for this? Do I need to sit down? But it’s actually a thing that we do in Lisp all the time.

And we’re going to go into it. It’s actually a pretty cool little section here. This is section three, if you’re following along.

“For some problems, the appropriate means of combination may be awkward to express his compositions of procedures, towers of abstractions may not suffice.”

Okay. Just awkward. Its saying, yeah, this doesn’t always work. These building up procedures on top of procedures. Then they talk about different kinds of programming paradigms. They mentioned object-oriented, imperative and logic.

“Then no single paradigm is sufficient. Large systems typically have some parts that are naturally described using one style and other parts that are more naturally expressed in other ways.

“Part of the wonder of computation is that we have the freedom to change the framework by which the descriptions of processes are combined.”

Here’s a mindblower here.

“If we can precisely describe a system in any well-defined notation, then we can build an interpreter to execute programs expressed in the new notation or we can build a compiler to translate programs expressed in the new notation into any other programming language.”

That’s the wonder of computation, right? This is like back to Turing completeness. Right? Any Turing-complete machine is able to translate, is able to be run, interpreted, or compiled to any other system, which is awesome.

And it’s wonder, right, that once you hit a certain set of features, boom, you’re done. You got, you got everything you need ever. You’re never going to be able to do better than that. Which is cool. But he’s saying something, they are saying something significant here, besides just being philosophical, that a large system typically has some parts that are naturally described using, say, imperative programming and other parts that are more naturally expressed in object-oriented or in logic or something like that.

All right, so this is what they’re talking about when they talk about metalinguistic.

“When we design a system that incorporates an application language, we must address metalinguistic issues as well as linguistic issues. We must not only consider how to describe a process, but also how to describe the language in which the process is to be described. We must view our program from two perspectives. From the perspective of the interpreter or compiler, an application program is merely data. An interpreter or program operates on that data without reference to what the program is intended to express. When we write programs in the application language, we view that same data as a program with meaning in the application domain.”

Okay, so this is what the metalinguistic is. You’re not just programming. Now you are thinking about how like designing a language, right? How do I describe the program? What is the language in which I can describe this program that is the metal linguistic abstraction? There’s two levels. One is I’m just coding. I know the language. I’m just typing the syntax out. That’s, that’s what I do. The other level is I’m now designing how that language looks, what it’s going to do all these things. These are metalinguistic and at that level, your program is data. The stream of bytes, I have to feed it to a parser cetera.

All right?

“Interpreters and compilers are just programs, but they are special programs. An application program to be interpreted or compiled is not a composition of abstractions of the language in which the interpreter is implemented or to which the compiler translates. This linguistic shift transcends the limits of abstraction.”

Okay. So what they’re saying is there are things that you can’t do just by making new functions and naming them. Okay. There are, you can shift to a new paradigm, a new language that you couldn’t do with just some functions that you combined, right? You’re getting something else. A totally new paradigm— some new abstraction, meaning there’s details you can now ignore because you’re letting the interpreter a compiler deal with. All right.

That brings up a good point. Abstraction, which they, they talk about only in these titles, like metalinguistic abstraction, in another sense, it’s not just naming, okay.

In this other sense of abstraction, because there are multiple senses. But in another sense, abstraction is, treating two things that are different in the same way. Okay? So I said before, two people holding hands, they love each other. Two people kissing. They love each other, different people. They are different phenomena. They are different people. They happen at different times, but we treat them the same. They’re both love. Okay? That is what we can do with abstraction, we’re ignoring the details. Now when I’m talking about love, I’m not talking about those two particular hand holders. Talking about everybody. You know what I mean?

It doesn’t matter. You could change people, right? Any people holding hands? This is abstraction. This is, we do this all the time. We don’t even think about it. And so they’re talking about this here. So it says “the interpreter or compiler operates on that data without reference to what the program is intended to express.”

So they are able to, at this low level of the interpreter, they’re ignoring the details of what the program is even supposed to do, right. It’s just like, know, look, you give me some bytes. I parse them. I do what it says. I don’t care what this is for. It could be, you know, curing cancer, or it could be sending missiles.

Like, that’s not my job. I don’t, I don’t, I don’t care about that. But then at the other level, you know, in the program when you’re writing your application, you certainly care about whether you’re curing cancer or launching missiles. But what you don’t care about is what machine code is this going to get compiled into?

Ah, that’s the compiler’s job, right? So you’re able to ignore these details, and that is a kind of abstraction. The different kind from what they usually talk about in this. But it means that each layer—this is me going a little past where the paper goes—but each layer in this stratified design is a different level of abstraction. It has different details that you focus on and details that you can ignore. That’s just my little bit of extrapolation there.

They make this statement “Lisp’s facility with symbolic data provides unusually good support for developing such subsystems.” And they’re talking about interpreters and compilers.

Okay? In Lisp, we have the symbol data structure. We have list data structure, and that gives you quite a lot to work with for writing interpreters. They go through a rule language in which you express a simplification of algebraic expressions. And it’s interesting, I’m not gonna read it, but then they write the rule interpreter.

So they described the language. Then they describe the interpreter and again, they say

“Programming a simplifier in the rule language allows us to focus on the rules themselves without becoming distracted by concerns about the strategy and mechanism by which rules are applied.”

So this linguistic, this metalinguistic abstraction, is this second kind of abstraction and not the kind of abstraction they talk about in the first part of the thing, it’s an abstraction where you get to ignore details. It’s not about naming and manipulating the thing at a different level, it’s about being able to ignore details.

I can write these rules. I don’t need to know the implementation of the rule language interpreter. And then likewise, I can write this interpreter. I don’t know what it’s even going to be applied to. And in fact, they talk about a different kind of a rule language that could express other kinds of simplifications, not just algebraic ones.

It’s really interesting. You should read this, but I’m going to breeze through, and move on to the conclusion. I’m going to reread the same paragraph that I read at the beginning. So this is in the conclusion.

“People who first learn about Lisp often want to know for what particular programming problems Lisp is the right language. The truth is that Lisp is not the right language for any particular problem. Rather, Lisp encourages one to attack a new problem by implementing new languages tailored to that problem. Such a language might embody an alternative computational paradigm as in the rule language, or it might be a collection of procedures that implement new primitives, means of combination, and means of abstraction embedded within Lisp as in the drawing language.”

“Perhaps that is why Lisp, although the second oldest computer language in widespread use today [only Fortran is older], still seems new and adaptable [1987 remember] and continues to accommodate current ideas about programming methodology.”

And it was true. They were very prolific with compiler, design, and you know, all sorts of stuff when they were doing scheme. Okay. So this is A note on Scheme. It’s like after the conclusion. Number five,

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions, then make additional features appear necessary”.

So this is their philosophy of Scheme right here, which is actually really nicely put. A scheme is a very minimal language, and they, they just made it so that we have first-class functions. One of the innovations they came up with was a way to compile functions more efficiently so that function calls were very cheap.

Subroutine calls in languages up until then were actually really expensive. People would save all the registers and you know, it was like, okay, we’re getting ready, we’re going to jump all the way over there. We got to be able to get back here, so let’s save as much as we can. And they developed the way that makes function calls cheap enough that you can just make a one liner or they define functions that just pass through to other functions and like, it’s cheap. It doesn’t cost much. A lot of that is tail call optimization, that kind of thing. But thanks to them, we now have very fast function calls, and that’s a paper for another day.

Well, think that’s the end of this paper was called Lisp: A language for stratified design by Harold Ableson and Gerald J. Sussman published in August of 1987. An old paper now, and a good one. And I learned a lot about stratified design, what makes lisp special. I hope you did too.

You can subscribe to this podcast in Spotify, Apple podcasts, Overcast, whatever you like. Just search for Thoughts on Functional Programming.

You can also find. All those links at lispcast.com/podcast. There, you’ll find this, and all of the old episodes. Just a note, this is the start of season three. So it’s a big format change. I used to riff for short times on small questions, important questions in functional programming, but you know, in 15 minutes I could answer it.

And now I’m going into these papers. I feel like these papers need to have more light shined on them. And so here I am with a light and I hope you like what I’m doing. Let me know. You can get in touch with me. If you go to lispcast.com/podcast you’ll find links to get in touch with me on social media.

Well, this has been my thought, my big thought on functional programming. My name is Eric Normand. Thanks for listening, and as
always, rock on.

&nbsp;

The post Lisp: A language for stratified design appeared first on LispCast.

Permalink

PurelyFunctional.tv Newsletter 361: Tip: trampoline your tail recursion

Issue 361 – January 20, 2020 · Archives · Subscribe

Clojure Tip 💡

trampoline your tail recursion

A few issues ago, I made the big distinction in Clojure between non-tail recursion and tail recursion. Non-tail-recursion uses a stackframe per recursion, so you risk blowing the stack. You can replace tail recursion with the recur form, which will optimize your recursion into a loop—and hence no stack explosion.

However, there are some things that look like tail calls but aren’t really. For example, you can’t do a recur inside of a try. And recur only works with self-recursion (a function calling itself) and not tail calls in general. Those add to the stack, too. Is there a way to avoid a stack explosion where we can’t use recur?

Yes! It’s called trampoline. Trampolining is a general technique of simulating a loop with return values. You run a function, it returns the next function to run, and the trampoline runs it. It keeps bouncing until you stop returning the next function to run. In the case of trampoline in Clojure, it keeps bouncing until you return a value that is not a function.

Let’s take a look at a function that will continue to retry something in case of an error.

(defn retry-forever [f]
  (try
    (f)
    (catch Exception e
      (println "Got an error, retrying.")
      (recur f))))

This won’t work. The recur looks like it’s in tail position, but the try/catch needs to clean up before you return. You can’t use recur inside of a try to call the enclosing function. We could switch it out to do regular recursion, but then this would eventually blow the stack.

(defn retry-forever [f]
  (try
    (f)
    (catch Exception e
      (println "Got an error, retrying.")
      (retry-forever f)))) ;; may blow the stack

Things seem dire, but they’re not! A trampoline can help. Recall from above that a trampoline expects to be passed a function that is the next iteration of the loop. It will call that function. If it returns a function, it will call that, etc. Instant loop! All we have to do it make retry-forever return a function instead of recursing itself. In Clojure, this is a one-character change. See if you can spot the character.

(defn retry-forever [f]
  (try
    (f)
    (catch Exception e
      (println "Got an error, retrying.")
      #(retry-forever f)))) ;; return a function for the next 
iteration

Notice that we are now returning a function, which tells the trampoline what the next iteration of the loop is. retry-forever is not technically recursive now. It is not calling itself. All branches (the success and the catch branches) return. That means the stack frame will be popped before retry-forever is called again.

To use this, we just have to call it like this:

(trampoline (retry-forever f))

That’s great, but not terribly convenient. The caller shouldn’t really care how this is implemented. Let’s move the trampoline inside the function. We’ll do the important work in a helper.

(defn retry-forever* [f]   ;; helper
  (try
    (f)
    (catch Exception e
      (println "Got an error, retrying.")
      #(retry-forever* f))))
      
(defn retry-forever [f]
  (trampoline (retry-forever* f)))

Now the caller doesn’t even know it’s a trampoline.

But there’s another problem. What if f returns a function? trampoline is not smart enough to know that it is a special return value, not the next loop iteration. trampoline will call the returned function, like all the others.

The solution is to wrap it up in a non-function. Here’s one solution:

(defn retry-forever* [f]   ;; helper
  (try
    {:value (f)}
    (catch Exception e
      (println "Got an error, retrying.")
      #(retry-forever* f))))
      
(defn retry-forever [f]
  (:value (trampoline (retry-forever* f))))

Now the helper is either returning a map or a function, regardless of the return value of f. We can unwrap it when trampoline returns from its bouncing. Wrap and unwrap.

That will work, but there’s a built-in that is a better fit for indicating the end of a loop. It’s called reduced. It’s made for signaling to reduce that it can stop and return a value instead of continuing to the end of the sequence. We can use it here with its opposite, unreduced.

(defn retry-forever* [f]   ;; helper
  (try
    (reduced (f))
    (catch Exception e
      (println "Got an error, retrying.")
      #(retry-forever* f))))
      
(defn retry-forever [f]
  (unreduced (trampoline (retry-forever* f))))

You can use trampoline for optimizing mutual tail recursion as well. Mutual recursion is where a calls b and b calls a. They can’t be optimized by the JVM (yet!), so they too can blow the stack. But trampolining can help. Mutual recursion comes up a lot in things like recursive descent parsers. That’s a bit too much to go into now, but you can learn more about that parsing technique in this lesson.

One quick note about tradeoffs: trampolining is more expensive than tail call optimization. The function has to unroll the stack, the trampoline has to check the type, and then do another function call. Tail call optimization can be as quick as a jump instruction (GOTO). But on our platform, sometimes tail call optimizations are not possible.

Relevant courses 📚

If you like recursion, laziness, and reduce, check out these courses.

Book update 📖

Chapter 6 is out! Buy it now: Grokking Simplicity.

Chapter 6 is all about immutable disciplines you can apply to your code. The two disciplines are copy-on-read and copy-on-write. They let you have immutable data when your language doesn’t provide it.

Chapter 7 is finally coming along. Design is a very intuitive, feeling-led endeavor. It is not 100% objective or quantifiable. I’m focused now on showing how the reader can develop their intuition along the lines of stratified design.

I’m not sure anyone has written as deeply as I am about it—I haven’t found anything. There are articles written about it, but they are descriptive, not prescriptive. They characterize it, but don’t tell you how to do it yourself. This will be an important chapter to differentiate the book. Finally, a book teaching FP design!

You can buy the book and use the coupon code TSSIMPLICITY for 50% off.

Podcast episode🎙

I’ve revamped the format of my podcast. If you like reading old computer science papers to really understand how we got here (and sometimes uncover old gems), this podcast is for you.

In the first episode of the season, we read a paper from 1987 by Abelson and Sussman, the authors of SICP, called * Lisp: A language for stratified design*. The paper asks the question: “What is Lisp good for?” Read the paper, then follow along on the podcast.

And speaking of podcasts, I was a guest on Functional Geekery and Defn.

State of Clojure Community Survey 📋

Please support the ongoing development of Clojure by filling out the annual survey. There are still a few days left.

This survey has been going on for years and it collects very important statistics on the Clojure community. I find the information valuable myself. I can only imagine how important it would be for the folks in Clojure’s core team to know this stuff.

Please fill it out, regardless of how you use Clojure. It’s all great information.

Cognitect analyzes the data and produces a summary. Here is the one from last year. They also release all of the data. Daniel Compton also analyzes the free-text responses and writes a summary. Here is his from last year.

Clojure Challenge 🤔

Last week’s challenge

The challenge in Issue 360 was a function to find the coordinates of a particular value in a nested vector. You can see the submissions.

Peter Strömberg asks for comments on his implementation. You can leave comments on this or any gist in the gist itself. Please leave comments! We’re all here to learn.

This week’s challenge

Gapful numbers

Some numbers are gapful, some aren’t. 100 is gapful because it has at least three digits and 100 is divisible by 10 ((str 1 0), the concatenation of the first and last digit). That’s the definition of gapful: it has at least three digits and is divisible by the number formed by concatenating the first and last digits.

Create a function that takes a number and finds the closest gapful number. The function should return its argument if the argument itself is gapful. And if there are two gapful numbers equidistant to the argument, return the lower one.

Thanks to to this site for the challenge idea!

As usual, please reply to this email and let me know what you tried. I’ll collect them up and share them in the next issue. If you don’t want me to share your submission, let me know.

Rock on!
Eric Normand

The post PurelyFunctional.tv Newsletter 361: Tip: trampoline your tail recursion appeared first on PurelyFunctional.tv.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.