Exploring programming in Thamil (not English) through Clojure

Or: A clear example of what macros can do

Introduction

I started working on a library call clj-thamil that I envision as a general-purpose library for Thamil language computing (ex: mobile & web input method), but a slight excursion in that work has led me to some very deep, intriguing ideas — some of which are technical, and some of which are socio-cultural. But they all fit together in my mind — Clojure, macros, opportunity and diversity (in computing), and the non English-speaking world.

I think that the implications are things that we should all think about. But if nothing else, hopefully you can read this account and understand something about macros — the kind of power they uniquely provide and at least good one use case where they are necessary.

Technical Aspects

How does one even begin??

I tried starting on this Thamil language project a year ago, but I immediately shelved it and left it alone for a large majority of that time. Why? I couldn’t find an editor that would support programming and typing of Thamil characters properly at the same time.

(FYI: The standard spelling is Tamil since British colonial times, but it is pronounced “Thamil”.)

I’m using Mac OS X, which has been supported Unicode well. Thamil, like other South & Southeast Asian languages, are set up in Unicode so that most of their letters [human language elements] require more than 2 characters [computer memory storage type]. Character is not synonymous with letter. For example, the letter கி in Unicode is the combination of the characters க + ி. But the rednering of the character ி is not an actual letter in the Thamil language. Also, both characters have to be treated together as a unit by the OS as well as the applications rendering the text — basically, the stack between storage and user interface — for க + ி to be recognized as being side-by-side and converted into a different shape, கி. Many Mac-native applications and editors like TextEdit handle this by default. Many programming-specific editors are cross-platform and/or non-native, so even their ports to Mac OS X don’t use the OS-support required for proper rendering. Neither Emacs for OS X, Eclipse, IntelliJ, jEdit, or a couple of other programming text editors “worked” – got OS support to combine characters. I basically gave up, but 9 months later, I tried Aquamacs on a whim, and it worked!

Java, Unicode, and Clojure

Java was designed to support Unicode from the beginning. And by that, they mean that instead of a character being an 8-bit ASCII element, characters in Java are 16-bits as defined by the original Unicode spec. Since Clojure emits byte code that runs on the JVM, it also supports Unicode by default. What that means is that you can start using symbols (‘variable names’) where the characters come from ranges designated for other languages without problems. So the following works fine:

(def π 3.14159)

From functions to macros

Clojure, like any language that supports a functional programming paradigm, has functions as first-class values. The interesting part is that we can store first-class values, since we can take any function and create a new binding (different ‘name’) whose value is now equal to the original function.

(def ιитєяρσѕє interpose)
(ιитєяρσѕє "," ["one", "two", "three"])

So now we can ‘translate’ function names, even if superficially. So how far can we go? The core library of Clojure operations come from special forms, functions, and macros. Special forms and macros can’t resolve to values, though, so if we can find a different way to “translate” them, we can use that to pull off a fairly extensive translation of Clojure from English to an entirely different human language (aka “natural language”).

What are macros?

In essence, a macro is special type of function where the input is some block of code, and the output is a block of code. As a result, macros are run in a special way — they are run on the code blocks before the contents of that code block get evaluated.

This enables macros to abstract out code reptition in ways that regular functions can’t. Basically, if you see any code repetition whatsoever, and if that can’t be helped by better code design and refactoring the repetitious code into a new function, then a macro will be your answer. My favorite example is the with-open macro, which gracefully handles try-catch-finally blocks for I/O objects with minimal code. Doto and its ‘fancier’ cousins, the threading macros -> (“thread first”) and ->> (“thread last”), are also good examples.

Translating macros and special forms using macros

Macros operate code at a ‘higher’ level than your regular function — we’re looking at the input code blocks to the macro as a bunch of shapes that we manipulate using the macro. Basically, we’re looking at the text of the code and treating it as data to operate on, before we take the result and then evaluate it like regular code.

So at this level on which macros operate, we can do the following to pull off our ‘translation’ idea for the special forms and macros: create a macro that takes whatever was given to it and pass it along verbatim to some other special form/macro.

As an example, if I take the Thamil word for ‘if’ – ‘எனில்’, then I want to create a macro where whatever I pass to ‘எனில்’ — (எனில் . .. …) — gets passed verbatim to ‘if’ — (if . .. …). And it turns out to be simple:

(defmacro எனில்
  [& body]
  `(if ~@body))

The code says to take the code block passed to ‘எனில்’, package them up into an array of shapes of code called ‘body’, and then unwrap the code shapes into a call to the ‘if’ function.

So we’re done! Right? All we have to do is just list out all of the functions, macros, and special forms to translate in this manner, and we will be done:

(def take எடு)
(def drop விடு)
...
(defmacro எனில்
  [& body]
  `(if ~@body))
(defmacro வரையறு
  [& body]
  `(def ~@body))
...

Macros, macros, everywhere!

That seems tedious. Inefficient. There is a lot of repetitive code here (the “def”, the “defmacro”, the shape of the defmacro definition, etc.). And we can’t really write a function to refactor out the repetitive code. But I just said that this is the kind of case that a macro can solve.

Once you strip out the repetitive code, all you are left with is:

take எடு
drop விடு
...
if எனில்
def வரையறு
...

This looks like a couple of maps, which makes sense. We’re associating an English word to a corresponding Thamil one. We need to represent the words as symbols so that the Thamil words don’t get evaluated. Putting a single quote (‘) in front of the words convert them into their symbol forms:

{'take 'எடு
 'drop 'விடு
 ...}
{'if 'எனில்
 'def 'வரையறு
 ...}

I’ll fast-forward through the details and say that you can see the final macros that take a map of symbols (symbol of the English name mapping to the symbol of the Thamil name). And you can see the progression of steps that it to get there in the linked slides.

The final results — programming in Thamil

And here is a namespace of functions that are written in Thamil that do basic natural language operations (pluralizing a noun, adding noun case suffixes). The pluralizing function looks like this:

(வரையறு-செயல்கூறு பன்மை
  "ஒரு சொல்லை அதன் பன்மை வடிவத்தில் அக்குதல்
  takes a word and pluralizes it"
  [சொல்]
  (வைத்துக்கொள் [எழுத்துகள் (சரம்->எழுத்துகள் சொல்)]
    (பொறுத்து

     ;; (fmt/seq-prefix? (புரட்டு சொல்) (புரட்டு "கள்"))
     (பின்னொட்டா? சொல் "கள்")
     சொல்

     (= "ம்" (கடைசி எழுத்துகள்))
     (செயல்படுத்து சரம் (தொடு (கடைசியின்றி எழுத்துகள்) ["ங்கள்"]))

     (மற்றும் (= 1 (எண்ணு எழுத்துகள்))
            (நெடிலா? சொல்))
     (சரம் சொல் "க்கள்")

     (மற்றும் (= 2 (எண்ணு எழுத்துகள்))
            (ஒவ்வொன்றுமா? அடையாளம் (விவரி குறிலா? எழுத்துகள்)))
     (சரம் சொல் "க்கள்")

     (மற்றும் (= 2 (எண்ணு எழுத்துகள்))
            (குறிலா? (முதல் எழுத்துகள்))
            (= "ல்" (இரண்டாம் எழுத்துகள்)))
     (சரம் (முதல் எழுத்துகள்) "ற்கள்")

     (மற்றும் (= 2 (எண்ணு எழுத்துகள்))
            (குறிலா? (முதல் எழுத்துகள்))
            (= "ள்" (இரண்டாம் எழுத்துகள்)))
     (சரம் (முதல் எழுத்துகள்) "ட்கள்")

     :அன்றி
     (சரம் சொல் "கள்"))))

Commentary on macros and state

Because macros aren’t values like numbers, strings, and functions are, you can’t compose them. Once you use a macro, you might end up having to use more macros around it (ex: you can’t pass it around to existing higher-order functions). Our use case is an example of that. So use macros sparingly, as a last resort. Prefer using functions — they compose and can be passed as arguments to other functions. This why I have a separate macro for translating function names, even though the macro for translating the names of macros and special forms alone is sufficient.

While the benefit of only needing a map of symbols can be viewed as simplicity or elegance, it is the result of an instinct about programming in general imparted by Clojure’s design to try to isolate state and operate on it with a toolset of composable functions. It’s a mindset that keeps paying dividends.

Technical implications

Since the only Thamil-specific information required to effect the “translation” is stored in just 2 maps, does this mean that we can use the same strategy for any other language? Sure! Why not? As far as Java is concerned, all of the characters it sees when it parses code are 16-bit Unicode characters/codepoints. It doesn’t know which range the codepoints fall in, or even how they have to be handled by the OS and applications to appear properly. So, nothing is Thamil-specific.

Also, it’s important enough to be worth pointing out, even if it is obvious to you, that none of the macro code here required modifying Clojure as a language, or the Clojure parser or compiler. This is all “user-level” code. And yet, we’ve created what is truly an entirely new programming language. I can create code that is entirely in Thamil without knowing that Clojure / Lisp exists underneath. Cascalog is another favorite example of mine of what creating a new language on top of Lisp looks like that is written using “user-level” Lisp code, even though it doesn’t quite syntactically resemble the core Clojure / Lisp that it is based on. The power to shape your language to suit your needs, even if it starts looking like another language, is the power that macros give you. And this is why Paul Graham’s book about Lisp is called On Lisp — the title emphasizes that Lisp lets you write new languages on top of Lisp.

Technical gaps and future possibilities

The method for translation is not a true translation, as you can tell. It’s cosmetic. So there are a few places where our abstraction fails:

  • Clojure is based on Java (it runs on the Java Virtual Machine).
    Since Java is written entirely in English, any Java interop from
    Clojure will require English. Also, stack traces and error messages
    will all be in English
  • The translation of functions is done by assigning existing Clojure
    functions to Thamil sybols because functions are values. This means
    evaluating the value of a Thamil symbol referring to a function
    will use the name of its value — the (English) name of the Clojure function
  • The namespace bootstrapping problem — in order to use Thamil names
    in a namespace, you need to ‘import’ (require, in Clojure parlance)
    the namespace that contains the translations (here, clj-thamil.core).
    But until those translations are imported (‘required’), they aren’t available, so
    the require statement has to be in English. If namespace
    sounds like a weird concept, think of it like a package, module, or
    file.
  • Things like literals (true/false, special keywords in Clojure macros
    like :as, :refer, :keys) would have to be translated at read time. Numbers represented in other languages’ numerals would need their own logic to interpret. The boolean values true and false are tricky since they represent the Java values, so if they are returned by a Clojure function, how could you change that behavior? Change the Clojure function to return a different, equivalent value? Then create your own implementations of translations of true?, false?, nil?, and if to use your new booleans (and redefine if to point to your translated if)? At which point, you would need to re-evaluate all of the functions/macros that use if (ex: when, if-not, and) before re-evaluating your translations

Some of these issues might be solved by modifying the Clojure reader, which some projects already do. Another idea is to localize the source code for Clojure itself somehow. I would consider exploring how far the first idea can take you. The second approach seems like it would be near-comprehensive, but also a lot of difficult work that risks obsoletion when the language changes. Fortunately, Clojure as a language is “stable” as I see it — the design is carefully thought out and controlled in a consistent and cohesive way. Changes are usually additions to the language or implementation details, making most code forward-compatible (including all of the code used here).

Social and Cultural Importance

There are a lot of implications of creating the ability to program in another human language which, I think, in the balance is a net positive for the world. The most obvious point is that English is not the primary language for most of the world.

For all the kids in the non-English speaking world, especially the ones in non-Western / non-developed countries, learning to program means having to learn and think in a second language in order to learn programming and write code in a programming language. Even in a place like Southern India, which is a hotspot for programming work, this creates a challenge for kids who do not enjoy the privilege of access to good English education but who still want to program (and get lucrative jobs). The divide is clear; even the state government of Tamil Nadu, where Thamil speakers live, which also creates the Thamil language textbooks and distributes them for free to all grade school students, uses screenshots of the default English interface of basic computer software as part of its computer/technology books (at least when I last checked). Of course, the hands-on classes would be more of the same. Students who aren’t fluent in English by their teenage years manage by memorizing which clicks of which icons and UI elements do what they need. The presence of an error dialog box may tell you that something is wrong, but being able to read the text of the error message, comprehend it (along with the jargon), and take actions accordingly is a different task altogether.

The task of learning programming is hard enough. It is a technical area that requires learning a separate vocabulary. It is an abstract subject that is not necessarily easy to explain. Having programming in someone’s first language allows that person to deal with only the concepts of programming when learning it. And through different human languages, we may open up different approaches to programming than with what we get through just English. What does it mean to write code that mimics human language when your language isn’t subject-verb-object (SVO), but instead subject-object-verb (SOV), as is most common in the world’s languages. Does OOP make more intuitive sense to people who speak SOV languages? What about Clojure/Lisp? In my limited experience of programming in Thamil in Clojure, it feels pretty similar. Human languages that start with a verb are rare, so in one way, you could say that Lisp is equally strange to most people. But the fact that there is less syntax to learn, and that the rules of the language are few and simple, and the fact that the code that you write fits the contours of the problem domain that you’re trying to solve, contribute to the experience of Clojure in Thamil being similar to Clojure in English.

The tech industry, as epitomized by Silicon Valley, has been recently contemplating its lack of diversity — an overwhelming number are white and/or male in leading companies and startups. I’m happy to see the small, growing, wonderful efforts to address the inequity in various programming circles. But the clj-thamil project has helped me take a step back and think about addressing the segment of programmers who not only lack the privileges of others in an American context, but to address those who do not enjoy such privileges in a global context — their language, their region’s wealth, and their personal wealth.

The privileges that we enjoy in the English-speaking world should not enable us to rationalize away these differences and privileges, though. Some people might think that, perhaps, the world would be better off if everyone were to speak one language. But suppose we did. Which language would be that one language? Chinese, because it is spoken in the most populous country? Or English, because it is spoken by more people and in disproportionately wealthy countries (a legacy of unfair colonial conquests)? Esperanto, an artificial language that inherits many aspects from European languages? There is no way to decide which language is universal without establishing more inequity. Also, selecting a universal language would erase cultural and geographic knowledge (and diversity in lifestyles!). And barring these concerns, if there were magically some agreeable universal language, and given a medium that could globally connect the world instantly (ex: the internet), that universal language would still fragment along geographic and socio-economic lines because humans naturally maintain differences to mark these distinctions.

Along the lines of what Bret Victor said in Inventing on Principle, I hope that we can properly enable programming for all the people around the world in the language that think most easily in, since that would be a form of expression that we are opening up and allowing to flourish.

Permalink

Emacs: Down The Rabbit Hole

So I wrote Welcome to The Dark Side: Switching to Emacs in response to a tweet, but as any of my co-workers will attest, it doesn’t take much to get me started on my Emacs workflow.

I feel like a religious convert… I miss those simple, unadorned Vim services but I’m floored by the majesty of the stained glass and altar dressings and ritual of the Church of Emacs.

So before the jump, in the spirit of “I’ll show you mine if you show me yours,” my .emacs.d.

An Unexpected Journey

I lived in my happy little Vim hobbit hole, smoking my pipe and enjoying my brandy. It was not a dirty hole, or a sandy hole, but a hobbit hole, which means comfort.

One day, a wizard visted me.

And that’s when things began to get weird…

Okay, so maybe I didn’t receive a visit from the revenant spirit of John McCarthy, ghost of programming past, present and future. Or maybe I did.

Maybe Paul Graham just convinced me I was coding in Blub, for whatever value of Blub I happened to be using.

See, the thing about Blub is it’s a mutable value. When you’re using C++ and Java comes along, you realize C++ was actually Blub. When you’re using Perl for your day-to-day and discover Python, and then Ruby, you realize that not only was Python Blub, but Perl was an even Blubbier Blub.

Ruby… oh, Ruby. I still love Ruby. But then something happened.

I need to backpedal a bit.

There’s using a language, and then there’s building something in it. I’d played with Scheme (SICP is wonderful), and even Common Lisp, and I knew enough to appreciate the Lisp-nature of Ruby which, when combined with its Smalltalk-nature, I thought made for hte perfect productive language.

But see, I was building things in Ruby while I was playing with Lisp.

Along comes Clojure.

I was working in a pretty isolated programming role that granted me a lot of de facto autonomy. So when I got a request for a new service, I thought “why not Clojure?”

We’re in late 2012 here, so bear with me.

My first Clojure project ran like a champ, was hailed as an unqualified success. Eventually I even blogged about a piece of that project that handled datetimes.

Fast-forward to the present, I’ve written Clojure in Sublime Text, Atom, mostly Vim with the help of some awesome plugins from Tim Pope.

Like I mentioned before, I’ve had a religious hatred for Emacs since the mid-1990s when I entered the *nix world and got involved in USENET.

The war is far from over…

…but, I digress.

I started the Baltimore Clojure Meetup and met more Emacs users than I had in one place in a long time. Again, I dismissed Emacs.

That is, until I found LightTable completely b0rked again and threw up my hands.

Perhaps I shouldn’t have eaten my hands to begin with… sorry, equivocation humor. Can’t resist.

Welcome to Emacs

So yeah, I went over my starter packages in the earlier post, but I didn’t talk about the full experience of discovery I underwent when I fully committed to emacs.

Sure, there’s the whole cider-mode and cider-jack-in and cider-nrepl and even cider-scratch that make LightTable’s inline evaluation modes look like child’s play (no offense to Chris Granger, LightTable is beautiful, I love it, but… y’know, Emacs).

So I did those things, started with Prelude, added all the Clojure fun I could find, and got to work.

I also subscribed to /r/emacs, and did a little reading on the Emacs Wiki.

Have you ever been comfortably reading (or coding) under a tree, and you see a white rabbit in a waistcoat with a pocket-watch run by complaining he’s late?

Thus such adventures begin.

EAT ME / DRINK ME

As I fell to the bottom (or so I thought) of the rabbit-hole, I found a bottle of cider labeled Drink Me, and so I drank the cider. Suddenly, I could eval Clojure inline, jump to docstrings, jump to source for a fn, and it was wonderful.

The last time I tried Emacs, I always joked about how I was using Emacs but always edited my .emacs config with Vim.

“Not this time,” I thought, and used Projectile to manage my .emacs.d and edited my user.el in Emacs. Oh, it was better! Then, thought I, I should put my .emacs.d in source control (actually, it was demanded:

…yeah).

But then I realized I was doing the ⌘-Tab to iTerm to run git ci -a (I pity the fool that doesn’t alias common git commoands) in… wait for it… $EDITOR=/usr/bin/vim.

That’s when I found a bit of fairy cake called magit, and I ate a bit of that and my Git workflow was inside of Emacs. Now it was a simple M-x magit-status to view my working tree state, where I could hit s to stage files for commit, and C-c C-c to commit changes, and P P to push.

Oh, it’s beautiful.

Curiouser and Curiouser

Well, if Emacs can handle my Git workflow, what can’t it do, I wondered?

I went a bit mad playing with multiple buffer and frame layouts; on one occasion I opened a shell inside an emacs biffer and launched the command-line version of emacs in a shell inside the windowed version of emacs.

Recursive rabbit holes.

When you’re running the Cocoa-nested version of Emacs (not Aquamacs, fuck that noise, but just GNU Emacs packaged as a .app), you get some suggestions from the menus. Gnus for USENET or email, various games, a calendar…

Calendar?

That’s whan I discovered Org-Mode.

Org-Mode FTW

Org Mode is an Emacs major mode that lets you organize your life. All of it. I’m not even going into detail here, it’s a deep, deep well. You can use it for a TODO list, sync it with your phone, use it to write your Octopress blog.

(Confession: This blog is powered by octopress, and although it’s now written in Emacs, I’ve not gone full crazy and started composing it with Org-Mode)

Twittering-Mode WTF

That’s when I started going down the tunnel of “well, what else can it do?”

And I discovered twittering-mode.

A quick M-x package-install RET twittering-mode puts a Twitter client in your text editor. Like you always needed. M-x twit will jump you right into your Twitter feed, i will enable user icons (yes, user avatars right in goddamn Emacs), and u will jump you to a buffer where you can compose a Tweet and hit C-c C-c to send it.

Playing Games

I’d be remiss if I didn’t mention that M-x package-install RET 2048-mode will install a game of 2048 in Emacs. Because that’s really fucking important, you know?

Sigh

For good reason, Emacs comes standard with an AI psychotherapist named Eliza.

A quick M-x doctor and you’re in therapy.

Which you’ll probably need.

…and Much, Much More

I’ve barely scratched the surface, but I feel like this post is long enough. There’s so much down here, down the Emacs rabbit hole, that it will probably take me weeks to even catch up to whre I am right now; what I’ve described so far is my first few days with this operating system text editor.

But it’s a fun ride.

Postscript

Sorry for the Tolkien digression when my dominant allusion was Alice in Wonderland… Emacs is a weird place.

Permalink

verbs, nouns and file watch semantics

I've recently had a fascination with file watchers semantics in clojure libraries. Having trialed bunch of them in the past, I decided that it was time to have a go at one myself and wanted to share some of my thoughts:

Typically file watchers are implemented using either one of two patterns:

  1. verb based - (add-watch directory callback options)
  2. noun based - (start (watcher callback options))

The difference is very subtle and really centers around the verb start. If the verb start does not exist, we can treat the file-watch as an action on the file object. However if it does exist, we can treat the file-watch as an object or these days, a stuartsierra/componentisable thing. My preference is for the verb style, though it really depends on how the functionality fits within a bigger application. Currently, most bigger applications revolve around the component/dependency injection pattern so it makes sense to have something be componentisable as well.

A survey of existing clojure file-watch libraries and their semantics yield the following results:

java-watch (verb based)

(use ['com.klauer.java-watcher.core :only [register-watch]])
(register-watch "/some/path/directory/here" [:modify] #(println "hello event " %))

dirwatch (verb based)

(require '[juxt.dirwatch :refer (watch-dir)])
(watch-dir println (clojure.java.io/file "/tmp"))

panoptic (noun based)

(use 'panoptic.core)
(def w (-> (file-watcher :checksum :crc32)
           (on-file-modify #(println (:path %3) "changed"))
           (on-file-create #(println (:path %3) "created"))
           (on-file-delete #(println (:path %3) "deleted"))))
(run-blocking! w ["error.log" "access.log"])

clojure-watch (verb based)

(use 'clojure-watch.core)
(start-watch [{:path "/home/derek/Desktop"
               :event-types [:create :modify :delete]
               :bootstrap (fn [path] (println "Starting to watch " path))
               :callback (fn [event filename] (println event filename))
               :options {:recursive true}}])

ojo (noun based)

(defwatch watcher
  ["../my/dir/" [["*goodstring.dat" #"^\S+$"]]] [:create :modify]
  {:parallel parallel?
   :worker-poll-ms 1000
   :worker-count 10
   :extensions [throttle track-appends]
   :settings {:throttle-period (config/value :watcher :throttle-period)}}
  (let [[{:keys [file kind appended-only? bit-position] :as evt}]
        *events*]
    (reset! result
            (format "%s%s"
                    (slurp file)
                    (if appended-only? "(append-only)" "")))))

watchtower (noun based with implicit start)

(watcher ["src/" "resources/"]
  (rate 50) ;; poll every 50ms
  (file-filter ignore-dotfiles) ;; add a filter for the files we care about
  (file-filter (extensions :clj :cljs)) ;; filter by extensions
  (on-change #(println "files changed: " %)))

filevents (verb based)

(watch (fn [kind file]
         (println kind)
         (println file))
       "foo" "bar/")

Out of all the watchers, ojo is seriously cool. I only properly looked at it after finishing my own file watcher and that would be my recommendation if anyone wants an industrial strength watcher.

another file watcher?

Yep. Though it was more of an exercise in design than anything performance based. I'm on a Mac and I chose to wrap the java.nio.file.WatchService api already done to death by many of the file-watch libraries before me. I'm hoping that in a year or two's time, they can replace the poll-based approach with something quicker. The lag-time for events is devastatingly slow. I often found myself starting up the file watcher, creating a new file for a test then having to wait... and wait... I twiddle my thumbs for a bit, sometimes going for a cup of tea. On coming back, I find that the :create file event had successfully registered. Granted, my laptop is a bit old, but I'm extremely disappointed with the WatchService performance.

watch the concept

clojure already has an add-watch function based around refs and atoms. There's a pattern that already exists:

(add-watch object :key (fn [object key previous next]))

However, add-watch is really quite a generic concept and it could be applied to all sorts of situations. Also, watching something usually comes with a condition. We usually don't react on every change that comes to us in our lives. We only react when a certain condition comes about. For example, in our everyday lives, I get told all the time to:

"Watch the noodles on the stove and IF it starts
  boiling over, add some cold water to the pot"

So this then becomes a much more generic concept:

(add-watch object :key (fn [object key previous next]) conditions)

In Orwell's 1984, there is a concept of Newspeak where by the vocabulary used by the populous become increasingly controlled by the party such that most ideas can be conveyed using a very limited subset of the language. In this way, individual thought and expression becomes non-existent and thus allows the party greater control over the population as well as provides a source of unity. In our society, newspeak is more subtle, though it influences through collective mindshare. Most corporate jargon can also be considered a form of newspeak.

I tend to get conflicted when I program because the key to having greater control is in the limitation of language. So culling words and combining two concepts into one word is very powerful as a strategy for control, though it may not be such a great thing for humanity in general.

Anyways... the concept of adding options to watch was implemented as a protocol and realised in hara.common.watch:

(require '[hara.common.watch :as watch])
(let [subject  (atom {:a 1 :b 2})
      observer (atom nil)]
  (watch/add subject :clone
             (fn [_ _ p n] (reset! observer n))

             ;; Options
             {:select :b   ;; we will only look at :b
              :diff true   ;; we will only trigger if :b changes
              })

  (swap! subject assoc :a 0) ;; change in :a does not
  @observer => nil           ;; affect watch

  (swap! subject assoc :b 1) ;; change in :b does
  @observer => 1))

So the watch/add, watch/list and watch/remove implementations extends the functionality of existing atoms and refs as well as allow more semantics so that other data-structures can also take advantage of the same shape in semantics. watch/add follows the same structural semantics as add-watch. They are then implemented as protocols around the java.io.File object in hara.io.watch:

(require '[hara.io.watch])
(require '[hara.common.watch :as watch])

(def ^:dynamic *happy* (promise))

;; We add a watch  
(watch/add (io/file ".") :save
           (fn [f k _ [cmd file]]

             ;; One-shot strategy where we remove the 
             ;; watch after a single event
             (watch/remove f k)
             (.delete file)
             (deliver *happy* [cmd (.getName file)]))

           ;; Options
           {:types #{:create :modify} 
            :recursive false
            :filter  [".hara"]
            :exclude [".git" "target"]
            :async false})

;; We can look at the watches on the current directory
(watch/list (io/file "."))
=> {:save function<>}

;; Create a file to see if the watch triggers
(spit "happy.hara" "hello")

;; It does!
@*happy*
=> [:create "happy.hara"]

;; We see that the one-shot watch has worked
(watch/list (io/file "."))
=> {}

but what about components?

It was actually very easy to build hara.io.watch using the idea of something that is startable and stoppable. watcher, start-watcher and stop-watcher all follow the conventions and so it becomes easy to wrap the component model around the three methods:

(require '[hara.component :as component])
(require '[hara.io.watch :refer :all])

(extend-protocol component/IComponent
  Watcher
  (component/-start [watcher]
    (println "Starting Watcher")
    (start-watcher watcher))

  (component/-stop [watcher]
    (println "Stopping Watcher")
    (stop-watcher watcher)))

(component/start
  (watcher ["."] println
           {:types #{:create :modify}
            :recursive false
            :filter  [".clj"]
            :exclude [".git"]
            :async false}))

but I want stuartsierra/components!

Okay, okay... I get it.

(require '[com.stuartsierra.component :as component])
(require '[hara.io.watch :refer :all])

(extend-protocol component/Lifecycle
  Watcher
  (component/start [watcher]
    (println "Starting Watcher")
    (start-watcher watcher))

  (component/stop [watcher]
    (println "Stopping Watcher")
    (stop-watcher watcher)))

(component/start
  (watcher ["."] println
          {:types #{:create :modify}
           :recursive false
           :filter  [".clj"]
           :exclude [".git"]
           :async false}))

Permalink

Onyx 0.4.0: DAGs, catalog grouping, lib-onyx

image

I’m pleased to announce the release of Onyx 0.4.0. It’s been about 6 weeks since Onyx was released at StrangeLoop. I’ve been quiet on the blog, but steadily grinding out new features and bug fixes. Onyx is already starting to make it’s way into non-critical production systems in the field. This release of Onyx is a massive step forward, shipping fundamental advancements to the information model. The release notes are here, but I’d like to take you on a tour of exactly what’s new myself.

Directed Acyclic Graph Workflows

The biggest change coincides with Onyx’s workflow representation. Originally, Onyx’s workflow specifications were strictly outward branching trees. Starting in 0.4.0, you can also model directed, acyclic graph workflows to join data flows back together using a vector-of-vectors. This new type of workflow makes it natural to write streaming joins on your data set. Onyx remains fully backwards compatible with the original version. Check out the comparison in the docs.

image

A DAG example. Inputs A, B, and C. Outputs J, K, and L. All other nodes are functions.

Catalog-level Grouping

Prior to 0.4.0, Onyx featured grouping and aggregation through special elements in a workflow. In an email exchange with David Greenberg, it suggested that grouping could instead by specified inside of the catalog entry. I gave this some serious thought, and realized that grouping at the level of a workflow is a form of structural complecting. Data flow representation ought to be orthogonal to how Onyx pins particular segments across different virtual peers. I present to you two new ways to do grouping in a fully data driven manner: automatic grouping by key, and grouping by arbitrary function. Aggregation now becomes an implicit, user level activity at each task site in the workflow. This change significantly refines Onyx’s information model with respect to keeping anything that’s not structure out of the workflow. Thanks David!

image

An Onyx workflow for word count. Note the :onyx/group-by-key :word association. This automatically pins all segments with the same value for :word to the same virtual peer. That means each peer gets all segments with a particular value assigned to it, and the “count” of each word is correct with respect to the entire data set.

lib-onyx

My final piece of exciting news is the announcement of a new supporting library - lib-onyx. Onyx has been built from the ground up on the notion that you can combine a sound information model with the full power of Clojure - everywhere. It’s no surprise that shortly after launching, reusable idioms started springing up across different code bases. lib-onyx is a library that packages up common techniques for doing operations like in-memory streaming joins, automatic message retry, and interval-based actions. You can use all of these operations today by adding lib-onyx to your project and adding a few extra key/values to your catalog entry. Contrast this composability with Apache Storm’s Tick Tuple feature. Just like core.async changed the Clojure world without touching the language itself, neither did Onyx need any implementation adjustments. lib-onyx is just a library.

Conclusion

That’s all I have for now. Thank you so much to everyone who has helped me since Onyx launched. I’d like to thank Bruce Durling, Malcolm Sparks, Lucas Bradstreet, and Bryce Blanton for their contributions to the 0.4.0 release. Now we turn our attention to 0.5.0 - especially exciting things will be happening in the next few months. Stay tuned, friends!

Permalink

What language should you learn?

I travel a lot these days. I'd call myself a “digital nomad” as a shorthand, if there was any way to say it without sounding impossibly smug. Let's just say I'm homeless but employed and my wife and I live in AirBnbs.

One of the challenges of moving around so much is dealing with language barriers. For the most part, even in places where English isn't widely understood, it's perfectly possible to get whatever you need with gestures, chief among them pointing and holding up money. It's the little things that are harder when you can't speak the language.

By way of example, I spent much of today wandering the streets of Istanbul in search of somewhere I could buy a simple envelope, because it turns out that without a Staples around I'm completely incapable of purchasing office supplies. I bet somewhere there's a whole bazaar full of old men with long grey beards flaunting staplers and paper clips – that's how it seems to go here – but I didn't happen across anyone who spoke enough English to ask.

So, I really prefer having at least some grasp of the language of wherever I'm going, but learning a language is pretty tough work, so being a nerd as well as a clueless putz I decided today to compute emperically which languages are the best to learn, if one were to, hypothetically, travel around randomly on the basis of where has the cheapest airline fares from here.

Get to the point, dammit.

I've been wondering this for a little while, but it turns out to be a trending topic on Quora today too, so I thought I might as well put the effort in and just get some numbers. Specifically, one of the answers on Quora linked to this page, listing the most widely spoken languages in the world along with which countries they're spoken in. It might seem that most of the work is done here, but the problem not addressed by this list is that of overlap – if you came to Canada after learning French on the strength of this list, you'd probably be mighty disappointed.

The part of the process where we debate the merits of different ways of measuring the number of speakers of a language in which country was definitely something I wanted to avoid, so this is perfect. We have a simple job: take this data, and figure out which sets of N languages cover the most countries.

Source Code

If you're interested in the code, I've put the commented clojure source in this gist for your perusal.

The Results + Discussion

If you can only learn two languages, you should learn English and French. Here are the top pairings:

Pairing# Countries
English, French95
Arabic, English91
English, Turkic90
English, Spanish89
English, Portuguese79
English, Russian79
Persian, English77
German, English75
Italian, English74
Dutch, English73
Chinese, English71
Indonesian, English71
Tamil, English71
English, Swedish70
English, Romanian70
Bengali, English69
English, Hindi69
Turkic, French54

I took an arbitrary sample there because it's interesting to me that the top pairing without English (Turkic + French) gets you by in significantly fewer countries than just English. Lucky us.

Boringly, the top result is what you would predict from the Wikipedia article anyhow. I thought there might be more overlap between English and French, but perhaps that's just because I'm so used to it being Canadian. Actually, most of the results are just English + (other languages in descending order of speakers).

However, this is good news for us native-english-speakers: French and English actually overlap a lot linguistically. About 30% of English words have French roots.

What about learning three languages? Perhaps the results of that will be less boring. If you're more on-the-ball, here's how you'll do:

Languages# Countries
English, Turkic, French117
English, Spanish, French115
Arabic, English, French114
Arabic, English, Spanish112
English, Spanish, Turkic111
Arabic, English, Turkic111
English, Russian, French106
English, Portuguese, French105
Persian, English, French104
Arabic, English, Portuguese102
English, Turkic, Portuguese101
Arabic, English, Russian101
English, Russian, Spanish100
Italian, English, French100

There you have it: your third language should be Turkic. It makes sense, given the small overlap between Arabic and French in northwest Africa.

I'm most intrigued by the English-Spanish-French pairing, actually. There's a lot of overlap between Spanish and French too, so this is almost certainly the easiest triple to learn for native English speakers.

So there you have it: learn you some French. Bonne chance, et au revoir!

Permalink

Clojure DevOps Engineer, Diligence Engine, Toronto or Remote

DiligenceEngine is a Toronto-based startup using machine learning to automate legal work. We’re looking for a DevOps engineer to help us manage and automate our technology stack. Our team is small, pragmatic, and inquisitive; we love learning new technologies and balance adoption with good analysis. We prefer to hire in the Toronto area, but also welcome remote work in a time zone within North America.

Full job listing at their blog: We’re hiring a Clojure engineer!


Permalink

Reify This!

On the way home this afternoon I was asked to explain Clojure’s reify macro, and apparently I did quite well, as an “Aha!” moment resulted. So I shall endeavour to explain reify here in the hope that such a moment might be available to others.

Reify derives from the Latin res, or “thing.” So reify fundamentally means “make a thing out of….

Protocols and Datatypes

Clojure protocols are similar to Java interfaces: They define a set of methods/functions purely by their signatures without providing implementation details. Declaring that a class implements an interface (in Java) or that a record implements a protocol (in Clojure) is a contract that specifies that the given class or record, in order to be valid, will provide concrete implementations of those methods/functions.

But sometimes we don’t need a reusable entity with reusable implementations that we can instantiate willy-nilly; sometimes we just need a thing that implements those methods.

In Java, anonymous inner classes can fulfill this purpose. In Clojure, we have reify.

That Nameless Thing

OK, it’s not really going to be nameless… let’s say we have a putative protocol as follows:

1
2
3
4
(defprotocol Foo
    (bar [this])
    (baz [this st])
    (quux [this x y]))

So if we were creating a new record, we might do:

1
2
3
4
5
(defrecord FooRecord
    Foo
    (bar [this] (println this))
    (baz [this st] (str this st))
    (quux [this x y] (str this (* x y))))

Which is perfect if we need to repeatedly instantiate a FooRecord that implements the Foo protocol. But sometimes we just need a Foo and be done with it. And so, Clojure gives us reify.

One-Off Things

Instead of creating a defrecord (I’m going to leave the issue of runtime class generation for another post), we have the option of creating an individual, unique object that implements the desired protocol via reify.

Like so:

1
2
3
4
5
(def athing
  (reify Foo
    (bar [this] (println this))
    (baz [this st] (str/replace this (re-pattern st)))
    (quux [this x y] (str this (+ x y)))))

Now I have athing that implements the Foo protocol in a manner appropriate to its context, I don’t have to worry about declaring a general case (class, or defrecord), and I can use this object while it’s handy and let it get GC’d when I’m done with it.

Incomplete, and Mostly Wrong

This is a really brief description of the reify macro, and more details are available in the Clojure Grimoire. But it apparently clarified things for one person, so I thought I’d share it here.

But in the words of Steve Jobs…

And One More Thing…

We’ve got a Lisp here in Clojure, right? We’re doing functional programming, so why all of this larking about with objects?

It’s not just Clojure’s Java heritage. Forms like defrecord, defprotocol, and reify aren’t about Java interop.

Let me take you back in time…

Once upon a time, there was a common Lisp dialect, established by ANSI standard, called Common Lisp.

In the times of mist, the original neckbeards established that this Common Lisp should have an object system, known as CLOS, or the Common Lisp Object System.

Clojure has an object system as well; some of it seems ties to its underlying Java architecture (at the moment); the emergence of Clojure-CLR and cljs have opened up the possibilities for the object model, maybe?

Not really. OOP models aren’t all that creative. Ruby has quite a novel object model but other than that, OOP is pretty boring and let’s just forget about that unhappy chapter in our past, shall we?

Let’s.

Permalink

Type-safe transducers in Clojure. And Scala. And Haskell.

TL;DR

  1. As noted earlier, transducers can be properly annotated in Clojure using core.typed and they probably should be.
  2. But... there are a few tricks necessary to make it work.
  3. Transducers in Scala require tricks too, but different ones.
  4. Oh, but they're so lovely in Haskell.

Updates 10-29 20:00 (thanks, Twittersphere)

  1. I am apparently confused about the difference between universal and existential types. Happily, I don't seem to be alone in this, but I promise to figure it out anyway...

  2. It would probably be more natural (and certainly more concise) to stick to the trait/apply solution in Scala than to try to emulate a Haskell style, interesting though the attempt may have been. Under the hood, the complicated functions are still classes with an apply method anyway.

  3. My Haskell type doesn't acknowledge that transducers might have state. The Scala and Clojure versions don't either, but that's more acceptable in their cultures.

Transducers

I won't explain transducers here. The canonical introduction is Rich Hickey's blog post, with further explanation in his Strangeloop talk. I contributed a brief glossary, which may possibly be helpful.

Why bother with typed transducers in Clojure

At the end of an earlier post, I noted that, despite some controversy on the subject, transducer's type can be defined with core.typed (I'll walk through this a bit further down, so don't panic...)

  (t/defalias ReducingFn
     (t/TFn [[a :variance :contravariant]
             [r :variance :invariant]]
        [r a -> r]))

  (t/defalias Transducer (t/TFn [[a :variance :covariant]
                                 [b :variance :contravariant]]
                (t/All [r] [(ReducingFn a r) -> (ReducingFn b r)])))

in a manner fairly evocative of the way you'd do it in Haskell:

  type ReducingFn a r = r -> a -> r
  type Transducer a b = forall r . ReducingFn a r -> ReducingFn b r

While these representations may be more explanatory (to some, anyway) than the graphical illustration

little boxes

in Rich's talk, explanation is not the main point. Neither is the triumphal riposte that transducers are yet another thing that isn't a good example of the superiority of dynamic typing.1

With or without types, you're going to figure out transducers eventually, and I doubt you're going to understand them by types alone. It might even be better to go untyped, since a good flailing of trial and error can have educational value.

That's is a less attractive option when writing code that's meant to do something real, and that's where a type system can be helpful. If you use transducers - and you will, because they're incredibly powerful - you will at some point be confounded by mysterious bugs of your own creation. You will get confused by the funny reversed order of composition. And then you will stare, despairingly, at long stack traces containing multiple anonymous functions. Then you'll festoon your code with more and more printlns (or, if you're fancy, logging macros) until the head-slap moment occurs.

Slapless

I'll get into the details of the above annotations in a bit, but for now just take them as given. Accept also that, for some reason, there's a special composition function compt just for transducers.

Our artificial goal is going to be to take a sequence of strings representing integers, like ["1" "2" "3"], parse them, multiply them by something and then, for each integer calculate $\sqrt[n]{2}$, and finally add those roots up. Here are my three transducers (ignoring, for simplicity, the zero- and one- argument alternatives for the returned function):

 (t/ann t-parsei (Transducer t/Int t/Str))
 (defn t-parsei [rf]
   (fn [result input]
   (rf result (Integer/parseInt input))))

 (t/ann t-repn (Transducer Number Number))
 (defn t-repn [rf]
   (fn [result input]
     (rf (rf result input) input)))

 (t/ann t-root (Transducer Double Number))
 (defn t-root [rf]
   (fn [acc in]
     (rf acc (pow 2.0 (/ 1.0 (double in))))))

Taking the Transducer type function as given, these annotations make sense. The first transducer transforms a function that reduces over integers to one that reduces over strings; the last transforms a function that reduces over doubles to one that reduces over integers; and the one in the middle doesn't change the type at all.

If all goes well, I should be able to compose the transducers, apply them to the + reducing function and reduce,

(reduce ((compt t-root t-repn t-parsei) +) 0 ["1" "2" "3"])

but this doesn't get past type-checking:

     Domains:[x -> y] [b ... b -> x]
     Arguments:
        [[t/Any Number -> t/Any] -> [t/Any  Number -> t/Any]] [[t/Any t/Int -> t/Any] -> [t/Any t/Str -> t/Any]]

Squinting at the last line slightly,

        [       Number         ] -> [        Number         ]] [[      t/Int         ] -> [      t/Str         ]]

we see the problem: the transducers are reversed. That's an easy mistake to make, with all those functions of functions strewn about, but it's also easy to fix, once we have a timely and specific error. (I won't pretend that it's a particularly elegant error, but, once you get used to reading it, it's a hell of a lot more timely and specific than an exception and stack trace at runtime.)

Back on the straight and narrow, we get the result we wanted:

user> (t/cf (compt t-parsei t-repn t-root))
  (t/All [r] [[r Double -> r] -> [r String -> r]])
user> (reduce ((compt t-root t-repn t-parsei) +) 0 ["1" "2" "3"])
  9.348269224535935

Type functions definitions

So, ReducingFn and Transducer seem pretty useful. How did we make them?

  (t/defalias ReducingFn
     (t/TFn [[a :variance :contravariant]
             [r :variance :invariant]]
        [r a -> r]))

The TFn indicates that we're making a type function, i.e. a function of types that returns another type. The two types it takes are a (the type we are reducing over) and r (the type we're reducing to). Since we ought to be able to substitute a function that knows how to consume Numbers in general for a function that will encounter only Ints, the ReducingFn is contravariant in a, by the Liskov substitution principle. On the other hand, the exact opposite is true for the value returned by a function: if the recipient wants Int, it's not going to be happy with any old Number, but it could handle a Short or some other subtype. As r appears both as argument (suggesting contravariance) and return type (suggesting variance), it has to be invariant.

The Transducer type function returns the type of a function that consumes one ReducingFn and returns another.

  (t/defalias Transducer (t/TFn [[a :variance :covariant]
                                 [b :variance :contravariant]]
                (t/All [r] [(ReducingFn a r) -> (ReducingFn b r)])))

If someone is expecting a Transducer that consumes a particular kind of ReducingFn, they should be happy with a Transducer that consumes a supertype of that ReducingFn, i.e. Transducer is contravariant in the type ReducingFn used as its argument. But, since ReducingFns are themselves contravariant in the type they reduce over, the Transducer must be covariant in a. By contrast, the Transducer is covariant in the type of ReducingFn it returns, but since the ReducingFn is contravariant in the type it consumes, the Transducer must be contravariant in b.

Phew. It might come as a relief that the Transducer doesn't give a damn about the type r being reduced to. To advertise our apathy, while at the same time promising that we won't mess with r, we need the All keyword, indicating a so-called existential type.

  (t/defalias Transducer (t/TFn [[a :variance :covariant]
                                 [b :variance :contravariant]]
                (t/All [r] [(ReducingFn a r) -> (ReducingFn b r)])))

Tricks and compromises with typed Clojure

You may have wondered why we had to define a special t-repn for repeating numbers? We could in fact have created a more general version

(t/ann ^:no-check t-rep (t/All [a] (Transducer a a)))

with (since Clojure is still dynamically typed underneath our annotations) exactly the same definition. However, when we actually use t-rep, we need to inform typed Clojure exactly which existential variant we really want, by instantiating it:

(t/cf (compt t-parsei (t/inst t-rep t/Int) t-root))

This is because typed Clojure only performs local type inference. You can read more about the limitation in this post and in the references it contains, but the gist is that nothing is ever inferred by working backwards from the return type of a function, so you need to provide a crutch. Most languages with some kind of automatic type inference perform the local variety; a few, like OCaml and Haskell, do a much fuller job; and of course the vast majority of languages do none whatsoever.

The other oddity is one I mentioned earlier: we're not using Clojure's normal comp. Why? Well, consider the type of a simple composition function:

  (All [a b c] [[b -> c] [a -> b] -> [a -> c]])

That makes sense. The first function to be applied converts from a to b, and then the second converts the b to a c. Now, let's compose 3 and 4 functions:

  (All [a b c d]            [[c -> d] [b -> c] [a -> b] -> [a -> d]])
  (All [a b c d e] [[d -> e] [c -> d] [b -> c] [a -> b] -> [a -> d]])

The pattern is pretty clear, but there isn't an obvious annotation that would capture the type of all variadic possibilities. Instead, core.typed suggests

  (All [x y b ...] [[x -> y] [b ... b -> x] -> [b ... b -> y]])

which means the 2nd and succeeding functions all have the same signature. Even this limited composition type challenges core.typed if the functions are even slightly polymorphic. E.g.

  (t/cf (comp identity identity))

will fail with an error, roughly like:

  user> (t/cf (comp identity identity)
    Type Error polymorphic function comp could not be applied to arguments:
    Polymorphic Variables:  a b c
    Domains:    [b -> c] [a -> b]
    Arguments:  (t/All [x] [x -> x]) (t/All [x] [x -> x])

As noted above, we are allowed to instantiate a specific version of the polymorphic type, so

  user> (t/cf (t/inst identity Long))
    [Long -> Long]
  user> (t/cf (comp (t/inst identity Long) (t/inst identity Long)))
    [Long -> Long]

In summary:

  1. (comp identity identity) fails, because identity is polymorphic
  2. (comp (t/inst identity Long) (t/inst identity Long)) succeeds, because we have instantiated a specific type.
  3. (comp (t/inst identity Long) (t/inst identity Long) (t/inst identity Long)) fails again, because comp is called with three arguments.

Haskell's type inference is of course more sophisticated, but it also makes the problem easier by eschewing variadics in favor of currying. There's one composition function, which takes one argument and happens to return another function:

  (.) :: (b -> c) -> (a -> b) -> a -> c

There are thus at least two reasons why Haskell can easily deduce:

  id :: a -> a
  (id . id . id) :: c -> c

First, it does non-local type inference; second, it doesn't have to deal with variadic functions.

A slightly better variadic comp

We can't do much about local type inference, but we can write a comp that lets core.typed check an arbitrary series of composed transformations. The trick, as usual when we need to go easy on the type checker, is to use a macro to simplify what it needs to check:

(defmacro comp* [& [f1 f2 & fs]]
  (if-not fs
    `(comp ~f1 ~f2)
    `(comp ~f1 (comp* ~f2 ~@fs))))

so (comp* c->d b->c a->b) unwinds to (comp c->d (comp b->c a->b)), and failure #3 now succeeds:

  user> (t/cf (comp* (t/inst identity Long) (t/inst identity Long) (t/inst identity Long)))
  [Long -> Long]

Now, the general transducer (Transducer a b) is of course polymorphic, but even a specific- seeming one like t-repn (which is (Transducer Long Long)), still has that (All [r] ...), polymorphism in the type being reduced to. Thus, (comp t-repn t-repn) will fail with the now familiar "could not be applied to arguments" error.

Fortunately, we know that the transducer doesn't care at all about r, so, without loss of actual generality, we can lie:

  user> (t/cf (comp (t/inst t-repn Any) (t/inst t-repn Any)))
    [[Any Number -> Any] -> [Any Number -> Any]]

Having lied, we can make it right again by casting the polymorphism back in:

  (t/ann ^:no-check lie-again
       (t/All [a b] [[[t/Any a -> t/Any] -> [t/Any b -> t/Any]] ->
                     (t/All [r] [[r a -> r] -> [r b -> r]])]))
  (def lie-again identity)

so that:

  user> (t/cf (lie-again(comp (t/inst t-repn t/Any) (t/inst t-repn t/Any))))
    (t/All [r] [[r Number -> r] -> [r Number -> r]])

Now we combine the two lies and the de-variadification into a single macro

(defmacro compt [& tds]
  (let [its (map #(list 't/inst % 't/Any) tds)]
    `(lie-again (comp* ~@its))))

and, as demonstrated way above, we can compose transducers. Now you know why we need compt.

It's far prettier in Haskell

There's not too much to say about this. While the Transducer type definition

  type ReducingFn a r = r -> a -> r
  type Transducer a b = forall r . ReducingFn a r -> ReducingFn b r

is essentially the same as in Clojure, everything else is easier. We can write fully general transducers

  t_dub :: Num a => Transducer a a
  t_dub f r b = f r (2 * b)

  t_rep :: Transducer a a
  t_rep f r b = f (f r b) b

  t_parse :: Read a => Transducer a String
  t_parse f r s = f r $ read s

  t_root :: Transducer Double Integer
  t_root f r i = f r $ pow 2.0 (1.0/(fromInteger i))

and compose them with no special effort.

  (t_parse . t_rep . t_dub . t_root) :: ReducingFn Double r -> ReducingFn String r
  (foldl ((t_parse . t_rep . t_dub . t_root) (+)) 0.0 ["1","2","3"]) :: Double

Scala is not Haskell either

Let's start out unambitiously. Trying to compose the identity function in Scala seems to run into the same problem as in Clojure

  scala> identity _ compose identity 
  <console>: error: type mismatch;
   found   : Nothing => Nothing
   required: A => Nothing
              identity _ compose identity
                                 ^

but what's going on here is a slightly different problem. While identity is defined polymorphically as identity[A](a:A):A, by the time we see it in the REPL, all type information has been erased. (We deliberately erased it, by instantiating the function with _ in a context where no other type information is available.)

If we put it back explicitly, composition works, and the composed function can itself be used polymorphically:

  scala> def ia[A] = identity[A] _ compose identity[A]
  ia: [A]=> A => A

  scala> ia(3)
  res39: Int = 3

  scala> ia(3.0)
  res40: Double = 3.0

We can chain compositions in a manner that looks a bit like Haskell

  scala> identity[Int] _ compose identity[Int] compose identity[Int]
  res33: Int => Int = <function1>

but is really quite different. Scala's compose is a method of the Function1 class rather than a standalone function, as this less sugary rendition makes clear:

scala> (identity[Int] _).compose(identity[Int] _).compose(identity[Int] _)
res36: Int => Int = <function1>

That's OK. Scala's OO nature gives us a set of tools completely different from those we got from Clojure's homoiconicity, but they can be deployed for qualitatively similar purposes - in this case, safe and reasonably attractive transducers.

In fact, I've seen transducers in Scala implemented as a trait, which then delegates to a virtual transform method, e.g.

  type ReducingFn[A, R] = (R,A) => R
  trait TransducerT[A, B] {
    def transform[R]: ReducingFn[A, R] => ReducingFn[B, R]
    ...
  } 

To make TransducerT act more like a function, we would add an apply method, and to make chained composition pretty, a compose method:

    def apply[R] = transform _
    def compose[C](t2: Transducer[C, A]): Transducer[C, B] = {
      val t1 = this
      new Transducer[C, B] {
        override def transform[R]: (ReducingFn[C, R]) => ReducingFn[B, R] = rf => t1(t2(rf))
      }
    }

This will work, but we it's more amusing to try to define transducers as existential types, using the semi-mystical forSome annotation, which Scala uses for the same purpose as Haskell's forall and typed Clojure's All:

    type ReducingFn[-A, R] = (R,A) => R
    type Transducer3[+A,-B,R] = ReducingFn[A,R] => ReducingFn[B,R]
    type Transducer[+A,-B] = Transducer3[A,B,R forSome {type R}]

(To be honest, I don't know if it's possible to do this without the intermediate ternary type.)

To assist in creating simple transducers that just modify individual elements of cargo, we write mapping, again with an intermediate ternary type,

    def mapping3[A,B,R](f : A => B) : Transducer3[B,A,R] = { rf : ReducingFn[B,R] =>
      (r : R  ,a:A) => rf(r,f(a))}
    def mapping[A,B] = map3[A,B,R forSome {type R}] _

which we use like this:

    val t_parsei: Transducer[Int, String] = mapping { s: String => s.toInt}
    def t_root2 : Transducer[Double,Int] = mapping { i : Int => Math.pow(2.0,1.0/i)}

Nice, so far, but let's try reducing something easy:

    scala> println(List("1","2","3").foldLeft[Int](0)(t_parsei (_+_)))
    <console>:12: error: type mismatch;
      found   : Int
      required: String

Huh? Maybe it's having trouble understanding _+_:

  scala> println(List("1","2","3").foldLeft[Int](0)(t_parsei {(i:Int,j:Int)=>i+j}))
  <console>:12: error: type mismatch;
   found   : (Int, Int) => Int
   required: TransducerExistential.ReducingFn[Int,R forSome { type R }]
      (which expands to)  (R forSome { type R }, Int) => R forSome { type R }

Different but not better. Maybe it will work to cast explicitly to the ternary type:

  scala> println(List("1","2","3").foldLeft[Int](0)(t_parsei.asInstanceOf[Transducer3[Int,String,Int]] (_+_)))
  6

But that's a little ugly, and whenever something is even slightly ugly in Scala, you introduce an implicit to make it confusing instead. Hence

    implicit class TransducerOps[A,B](t1 : Transducer[A,B]) {
      def transform[R](rf : ReducingFn[A,R]) = t1.asInstanceOf[Transducer3[A,B,R]](rf)
    }

to coerce automatically after hoisting the Transducer into the TransducerOp container class.

Since we've already crossed the Rubicon, let's bring some slick Unicode along for the ride:

      def ⟐[R] = transform[R] _

Now

  scala> println(List("1","2","3").foldLeft[Int](0)(t_parsei   (_+_)))
  6

Finally, we're going to want chained function composition, so let's put a method for that, plus a nifty symbol, into the implicit class

      def compose[C](t2 : Transducer[C,A]) : Transducer[C,B] = comp(t1,t2)
      def ∘[C](t2: Transducer[C, A]): Transducer[C, B] = compose(t2)

so that:

  scala> println(List("1", "2", "3").foldLeft[Double](0.0)((t_parsei  t_repeat  t_root2)  {(x:Double,y:Double) => x+y}))
  9.348269224535935

I suspect that there will be more trickery further down the road, as we flesh out the standard library of transducer functions. To get sequence to work, I ended up performing multiple coercions:

    def sequence[A, B](t: Transducer[B, A], data: Seq[A]): Seq[B] = {
      val rf1: ReducingFn[B, Seq[B]] = { (r, b) => r :+ b}
      val rf2: ReducingFn[A, Seq[B]] = t(rf1.asInstanceOf[ReducingFn[B, R forSome {type R}]]).asInstanceOf[ReducingFn[A, Seq[B]]]
      data.foldLeft[Seq[B]](data.companion.empty.asInstanceOf[Seq[B]])(rf2)
    }
  scala> println(sequence(t_parsei  t_repeat  t_root2, List("1", "2", "3")));
  List(2.0, 2.0, 1.4142135623730951, 1.4142135623730951, 1.2599210498948732, 1.2599210498948732)

Conclusions

The type of one transducer is not obscure, and it's not much harder to understand than a callback. However, once you combine several transducers into a working program, the business of reconciling and checking their types can be challenging. Of the languages I know, only Haskell handles it gracefully. Building an entire system in Haskell might be intimidating, but, the transducer bits will be - bracing ourselves for a word not normally applied to Haskell - easy.

Transducers were invented for and clearly work in unityped Clojure, but I find myself wondering if they'll be one function abstraction too far for projects large enough to require many developers, and the argument that I find them beautiful might not carry the day. I do believe that a capable type framework would at least reduce the frequency of bugs, but typed Clojure is not at the point where telling someone to use it for transducers will obviously improve her or his life. It does not seem to be the case that a little macro cleverness can nudge the problem into the core.typed sweet spot.

It was interesting to play with transducers in Scala, if only because not many people have. Given the industrial efforts that have gone into Scala and the centrality of type checking to the language, it's hardly surprising that it does a better job than typed Clojure. But the margin of victory is slimmer than I would have expected. Even with the latest release of the IntelliJ plugin, many type errors didn't show up until a complete compilation. In general, once you get to forSome and its ilk, there isn't a wealth of straightforward advice available. (Hie thee, of course, to the Twitter-curated tutorials, which are about as good as it gets.)


  1. I'm pretty sure I heard someone say this. 

Permalink

Survey Finds Clojure Adoption Progresses Year-to-Year

Cognitect has recently published the results of a community survey aimed at finding out "how and for what Clojure and ClojureScript are being adopted, what is going well and what could stand improvement." According to Cognitect, though not a scientific survey, it shows how Clojure has "transitioned from exploratory status to a viable, sustainable platform for development at work." By Sergio De Simone

Permalink

Datomic Pull API

Datomic's new Pull API is a declarative way to make hierarchical selections of information about entities. You supply a pattern to specify which attributes of the entity (and nested entities) you want to pull, and db.pull returns a map for each entity.

Pull API vs. Entity API

The Pull API has two important advantages over the existing Entity API:

Pull uses a declarative, data-driven spec, whereas Entity encourages building results via code. Data-driven specs are easier to build, compose, transmit and store. Pull patterns are smaller than entity code that does the same job, and can be easier to understand and maintain.

Pull API results match standard collection interfaces (e.g. Java maps) in programming languages, where Entity results do not. This eliminates the need for an additional allocation/transformation step per entity.

Wildcards

A pull pattern is a list of attribute specifications.  If you want all attributes, you can use the wildcard (*) specification along with an entity identifier.  (In the examples below, entity identifiers such as led-zeppelin are variables defined in the complete code examples in Clojure and Java.)

;; Clojure API
(d/pull db '[*] led-zeppelin)

;; Java API
db.pull("[*]", ledZeppelin)

A pull result is a map per entity, shown here in edn:

;; result
{:artist/sortName "Led Zeppelin",
:artist/name "Led Zeppelin",
:artist/type {:db/id 17592186045746},
:artist/country {:db/id 17592186045576},
:artist/gid #uuid "678d88b2-87b0-403b-b63d-5da7465aecc3",
:artist/endDay 25,
:artist/startYear 1968,
:artist/endMonth 9,
:artist/endYear 1980,
:db/id 17592186050305}

Attributes

You can also specify the attributes you want explicitly, as with :artist/name and :artist/gid below:

;; pattern
[:artist/name :artist/gid]

;; input
led-zeppelin

;; result
{:artist/gid #uuid "678d88b2-87b0-403b-b63d-5da7465aecc3",
:artist/name "Led Zeppelin"}

The underscore prefix reverses the direction of an attribute, so :artist/_country pulls all the artists for a particular country:

;; pattern
[:artist/_country]

;; input
greatBritain

;; result
{:artist/_country [{:db/id 17592186045751}
{:db/id 17592186045755}
...]}

Components

Datomic component attributes are pulled recursively by default, so the :release/media pattern below automatically returns a release's tracks as well:

;; pattern
[:release/media]

;; input
darkSideOfTheMoon

;; result
{:release/media
[{:db/id 17592186121277,
:medium/format {:db/id 17592186045741},
:medium/position 1,
:medium/trackCount 10,
:medium/tracks
[{:db/id 17592186121278,
:track/duration 68346,
:track/name "Speak to Me",
:track/position 1,
:track/artists [{:db/id 17592186046909}]}
{:db/id 17592186121279,
:track/duration 168720,
:track/name "Breathe",
:track/position 2,
:track/artists [{:db/id 17592186046909}]}
{:db/id 17592186121280,
:track/duration 230600,
:track/name "On the Run",
:track/position 3,
:track/artists [{:db/id 17592186046909}]}
...]}]}

Map Specifications

Instead of just an attribute name, you can use a nested map specification to pull related entities.  The pattern below pulls the :db/id and :artist/name of each artist:

;; pattern
[:track/name {:track/artists [:db/id :artist/name]}]

;; input
ghostRiders

;; result
{:track/artists [{:db/id 17592186048186, :artist/name "Bob Dylan"}
{:db/id 17592186049854, :artist/name "George Harrison"}],
:track/name "Ghost Riders in the Sky"}

And of course everything nests arbitrarily, in case you need the release's medium's track's names and artists:

 ;; pattern
[{:release/media
[{:medium/tracks
[:track/name {:track/artists [:artist/name]}]}]}]

;; input
concertForBanglaDesh

;; result
[{:medium/tracks
[{:track/artists
[{:artist/name "Ravi Shankar"} {:artist/name "George Harrison"}],
:track/name "George Harrison / Ravi Shankar Introduction"}
{:track/artists [{:artist/name "Ravi Shankar"}],
:track/name "Bangla Dhun"}]}
{:medium/tracks
[{:track/artists [{:artist/name "George Harrison"}],
:track/name "Wah-Wah"}
{:track/artists [{:artist/name "George Harrison"}],
:track/name "My Sweet Lord"}
{:track/artists [{:artist/name "George Harrison"}],
:track/name "Awaiting on You All"}
{:track/artists [{:artist/name "Billy Preston"}],
:track/name "That's the Way God Planned It"}]
...]}

Try It Out

The Pull API has many other capabilities not shown here.  See the full docs for defaults, limits, bounded and unbounded recursion, and more.  Or check out the examples in Clojure and Java.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.