Meet Datomic: the immutable and functional database.

"When you combine two pieces of data you get data. When you combine two machines you get trouble." - Rich Hickey presenting The Functional Database.

Concepts focused on functional programming, mainly on immutability, have been increasingly present in our daily lives, therefore, nothing more fair than getting to know a database whose philosophy is the immutability of data, bringing control of facts in a format completely different from what we are used to.

In this article, we will get to know Datomic, which has this name precisely because it brings data in a format slightly different from the conventional one, seeking to bring data immutability closer to the database level, with a functional approach focused to work well with distributed systems.

Table of Contents

  • What is Datomic?
  • Architecture
  • Data structure
  • How a transaction works?
  • Conclusion

What is Datomic?

At the beginning of 2012, the Relevance team (later joining Metadata to form Cognitec), together with Rich Hickey, launched Datomic, which they began working on in 2010, with the main motivation being to transfer a substantial part of the power assigned to database servers for application servers, so that the programmer would have more programming power with data within the application logic.

Datomic Cloud was released in early 2018 using Amazon's components:

  • DynamoDB, EFS, EBS and S3 as storage services;
  • CloudFormation for deployment;
  • AWS Cloudwatch for logging, monitoring, and metrics.

Cognitect (the company previously responsible for developing Datomic) was acquired by Nubank in 2020, and Nubank announced in April 2023 that the Datomic binaries are publicly available and free to use (this means that its Pro version is now free to use).

Written in Clojure, Datomic works a little differently from the databases we are used to using, being used to manage data, but not store it. We'll go through more details about its architecture, but in short, it means that Datomic can use several other data storage services to store transactions, even other databases, which can result in a nice combination.

Concept

The main operating concept of Datomic is that the data is immutable as a whole. An interesting analogy for you to imagine and understand how it works a little better is:

  • Imagine a database that you are used to working with, such as PostgreSQL or MySQL;
  • Imagine now that you have two tables, a product table, but you also have a log table for these products, storing every modification that was made to the original product table;
  • When you update an item, this item has its data modified in the product table, but we add its previous value to the log table, highlighting what was done (in our case an update on its value);
  • For Datomic, there would only be the product "table" (in our case schema), with an additional column, indicating whether that item is true or not, so, when we update the product value, a new line would be added with the new value, however, the old line now has the check column value set to false, after all, it is no longer true at the current time.

This means that past data continues to exist, but there is this representation that indicates whether the value of the product is valid or not. These lines are called facts, so when the word "fact" is mentioned, remember this analogy.

One point to highlight: remember that no matter how much the value of the product has been changed, it remains a fact - no longer valid for the current time, however, it is still a fact that occurred, after all, the product in the past had this value.

Architecture

To better understand how everything works within Datomic, we first need to better understand how its architecture works.

How can I store data?

As previously mentioned, Datomic is not characterized by storing data as a database in the same way as we are used to, being used mainly to "transact and manage data". You can combine Datomic in different ways, such as:

  • SQL databases (such as PostgreSQL and MySQL);
  • DynamoDB (if you choose to use Datomic Cloud, the Indexes will be stored in S3, Transaction Log in DynamoDB and Cache in EFS);
  • Cassandra and Cassandra2;
  • Dev mode, storing data in memory. How it works is very simple: depending on your choice, a table will be created within your database in order to store all the data, which Datomic will manage.

Peers

Every type of interaction that an application will make with Datomic will be through a Peer, responsible for carrying out a small cache control, assembling and executing queries, bringing indexing data in addition to sending commits. Peers can be used in two main ways, namely:

  • Peer Library: a library to be added to your dependencies that will always be working with your application, after all, it is through it that you will carry out any type of action with your database;
  • Peer Server: being used mainly with the Datomic in Cloud format, your application now has only a Client library, responsible for communicating with the Peer Server that will perform direct actions with Datomic, as well as the Peer Library.

Peers work with a given database value for an extended period of time without concern. These values are immutable and provide a stable, consistent view of data for as long as a needs one, functioning as a kind of "snapshot" of the database to allow data to be returned in a more practical way without overloading with multiple queries in real time . This is a marked difference from relational s which require work to be done quickly using a short-lived. You can see more about the description of how Peers work in official documentation.

An important point to highlight is: if you are using the Peer Library, each node of your application, each service will have a Peer running alongside it, therefore, this means that in the process of a distributed system, these multiple Peers will be responsible for sending commits to your database, however, in a distributed system it is extremely important to control data consistency, after all, if I have multiple services sending queries in parallel, how can I control that the data is truthful and correctly passes a race condition? Well, to understand this better, let's understand what Transactor is.

Transactor

The Transactor is responsible for transacting the received commits and storing data within our database. The architecture of an application with Datomic is characterized by having multiple Peers working and sending commits (after all, we will have multiple services), but only a single Transactor, guaranteeing total data consistency. This means that regardless of whether the Transactor is receiving n commits per second, they will all be queued to guarantee total consistency.

The main point of this architectural format comes from understanding that dealing with concurrency in general, allowing multiple data to be stored in parallel in our database, especially in case of a distributed system, could negatively affect the consistency of stored data, being a drastic problem.

Now, we can look in more detail at a diagram that demonstrates how this entire architecture behaves:
Datomic architecture

Note in the diagram above that the Transactor is responsible for all activities that require direct communication with the data storage service used, in addition to controlling data indexing formats, working with a memcached cluster, and responding to commits. Thus, we can state that Datomic deals with ACID transactions, an acronym that refers to the set of 4 key properties that define a transaction: Atomicity , Consistency, Isolation, and Durability.

Storage Services

"Peers read facts from the Storage Services. The facts the Storage Service returns never change, so Peers do extensive caching. Each Peer's cache represents a partial copy of all the facts in the database. The Peer cache implements a least-recently used policy for discarding data, making it possible to work with databases that won't fit entirely in memory. Once a Peer's working set is cached, there is little or no network traffic for reads." - from Datomic Pro Documentation.

Storage Services can be configured however you like, all you need to do is create a properties file (called transactor.properties) to represent how your Transactor will be created and managed.

So, if you use PostgreSQL, for example, you will have to configure the driver that will be used, the connection url, username and password, and you can also configure values for memory-index-threshold, memory-index-max , object-cache-max, read-concurrency and even write-concurrency, ultimately creating a table named datomic_kvs within your PostgreSQL:

CREATE TABLE datomic_kvs (
  id text NOT NULL,
  rev integer,
  map text,
  val bytea,
  CONSTRAINT pk_id PRIMARY KEY (id)
) WITH (OIDS=FALSE);

Data structure

Well, we've already talked about the architecture of how Datomic as a whole works, so now let's better visualize what the basis of Datomic's data structure is like, starting with Datoms.

Datoms

"A datom is an immutable atomic fact that represents the addition or retraction of a relation between an entity, an attribute, a value, and a transaction."

So basically a datom is a simple fact in log, representing data changes of a relation. We can express a datom as a five-tuple:

  • an entity id (E)
  • an attribute (A)
  • a value for the attribute (V)
  • a transaction id (Tx)
  • a boolean (Op) indicating whether the datom is being added or retracted

All of these are from Datomic Cloud documentation. Look at the example below:

E 42
A :user/favorite-color
V :blue
Tx 1234
Op true

Entities

"A Datomic entity provides a lazy, associative view of all the information that can be reached from a Datomic entity id."

Looking into an Entity we can visualize as a table:

E A V Tx Op
42 :user/favorite-color :blue 1234 true
42 :user/first-name "John" 1234 true
42 :user/last-name "Doe" 1234 true
42 :user/favorite-color :green 4567 true
42 :user/favorite-color :blue 4567 false

The Transaction Id can be visualized as a point of the time which represents that data. In the example above we have 1234 and 4567. Look at 1234... In the first row, the :user/favorite-colour attribute has the value :blue, with op as true. But, in the future, at 4567 now the attribute has the op set to false for the attribute with the value :blue (now :green is set for true).

For us, we haven't changed manually the Op. Datomic automatically made this when we updated the value for :user/favorite-color. That means: Datomic automatically manage our data and set or update values, and we have the exactly point in time which the :user/favorite-color have been changed.

Schemas

As the documentation says: Attributes are defined using the same data model used for application data. That is, attributes are themselves defined by entities with associated attributes.

Well, for defining a new attribute we need to define:

  • :db/ident, a name that is unique within the database
  • :db/cardinality, specifying whether entities can have one or a set of values for the attribute
  • :db/valueType, the type allowed for an attribute's value
  • :db/doc (optional), the attribute's description/documentation

Look, all of these :db/ident, :db/cardinality and etc are only simple entities which pointer to each other. They are automatically generated by Datomic in the initial stage. This means: they have a default entity id.

How a transaction works?

"Every transaction in Datomic is its own entity, making it easy to add facts about why a transaction was added (or who added it, or from where, etc.)"

We have "two options" for transactions: add or retraction. Every transaction returns the transaction id and the database state before and after the transaction. The forms can be:

[:db/add entity-id attribute value]
[:db/retract entity-id attribute value]

How we saw before: every transaction occur in a queued mode.

If a transaction completes successfully, data is committed to the database and we have a transaction report returned as a map with the following keys:

key usage
:db-before database value before the transaction
:db-after database value after the transaction
:tx-data datoms produced by the transaction
:tempids map from temporary ids to assigned ids

The database value is like a "snapshot" from the database, as we saw before.

Let's see an example of how :db/add works. Look at the example below:

;; We have this schema
{:internal/id ...
 :internal/value 0
 :internal/key " "}

;; Making a simple transaction
[[:db/add id :internal/value 1]]
;; This will update the value...

;; But, we can perform multiple
;; transactions, look:
[[:db/add id :internal/value 1]
 [:db/add id :internal/key "another"]]
;; It will work fine.

;; But, when we perform something like:
[[:db/add id :internal/value 1]
 [:db/add id :internal/value 2]]
;; We will have a conflict

The conflict occur when we have a change in the same entity with the same attribute. That's make sense, because we can't have a fact updating multiple times in the same time lapse.

A cool fact: if we perform a multiple transaction they occur in parallel (with multiple processing). This is secure because as we saw before, the same attribute can't be updated in the same transaction.

Conclusion

This article and the beginning of this series of articles aims to introduce Datomic and present its various possibilities and advantages for general use. It is important to highlight that the official Datomic documentation is excellent, therefore, for further in-depth research it is extremely important that you use it! And of course, if you want to take your first steps with Datomic, feel free to use Getting Started from the official documentation, but if you want a repository with the codes used, I've made a repository available on my GitHub (don't forget to give a star)!

Permalink

Erlang Workshop 2024 - Call for Papers

23rd Edition of the Erlang Workshop
Monday 2nd September 2024, Milan, Italy
A satellite workshop of ICFP 2024

https://icfp24.sigplan.org/home/erlang-2024

Deadline: Thursday 30 May

The Erlang Workshop aims to bring together the open source, academic, and industrial communities of Erlang-like languages (e.g., Elixir, Gleam, Erlang, Akka, Clojure), including the concurrent, distributed systems, and fault-tolerant communities. The workshop will enable participants to familiarize themselves with recent developments on new techniques and tools, novel applications, draw lessons from users' experiences and identify research problems and common areas relevant to the practice of Erlang, Erlang-like languages, functional programming, distribution, concurrency etc.

Topics

This year we invite three types of submissions:

  • Technical papers describing language extensions, critical discussions of the status quo, formal semantics of language constructs, program analysis and transformation, virtual machine extensions and compilation techniques, implementations and interfaces of Erlang in/with other languages, and new tools (profilers, tracers, debuggers, testing frameworks, etc.). Submissions related to Erlang, Elixir, Lisp Flavored Erlang, and topics in functional, concurrent, and distributed programming are welcome and encouraged. The maximum length for technical papers is restricted to 12 pages, but short papers (max. 6 pages) are also welcome.

  • Practice and application papers describing uses of Erlang and related languages in the “real-world”, libraries for specific tasks, experiences from using Erlang in specific application domains, reusable programming idioms and elegant new ways of using Erlang to approach or solve particular problems, etc. The maximum length for the practice and application papers is restricted to 12 pages, but short papers (max. 6 pages) are also welcome.

  • Lightning talks describing topics related to the workshop goals that allow participants to present and demonstrate projects and preliminary work in academia and industry. Presentations in this category will be given at most an hour of shared simultaneous presentation time, will not be part of the peer review process and will not be part of the formal proceedings. Notification of acceptance will be continuous.

    Important dates

Submission deadline 30 May 2024
Notification 27 Jun 2024
Workshop 2 Sep 2024

Deadlines are anywhere on Earth.

Workshop Co-Chairs

  • Kiko Fernandez-Reyes, Ericsson, Sweden
  • Laura Voinea, University of Glasgow, UK

Program Committee

(Note: the Workshop Co-Chairs are also committee members)

  • Ingela A. Andin (Ericsson, Sweden)
  • Duncan Paul Attard (University of Glasgow, United Kingdom)
  • Annette Bieniusa (University of Kaiserslautern-Landau, Germany)
  • Laura Bocchi (University of Kent, United Kingdom)
  • Lars-Ake Fredlund (Universidad Politécnica de Madrid, Spain)
  • Saša Jurić (Elixir mentor, Croatia)
  • Luca Padovani (Università di Camerino, Italy)
  • Marjan Sirjani (Malardalen University, Sweden)
  • Hayleigh Thompson (Gleam core team, United Kingdom)
  • Brooklyn Zelenka (Fission, Canada)

Instructions to authors

Submission

Submissions must adhere to SIGPLAN’s republication policy
(http://sigplan.org/Resources/Policies/Republication/), and authors should be aware of ACM’s policies on plagiarism
(https://www.acm.org/publications/policies/plagiarism). Program Committee members are allowed to submit papers, but their papers will be held to a higher standard.

Papers must be submitted online via HotCRP at:

https://erlang2024.hotcrp.com

Lightning talks can be submitted to the workshop's co-chairs via e-mail.

Formatting

Submitted papers should be in portable document format (PDF),
formatted using the ACM SIGPLAN style guidelines. Authors should use the acmart format, with the sigplan sub-format for ACM proceedings. For details, see:

http://www.sigplan.org/Resources/Author/#acmart-format

It is recommended to use the review option when submitting a paper; this option enables line numbers for easy reference in reviews.

Supplementary material

Authors have the option to attach supplementary material to a submission, on the understanding that reviewers may choose not to look at it. This supplementary material should not be submitted as part of the main document; instead, it should be uploaded as a separate PDF document or tarball. Supplementary material should be uploaded at submission time, not by providing a URL in the paper that points to an external repository.

Artifacts

Authors of accepted papers are encouraged to make auxiliary material (artifacts like source code, test data, etc.) available with their paper. They can opt to have these artifacts published alongside their paper in the ACM Digital Library (copyright of artifacts remains with the authors). If an accepted paper’s artifacts are made permanently available for retrieval in a publicly accessible archival repository like the ACM Digital Library, that paper qualifies for an Artifacts Available badge (https://www.acm.org/publications/policies/artifact-review-badging#available). Applications for such a badge can be made after paper acceptance and will be reviewed by the PC co-chairs.

Proceedings

As with previous years, the accepted workshop papers will be published
by the ACM and will appear in the ACM Digital Library.

AUTHORS TAKE NOTE:

The official publication date is the date the proceedings are made available in the ACM Digital Library.

This date may be up to two weeks prior to the first day of your conference. The official publication date affects the
deadline for any patent filings related to published work.

For more information, please see ACM Copyright Policy
(http://www.acm.org/publications/policies/copyright-policy) and ACM Author Rights (http://authors.acm.org/main.html).

Accepted lightning talks will be posted on the workshop's website but not formally published in the proceedings.

Permalink

52: Coding in YAML with Ingy döt Net

Ingy döt Net talks about his new programming language YAMLScript, compiling YAML to Clojure, and the development of the YAML format. SML mailing list archive ActiveState Data::Denter Zope Ingy.net personal website Acmeism SnakeYAML / clj-yaml BPAN PST - Package Super Tool YAMLScript docs release-yamlscript file Yes expressions e.g. a(b c) => (a b c) and (a + b) => (+ a b) Deno - capabilities/permissions Advent of YAMLScript New YAML version

Permalink

New Library: clj-reload

The problem

Do you love interactive development? Although Clojure is set up perfectly for that, evaluating buffers one at a time can only get you so far.

Once you start dealing with the state, you get data dependencies, and with them, evaluation order starts to matter, and now you change one line but have to re-eval half of your application to see the change.

But how do you know which half?

The solution

Clj-reload to the rescue!

Clj-reload scans your source dir, figures out the dependencies, tracks file modification times, and when you are finally ready to reload, it carefully unloads and loads back only the namespaces that you touched and the ones that depend on those. In the correct dependency order, too.

Let’s do a simple example.

a.clj:

(ns a
  (:require b))

b.clj:

(ns b
  (:require c))

c.clj:

(ns c)

Imagine you change something in b.clj and want to see these changes in your current REPL. What do you do?

If you call

(clj-reload.core/reload)

it will notice that

  • b.clj was changed,
  • a.clj depends on b.clj,
  • there’s c.clj but it doesn’t depend on a.clj or b.clj and wasn’t changed.

Then the following will happen:

Unloading a
Unloading b
Loading b
Loading a

So:

  • c wasn’t touched — no reason to,
  • b was reloaded because it was changed,
  • a was loaded after the new version of b was in place. Any dependencies a had will now point to the new versions of b.

That’s the core proposition of clj-reload.

Usage

Here, I recorded a short video:

But if you prefer text, then start with:

(require '[clj-reload.core :as reload])

(reload/init
  {:dirs ["src" "dev" "test"]})

:dirs are relative to the working directory.

Use:

(reload/reload)
; => {:unloaded [a b c], :loaded [c b a]}

reload can be called multiple times. If reload fails, fix the error and call reload again.

Works best if assigned to a shortcut in your editor.

Usage: Return value

reload returns a map of namespaces that were reloaded:

{:unloaded [<symbol> ...]
 :loaded   [<symbol> ...]}

By default, reload throws if it can’t load a namespace. You can change it to return exception instead:

(reload/reload {:throw false})

; => {:unloaded  [a b c]
;     :loaded    [c b]
;     :failed    b
;     :exception <Throwable>}

Usage: Choose what to reload

By default, clj-reload will only reload namespaces that were both:

  • Already loaded
  • Changed on disk

If you pass :only :loaded option to reload, it will reload all currently loaded namespaces, no matter if they were changed or not.

If you pass :only :all option to reload, it will reload all namespaces it can find in the specified :dirs, no matter whether loaded or changed.

Usage: Skipping reload

Some namespaces contain state you always want to persist between reloads. E.g. running web-server, UI window, etc. To prevent these namespaces from reloading, add them to :no-reload during init:

(reload/init
  {:dirs ...
   :no-reload '#{user myapp.state ...}})

Usage: Unload hooks

Sometimes your namespace contains stateful resource that requires proper shutdown before unloading. For example, if you have a running web server defined in a namespace and you unload that namespace, it will just keep running in the background.

To work around that, define an unload hook:

(def my-server
  (server/start app {:port 8080}))

(defn before-ns-unload []
  (server/stop my-server))

before-ns-unload is the default name for the unload hook. If a function with that name exists in a namespace, it will be called before unloading.

You can change the name (or set it to nil) during init:

(reload/init
  {:dirs [...]
   :unload-hook 'my-unload})

This is a huge improvement over tools.namespace. tools.namespace doesn’t report which namespaces it’s going to reload, so your only option is to stop everything before reload and start everything after, no matter what actually changed.

Usage: Keeping vars between reloads

One of the main innovations of clj-reload is that it can keep selected variables between reloads.

To do so, just add ^:clj-reload/keep to the form:

(ns test)

(defonce x
  (rand-int 1000))

^:clj-reload/keep
(def y
  (rand-int 1000))

^:clj-reload/keep
(defrecord Z [])

and then reload:

(let [x test/x
      y test/y
      z (test/->Z)]
  
  (reload/reload)
  
  (let [x' test/x
        y' test/y
        z' (test/->Z)]
    (is (= x x'))
    (is (= y y'))
    (is (identical? (class z) (class z')))))

Here’s how it works:

  • defonce works out of the box. No need to do anything.
  • def/defn/deftype/defrecord/defprotocol can be annotated with ^:clj-reload/keep and can be persistet too.
  • Project-specific forms can be added by extending clj-reload.core/keep-methods multimethod.

Why is this important? With tools.namespace you will structure your code in a way that will work with its reload implementation. For example, you’d probably move persistent state and protocols into separate namespaces, not because logic dictates it, but because reload library will not work otherwise.

clj-reload allows you to structure the code the way business logic dictates it, without the need to adapt to developer workflow.

Simply put: the fact that you use clj-reload during development does not spill into your production code.

Comparison: Evaluating buffer

The simplest way to reload Clojure code is just re-evaluating an entire buffer.

It works for simple cases but fails to account for dependencies. If something depends on your buffer, it won’t see these changes.

The second pitfall is removing/renaming vars or functions. If you had:

(def a 1)

(def b (+ a 1))

and then change it to just

(def b (+ a 1))

it will still compile! New code is evaluated “on top” of the old one, without unloading the old one first. The definition of a will persist in the namespace and let b compile.

It might be really hard to spot these errors during long development sessions.

Comparison: (require ... :reload-all)

Clojure has :reload and :reload-all options for require. They do track upstream dependencies, but that’s about it.

In our original example, if we do

(require 'a :reload-all)

it will load both b and c. This is excessive (b or c might not have changed), doesn’t keep track of downstream dependencies (if we reload b, it will not trigger a, only c) and it also “evals on top”, same as with buffer eval.

Comparison: tools.namespace

tools.namespace is a tool originally written by Stuart Sierra to work around the same problems. It’s a fantastic tool and the main inspiration for clj-reload. I’ve been using it for years and loving it, until I realized I wanted more.

So the main proposition of both tools.namespace and clj-reload is the same: they will track file modification times and reload namespaces in the correct topological order.

This is how clj-reload is different:

  • tools.namespace reloads every namespace it can find. clj-reload only reloads the ones that were already loaded. This allows you to have broken/experimental/auxiliary files lie around without breaking your workflow TNS-65
  • First reload in tools.namespace always reloads everything. In clj-reload, even the very first reload only reloads files that were actually changed TNS-62
  • clj-reload supports namespaces split across multiple files (like core_deftype.clj, core_defprint.clj in Clojure) TNS-64
  • clj-reload can see dependencies in top-level standalone require and use forms TNS-64
  • clj-reload supports load and unload hooks per namespace TNS-63
  • clj-reload can specify exclusions during configuration, without polluting the source code of those namespaces.
  • clj-reload can keep individual vars around and restore previous values after reload. E.g. defonce doesn’t really work with tools.namespace, but it does with clj-reload.
  • clj-reload has 2× smaller codebase and 0 runtime dependencies.
  • clj-reload doesn’t support ClojureScript. Patches welcome.

That’s it!

Clj-reload grew from my personal needs on Humble UI project. But I hope other people will find it useful, too.

Let me know what works for you and what doesn’t! I’ll try to at least be on par with tools.namespace.

And of course, here’s the link:

Permalink

PG2 release 0.1.11: HugSQL support

The latest 0.1.11 release of PG2 introduces HugSQL support.

The pg2-hugsql package brings integration with the HugSQL library. It creates functions out from SQL files like HugSQL does but these functions use the PG2 client instead of JDBC. Under the hood, there is a special database adapter as well as a slight override of protocols to make inner HugSQL stuff compatible with PG2.

Since the package already depends on core HugSQL functionality, there is no need to add the latter to dependencies: having pg2-hugsql by itself will be enough (see Installation).

Basic Usage

Let’s go through a short demo. Imagine we have a demo.sql file with the following queries:

-- :name create-demo-table :!
create table :i:table (id serial primary key, title text not null);

-- :name insert-into-table :! :n
insert into :i:table (title) values (:title);

-- :name insert-into-table-returning :<!
insert into :i:table (title) values (:title) returning *;

-- :name select-from-table :? :*
select * from :i:table order by id;

-- :name get-by-id :? :1
select * from :i:table where id = :id limit 1;

-- :name get-by-ids :? :*
select * from :i:table where id in (:v*:ids) order by id;

-- :name insert-rows :<!
insert into :i:table (id, title) values :t*:rows returning *;

-- :name update-title-by-id :<!
update :i:table set title = :title where id = :id returning *;

-- :name delete-from-tablee :n
delete from :i:table;

Prepare a namespace with all the imports:

(ns pg.demo
  (:require
   [clojure.java.io :as io]
   [pg.hugsql :as hug]
   [pg.core :as pg]))

To inject functions from the file, pass it into the pg.hugsql/def-db-fns function:

(hug/def-db-fns (io/file "test/demo.sql"))

It accepts either a string path to a file, a resource, or a File object. Should there were no exceptions, and the file was correct, the current namespace will get new functions declared in the file. Let’s examine them and their metadata:

create-demo-table
#function[pg.demo...]

(-> create-demo-table var meta)

{:doc ""
 :command :!
 :result :raw
 :file "test/demo.sql"
 :line 2
 :arglists ([db] [db params] [db params opt])
 :name create-demo-table
 :ns #namespace[pg.demo]}

Each newborn function has at most three bodies:

  • [db]
  • [db params]
  • [db params opt],

where:

  • db is a source of a connection. It might either a Connection object, a plain Clojure config map, or a Pool object.
  • params is a map of HugSQL parameters like {:id 42};
  • opt is a map of pg/execute parameters that affect processing the current query.

Now that we have functions, let’s call them. Establish a connection first:

(def config
  {:host "127.0.0.1"
   :port 10140
   :user "test"
   :password "test"
   :dbname "test"})

(def conn
  (jdbc/get-connection config))

Let’s create a table using the create-demo-table function:

(def TABLE "demo123")

(create-demo-table conn {:table TABLE})
{:command "CREATE TABLE"}

Insert something into the table:

(insert-into-table conn {:table TABLE
                         :title "hello"})
1

The insert-into-table function has the :n flag in the source SQL file. Thus, it returns the number of rows affected by the command. Above, there was a single record inserted.

Let’s try an expression that inserts something and returns the data:

(insert-into-table-returning conn
                             {:table TABLE
                              :title "test"})
[{:title "test", :id 2}]

Now that the table is not empty any longer, let’s select from it:

(select-from-table conn {:table TABLE})

[{:title "hello", :id 1}
 {:title "test", :id 2}]

The get-by-id shortcut fetches a single row by its primary key. It returs nil for a missing key:

(get-by-id conn {:table TABLE
                 :id 1})
{:title "hello", :id 1}

(get-by-id conn {:table TABLE
                 :id 123})
nil

Its bulk version called get-by-ids relies on the in (:v*:ids) HugSQL syntax. It expands into the following SQL vector: ["... where id in ($1, $2, ... )" 1 2 ...]

-- :name get-by-ids :? :*
select * from :i:table where id in (:v*:ids) order by id;
(get-by-ids conn {:table TABLE
                  :ids [1 2 3]})

;; 3 is missing
[{:title "hello", :id 1}
 {:title "test", :id 2}]

To insert multiple rows at once, use the :t* syntax which is short for “tuple list”. Such a parameter expects a sequence of sequences:

-- :name insert-rows :<!
insert into :i:table (id, title) values :t*:rows returning *;
(insert-rows conn {:table TABLE
                   :rows [[10 "test10"]
                          [11 "test11"]
                          [12 "test12"]]})

[{:title "test10", :id 10}
 {:title "test11", :id 11}
 {:title "test12", :id 12}]

Let’s update a single row by its id:

(update-title-by-id conn {:table TABLE
                          :id 1
                          :title "NEW TITLE"})
[{:title "NEW TITLE", :id 1}]

Finally, clean up the table:

(delete-from-table conn {:table TABLE})

Passing the Source of a Connection

Above, we’ve been passing a Connection object called conn to all functions. But it can be something else as well: a config map or a pool object. Here is an example with a map:

(insert-rows {:host "..." :port ... :user "..."}
             {:table TABLE
              :rows [[10 "test10"]
                     [11 "test11"]
                     [12 "test12"]]})

Pay attention that, when the first argument is a config map, a Connection object is established from it, and then it gets closed afterward before exiting a function. This might break a pipeline if you rely on a state stored in a connection. A temporary table is a good example. Once you close a connection, all the temporary tables created within this connection get wiped. Thus, if you create a temp table in the first function, and select from it using the second function passing a config map, that won’t work: the second function won’t know anything about that table.

The first argument might be a Pool instsance as well:

(pool/with-pool [pool config]
  (let [item1 (get-by-id pool {:table TABLE :id 10})
        item2 (get-by-id pool {:table TABLE :id 11})]
    {:item1 item1
     :item2 item2}))

{:item1 {:title "test10", :id 10},
 :item2 {:title "test11", :id 11}}

When the source a pool, each function call borrows a connection from it and returns it back afterwards. But you cannot be sure that both get-by-id calls share the same connection. A parallel thread may interfere and borrow a connection used in the first get-by-id before the second get-by-id call acquires it. As a result, any pipeline that relies on a shared state across two subsequent function calls might break.

To ensure the functions share the same connection, use either pg/with-connection or pool/with-connection macros:

(pool/with-pool [pool config]
  (pool/with-connection [conn pool]
    (pg/with-tx [conn]
      (insert-into-table conn {:table TABLE :title "AAA"})
      (insert-into-table conn {:table TABLE :title "BBB"}))))

Above, there is 100% guarantee that both insert-into-table calls share the same conn object borrowed from the pool. It is also wrapped into transaction which produces the following session:

BEGIN
insert into demo123 (title) values ($1);
  parameters: $1 = 'AAA'
insert into demo123 (title) values ($1);
  parameters: $1 = 'BBB'
COMMIT

Passing Options

PG2 supports a lot of options when processing a query. To use them, pass a map into the third parameter of any function. Above, we override a function that processes column names. Let it be not the default keyword but clojure.string/upper-case:

(get-by-id conn
           {:table TABLE :id 1}
           {:fn-key str/upper-case})

{"TITLE" "AAA", "ID" 1}

If you need such keys everywhere, submitting a map into each call might be inconvenient. The def-db-fns function accepts a map of predefined overrides:

(hug/def-db-fns
  (io/file "test/demo.sql")
  {:fn-key str/upper-case})

Now, all the generated functions return string column names in upper case by default:

(get-by-id config
           {:table TABLE :id 1})

{"TITLE" "AAA", "ID" 1}

For more details, refer to the official HugSQL documentation.

Permalink

Launching Columns for Tablecloth

This is a cross-post of a recent post by Ethan Miller at his blog, announcing the release of a substantial addition to the Tablecloth library. The New Column API # Today we at scicloj deployed a new Column API (tablecloth.column.api) into the data processing library Tablecloth (available as of version 7.029.1). This new API adds a new primitive to the Tablecloth system: the column. Here’s how we use it:

Permalink

Launching Columns for Tablecloth

The New Column API

Today we at scicloj deployed a new Column API (tablecloth.column.api) into the data processing library Tablecloth (available as of version 7.029.1). This new API adds a new primitive to the Tablecloth system: the column. Here’s how we use it:

(require '[tablecloth.column.api :as tcc])

(tcc/column [1 2 3 4 5])
;; => #tech.v3.dataset.column<int64>[5]
null
[1, 2, 3, 4, 5]

The new column is the same as the columns that comprise a dataset. It is a one-dimensional typed sequence of values. Underneath the hood, the column is just the column defined in tech.ml.dataset, the library that backs Tablecloth.

The difference is that now when you are using Tablecloth you have the option of interacting directly with a column using an API that provides a set of operations that always take and return a column.

Basic Usage

Let’s go through a simple example. Let’s say we have some test scores that we need to analyze:

(def test-scores (tcc/column [85 92 78 88 95 83 80 90]))

test-scores
;; => #tech.v3.dataset.column<int64>[8]
null
[85, 92, 78, 88, 95, 83, 80, 90]

Now that we have these values in a column, we can easily perform operations on them:

(tcc/mean test-scores)
;; => 86.375

(tcc/standard-deviation test-scores)
;; => 5.926634795564849

There are a many operations that one can perform. At the moment, the available operations are those that you would have previously accessed by importing the tech.v3.datatype.functional namespace from dtype-next.

To get a fuller picture of the Column API and how it works, please consult the Column API section in the Tablecloth documentation.

Easier Column Operations on the Dataset

The changes we’ve deployed also improve the expressive power of Tablecloth’s standard Dataset API. Previously, if you needed to do something simple like a group by and aggregation on a column in a dataset, the code could become unnecessarily verbose:

(defonce stocks
  (tc/dataset "https://raw.githubusercontent.com/techascent/tech.ml.dataset/master/test/data/stocks.csv" {:key-fn keyword}))


(tc/column-names stocks)

(-> stocks
    (tc/group-by [:symbol])
    (tc/aggregate (fn [ds]
                    (-> ds
                        :price
                        tech.v3.datatype.functional/mean))))
;; => _unnamed [5 2]:

| :symbol |      summary |
|---------|-------------:|
|    MSFT |  24.73674797 |
|    AMZN |  47.98707317 |
|     IBM |  91.26121951 |
|    GOOG | 415.87044118 |
|    AAPL |  64.73048780 |

With the new column operations within for datasets, you can now simply write:

(-> stocks
    (tc/group-by [:symbol])
    (tc/mean [:price]))

The same set operations available to be run on the column can be called on columns in the datasest. However, when operating a dataset, functions that would return a scalar value act as aggregator functions, as seen above.

Functions that would return a new column allow the user to specify a target column to be added to the dataset, as in this example where we first use the method above to add a column with the mean back to stocks:

(def stocks-with-mean
  (-> stocks
      (tc/group-by [:symbol])
      (tc/mean [:price])
      (tc/rename-columns {"summary" :mean-price})
      (tc/inner-join stocks :symbol)))


stocks-with-mean
;; => inner-join [560 4]:
;;    | :symbol | :mean-price |      :date | :price |
;;    |---------|------------:|------------|-------:|
;;    |    MSFT | 24.73674797 | 2000-01-01 |  39.81 |
;;    |    MSFT | 24.73674797 | 2000-02-01 |  36.35 |
;;    |    MSFT | 24.73674797 | 2000-03-01 |  43.22 |
;;    |    MSFT | 24.73674797 | 2000-04-01 |  28.37 |

Then we use a dataset column operation that returns a column – column division, in this case – to add a new column holding the relative daily price of the stock:

(-> stocks-with-mean
    (tc// :relative-daily-price [:price :mean-price]))
;; => inner-join [560 5]:
;;    | :symbol | :mean-price |      :date | :price | :relative-daily-price |
;;    |---------|------------:|------------|-------:|----------------------:|
;;    |    MSFT | 24.73674797 | 2000-01-01 |  39.81 |            1.60934655 |
;;    |    MSFT | 24.73674797 | 2000-02-01 |  36.35 |            1.46947368 |
;;    |    MSFT | 24.73674797 | 2000-03-01 |  43.22 |            1.74719814 |
;;    |    MSFT | 24.73674797 | 2000-04-01 |  28.37 |            1.14687670 |

For more information, on these operations, please consult the documentation here.

Thanks to Clojurist Together

This contribution to Tablecloth was supported by Clojurists Together through their Quarterly Fellowships for open source development.

Permalink

Clojure Deref (Apr 12, 2024)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS). Thanks to Anton Fonarev for link aggregation.

Blogs, articles, and projects

Libraries and Tools

New releases and tools this week:

Permalink

Heart of Clojure Tickets For Sale

Tickets for Heart of Clojure are now available.

Grab them here!

We have a limited amount of Early Bird tickets, they will be on sale until the end of April, but we expect them to sell out before that.

Meanwhile the orga team is in full swing! The next major milestone will be to get the CFP up, so people can submit their talk proposals. While we get that ready you can already start thinking about the talk or session you want to propose.

We are also busy reaching out to potential sponsors. We really need sponsors to make this event happen, and in the current economic climate they&aposre a little harder to find than five years ago.

But we&aposre confident we can convinve enough of them to support us. It&aposs a unique opportunity to reach three to fourhundred smart and talented technologists, and to be associated with probably the coolest event of the year. If you know of companies that would be interested, please send an intro to orga@heartofclojure.eu, or pass on our sponsorship prospectus.

- Bettina & Arne

Permalink

Upload files to SharePoint using Babashka

My team and I were publishing some files several times weekly and we were obligated to put them on Microsoft SharePoint for non-technical people to be able to find them. After all, it is fair to not assume everyone knows their way around GitHub or how to run small pieces of code.

The tedious file upload to SharePoint was done manually until recently when Babashka came to the rescue and helped leverage our CI/CD pipeline.

tl;dr Link to a Babashka script that can upload files to SharePoint — you need to fill out your authentication information.

Getting an app-only principal

Since it was a CI/CD pipeline that was supposed to do all the work, we wanted to be granted access using SharePoint App-Only. Being in a huge organization, the people handling such things are “far away”, and they are required to follow a process with a lot of bureaucratic hoops.

I am not sure exactly how all the specifics were done, but I am told we got an app-only principal with the following setup:

<AppPermissionRequests AllowAppOnlyPolicy="true">
  <AppPermissionRequest Scope="http://sharepoint/content/sitecollection/web" Right="FullControl"/>
</AppPermissionRequests>

Though I don’t think it matters for this use case, App Domain and Redirect URI were using the following settings:

App Domain: www.localhost.com
Redirect URI: https://www.localhost.com

Acquiring an access token

Looking at the SharePoint API documentation it seemed fairly straightforward, but acquiring an access token that the SharePoint API would accept, proved to be the hardest part.

Documentation and guides both official and community suggested acquiring access tokens from login.microsoftonline.com in one way or another. But I kept getting rejected by the SharePoint API regardless, with cryptic error messages that didn’t help me understand the issue.

I was tempted to give up on several occasions, but I also dreaded the thought of maintaining code, that would be unnecessarily long and complex. I got a taste when a colleague showed me some Python code using the Office365-REST-Python-Client library. The Python code was using A LOT more lines of code — even with the library — than I suspected would be necessary in Clojure without a library.

After several weeks on and off, whenever I could find a bit of time in between my normal work, I stumbled over the blog post How To Perform A SharePoint App-Only Authentication In Power Automate, which was acquiring an access token from accounts.accesscontrol.windows.net:

(def sharepoint-principal-id
  "00000003-0000-0ff1-ce00-000000000000")

(defn get-token
  [{:keys [client-id client-secret tenant-id tenant-name]}]
  (let [params {"client_id" (str client-id "@" tenant-id )
                "client_secret" client-secret
                "grant_type" "client_credentials"
                "resource" (str sharepoint-principal-id "/" tenant-name ".sharepoint.com@" tenant-id) }]
    (-> (http/post (str "https://accounts.accesscontrol.windows.net/" tenant-id "/tokens/OAuth/2")
                   {:form-params params})
        :body
        (json/decode true))))

(comment
  ;; replace client and tenant dummy information with your own
  (def access-token
    (-> (get-token {:client-id "71e0b131-84a1-40ed-a9f1-366fbecfd05f"
                    :client-secret "NDMzODQ3NWQtZTh/Tk0YzYtOGEzZjI3+MDc+2NWM3Cg="
                    :tenant-id "contoso.com"
                    :tenant-name "contoso"})
        :access_token))
)

… and just like that, we were back in business 😅

Interacting with SharePoint REST API

The rest fell into place easily, with the SharePoint API documentation in hand, and a file was quickly uploaded.

(defn percent-encode
  [s]
  (-> s
      (URLEncoder/encode)
      (str/replace "+" "%20")))

(defn upload-file
  [access-token endpoint {:keys [input-stream name destination] :as _file-info} opts]
  (let [mime-type (or (:mime-type opts)
                      (not-empty (URLConnection/guessContentTypeFromName name)))]
    (when-not mime-type
      (throw (ex-info "Unable to automatically determine mime-type" {:file-name name})))

    (println "Uploading" (str destination "/" name))
    (http/post (str endpoint "/_api/web"
                    "/GetFolderByServerRelativeUrl('" (percent-encode destination) "')"
                    "/Files/add(url='" (percent-encode name) "',overwrite=true)")
               {:headers {"Authorization" (str "Bearer " access-token)
                          "Content-Type" mime-type}
                :body input-stream})))

(comment
  (def endpoint
    (let [tenant-name "contoso"
          site "somesite"
      (str "https://" tenant-name ".sharepoint.com/sites/" site))

  (upload-file access-token endpoint "foldere that exists"
               {:input-stream (io/file "test-file.txt")
                :mime-type "text/plain" ; optional - should be auto detected for most filetypes
                :name "super-file.txt"})
)

Notice: percent-encode which is necessary when dealing with file and folder names containing space.

Also, it didn’t take long before I found out that you cannot upload a file to a folder that doesn’t exist. Luckily, the API can also be used to create such folders.

(defn create-folder
  [access-token endpoint folder-name]
  (http/post (str endpoint "/_api/web/Folders/add('" (percent-encode folder-name) "')" )
             {:version :http1.1 ; SharePoint API gets confused about empty body when using HTTP v. 2
              :headers {"Authorization" (str "Bearer " access-token)}}))

Notice: :version :http1.1 which is required to work around a bug (presumably in the SharePoint API). Anyway the combination of Java HTTP client used by Babashka and HTTP2 protocol, causes SharePoint to complain about missing content-type header (for an empty body).

I have put the Babashka script for uploading files to SharePoint in a GitHub Gist, which includes a nice helper function that takes a local folder and uploads all its contents to SharePoint (including nested folders and files).

The script only takes up 111 lines of code, which are all simple and easy to read (maybe with the exception of upload-info-keep-relative-path).

Damn, Clojure ROCKS 🚀

Permalink

Clojure Deref (Apr 5, 2024)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS). Thanks to Anton Fonarev for link aggregation.

Blogs, articles, and projects

Libraries and Tools

New releases and tools this week:

  • clay 2-beta4 - A tiny Clojure tool for dynamic workflow of data visualization and literate programming

  • portal 0.54.2 - A clojure tool to navigate through your data

  • sane-math 0.1.0 - Clojure/Script library for infix (normal) math expressions

  • pact 1.0.6 - clojure.spec to json-schema generation library

  • Tutkain 0.20.1 (alpha) - A Sublime Text package for interactive Clojure development

  • beep-boop - Audible and visual feedback for test runs

  • neil 0.3.65 - A CLI to add common aliases and features to deps.edn-based projects

  • fdb 1.0.0 - Reactive database environment for your files

  • clojure-repl-intellij 1.0.0 - Free OpenSource IntelliJ plugin for Clojure REPL development

  • kc-repl 2.0 - An interactive, command line tool for exploring Kafka clusters

  • tg-clj 0.2.2 - A telegram bot api wrapper inspired by aws-api

  • tg-clj-server 0.3.0 - A more framework-y library for use with tg-clj inspired by ring web-servers

  • omniconf 0.5.2 - Configuration library for Clojure that favors explicitness

  • clay.el 1.3 - Emacs bindings for the Clojure Clay tool

  • flow-storm-debugger 3.14.0 - A debugger for Clojure and ClojureScript with some unique features

  • clojure-lsp 2024.03.31-19.10.13 - Clojure & ClojureScript Language Server (LSP) implementation

  • calva 2.0.435 - Clojure & ClojureScript Interactive Programming for VS Code

  • sitefox 0.0.19 - Node + cljs backend web framework

  • squint 0.7.104 - Light-weight ClojureScript dialect

  • datalevin 0.9.4 - A simple, fast and versatile Datalog database

  • semantic-router-clj 0.1.0-alpha.1 - Decision making layer for LLMs in Clojure

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.