Clojure Ads.txt Crawler Reporting Site

Continuing with the Ads.txt crawler has lead to the idea to store the crawler results in a database and have them available from a web site. This post introduces the first pass as such a site with the source code available in the following repository:

Ads.txt Review

As a quick review the Ads.txt standard is one where publishers can host a simple text file with the names of authorized ad networks that have permission to sell the publisher's inventory. There is a reference Python crawler for such files and I've built a crawler in Clojure as an alternative. See this link for a series of posts about the Ads.txt specification and the development of the crawler. The crawler project is here.

Crawler Library

The crawler previously developed has a command line interface but can be easily used as a library from another Clojure application. To facilitate this I've pushed a version of the crawler to Clojars.org. It's listing is:

Luminus Framework

For a batteries included framework to build a Clojure web application I highly recommend Luminus. Start with the tutorial and get that working first. Then I'd suggest repeating the excercise with the database you are ultimately going to us. For my example I built the Guestbook with a local Postgres instance. Then, I knew I was going to deploy to Heroku so moved the app to Heroku as another excercise.

Heroku

If you aren't familiar with Heroku it is another recommendation. You can once you are familiar with their overall system host test and development versions without any cost. This includes backing them with Postgres.

Here is the short list of commands to setup a site such as the Ads.txt site. The details are for you to research.

$ heroku apps:create ads-txt
$ heroku addons:create heroku-postgresql:hobby-dev
$ git push heroku master
$ heroku run lein run migrate

Bulk Loading

I added the ability to add domains to the system manually but to get things started it was important to allow bulk loading using the crawler library. Here are some sample commands with references to lists of domains that may be useful.

$ heroku run lein run -t https://gist.githubusercontent.com/bradlucas/f5060dfc602cfeedb140a23bb4d77403/raw/f47ae3cfe938b5035c4396d1069a1a5c2e8324c2/top-100-programmatic-domains.txt --app ads-txt
$ heroku run lein run -t https://gist.githubusercontent.com/bradlucas/5b80fae610b9a5547ff10d1c8a706d35/raw/3a6262da02d17908517cbbb5499404aa58f856a6/the-moz-top-500.txt  --app ads-txt
$ heroku run lein run -t https://gist.githubusercontent.com/bradlucas/df0a5dc3e7d4a92eaecf2ac28bc0f17a/raw/d53ae31b3aaba88019ef2dff644a6bfeb8c9a088/adstxt_domains_2017-09-11.txt --app ads-txt

The Site

The above described site lets you request domains to be crawled for their Ads.txt file records and have the results stored. The site is currently running on Heroku at https://ads-txt.herokuapp.com/.

https://ads-txt.herokuapp.com/

Permalink

Съездил в Балтимор

На прошлой неделе съездил в Балтимор на конференцию по Кложе. Впервые посетил США. Гулял по набережной, видел Рича Хикки – словом, путешествие удалось. Ниже – случайные заметки обо всем, что осталось в голове, плюс немного любительских фото с завеленным горизонтом.

Визу в США я успел получить за неделю до событий, когда повыгоняли американских дипломатов, а штаты в ответ усложнили выдачу виз. Насколько я знаю, визу и сейчас можно получить, просто это займет больше времени. Документы лучше готовить в местном визовом центре, стоит примерно 10 тыс. рублей.

В Москве при входе в посольство случился казус: внутрь не пускают с сумками и рюкзаками. А у меня ноут, документы, все дела. Ячеек нет, куда идти неясно. Рядом тусуются мутные типы, принимают вещи на хранение. Пришлось бегать по магазинам в поисках ячеек. Это полный пиздец, конечно. Вопиющее неуважение к людям.

На собеседовании уделяют особое внимание людям пожилого возраста. Никто не хочет, чтобы приехал очередной нахлебник. Одного дедулю, собравшегося к сыну, взяли в оборот и мурыжили где-то час. Пока стоял в очереди, услышал его биографию от пятого колена, но так и не узнал, дали визу или нет.

Тех, кто едет по бизнесу, просят шпрехнуть на английском. Одна пара утверждала, что едут заключать контракты на миллионы, а сами ни бельмеса по-английски. Тут не надо быть офицером, чтобы что-то заподозрить.

Впереди стояла женщина, которая была в посольстве явно не в первый раз, всем указывала, все знала. На собеседовании сказала, что собралась замуж за американца. Это было так жалко. Брак считается очень слабой причиной для визы. Вас расспросят о самых интимных моментах знакомства, попросят личную переписку, фотки, открытки, потому что жалающих попасть в США через мнимую любовь хоть отбавляй.

Когда летишь из провинции в провинцию, твой путь лежит через столицы, хочешь этого или нет. Из Воронежа в Москву, оттуда в Нью-Йорк, затем в Балтимор. Обратно сначала на авто в Вашингтон, оттуда в Германию, в Москву и в Воронеж. Шесть самолетов. 20 часов туда, 15 обратно.

Самое тяжелое – перелетать океан. Занимает 9 часов, временной сдвиг, устаешь от самолета. Ближе к концу наступает апатия: и книжки надоели, на ноуте делать нечего, читалка внушает отвращение.

Смутно помню, что все путешествие меня преследовал голод. Почему-то не удавалось нормально поесть. Во время пересадок я только успевал менять терминалы. Опасаясь шмона в США, выкинул кошерную воронежскую булочку (везде пишут, что еда строго запрещена). Во время очередного полета мне эта булочка приснилась.

Посмотрел в самолете Сферу с Гермионой. (Только не поправляйте, я все равно не помню актеров по имени. Вы еще скажите как Гарри Поттера зовут.) Удивительно глупый фильм, я прямо удивился, как такую пургу сняли. Опоздали лет на 10-15. Героине тридцатник, а она решает проблемы подростков. “Чтобы добиться демократии, нужно заставить всех голосовать силой”. Лучше бы в Золушке снялась. Не рекомендую.

На конференции видел Рича Хикки и даже сидел рядом с ним на соседнем стуле. Это дает преимущество в споре с любым кложуристом, верно? Под конец с ним можно было даже сфотаться, но я что-то тупанул. Зато стал свидетелем этих кадров:

Кложе 10 лет, Рич режет юбилейный торт. Шок, смотреть до конца.

Балтимор считается неблагополучным городом. Половина населения черные. Кварталы с неграми опасны, там берзаботица, гетто и обособленная от общества жизнь. Где белые, там более-менее хорошо. Балтимор один из городов-лидеров по обороту наркотиков.

В центре города чисто, хорошая инфраструктура. Нет бордюров, пространство плавное. Ходят трамваи, машин мало. Светофоры и знаки расставлены правильно, переходами пользоваться удобно. Все парковки платные. Встречаются заборы, закругленные снизу. Интересно, зачем это? Чтобы ничего нельзя было прислонить? Для устойчивости?

Интересно выглядят автомобили. Много длинных грузовичков. Тяжелая техника брутальна: трубы, хром, цепи до пола.

В Америке паровое отопление, по вечерам то тут, то там из недр канализации выходит пар. Он с особым запахом, но ничего противного в этом нет. Наоборот, по мне это так романтично! Вспоминаю угрюмый Нью-Йорк из рассказов и компьютерных игр: ночь, улица, фонарь, полицейская сирена, пар из люка.

В город глубоко вдается гавань. Это прекрасное место! Не застроено всяким говном как бывает в России, нет. Там целая инфраструктура: пристань, музей, памятник, большое пространство для прогулок и отдыха.

С одной стороны гигантский книжный магазин на пять этажей. Это бывшая корабельная верфь. От пола до крыши восходят 5 огромных труб как у Титаника. Внутри балки и кирпичи, гайки размером с кулак, но все чисто и опрятно.

Это даже не магазин, а этакий центр творчества. Прямо среди сталлажей сидит черная молодежь и что-то креативят: рисуют фломастерами, разгадывают кроссворды, плетут фенечки. Кто-то валяется на полу с книжкой, никому нет дела.

Напротив магазина два в одном: океанариум и ботанический сад. Выглядит как огромный стеклянный куб, сквозь который все видно. Билет стоит 20 долларов, на входе фотографируют.

Поразил водопад на 5 этажей, очень круто. Хотя воронежский океанариум в Сити-парке все же лучше, серьезно.

В Америке много толстых людей. Это особенно заметно после России. У нас население хоть здоровьем не блещет, но, на мой взгляд, толстых меньше. Полные в основном люди пенсионного возраста. А в штатах много юных толстяков. Смотрится гротескно и неприятно. На молодом теле жир лежит искусственно, словно привязанная подушка.

Толстяки не идут, а плывут, словно мыльные пузыри. Я бы на месте министра здравоохранения схватился бы за голову. Лишний вес, да еще в столь юном возрасте – это же какая нагрузка на сердце!

Плохое питание главная причина полноты. В Америке едят черт знает что. Понятно, я не мог обойти магазины, но нигде не видел фруктов или нормальной молочной продукции. Сплошные бутерброды и пицца, все острое, соленое, с майонезом. Из напитков только кофе и газировки. Чай если есть, то почему-то холодный и сладкий.

Американцы считают чай бзиком азиатов и держат пакетики на ресепшене только для китайцев. Это не обязательно будет настоящий чай, может оказаться аптечный сбор, ромашка, какое-то сено – один хер чай. В буфете на конференции я спер несколько пакетиков Липтона, был счастлив.

Не мог не обратить внимание на то, как отличается американское произношение от британского. Я раньше в этом ничего не понимал, казалось бы, и там и тут английский. Ан нет. У американцев больше диалектов, произношений. В их восприятии британский английский слишком академичен. С удивлением заметил, что никто не понимает меня с первого раза. И лишь только когда разговоришься с человеком, налаживается диалог.

Совершенно невозможно понять негров. Когда черный служащий обращался ко мне, я понимал только по отдельным словам. Послушал разговор негров на остановке – вообще ни слова не понял, набор звуков. Мэтт, учитель английского, объяснил в чем дело.

Черные горазды на диалекты. Иной раз они столь сильно отличаются от оригинала, что им дают имя. В Балтиморе преобладает т.н. “Black English”, английский для черных. В нем начисто игнорируются некоторые основы языка, например глагол “be” не склоняется по лицам и временам: черные говорят “I be at home” (вместо “I am at home” или “I was at home”). Не считая того, что целые слоги проглатываются, ударения смещены и все в этом роде.

Даже черные учителя борятся с этим, снижая за black English оценки ученикам. Кажется, не помогает.

Словом, после того как я пожаловался Мэтту на трудности, он ответил, что это норма (малышева.jpeg). Сказал, что проблемы в восприятии британцев американцами и наоборот - старая проблема. Я теперь занят американским английским.

Балтимор портовый город. Под вечер улицы наполняются молодыми чернокожими. Они улыбаются и просят немного мелочи. Город знаменит крабами, которых ловят в океане и продают за баснословные деньги. Правильно приготовленный краб считается здесь чем-то вроде утки по-Пекински в Китае. Ходят слухи, что туристам под видом дорогих крабов впаривают обычных, на порядок дешевле.

Футболки на конференции были с логотипом Кложи и крабом. Я только потом догадался.

Американцы гротескно вежливы. Если ловишь чей-то взгляд, обязательно кивнут, улыбнутся, хау-ю-ду. Это хорошая практика.

Продираясь через бесконечные аэропорты, иной раз помогал женщинам с чемоданами. Все эти базары что европейские женщины отказываются от помощи – брехня. Ни одна не отказалась. Потому что там хоть и Европа, а переходы иной раз тоже через жопу. А вот как толпы негров идут на расслабоне мимо бабушки с чемоданом на лестнице, это для меня вопрос.

До свидания, Америка! Хоть и тяжело добираться, было круто. Еще вернусь.

Permalink

Closing all parentheses at once

While watching interesting presentation called Inspiring a future Clojure editor with forgotten Lisp UX, I've noticed author mentioned one really cool feature I was looking for some time - Interlisp's super-paren.

In short, Interlisp had this unique super-paren option, bound to ] key, that would close all opened parentheses at once.

To my knowledge, Emacs doesn't have something like this out of the box, unless you use Allegro CL mode or SLIME, but let's see how would it be hard to implement it.

First Google hit is this, which brings really nice implementation from Gilles. Copied code is down below:

(defun close-all-parentheses ()
  (interactive "*")
  (let ((closing nil))
    (save-excursion
      (while (condition-case nil
         (progn
           (backward-up-list)
           (let ((syntax (syntax-after (point))))
             (case (car syntax)
               ((4) (setq closing (cons (cdr syntax) closing)))
               ((7 8) (setq closing (cons (char-after (point)) closing)))))
           t)
           ((scan-error) nil))))
    (apply #'insert (nreverse closing))))

The function is using backward-up-list to find open parentheses for current block and get matching closing parentheses using syntax-table. Unique feature of this implementation is that target language doesn't have to have parentheses only. It works with Clojure, Java, C/C++, Javascript...

Assuming this function is bound to C-c ] in Emacs, let's see how it works out of the box with nested block in Clojure:

(defn my-func []
  {:key
    [(fun1) (fun2) (fun3 {:key2 
                           {:a1 "str"
                            :a2 (call [1 2 3 
                                            ^
                                            cursor is here

When you place the cursor where caret sign is (^) and press C-c ], everything will be nicely balanced and closed:

(defn my-func []
  {:key
    [(fun1) (fun2) (fun3 {:key2 
                           {:a1 "str"
                            :a2 (call [1 2 3])}})]})

Even better, it works correctly when open parentheses is found in comments:

;; testing ([{[[[
(defn my-func []
  {:key
    [(fun1) (fun2) (fun3 {:key2 
                           {:a1 "str"
                            :a2 (call [1 2 3])}})]})

Where it doesn't work is inside blocks where broader knowledge of surrounding structure is required. For example:

(defn my-fun []
  (let [a (call [1 2
                    ^
    (println a)))

and when you try to close the parentheses where caret is, it will yield:

(defn my-fun []
  (let [a (call [1 2])]))
    (println a)))

In short, it will close the whole block! However, to keep things fairly simple without adding complex modes like paren-mode, smartparens and etc. I'm pretty much fine with this.

Use it with other languages

Let's be adventurous and try this this facility for other modes, specifically editing C code (this should be applicable for C++, Java, Javascript...).

Nested C code example:

if (myfunc()) {
  if (v1) {
    if (v2) {
      call(); 
             ^

again, cursor is when caret is placed. Pressing C-c ] will yield a surprise:

if (myfunc()) {
  if (v1) {
    if (v2) {
      call();}}}

which is kind of too lispy for ordinary C developer. Let's fix that by adding formatting function and optional argument for calling it:

(defun close-all-parentheses (arg)
  (interactive "P")
  (let ((closing nil))
    (save-excursion
      (while (condition-case nil
         (progn
           (backward-up-list)
           (let ((syntax (syntax-after (point))))
             (case (car syntax)
               ((4) (setq closing (cons (cdr syntax) closing)))
               ((7 8) (setq closing (cons (char-after (point)) closing)))))
           t)
           ((scan-error) nil))))

    ;; changed part - call (newline-and-indent) when C-u prefix is given
    (dolist (token (nreverse closing))
      (when arg
        (newline-and-indent))
      (insert token))))

Now, when you go to above C code example and press C-u C-c ] (argument is set to true), it will insert remaining braces with correct formatting.

if (myfunc()) {
  if (v1) {
    if (v2) {
      call();
      }
    }
  }

Close, but not perfect. c-mode is using c-electric-brace to insert braces with proper formatting and something like this will make it near perfect. Here is a modified chunk of above code:

;; ...
    (dolist (token (nreverse closing))
      (if arg
          (progn
            (let ((last-command-event ?}))
              (newline)
              (c-electric-brace nil)))
        (insert token))

which will yield this:

if (myfunc()) {
  if (v1) {
    if (v2) {
        call();
    }
  }
}

Notice that calling (insert token) is not necessary: setting last-command-event to } in combination with c-electric-brace will actually insert } in buffer and place it with correct indentation.

This starts to show all complexities of c-mode that one should tackle with and I even didn't touch c++-mode with templates (implementing proper completion for <> should not be that hard).

Now, let's make close-all-parentheses implementation a bit more generic, so caller can provide own formatting function if necessary.

;; internal function which does most of the job

(defun close-all-parentheses* (indent-fn)
  (let* ((closing nil)
         ;; by default rely on (newline-and-indent)
         (local-indent-fn (lambda (token)
                            (newline-and-indent)
                            (insert token)))
         (indent-fn (if indent-fn
                      indent-fn
                      local-indent-fn)))
    (save-excursion
      (while (condition-case nil
         (progn
           (backward-up-list)
           (let ((syntax (syntax-after (point))))
             (case (car syntax)
               ((4) (setq closing (cons (cdr syntax) closing)))
               ((7 8) (setq closing (cons (char-after (point)) closing)))))
           t)
           ((scan-error) nil))))
    (dolist (token (nreverse closing))
      (if arg
        (funcall indent-fn token)
        (insert token)))))

Formatting function is expected to be in form:

(defun my-formatter (token)
  ;; do some formatting if necessary
  ;; and finally insert a token
  (insert token))

where (insert token) will do actual matched character insertion. Emacs supports number of ways to insert a character, so this is optional approach.

Here is the final implementation of close-all-parentheses:

(defun close-all-parentheses (arg)
  (interactive "P")
  (let ((my-format-fn (lambda (token)
                        ;; 125 is codepoint for '}'
                        (if (and (= token 125)
                                 ;; C, C++ and Java
                                 (member major-mode '(c-mode c++-mode java-mode)))
                            (let ((last-command-event ?}))
                              (newline)
                              (c-electric-brace nil))
                          (insert token)))))
    (close-all-parentheses* my-format-fn)))

Default formatting is using newline-and-indent, which will be enough for most cases. For specialized modes, close-all-parentheses can be a starting point. Note however that my implementation of c-mode formatting isn't perfect: mixing braces and brackets will easily confuse it so there is a bit room for improvements.

Again, if you prefer simplicity like I do, use original implementation ;)

Permalink

36. Datomic Quickstart, part 1

Datomic is a database based on the same principles that underly the design of Clojure itself. Learn what makes it different, and how to start using from Clojure immediately. This first part gives an overview of the architecture and data model, and walks you through your first transactions.

Permalink

Dealing with emoji in Clojure

Generally, I hate emoji and try to avoid them everywhere I could. Those colored faces look dull to me comparing to a good old text smile. But still, emoji might be helpful replacing icons with them. When you need a globe, a mail envelope or a flight sign, putting a proper emoji could be a fast and good enough solution.

After long Python experience, I though Java supports long unicode literals started with capital U and two bytes as follows (Python version):

>>> print len(u"\U0001F535") # prints 2

Surprisingly, it doesn’t. But I needed to put a blue circle sign that’s got U+1F535 number. So how should I turn that number into a string?

term

After googling for a while, I’ve done with a short Clojure function:

(defn unicode-to-string
  "Turns a hex unicode symbol into a string.
  Deals with such long numbers as 0x1F535 for example."
  [code]
  (-> code Character/toChars String.))

Usage example:

term

Adding it into business logic:

(let [caption "Some important feature"
      is-on? (get-feature-state)
      sign (if is-on?
             (unicode-to-string 0x1F535)  ;; blue circle
             (unicode-to-string 0x26AA))] ;; white circle
  (str sign \space caption))

Depending on whether the feature was enabled or not, the result message will have either a blue (active) or white (inactive) circle in front of it.

Permalink

Fallback on maps lookup on Clojure (or a gentle introduction to macros)

So, you have to deal with a map, but you can't be sure whether or not you have a value for a given key.

The easiest, and probably the most known is to use the get function (that accepts optionally a third argument which is the fallback value).

(def a-qb {:name "Josh"
           :surname "McCown"
           :team "New York Jets"})

(def another-qb {:name "Robert"
                 :surname "Griffin III"})

(get a-qb :team) ; => "New York Jets"
(get another-qb :team) ; => nil 
(get another-qb :team "Free Agent") ; => "Free Agent"

Another way is to use the keyword as a function or even the map as a function.

(:team a-qb) ; => "New York Jets"
(:team another-qb) ; => nil 

(a-qb :team) ; => "New York Jets"
(another-qb :team) ; => nil

I've learned that you can also specify a second argument to these cases, and if the value is not found, it will return this second argument.

(:team a-qb "Free Agent") ; => "New York Jets"
(:team another-qb "Free Agent") ; => "Free Agent" 

(a-qb :team "Free Agent") ; => "New York Jets"
(another-qb :team "Free Agent") ; => "Free Agent"

But what if you want to raise an exception if it does not exists?

(a-qb :team (throw (Exception. )))
(a-qb :team (throw (Exception. )))

If you call it this way, it will evaluate all parameters before the function is called (because it is a function), raising an exception regardless if the key-value pair exists or not on the map. A way to workaround this it is to use an or.

(or (a-qb :team) 
    (throw (Exception. ))

Because the or is implemented as a macro, it will only evaluate the second sexp (throw ...) unless the first sexp is falsy (nil or false) and thus raising an exception only on this case.

Permalink

Clojure Numerics, Part 5 - Orthogonalization and Least Squares

How to solve linear systems that have many solutions, or those that have no solutions at all? That's the theme for a thick math textbook, of course, but from the programmer's point of view, we are interested in the practical matters, so I'll stick to the main point. When I have less equations than there are unknowns, or I have too many equations, what can I do in Clojure to make those unknowns known? What functions do I call?

Before I continue, a few reminders:

  • Include Neanderthal library in your project to be able to use ready-made high-performance linear algebra functions.
  • Read articles in the introductory Clojure Linear Algebra Refresher series.
  • This is the fifth article in a more advanced series. It is a good idea to read them all in sequence if this is your first contact with this kind of software.

The namespaces we'll use:

(require '[uncomplicate.neanderthal
           [native :refer [dge dv]]
           [core :refer [mm trans view-sy view-tr axpy nrm2 mm! copy submatrix]]
           [linalg :refer [qrf org svd sv ls!]]])

Let's begin

I wrote about solving linear systems, and I think it was clear that, while there are complications along the way when the specific numbers are a bit funky, it is otherwise a quite straightforward business. Many problems are not structured so well. What about the case when we have more equations than there are unknowns? Such systems are overdetermined; there are so many conditions that there probably isn't any solution to satisfy them all. In that case, we have to choose the solution that is somehow "least bad". What is "least bad" is also its own matter. On top of that, since I am a practical programmer who prefers Clojure, I also require an elegant programming API to find the solution. And, since I accept nothing less than the best from Neanderthal, I want it to be superfast…

The Full-rank Least Squares

Let's consider the problem first. I have a system \(Ax=b\), where \(A\in{R^{m\times{n}}}\) is a data matrix holding coefficients, and \(b\in{R^m}\) is an observation vector, while \(m\geq{n}\). Such system is called overdetermined, and usually there is no exact solution to such system.

Since we can't find the exact solution, we opt for the next best, the "nearest" solution. Recall from the Clojure Linear Algebra Refresher series that in linear algebra we use norms to measure distance. Therefore, the nearest something is something with the least norm \({\lVert Ax-b \rVert}_p\). But, which norm to choose? 1-norm? ∞-norm? Different norms will give different answers. The matter is, fortunately, solved by the fact that minimization of 1-norm and ∞-norm is too complicated, so our best buddy the Euclidean norm is the winner.

Minimization of the Euclidean norm \(min {\lVert Ax-b \rVert}_2\) is called the least squares problem. It is convenient because this function is differentiable, and the 2-norm is preserved under orthogonal transformations (see Clojure Linear Algebra Refresher series for the basics, and the specific details follow here shortly).

When \(A\) is full-rank (\(ran(A)=n\)), there is a unique linear squares solution, and it can be found by solving the symmetric positive definite system \(A^TAx_{LS}=A^Tb\). You remember one of the previous articles in this series? We talked about these.

Let's try a random example in Clojure:

First, I'll construct a (randomly chosen) matrix a, and make sure its rank is 2 (here, by doing SVD):

(def a (dge 3 2 [1 2 3 -1 1 -1]))
(svd a)
#uncomplicate.neanderthal.internal.common.SVDecomposition{:sigma #RealDiagonalMatrix[double, type:gd mxn:2x2, offset:0]
   ▧                       ┓
   ↘       3.79    1.63
   ┗                       ┛
, :u nil, :vt nil, :master true}

There are 2 non-zero values there, so the rank is 2, which I wanted to check.

Next, I'll construct \(A^TA\):

(def ata (mm (trans a) a))
ata
#RealGEMatrix[double, mxn:2x2, layout:column, offset:0]
   ▥       ↓       ↓       ┓
   →      14.00   -2.00
   →      -2.00    3.00
   ┗                       ┛

I created ata as a general matrix, so we can both look at it and see that it is symmetric, as promised by math theory. Now I'll make it symmetric (in the Clojure data structure sense). Of course, in a real project, I'd skip those intermediate steps.

(def b (dge 3 1 [1 -1 -2]))
(def solution (sv (view-sy (mm (trans a) a)) (mm (trans a) b)))
solution
#RealGEMatrix[double, mxn:2x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   ┗               ┛

This should be the LS solution, and we'll check later whether a more general method will give the same answer. Yes, there are more general methods, so what I've shown you until now is just a learning-oriented warm-up. Usually you do not need to be so creative, and can just go to the method X and say "Solve this for me!".

An interesting side note here is that solving these normal equations is connected to solving gradient equations. Which might be an interesting topic if you're interested in machine learning. I'll have to say more on that when I finish these more general topics in the Linear Algebra series.

When we find the LS solution \(x_{LS}\), we can also compute the minimum residual \(r_{LS} = b-Ax_{LS}\). It's norm will tell us how far the system we solved exactly is from the original system.

(axpy -1 (mm a solution) b)
#RealGEMatrix[double, mxn:3x1, layout:column, offset:0]
   ▥       ↓       ┓
   →       1.18
   →       0.47
   →      -0.71
   ┗               ┛

Minimum residual, the norm of the residual, seems quite high here. We'll check whether our solution is correct later.

(nrm2 (axpy -1 (mm a solution) b))
1.459992790176863

This method is quite OK for full-rank systems, but even then, it is sensitive to the presence of small singular values (recall the last article about Singular Value Decomposition).

Another method is to use normal equations, which includes Cholesky factorization, matrix multiplication and matrix-vector multiplication. I'll skip this, because there is a better and more general method.

These better methods usually rely on the QR factorization.

QR Factorization

The idea behind the QR factorization is similar to the idea behind the LU and Cholesky factorizations that we met in previous articles: destructure matrix \(A\) to a product of a few matrices that have special properties (symmetric, triangular, positive definite, etc.) and then use that destructured form in "easier" computations. In the QR, the trick is to find an orthogonal matrix \(Q\) and and upper triangular \(R\), such that \(A=QR\). Recall from the refresher series that \(Q\) is orthogonal if \(Q^{-1}=Q^T\).

There is a proven theorem (look it up in a math textbook if you're interested in the details :) that for \(A\in{R^{m\times{n}}}\), there are orthogonal \(Q\in{R^{m\times{n}}}\) and upper triangular \(R\in{R^{m\times{n}}}\) such that \(A=QR\), the QR factorization of \(A\).

Since Q is orthogonal, and orthogonal transformations preserve the rank, shape and distance, we can intuitively guess that such factorization have many properties useful in rank-sensitive operations. I'll plead with you to trust the theory and believe me that this works quite well without further convincing (if you want to know more, the textbooks are at your disposal).

Most of these techniques are based on partitioning the columns of into the form such as

\begin{equation} A=QR= [Q_1 | Q_2] \begin{bmatrix} R_1\\0\\ \end{bmatrix} \end{equation}

and then using \(Q_1 R_1\) further. What is important to me as a programmer is, mostly:

  • To keep in mind that the QR factorization is the key step for these kinds of problems
  • To know what can be computed from QR factorization, and which method does it.
  • To keep in mind that, while very powerful, QR-based algorithms are not almighty; there are cases when further options should be explored.

The complexity of various flavors varies, but boils down to \(O(m\times{n^2})\).

But how to solve the system with it?

From \(Ax=QRx=b\), we have \(Q^{T}Ax=Rx\) and

\begin{equation} Q^Tb= \begin{bmatrix} c\\d\\ \end{bmatrix} \end{equation}

so \(R_1x_{LS}=c\). Since \(c\) is readily computed by a matrix multiplication, \(R_1\) is upper triangular system, and and triangular systems are quite easy to solve, this is what we were looking for.

Here I call Neanderthal's sv method, which computes the factorization and maintains all crutial parts of it. Please note that the matrix :or and vector :tau together hold Q and R encoded. Using raw :or would produce gibberish. Other functions in Neanderthal expect exactly this raw structure and know how to work with it, so do not mess with its contents yourself.

(def qr (qrf a))
qr
#uncomplicate.neanderthal.internal.common.OrthogonalFactorization{:eng #object[uncomplicate.neanderthal.internal.host.mkl.DoubleGEEngine 0x4e780e3a "uncomplicate.neanderthal.internal.host.mkl.DoubleGEEngine@4e780e3a"], :or #RealGEMatrix[double, mxn:3x2, layout:column, offset:0]
   ▥       ↓       ↓       ┓
   →      -3.74    0.53
   →       0.42   -1.65
   →       0.63   -0.01
   ┗                       ┛
, :jpiv nil, :tau #RealBlockVector[double, n:2, offset: 0, stride:1]
[   1.27    2.00 ]
, :master true, :fresh #atom[true 0x4a43cc2b], :m 3, :n 3, :or-type :qr, :api-orm #function[uncomplicate.neanderthal.internal.api/eval6950/fn--7034/G--6891--7047], :api-org #function[uncomplicate.neanderthal.internal.api/eval6950/fn--7129/G--6923--7138]}

Did you know that even the mm! method can work with Q encoded in this form? It is important, since we need to compute \(Q^Tb\). Even transpose works with QR, although it is not a proper matrix itself:

(def cd (mm! 1 (trans qr) (copy b)))
cd
#RealGEMatrix[double, mxn:3x1, layout:column, offset:0]
   ▥       ↓       ┓
   →       1.87
   →       0.61
   →      -1.46
   ┗               ┛

\(R\) is just the upper triangular part of the qr instance that we have, so:

(let [r1 (view-tr (:or qr) {:uplo :upper})]
  (sv r1 (submatrix cd 0 0 2 1)))
#RealGEMatrix[double, mxn:2x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   ┗               ┛

Yep, we got the same result using different method. Please note that the third component, \(d\) is equals to the one that we predicted using minimum residual \({\rho_{LS}= \lVert d \rVert}_2\) with the first method.

Least Squares in one quick sweep

Let's check with the function we could have used at the start, ls. Yes, there is a simple almighty method that makes all this fuss unnecessary. But, If I just showed you that, what would we have learned about the fine points of solving these kinds of systems and the many gotchas that await us there?

So:

(ls! (copy a) (copy b))
#RealGEMatrix[double, mxn:3x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   →      -1.46
   ┗               ┛

Hey, all methods that we used gave the same solution! That's a good sign.

Let's see whether we can recover the original Q and try with it, just to further our understanding:

(let [q (org qr)
      r1 (view-tr (:or qr) {:uplo :upper})]
  (sv r1 (mm (trans q) b)))
#RealGEMatrix[double, mxn:2x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   ┗               ┛

This is the same result indeed! This is good enough for me, today at least.

Please note that I copied all these instances in code, but for best performance, you'd probably use the destructive variants of those methods.

Next installments…

I'll probably take some time to write some more about problems connected to least squares, and their solutions. Or I'll skip right to eigen-problems. We'll see. Anyway, more Clojure & Neanderthal stuff follows soon.

Permalink

PurelyFunctional.tv Newsletter 247: Clojure SYNC, Guy Steele, Zach Tellman

Issue 247 – October 16, 2017

Hi Clojurers,

I recently got back from Clojure/conj 2017. I’ve included some of my favorite talks, but I haven’t seen all of them yet. However, I left the Conj very proud of the community we have. Despite the drama we sometimes see online, the people in the community really care. They are listening. And they are working to make things better.

Be sure to check out the message about Clojure SYNC below.

Please enjoy the issue.

Rock on!
Eric Normand <eric@purelyfunctional.tv>

PS Want to get this in your email? Subscribe!


Watching a Language Grow YouTube

My favorite talk was by The Joy of Clojure authors Chris Houser and Michael Fogus. They reviewed the development of Clojure, its features, and its community. Such good memories.


Rich Hickey’s Opening Keynote YouTube

Rich Hickey talks about the reasoning behind the design of Clojure–what problems it was meant to solve. Along the way, he ruffles some feathers, especially in the static typing world. I was a surprised by some of the statements he made, and I was baffled by others. I’m sure others are in the same boat. This talk is worth another listen and I hope to shed some light on his ideas.


Clojure SYNC

I talk about SYNC every week, yet I met many people at the Conj with questions like “when are tickets available” and “are there speakers lined up yet?” I asked a lot of people at the Conj for advice on this, and they all said to talk about it more. So here I am!

If Strange Loop is about the confluence of Industry and Academia, Clojure SYNC is about the connections between our skills, our work, and the history of technology, using Clojure as a lens.

  • The complete speaker lineup is here; most notable speaker: Dr. Gerald Jay Sussman.
  • Tickets are available here
  • The venue and hotels are beautiful
  • Dates: February 15 & 16, 2018, just after Mardi Gras
  • Remember: Clojure/West is merging with the Conj in 2018, so only the Conj will happen (I’m guessing in the Fall). The Clojure calendar around SYNC is wide open.

And here’s a little secret that I’ll share with you: the ticket sales are not at break-even yet. If you are even thinking about coming, please email me if there is anything to help you make up your mind.

Also, start talking to your managers about getting it in next year’s budget. Or pay for it now with this year’s leftover budget! The conference would also make a great company retreat for remote teams. Get out of the cold of February!


Homoiconicity It is What It Is YouTube

Stuart Sierra researches and presents the history of the term homoiconic and how it applies to Clojure.


It’s Time for a New Old Language YouTube

I cannot resits a Guy Steele talk. This one does not disappoint. Guy Steele examines the history of the notation used in Computer Science papers to describe the syntax and semantics of type systems and languages.


All I needed for FP I learned in High School Algebra

My own talk at the Conj went pretty well. I talk about the process of programming and of analyzing the properties of the operations in our domain. Unfortunately there was a problem in the recording equipment and my slides weren’t recorded. They were added in later, without all the nice animations and a little out of sync. Still, it’s pretty good!


Baishampayan Ghose (BG) to speak at Clojure SYNC

BG has been a big part of the Clojure community and an inspiration to me since I first started learning Clojure. As the CTO of Helpshift, he’s built a team of 80 Clojure programmers in India. He’ll be speaking about the reality of building a company on Clojure.


Elements of Clojure

Zach Tellman hosted an Unsession discussion at the Conj. It inspired me to finish reading his book, which I did on the plane home. It’s a good book, filled with useful nuggets of wisdom. But the last chapeter (the one I read on the plane), which is all about Indirection, was a gem. It starts technical but then gets wonderfully personal and vivid, calling on examples from literature and Tellman’s own life. Highly worth a read, even before it’s released.

Oh, and Zach Tellman will speak at Clojure SYNC.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.