Writing a Simple Clojure Library

I’ve been learning/using Clojure on and off for about 2 years. The lispy syntax isn’t a deterrent for me at all, in fact, I’m quite fond of it and consider it very elegant. However, it does take some time to get used to. I don’t use Clojure or anything remotely close in my day job, but I love to find something useful to implement using Clojure. In the past few days I found such niche.

Terminal colours

Ever wonder how some console applications can output coloured text? Most terminals and terminal emulators (think iTerm or konsole) support colour through a system of escape sequences.

An Escape Sequence starts with the ASCII code for the escape character (0x1b or 33). There a list of control characters you can specify after the escape character that controls the colour and style of the text after it. For example, sending “[31mfoo” to a terminal means “output red coloured text from now on”. Everything the terminal output will be coloured red, until the terminal reads another escape sequence, including “[ESC]0m”, which tells the terminal to reset the styles to the default.

So I had this idea to implement this in Clojure as a library so that console application authors can use it to output stylized text from their application using idiomatic Clojure, and here comes Lumiere.

Implementing it in Clojure

Design the interface

What’s our end goal here? We would like to output colour sequence wrapped text with Clojure function calls such as

(red "foo") ; output foo in red colour
(bg-green "bar") ;"bar" with green background colour
(bold (magenta "nyan cat")) ; bold and magenta
(-> "nyan cat" red bg-white bold) ; use Clojure's "thread macro" to combine these functions, resulting in red foreground, white background and bold "nyan cat"

Toolchain

I wrote a blog post a year ago about how I liked Cake the Clojure build system alternative to the defacto Leiningen. The recent news on this is that Cake and Leiningen are merging. This time, I decided to use Leiningen from the start. Even though Leiningen hasn’t ported some of the Cake goodies, but I’m hoping they will get ported soon.

First working version:

With lein new lumiere, Leiningen generates the default project layout. By default, Leiningen generates src/lumiere/core.clj and test/lumiere/core.clj. Because Lumiere is such a small script, we don’t need the ‘lumiere.core namespace, rather I’d like the functions to be in the ‘lumiere namespace. The easiest way is to delete src/lumiere/ folder and create lumiere.clj under src. Same goes for the test file.

Following the spirit of TDD, off I went to write my first test:

(ns lumiere.test
  (:use [lumiere])
  (:use [clojure.test]))

(deftest test-only-foreground
  (is (= "\033[30msome black text\033[0m" (black "some black test"))))

This tests that the correct sequences of characters are generated by the call to (black "..."). 33 is the ASCII code for .

To make this test pass, I added this in src/lumiere.clj,

(ns lumiere)
(defn black [text]
  (format "\033[0m" 30 text)) ; 30 is the code for black foreground.

This passes the tests, but obviously there’s room for abstraction:
* the red foreground code is 31, green 32, etc…
* the black background code is 40, red 41, green 42, etc…

So here we go:

(defn- colour ; defn- makes this function private to the current namespace.
  ( (format "\033[%dm" (+ (if is-bg? 40 30) code)))
  ( (colour code false)))

(defn black [text] (colour 0))
(defn red [text] (colour 1))
(defn bg-black [text] (colour 0 true))
(defn bg-red [text] (colour 1 true))

Clojure supports "default" arguments through method overloading. Here we adjust the offset based on whether or not that colour is a foreground or background.

Second take: use macro to define declaratively create colour functions

One advantage of lispy syntax is the convergence of programming with meta-programming. If you read the core Clojure library code, you'll find that Clojure defines a few special forms and most control structures are written in macros. In my case, however, I'm using macros to define colour/style functions in a declarative way. Some may argue it doesn't justify using macro for this purpose, but I just want to practice writing macros, and it does make the public interface of the library a bit prettier.

First off, again, we need to define what the end result should look like. I would still like the functions to remain the same, e.g., red, bg-red, black, bg-black, etc. However, defining these functions takes a lot of boilerplate code. I'd like to simply call (defcolour BLACK 0) or the like to generate the `black` and `bg-black` functions for me.

Disclaimer: I'm a novice macro writer. Advanced readers, please hold your nose and tell me what I did wrong or any improvements could be made 🙂

(def RESET "\033[0m")
(defmacro defcolour [colour-func-name bg-colour-func-name colour-code]
  `(do
    (defn ~colour-func-name [text#]
      (format "%s%s%s" (colour ~colour-code) text# RESET))
    (defn ~bg-colour-func-name [text#]
      (format "%s%s%s" (colour ~bg-colour-code true) text# RESET))))

(defcolour black bg-black 0)
(defcolour red bg-red 1)
(defcolour green bg-green 2)
; etc...

A few special reader macros you need to know about when writing a macro:
- tick (`) indicates the following code should be quoted and treated as a template.
- tilda (~) indicates that the symbol should not be quoted (unquote), and should be replaced with the value in the current context.
- hash (#) indicates that the macro system should generate a unique name for this symbol so it doesn't conflict. Otherwise, it will be expanded to its fully qualified name.

Run tests and because we didn't change our public interface, everything all tests should still pass.

Take 3: combining styles

Alright, now that we have a fully functional style system, we can cascade the styles, e.g., (red (bg-green "foo")). It work as expected when trying it in a REPL, but the character sequence it generates is "33[31m33[42mfoo33[0m33[0m" and surely it isn't optimal. If we want to add styles such as bold, it's going to get even worse.

So, we need some abstraction here. When you call (red "some text") it shouldn't generate the character sequence right away. Instead, the caller should decide when the sequence should be generated. We need some data structure to represent a "luminated" text. In Clojure we can define a "record".

(defrecord Lumiere [text opts])

"opts" is a map with keys :fg, :bg, :styles. We also want to override the toString() method so when the user calls (str lumiered-text), he will get the character sequence ready to be printed to the console. The downside of this is that we modified the interface, so we need to go back and change the tests so they call (str (red "foo")):

  (is (= "\033[30msome black text\033[0m" (str (black "some black test")))))

To override toString, we need to extend our Lumiere type to conform to the IObject protocol:

(defn- ansi-escape-seq [& codes]
  (format "\033[%sm" (join ";" (filter #(not= % nil) codes))))

(defrecord Lumiere [text fg bg styles]
  Object
  (toString [this]
    (let [prefix (ansi-escape-seq (:fg this) (:bg this) (:styles this))]
      (format "%s%s%s" prefix (:text this) RESET))))

Next we need to let the colour/style functions return Lumiere object, rather than plain character sequence. There are two situations we need to adapt:
1. When we first start decorating a text, text input is going to be a plain string. In this case, we need to create a Lumiere object with the text and options.
2. When the return of a colour/style function is chained into another colour/style function, we need to modify the options of the Lumiere object.

(defn- adapt-lum [text option value]
  (let [local-option-map (merge {:fg nil :bg nil :styles nil} {option value})]
    (cond
      (instance? String text) (Lumiere. text (:fg local-option-map) (:bg local-option-map) (:styles local-option-map))
      (instance? Lumiere text) (assoc text option value)
      :else (throw (java.lang.IllegalArgumentException.)))))

and we need to modify the macros so "adapt-lum" helper is used:

(defmacro defcolour [colour-func-name bg-colour-func-name ^Integer colour-code]
  `(do
     (defn ~colour-func-name [text#]
       (adapt-lum text# :fg ~colour-name))
     (defn ~bg-colour-func-name [text#]
       (adapt-lum text# :bg ~bg-colour-name))))

Publish to Clojar.org

Now that the library is in a relatively stable state. I'd like to publish this snapshot version to a repository. Clojars.org is the most popular clojure library repository. Register on clojars.org, add let them know your public key. Then do lein pom && lein deploy, voila!

Advertisements

Write sudoku solver in Clojure

…yeah, because the world just needs another Sudoku solver. Well, I’m not trying to solve world hunger with it, but just an attempt to practice Clojure, I took (read: stole) Peter Norvig’s sudoku solver algorithm (written in Python) and adapted it into Clojure. I put it up on Github under sudoku-clj. The algorithm itself isn’t *that* hard to understand. The porting to a lisp-y syntax made the code a little longer than its Python counterpart. I’m sure seasoned Lisp/Clojure users can point out dozens of places where more idiomatic/succinct syntax can be used (If you happen to be one, do tell, by the way).

Here’s a few things I noticed:

  • Mutable states in clojure are captured using `ref`s. The object itself (in this case, the grid, which is a hash map) doesn’t mutate, but the reference is changed to point to different grid objects that represent a configuration at a given step.
  • Clojure sequences are Lazy. A few times I tried to print out the current state (remaining digits) of the square, but if you simply do (println seq), you will get a Java-ish toString() output of the sequence object. You need to force the lazy sequence to be evaluated by (println (apply str seq)). Needless to say, you lose the advantage of lazy sequences, so use it sparingly.
  • Python’s list comprehension syntax is fabulous. Clojure’s counterpart for comprehension doesn’t feel as elegent, nor is map a function onto a sequence to achieve that (the way I used it)
  • Cake is yummy!
  • The performance isn’t great…I must have done something wrong, but the easy sudoku grid took about 2 seconds (with the JVM already booted), while the Python algorithm solves it in a fraction of a second.
  • Because assign/eliminate are mutually recursive, my current implementation uses the naive way of doing recursion, i.e., let the stack grow. Clojure has a function `trampoline`, which adds a level of indirection that applies to mutually recursive functions. It uses `recur` at tail end position (basically translates the recursive calls into loops) which doesn’t fill your process’s stack. It might not be obvious (to me anyways) how one can do that with a few levels of function calls in between assign/eliminate, but I’m sure there’s a way

Cake – the yummy Clojure build system

About 10 minutes ago I heard about cake clojure build system, and gave it a try. And 10 minutes later, it won me over! Wow, it addresses all the pain points of leiningen

BLAZINGLY FAST
Sorry for using all CAPS but I’m very excited about this improvement over leiningen — OK, it may not be the fault of leiningen that JVM cold startup time is non-trivial but hey, someone came up with an idea of having a long running JVM process in the background, so subsequent clojure tasks reuse the same JVM instance. Cake folks integrated that nicely. It takes about 10-15 seconds to boot up a JVM but subsequent cake tasks or execution of clojure code is virtually instant! Comparing to leiningen, which doesn’t take this approach and every single task (such as common ones like lein test) takes around 5 seconds. This adds up quickly and makes you less efficient. The speed improvement alone is enough for me to switch to cake.

Advanced REPL functionalities: tab completion, history
It just works. Very useful for having instant feedbacks while exploring the language and API. No more manually adding jLine to your classpath or hack around tab completion wrapper…It just works! (I know I said it already)

run clojure files directly
OK, leiningen can do this too, but through plugin. I feel this is a very handy functionality, which probably should be included in the core.

autotest
Detects your code change and automatically run your test suites! Sweet.

compatible with leiningen project definition files
Cake understand project.clj, so I don’t need to do anything for my existing leiningen projects. Change directory to the project and `cake` away 😀

Overall, it just works out of the box. No more mucking around with dev-dependencies and other chores and let you focus on what you’d love to do.

Setting up a Clojure Project with Maven

In this blog post I’m going to record my recent experience in setting up a Clojure project using the clojure-maven-plugin.

Clojure-Maven-Plugin

First you need to compile the plugin from source:
git clone git://github.com/talios/clojure-maven-plugin.git
cd clojure-maven-plugin
mvn install
Of course, you will need to have Maven2 installed already.

After that, the compiled plugin jar will be in your maven local repository. Create a pom.xml file to use the plugin. I’m using the pom.xml from my project Programming Collective Intelligence as an example:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>pci</groupId>
    <artifactId>pci</artifactId>
    <name>Programming Collective Intelligence</name>
    <version>0.0.1-SNAPSHOT</version>

    <build>
        <plugins>
            <plugin>
                <groupId>com.theoryinpractise</groupId>
                <artifactId>clojure-maven-plugin</artifactId>
                <version>1.2-SNAPSHOT</version>
            </plugin>
        </plugins>
    </build>

</project>

Also, setting up clojure-lang and clojure-contrib (optional, but nice to have) as dependencies as follows:

    <dependencies>
        <dependency>
            <groupId>org.clojure</groupId>
            <artifactId>clojure-lang</artifactId>
            <version>1.1.0-alpha-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.clojure</groupId>
            <artifactId>clojure-contrib</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
    </dependencies>

Clojure builds haven’t reached Maven central repository yet, so you need to build Clojure from source, and install it into your local repository by using
mvn install:install-file -Dfile=clojure-lang.jar -DgroupId=org.clojure -DartifactId=clojure-lang -Dversion=1.1.0-alpha-SNAPSHOT -Dpackaging=jar

Do the same for clojure-contrib.

You can also create a directory in under your project root and have pom.xml looking for artifacts in that directory as well. I found this to be useful for artifacts that are not in the central repository. To do so, add repository definition in pom.xml:

    <repositories>
        <repository>
            <id>lib-repo</id>
            <name>lib-m2-repository</name>
            <url>file://${basedir}/lib-repo</url>
            <layout>legacy</layout>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
            <releases>
                <checksumPolicy>ignore</checksumPolicy>
            </releases>
        </repository>
    </repositories>

Afterwards, you need to setup your directory structure as follows:

${project_root}
-- lib-repo
    -- org.clojure
        -- jars
            * clojure-lang-1.1.0-alpha-SNAPSHOT.jar
            * clojure-contrib-1.1.0-alpha-SNAPSHOT.jar
        -- poms
            * clojure-lang-1.1.0-alpha-SNAPSHOT.pom
            * clojure-contrib-1.1.0-alpha-SNAPSHOT.pom

The pom files can be found in your local repository after you’ve done mvn install:install-file.

If you want to run clojure:repl goal, you’d better add jline to your dependency as well:

<dependencies>
  ...
        <dependency>
            <groupId>jline</groupId>
            <artifactId>jline</artifactId>
            <version>0.9.94</version>
        </dependency>

  ...
</dependencies>

By convention, the plugin compiles everything under ${project_root}/src/main/clojure and ${project_root}/src/test/clojure.

It’s hard to imagine working in a language like Clojure without automated unit tests. Fortunately, clojure-maven-plugin has the clojure:test goal which runs unit tests. All you need to do is telling the plugin where’s the entry point of your unit test. Add the following configuration in the build section:

    <build>
        <plugins>
            <plugin>
                <groupId>com.theoryinpractise</groupId>
                <artifactId>clojure-maven-plugin</artifactId>
                <version>1.2-SNAPSHOT</version>
                <configuration>
                    <testScript>src/test/clojure/pci/test.clj</testScript>
                </configuration>
            </plugin>
        </plugins>
    </build>

The test script looks like the following:

(ns my-namespace
  (:use clojure.contrib.test-is
          test-module-1
          test-module-2))
(run-tests 'test-module-n)

There you have it! The sample files can be found in my pci-clj project on GitHub.

Programming Collective Intelligence in Clojure

They say the best way to learn a new programming language is by programming in it. Therefore I’m starting this project converting algorithms in the book Programming Collective Intelligence into Clojure, while learning the best practices and language idioms during the process.

I’ve created a GitHub project for this. I’m not sure how far I’m able to go but let’s see.

Explore Clojure: Building a Bifid cipher

Lately I’ve been teaching myself Clojure, a Lisp dialect on the JVM platform. I still love Erlang and still learning it, but Clojure has a special draw for me being a JVM language and its Lisp roots. I studied Scheme (another Lisp dialect) in my college years and deemed it purely academic. However, Clojure has the potential of changing this and bring the expressiveness of Lisp and the power of functional programming to the Java world.

The best way to learn a language is to use it to solve some non-trivial practical problems. I stumbled this problem on Programming Praxis. It’s about building a Bifid cipher, which I thought could be a good practical problem for me to solve in Clojure.

Polybius Square

Bifid cipher is based on Polybius Square, which looks like this:

1 2 3 4 5
1 B G W K Z
2 Q P N D S
3 I O A X E
4 F C L U M
5 T H Y V R

Every encodable letter is given a vector value, which is the coordinate in this square. For example, X=[3 4], H=[5 2]. The first task would be building the Polybius square in Clojure.

My function polybius-square is going to take 2 parameters: charset is a list of encodable characters, and size is the size of the Polybius square.

(defn polybius-square
  "Polybius square by the given charset and size"
  [charset square-size]
  (map #(vector %1 [%2 %3]) charset
    (for [x (range 1 (+ 1 square-size)), y (range 1 (+ 1 square-size))] x)
    (take (* square-size square-size) (cycle (range 1 (+ 1 square-size))))))

Let’s see how this works. Suppose you’re calling this function with
(polybius-square "ABCDE" 3), charset is bound to “ABCDE” and size is bound to 3. What I want is a seq of vectors in the form of: (\Letter [x y]) where x and y are the coordinates. It’s natural to use map, with the function-to-map taking three parameters and vectorize them. That’s what the anonymous function does:
#(vector %1 [%2 %3])

%1 is the letter, %2 and %3 are the coordinates. I want the coordinate sequence to be like this: [1 1], [1 2], [1 3], [2 1], [2 2], [2 3]. In imperative language, you’d use a nested for loop, but we’ve established that’s ugly. We want the parameters fed to map to be of the following:
"ABCDEFGHJI" (1 1 1 2 2 2 3 3 3) (1 2 3 1 2 3 1 2 3 1 2 3)

To generate the first list, we use list comprehension:
(for [ x (range 1 (+ 1 square-size)), y (range 1 (+ 1 square-size))] x)
It reads: for every x in range 1 to square-size+1 (because “range” is right-exclusive) and for every y in the same range, output the value of x and take the whole output as the list. Because the binding of y comes later, and Lisp evaluates from right to left, the result of this list comprehension is (1 1 1 2 2 2 3 3 3)

Now let’s look at the other list: (1 2 3 1 2 3 1 2 3) – a cyclic list. Fortunately, Clojure has a built-in cycle function, which returns a lazy sequence of cycles of a collection. We want to call it with cycle '(1 2 3), so it’s going to be cycle (range 1 (+ 1 square-size)). Because cycle will be evaluated to an infinite sequence, we have to tell it when to stop.
When you see the square as a flat list, it’s clear that the cycle should stop when it reaches square-size^2, hence (take (* square-size square-size) (cycle (range 1 (+ 1 square-size)))).

The codec maps

After the Polybius square is built, we have the bases for our encryption. However, it’s not user-friendly: it’s a flat list of tuples. We want two maps for fast searching from letters to their encoding forms (letter-2-code) and from transposed vectors to the letters (transposed-2-letter).

(defn- letter-2-code
  "Produce a mapping between letters and their codes"
  [square]
  (letfn [(helper [the-square the-map]
          (if (empty? the-square)
            the-map
            (let [the-item (first the-square)
                  the-key (first the-item)
                  the-number (second the-item)]
              (recur (rest the-square) (assoc the-map the-key the-number)))))]
    (helper square (empty hash-map))))

The strategy here is to walk through the list, add each item to a hash-map. Here we define an inner function, which is basically a recursive helper. The function itself is pretty straightforward. Take note that in Clojure, when you need to do tail recursion, always use recur instead of calling the function by its name. This is because JVM cannot do automatic tail cut optimization (TCO), but Clojure does the optimization through the recur keyword. After the binding, we kick off the recursion by calling it with the initial list.

In the similar fashion, we have:

(defn- transposed-2-letter
  "Produce a mapping between the transposed codes to letter"
  [square]
  (letfn [(helper [the-square the-map]
            (if (empty? the-square)
              the-map
              (let [the-item (first the-square)
                    the-key (second the-item)
                    the-letter (first the-item)]
                (recur (rest the-square) (assoc the-map the-key the-letter)))))]
    (helper square (empty hash-map))))

Encode

Now comes the encode function. It’s important to note that the encoding will be different if a different polybius square is provided. Therefore, both encode and decode functions take the Polybius square as parameter.

During the encoding phase, we first convert every character in the message into a vector, then reorganize the vectors. To better illustrate, we use an example:
Suppose we have the following mapping:
{I [3 3], H [3 2], G [3 1], F [2 3], E [2 2], D [2 1], C [1 3], B [1 2], A [1 1]}
and we want to encode the word “CEDE”
CEDE => ([1 3] [2 2] [2 1] [2 2]), then we want to transpose the “matrix” composed of the above vectors, so it would be like:
([1 2] [2 2] [3 2] [1 2]), and finally looking up every item in the transposed-2-letter mapping
{[3 3] I, [3 2] H, [3 1] G, [2 3] F, [2 2] E, [2 1] D, [1 3] C, [1 2] B, [1 1] A}
=>BEHB

To get a mapping of message to vectors, we again map a function to the list:
(map #(get l2c % %) message) (l2c has been bound)
Then we apply the interleave function to the result, this gives:
(1 2 2 2 3 2 1 2). This is good but we want a list of vectors instead of a flat list. Clojure provides partition function, which does exactly it:
(partition 2 '(1 2 2 2 3 2 1 2))
gives:
([1 2] [2 2] [3 2] [1 2])

Then, we want to lookup every item in the result in the transposed-to-letter map and get a list of mapped characters. To achieve this, again, we use map, providing a different mapping:
(map #(get tr2l % %) ...) (tr2l is bound)
This returns a list of characters. We want a string, so we concatenate the characters by applying str function:
(apply str (map #(get tr2l % %) ...))

Here’s the complete code:

(defn encode
  "Return the message encoded by the specified polybius-square"
  [message polybius-square]
  (let [l2c (letter-2-code polybius-square)
        tr2l (transposed-2-letter polybius-square)]
    (apply str
      (map #(get tr2l % %) (partition 2 (apply interleave (map #(get l2c % %) message)))))))

Here you see the expressiveness of Clojure: all operations are chained together and formed in expression.

The decode function behaves similarly. The only trick is we need to split the translated code list in two. We use partition method again, with size being half of the entire list.

(defn decode
  "Return the decoded message using the specified polybius-square"
  [encoded polybius-square]
  (let [l2c (letter-2-code polybius-square)
        c2l (transposed-2-letter polybius-square)
        codes (apply concat (map #(get l2c % %) encoded))
        len (count codes)
        parts (partition (/ len 2) codes)]
    (apply str (map #(get c2l %1 %1) (map #(vector %1 %2) (first parts) (second parts))))))

Putting it together

Now with all the pieces are in place, we can put it together by writing a simple test:

(defn -main
  "A simple test case entry point"
  []
  (let [square (polybius-square "ABCDEFGHIKLMNOPQRSTUVWXYZ" 5)]
    (do (println (encode "BIFIDCIPHER" square))
      (println (decode (encode "BIFIDCIPHER" square) square)))))

Output:

BGAHFRQTOXW
BIFIDCIPHER

Have the complete source code here at http://github.com/kevinjqiu/bifid-clj, I’m sure there are a lot to be improved of the above implementation of the Bifid cipher. Especially currently I have no elegant way to solve the situation where the letter to be encoded is not in the Polybius square. Also, I’m sure my code isn’t entirely the idiomatic Clojure. If you have any suggestions, don’t hesitate to write a comment.

Although I have been learning Clojure for 3 days, I’m already enjoying its expressiveness and the clean look. As a Java programmer, Clojure also gives me the benefit of full access to the Java land. The potential is endless!