Writing a Simple Clojure Library

I’ve been learning/using Clojure on and off for about 2 years. The lispy syntax isn’t a deterrent for me at all, in fact, I’m quite fond of it and consider it very elegant. However, it does take some time to get used to. I don’t use Clojure or anything remotely close in my day job, but I love to find something useful to implement using Clojure. In the past few days I found such niche.

Terminal colours

Ever wonder how some console applications can output coloured text? Most terminals and terminal emulators (think iTerm or konsole) support colour through a system of escape sequences.

An Escape Sequence starts with the ASCII code for the escape character (0x1b or 33). There a list of control characters you can specify after the escape character that controls the colour and style of the text after it. For example, sending “[31mfoo” to a terminal means “output red coloured text from now on”. Everything the terminal output will be coloured red, until the terminal reads another escape sequence, including “[ESC]0m”, which tells the terminal to reset the styles to the default.

So I had this idea to implement this in Clojure as a library so that console application authors can use it to output stylized text from their application using idiomatic Clojure, and here comes Lumiere.

Implementing it in Clojure

Design the interface

What’s our end goal here? We would like to output colour sequence wrapped text with Clojure function calls such as

(red "foo") ; output foo in red colour
(bg-green "bar") ;"bar" with green background colour
(bold (magenta "nyan cat")) ; bold and magenta
(-> "nyan cat" red bg-white bold) ; use Clojure's "thread macro" to combine these functions, resulting in red foreground, white background and bold "nyan cat"

Toolchain

I wrote a blog post a year ago about how I liked Cake the Clojure build system alternative to the defacto Leiningen. The recent news on this is that Cake and Leiningen are merging. This time, I decided to use Leiningen from the start. Even though Leiningen hasn’t ported some of the Cake goodies, but I’m hoping they will get ported soon.

First working version:

With lein new lumiere, Leiningen generates the default project layout. By default, Leiningen generates src/lumiere/core.clj and test/lumiere/core.clj. Because Lumiere is such a small script, we don’t need the ‘lumiere.core namespace, rather I’d like the functions to be in the ‘lumiere namespace. The easiest way is to delete src/lumiere/ folder and create lumiere.clj under src. Same goes for the test file.

Following the spirit of TDD, off I went to write my first test:

(ns lumiere.test
  (:use [lumiere])
  (:use [clojure.test]))

(deftest test-only-foreground
  (is (= "\033[30msome black text\033[0m" (black "some black test"))))

This tests that the correct sequences of characters are generated by the call to (black "..."). 33 is the ASCII code for .

To make this test pass, I added this in src/lumiere.clj,

(ns lumiere)
(defn black [text]
  (format "\033[0m" 30 text)) ; 30 is the code for black foreground.

This passes the tests, but obviously there’s room for abstraction:
* the red foreground code is 31, green 32, etc…
* the black background code is 40, red 41, green 42, etc…

So here we go:

(defn- colour ; defn- makes this function private to the current namespace.
  ( (format "\033[%dm" (+ (if is-bg? 40 30) code)))
  ( (colour code false)))

(defn black [text] (colour 0))
(defn red [text] (colour 1))
(defn bg-black [text] (colour 0 true))
(defn bg-red [text] (colour 1 true))

Clojure supports "default" arguments through method overloading. Here we adjust the offset based on whether or not that colour is a foreground or background.

Second take: use macro to define declaratively create colour functions

One advantage of lispy syntax is the convergence of programming with meta-programming. If you read the core Clojure library code, you'll find that Clojure defines a few special forms and most control structures are written in macros. In my case, however, I'm using macros to define colour/style functions in a declarative way. Some may argue it doesn't justify using macro for this purpose, but I just want to practice writing macros, and it does make the public interface of the library a bit prettier.

First off, again, we need to define what the end result should look like. I would still like the functions to remain the same, e.g., red, bg-red, black, bg-black, etc. However, defining these functions takes a lot of boilerplate code. I'd like to simply call (defcolour BLACK 0) or the like to generate the `black` and `bg-black` functions for me.

Disclaimer: I'm a novice macro writer. Advanced readers, please hold your nose and tell me what I did wrong or any improvements could be made 🙂

(def RESET "\033[0m")
(defmacro defcolour [colour-func-name bg-colour-func-name colour-code]
  `(do
    (defn ~colour-func-name [text#]
      (format "%s%s%s" (colour ~colour-code) text# RESET))
    (defn ~bg-colour-func-name [text#]
      (format "%s%s%s" (colour ~bg-colour-code true) text# RESET))))

(defcolour black bg-black 0)
(defcolour red bg-red 1)
(defcolour green bg-green 2)
; etc...

A few special reader macros you need to know about when writing a macro:
- tick (`) indicates the following code should be quoted and treated as a template.
- tilda (~) indicates that the symbol should not be quoted (unquote), and should be replaced with the value in the current context.
- hash (#) indicates that the macro system should generate a unique name for this symbol so it doesn't conflict. Otherwise, it will be expanded to its fully qualified name.

Run tests and because we didn't change our public interface, everything all tests should still pass.

Take 3: combining styles

Alright, now that we have a fully functional style system, we can cascade the styles, e.g., (red (bg-green "foo")). It work as expected when trying it in a REPL, but the character sequence it generates is "33[31m33[42mfoo33[0m33[0m" and surely it isn't optimal. If we want to add styles such as bold, it's going to get even worse.

So, we need some abstraction here. When you call (red "some text") it shouldn't generate the character sequence right away. Instead, the caller should decide when the sequence should be generated. We need some data structure to represent a "luminated" text. In Clojure we can define a "record".

(defrecord Lumiere [text opts])

"opts" is a map with keys :fg, :bg, :styles. We also want to override the toString() method so when the user calls (str lumiered-text), he will get the character sequence ready to be printed to the console. The downside of this is that we modified the interface, so we need to go back and change the tests so they call (str (red "foo")):

  (is (= "\033[30msome black text\033[0m" (str (black "some black test")))))

To override toString, we need to extend our Lumiere type to conform to the IObject protocol:

(defn- ansi-escape-seq [& codes]
  (format "\033[%sm" (join ";" (filter #(not= % nil) codes))))

(defrecord Lumiere [text fg bg styles]
  Object
  (toString [this]
    (let [prefix (ansi-escape-seq (:fg this) (:bg this) (:styles this))]
      (format "%s%s%s" prefix (:text this) RESET))))

Next we need to let the colour/style functions return Lumiere object, rather than plain character sequence. There are two situations we need to adapt:
1. When we first start decorating a text, text input is going to be a plain string. In this case, we need to create a Lumiere object with the text and options.
2. When the return of a colour/style function is chained into another colour/style function, we need to modify the options of the Lumiere object.

(defn- adapt-lum [text option value]
  (let [local-option-map (merge {:fg nil :bg nil :styles nil} {option value})]
    (cond
      (instance? String text) (Lumiere. text (:fg local-option-map) (:bg local-option-map) (:styles local-option-map))
      (instance? Lumiere text) (assoc text option value)
      :else (throw (java.lang.IllegalArgumentException.)))))

and we need to modify the macros so "adapt-lum" helper is used:

(defmacro defcolour [colour-func-name bg-colour-func-name ^Integer colour-code]
  `(do
     (defn ~colour-func-name [text#]
       (adapt-lum text# :fg ~colour-name))
     (defn ~bg-colour-func-name [text#]
       (adapt-lum text# :bg ~bg-colour-name))))

Publish to Clojar.org

Now that the library is in a relatively stable state. I'd like to publish this snapshot version to a repository. Clojars.org is the most popular clojure library repository. Register on clojars.org, add let them know your public key. Then do lein pom && lein deploy, voila!

Advertisements

Building a Google Reader plugin using Chrome extension

OK, ok, I understand. The title is a bit misleading. Google Reader isn’t open for 3rd party plugins, and there’s no indication that Google will ever. However, with Google Chrome extension, we can build such local “plugins”.

What are we going to achieve?

Anyone uses Google Reader to read DZone feeds? I do. DZone is a very good tech news aggregator and you can vote and comment on stories. With Google Reader, you get DZone feeds like the following. I don’t know about you but for me, sometimes I just want to read the original story without going to the DZone page. It’d be nice if they have a “click through” action (like the following) on the action bar that brings you to the original story.
End goal

Basic strategy

So, how are we going to implement this feature? Chrome Extension code can be injected into the running page, and have full access to its DOM. Therefore, we can write code such that if the currently opened entry is a DZone entry, we insert ‘Click Through’ action into the entry action bar (action bar is the bar underneath the main entry, where ‘Add Star’, ‘Like’, ‘Share’ actions are). The ‘Click Through’ action, when clicked, will read the feed URL, fetch it in the background, parse it and get the URL of the original story, and open the original URL in a separate tab.

Create a manifest

A Chrome extension must have a manifest.json file containing the metadata of the extension.

{
    "name":"GReader",
    "version":"1.0",
    "description":"Enhanced Google Reader experience",
    "permissions": [
        "http://*.dzone.com/"
    ],
    "background_page":"background.html",
    "content_scripts":[{
        "matches":[
            "*://www.google.com/reader/view/*"
        ],
        "js":[
            "lib/jquery-1.6.4.min.js",
            "src/greader.js"
        ],
        "run_at":"document_idle"
    }]
}

Here we

  • specify that the extension needs to access any URL on dzone.com including its subdomains.
  • specify the background page
  • specify the content script

The “run_at” property will dictate when the content script is going to be run. Because Google Reader is a full AJAX application, we want our script to be run when the document is fully rendered.
We also specify the “matches” property, so our content script is only activated when the URL matches.

The content script

We start with:

(function($) {

})(jQuery);

This creates a function scope, which separates our $ variable apart from the current page’s $ variable. Google Reader (I assume, is using Google’s own closure library), already defines $ and it’s not the jQuery object. This idiom gives $ as jQuery.

We want to insert the “Click through” action in the entry action bar. To achieve this, we will need to listen on “DOMNodeInserted” event, and when such event happens and the node inserted is of the right CSS class name (“entry-action” here), we proceed to manipulate the DOM to add our customized actions.

    $("#entries").live('DOMNodeInserted', function(e) {
        if (!e.target.className.match(/entry\-actions/))
            return;

        var entryAction = new EntryAction($(e.target));
        if (entryAction.entry.url.match(/^http\:\/\/feeds\.dzone\.com/)) {
            entryAction.addAction({
                'name':'Click Through',
                'fn':function(entry) {
                    chrome.extension.sendRequest({"type":"fetch_entry", "url":entry.url}, function(response) {
                        var matched = /<div class="ldTitle">(.*?)<\/div>/.exec(response.data);
                        var href = ($(matched[1]).attr("href"));
                        if (href !== null) {
                            chrome.extension.sendRequest({"type":"open_tab", "url":href}, function(response) {
                                // TODO: do something afterwards?
                            });
                        }
                    });
                }
            });
        }
    });

Here I built a little bit of abstraction around the raw entry action bar. It’s encapsulated in EntryAction class, which I’ll show in a moment. Basically, if the current displaying entry’s feed URL starts with feed.dzone.com, I’ll build the “click through” action, and set the click handler. It sends the feed URL to the background script. The background script will do the cross-site request to fetch the feed content and send it back. Then the content script will regex match the content to get the original story’s URL, and ask chrome to open the URL in a new tab.

Here’s the code for EntryAction:

    var EntryAction = function(element) {
        this.element = element;
        var entryElmt = this.element.parent(".entry");
        var url = $(entryElmt).find(".entry-title-link").attr('href');
        this.entry = {
            "url" : url
        };
    };

    EntryAction.prototype.addAction = function(action) {
        var that = this;
        var onclick = function(e) {
            var actionFunc = action['fn'];
            actionFunc(that.entry);
        }

        this.element.append($("<span>")
            .addClass("link unselectable")
            .text(action['name'])
            .click(onclick));
    };

I won’t delve too much into this code. It makes assumptions about the structure of the DOM that Google Reader renders into. This does make the extension brittle but that’s the reality we have to deal with for client-side scripting. Luckily, Google Reader markup doesn’t change very often. For people new to object-oriented Javascript, this is one way to create a “class” (prototype) and put “instance” methods on a “class”.

Background.html

Unlike content scripts, which is injected and runs in the target page, the background page runs in its own process (the extension’s process) and keeps running while the extension is active. It’s comparable to the “server” side of the extension. For our extension’s purpose, we’re using the background script to make requests to 3rd party web sites (DZone).

<html>
<script type="text/javascript" src="lib/jquery-1.6.4.min.js"></script>
<script>
    (function($) {
        chrome.extension.onRequest.addListener(
            function(request, sender, sendResponse) {
                if (request.type === 'fetch_entry') {
                    $.get(request.url, function(data) {
                        sendResponse({"data":data});
                    });
                } else if (request.type === 'open_tab') {
                    chrome.tabs.create({'url':request.url});
                    sendResponse({"status":"ok"});
                }
            }
        );
    })(jQuery);
</script>
</html>

We register handlers for events that our “client” (the content script) is able to raise. Here we deal with 2 kinds of events: fetch_entry and open_tab.

  • Although Chrome 13 allows cross-site requests from content scripts, I’m actually quite fond of this pattern of delegating requests to the background page.
  • chrome.tabs isn’t accessible in the content script. That’s why open_tab is an event the client (the content script) can raise and delegate chrome specific API calls to the background script.

After Thoughts

That’s it! That’s my first Chrome extension. It’s not earth shattering or anything but I learned quite a lot. I like Chrome extension development — it’s straightforward and simple. The architecture is quite simple yet powerful. The code is on Github and I plan to expand it to a framework for customizing Google Reader experience. Here are a few things we can do with the extension:

  • Link “Share” action to twitter/Google+
  • Click on “Like” action to automatically vote up on DZone (or any other news aggregator)
  • Share with comment on Google Reader puts the comment on the entry on DZone (or any other news aggregator)
  • endless opportunities…

Spectrum.vim – My first Vim plugin

Over the past few months, I’ve been using Vim as my primary development tool at work and at home, and I have to say, I’m addicted to it! I’m thinking about writing a blog post of why I get hooked on “walking without crutches”, but for this post, I’m just going to introduce you to my first plugin in Vim – Spectrum.

Introduction

Spectrum is a vim colorscheme roulette. Ever getting tired of staring at the same colorscheme every day? Having hundreds of colorschemes in your repo but too lazy to deterministically pick one? Spectrum helps you by randomly pick colorschemes from your vim runtime path or from the web. From there, you have the chance of voting up a colorscheme so Spectrum will have a higher probability to pick it or voting down a colorscheme so you wouldn’t see it again.

Development

To be honest, I’m not a fan of Vim script – the language is not very expressive and doesn’t have a lot of object oriented features. Semi-fortunately, since Vim 7, they have added support for Python, Ruby and Perl scripts. I said ‘semi-fortunately’ because the support isn’t too comprehensive. For most core vim features, you still have to resort to calling Vim commands to achieve them, but at least I don’t have to use Vim script for the most part.

Spectrum is written in Python, and use the “vim“ module to interact with the hosting Vim instance. There is a bit of bootstrapping to do if you want to separate most of the Python code out of the entry point vim script (see https://github.com/kevinjqiu/spectrum.vim/blob/master/plugin/spectrum.vim). Many vim plugins written in Python require you to install the python code into your Python runtime before you can use them, but for a simple module like Spectrum, I opted for monkey patching syspath to include modules in the plugin folder.

Anyhow, give it a try and hope you like it.

https://github.com/kevinjqiu/spectrum.vim

Write sudoku solver in Clojure

…yeah, because the world just needs another Sudoku solver. Well, I’m not trying to solve world hunger with it, but just an attempt to practice Clojure, I took (read: stole) Peter Norvig’s sudoku solver algorithm (written in Python) and adapted it into Clojure. I put it up on Github under sudoku-clj. The algorithm itself isn’t *that* hard to understand. The porting to a lisp-y syntax made the code a little longer than its Python counterpart. I’m sure seasoned Lisp/Clojure users can point out dozens of places where more idiomatic/succinct syntax can be used (If you happen to be one, do tell, by the way).

Here’s a few things I noticed:

  • Mutable states in clojure are captured using `ref`s. The object itself (in this case, the grid, which is a hash map) doesn’t mutate, but the reference is changed to point to different grid objects that represent a configuration at a given step.
  • Clojure sequences are Lazy. A few times I tried to print out the current state (remaining digits) of the square, but if you simply do (println seq), you will get a Java-ish toString() output of the sequence object. You need to force the lazy sequence to be evaluated by (println (apply str seq)). Needless to say, you lose the advantage of lazy sequences, so use it sparingly.
  • Python’s list comprehension syntax is fabulous. Clojure’s counterpart for comprehension doesn’t feel as elegent, nor is map a function onto a sequence to achieve that (the way I used it)
  • Cake is yummy!
  • The performance isn’t great…I must have done something wrong, but the easy sudoku grid took about 2 seconds (with the JVM already booted), while the Python algorithm solves it in a fraction of a second.
  • Because assign/eliminate are mutually recursive, my current implementation uses the naive way of doing recursion, i.e., let the stack grow. Clojure has a function `trampoline`, which adds a level of indirection that applies to mutually recursive functions. It uses `recur` at tail end position (basically translates the recursive calls into loops) which doesn’t fill your process’s stack. It might not be obvious (to me anyways) how one can do that with a few levels of function calls in between assign/eliminate, but I’m sure there’s a way

New Year’s Resolution

2011 here we come! In the spirit of continual learning, I’m going to write down the technology I’d love to learn this year.

  • Haskell
  • Now that I’m more interested in functional languages, I’d love to look into this “pure” functional language that inspired countless other ones of its kind.

  • Lift
  • Last year I scratched the surface of Scala, a hybrid JVM language. I’m very fond of it, and think it has tremendous potential. Twitter and FourSquare are already using Scala, so it has been put to the test of some pretty high-profile usages. Lift is the most popular web framework built on top of Scala. It claims to have the rapid application development benefits from Rails and the benefits from statically typed language.

  • More advanced features of Clojure
  • In the past two years, I explored Clojure and on off. I love the Lisp idea of homoiconicity which unifies programming and meta-programming. That said, I haven’t been using macro in Clojure too much, and there are other cool ideas of Clojure I haven’t been able to explore deeply, such as protocols and software transactional memory (STM)

  • Ruby and Rails 3
  • For the past few years, I’ve been refraining myself from learning Ruby, because I’m already quite adept at Python, and I feel I should be learning languages that are different from what I already know. However, the more I heard about Ruby and its ideas, the more interesting it appears to me. On top of that, Rails 3 has come out, and it appears to have improved significantly. Moreover, there are a lot of advancement in the Ruby VM world such as JRuby, which means Ruby applications don’t have to run on the old and dreadful (hear-say) MRI. I wouldn’t mind taking Ruby and Rails out for a spin this year.

  • Android
  • Last year I got an HTC Legend Android phone, with the intention of developing for Android at some point of time. It didn’t work out that way, though, but Android continues to be a very interesting and fast growing platform. Mobile *is* the future, and I’d like to poke into the Android world this year, primarily because it’s open source. iOS is equally interesting technically, but I don’t own a Mac and I don’t like the idea of paying Apple $99 for SDK even though I don’t intend to publish on AppStore.

I think that’s enough for a year…or is it? There’s a few more technologies I wish to learn or keep up with:

  • GWT and Google App Engine
  • 2 years of professional GWT development made me a firm fan of this Google technology. I got out of GWT for different reasons, but I love the engineering effort they put into GWT. It may not take over the world but it’s definitely a solid player in the front-end web development arena. Especially now they integrated with the Spring framework and made deploying to App Engine easy, it may pick up more traction this year. I think the Java language is both the pros and cons of GWT. I’d love to see an alternative language (Scala) being implemented for GWT, but it may not happen any time soon.

  • A “NoSQL” database
  • Let’s face it, “NoSQL” is a terrible name, but it grabs people’s attention. I flirted with CouchDB briefly last year, and would love to continue this journey this year. Also, MongoDB seems interesting too.

  • Node.js
  • It’s the least I think I’d learn this year. It’s hot in the geekdom right now, and it has its value, for example, having both the server and client side written in Javascript eliminates the need to implement the validation logic in two different languages. However, I’m just not a big fan of Javascript. I think it’s a language that’s by a chain of serendipitous events became the world’s most widely used language. It carries a huge historical burden, and although it has cool features, some other modern languages have them too and do better. Regardless, given the stardom status of Node.js, it deserves some looking into 🙂

Use Python decorator to curry functions

It’s been a while since the last time I wrote about Python. This morning, I was listening to a podcast on my way to work. They were discussing functional programming and dynamic languages…I learned Python before I went into Computer Science, and then I learned about functional programming and through learning of Scala and Clojure, my functional programming concepts have been enriched. As I was listening, it suddenly appeared to me that there isn’t a way in Python to curry a function. Not that it’s critical to everyday development, but wouldn’t it be neat if I can curry a function in Python?

Then the hosts of the podcast discussed how dynamic languages are so flexible that you can pretty much do anything to it. “You can take a function as parameter, return a function from a function, and so on.” Hey, isn’t that what Python’s decorator can do? I learned decorators before, but I haven’t used it beyond the scope of creating properties and certainly haven’t written any decorators. I thought this would be a good exercise for learning decorators.

Here’s a simple example of what function currying: suppose you have a method

def add(x,y):
  return x+y

Then calling add(1,2) should be the same as add(1)(2). add(1) is what they call a partially applied function. It’s a function that takes one parameter.

Our goal here is to write a decorator “curried” that takes a function with n parameters and transform it in a way that can be applied n times and get the final result.

We’ll start with unit tests first:

import unittest

class CurryTest(unittest.TestCase):

	def test_with_no_args(self):
		@curried
		def do_nothing():
			return ""
		self.assertEquals("", do_nothing())

	def test_with_int_args(self):
		@curried
		def add_int(x,y):
			return x+y
		self.assertEquals(3, add_int(1)(2))
	def test_with_str_args(self):
		@curried
		def add_str(x,y):
			return "%s%s"%(x,y)
		self.assertEquals("ab", add_str("a")("b"))

So we make sure that a currying on a function takes no parameter is valid but should be a pass through, and also the “curried” decorator can be applied to any function with arguments (excluding positional arguments and keyword arguments)

A decorator is simply a function that takes a function as parameter:

def curried(fn):
  pass

and @curried is simply a syntactic sugar for:

def fn(...): ...
fn=curried(fn)

So, now we can write “curried” decorator.
To make the test for function with no argument pass, in curried() function, we can test to see if fn has arguments. Python’s standard library provides inspect.getargspec method:

def curried(fn):
  argspec = inspect.getargspec(fn)
  if len(argspec.args)==0:
    return fn
  else:
    # later

Now the first test passes.

For the other two cases, here’s the strategy. In Python, when a class defines __call__ method, the instance of that class is said to be “callable”. For instance:

class A(object):
  def __call__(self, arg):
    return arg

f=A()
f("echo")  # this gives you "echo"

This is very similar to Scala’s apply() function. Now that we have this in our inventory, we can define a `PartialFunction` class, take all the required parameters of the original function, and allow them to be applied one at a time. So the __call__ method of PartialFunc will look like this:

def __call__(self, value):
  # Xxx

If all the required parameters are passed in, PartialFunc should evaluate the original function with the complete argument list. Otherwise, PartialFunc stores the parameter in an instance variable, and returns itself.

Here’s the complete code:

class PartialFunc(object):
	def __init__(self, fn, argspec):
		self.fn = fn
		self.argspec = argspec 
		self.args = []

	def __call__(self, value):
		self.args.append(value) 
		if len(self.args) == len(self.argspec.args):
			arglist = ",".join(["self.args[%d]"%i for i in range(0, len(self.args))])
			return eval("self.fn(" + arglist + ")")
		else:
			return self

and the curried decorator:

def curried(fn):
	argspec = inspect.getargspec(fn)
	if len(argspec.args) == 0:
		return fn
	else:
		return PartialFunc(fn, argspec)

It’s pretty straightforward. When the parameters are complete, I construct a python statement that calls the original function with the complete argument list, and then pass the statement into an eval statement. I know evals are evil, but I can’t find a way in Python to dynamically change the signature of the original method and make it accept a variable length argument (varargs).

So this is it. It’s quite simple. Python methods can have varargs and keyword args, the situation gets a little more complicated. The thing is, both varargs and keyword args are not mandatory, so it’s hard for the curried function to know whether the argument list has been completed…Also, if you take default values into account, it could get even more complicated.

My impression on Scala so far

I’ve been exploring Scala on and off for some time now. Here’s my highly subjective and very limited impression of Scala.

What I like about Scala:

0) It’s statically-typed language.

That’s right! I don’t care what you ninjas say. As much as I love dynamic languages, I just prefer statically typed language for big projects. The benefit of having type information is enormous for a project with a large code base. I know you have to write unit tests for it anyways, but when 80% of your unit test code is checking types, it’s just counter-productive.

1) The ability to combine functional programming style with OOP.

This is what Scala is known for – a hybrid language that boasts the better in both OOP world and FP world. I’m not a functional programming purist, but I think Scala did a good job blending the functional programming elements into OOP. Writing Scala programs, I find myself much more empowered to be able to choose from both styles where I see fit. The ability to pass functions around and apply high order functions can significantly reduce boilerplate code and visual clutter.

2) The ability to create your own control flow (well, sort of).

I had a blog post about how to use function curry to create a customized control flow. This is very empowering and yet, it’s not an one-off syntactic sugar that’s sprinkled randomly into the language like some other language would do. Customization of control flow is a result of combining generic language features (such as currying, by-name parameter, implicit conversion, etc).

3) Object instead of static

Scala does away with the Java static keyword. Instead, it provides “object” keyword to define a class that only has one instance. It has the same power as singleton design pattern, without the extra boilerplate code. (if you think implementing singleton in Java is simple, think again. Especially in the multi-threaded context). Also, you can use the same construct to define “companion” objects, which can be used to implement the factory pattern, and put in miscellaneous methods, implicit conversion methods and so on. Extremely powerful yet elegant.

4) Operator overloading (well, sort of).

OK, it’s not exactly operator overloading – Scala allows many non-alpha-numeric characters in the method name, so operators are essentially methods. duh! if you think about it…why should + be treated any different from “add()”? Being able to use special characters in method names makes code easy to understand. However, I’m a bit disappointed that question mark (?) cannot be used in method names…One of the things I liked about LISP is you can define a function (odd? (x) (…)) and readers immediately know this function returns a boolean. I don’t have to ponder whether I should name it isOdd() or hasChildren(). Anyway…

5) Traits

In most cases, multiple inheritance is manageable. Seems Java threw the baby out with the bath water – rejecting multiple inheritance completely. It’s ironic that the AOP guys cracks open Java classes in the byte code level to do the “mixins”…With Scala, this seems to be natural.

6) Built-in parser-combinator library

It’s easy to do some non-mission critical parsing using the Scala standard library. The library itself is implemented as an internal DSL such that writing parser rules feels like writing EBNF directly.

Here’s what’s not so hot for me so far:

1) The generics still is obtrusive sometimes.

That said, however, it’s vastly superior than Java’s generics system with far better type inferencing. But sometimes, you still have to write down a lot of types especially on a parameterized method. What’s worse is because Scala is more strict than Java wrt types, you cannot ignore generic types the way you can in Java code. Moreover, because Scala runs on JVM, and because of type-erasure, you still can’t write code such as val t = new T()…although we can’t blame this on Scala.

2) The collection library.

For each collection data structure, Scala has the mutable implementation, immutable implementation, and the native Java Collection Library. If you’re writing code that calls Java library, very often you have to wrap JCL lists/maps/sets into the Scala ones, because you want to use the nice functional features Scala provides. I’m not smart enough to know a better solution to this, I’m just merely pointing out a sore spot. Clojure has a very clever and elegant way of integrating JCL data structures into Clojure code, but since Clojure and Scala are vastly different, I don’t know how relevant this is.

3) Start up time

One of Scala’s goal is to be able to both scale up and scale down. Scala programs can be run as scripts by the interpreter, but it can’t compete with Python or Ruby in terms of development turn-around time in terms of scripting. Again, this is because of the dreadful JVM startup process.

4) Lack of good tooling

The best Scala development environment I found so far is the Scala plugin for IntelliJ. It’s still not great for debugging, but for coding and integration with Maven and such, it’s great. The Eclipse plugin lags significantly, despite having built a brand new website and seemingly having more resource pour into it…I know I shouldn’t be overly critical of a community effort, but it’s just frustrating that it can’t even do auto import of classes…

Overall, I think Scala really has the potential to become a big name in the programming language landscape. It can do everything Java can and does better. It has so much features that Java doesn’t dare to add to keep backward compatibility or keep the language dead simple or politics or whatever reason…

Scala is not a hype, but in order to be adopted more rapidly, I think it needs to

1) Stablize the language and core library.

2) Tooling, tooling, tooling! Seriously.

Use function currying to reduce repetition and make code clean

Lately, I’ve been writing a parser using the Scala parser-combinator framework to parse some saves from a game. As a responsible programmer (:P), I write unit tests for each rule. However, I found myself having to write the following code over and over again:

@Test
def testRule1() {
  parserRule.apply(new CharSequence("someInput")) match {
    case Success(result, _) => {
      assertEquals("expected", result)
      /* other asserts if the result is a collection of something else */
      }
    case NoSuccess(msg, _) => fail(msg)
  }
}

It’s worth noting that you cannot pass in a string value to a Parser. A parser expects a CharSequence CharSequenceReader object. Luckily, with Scala, you can define implicit conversion to turn a string into a CharSequence CharSequenceReader:

implicit def str2charSeqReader(v:String) = new CharSequenceReader(v)

Another goody Scala offers is that obj.apply(arg) method can be invoked by obj(arg), so the above code becomes:

/* ... */
parserRule("someInput") match {
/* ... */
}

It’s much cleaner.

However, the matching part is not clean still. It’s not that bad for a single test case, but it’s hardly a good practice to copy/paste it all over the test code base. We need to refactor this. The first thing comes to mind is to extract the matching part into a separate method, and let the caller provide the assertion part. Because Scala is a functional language, we can declare the method takes a function as a parameter, instead of using the delegate pattern, creating another class to encapsulate the method to be called, as you would do in Java:

def tryMatch[T](pr:ParseResult[T], f:(T)=>Unit) {
  pr match {
    case Success(r, _) => f(r)
    case NoSuccess(msg, _) => fail(msg)
  }
}

Now the consumer (the test code) looks like this:

@Test
def test1() {
  tryMatch(rule1("input"), (result)=>{assertEquals("expected",result)})
}

It’s better, except that when you have multiple assert* methods need to be called, it quickly becomes ugly:

@Test
def test2() {
  tryMatch(listRule("1,2,3,4"), (result)=>{assertEquals(1,result(0));assertEquals(2,result(1));assertEquals(3,result(2));assertEquals(4,result(3))})
}

To do better, we can use function currying. I read about function currying before, but not until now did it dawn on me that I can use this technique to make my code look cleaner. Basically, currying means that a function with n parameters is the same as the function being applied one parameter at a time in succession. e.g., function add(x1:Int,x2:Int,…,xn:Int) adds all its parameters. add(1,2)=[add(1)](2), where add(1) becomes a partially applied function that takes an integer and returns an integer. Then the partial function is applied to the second parameter which gives the eventual result.

This is all very theoretical, but the practical implication is that we can transform tryMatch() method shown above from a method with 2 parameters into a curried function. Why? Because with Scala, when you have a method that only takes a single parameter (as is the case with the partially applied function), you can replace parentheses with curly brackets! Let’s see how it works:

def tryMatch[T](pr:ParseResult[T])(f:(T)=>Unit) {
  pr match {
    case Success(r, _) => f(r)
    case NoSuccess(msg, _) => fail(msg)
  }
}

Basically, we break the two parameter definitions. This creates a PartialFunction in Scala. No other code needs to be changed at all. Now we can call tryMatch like this:

@Test
def testList() {
  tryMatch(listRule("1 2 3")) {
    result => {
      assertEquals(1, result(0))
      assertEquals(2, result(1))
      assertEquals(3, result(2))
    }
  }
}

This is like creating your own control flow (like try-catch-finally). Ain’t that neat?

Finding Happy Numbers using Scala

The problem was posted on Programming Praxis. The algorithm itself is pretty straightforward, anyone can do it with a few if/else/fors, but to coerce myself to think functionally, I decide to practice writing it in Scala.

A number is a happy number if the sum of square of its digits eventually arrive at 1. For example, 7=>72=49=>42+92=97=>92+72=>130=12+32+02=10=>12+02=1, so 7 is a happy number. 17 is not a happy number because by applying the above process, it goes into a loop.

Step1: calculating the sum of squares of a number

To get a list of numbers from a given number, we can first convert the number into a string, and then map every character of the string to its corresponding integer value. A more mathematical way is to divide the number continually by 10 until the original number becomes 0, adding the remainder to a list each time…The first method is easier to visualize, so here it goes:
n.toString.toCharArray.map{digit=>Integer.valueOf(""+digit)}

It’s quite wordy but I’ve yet to find a better way. I’m pretty sure there is so if you know, do inform me.
Now I need to square each item in the list. So modify the above statement:
n.toString.toCharArray.map{digit=>Math.pow(Integer.valueOf(""+digit).doubleValue,2.0)}
One thing bothers me is that I have to explicitly call someInt.doubleValue…I thought Scala does the implicit conversion for me? Then I realized Integer.valueOf(…) gives me java.lang.Integer, not scala.lang.Int. So I have to write a implicit conversion function myself:

implicit def integer2double(i:Integer):Double = i.doubleValue

Now I can get rid of the hideous someInt.doubleValue.
Since we’re doing implicit conversion already, why not just implicitly convert a numeric character to a double so that it can be accepted by Math.pow?

implicit def char2double(ch:Char):Double = Integer.valueOf("" + ch).doubleValue

Now the code can be shortened to

n.toString.toCharArray.map { digit => Math.pow(digit, 2) }

Isn’t that sweet? Implicit conversion is cool but it’s easy to get carried away and do everything implicit, which makes the code hard to maintain, so there gotta be a balance somewhere. In the scope of this small exercise, I guess it’s OK to use it.

Now that we have the squares of individual digits in a list, we can calculate the sum by reducing or folding it:

((n.toString.toCharArray.map { digit => Math.pow(digit, 2) }).foldLeft(0.0) { _ + _ }).toInt

{_ + _} seems a lot like line noise. The underscore converts the statement into a closure. It’s a shortcut for (a,b)=>a+b. It’s succinct yet should be used sparingly.

Step2:return

lol, yeah, no extra fluff is needed. We can capture the constraints in one return statement, but before that, I need to decide on what states I need to carry on from each recursive step. (You know I’m going to use recursion, don’t you? :P)

There are a couple of states involved:
1) the limit. It’s the number of steps we allow before the final “1” is reached. It took number seven 5 steps to reach 1. This variable is the cut-off: if the given number can’t reach 1 before the limit, it’s considered unhappy (return false)
2) the current number of steps
3) the set of numbers already appeared. In the case of seven, the set is {49 97 130 10}. This set is used to determine if the calculations fall into a loop. If the current calculated number appears in this set, the original number is unhappy.

So here’s our method:

def isHappyNumber(n:Int, limit:Int, numOfTries:Int, alreadySeen:Set[Int]):Boolean = {
val sos = ((n.toString.toCharArray.map { digit => Math.pow(digit, 2) }).foldLeft(0.0) { _ + _ }).toInt
return sos == 1 ? true : (!alreadySeen.contains(sos) && numOfTries+1 <= limit && isHappyNumber(sos, limit, numOfTries+1, alreadySeen+sos))
}

The second line basically says: when sos (sum of squares) is 1, return true, otherwise, is the number already in the set of numbers seen during the calculation?, if not, does the number of calculation exceed the limit? if not, repeat the calculation, with sos being the “original” number, increase the counter and put sos in the alreadySeen set.

Note that this is a tail recursive function, we can add @tailrec annotation to the method to let the compiler optimize it – turn the recursion into a loop so that it won’t grow in stack.

Now that we’ve had the body of the function, we can write an overload method that provides initial values:

def isHappyNumber(n:Int):Boolean = isHappyNumber(n, 10, 0, Set[Int]())

To find all happy numbers between 1 and 100:

println (1 to 100 filter { isHappyNumber(_) })

Google IO 2010 is here!

This year’s Google IO conference is finally here! I don’t want to sound like a fanboy but I benefited a lot from last year’s IO sessions and looking at this year’s session list, I’m sure I won’t be disappointed.

This year’s keynote speech is an overview of Google’s strategy to bring web to the next level. It seems HTML5 is rapidly gaining grounds. Although the specification is not finalized yet, many of the new features are already implemented in most modern browsers, such as AppCache, WebSQL, LocalStorage, Web Worker, and WebSocket. GWT is still in a very strategic place in Google’s plan, and this year, they partnered with Spring to make it dead easy to setup a GWT-Spring application from ground up. I tried it myself shortly after I saw the keynote and it brought a lot of the agile element into Java development – scaffolding, ORM, test driven, fast reloading time in development and project lifecycle management with Maven. It all looked very good. I certainly will look more into this development setup.