Moving to blog.reminiscential.org

Recently I discovered static site generator and I think it’s a better platform for a tech blog. From now on, my tech blog will be hosted at

Use Python’s sys.settrace() for fun and for profit

The itch to scratch

Everyone in the software industry knows Kent Beck, the pioneers of extreme programming and test-driven development and the co-author of JUnit. One of his lesser known project was JUnitMax, which aims to reduce the time developers have to wait while tests are running. One of the ideas behind that is that when code changes, only the test cases that exercise the code need to be run, instead of running the entire suite. The idea makes a lot of sense to me, but at the time, I (and the development shop I was in) wasn’t practising enough TDD, so unit test time wasn’t a big problem for me back then.

Fast-forward a few years, now as the project in my current company gets bigger, the time it takes to run tests is slowly becoming an impeding factor of my productivity. I remembered JUnitMax and say to myself, wouldn’t it be neat if something like JUnitMax were available? As the name suggests, JUnitMax is for Java while my project is in Python. Java, being a statically-typed language, has the blessings of statical analysis, which means a tool like JUnitMax can figure out which test cases cover which lines of code simply by type analysis. Python, however, being a dynamic language, doesn’t have this ability.

A few days ago, while I was running unit tests with coverage, it dawned on me that if the coverage tool knows which lines of the source code is covered by unit tests, couldn’t the same technique be used to figure out which lines are covered by which test cases?

So, I started looking into coveragepy‘s source code, and watching its author Ned Batchelder‘s excellent PyCon2011 video on sys.settrace. I wanted to build a proof-of-concept tool that integrates with the de-facto Python unit-test tool nose, that, when run, gathers the information about which lines in the files in the source folder are covered by which test cases, and hence nostrils is born.

Here comes `sys.settrace()`

Python’s motto is “batteries included”. This is manifested in many Python’s stanndard library modules, such as ast (source code parsing) and dis (bytecode disassembly). One of which is the ability to make the Python interpreter call an external function whenever a line of code is being executed. You can do a lot of fun stuff with it, for example, Coverage.py uses this to build code coverage data; pdb uses it to insert breakpoints into a running application and change the way a Python program is executed.

How can it be used?

For nostrils, we need to write a nose plugin that installs the trace function when a test is encountered. The trace function records the line numbers and the current test case name. After all tests are run, we have our map.

A simple use case

To start, we need a simple use case:

# worker.py
# this is the code-under-test
def add(x, y):
    z = x + y
    return z

def subtract(x, y):
    z = x - y
    return z

 

# test_worker.py
# test cases

import worker

def test_add():
    assert 1 == worker.add(1, 0)

def test_add___negative():
    assert 0 == worker.add(-1, 1)

def test_subtract():
    assert 0 == worker.subtract(0, 0)

class TestFoo(object):

    def test_add(self):
        assert 5 == worker.add(5, 0)

As you can see, we have 4 tests and 2 methods-under-test. Our goal is that when running `nosetests –with-nostrils` (–with-nostrils is the switch to turn on the nostrils plugin), we get the following mappings:


worker.py

def add(x, y):
  z = x + y # test_add, test_add_negative, TestFoo.test_add
  return z  # test_add, test_add_negative, TestFoo.test_add

def subtract(x, y):
  z = x - y # test_subtract
  return z  # test_subtract

Nose plugin

I won’t go into the details about how to create a plugin for nose. You can read it here, and you can take a look at my sample setup here. In a nutshell, every plugin has a name, and when nose is supplied with –with-plugin_name, your plugin is activated. Nose provides a test lifecycle “hooks” that plugins can implement. For example, startTest is called when a test case is discovered and adapted into a nose TestCase. addSuccess is called when a test case succeeded. finalize is called when all tests are finished.

Here’s how my plugin looks like:

class Nostrils(Plugin):
    name = 'nostrils'

    def addError(self, test, err, *args):
        self._restore_tracefn()

    def addFailure(self, test, err, *args):
        self._restore_tracefn()

    def addSkip(self, test, err):
        self._restore_tracefn()

    def addSuccess(self, test, err):
        self._restore_tracefn()

    def startTest(self, test):
        self._current_test = test
        self._install_tracefn()

    def finalize(self, result):
        self._print()

    def _install_tracefn(self):
        self._orig_tracefn = sys.gettrace()
        sys.settrace(self._trace) # See below

    def _restore_tracefn(self):
        sys.settrace(self._orig_tracefn)

The idea is that we install the trace function when test starts, and restore the trace function back to what it was. We also keeps track of what’s the current test in self._current_test.

Trace function

Now let’s have a look at the trace function:

class Nostrils(Plugin):
  # ...
  def _trace(self, frame, event, arg):
    if event == 'line':
      self._trace_down(frame)
    return self._trace

  def _trace_down(self, frame):
    while frame is not None:
      if frame.f_code == test.__call__.func_code:
        break

      self._collect(frame)
      frame = frame.f_back

A trace function should take 3 parameters:

  • frame: the current frame object
  • event: what type of event that triggered the trace function? See here
  • &asterisk;args: any additional arguments

Here, I’m only interested in the ‘line’ event, which is triggered when a new line of code is being executed. When this happens, we invoke _trace_down, which walks the frame stack by recursing on frame.f_back. When it’s None, we’re at the bottom of the stack. Because we’re tracing the execution of tests, we can probably stop traversing when the code object of the frame is the entry point of the test case (if frame.f_code == test.__call__.func_code). This way, we save ourselves some unnecessary traversals.

Data Collection

There’s are few things we need to collect: filename, line number of the code being executed and the test case name that covers the code.

class Nostrils(Plugin):
  def __init__(self):
    super(Nostrils, self).__init__()
    self._data = defaultdict(
      lambda : defaultdict(
        lambda : set([])
      )
    )

  def _collect(self, frame):
    filename, lineno = frame.f_code.co_filename, frame.f_lineno
    self._data[filename][lineno].add("%s:%s.%s" % self._current_test.address())

The data structure we use here is a dictionary of dictionary. At the top level, the keys are filenames, and the values are dictionaries of with keys the line numbers and the values the set of test case names. The data structure looks like this:

{
  'foo.py':{
      1 : set(['test_foo.py:test_foo_case1', 'test_foo.py:test_foo_case2']),
      2 : set(['test_foo.py:test_foo_case1', 'test_foo.py:test_foo_case2']),
      3 : set(['test_foo.py:test_foo_case2'])
  }
}

There we have it! We have a prototype of what could become a PyUnitMax 😉

Potential Problems

  • Scale: Now I’m only running nostrils on trivial code base. Profiling and optimization is needed if nostrils were to be used in real-world cases.
  • Multi-threading: No consideration was given to multi-threading at this stage.

Collaborators welcome!

I have since refactored the code, revised the data structure and published it on github. Please provide me with feedbacks and suggestions.

Realtime notification delivery using rabbitmq, Tornado and websocket

Our company has “hack-off” days once a while, where we developers get to choose whatever we would like to work on and present it to the entire company by the end of the day. I have been hearing this websocket buzz for a while now and would like to build something interesting with it.

WebSocket

Websocket is a persistent bi-directional connection between the browser and the server. With websocket, web browser can post message to the server, but what’s more interesting is that the server is able to push messages to the client (browser). This breaks away from the traditional web application request/response model. Traditionally, the client makes the request and waits for the server to give an answer. AJAX is revolutionary, but essentially, it’s still the same model: the client asks the server whether there’s anything interesting, but not the other way around. With websocket, the server suddenly becomes more involved and able to deliver more engaged user experience.

Our company provides web application for online invoicing. The web application allows users to create clients, create invoices, send them to clients, and so on. Each one of these are “events” which gets sent to RabbitMQ. We then have a plethora of RabbitMQ consumers that read messages off the queue and do interesting stuff with them.

Proof of concept

For this hack-off, my goal is to write a RabbitMQ consumer that reads the messages off the message queue, and deliver (notify) them to the front-end using websocket.

I’ve heard good things about Tornado. Having read their docs on websocket request handler, I felt it’s straightforward enough for me, so I chose Tornado as my backend.

Pika

One problem arises, though: The tornado server will run as a regular server, waiting for incoming websocket connections. The RabbitMQ consumer also needs to be in the same process event loop, waiting for incoming messages from the message queue. I looked into a few solutions such as sparkplug and stormed-amqp, neither seem to be a good hit here. Finally, I stumbled on Pika. It comes with a Tornado event loop adapter, which allows rabbitmq consumer and websocket handlers to run inside the same event loop. Perfect.

The entry point looks like this:


application = tornado.web.Application([
    (r'/ws', handlers.MyWebSocketHandler),
])

def main():
    pika.log.setup(color=True)

    io_loop = tornado.ioloop.IOLoop.instance()

    # PikaClient is our rabbitmq consumer
    pc = client.PikaClient(io_loop)
    application.pc = pc
    application.pc.connect()

    application.listen(8888)
    io_loop.start()

class MyWebSocketHandler(tornado.websocket.WebSocketHandler):

    def open(self, *args, **kwargs):
        pika.log.info("WebSocket opened")

    def on_close(self):
        pika.log.info("WebSocket closed")

That was straightforward. However, I’m faced with the problem of how to make the amqp consumer notify websocket handlers when we receive a message from the message queue. We cannot get the handler instances from the tornado application object. Note, each websocket connection has a corresponding “MyWebSocketHandler“ instance. The instances are not available from the application object. Maybe there’s a way to get them by other means, but I’m not familiar with the tornado API enough to know that.

However, from the handler, we do get the “application“ object, and because we attached pika_client (our amqp consumer) to the application, we have access to it inside our socket handler. Hey, how about registering the handler with the client when the websocket is connected, and let the client “notify” the handler when events are received? Hey, isn’t that the observer pattern?

Here’s the code:

class MyWebSocketHandler(websocket.WebSocketHandler):

    def open(self, *args, **kwargs):
        self.application.pc.add_event_listener(self)
        pika.log.info("WebSocket opened")

    def on_close(self):
        pika.log.info("WebSocket closed")
        self.application.pc.remove_event_listener(self)

Now, our PikaClient object need to support add_event_listener() and remove_event_listener() methods.

class PikaClient(object):

    def __init__(self, io_loop):
        pika.log.info('PikaClient: __init__')
        self.io_loop = io_loop

        self.connected = False
        self.connecting = False
        self.connection = None
        self.channel = None

        self.event_listeners = set([])

    def connect(self):
        if self.connecting:
            pika.log.info('PikaClient: Already connecting to RabbitMQ')
            return

        pika.log.info('PikaClient: Connecting to RabbitMQ')
        self.connecting = True

        cred = pika.PlainCredentials('guest', 'guest')
        param = pika.ConnectionParameters(
            host='localhost',
            port=5672,
            virtual_host='/',
            credentials=cred
        )

        self.connection = TornadoConnection(param,
            on_open_callback=self.on_connected)
        self.connection.add_on_close_callback(self.on_closed)

    def on_connected(self, connection):
        pika.log.info('PikaClient: connected to RabbitMQ')
        self.connected = True
        self.connection = connection
        self.connection.channel(self.on_channel_open)

    def on_channel_open(self, channel):
        pika.log.info('PikaClient: Channel open, Declaring exchange')
        self.channel = channel
        # declare exchanges, which in turn, declare
        # queues, and bind exchange to queues

    def on_closed(self, connection):
        pika.log.info('PikaClient: rabbit connection closed')
        self.io_loop.stop()

    def on_message(self, channel, method, header, body):
        pika.log.info('PikaClient: message received: %s' % body)
        self.notify_listeners(event_factory(body))

    def notify_listeners(self, event_obj):
        # here we assume the message the sourcing app
        # post to the message queue is in JSON format
        event_json = json.dumps(event_obj)

        for listener in self.event_listeners:
            listener.write_message(event_json)
            pika.log.info('PikaClient: notified %s' % repr(listener))

    def add_event_listener(self, listener):
        self.event_listeners.add(listener)
        pika.log.info('PikaClient: listener %s added' % repr(listener))

    def remove_event_listener(self, listener):
        try:
            self.event_listeners.remove(listener)
            pika.log.info('PikaClient: listener %s removed' % repr(listener))
        except KeyError:
            pass

I left out the queue setup code here for brevity. `on_message` callback is called when the consumer gets a message from the queue. The client, in turn, notifies all registered websocket handlers. Obviously, in real applications, you may want to do some kind of credentials and filtering, so the right message get to the right receiver. Then we simply call handler.write_message(), so the message gets relayed to the front-end’s websocket.onmessage callback.

Here’s some front-end code:

(function($){
    $(document).ready(function() {
        var ws = new WebSocket('ws://localhost:8888/ws');
        ws.onmessage = function(evt){
            alert(evt.data);
        }
    });
})(jQuery);

Yes, we simply echo the message back. For the hackoff, I did parse the data, render a slightly more detailed notification message, and display the notification using jquery-toaster.

Conclusion

This is my first stab at websocket and the tornado web framework. I’m not an expert on either subject, so chances are there are better ways to achieve the same result.

I think websocket is a very interesting technology. It opens a wide range of possibilities for more interactive and engaging web applications. Our web application is of traditional architecture: server renders most of the page, and every request involves page loads. Having a websocket may not be very beneficial as the application doesn’t have that much of user interaction. My hackoff is more of a proof of concept. However, if the application is a one-page web app (no full page reloads), the websocket model works very well.

Writing a Simple Clojure Library

I’ve been learning/using Clojure on and off for about 2 years. The lispy syntax isn’t a deterrent for me at all, in fact, I’m quite fond of it and consider it very elegant. However, it does take some time to get used to. I don’t use Clojure or anything remotely close in my day job, but I love to find something useful to implement using Clojure. In the past few days I found such niche.

Terminal colours

Ever wonder how some console applications can output coloured text? Most terminals and terminal emulators (think iTerm or konsole) support colour through a system of escape sequences.

An Escape Sequence starts with the ASCII code for the escape character (0x1b or 33). There a list of control characters you can specify after the escape character that controls the colour and style of the text after it. For example, sending “[31mfoo” to a terminal means “output red coloured text from now on”. Everything the terminal output will be coloured red, until the terminal reads another escape sequence, including “[ESC]0m”, which tells the terminal to reset the styles to the default.

So I had this idea to implement this in Clojure as a library so that console application authors can use it to output stylized text from their application using idiomatic Clojure, and here comes Lumiere.

Implementing it in Clojure

Design the interface

What’s our end goal here? We would like to output colour sequence wrapped text with Clojure function calls such as

(red "foo") ; output foo in red colour
(bg-green "bar") ;"bar" with green background colour
(bold (magenta "nyan cat")) ; bold and magenta
(-> "nyan cat" red bg-white bold) ; use Clojure's "thread macro" to combine these functions, resulting in red foreground, white background and bold "nyan cat"

Toolchain

I wrote a blog post a year ago about how I liked Cake the Clojure build system alternative to the defacto Leiningen. The recent news on this is that Cake and Leiningen are merging. This time, I decided to use Leiningen from the start. Even though Leiningen hasn’t ported some of the Cake goodies, but I’m hoping they will get ported soon.

First working version:

With lein new lumiere, Leiningen generates the default project layout. By default, Leiningen generates src/lumiere/core.clj and test/lumiere/core.clj. Because Lumiere is such a small script, we don’t need the ‘lumiere.core namespace, rather I’d like the functions to be in the ‘lumiere namespace. The easiest way is to delete src/lumiere/ folder and create lumiere.clj under src. Same goes for the test file.

Following the spirit of TDD, off I went to write my first test:

(ns lumiere.test
  (:use [lumiere])
  (:use [clojure.test]))

(deftest test-only-foreground
  (is (= "\033[30msome black text\033[0m" (black "some black test"))))

This tests that the correct sequences of characters are generated by the call to (black "..."). 33 is the ASCII code for .

To make this test pass, I added this in src/lumiere.clj,

(ns lumiere)
(defn black [text]
  (format "\033[0m" 30 text)) ; 30 is the code for black foreground.

This passes the tests, but obviously there’s room for abstraction:
* the red foreground code is 31, green 32, etc…
* the black background code is 40, red 41, green 42, etc…

So here we go:

(defn- colour ; defn- makes this function private to the current namespace.
  ( (format "\033[%dm" (+ (if is-bg? 40 30) code)))
  ( (colour code false)))

(defn black [text] (colour 0))
(defn red [text] (colour 1))
(defn bg-black [text] (colour 0 true))
(defn bg-red [text] (colour 1 true))

Clojure supports "default" arguments through method overloading. Here we adjust the offset based on whether or not that colour is a foreground or background.

Second take: use macro to define declaratively create colour functions

One advantage of lispy syntax is the convergence of programming with meta-programming. If you read the core Clojure library code, you'll find that Clojure defines a few special forms and most control structures are written in macros. In my case, however, I'm using macros to define colour/style functions in a declarative way. Some may argue it doesn't justify using macro for this purpose, but I just want to practice writing macros, and it does make the public interface of the library a bit prettier.

First off, again, we need to define what the end result should look like. I would still like the functions to remain the same, e.g., red, bg-red, black, bg-black, etc. However, defining these functions takes a lot of boilerplate code. I'd like to simply call (defcolour BLACK 0) or the like to generate the `black` and `bg-black` functions for me.

Disclaimer: I'm a novice macro writer. Advanced readers, please hold your nose and tell me what I did wrong or any improvements could be made 🙂

(def RESET "\033[0m")
(defmacro defcolour [colour-func-name bg-colour-func-name colour-code]
  `(do
    (defn ~colour-func-name [text#]
      (format "%s%s%s" (colour ~colour-code) text# RESET))
    (defn ~bg-colour-func-name [text#]
      (format "%s%s%s" (colour ~bg-colour-code true) text# RESET))))

(defcolour black bg-black 0)
(defcolour red bg-red 1)
(defcolour green bg-green 2)
; etc...

A few special reader macros you need to know about when writing a macro:
- tick (`) indicates the following code should be quoted and treated as a template.
- tilda (~) indicates that the symbol should not be quoted (unquote), and should be replaced with the value in the current context.
- hash (#) indicates that the macro system should generate a unique name for this symbol so it doesn't conflict. Otherwise, it will be expanded to its fully qualified name.

Run tests and because we didn't change our public interface, everything all tests should still pass.

Take 3: combining styles

Alright, now that we have a fully functional style system, we can cascade the styles, e.g., (red (bg-green "foo")). It work as expected when trying it in a REPL, but the character sequence it generates is "33[31m33[42mfoo33[0m33[0m" and surely it isn't optimal. If we want to add styles such as bold, it's going to get even worse.

So, we need some abstraction here. When you call (red "some text") it shouldn't generate the character sequence right away. Instead, the caller should decide when the sequence should be generated. We need some data structure to represent a "luminated" text. In Clojure we can define a "record".

(defrecord Lumiere [text opts])

"opts" is a map with keys :fg, :bg, :styles. We also want to override the toString() method so when the user calls (str lumiered-text), he will get the character sequence ready to be printed to the console. The downside of this is that we modified the interface, so we need to go back and change the tests so they call (str (red "foo")):

  (is (= "\033[30msome black text\033[0m" (str (black "some black test")))))

To override toString, we need to extend our Lumiere type to conform to the IObject protocol:

(defn- ansi-escape-seq [& codes]
  (format "\033[%sm" (join ";" (filter #(not= % nil) codes))))

(defrecord Lumiere [text fg bg styles]
  Object
  (toString [this]
    (let [prefix (ansi-escape-seq (:fg this) (:bg this) (:styles this))]
      (format "%s%s%s" prefix (:text this) RESET))))

Next we need to let the colour/style functions return Lumiere object, rather than plain character sequence. There are two situations we need to adapt:
1. When we first start decorating a text, text input is going to be a plain string. In this case, we need to create a Lumiere object with the text and options.
2. When the return of a colour/style function is chained into another colour/style function, we need to modify the options of the Lumiere object.

(defn- adapt-lum [text option value]
  (let [local-option-map (merge {:fg nil :bg nil :styles nil} {option value})]
    (cond
      (instance? String text) (Lumiere. text (:fg local-option-map) (:bg local-option-map) (:styles local-option-map))
      (instance? Lumiere text) (assoc text option value)
      :else (throw (java.lang.IllegalArgumentException.)))))

and we need to modify the macros so "adapt-lum" helper is used:

(defmacro defcolour [colour-func-name bg-colour-func-name ^Integer colour-code]
  `(do
     (defn ~colour-func-name [text#]
       (adapt-lum text# :fg ~colour-name))
     (defn ~bg-colour-func-name [text#]
       (adapt-lum text# :bg ~bg-colour-name))))

Publish to Clojar.org

Now that the library is in a relatively stable state. I'd like to publish this snapshot version to a repository. Clojars.org is the most popular clojure library repository. Register on clojars.org, add let them know your public key. Then do lein pom && lein deploy, voila!

Building a Google Reader plugin using Chrome extension

OK, ok, I understand. The title is a bit misleading. Google Reader isn’t open for 3rd party plugins, and there’s no indication that Google will ever. However, with Google Chrome extension, we can build such local “plugins”.

What are we going to achieve?

Anyone uses Google Reader to read DZone feeds? I do. DZone is a very good tech news aggregator and you can vote and comment on stories. With Google Reader, you get DZone feeds like the following. I don’t know about you but for me, sometimes I just want to read the original story without going to the DZone page. It’d be nice if they have a “click through” action (like the following) on the action bar that brings you to the original story.
End goal

Basic strategy

So, how are we going to implement this feature? Chrome Extension code can be injected into the running page, and have full access to its DOM. Therefore, we can write code such that if the currently opened entry is a DZone entry, we insert ‘Click Through’ action into the entry action bar (action bar is the bar underneath the main entry, where ‘Add Star’, ‘Like’, ‘Share’ actions are). The ‘Click Through’ action, when clicked, will read the feed URL, fetch it in the background, parse it and get the URL of the original story, and open the original URL in a separate tab.

Create a manifest

A Chrome extension must have a manifest.json file containing the metadata of the extension.

{
    "name":"GReader",
    "version":"1.0",
    "description":"Enhanced Google Reader experience",
    "permissions": [
        "http://*.dzone.com/"
    ],
    "background_page":"background.html",
    "content_scripts":[{
        "matches":[
            "*://www.google.com/reader/view/*"
        ],
        "js":[
            "lib/jquery-1.6.4.min.js",
            "src/greader.js"
        ],
        "run_at":"document_idle"
    }]
}

Here we

  • specify that the extension needs to access any URL on dzone.com including its subdomains.
  • specify the background page
  • specify the content script

The “run_at” property will dictate when the content script is going to be run. Because Google Reader is a full AJAX application, we want our script to be run when the document is fully rendered.
We also specify the “matches” property, so our content script is only activated when the URL matches.

The content script

We start with:

(function($) {

})(jQuery);

This creates a function scope, which separates our $ variable apart from the current page’s $ variable. Google Reader (I assume, is using Google’s own closure library), already defines $ and it’s not the jQuery object. This idiom gives $ as jQuery.

We want to insert the “Click through” action in the entry action bar. To achieve this, we will need to listen on “DOMNodeInserted” event, and when such event happens and the node inserted is of the right CSS class name (“entry-action” here), we proceed to manipulate the DOM to add our customized actions.

    $("#entries").live('DOMNodeInserted', function(e) {
        if (!e.target.className.match(/entry\-actions/))
            return;

        var entryAction = new EntryAction($(e.target));
        if (entryAction.entry.url.match(/^http\:\/\/feeds\.dzone\.com/)) {
            entryAction.addAction({
                'name':'Click Through',
                'fn':function(entry) {
                    chrome.extension.sendRequest({"type":"fetch_entry", "url":entry.url}, function(response) {
                        var matched = /<div class="ldTitle">(.*?)<\/div>/.exec(response.data);
                        var href = ($(matched[1]).attr("href"));
                        if (href !== null) {
                            chrome.extension.sendRequest({"type":"open_tab", "url":href}, function(response) {
                                // TODO: do something afterwards?
                            });
                        }
                    });
                }
            });
        }
    });

Here I built a little bit of abstraction around the raw entry action bar. It’s encapsulated in EntryAction class, which I’ll show in a moment. Basically, if the current displaying entry’s feed URL starts with feed.dzone.com, I’ll build the “click through” action, and set the click handler. It sends the feed URL to the background script. The background script will do the cross-site request to fetch the feed content and send it back. Then the content script will regex match the content to get the original story’s URL, and ask chrome to open the URL in a new tab.

Here’s the code for EntryAction:

    var EntryAction = function(element) {
        this.element = element;
        var entryElmt = this.element.parent(".entry");
        var url = $(entryElmt).find(".entry-title-link").attr('href');
        this.entry = {
            "url" : url
        };
    };

    EntryAction.prototype.addAction = function(action) {
        var that = this;
        var onclick = function(e) {
            var actionFunc = action['fn'];
            actionFunc(that.entry);
        }

        this.element.append($("<span>")
            .addClass("link unselectable")
            .text(action['name'])
            .click(onclick));
    };

I won’t delve too much into this code. It makes assumptions about the structure of the DOM that Google Reader renders into. This does make the extension brittle but that’s the reality we have to deal with for client-side scripting. Luckily, Google Reader markup doesn’t change very often. For people new to object-oriented Javascript, this is one way to create a “class” (prototype) and put “instance” methods on a “class”.

Background.html

Unlike content scripts, which is injected and runs in the target page, the background page runs in its own process (the extension’s process) and keeps running while the extension is active. It’s comparable to the “server” side of the extension. For our extension’s purpose, we’re using the background script to make requests to 3rd party web sites (DZone).

<html>
<script type="text/javascript" src="lib/jquery-1.6.4.min.js"></script>
<script>
    (function($) {
        chrome.extension.onRequest.addListener(
            function(request, sender, sendResponse) {
                if (request.type === 'fetch_entry') {
                    $.get(request.url, function(data) {
                        sendResponse({"data":data});
                    });
                } else if (request.type === 'open_tab') {
                    chrome.tabs.create({'url':request.url});
                    sendResponse({"status":"ok"});
                }
            }
        );
    })(jQuery);
</script>
</html>

We register handlers for events that our “client” (the content script) is able to raise. Here we deal with 2 kinds of events: fetch_entry and open_tab.

  • Although Chrome 13 allows cross-site requests from content scripts, I’m actually quite fond of this pattern of delegating requests to the background page.
  • chrome.tabs isn’t accessible in the content script. That’s why open_tab is an event the client (the content script) can raise and delegate chrome specific API calls to the background script.

After Thoughts

That’s it! That’s my first Chrome extension. It’s not earth shattering or anything but I learned quite a lot. I like Chrome extension development — it’s straightforward and simple. The architecture is quite simple yet powerful. The code is on Github and I plan to expand it to a framework for customizing Google Reader experience. Here are a few things we can do with the extension:

  • Link “Share” action to twitter/Google+
  • Click on “Like” action to automatically vote up on DZone (or any other news aggregator)
  • Share with comment on Google Reader puts the comment on the entry on DZone (or any other news aggregator)
  • endless opportunities…

Spectrum.vim – My first Vim plugin

Over the past few months, I’ve been using Vim as my primary development tool at work and at home, and I have to say, I’m addicted to it! I’m thinking about writing a blog post of why I get hooked on “walking without crutches”, but for this post, I’m just going to introduce you to my first plugin in Vim – Spectrum.

Introduction

Spectrum is a vim colorscheme roulette. Ever getting tired of staring at the same colorscheme every day? Having hundreds of colorschemes in your repo but too lazy to deterministically pick one? Spectrum helps you by randomly pick colorschemes from your vim runtime path or from the web. From there, you have the chance of voting up a colorscheme so Spectrum will have a higher probability to pick it or voting down a colorscheme so you wouldn’t see it again.

Development

To be honest, I’m not a fan of Vim script – the language is not very expressive and doesn’t have a lot of object oriented features. Semi-fortunately, since Vim 7, they have added support for Python, Ruby and Perl scripts. I said ‘semi-fortunately’ because the support isn’t too comprehensive. For most core vim features, you still have to resort to calling Vim commands to achieve them, but at least I don’t have to use Vim script for the most part.

Spectrum is written in Python, and use the “vim“ module to interact with the hosting Vim instance. There is a bit of bootstrapping to do if you want to separate most of the Python code out of the entry point vim script (see https://github.com/kevinjqiu/spectrum.vim/blob/master/plugin/spectrum.vim). Many vim plugins written in Python require you to install the python code into your Python runtime before you can use them, but for a simple module like Spectrum, I opted for monkey patching syspath to include modules in the plugin folder.

Anyhow, give it a try and hope you like it.

https://github.com/kevinjqiu/spectrum.vim

Scala Simple Build Tool — Not so simple after all, at least for now…

Update:I got sbt working by building directly from the master branch from their github repo. The current version is 0.7.5. The tagged 0.9.4 version is actually an older version. Anyway, tried it and kinda loved it.

This is just another late night rambling…I was trying to get a proper scala build system setup. I was using Maven scala plugin for a while, but longing for something simpler and more scalanic (is there such a word?). I was pretty happy at Cake, the Clojure build system and expected SBT to allow me to break away from using Maven to build Scala projects…boy, was I wrong…

First off, when you google ‘simple build tool’, you get a link to the SBT Google code home page. Well, nothing wrong there, except the “latest” version on Google code was 0.7.4 and it was half a year ago…Maybe it’s not that outdated, so I downloaded it, followed this instruction and setup my ~/bin/sbt script. Running it, it asked me to setup projects, and it only supported up until Scala 2.7.7…Hrm, 2.8 was out for a while now, so obviously, SBT 0.7.4 isn’t the latest. Reading their home page more carefully, they’re moving the repository to Github. Awesome! I’d pick Github over Google Code any time too.

Heading over to their Github repo, and found the latest stable version is 0.9.2. Good! So it should support Scala 2.8 now! Downloaded the zip, unzipped it, and of course it wasn’t executable. You need to build it. There’s a README.md, so quickly I less’ed it. For step 1, it asked me to go to the setup wiki page on Google Code (!), which is the steps I did setting up 0.7.4…I guess they’re using 0.7.4 as a bootstrapping build…Anyways, I did that. Step 2 was to run `sbt update “project Launcher” proguard “project Simple Build Tool” “publish-local”`. Of course it didn’t work. It’s complained 0.7.4 version of sbt-launch can’t download Scala 2.7.7 from any of the repository…bummer! But hey, I can download Scala 2.7.7 lib from Maven! So I quickly updated pom.xml of one of my projects to use Scala 2.7.7 and did an upgrade. Now 2.7.7 is happily in my local Maven repo. Ran that command again, hooray! It started to build, and judging by the number of packages it’s building, “simple” isn’t the first adjective that comes into my mind. Anyway, it’s building at least, so even if it’s a little complicated, so be it…Except…of course it broke half way… and why?

[info] Post-analysis: 107 classes.
[info] == Precompiled 2.7.7 / compile ==
[info]
[info] Precompiled 2.8.0 / compile …
[info]
[info] == Precompiled 2.8.0 / compile ==
[info] Source analysis: 9 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources…
[warn] there were deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Compilation successful.
[info] Post-analysis: 108 classes.
[info] == Precompiled 2.8.0 / compile ==
java.lang.OutOfMemoryError: PermGen space
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)

You’ve gotta be kidding me! I set -Xmx512M and it’s not enough? And why is it building every version of Scala *from source*?? Is there something called a…JAR?

Anyway, increased -Xmx from 512 to 1024M, ran again, wait, and same thing happened again! Out of PermGen space…urrgh…

I decided to give up, at least for the day… SBT is anything but simple, at least from my experience. I know it’s open source and people put efforts into it without compensation, so I shouldn’t be critical about it. I’ll give it a try again, and hopefully it’s worth the time investment.

Write sudoku solver in Clojure

…yeah, because the world just needs another Sudoku solver. Well, I’m not trying to solve world hunger with it, but just an attempt to practice Clojure, I took (read: stole) Peter Norvig’s sudoku solver algorithm (written in Python) and adapted it into Clojure. I put it up on Github under sudoku-clj. The algorithm itself isn’t *that* hard to understand. The porting to a lisp-y syntax made the code a little longer than its Python counterpart. I’m sure seasoned Lisp/Clojure users can point out dozens of places where more idiomatic/succinct syntax can be used (If you happen to be one, do tell, by the way).

Here’s a few things I noticed:

  • Mutable states in clojure are captured using `ref`s. The object itself (in this case, the grid, which is a hash map) doesn’t mutate, but the reference is changed to point to different grid objects that represent a configuration at a given step.
  • Clojure sequences are Lazy. A few times I tried to print out the current state (remaining digits) of the square, but if you simply do (println seq), you will get a Java-ish toString() output of the sequence object. You need to force the lazy sequence to be evaluated by (println (apply str seq)). Needless to say, you lose the advantage of lazy sequences, so use it sparingly.
  • Python’s list comprehension syntax is fabulous. Clojure’s counterpart for comprehension doesn’t feel as elegent, nor is map a function onto a sequence to achieve that (the way I used it)
  • Cake is yummy!
  • The performance isn’t great…I must have done something wrong, but the easy sudoku grid took about 2 seconds (with the JVM already booted), while the Python algorithm solves it in a fraction of a second.
  • Because assign/eliminate are mutually recursive, my current implementation uses the naive way of doing recursion, i.e., let the stack grow. Clojure has a function `trampoline`, which adds a level of indirection that applies to mutually recursive functions. It uses `recur` at tail end position (basically translates the recursive calls into loops) which doesn’t fill your process’s stack. It might not be obvious (to me anyways) how one can do that with a few levels of function calls in between assign/eliminate, but I’m sure there’s a way

Cake – the yummy Clojure build system

About 10 minutes ago I heard about cake clojure build system, and gave it a try. And 10 minutes later, it won me over! Wow, it addresses all the pain points of leiningen

BLAZINGLY FAST
Sorry for using all CAPS but I’m very excited about this improvement over leiningen — OK, it may not be the fault of leiningen that JVM cold startup time is non-trivial but hey, someone came up with an idea of having a long running JVM process in the background, so subsequent clojure tasks reuse the same JVM instance. Cake folks integrated that nicely. It takes about 10-15 seconds to boot up a JVM but subsequent cake tasks or execution of clojure code is virtually instant! Comparing to leiningen, which doesn’t take this approach and every single task (such as common ones like lein test) takes around 5 seconds. This adds up quickly and makes you less efficient. The speed improvement alone is enough for me to switch to cake.

Advanced REPL functionalities: tab completion, history
It just works. Very useful for having instant feedbacks while exploring the language and API. No more manually adding jLine to your classpath or hack around tab completion wrapper…It just works! (I know I said it already)

run clojure files directly
OK, leiningen can do this too, but through plugin. I feel this is a very handy functionality, which probably should be included in the core.

autotest
Detects your code change and automatically run your test suites! Sweet.

compatible with leiningen project definition files
Cake understand project.clj, so I don’t need to do anything for my existing leiningen projects. Change directory to the project and `cake` away 😀

Overall, it just works out of the box. No more mucking around with dev-dependencies and other chores and let you focus on what you’d love to do.

New Year’s Resolution

2011 here we come! In the spirit of continual learning, I’m going to write down the technology I’d love to learn this year.

  • Haskell
  • Now that I’m more interested in functional languages, I’d love to look into this “pure” functional language that inspired countless other ones of its kind.

  • Lift
  • Last year I scratched the surface of Scala, a hybrid JVM language. I’m very fond of it, and think it has tremendous potential. Twitter and FourSquare are already using Scala, so it has been put to the test of some pretty high-profile usages. Lift is the most popular web framework built on top of Scala. It claims to have the rapid application development benefits from Rails and the benefits from statically typed language.

  • More advanced features of Clojure
  • In the past two years, I explored Clojure and on off. I love the Lisp idea of homoiconicity which unifies programming and meta-programming. That said, I haven’t been using macro in Clojure too much, and there are other cool ideas of Clojure I haven’t been able to explore deeply, such as protocols and software transactional memory (STM)

  • Ruby and Rails 3
  • For the past few years, I’ve been refraining myself from learning Ruby, because I’m already quite adept at Python, and I feel I should be learning languages that are different from what I already know. However, the more I heard about Ruby and its ideas, the more interesting it appears to me. On top of that, Rails 3 has come out, and it appears to have improved significantly. Moreover, there are a lot of advancement in the Ruby VM world such as JRuby, which means Ruby applications don’t have to run on the old and dreadful (hear-say) MRI. I wouldn’t mind taking Ruby and Rails out for a spin this year.

  • Android
  • Last year I got an HTC Legend Android phone, with the intention of developing for Android at some point of time. It didn’t work out that way, though, but Android continues to be a very interesting and fast growing platform. Mobile *is* the future, and I’d like to poke into the Android world this year, primarily because it’s open source. iOS is equally interesting technically, but I don’t own a Mac and I don’t like the idea of paying Apple $99 for SDK even though I don’t intend to publish on AppStore.

I think that’s enough for a year…or is it? There’s a few more technologies I wish to learn or keep up with:

  • GWT and Google App Engine
  • 2 years of professional GWT development made me a firm fan of this Google technology. I got out of GWT for different reasons, but I love the engineering effort they put into GWT. It may not take over the world but it’s definitely a solid player in the front-end web development arena. Especially now they integrated with the Spring framework and made deploying to App Engine easy, it may pick up more traction this year. I think the Java language is both the pros and cons of GWT. I’d love to see an alternative language (Scala) being implemented for GWT, but it may not happen any time soon.

  • A “NoSQL” database
  • Let’s face it, “NoSQL” is a terrible name, but it grabs people’s attention. I flirted with CouchDB briefly last year, and would love to continue this journey this year. Also, MongoDB seems interesting too.

  • Node.js
  • It’s the least I think I’d learn this year. It’s hot in the geekdom right now, and it has its value, for example, having both the server and client side written in Javascript eliminates the need to implement the validation logic in two different languages. However, I’m just not a big fan of Javascript. I think it’s a language that’s by a chain of serendipitous events became the world’s most widely used language. It carries a huge historical burden, and although it has cool features, some other modern languages have them too and do better. Regardless, given the stardom status of Node.js, it deserves some looking into 🙂