Tuesday, June 17, 2008

Automatically parallelizing list comprehensions in Erlang

This evening I got to chatting with some people on IRC about how it would be neat if Erlang had syntax to automatically parallelize a list comprehension. For those who aren't familiar with Erlang yet, here's a brief introduction:

A list comprehension is a quick way of performing an operation over a list of values. For example, at the erlang prompt:
1> MyList = lists:seq(1,10).
2> [X * 2 || X <- MyList].

For those familiar with Python list comprehensions, this is equivalent to [x * 2 for x in range(1,11)]. It's also just like the map operator common in pretty much every functional language. In both Python and Erlang you can get a bit more complicated:
3> [X * 2 || X <- MyList, X rem 2 == 1]. %% Filter out the odd numbers from the list

or even take the cartesian product of several lists in the same list comprehension:
4> [{X, Y, X * Y} || X <- lists:seq(1,3), Y <- lists:seq(1,3)].

Now, back to the interesting part. Everyone talks about how Erlang is so easy to make concurrent. But if we add the self() function, which returns the current running process id, to the output of the list comprehension we see that it is not parallelized:
5> [{self(), X, Y, X * Y} || X <- lists:seq(1,3), Y <- lists:seq(1,3)].

All of the results were computed in the same process.

Handily, Erlang provides a way of inserting user code in between the parser and the compiler in the form of parse transforms. A parse transform is specified in the compile options for a module and can make arbitrary modifications to the abstract syntax tree of an Erlang program before the compiler gets to it, with the important caveat: "programmers are strongly advised not to engage in parse transformations and no support is offered for problems encountered." Everyone knows caveats are lame, so I went ahead and built a parse transform. Here's example usage:
-compile({parse_transform, plc}).

test() ->
Result = plc:lc([{self(), A, B, C} || A <- lists:seq(1,2),
B <- lists:seq(3,4),
C <- lists:seq(5,6),
A * B < 5
io:format("result: ~p~n", [Result]).


result: [{<0.3145.0>,1,3,5},

We can see that the list comprehension has been performed in parallel this time - each result element has a different value of self().

Want to check it out? Clone the git repo on github.

This probably doesn't work that well. Don't use it on production software. Unless you want to test it thoroughly first.

No comments: