Thursday, July 03, 2008

Cassandra source on Google Code

As predicted, Facebook has open sourced Cassandra. It's available on Google Code with very little fanfare.

Some notes after my browsing through the source:

  • It uses hinted handoff and bootstrapping just like Amazon's Dynamo

  • Its consistency model doesn't seem to be quite the same - Dynamo uses vector clocks to determine causal relationships, whereas Cassandra seems to be just based on timestamps and "majority rules" semantics when timestamps are tied.

  • Membership is communicated by a gossip protocol as described in the Dynamo paper.

  • Requests are made to the system by sending thrift calls to any node. The thrift interface is included in the source.



Some further thoughts:

  • It doesn't seem like there's a lock on the table during bootstrapping. What happens to mutations made on the source node while it is bootstrapping the destination? Are they marked for later hinted handoff?

  • Would system performance be improved by using the new Thrift TNonblockingServer (see THRIFT-5 on JIRA)? It should be more scalable than the TThreadPoolServer they're using now.

  • Cassandra is around 40K lines of Java. How many lines would an equivalent Erlang program be, and what would be the performance difference?



All in all, it's a very interesting project sure to attract much attention. Now that Powerset has been acquired by Microsoft, I'm a little worried for Hbase's future -- two of the three main developers are Powerset employees. Maybe Cassandra can help fill the open source scalable database niche.