Showing posts with label facebook. Show all posts
Showing posts with label facebook. Show all posts

Thursday, July 03, 2008

Cassandra source on Google Code

As predicted, Facebook has open sourced Cassandra. It's available on Google Code with very little fanfare.

Some notes after my browsing through the source:

  • It uses hinted handoff and bootstrapping just like Amazon's Dynamo

  • Its consistency model doesn't seem to be quite the same - Dynamo uses vector clocks to determine causal relationships, whereas Cassandra seems to be just based on timestamps and "majority rules" semantics when timestamps are tied.

  • Membership is communicated by a gossip protocol as described in the Dynamo paper.

  • Requests are made to the system by sending thrift calls to any node. The thrift interface is included in the source.



Some further thoughts:

  • It doesn't seem like there's a lock on the table during bootstrapping. What happens to mutations made on the source node while it is bootstrapping the destination? Are they marked for later hinted handoff?

  • Would system performance be improved by using the new Thrift TNonblockingServer (see THRIFT-5 on JIRA)? It should be more scalable than the TThreadPoolServer they're using now.

  • Cassandra is around 40K lines of Java. How many lines would an equivalent Erlang program be, and what would be the performance difference?



All in all, it's a very interesting project sure to attract much attention. Now that Powerset has been acquired by Microsoft, I'm a little worried for Hbase's future -- two of the three main developers are Powerset employees. Maybe Cassandra can help fill the open source scalable database niche.

Monday, June 30, 2008

Facebook's next open source projects: Hive and Cassandra

A couple weeks ago, Jeff Hammerbacher from Facebook presented some details on Cassandra (see later slides), a structured p2p storage system similar to Google's Bigtable or Amazon's Dynamo. What is most interesting about Cassandra is that they seem to be preparing to open source it imminently. Jeff bookmarked two things on delicious last night:
  1. Cassandra: Welcome to your new Wikidot site

  2. Cassandra: A Structured Storage System on a P2P Network in Launchpad

Both sites are empty as of now, but it looks like they're planning on releasing the source some time soon using bzr for version control.

Another interesting Facebook project is Hive, a sort of data warehousing solution built on Hadoop. They've been discussing open sourcing this for several months now, but it looks like things are starting to happen with HADOOP-3601: Hive as contrib project.

On the non-Facebook open source front, we've got some news coming soon as well. We've made the decision to open source several of our internal tools under an MIT license - hold tight for more info.

Friday, May 16, 2008

FB Engineering blog post on Facebook Chat

Eugene at Facebook posted an interesting article about the technology behind the new Facebook Chat. This new service has large parts written in Erlang and communicates with the rest of the system using the Thrift bindings Amie Street and Facebook have been collaborating on for the last couple of months.

The good news for us: our thrift bindings are pretty much guaranteed to be stable and leak/bug free now that they're used for millions of messages/second over at FB.

If you're interested, check out over at the thrift git repository