What's That Noise?! [Ian Kallen's Weblog]

« Python SVN Bindings,... | Main | Downturn, what downt... »

20081223 Tuesday December 23, 2008

Another Hadoop-backended Database: CloudBase

This post to one of the Hadoop mailing lists caught my eye, Announcing CloudBase-1.1 release. Wait, wasn't Cloudbase the embedded database company that IBM acquired several years back but ended up donating the product to the Apache Software Foundation as Derby? No, not that Cloudbase. This is apparently another project that aims to provide data warehousing on top of Hadoop.

I've been watching the emergence of HBase, Hypertable and most recently the proposed incubation of Facebook's Cassandra with great interest. The first two are modeled from Google's BigTable but all are essentially horizontally scalable column oriented databases. The developers of these systems explicitly steer away having their technologies pegged as relational databases, with the refrain: "We don't do joins." What the CloudBase project aims to do is not model themselves on BigTable but to explicitly support joins between tables built on top of an HDFS cluster. It looks like they've posted extensive documentation and have released a JDBC driver, pretty cool! This is the most interesting database initiative I've seen since GreenPlum announced their support for mapreduce.

Yes, as far as scale-out data analytics, we live in interesting times.


( Dec 23 2008, 04:02:21 PM PST ) Permalink
Comments [2]


Hey Ian, The Hive project from Facebook is similar in motivation to CloudBase, but has the advantages of being Apache 2.0 licensed, an official subproject of Hadoop, nearly a petabyte under management at Facebook, and a far more active user and developer community. If you're interested in data warehousing on Hadoop, I recommend checking out Hive as well! http://hadoop.apache.org/hive/ We're also offering training and support for Hive at Cloudera, which should make Hive easier to adopt for organizations outside of Facebook: http://www.cloudera.com/hadoop-training. Later, Jeff

Posted by Jeff Hammerbacher on December 24, 2008 at 01:49 PM PST #

the IBM one is called Cloudscape

Posted by Eugene Kuleshov on December 24, 2008 at 01:49 PM PST #

Post a Comment:

Comments are closed for this entry.