Analysis of the NoSQL Landscape

| 8 Comments | 1 TrackBack

This is an overview of the current state of the NoSQL landscape. It's getting large and somewhat unwieldy and there may be projects which have landed in the wrong category here. I have included object databases in the mix too. Seriously folks, some of you need to pick more google friendly project names. Here are the types and the players in each category. Background data is available in this Google docs spreadsheet.

Key-value stores

Column-oriented stores

Google BigTable, HBase, Cassandra, HyperTable, OpenNeptune, KDI, QBase

Document Databases

CouchDB, MongoDB, Apache JackRabbit, ThruDB, CloudKit, Perservere, Lotus Domino, Riak, Terrastore

Object Databases

ZODB, db40, Versant, Gemstone/s, Progress Objectstore

Graph Databases

Neo4j, VertexDB, Infogrid, Sones, Filament, Allegrograph, HyperGraphDB

Projects by Type

If we graph all the projects by type we get this view:

projects_by_type(2).png

There are more key-value stores than the other types combined. Why is this? Are key-value stores that much easier to implement? I would at least guess that the first area where we see projects being abandoned and convergence of projects is this one. The important thing is the features users want, not the project themselves. There must be a lot of overlap here and a lot of projects that are slightly different and almost identical. On the other hand a lot of knowledge of these kinds of system is spread around and there is a good chance of innovation. The combination of the best technical features and API features will hopefully bubble to top and stay on.

License Breakdown

If we graph the projects in the list above by license chosen we get the following:

projects_by_license(2).png

This shows a clear dominance for open source licenses over commercial ones. Some product have chosen a dual licensing model (neo4j and BerkelyDB). Quite a few are unknown which really means they are unable to communicate their license in a understandable manner or the project wasn't really found on the web at all (see point about google friendly names).

Language Breakdown

Graphing the projects by implementation language we get the following:

projects_by_language(4).png

Java takes the lead by with C and C++ following close behind. But is the prevalence of Java a result of the amount of Java knowledge spread around and the big Java usage in Open source, or is Java more suited than other languages to implement these kinds of systems? Interesting to note the number of Erlang implementations and also the fact that quite a few of the projects have implementations in more than one language. The ones with more than one implementation are mostly commercial ones.

Some ending questions:
* Have we reach the maximum of projects that are sustainable now or will the ecosystem continue to grow even more? * Will more of them go commerical? Or will more choose the model with support as the income, like 10Gen has with MongoDB? * How does one choose the right one to use for a given project? This is an increasingly hard problem, at least for key-value stores.

References:

1 TrackBack

TrackBack URL: http://blog.knuthaugen.no/mt/mt-tb.cgi/26

This is an overview of the current state of the NoSQL landscape. It's getting large and somewhat unwieldy and there may be projects which have landed in the wrong category here. I have included object databases in the mix too. Seriously folks, some of ... Read More

8 Comments

Hi,
How do you differentiate between column-oriented store and a key-value store. I've always thought of Google Big Table and Amazon SimpleDB as being very very similar products and argubly fall into both "column-oriented" and "key-value". I appreciate that you did state "there may be projects which have landed in the wrong category here".

Also, does Azure not qualify under one of those two categories?

Regards

What about all the native XML databases e.g. eXist, xDB, MarkLogic, etc.

@Jamie BigTable type/Column-oriented databases are a multidimensional sorted map and data is stored column-wise so access to data in one row is more or less sequential. Access to data in the columns is done by indexing separately. see http://en.wikipedia.org/wiki/Column-oriented_DBMS. Key-value stores are basically stored hashtables where there is one key which gives you access to exactly one value (which can be a simple value or a compound value) see e.g. http://en.wikipedia.org/wiki/Redis_%28data_store%29. I have no idea how storage is done in Azure but that is normally not mentioned when the NoSQL landscape is covered.

Hi Knut,
Thanks for the clarification.

Azure may not be mentioned but it is ostensibly a key-value store so perhaps it should be.

Regards


Splunk is a popular, high-performance search engine optimized for IT data with late/lazy binding of columns at search-time. It allows highly unstructured data to be inserted without having to have a rigid schema. There's now an open-source, free, MySQL front-end so it can be accessed with SQL/ODBC. Perhaps you can add it to your list. Thank you.

Overview:

http://blogs.splunk.com/2010/02/10/sql-splunk-splunkmse/

Project:

http://bitbucket.org/rdas/splunkengine/wiki/Home

@knut - Nice summary

Is there a particular reason why you missed GigaSpaces in your summary?


Note that:
1. We were categorized as one of the leading in In Memory Elastic Caching by Forrester - See my recent post on that regard: WTF is Elastic Data Grid? (By Example)

2. We recently added Memcache support : Did Someone Say GigaSpaces Now Has Memcached Support?

3. We also added document based support: See details here

And were probably the only NoSQL implementation (except for Google App Engine tat provide JPA on top of thier big table implementation) that provides SQL (JDBC) support as well.

I also wrote quite a bit on that topic myself as you can see here so i'm also curious to know how did that slipped your radar.

Other than that - its a pretty useful summary so thanks for the effort and writeup.

Cheers
Nati S.

mini bio

Knut Haugen [Knu:t Hæugen], Norwegian software developer with a penchant for dynamic languages and anything to with developer testing. Agile methodology geek with bias on Lean and Kanban. Some pointers to other stuff by me

meta

This page contains a single entry by Knut Haugen published on March 17, 2010 8:01 AM.

A Brief History of NoSQL was the previous entry in this blog.

Naming Classes and Interfaces is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.