This is an overview of the current state of the NoSQL landscape. It’s getting large and somewhat unwieldy and there may be projects which have landed in the wrong category here. I have included object databases in the mix too. Seriously folks, some of you need to pick more google friendly project names. Here are the types and the players in each category. Background data is available in this Google docs spreadsheet.

Key/value stores

Column-oriented stores

Google BigTable, HBase, Cassandra, HyperTable, OpenNeptune, KDI, QBase

Document Databases

CouchDB, MongoDB, Apache JackRabbit, ThruDB, CloudKit, Perservere, Lotus Domino, Riak, Terrastore

Object Databases

ZODB, db40, Versant, Gemstone/s, Progress Objectstore

Graph Databases

Neo4j, VertexDB, Infogrid, Sones, Filament, Allegrograph, HyperGraphDB

Projects by Type

If we graph all the projects by type we get this view:


There are more key/value stores than the other types combined. Why is this? Are key/value stores that much easier to implement? I would at least guess that the first area where we see projects being abandoned and convergence of projects is this one. The important thing is the features users want, not the project themselves. There must be a lot of overlap here and a lot of projects that are slightly different and almost identical. On the other hand a lot of knowledge of these kinds of system is spread around and there is a good chance of innovation. The combination of the best technical features and API features will hopefully bubble to top and stay on.

License Breakdown

If we graph the projects in the list above by license chosen we get the following:


This shows a clear dominance for open source licenses over commercial ones. Some product have chosen a dual licensing model (neo4j and BerkelyDB). Quite a few are unknown which really means they are unable to communicate their license in a understandable manner or the project wasn’t really found on the web at all (see point about google friendly names).

Language Breakdown

Graphing the projects by implementation language we get the following:


Java takes the lead by with C and C++ following close behind. But is the prevalence of Java a result of the amount of Java knowledge spread around and the big Java usage in Open source, or is Java more suited than other languages to implement these kinds of systems? Interesting to note the number of Erlang implementations and also the fact that quite a few of the projects have implementations in more than one language. The ones with more than one implementation are mostly commercial ones.

Some ending questions:
* Have we reach the maximum of projects that are sustainable now or will the ecosystem continue to grow even more? * Will more of them go commerical? Or will more choose the model with support as the income, like 10Gen has with MongoDB? * How does one choose the right one to use for a given project? This is an increasingly hard problem, at least for key/value stores.


blog comments powered by Disqus


17 March 2010