This is an overview of the current state of the NoSQL landscape. It's getting large and somewhat unwieldy and there may be projects which have landed in the wrong category here. I have included object databases in the mix too. Seriously folks, some of you need to pick more google friendly project names. Here are the types and the players in each category. Background data is available in this Google docs spreadsheet.
Key-value stores
key‐value in-memory: memcached, Repcached, Oracle Coherence, Infinispan, Websphere eXtreme scale, JBoss cache, Velocity, Terracotta Ehcache
Regular key‐value stores: Keyspace, Amazon SimpleDB, Flare, Schema‐free, RAMCloud, Twisted Storage (TSnoSQL), Redis, Tokyo Cabinet, Lightcloud, NMDB, Lux IO, Memcachedb, Actord, BerkeleyDB, Scalaris, GT.M, Mnesia, HamsterDB, Chordless
Eventually Consistent key‐value stores: Amazon Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, Dovetaildb, KAI
Column-oriented stores
Google BigTable, HBase, Cassandra, HyperTable, OpenNeptune, KDI, QBase
Document Databases
CouchDB, MongoDB, Apache JackRabbit, ThruDB, CloudKit, Perservere, Lotus Domino, Riak, Terrastore
Object Databases
ZODB, db40, Versant, Gemstone/s, Progress Objectstore
Graph Databases
Neo4j, VertexDB, Infogrid, Sones, Filament, Allegrograph, HyperGraphDB
Projects by Type
If we graph all the projects by type we get this view:

There are more key-value stores than the other types combined. Why is this? Are key-value stores that much easier to implement? I would at least guess that the first area where we see projects being abandoned and convergence of projects is this one. The important thing is the features users want, not the project themselves. There must be a lot of overlap here and a lot of projects that are slightly different and almost identical. On the other hand a lot of knowledge of these kinds of system is spread around and there is a good chance of innovation. The combination of the best technical features and API features will hopefully bubble to top and stay on.
License Breakdown
If we graph the projects in the list above by license chosen we get the following:

This shows a clear dominance for open source licenses over commercial ones. Some product have chosen a dual licensing model (neo4j and BerkelyDB). Quite a few are unknown which really means they are unable to communicate their license in a understandable manner or the project wasn't really found on the web at all (see point about google friendly names).
Language Breakdown
Graphing the projects by implementation language we get the following:

Java takes the lead by with C and C++ following close behind. But is the prevalence of Java a result of the amount of Java knowledge spread around and the big Java usage in Open source, or is Java more suited than other languages to implement these kinds of systems? Interesting to note the number of Erlang implementations and also the fact that quite a few of the projects have implementations in more than one language. The ones with more than one implementation are mostly commercial ones.
Some ending questions:
* Have we reach the maximum of projects that are sustainable now or will the ecosystem continue to grow even more?
* Will more of them go commerical? Or will more choose the model with support as the income, like 10Gen has with MongoDB?
* How does one choose the right one to use for a given project? This is an increasingly hard problem, at least for key-value stores.
References:
- NoSQL presentation by Steve Yen
- http://nosql-database.org/
- http://nosql.mypopescu.com/
- http://www.dbms2.com/2010/03/14/nosql-taxonomy/
- http://blog.nahurst.com/visual-guide-to-nosql-systems
- All the various project pages and product websites.

Hi,
How do you differentiate between column-oriented store and a key-value store. I've always thought of Google Big Table and Amazon SimpleDB as being very very similar products and argubly fall into both "column-oriented" and "key-value". I appreciate that you did state "there may be projects which have landed in the wrong category here".
Also, does Azure not qualify under one of those two categories?
Regards
What about all the native XML databases e.g. eXist, xDB, MarkLogic, etc.
@Jamie BigTable type/Column-oriented databases are a multidimensional sorted map and data is stored column-wise so access to data in one row is more or less sequential. Access to data in the columns is done by indexing separately. see http://en.wikipedia.org/wiki/Column-oriented_DBMS. Key-value stores are basically stored hashtables where there is one key which gives you access to exactly one value (which can be a simple value or a compound value) see e.g. http://en.wikipedia.org/wiki/Redis_%28data_store%29. I have no idea how storage is done in Azure but that is normally not mentioned when the NoSQL landscape is covered.
@Paul XML databases should maybe be included under the nosql umbrella. But I guess it boils down to how they are queried and how they store data: If they are glorified RDBMS queried with sql, with an XML overlay they definitely don't belong, but native XML implementations with other storage and query belongs as a separate category perhaps.
I'm not familiar enough with the products and made a decision not to include them in the post for time reasons mainly. I also noticed that other blogs tracking nosql (like http://nosql-database.org/ and http://nosql.mypopescu.com/) do not include them in the core overview (NoSQL databases mentions them at the bottom, though). I guess the answer is it depends on your viewpoint and definition of NoSQL.
Hi Knut,
Thanks for the clarification.
Azure may not be mentioned but it is ostensibly a key-value store so perhaps it should be.
Regards
Splunk is a popular, high-performance search engine optimized for IT data with late/lazy binding of columns at search-time. It allows highly unstructured data to be inserted without having to have a rigid schema. There's now an open-source, free, MySQL front-end so it can be accessed with SQL/ODBC. Perhaps you can add it to your list. Thank you.
Overview:
http://blogs.splunk.com/2010/02/10/sql-splunk-splunkmse/
Project:
http://bitbucket.org/rdas/splunkengine/wiki/Home
@knut - Nice summary
Is there a particular reason why you missed GigaSpaces in your summary?
Note that:
1. We were categorized as one of the leading in In Memory Elastic Caching by Forrester - See my recent post on that regard: WTF is Elastic Data Grid? (By Example)
2. We recently added Memcache support : Did Someone Say GigaSpaces Now Has Memcached Support?
3. We also added document based support: See details here
And were probably the only NoSQL implementation (except for Google App Engine tat provide JPA on top of thier big table implementation) that provides SQL (JDBC) support as well.
I also wrote quite a bit on that topic myself as you can see here so i'm also curious to know how did that slipped your radar.
Other than that - its a pretty useful summary so thanks for the effort and writeup.
Cheers
Nati S.
@Nati
I don't know how it slipped under the radar. At the time of writing I started out with the overviews from nosql-database.org and nosql.mypopescu.com and did some additional digging around on the interwebs. Perhaps low (at the time) google rank for the search terms I used can explain some of it.
I can assure you it wasn't intentional since I set out the cover them all. But finding them "all" and deciding when I had "all" is no easy task. Several new ones have popped up since the time of writing too.