Saturday, March 19, 2011

noSQL (Document-based) or Distributed Cache - a Practitioner's tale.

As numerous discussions at Shutterfly and on various online posts and this blog by Greg Luck indicates, there is definitely some interest in the market in terms of understandings where Distributed Cache and NoSQL Movement use-cases intersect and where they don't.

Of course, both aim to offload the RDBMS strong-hold in the enterprise by either
  • Deflection (i.e. minimizing direct access to the RDBMS) - i.e. a proxy to the RDBMS or
  • Avoidance (i.e. allowing for some types of data to not be stored in the RDBMS to begin with).

PURE-L2 Architecture:

Usage of L1-L2 analogies in distributed-cache literature stems from multi-level cache hierarchies described in CPU architectures - basically the L1 cache is local to your client application process and the L2 is an off-host/off-process manager of cache-data.

See this animated GIF -

As can be seen, whether the off-host process that manages the cache-data is MongoD or MemcacheD or Terracotta-Server, architecturally they all look equivalent - i.e. a pure-L2 with no-L1 - so that all data needs to be retrieved from over the network and then massaged into a POJO for consumption by the application. (FYI - I used these specific technology implementations as proxies for NoSQL and Distributed Cache since I have good familiarity with Mongo (Document Based noSQL), Terracotta-Ehcache and Memcache - given that we use these 3 technologies in our stack at Shutterfly).

Given the architectural equivalence here and our runtime-observations, one could argue that in a pure-L2 architecture, using No-SQL technologies seems like the way to go (especially if there is a requirement to view the data outside of the confines of the application's Domain Model - e.g MongoDB Shell). Or at least, drop them into the same pool of candidate technologies and map features as they relate to your use-case.

For us definitely, with Mongo-DB as L2, in certain use cases
- we observed good response times (see attached image)
- we experienced good query expressability
- we could validate data manipulations expected off the application easily from the MongoDB Shell.

Although there was definitely the domain model/ BSON impedance mismatch to be overcome with Morphia annotations and APIs. For a comparison along some common dimensions, see the table below:

DimensionMongoDMemcacheDTerracotta Ehcache
RetrievalBSON from MongoD that can be mapped via Morphia into a POJOSerialized version from MemcacheD - that needs to be de-serialized via Java-Serializaton as a POJOObject DNA from Terracotta Server - available at the application transparently as an Ehcache-Element.
Latency (as seen by app)
1ms - more1ms - more1ms - more
Horizontal ScalingMongo Replica Sets (available in open-source)Multiple MemcacheD (available in open-source)Terracotta Server Cluster Array (not Available in open-source, in enterprise-kit
High AvailabilityMongo Replica Sets (available in open-source)Not a good story - no HA available for MemcacheD (change to MemcacheDB but that has its own costs)Terracotta Server Cluster Array (not Available in open-source, in enterprise-kit
Query ExpressabilityVery good - see Mongo Querying and Advanced QueryingKey-Value lookup only Key-Value although Ehcache 2.5 and onward now offer some search capability via searchability API

Of course, there are use cases where cache access in the order of milli-seconds is a non-starter - you want cache-access in the order of mirco-seconds and hence if Network-hops and Impedance-mismatches can be avoided, you want to avoid them. A cache is about local data - and assuming your application has good locality characteristics, you want to exploit L1 caching and in such a case a L1-L2 architecture offered by some of the distributed cache implementations (Terracotta-Ehcache, Oracle-Coherence) or a bloated L1 architecture (Terracotta-BigMemory) could well be the way to go.

If you have experiences or comments on MongoDB (or other noSQL) and distributed cache usage comparisons, feel free to share them or comment.

About Me

I have spent the last 11+ years of my career either working for a vendor that provides infrastructure software or managing Platform Teams that deal with software-infrastructure concerns at large online-presences (e.g. at and currently at Shutterfly) - am interested in building and analyzing systems that speak to cross-cutting concerns across application development in the enterprise (scalability, availability, security, manageability, latency etc.)