Of course, both aim to offload the RDBMS strong-hold in the enterprise by either
- Deflection (i.e. minimizing direct access to the RDBMS) - i.e. a proxy to the RDBMS or
- Avoidance (i.e. allowing for some types of data to not be stored in the RDBMS to begin with).
PURE-L2 Architecture:
Usage of L1-L2 analogies in distributed-cache literature stems from multi-level cache hierarchies described in CPU architectures - basically the L1 cache is local to your client application process and the L2 is an off-host/off-process manager of cache-data.
See this animated GIF -
As can be seen, whether the off-host process that manages the cache-data is MongoD or MemcacheD or Terracotta-Server, architecturally they all look equivalent - i.e. a pure-L2 with no-L1 - so that all data needs to be retrieved from over the network and then massaged into a POJO for consumption by the application. (FYI - I used these specific technology implementations as proxies for NoSQL and Distributed Cache since I have good familiarity with Mongo (Document Based noSQL), Terracotta-Ehcache and Memcache - given that we use these 3 technologies in our stack at Shutterfly).
Given the architectural equivalence here and our runtime-observations, one could argue that in a pure-L2 architecture, using No-SQL technologies seems like the way to go (especially if there is a requirement to view the data outside of the confines of the application's Domain Model - e.g MongoDB Shell). Or at least, drop them into the same pool of candidate technologies and map features as they relate to your use-case.
For us definitely, with Mongo-DB as L2, in certain use cases
- we observed good response times (see attached image)
- we experienced good query expressability
- we could validate data manipulations expected off the application easily from the MongoDB Shell.
Although there was definitely the domain model/ BSON impedance mismatch to be overcome with Morphia annotations and APIs. For a comparison along some common dimensions, see the table below:
Dimension | MongoD | MemcacheD | Terracotta Ehcache |
Retrieval | BSON from MongoD that can be mapped via Morphia into a POJO | Serialized version from MemcacheD - that needs to be de-serialized via Java-Serializaton as a POJO | Object DNA from Terracotta Server - available at the application transparently as an Ehcache-Element. |
Latency (as seen by app) | 1ms - more | 1ms - more | 1ms - more |
Horizontal Scaling | Mongo Replica Sets (available in open-source) | Multiple MemcacheD (available in open-source) | Terracotta Server Cluster Array (not Available in open-source, in enterprise-kit |
High Availability | Mongo Replica Sets (available in open-source) | Not a good story - no HA available for MemcacheD (change to MemcacheDB but that has its own costs) | Terracotta Server Cluster Array (not Available in open-source, in enterprise-kit |
Query Expressability | Very good - see Mongo Querying and Advanced Querying | Key-Value lookup only | Key-Value although Ehcache 2.5 and onward now offer some search capability via searchability API |
Of course, there are use cases where cache access in the order of milli-seconds is a non-starter - you want cache-access in the order of mirco-seconds and hence if Network-hops and Impedance-mismatches can be avoided, you want to avoid them. A cache is about local data - and assuming your application has good locality characteristics, you want to exploit L1 caching and in such a case a L1-L2 architecture offered by some of the distributed cache implementations (Terracotta-Ehcache, Oracle-Coherence) or a bloated L1 architecture (Terracotta-BigMemory) could well be the way to go.
If you have experiences or comments on MongoDB (or other noSQL) and distributed cache usage comparisons, feel free to share them or comment.
Cheers,