Sunday, November 30, 2008

Distributed Caching/Database Offload with Terracotta DSO


You've probably heard about how Terracotta DSO (Terracotta's flagship general purpose JVM-level clustering technology) can help you offload your database. You have a variety of choices depending on your implementation and your goals - This article tries to bring them all together in 1 place so you could more easily determine what Terracotta offering better suits your needs.

See the attached diagram for a decision matrix of sorts. The Green Lozenges lead you to products on terracotta.org that might best fit your stack.


In a nutshell:
  • At a high-level, the first step is to acknowlede the problem with the implementation.
  • This problem will typically manifest itself as UNACCEPTABLE LATENCY associated with certain application transactions OR as DATABASE RESOURCES Being PEGGED (e.g. CPU running uncomfortably high, for the DBA/SysAdmin's comfort).
  • Now, it follows that if the data the App is mutating is "close" to the application then that is much better from a latency perspective, than if the data has to be fetched off the network from your SGA/Disk that comprises your Oracle Database Instance. This data may rightfully live on the database but it will still greatly improve latency if all operations were done in memory against pre-fabricated Domain Model Objects. Enter CACHING. Now your application presumably runs on a cluster of JVMs - so just caching locally on a given JVM, would render the caches inconsistent amongst the JVMs, if these caches and or the Database are also receiving updates to the underlying data. Enter DISTRIBUTED CACHING - where all Read and Write access now occurs in Memory across all the cluster JVMs (and is kept in synch with the database - by "writing-through" or "writing-behind"). i.e. Decide if you just need Caching (e.g. read only data set), or you need Distributed Caching ( a data-set that is being modified by the application as well. A Distributed Cache for a read-only data-set only helps in reducing query load to the database by 1/N where N is the number of client-JVMs in the cluster).
  • Once you've established that Distributed Caching is the solution - decide on whether changes need to be persisted to the Database, and if so, need to be written synchronously (write through) or asynchronously (write behind).
  • Decide on whether the application can tolerate any incorrectness amongst threads on different JVMs reading/writing to the cache. Terracotta is really the only solution, in my opinion, if the application cannot tolerate ANY incoherent concurrent access, whilst still delivering scale/performance.
In summary:
A> There are several technologies that enable "DISTRIBUTED CACHES". So in that usage - yes, Terracotta is one of many such "distributed cache" products in the market - but Terracotta's Networked Attached Memory innovation and implementation offer some unique and powerful advantages:
  1. Preservation of your Java Programming Model - you write natural Java - no new API to program to. You could thus cluster your own custom caches be they implemented as Hashmaps, Hastables, ConcurrentHashMaps, LinkedHashMaps etc.
  2. Correctness guarantees (as defined by the Java Memory Model) of access across the cluster of participating JVMs.
  3. Ready Made clustering solutions (Terracotta Integration Modules - TIMs) for popular OSS caches (e.g. EHcache etc.).
  4. Ready Made support for ORMs (Terracotta Integration Modules - TIMs): Hibernate (Detached-Mode, 2nd Level Cache); Ibatis etc.
  5. "Intelligent replication" to minimize Network Chatter and Network Replication payloads (byte code injection gurantees only mutated Object Fields are shipped to at least 1 other element in the network - the Terracotta Server (and more than 1 if other elements in the cluster reference that field)).
  6. Abilitity for the cache to exceed the size of HEAP on a given application JVM without the need for partitioning (assuming somewhat Gaussian access patterns across key-space).
  7. In depth Monitoring and Visibility via the Developer Console and the Terracotta Operations Center.
  8. L1 (component in client JVM), L2 (dedicated Terracotta JVM) design with persistence affords higher Level of HA - e.g. cache preserved even if all JVMs are lost. HA is in the box - in that no Shared Disk be provisioned.
  9. JMX Events and Hooks to respond programatically to cluster situations such as Client JVM failure or new Client JVM introduction.
  10. OSS Model of Distribution and many others.....
In addition,
B> Evaluate if certain data sets in the database really needs to be persisted in the database or if the persistence afforded by Terracotta suffices. Typically - data should be in your RDBMS/System of Record
  • If it is long-lived and business-critical. See http://tech.puredanger.com/2008/08/01/thinking-about-data-lifetimes/
  • If the Data need to be Reported on?
  • If the Data need to be queried extensively along many diverse criteria - SQL is much more expressive than OQL or any such option? (e.g. if you have an entity X with 3 attributes and Y with 4 attributes where 1 attribute is common across them then you can query X alone in 3C1 + 3C2 + 3C3 = 7 ways and Y alone in 4C1 + 4C2 + 4C3 + 4C4 = 15 ways and X & Y together in multiple other ways).
In cases other than the ones mentioned above - there is a strong possibility that you could just work off memory, assuming some technology like Terracotta gave you HA for that chunk of memory. Advantages apart from DB Offload is the simpler code base - since one simply works off the Java Domain Model, where Terracotta delivers HA and correctness of access for those data structures, without the need to involve a Database and the marshalling/unmarshalling code to overcome the impedance mismatch between the object world and the relational world. This has been proven out on several occasions now - you can get more at http://www.terracotta.org/web/display/orgsite/Common+Use+Patterns

So hopefully after reading this, it is easier for you to determine which Terracotta offering best solves your data-base offload problem....

About Me

I have spent the last 11+ years of my career either working for a vendor that provides infrastructure software or managing Platform Teams that deal with software-infrastructure concerns at large online-presences (e.g. at Walmart.com and currently at Shutterfly) - am interested in building and analyzing systems that speak to cross-cutting concerns across application development in the enterprise (scalability, availability, security, manageability, latency etc.)