Friday, June 12, 2009

So... you want a Terracotta-based Distributed Cache

INTRODUCTION:

There are many parallels between the Eastern parable of the 6 blind monks and an elephant and Terracotta’s capabilities and usage – in that Terracotta on the one-hand ships a platform based on its flagship DSO (Distributed Shared Objects) technology, which can be exploited in a multitude of ways to concoct distributed systems manufactured with POJOs from the J2SE JDK, but then also ships many ready-made solutions (Terracotta Integration Modules – TIMs in short) that solve specific problems in the enterprise-computing landscape e.g. TIMs exist for Http-Session Clustering, Master-Worker Frameworks (i.e. Map Reduce via tim-messaging), Distributed Job Scheduling (via tim-quartz), distributed Caching (via tim-concurrent-collections, tim-EhCache) and so on and so forth. An ever increasing compendium of these "half-truths" of what Terracotta actually is can be found at http://forge.terracotta.org.

This article is focused on Distributed Cache implementations (i.e. Java-based implementations of in-memory Data-Structures – which are backed by a Data-Source such as a Relational Database) with Terracotta and builds up on prior musings on the topic by Alex Miller and self. It includes both POJO-based solutions and TIM-based solutions, with the latter affording you faster time to market as the description of TIMs above might lead you to conclude. It attempts to open the hood and describes how each distributed-cache feature is satisfied by Terracotta, describes the typical process around some common implementations within the community and also highlights what different interfaces are available to the application developer and why to choose one over the other. Would also recommend reading The clustered data structures guide in conjunction with this blog to get a thorough understanding of how you can skin this cat.



TYPICAL REQUIREMENTS OF A DISTRIBUTED CACHE:

When looking for a comprehensive Distributed Cache Solution, customers often express the following requirements consistently:

- Cache Specific Features: such as Correctness, Eviction, Disk Spill-over, Cache-Hydration, Querying, Eventing, Transactional Updates and Partitioning.

- Non Functional Features: such as High Availability, Scalability/Latency, Security, Visibility/Management and Replication capabilities with Geographically Dispersed Active or DR (Disaster Recovery) Data-centers. And in some rare cases, Cross Platform usability (e.g. Dot-Net, C++ and Java based cache-consumers)

The table below describes how Terracotta delivers these requirements, based on the several 100 production-deployments of this usage of Terracotta:



CACHE FEATURES

CORE DSO PLATFORM

TERRACOTTA TIMs

EXTERNAL (i.e.in App/Infrastucture)

Cache Specific Features:




Large Cache Sizes

Disk Spillover

Terracotta Virtual Memory ensures that you can accommodate large caches that exceed the size of client-JVM/TC-Server-JVM Heaps. Hot references are materialized in your local JVM heap, “warm” references in your Terracotta Server JVM heap and everything is on disk (assuming persistent-disk option is on).




Cache Eviction


TTL, TTI schemes are delivered via tim-map-evictor. Fixed-Time expirations (i.e. empty cache at 5am) and custom schemes can be hand-built. Note that TIMs: EHCache, H2LC & DistributedMap, tccache TIMs already have eviction built in.



Cache Partitioning

Terracotta Server Striping available in FX (Paid Only version). So the striped layer is basically multiple Mirror-Groups which in turn contain 1 Active Terracotta Server JVM and any number of Mirror (i.e. hot standby) Terracotta JVMs. See VIDEO.


One can also partition up at the application tier – but then the partitioning-scheme and routing is to be implemented by the application. Terracotta cluster events greatly ease partition management (Refer to Terracotta Consulting, if unsure)






Cache Correctness

Across Threads in the same JVM and across JVM Process Boundaries.

Java Memory Model is extended by DSO from Amongst Threads in a single JVM Process to Amongst Threads in a single and across JVM Process Boundaries.




Across JVM->DB


Delivered via Tim-async that allows for asynchronous Write Behind to the database.

In case of H2LC – Hibernate writes to 2nd Level Cache on Commit. Terracotta is a transactional 2nd Level cache-provider.



Across DB->JVM



Typically implemented externally, where database triggers keep track of DML changes and publish to a JMS-Topic and one client JVM is a subscriber to this JMS-Topic and upon consumption of the message, updates the cache. The replication of this change is done by Terracotta (assuming the cache is clustered via Terracotta DSO). Use Terracotta cluster events to innoculate against subscriber-JVM failures.






Cache Eventing


Could be implemented via – Distributed Wait/Notify or DMI and Cluster Events (Refer to Terracotta Consultants)


Or via a custom event publication mechanism (Terracotta gives you JVM level clustering, after all - so Lists, Queues etc. can all be clustered just as easily as a Map).






Cache Visibility and Management


Implemented via:

- Terracotta Developer Console

- Terracotta Operations Center(FX/Paid only)

-
JMX API

- Cluster Events.

-Locality API

H2LC Tim features a specialized Management Console based on Hibernate Statistics (TC 3.1 onwards) – that pinpoints queries being offloaded off the database, cache-hit-ratios etc. and is very useful in determining the efficacy of the 2ndLevel Cache in Hibernate based Apps.







Cache Indexing/ Querying


Searchable Map exports an enhanced Map interfaces and supports searches via a local Lucene Data-Store, whose disk-based-Index is kept in synch across JVMs with Terracotta DSO.





Transactional Cache Updates


Terracotta Hibernate 2nd Level Cache Implementation supports (shortly) a Transactional Cache. Similarly, tim-async's implementation provides guarantees around once-and-only-once write to the database with no transaction loss.

Also, often implemented in the App, currently. e.g. in case of a write-through cache - Clustered cache is updated only after the transaction is committed to the database. Or in case of queues to be flushed to the db - double-headed-queues are popular - i.e. "txn-begin = pop-work-off-queue1", process and "txn-end = pop-work-off-queue2" (refer to Terracotta Consultants)





Cache Preloading


Achieved via Fault-count tuning. It is a global setting though (Consultants know how to reflectively change this at run-time). More features are on the way to make cache-pre Loading across client-JVM/TCServer-JVMs virtually painless.

Alternatively one could implement Maps with non-partial behaviors – so that the entire cache can be faulted in one go into the client JVM from the Terracotta Server JVM (or from disk into the Terracotta Server JVM).

The Solutions described to the left were about pre-loading between Terracotta Server and Client JVMs. To pre-load from the data-source, typically cache warmers have to be written currently at the application tier. Note that you should be careful about coarsening your locks within your application code in case of bulk-loads. Expect more TIM offerings here soon. (Refer to Terracotta Consultants for now)

Cache Non Functional Features:




High Availability

Un-Planned Down-Time

Terracotta DSO + Network Active Passive technology ensures that you have no single point of failure in your architecture and that you are protected against some double/ multiple failures.In persistent-mode, all state received by the Terracotta server is written to disk as well - giving you much higher levels of HA than traditional, pure-N-memory-locations-based approaches.



Planned Down-Time

Terracotta helps you with several planned downtime scenarios in that you can execute rolling upgrades across the cluster, for those scenarios. More detailed discussions are documented in Terracotta run-books (Documentation included in Paid Versions).






Security


Follow Terracotta
Security Best Practices






Disaster Recovery


Could be implemented via Terracotta Backup (available in the FX – Paid Version) or via an additional Hot-Standby Node in the DR (assuming it is ok to impact every live-transaction by a marginal amount). See notes…on this topic


Could be implemented in the application via client-JVMs publishing via JMS to a receiving JVM in the DR Site.






Geographically Dispersed Data Centers




Recommend local clusters and application implemented JMS-based (apacheMQ for example) mechanisms if synchronization is needed (or refer to Terracotta Consultants)






Hetergogenous Platform Support (.Net, C#, C++, Java based client JVMs etc.)




Terracotta is Java only solution - however non-Java Clients can talk to a Java-based Soap Service that shares state with the Java-based cluster. Ofcourse object identity contracts would be broken. (or refer to Terracotta Consultants)

Note that Scalability/Latency has deliberately not been included as a non-functional feature, since IMHO that is an emergent property of the architecture/implementation.

-First off, given that application threads are executing I/O against shared in-memory Data-structures as against the RDBMS (other data-store) automatically gives you significantly better latencies and scale.

-Additionally, Terracotta replicates very smart with only field-level fine-grained changes darting around the network to only those JVMs that have those references materialized.

- Thirdly Terracotta implements Greedy-Locking (just like the JVM’s BiasedLocking), LockGC and Lock Lease Policies – which ensure no lock-starvation and significantly reduce distributed lock overheads.

- Fourthly, if the cache-size is too large to fit onto a single Terracotta Server JVM’s heap and you are faced with demanding latency SLAs, you can stripe the Terracotta Server tier to ensure that all clustered data fits onto Heap on the Terracotta Server JVMs.

Terracotta thus offers you an extremely scalable implementation, where you can scale out the number of client JVMs and the number of Terracotta-Server-JVM Mirror-Groups based on the application load and the amount of clustered-I/O it executes – all on commodity hardware with the commodity Hotspot JVM you have grown to love.






DEVELOPER OPTIONS:

As a developer you have many options in terms of what cache-interface you choose - POJOs, Spring and TIMs (Modules) …If anything the trick with Terracotta is to choose the right implementation for your use-case out of the several available. The section below tries to answer that specific question.

Note that –

- Non-functional features such as

o Cache-Correctness (intra and inter-JVM – Terracotta provides the highest levels of correctness out of the box - since no other product tries to honor the JMM across process-boundaries),

o HA (very high-levels of HA given Disk Persistence and NAP),

o Scale

- And some cache-specific features such as:

o Cache-Pre-Load

o Disk-Spillover

o Partitioning

- Are provided at the core-platform layer – i.e. via Terracotta flagship technology of JVM-level clustering (DSO or Distributed Shared Objects).

- Some features are to be implemented as Best Practices, externally in the application and/or infrastructure:

o WAN/DR Support,

o Security

So they are not depicted in the table below. Note that you can use an existing TIM or package your integration as a TIM.

APPLICATION INTERFACES

WHEN TO USE THIS INTERFACE

What TIMs do I need.

CACHE FEATURES





Eviction (TTL, TTI)

Cache- specific Visibility/ Mgmt

Querying

Eventing








DSO Platform provides




Locality API

Tools such as TDC, Ops-Center etc.


Cluster Events and JMX API

Terracotta Modules/ TIMs







H2LC

Application ORMs to the Database via Hibernate and is in need of a highly scalable, available, Distributed 2nd Level Cache - i.e. a Hibernate Accelerator (3.1)

tim-hibernate

tim-hibernate-cache

tim-map-evictor

tim-concurrent-collections

(via hibernate statistics)

x

x


EHCACHE

Application is agnostic to Cache Store Implementation or is already using EHCache as a Cache Store.

tim-ehcache-*

tim-ehcache-commons

x

x

x

DISTRIBUTED MAP BUILDER

If Cache-keys are Strings and you want a built-in Evictor.

tim-map-evictor, tim-concurrent-collections

x

x

x

TCCACHE

Do not care about identity and want a distributed store with a get/put API – This will free you from having to deal with Correct Locking (which is needed for Object Identity preservation) and Instrumentation sections in tc-config.xml (since what is in Terracotta is a byte-array)

tim-tccache, tim-concurrent-collections

x

x

x


CSM

If Cache-Keys are Strings and you plan on implementing your own evictor – i.e. you just need a performant distributed data-store with identity.

tim-concurrent-collections

x

x

x

x


SEARCHABLE MAP

If you want to be able to query your Map and get returned a sub-set and otherwise maintain a Map-Like interface

tim-searchable

(coming soon)

x


ESOTERIC MAP Variations

If you want a

Non Partial HashMap or a , Non-Partial/Non- Clearable (Datastructures that are clearable by the Terracotta Virtual Memory Manager implement the Clearable interface) HashMap

tim-maps, or

tim-tree-map-cache.

x

x

x

x









JDK

CHM

High concurrency (given striped locks) and Key is not a String (If Key is a string, use CSM).

None

tc-config.xml

x

x

x

x


HashMap

If you might be doing multi-segment operations on a CHM or may need coarser locking that Collection-Level APIs such as get/put – then use a HashMap since it is not implicitly locked by TC like a CHM is.

None

tc-config.xml

x

x

x

x



Map (HashMaps)

Terracotta does not implement partial keys. Therefore use this if the #keys are so large that the keys alone won't ever fit into a JVM

Tim-maps

x

x

x

x










SPRING


Spring just injects the framework /POJO and you cluster the underlying frameworks/ POJOs – see Examples of injecting Terracotta-clustered EHcache, Hibernate, Quartz etc. via Spring.








COMMON TERRACOTTA-BASED DISTRIBUTED CACHE IMPLEMENTATION:

If one samples the hundreds of Production Deployments of Distributed Caches with Terracotta, 2-3 winners out of the 10 odd possible options (described in the above section) easily emerge. The rest are more esoteric and specific users have chosen those options/data-structures given some specific requirements in their use-cases. The 2-3 "market-share-based-winners" include:

- EHCache via tim-ehcache

- Hibernate 2nd Level Cache via tim-hibernate-cache/tim-hibernate (with EHCache as the 2nd level Cache provider in pre Terracotta 3.x versions and with Terracotta ConcurrentStringMap (CSM) as the 2nd Level Cache Provider), if the ORM is in the application’s data-access path.

-Distributed-Map via tim-concurrent-collections.

The approach is to marry the data-structure and interfaces (i.e. the cache) with the smart (fine-grained), transparent (no java-serialization/ externalization), correct (in conformance with the JMM) replication capabilities afforded by the Terracotta-DSO-platform to arrive at a scalable, highly-available Distributed Cache.

Here is a description of the typical SDLC for a EHCache-Based Distributed-Cache Implementation, for example.

- INTEGRATION:

o Packaged as a TIM – so you just need tim-ehcache-commons and tim-ehcache (and possibly tim-async) …Modules. Based on what keys/values are being dumped into the EHCache Store, you may need tweaks to the tc-config.xml. Refer to this EHCache integration manual.

o Usage of tim-async is optional since it would depend on what the application can tolerate. Use it if you can write-behind to the database. If you need to write-through to the database – then the Hibernate integration would accommodate you, but in applications without Hibernate, recommend writing to the database and on-db-Transaction-commit writing to the clustered-cache (of course, it depends on the particulars of your application).

o Other best practices are to be aware of:

§ Application TTL and TTI – setting it too low can result in too much garbage on the Terracotta Server and setting it too high increases cache-size needlessly (not to mention data-freshness concerns, if the database receives DML from underneath the application).

§ Read/ Write Ratio – (Terracotta does exceedingly well in all these scenarios, but the value you derive is different)

· All Read / No Write (except to hydrate): You deflect N-1 Queries off the database assuming N is the size of your cluster.

· Mostly Read / Some Write (update): Database Deflection + Correctness across JVMs.

· Mostly Write/ Some Read: Database Deflection + Correctness across JVMs. Might need transaction tuning as well.

- TESTING/TUNING:

o Post Functional Testing, the next step is to usually execute stress tests - to simulate production-level cache-sizes, TTL/TTI and Read/Write ratios and other Cache I/O – tuning here to meet SLAs would typically involve:

§ Instrumentation Tuning – may require you to reduce the scope of the instrumentation in your tc-config.xml. This depends on the object graphs that comprise the keys and values of the elements that are within the Cache.

§ Lock tuning -

· EhCache.concurrency may need to be changed from the default of 127. Higher concurrency allows multiple threads to perform simultaneous operations against EhCache.

· Lock Levels: Set ehcache.lock.readLevel = READ to something more optimistic if the application allows it. If application can afford to read stale data, set ehcache.lock.readLevel = NO_LOCK. This allows read to be performed without acquiring any lock, though the application runs the risk of not reading latest data/changes (and exceptions in some cases) – so not recommended unless the cache is more-or-less read-only and even then with caution.

§ Eviction tuning -

· Tune ehcache.global.eviction* properties and

· Based on application needs try setting 'timeToIdleSeconds and/or timeToLiveSeconds' inside ehcache.xml to reasonable values.

§ Memory Tuning -

· Garbage Collection Tuning.

· Distributed Garbage Collection Tuning.

· Fault/Flush Tuning if needed, based on cache-sizes and available Heap on Client and Terracotta Server JVMs.

§ Transaction Tuning, if needed, to exploit the core-platform’s batching, windowing and transaction-folding optimizations and based on whether the application tends to write in a burst or creates a lot of cache-objects.

- HARDENING and ROLLOUT:

o Once done – you may or may not wish to execute Failure Tests in your specific environment (Terracotta does execute close to 25+ HA tests to guarantee no Single Points of Failure in the architecture) and/or Long Running Tests. Of course, we recommend these, especially if the default heartbeats and tolerances and failover timings do not meet your SLA.

o Getting ready for production additional involves ensuring Monitoring is in place and the technology is socialized amongst the Operational Staff who run and maintain the cluster. Various Terracotta run-books are very useful here (available with the Terracotta subscription). Your implementation may end up then looking like what is depicted in Figure1:






SUMMARY:

In summary you have many options in terms of Interfaces and the core-platform provides for the HA, Scale, Visibility and other non-functional features that a distributed Cache demands - You can read more at http://www.terracotta.org. to get to an implementation with Terracotta looks something like what is depicted here:





Hope this was useful both in terms of an overview and in terms of useful pointers to specific issues in your march across the SDLC.

About Me

I have spent the last 11+ years of my career either working for a vendor that provides infrastructure software or managing Platform Teams that deal with software-infrastructure concerns at large online-presences (e.g. at Walmart.com and currently at Shutterfly) - am interested in building and analyzing systems that speak to cross-cutting concerns across application development in the enterprise (scalability, availability, security, manageability, latency etc.)