Thursday, October 11, 2007

Why is RAT a better mouse-trap?



What is RAT and what is it better at?

RAT = Remotely Attached Tenured (generation). You might have also heard of it as NAM (Network Attached Memory/Heap) for your Java applications. Bear with me for a moment as we go through a broad argument as to why RAT makes sense.

Let’s assume you reference a Cache in your Java Application – as this cache fills up, your Java Heap is under pressure. So, what are your options to avoid an OOME (Out Of Memory Exception):

1. Implement an evictor ⇒
  • Reapers remove Cache-Entries based on some LRU/LFU algorithm.
  • Problem is that these Cache-Entries were expensive-to construct in the first place. I suppose you have no choice if reaping is being done for cache-freshness but then doing it for getting around HEAP constraints – that hurts…
2. Rely on Java’s SoftReference capabilities i.e.
  • Create Cache-Entries as SoftReferences and
  • Then specify a SoftRefLRUPolicyMSPerMB (where essentially the JVM nulls your "carefully-constructed, expensive" data-structures) e.g. if set to 100 and if 200 MB is available on heap then => all SoftReferences that were created over 20,000 (200x100) ms ago – i.e. 20s ago, become eligible for collection...Again, hurts.
3. Rely on Java Serialization and overflow to disk.
  • Overflow leastRecentlyUsed/leastFrequentlyUsed Cache-Entries to disk based on some upper-watermark in terms of Heap utilization and pull them off disk, if the reference is not live on local Heap.
  • Cache-Entries and the object-graph associated with them need to implement java.io.Serializable and you need to marshall /unmarshall your object graph (at the very least Cache-Entry would feature 2 methods to serialize, deserialize itself).
  • So dosen't hurt as bad, although we had to “pollute” application code (with several classes having to implement java.io.Serializable and having to invest in “Marshalling /UnMarshalling from Domain Object Model <-> Disk Persistable Representation" code) and Java serialization isn’t cheap and local disk can’t be shared with other JVMs on other boxes.
So now what if, instead of all these gyrations around losing "carefully-constructed, expensive" Cache-Entries, or polluting your code - your object references (Cache-Entries and the graph attached to them) were able to just automatically become “Semi SoftReferences” when you encounter HEAP pressures – i.e.
  • LeastRecently/LeastFrequently used References just get nulled in the local JVM (so they become eligible for GC) and thus prevent the local JVM from throwing an OOME (similar to how Soft Reference would have worked).
  • But, in fact you don’t permanently lose the “carefully-constructed, expensive” Cache-Entry – since unbeknownst to the application-developer, these object references were also being maintained on a remote JVM – and when the local JVM does need such a “nulled” reference, it gets transparently pulled in from the remote JVM.
Sounds neat – dosen’t it? Your local JVM does not OOME and you did not have to invest any effort in your application to get this behavior of not losing the Cache-Entry from within your Java Domain Model.

That is precisely what is implemented as Terracotta’s virtual heap feature. The local JVM is your application JVM and the remote JVM is the “Terracotta Server” JVM and you get this behavior with just configuration. See:
http://www.terracotta.org/confluence/display/docs1/Concept+and+Architecture+Guide#ConceptandArchitectureGuide-VirtualHeap
and
http://www.terracotta.org/confluence/display/orgsite/Virtual+Heap+for+Large+Datasets
and
http://www.terracotta.org/confluence/display/orgsite/How+Terracotta+Works

In fact the remote JVM (the Terracotta server) can additionally persist the object references it maintains to disk. You thus get sections of Tenured (Old) Generation to become
  • Virtually unbounded (limited by only the size of disk you can attach to the remote JVM).
  • To become durable.
  • i.e. your Heap begins to look like what is attached in the picture (please ignore, Survivor, Perm etc. for the sake of this argument).

Now, Assuming ON AVERAGE:
  • World population for the past 50 years has averaged 5 billion at any one point in time &&
  • That 5% of the population on average have been programmers &&
  • That each programmer worked 300*8 = 2400 hours/year (I kid - the real no is probably more like 4800) &&
  • That 20% of all that time was devoted to managing memory and dealing with resultant issues:
    • Back in Mainframe days, it was about preserving every single byte you consumed;
    • In C/C++ days, it was about debugging “malloc”s without “free”s and pointer arithmetic gone haywire and
    • In these Java days, it is about tuning GC and avoiding OOMEs.
  • Therefore, time spent by humankind on Memory related issues = 5 * 50 * .05 * 2400 * .2 = 6 Trillion Man-hours – clearly not a trivial amount of time ☺. Now that you can use Terracotta Virtual heap and save yourself all the bother of writing all kinds of mechanisms to avoid OOMEs due to Heap constraints, hopefully we can make a dent on human-hours spent on dealing with this - 20% coming down to 19% itself is a saving today of 1 billion human-hours/year ;-) .
Of course, persist to a database a certain class of data (business critical data, data you need to query/report on) – databases are unmatched in terms of recovery, backup, check-pointing features and accessibility via SQL.

But, now that you have this sliver of Tenured that can expand and expand (at the rate and cost of disk) and still stay persistent across JVM lifecycles, without you having modify Java code – you have this new weapon in your armory. Check it out at http://www.terracotta.org

(Will post more details on how to control virtual memory and a little bit about its inner workings on the next blog)....

About Me

I have spent the last 11+ years of my career either working for a vendor that provides infrastructure software or managing Platform Teams that deal with software-infrastructure concerns at large online-presences (e.g. at Walmart.com and currently at Shutterfly) - am interested in building and analyzing systems that speak to cross-cutting concerns across application development in the enterprise (scalability, availability, security, manageability, latency etc.)