Thursday, October 11, 2007

Why is RAT a better mouse-trap?



What is RAT and what is it better at?

RAT = Remotely Attached Tenured (generation). You might have also heard of it as NAM (Network Attached Memory/Heap) for your Java applications. Bear with me for a moment as we go through a broad argument as to why RAT makes sense.

Let’s assume you reference a Cache in your Java Application – as this cache fills up, your Java Heap is under pressure. So, what are your options to avoid an OOME (Out Of Memory Exception):

1. Implement an evictor ⇒
  • Reapers remove Cache-Entries based on some LRU/LFU algorithm.
  • Problem is that these Cache-Entries were expensive-to construct in the first place. I suppose you have no choice if reaping is being done for cache-freshness but then doing it for getting around HEAP constraints – that hurts…
2. Rely on Java’s SoftReference capabilities i.e.
  • Create Cache-Entries as SoftReferences and
  • Then specify a SoftRefLRUPolicyMSPerMB (where essentially the JVM nulls your "carefully-constructed, expensive" data-structures) e.g. if set to 100 and if 200 MB is available on heap then => all SoftReferences that were created over 20,000 (200x100) ms ago – i.e. 20s ago, become eligible for collection...Again, hurts.
3. Rely on Java Serialization and overflow to disk.
  • Overflow leastRecentlyUsed/leastFrequentlyUsed Cache-Entries to disk based on some upper-watermark in terms of Heap utilization and pull them off disk, if the reference is not live on local Heap.
  • Cache-Entries and the object-graph associated with them need to implement java.io.Serializable and you need to marshall /unmarshall your object graph (at the very least Cache-Entry would feature 2 methods to serialize, deserialize itself).
  • So dosen't hurt as bad, although we had to “pollute” application code (with several classes having to implement java.io.Serializable and having to invest in “Marshalling /UnMarshalling from Domain Object Model <-> Disk Persistable Representation" code) and Java serialization isn’t cheap and local disk can’t be shared with other JVMs on other boxes.
So now what if, instead of all these gyrations around losing "carefully-constructed, expensive" Cache-Entries, or polluting your code - your object references (Cache-Entries and the graph attached to them) were able to just automatically become “Semi SoftReferences” when you encounter HEAP pressures – i.e.
  • LeastRecently/LeastFrequently used References just get nulled in the local JVM (so they become eligible for GC) and thus prevent the local JVM from throwing an OOME (similar to how Soft Reference would have worked).
  • But, in fact you don’t permanently lose the “carefully-constructed, expensive” Cache-Entry – since unbeknownst to the application-developer, these object references were also being maintained on a remote JVM – and when the local JVM does need such a “nulled” reference, it gets transparently pulled in from the remote JVM.
Sounds neat – dosen’t it? Your local JVM does not OOME and you did not have to invest any effort in your application to get this behavior of not losing the Cache-Entry from within your Java Domain Model.

That is precisely what is implemented as Terracotta’s virtual heap feature. The local JVM is your application JVM and the remote JVM is the “Terracotta Server” JVM and you get this behavior with just configuration. See:
http://www.terracotta.org/confluence/display/docs1/Concept+and+Architecture+Guide#ConceptandArchitectureGuide-VirtualHeap
and
http://www.terracotta.org/confluence/display/orgsite/Virtual+Heap+for+Large+Datasets
and
http://www.terracotta.org/confluence/display/orgsite/How+Terracotta+Works

In fact the remote JVM (the Terracotta server) can additionally persist the object references it maintains to disk. You thus get sections of Tenured (Old) Generation to become
  • Virtually unbounded (limited by only the size of disk you can attach to the remote JVM).
  • To become durable.
  • i.e. your Heap begins to look like what is attached in the picture (please ignore, Survivor, Perm etc. for the sake of this argument).

Now, Assuming ON AVERAGE:
  • World population for the past 50 years has averaged 5 billion at any one point in time &&
  • That 5% of the population on average have been programmers &&
  • That each programmer worked 300*8 = 2400 hours/year (I kid - the real no is probably more like 4800) &&
  • That 20% of all that time was devoted to managing memory and dealing with resultant issues:
    • Back in Mainframe days, it was about preserving every single byte you consumed;
    • In C/C++ days, it was about debugging “malloc”s without “free”s and pointer arithmetic gone haywire and
    • In these Java days, it is about tuning GC and avoiding OOMEs.
  • Therefore, time spent by humankind on Memory related issues = 5 * 50 * .05 * 2400 * .2 = 6 Trillion Man-hours – clearly not a trivial amount of time ☺. Now that you can use Terracotta Virtual heap and save yourself all the bother of writing all kinds of mechanisms to avoid OOMEs due to Heap constraints, hopefully we can make a dent on human-hours spent on dealing with this - 20% coming down to 19% itself is a saving today of 1 billion human-hours/year ;-) .
Of course, persist to a database a certain class of data (business critical data, data you need to query/report on) – databases are unmatched in terms of recovery, backup, check-pointing features and accessibility via SQL.

But, now that you have this sliver of Tenured that can expand and expand (at the rate and cost of disk) and still stay persistent across JVM lifecycles, without you having modify Java code – you have this new weapon in your armory. Check it out at http://www.terracotta.org

(Will post more details on how to control virtual memory and a little bit about its inner workings on the next blog)....

Wednesday, July 11, 2007

Clustered Java - Operational Needs


Businesses are in constant flux and mutate in response to environmental stimuli - the Java application you deployed in the past has to change - new functionality, capacity, availability, monitorability requirements get tagged on.

Typically, the operations management and team (sysadmins, network engineers, DBAs etc.) who get measured on uptime/scale and the overall smooth functioning of the data-center. But ofcourse, they rely very heavily on the application/ application-infrastructure developer to help them get there. What is the expected application behavior:
1. When it gets deployed off 1 node to multiple nodes.
2. When 1 of the JVMs crashes.
3. When state gets out of synch across these nodes.
4. Under load etc.

Any stream-lining here will simplify the Development <-> Operations handoff i.e. fewer calls to you, the developer and lower maintenance costs to the organization as a whole.
So, then:
  • Where does the application developer’s job end and the software-infrastructure engineer’s job begin?
  • Where does the software-infrastructure engineer’s job end and the operational-infrastructure engineer’s job begin ?

Now whether you are in an IT culture where

  • Roles are clearly demarcated across silos – as in the attached figure OR
  • You are in a culture/organization, where the above described roles don’t have clean boundaries…

there is no denying that while there is an argument for “silo-ization” – concerns around application functionality, latency, scalability, availability and manageability cut across these boundaries. The characteristics of a successful, highly efficient IT organization arguably are:

  • Core-competencies within each silo.
  • Clearly defined artifacts and interfaces across each boundary resulting in minimal friction /iterative loops across each boundary AND
  • A good understanding the up-stream/down-stream consequence of decisions within each silo.

As an example, lets focus on a big, hard-to-tame beast of Availability and Scalability for Java Applications. Of course, as an application-developer you are bound by what requirements flow-downstream - but assuming reasonable business requirements/project- management ;-), you would impact the final infrastructure/management foot-print of your application, based on decisions you execute around:

Task / Decision

Downstream Impact

Modelling of real-world entities as Database-schema and as a OO hierarchy in the Java-tier. And implementation of these entitites in the Database and at the Java tier (Data-structures employed, package/class organization, "garbage" created)

Latency, Scale

Marshalling/Unmarshalling across the OO-DB representations (ORM).

Latency, Scale

Computational algorithms, control-flow

Latency, Scale

Dependency management.

Maintainability, Manageability

Management of state across JVMs and JVMs and DB/external systems.

Availability, Scale

Wrong choices anywhere (e.g. an inappropriate data-structure, needless sql within a tight-loop, massive amounts of new objects instantiated/request etc.) can adversely impact “down-stream” concerns such as latency/scale/manageability. In today’s environment, the first-4 bullet points on the list above are clearly better-contained within the application developer’s world and classic application trouble-shooting can get rid of reported problems.
Arguably however, the last bullet point on the list above around state-management across JVMs and external systems is the fuzziest – and routinely crosses the Development / Operations chasm. The consequence of addressing this “downstream-concern” much later is that the current state-of-the-art is intrusive/labor-intensive, expensive, inconsistent and features frequent round-trips across the Dev/Ops interface – leading to inefficiencies and high support-overhead costs. i.e. one is better off taking a long-term view during the SDLC as against when maintaining the application (i.e. during SMLC – Software Maintenance LifeCycle).

Focussing further on an application on a single-JVM now running on multiple JVMs, the interfaces between Ops and Dev (and at times between application-infrastructure-development and application-software development) are weakly defined - resulting in frequent traversals across the chasm and inefficiencies. So, what are some of these operational concerns/needs when running a Java application on a cluster of JVMs.

  1. Load Balancing strategy:
    1. How to ensure that the load gets appropriately partitioned across the JVMs in the cluster?
    2. Does the application need Locality of Reference - i.e. do requests need to be stickily routed to the same JVM?
    3. How to re-route and what is user-experience when a JVM goes down?

  1. Application Correctness: not synching state (e.g. caches) or any optimistic way of sending updates over implies
    1. Reacting to potentially incoherent application/state (cache) across the cluster.
    2. Non-guaranteed delivery of messaging across JVMs can easily lead to the brain being split across the participants in the cluster. (e.g. if the cache reflected price on an item in an electronic catalog and is inconsistent on 2 different JVMs, which JVM then has the right cache value - how does operations recover, do they need to restart the cluster ?)

  1. Capacity Planning:
    1. Operations needs a deterministic way to scale. Today there is a lack of a deterministic way to scale (e.g. add-a-node on demand).
    2. Few low clustering overhead solutions are available in the market (or if home-grown) - since the majority use expensive Java-serialization. i.e. The requirement of more scale per operational-dollar in a clustered environment is still largely unmet by today's solutions.
    3. Dynamic Infrastructure Provisioning based on usage characteristics.

  1. Breach of Availability SLA (Unplanned downtime - i.e the fewer pages go off, the better):
    1. Due to an Application Server Failure (Industry Java-app availability ~ 98%)
    2. Cascading Failure (1 JVM goes down and the others go down as well).
    3. Unpalatable Recoverability (long restart, application/cache warming, infrastructure abuse)
    4. Cluster work-load balance/re-balance in response to outage/recovery.

  1. Breach of Availability SLA (Planned downtime) - i.e. can i continue to maintain business continuity:
    • Application Release Processes: Rolling upgrades/Incompatible patches (e.g. apps that depend on Java Serialization)
    • Content Management (database/schema-related downtime).

  1. Manageability / Monitorability
    1. Few/No Intrusions into Release/Patch process
    2. No comprehensive cluster visibility - Cluster management today is per node.
    3. Proliferation of Tools/Mgmt Interfaces and complexity for the NOC.
    4. Running Stateless with Stateful applications (arbitrary servers out-of/in-rotation).

  1. Dev <-> Ops HandOff
    1. High Overhead/Friction in Dev<->Ops Hand-Off
    2. Operational Training costs
    3. NOC and Production-Support Complexity/Errors

  1. Others : Peculiar to a given IT/Organization’s culture.
Terracotta (the company behind the open-source clustering technology - DSO/Distributed Shared Objects) attempts (successfully, imho) to address all of the above operational concerns (with the exception of Load Balancing, Dynamic Infrastructure Provisioning 3-b) via a non-intrusive (simpilicity for Dev by clustering at the JVM), cost-effective (open-source) and consistent mechanism to cluster state across JVMs:
  • The application architect identifies what is stateful and codifies the clustering behavior in a XML file. Application code is cleanly separated from the clustering concern.
  • Operations gets a standard terminology and a standard mechanism (i.e a clean interface from Dev) to cluster Java applications, efficiently (given the implementation) and cost-effectively (highly scalable and open-source).

You can read more about it on the numerous blogs and the website at http://www.terracotta.org.

About Me

I have spent the last 11+ years of my career either working for a vendor that provides infrastructure software or managing Platform Teams that deal with software-infrastructure concerns at large online-presences (e.g. at Walmart.com and currently at Shutterfly) - am interested in building and analyzing systems that speak to cross-cutting concerns across application development in the enterprise (scalability, availability, security, manageability, latency etc.)