GlassBox is a cool open-source tool that promises exactly that for your Java applications in that it can give unprecedented visibility and adds on a diagnostic layer above this to summarize any potential issue succintly (see figure). Added bonus - it does this via AOP i.e. non-intrusive as far as your application is concerned.
However all of the measurement that GlassBox provides are for a given JVM - so coming back to our earlier point around aggregating measurements, wouldn't it be cool if there was some way (with minimal code changes) to collect these stats across the entire cluster of JVMs - so that way when a problem occurs, you can correlate data across all JVMs and on the cluster as a whole, so as to easily identify if a problem is cluster-wide or is isolated to a single JVM(s).
I sat with Ron Bodkin, CTO of Glassbox to see if we could cluster Glassbox with Terracotta. Here is what we had to do :
- We installed GlassBox and then Terracotta for Spring (since glassBox uses Spring internally)
- Modified the catalina.sh script and passed in 4 additional JAVA OPTIONS (so Terracotta could get hooked in). Container was Tomcat 5.5.x.
- Ron then identified the Spring Bean that had all of the state (i.e. the per-JVM statistics).
- We ran into a few configuration issues, which had to get sorted. Upgraded Spring to 1.2.8 (1.2.8 and upwards is supported by Terracotta). Even so, we ran into some exceptions and then realized that there was another old spring.jar in the classpath that needed to get removed.
- Filled out the terracotta config file (tc-config.xml) which is where you state what bean needs to be clustered. Added wild-card characters to the application name and the resources entry, so Terracotta could find the Spring application context file, which captures the bean definitions. tc-config.xml entries look like
<jee-application name ="*glassbox*">
< paths><path>*beans.xml</path><paths> - Also the requirements were such that cluster-wide stats needed to be updated on a timed basis and not in real-time (i.e. as soon as a given statistic changed on any JVM in the cluster). So, Ron then wrote a method that essentially made an intelligent deep clone of the object graph that needed to be clustered - this method would fire off every few seconds. And we then clustered the bean that wrapped this deep clone instead. Config entries were:
<beans>
<bean name = "distributedRegistryHolder"></bean>
<beans>
<locks><autolock>
<method-expression> *glassbox.track.api.StatisticsRegistryHolderImpl.copy(..)</method-expression>
<lock-level>write</lock-level>
</autolock></locks> - Terracotta for Spring also has a feature where with a one-line config entry, you could cluster HttpSessions. So we turned that on. Something like adding a
<session-support>true</session-support>
element to the tc-config.xml. A GlassBox User can used a Web-based client to monitor your App - not a whole lot of state is in HttpSession, except for things such as GUI preferences - so those would now get preserved in case of loss of a given app-server. Not much but 1 less minor inconvenience in the world.