Wednesday, January 21, 2009

Terracotta Usage Amongst Content Portals

TYPICAL REQUIREMENTS:

  1. Content Creation: A combination of
    • User-Generated
    • Feeds from Content Providers (e.g. Movies, Weather from Yahoo etc.)
    • Editorially Generated Content

  2. Content Publishing: On demand, once approved.

  3. Content Portal sites often feature pay-for-content services that imply:
    • User Subscription/Registration.
    • User Authentication and Authorization

  4. Traditional Distribution of Caches that query the DB for User Profile Data and/or Content-Attributes/Specific Content


TYPICAL IMPLEMENTATION:

  1. Typically Content Creation/Publication has been implemented:

    • Against the database in that the CMS publishes to the database and any concomitant caches are thereafter cleared. This has 2 issues:
      • Database can get overloaded depending on the amount of content streaming in and based on the fact that caches get cleared once the new content gets published, which then results in a spike in database queries.
      • To hedge against spikes, often the caches are cleared only at graveyard-hours on the site. So, often there is a delay between content publication and availability for consumption.


    • With Terracotta:
      • You solve both problems. THE CMS application would publish against in memory Java data-structures which are replicated to JVMs via Terracotta that serve up the content to users of the Content Portal.
      • One thus obviates the usage of the database and cache clearing and hence content is available as soon as possible.


    • Example:
      • Portal consists of multiple portlets. One portlet displays weather. Content for the portlet streams in through a Yahoo Feed. Weather for 5 zip-codes change - publication is done in-memory to Java data-structures. (e.g. to specific elements in a CHM where key is ZipCode, Value is Weather-Forecast Object and the CHM is shared between Publication JVMs and Consumption JVMs).



    • Terracotta Solution Considerations (see figure):
      • Publishing Application and Consumption Application are implemented as two different applications.

      • The applications MUST be factored in such a way that both share common data structures that represents the content.

      • Class Loader issues: Since these are 2 different applications, they feature different class loaders - so a common classloader is needed

      • EXAMPLE:
        • Tomcat appserver. CMS app deployed as CMSApp Context Root and Consuming app deployed as UserApp ContextRoot.
        • A class loader is created for each web application. The CMSApp ClassLoader loads all Unpacked-Classes+Resources in the CMSApp/WEB-INF/classes directory & Classes+Jars in the CMSApp/WEB-INF/lib directory of your web application archive, and are made visible to the containing web application, but to no others. The UserApp ClassLoader loads all Unpacked-Classes+Resources in UserApp/WEB-INF/classes and Classes+Jars in UserApp/WEB-INF/lib directory of your web application
        • Given this visibility problem now, you would need to have classes loaded off the Shared ClassLoader, so that all unpacked classes and resources in $CATALINA_BASE/shared/classes, as well as classes and resources in JAR files under $CATALINA_BASE/shared/lib, are made visible to all (and thus CMSApp and UserApp) Web Applications and the Classloader name across CMSApp/ UserApp remain identical (Terracotta requirement)



  2. Subscription and User Authentication and Authorization:

  3. Traditional Distributed Caching: See http://www.terracotta.org/web/display/orgsite/Data+Caching and http://javamuse.blogspot.com/2008/11/distributed-cachingdatabase-offload.html for caching JDBC/ORM querying of the database for Specific Content

  4. Other examples of Terracotta being used in Content Portals include Message Board Clustering (e.g JForum and others) and special case usages.



Hope those of you in the Portal business find this useful.

About Me

I have spent the last 11+ years of my career either working for a vendor that provides infrastructure software or managing Platform Teams that deal with software-infrastructure concerns at large online-presences (e.g. at Walmart.com and currently at Shutterfly) - am interested in building and analyzing systems that speak to cross-cutting concerns across application development in the enterprise (scalability, availability, security, manageability, latency etc.)