Global Channel Content Caching (GCCC) & More

Overview

This document is presented for discussion and identifies proposed enhancements to uPortal 2.5.x to increase performance. Modifications are applied to the framework and CWebProxy/CGenericXSLT channels.

Global Channel Content Caching

There are different caching strategies utilize by the portal at different levels. This enhancement is specifically designed to reduce the number of outgoing http/https calls from the portal to retrieve external content at the cost of additional memory use. The additional memory use could be low to high, depending on the size of the retrieved content.

Channel Content Caching is currently supported for the CWebProxy and CGenericXSLT channels per user. In other words, if a channel is configured for caching, content is retrieved for a user who logs into the portal and cached for a set amount of time. This type of caching is defined in the framework by the ICacheable and IMultithreadedCacheable interfaces:

public interface ICacheable {

    /**
     * Requests the channel to generate a key uniqly describing it's current state,
     * and a description of key usage.
     */
    public ChannelCacheKey generateKey();

    /**
     * Requests the channel to verify validity of the retreived cache based on the validator object.
     */
    public boolean isCacheValid(Object validity);
}

The content cached by the framework is "processed" in that it is not the raw content retrieved from external resources. Instead, it is the data generated in the renderXML() method into the ContentHandler.

This strategy is beneficial since:

  1. The framework handles caching details (except the decision as to the validity of the cache entry)
  2. It reduces the number of channel renderXML() calls (thereby external http/s calls)

In summary, user or framework caching caches only for that users instance of the channel

This enhancement attempts to add an additional layer of caching. In the myRutgers portal, two observations have been noted:

  1. Many CWebProxy/CGenericXSLT channels render the same content for all users
  2. During high login activity (e.g., 50 logins per second), this can generate many external resource calls

External IO calls such as HTTP/S are blocking. In other words, the Java thread(s) making these calls by the portal for CWebProxy/CGenericXSLT will block on the external resource, waiting for the complete response. Reducing the number of network IO calls will make a vast improvement especially during peak load periods.

Global Channel Content Cache (GCCC) - at a high level, a new cache is being introduced. This cache (supported by the WhirlyCache software) will be populated by CWebProxy/CGenericXSLT channels who cache content to be shared across users.

For example, suppose the portal has a CWebProxy channel that displays the Academic Calendar for the university, this calendar is the same for all students and is hosted at http://some.place.edu/acal

The first user who logs into the portal will attempt to retreive this content from the GCCC. If not found, it makes the external call to retreive the content and stores it in the GCCC. All subsequent users will then benefit from the first users retrieval of the content.

After a period of time, this content will age and be removed from the cache. Another user, when rendering the channel, will then retrieve/populate the cache with updated content.

Note that the suggested changes are to the CWebProxy/CGenericXSLT channels, NOT the portal framework. Additionally, this proposed solution has been used by myRutgers successfully over the past few months and is being shared here for comment to be included into the uPortal codebase.

Changes are presented in an attached patch. A few notes:

Global Properties

properties/portal.properties

Introduce two new parameters the control default caching for all CWebProxy/CGenericXSLT channels. If not defined in this file, they will default to "false", providing backward compatability (in behavior) to previous uPortal releases.

Note also no additional parameters are added (e.g., cache timeouts). The implementation tends to re-use existing framework cache parameters to reduce complexity.

+org.jasig.portal.channels.CGenericXSLT.cache_global_mode=false
+org.jasig.portal.channels.webproxy.CWebProxy.cache_global_mode=false

Cache Configuration

properties/whirlycache.xml

Whirlycache includes a new cache definition "contentCache". Default values were used and should be sufficient for most portals.

Exposing Cache To Portal Framework

source/org/jasig/portal/utils/cache/CacheFactory.java

Exposes the contentCache to the portal/framework according to portal APIs.

Channel properties

webpages/media/org/jasig/portal/channels/CGenericXSLT/CGenericJustXSLT.cpd
webpages/media/org/jasig/portal/channels/CGenericXSLT/CGenericXSLT.cpd
webpages/media/org/jasig/portal/channels/CGenericXSLT/RSS/RSS.cpd
webpages/media/org/jasig/portal/channels/webproxy/CWebProxy.cpd

XSLT/CWebProxy channel definitions were modified to have a channel specific parameter cacheGlobalMode/cw_cacheGlobalMode that overrides site-wide defaults for global caching.

CGenericXSLT Channel

source/org/jasig/portal/channels/CGenericXSLT.java

As discussed, the content cache is retreived from the CacheManager and stored in a static variable. External content is retreived if not found in the cache, then stored for a configured period (same period as defined by per user caching configuration parameter).

Of particular interest is the isCacheValid() method. The framwork is asking, "is the processed data still valid?". For per-user caching, this is straight-forward. For Global caching, it is a more interesting question.

An example might best explain the concern. If user A logs into the portal at time O and retrieves content to be stored for 15 minutes in the Global Cache, the user also has a per-user cache of 15 minutes.

Then user B logs into the portal at O + 14. Content is retrieved from the GCCC, but the user also has a per-user cache value defined for 15 minutes (O + 14 + 15). To keep the user up-to-date, the isCacheValid() should return false anytime after O + 15.

Therefore, the isCacheValid() method was changed to indicate to the framework a per-user cache expiration anytime after O + 15. This will cause the framework to invoke the channel renderXML(), causing retrieval of new content from the GCCC (or external resource if not found in the GCCC).

CWebProxy Channel

source/org/jasig/portal/channels/webproxy/CWebProxy.java

Changes are similar to CGenericXSLT. As in the CGenericXSLT channel, determination of when the channel content should be globally cached is very important.

Some rules that turn off global caching include:

  • state.cacheGlobalMode == false: defined through portal.properties default or channel specific parameter. If false, do not globally cache.
  • "none".equalsIgnoreCase(state.cacheMode): if per-user framework cachine is turned off, do not do global caching.
  • "post".equalsIgnoreCase(state.runtimeData.getHttpRequestMethod()): do not globally cache HTTP POST's.
  • state.fullxmlUri.indexOf('?') >= 0: do not globally cache if parameters are found on the url.
  • state.localConnContext != null: if a local connection context is defined, do not globally cache.

Reduce Tidy Objects

Reduction in the number of temporary objects by creating a single Tidy object per thread.

Tidy is used to clean invalid XML data to aid in parsing. Instead of creating one Tidy object each time data is being Tidied, a single Tidy object is created per Thread. It is then re-used by that thread each time content needs to be Tidied.

Index: source/org/jasig/portal/channels/webproxy/CWebProxy.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/channels/webproxy/CWebProxy.java,v
retrieving revision 1.40.2.6
diff -u -r1.40.2.6 CWebProxy.java
--- source/org/jasig/portal/channels/webproxy/CWebProxy.java    20 Oct 2005 20:57:14 -0000  1.40.2.6
+++ source/org/jasig/portal/channels/webproxy/CWebProxy.java    21 Aug 2006 13:53:18 -0000
+
+  // Optimized; share the tidy object in this thread
+  private static final ThreadLocal perThreadTidy = new ThreadLocal() {
+      protected Object initialValue() {
+          Tidy tidy = new Tidy ();
+          tidy.setXHTML (true);
+          tidy.setDocType ("omit");
+          tidy.setQuiet(true);
+          tidy.setShowWarnings(false);
+          tidy.setNumEntities(true);
+          tidy.setWord2000(true);
+
+          tidy.setErrout(devNull);
+
+          return (tidy);
+      }
+  };

   /**
     * Get the contents of a URI as a String but send it through tidy first.
@@ -888,13 +1035,7 @@
         encoding = encoding.substring(1,encoding.length()+1);
     }

-    Tidy tidy = new Tidy ();
-    tidy.setXHTML (true);
-    tidy.setDocType ("omit");
-    tidy.setQuiet(true);
-    tidy.setShowWarnings(false);
-    tidy.setNumEntities(true);
-    tidy.setWord2000(true);
+    Tidy tidy = (Tidy) perThreadTidy.get();

     // If charset is specified in header, set JTidy's
     // character encoding  to either UTF-8, ISO-8859-1
@@ -916,7 +1057,7 @@
       tidy.setCharEncoding(org.w3c.tidy.Configuration.UTF8);
     }

-    tidy.setErrout(devNull);
+    // tidy.setErrout(devNull);

     ByteArrayOutputStream stream = new ByteArrayOutputStream (1024);
     BufferedOutputStream out = new BufferedOutputStream (stream);

Optimize BaseMarkupSerializer

Much time was being spent in BaseMarkupSerializer.printEscaped() method. Calls to getEntityRef() were identified as very numerous. To decrease this, some inline optimization was done for the most common occurrences of character data.

Index: source/org/jasig/portal/serialize/BaseMarkupSerializer.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/serialize/BaseMarkupSerializer.java,v
retrieving revision 1.13.2.1
diff -u -r1.13.2.1 BaseMarkupSerializer.java
--- source/org/jasig/portal/serialize/BaseMarkupSerializer.java	5 Aug 2005 18:39:26 -0000	1.13.2.1
+++ source/org/jasig/portal/serialize/BaseMarkupSerializer.java	21 Aug 2006 13:53:19 -0000
@@ -1405,8 +1405,12 @@
         // character, print it. The list of available entity
         // references is almost but not identical between
         // XML and HTML.
-        charRef = getEntityRef( ch );
-        if ( charRef != null ) {
+
+        // Optimized; quick checks for ASCII letter/whitespace
+        if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')
+                || ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r') {
+            _printer.printText((char)ch );
+        } else if ((charRef = getEntityRef(ch)) != null ) {
             _printer.printText( '&' );
             _printer.printText( charRef );
             _printer.printText( ';' );

Optimize HTMLdtd

Calls to HTMLdtd.initialize() were numerous. Only needed to call initialize() once.

Index: source/org/jasig/portal/serialize/HTMLdtd.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/serialize/HTMLdtd.java,v
retrieving revision 1.3.10.2
diff -u -r1.3.10.2 HTMLdtd.java
--- source/org/jasig/portal/serialize/HTMLdtd.java	4 Aug 2005 22:17:46 -0000	1.3.10.2
+++ source/org/jasig/portal/serialize/HTMLdtd.java	21 Aug 2006 13:53:19 -0000
@@ -343,7 +343,8 @@
        if (value > 0xffff)
             return null;

-        initialize();
+        // Optimized; no need to call since done in static block
+        // initialize();
         if( value < _entity.length){
         return  _entity[value];
         } else {

Optimize Printer

Much time being spent in Printer.printText() method. Logic re-worked to make more efficient use of buffers.

Index: source/org/jasig/portal/serialize/Printer.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/serialize/Printer.java,v
retrieving revision 1.2
diff -u -r1.2 Printer.java
--- source/org/jasig/portal/serialize/Printer.java	4 Apr 2003 00:46:44 -0000	1.2
+++ source/org/jasig/portal/serialize/Printer.java	21 Aug 2006 13:53:19 -0000
@@ -203,13 +203,22 @@
     {
         try {
             int length = text.length();
-            for ( int i = 0 ; i < length ; ++i ) {
-                if ( _pos == BufferSize ) {
-                    _writer.write( _buffer );
-                    _pos = 0;
-                }
-                _buffer[ _pos ] = text.charAt( i );
-                ++_pos;
+
+            // if buffer doesn't have enough
+            // room for text, write it
+            if (length + _pos >= BufferSize) {
+                _writer.write(_buffer, 0, _pos);
+                _pos = 0;
+            }
+
+            // now, make sure the text isn't bigger
+            // than the buffer; if so, write text directly
+            if (length + _pos >= BufferSize) {
+                _writer.write(text);
+            } else {
+                // append text to buffer
+                text.getChars(0, length, _buffer, _pos);
+                _pos += length;
             }
         } catch ( IOException except ) {
             // We don't throw an exception, but hold it

Optimize XHTMLSerializer

Many calls identified to String.toLowerCase() in XHTMLSerializer.startElement(). Local caching of lower-cased string reduces the number of objects created.

Index: source/org/jasig/portal/serialize/XHTMLSerializer.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/serialize/XHTMLSerializer.java,v
retrieving revision 1.5
diff -u -r1.5 XHTMLSerializer.java
--- source/org/jasig/portal/serialize/XHTMLSerializer.java	13 May 2005 20:12:57 -0000	1.5
+++ source/org/jasig/portal/serialize/XHTMLSerializer.java	21 Aug 2006 13:53:19 -0000
@@ -220,9 +220,12 @@
                     htmlName = null;
             }

+            // Optimized; lowercase rawName only once
+            String lrawName = rawName.toLowerCase();
+
             // XHTML: element names are lower case, DOM will be different
             _printer.printText( '<' );
-            _printer.printText( rawName.toLowerCase() );
+            _printer.printText( lrawName );
             _printer.indent();

             // For each attribute serialize it's name and value as one part,
@@ -241,7 +244,7 @@
                         _printer.printText( name );
                         _printer.printText( "=\"" );
                         value = ProxyWriter.considerProxyRewrite(name,localName,value);
-                        value = appendAnchorIfNecessary(rawName.toLowerCase(),name,value);
+                        value = appendAnchorIfNecessary(lrawName,name,value);
                         printEscaped( value );
                         _printer.printText( '"' );
                     }
@@ -294,8 +297,8 @@

             // Handle SCRIPT and STYLE specifically by changing the
             // state of the current element to CDATA
-            if ( htmlName != null && ( rawName.equalsIgnoreCase( "SCRIPT" ) ||
-                                       rawName.equalsIgnoreCase( "STYLE" ) ) ) {
+            if ( htmlName != null && ( lrawName.equals( "script" ) ||
+                                       lrawName.equals( "style" ) ) ) {
                     // XHTML: Print contents as CDATA section
                     state.doCData = true;
             }

Optimize ResourceLoader

Much time being spent in ResourceLoader.getResourceAsURLString(). Method being called frequently and was re-loading the same resource from disk. A simple, small cache was introduced to cache first few resource retrievals.

Index: source/org/jasig/portal/utils/ResourceLoader.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/utils/ResourceLoader.java,v
retrieving revision 1.20
diff -u -r1.20 ResourceLoader.java
--- source/org/jasig/portal/utils/ResourceLoader.java	20 Apr 2005 07:24:14 -0000	1.20
+++ source/org/jasig/portal/utils/ResourceLoader.java	21 Aug 2006 13:53:19 -0000
@@ -14,6 +14,9 @@
 import java.net.MalformedURLException;
 import java.net.URL;
 import java.net.URLDecoder;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
 import java.util.Properties;

 import javax.xml.parsers.DocumentBuilderFactory;
@@ -108,16 +111,58 @@
    * @throws org.jasig.portal.ResourceMissingException
    */
   public static String getResourceAsURLString(Class requestingClass, String resource) throws ResourceMissingException {
-    return getResourceAsURL(requestingClass, resource).toString();
-  }
+      String res;
+        final String key = requestingClass.getName();
+        // Optimized; cache results of first n lookups
+
+        // maintain a hashmap of hashmaps; keyed off of requestingClass name
+        Map rmap = (Map) chm.get(key);
+        if (rmap == null && chm.size() < 96) {
+            // we store about 96 items; may be a few more since we're not
+            // sync'ing
+            chm.put(key, Collections.synchronizedMap(new HashMap(12)));
+
+            // it's possible rmap below isn't the value we just put - that's ok
+            // though
+            rmap = (Map) chm.get(key);
+
+        } else if ((res = (String) rmap.get(resource)) != null) {
+            return (res);
+        }
+
+        // at this point, we have to execute the expensive operation
+        res = getResourceAsURL(requestingClass, resource).toString();
+
+        if (res != null && rmap != null && rmap.size() < 8) {
+            rmap.put(resource, res);
+        }
+
+        return (res);
+    }
+
+    // The resource hash map (chm) is keyed off of the requestingClass name,
+    // and will contain entries of HashMap's, each keyed off of resource. A
+    // single hashmap could have been used with a key of
+    // "classname:resourcename",
+    // but that would involve constructing many string objects when putting
+    // and/or getting from the map. Therefore, two maps are used. Cache sizes
+    // were selected at random; numbers selected successfully cached the
+    // values for the myRutgers portal
+    private static final Map chm = Collections
+            .synchronizedMap(new HashMap(128));

   /**
-   * Returns the requested resource as a File.
-   * @param requestingClass the java.lang.Class object of the class that is attempting to load the resource
-   * @param resource a String describing the full or partial URL of the resource to load
-   * @return the requested resource as a File
-   * @throws org.jasig.portal.ResourceMissingException
-   */
+     * Returns the requested resource as a File.
+     *
+     * @param requestingClass
+     *            the java.lang.Class object of the class that is attempting to
+     *            load the resource
+     * @param resource
+     *            a String describing the full or partial URL of the resource to
+     *            load
+     * @return the requested resource as a File
+     * @throws org.jasig.portal.ResourceMissingException
+     */
   public static File getResourceAsFile(Class requestingClass, String resource) throws ResourceMissingException {
     return new File(getResourceAsFileString(requestingClass, resource));
   }

Optimize SubstitutionIntegerFilter and SubstitutionWriter

Many calls being made to write char values through iteration of character arrays. Modified to instead be array based.

Index: source/org/jasig/portal/utils/SubstitutionIntegerFilter.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/utils/SubstitutionIntegerFilter.java,v
retrieving revision 1.8.4.1
diff -u -r1.8.4.1 SubstitutionIntegerFilter.java
--- source/org/jasig/portal/utils/SubstitutionIntegerFilter.java	9 Sep 2005 14:50:38 -0000	1.8.4.1
+++ source/org/jasig/portal/utils/SubstitutionIntegerFilter.java	21 Aug 2006 13:53:19 -0000
@@ -40,6 +40,7 @@
     private char[] buffer;
     private int bufferindex;
     private int maxBuffer = DEFAULT_BUFFER_SIZE;
+    private int maxBufferSubstitute = maxBuffer - 2;

     /**
      * Creates a new <code>SubstitutionIntegerFilter</code> instance.
@@ -72,6 +73,7 @@
         this.matchindex=0;
         this.bufferindex=0;
         this.maxBuffer=bufferSize-target.length;
+        this.maxBufferSubstitute = this.maxBuffer - 2;
         this.buffer=new char[maxBuffer + target.length];
     }

@@ -95,6 +97,39 @@
         }
     }

+    // Optimized; perform filter against given array
+    public void write(final char[] ca, final int off, final int len) throws IOException {
+        final int tlen = target.length - 1;
+
+        for (int i = off; i < len; i++) {
+            final char c = ca[i];
+            if (c == target[matchindex]) {
+                if (matchindex < tlen) {
+                    matchindex++;
+                } else {
+                    // we have a match, roll back buffer and add substitute
+                    bufferindex = bufferindex - matchindex;
+                    matchindex = 0;
+                    for (int x =0; x<substitute.length;x++){
+                        if ((bufferindex > (maxBufferSubstitute)) && matchindex == 0) {
+                            flush();
+                        }
+                        buffer[bufferindex++] = substitute[x];
+                    }
+                    continue;
+                }
+            } else {
+                matchindex=0;
+            }
+
+            if ((bufferindex > maxBufferSubstitute) && matchindex == 0) {
+                flush();
+            }
+
+            buffer[bufferindex++] = c;
+        }
+    }
+
     public void flush() throws IOException {
         // do internal flush
         out.write(buffer,0,bufferindex);
@@ -109,7 +144,7 @@

     protected void addToBuffer(char i) throws IOException{
       // flush if buffer fills up, but only if we're not tracking a possible substitution
-        if ((bufferindex > (maxBuffer-2)) && matchindex==0){
+        if ((bufferindex > maxBufferSubstitute) && matchindex==0){
           flush();
         }
         buffer[bufferindex++] = i;



Index: source/org/jasig/portal/utils/SubstitutionWriter.java
===================================================================
RCS file: /home/cvs/jasig/portal/source/org/jasig/portal/utils/SubstitutionWriter.java,v
retrieving revision 1.6.4.1
diff -u -r1.6.4.1 SubstitutionWriter.java
--- source/org/jasig/portal/utils/SubstitutionWriter.java	9 Sep 2005 14:50:38 -0000	1.6.4.1
+++ source/org/jasig/portal/utils/SubstitutionWriter.java	21 Aug 2006 13:53:19 -0000
@@ -60,9 +60,8 @@
         // check boundaries
         if(off+len>cbuf.length) throw new IOException("Invalid offsent or length specified");

-        for(int i=0;i<len;i++) {
-            filter.write(cbuf[i]);
-        }
+        // Optimized; write character array directly to filter
+        filter.write(cbuf, off, len);
     }

     /**