uPortal Heap Tuning

Reasoning

In preparation for the Fall 2006 semester UW Madison was looking at ways to improve portal performance. The initial effort was focused at tuning the JVM heap for the way uPortal uses memory.

Notes & Warnings

This configuration is meant for a multi-cpu machine. It WILL NOT perform well on a single CPU machine.
This configuration was created specifically for the hardware and usage patterns at UW-Madison, while it may work well as is for you it is recommended that you perform your own performance testing and tweak the configuration as needed.

Preparation

Reading

Before starting to make changes to the heap config following white papers were read to understand the options available for the heap and garbage collectors.

Java Tuning White Paper
- A general overview of tuning strategies for the JVM.
Tuning Garbage Collection with Java 5
- Details about the GC options for Java 5.
Understanding Concurrent Mark Sweep Garbage Collector Logs
- Provides good explanations of all that info the GC logs.
The PrintCompilationFlag
- Not covered here but usefull to learn what the JIT is doing on your system (or if you have HotSpot core dumps)

Testing

We used a jMeter load test script to test each change made to the JVM configuration. The test consisted of the script logging into the portal, visiting one other tab (chosen at random) and logging out. Each test latest 2 hours and was capped at a throughput of 2 requests per second. The throughput cap only applied to the initial view of the home tab after login and the random tab visit.

Break down of Configuration

# Use the Hot Spot server compiler
-server

# Set the initial & max heap size to the same value. This makes monitoring heap usage a bit easier
-Xms1280m
-Xmx1280m

# Set the NewSize and MaxNewSize (space for eden and survivor) to half of the max heap. uPortal creates a lot of temporary objects, the large NewSize provides enough space for the objects to be created, used and die between GCs on the eden heap. This results in less of a need for objects in eden to be copied to survivor. DO NOT set this to more than half of the max heap or the GC can't fulfill the Young Generation Guarantee which will cause constant full GCs
-XX:NewSize=640m
-XX:MaxNewSize=640m

# Set the survivor heap ratio. To determine the size of your eden and survivor spaces use the following formula "SurviorSize = MaxNewSize / (SurvivorRatio + 2)" and "EdenSize = SurviorSize * SurvivorRatio". Remember that there are two Survivor spaces and one Eden space.
-XX:SurvivorRatio=5

# The TargetSurvivorRatio specifies how full the Survivor space is allowed to be. This defaults to 50%, setting it to 90% allows more of the memory to be used and is desired for high throughput applications.
-XX:TargetSurvivorRatio=90

# MaxTenuringThreshold is the maximum number of times a live object can be copied between the survivor spaces before being moved to the tenured space. Objects are copied between the survivor spaces at each minor GC. Read more on the PrintTenuringDistribution log option below on how to tune this setting for your portal.
-XX:MaxTenuringThreshold=12

# Enable the Concurrent Mark Sweep (CMS) GC engine for the tenured space. This is the default GC for JDK5 with the -server flag but we specify it just to be clear on the intended configuration. CMS provides mostly parallel GC operations in the tenured space reducing the length of "stop-the-world" pauses the GC has to do.
-XX:+UseConcMarkSweepGC

# Enable incremental mode for CMS (iCMS). This provides a significant improvement in throughput for applications running on machines with a low number of CPUs (1 to 2). iCMS will run the CMS collector between the young generation collections. This has the effect of constant small iCMS GCs ensuring low pause times and good maintenance of the tenured space.
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:+CMSParallelRemarkEnabled

# Turns on the parallel New GC engine. This moves the young generation GC into a parallel thread. The young GCs shouldn't stop the world with this option turned on.
-XX:+UseParNewGC

# On JDK5 with a 64bit processor the JVM reserves 1GB of virtual address space for the PermGen heap. This can cause some problems so we set it to 64MB, the default for JDK6.
-XX:PermSize=64m
-XX:MaxPermSize=64m
-XX:+UseTLAB

# Enable class unloading. This is needed with ConcMarkSweepGC or unused classes will never be unloaded and there is the possibility of running out of PermGen space.
-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled

# PrintCompilation prints the classes & methods that HotSpot is compiling to native code to System.out. The code cache memory settings and CompileCommandFile are set because of a core dump we were seeing in the JVM when it tried to compile a uPortal class. The hotspot_compiler file contains the line "exclude org.jasig.portal.UserInstance renderState"
-XX:+PrintCompilation
-XX:CodeCacheMinimumFreeSpace=2M
-XX:ReservedCodeCacheSize=64M
-XX:CompileCommandFile=/my/portal/bin/hotspot_compiler

# Enable JMX Remote Monitoring. Monitoring memory usage via JConsole provides very good insight into what the portal is doing.
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9000
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.password.file=/my/portal/bin/jmxremote.password
-Dcom.sun.management.jmxremote.access.file=/my/portal/bin/jmxremote.access

# Turn on GC logging. The GC logs are very useful for tuning your heap. When adjusting the MaxTenuringThreshold PrintTenuringDistribution should be enabled.
-verbose:gc
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-Xloggc:/my/portal/logs/portal/gc.log

# Enable remote debugging port
-Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n

PrintTenuringDistribution

PrintTenuringDistribution provides a list of the size of objects that have survived each of the last X young generation GCs where X is the current TenuringThreshold.

Example gc log output with PrintTenuringDistribution enabled:

1096.789: [GC 1096.789: [ParNew
Desired survivor size 86232268 bytes, new threshold 6 (max 12)
- age   1:   50754696 bytes,   50754696 total
- age   2:   12147696 bytes,   62902392 total
- age   3:   12295552 bytes,   75197944 total
- age   4:    6537136 bytes,   81735080 total
- age   5:    2435944 bytes,   84171024 total
- age   6:    3013488 bytes,   87184512 total
- age   7:     627368 bytes,   87811880 total
- age   8:     999536 bytes,   88811416 total
- age   9:     924656 bytes,   89736072 total
- age  10:    1811480 bytes,   91547552 total
: 554848K->89528K(561792K), 0.5317388 secs] 607743K->146164K(1217152K) icms_dc=18 , 0.5326526 secs]

This statement is from a young generation collection by the ParNew collector. The MaxTenuringThreshold is set to 12 but the JVM has just adjusted the actual TenuringThreshold to 6 because only the first 6 generations will fit into the survivor space (in this config the survivor space is 91MB and it is allowed to be 90% full).

The log shows that generally each generation is smaller than the generation before it. This is expected and will happen to a certain point. After that point the numbers will start going back up as the objects are staying alive long enough in the survivor space. The low point in size for the generations is the sweet spot and where your MaxTenuringThreshold should be set. This will have the affect that objects that would generally survive past the sweet spot will be copied into tenured space sooner rather than later, saving them from unneeded copies between the survivor spaces.