Monitoring uPortal MBeans, Jmxterm and Cacti

Monitoring uPortal MBeans with Jxmterm

Background 

The uPortal software maintains a number of statistics as managed beans (MBeans) in the Java Virtual Machine (JVM). A Java Management Extension (JMX) client, such as Jmxterm, can be used to retrieve those statistics such that they can be made available to an external monitoring system. This document shows Cacti (open source - http://www.cacti.net/) as a monitoring system, which is great at archiving this kind of data and displaying it as a collection of versatile graphs. The same approach can be used to feed a monitoring system such as Spectrum (commercial - http://www.ca.com/). Spectrum's strength is that you can setup thresholds that throw alerts when those thresholds are exceeded.

Note: This documentation is rather UNIX/Linux specific.

Opening JMX up on the JVM

There are a number of JMX-related command line arguments that can be given to the "java" command when starting uPortal. Here are some that I've used:

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=7777
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false 

Some of those settings might not be appropriate for your site. If you restrict access to the TCP/IP port you choose to open, such as via a firewall, then you may be comfortable enough to go without having authentication or SSL, as shown above. For instance, if you restrict access to port 7777 to just localhost (127.0.0.1) and assume you have to have authenticated to be able to invoke a shell then you may not be concerned with authentication. If communication is all internal to the host, there is no need for SSL.

Working with those assumptions, we can write shell scripts to feed a locally installed Simple Network Monitoring Protocol (SNMP) client which then can pass information to an enterprise monitoring service.

If you're planning to remotely connect to your JVM then you'll want to revisit this topic. For reference, this configuration is exactly the same as is necessary to use JConsole to access a JVM.

Downloading Jmxterm

Jmxterm is open source software is offered by the CyclopsGroup at their web site:

http://wiki.cyclopsgroup.org/jmxterm

The software is contained entirely within a single JAR file. At the time of this writing, the most recent file was "jmxterm-1.0-alpha-4-uber.jar". Download this file and put it into a directory that makes sense for your site.

The documentation available on that web site is much more complete and comprehensive than this document, which is focused on providing some uPortal-specific examples.

Invoking Jmxterm as a Shell 

To work interactively with Jmxterm, you can invoke it as a shell.

Example:

[tongb@portal2 batch]$ java -jar jmxterm-1.0-alpha-4-uber.jar
Welcome to JMX terminal. Type "help" for available commands.
$> 

Once you receive the "$>" prompt, you can enter jmxterm commands. For a complete treatment of all of the commands, consult the jmxterm web site. This document will show just the steps necessary for retrieving uPortal MBean attributes.

Retrieving uPortal MBean Attributes

There are a couple of steps to go through before you can retrieve the uPortal MBean attributes. First, open a connection to the JVM.

Example:

$> open localhost:7777
#Connection to localhost:7777 is opened

MBeans are grouped by domains. We start by getting a list of all of the available domains, then selecting the uPortal domain.

Example:

$> domains
#following domains are available
BookmarkPortlets
Catalina
JMImplementation
Users
WebProxyPortlet
com.sun.management
java.lang
java.util.logging
net.sf.ehcache
uPortal

We're can get a list of MBeans for a domain.

Example:

$> beans -d uPortal
#domain = uPortal:
uPortal:name=HibernateStatistics,section=Persistence
uPortal:name=PortalDB,section=Persistence
uPortal:name=PortalStatsHibernateStatistics,section=Persistence
uPortal:name=Statistics,section=Framework

The results show there are four MBeans in the uPortal domain. You can examine what an MBean can provide using the "info" command. The "-b" argument specifies the bean name and the "-d" argument specifies the domain.

Example:

$> info -b name=Statistics,section=Framework
#mbean = uPortal:name=Statistics,section=Framework
#class name = org.jasig.portal.jmx.FrameworkMBeanImpl
# attributes
  %0   - AuthenticationAverage (long, r)
  %1   - AuthenticationHighMax (long, r)
  %2   - AuthenticationLast (long, r)
  %3   - AuthenticationMax (long, r)
  %4   - AuthenticationMin (long, r)
  %5   - AuthenticationTotalLogins (long, r)
  %6   - ChannelRendererActiveThreads (long, r)
  %7   - ChannelRendererMaxActiveThreads (long, r)
  %8   - DatabaseAverage (long, r)
  %9   - DatabaseHighMax (long, r)
  %10  - DatabaseLast (long, r)
  %11  - DatabaseMax (long, r)
  %12  - DatabaseMin (long, r)
  %13  - DatabaseTotalConnections (long, r)
  %14  - GuestSessionCount (long, r)
  %15  - LastAuthentication (org.jasig.portal.utils.MovingAverageSample, r)
  %16  - LastDatabase (org.jasig.portal.utils.MovingAverageSample, r)
  %17  - LastRender (org.jasig.portal.utils.MovingAverageSample, r)
  %18  - RDBMActiveConnectionCount (int, r)
  %19  - RDBMMaxConnectionCount (int, r)
  %20  - RecentProblems ([Ljava.lang.String;, r)
  %21  - RenderAverage (long, r)
  %22  - RenderHighMax (long, r)
  %23  - RenderLast (long, r)
  %24  - RenderMax (long, r)
  %25  - RenderMin (long, r)
  %26  - RenderTotalRenders (long, r)
  %27  - StartedAt (java.util.Date, r)
  %28  - ThreadCount (long, r)
  %29  - UserSessionCount (long, r)
# operations
  %0   - long getAuthenticationAverage()
  %1   - long getAuthenticationHighMax()
  %2   - long getAuthenticationLast()
  %3   - long getAuthenticationMax()
  %4   - long getAuthenticationMin()
  %5   - long getAuthenticationTotalLogins()
  %6   - long getChannelRendererActiveThreads()
  %7   - long getChannelRendererMaxActiveThreads()
  %8   - long getDatabaseAverage()
  %9   - long getDatabaseHighMax()
  %10  - long getDatabaseLast()
  %11  - long getDatabaseMax()
  %12  - long getDatabaseMin()
  %13  - long getDatabaseTotalConnections()
  %14  - long getGuestSessionCount()
  %15  - org.jasig.portal.utils.MovingAverageSample getLastAuthentication()
  %16  - org.jasig.portal.utils.MovingAverageSample getLastDatabase()
  %17  - org.jasig.portal.utils.MovingAverageSample getLastRender()
  %18  - int getRDBMActiveConnectionCount()
  %19  - int getRDBMMaxConnectionCount()
  %20  - [Ljava.lang.String; getRecentProblems()
  %21  - long getRenderAverage()
  %22  - long getRenderHighMax()
  %23  - long getRenderLast()
  %24  - long getRenderMax()
  %25  - long getRenderMin()
  %26  - long getRenderTotalRenders()
  %27  - java.util.Date getStartedAt()
  %28  - long getThreadCount()
  %29  - long getUserSessionCount()
#there's no notifications

What we really want to monitor are certain attributes of those beans. The get command has a few arguments to fill in. The "*" argument tells jmxterm that we want all of the attributes for that bean.

Example:

$> get -b name=Statistics,section=Framework -d uPortal *
#mbean = uPortal:name=Statistics,section=Framework:
RDBMActiveConnectionCount = 0;
RDBMMaxConnectionCount = 3;
AuthenticationAverage = 611;
AuthenticationHighMax = 611;
AuthenticationLast = 611;
AuthenticationMax = 611;
AuthenticationMin = 611;
AuthenticationTotalLogins = 1;
ChannelRendererActiveThreads = 0;
ChannelRendererMaxActiveThreads = 13;
DatabaseAverage = 2;
DatabaseHighMax = 59;
DatabaseLast = 2;
DatabaseMax = 8;
DatabaseMin = 1;
DatabaseTotalConnections = 682;
GuestSessionCount = 0;
#RuntimeIOException: Runtime IO exception: error unmarshalling return; nested exception is:
        java.lang.ClassNotFoundException: org/jasig/portal/utils/MovingAverageSample (no security manager: RMI class loader disabled)

(!) The exception at the end is saying that jmxterm does not have a security manager, which is preventing jmxterm's remote method invocation class loader from accepting information about the uPortal MovingAverageSample class. I have not found a way around this issue so some of the MBean attributes are unavailable.

You can specify a specific attribute by listing the attribute's name (instead of the "*" wildcard).

Examples:

$> get -b name=Statistics,section=Framework -d uPortal ChannelRendererActiveThreads
#mbean = uPortal:name=Statistics,section=Framework:
ChannelRendererActiveThreads = 0;

$> get -b name=Statistics,section=Framework -d uPortal -s ChannelRendererActiveThreads
#mbean = uPortal:name=Statistics,section=Framework:
0

The "-s" option stands for "silent", so you get a little less output.

Scripting Jmxterm

There are several reasons why you might want to wrap jmxterm commands into a shell script. One reason is that some Unix/Linux sites use snmpd to provide information to enterprise monitoring services. The snmpd client can be configured to execute shell scripts and then report the value that was returned by the script. (For more information about this feature of snmpd, see the "EXTENDING AGENT FUNCTIONALITY" section of the snmpd.conf manual pages.)

Example Script (get-rendering-thread-pool-count.sh):

#!/bin/bash
# The command to issue to jmxterm to return a value...
JMXTERM_CMD="get -b name=Statistics,section=Framework -d uPortal -s ChannelRendererActiveThreads"

# Invoking jmxterm non-interactively to get the value...
echo $JMXTERM_CMD | /usr/bin/java -jar jmxterm-1.0-alpha-4-uber.jar -l localhost:7777 -v silent -n

This script will send the number of currently active channel renderer threads to stdout.

Example:

[tongb@portal2 batch]$ ./get-rendering-thread-pool-count.sh
0

At this point, a system administrator can make an entry in snmpd.conf that associates this script and its results with an SNMP OID that will be reported to the enterprise monitoring system. That monitoring system can then set thresholds for notification. For instance, it could issue a low level warning if the count reached 100, and a higher level notice if the count reached 150. The size of the rendering thread pool is a customizable setting within uPortal, so you would set your thresholds based on that.

Consider the following script, which shows the number of current sessions.

Example Script (get-user-session-count.sh):

#!/bin/bash
# The command to issue to jmxterm to return a value...
JMXTERM_CMD="get -b name=Statistics,section=Framework -d uPortal -s UserSessionCount"

# Invoking jmxterm non-interactively to get the value...
echo $JMXTERM_CMD | /usr/bin/java -jar jmxterm-1.0-alpha-4-uber.jar -l localhost:7777 -v silent -n

Combined with snmpd, this script would allow our enterprise monitoring system to track, and maintain a history of, the number of current user sessions for each node in our production cluster.

Here’s the entry for the /etc/snmp/snmpd.conf file:

# portal specific stats
extend .1.3.6.1.4.1.32396.1.3.205 uportal-user-count /opt/portal/batch/get-user-session-count.sh

The OID value (.1.3.6.1.4.1.32396.1.3.205) is specific to Ohio University. That is, it comes from Ohio University’s allocation of OID values from the Internet Assigned Numbers Authority (IANA), but you can probably reuse it for your testing and experimenting. It would probably also work for you in production, though if you used something sold and distributed by Ohio University you might someday face a collision. Getting your own OID range for your organization is quite painless. (Contact the IANA.) You might find you have one already. As of this writing, Jasig did not have an allocated range.

A More Substantial Script

During the course of operationalizing all of this, the script being used became more substantial. It turned out the SNMP agent we used demanded a more responsive script and could not wait up to 30 seconds to get a response. The script was altered to be used in two ways. A “load” option would be called by the “cron” service to determine the value and cache the results. Then snmpd could call on the “show” option any time it wanted, which would just return what was in the cache. The script follows:

#!/bin/bash
# This script can be executed two ways:
#   "load" finds the value and writes it to a cache file.
#   "show" echos the value contained in the cache file.
#
# The reason for the two approaches is that finding the value (load) takes
# a while, but the services that need the value cannot afford to wait that
# long to get an answer. So we teach cron to run the "load" option every
# so many minutes, and the various services can "show" any time they want.

# Where to find the jmxterm JAR file...
PORTAL_BATCH_PATH=/opt/portal/batch

# The command to issue to jmxterm to return a value...
JMXTERM_CMD="get -b name=Statistics,section=Framework -d uPortal -s UserSessionCount"

# The result cache file extension...
CACHE_FILE=$PORTAL_BATCH_PATH/get-user-session-count.cache

# Load up the metric...
if [ "$1" = "load" ]
then
    # Invoking jmxterm non-interactively to get the value...
    RESULT=`echo $JMXTERM_CMD | /usr/bin/java -jar $PORTAL_BATCH_PATH/jmxterm-1.0-alpha-4-uber.jar -l localhost:7777 -v silent -n 2> /dev/null`
    RC=$?

    # If there was no error, print the result, else print 0 if there was an error...
    if [ $RC -eq 0 ]
    then
        echo "$RESULT" > $CACHE_FILE
    else
        echo "0" > $CACHE_FILE
    fi

    exit 0
fi

# Show the previously loaded metric...
if [ "$1" = "show" ]
then
    if [ -f $CACHE_FILE ]
    then
        /bin/cat $CACHE_FILE
    else
        echo "0"
    fi

    exit 0
fi

# If we made it here, there were no valid command line arguments.
echo "Usage: $0 [load|show]"
exit 100

Cacti Configuration

I wasn't very involved in the Cacti configuration, but our Cacti administrator was willing to export the configuration for the graphs we defined. Hopefully these can serve as an example. You might even be able to just import them using Cacti features. They depends on the OID values I mentioned above, so you may have to alter the definitions to fit OID values you selected. You should find this configuration in a ZIP file attached to this page.

Final Results

Once everything is implemented, your SNMP-based monitoring system can start to produce results like these:

MBean Attribute: UserSessionCount
SNMP OID: .1.3.6.1.4.1.32396.1.3.205

 

MBean Attribute: AuthenticationAverage
SNMP OID: .1.3.6.1.4.1.32396.1.3.200

 

MBean Attribute: DatabaseLast
SNMP OID: .1.3.6.1.4.1.32396.1.3.201

 

MBean Attribute: Not Applicable, File Descriptors are collected in a script via this command: /usr/sbin/lsof -u "tomcat" | wc -l
SNMP OID: .1.3.6.1.4.1.32396.1.3.202

 

MBean Attribute: RenderAverage
SNMP OID: .1.3.6.1.4.1.32396.1.3.203