Clustering CAS
Implementing clustering introduces CAS server security concerns
It's easy to visualize the requirements to secure the path of sensitive information when working with a single-server installation of CAS:
- Protect user passwords with SSL encryption
- Secure the communication between the CAS server and the credential store
- Assure that the Ticket Granting Cookie is only sent from the browser to the CAS server
- Assure that Proxy Tickets are only issued to an SSL-protected endpoint
- Secure the validation of Service Tickets and Proxy Tickets with SSL encryption
It is also easy to visualize how clustering CAS servers may create additional security concerns. This article, while thorough at explaining needs for CAS servers to share their data with each other, does not aim to explain how to secure these additional network communication channels. It is imperative that implementers analyze each of the steps described below for potential security weaknesses in their network environments.
Relevant webinar
See also the relevant September 2010 Jasig CAS Community Call, with both slides and audio available, which featured a presentation with a perspective on clustering CAS from Howard Gilbert at Yale University.
Overview
Clustering is essential if your CAS instance is to be "highly available," or HA in manager-speak. Since CAS is a stateful application, there must be a way for each CAS instance to know about what the other CAS instance has done. It would be nice to just use one CAS instance (and one instance on the appropriate hardware can probably easily handle your login needs), but if that instance fails, you do not want all of your users to have to log in again.
As mentioned above, CAS is a stateful application, and stateful in more than one way. CAS keeps track of users in the application's session, and it keeps track of the services the user visits and the tickets used to visit those services. Although the service and proxy tickets are only stored in memory for a brief amount of time, if you are load balancing and clustering CAS, each instance of CAS must immediately know about those tickets. If they do not, CAS simply will not work (most of the time). You may think that LB sticky sessions will save you, but they won't! Sticky sessions are good for sending the user (via a web browser) back to the same CAS instance, but it does not solve the problem that applications also use CAS, and the LB may have already determined that a particular application should be using another CAS instance (via sticky sessions)!
So, there are several things that need to be be done for clustering to work:
- Replicate user login information
- Replicate tickets
- Ensure all tickets (TGTs, service, and proxy tickets) are unique across JVMs
Since CAS is a Java application (and based on Spring at that), there are many ways to do clustering. Furthermore, there is no easy "on/off" switch for clustering, hence this document. The CAS clustering described here takes advantage of the Spring aspects of CAS, and implements the clustering purely via XML configuration! (Of course, we do use Java classes that have already been written by the CAS team.)
Assumptions
This HOW TO makes the following assumptions:
- CAS 3.0.6 or greater
- CAS 3.1.0 or greater
- CAS 3.2.0 or greater
- Tomcat 5.5 or 6.0
- JBOSS 4 running "all"
- You know how to deploy CAS 3.0.x / CAS 3.1.x in Tomcat
- You know how to configure Tomcat (or at least poke blindly at the controls until they let you go)
- That CAS is configured to actually work, i.e., users can actually use your CAS for authN
- You have some load balancing mechanism for your (soon to be) clustered environment
- You have checked in with your network administrators about using Multicast on your network
- One CAS instance per host - if you have more, you will have to make some adjustments, but they will be obvious to figure out
Clustering
Guaranteeing Ticket Uniqueness
*If you are using CAS 3.2.x, feel free to skip this step. It is already part of your implementation.
Since all the tickets need to be unique across JVMs, we will configure this part first, and it is the easiest part to do, too.
The first problem you need to solve is what unique identifier to use. I choose the hostname of the server from which CAS is being served. Because this is Java and we do everything via XML configuration and not Java code, we will solve this problem using the applicationContext.xml
file and one other file external to CAS. The benefit of this approach is that a single deployable (WAR file) can be used across all nodes of the cluster with host-specific properties resolved from the filesystem of each host. We use this strategy at Virginia Tech and it works very well.
By default CAS gets vital host-specific configuration properties from the cas.properties file that is packed in the WAR file. Place that file on a convenient filesystem location that is accessible by the Java process running the servlet container, e.g.,
/apps/local/share/etc/cas.properties
The contents of cas.properties should be exactly the same as that distributed with the CAS distribution:
# Unique name for each node in cluster # Host name is a good choice, but can be anything host.name=eiger # Security settings cas.securityContext.serviceProperties.service=https://eiger.some.edu/cas/services/j_acegi_cas_security_check # Names of roles allowed to access the CAS service manager cas.securityContext.serviceProperties.adminRoles=ROLE_MIDDLEWARE.STAFF cas.securityContext.casProcessingFilterEntryPoint.loginUrl=https://eiger.some.edu/cas/login cas.securityContext.ticketValidator.casServerUrlPrefix=https://eiger.some.edu/cas # Name of properties file defining views cas.viewResolver.basename=default
In order for CAS to load properties from the filesystem instead of the classpath of the unpacked WAR file, you must modify the file /WEB-INF/applicationContext.xml
.
<bean id="placeholderConfig" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"> <property name="locations"> <list> <value>file:/apps/local/share/etc/cas.properties</value> </list> </property> </bean>
The host.name property placeholder is used by ticket generators to tag tickets issued by a particular cluster node:
<bean id="ticketGrantingTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator"> <constructor-arg index="0" type="int" value="50" /> <constructor-arg index="1" value="${host.name}" /> </bean> <bean id="serviceTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator"> <constructor-arg index="0" type="int" value="20" /> <constructor-arg index="1" value="${host.name}" /> </bean> <bean id="proxy20TicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator"> <constructor-arg index="0" type="int" value="20" /> <constructor-arg index="1" value="${host.name}" /> </bean> ...
This creates tickets, for example, like the following:
TGT-2-Lj1aIVkEqGDCSLaXwXVQlIcYQcyyqcI0tuR-<hostname of your server>
Tomcat Session Replication
Since CAS stores the login information in the application session we need to setup session replication between our Tomcat instances.
Note there was an approach (sometimes referenced in older resources) for preserving application login state via a Spring Workflow 1.0 configuration option (Spring 1.0 documentation on this here). Spring Webflow 2.0+ (used in modern versions of CAS) no longer has this feature, meaning this state must be maintained in some other way (such as Tomcat session replication covered here).
The first thing you need to do is tell CAS (the application) that it is distributable 1. So, in the CAS web.xml file you need to add the <distributable/>
tag. The web.xml file is located here:
CAS 3.0.x
cas-distribution/webapp/WEB-INF/web.xml
CAS 3.1.x & CAS 3.2.x
cas-distribution/cas-server-webapp/src/main/webapp/WEB-INF/web.xml
In this file, I put the distributable
tag right below the context-param section:
... <context-param> <param-name>contextConfigLocation</param-name> <param-value> /WEB-INF/applicationContext.xml, /WEB-INF/deployerConfigContext.xml </param-value> </context-param> <!-- Set the application as distributable: http://tomcat.apache.org/tomcat-5.0-doc/cluster-howto.html --> <distributable /> ...
Now you need to tell Tomcat to replicate the session information by adding Cluster
elements under the Host
elements. In the following examples, data is replicated via UDP multicast since it requires the least amount of host-specific configuration. An alternative is to use TCP, where each node must explicitly know about its peers. Regardless of your choice, you should thoroughly test node failure with your replication strategy to determine whether your network supports graceful node loss and recovery.
<Server port="8005" shutdown="SHUTDOWN"> ... <Engine name="Standalone" defaultHost="localhost"> ... <Host name="localhost" deployOnStartup="true" autoDeploy="false" appBase="webapps"> ... <!-- When configuring for clustering, you also add in a valve to catch all the requests coming in, at the end of the request, the session may or may not be replicated. A session is replicated if and only if all the conditions are met: 1. useDirtyFlag is true or setAttribute or removeAttribute has been called AND 2. a session exists (has been created) 3. the request is not trapped by the "filter" attribute The filter attribute is to filter out requests that could not modify the session, hence we don't replicate the session after the end of this request. The filter is negative, ie, anything you put in the filter, you mean to filter out, ie, no replication will be done on requests that match one of the filters. The filter attribute is delimited by ;, so you can't escape out ; even if you wanted to. filter=".*\.gif;.*\.js;" means that we will not replicate the session after requests with the URI ending with .gif and .js are intercepted. --> <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster" managerClassName="org.apache.catalina.cluster.session.DeltaManager" expireSessionsOnShutdown="false" useDirtyFlag="true"> <Membership className="org.apache.catalina.cluster.mcast.McastService" mcastAddr="239.255.0.1" mcastPort="45564" mcastFrequency="500" mcastDropTime="3000" mcastTTL="1"/> <Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener" tcpListenAddress="auto" tcpListenPort="4001" tcpSelectorTimeout="100" tcpThreadCount="6"/> <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="synchronous"/> <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.htm;.*\.html;.*\.txt;"/> <ClusterListener className="org.apache.catalina.cluster.session.ClusterSessionListener"/> </Cluster> </Host> </Engine> </Server>
<Server port="8005" shutdown="SHUTDOWN"> ... <Engine name="Catalina" defaultHost="localhost"> ... <Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="true" xmlValidation="false" xmlNamespaceAware="false"> ... <!-- When configuring for clustering, you also add in a valve to catch all the requests coming in, at the end of the request, the session may or may not be replicated. A session is replicated if and only if all the conditions are met: 1. useDirtyFlag is true or setAttribute or removeAttribute has been called AND 2. a session exists (has been created) 3. the request is not trapped by the "filter" attribute The filter attribute is to filter out requests that could not modify the session, hence we don't replicate the session after the end of this request. The filter is negative, ie, anything you put in the filter, you mean to filter out, ie, no replication will be done on requests that match one of the filters. The filter attribute is delimited by ;, so you can't escape out ; even if you wanted to. filter=".*\.gif;.*\.js;" means that we will not replicate the session after requests with the URI ending with .gif and .js are intercepted. --> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"> <Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="239.255.0.1" port="45564" frequency="500" dropTime="3000" mcastTTL="1"/> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="4000" autoBind="0" selectorTimeout="100" maxThreads="6"/> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.htm;.*\.html;.*\.txt;"/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster> </Host> </Engine> </Server>
See http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html and http://tomcat.apache.org/tomcat-6.0-doc/config/cluster.html for more information on Tomcat 6 clustering.
Note 1: Again, please check with your network administrator before turning this on. I have set mcastTTL
to 1 because my network admin told me " If you want to force it to stay within your subnet, my understanding is that you can do so by using a TTL of 1." If you want to do clustering outside of a single subnet, you will probably have to change this value, or remove the mcastTTL
attribute and value altogether.
Note 2: You will see a lot of references to the jvmRoute
attribute of the Engine
tag, but you only need to specify that if you are clustering more than one Tomcat on one host. In that case, you will have to specify the jvmRoute
that corresponds to the Apache worker you have specified for that Tomcat instance.
Note 3: If your Tomcat cluster doesn't work (Tomcat instance not seeing other member), perhaps you must change auto
in tcpListenAddress="auto"
by IP address of server.
Note 4: If your Tomcat cluster still doesn't work ensure that the TCP and UDP ports on the servers are not being blocked by a host-based firewall, that your network interface has multicast enabled, and that it has the appropriate routes for multicast.
Note 5: If you see a large stacktrace in the cas.log file that ends with a root cause of: "java.net.BindException: Cannot assign requested address", it's likely due to the JVM trying to use IPv6 sockets while your system is using IPv4. Set the JVM to prefer IPv4 by setting the Java system property -Djava.net.preferIPv4Stack=true. You can set the CATALINA_OPTS environment variable so Tomcat will pick it up automatically with:
export CATALINA_OPTS=-Djava.net.preferIPv4Stack=true
Now start up your two (or more) Tomcat instances (on separate hosts!) and you should see something like the following in the catalina.out
log:
May 22, 2007 4:25:54 PM org.apache.catalina.cluster.tcp.SimpleTcpCluster memberAdded INFO: Replication member added:org.apache.catalina.cluster.mcast.McastMember [tcp://128.32.143.78:4001,catalina,128.32.143.78,4001, alive=5]
Conversly, in the catalina.out
log on my other server, I see:
May 22, 2007 4:27:13 PM org.apache.catalina.cluster.tcp.SimpleTcpCluster memberAdded INFO: Replication member added:org.apache.catalina.cluster.mcast.McastMember [tcp://128.32.143.79:4001,catalina,128.32.143.79,4001, alive=5]
Excellent, you now have clustering of the user's login information for CAS. Test it out by logging into CAS, then stopping Tomcat on the server you logged in at, and then hit the login page again, and CAS should show you the "you are already logged in page."
Ticket Cache Replication
Now you we need to setup the ticket cache replication using the org.jasig.cas.ticket.registry.JBossCacheTicketRegistry
class. We implement this by editing the applicationContext.xml
config file again.
<bean id="ticketRegistry" class="org.jasig.cas.ticket.registry.JBossCacheTicketRegistry" p:cache-ref="cache" /> <bean id="cache" class="org.jasig.cas.util.JBossCacheFactoryBean" p:configLocation="classpath:jbossTicketCacheReplicationConfig.xml" />
Note 1: No space between classpath:
and jbossTicketCacheReplicationConfig.xml,
otherwise you have a not found exception.
In the cache
bean above, there is a property with a value of classpath:jbossTicketCacheReplicationConfig.xml
so now we have to find and do something with this file.
jbossCache.xml
started out life as jbossTestCache.xml
. Since I do not like to put things into production with the word "test" in them, I changed the name (and a few things inside the file). This file is located at:
CAS 3.0.x
cas-distribution/core/src/test/resources/jbossTestCache.xml
CAS 3.1.x & CAS 3.2.x
cas-distribution/cas-server-integration-jboss/src/test/resources/jbossTestCache.xml
Open this file up and get ready for some editing. I discovered that the default file did not work in my installation, as was noted by some others on the CAS mailing list. Scott Battaglia sent an edited version to the list. 2
You have to comment-out the following lines:
<!-- <depends>jboss:service=TransactionManager</depends> -->
and:
<!-- <attribute name="TransactionManagerLookupClass"> org.jboss.cache.DummyTransactionManagerLookup</attribute> -->
Next, you have to edit the mcast_addr
In the ClusterConfig
section, set the mcast_addr
to the value appropriate for your network, and if your hosts are on the same subnet, set ip_ttl
to 1
. You may also need to set the bind_addr
property to the IP address you want this host to listen for TreeCache updates. This is especially true if you are using bonding and/or IPV6 on your system :
<UDP mcast_addr="239.255.0.2" mcast_port="48866" ip_ttl="1" ip_mcast="true" bind_addr="192.168.10.10" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="false"/>
Now that you have edited this file, you have to get it onto your CLASSPATH
I have decided to put it directly into my Tomcat directory:
$CATALINA_BASE/share/classes/jbossTicketCacheReplicationConfig.xml
For JBOSS, this is a good location:
$JBOSS_HOME/server/all/conf/jbossTicketCacheReplicationConfig.xml
If you know of a better way to get it on your CLASSPATH
by putting it somewhere in the localPlugins
directory, please let me know.
Now, the hard part: Rounding up the 10 jars needed to make JBossCache work! JBossCache for CAS requires the following jars
(*Skip this if you are running on JBoss):
concurrent.jar jboss-cache-jdk50.jar jboss-common.jar jboss-j2ee.jar jboss-jmx.jar jboss-minimal.jar jboss-serialization.jar jboss-system.jar jgroups.jar trove.jar
CAS 3.0.x
You can get all of these jar files in the JBossCache distribution. 3 Once you have these jars, put them in your localPlugins/lib
directory:
cas-distribution/localPlugins/lib
CAS 3.1.x
Using Maven 2, it is not as hard as CAS 3.0.x branch.
Add the following dependency to the pom.xml file located at the folder cas-server-webapp and it will include the JBoss cache stuff in cas.war
Remarks: The dependency is needed if you are NOT using JBoss Application Server.
... <dependency> <groupId>org.jasig.cas</groupId> <artifactId>cas-server-integration-jboss</artifactId> <version>3.1</version> <scope>runtime</scope> </dependency> ...
CAS 3.2.x on JBOSS (or probably any CAS implementation on JBoss)
You need to exclude some jars from the deployment otherwise they will conflict with JBOSS.
<dependency> <groupId>org.jasig.cas</groupId> <artifactId>cas-server-integration-jboss</artifactId> <version>3.2.1-RC1</version> <scope>runtime</scope> <exclusions> <exclusion> <groupId>concurrent</groupId> <artifactId>concurrent</artifactId> </exclusion> <exclusion> <groupId>jboss</groupId> <artifactId>jboss-serialization</artifactId> </exclusion> <exclusion> <groupId>jboss</groupId> <artifactId>jboss-jmx</artifactId> </exclusion> <exclusion> <groupId>jboss</groupId> <artifactId>jboss-common</artifactId> </exclusion> <exclusion> <groupId>jboss</groupId> <artifactId>jboss-j2ee</artifactId> </exclusion> <exclusion> <groupId>jboss</groupId> <artifactId>jboss-minimal</artifactId> </exclusion> <exclusion> <groupId>jboss</groupId> <artifactId>jboss-system</artifactId> </exclusion> </exclusions> </dependency>
Ok, now let's test this thing! Build cas.war and redeploy to your two (or more) Tomcat instances and you should see the JBossCache
info in the catalina.out
log:
2007-05-23 16:59:34,486 INFO [org.jasig.cas.util.JBossCacheFactoryBean] - <Starting TreeCache service.> ------------------------------------------------------- GMS: address is 128.32.143.78:51052 -------------------------------------------------------
In the catalina.out
log on my other server, I see:
2007-05-23 17:01:22,113 INFO [org.jasig.cas.util.JBossCacheFactoryBean] - <Starting TreeCache service.> ------------------------------------------------------- GMS: address is 128.32.143.79:56023 -------------------------------------------------------
If you see this, and no Java exceptions, you are doing well! If you see Java exceptions, they are probably related to Tomcat not being able to find the jbossTicketCacheReplicationConfig.xml
file in its CLASSPATH
or it can't fine some class related to the JBossCache, i.e., one of the jars is missing.
Ensuring Ticket Granting Ticket Cookie Visibility
The last step before you can test out whether CAS is set up to be clustered correctly is to ensure that the ticket granting ticket (TGT) cookie set in the users' browsers is visible by all of the nodes in the CAS cluster. Using your favorite text editor (shameless plug for vim), open the cas-servlet.xml file and look for the warnCookieGenerator and ticketGrantingTicketCookieGenerator beans. Both of these beans need to have the cookieDomain property set to the domain where the TGT cookie should be visible to. Edit the bean declarations based on the following example (substitute your domain as necessary):
Protect your ticket granting cookies!
Warning: do not set the cookieDomain any wider than absolutely necessary. All hosts in the cookieDomain must be absolutely trusted - at a security level of your CAS server itself. Ideally all clustered CAS server instances will appear to the end user's web browser to be answering the very same URLs (e.g., the cluster is fronted by a hardware load balancer) and so the cookieDomain can be maximally restrictive.
Setting the cookie domain such that untrusted servers have access to the Ticket Granting Cookie will allow those servers to hijack the end user's single sign on session and acquire service tickets in his or her name to illicitly authenticate to CASified applications.
<bean id="warnCookieGenerator" class="org.jasig.cas.web.support.CookieRetrievingCookieGenerator" p:cookieSecure="true" p:cookieMaxAge="-1" p:cookieName="CASPRIVACY" p:cookiePath="/cas" p:cookieDomain="example.com"/>
<bean id="ticketGrantingTicketCookieGenerator" class="org.jasig.cas.web.support.CookieRetrievingCookieGenerator" p:cookieSecure="true" p:cookieMaxAge="-1" p:cookieName="CASTGC" p:cookiePath="/cas" p:cookieDomain="example.com" />
Verification
JBOSS
You will need to deploy as a .war file into JBoss's farm at:
${JBOSS_HOME}/server/all
After you have started your cluster servers, insure you have a cluster by checking the JBoss DefaultPartition. The CurrentView should show all the ip's of your cluster. If not, you will need to research why your cluster is not finding the other nodes.
Service Management
If you use the service management feature to restrict access to the CAS server based on CAS client service URLs/URL patterns, a Quartz job like the following must be added to one of your Spring contexts. The purpose of the job is to refresh the other nodes of service changes by reloading the services from the backing store. A service registry implementation that supports clustering, e.g. JpaServiceRegistryDaoImpl, LdapServiceRegistryDao, is required for proper clustering support. Both the Service Manager Reload Job and Trigger should be added to ticketRegistry.xml
<!-- Job to periodically reload services from service registry. This job is needed for a clustered CAS environment since service changes in one CAS node are not known to the other until a reload. --> <bean id="serviceRegistryReloaderJobDetail" class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean" p:targetObject-ref="servicesManager" p:targetMethod="reload" /> <bean id="periodicServiceRegistryReloaderTrigger" class="org.springframework.scheduling.quartz.SimpleTriggerBean" p:jobDetail-ref="serviceRegistryReloaderJobDetail" p:startDelay="120000" p:repeatInterval="120000" />
In order for the above job to fire, the trigger must be added to the Quartz scheduler bean as follows:
<bean id="scheduler"> <property name="triggers"> <list> <ref local="periodicServiceRegistryReloaderTrigger" /> </list> </property> </bean>
References
The following references are used in this document: