Load Balancing

There are a number of techniques you can employ to achieve load balancing across a cluster of portal servers. Most of this is not specific to uPortal. While you should look for options and detailed information elsewhere, a number of examples are detailed below.

Software-based Load Balancing

Apache

Hardware-based Load Balancing

Nortel Networks Alteon 184

F5

Load Balancing general guidelines

Sticky Sessions

uPortal significantly caches data for a user session.  For best performance a user must maintain a persistent connection with the same server for the duration of their session.  This is often called sticky session.

  • Some users have had problems using a load-balancer-assigned cookie and trying to use it for request routing.  One approach that works well is to route based on the JSESSIONID cookie assigned by Tomcat.

Load Distribution

There are a number of algorithms for load distribution, none of them perfect.  Refer to your load balancer documentation for supported methods and additional guidance, including using pool groups (clusters at different data centers, for example) or weighting multiple factors.  

Some of the load balance algorithms at a high level are:

Least connections

  • Assumes each connection has equivalent impact on a server.  If your campus supports unauthenticated access, this mechanism does not take into account that guest access is heavily cached and a guest session generally has lower impact and requires less resources than an authenticated session.

Number of active HTTP sessions (retrieved from Tomcat)

  • Can result in inbalances due to long HTTP session timeouts (30 minutes) and guest users or authenticated users that access the landing page and then branch off to other campus systems will appear to be active until the HTTP session times out.  If your campus supports unauthenticated access, this mechanism does not take into account that guest access is heavily cached and a guest session generally has lower impact and requires less resources than an authenticated session.

Response time

  • Distributing load based on response time of the Node operational health check or another test can provide a reasonable indication of performance of a node.

Target node metrics (such as avg CPU load)

  • Assumes something like average CPU has a rough correlation to response time and load.

Round robin

  • One of the least desirable algorithms as it does not take into account target node performance or load.

Regardless of which algorithm you choose, if your load balancer supports it configure the Slow Ramp-up time so a node that is just added into the cluster does not get hammered with many connections. uPortal has a heavy ramp-up time to initialize the system and the first few connections take a heavy hit on filling up some of the in-memory caches. Also user login is a very heavyweight operation as much computation and database activity occurs to create the user's authenticated environment and home page so you want to have logins spread out. Failure to configure a ramp-up time for a new node will typically result in poor performance for users on the new node until its behavior stabilizes.

Node operational health check

Currently uPortal does not have a health check page that the load balancer can use to validate a node is operational.  In lieu of that, the following approaches will allow the load balancer to determine some level of operational capability on the uPortal servers.

GET /uPortal/layout.json

  • Preferred approach.  Returns HTTP 200 if layout is able to be returned.  Returns HTTP 500 if uPortal is unable to connect to the database (by default reads occur from UP_MESSAGE table and render event writes occur to UP_RAW_EVENTS (unless event aggregation configuration has this disabled).  Data (guest layout) is heavily cached and rarely pulled from the database so this is a moderately low load health check.  There is still a fair bit of computation that occurs to generate the response so this can also provide a rough target system response time indication for load leveling.

If the load balancer has trouble following HTTP 302 redirects, configure it to send a fixed cookie value that Tomcat/Java would not create in the request. For example:

wget --header="Cookie: JSESSIONID=23485898E75DB49-LoadBalancer" http://localhost:8080/uPortal/layout.json

The first request will get HTTP 302 redirected through /Login as normal, but subsequent requests will return immediately with an HTTP 200. However it is better to follow HTTP 302 redirects (might need to enable Connection: keep-alive for this).

Configuring the load balancer to send a fixed cookie value will continue to re-use a single HTTP session and not create many unnecessary HTTP sessions in Tomcat just for a health check. Though not a big operational impact, this strategy is a useful optimization that helps minimize the heap memory impact of operational health checks.

 

GET /uPortal/f/welcome/normal/render.uP

  • More load-intensive approach that returns the uPortal guest page.  Indicates greater level of uPortal operation than the layout.json file (verifies the guest page is rendered).  However this URL returns multiple HTTP 302 redirects as part of the authentication process.  The load balancer must be configured to automatically follow HTTP redirects.

 uPortal 4.2.0+: If your load balancer has trouble with the cookieCheck process, you can configure specific User-Agent strings that skip the cookie check and configure your load balancer to send that HTTP header. See bean 'remoteCookieCheckFilter' in uportal-war/src/main/resources/properties/contexts/mvcContext.xml.

 

 

Additional References

 

Having problems with these instructions?

Please send us feedback at uportal-user@lists.ja-sig.org