...
[14:52:17 CDT(-0500)] <sjungling> yep, one moment
[14:53:30 CDT(-0500)] <sjungling> https://gist.github.com/70950a606b5fe4a0de49
[14:54:16 CDT(-0500)] <EricDalquist> so one thing I'd look at is cache sizes
[14:54:20 CDT(-0500)] <EricDalquist> either via the cache manager portlet
[14:54:21 CDT(-0500)] <EricDalquist> or jmx
[14:54:30 CDT(-0500)] <EricDalquist> see how full your persondir caches are
[14:54:39 CDT(-0500)] <EricDalquist> you might be running out of space with only 1000 elements
[14:54:43 CDT(-0500)] <EricDalquist> resulting in more queries
[14:54:55 CDT(-0500)] <EricDalquist> same with 500 for ldap queries
[14:55:22 CDT(-0500)] <sjungling> one of our boxes… (21 / 500) for merged
[14:55:25 CDT(-0500)] <EricDalquist> if you get more than 500 unique, active, users over 10 minutes you'll end up with extra queries being run
[14:56:29 CDT(-0500)] <sjungling> groups.IEntity, StylesheetUserPreferencesImpl.Query, and FragmentDefinition.Query are the only ones with full objects
[14:56:46 CDT(-0500)] <EricDalquist> hrm ok
[14:57:27 CDT(-0500)] <EricDalquist> and can you add your ldapContext.xml to that gist?
[15:03:54 CDT(-0500)] <sjungling> added
[15:04:46 CDT(-0500)] <EricDalquist> hrm
[15:04:55 CDT(-0500)] <EricDalquist> so I think that is the LDAP server returning an error
[15:05:34 CDT(-0500)] <EricDalquist> or ... well it could be the csuc-student-announcements portlet timing out
[15:05:44 CDT(-0500)] <EricDalquist> since when a portlet times out thread.interrupt is called on it
[15:06:15 CDT(-0500)] <EricDalquist> yeah that is the case
[15:06:23 CDT(-0500)] <EricDalquist> so the portal is trying to execute csuc-student-announcements
[15:06:33 CDT(-0500)] <EricDalquist> which takes longer than the configured timeout
[15:06:52 CDT(-0500)] <EricDalquist> and when the portlet times out the portal sticks that worker thread on a recovery queue
[15:07:00 CDT(-0500)] <EricDalquist> and periodically calls Thread.interrupt on it
[15:07:04 CDT(-0500)] <EricDalquist> until the worker returns
[15:07:12 CDT(-0500)] <EricDalquist> if you're seeing a lot of thse
[15:07:25 CDT(-0500)] <EricDalquist> I'd say it implies that the LDAP searches are taking a long time
[15:07:35 CDT(-0500)] <EricDalquist> looking at the bottom of the stack trace
[15:07:41 CDT(-0500)] <EricDalquist> the LDAP connection is established
[15:07:59 CDT(-0500)] <EricDalquist> and the code is waiting in readReply to read the response from the LDAP server
[15:14:11 CDT(-0500)] <EricDalquist> what is the timeout for that portlet?>
[15:14:48 CDT(-0500)] <sjungling> checking…
[15:14:52 CDT(-0500)] <sjungling> probably the default for webproxy
[15:15:13 CDT(-0500)] <sjungling> 5000
[15:15:25 CDT(-0500)] <EricDalquist> ok
[15:15:35 CDT(-0500)] <EricDalquist> so at this point I'd go back to your LDAP admins
[15:15:42 CDT(-0500)] <EricDalquist> and try to figure out why LDAP queries are taking so long
[15:15:54 CDT(-0500)] <sjungling> yeah, just got done talking to them
[15:16:05 CDT(-0500)] <EricDalquist> though the fact that this query is happing during a portlet render is a little weird
[15:16:07 CDT(-0500)] <sjungling> they're claiming the uportal is making deferred operations?
[15:16:09 CDT(-0500)] <EricDalquist> since I'd expect that to be cached
[15:16:19 CDT(-0500)] <EricDalquist> deferred operations?
[15:16:51 CDT(-0500)] <sjungling> "It means that an operation that was just received on that connection has been deferred, because there are other operations ahead of it that are still pending. I.e., there's a bunch of operations on that connection waiting to execute, and the newcomer is last in line."
[15:16:55 CDT(-0500)] <sjungling> from one of the OpenLDAP guys
[15:17:10 CDT(-0500)] <EricDalquist> ahhh'
[15:17:15 CDT(-0500)] <EricDalquist> so that is probably due to the connection poolingh
[15:17:19 CDT(-0500)] <sjungling> ok
[15:17:23 CDT(-0500)] <sjungling> kinda my thought
[15:17:38 CDT(-0500)] <EricDalquist> the portal gives up on a thread making an ldap query
[15:17:46 CDT(-0500)] <sjungling> and sounds like you guys have pretty good experience with the config you supplied?
[15:17:50 CDT(-0500)] <EricDalquist> yes
[15:18:01 CDT(-0500)] <sjungling> of course, not all ldap enviros being equal
[15:18:05 CDT(-0500)] <EricDalquist> the portlet is interuppted
[15:18:10 CDT(-0500)] <EricDalquist> and the connection is returned to the poo;
[15:18:16 CDT(-0500)] <EricDalquist> even though it is still executing a query
[15:18:25 CDT(-0500)] <EricDalquist> another thread comes along and tries to use it
[15:18:30 CDT(-0500)] <EricDalquist> that makes sense
[15:18:39 CDT(-0500)] <EricDalquist> still the original cause would be slow query results
[15:18:43 CDT(-0500)] <EricDalquist> you could disable pooling
[15:18:56 CDT(-0500)] <EricDalquist> (just comment out the pooling bean and rename the target bean to the pool bean name)
[15:18:58 CDT(-0500)] <EricDalquist> and see how that goes
[15:19:14 CDT(-0500)] <sjungling> hrm, kk
[15:19:40 CDT(-0500)] <sjungling> you recommend starting there and then going to more advanced pooling if the issue persists?
[15:20:48 CDT(-0500)] <EricDalquist> well that would fix the deferred issue
[15:20:51 CDT(-0500)] <EricDalquist> \but if queries are still sllow
[15:20:57 CDT(-0500)] <EricDalquist> you're just going to change the overload
[15:21:11 CDT(-0500)] <EricDalquist> to a potential large increase in the number of open connections
[15:21:23 CDT(-0500)] <EricDalquist> I'm still back on ... why are ldap queries taking so long
[15:21:36 CDT(-0500)] <EricDalquist> I'd assume that a direct user query should be like .... a millisecond or two
[15:21:46 CDT(-0500)] <sjungling> right
[15:21:47 CDT(-0500)] <EricDalquist> that is the whole point of LDAP versus a DB
[15:21:55 CDT(-0500)] <EricDalquist> if you're seeing this behavior with a 5s timeout
[15:22:10 CDT(-0500)] <EricDalquist> that seems to imply your LDAP server is having problems
[15:22:15 CDT(-0500)] <sjungling> right
[15:23:32 CDT(-0500)] <sjungling> 5s is the default timeout?
[15:23:41 CDT(-0500)] <sjungling> with the webproxy portlet, right?
[15:23:55 CDT(-0500)] <EricDalquist> I'm not sure, I'd have to go look at portlet manager
[15:31:36 CDT(-0500)] <EricDalquist> uhg athena I'm trying to track down this ALL stuff
[15:31:40 CDT(-0500)] <EricDalquist> such a rabbit hole
[15:32:30 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/commit/b412d0f13fa336db165c2722c9fe05091d5602c7
[15:32:36 CDT(-0500)] <EricDalquist> that is the commit that did it though
[15:33:04 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/commit/b412d0f13fa336db165c2722c9fe05091d5602c7#L6R88
[15:33:06 CDT(-0500)] <EricDalquist> that block
[15:33:23 CDT(-0500)] <EricDalquist> that ends up in a chunk of code that tries to figure out if the ALL_TARGET is a portlet or a person
[15:33:32 CDT(-0500)] <EricDalquist> and since it doesn't match the portlet pattern it assumes it is a PERSON
[15:33:36 CDT(-0500)] <EricDalquist> and runs it through PAGs
[15:33:43 CDT(-0500)] <athena> ugh
[15:33:43 CDT(-0500)] <EricDalquist> which does the attribute query
[15:34:31 CDT(-0500)] <EricDalquist> the thing is I can;t figure out where ALL_TARGET is actually ever tested against
[15:34:36 CDT(-0500)] <EricDalquist> but I haven't read the whole patch yet
[15:34:38 CDT(-0500)] <EricDalquist> have to run
[15:34:43 CDT(-0500)] <EricDalquist> and will dig into this more tomorrow.