Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

[14:50:08 CDT(-0500)] <EricDalquist> not sure

[14:51:18 CDT(-0500)] <EricDalquist> what does your target org.springframework.ldap.core.support.LdapContextSource bean look like again?

[14:51:59 CDT(-0500)] <EricDalquist> also do you have a snippet of your ehcache config handy for the caches used by PD?

[14:52:17 CDT(-0500)] <sjungling> yep, one moment

[14:53:30 CDT(-0500)] <sjungling> https://gist.github.com/70950a606b5fe4a0de49

[14:54:16 CDT(-0500)] <EricDalquist> so one thing I'd look at is cache sizes

[14:54:20 CDT(-0500)] <EricDalquist> either via the cache manager portlet

[14:54:21 CDT(-0500)] <EricDalquist> or jmx

[14:54:30 CDT(-0500)] <EricDalquist> see how full your persondir caches are

[14:54:39 CDT(-0500)] <EricDalquist> you might be running out of space with only 1000 elements

[14:54:43 CDT(-0500)] <EricDalquist> resulting in more queries

[14:54:55 CDT(-0500)] <EricDalquist> same with 500 for ldap queries

[14:55:22 CDT(-0500)] <sjungling> one of our boxes… (21 / 500) for merged

[14:55:25 CDT(-0500)] <EricDalquist> if you get more than 500 unique, active, users over 10 minutes you'll end up with extra queries being run

[14:56:29 CDT(-0500)] <sjungling> groups.IEntity, StylesheetUserPreferencesImpl.Query, and FragmentDefinition.Query are the only ones with full objects

[14:56:46 CDT(-0500)] <EricDalquist> hrm ok

[14:57:27 CDT(-0500)] <EricDalquist> and can you add your ldapContext.xml to that gist?

[15:03:54 CDT(-0500)] <sjungling> added

[15:04:46 CDT(-0500)] <EricDalquist> hrm

[15:04:55 CDT(-0500)] <EricDalquist> so I think that is the LDAP server returning an error

[15:05:34 CDT(-0500)] <EricDalquist> or ... well it could be the csuc-student-announcements portlet timing out

[15:05:44 CDT(-0500)] <EricDalquist> since when a portlet times out thread.interrupt is called on it

[15:06:15 CDT(-0500)] <EricDalquist> yeah that is the case

[15:06:23 CDT(-0500)] <EricDalquist> so the portal is trying to execute csuc-student-announcements

[15:06:33 CDT(-0500)] <EricDalquist> which takes longer than the configured timeout

[15:06:52 CDT(-0500)] <EricDalquist> and when the portlet times out the portal sticks that worker thread on a recovery queue

[15:07:00 CDT(-0500)] <EricDalquist> and periodically calls Thread.interrupt on it

[15:07:04 CDT(-0500)] <EricDalquist> until the worker returns

[15:07:12 CDT(-0500)] <EricDalquist> if you're seeing a lot of thse

[15:07:25 CDT(-0500)] <EricDalquist> I'd say it implies that the LDAP searches are taking a long time

[15:07:35 CDT(-0500)] <EricDalquist> looking at the bottom of the stack trace

[15:07:41 CDT(-0500)] <EricDalquist> the LDAP connection is established

[15:07:59 CDT(-0500)] <EricDalquist> and the code is waiting in readReply to read the response from the LDAP server

[15:14:11 CDT(-0500)] <EricDalquist> what is the timeout for that portlet?>

[15:14:48 CDT(-0500)] <sjungling> checking…

[15:14:52 CDT(-0500)] <sjungling> probably the default for webproxy

[15:15:13 CDT(-0500)] <sjungling> 5000

[15:15:25 CDT(-0500)] <EricDalquist> ok

[15:15:35 CDT(-0500)] <EricDalquist> so at this point I'd go back to your LDAP admins

[15:15:42 CDT(-0500)] <EricDalquist> and try to figure out why LDAP queries are taking so long

[15:15:54 CDT(-0500)] <sjungling> yeah, just got done talking to them

[15:16:05 CDT(-0500)] <EricDalquist> though the fact that this query is happing during a portlet render is a little weird

[15:16:07 CDT(-0500)] <sjungling> they're claiming the uportal is making deferred operations?

[15:16:09 CDT(-0500)] <EricDalquist> since I'd expect that to be cached

[15:16:19 CDT(-0500)] <EricDalquist> deferred operations?

[15:16:51 CDT(-0500)] <sjungling> "It means that an operation that was just received on that connection has been deferred, because there are other operations ahead of it that are still pending. I.e., there's a bunch of operations on that connection waiting to execute, and the newcomer is last in line."

[15:16:55 CDT(-0500)] <sjungling> from one of the OpenLDAP guys

[15:17:10 CDT(-0500)] <EricDalquist> ahhh'

[15:17:15 CDT(-0500)] <EricDalquist> so that is probably due to the connection poolingh

[15:17:19 CDT(-0500)] <sjungling> ok

[15:17:23 CDT(-0500)] <sjungling> kinda my thought

[15:17:38 CDT(-0500)] <EricDalquist> the portal gives up on a thread making an ldap query

[15:17:46 CDT(-0500)] <sjungling> and sounds like you guys have pretty good experience with the config you supplied?

[15:17:50 CDT(-0500)] <EricDalquist> yes

[15:18:01 CDT(-0500)] <sjungling> of course, not all ldap enviros being equal

[15:18:05 CDT(-0500)] <EricDalquist> the portlet is interuppted

[15:18:10 CDT(-0500)] <EricDalquist> and the connection is returned to the poo;

[15:18:16 CDT(-0500)] <EricDalquist> even though it is still executing a query

[15:18:25 CDT(-0500)] <EricDalquist> another thread comes along and tries to use it

[15:18:30 CDT(-0500)] <EricDalquist> that makes sense

[15:18:39 CDT(-0500)] <EricDalquist> still the original cause would be slow query results

[15:18:43 CDT(-0500)] <EricDalquist> you could disable pooling

[15:18:56 CDT(-0500)] <EricDalquist> (just comment out the pooling bean and rename the target bean to the pool bean name)

[15:18:58 CDT(-0500)] <EricDalquist> and see how that goes

[15:19:14 CDT(-0500)] <sjungling> hrm, kk

[15:19:40 CDT(-0500)] <sjungling> you recommend starting there and then going to more advanced pooling if the issue persists?

[15:20:48 CDT(-0500)] <EricDalquist> well that would fix the deferred issue

[15:20:51 CDT(-0500)] <EricDalquist> \but if queries are still sllow

[15:20:57 CDT(-0500)] <EricDalquist> you're just going to change the overload

[15:21:11 CDT(-0500)] <EricDalquist> to a potential large increase in the number of open connections

[15:21:23 CDT(-0500)] <EricDalquist> I'm still back on ... why are ldap queries taking so long

[15:21:36 CDT(-0500)] <EricDalquist> I'd assume that a direct user query should be like .... a millisecond or two

[15:21:46 CDT(-0500)] <sjungling> right

[15:21:47 CDT(-0500)] <EricDalquist> that is the whole point of LDAP versus a DB

[15:21:55 CDT(-0500)] <EricDalquist> if you're seeing this behavior with a 5s timeout

[15:22:10 CDT(-0500)] <EricDalquist> that seems to imply your LDAP server is having problems

[15:22:15 CDT(-0500)] <sjungling> right

[15:23:32 CDT(-0500)] <sjungling> 5s is the default timeout?

[15:23:41 CDT(-0500)] <sjungling> with the webproxy portlet, right?

[15:23:55 CDT(-0500)] <EricDalquist> I'm not sure, I'd have to go look at portlet manager

[15:31:36 CDT(-0500)] <EricDalquist> uhg athena I'm trying to track down this ALL stuff

[15:31:40 CDT(-0500)] <EricDalquist> such a rabbit hole

[15:32:30 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/commit/b412d0f13fa336db165c2722c9fe05091d5602c7

[15:32:36 CDT(-0500)] <EricDalquist> that is the commit that did it though

[15:33:04 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/commit/b412d0f13fa336db165c2722c9fe05091d5602c7#L6R88

[15:33:06 CDT(-0500)] <EricDalquist> that block

[15:33:23 CDT(-0500)] <EricDalquist> that ends up in a chunk of code that tries to figure out if the ALL_TARGET is a portlet or a person

[15:33:32 CDT(-0500)] <EricDalquist> and since it doesn't match the portlet pattern it assumes it is a PERSON

[15:33:36 CDT(-0500)] <EricDalquist> and runs it through PAGs

[15:33:43 CDT(-0500)] <athena> ugh

[15:33:43 CDT(-0500)] <EricDalquist> which does the attribute query

[15:34:31 CDT(-0500)] <EricDalquist> the thing is I can;t figure out where ALL_TARGET is actually ever tested against

[15:34:36 CDT(-0500)] <EricDalquist> but I haven't read the whole patch yet

[15:34:38 CDT(-0500)] <EricDalquist> have to run

[15:34:43 CDT(-0500)] <EricDalquist> and will dig into this more tomorrow.