Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 28 Next »

[13:00:44 CDT(-0500)] <EricDalquist> The String.split method annoys me

[13:00:56 CDT(-0500)] <EricDalquist> every time you call it you're compiling a new Pattern

[13:01:04 CDT(-0500)] <EricDalquist> so it is useful

[13:01:07 CDT(-0500)] <EricDalquist> it saves a little code

[13:01:10 CDT(-0500)] <EricDalquist> but used in hot spots

[13:01:17 CDT(-0500)] <EricDalquist> it really chews CPU

[13:07:52 CDT(-0500)] <athena> yeah

[13:07:57 CDT(-0500)] <athena> that's a good point

[13:08:08 CDT(-0500)] <athena> can you feed it a pre-compiled pattern?

[13:08:14 CDT(-0500)] <EricDalquist> no but you can do:

[13:08:27 CDT(-0500)] <EricDalquist> p = Pattern.compile(...);

[13:08:33 CDT(-0500)] <EricDalquist> p.split(String)

[13:08:40 CDT(-0500)] <EricDalquist> and store p as a static final

[13:08:47 CDT(-0500)] <athena> that makes sense

[13:09:05 CDT(-0500)] <EricDalquist> maybe someday uPortal's code will be clean enough that we can add checkstyle (smile)

[13:11:25 CDT(-0500)] <athena> (smile)

[13:42:31 CDT(-0500)] <JoeMoore> Im trying to configure portlet parameters. It wont let me edit them at the beginning, but when I try to edit them with the rich editor I get http://pastebin.com/Q5Y9j5Cw

[14:50:19 CDT(-0500)] <JoeMoore> I guess my pastebin timed out (one hour). Let me know if someone wants to take a look and I'll paste again with a longer timeout on the problem of configuring portlet parameters.

[15:40:06 CDT(-0500)] <drewwills> EricDalquist I'd like to pick your brain for a sec about UP-3161: Misbehaving portlets can cause channel rendering threads to leak and bring the portal to its knees

[15:40:14 CDT(-0500)] <EricDalquist> ok

[15:40:20 CDT(-0500)] <EricDalquist> I have a few spare moments

[15:40:30 CDT(-0500)] <EricDalquist> working franticly toward a uP4.0.4 upgrade in 7 days (tongue)

[15:40:46 CDT(-0500)] <drewwills> the way I remember it, things lie as follows: there is logic to detect and log the problem, which is "on" be default...

[15:41:06 CDT(-0500)] <EricDalquist> are we talking 3.2 or 4.0

[15:41:12 CDT(-0500)] <EricDalquist> because they have different mechanisms

[15:41:16 CDT(-0500)] <drewwills> there is logic to prevent the offending portlets from taking more threads, which is "off" by default

[15:41:20 CDT(-0500)] <drewwills> all of this is 3.2

[15:41:23 CDT(-0500)] <EricDalquist> ok

[15:41:47 CDT(-0500)] <drewwills> and (now) for 4.x, there's different realities, and I'm not sure how much is there

[15:42:50 CDT(-0500)] <drewwills> question 1: can we turn on the thread-capping tech? (currently based on positive-integer threadshold for not allocating more)

[15:43:17 CDT(-0500)] <drewwills> question #2: do we need any of that translated and added to 4?

[15:43:19 CDT(-0500)] <EricDalquist> for 3.2, if you're very confident it wont' break existing 3.2 uPortal installs: yes

[15:43:22 CDT(-0500)] <EricDalquist> for 4.0

[15:43:32 CDT(-0500)] <EricDalquist> the portlet execution is done by the org.jasig.portal.utils.threading.QoSThreadPoolExecutorFactoryBean

[15:43:41 CDT(-0500)] <drewwills> pretty sure... OHIO has been running it for 2 quarters or more

[15:43:43 CDT(-0500)] <EricDalquist> sorry that creates the ExecutorService that is used

[15:43:56 CDT(-0500)] <EricDalquist> and the #s will scale well to schools of various sizes?

[15:44:13 CDT(-0500)] <EricDalquist> the Queue for that executor service is org.jasig.portal.portlet.rendering.worker.PortletWorkerExecutionQueue

[15:44:19 CDT(-0500)] <drewwills> i should think... we can pick a high number if we're worried

[15:44:25 CDT(-0500)] <EricDalquist> that would be good

[15:44:42 CDT(-0500)] <EricDalquist> the idea in 4.0 is that the queue that feeds the thread pool is actually a pool of queues

[15:44:46 CDT(-0500)] <EricDalquist> with one queue per fname

[15:44:55 CDT(-0500)] <drewwills> maybe change default thread pool size to 100, max errant to 10?

[15:45:15 CDT(-0500)] <EricDalquist> so that means if one fname has more than 10 threads in use it stops?

[15:45:20 CDT(-0500)] <drewwills> ah one queue per fname means this problem doesn't exist, basically

[15:45:29 CDT(-0500)] <drewwills> no, not that...

[15:45:30 CDT(-0500)] <EricDalquist> well it still could in 4.0

[15:45:40 CDT(-0500)] <EricDalquist> but it is at least mitigated

[15:46:02 CDT(-0500)] <drewwills> it means if 1 fname hase more than 10 rendering FOR MORE THAN DOUBLE THE CONFIGURED TIMEOUT, it won't get more threads

[15:46:24 CDT(-0500)] <EricDalquist> ah

[15:46:24 CDT(-0500)] <EricDalquist> ok

[15:46:28 CDT(-0500)] <EricDalquist> that seems reasonable

[15:46:29 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletWorkerExecutionQueue.java

[15:46:49 CDT(-0500)] <EricDalquist> QualityOfServiceBlockingQueue handles all the real work of maintaining a thread safe queue of queues

[15:46:58 CDT(-0500)] <drewwills> roger

[15:47:21 CDT(-0500)] <EricDalquist> PortletWorkerExecutionQueue essentially has the work of:

[15:47:21 CDT(-0500)] <EricDalquist> 1. getElementKey - determines the key for the queue the worker gets placed into

[15:47:23 CDT(-0500)] <drewwills> so if 1 portlet (by fname) runs out of threads, it doesn't affect the others, in essence?

[15:47:36 CDT(-0500)] <EricDalquist> 2. getNextElementKey - get the next queue name to grab a worker from

[15:47:59 CDT(-0500)] <EricDalquist> so lets say right now you have a queue that has had the following portlets offered to it:

[15:48:20 CDT(-0500)] <EricDalquist> proxy, news, alerts, news, proxy, news, news

[15:48:30 CDT(-0500)] <EricDalquist> you would have 3 internal queues

[15:48:44 CDT(-0500)] <drewwills> oh, and one more thing: if the portlet starts behaving, and the number of "errant" threads come beack below the threshold, it will start getting threads again

[15:49:07 CDT(-0500)] <EricDalquist> proxy (2 entries), news (4 entries), alerts (1 entry)

[15:49:13 CDT(-0500)] <drewwills> roger

[15:49:39 CDT(-0500)] <EricDalquist> the PortletWorkerExecutionQueue gets a list of all internal queue names

[15:49:59 CDT(-0500)] <EricDalquist> and when getNextElementKey it finds the next internal queue with a worker in it

[15:50:20 CDT(-0500)] <EricDalquist> and when it gets to the end of the queue name list it refreshes the list and starts again

[15:50:26 CDT(-0500)] <EricDalquist> so you can still have thread starvation here

[15:51:06 CDT(-0500)] <drewwills> so it's a multi-queue setup, but not multi-threadpool?

[15:51:17 CDT(-0500)] <EricDalquist> right

[15:51:21 CDT(-0500)] <EricDalquist> there is one pool

[15:51:28 CDT(-0500)] <EricDalquist> with one "meta queue" that feeds it

[15:51:41 CDT(-0500)] <EricDalquist> and that meta queue needs does some very basic 'balancing' right now

[15:52:08 CDT(-0500)] <EricDalquist> what needs to happen is the logic in PortletWorkerExecutionQueue.getNextElementKey needs to get smarter

[15:52:24 CDT(-0500)] <drewwills> what if we did something similar... detect when an fname has n number of rendering threads allocated+overdue by 2x or more... then log and refuse to allocate more

[15:52:54 CDT(-0500)] <EricDalquist> yup

[15:53:13 CDT(-0500)] <EricDalquist> so the best way to do that is to add some share instrumentation between

[15:53:14 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletExecutionWorker.java

[15:53:14 CDT(-0500)] <EricDalquist> and

[15:53:14 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletWorkerExecutionQueue.java

[15:53:33 CDT(-0500)] <EricDalquist> PortletExecutionWorker is the base class for the four portlet execution types (render, action, resource, event)

[15:53:56 CDT(-0500)] <EricDalquist> it handles all the logic around knowing when a worker is actually running

[15:54:00 CDT(-0500)] <EricDalquist> if it has timed out

[15:54:01 CDT(-0500)] <EricDalquist> etc

[15:54:10 CDT(-0500)] <EricDalquist> hrm

[15:54:12 CDT(-0500)] <EricDalquist> actually

[15:54:12 CDT(-0500)] <drewwills> one pool for all execution types, right?

[15:54:39 CDT(-0500)] <EricDalquist> yes

[15:54:47 CDT(-0500)] <EricDalquist> portlet workers in uP4 have an interceptor API

[15:54:49 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/contexts/portletContainerContext.xml#L43

[15:55:29 CDT(-0500)] <EricDalquist> here is an example: https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/ThreadNamingPortletExecutionInterceptorAdaptor.java

[15:55:52 CDT(-0500)] <EricDalquist> so that lets you write a little class that gets called preSubmit, preExec, and postExec

[15:56:05 CDT(-0500)] <EricDalquist> so you could write a portlet execution interceptor

[15:56:20 CDT(-0500)] <EricDalquist> that counts the number of portlets of each fname in the pool

[15:56:25 CDT(-0500)] <drewwills> one that would keep a count

[15:56:26 CDT(-0500)] <drewwills> ?

[15:56:32 CDT(-0500)] <drewwills> yes, exactly

[15:56:40 CDT(-0500)] <EricDalquist> and uses that info to influence sub-queue skipping in PortletWorkerExecutionQueue

[15:56:51 CDT(-0500)] <drewwills> terrific, that's great

[15:57:13 CDT(-0500)] <drewwills> I have some bandwidth to take this on under coop – sound good?

[15:58:00 CDT(-0500)] <EricDalquist> great

[15:58:16 CDT(-0500)] <EricDalquist> create a feature-branch off master if you would

[15:58:25 CDT(-0500)] <drewwills> sure deal

[15:58:38 CDT(-0500)] <EricDalquist> then if you push that to your fork we can collaborate on code better (smile)

[15:58:56 CDT(-0500)] <drewwills> wait – feature branch in Jasig/uPortal? not a ticket branch in my personal?

[15:59:08 CDT(-0500)] <EricDalquist> oh either works actually

[15:59:44 CDT(-0500)] <EricDalquist> so the other place to consider looking is https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletExecutionWorker.java#L244

[15:59:54 CDT(-0500)] <EricDalquist> that is the handler that gets called every time a portlet times out

[16:00:44 CDT(-0500)] <drewwills> does postExecution() get called even if the tread times out, is killed, errors out, etc?

[16:00:56 CDT(-0500)] <EricDalquist> I believe so

[16:01:01 CDT(-0500)] <EricDalquist> it gets called in a finally block

[16:01:16 CDT(-0500)] <EricDalquist> so anything short of the thread dieing should result in execution

[16:01:42 CDT(-0500)] <drewwills> roger

[16:03:00 CDT(-0500)] <EricDalquist> another thing you might want to look at in all of this is PortletExecutionManager.checkWorkerCompletion

[16:03:10 CDT(-0500)] <EricDalquist> that is called after each request the portal handles

[16:03:24 CDT(-0500)] <EricDalquist> the portal keeps track of all workers started during a request as a request attribute

[16:03:36 CDT(-0500)] <EricDalquist> that goes through and makes sure each worker has completed

[16:04:33 CDT(-0500)] <EricDalquist> if it finds a worker that hasn't completed it tries to cancel the worker (eventually Thread.interrupt is called) and adds the worker to a singleton queue

[16:05:00 CDT(-0500)] <EricDalquist> there is a background process that runs every 200ms which goes through that queue and calls interrupt on each worker until the worker eventually stops

[16:05:50 CDT(-0500)] <athena> EricDalquist: i think i've found the issue with the layout.json document. that URL wasn't behind the URL canonicalizing filter until this commit: https://github.com/Jasig/uPortal/commit/53d54bc66ea808414614d62a2a52175a3b167da0

[16:06:07 CDT(-0500)] <athena> er, wrong link - https://github.com/Jasig/uPortal/commit/6e7396b547a32ea2ecdf301e0f03de6db85c675d

[16:06:20 CDT(-0500)] <EricDalquist> ah

[16:06:29 CDT(-0500)] <EricDalquist> right that url shouldn't get canonicalized

[16:06:30 CDT(-0500)] <drewwills> i see... so if we had another process – or perhaps the same process – count the ones that won't die by fname, it could trigger the no-new-threads valve

[16:06:38 CDT(-0500)] <EricDalquist> yup

[16:06:47 CDT(-0500)] <EricDalquist> so drewwills there is no right answer here

  • No labels