Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

[15:56:51 CDT(-0500)] <drewwills> terrific, that's great

[15:57:13 CDT(-0500)] <drewwills> I have some bandwidth to take this on under coop – sound good?

[15:58:00 CDT(-0500)] <EricDalquist> great

[15:58:16 CDT(-0500)] <EricDalquist> create a feature-branch off master if you would

[15:58:25 CDT(-0500)] <drewwills> sure deal

[15:58:38 CDT(-0500)] <EricDalquist> then if you push that to your fork we can collaborate on code better (smile)

[15:58:56 CDT(-0500)] <drewwills> wait – feature branch in Jasig/uPortal? not a ticket branch in my personal?

[15:59:08 CDT(-0500)] <EricDalquist> oh either works actually

[15:59:44 CDT(-0500)] <EricDalquist> so the other place to consider looking is https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletExecutionWorker.java#L244

[15:59:54 CDT(-0500)] <EricDalquist> that is the handler that gets called every time a portlet times out

[16:00:44 CDT(-0500)] <drewwills> does postExecution() get called even if the tread times out, is killed, errors out, etc?

[16:00:56 CDT(-0500)] <EricDalquist> I believe so

[16:01:01 CDT(-0500)] <EricDalquist> it gets called in a finally block

[16:01:16 CDT(-0500)] <EricDalquist> so anything short of the thread dieing should result in execution

[16:01:42 CDT(-0500)] <drewwills> roger

[16:03:00 CDT(-0500)] <EricDalquist> another thing you might want to look at in all of this is PortletExecutionManager.checkWorkerCompletion

[16:03:10 CDT(-0500)] <EricDalquist> that is called after each request the portal handles

[16:03:24 CDT(-0500)] <EricDalquist> the portal keeps track of all workers started during a request as a request attribute

[16:03:36 CDT(-0500)] <EricDalquist> that goes through and makes sure each worker has completed

[16:04:33 CDT(-0500)] <EricDalquist> if it finds a worker that hasn't completed it tries to cancel the worker (eventually Thread.interrupt is called) and adds the worker to a singleton queue

[16:05:00 CDT(-0500)] <EricDalquist> there is a background process that runs every 200ms which goes through that queue and calls interrupt on each worker until the worker eventually stops

[16:05:50 CDT(-0500)] <athena> EricDalquist: i think i've found the issue with the layout.json document. that URL wasn't behind the URL canonicalizing filter until this commit: https://github.com/Jasig/uPortal/commit/53d54bc66ea808414614d62a2a52175a3b167da0

[16:06:07 CDT(-0500)] <athena> er, wrong link - https://github.com/Jasig/uPortal/commit/6e7396b547a32ea2ecdf301e0f03de6db85c675d

[16:06:20 CDT(-0500)] <EricDalquist> ah

[16:06:29 CDT(-0500)] <EricDalquist> right that url shouldn't get canonicalized

[16:06:30 CDT(-0500)] <drewwills> i see... so if we had another process – or perhaps the same process – count the ones that won't die by fname, it could trigger the no-new-threads valve

[16:06:38 CDT(-0500)] <EricDalquist> yup

[16:06:47 CDT(-0500)] <EricDalquist> so drewwills there is no right answer here

[16:06:54 CDT(-0500)] <EricDalquist> but there are a lot of places we can hook into portlet execution

[16:06:58 CDT(-0500)] <athena> i can just edit that line out as long as that wn't cause any other issues

[16:06:59 CDT(-0500)] <drewwills> or perhaps there are several (smile)

[16:07:06 CDT(-0500)] <EricDalquist> athena: yes

[16:07:21 CDT(-0500)] <athena> cool.

[16:07:37 CDT(-0500)] <EricDalquist> that was an oversight

[16:07:39 CDT(-0500)] <EricDalquist> drewwills: yes

[16:07:48 CDT(-0500)] <EricDalquist> and probably several metrics we could eventually watch

[16:07:49 CDT(-0500)] <athena> no worries (smile)

[16:08:00 CDT(-0500)] <EricDalquist> # of workers that won't die per fname

[16:08:09 CDT(-0500)] <EricDalquist> # of workers that timeout per fname

[16:08:15 CDT(-0500)] <EricDalquist> # of workers in the pool per fname

[16:08:22 CDT(-0500)] <EricDalquist> though that last one may not be a good once

[16:08:40 CDT(-0500)] <EricDalquist> since you could have a portal where 50% of your work is one fname

[16:08:43 CDT(-0500)] <EricDalquist> and that is ok

[16:10:14 CDT(-0500)] <EricDalquist> I'll be interest to see what you come up with (smile)

[16:10:29 CDT(-0500)] <EricDalquist> the other piece that may be missing is a way to reject new workers from the queue

[16:10:50 CDT(-0500)] <EricDalquist> potentially adding another abstract method to QualityOfServiceBlockingQueue

[16:11:05 CDT(-0500)] <EricDalquist> which would allow the subclass to reject a newly offered worker

[16:11:31 CDT(-0500)] <EricDalquist> otherwise if portlet foo gets hung up and never recovers we could be in a situation where foo's queue just keeps growing

[16:21:25 CDT(-0500)] <drewwills> sorry, got distracted – yes, rl experience it would be better simply not to accept workers for portlets that have gone rogue... since they normally do so for 10 min or more

[16:21:37 CDT(-0500)] <EricDalquist> yup

[16:21:39 CDT(-0500)] <drewwills> "rl experience suggests"*

[16:21:56 CDT(-0500)] <drewwills> ok, i'm excited to see what I come up with too (tongue)

[16:21:58 CDT(-0500)] <EricDalquist> so that would be an additional enhancement to that QualityOfServiceBlockingQueue and the portlet worker subclass

[16:22:00 CDT(-0500)] <EricDalquist> (smile)

[16:22:30 CDT(-0500)] <EricDalquist> ok drewwills all my crn enhancements have been committed

[16:22:40 CDT(-0500)] <EricDalquist> more stuff I found working on exporting layouts from 3.2

[16:22:49 CDT(-0500)] <EricDalquist> also found two tables in dire need of indexes

[16:32:36 CDT(-0500)] <EricDalquist> on a semi-related note

[16:32:49 CDT(-0500)] <EricDalquist> uP4 appears to be a bit stricter on timeout enforcement

[16:33:16 CDT(-0500)] <EricDalquist> and we're running into a fun problem where uP calls interrupt on a portlet that has timed out while the portlet is in the middle of classloading

[16:33:31 CDT(-0500)] <EricDalquist> the result is the class it is loading gets marked bad and never loads

[16:33:58 CDT(-0500)] <EricDalquist> so I'm adding some logic to PortletExecutionManager to multiply a portlet's timeout by 20 the first 5 times it executes