...
[16:05:00 CDT(-0500)] <EricDalquist> there is a background process that runs every 200ms which goes through that queue and calls interrupt on each worker until the worker eventually stops
[16:05:50 CDT(-0500)] <athena> EricDalquist: i think i've found the issue with the layout.json document. that URL wasn't behind the URL canonicalizing filter until this commit: https://github.com/Jasig/uPortal/commit/53d54bc66ea808414614d62a2a52175a3b167da0
[16:06:07 CDT(-0500)] <athena> er, wrong link - https://github.com/Jasig/uPortal/commit/6e7396b547a32ea2ecdf301e0f03de6db85c675d
[16:06:20 CDT(-0500)] <EricDalquist> ah
[16:06:29 CDT(-0500)] <EricDalquist> right that url shouldn't get canonicalized
[16:06:30 CDT(-0500)] <drewwills> i see... so if we had another process – or perhaps the same process – count the ones that won't die by fname, it could trigger the no-new-threads valve
[16:06:38 CDT(-0500)] <EricDalquist> yup
[16:06:47 CDT(-0500)] <EricDalquist> so drewwills there is no right answer here
[16:06:54 CDT(-0500)] <EricDalquist> but there are a lot of places we can hook into portlet execution
[16:06:58 CDT(-0500)] <athena> i can just edit that line out as long as that wn't cause any other issues
[16:06:59 CDT(-0500)] <drewwills> or perhaps there are several
[16:07:06 CDT(-0500)] <EricDalquist> athena: yes
[16:07:21 CDT(-0500)] <athena> cool.
[16:07:37 CDT(-0500)] <EricDalquist> that was an oversight
[16:07:39 CDT(-0500)] <EricDalquist> drewwills: yes
[16:07:48 CDT(-0500)] <EricDalquist> and probably several metrics we could eventually watch
[16:07:49 CDT(-0500)] <athena> no worries
[16:08:00 CDT(-0500)] <EricDalquist> # of workers that won't die per fname
[16:08:09 CDT(-0500)] <EricDalquist> # of workers that timeout per fname
[16:08:15 CDT(-0500)] <EricDalquist> # of workers in the pool per fname
[16:08:22 CDT(-0500)] <EricDalquist> though that last one may not be a good once
[16:08:40 CDT(-0500)] <EricDalquist> since you could have a portal where 50% of your work is one fname
[16:08:43 CDT(-0500)] <EricDalquist> and that is ok
[16:10:14 CDT(-0500)] <EricDalquist> I'll be interest to see what you come up with
[16:10:29 CDT(-0500)] <EricDalquist> the other piece that may be missing is a way to reject new workers from the queue
[16:10:50 CDT(-0500)] <EricDalquist> potentially adding another abstract method to QualityOfServiceBlockingQueue
[16:11:05 CDT(-0500)] <EricDalquist> which would allow the subclass to reject a newly offered worker
[16:11:31 CDT(-0500)] <EricDalquist> otherwise if portlet foo gets hung up and never recovers we could be in a situation where foo's queue just keeps growing
[16:21:25 CDT(-0500)] <drewwills> sorry, got distracted – yes, rl experience it would be better simply not to accept workers for portlets that have gone rogue... since they normally do so for 10 min or more
[16:21:37 CDT(-0500)] <EricDalquist> yup
[16:21:39 CDT(-0500)] <drewwills> "rl experience suggests"*
[16:21:56 CDT(-0500)] <drewwills> ok, i'm excited to see what I come up with too
[16:21:58 CDT(-0500)] <EricDalquist> so that would be an additional enhancement to that QualityOfServiceBlockingQueue and the portlet worker subclass
[16:22:00 CDT(-0500)] <EricDalquist>
[16:22:30 CDT(-0500)] <EricDalquist> ok drewwills all my crn enhancements have been committed
[16:22:40 CDT(-0500)] <EricDalquist> more stuff I found working on exporting layouts from 3.2
[16:22:49 CDT(-0500)] <EricDalquist> also found two tables in dire need of indexes
[16:32:36 CDT(-0500)] <EricDalquist> on a semi-related note
[16:32:49 CDT(-0500)] <EricDalquist> uP4 appears to be a bit stricter on timeout enforcement
[16:33:16 CDT(-0500)] <EricDalquist> and we're running into a fun problem where uP calls interrupt on a portlet that has timed out while the portlet is in the middle of classloading
[16:33:31 CDT(-0500)] <EricDalquist> the result is the class it is loading gets marked bad and never loads
[16:33:58 CDT(-0500)] <EricDalquist> so I'm adding some logic to PortletExecutionManager to multiply a portlet's timeout by 20 the first 5 times it executes