[13:00:44 CDT(-0500)] <EricDalquist> The String.split method annoys me
[13:00:56 CDT(-0500)] <EricDalquist> every time you call it you're compiling a new Pattern
[13:01:04 CDT(-0500)] <EricDalquist> so it is useful
[13:01:07 CDT(-0500)] <EricDalquist> it saves a little code
[13:01:10 CDT(-0500)] <EricDalquist> but used in hot spots
[13:01:17 CDT(-0500)] <EricDalquist> it really chews CPU
[13:07:52 CDT(-0500)] <athena> yeah
[13:07:57 CDT(-0500)] <athena> that's a good point
[13:08:08 CDT(-0500)] <athena> can you feed it a pre-compiled pattern?
[13:08:14 CDT(-0500)] <EricDalquist> no but you can do:
[13:08:27 CDT(-0500)] <EricDalquist> p = Pattern.compile(...);
[13:08:33 CDT(-0500)] <EricDalquist> p.split(String)
[13:08:40 CDT(-0500)] <EricDalquist> and store p as a static final
[13:08:47 CDT(-0500)] <athena> that makes sense
[13:09:05 CDT(-0500)] <EricDalquist> maybe someday uPortal's code will be clean enough that we can add checkstyle
[13:11:25 CDT(-0500)] <athena>
[13:42:31 CDT(-0500)] <JoeMoore> Im trying to configure portlet parameters. It wont let me edit them at the beginning, but when I try to edit them with the rich editor I get http://pastebin.com/Q5Y9j5Cw
[14:50:19 CDT(-0500)] <JoeMoore> I guess my pastebin timed out (one hour). Let me know if someone wants to take a look and I'll paste again with a longer timeout on the problem of configuring portlet parameters.
[15:40:06 CDT(-0500)] <drewwills> EricDalquist I'd like to pick your brain for a sec about UP-3161: Misbehaving portlets can cause channel rendering threads to leak and bring the portal to its knees
[15:40:14 CDT(-0500)] <EricDalquist> ok
[15:40:20 CDT(-0500)] <EricDalquist> I have a few spare moments
[15:40:30 CDT(-0500)] <EricDalquist> working franticly toward a uP4.0.4 upgrade in 7 days
[15:40:46 CDT(-0500)] <drewwills> the way I remember it, things lie as follows: there is logic to detect and log the problem, which is "on" be default...
[15:41:06 CDT(-0500)] <EricDalquist> are we talking 3.2 or 4.0
[15:41:12 CDT(-0500)] <EricDalquist> because they have different mechanisms
[15:41:16 CDT(-0500)] <drewwills> there is logic to prevent the offending portlets from taking more threads, which is "off" by default
[15:41:20 CDT(-0500)] <drewwills> all of this is 3.2
[15:41:23 CDT(-0500)] <EricDalquist> ok
[15:41:47 CDT(-0500)] <drewwills> and (now) for 4.x, there's different realities, and I'm not sure how much is there
[15:42:50 CDT(-0500)] <drewwills> question 1: can we turn on the thread-capping tech? (currently based on positive-integer threadshold for not allocating more)
[15:43:17 CDT(-0500)] <drewwills> question #2: do we need any of that translated and added to 4?
[15:43:19 CDT(-0500)] <EricDalquist> for 3.2, if you're very confident it wont' break existing 3.2 uPortal installs: yes
[15:43:22 CDT(-0500)] <EricDalquist> for 4.0
[15:43:32 CDT(-0500)] <EricDalquist> the portlet execution is done by the org.jasig.portal.utils.threading.QoSThreadPoolExecutorFactoryBean
[15:43:41 CDT(-0500)] <drewwills> pretty sure... OHIO has been running it for 2 quarters or more
[15:43:43 CDT(-0500)] <EricDalquist> sorry that creates the ExecutorService that is used
[15:43:56 CDT(-0500)] <EricDalquist> and the #s will scale well to schools of various sizes?
[15:44:13 CDT(-0500)] <EricDalquist> the Queue for that executor service is org.jasig.portal.portlet.rendering.worker.PortletWorkerExecutionQueue
[15:44:19 CDT(-0500)] <drewwills> i should think... we can pick a high number if we're worried
[15:44:25 CDT(-0500)] <EricDalquist> that would be good
[15:44:42 CDT(-0500)] <EricDalquist> the idea in 4.0 is that the queue that feeds the thread pool is actually a pool of queues
[15:44:46 CDT(-0500)] <EricDalquist> with one queue per fname
[15:44:55 CDT(-0500)] <drewwills> maybe change default thread pool size to 100, max errant to 10?
[15:45:15 CDT(-0500)] <EricDalquist> so that means if one fname has more than 10 threads in use it stops?
[15:45:20 CDT(-0500)] <drewwills> ah one queue per fname means this problem doesn't exist, basically
[15:45:29 CDT(-0500)] <drewwills> no, not that...
[15:45:30 CDT(-0500)] <EricDalquist> well it still could in 4.0
[15:45:40 CDT(-0500)] <EricDalquist> but it is at least mitigated
[15:46:02 CDT(-0500)] <drewwills> it means if 1 fname hase more than 10 rendering FOR MORE THAN DOUBLE THE CONFIGURED TIMEOUT, it won't get more threads
[15:46:24 CDT(-0500)] <EricDalquist> ah
[15:46:24 CDT(-0500)] <EricDalquist> ok
[15:46:28 CDT(-0500)] <EricDalquist> that seems reasonable
[15:46:29 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletWorkerExecutionQueue.java
[15:46:49 CDT(-0500)] <EricDalquist> QualityOfServiceBlockingQueue handles all the real work of maintaining a thread safe queue of queues
[15:46:58 CDT(-0500)] <drewwills> roger
[15:47:21 CDT(-0500)] <EricDalquist> PortletWorkerExecutionQueue essentially has the work of:
[15:47:21 CDT(-0500)] <EricDalquist> 1. getElementKey - determines the key for the queue the worker gets placed into
[15:47:23 CDT(-0500)] <drewwills> so if 1 portlet (by fname) runs out of threads, it doesn't affect the others, in essence?
[15:47:36 CDT(-0500)] <EricDalquist> 2. getNextElementKey - get the next queue name to grab a worker from
[15:47:59 CDT(-0500)] <EricDalquist> so lets say right now you have a queue that has had the following portlets offered to it:
[15:48:20 CDT(-0500)] <EricDalquist> proxy, news, alerts, news, proxy, news, news
[15:48:30 CDT(-0500)] <EricDalquist> you would have 3 internal queues
[15:48:44 CDT(-0500)] <drewwills> oh, and one more thing: if the portlet starts behaving, and the number of "errant" threads come beack below the threshold, it will start getting threads again
[15:49:07 CDT(-0500)] <EricDalquist> proxy (2 entries), news (4 entries), alerts (1 entry)
[15:49:13 CDT(-0500)] <drewwills> roger
[15:49:39 CDT(-0500)] <EricDalquist> the PortletWorkerExecutionQueue gets a list of all internal queue names
[15:49:59 CDT(-0500)] <EricDalquist> and when getNextElementKey it finds the next internal queue with a worker in it
[15:50:20 CDT(-0500)] <EricDalquist> and when it gets to the end of the queue name list it refreshes the list and starts again
[15:50:26 CDT(-0500)] <EricDalquist> so you can still have thread starvation here
[15:51:06 CDT(-0500)] <drewwills> so it's a multi-queue setup, but not multi-threadpool?
[15:51:17 CDT(-0500)] <EricDalquist> right
[15:51:21 CDT(-0500)] <EricDalquist> there is one pool
[15:51:28 CDT(-0500)] <EricDalquist> with one "meta queue" that feeds it
[15:51:41 CDT(-0500)] <EricDalquist> and that meta queue needs does some very basic 'balancing' right now
[15:52:08 CDT(-0500)] <EricDalquist> what needs to happen is the logic in PortletWorkerExecutionQueue.getNextElementKey needs to get smarter
[15:52:24 CDT(-0500)] <drewwills> what if we did something similar... detect when an fname has n number of rendering threads allocated+overdue by 2x or more... then log and refuse to allocate more
[15:52:54 CDT(-0500)] <EricDalquist> yup
[15:53:13 CDT(-0500)] <EricDalquist> so the best way to do that is to add some share instrumentation between
[15:53:14 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletExecutionWorker.java
[15:53:14 CDT(-0500)] <EricDalquist> and
[15:53:14 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletWorkerExecutionQueue.java
[15:53:33 CDT(-0500)] <EricDalquist> PortletExecutionWorker is the base class for the four portlet execution types (render, action, resource, event)
[15:53:56 CDT(-0500)] <EricDalquist> it handles all the logic around knowing when a worker is actually running
[15:54:00 CDT(-0500)] <EricDalquist> if it has timed out
[15:54:01 CDT(-0500)] <EricDalquist> etc
[15:54:10 CDT(-0500)] <EricDalquist> hrm
[15:54:12 CDT(-0500)] <EricDalquist> actually
[15:54:12 CDT(-0500)] <drewwills> one pool for all execution types, right?
[15:54:39 CDT(-0500)] <EricDalquist> yes
[15:54:47 CDT(-0500)] <EricDalquist> portlet workers in uP4 have an interceptor API
[15:54:49 CDT(-0500)] <EricDalquist> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/contexts/portletContainerContext.xml#L43
[15:55:29 CDT(-0500)] <EricDalquist> here is an example: https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/ThreadNamingPortletExecutionInterceptorAdaptor.java
[15:55:52 CDT(-0500)] <EricDalquist> so that lets you write a little class that gets called preSubmit, preExec, and postExec
[15:56:05 CDT(-0500)] <EricDalquist> so you could write a portlet execution interceptor
[15:56:20 CDT(-0500)] <EricDalquist> that counts the number of portlets of each fname in the pool
[15:56:25 CDT(-0500)] <drewwills> one that would keep a count
[15:56:26 CDT(-0500)] <drewwills> ?
[15:56:32 CDT(-0500)] <drewwills> yes, exactly
[15:56:40 CDT(-0500)] <EricDalquist> and uses that info to influence sub-queue skipping in PortletWorkerExecutionQueue
[15:56:51 CDT(-0500)] <drewwills> terrific, that's great
[15:57:13 CDT(-0500)] <drewwills> I have some bandwidth to take this on under coop – sound good?
[15:58:00 CDT(-0500)] <EricDalquist> great
[15:58:16 CDT(-0500)] <EricDalquist> create a feature-branch off master if you would
[15:58:25 CDT(-0500)] <drewwills> sure deal
[15:58:38 CDT(-0500)] <EricDalquist> then if you push that to your fork we can collaborate on code better
[15:58:56 CDT(-0500)] <drewwills> wait – feature branch in Jasig/uPortal? not a ticket branch in my personal?
[15:59:08 CDT(-0500)] <EricDalquist> oh either works actually
[15:59:44 CDT(-0500)] <EricDalquist> so the other place to consider looking is https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/portlet/rendering/worker/PortletExecutionWorker.java#L244
[15:59:54 CDT(-0500)] <EricDalquist> that is the handler that gets called every time a portlet times out
[16:00:44 CDT(-0500)] <drewwills> does postExecution() get called even if the tread times out, is killed, errors out, etc?
[16:00:56 CDT(-0500)] <EricDalquist> I believe so
[16:01:01 CDT(-0500)] <EricDalquist> it gets called in a finally block
[16:01:16 CDT(-0500)] <EricDalquist> so anything short of the thread dieing should result in execution
[16:01:42 CDT(-0500)] <drewwills> roger
[16:03:00 CDT(-0500)] <EricDalquist> another thing you might want to look at in all of this is PortletExecutionManager.checkWorkerCompletion
[16:03:10 CDT(-0500)] <EricDalquist> that is called after each request the portal handles
[16:03:24 CDT(-0500)] <EricDalquist> the portal keeps track of all workers started during a request as a request attribute
[16:03:36 CDT(-0500)] <EricDalquist> that goes through and makes sure each worker has completed
[16:04:33 CDT(-0500)] <EricDalquist> if it finds a worker that hasn't completed it tries to cancel the worker (eventually Thread.interrupt is called) and adds the worker to a singleton queue
[16:05:00 CDT(-0500)] <EricDalquist> there is a background process that runs every 200ms which goes through that queue and calls interrupt on each worker until the worker eventually stops
[16:05:50 CDT(-0500)] <athena> EricDalquist: i think i've found the issue with the layout.json document. that URL wasn't behind the URL canonicalizing filter until this commit: https://github.com/Jasig/uPortal/commit/53d54bc66ea808414614d62a2a52175a3b167da0
[16:06:07 CDT(-0500)] <athena> er, wrong link - https://github.com/Jasig/uPortal/commit/6e7396b547a32ea2ecdf301e0f03de6db85c675d
[16:06:20 CDT(-0500)] <EricDalquist> ah
[16:06:29 CDT(-0500)] <EricDalquist> right that url shouldn't get canonicalized
[16:06:30 CDT(-0500)] <drewwills> i see... so if we had another process – or perhaps the same process – count the ones that won't die by fname, it could trigger the no-new-threads valve
[16:06:38 CDT(-0500)] <EricDalquist> yup
[16:06:47 CDT(-0500)] <EricDalquist> so drewwills there is no right answer here
[16:06:54 CDT(-0500)] <EricDalquist> but there are a lot of places we can hook into portlet execution
[16:06:58 CDT(-0500)] <athena> i can just edit that line out as long as that wn't cause any other issues
[16:06:59 CDT(-0500)] <drewwills> or perhaps there are several
[16:07:06 CDT(-0500)] <EricDalquist> athena: yes
[16:07:21 CDT(-0500)] <athena> cool.
[16:07:37 CDT(-0500)] <EricDalquist> that was an oversight
[16:07:39 CDT(-0500)] <EricDalquist> drewwills: yes
[16:07:48 CDT(-0500)] <EricDalquist> and probably several metrics we could eventually watch
[16:07:49 CDT(-0500)] <athena> no worries
[16:08:00 CDT(-0500)] <EricDalquist> # of workers that won't die per fname
[16:08:09 CDT(-0500)] <EricDalquist> # of workers that timeout per fname
[16:08:15 CDT(-0500)] <EricDalquist> # of workers in the pool per fname
[16:08:22 CDT(-0500)] <EricDalquist> though that last one may not be a good once
[16:08:40 CDT(-0500)] <EricDalquist> since you could have a portal where 50% of your work is one fname
[16:08:43 CDT(-0500)] <EricDalquist> and that is ok
[16:10:14 CDT(-0500)] <EricDalquist> I'll be interest to see what you come up with
[16:10:29 CDT(-0500)] <EricDalquist> the other piece that may be missing is a way to reject new workers from the queue
[16:10:50 CDT(-0500)] <EricDalquist> potentially adding another abstract method to QualityOfServiceBlockingQueue
[16:11:05 CDT(-0500)] <EricDalquist> which would allow the subclass to reject a newly offered worker
[16:11:31 CDT(-0500)] <EricDalquist> otherwise if portlet foo gets hung up and never recovers we could be in a situation where foo's queue just keeps growing
[16:21:25 CDT(-0500)] <drewwills> sorry, got distracted – yes, rl experience it would be better simply not to accept workers for portlets that have gone rogue... since they normally do so for 10 min or more
[16:21:37 CDT(-0500)] <EricDalquist> yup
[16:21:39 CDT(-0500)] <drewwills> "rl experience suggests"*
[16:21:56 CDT(-0500)] <drewwills> ok, i'm excited to see what I come up with too
[16:21:58 CDT(-0500)] <EricDalquist> so that would be an additional enhancement to that QualityOfServiceBlockingQueue and the portlet worker subclass
[16:22:00 CDT(-0500)] <EricDalquist>
[16:22:30 CDT(-0500)] <EricDalquist> ok drewwills all my crn enhancements have been committed
[16:22:40 CDT(-0500)] <EricDalquist> more stuff I found working on exporting layouts from 3.2
[16:22:49 CDT(-0500)] <EricDalquist> also found two tables in dire need of indexes