Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

ยซ Previous Version 43 Next ยป

[11:27:20 CST(-0600)] <jwennmacher1> EricDalquist: Good morning. Drew suggested I contact you about working on some statistics reporting.

[11:28:25 CST(-0600)] <EricDalquist> hi

[11:28:27 CST(-0600)] <EricDalquist> awesome

[11:29:21 CST(-0600)] <jwennmacher1> I am at a very early stage; I was just glancing at the statistics that are being collected right now. Initial observation is there are two sets currently collected not being reported on. Portlet/folder added/deleted/removed from layout. I was thinking maybe of starting with one of those (probably portlet). Thoughts?

[11:30:10 CST(-0600)] <EricDalquist> well, let me do a little overview of the whole stats/aggregation/reporting system

[11:30:51 CST(-0600)] <EricDalquist> opening up uPortal source just a minute ....

[11:33:09 CST(-0600)] <EricDalquist> so forgive me if you're familiar with parts of this already but I figure a full picture is good

[11:33:32 CST(-0600)] <EricDalquist> uportal uses an extension of the spring application context event apis

[11:33:38 CST(-0600)] <jwennmacher1> yep

[11:33:56 CST(-0600)] <EricDalquist> one of the event handlers sticks the events onto a concurrent queue

[11:34:08 CST(-0600)] <EricDalquist> and a background thread periodically flushes them out to the db via JpaPortalEventStore

[11:34:16 CST(-0600)] <EricDalquist> these are what we call raw events

[11:34:21 CST(-0600)] <EricDalquist> and are actually stored as JSON CLOBs

[11:35:32 CST(-0600)] <EricDalquist> then there is a background process that runs on one machine in the cluster that periodically aggregates that raw event data into some form that is easier to report on / process

[11:35:46 CST(-0600)] <EricDalquist> PortalEventProcessingManagerImpl is essentially the entry point for all of that logic

[11:36:45 CST(-0600)] <EricDalquist> that uses all instances of IPortalEventAggregator it finds in the app context, so we have sort of a pluggable api for doing this aggregation work

[11:37:26 CST(-0600)] <EricDalquist> right now we have aggregators that track: concurrent users, logins (unique & total), tab renders, & portlet executions

[11:37:40 CST(-0600)] <EricDalquist> this code is really finicky to write

[11:37:45 CST(-0600)] <EricDalquist> and VERY performance sensitive

[11:38:01 CST(-0600)] <EricDalquist> since you need to make sure the aggregator can handle processing the data faster than it is created

[11:39:08 CST(-0600)] <EricDalquist> for example looking at our logs at UW right now

[11:39:13 CST(-0600)] <EricDalquist> the aggregator is falling behind a bit

[11:39:20 CST(-0600)] <EricDalquist> Aggregated 10000 events created at 16.4745 events/second between 2012-12-11T09:54:51.359-06:00 and 2012-12-11T10:04:59.202-06:00 in 1108885ms - 9.0181 e/s a 0.5474x speedup.

[11:39:27 CST(-0600)] <EricDalquist> but we track A LOT of data

[11:39:41 CST(-0600)] <EricDalquist> and this seems to happen as our DB index statistics slowly get out of data

[11:39:46 CST(-0600)] <EricDalquist> out of date*

[11:39:54 CST(-0600)] <EricDalquist> so just something to keep in mind

[11:40:44 CST(-0600)] <EricDalquist> then there is the reporting piece

[11:40:59 CST(-0600)] <EricDalquist> we currently have two of those, LoginTotalsStatisticsController and ConcurrentUsersStatisticsController

[11:41:15 CST(-0600)] <EricDalquist> these are what let us get nice graphs/reports out of the aggregated data

[11:41:24 CST(-0600)] <EricDalquist> ok โ€ฆ does that all make sense?

[11:42:05 CST(-0600)] <jwennmacher1> yes. Good to know, especially the performance requirement

[11:43:13 CST(-0600)] <EricDalquist> so for a place to start

[11:43:28 CST(-0600)] <EricDalquist> I think what may actually be the best spot are the reporting portlets

[11:43:46 CST(-0600)] <EricDalquist> we are currently collecting a ton of data about tab renders and portlet executions

[11:44:01 CST(-0600)] <jwennmacher1> yes. I see there are aggregators already written for portlet execution and tab mapping. I haven't checked to see if they are used yet. Would these be good candidates to consider since some of the foundation work appears to be present? I'm still somewhat of a newbie; I've done a bit of portlet work but not uPortal yet. I only have a few days to contribute before I'm off to another project for a while.

[11:44:27 CST(-0600)] <jwennmacher1> Reporting portlets are what Drew and I discussed.

[11:45:06 CST(-0600)] <EricDalquist> yeah these would be the best place to start

[11:45:14 CST(-0600)] <EricDalquist> I'd probably start with tabs first

[11:45:20 CST(-0600)] <EricDalquist> as they have the simpler of the two data models

[11:45:32 CST(-0600)] <jwennmacher1> Sounds good.

[11:46:05 CST(-0600)] <jwennmacher1> Have the aggregators had adequate performance testing or will I need to be concerned about that?

[11:46:25 CST(-0600)] <EricDalquist> yeah they have had a lot of performance testing

[11:46:28 CST(-0600)] <EricDalquist> many hours with a profile

[11:46:32 CST(-0600)] <EricDalquist> profiler*

[11:46:42 CST(-0600)] <EricDalquist> what I would recommend is starting with LoginTotalsStatisticsController

[11:46:45 CST(-0600)] <EricDalquist> copying that

[11:47:03 CST(-0600)] <EricDalquist> and reworking it to work against the TabRenderAggregationDao

[11:47:45 CST(-0600)] <EricDalquist> so the tab renders have one more "dimension" than logins to

[11:48:06 CST(-0600)] <EricDalquist> logs have: date&time & group

[11:48:17 CST(-0600)] <EricDalquist> logins have* (sorry for all the typos this morning)

[11:48:31 CST(-0600)] <EricDalquist> tab renders have: date&time, group & tab name

[11:48:38 CST(-0600)] <EricDalquist> so that is a little bit of added complexity

[11:49:20 CST(-0600)] <EricDalquist> ConcurrentUsersStatisticsController and LoginTotalsStatisticsController are good examples to get you started though

[11:49:47 CST(-0600)] <EricDalquist> the reporting portlet should just "auto detect" any other controllers that implement BaseStatisticsReportController

[11:49:52 CST(-0600)] <EricDalquist> and show it in the report list

[11:50:13 CST(-0600)] <EricDalquist> so just a copy and paste of LoginTotalsStatisticsController and then reworking for tab renders will be a good first step

[11:50:42 CST(-0600)] <EricDalquist> once you get that working and are more comfortable we can talk about additional report uis

[11:52:20 CST(-0600)] <EricDalquist> since tab renders track render count and then a bunch of data about the render time: sum of squares, population variance, geometric mean, sum of logs, mean, variance, standard deviation, max, min, and sum

[11:52:26 CST(-0600)] <EricDalquist> so lots of time to render data

[11:52:43 CST(-0600)] <EricDalquist> which could turn into some interesting reports

[11:52:46 CST(-0600)] <EricDalquist> even non-graph reports

[11:53:05 CST(-0600)] <EricDalquist> like for the portlet execution side (which tracks the same timing data) we'd love to have a "slowest portlet" report

[11:53:21 CST(-0600)] <EricDalquist> like I can log into the portal and see which portlets are taking the longest to render over the last 5 minutes

[11:53:44 CST(-0600)] <EricDalquist> ok โ€ฆ I think I'm done with my wall of text

[11:53:50 CST(-0600)] <EricDalquist> I'll be around all day/week

[11:54:03 CST(-0600)] <EricDalquist> so just poke me if you have questions or even want to chat about report ideas

[11:57:11 CST(-0600)] <jwennmacher1> Thanks. good idea on slowest portlet. For tabs what are the 'groups' you mentioned as another dimension? It's same as normal groups (everyone, students, etc.)?

[11:57:47 CST(-0600)] <EricDalquist> yes, but to insulate from portal config changes the event aggregation has its own group, tab and portlet lookup tables

[11:58:23 CST(-0600)] <EricDalquist> AggregatedGroupLookupDao, AggregatedTabLookupDao, AggregatedPortletLookupDao

[11:58:40 CST(-0600)] <EricDalquist> these capture the group/tab/portlet data from the primary uPortal daos the first time it is seen

[11:58:48 CST(-0600)] <EricDalquist> and the stats data actuall refers to these

[11:58:54 CST(-0600)] <EricDalquist> this is to that if say a tab or portlet is deleted

[11:58:59 CST(-0600)] <EricDalquist> you don't lose the stats data about it

[11:59:18 CST(-0600)] <EricDalquist> note that you may well run into areas where there are missing APIs

[11:59:30 CST(-0600)] <EricDalquist> like no way to get a list of all the tabs in the lookup dao

[11:59:38 CST(-0600)] <EricDalquist> this is simply due to nothing needing that api yet

[11:59:49 CST(-0600)] <EricDalquist> so you or I will need to add those APIs when you find the holes

[12:01:22 CST(-0600)] <jwennmacher1> Ahh I see what you mean about insulating. Gotcha. Thanks for the overview. That helps me quite a bit.

[12:01:44 CST(-0600)] <jwennmacher1> I'm sure I'll have tons of questions as I dig into it (smile)

[12:04:18 CST(-0600)] <EricDalquist> (smile)

[12:06:42 CST(-0600)] <EricDalquist> drewwills: you have a few minutes to talk about person diretory?

[12:06:48 CST(-0600)] <EricDalquist> on a more abstract level?

[12:30:34 CST(-0600)] <drewwills> i will EricDalquist, sure

[12:31:04 CST(-0600)] <EricDalquist> so looking at PD on a higher level with various features

[12:31:12 CST(-0600)] <EricDalquist> what do you think of the current sql/ldap query templating

[12:31:28 CST(-0600)]

<EricDalquist> where you stick a

Unknown macro: {0}

in where you want the search/restrictions to appear

[13:21:14 CST(-0600)] <drewwills1> sorry EricDalquist... i was on a call

[13:21:26 CST(-0600)] <EricDalquist> no problem

[13:21:59 CST(-0600)] <drewwills1> the issue i run into with that is that sometimes i need more flexibility

[13:22:22 CST(-0600)] <EricDalquist> yeah

[13:22:25 CST(-0600)] <EricDalquist> that is my thought as well

[13:22:29 CST(-0600)]

<drewwills1> i may need to "select foo from bar where netId =

Unknown macro: {username}

"

[13:22:29 CST(-0600)] <EricDalquist> I'm not sure what a solution is though

[13:22:58 CST(-0600)] <EricDalquist> since that works great for simple queries but doesn't work for attribute sources than can have a more flexible search done

[13:23:07 CST(-0600)] <drewwills1> one sec...

[13:23:07 CST(-0600)] <EricDalquist> I'm open to all ideas here (smile)

[13:26:27 CST(-0600)] <drewwills1> i've used this approach, and it's attractive in some ways: https://gist.github.com/4261275

[13:26:42 CST(-0600)] <drewwills1> more complex, i guess, but flexible

[13:27:13 CST(-0600)] <EricDalquist> ok

[13:27:31 CST(-0600)] <EricDalquist> so the idea is that attribute source has a fixed set of query attributes

[13:27:47 CST(-0600)] <EricDalquist> you always have to ask for data with username=X

[13:29:04 CST(-0600)] <EricDalquist> so maybe the solution is just a variety of options when configuration attribute sources

[13:29:15 CST(-0600)] <EricDalquist> we have some like this that use named parameters

[13:29:19 CST(-0600)] <drewwills1> that's fine, but you might want to do lower(username) = x

[13:29:50 CST(-0600)] <EricDalquist> oh I was just saying that the incoming PD query would be for username=X

[13:29:57 CST(-0600)] <drewwills1> ah yes

[13:30:00 CST(-0600)] <EricDalquist> and X gets inserted for :username

[13:30:07 CST(-0600)] <drewwills1> yep, totally

[13:30:46 CST(-0600)] <drewwills1> but something like this alows you to "decorate" the SQL where the rubber hits the road around username = x

[13:31:05 CST(-0600)] <EricDalquist> ok

[13:31:20 CST(-0600)] <drewwills1> you might even want to do "where :grad_year > 2014"

[13:31:40 CST(-0600)] <drewwills1> use cases with cascading DAOs

[13:32:26 CST(-0600)] <drewwills1> brb... lunch

[13:33:54 CST(-0600)] <EricDalquist> ok

[14:39:23 CST(-0600)] <EricDalquist> drewwills: you around?

[14:57:48 CST(-0600)] <drewwills> yep

[14:57:55 CST(-0600)] <EricDalquist> so more PD stuff

[14:58:06 CST(-0600)] <EricDalquist> do you ever see the need for more complex person queries?

[14:58:07 CST(-0600)] <drewwills> ok

[14:58:18 CST(-0600)] <EricDalquist> right now we have this terrible API

[14:58:27 CST(-0600)] <drewwills> +1

[14:58:29 CST(-0600)] <EricDalquist> where you give it a map of attribute name>value pairs

[14:58:40 CST(-0600)] <EricDalquist> and each source figures out how it will handle those

[14:58:49 CST(-0600)] <EricDalquist> do we really need something more expressive?

[14:59:15 CST(-0600)] <drewwills> maybe... can you throw out an example?

[14:59:17 CST(-0600)] <EricDalquist> like something where you could do (firstName=Jane && (lastName=Doe || lastName=Smith))

[14:59:33 CST(-0600)] <drewwills> hmmm... ppossibly

[14:59:53 CST(-0600)] <drewwills> that would suppost full-featured directory searching, for example

[15:00:04 CST(-0600)] <EricDalquist> note that something like that really only works for attribute sources that support a more free-form query building

[15:00:18 CST(-0600)] <EricDalquist> sources that are using direct named attribute replacement wouldn't really fit here

[15:00:22 CST(-0600)] <EricDalquist> they would either have to be ignored

[15:00:38 CST(-0600)] <EricDalquist> or queried once per result after getting results from the more flexible sources

[15:00:42 CST(-0600)] <drewwills> instead of passing attr/value pairs, you could pass criteria objects

[15:00:49 CST(-0600)] <EricDalquist> right

[15:01:18 CST(-0600)] <EricDalquist> I'd probably model something after: http://static.springsource.org/spring-ldap/site/apidocs/org/springframework/ldap/filter/package-frame.html

[15:01:34 CST(-0600)] <drewwills> looking

[15:02:28 CST(-0600)] <drewwills> sure... not dissimilar to PAGS and DLM evaluators actually

[15:02:35 CST(-0600)] <EricDalquist> yeah

[15:02:41 CST(-0600)] <EricDalquist> it is essentially the LDAP filter syntax

[15:02:42 CST(-0600)] <drewwills> i wonder if we could consolidate even

[15:02:45 CST(-0600)] <EricDalquist> in java object form

[15:02:58 CST(-0600)] <EricDalquist> so all of this is coming from a potential chunk of time I have from UW to do a big refactoring of PD

[15:03:09 CST(-0600)] <EricDalquist> our motivations are around better query support

[15:03:15 CST(-0600)] <EricDalquist> and better concurrent support

  • No labels