Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

[05:07:16 EST(-0500)] * mad (n=chatzill@pcit-8752.HIG.SE) has joined ##uportal
[09:48:37 EST(-0500)] * athena7 (n=athena7@adsl-99-130-147-23.dsl.wlfrct.sbcglobal.net) has joined ##uportal
[09:53:22 EST(-0500)] * EricDalquist (n=dalquist@bohemia.doit.wisc.edu) has joined ##uportal
[10:16:26 EST(-0500)] * lennard1 (n=sparhk@ip68-98-56-21.ph.ph.cox.net) has left ##uportal
[10:31:23 EST(-0500)] * lennard1 (n=sparhk@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal
[10:31:40 EST(-0500)] * holdorph (n=holdorph@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal
[10:45:19 EST(-0500)] * holdorph (n=holdorph@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal
[10:47:26 EST(-0500)] <lennard1> hey... pearson is starting to lean towards some sort of a reporting portlet again. Something that would allow them to monitor usage of the portal.
[10:47:50 EST(-0500)] <EricDalquist> well
[10:47:56 EST(-0500)] <EricDalquist> 3.1 will have database stats logging
[10:47:57 EST(-0500)] <lennard1> any chance something like that already exists... and isn't a guaranteed performance killer?
[10:48:09 EST(-0500)] <EricDalquist> and I'm just finishing up a spring batch tool to aggregate that data
[10:48:17 EST(-0500)] <EricDalquist> all we're missing is reporting off of those tables
[10:48:42 EST(-0500)] <lennard1> what is recorded?
[10:49:57 EST(-0500)] <EricDalquist> so the portal records a whole bunch of stuff
[10:50:11 EST(-0500)] <EricDalquist> you can look at the type hierarchy for PortalEvent
[10:50:18 EST(-0500)] <EricDalquist> our aggregator does:
[10:51:07 EST(-0500)] <EricDalquist> channel render count, average render time, max render time, action count, avg action time, max action time, targeted count, rendered from cache count
[10:51:15 EST(-0500)] <EricDalquist> tab render count, avg render time, max render time
[10:51:24 EST(-0500)] <EricDalquist> concurrent users
[10:51:31 EST(-0500)] <EricDalquist> login frequency
[10:51:40 EST(-0500)] <EricDalquist> total and unique logins
[10:52:07 EST(-0500)] <EricDalquist> each of those can be aggregated at any number of intervals (minute, 5minute, hour, day, week, month, quarter, academic term, year)
[10:52:18 EST(-0500)] <EricDalquist> and each of those is tracked globally and on a per-group basis
[10:52:32 EST(-0500)] <EricDalquist> which groups are tracked is also configurable
[10:52:40 EST(-0500)] <EricDalquist> for a reference on #s
[10:53:00 EST(-0500)] <lennard1> the portal keeps track of the numbers and then periodically writes the data to the db?
[10:53:16 EST(-0500)] <EricDalquist> so the portal itself just writes 'raw' stats data to a database
[10:53:30 EST(-0500)] <lennard1> how often?
[10:53:36 EST(-0500)] <EricDalquist> so one row per 'event' such as login, logout, channel render, tab render, etc
[10:53:42 EST(-0500)] <EricDalquist> the code uses JPA/Hib
[10:53:56 EST(-0500)] <EricDalquist> and batches those raw stats to the DB once per second
[10:53:59 EST(-0500)] <EricDalquist> then
[10:54:04 EST(-0500)] * lennard1 now has perf concerns... but can understand why that decision was made.
[10:54:18 EST(-0500)] <EricDalquist> there is almost no perf overhead
[10:54:30 EST(-0500)] <EricDalquist> the stats framework in the portal has its own thread pool
[10:54:53 EST(-0500)] <lennard1> ok... and that thread handles writing the data.
[10:54:53 EST(-0500)] <EricDalquist> so the 3.0 framework code always generates all stats
[10:55:03 EST(-0500)] <EricDalquist> what this DB version does is uses a concurrent queue
[10:55:09 EST(-0500)] <EricDalquist> portal threads write to that
[10:55:24 EST(-0500)] <EricDalquist> then a background thread fires every second and writes out all queued events
[10:55:32 EST(-0500)] <EricDalquist> the portal NEVER waits for stats
[10:55:42 EST(-0500)] <EricDalquist> the whole stats storing process can fail horribly
[10:55:46 EST(-0500)] <EricDalquist> and the portal will keep on chugging
[10:56:00 EST(-0500)] <EricDalquist> then we have an external aggregation process
[10:56:05 EST(-0500)] <EricDalquist> just finishing this part up
[10:56:07 EST(-0500)] <EricDalquist> uses spring batch
[10:56:15 EST(-0500)] <EricDalquist> reads in data from the raw stats tables
[10:56:18 EST(-0500)] <EricDalquist> and generates aggregates
[10:57:12 EST(-0500)] <EricDalquist> if you're tracking aggregates at a per-minute level 2mil raw stats rows generates about 600k aggregate rows (channel request aggr being 90% of that)
[10:57:19 EST(-0500)] <EricDalquist> we're aggregating at the 5 minute level
[10:57:46 EST(-0500)] <EricDalquist> and look to be translating about 2mil raw stats rows into about 180k rows of aggregates
[10:58:03 EST(-0500)] <EricDalquist> or about 20MB of table/index data per day in our aggregates
[10:58:15 EST(-0500)] <EricDalquist> and we keep about 2 weeks of raw stats which we can use to debug problems if needed
[10:58:16 EST(-0500)] <lennard1> just thinking about how that will scale for a user like pearson...
[10:58:30 EST(-0500)] <EricDalquist> yeah
[10:58:35 EST(-0500)] <EricDalquist> it depends on what they want to track
[10:58:50 EST(-0500)] <lennard1> nearling 6 million users and growing by an insane percentage every year.
[10:58:55 EST(-0500)] <EricDalquist> if you disable per-channel raw event logging that would drop the majority of the events
[10:59:02 EST(-0500)] <EricDalquist> how many concurrent users?
[10:59:16 EST(-0500)] <EricDalquist> and honestly, this is probably your best bet
[10:59:22 EST(-0500)] <EricDalquist> not sure what else you could do
[10:59:29 EST(-0500)] <lennard1> that is the kicker... I happen to think many students just create a new account rather than use their old one.
[10:59:31 EST(-0500)] <EricDalquist> the reality is for large installs ... stats is A LOT of data
[10:59:38 EST(-0500)] <lennard1> peak... probably 150k concurrent users.
[10:59:50 EST(-0500)] <EricDalquist> over what time range?
[10:59:55 EST(-0500)] <EricDalquist> is that 150k in a 5 minute window?
[11:00:33 EST(-0500)] <lennard1> well... for specifics have to look at a report I can't lay my hand on right now.
[11:00:53 EST(-0500)] <EricDalquist> ok
[11:00:55 EST(-0500)] <lennard1> 150k active sessions would be the safe bet.
[11:01:00 EST(-0500)] <EricDalquist> wow
[11:01:15 EST(-0500)] <lennard1> as to how active they are over a given minute or so... that varies.
[11:01:25 EST(-0500)] <lennard1> you can see why perf is a concern(smile)
[11:02:11 EST(-0500)] <EricDalquist> yeah we see between 500 & 800 concurrent users in a 5 minute window (defined by a stats event was generated by a user in that window)
[11:02:30 EST(-0500)] <EricDalquist> so the raw stats storage code is in trunk if you want to take a look
[11:02:54 EST(-0500)] <EricDalquist> we are planning on having this aggregation tool out by the end of the year but could put it out there earlier if folks are really interested
[11:03:00 EST(-0500)] <EricDalquist> we're still missing reporting tools though
[11:03:35 EST(-0500)] <lennard1> right now they are talking about tracking the 'community' functionality we have in the portal.
[11:03:51 EST(-0500)] <lennard1> that is only used by a much smaller subset of users (instructors only)
[11:04:13 EST(-0500)] <EricDalquist> so right now on the portal side you can filter stats by event type
[11:04:14 EST(-0500)] <lennard1> Pearson might be 'really interested'
[11:04:19 EST(-0500)] <lennard1> am in a meeting to find out now
[11:04:26 EST(-0500)] <EricDalquist> but it wouldn't be a stretch to add additional filtering options
[11:04:37 EST(-0500)] <EricDalquist> like only write out events that are from a specific group
[11:05:07 EST(-0500)] * lennard1 nods
[11:06:15 EST(-0500)] <EricDalquist> some other random #s on this stuff
[11:06:29 EST(-0500)] <EricDalquist> with that user load we generate about 50 events/second peak
[11:06:46 EST(-0500)] <EricDalquist> our current aggregator code looks like it can consistently process about 500 events/second
[11:07:00 EST(-0500)] <EricDalquist> database space & management is a BIG part of all of this
[11:07:21 EST(-0500)] <EricDalquist> and the aggregator code right now is oracle specific
[11:08:36 EST(-0500)] <EricDalquist> one thing I know some schools are using is google analytics
[11:08:54 EST(-0500)] <EricDalquist> which can provide a lot of the page/browser/remote host type metrics
[11:10:26 EST(-0500)] <EricDalquist> we would love to be able to collaborate with someone one reporting tools though
[11:41:19 EST(-0500)] <lennard1> do you track average length of sessions?
[11:41:28 EST(-0500)] <EricDalquist> no
[11:41:35 EST(-0500)] <EricDalquist> well not yet
[11:41:42 EST(-0500)] <EricDalquist> it may get added to our aggregates eventually
[11:42:01 EST(-0500)] <EricDalquist> oh and some updated numbers for processing
[11:42:22 EST(-0500)] <EricDalquist> doing 5 minute intervals as our smallest instead of 1 minute we're processing around 1000 events/second

  • No labels