uPortal IRC Logs-2009-06-01

[03:26:16 EDT(-0400)] * higmad (n=chatzill@pcit-8752.hig.se) has joined ##uportal
[08:36:22 EDT(-0400)] * athena (n=athena@99.129.100.66) has joined ##uportal
[09:13:28 EDT(-0400)] * jessm (n=Jess@c-71-232-1-65.hsd1.ma.comcast.net) has joined ##uportal
[09:18:07 EDT(-0400)] * fj4000 (n=Jacob@142.150.154.106) has joined ##uportal
[09:33:17 EDT(-0400)] * colinclark (n=colin@bas2-toronto09-1176406828.dsl.bell.ca) has joined ##uportal
[09:41:42 EDT(-0400)] * michelled (n=team@142.150.154.193) has joined ##uportal
[09:57:39 EDT(-0400)] * lennard1 (n=sparhk@ip68-98-56-21.ph.ph.cox.net) has left ##uportal
[10:00:19 EDT(-0400)] * higmad_ (n=chatzill@pcit-8752.HIG.SE) has joined ##uportal
[10:19:30 EDT(-0400)] * lennard1 (n=sparhk@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal
[10:48:24 EDT(-0400)] * holdorph (n=holdorph@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal
[10:56:59 EDT(-0400)] * invisibill (i=80876350@gateway/web/ajax/mibbit.com/x-724b69dba5ee780b) has joined ##uportal
[11:05:05 EDT(-0400)] * invisibill (i=80876350@gateway/web/ajax/mibbit.com/x-724b69dba5ee780b) has joined ##uportal
[11:31:20 EDT(-0400)] * EricDalquist (n=dalquist@bohemia.doit.wisc.edu) has joined ##uportal
[12:47:47 EDT(-0400)] * Sememmon (n=Sememmon@uni1.unicon.net) has joined ##uportal
[13:11:26 EDT(-0400)] * colinclark (n=colin@142.150.154.101) has joined ##uportal
[13:51:54 EDT(-0400)] <EricDalquist> just stumbled on this: http://code.google.com/apis/visualization/documentation/
[13:51:57 EDT(-0400)] <EricDalquist> like google charts
[13:52:06 EDT(-0400)] <EricDalquist> but a JavaScript API that outputs SVG
[13:52:17 EDT(-0400)] <EricDalquist> so you can have nice mouse-over effects and such
[13:53:06 EDT(-0400)] <athena> yes
[13:53:20 EDT(-0400)] <athena> they're super super cool
[13:53:36 EDT(-0400)] <athena> i was sort of hoping we could use that to display portal statistics someday
[13:53:53 EDT(-0400)] <athena> there are also some more fancy flash-based ones
[13:54:09 EDT(-0400)] <athena> someday too we could do things like display a little map of who's using uportal
[13:54:13 EDT(-0400)] <athena> other cool stuff like that
[13:55:12 EDT(-0400)] <EricDalquist> yeah
[13:55:21 EDT(-0400)] <EricDalquist> the charts API has the map based stuff
[13:56:20 EDT(-0400)] <athena> yep
[13:56:21 EDT(-0400)] * invisibill (i=80876350@gateway/web/ajax/mibbit.com/x-8b66fd7424afaa0d) has joined ##uportal
[13:56:31 EDT(-0400)] <athena> i really like it - i've been itching to use it for something in uportal
[13:56:40 EDT(-0400)] <EricDalquist> stats sounds great (smile)
[13:56:47 EDT(-0400)] <athena> yeah i think that's a perfect use case
[13:56:51 EDT(-0400)] <EricDalquist> it would be nice to get that aggregator code DB agnostic
[13:57:02 EDT(-0400)] <EricDalquist> if we can do that then I think we could probably package it into uPortal
[13:57:06 EDT(-0400)] <athena> and it prevents us from having to require people to run headless, store graphics on the server, etc.
[13:57:12 EDT(-0400)] <athena> gotcha
[13:57:31 EDT(-0400)] <EricDalquist> if we setup multi-server quartz it should work just fine for running the aggregation jobs
[13:58:19 EDT(-0400)] <athena> sounds good
[15:08:53 EDT(-0400)] * EricDalquist really wants that reporting tool for the stats aggregates
[15:09:24 EDT(-0400)] <EricDalquist> right now I'm stuck writing SQL: http://uportal.pastebin.com/d475a17d6
[15:09:52 EDT(-0400)] <athena> oh (sad)
[15:10:04 EDT(-0400)] <athena> so basically we need something that will work on databases other than oracle?
[15:10:21 EDT(-0400)] <EricDalquist> well that would be the first step
[15:10:22 EDT(-0400)] <athena> is the current issue that it's difficult to write HQL that doesn't have performance issues?
[15:10:32 EDT(-0400)] <EricDalquist> I didn't even try that
[15:10:38 EDT(-0400)] <athena> ah (smile)
[15:10:44 EDT(-0400)] <athena> started out just as an in-house thing?
[15:10:49 EDT(-0400)] <EricDalquist> yeah
[15:10:53 EDT(-0400)] <EricDalquist> so there are two sides to the app
[15:11:09 EDT(-0400)] <EricDalquist> the reading side, which only has very minimal oracle feature usage
[15:11:33 EDT(-0400)] <EricDalquist> it just selects the next X events that haven't been processed yet (tracking done via datestamps)
[15:11:53 EDT(-0400)] <EricDalquist> X defaults to like 10k events at a go which seems to be the best performance for us
[15:12:12 EDT(-0400)] <EricDalquist> though if the tool is running as expeted it aggregates every 60 seconds so there are never actually 10k events to proces
[15:12:32 EDT(-0400)] <EricDalquist> then it loads those into a minimal object model, doing lookups for things like groups, tabs & channels from the portal DB
[15:12:50 EDT(-0400)] <EricDalquist> then there is the writer side
[15:13:28 EDT(-0400)] <EricDalquist> it calculates as much as it can in-memory, flushing calculations out the DB at each event time interval, like for us we use 5 minutes as our smallest interval
[15:13:44 EDT(-0400)] <EricDalquist> so after each 5 mintues of events it flushes the aggregations to the aggregate DB
[15:14:15 EDT(-0400)] <athena> ah, interesting
[15:14:17 EDT(-0400)] <EricDalquist> but some of the stats, like unique logins or concurrent users, use a scratch table in the database to allow for calculations over long periods of time
[15:14:31 EDT(-0400)] <EricDalquist> updating those tables is where most of the time is spent
[15:14:35 EDT(-0400)] <athena> makes sense
[15:14:57 EDT(-0400)] <EricDalquist> especially if you have to do the "update, check for success, insert if failure"
[15:15:09 EDT(-0400)] <EricDalquist> like for channel stats
[15:15:20 EDT(-0400)] <EricDalquist> we would have to run that query once per channel per group per interval
[15:16:00 EDT(-0400)] <athena> oh wow
[15:16:03 EDT(-0400)] <EricDalquist> so what we did instead is stuff like this: http://uportal.pastebin.com/m3ebbafb1
[15:16:15 EDT(-0400)] <EricDalquist> which is the Oracle version of UPDATE OR INSERT
[15:16:19 EDT(-0400)] <athena> ouch (smile)
[15:16:26 EDT(-0400)] <EricDalquist> but only requires a single DB round trip
[15:16:36 EDT(-0400)] <EricDalquist> and can be batched
[15:16:39 EDT(-0400)] <EricDalquist> that was the other problem
[15:16:50 EDT(-0400)] <EricDalquist> with Oracle you can't batch SQL updates if you want to know the result value
[15:16:50 EDT(-0400)] <athena> gotcha
[15:17:15 EDT(-0400)] <EricDalquist> so with that merge I can make one DB call and update or insert 50+ entries
[15:17:27 EDT(-0400)] <EricDalquist> where as that would have potentially been 200 calls with DB agnostic SQL
[15:18:12 EDT(-0400)] <EricDalquist> there are 7 MERGE statements total
[15:18:25 EDT(-0400)] <EricDalquist> I think it could be possible to abstract that part of the aggregating DAOs
[15:18:32 EDT(-0400)] <EricDalquist> so you could write a db agnostic version
[15:18:44 EDT(-0400)] <EricDalquist> and still have DB specific versions
[15:18:57 EDT(-0400)] <EricDalquist> it is more than fast enough for us right now to keep up
[15:19:08 EDT(-0400)] <EricDalquist> the problem is if it gets behind because of the machine it is on being down
[15:19:28 EDT(-0400)] <EricDalquist> it has to be fast enough to catch up to a potential multi-day backlog before our raw stats DB grows too large
[15:19:44 EDT(-0400)] <athena> gotcha
[15:19:48 EDT(-0400)] <athena> that all makes a lot of sense
[15:20:04 EDT(-0400)] <athena> and i think mixing in db-specific code where performance requires it is a decent approach
[15:20:12 EDT(-0400)] <EricDalquist> I had actually written the whole thing to be DB-agnostic originally
[15:20:22 EDT(-0400)] <EricDalquist> but the performance was so bad we would never have caught up (tongue)
[15:20:27 EDT(-0400)] <athena> lol
[15:20:28 EDT(-0400)] <athena> yeah
[15:20:35 EDT(-0400)] <EricDalquist> but at a smaller school it could be fine
[15:20:46 EDT(-0400)] <EricDalquist> for us it has to handle peaks of 5k events/minute
[15:20:55 EDT(-0400)] <EricDalquist> and more in the future
[15:21:15 EDT(-0400)] <athena> well i think it really is a good thing to make it as performant as a school like UW requires
[15:21:41 EDT(-0400)] <athena> other schools might have less data, but also be running on less hardware
[15:21:43 EDT(-0400)] <EricDalquist> its too bad that those more advanced SQL features like Oracle's MERGE don't have a standard
[15:21:50 EDT(-0400)] <athena> and nice to have a solution that scales anyway
[15:21:51 EDT(-0400)] <athena> yeah, it is
[15:21:52 EDT(-0400)] <EricDalquist> well our DB hardware isn't anything huge
[15:22:02 EDT(-0400)] <EricDalquist> its really not a hardware limitation as much
[15:22:08 EDT(-0400)] <EricDalquist> just simple volume
[15:22:32 EDT(-0400)] <EricDalquist> hardware can only speed up having to do 200 DB calls so much versus compacting it into a single call
[15:22:49 EDT(-0400)] <EricDalquist> I think the db agnostic version I had those calls were sub 10ms each
[15:22:54 EDT(-0400)] <EricDalquist> but it adds up fast
[15:23:10 EDT(-0400)] <EricDalquist> but the code is out there
[15:23:27 EDT(-0400)] <EricDalquist> so hopefully Susan, Arlo or someone can help a bit with that
[15:25:29 EDT(-0400)] <EricDalquist> the other thing it still needs is a job to purge the spring-batch control tables
[15:25:38 EDT(-0400)] <EricDalquist> spring-batch doesn't provide any way to do that
[15:25:52 EDT(-0400)] <EricDalquist> and running 2 jobs ever minute has resulted in those tables getting rather large :p0
[15:32:08 EDT(-0400)] <athena> gotcha
[17:58:31 EDT(-0400)] * Sememmon (n=Sememmon@uni1.unicon.net) has joined ##uportal
[18:00:11 EDT(-0400)] * lennard1 (n=sparhk@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal
[19:13:01 EDT(-0400)] * Sememmon (n=Sememmon@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal
[20:33:06 EDT(-0400)] * lennard1 (n=sparhk@wsip-98-174-242-39.ph.ph.cox.net) has left ##uportal
[21:26:03 EDT(-0400)] * Sememmon (n=Sememmon@wsip-98-174-242-39.ph.ph.cox.net) has joined ##uportal