Active Document Cacher
Many uPortal deployers run into the same issue, how do I cache all of this remote web content (RSS, WebProxy, iCal, ...) locally to improve performance and reduce dependency on remote services. Yale, UW-Madison and many others have implemented their own local solutions to this problem with limited success and portability. This page is meant to collect requirements and design information for a general solution application for this problem.
Requirements
- Cached documents will be retrieved via a URL into the document cacher service.
- Actively retrieve a configured URL at a specified interval
- Ability to vary interval based on absolute timing or timing relative to last successful or failed retrieval
- Configure complex retrieval intervals
- One idea here would be to allow cron expressions
- Specify the action to take when a retrieval fails
- Continue serving old data
- Serv some per-URL error message
- Set an optional max age for cached data
- Share the cached data between multiple server nodes (Optional?)
- Allow 'easy' configuration via a big-long-url with all of the config parameters
Design Ideas
- Defined DocumentRetrievalService (DRS) interface
- DRS lookup by document URI
- cache service interface
- allows for per-document cache settings
- how to store service configuration?
- xstream - local xml file
- embedded database (similar to bookmarks?)
- quartz for scheduling
- need to have a db backed job store?