Person Directory 2.0 Design Notes

Here are my notes from some internal design work on PD2.0, the primary goals are:

Simplify configuration, this will likely involved a custom Spring namespace handler to provide a more complete XML configuration language.
Improve lookup speed, adding in an ExecutorService to allow for parallel lookup of attributes from various sources.
Simplify the API, provide a try criteria API for complex searches in addition to the ability to lookup attributes for a single user.

Secondary goals:

Add JMX monitoring of performance of each attribute source.

QUESTIONS
# are attribute names case insensitive? YES according to PD1.5 behavior

api - public interface
* need to think/design the query builder API, something fluent would be good
** http://static.springsource.org/spring-ldap/site/apidocs/org/springframework/ldap/filter/package-frame.html
* do we need an Attribute class or are Attributes just Strings?
*
*
Complex queries and multiple attribute sources
* default root query object ORs its parts together?
* break root query object up by OR clause?
* the problem:
** Given a query like (firstName=Jane && (isStudent=Y || lastName=Doe))
** How do we handle sources that do not support all of the attributes in the query?
*** do a multi pass query, query sources that support all attributes first
*** query sources that support a subset of the attributes second, during merge filter these in code using the attributes that were not passed to the source
*** query non-searchable sources

General Query Logic
* attribute query
** ex: by username, [foo=bar, name=smith, ....]
** Run MS & PS sources
*** turn map into OR() criteria for MS
** Run S sources once per existing result
* criteria query
** ex: (firstName=jane && (lastName==smith || lastName=doe))
** Run MS sources
*** merge results
** Run PS sources
*** merge results
** Run S sources once per existing result

attribute source classes - how do we tell/config the difference?
* fully searchable (MS) - CriteriaSearchAttributeSource
** uses a query template (supports arbitrary logic)
** ldap or primary use directories go here
* partial searchable (PS) - SimpleSearchAttributeSource
** uses named placeholders but still can return multiple people for one query
** small associated sources go here
* single-person only (S)
** will only ever return a single result ... is this useful?
** in=memory sources like for shib go here

spi - what code in support implements to provide data

core - big ugly guts
* core code that does
** dependency tree calc of sources
** determine query order and potential for parallelism, probably better to figure it with always parallel and having "block" spots that wait for other sources to complete
** caching of results from each source
** handling of query timeouts
** merging results from various sources
** mapping attribute names from the API side to the SPI side
** jmx metrics for per-source usage & performance
** primaryId
*** Used when a find person by primary id query is run
*** Used to merge data from multiple sources (each result must have a primaryId set)

* add a list of AttributeSourceFilter
** these are called in order (sorted by ordered)
** if any filter returns false the filtered source is not executed
** filterchain style API that allows for modification of search?

* dependency tree calculation on configured attribute sources
** needs to fail to init if something is wrong with the tree
** this probably needs to be calculated and cached for each query since the tree will look different every time based on the input

* caching of results - part of XML config support
** for each configured source, set cache name or reference to Ehcache bean
** optional cache name/ref for misses
** optional cache name/ref for exceptions

* query timeout - part of XML config support
** set maximum wait for query result
** set behavior on timeout? (ignore, fail)

* merge behavior - part of XML config
** does it work for each source to have a "prepend/append/overwrite" flag?
** if so we probably need support for Spring's Orderable on the SPI impl

* attribute name mapping - part of XML config support
** for each configured source, option to allow for saying api attr "username" is actually "uid" in this spi

* attribute lists
** - in the config are these the PD side or the source side of the attr mapping?
** - at least one required or optional search attribute must be specified
** required search
*** ALL of these attributes must be include in a query for this source to be able to run the query
** optional search
*** This plus the required set make up the collection of attributes that can be used to search, attributes outside this set are ignored
** available return
*** The list of attributes the source returns, this is a best-effort set and the source may return more attributes than are named in the set

support
* attribute sources
** jdbc (MS,MP,S)
*** single row
*** multi row
** ldap (MS,MP,S)
** xml (MS)
** request attribute (S)
* filters
** regex
** spel