Person Directory 2.0 Design Notes

Here are my notes from some internal design work on PD2.0, the primary goals are:

Simplify configuration, this will likely involved a custom Spring namespace handler to provide a more complete XML configuration language.
Improve lookup speed, adding in an ExecutorService to allow for parallel lookup of attributes from various sources.
Simplify the API, provide a try criteria API for complex searches in addition to the ability to lookup attributes for a single user.

Secondary goals:

Add JMX monitoring of performance of each attribute source.

QUESTIONS

are attribute names case insensitive? YES according to PD1.5 behavior

api - public interface

need to think/design the query builder API, something fluent would be good
- http://static.springsource.org/spring-ldap/site/apidocs/org/springframework/ldap/filter/package-frame.html
do we need an Attribute class or are Attributes just Strings?

Complex queries and multiple attribute sources

default root query object ORs its parts together?
break root query object up by OR clause?
maxResults?
the problem:
- Given a query like (firstName=Jane && (isStudent=Y || lastName=Doe))
- How do we handle sources that do not support all of the attributes in the query?
  - do a multi pass query, query sources that support all attributes first
  - query sources that support a subset of the attributes second, during merge filter these in code using the attributes that were not passed to the source
  - query non-searchable sources

General Query Logic

attribute query
- ex: by username, [foo=bar, name=smith, ....]
- Run MS & PS sources
  - turn map into OR() criteria for MS
- Run S sources once per existing result
criteria query
- ex: (firstName=jane && (lastName==smith || lastName=doe))
- Run MS sources
  - merge results
- Run PS sources
  - merge results
- Run S sources once per existing result

attribute source classes - how do we tell/config the difference?

fully searchable (MS) - CriteriaSearchAttributeSource
- uses a query template (supports arbitrary logic)
- ldap or primary use directories go here
partial searchable (PS) - SimpleSearchAttributeSource
- uses named placeholders but still can return multiple people for one query
- small associated sources go here
single-person only (S)
- will only ever return a single result ... is this useful?
- in=memory sources like for shib go here

spi - what code in support implements to provide data

core - big ugly guts

core code that does
- dependency tree calc of sources
- determine query order and potential for parallelism, probably better to figure it with always parallel and having "block" spots that wait for other sources to complete
- caching of results from each source
- handling of query timeouts
- merging results from various sources
- mapping attribute names from the API side to the SPI side
  - TODO move this into a transformation API
- jmx metrics for per-source usage & performance
- primaryId
  - Used when a find person by primary id query is run
  - Used to merge data from multiple sources (each result must have a primaryId set)
add a list of AttributeSourceFilter
- these are called in order (sorted by ordered)
- if any filter returns false the filtered source is not executed
- filterchain style API that allows for modification of search?
dependency tree calculation on configured attribute sources
- needs to fail to init if something is wrong with the tree
- this probably needs to be calculated and cached for each query since the tree will look different every time based on the input
caching of results - part of XML config support
- for each configured source, set cache name or reference to Ehcache bean
- optional cache name/ref for misses
- optional cache name/ref for exceptions
query timeout - part of XML config support
- set maximum wait for query result
- set behavior on timeout? (ignore, fail)
merge behavior
- if two sources return different attribute values for the same attribute ignore the second and log an error
attribute name mapping - part of XML config support
- for each configured source, option to allow for saying api attr "username" is actually "uid" in this spi
- which direction does this mapping work?
attribute lists
- in the config are these the PD side or the source side of the attr mapping?
- at least one required or optional search attribute must be specified
- required search
  - ALL of these attributes must be include in a query for this source to be able to run the query
- optional search
  - This plus the required set make up the collection of attributes that can be used to search, attributes outside this set are ignored
- available return
  - The list of attributes the source returns, this is a best-effort set and the source may return more attributes than are named in the set

support

attribute sources
- jdbc (MS,MP,S)
  - single row
  - multi row
- ldap (MS,MP,S)
- xml (MS)
- request attribute (S)
filters
- regex
- spel