Person Directory 2.0 Design Notes

Here are my notes from some internal design work on PD2.0, the primary goals are:

  • Simplify configuration, this will likely involved a custom Spring namespace handler to provide a more complete XML configuration language.
  • Improve lookup speed, adding in an ExecutorService to allow for parallel lookup of attributes from various sources.
  • Simplify the API, provide a try criteria API for complex searches in addition to the ability to lookup attributes for a single user.

Secondary goals:

  • Add JMX monitoring of performance of each attribute source.

 

QUESTIONS

  1. are attribute names case insensitive? YES according to PD1.5 behavior


api - public interface

Complex queries and multiple attribute sources

  • default root query object ORs its parts together?
  • break root query object up by OR clause?
  • maxResults?
  • the problem:
    • Given a query like (firstName=Jane && (isStudent=Y || lastName=Doe))
    • How do we handle sources that do not support all of the attributes in the query?
      • do a multi pass query, query sources that support all attributes first
      • query sources that support a subset of the attributes second, during merge filter these in code using the attributes that were not passed to the source
      • query non-searchable sources

General Query Logic

  • attribute query
    • ex: by username, [foo=bar, name=smith, ....]
    • Run MS & PS sources
      • turn map into OR() criteria for MS
    • Run S sources once per existing result
  • criteria query
    • ex: (firstName=jane && (lastName==smith || lastName=doe))
    • Run MS sources
      • merge results
    • Run PS sources
      • merge results
    • Run S sources once per existing result

attribute source classes - how do we tell/config the difference?

  • fully searchable (MS) - CriteriaSearchAttributeSource
    • uses a query template (supports arbitrary logic)
    • ldap or primary use directories go here
  • partial searchable (PS) - SimpleSearchAttributeSource
    • uses named placeholders but still can return multiple people for one query
    • small associated sources go here
  • single-person only (S)
    • will only ever return a single result ... is this useful?
    • in=memory sources like for shib go here


spi - what code in support implements to provide data


core - big ugly guts

  • core code that does
    • dependency tree calc of sources
    • determine query order and potential for parallelism, probably better to figure it with always parallel and having "block" spots that wait for other sources to complete
    • caching of results from each source
    • handling of query timeouts
    • merging results from various sources
    • mapping attribute names from the API side to the SPI side
      • TODO move this into a transformation API
    • jmx metrics for per-source usage & performance
    • primaryId
      • Used when a find person by primary id query is run
      • Used to merge data from multiple sources (each result must have a primaryId set)
  • add a list of AttributeSourceFilter
    • these are called in order (sorted by ordered)
    • if any filter returns false the filtered source is not executed
    • filterchain style API that allows for modification of search?
  • dependency tree calculation on configured attribute sources
    • needs to fail to init if something is wrong with the tree
    • this probably needs to be calculated and cached for each query since the tree will look different every time based on the input
  • caching of results - part of XML config support
    • for each configured source, set cache name or reference to Ehcache bean
    • optional cache name/ref for misses
    • optional cache name/ref for exceptions
  • query timeout - part of XML config support
    • set maximum wait for query result
    • set behavior on timeout? (ignore, fail)
  • merge behavior
    • if two sources return different attribute values for the same attribute ignore the second and log an error
  • attribute name mapping - part of XML config support
    • for each configured source, option to allow for saying api attr "username" is actually "uid" in this spi
    • which direction does this mapping work?
  • attribute lists
    • in the config are these the PD side or the source side of the attr mapping?
    • at least one required or optional search attribute must be specified
    • required search
      • ALL of these attributes must be include in a query for this source to be able to run the query 
    • optional search
      • This plus the required set make up the collection of attributes that can be used to search, attributes outside this set are ignored
    • available return 
      • The list of attributes the source returns, this is a best-effort set and the source may return more attributes than are named in the set

support

  • attribute sources
    • jdbc (MS,MP,S)
      • single row
      • multi row
    • ldap (MS,MP,S)
    • xml (MS)
    • request attribute (S)
  • filters
    • regex
    • spel