Data Model

Data Model ERDs

(revised April 2011)

Currently Being Revised

Initial Design

Source Files

  • Available in SVN

Notes

Data Model Design

Process Flow (by example)

  1. Source/batch files are loaded into raw (R) tables
  2. The data is normalized and moved into standardized (S) tables, eg prs_sor_role_records and associated tables
  3. Where a person has multiple records, the "best" biodem data is elected into prc_persons## This covers (eg) correction of typos and name changes

Guidelines

  1. The database is a "black box", so nothing sees it except for core Registry code. All manipulation is done via APIs.
  2. Where possible, tables should be consolidated to keep the number of tables down and simplify administering them. As a general rule of thumb, if two tables have the same structure and vary by only one column name, the tables should be consolidated.
  3. As a general rule, only Calculated data is referenced for publishing outside the Registry.

Assumptions

  1. Each SOR can only assert one set of biodem data and one official name.
  2. prs_sor_roles must have only one entry per role, where role is department + title.
  3. When an SOR role assertion disappears, the record remains in prc_role_records, possibly with a stop date added.

Naming Conventions

  1. Table names are prefixed CCT_ where CC indicates the responsible component and T indicates the type of table as enumerated above.
  2. Table and column names are all lowercase, with underscores (_) to separate words/fragments. StudlyCaps are not used.
  3. Natural english is preferred over major/minor. So start_date, not date_start.
  4. Column names should avoid incorporating the table name.
  5. The column name id indicates a row identifier.
  6. The suffix _t indicates a type identifier, as defined in ctx_data_types.

Terminology

Table Data Descriptions

  • (Pre)defined: Definitions provided out of the box, may be added to by local deployment
  • Instantiated: Definitions added by local deployment
  • Standardized: Normalized data transformed from SOR specific format to common format
  • Raw: Untouched data from SOR
  • Calculated: Data calculated by transformation on raw, standardized, or calculated data

Table Types

  • Calculated (C): Tables holding calculated data. May also include Standardized data.
  • Dictionary (D): Support tables that hold (pre)defined definitions external to the Registry. Example: A list of countries.
  • Original (O): Tables that hold data original to/originated by the Registry.
  • Raw (R): Tables that directly reflect the source (or "raw") data, with only minimal formatting changes.
  • View (V): A database view, constructed of underlying tables.
  • Standardized (S): Tables holding standardized data.
  • Supporting (X): Tables that relate to the management of the system, and that are generally only instantiated or modified by the system or an administrator.