Data Model ERDs
(revised October 2008)
Source Files
- Available in Rutgers ESS CVS for now.
Data Model Design
Process Flow (by example)
- Source/batch files are loaded into raw (R) tables
- The data is normalized and moved into standardized (S) tables, eg
prs_sor_role_records
and associated tables - Where a person has multiple records from a given SOR, the "best" biodem data is elected into
prs_sor_persons
- This covers (eg) correction of typos and name changes
- Where a person has multiple SORs, the "best" biodem data is elected into
prc_persons
- Note: The current table definition implies same SOR for best name & biodem
Guidelines
- The database is a "black box", so nothing sees it except for core Registry code. All manipulation is done via APIs.
- Where possible, tables should be consolidated to keep the number of tables down and simplify administering them. As a general rule of thumb, if two tables have the same structure and vary by only one column name, the tables should be consolidated.
Naming Conventions
- Table names are prefixed
CCT_
whereCC
indicates the responsible component andT
indicates the type of table as enumerated above. - Table and column names are all lowercase, with underscores (
_
) to separate words/fragments. StudlyCaps are not used. - Natural english is preferred over major/minor. So
start_date
, notdate_start
. - Column names should avoid incorporating the table name.
- The suffix
_id
indicates a row identifier. - The suffix
_t
indicates a type identifier, as defined inctx_data_types
.
Terminology
Table Data Descriptions
- (Pre)defined: Definitions provided out of the box, may be added to by local deployment
- Instantiated: Definitions added by local deployment
- Standardized: Normalized data transformed from SOR specific format to common format
- Raw: Untouched data from SOR
- Calculated: Data calculated by transformation on raw, standardized, or calculated data
Table Types
- Calculated (C): Tables holding calculated data. May also include Standardized data.
- Dictionary (D): Support tables that hold (pre)defined definitions external to the Registry. Example: A list of countries.
- Original (O): Tables that hold data original to/originated by the Registry.
- Raw (R): Tables that directly reflect the source (or "raw") data, with only minimal formatting changes.
- View (V): A database view, constructed of underlying tables.
- Standardized (S): Tables holding standardized data.
- Supporting (X): Tables that relate to the management of the system, and that are generally only instantiated or modified by the system or an administrator.