ssp-data-importer

ssp-data-importer is a tool for importing csv data for the external tables into the SSP database. This program inserts data from csv files into a database. Csv files have some latitude in parsing including whitespace, separator, and quote. Currently, the program supports only updates and inserts. It is possible to upload partial data updating specific columns. When used as a method to update already existing data, the only requirement is that the columns containing key values be supplied.

High-level workflow

The ssp-data-importer is expected to be used as part of a cronjob or other periodic method for running the program. The job should be set up to run with a period that is appropriate for how often the external tables are to be updated. It is important that the upload to the input folder be timed to the running of the cronjob. 1. A folder that will contain the .csv files (input folder) to import is monitored at runtime. 2. If the folder contains .csv files and the files have had a sufficient soak time since modification, the files are transfered to a folder (process folder) for reading and validation. (Validation is on a column basis, all errors in a given row are documented for the final report. Validation is based on database metadata constraints. 3. Validated rows are written to a second folder (upsert folder) 4. Where they are upserted into the staging tables in the database. 5. Additional validation steps are taken on the complete data set evaluating for inconsistencies and any potential duplications. 6. The validated staging tables are then used to update/insert data into the corresponding external_tables. 7. A report is generated giving pertinent information including any validation errors, total lines per table processed etc. 8. Finally, staging tables are truncated, processing and upset folders are removed and the processed files can be archived.

Pre-requisites

At a minimum you'll need a Java JDK install. This process can vary widely from platform to platform, so you're on you're own for that one. This program requires JDK 1.6+. You will need a postgres 9.1+ or SQL Server 2008+ database To build you will need a maven distribution.

BUILDING

ssp-data-importer is a maven project. It can be built and tested using the standard mvn commands. To build for distribution, an assembly/descriptor.xml is supplied, as well as a bat and sh script for running the importer.

Running the Program

It is recommended that the supplied scripts: runJob.bat (Windows) or runJob.sh (Unix) be used to run ssp-csv-importer.

arguments that need to be supplied:

argument for database profile -Dspring.profiles.active=profile: postgres|sqlserver
argument for location of override properties file -Dssp.importer.configdir=/location/of/override/properties/file/folder/

To run hourly using crontab: in command line type:

# crontab -e to edit

then add the following line:

0 0-23 * * * /root/scripts/sh/runJob.sh -Dspring.profiles.active=postgres -Dssp.importer.configdir=/ssp-properties/ ```

PROPERTIES FILE DESCRIPTION

There are a number of properties that are required for the program to run properly. All * values must be set as defaults are not supplied.

IMPORTANT! These properties are stored in a file title: sap-importer.properties. The location of which is set at run time. See Running the Program

FOLDER LOCATIONS NOTE: file: must be used for location identification.
*batch.tables.input.folder=file:/location/of/input/folder/ full path to folder that will contain initial csv files
*batch.tables.process.folder=file:/location/of/process/folder/ full path to folder where csv files will be processed
*batch.tables.upsert.folder=file:/location/of/upsert/folder full path to folder where csv files will be upserted
*batch.tables.archive.folder=file:/location/of/archive full path to archive folder

FILE LOCATIONS
batch.tables.input.files=${batch.tables.input.folder}*.csv full path to uploaded files
batch.tables.process.files=${batch.tables.process.folder}*.csv full path to files to process
batch.tables.upsert.files=${batch.tables.upsert.folder}*.csv full path to files to upsert

INITIALIZATION
batch.tables.lagTimeBeforeStartInMinutes=10 set minutes files must be unmodified before beginning processing (default: 10)

ARCHIVING
batch.tables.retain.input.archive= true turn archiving on default is true
batch.tables.archive=UNIQUE what files to archive, ALL, NONE, UNIQUE default: UNIQUE

TOLERANCE
batch.rawitem.skip.limit=10 number of lines generating validation errors to allow during read/write
batch.rawitem.commit.interval=100 size of the batch

batch.upsertitem.skip.limit=10 number of lines generating validation errors to allow during upsert
batch.upsertitem.commit.interval=100 size of batch. Larger batch sizes will reduce processing time make errors less specific

TESTING
batch.table.input.duplicate=false
exists.only.for.testing.1=default.value.1
exists.only.for.testing.2=default.value.2

DATABASE
db_name=ssp Default is ssp
batch.jdbc.url=jdbc:postgresql://127.0.0.1:5432/${db_name} The full URL to the source database (default:jdbc:postgresql://localhost:5432/ssp)
batch.jdbc.driver=org.postgresql.Driver The driver to be used example: (default:org.postgresql.Driver) or net.sourceforge.jtds.jdbc.Driver
batch.jdbc.user=sspadmin The username for the source database (default:sspadmin)
batch.jdbc.password=sspadmin The password for the source database. (default:sspadmin)

REPORTING
batch.title=ssp_import

EMAIL NOTIFICATION NOTE: ** indicates required if batch.sentMail is true)
batch.sendEmail=true Activate email (default: true)
**batch.smtp.host=localhost host name (default: localhost)
**batch.smtp.port=25 port (default: 25)
**batch.smtp.protocol=smtp protocol to be used (default: smtp)
**batch.smtp.username=sysadmin username (default: sysadmin)
**batch.smtp.password=password password (default: password)

RECIPIENTS - COMMA SEPERATED address as string. Address must follow RFC822 syntax. (default: "SSP DATA IMPORTER"<sysadmin@localhost>)
**batch.email.recipients="SSP DATA IMPORTER"<sysadmin@localhost>
**batch.email.replyTo="SSP DATA IMPORTER"<sysadmin@localhost>

CSV FILE FORMAT

Name of csv file MUST correspond to the name of the external_table that is to be imported. example: external_person.csv
format supported is comma delimite. Quote character is " and the escape character is also "
Each file MUST have an initial line containing the column names in the format of the external table. example for external_faculty_course first line would contain: faculty_school_id,term_code,formatted_course,title,section_code,section_number

Partial data updates are supported. However, in addition to the data being updated the key values for the specific table MUST be supplied.

For SSP version 2.0 keys for each external table:

TABLE	KEY
external_course	code
external_course_program	program_code, course_code
external_course_requisite	requiring_course_code, required_course_code, requisite_code
external_course_tag	course_code, tag
external_course_term	course_code, term_code
external_department	code
external_division	code
external_faculty_course	(NONE) recommended faculty_school_id,term_code,formatted_course,title
external_faculty_course_roster	(NONE) recommended faculty_school_id,school_id,term_code,formatted_course,title
external_person	school_id
external_person_note	code
external_person_planning_status	school_id
external_program	code
external_registration_status_by_term	school_id,term_code
external_student_academic_program	school_id, degree_code, program_code
external_student_financial_aid	school_id
external_student_test	school_id, test_code, sub_test_code, test_date, discriminator
external_student_transcript	school_id
external_student_transcript_course	school_id, term_code, formatted_course, section_code
external_student_transcript_term	school_id, term_code
external_term	code

For a complete list of column names and constraints please see the relevant mappings for your version of SSP.

For questions on csv formatting please see expected csv format. Empty (including those with all whitespace) strings will be entered as null values.
acceptable examples:
1. no quotes
  faculty_school_id,term_code,formatted_course,title,section_code,section_number
  sherman123,FA12,Biology 101,Introduction To Biology,,
2. All quotes (note the enclosed comma in column 1):
```
 "faculty_school_id","term_code","formatted_course","title","section_code","section_number" 
```
```
"sherman, 123","FA12","Biology 101","Introduction To Biology","",""
```
3. All quotes escaped quote(note the enclosed comma in column 1):
  "faculty_school_id","term_code","formatted_course","title","section_code","section_number"
  "sherman"", 123","FA12","this is a string "", with escaped quotes","Introduction To Biology","",""

SSP-DATA-IMPORTER