Page Comparison

...

[11:47:44 CST(-0600)] <dmccallum54> non-specific

[11:47:52 CST(-0600)] <TonyUnicon> you've been talking to me too much

[11:48:16 CST(-0600)] <TonyUnicon> ok, so then I'm not sure what we were arguing about on the call

[11:48:34 CST(-0600)] <TonyUnicon> I thought we were talking about what we wanted to do before we wrote to the database in terms of validation

[11:48:55 CST(-0600)] <dmccallum54> what i heard on the call was a proposal to move all validations to database operations

[11:49:12 CST(-0600)] <dmccallum54> specifically to attempt a non-batched insert into stage tables for each raw record

[11:49:18 CST(-0600)] <TonyUnicon> right right, ok I didn't know we definitely wanted to do that

[11:49:28 CST(-0600)] <TonyUnicon> that is the argument I'm making, lean on the DB

[11:50:15 CST(-0600)] <TonyUnicon> i dont think we want to do non-batched inserts

[11:51:04 CST(-0600)] <TonyUnicon> especially if we want to use indices

[11:51:27 CST(-0600)] <TonyUnicon> I think

[11:51:31 CST(-0600)] <TonyUnicon> despite what the doc says

[11:51:46 CST(-0600)] <dmccallum54> if we use the db entirely, one downside is we can report at most one error per row

[11:51:51 CST(-0600)] <TonyUnicon> we can determine the bad row via the database error itself and not any sort of state the framework stores

[11:52:09 CST(-0600)] <dmccallum54> the other is the row-specificity issue that you're talking about now

[11:53:29 CST(-0600)] <TonyUnicon> right

[11:53:33 CST(-0600)] <TonyUnicon> the more in memory validation we have

[11:53:51 CST(-0600)] <TonyUnicon> the more detail we can give the issue on the validity of the file

[11:54:06 CST(-0600)] <TonyUnicon> we can probably identify all bad rows in one shot, and maybe multiple errors per row

[11:54:09 CST(-0600)] <TonyUnicon> relying on the database

[11:54:16 CST(-0600)] <TonyUnicon> would mean it would fail fast on the first error

[11:54:21 CST(-0600)] <TonyUnicon> which could mean more iteration

[11:54:37 CST(-0600)] <TonyUnicon> so

[11:54:56 CST(-0600)] <TonyUnicon> I think that is a good enough reason to try to put as much validation in the java as we can

[11:55:05 CST(-0600)] <dmccallum54> that is still my vote

[11:55:14 CST(-0600)] <TonyUnicon> the ayes have it

[11:55:30 CST(-0600)] <dmccallum54> poking around on SO re error granularity in batched jdbc statements

[11:55:53 CST(-0600)] <dmccallum54> looks like identifying the bad row might be a bit driver specific, if possible at all

[11:56:30 CST(-0600)] <TonyUnicon> ok, well in that case i'll do my best to put as much info into the logs as we can, at least for postgres and sqlserver

[11:56:31 CST(-0600)] <dmccallum54> i.e. if the driver keeps ploughing ahead after failed statements, getUpdateCounts wont help

[11:57:59 CST(-0600)] <dmccallum54> cool. sounds like we are agreed, then

[11:58:04 CST(-0600)] <TonyUnicon> yep

[11:58:06 CST(-0600)] <TonyUnicon> thanks

[12:22:11 CST(-0600)] <js70> interesting: https://github.com/42BV/jarb/ https://blog.42.nl/articles/using-database-constraints-in-java/

[12:26:12 CST(-0600)] <js70> so, an outstanding question that I have, we still need a little information to start using the metadata for validation. mainly the table name. that is going to come from the file name correct? headers contain the columnnames and away we go?

[12:27:41 CST(-0600)] <dmccallum54> simplest thing would be for the file names to match table names and file column headers to match db column names

[12:27:57 CST(-0600)] <dmccallum54> the result, of course, is that if the db names change, the file protocol changes

[12:28:22 CST(-0600)] <dmccallum54> but… the advantage is that it's totally obvious how to go from our published spec for the db tables to what your CSV files need to look like

[12:29:08 CST(-0600)] <dmccallum54> so my vote is to try to get as far as we can with unmapped correlations between file/table and column/column names

[12:30:20 CST(-0600)] <js70> k.

Versions Compared

Old Version 22

New Version Current

Key