jasig-ssp IRC Logs-2013-11-06

[11:09:48 CST(-0600)] <dmccallum54> TonyUnicon here is the section of the SpringBatch reference docs i was talking about w/r/t error granularity

[11:09:51 CST(-0600)] <dmccallum54> http://docs.spring.io/spring-batch/reference/html-single/index.html#databaseItemWriters

[11:10:57 CST(-0600)] <dmccallum54> what i'd like to do is preserve the ability to report very fine-grained validation errors and the flexibility to aggressively batch inserts to the staging tables for performance (using JdbcBatchItemWriter perhaps)

[11:12:18 CST(-0600)] <dmccallum54> because of that, we can't rely exclusively on db inserts to check the validity of inbound flat file records

[11:12:59 CST(-0600)] <dmccallum54> that's not to say all inserts are guaranteed to succeed, but the idea is to catch as many fine-grained errors as early and as cheaply as possible using in memory validations

[11:15:35 CST(-0600)] <dmccallum54> js70… for db metadata… http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getColumns(java.lang.String, java.lang.String, java.lang.String, java.lang.String) and http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String, java.lang.String, java.lang.String)

[11:18:17 CST(-0600)] <TonyUnicon> reading now

[11:23:45 CST(-0600)] <dmccallum54> key non-uniqueness is one example of an invalidity that's not going to be easily detectable until inserts. in that case we're going to need to accept that the error granularity will be unfortunately coarse

[11:24:03 CST(-0600)] <js70> yep

[11:24:25 CST(-0600)] <dmccallum54> unless we require you to deploy on unix and we just call out to the shell… then it's trivial

[11:25:34 CST(-0600)] <TonyUnicon> so

[11:25:45 CST(-0600)] <TonyUnicon> im fine wish using the in memory approach btw

[11:25:55 CST(-0600)] <TonyUnicon> but I think what the doc itself is saying

[11:26:04 CST(-0600)] <TonyUnicon> there is no way for the framework to know which one failed

[11:26:36 CST(-0600)] <TonyUnicon> but you should be able to tell from the database error you can propagate into the logs which row failed

[11:27:11 CST(-0600)] <TonyUnicon> if you're worried about partial writes

[11:27:54 CST(-0600)] <TonyUnicon> if we really want restartability to be an option we have to be able to pick up after a failed batch

[11:28:24 CST(-0600)] <TonyUnicon> and from what I gather from the frame, it may be for free

[11:28:31 CST(-0600)] <TonyUnicon> framework docs*

[11:29:11 CST(-0600)] <TonyUnicon> it will be easy to see as I write the code

[11:32:16 CST(-0600)] <TonyUnicon> and btw, if the validations have to be done by running queries

[11:32:24 CST(-0600)] <TonyUnicon> and those queries aren't batched

[11:32:39 CST(-0600)] <TonyUnicon> won't that be expensive as well?

[11:32:57 CST(-0600)] <TonyUnicon> just playing devils advocate

[11:33:22 CST(-0600)] <js70> you make a good one.

[11:33:51 CST(-0600)] <TonyUnicon> just a single intersection query?

[11:34:08 CST(-0600)] <TonyUnicon> i could look at the groovy code

[11:35:49 CST(-0600)] <js70> It was my understanding that the initial validation would not be against the database. The groovy was slow each bean was porcessed with a select/ update or insert.

[11:36:24 CST(-0600)] <js70> but after a bean validation processing step

[11:37:32 CST(-0600)] <TonyUnicon> if my understanding of the 'itemWriter' validation is correct, we want to catch these natural key type errors against existing external data… youve had to goto the db

[11:37:35 CST(-0600)] <js70> In use it was made a use difference validating the customer data. I know that is a concern

[11:38:34 CST(-0600)] <TonyUnicon> but i guess you can write one giant query for the whole file

[11:38:34 CST(-0600)] <js70> eventually, first step was to ensure Natural Keys were not null.

[11:38:50 CST(-0600)] <TonyUnicon> that would be the raw data validation i would expect

[11:39:09 CST(-0600)] <TonyUnicon> i dont think the raw data should need to concern itself with existing data

[11:39:09 CST(-0600)] <dmccallum54> the "raw data" validations that jim was running against beans did not involve database interactions

[11:39:16 CST(-0600)] <TonyUnicon> right

[11:39:21 CST(-0600)] <TonyUnicon> and I would expect that

[11:39:24 CST(-0600)] <dmccallum54> and you're right… it did not concern itself with existing data

[11:40:00 CST(-0600)] <TonyUnicon> so would you expect the validation step to have in one giant query

[11:40:14 CST(-0600)] <TonyUnicon> or on a unbatched - query per row basis

[11:40:35 CST(-0600)] <TonyUnicon> an*