Better error descriptions if CSV is invalid #215

kspurgin · 2024-08-02T16:49:12Z

Improve error message for invalid character encoding in CSV
0e1b7e4

As long as CSV is getting invalid_encoding error(s), only the new
message will be shown. This is because encoding errors sometimes cause
structural errors that cascade through the file, causing hundreds of
error messages about structure in which the message about encoding
gets lost. Often if the encoding is fixed, the structural errors
disappear. This makes it obvious/clear that fixing the invalid
character encoding is the required.
Shows the first occurrence of each invalid character in context, so
users can more easily find/fix the issues

Show errors for Csvlint-valid files CSV library can't parse
6b3586f

Resolves #213
Currently, you are allowed to proceed to run the pre-processing job
with such a file, but the job fails when processing hits a row that
causes a CSV::MalformedCSVError. Unfortunately, this often occurs at
the very end of the file.

This commit adds a separate csv_parse_validator_for(batch) method
that is only called if Csvlint reports the file is valid.

CSV-Parsing the whole file before creating the batch may seem like
overkill, but (a) I can't come up with a better way to catch all
possible issues; and (b) it seems better than using system
resources/user time to pre-process a whole batch that is only going to
fail at the end.

Thinking forward: If we are eventually moving to using the database to
store information about each row and its state/status, then this step
can be reworked to serve a dual purpose: (1) prepare each row to be
added to database if we are able to prepare all rows successfully;
and (2) if not able to prepare all rows successfully, show error
messages and destroy batch.

- As long as CSV is getting invalid_encoding error(s), only the new message will be shown. This is because encoding errors sometimes cause structural errors that cascade through the file, causing hundreds of error messages about structure in which the message about encoding gets lost. Often if the encoding is fixed, the structural errors disappear. This makes it obvious/clear that fixing the invalid character encoding is the required. - Shows the first occurrence of each invalid character in context, so users can more easily find/fix the issues Improve error message for invalid character encoding in CSV - As long as CSV is getting invalid_encoding error(s), only the new message will be shown. This is because encoding errors sometimes cause structural errors that cascade through the file, causing hundreds of error messages about structure in which the message about encoding gets lost. Often if the encoding is fixed, the structural errors disappear. This makes it obvious/clear that fixing the invalid character encoding is the required. - Shows the first occurrence of each invalid character in context, so users can more easily find/fix the issues

Currently, you are allowed to proceed to run the pre-processing job with such a file, but the job fails when processing hits a row that causes a CSV::MalformedCSVError. Unfortunately, this often occurs at the very end of the file. This commit adds a separate `csv_parse_validator_for(batch)` method that is only called if Csvlint reports the file is valid. CSV-Parsing the whole file before creating the batch may seem like overkill, but (a) I can't come up with a better way to catch all possible issues; and (b) it seems better than using system resources/user time to pre-process a whole batch that is only going to fail at the end. Thinking forward: If we are eventually moving to using the database to store information about each row and its state/status, then this step can be reworked to serve a dual purpose: (1) prepare each row to be added to database if we are able to prepare all rows successfully; and (2) if not able to prepare all rows successfully, show error messages and destroy batch. Show errors for Csvlint-valid files CSV library can't parse Currently, you are allowed to proceed to run the pre-processing job with such a file, but the job fails when processing hits a row that causes a CSV::MalformedCSVError. Unfortunately, this often occurs at the very end of the file. This commit adds a separate `csv_parse_validator_for(batch)` method that is only called if Csvlint reports the file is valid. CSV-Parsing the whole file before creating the batch may seem like overkill, but (a) I can't come up with a better way to catch all possible issues; and (b) it seems better than using system resources/user time to pre-process a whole batch that is only going to fail at the end. Thinking forward: If we are eventually moving to using the database to store information about each row and its state/status, then this step can be reworked to serve a dual purpose: (1) prepare each row to be added to database if we are able to prepare all rows successfully; and (2) if not able to prepare all rows successfully, show error messages and destroy batch.

kspurgin · 2024-08-09T01:36:34Z

Splitting this what this was originally going to cover into two PRs

kspurgin marked this pull request as draft August 2, 2024 16:53

kspurgin force-pushed the DRYD-1488 branch 2 times, most recently from 7ad66fe to 4ad4944 Compare August 2, 2024 21:54

kspurgin force-pushed the DRYD-1488 branch from 4ad4944 to 923367a Compare August 2, 2024 21:55

kspurgin marked this pull request as ready for review August 2, 2024 21:58

kspurgin requested review from mark-cooper and removed request for mark-cooper August 2, 2024 21:58

kspurgin marked this pull request as draft August 2, 2024 22:20

kspurgin marked this pull request as ready for review August 9, 2024 01:36

kspurgin closed this Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better error descriptions if CSV is invalid #215

Better error descriptions if CSV is invalid #215

kspurgin commented Aug 2, 2024 •

edited

Loading

kspurgin commented Aug 9, 2024

Better error descriptions if CSV is invalid #215

Better error descriptions if CSV is invalid #215

Conversation

kspurgin commented Aug 2, 2024 • edited Loading

kspurgin commented Aug 9, 2024

kspurgin commented Aug 2, 2024 •

edited

Loading