Better error descriptions if CSV is invalid #215
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improve error message for invalid character encoding in CSV
0e1b7e4
message will be shown. This is because encoding errors sometimes cause
structural errors that cascade through the file, causing hundreds of
error messages about structure in which the message about encoding
gets lost. Often if the encoding is fixed, the structural errors
disappear. This makes it obvious/clear that fixing the invalid
character encoding is the required.
users can more easily find/fix the issues
Show errors for Csvlint-valid files CSV library can't parse
6b3586f
Resolves #213
Currently, you are allowed to proceed to run the pre-processing job
with such a file, but the job fails when processing hits a row that
causes a CSV::MalformedCSVError. Unfortunately, this often occurs at
the very end of the file.
This commit adds a separate
csv_parse_validator_for(batch)
methodthat is only called if Csvlint reports the file is valid.
CSV-Parsing the whole file before creating the batch may seem like
overkill, but (a) I can't come up with a better way to catch all
possible issues; and (b) it seems better than using system
resources/user time to pre-process a whole batch that is only going to
fail at the end.
Thinking forward: If we are eventually moving to using the database to
store information about each row and its state/status, then this step
can be reworked to serve a dual purpose: (1) prepare each row to be
added to database if we are able to prepare all rows successfully;
and (2) if not able to prepare all rows successfully, show error
messages and destroy batch.