emp::DataFile
output should comply with CSV standard RFC 4180
#489
Labels
emp::DataFile
output should comply with CSV standard RFC 4180
#489
Is your feature request related to a problem? Please describe.
Serialization through
emp::DataFile
and deserialization throughemp::File
are defaulted to work with CSV format, but by default only support a subset of the format.For example, this file
should be read as
according to RFC 4180. However, it would currently read as
Note that in the RFC 4180-compliant version, the quotes around "g" are interpreted as enclosing a single field, making the actual value g.
In the current reading, the quotes are being interpreted literally, so the field reads as """g""".
Describe the solution you'd like
A clear and concise description of what you want to happen.
Probably, for performance reasons, the
emp::DataFile
andemp::File
default behavior should not change.However, RFC4180 modes or classes should be available.
In debug mode,
emp::DataFile
/emp::File
should probably warn of RFC4180 noncompliance where pertinent.An easy way to do this would be comparing results with RFC4180-enabled interpretation and warning naive interpretation differs.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Users could currently get part of the way by setting the beginning, separator, and end delimiters to
"
,","
and"
, respectively for serialization.This delimiter kludge wouldn't work as a deserialization solution because it would fail on plain csv files like
For serialization, this delimiter kludge would add unnecessary quotes to lots of csv output without properly escaping
"
's in output strings as""
.Additional context
Find RFC 4180 here.
The pertinent content is:
The text was updated successfully, but these errors were encountered: