Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove to string for dictionary pages (#138)
# Problem As reported in #121 and #131 there are issues with the parsing of `FIXED_LEN_BYTE_ARRAY` values. This may be caused by the fact that they are converted to strings as part of the decoding process for DICTIONARY_PAGE types. This should close both issues. Solution ======== By removing the `toString` conversion this removes the error, and byte array values get returned as Buffers, as well as also returning numeric types as numbers, and not as strings. To try to check that the change would only impact things as expected, I also added some regression testing for all of the reference files. For this I've added a check for the first record returned by each reference file. The process to get this data was to add the test, see the error message and then fix the test. With this regression in place, I then removed the `toString` call, and saw how the tests failed. The errors were all in places where the schema was returning either numeric types or things like timestamps or byte arrays. Previously these would be getting stringified if the file was using a dictionary page. **Note, this does cause a breaking change to the way that these dictionary page files are decoded, as the values being returned will no longer be strings by default, and this may create issues for some consumers as the runtime types will change.** There may be an improvement to add something like snapshot testing for the reference-test files so that the whole file can be compared without having to manually write it all out. I know this is natively supported in jest/vitest but I'm not sure how to do it in mocha. ## Change summary: - Add testing to validate the first record in each reference file - Remove the `toString` conversion when decoding dictionary pages ## Steps to Verify: It may be worth double checking that some of the schemas for some of the reference files line up with the values that I've added. I've split it into two commits so that the first commit is adding in the regression, and the second commit is removing the `toString`. Additionally I've ran the test added in #121, the test needs to be updated to either parse the age value from a buffer to the number, or to compare the ages as Buffers. The `id` is also now returned as a bigint as the column is an int64. ``` assert.deep,Equal(records, [{ id: 10n, age: Buffer.from([0,0,0,0,0,0,0,0,0,0,0,9,199,101,36,0]), full_name: 'Jonathan Cohen' }, { id: 11n, age: Buffer.from([0,0,0,0,0,0,0,0,0,0,0,0,178,208,94,0]), }]) ``` The data recorded in #131 is returned as a buffer, but I've not fully checked the contents of that buffer that is returned.
- Loading branch information