Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail gracefully on bad UTF-8. #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

krokodilerian
Copy link

(an updated version, missed one change)

For some reason it's possible to get an invalid UTF8 input in downgrade,
which leads to a very weird stop of processing and errors. The problem
was observed on Request Tracker 4.2.8 with emails with utf8-encoded subject
lines.

This seems like a really hacky fix, and the issue might be somewhere else
on the stack, like some kind of a failure to mark a specific string as UTF-8,
but I have no idea how to track that down.

For some reason it's possible to get an invalid UTF8 input in downgrade,
which leads to a very weird stop of processing and errors. The problem
was observed on Request Tracker 4.2.8 with emails with utf8-encoded subject
lines.

This seems like a really hacky fix, and the issue might be somewhere else
on the stack, like some kind of a failure to mark a specific string as UTF-8,
but I have no idea how to track that down.
@ilmari
Copy link
Collaborator

ilmari commented Oct 13, 2015

This looks like you're passing character strings to DBD::Pg, but not using client_encoding=utf8.
Since version 3.2, the behaviour has been to deal in character strings for client_encoding=utf8 and byte strings (i.e. you need to do your own encoding and decoding) for other encodings.

Falling back to using the UTF-8 representation for non-downgradeable strings means you get inconsistently encoded data in your database: code points in the 0-255 range are represented as-is, while those above are represented by characters corresponding to the bytes in their UTF-8 encoding.

The error message and documentation of this clearly needs improving, though.

@pali
Copy link
Contributor

pali commented Aug 21, 2019

Previous code was correct, this change is wrong. You cannot pass values above 0xFF to binary fields which are used by pg_downgraded_sv as binary data must be in range 0x00 ... 0xFF. And when second parameter of sv_utf8_downgrade is false then that function throws an error on invalid data input -- so correct behavior.

@turnstep
Copy link
Contributor

@ilmari Does this need any work? A doc patch or anything?

@turnstep
Copy link
Contributor

Anyone want to improve the docs on this?

@pali
Copy link
Contributor

pali commented May 24, 2021

@turnstep: I agree that problem is not in code, but in "documentation", which can be improved. But I really do not have time to write documentation updates. But if somebody else is going to prepare documentation patch, I can find time for reviewing it!

@turnstep: But in case nobody is going to prepare and send documentation patch, please at least open an issue ticket that documentation needs to be improved. So in case somebody would be confused again we can at least redirect to documenation issue ticket. Meaning information about "documentation improvement is needed" is not lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants