Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize encoding conversion for FPDF #260

Merged
merged 4 commits into from
Sep 5, 2024

Conversation

kohlerdominik
Copy link
Contributor

@kohlerdominik kohlerdominik commented Sep 3, 2024

Fixes #245

I'm not entirly shure what the method toUtf8 did, but surely not convert to UTF-8. It converted away from UTF-8; my best guess it that it's to replace characters that are not allowed in the payment slip?

ISO-8859-1 should be correct here, as the specs require Latin Character Set. But maybe there was a specific reason that the previous characterset was chosen?

Also, iconv is not able to handle MultiByteCharacters (🦓🦓🦓), so maybe we can fix that by using mb_* instead. I explicitly added the Symfony mb_*-polyfill, as its an indirect dependency anyway, so it will not bloat the package.

This PR should be considered as breaking change.

@sprain
Copy link
Owner

sprain commented Sep 4, 2024

Thanks for the PR 🙌

Some background:
The initial functionality stems from this PR. The approach seems to originate here.

The method name toUtf8() is definitely wrong – as you said, it does the opposite. However, I think substituteUnsupportedCharacters() is also not the best, as it could be confused with what we did here.

So maybe we could call it setEncoding()? It really is about encoding.

Regarding the encoding itself, I think your approach here should be fine. But I am no expert on this topic, so it would be great if @supercosh could give it a test run, as mentioned in #245 (comment).

This PR should be considered as breaking change.

I am not sure if it's really needed. Is there another reason besides unsupported horse emojis? 😁🦓🦓

@kohlerdominik
Copy link
Contributor Author

Hi @sprain

Thanks for your information, that cleared up a lot.

Your link provides information, why this was introduced to FpdfOutput. So my guess was not accurate. The real problem is, that FPDF does only support ISO-8859-1 and its extension Windows-1252. This should be easy fixable by mb_convert_encoding: it supports Windows-1252 without relying on the environment (OS). Only problem is, that if the mbstring-extension is not available, the Symfony-Polyfill will fall back to iconv again.

For TcPdfOutput, the conversion seems completely useless (at least acording to the tests?). Maybe you just copied over from FpdfOutput in 65f5b6a?. So I suggest to remove it there.

So I just updated my PR. This should be 100% backwards compatible, while fixing #245:

  • If ext-mbstring is installed, it should work perfectly
  • If fallback symfony/polyfill-mbstring is used, it should not throw an error for @supercosh, as i changed the encoding from Windows-1252 to CP1252. In mbstring, they are aliases anyway.

It might change output in edge-cases though, so the release should be tagged accordingly.

@kohlerdominik kohlerdominik changed the title Fixed bad support for substitute of unsupported characters in PDF output Modernize encoding conversion for FPDF Sep 4, 2024
@sprain sprain merged commit 3805597 into sprain:master Sep 5, 2024
9 checks passed
@supercosh
Copy link

Regarding the encoding itself, I think your approach here should be fine. But I am no expert on this topic, so it would be great if @supercosh could give it a test run, as mentioned in #245 (comment).

My answer is a bit late, but I can confirm the fix is working. I see that the fix is already in the master and released. Thank you all for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Character set windows-1252 should be CP1252 under Unix
3 participants