Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseencoding fallback #669

Merged
merged 4 commits into from
Feb 2, 2024
Merged

Conversation

GreyWyvern
Copy link
Contributor

When a document doesn't include a BaseEncoding header, StandardEncoding should be assumed as the default instead of an empty string.

Type of pull request

  • Bug fix (involves code and configuration changes)

About

Some documents which are short-and-sweet may not include a BaseEncoding header. In this case, the PDF Reference 1.7 describes this encoding as a default.

Chapter 5, page 426:

Latin-text font programs produced by Adobe Systems use the Adobe standard encoding, often referred to as StandardEncoding. The name StandardEncoding has no special meaning in PDF, but this encoding does play a role as a default encoding.

Section 5.5, page 431:

  • If the Encoding entry is a dictionary, the table is initialized with the entries from the dictionary's BaseEncoding entry (see Table 5.11). Any entries in the Differences array are used to update the table. Finally, any undefined entries in the table are filled using StandardEncoding.

If the result of checking for the BaseEncoding returns an empty string, use StandardEncoding as the value instead. Resolves #665.

Checklist for code / configuration changes

  • Please add at least one test case (unit test, system test, ...) to demonstrate that the change is working. If existing code was changed, your tests cover these code parts as well.
  • Please run PHP-CS-Fixer before committing, to confirm with our coding styles. See https://github.com/smalot/pdfparser/blob/master/.php-cs-fixer.php for more information about our coding styles.
  • In case you fix an existing issue, please do one of the following:
    • Write in this text something like fixes #1234 to outline that you are providing a fix for the issue #1234.

When a document doesn't include a BaseEncoding, 'StandardEncoding' should be assumed as the default instead of an empty string.
@GreyWyvern
Copy link
Contributor Author

PHP CS Fixer is complaining about indentation in Document.php, PDFObject.php and RawData\RawDataParser.php. Files I didn't even modify. :( Running PHP CS Fixer on my local (Windows) machine doesn't find these issues either.

@k00ni k00ni added the fix label Jan 25, 2024
@k00ni
Copy link
Collaborator

k00ni commented Jan 25, 2024

I merged #670 into master which fixes these coding style issues. Please merge master in to get rid of them.

@k00ni k00ni merged commit 4db3b81 into smalot:master Feb 2, 2024
29 checks passed
@k00ni
Copy link
Collaborator

k00ni commented Feb 2, 2024

Thank you!

@GreyWyvern GreyWyvern deleted the baseencoding-fallback branch February 14, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Attempting to parse a PDF with a form field with two dots in it causes pdfparser to throw an exception
2 participants