Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for adjacent escaped slashes and escaped parentheses in strings #711

Merged
merged 2 commits into from
Jun 5, 2024

Conversation

GreyWyvern
Copy link
Contributor

Type of pull request

  • Bug fix (involves code and configuration changes)

About

The current (string) replacement regexp in formatContent() only backchecked two characters for escaped slashes, so if an escaped slash immediately preceded an escaped parenthesis, the script would incorrectly interpret it as an escaped slash and an unescaped parenthesis. This would lead to the loop never finding the "end" of the string (for an open parenthesis) or finding the end of the string prematurely (for a close parenthesis).

Perform a string replace to get rid of all escaped slashes and then escaped parentheses; they aren't needed when just checking for balanced, unescaped parentheses. Also add removing slashes to the inline images section above for the same reason.

Resolves #709.

Checklist for code / configuration changes

In case you changed the code/configuration, please read each of the following checkboxes as they contain valuable information:

  • Please add at least one test case (unit test, system test, ...) to demonstrate that the change is working. If existing code was changed, your tests cover these code parts as well.
  • Please run PHP-CS-Fixer before committing, to confirm with our coding styles. See https://github.com/smalot/pdfparser/blob/master/.php-cs-fixer.php for more information about our coding styles.
  • In case you fix an existing issue, please do one of the following:
    • Write in this text something like fixes #1234 to outline that you are providing a fix for the issue #1234.

The regexp to detect strings only backchecked two characters for escape slashes, so if an escaped slash immediately preceded an escaped parenthesis, the script would interpret it as an escaped slash and an unescaped parenthesis.

Perform a string replace to get rid of all escaped slashes and parentheses; they aren't needed when just checking for balanced parentheses. Also add removing slashes to the inline images section above.
@k00ni
Copy link
Collaborator

k00ni commented May 14, 2024

@huihuangjiuai Does this fix #709 for you?

@k00ni k00ni added the fix label May 14, 2024
Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Short and clean.

I only moved the test code into a separate function to keep code readability up, otherwise test code concerning different issues is mixed, which makes debugging/maintenance more difficult in the future.

@k00ni k00ni self-assigned this May 15, 2024
@k00ni k00ni merged commit bd8abee into smalot:master Jun 5, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

preg_match(): Compilation failed: regular expression is too large at offset 38605
2 participants