Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Better octal and hex-entity decode (#640)
* Better octal and hex-entity decode Octal strings can include series of backslashes of arbitrary length. If there is an odd number of backslashes, a following octal code is valid, but if there's an even number, the following octal code should not be translated. Previously PdfParser would only account for two backslashes directly preceding an octal code. A commit from in-progress PR #634 extended this to three which probably covers 99.99% of all cases. This change ups that to 100% in that there could be a string with any number of backslashes in a row, and codes will be correctly translated. Also update decodeEntities() to use a preg_replace_callback() instead of the bulkier preg_split() + foreach loop. Make sure it matches all hexadecimal digits including a-f. Add new tests for both of these. * Use #2D to ensure we're capturing hex letters * Change order of special string replacement Move the special string replacement after the unescaping of parentheses so we don't unescape any parentheses we shouldn't. Add more tests to make sure this is working. * Apply suggestions from code review Co-authored-by: Konrad Abicht <[email protected]> --------- Co-authored-by: Konrad Abicht <[email protected]>
- Loading branch information