-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to ignore PDF encryption check #632
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @DivineOmega, it is a helpful addition!
I only have a few things:
- Please add a simple test or two, to prove it is working as intended and to avoid regressions later on
- Would you mind adding a section to https://github.com/smalot/pdfparser/blob/master/doc/CustomConfig.md as well?
This is a good addition, but as the OP says, this is a workaround. Eventually in the future the simple check in if (isset($xref['trailer']['encrypt'])) {
throw new \Exception('Secured pdf file are currently not supported.');
} It should be taken into account that a future fix for this would obsolete the use of the config option being added here. That's probably the only thing I don't like about this change. |
@DivineOmega Are you still with us here? |
Hi. Sorry for the delayed response. Things have been busy recently. I didn't end up actually using this functionality myself. I found that a majority of the PDFs I ignored the encryption check for would actually be parsed as containing no text or limited useful text. I'm not sure why this is and so my workaround here ended up not being useful for my use case. This library still provides some of the best parsing I've found. My solution was to use an alternative parser if this one detected an encrypted PDF. |
@k00ni Can you please reopen and merge this, as in some cases the PDFs are from a predictable origin and are readable but are marked as encrypted. I believe it is up to the caller to test that the data they get is valid. I am willing to write the test (using test.pdf from #488) and the docs. But first I would need agreement that the merge would be done if those conditions are met. Thanks. |
@unixnut Thank you for your interest. You have my full support. It would be great if we could agree on the following list:
|
In some cases PDF files may be internally marked as encrypted even though the content is not encrypted and can be read.
This MR provides a config option to inform the PDF parser to ignore the encryption and attempt to read the PDF anyway.
This therefore provides a work around for the following issues: