Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore encryption #653

Merged
merged 9 commits into from
Dec 1, 2023
Merged

Ignore encryption #653

merged 9 commits into from
Dec 1, 2023

Conversation

unixnut
Copy link
Contributor

@unixnut unixnut commented Nov 22, 2023

Type of pull request

  • New feature (involves code and configuration changes)
  • Documentation update

About

In some cases PDF files may be internally marked as encrypted even though the content is not encrypted and can be read.

This MR provides a config option to inform the PDF parser to ignore the encryption and attempt to read the PDF anyway.

This therefore provides a work around for the following issues:

Thanks to @DivineOmega for making the original pull request.

(PHP-CS-Fixer has not been run.)

Jordan Hall and others added 4 commits August 14, 2023 08:44
doc/Usage.md:

  - Moved description of `setIgnoreEncryption` option to doc/CustomConfig.md
  - Added brief "PDF encryption" section

doc/CustomConfig.md: added `setIgnoreEncryption` option and section to describe it.

src/Smalot/PdfParser/Config.php: Doc comment for Config::setIgnoreEncryption()

Added tests/PHPUnit/Integration/EncryptionTest.php

Added samples/not_really_encrypted.pdf (thanks to @parijke who
orginially created this as test.pdf)

See #653
@unixnut
Copy link
Contributor Author

unixnut commented Nov 22, 2023

PHP-CS-Fixer has been run successfully.

All tests passing including new ones in tests/PHPUnit/Integration/EncryptionTest.php

@unixnut unixnut marked this pull request as ready for review November 22, 2023 14:21
@unixnut
Copy link
Contributor Author

unixnut commented Nov 23, 2023

@k00ni Looks like there were pre-existing PHP-CS-Fixer violations in the code; I ignored those because I assumed they were OK. (And just fixed ones in files I added/changed.) Might have come from PR 632 commits.

Any chance you could triage these, please?

@k00ni
Copy link
Collaborator

k00ni commented Nov 23, 2023

@unixnut thank you for this pull request. I am very busy these days, but I will try to get back to you until next week. Only skimmed your code, but it looks great so far.

@unixnut
Copy link
Contributor Author

unixnut commented Nov 23, 2023

More re. PHP-CS-Fixer:

  • src/Smalot/PdfParser/PDFObject.php is showing up in the list from GitHub actions, but not locally for me
  • src/Smalot/PdfParser/Config.php was my bad, fixed
  • src/Smalot/PdfParser/Parser.php is showing up locally for me (blank_line_before_statement, line 80 in parseFile()) but not in the list from GitHub actions
  • src/Smalot/PdfParser/RawData/RawDataParser.php (statement_indentation, line 405) appears to be spurious, i.e. the correction is bad
  • src/Smalot/PdfParser/Document.php (statement_indentation, line 258): comment line should be moved into else clause

Except the one I fixed, I believe all of these are out of scope for the PR.

@k00ni
Copy link
Collaborator

k00ni commented Nov 23, 2023

CS related stuff doesn't really matter, because I will clean/correct it myself locally, if needed. The CI stuff is important and there are no problems in your code. Thank you for investing extra time on that 👍

@unixnut
Copy link
Contributor Author

unixnut commented Nov 30, 2023

Hi, @k00ni Sorry to bug you, but it looks like the PR is ready to approve. Any ideas when you will have merged it and released a new version? This is so I can give an update to my client.

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason my Git was not able to push my local changes back into your branch, even though you allowed changes by maintainers. I had to make the changes by hand, therefore the messy history.

Summary:

  • Adapted a few texts, only minor changes.
  • Moved the tests into ParserTest, because a separate EncryptionTest class makes no sense at this point. Our tests need better organisation, but for now that the most practical solution I can think of.

Your pull request is not based on our latest master branch. I tried it locally and everything worked fine. Please check that next time.

I will merge it later this day and can prepare a new version. But it will be a release candidate, because we are in the middle of a big transition and watching how things are going.

@k00ni k00ni merged commit 268a620 into smalot:master Dec 1, 2023
28 of 29 checks passed
@k00ni
Copy link
Collaborator

k00ni commented Dec 2, 2023

@UnnitMetaliya
Copy link

@k00ni any idea why this was deprecated later on? Do we have a permanent fix? If yes, what is that?

@k00ni
Copy link
Collaborator

k00ni commented Sep 17, 2024

It was marked deprecated because our current implementation is incomplete/not working. In case of a false positive one can override the check setting the appropriate config value (we had a few issues with those already). In the future there might be someone who implements this part and then this config value can be removed.

FYI: https://github.com/smalot/pdfparser/blob/master/doc/Usage.md#pdf-encryption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants