Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression when reading partially broken PDF files #2926

Open
stefan6419846 opened this issue Oct 29, 2024 · 0 comments
Open

Regression when reading partially broken PDF files #2926

stefan6419846 opened this issue Oct 29, 2024 · 0 comments
Labels
is-regression Regression introduced as a side-effect of another change PdfReader The PdfReader component is affected

Comments

@stefan6419846
Copy link
Collaborator

https://github.com/py-pdf/sample-files/blob/main/017-unreadable-meta-data/unreadablemetadata.pdf stopped working at some point in time after the 3.17.0 release (3.17.4 appeared fine as well).

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.4.0-150600.23.25-default-x86_64-with-glibc2.38

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.1.0, crypt_provider=('pycryptodome', '3.18.0'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader('sample-files/017-unreadable-meta-data/unreadablemetadata.pdf')
list(reader.pages)

Traceback

This is the complete traceback I see:

Invalid parent xref., rebuild xref
parsing for Object Streams
Object 172 0 not defined.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stefan/temp/pdf/pypdf/_page.py", line 2520, in __len__
    return self.length_function()
  File "/home/stefan/temp/pdf/pypdf/_doc_common.py", line 354, in get_num_pages
    self._flatten(self._readonly)
  File "/home/stefan/temp/pdf/pypdf/_doc_common.py", line 1163, in _flatten
    raise PdfReadError("Invalid object in /Pages")
pypdf.errors.PdfReadError: Invalid object in /Pages
@stefan6419846 stefan6419846 added PdfReader The PdfReader component is affected is-regression Regression introduced as a side-effect of another change labels Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-regression Regression introduced as a side-effect of another change PdfReader The PdfReader component is affected
Projects
None yet
Development

No branches or pull requests

1 participant