Replies: 3 comments 12 replies
-
What precisely do you mean by "forces my script to exit"? You need to paste the actual error, otherwise I can't really judge what this may be about. Is it a python traceback, or even a C crash? However, I have two notes already:
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much for following up and reporting the bug. FYI: I'm on a Windows 11 machine, Python3, and I ran the script in VSCode and IDLE, and on the Windows command line, all with the same result. I tried with your patch from the devel_new branch, but it produced the same result. Good catch that it's the pages with transparency that fail; I also noticed that if I print the ImageNotExtractableError in the The only other file the script failed with, is from the same creator: 1143CabdGhaniNabulusi.HadraUnsiyya.pdf My script is part of a pipeline that should process dozens of texts at a time. Although the problem seems to be with pdfium, it would be great if the pypdfium call to |
Beta Was this translation helpful? Give feedback.
-
Brilliant, it worked! My pipeline now works without the annoying exit. |
Beta Was this translation helpful? Give feedback.
-
I'm trying to extract images from PDFs by looping over the page objects; if there's only one image object and no text object of the page, I try to extract the image (to retain the original quality of the image); in all other cases, I render the page and save the rendered page as an image (see the code below).
In some cases, pypdfium fails to extract the image directly (without rendering it); this has to do with pdfium itself, as described here: https://issues.chromium.org/issues/42270939. In those cases, I'm rendering the page and storing the rendered page as an image.
However, with some PDFs, this doesn't work either, as the call to pdfium_c.FPDFImageObj_GetBitmap() forces my script to exit instead of returning an error, as it is designed to do:
Does anyone have an idea why the code exits instead of raising an error, and how I could solve this?
This is an example of a PDF where this fails: 0309Hallaj.Diwan.pdf
This is my (slightly simplified) code:
Beta Was this translation helpful? Give feedback.
All reactions