Skip to content

Commit

Permalink
Use mb_detect_encoding() instead of finfo()
Browse files Browse the repository at this point in the history
The Fileinfo functions are not installed by default on Windows, so use a different method to determine whether the stream is valid or binary.
  • Loading branch information
GreyWyvern committed Aug 22, 2023
1 parent 2167541 commit 42d3ec6
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions src/Smalot/PdfParser/PDFObject.php
Original file line number Diff line number Diff line change
Expand Up @@ -187,13 +187,12 @@ public function cleanContent(?string $content): string

// Now that all strings and dictionaries are hidden, the only
// PDF commands left should all be plain text.
// Detect MIME-type of the current string and prevent reading
// Detect text encoding of the current string to prevent reading
// content streams that are images, etc. This prevents PHP
// error messages when JPEG content is sent to this function
// by the sample file '12249.pdf' from:
// https://github.com/smalot/pdfparser/issues/458
$finfo = new \finfo(\FILEINFO_MIME);
if (false === strpos($finfo->buffer($content), 'text/plain')) {
if (false === mb_detect_encoding($content, null, true)) {
return '';
}

Expand Down

0 comments on commit 42d3ec6

Please sign in to comment.