Skip to content

Commit

Permalink
Use text matrix 'i' instead of 'b'
Browse files Browse the repository at this point in the history
The correct matrix elements to use for scaling the x-axis are actually the first *column*, so 'a' and 'i', not 'a' and 'b'. My bad! It worked before because almost always the x-axis scaling is equal to the y-axis scaling.
  • Loading branch information
GreyWyvern committed Aug 21, 2023
1 parent 449cdde commit 2167541
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 5 deletions.
8 changes: 4 additions & 4 deletions src/Smalot/PdfParser/Font.php
Original file line number Diff line number Diff line change
Expand Up @@ -442,17 +442,17 @@ public function decodeText(
$words = [];

// Ensure we have a valid $textMatrix
// Values 'a' and 'b' come from the top row of the text matrix
// Values 'a' and 'i' come from the first column of the text matrix
// and determine the amount of horizontal (x-axis) scaling. Since
// we are only dealing with one line here, we can ignore the 'i'
// we are only dealing with one line here, we can ignore the 'b'
// and 'j' values for vertical (y-axis) scaling, but they are
// included here for clarity.
if (false === is_array($textMatrix) || false === isset($textMatrix['a']) || false === isset($textMatrix['b'])) {
if (false === is_array($textMatrix) || false === isset($textMatrix['a']) || false === isset($textMatrix['i'])) {
$textMatrix = ['a' => 1, 'b' => 0, 'i' => 0, 'j' => 1];
}

$font_space = $this->getFontSpaceLimit();
$font_space = $font_space * (float) $textMatrix['a'] + $font_space * (float) $textMatrix['b'];
$font_space = $font_space * (float) $textMatrix['a'] + $font_space * (float) $textMatrix['i'];

foreach ($commands as $command) {
switch ($command[PDFObject::TYPE]) {
Expand Down
7 changes: 6 additions & 1 deletion src/Smalot/PdfParser/PDFObject.php
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,11 @@ private function getDefaultFont(Page $page = null): Font
}

/**
* Decode a '[]TJ' command and attempt to use alternate fonts if
* the current font results in output that contains Unicode control
* characters. See Font::decodeText for a full description of
* $textMatrix
*
* @param array<int,array<string,string|bool>> $command
* @param array<string,float> $textMatrix
*/
Expand Down Expand Up @@ -719,7 +724,7 @@ public function getTextArray(Page $page = null): array
$whiteSpace = "\n";
} else {
$curX = $currentX - $current_position['x'];
$factorX = 10 * $current_position_tm['a'] + 10 * $current_position_tm['b'];
$factorX = 10 * $current_position_tm['a'] + 10 * $current_position_tm['i'];
if (true === $reverse_text) {
if ($curX < -abs($factorX * 8)) {
$whiteSpace = "\t";
Expand Down

0 comments on commit 2167541

Please sign in to comment.