Skip to content

Commit

Permalink
fix: set page number using 1-based indexing (#22)
Browse files Browse the repository at this point in the history
Signed-off-by: Panos Vagenas <[email protected]>
  • Loading branch information
vagenas authored Jul 31, 2024
1 parent e102827 commit d2d9543
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 11 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ print(doc.export_to_markdown()) # output: "## DocLayNet: A Large Human-Annotate

### Convert a batch of documents

For an example of converting multiple documents, see [convert.py](https://github.com/DS4SD/docling/blob/main/examples/convert.py).
For an example of batch-converting documents, see [convert.py](https://github.com/DS4SD/docling/blob/main/examples/convert.py).

From a local repo clone, you can run it with:

Expand Down
10 changes: 5 additions & 5 deletions docling/datamodel/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def to_ds_document(self) -> DsDocument:
desc = DsDocumentDescription(logs=[])

page_hashes = [
PageReference(hash=p.page_hash, page=p.page_no, model="default")
PageReference(hash=p.page_hash, page=p.page_no + 1, model="default")
for p in self.pages
]

Expand Down Expand Up @@ -159,7 +159,7 @@ def to_ds_document(self) -> DsDocument:
prov=[
Prov(
bbox=target_bbox,
page=element.page_no,
page=element.page_no + 1,
span=[0, len(element.text)],
)
],
Expand Down Expand Up @@ -242,7 +242,7 @@ def make_spans(cell):
prov=[
Prov(
bbox=target_bbox,
page=element.page_no,
page=element.page_no + 1,
span=[0, 0],
)
],
Expand All @@ -264,7 +264,7 @@ def make_spans(cell):
prov=[
Prov(
bbox=target_bbox,
page=element.page_no,
page=element.page_no + 1,
span=[0, 0],
)
],
Expand All @@ -274,7 +274,7 @@ def make_spans(cell):
)

page_dimensions = [
PageDimensions(page=p.page_no, height=p.size.height, width=p.size.width)
PageDimensions(page=p.page_no + 1, height=p.size.height, width=p.size.width)
for p in self.pages
]

Expand Down
8 changes: 4 additions & 4 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ packages = [{include = "docling"}]
[tool.poetry.dependencies]
python = "^3.10"
pydantic = "^2.0.0"
docling-core = "^1.1.0"
docling-core = "^1.1.2"
docling-ibm-models = "^1.1.0"
deepsearch-glm = ">=0.19.0,<1"
filetype = "^1.2.0"
Expand Down

0 comments on commit d2d9543

Please sign in to comment.