Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to extract figures from the pdf? #70

Closed
PeterGriffinJin opened this issue Feb 7, 2024 · 3 comments
Closed

how to extract figures from the pdf? #70

PeterGriffinJin opened this issue Feb 7, 2024 · 3 comments
Assignees

Comments

@PeterGriffinJin
Copy link

PeterGriffinJin commented Feb 7, 2024

Hi there,

Thank you so much for the nice package!

Can I ask how to extract the figures from the pdf? I have tried:

recipe = CoreRecipe()
doc = recipe.run("papermage/tests/fixtures/2020.acl-main.447.pdf")
doc.figures

But it seems that this is not returning the figure data. Is the figure extraction achievable with your package?

Best,
Bowen

@kyleclo
Copy link
Collaborator

kyleclo commented Mar 13, 2024

Hey @PeterGriffinJin Sorry looks like a bug; once this merges, should fix it thanks!
#73

@kyleclo kyleclo self-assigned this Mar 13, 2024
@kyleclo
Copy link
Collaborator

kyleclo commented Mar 18, 2024

Just merged #73. Here's me testing out the recipe locally on that PDF to get Figures:

import json
import os
import pathlib

from papermage.magelib import Document
from papermage.recipes import CoreRecipe
from papermage.visualizers.visualizer import plot_entities_on_page

# load doc
recipe = CoreRecipe()
pdfpath = pathlib.Path(__file__).parent.parent / "tests/fixtures/2020.acl-main.447.pdf"
doc = recipe.from_pdf(pdf=pdfpath)
page_id = 0
figures = doc.pages[page_id].intersect_by_box("figures")
plot_entities_on_page(page_image=doc.images[page_id], entities=figures)

image

image

@kyleclo
Copy link
Collaborator

kyleclo commented Mar 18, 2024

I'm gonna close this for now, please re-open if it's not resolved, thankss!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants