Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot transverse from clean_top_node to clean_doc or doc #497

Closed
AndyTheFactory opened this issue Oct 24, 2023 · 2 comments
Closed

Cannot transverse from clean_top_node to clean_doc or doc #497

AndyTheFactory opened this issue Oct 24, 2023 · 2 comments
Labels
bug Something isn't working PR-verify Has a PR, must be checked
Milestone

Comments

@AndyTheFactory
Copy link
Owner

Issue by monstrfolk
Sun Dec 13 01:34:57 2020
Originally opened as codelucas/newspaper#862


Perhaps misunderstand the relationship from clean_top_node to clean_doc or doc, but cannot transverse from clean_top_node to clean_doc or doc.

For example, following will not work.

a = Article('https://somesite.com/some_article')
a.download()
a.parse()
print(a.clean_doc.getroottree().getpath(a.clean_top_node))

Expect to be able to print the path from clean_doc/doc to clean_top_node.

@AndyTheFactory
Copy link
Owner Author

Comment by monstrfolk
Sun Dec 13 01:36:21 2020


Please see codelucas/newspaper#863 with a fix for this issue.

@AndyTheFactory AndyTheFactory added bug Something isn't working PR-verify Has a PR, must be checked labels Oct 30, 2023
@AndyTheFactory AndyTheFactory added this to the Release 0.9.1 milestone Oct 30, 2023
@AndyTheFactory
Copy link
Owner Author

ensured that cleaned_doc and cleaned_top_node are on the same DOM
also, doc and top_node are on another DOM together.

Added a Article.text_clean property that returns the cleaned text of an article based on the clean_top_node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PR-verify Has a PR, must be checked
Projects
None yet
Development

No branches or pull requests

1 participant