Skip to content

Version 0.9.1 code refactoring and bugfixes

Compare
Choose a tag to compare
@AndyTheFactory AndyTheFactory released this 08 Nov 13:40
· 170 commits to master since this release

New feature:

  • version bump(f7107be)
  • tests: Add test case for(592f6f6)
  • parse: added possibility to follow "read more" links in articles(0720de1)
  • Allow to pass any requests parameter to the Article constructor. You can now pass verify=False in order to ignore certificate errors (issue #462)(5ff5d27)
  • parse: extended data parsing of json-ld metadata (issue #518)(fc413af)
  • tests: added script to create test cases(9df8c16)
  • parse: added tag for date detection issue #835(41152eb)
  • parse: added og:regDate to known date tags(dc35e29)
  • tests: convert unittest to pytest(45c4e8d)

Bugs fixed:

  • typing annotation for set python 3.8(895343f)
  • parse: improve meta tag content for articles and pubdate(37bb0b7)
  • parse: 📝 improved author detection. improved video links detection(23c547f)
  • parse: ensured that clean_doc/doc to clean_top_node are on the same DOM. And doc/top_node on the same DOM.(6874d05)
  • small changes, replace os.path with pathlib(5598d95)
  • parse: use one file of stopwords for english, the one in the standard folder #503(6bdf813)
  • parse: better author parsing based on issue #493(f93a9c2)
  • parse: make the url date parsing stricter. Issue #514(0cc1e83)
  • parse: replace \n with space in sentence split (Issue #506)(3ccb87c)
  • parsing: catch url errors resulting resulting from parsed image links(9140a04)
  • correct python versions in pipeline(7e671df)
  • gitignore update(8855f00)