Skip to content

Releases: dginev/CorTeX

ar5iv 04-2024 release

29 Apr 21:54
Compare
Choose a tag to compare

Tagging the version of CorTeX used for the full arXiv conversion run upto 04.2024, also used for producing the ar5iv 04-2024 dataset.

Recent changes mostly track package updates, and have small refinements to the report pages.

arXMLiv 2022 release

11 Jan 21:47
Compare
Choose a tag to compare

This variant of CorTeX was used for converting the ~2 million arXiv articles (upto the end of 2022) into HTML.

arxmliv 2020 release

24 Jan 19:47
Compare
Choose a tag to compare

Minor release capturing the latest CorTeX state for generating and bundling the arXMLiv 2020 dataset.

arxmliv 08.2019 release

18 Sep 19:38
Compare
Choose a tag to compare

This version of cortex was used to convert and bundle the 08.2019 version of the arXMLiv dataset.

History Feature polish and feedback

03 Apr 22:59
Compare
Choose a tag to compare

Minor polish of the newly released history reports, and related patches

History Feature

01 Apr 19:16
Compare
Choose a tag to compare

CorTeX now has an automatic "historical runs" reporting capacity.

It provides insight into incremental changes in subsequent runs of a service over a corpus, helping to track both improvements and regressions, at a course-granular severity level.

See #41 for additional details.

Update to Rocket 0.4

21 Jan 20:18
Compare
Choose a tag to compare

Minor hygiene release: Update to latest Rust nightly (1.33) and Rocket (0.4).

arxmliv 08.2018 upgrades

19 Sep 18:30
Compare
Choose a tag to compare

Frontend upgrades, as well as stability fixes, for the successful conversion run of arXiv upto 08.2018 with the tex_to_html service.

Improved Report Interfaces

17 Aug 01:05
Compare
Choose a tag to compare

Combined release changes upto 0.3.0, include:

  • pagination and dedicated preview URLs for task list reports
  • worker metadata tracking, as well as per-service worker reports
  • breaking changes to dispatcher API, as sink (zmq::PULL) replies are now also required to include an identity message, for better tracking

Diesel.rs Backend

03 Dec 19:18
2eff00e
Compare
Choose a tag to compare

This release, detailed in PR #24 , is a major backend rewrite that ensures a more solid and maintainable foundation. This includes:

  • The postgresql backend is now realized entirely using the diesel ORM
  • The log messaging table has now been split into 5 tables - one per (latexml-convention) severity, in an effort to keep the final table sizes for billions of messages reasonable.
  • The new LogRecord trait makes that usable in Rust with moderate boilerplate, which I find acceptable.
  • The implications for the code base are more significant - there are large refactors in the backend APIs and coding style.
  • The code quality has been boosted by a more disciplined use of rustfmt and clippy.

The release has undergone a stress test of converting 1000 arXiv artciles and using the respective reports, as a basic sanity check.