Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decontaminate training datasets #403

Merged
merged 24 commits into from
Oct 25, 2024
Merged

Decontaminate training datasets #403

merged 24 commits into from
Oct 25, 2024

Commits on Oct 11, 2024

  1. checking in indexing script

    pdasigi committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    b5ab7e1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d266d55 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2024

  1. better logging

    pdasigi committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    cd29385 View commit details
    Browse the repository at this point in the history
  2. bug

    pdasigi committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    abb501c View commit details
    Browse the repository at this point in the history
  3. search script

    pdasigi committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    5e77cbc View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f7036ab View commit details
    Browse the repository at this point in the history
  5. multiple fields

    pdasigi committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    c21fcc5 View commit details
    Browse the repository at this point in the history
  6. runtime fixes

    pdasigi committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    3a64aa0 View commit details
    Browse the repository at this point in the history
  7. write tsv

    pdasigi committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    e9c287a View commit details
    Browse the repository at this point in the history
  8. eval dataset limit

    pdasigi committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    127edeb View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2024

  1. ngram matching and threshold

    pdasigi committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    9275f8f View commit details
    Browse the repository at this point in the history
  2. runtime fixes

    pdasigi committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    64d3ac5 View commit details
    Browse the repository at this point in the history
  3. restructured

    pdasigi committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    aec68d4 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2024

  1. read data mixer config

    pdasigi committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    06472ae View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a61039e View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2024

  1. better logging and bug fixes

    pdasigi committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    8d091a5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7565dc5 View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2024

  1. readme and basic cleanup

    pdasigi committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    13b7b25 View commit details
    Browse the repository at this point in the history
  2. updated top-level readme

    pdasigi committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    88c29e7 View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2024

  1. decontaminate train datasets

    pdasigi committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    308f2a8 View commit details
    Browse the repository at this point in the history
  2. better logging

    pdasigi committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    1cfe5e9 View commit details
    Browse the repository at this point in the history
  3. decontaminate train sets

    pdasigi committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    8d19339 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b3c19d0 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2024

  1. search size

    pdasigi committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    a7b608f View commit details
    Browse the repository at this point in the history