Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split checksum file into chunks #961

Open
wants to merge 1 commit into
base: release/114
Choose a base branch
from

Conversation

TamaraNaboulsi
Copy link
Member

Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion

Requirements

  • Filling out the template is required.
  • Review the contributing guidelines for this repository; remember in particular:
    • do not modify code without testing for regression
    • provide simple unit tests to test the changes
    • if you change the schema you must patch the test databases as well, see Updating the schema
    • the PR must not fail unit testing

Description

Splitting the large checksum file into smaller ones.

Use case

Timeout errors are popping up in the Checksum step of the pipeline because of the 'LOAD DATA INFILE' command being run on a very large file. This fix consists of splitting the big file into multiple smaller ones and running the command on each. At the end, the code combines these smaller files into 1 to revert back to the previous state of things at the end of running.
This change is also accompanied by another in the DB model (ensemb-py) to set the engine for the checksum_xref table to MyISAM as this decreases the probability of getting the error.

Benefits

Probability of errors decreases.

Possible Drawbacks

If applicable, describe any possible undesirable consequence of the changes.

Testing

  • Have you added/modified unit tests to test the changes?
  • If so, do the tests pass?
  • Have you run the entire test suite and no regression was detected?
  • TravisCI passed on your branch

Dependencies

If applicable, define what code dependencies were added and/or updated.

@TamaraNaboulsi TamaraNaboulsi marked this pull request as ready for review October 1, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant