Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize time and space consumption. #85

Open
thequicksort opened this issue Feb 6, 2021 · 2 comments
Open

Optimize time and space consumption. #85

thequicksort opened this issue Feb 6, 2021 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@thequicksort
Copy link
Contributor

Brain storming ideas for optimizations post-release. Please add them as you think of them.

  • (Katie mentioned): Removing Random Forest classifier

  • Use Python slots in dataclasses without defaults (will reduce time and space).

@thequicksort thequicksort added the enhancement New feature or request label Feb 6, 2021
@thequicksort thequicksort added this to the 1.1 milestone Feb 6, 2021
@thequicksort
Copy link
Contributor Author

It turns out that ONT released some documentation around many of its data types. Currently we store things as INt64s which could be Int16s: https://github.com/nanoporetech/minknow_api/blob/6f2dfb66bf0ff03edd0a57d758913110f08c7f07/proto/minknow_api/data.proto#L302

Maybe there's an optimization there? But their wording seems "shifty" enough that I don't know that I would bet the bank on it. We could try the optimization, and if the data compression is above some threshold and we feel safe with ONT being consistent, keep it.

@thequicksort
Copy link
Contributor Author

Cool info on optimizing H5py :
https://www.nersc.gov/assets/Uploads/H5py-2017-Feb23.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant