-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add SINA reference based alignment #53
Comments
Thanks, @epruesse! Another option would be, for now, do create a separate plugin as |
@wasade If I went the I think a minimal feature route with The primary features offered by the ARB dependency would not easily integrate with Qiime2 anyway. The ARB files are something of QZAs themselves, combining multiple sequence features, alignments, phylogenetic trees, taxonomic groupings, per-organism and per-alignment-column metadata, etc into one database file. |
It's similar to the other plugins for hosting. See I think an ARB independent version of SINA would reduce deployment burden, and perhaps it would make sense as a PR on this repo. @gregcaporaso or @thermokarst, any issues with a PR here vs. a separate q2 plugin? |
I think including in q2-alignment would be a good way to go if possible. This would be a really nice addition - thanks for your interest in contributing it @epruesse! |
@gregcaporaso I had waited for a while, hoping someone might add PyNAST so that I could copy-paste more easily ;-) |
@wasade Is there a way to have software required by q2 plugins installed into separate conda environments? I believe we touched upon this topic at ISME. Turns out that
In this case I know the culprit - the |
This is a good reason for moving away from the "core distribution" model and into an a la carte "choose your favorite plugins" style (@ebolyen and I have been discussing this at length in recent months). Then if there are conflicting plugins, we at least stand a chance (just make two different envs and boumce between them). |
OFF TOPIC - re dependency hell Roughly, what I'd envision is defining an action with an extra parameter: register_function(conda_env={'mycommand': {'bioconda': ['fastspar']}}, ...) The function would then be passed an accessor object via which commands can be executed. def myfunction(input1: type1, conda_env: qiime2.envrunner):
conda_env.run("mycommand", ['mycommand', '-i', str(input1)])
... Upon call, the accessor object can ensure that the environment is available, installing it on demand in a suitable place (same disk as conda installation to allow hard links to work), and prefix the command with This would limit dependency hell to the things needed by the plugin python code, which is much more controllable than the needs of the wrapped tools. Initial installation size would also be reduced, potentially allowing many more plugins to be delivered with qiime core. The CI system could then actually pre-resolve the environments, run unit-tests and store fully qualified URLs to be used for creating environments on the user side. The latter would prevent user/support pain with changing packages in the conda channels, and also has the potential of greatly increasing installation speed. I've got code doing essentially this for my own pipeline. If you want to derive from that I'll happily grant you license (code is GPL but all mine) to do so. |
Ok, I've got the dependency issue fixed. How do I handle the ARB files and the indices that SINA needs for alignment? The "normal", non-Qiime workflow looks like this:
Step 2 can take quite a while, but since we usually work with a static reference database, the hour or so spent is only expended once. Questions:
|
|
ping @gregcaporaso - could you merge #56? The biggest chunk it adds in terms of disk is W.r.t. the ARB data type problem: I've worked around it by accepting either |
ACK. I'll push an update to #54 this evening then to see whether the unit tests pass on that. |
That way, using a large reference database it can stay in I'm also working on pushing SINA 1.4 out which will have native parallel processing and a built-in search engine obviating the need for the |
I had a thought the other day about the ARB situation. What if QIIME 2 artifacts had an ephemeral host-specific caching layer? We had discussed holding a "view cache" within the artifact, but we've always kind of discarded the idea since it breaks the immutability of an artifact. However if the cache (in this case an arb file) was just somewhere else on disk, we could use the UUID to look up if such a file exists. This would be useful for all manner of index formats which are usually large and avoids the unzip problem. The most immediate issue would be the provenance tracking of that transformation (which could have happened with different software versions), but that seems like a workable problem to me. |
@ebolyen There are few concepts in there:
Be aware that indexing is often parametrizable, even if that isn't used very often. |
I was thinking more along the lines of 2 but 1 would be pretty easy to do as a side effect of 2. I guess the hard part becomes not using up all of the disk without the user realizing. |
@ebolyen Exactly. And managing that is non-trivial. No good OS mechanisms exist to e.g. trigger a cleanup if disk space becomes low. The closest is using |
@gregcaporaso #54 builds and tests fine now. Where do you want me to add docs? Anything else on your wishlist for this? |
See q2-alignment-reference-based-alignment-using-sina Is there a way to show warnings to the user even if |
No, unfortunately not (related: qiime2/qiime2#141). Would it make sense to raise an |
Some initial work for this was done in #54 Future implementations might want to build on the above. |
TODO
Addition Description
Allow using SINA to align to reference (i.e. GreenGenes or SILVA)
Current Behavior
Only de-novo alignment using MaFFT
Proposed Behavior
Reference based alignment is useful for placement in pre-existing tree as alignments must match and promises better scalability than progressive de-novo alignment.
Comments
SINA (currently) requires ARB which in turn requires a ton of software, including X11, creating a bit of a problem for a core plugin.
@wasade told me to PR to qiime2/q2-alignment anyway for now. For production, the issue will need some kind of fix. Options that come to mind:
qiime alignment sina
would be the natural spot, this would require a (clean) way to inject actions into core plugins.References
See PR #54
The text was updated successfully, but these errors were encountered: