Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

adds convergent mode to pick_open_reference_otus #1958

Closed

Conversation

gregcaporaso
Copy link
Contributor

This replaces #1951.

We still need to do some more testing before this is merged though. @josenavas, how is the EMP run going with this? Can you confirm that all sequences are accounted for after different iterations as an additional test. The count of input sequences should be the same as the count of sequences in the iteration's OTU map before singleton filtering.

@josenavas
Copy link
Member

Thanks for adding more documentation @gregcaporaso
Agree we still need more testing. I'm thinking in doing still some modifications to the code, so the performance can be improved.

The problem that I found is that if the size of the input files are quite different (e.g. we have in the EMP input files of 80GB while others are less than 1 GB) once this files are processed, the amount of sequences per iteration that are analyzed is reduced. The change that I'm planning to do is to modify the number of sequences per input file included in each step dynamically; so in each iteration we can analyze approx the same amount of sequences. Does this sound reasonable to you @gregcaporaso ?

Another change will be to allow the convergent mode also in a single input file; so we can analyze extremely large datasets in a convergent manner. Do you also agree with this change @gregcaporaso ?

@ghost
Copy link

ghost commented Mar 16, 2015

Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1603/

@gregcaporaso
Copy link
Contributor Author

Both of those sound like good additions, but I think you should focus on the first one since we have an immediate application (EMP). Does the process still seem to be working for that analysis?

@josenavas
Copy link
Member

Yeah, I will focus on the first one. The process seems to be working correctly on that data.
Another addition that I think is going to be awesome and it is going to be somewhat required is the ability of checkpointing, i.e. if X iterations have been already executed and the process failed, start from that iteration rather than re-analyze everything. I'm going to work in both of these issues today, as we are moving the compute to our local cluster in UCSD and being able to resume the work that has been already done will be extremely useful.

@josenavas
Copy link
Member

closing in favor of #1959

@josenavas josenavas closed this Mar 17, 2015
@gregcaporaso gregcaporaso deleted the new-open-ref-workflow branch December 19, 2022 16:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants