adds convergent mode to pick_open_reference_otus #1958

gregcaporaso · 2015-03-16T15:21:39Z

This replaces #1951.

We still need to do some more testing before this is merged though. @josenavas, how is the EMP run going with this? Can you confirm that all sequences are accounted for after different iterations as an additional test. The count of input sequences should be the same as the count of sequences in the iteration's OTU map before singleton filtering.

josenavas · 2015-03-16T16:38:55Z

Thanks for adding more documentation @gregcaporaso
Agree we still need more testing. I'm thinking in doing still some modifications to the code, so the performance can be improved.

The problem that I found is that if the size of the input files are quite different (e.g. we have in the EMP input files of 80GB while others are less than 1 GB) once this files are processed, the amount of sequences per iteration that are analyzed is reduced. The change that I'm planning to do is to modify the number of sequences per input file included in each step dynamically; so in each iteration we can analyze approx the same amount of sequences. Does this sound reasonable to you @gregcaporaso ?

Another change will be to allow the convergent mode also in a single input file; so we can analyze extremely large datasets in a convergent manner. Do you also agree with this change @gregcaporaso ?

ghost · 2015-03-16T18:30:19Z

Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1603/

gregcaporaso · 2015-03-16T18:45:30Z

Both of those sound like good additions, but I think you should focus on the first one since we have an immediate application (EMP). Does the process still seem to be working for that analysis?

josenavas · 2015-03-16T19:11:18Z

Yeah, I will focus on the first one. The process seems to be working correctly on that data.
Another addition that I think is going to be awesome and it is going to be somewhat required is the ability of checkpointing, i.e. if X iterations have been already executed and the process failed, start from that iteration rather than re-analyze everything. I'm going to work in both of these issues today, as we are moving the compute to our local cluster in UCSD and being able to resume the work that has been already done will be extremely useful.

josenavas · 2015-03-17T17:31:45Z

closing in favor of #1959

josenavas and others added 5 commits March 10, 2015 22:56

Adding convergent open ref workflow

17db9fa

Adding tests

b9cade8

Addressing comments

2282439

Removing iteration input file

fdd3152

DOC: additions to script documentation

7960d8c

gregcaporaso mentioned this pull request Mar 16, 2015

New open ref workflow - do not merge #1951

Closed

josenavas mentioned this pull request Mar 17, 2015

adds convergent mode to pick_open_reference_otus #1959

Open

josenavas closed this Mar 17, 2015

gregcaporaso deleted the new-open-ref-workflow branch December 19, 2022 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds convergent mode to pick_open_reference_otus #1958

adds convergent mode to pick_open_reference_otus #1958

gregcaporaso commented Mar 16, 2015

josenavas commented Mar 16, 2015

ghost commented Mar 16, 2015

gregcaporaso commented Mar 16, 2015

josenavas commented Mar 16, 2015

josenavas commented Mar 17, 2015

adds convergent mode to pick_open_reference_otus #1958

adds convergent mode to pick_open_reference_otus #1958

Conversation

gregcaporaso commented Mar 16, 2015

josenavas commented Mar 16, 2015

ghost commented Mar 16, 2015

gregcaporaso commented Mar 16, 2015

josenavas commented Mar 16, 2015

josenavas commented Mar 17, 2015