Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxonomy #18

Open
sjanssen2 opened this issue Nov 17, 2017 · 2 comments
Open

Taxonomy #18

sjanssen2 opened this issue Nov 17, 2017 · 2 comments

Comments

@sjanssen2
Copy link
Collaborator

sjanssen2 commented Nov 17, 2017

Improvement Description
I thought about the FeatureData[Taxonomy] artifact and Daniel's warnings about the quality of the assigned taxonomic labels, which depend on the quality of the placements of taxonomic labels in the reference phylogeny. Furthermore, fragment insertion is not unambiguous, but results in a distribution of positions and I remember Siavash suggesting his program TIPP for taxonomy assignment. Thus, I think we better organize creation of a FeatureData[Taxonomy] as a separate function instead of integrating it into the main function ("sepp").

Proposed Behavior
Currently, I am thinking about two alternatives to generate a FeatureData[Taxonomy]:

  1. classify-paths: the current method which collects all taxonomic labels along the path from tip to root. Single input would be the Phylogeny[Rooted] artifact.

  2. classify-otus: For every inserted fragment, we traverse the tree from tip to root. In every step, we check if we can find any OTU nodes in the current sub-tree. If so, we stop, otherwise continue the same procedure with the parent node. Once we found one (or maybe several) OTUs, we look up their assigned taxonomy lineage in Greengenes/Silva taxonomy table for corresponding reference tree. In case of several OTUs we report the longest commong prefix. This would require two inputs, the Phylogeny[Rooted] artifact and the taxonomy table from Greengenes with two columns: OTU-ID and lineage-string. This is the more conservative method and should only produce results en par with current Greengenes based taxonomy assignment algorithms.

  3. classify-tipp: A feature development could use Siavash's TIPP to generate taxonomic lineages.

Questions
@wasade what are your thoughts?

@wasade
Copy link
Member

wasade commented Nov 17, 2017 via email

@sjanssen2
Copy link
Collaborator Author

I think I want to change classify-paths to operate on two inputs, the insertion tree Phylogeny[Rooted] AND the representative-sequences FeatureData[Sequence] to ensure collecting lineages only for the inserted tips.
Otherwise, one would need to guess which tips belong to the reference phylogeny and which are inserted fragments (which might work as long as fragment names are nucleotide sequences), but since we allow arbitrary reference phylogenies we cannot ensure that no other tip names are only composed of "acgt" characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants