Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer for Biobakery outputs: HUMAnN 3 #190

Open
antagomir opened this issue Dec 22, 2021 · 9 comments
Open

Importer for Biobakery outputs: HUMAnN 3 #190

antagomir opened this issue Dec 22, 2021 · 9 comments
Assignees

Comments

@antagomir
Copy link
Member

HUMAnN 3 provides functional predictions for metagenome profiles. An importer to MAE or altExp in mia would be useful as this is a common format.

Later in this page there is one import code example.

Some example data is on the way, for a closer look.

These are functional predictions based on metagenome profiles; they are not functional measurements (eg metabolites). Hence I am thinking that altExp might also be suitable since it is another view to the same data (metagenome) from which we pull taxonomic abundance profiles as well. Conceptually, MAE could be suitable since taxonomic and functional profiles are two different types, even if derived from the same source. I would tend to choose the latter (MAE).

@microsud
Copy link
Member

I have been thinking about this also. The HuMAnN3 output has one key and important aspect which is often underestimated.
Example data:
image

There is function-species linkage information that can be viewed in two ways:
image

This is also the case with genome-resolved metagenomics where we have MAGs and pathway information for each of the MAGs across samples. So this is a general aspect which needs attention.

How can we store such information? feature-microbe joint in a single column is not always the best to analyse.
This is more like single-cell data where pathway information for each microbe is available for every sample. Moreover, many pathways can be unique to specific microbes. But usually, we end up summing up pathways by samples thereby losing out on information about which microbes are contributing to these functions. In biological sense this is a crucial aspects considering high functional redundancy in microbiomes.
During my own analysis, for instance, I found pathways that are interesting and then looked at which microbes contributed to these pathways and found interesting patterns in bacterial contributions. I have been thinking about this but no eureka moment or maybe I am just overthinking here :P

@antagomir
Copy link
Member Author

It is important, and we must learn while we go. I have not seen comprehensive R-based solutions to bring these levels together, and SE/MAE is a promising framework albeit not necessarily the final one. The MAE container does not require that features are matched. Additional information linking the features (rows), i.e. genes, pathways, taxa between MAE experiments is needed in many analyses and can be added through rowData, or in experiment metadata?

The sampleMap mechanism allows more complex matchings between colData and the individual experiments in MAE but for features this might be missing.

@FelixErnst
Copy link
Contributor

This requires an additional class to be defined, if such a class is not available in BioC, since MAE links samples not features as @antagomir pointed out.

The requirements would be as follows:

  • To be compatible with MAEs it would need to extend from TSE
  • A hard-coded alternative TSE slot to hold the "mirror" data also as an TSE
  • A hard coded slot for linking data (also allowing for non-linked data?)
  • An invert function would need to be added to switch between species and gene data
  • A getter/setter pair for the alternative data slot
  • All the necessary reimplementation of functions from the TSE, SCE and SE universe (This is not hard, but probably a bit of work: Each call would need to be applied twice to data and the alternative data and the result recombined)

Downside would be, it would allow only two types of data mirroring each other and not like the MAE an huge number of data types.

However, I think this can be rationalized in this instance, since the number of samples have to be equal in both cases (This limitation is not imposed by MAE) and the type of data is very specific to microbiome data analysis. I would call the class MicrobiomeExperiment 😆 🤣

@antagomir
Copy link
Member Author

antagomir commented Dec 31, 2021

Whoa! Well this could be useful and valuable. It is also some work. Let's see how we get there.. PRs welcome! :-)

Maybe one thing to still consider more carefully before jumping into it: if there are alternative (completely different?) solutions for operating in this space, or if the broader SE community is working on this already.

@antagomir
Copy link
Member Author

Related to #383

@antagomir
Copy link
Member Author

Also related to #306 #308

@antagomir
Copy link
Member Author

Does mia::importHUMAnN() solve this one already (can we close)?

@TuomasBorman
Copy link
Contributor

It imports single Humann file into TreeSE. That might be the most optimal solution currently. The Humann output has species information that is stored in rowData, but single Humann/Metaphlan files are not linked if that was the idea

@antagomir
Copy link
Member Author

Yes, two different issues:

  1. importing functional predictions with importHUMAnN() into TreeSE or similar
  2. linking taxonomic and functional data via MAE

It seems that we have solved (1) satisfactorily now.

The second issue remains open. Not sure if it is feasible to provide a general solution.

However, we could transfer the issue to OMA and demonstrate how to use MAE (or altExp might work even better as the samples match one-to-one) in linking the two types of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants