agglomerate* functions: behavior when NA #411

TuomasBorman · 2023-08-02T08:34:29Z

Leo;

1. agglomerateByPrevalence: Indeed, it seems that your analysis is correct; the "Other" group only contains 0's and 1's. Then the system is not sure if these are actual counts that are sensible to sum up, and throws a warning. In principle this works as expected. In practice, we already know from the context that it is actual count data because we could test that the original mae[[1]] is count data, and hence any subset of it (those that will be merged under the "Other" category) will also be. This could deserve a small fix that would check the "count" status in such cases for the original input only. This will require thinking a bit about the logic of the method.

I noticed also another point; agglomerating by rank or prevalence will give different total read counts per sample, although they would be expected to give identical count (just different grouping of the rows).

colSums(assay(agglomerateByRank(mae[[1]], rank = "Phylum")))
colSums(assay(agglomerateByPrevalence(mae[[1]], rank = "Phylum")))

This is because the Phylum rank includes NAs for some rows: sum(is.na(rowData(mae[[1]])$Phylum)) yields 93. These are omitted with agglomerateByPrevalence but not with agglomerateByRank (they will be included as NA row in the latter). It would be most logical that the NA row would be included also in the data that is agglomerated by prevalence. The user can choose whether they want to merge such NA row further. One problem with the NA row is that these may come from different phyla, and hence grouping them together in the phylum level agglomeration is potentially misleading. I would solve this by providing a binary argument that excludes the NA phyla by default in all agglomerations (rank, prevalence, or other grouping variable) but user could choose to keep these by switching the argument (then they are aware of this and can maintain the original read count, which might be relevant in some cases).

Merge branch 'agglomerate_NAs' of github.com:microbiome/mia into agglomerate_NAs # Conflicts: # R/merge.R

…omerate_NAs

TuomasBorman · 2023-09-06T17:13:40Z

New PR, closing this

#438

TuomasBorman and others added 8 commits August 1, 2023 15:35

up

2f5613a

Merge branch 'master' into agglomerate_NAs

4556f83

up

24d17b9

up

a958195

Merge branch 'master' into agglomerate_NAs

5fac39c

up

b1c10fa

Merge branch 'agglomerate_NAs' of github.com:microbiome/mia into agglomerate_NAs # Conflicts: # R/merge.R

Merge branch 'agglomerate_NAs' of github.com:microbiome/mia into aggl…

4efa8a6

…omerate_NAs

up

03871d9

TuomasBorman marked this pull request as draft September 6, 2023 16:39

TuomasBorman added 2 commits September 6, 2023 19:56

up

dd7c966

Merge branch 'agglomerate_NAs' of github.com:microbiome/mia into aggl…

9641981

…omerate_NAs

TuomasBorman mentioned this pull request Sep 6, 2023

agglomerateFunction when NAs #438

Merged

TuomasBorman closed this Sep 6, 2023

Daenarys8 deleted the agglomerate_NAs branch August 6, 2024 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agglomerate* functions: behavior when NA #411

agglomerate* functions: behavior when NA #411

TuomasBorman commented Aug 2, 2023

TuomasBorman commented Sep 6, 2023

agglomerate* functions: behavior when NA #411

agglomerate* functions: behavior when NA #411

Conversation

TuomasBorman commented Aug 2, 2023

TuomasBorman commented Sep 6, 2023