Add wrapper for agglomerateByRank/mergeRows #389

Daenarys8 · 2023-07-13T06:51:10Z

There are 2 different methods doing similar (merging/grouping rows/features);

mergeRows and agglomerateByRank
This wrapper method that combines these two methods.
It would be used internally by other functions so it would not be available directly for user.

Signed-off-by: Daenarys8 <[email protected]>

TuomasBorman

Neat!

Add couple of checks: .merge_features vs agglomerateByRank and .merge_features vs mergeRows

R/utils.R

Signed-off-by: Daenarys8 <[email protected]>

TuomasBorman

Couple small thhings, looks good!

R/utils.R

tests/testthat/test-2merge.R

tests/testthat/test-3agglomerate.R

Signed-off-by: Daenarys8 <[email protected]>

TuomasBorman

very nice

antagomir · 2023-07-18T20:59:12Z

Status of this?

Daenarys8 · 2023-07-19T13:24:47Z

Status of this?

Complete for now.

antagomir · 2023-07-19T13:46:04Z

Great!

We had elsewhere discussion about function naming.

I would suggest to update as follows:

mergeRows -> mergeFeatures (see issue Deprecate mergeRows/mergeCols #392)
agglomerateByRank -> mergeFeaturesByRank (logical to use similar naming in both functions)

Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)

Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?

TuomasBorman · 2023-07-20T06:51:53Z

Great!

We had elsewhere discussion about function naming.

I would suggest to update as follows:
* mergeRows -> mergeFeatures (see issue [Deprecate mergeRows/mergeCols #392](https://github.com/microbiome/mia/issues/392))

* agglomerateByRank -> mergeFeaturesByRank (logical to use similar naming in both functions)
Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)

Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?

I also think mergeFeatures and mergeFeaturesByRank are better than agglomerate*, and it is good to have same naming convention.

We discussed also about enabling rowData variables in mergeRows. Currently, the groups must be fed as a vector (f parameter) but I think it would be nice to be able to specify the grouping variable directly from rowData as a column name. (Same for mergeCols)

There must be some reason why we didn't implement this before??? Do you remember @antagomir?

I think this PR can be merged, and these new modifications can be done in different PR. @Daenarys8 can you implement these?

antagomir · 2023-07-20T09:45:27Z

I agree. I don't think there is a specific reason, the rowData variables could be called by name as well.

If @Daenarys8 can open a new issue and PR about that it would be great.

You can close this issue when you have checked that it is ready.

Daenarys8 · 2023-07-20T13:17:19Z

Great!
We had elsewhere discussion about function naming.
I would suggest to update as follows:
* mergeRows -> mergeFeatures (see issue [Deprecate mergeRows/mergeCols #392](https://github.com/microbiome/mia/issues/392))

* agglomerateByRank -> mergeFeaturesByRank (logical to use similar naming in both functions)
Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)
Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?
I also think mergeFeatures and mergeFeaturesByRank are better than agglomerate*, and it is good to have same naming convention.

We discussed also about enabling rowData variables in mergeRows. Currently, the groups must be fed as a vector (f parameter) but I think it would be nice to be able to specify the grouping variable directly from rowData as a column name. (Same for mergeCols)

There must be some reason why we didn't implement this before??? Do you remember @antagomir?

I think this PR can be merged, and these new modifications can be done in different PR. @Daenarys8 can you implement these?

Sure, will give it a shot

antagomir · 2023-07-21T10:15:12Z

Can someone close this PR when it is confirmed to be ready.

antagomir · 2023-07-24T09:23:59Z

Was this ever merged? If I check correctly from above, it was just closed without merging ? The idea was to merge I guess?

antagomir · 2023-07-24T22:32:04Z

Hmm - - ok now I noticed that the renaming scheme discussed above, and agreed, has not made it to this PR yet.

Can we add it (@Daenarys8) ?

More specifically, the idea was to rename "Rows" to "Features" and "Cols" to "Samples".

In addition, to harmonize terminology (use "merge" instead of "agglomerate").

However, one last thing to discuss first: the meaning of "to agglomerate" is somewhat better fit with our case (in terms of language & meaning), in particular when the phylogenetic tree is involved in the process. The phyloseq equivalent to TreeSE agglomerateByRank is tax_glom (e.g. https://mikemc.github.io/speedyseq/reference/tax_glom.html).

We could use "glom" instead of "merge". That would be as fast to write, and the meaning would be more specific (merge is easier to confuse with merging of matrices). However, "glom" might be a bit weird term to introduce right now, and it is possible to re-evaluate that later as well.

Hence, in summary, I suggest to rename as:

mergeCols -> mergeSamples (or glomSamples... perhaps not?)
mergeRows -> mergeFeatures
agglomerateByRank -> mergeFeaturesByRank
agglomerateByPrevalence -> mergeFeaturesByPrevalence

@Daenarys8 could you add that to this PR as discussed above, OR open a new issue proposing this change, then merging and closing the current PR. Also confirm that the other points discussed / agreed above have now been addessred.

antagomir · 2023-07-25T07:23:55Z

Ok it is more clear to do the other discussed changes in a separate PR. I am merging and closing this one.

antagomir · 2023-07-26T14:14:58Z

Ok - now as this is merged:

the original motivation was ANCOMBC issue #174 where we would like to let users specify taxonomic rank or also other rowData variable to merge rows before running ANCOMBC: FrederickHuangLin/ANCOMBC#174

Now, ideally this new wrapper will help there; it would group by taxonomic rank if this is available in rowData, otherwise it uses the more general grouping.

Add wrapper for agglomerateByRank/mergeRows

c586ee8

Signed-off-by: Daenarys8 <[email protected]>

TuomasBorman approved these changes Jul 13, 2023

View reviewed changes

R/utils.R Outdated Show resolved Hide resolved

Up

a80ce25

Signed-off-by: Daenarys8 <[email protected]>

TuomasBorman requested changes Jul 15, 2023

View reviewed changes

R/utils.R Outdated Show resolved Hide resolved

R/utils.R Outdated Show resolved Hide resolved

R/utils.R Outdated Show resolved Hide resolved

tests/testthat/test-2merge.R Outdated Show resolved Hide resolved

tests/testthat/test-3agglomerate.R Show resolved Hide resolved

Daenarys8 added 2 commits July 15, 2023 19:26

Up

358eec4

Signed-off-by: Daenarys8 <[email protected]>

Up

c36d9ef

Signed-off-by: Daenarys8 <[email protected]>

TuomasBorman approved these changes Jul 16, 2023

View reviewed changes

Daenarys8 mentioned this pull request Jul 21, 2023

Enabling rowData variables in mergeRow #397

Closed

Merge branch 'master' into mergeby

891d849

Daenarys8 closed this Jul 24, 2023

antagomir reopened this Jul 24, 2023

Merge branch 'master' into mergeby

a8aea71

antagomir merged commit 8685bda into microbiome:master Jul 25, 2023
1 check passed

This was referenced Jul 25, 2023

merge by variable name #400

Closed

Deprecate mergeRows/mergeCols #392

Closed

antagomir assigned TuomasBorman and Daenarys8 Jul 26, 2023

antagomir mentioned this pull request Aug 24, 2023

Issue #174: TreeSE support non-taxonomic ranks FrederickHuangLin/ANCOMBC#201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wrapper for agglomerateByRank/mergeRows #389

Add wrapper for agglomerateByRank/mergeRows #389

Daenarys8 commented Jul 13, 2023

TuomasBorman left a comment

TuomasBorman left a comment

TuomasBorman left a comment

antagomir commented Jul 18, 2023

Daenarys8 commented Jul 19, 2023

antagomir commented Jul 19, 2023 •

edited

Loading

TuomasBorman commented Jul 20, 2023

antagomir commented Jul 20, 2023

Daenarys8 commented Jul 20, 2023

antagomir commented Jul 21, 2023

antagomir commented Jul 24, 2023

antagomir commented Jul 24, 2023 •

edited

Loading

antagomir commented Jul 25, 2023

antagomir commented Jul 26, 2023

Add wrapper for agglomerateByRank/mergeRows #389

Add wrapper for agglomerateByRank/mergeRows #389

Conversation

Daenarys8 commented Jul 13, 2023

TuomasBorman left a comment

Choose a reason for hiding this comment

TuomasBorman left a comment

Choose a reason for hiding this comment

TuomasBorman left a comment

Choose a reason for hiding this comment

antagomir commented Jul 18, 2023

Daenarys8 commented Jul 19, 2023

antagomir commented Jul 19, 2023 • edited Loading

TuomasBorman commented Jul 20, 2023

antagomir commented Jul 20, 2023

Daenarys8 commented Jul 20, 2023

antagomir commented Jul 21, 2023

antagomir commented Jul 24, 2023

antagomir commented Jul 24, 2023 • edited Loading

antagomir commented Jul 25, 2023

antagomir commented Jul 26, 2023

antagomir commented Jul 19, 2023 •

edited

Loading

antagomir commented Jul 24, 2023 •

edited

Loading