Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade to adonis2 #328

Open
nbokulich opened this issue May 24, 2022 · 7 comments
Open

upgrade to adonis2 #328

nbokulich opened this issue May 24, 2022 · 7 comments

Comments

@nbokulich
Copy link
Member

Improvement Description
Upgrade to the vegan function adonis2 to access additional functionality unavailable in adonis1.

To quote @mestaki :

The most important benefit would be that adonis2 allows marginal testing of variables using the by="marginal" parameter and the output gives a partial R-squared for each variable in the model.

Current Behavior
Currently adonis uses adonis1 in vegan. This lacks some functionality of adonis2.

To quote @mestaki :

the current QIIME 2 version of adonis can only perform a sequential test of terms (equivalent to by="terms" in adonis2(), meaning the order of the variables in the formula is important with the residuals from the first term being passed on to the second term, then residuals from the 2nd to 3rd and so on. I suspect most users are not aware of this and it is easy to miss an important biological signal in your model if a term of interest is unintentionally shoved at the end of the formula.

Proposed Behavior
Swap the functions, but keep the defaults to match the current adonis1 functionality (this should be possible with adonis2 surely).

Also expose marginal testing and other options available in adonis2 (see related open issues about adonis parameters, e.g., #242)

References
Raised by @mestaki on the forum

@jwdebelius
Copy link

I would love if this refactor (which I'm happy to help with) output an artifact rather than a visualization. The results could then be passed into the current visualizer, but it would make adonis/adonis2 easier to use in the python interface for multiple testing or for comparing models.

Again, happy to help!

@ebolyen
Copy link
Member

ebolyen commented May 24, 2022

Hey @jwdebelius, @lizgehret and I have been working on reporting statistical outputs as artifacts (in no small part because of your suggestions at the last workshop). If you'd like to see how we've been doing it, you can find it here: https://github.com/qiime2/q2-fmt , we'd be super interested in any feedback you can give.

We're using Tabular Data Resources as the backing format because it makes the data a bit more standardized for alternate environments and lets us attach metadata to the columns, so we can preserve things like labels and other info (ideally mirroring the convenience of R a little bit).

Maybe these can serve as a template for the Adonis refactor? It may be worth having an offline conversation about what our plans are in the near-term for this stuff. I think it would be really great to have you on-board! (cc @gregcaporaso and @nbokulich)

@jwdebelius
Copy link

Thanks @ebolyen,

I will check it out! Thank you. And I'm happy to talk offline and contribute, if I can. I'm not too familiar with the tabular data resource.

@mestaki
Copy link

mestaki commented May 24, 2022

Thanks @nbokulich for starting this and @jwdebelius for offering to help it out!
I'm happy to help anyway I can as well but will be nowhere as effective as Justine lol.

Proposed Behavior
Swap the functions, but keep the defaults to match the current adonis1 functionality (this should be possible with adonis2 surely).

Yes, totally, there are a few options here. The updated vegan package comes with both adonis() and adonis2(). Option 1: keep default as adonis() to avoid any backward compatibility issues, and add adonis2 as a separate plugin. Option2: use adonis() as default and switch to adonis2() if the user adds the by="margin" parameter. Option 3: Just use adonis2() with by="terms" as the default which is identical to current adonis(), as shown below, and expose the by parameters so users can switch when needed. My vote is Option 3 by far.

with adonis()

library(tidyverse)
library(vegan)
library(broom)
data(dune)
data(dune.env)

set.seed(2022) 
adonis(dune ~ Management+A1, data = dune.env) %>% #build model
.$aov.tab %>% #grab summary table
broom::tidy() #broom's tidy cleans up results in a tibble. The warnings can be ignored.

'adonis' will be deprecated: use 'adonis2' instead
# A tibble: 4 × 7
  term          df SumsOfSqs MeanSqs F.Model    R2 p.value
  <chr>      <dbl>     <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Management     3     1.47    0.490    3.07 0.342   0.004
2 A1             1     0.441   0.441    2.77 0.103   0.023
3 Residuals     15     2.39    0.159   NA    0.556  NA    
4 Total         19     4.30   NA       NA    1      NA    
Warning message:
In tidy.anova(.) :
  The following column names in ANOVA output were not recognized or transformed: SumsOfSqs, MeanSqs, F.Model, R2

with adonis2, by="terms"

set.seed(2022); adonis2(dune ~ Management+A1, data = dune.env, by="terms") %>% 
broom::tidy(.)

# A tibble: 4 × 6
  term          df SumOfSqs    R2 statistic p.value
  <chr>      <dbl>    <dbl> <dbl>     <dbl>   <dbl>
1 Management     3    1.47  0.342      3.07   0.004
2 A1             1    0.441 0.103      2.77   0.023
3 Residual      15    2.39  0.556     NA     NA    
4 Total         19    4.30  1         NA     NA    
Warning message:
In tidy.anova(.) :
  The following column names in ANOVA output were not recognized or transformed: SumOfSqs, R2

And finally with adonis2, by="margin"

set.seed(2022)
adonis2(dune ~ Management+A1, data = dune.env, by="margin") %>% 
broom::tidy(.)

# A tibble: 4 × 6
  term          df SumOfSqs    R2 statistic p.value
  <chr>      <dbl>    <dbl> <dbl>     <dbl>   <dbl>
1 Management     3    1.19  0.276      2.48   0.005
2 A1             1    0.441 0.103      2.77   0.023
3 Residual      15    2.39  0.556     NA     NA    
4 Total         19    4.30  1         NA     NA    
Warning message:
In tidy.anova(.) :
  The following column names in ANOVA output were not recognized or transformed: SumOfSqs, R2

A couple of notes here. The warnings here are related to the tidy function I used. I'm piping the adonis output into broom::tidy() as this gives nice tibbles that are convenient to work with downstream. The broom::tidy is a very useful way to get various stats objects into tables. It isn't usually used with adonis outputs, thus the warnings, though they are not important imo, but I find the output very useful still and it may actually help with Justine's hope of getting this into an artifact format. Adonis2() also doesn't require fetching the aov.tab object so one less thing to worry about.

@ebolyen I have some minor suggestions about those awesome plots over on the q2-fmt page you linked. Should I comment there?

@mestaki
Copy link

mestaki commented May 27, 2022

Looks like I was wrong about adonis2() being introduced to vegan only after 2.5_7 and after, as Jari Oksanen mentioned in the above Q2 forum post, this has been around since 2.4-0, so updating this will be even simpler, just a matter of updating the existing syntax to adonis2 and exposing 1 or 2 additional parameters, which we can discuss here.

@colinbrislawn
Copy link

colinbrislawn commented May 27, 2022

broom::tidy() does not support any vegan objects, but ggvegan does!

Thank you all for working on this problem. I'm excited for q2-fmt

@gregcaporaso
Copy link
Member

gregcaporaso commented Oct 26, 2023

Other adonis related feature requests that should be addressed at the same time as this one:
#303
#242
#243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants