Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add method to "bin" continuous metadata and generate a new metadata column #8

Open
nbokulich opened this issue Sep 8, 2017 · 1 comment

Comments

@nbokulich
Copy link
Member

nbokulich commented Sep 8, 2017

Proposed Behavior
The question is: how to bin? The user could define:

  1. an explicit number of bins to create, and the range of values are sliced at even intervals
  2. a "step" size to explicitly define bin range instead of number of bins. To follow the examples above, 1) if the unit is 1 day, a step of 30 would be roughly 1 month; 2) if the unit is 1 meter, 100 would be 100 meters.
  3. A list of bin cutoffs. E.g., [100, 1000, 10000] would generate 3 bins: all samples with x < 100, 100 ≤ x < 10000, and x ≥ 10000. This would be useful for explicitly defining uneven bin sizes. A tangible example of where this would be used is if samples were collected from patients at many different ages, and an investigator wants to compare the microbiome at [3, 12, 24, 72, 144] months of age.
  4. A very cool "some day in the future" enhancement would be to add a function for auto-binning, by looking at the distribution and finding sensible divisions for creating bins.

This method should require a user-defined name to give the new column.

Comments

  1. This would be useful for using continuous metadata column as pseudo-categorical groupings when performing statistical tests.
  2. For example, a researcher might collect samples from infants at different days of life, and choose to bin those samples into months of life to aggregate into larger groups for statistical comparison. Or collect soil samples at different elevations (meter) and put into 100 m bins for comparison. I could make many other examples.
  3. You are probably asking: "why don't users just create these categories manually from the start"? Sometimes this is not easy to do, and sometimes this will come up only later during analysis.
@jairideout
Copy link
Member

This is an interesting idea and definitely worth exploring (though honestly not a high priority for us in 2017 unless someone wants to implement it!). Right now QIIME 2 can't output/create metadata files so we'd need to add support for that in the framework. It sounds like there's a few cases where allowing QIIME 2 to write out metadata would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants