-
Notifications
You must be signed in to change notification settings - Fork 68
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #7 from cuttlefishh/manuscript
Manuscript
- Loading branch information
Showing
222 changed files
with
561,979 additions
and
2,997 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
Copyright (c) 2017, Earth Microbiome Project analysis team | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without modification, | ||
are permitted provided that the following conditions are met: | ||
|
||
Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
Redistributions in binary form must reproduce the above copyright notice, this | ||
list of conditions and the following disclaimer in the documentation and/or | ||
other materials provided with the distribution. | ||
|
||
Neither the name of the {organization} nor the names of its | ||
contributors may be used to endorse or promote products derived from | ||
this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR | ||
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | ||
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | ||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON | ||
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,23 +5,23 @@ Earth Microbiome Project | |
|
||
The Earth Microbiome Project (EMP) is a systematic attempt to characterize global microbial taxonomic and functional diversity for the benefit of the planet and humankind. Most of the data generated to this point are from 16S rRNA amplicon surveys, but the project also includes data from 18S and ITS amplicons, metagenomics, and metabolomics. For more information about the EMP -- people, publications, news, protocols and standards, and more -- please see the [EMP website](http://www.earthmicrobiome.org/). | ||
|
||
This GitHub repository describes the EMP catalogue and how to use it. The EMP dataset is generated from samples that individual researchers have compiled and contributed to the EMP. Samples from each group of researchers represent individual EMP studies. In addition to analyses being done by contributing researchers on the individual studies, we are performing cross-study meta-analyses. A meta-analysis of the first 97 16S rRNA amplicon studies in the EMP (Release 1) is currently under review. | ||
This GitHub repository describes the EMP catalogue and how to use it. The EMP dataset is generated from samples that individual researchers have compiled and contributed to the EMP. Samples from each group of researchers represent individual EMP studies. In addition to analyses being done by contributing researchers on the individual studies, we are performing cross-study meta-analyses. A meta-analysis of the first 97 16S rRNA amplicon studies in the EMP (Release 1) is currently in press (Thompson et al., *Nature*, 2017, [doi:10.1038/nature24621](http://doi.org/10.1038/nature24621)). | ||
|
||
Getting involved | ||
---------------- | ||
|
||
There are several ways to get involved with the EMP: | ||
|
||
* **Use the EMP catalogue in your own research.** Download the whole catalogue or just a few studies, merge and analyze them with your own data, or query the catalogue. Please skip to the next section for detailed instructions. | ||
* **Join the analysis team.** If you are interested in getting involved with EMP meta-analyses, you can begin by reviewing the open [issues](https://github.com/EarthMicrobiomeProject/emp/issues) on this GitHub page. You can add comments to an existing issue to propose your ideas, or create a new issue entirely. Note that the initial meta-analysis of the EMP has been completed and is currently under review. You can view the existing meta-analysis [code](https://github.com/biocore/emp/tree/master/ipynb) and [results](https://github.com/biocore/emp/tree/master/results). | ||
* **Join the analysis team.** If you are interested in getting involved with EMP meta-analyses, you can begin by reviewing the open [issues](https://github.com/biocore/emp/issues) on this GitHub page. You can add comments to an existing issue to propose your ideas, or create a new issue entirely. Note that the initial meta-analysis of the EMP has been accepted for publication. You can view the existing [code](https://github.com/biocore/emp/tree/master/code) for generating [figures](https://github.com/biocore/emp/tree/master/figures) for the meta-analysis. | ||
* **Contribute samples.** We are not currently soliciting samples for the EMP. If you have an idea for samples you might like to submit in the future, you may [email](mailto:[email protected]) the project leader for the EMP, Dr. Luke Thompson. | ||
|
||
Using the EMP catalogue | ||
----------------------- | ||
|
||
The EMP catalogue is a diverse and standardized set of thousands of microbiomes for use by the public. Here are some of the ways you can use this resource: | ||
|
||
* **Download EMP Release 1 from our FTP site.** EMP Release 1 contains merged and quality-filtered mapping files, BIOM tables, OTU/sequence information, and alpha/beta-diversity results for ~25,000 samples in 97 studies of the initial meta-analysis of the EMP. The [FTP site](ftp://ftp.microbio.me/emp/release1) contains README files about its contents, and the individual files are listed [here](https://github.com/biocore/emp/blob/master/data/data_locations.txt). | ||
* **Download EMP Release 1 from our FTP site.** EMP Release 1 contains merged and quality-filtered mapping files, BIOM tables, OTU/sequence information, and alpha/beta-diversity results for ~25,000 samples in 97 studies of the initial meta-analysis of the EMP. The [FTP site](ftp://ftp.microbio.me/emp/release1) contains README files about its contents, and the individual files are listed [here](https://github.com/biocore/emp/blob/master/data/ftp_contents.txt). | ||
* **Download individual studies from the Qiita EMP Portal.** For each study, you can download metadata (mapping file), feature tables (BIOM file), and demultiplexed raw sequence files. Like the rest of Qiita, the [EMP Portal](https://qiita.ucsd.edu/emp/) requires the Google Chrome browser. | ||
* **Merge your data with all or part of the EMP dataset.** If you sequenced your sample using the [EMP 16S rRNA primers](http://www.earthmicrobiome.org/protocols-and-standards/16s/) and picked OTUs using either [Deblur](http://msystems.asm.org/content/2/2/e00191-16) or closed-reference against Greengenes 13.8 or Silva 123, you can merge your BIOM table with the relevant merged EMP Release 1 BIOM table or one of the individual per-study BIOM tables from Qiita. Basic instructions for [initial processing](http://www.earthmicrobiome.org/protocols-and-standards/initial-qiime-processing/) of your data are provided. You can then use [QIIME1](http://qiime.org/) or [QIIME2](https://qiime2.org/) to merge the BIOM tables and mapping files. | ||
* **Query the EMP catalogue using Redbiom.** [Redbiom](https://github.com/biocore/redbiom) is a command-line tool that allows users to query the Qiita database, including EMP studies. It allows you to find samples based on the sequences or taxa they contain or on sample metadata, and to export selected sample data and metadata. Once you have Redbiom [installed](https://github.com/biocore/redbiom#installation), you can carry out queries such as those described here: | ||
|
@@ -64,45 +64,23 @@ Organization of this repository | |
|
||
This repository contains the following directories: | ||
|
||
* `data/` data files used for downstream analysis (biom tables, trees, mapping files, etc) | ||
- `data_locations.txt` links to where large data files can be found (e.g., BIOM and tree files) | ||
- `MIxS/` Excel files describing MIxS, EBI, and Qiita metadata standard requirements; used to generate metadata templates | ||
- `sequence-lookup/` files used for the EMP Trading Cards (sequence lookup) notebooks (e.g., RDP taxonomy files) | ||
|
||
* `ipynb/` IPython notebooks and scripts (Python, Java, R, Bash) developed for meta-analysis of EMP data (Thompson et al., in prep.) | ||
- `01-metadata-processing/` | ||
- `02-sequence-processing/` | ||
- `03-otu-picking/` | ||
- `04-rarefaction-and-subsets/` | ||
- `05-alpha-diversity/` | ||
- `06-beta-diversity/` | ||
- `07-environmental-covariation/` | ||
- `08-cooccurrence-and-nestedness/` | ||
- `09-sequence-lookup/` | ||
|
||
* `legacy/` code, results, and website documents from the early phase of the EMP (2010-2013) | ||
|
||
* `presentations/` collection of presentations on the EMP | ||
|
||
* `results/` diversity analyses and high-level results (e.g., figures and tables that are useful for presentations) | ||
- `results_locations.txt` links to where large results files can be found (e.g., alpha- and beta-diversity results) | ||
|
||
* `scripts/` utility scripts and code not specific to particular analyses | ||
- `01-metadata-templates/` | ||
- `02-colors-and-styles/` | ||
- `03-phylogenetic-placement/` | ||
* `code` IPython notebooks and scripts (Python, Java, R, Bash) developed for meta-analysis of EMP data; this code is used in the top-level directory `figures`. | ||
* `data` Data files used for processing and downstream analysis. | ||
* `figures` Instructions to generate the figures in "A communal catalogue reveals Earth’s multiscale microbial diversity", Thompson et al., *Nature* (2017). | ||
* `legacy` Early code, results, and website documents from the initial phase of the EMP (2010-2013). | ||
* `presentations` Collection of presentations on the EMP. | ||
|
||
File name abbreviation conventions | ||
---------------------------------- | ||
|
||
Some abbreviations used in this repository: | ||
|
||
* `demux` is shorthand for "demultiplexed", which describes the fastq data after it is split into per-sample fastq files using barcodes | ||
* `deblur` refers to the exact-sequence de novo OTU picking method [Deblur](https://github.com/cuttlefishh/deblur) | ||
* `cr` refers to [closed-reference OTU picking](http://qiime.org/tutorials/otu_picking.html#closed-reference-otu-picking) | ||
* `or` refers to [open-reference OTU picking](http://qiime.org/tutorials/otu_picking.html#open-reference-otu-picking) | ||
* `refseqs` refers to reference sequence collections that could be used in reference-based OTU picking | ||
* `mc2` refers to minimum sequence count in an OTU to be included equals to 2 | ||
* `demux` is shorthand for "demultiplexed", which describes the fastq data after it is split into per-sample fastq files using barcodes. | ||
* `deblur` refers to the exact-sequence de novo OTU picking method [Deblur](https://github.com/cuttlefishh/deblur). | ||
* `cr` refers to [closed-reference OTU picking](http://qiime.org/tutorials/otu_picking.html#closed-reference-otu-picking). | ||
* `or` refers to [open-reference OTU picking](http://qiime.org/tutorials/otu_picking.html#open-reference-otu-picking). | ||
* `refseqs` refers to reference sequence collections that could be used in reference-based OTU picking. | ||
* `mc2` refers to minimum sequence count in an OTU to be included equals to 2. | ||
|
||
Finding older data | ||
------------------ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
## code/01-metadata | ||
|
||
Code is used in the following section numbers of `figures/README.md`. | ||
|
||
**2:** | ||
|
||
* `metadata_refine_step1_studies.ipynb` | ||
* `metadata_refine_step2_samples.ipynb` | ||
* `metadata_refine_step3_qiita.ipynb` | ||
* `envo_hierarchy_lookup.ipynb` | ||
* `metadata_template_generator.py` | ||
* `metadata_template_generator.md` | ||
|
||
**3.1.2:** | ||
|
||
* `map_samples_by_empo.ipynb` | ||
|
||
**3.5**: | ||
|
||
* `physicochemical_pairplot.ipynb` |
Oops, something went wrong.