Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from sphinx to use mkdocs #209

Merged
merged 15 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Documentation_deploy
run-name: ${{ github.actor }} triggered doc generation
on:
name: Documentation_deploy_mkdocs
run-name: ${{ github.actor }} triggered mkdocs generation
on:
pull_request:
types:
- closed
Expand All @@ -16,9 +16,9 @@ on:
- 'docs/**'
permissions:
contents: write

jobs:
Sphinx_Doc_generation:
Mkdocs_Doc_generation:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
steps:
Expand All @@ -34,19 +34,30 @@ jobs:
cache-dependency-path: '**/pip'
run: echo '${{ steps.cp38.outputs.cache-hit }}'

- name: Set pip cache directory path
id: pip-cache-dir-path
run: |
echo "PIPCACHE=(`pip cache dir`)" >> "$GITHUB_OUTPUT"

- name: Get pip cache dir
env:
PIPCACHE: ${{ steps.pip-cache-dir-path.outputs.PIPCACHE }}
run: echo "The pip cache dir located is $PIPCACHE"

- name: Install Dependencies
run: |
pip install -e .[doc]

- name: Sphinx Build
- name: mkdocs deploy
run: |
sh scripts/setup/docs/build_sphinx_docs.sh
mkdocs build

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
if: ${{ github.event_name == 'pull_request' && github.ref == 'refs/heads/main' }}
with:
publish_branch: gh-pages
publish_branch: mkdocs
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/build/html
publish_dir: ./site
force_orphan: true

32 changes: 32 additions & 0 deletions docs/gen_ref_pages.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""Generate the code reference pages for mkdocs."""

from pathlib import Path

import mkdocs_gen_files

nav = mkdocs_gen_files.Nav()

for path in sorted(Path("src").rglob("*.py")):
module_path = path.with_suffix("")
doc_path = path.relative_to("src").with_suffix(".md")
full_doc_path = Path("reference", doc_path)

parts = tuple(module_path.parts)

if parts[-1] == "__init__":
parts = parts[:-1]
doc_path = doc_path.with_name("index.md")
full_doc_path = full_doc_path.with_name("index.md")
elif parts[-1] == "__main__":
continue

nav[parts] = doc_path.as_posix()

with mkdocs_gen_files.open(full_doc_path, "w") as fd:
identifier = ".".join(parts)
print("::: " + identifier, file=fd)

mkdocs_gen_files.set_edit_path(full_doc_path, Path("../") / path)

with mkdocs_gen_files.open("reference/SUMMARY.md", "w") as nav_file:
nav_file.writelines(nav.build_literate_nav())
Binary file added docs/img/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/metazoa_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
94 changes: 94 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# [Ensembl GenomIO](https://github.com/Ensembl/ensembl-genomio)

*Ensembl-genomIO Base Library Documentation*

A repository dedicated to pipelines used to turn basic genomic data into formatted
Ensembl core databases. Also allow users to dump core databases into various formats.

File formats handled : FastA, GFF3, JSON (*following BRC4 specifications*).

Contents
--------
Check out [installation](install.md) section for further information on how
to install the project.

1. [Usage](usage.md)
2. [Install](install.md)

Ehive pipelines
-------------------------------------------
Check out the [usage](usage.md) section for further information of requirements to
run ensembl-genomio pipelines.

1. __Genome loader__: Creates an Ensembl core database from a set of flat files.
2. __Genome dumper__: Dumps flat files from an Ensembl core database.

Nextflow pipelines
-------------------------------------------
1. __Additional seq prepare__: BRC/Ensembl metazoa pipeline. Preparation of genome data loading files for new sequence(s) to existing species databases.
2. __Genome Prepare__: BRC/Ensembl metazoa pipeline. Retrieve data for genome(s), obtained from INSDC and RefSeq, validate and prepare GFF3, FASTA, JSON files for each genome accession.


## Project layout
src/ensembl/
├── brc4
│ └── runnable
│ ├── compare_fasta.py
│ ├── compare_report.py
│ ├── core_server.py
│ ├── download_genbank.py
│ ├── dump_stable_ids.py
│ ├── extract_from_gb.py
│ ├── fill_metadata.py
│ ├── gff3_specifier.py
│ ├── integrity.py
│ ├── json_schema_factory.py
│ ├── load_sequence_data.py
│ ├── manifest.py
│ ├── manifest_stats.py
│ ├── prepare_genome.py
│ ├── read_json.py
│ ├── say_accession.py
│ └── seqregion_parser.py
└── io
└── genomio
├── assembly
│ └── download.py
├── database
│ └── factory.py
├── events
│ ├── dump.py
│ ├── format.py
│ └── load.py
├── fasta
│ └── process.py
├── genbank
│ ├── download.py
│ └── extract_data.py
├── genome_metadata
│ ├── dump.py
│ ├── extend.py
│ └── prepare.py
├── genome_stats
│ ├── compare.py
│ └── dump.py
├── gff3
│ ├── extract_annotation.py
│ └── process.py
├── manifest
│ ├── check_integrity.py
│ ├── compute_stats.py
│ └── generate.py
├── schemas
│ └── json
│ ├── factory.py
│ └── validate.py
├── seq_region
│ ├── dump.py
│ └── prepare.py
└── utils
├── archive_utils.py
└── json_utils.py

## License
Software as part of [Ensembl GenomIO](https://github.com/Ensembl/ensembl-genomio) is distributed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
53 changes: 53 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
API Setup and installation
===========================

Requirements
--------------

An Ensembl API checkout including:

- [ensembl-genomio](https://github.com/Ensembl/ensembl-genomio) (export /src/perl into PERL5LIB)
- [ensembl-hive](https://github.com/Ensembl/ensembl-hive)
- [ensembl-production](https://github.com/Ensembl/ensembl-production)
- [ensembl-analysis](https://github.com/Ensembl/ensembl-analysis/tree/dev/hive_master) (on dev/hive_master branch)
- [ensembl-taxonomy](https://github.com/Ensembl/ensembl-taxonomy)
- [ensembl-orm](https://github.com/Ensembl/ensembl-orm)

Software
--------------

- Python 3.8+
- Perl 5.26
- Bioperl 1.6.9+
ens-LCampbell marked this conversation as resolved.
Show resolved Hide resolved

Python Modules
--------------
- bcbio-gff
- biopython
- jsonschema
- intervaltree
- mysql-connector-python
- python-redmine
- requests


## Installation
--------------
### Directly from GitHub:
```
git clone https://github.com/Ensembl/ensembl-genomio
git clone https://github.com/Ensembl/ensembl-analysis -b dev/hive_master
git clone https://github.com/Ensembl/ensembl-production
git clone https://github.com/Ensembl/ensembl-hive
git clone https://github.com/Ensembl/ensembl-taxonomy
git clone https://github.com/Ensembl/ensembl-orm
```


### Documentation
Documentation for Ensembl-genomio generated using _mkdocs_. For full information visit [mkdocs.org](https://www.mkdocs.org).
#### Commands
* `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit.
47 changes: 47 additions & 0 deletions docs/pipelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Ensembl Genomio Pipelines:

## Genomio prepare pipeline
_Module [Bio::EnsEMBL::Pipeline::PipeConfig::BRC4_genome_prepare_conf]_

**Genome prepare pipeline for BRC/Metazoa**

#### Description
Retrieve data for a genome from INSDC and prepare the following files in a separate folder
for each genome:

- FASTA for DNA sequences
- FASTA for protein sequences
- GFF gene models
- JSON functional annotation
- JSON seq_region
- JSON genome
- JSON manifest

The JSON files follow the schemas defined in the /schemas folder.

These files can then be fed to the Genome loader pipeline.

### How to run

```
init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::BRC4_genome_prepare_conf \
--host $HOST --port $PORT --user $USER --pass $PASS \
--hive_force_init 1 \
--pipeline_dir temp/prepare \
--data_dir $INPUT \
--output_dir $OUTPUT \
${OTHER_OPTIONS}
```

### Parameters

| option | default value | meaning |
| - | - | - |
| `--pipeline_name` | brc4_genome_prepare | name of the hive pipeline
| `--pipeline_dir` | | temp directory for this pipeline run
| `--data_dir` | | directory with json files for each genome to prepare, following the format set by schemas/genome_schema.json
| `--output_dir` | | directory where the prepared files are to be stored
| `--merge_split_genes` | 0 | Sometimes the gene features are split in a gff file. Ensembl expects genes to be contiguous, so this option merge the parts into 1.
| `--exclude_seq_regions` | | Do not include those seq_regions (apply to all genomes, this should be seldom used)
| `--validate_gene_id` | 0 | Enforce a strong gene ID pattern (replace by GeneID if available)
| `--ensembl_mode` | 0 | By default, set additional metadata for BRC genomes. With this parameter, use vanilla Ensembl metadata.
Loading
Loading