Skip to content

Commit

Permalink
Delete mitogenome files that contain more than one sequence.
Browse files Browse the repository at this point in the history
  • Loading branch information
charles-plessy committed Sep 24, 2024
1 parent 44220a8 commit d0f133f
Show file tree
Hide file tree
Showing 6 changed files with 9 additions and 26 deletions.
17 changes: 5 additions & 12 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,15 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## unreleased
## v2.0.0 - September 24th, 2024 (Lama glama)

Allow TSV format and change column names to `id` and `file`.
- Allow TSV format and change column names to `id` and `file`.
- Delete mitogenome files that contain more than one sequence.

## v1.1.0 - September 24th, 2024 (Mus caroli)

Expanded the pattern matching chromosome contigs to `^(CM|CP|FR|L[R-T]|O[U-Z])`.
- Expanded the pattern matching chromosome contigs to `^(CM|CP|FR|L[R-T]|O[U-Z])`.

## v1.0.0 - September 20th, 2024 (Orang Outan)

Initial release of oist/LuscombeU_stlpreprocess, created with the [nf-core](https://nf-co.re/) template.

### `Added`

### `Fixed`

### `Dependencies`

### `Deprecated`
- Initial release of oist/LuscombeU_stlpreprocess, created with the [nf-core](https://nf-co.re/) template.
11 changes: 0 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,6 @@
3. Extract mitochondrial genomes from the assembly file (they might be useful later as an internal control).
4. Summarises the occurence of the first two letters of the accession numbers, to ease future changes of the grepping pattern for whole-chromosome scaffolds.

## TODO

- Some assemblies only contain fragmented condigs of the mitochondrial genome.
Filter them out.
- In `GCA_000146795` they match `Contig` and `>AD` while
complete Primate mitogenomes do not (`>CM`, `>CP`, `>J0`, `>KT`, `>LN`).
- In `GCA_015711505` they match `HiC_scaffold` and `>JA` while
complete Glire mitogenomes do not (`>AA`, `>AY`, `>CM`, `>JA`, `>LR`, `>OR`, `>OW`, `>OX`, `>OY`, `>OZ`).
- In `GCA_019903745` they match `>JA`, while no other Artiodactyla does.
This said, maybe it will be easier to just delete files that contain more than one sequence?

## Usage

> [!NOTE]
Expand Down
Binary file added assets/GRCh38.mito1.fa.gz
Binary file not shown.
File renamed without changes.
3 changes: 2 additions & 1 deletion assets/samplesheet.tsv
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
id file other
GRCh38 assets/GRCh38.head.fa.gz unused column
mitoch assets/GRCh38.mito.fa.gz added for tests
mito_1 assets/GRCh38.mito1.fa.gz added for tests
mito_2 assets/GRCh38.mito2.fa.gz
4 changes: 2 additions & 2 deletions modules/local/mitogenome.nf
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ process MITOGENOME {
${sequence} \\
-o ${prefix}.mitogenome.${suffix}.gz \\
# Remove output if empty
[ -z "\$(zcat ${prefix}.mitogenome.${suffix}.gz | head)" ] && rm ${prefix}.mitogenome.${suffix}.gz
# Remove if containing less or more than one sequence
[ \$(zcat ${prefix}.mitogenome.${suffix}.gz | grep -c '>') -ne 1 ] && rm ${prefix}.mitogenome.${suffix}.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down

0 comments on commit d0f133f

Please sign in to comment.