Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortMeRNA: update version 4.3.6 #1316

Merged
merged 42 commits into from
Oct 18, 2023

Conversation

gallardoalba
Copy link
Contributor

@gallardoalba gallardoalba commented Jun 27, 2023

Main changes:

  • Log parameter removed (it is not available in the new version)
  • Merge/unmerge bash scripts replaced by equivaled functionalities (--out2)
  • New option included: --sout (separate paired and singleton aligned reads)
  • Indexdb_rna removed (included in the main function)
  • Include the possibility of using interleaved reads (-paired option)

@bernt-matthias
Copy link
Collaborator

Cool. One of my users just asked for an update. Can I help here?

@gallardoalba
Copy link
Contributor Author

Cool. One of my users just asked for an update. Can I help here?

Tomorrow I'll continue working on it; I'll ping you if I find any problem.

@gallardoalba
Copy link
Contributor Author

gallardoalba commented Aug 23, 2023

This is the problem that I found and that temporarily paralyzed the PR @bernt-matthias ; apparently Sortmerna generates an alignment in the temporary folder, and galaxy tries to index it without success, generating this error:

Screenshot from 2023-08-23 15-36-13

I tried to specify the path of this folder in order to provide an adequate extension, but I think it is not possible.

@bernt-matthias
Copy link
Collaborator

Can you check if the file is empty?

@gallardoalba
Copy link
Contributor Author

Can you check if the file is empty?

You are right, this is indeed the problem. I'll try to find a better input file.

@gallardoalba gallardoalba marked this pull request as ready for review August 30, 2023 17:11
@bgruening
Copy link
Owner

. ERROR: Test 1: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. ERROR: Test 2: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. ERROR: Test 5: Found output tag with unknown name [aligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. ERROR: Test 5: Found output tag with unknown name [unaligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. CHECK: 9 test(s) found.
Applying linter output... CHECK
.. INFO: 14 outputs found.
Applying linter inputs... WARNING
.. WARNING: Param input [num_alignments] 'name' attribute is redundant if argument implies the same name.

For the linting. Sorry for such a messy tool :(

@gallardoalba
Copy link
Contributor Author

. ERROR: Test 1: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. ERROR: Test 2: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. ERROR: Test 5: Found output tag with unknown name [aligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. ERROR: Test 5: Found output tag with unknown name [unaligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo']
.. CHECK: 9 test(s) found.
Applying linter output... CHECK
.. INFO: 14 outputs found.
Applying linter inputs... WARNING
.. WARNING: Param input [num_alignments] 'name' attribute is redundant if argument implies the same name.

For the linting. Sorry for such a messy tool :(

Now should be fine; some scripts were removed (e.g. merge-paired-reads.sh and unmerge-paired-reads.sh), and replaced by equivalent functionalities.

@bgruening
Copy link
Owner

@gallardoalba a profile version enables an own HOME dir for every job.

@gallardoalba
Copy link
Contributor Author

@gallardoalba a profile version enables an own HOME dir for every job.

Perfect, thanks for including it.

Copy link
Collaborator

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a large update. Good work. Here a few comments from my side.

tools/rna_tools/sortmerna/sortmerna.xml Outdated Show resolved Hide resolved
tools/rna_tools/sortmerna/sortmerna.xml Show resolved Hide resolved
</conditional>
<param name="strand_search" value="" />
<conditional name="databases_type">
<param name="databases_selector" value="history" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have a test for the cached case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to review it, because neither in the previous version nor in the update the test datatables seem to be available.

tools/rna_tools/sortmerna/macros.xml Outdated Show resolved Hide resolved
tools/rna_tools/sortmerna/macros.xml Show resolved Hide resolved
$ref.append('%s' % $db )
#end for
#else
#for $db in $databases_type.input_databases.fields.path.split(",")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is here that a single entry is selected and split by comma? How does this look like in practice?

Wondering if making the cached case multiple="true" would be an option? Might be more flexible (backward compatibility might be a bit tricky .. but not very)?

tools/rna_tools/sortmerna/sortmerna.xml Show resolved Hide resolved
#for $i, $reference in enumerate($ref)
--ref '$reference'
#end for
#if str( $databases_type.databases_selector ) != 'cached'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the cached / cached_to_index question. This seems to happen now in sortmerna.

But its still not clear how this works in practice, because for both options the user selects from the same data table?

Copy link
Contributor Author

@gallardoalba gallardoalba Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I cannot completely understand. Reference FASTA and indices can be provided in that way ./sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db

Perhaps @bebatut could have a look, since she wrote the datamanager. Otherwise I would propose to keep this section without changes. I think it should still work properly.

tools/rna_tools/sortmerna/sortmerna.xml Show resolved Hide resolved
@gallardoalba
Copy link
Contributor Author

Do you think it could be merged @bernt-matthias? I would like to test if it works with the installed indexed genomes.

@bernt-matthias
Copy link
Collaborator

I would like to test if it works with the installed indexed genomes.

Would be cool to have a test. Let me know if the PR is ready from your side and I will review and merge.

@gallardoalba
Copy link
Contributor Author

I would like to test if it works with the installed indexed genomes.

Would be cool to have a test. Let me know if the PR is ready from your side and I will review and merge.

Hi @bernt-matthias, I'm trying to create the test for the database, but I'm not sure how to create the file structure. According this https://github.com/bgruening/galaxytools/blob/master/data_managers/data_manager_sortmerna_database_downloader/data_manager/data_manager_sortmerna_download.py#L122 it seems to be fine, but don't know why the tool is not able to recognize it. Would you mind to have a look? Thanks a lot!

@bernt-matthias
Copy link
Collaborator

Hi @gallardoalba what is the state here?

Would you mind to have a look?

What exactly should I look at? Is there a failing test that I could examine?

@bernt-matthias
Copy link
Collaborator

Will add a test for cached data. Wondering if the loops are correct, i.e. in

#for $db in $databases_type.input_databases.fields.path.split(",")
we loop over a list derived from a comma separated string. But actually we have a select with multiple="true".

@bernt-matthias
Copy link
Collaborator

I get the impression that the use of (multiple?) cached references was already wrong in 2.1. But I guess most of the time a single one is used. The docs state

      --ref             STRING,STRING   FASTA reference file, index file                               mandatory
                                         (ex. --ref /path/to/file1.fasta,/path/to/index1)
                                         If passing multiple reference files, separate 
                                         them using the delimiter ':',
                                         (ex. --ref /path/to/file1.fasta,/path/to/index1:/path/to/file2.fasta,path/to/index2)

But Galaxy just executes with --ref REF1,REF2,REF3,....

Also the indexdb (actually indexdb_rna) executed with the datamanager is not used anymore. I guess we can / should ignore the indexes created by the data manager.

@bgruening bgruening merged commit 0e5cc6f into bgruening:master Oct 18, 2023
9 of 11 checks passed
@bernt-matthias
Copy link
Collaborator

Hi @bgruening .. I was still fixing bugs and adding tests wrt refereces. I stopped CI, but feel free to restart if you need the current state.

I will open a followup PR.

bernt-matthias added a commit to bernt-matthias/galaxytools that referenced this pull request Oct 19, 2023
followup on bgruening#1316 which was
not deployed

- use the same chached data for chached and cached_to_index
  i.e. now they differ only in that the later uses the dbprep macro

  if I get it right previously the datamanager precomputed indexes
  which could be used. this seems not possible anymore

  I suggest to leave the dm untouched (than the provided data will also
  work for old sortmerna versions)

- fix usage of cached data (did not work for multiple provided values)
  and add tests
bernt-matthias added a commit to bernt-matthias/galaxytools that referenced this pull request Oct 19, 2023
followup on bgruening#1316 which was
not deployed

- use the same chached data for chached and cached_to_index
  i.e. now they differ only in that the later uses the dbprep macro

  if I get it right previously the datamanager precomputed indexes
  which could be used. this seems not possible anymore

  I suggest to leave the dm untouched (than the provided data will also
  work for old sortmerna versions)

- fix usage of cached data (did not work for multiple provided values)
  and add tests
bgruening pushed a commit that referenced this pull request Oct 20, 2023
* sortmerna: finish update

followup on #1316 which was
not deployed

- use the same chached data for chached and cached_to_index
  i.e. now they differ only in that the later uses the dbprep macro

  if I get it right previously the datamanager precomputed indexes
  which could be used. this seems not possible anymore

  I suggest to leave the dm untouched (than the provided data will also
  work for old sortmerna versions)

- fix usage of cached data (did not work for multiple provided values)
  and add tests

* add missing test file

* also chached references are not optional

selects default to optional="true" which should not apply here.
also checkboxes do not work therefore.

* eliminate cached_to_index option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants