Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty aligning raw signals with Kit14 chemistry #3

Open
Cuypers-Wim opened this issue Jun 28, 2024 · 1 comment
Open

Difficulty aligning raw signals with Kit14 chemistry #3

Cuypers-Wim opened this issue Jun 28, 2024 · 1 comment

Comments

@Cuypers-Wim
Copy link

Dear RawAlign Team,

I am facing challenges in aligning raw signals generated using Kit14 chemistry. I ensured that the correct pore model was specified during the indexing process with the following command:

rawalign \
-d PlasmoDB-58_Pfalciparum3D7_Genome.ind \
-p /extern/local_kmer_models/r10_180mv_450bps_9mer/template_r10_9mer.model -t 32 PlasmoDB-58_Pfalciparum3D7_Genome.fasta

Then, I executed this command:

rawalign --dtw-evaluate-chains \
-t 32 -x sensitive PlasmoDB-58_Pfalciparum3D7_Genome.ind *.fast5 > mapping_plasmo.paf

However, I am encountering very low mapping rates: only 30% of reads from my Plasmodium dataset and just 1% from a virus dataset (both in-house datasets) align.

In contrast, when aligning subsets of R9 data from your pre-print included in the repository, at least 80% of reads map, which seems satisfactory (considering not all reads will match the reference genome).

I consider a read ‘unaligned’ if the line for that read in the PAF file contains only '*'.

Is it possible that I am doing something wrong with the commands as outlined above? In case you could examine my data (https://drive.google.com/drive/folders/1bRj_gOfOACqkQADOoJ6y5tAY-wJDAtqW?usp=sharing), I have included a subset of our in-house generated Plasmodium dataset (from our publication: https://journals.asm.org/doi/full/10.1128/mbio.01967-23). I included the RawAlign index file of the Plasmodium reference genome, the output paf files, and some reads in FAST5 format. The reads were originally POD5 files that I converted to FAST5 using ONT’s POD5 toolkit (https://github.com/nanoporetech/pod5-file-format):

pod5 convert to_fast5 "$pod5_files" --output pod5_to_fast5/

It would be extremely helpful if you could help me determine if the issue lies with the commands, or rather the datasets. Additionally, do you have access to any reference dataset known to work well with the latest nanopore chemistry that I could use for comparison?

Thank you for your assistance!

Best regards,

Wim

@joellindegger
Copy link
Collaborator

Hi Wim,

We developed RawAlign and optimized its default parameters on R9.4.
Other pore models require parameter sweeps, in particular the match bonus and min score thresholds likely need to be chosen differently, copying from the documentation:

--dtw-match-bonus FLOAT     | DTW bonus score per aligned read event (default: 0.4)
--dtw-min-score FLOAT       | DTW minimum alignment score for a candidate to be considered mapped (default: 20.0)

RawHash2 is better optimized for R10 data, and since it now includes most of RawAlign's options, in addition to several improvements to the seeding and chaining stages, we recommend using it for R10 data.

Best,
Joel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants