Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Alignment Results Using Soft-Masked Genome in Bismark #705

Open
Chanspace opened this issue Oct 14, 2024 · 2 comments
Open

Issue with Alignment Results Using Soft-Masked Genome in Bismark #705

Chanspace opened this issue Oct 14, 2024 · 2 comments

Comments

@Chanspace
Copy link

I am currently conducting Whole Genome Bisulfite Sequencing (WGBS) data analysis using Bismark and plan to utilize a soft-masked genome, where all repetitive and low-complexity regions are marked with lowercase letters.

During the index generation step, I observed that the index created is consistent with the unmasked genome. However, I noticed a significant difference in the results during the alignment step, specifically in the number of uniquely aligned reads. It appears that tools like Bowtie2 ignore the soft-masking, treating the lowercase letters as uppercase during alignment.

Is there a specific parameter or approach in Bismark that would allow me to achieve alignment results with the soft-masked genome that are comparable to those obtained with the unmasked genome? Any guidance or advice would be greatly appreciated!

Thank you!

@FelixKrueger
Copy link
Owner

FelixKrueger commented Oct 15, 2024

To be perfectly honest, I don't exactly know whether or not Bowtie2 treats soft-masked genomes differently to unmasked genomes but I don't think it does (Google also doesn't seem to know, "how does Bowtie2 treat soft-masked index" didn't yield any great insights either).

What would you like to achieve by soft-masking repeats?

@Chanspace
Copy link
Author

I'm sorry, I may not have expressed myself clearly. What I actually want to know is how to ensure consistent detection rates when using unmasked and soft-masked genomes in Bismark. The reason is that we have utilized soft-masked genomes in other omics analyses, so we hope to maintain consistency. However, we compared unmasked and soft-masked genomes in WGBS data analysis with bismark, and even though the generated indexes are the same, there are still differences in the subsequent methylation detection rates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants