Calculation of λ for no crontrol samples (ATAC seq) #553
tahirsajid
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Recently, I started using MACS2 for peak calling in my ATAC seq data analysis. I was trying to understand how MACS2 calculates lambda for peak calling in ATAC seq samples where no control sample is included. Although, I was not able to completely understand it; however, to satisfy my brain, I developed the concept that probably the lambda (λ) is also calculated in ATAC seq samples as a measure of randomly expected mappability of any nucleotide of a read to any nucleotide in the mappable genome. This λ through poison distribution is used to call peaks in comparison to observed reads in the sample. I developed this concept from the following formula provided in a MACS2-based analysis workshop:
λ = (Read Length * Total Number of Reads) / Mappable Genome Length
As per my understanding, the lambda above takes into account the effect of the number of all reads aligned to a region (local λ) or to whole genome (BG λ). All this practice is done to remove the noise in the data to avoid false positive peaks. In theory, we generally believe that TN5 cuts the open chromatin randomly; therefore, ideally, we expect that every cut fragment should be from the open chromatin that we are interested to investigate. Keeping in mind this scenario, the noise in the case of ATAC seq data can only be due to non-specific mapping of some fragments, and we don't want to call these signals as peaks. Other than that every high-quality mapping should be considered as a signal for peak.
My reservation is that instead of using λ that carries the effect of total number of reads mapped to genome or local regions, why the λ that carries only the effect of non-specific mapped reads (reads with low mapping quality score) is not used?
If my above explanation makes sense, is it possible to model the number of reads with low-quality mapping quality score in the calculation of λ? In my understanding, it will help in reducing unnecessary removal of true signals (false negative) while calling peaks. The options to control false positive peaks will be still there through multiple testing and using sample replicates.
I am just a beginning user of peak callers. I am aware that my knowledge can be limited. I just wanted to express my observations and a suggestion. If you find these observations and suggestion not valid, I would highly appreciate if you advise me some better logics to correct my concepts.
Kind regards,
Sajid
Beta Was this translation helpful? Give feedback.
All reactions