Genotyping Many SNPs with multidog()
-David Gerard
+David +Gerard
Source:vignettes/multidog.Rmd
@@ -93,28 +95,40 @@ David Gerard
Abstract
-multidog()
provides support for genotyping many SNPs by iterating flexdog()
over the SNPs. Support is provided for parallel computing through the future
package. The genotyping method is described in Gerard et al. (2018) and Gerard and Ferrão (2020).
multidog()
provides support for genotyping many SNPs by
+iterating flexdog()
over the SNPs. Support is provided for
+parallel computing through the future
+package. The genotyping method is described in Gerard et al. (2018) and
+Gerard and Ferrão (2020).
Fit multidog()
-Let’s load updog
, future
, and the data from Uitdewilligen et al. (2013).
Let’s load updog
, future
, and the data from
+Uitdewilligen et al. (2013).
uitdewilligen$refmat
is a matrix of reference counts while uitdewilligen$sizemat
is a matrix of total read counts. In these data, the rows index the individuals and the columns index the loci. But for insertion into multidog()
we need it the other way around (individuals in the columns and loci in the rows). So we will transpose these matrices.
uitdewilligen$refmat
is a matrix of reference counts
+while uitdewilligen$sizemat
is a matrix of total read
+counts. In these data, the rows index the individuals and the columns
+index the loci. But for insertion into multidog()
we need
+it the other way around (individuals in the columns and loci in the
+rows). So we will transpose these matrices.
refmat <- t(uitdewilligen$refmat)
sizemat <- t(uitdewilligen$sizemat)
ploidy <- uitdewilligen$ploidy
sizemat
and refmat
should have the same row and column names. These names identify the loci and the individuals.
sizemat
and refmat
should have the same row
+and column names. These names identify the loci and the individuals.
setdiff(colnames(sizemat), colnames(refmat))
#> character(0)
setdiff(rownames(sizemat), rownames(refmat))
#> character(0)
If we want to do parallel computing, we should check that we have the proper number of cores:
+If we want to do parallel computing, we should check that we have the +proper number of cores:
future::availableCores()
#> system
@@ -142,7 +156,14 @@ Fit multidog()
future::plan(future::multisession, workers = nc)
if nc
is greater than 1. You can choose your own evaluation strategy by running future::plan()
prior to running multidog()
, and then setting nc = NA
. This should be particularly useful in higher performance computing environments that use schedulers, where you can control the evaluation strategy through the future.batchtools
package. For example, the following will run multidog()
using forked R processes:
if nc
is greater than 1. You can choose your own
+evaluation strategy by running future::plan()
prior to
+running multidog()
, and then setting nc = NA
.
+This should be particularly useful in higher performance computing
+environments that use schedulers, where you can control the evaluation
+strategy through the future.batchtools
+package. For example, the following will run multidog()
+using forked R processes:
future::plan(future::multicore, workers = 2)
mout <- multidog(refmat = refmat,
@@ -169,7 +190,9 @@
#>
#> [[3]]
-The output of multidog contains two data frame. The first contains properties of the SNPs, such as estimated allele bias and estimated sequencing error rate.
+The output of multidog contains two data frame. The first contains
+properties of the SNPs, such as estimated allele bias and estimated
+sequencing error rate.
str(mout$snpdf)
#> 'data.frame': 100 obs. of 20 variables:
@@ -193,7 +216,10 @@
#> $ Pr_4 : num 0.592065 0.002423 0.000482 0.381024 0.149179 ...
#> $ mu : num 4.18 1.01 -1 3.75 2.29 ...
#> $ sigma : num 1.067 0.925 1.289 1.481 1.433 ...
-The second data frame contains properties of each individual at each SNP, such as the estimated genotypes (geno
) and the posterior probability of being genotyping correctly (maxpostprob
).
+The second data frame contains properties of each individual at each
+SNP, such as the estimated genotypes (geno
) and the
+posterior probability of being genotyping correctly
+(maxpostprob
).
str(mout$inddf)
#> 'data.frame': 1000 obs. of 17 variables:
@@ -214,7 +240,8 @@
#> $ logL_2 : num -13.27 -6.95 -13.93 -29.29 -25.69 ...
#> $ logL_3 : num -2.55 -4 -2.79 -11.49 -10.06 ...
#> $ logL_4 : num -25.804 -38.999 -15.935 -0.181 -0.158 ...
-You can obtain the columns in inddf
in matrix form with format_multidog()
.
+You can obtain the columns in inddf
in matrix form with
+format_multidog()
.
genomat <- format_multidog(mout, varname = "geno")
head(genomat)
@@ -232,7 +259,14 @@
#> PotVar0066020 4 2
#> PotVar0003381 4 3
#> PotVar0131622 3 3
-To filter SNPs based on quality metrics (bias, sequencing error rate, overdispersion, etc), you can use filter_snp()
, which uses the same non-standard evaluation you are used to from dplyr::filter()
. That is, you can define predicates in terms of the variable names in the snpdf
data frame from the output of mupdog()
. It then keeps rows in both snpdf
and inddf
where the predicate for a SNP evaluates to TRUE
.
+To filter SNPs based on quality metrics (bias, sequencing error rate,
+overdispersion, etc), you can use filter_snp()
, which uses
+the same non-standard evaluation you are used to from
+dplyr::filter()
. That is, you can define predicates in
+terms of the variable names in the snpdf
data frame from
+the output of mupdog()
. It then keeps rows in both
+snpdf
and inddf
where the predicate for a SNP
+evaluates to TRUE
.
dim(mout$snpdf)
#> [1] 100 20
@@ -247,9 +281,16 @@
References
-Gerard, David, and Luís Felipe Ventorim Ferrão. “Priors for genotyping polyploids.” Bioinformatics 36, no. 6 (2020): 1795-1800. https://doi.org/10.1093/bioinformatics/btz852.
-Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
-Uitdewilligen, Anne-Marie A. AND D’hoop, Jan G. A. M. L. AND Wolters. 2013. “A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato.” PLOS ONE 8 (5). Public Library of Science: 1–14. https://doi.org/10.1371/journal.pone.0062355.
+Gerard, David, and Luís Felipe Ventorim Ferrão. “Priors for
+genotyping polyploids.” Bioinformatics 36, no. 6 (2020):
+1795-1800. https://doi.org/10.1093/bioinformatics/btz852.
+Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco
+Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy
+Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
+Uitdewilligen, Anne-Marie A. AND D’hoop, Jan G. A. M. L. AND Wolters.
+2013. “A Next-Generation Sequencing Method for Genotyping-by-Sequencing
+of Highly Heterozygous Autotetraploid Potato.” PLOS ONE 8 (5).
+Public Library of Science: 1–14. https://doi.org/10.1371/journal.pone.0062355.
diff --git a/docs/articles/oracle_calculations.html b/docs/articles/oracle_calculations.html
index e776cb8..99b7887 100644
--- a/docs/articles/oracle_calculations.html
+++ b/docs/articles/oracle_calculations.html
@@ -78,10 +78,12 @@
-
+
+
Oracle Calculations
- David Gerard
+ David
+Gerard
Source: vignettes/oracle_calculations.Rmd
@@ -93,25 +95,44 @@ David Gerard
Abstract
-We provide some example usage of the oracle calculations available in updog
. These are particularly useful for read-depth determination. These calculations are described in detail in Gerard et al. (2018).
+We provide some example usage of the oracle calculations available in
+updog
. These are particularly useful for read-depth
+determination. These calculations are described in detail in Gerard et
+al. (2018).
Controlling Misclassification Error
-Suppose we have a sample of tetraploid individuals derived from an S1 cross (a single generation of selfing). Using domain expertise (either from previous studies or a pilot analysis), we’ve determined that our sequencing technology will produce relatively clean data. That is, the sequencing error rate will not be too large (say, ~0.001), the bias will be moderate (say, ~0.7 at the most extreme), and the majority of SNPs will have reasonable levels of overdispersion (say, less than 0.01). We want to know how deep we need to sequence.
-Using oracle_mis
, we can see how deep we need to sequence under the worst-case scenario we want to control (sequencing error rate = 0.001, bias = 0.7, overdispersion = 0.01) in order to obtain a misclassification error rate of at most, say, 0.05.
+Suppose we have a sample of tetraploid individuals derived from an S1
+cross (a single generation of selfing). Using domain expertise (either
+from previous studies or a pilot analysis), we’ve determined that our
+sequencing technology will produce relatively clean data. That is, the
+sequencing error rate will not be too large (say, ~0.001), the bias will
+be moderate (say, ~0.7 at the most extreme), and the majority of SNPs
+will have reasonable levels of overdispersion (say, less than 0.01). We
+want to know how deep we need to sequence.
+Using oracle_mis
, we can see how deep we need to
+sequence under the worst-case scenario we want to control (sequencing
+error rate = 0.001, bias = 0.7, overdispersion = 0.01) in order to
+obtain a misclassification error rate of at most, say, 0.05.
bias <- 0.7
od <- 0.01
seq <- 0.001
maxerr <- 0.05
-Before we do this, we also need the distribution of the offspring genotypes. We can get this distribution assuming various parental genotypes using the get_q_array
function. Typically, error rates will be larger when the allele-frequency is closer to 0.5. So we’ll start in the worst-case scenario of assuming that the parent has 2 copies of the reference allele.
+Before we do this, we also need the distribution of the offspring
+genotypes. We can get this distribution assuming various parental
+genotypes using the get_q_array
function. Typically, error
+rates will be larger when the allele-frequency is closer to 0.5. So
+we’ll start in the worst-case scenario of assuming that the parent has 2
+copies of the reference allele.
library(updog)
ploidy <- 4
pgeno <- 2
gene_dist <- get_q_array(ploidy = ploidy)[pgeno + 1, pgeno + 1, ]
-This is what the genotype distribution for the offspring looks like:
+This is what the genotype distribution for the offspring looks
+like:
library(ggplot2)
distdf <- data.frame(x = 0:ploidy, y = 0, yend = gene_dist)
@@ -121,7 +142,8 @@ Controlling Misclassification Error
xlab("Allele Dosage") +
ylab("Probability")
-Now, we are ready to iterate through read-depth’s until we reach one with an error rate less than 0.05.
+Now, we are ready to iterate through read-depth’s until we reach one
+with an error rate less than 0.05.
err <- Inf
depth <- 0
@@ -136,13 +158,24 @@ Controlling Misclassification Error
}
depth
#> [1] 90
-Looks like we need a depth of 90 in order to get a misclassification error rate under 0.05.
-Note that oracle_mis
returns the best misclassification error rate possible under these conditions (ploidy
= 4, bias
= 0.7, seq
= 0.001, od
= 0.01, and pgeno
= 2). In your actual analysis, you will have a worse misclassification error rate than that returned by oracle_mis
. However, if you have a lot of individuals in your sample, then this will act as a reasonable approximation to the error rate. In general though, you should sequence a little deeper than suggested by oracle_mis
.
+Looks like we need a depth of 90 in order to get a misclassification
+error rate under 0.05.
+Note that oracle_mis
returns the best
+misclassification error rate possible under these conditions
+(ploidy
= 4, bias
= 0.7, seq
=
+0.001, od
= 0.01, and pgeno
= 2). In your
+actual analysis, you will have a worse misclassification error rate than
+that returned by oracle_mis
. However, if you have a lot of
+individuals in your sample, then this will act as a reasonable
+approximation to the error rate. In general though, you should sequence
+a little deeper than suggested by oracle_mis
.
Visualizing the Joint Distribution
-Suppose we only have a budget to sequence to a depth of 30. Then what errors can we expect? We can use oracle_joint
and oracle_plot
to visualize the errors we can expect.
+Suppose we only have a budget to sequence to a depth of 30. Then what
+errors can we expect? We can use oracle_joint
and
+oracle_plot
to visualize the errors we can expect.
depth <- 30
jd <- oracle_joint(n = depth,
@@ -153,8 +186,12 @@ Visualizing the Joint Distribution dist = gene_dist)
oracle_plot(jd)
-Most of the errors will be mistakes between genotypes 2/3 and mistakes between genotypes 1/2.
-Even though the misclassification error rate is pretty high (0.14), the correlation of the oracle estimator with the true genotype is pretty reasonable (0.89). You can obtain this using the oracle_cor
function.
+Most of the errors will be mistakes between genotypes 2/3 and
+mistakes between genotypes 1/2.
+Even though the misclassification error rate is pretty high (0.14),
+the correlation of the oracle estimator with the true genotype is pretty
+reasonable (0.89). You can obtain this using the oracle_cor
+function.
ocorr <- oracle_cor(n = depth,
ploidy = ploidy,
@@ -168,7 +205,9 @@ Visualizing the Joint Distribution
References
-Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
+Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco
+Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy
+Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
diff --git a/docs/articles/simulate_ngs.html b/docs/articles/simulate_ngs.html
index 7bd7740..d63e2a2 100644
--- a/docs/articles/simulate_ngs.html
+++ b/docs/articles/simulate_ngs.html
@@ -78,10 +78,12 @@
-
+
+
Simulate Next-Generation Sequencing Data
- David Gerard
+ David
+Gerard
Source: vignettes/simulate_ngs.Rmd
@@ -93,12 +95,15 @@ David Gerard
Abstract
-We demonstrate how to simulate NGS data under various genotype distributions, then fit these data using flexdog
. The genotyping methods are described in Gerard et al. (2018).
+We demonstrate how to simulate NGS data under various genotype
+distributions, then fit these data using flexdog
. The
+genotyping methods are described in Gerard et al. (2018).
Analysis
-Let’s suppose that we have 100 hexaploid individuals, with varying levels of read-depth.
+Let’s suppose that we have 100 hexaploid individuals, with varying
+levels of read-depth.
set.seed(1)
library(updog)
@@ -107,18 +112,29 @@ Analysissizevec <- round(stats::runif(n = nind,
min = 50,
max = 200))
-We can simulate their read-counts under various genotype distributions, allele biases, overdispersions, and sequencing error rates using the rgeno
and rflexdog
functions.
+We can simulate their read-counts under various genotype
+distributions, allele biases, overdispersions, and sequencing error
+rates using the rgeno
and rflexdog
+functions.
F1 Population
-Suppose these individuals are all siblings where the first parent has 4 copies of the reference allele and the second parent has 5 copies of the reference allele. Then the following code, using rgeno
, will simulate the individuals’ genotypes.
+Suppose these individuals are all siblings where the first parent has
+4 copies of the reference allele and the second parent has 5 copies of
+the reference allele. Then the following code, using rgeno
,
+will simulate the individuals’ genotypes.
true_geno <- rgeno(n = nind,
ploidy = ploidy,
model = "f1",
p1geno = 4,
p2geno = 5)
-Once we have their genotypes, we can simulate their read-counts using rflexdog
. Let’s suppose that there is a moderate level of allelic bias (0.7) and a small level of overdispersion (0.005). Generally, in the real data that I’ve seen, the bias will range between 0.5 and 2 and the overdispersion will range between 0 and 0.02, with only a few extremely overdispersed SNPs above 0.02.
+Once we have their genotypes, we can simulate their read-counts using
+rflexdog
. Let’s suppose that there is a moderate level of
+allelic bias (0.7) and a small level of overdispersion (0.005).
+Generally, in the real data that I’ve seen, the bias will range between
+0.5 and 2 and the overdispersion will range between 0 and 0.02, with
+only a few extremely overdispersed SNPs above 0.02.
refvec <- rflexdog(sizevec = sizevec,
geno = true_geno,
@@ -167,7 +183,8 @@ F1 Population#> Keeping old fit.
#>
#> Done!
-flexdog
gives us reasonable genotyping, and it accurately estimates the proportion of individuals mis-genotyped.
+flexdog
gives us reasonable genotyping, and it
+accurately estimates the proportion of individuals mis-genotyped.
plot(fout)
@@ -184,7 +201,8 @@ F1 Population
HWE Population
-Now run the same simulations assuming the individuals are in Hardy-Weinberg population with an allele frequency of 0.75.
+Now run the same simulations assuming the individuals are in
+Hardy-Weinberg population with an allele frequency of 0.75.
true_geno <- rgeno(n = nind,
ploidy = ploidy,
@@ -246,7 +264,9 @@ HWE Population
References
-Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
+Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco
+Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy
+Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
diff --git a/docs/articles/smells_like_updog.html b/docs/articles/smells_like_updog.html
index 1706681..bf51035 100644
--- a/docs/articles/smells_like_updog.html
+++ b/docs/articles/smells_like_updog.html
@@ -78,10 +78,12 @@
-
+
+
Example Use of Updog
- David Gerard
+ David
+Gerard
Source: vignettes/smells_like_updog.Rmd
@@ -93,11 +95,23 @@ David Gerard
What’s Updog?
-Updog is a package containing empirical Bayes approaches to genotype individuals (particularly polyploids) from next generation sequencing (NGS) data. We had in mind NGS data that results from a reduced representation library, such as “genotyping-by-sequencing” (GBS) (Elshire et al., 2011) or “restriction site-associated DNA sequencing” (RAD-seq) (Baird et al., 2008).
-Updog wields the power of hierarchical modeling to account for some key features of NGS data overlooked in most other analyses, particularly allelic bias and overdispersion. Updog will also automatically account for sequencing errors.
-To efficiently account for these features, updog needs to know the distribution of the individual genotypes in the population. The function flexdog
can accurately estimate this distribution under a wide variety of situations.
+Updog is a package containing empirical Bayes approaches to genotype
+individuals (particularly polyploids) from next generation sequencing
+(NGS) data. We had in mind NGS data that results from a reduced
+representation library, such as “genotyping-by-sequencing” (GBS)
+(Elshire et al., 2011) or “restriction site-associated DNA sequencing”
+(RAD-seq) (Baird et al., 2008).
+Updog wields the power of hierarchical modeling to account for some
+key features of NGS data overlooked in most other analyses, particularly
+allelic bias and overdispersion. Updog will also automatically account
+for sequencing errors.
+To efficiently account for these features, updog needs to know the
+distribution of the individual genotypes in the population. The function
+flexdog
can accurately estimate this distribution under a
+wide variety of situations.
You can read more about the updog method in Gerard et al. (2018).
-In this vignette, we will go through one example of fitting flexdog
on an S1 population of individuals.
+In this vignette, we will go through one example of fitting
+flexdog
on an S1 population of individuals.
Example from an S1 Population
@@ -105,7 +119,12 @@ Example from an S1 Population
Fit updog
-Load updog
and the snpdat
dataset. The data frame snpdat
contains three example SNPs (single nucleotide polymorphisms) from the study of Shirasawa et al. (2017). The individuals in this dataset resulted from a single generation of selfing (an S1 population). You can read more about it by typing ?snpdat
.
+Load updog
and the snpdat
dataset. The data
+frame snpdat
contains three example SNPs (single nucleotide
+polymorphisms) from the study of Shirasawa et al. (2017). The
+individuals in this dataset resulted from a single generation of selfing
+(an S1 population). You can read more about it by typing
+?snpdat
.
set.seed(1)
library(updog)
@@ -123,18 +142,24 @@ Fit updog#> 4 157 184 Xushu18S1-003
#> 5 175 215 Xushu18S1-004
#> 6 283 283 Xushu18S1-005
-We will separate the counts between the children and the parent (the first individual). Note that you do not need the parental counts to fit updog, but they can help improve estimates of the parameters in the updog model.
+We will separate the counts between the children and the parent (the
+first individual). Note that you do not need the parental
+counts to fit updog, but they can help improve estimates of the
+parameters in the updog model.
pref <- smalldat$counts[1]
psize <- smalldat$size[1]
oref <- smalldat$counts[-1]
osize <- smalldat$size[-1]
ploidy <- 6 # sweet potatoes are hexaploid
-We can first use plot_geno
to visualize the raw data.
+We can first use plot_geno
to visualize the raw
+data.
plot_geno(refvec = oref, sizevec = osize, ploidy = ploidy)
-Now we use the flexdog
function to fit the model. We use model = "s1"
because the individuals resulted from one generation of selfing of the same parent.
+Now we use the flexdog
function to fit the model. We use
+model = "s1"
because the individuals resulted from one
+generation of selfing of the same parent.
uout <- flexdog(refvec = oref,
sizevec = osize,
@@ -172,7 +197,15 @@ Fit updog
Analyze Output
-We use plot.flexdog
to visualize the fit. Points are color coded according to the genotype with the highest posterior probability. For example, a genotype of “4” represents four copies of the reference allele and two copies of the alternative allele (AAAAaa). The level of transparency is proportional to the maximum posterior probability. This is equivalent to the posterior probability that this genotype estimate is correct. The lines represent the mean counts at a given genotype. The “+” symbol with a black dot is the location of the parent.
+We use plot.flexdog
to visualize the fit. Points are
+color coded according to the genotype with the highest posterior
+probability. For example, a genotype of “4” represents four copies of
+the reference allele and two copies of the alternative allele (AAAAaa).
+The level of transparency is proportional to the maximum posterior
+probability. This is equivalent to the posterior probability that this
+genotype estimate is correct. The lines represent the mean counts at a
+given genotype. The “+” symbol with a black dot is the location of the
+parent.
plot(uout)
@@ -180,22 +213,45 @@ Analyze Output
Filtering SNPs
-For downstream analyses, you might want to filter out poorly behaved SNPs. These SNPs might be poorly behaved for a variety of reasons (they might not be real SNPs, it might be much more very difficult to map one allele to the correct location relative to the other allele, etc). Updog gives you some measures to filter out these SNPs.
-The most intuitive measure would be the (posterior) proportion of individuals mis-genotyped:
+For downstream analyses, you might want to filter out poorly behaved
+SNPs. These SNPs might be poorly behaved for a variety of reasons (they
+might not be real SNPs, it might be much more very difficult to map one
+allele to the correct location relative to the other allele, etc). Updog
+gives you some measures to filter out these SNPs.
+The most intuitive measure would be the (posterior) proportion of
+individuals mis-genotyped:
uout$prop_mis
#> [1] 0.04216399
-For this SNP, we expect about 4.22 percent of the individuals to be mis-genotyped. The specific cutoff you use is context and data dependent. But as a starting point, you could try a loose cutoff by keeping SNPs only if they have a prop_mis
of less than 0.2.
-From our simulation studies, we also generally get rid of SNPs with overdispersion parameters greater than 0.05 or SNPs with bias parameters either less than 0.5 or greater than 2. However, if you have higher or lower read depths than what we looked at in our simulations, you should adjust these levels accordingly.
+For this SNP, we expect about 4.22 percent of the individuals to be
+mis-genotyped. The specific cutoff you use is context and data
+dependent. But as a starting point, you could try a loose cutoff by
+keeping SNPs only if they have a prop_mis
of less than
+0.2.
+From our simulation studies, we also generally get rid of SNPs with
+overdispersion parameters greater than 0.05 or SNPs with bias parameters
+either less than 0.5 or greater than 2. However, if you have higher or
+lower read depths than what we looked at in our simulations, you should
+adjust these levels accordingly.
References
-Baird, Paul D. AND Atwood, Nathan A. AND Etter. 2008. “Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers.” PLOS ONE 3 (10). Public Library of Science: 1–7. https://doi.org/10.1371/journal.pone.0003376.
-Elshire, Jeffrey C. AND Sun, Robert J. AND Glaubitz. 2011. “A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species.” PLOS ONE 6 (5). Public Library of Science: 1–10. https://doi.org/10.1371/journal.pone.0019379.
-Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
-Shirasawa, Kenta, Masaru Tanaka, Yasuhiro Takahata, Daifu Ma, Qinghe Cao, Qingchang Liu, Hong Zhai, et al. 2017. “A High-Density SNP Genetic Map Consisting of a Complete Set of Homologous Groups in Autohexaploid Sweetpotato (Ipomoea batatas).” Scientific Reports 7. Nature Publishing Group. https://doi.org/10.1038/srep44207.
+Baird, Paul D. AND Atwood, Nathan A. AND Etter. 2008. “Rapid SNP
+Discovery and Genetic Mapping Using Sequenced RAD Markers.” PLOS
+ONE 3 (10). Public Library of Science: 1–7. https://doi.org/10.1371/journal.pone.0003376.
+Elshire, Jeffrey C. AND Sun, Robert J. AND Glaubitz. 2011. “A Robust,
+Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity
+Species.” PLOS ONE 6 (5). Public Library of Science: 1–10. https://doi.org/10.1371/journal.pone.0019379.
+Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco
+Garcia, and Matthew Stephens. 2018. “Genotyping Polyploids from Messy
+Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
+Shirasawa, Kenta, Masaru Tanaka, Yasuhiro Takahata, Daifu Ma, Qinghe
+Cao, Qingchang Liu, Hong Zhai, et al. 2017. “A High-Density SNP Genetic
+Map Consisting of a Complete Set of Homologous Groups in Autohexaploid
+Sweetpotato (Ipomoea batatas).” Scientific Reports 7.
+Nature Publishing Group. https://doi.org/10.1038/srep44207.
diff --git a/docs/index.html b/docs/index.html
index 18075c9..9d1bb32 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -148,36 +148,40 @@ InstallationHow to Cite
Please cite
-Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping Polyploids from Messy Sequencing Data. Genetics, 210(3), 789-807. doi: 10.1534/genetics.118.301468.
+
+Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping Polyploids from Messy Sequencing Data. Genetics, 210(3), 789-807. doi: 10.1534/genetics.118.301468.
+
Or, using BibTex:
-
- @article {gerard2018genotyping,\~a}o, Lu{\'i}s Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew},
- author = {Gerard, David and Ferr{
- title = {Genotyping Polyploids from Messy Sequencing Data},
- volume = {210},
- number = {3},
- pages = {789--807},
- year = {2018},
- doi = {10.1534/genetics.118.301468},
- publisher = {Genetics},
- issn = {0016-6731},
- URL = {https://doi.org/10.1534/genetics.118.301468},
- journal = {Genetics} }
+@article {gerard2018genotyping,
+ author = {Gerard, David and Ferr{\~a}o, Lu{\'i}s Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew},
+ title = {Genotyping Polyploids from Messy Sequencing Data},
+ volume = {210},
+ number = {3},
+ pages = {789--807},
+ year = {2018},
+ doi = {10.1534/genetics.118.301468},
+ publisher = {Genetics},
+ issn = {0016-6731},
+ URL = {https://doi.org/10.1534/genetics.118.301468},
+ journal = {Genetics}
+}
If you are using the proportional normal prior class (model = "norm"
), which is also the default prior, then please also cite:
-Gerard D, Ferrão L (2020). “Priors for Genotyping Polyploids.” Bioinformatics, 36(6), 1795-1800. ISSN 1367-4803, doi: 10.1093/bioinformatics/btz852.
+
+Gerard D, Ferrão L (2020). “Priors for Genotyping Polyploids.” Bioinformatics, 36(6), 1795-1800. ISSN 1367-4803, doi: 10.1093/bioinformatics/btz852.
+
Or, using BibTex:
-
- @article{gerard2020priors,
- title = {Priors for Genotyping Polyploids},
- year = {2020},
- journal = {Bioinformatics},
- publisher = {Oxford University Press},
- volume = {36},
- number = {6},
- pages = {1795--1800},
- issn = {1367-4803},
- doi = {10.1093/bioinformatics/btz852},\'i}s Felipe Ventorim Ferr{\~a}o},
- author = {David Gerard and Lu{ }
+@article{gerard2020priors,
+ title = {Priors for Genotyping Polyploids},
+ year = {2020},
+ journal = {Bioinformatics},
+ publisher = {Oxford University Press},
+ volume = {36},
+ number = {6},
+ pages = {1795--1800},
+ issn = {1367-4803},
+ doi = {10.1093/bioinformatics/btz852},
+ author = {David Gerard and Lu{\'i}s Felipe Ventorim Ferr{\~a}o},
+ }
Code of Conduct
diff --git a/docs/news/index.html b/docs/news/index.html
index 203455b..468cc9f 100644
--- a/docs/news/index.html
+++ b/docs/news/index.html
@@ -53,6 +53,10 @@
Source: NEWS.md
+
+- Bug fix: Use
&&
instead of &
in C++.
+
updog 2.1.2
CRAN release: 2022-01-24
-
Fixed a bug in my use of assertthat::are_equal()
and testthat::expect_equal()
. See 21 Jan 2022 R-devel/NEWS where it states:
-all.equal.numeric()
gains a sanity check on its tolerance
argument - calling all.equal(a, b, c)
for three numeric vectors is a surprisingly common error.
-
+
+all.equal.numeric()
gains a sanity check on its tolerance
argument - calling all.equal(a, b, c)
for three numeric vectors is a surprisingly common error.
+
+
+
updog 2.1.1
CRAN release: 2021-10-25
- Added an upper bound to the sequencing error rate in
flexdog_full()
(and, hence, flexdog()
and multidog()
). This protects against some poor behavior observed in a corner case. Specifically, F1 populations where the offspring are all the same genotype and is sequenced at moderate to low depth.
@@ -133,9 +141,13 @@ updog 1.0.1Fixes a bug with option model = "s1pp"
in flexdog()
. I was originally not constraining the levels of preferential pairing to be the same in both segregations of the same parent. This is now fixed. But the downside is that model = "s1pp"
is now only supported for ploidy = 4
or ploidy = 6
. This is because the optimization becomes more difficult for larger ploidy levels.
-
I fixed some documentation. Perhaps the biggest error comes from this snippet from the original documentation of flexdog
:
-The value of prop_mis
is a very intuitive measure for the quality of the SNP. prop_mis
is the posterior proportion of individuals mis-genotyped. So if you want only SNPS that accurately genotype, say, 95% of the individuals, you could discard all SNPs with a prop_mis
under 0.95.
+
+The value of prop_mis
is a very intuitive measure for the quality of the SNP. prop_mis
is the posterior proportion of individuals mis-genotyped. So if you want only SNPS that accurately genotype, say, 95% of the individuals, you could discard all SNPs with a prop_mis
under 0.95.
+
This now says
-The value of prop_mis is a very intuitive measure for the quality of the SNP. prop_mis is the posterior proportion of individuals mis-genotyped. So if you want only SNPS that accurately genotype, say, 95% of the individuals, you could discard all SNPs with a prop_mis over 0.05.
+
+The value of prop_mis is a very intuitive measure for the quality of the SNP. prop_mis is the posterior proportion of individuals mis-genotyped. So if you want only SNPS that accurately genotype, say, 95% of the individuals, you could discard all SNPs with a prop_mis over 0.05.
+
I’ve now exported some C++ functions that I think are useful. You can call them in the usual way.
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index 2055ed9..172781c 100644
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -1,4 +1,4 @@
-pandoc: 2.9.2.1
+pandoc: 3.1.1
pkgdown: 2.0.7
pkgdown_sha: ~
articles:
@@ -6,5 +6,5 @@ articles:
oracle_calculations: oracle_calculations.html
simulate_ngs: simulate_ngs.html
smells_like_updog: smells_like_updog.html
-last_built: 2023-11-28T16:57Z
+last_built: 2023-11-28T17:46Z
diff --git a/docs/search.json b/docs/search.json
index e5ca42c..7866a3e 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -1 +1 @@
-[{"path":"/CONDUCT.html","id":null,"dir":"","previous_headings":"","what":"Contributor Code of Conduct","title":"Contributor Code of Conduct","text":"contributors maintainers project, pledge respect people contribute reporting issues, posting feature requests, updating documentation, submitting pull requests patches, activities. committed making participation project harassment-free experience everyone, regardless level experience, gender, gender identity expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion. Examples unacceptable behavior participants include use sexual language imagery, derogatory comments personal attacks, trolling, public private harassment, insults, unprofessional conduct. Project maintainers right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct. Project maintainers follow Code Conduct may removed project team. Instances abusive, harassing, otherwise unacceptable behavior may reported opening issue contacting one project maintainers. Code Conduct adapted Contributor Covenant (https:contributor-covenant.org), version 1.0.0, available https://contributor-covenant.org/version/1/0/0/","code":""},{"path":"/articles/multidog.html","id":"abstract","dir":"Articles","previous_headings":"","what":"Abstract","title":"Genotyping Many SNPs with multidog()","text":"multidog() provides support genotyping many SNPs iterating flexdog() SNPs. Support provided parallel computing future package. genotyping method described Gerard et al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/articles/multidog.html","id":"fit-multidog","dir":"Articles","previous_headings":"","what":"Fit multidog()","title":"Genotyping Many SNPs with multidog()","text":"Let’s load updog, future, data Uitdewilligen et al. (2013). uitdewilligen$refmat matrix reference counts uitdewilligen$sizemat matrix total read counts. data, rows index individuals columns index loci. insertion multidog() need way around (individuals columns loci rows). transpose matrices. sizemat refmat row column names. names identify loci individuals. want parallel computing, check proper number cores: Now let’s run multidog(): default, parallelization run using nc greater 1. can choose evaluation strategy running future::plan() prior running multidog(), setting nc = NA. particularly useful higher performance computing environments use schedulers, can control evaluation strategy future.batchtools package. example, following run multidog() using forked R processes:","code":"library(future) library(updog) data(\"uitdewilligen\") refmat <- t(uitdewilligen$refmat) sizemat <- t(uitdewilligen$sizemat) ploidy <- uitdewilligen$ploidy setdiff(colnames(sizemat), colnames(refmat)) #> character(0) setdiff(rownames(sizemat), rownames(refmat)) #> character(0) future::availableCores() #> system #> 16 mout <- multidog(refmat = refmat, sizemat = sizemat, ploidy = ploidy, model = \"norm\", nc = 2) #> | *.#,% #> ||| *******/ #> ||||||| (**..#**. */ **/ #> ||||||||| */****************************/*% #> ||| &****..,*.************************/ #> ||| (....,,,*,...****%********/(****** #> ||| ,,****%////,,,,./.****/ #> ||| /**// .*///.... #> ||| .*/*/%# .,/ ., #> ||| , **/ #% .* .. #> ||| ,,,* #> #> Working on it...done! future::plan(future::multisession, workers = nc) future::plan(future::multicore, workers = 2) mout <- multidog(refmat = refmat, sizemat = sizemat, ploidy = ploidy, model = \"norm\", nc = NA) ## Shut down parallel workers future::plan(future::sequential)"},{"path":"/articles/multidog.html","id":"multidog-output","dir":"Articles","previous_headings":"","what":"multidog() Output","title":"Genotyping Many SNPs with multidog()","text":"plot method output multidog(). output multidog contains two data frame. first contains properties SNPs, estimated allele bias estimated sequencing error rate. second data frame contains properties individual SNP, estimated genotypes (geno) posterior probability genotyping correctly (maxpostprob). can obtain columns inddf matrix form format_multidog(). filter SNPs based quality metrics (bias, sequencing error rate, overdispersion, etc), can use filter_snp(), uses non-standard evaluation used dplyr::filter(). , can define predicates terms variable names snpdf data frame output mupdog(). keeps rows snpdf inddf predicate SNP evaluates TRUE.","code":"plot(mout, indices = c(1, 5, 100)) #> [[1]] #> #> [[2]] #> #> [[3]] str(mout$snpdf) #> 'data.frame': 100 obs. of 20 variables: #> $ snp : chr \"PotVar0089524\" \"PotVar0052647\" \"PotVar0120897\" \"PotVar0066020\" ... #> $ bias : num 0.519 1.026 0.929 1.221 0.847 ... #> $ seq : num 0.00485 0.00221 0.002 0.0039 0.00206 ... #> $ od : num 0.00304 0.00295 0.00337 0.00275 0.00335 ... #> $ prop_mis: num 0.004926 0.002274 0.000626 0.002718 0.003 ... #> $ num_iter: num 6 3 3 5 7 7 4 8 8 4 ... #> $ llike : num -14.7 -25.3 -10.4 -22.7 -32 ... #> $ ploidy : num 4 4 4 4 4 4 4 4 4 4 ... #> $ model : chr \"norm\" \"norm\" \"norm\" \"norm\" ... #> $ p1ref : num NA NA NA NA NA NA NA NA NA NA ... #> $ p1size : num NA NA NA NA NA NA NA NA NA NA ... #> $ p2ref : num NA NA NA NA NA NA NA NA NA NA ... #> $ p2size : num NA NA NA NA NA NA NA NA NA NA ... #> $ Pr_0 : num 0.000279 0.248211 0.66369 0.015803 0.08409 ... #> $ Pr_1 : num 0.00707 0.45067 0.26892 0.06938 0.20154 ... #> $ Pr_2 : num 0.0745 0.2542 0.0597 0.1931 0.2968 ... #> $ Pr_3 : num 0.32604 0.04452 0.00725 0.34069 0.26844 ... #> $ Pr_4 : num 0.592065 0.002423 0.000482 0.381024 0.149179 ... #> $ mu : num 4.18 1.01 -1 3.75 2.29 ... #> $ sigma : num 1.067 0.925 1.289 1.481 1.433 ... str(mout$inddf) #> 'data.frame': 1000 obs. of 17 variables: #> $ snp : chr \"PotVar0089524\" \"PotVar0089524\" \"PotVar0089524\" \"PotVar0089524\" ... #> $ ind : chr \"P5PEM08\" \"P3PEM05\" \"P2PEM10\" \"P7PEM09\" ... #> $ ref : num 122 113 86 80 69 85 130 228 60 211 ... #> $ size : num 142 143 96 80 69 86 130 228 86 212 ... #> $ geno : num 3 3 3 4 4 4 4 4 2 4 ... #> $ postmean : num 3 2.99 3 4 4 ... #> $ maxpostprob: num 1 0.988 1 1 1 ... #> $ Pr_0 : num 3.74e-90 1.03e-78 2.21e-77 1.06e-86 8.21e-79 ... #> $ Pr_1 : num 7.97e-23 3.86e-16 2.61e-20 6.80e-30 1.21e-26 ... #> $ Pr_2 : num 4.94e-06 1.17e-02 3.27e-06 2.82e-14 1.01e-12 ... #> $ Pr_3 : num 1.00 9.88e-01 1.00 6.74e-06 2.75e-05 ... #> $ Pr_4 : num 1.45e-10 1.14e-15 3.56e-06 1.00 1.00 ... #> $ logL_0 : num -201 -176 -172 -190 -172 ... #> $ logL_1 : num -49.6 -35.6 -44 -62.9 -55.4 ... #> $ logL_2 : num -13.27 -6.95 -13.93 -29.29 -25.69 ... #> $ logL_3 : num -2.55 -4 -2.79 -11.49 -10.06 ... #> $ logL_4 : num -25.804 -38.999 -15.935 -0.181 -0.158 ... genomat <- format_multidog(mout, varname = \"geno\") head(genomat) #> P1PEM10 P2PEM05 P2PEM10 P3PEM05 P4PEM01 P4PEM09 P5PEM04 P5PEM08 #> PotVar0089524 4 4 3 3 4 4 4 3 #> PotVar0052647 3 1 0 1 1 2 0 1 #> PotVar0120897 0 0 0 0 0 0 0 1 #> PotVar0066020 3 2 3 4 4 3 1 4 #> PotVar0003381 3 1 2 0 2 3 3 1 #> PotVar0131622 2 4 1 2 2 3 4 3 #> P6PEM11 P7PEM09 #> PotVar0089524 2 4 #> PotVar0052647 1 1 #> PotVar0120897 2 1 #> PotVar0066020 4 2 #> PotVar0003381 4 3 #> PotVar0131622 3 3 dim(mout$snpdf) #> [1] 100 20 dim(mout$inddf) #> [1] 1000 17 mout_cleaned <- filter_snp(mout, prop_mis < 0.05 & bias > exp(-1) & bias < exp(1)) dim(mout_cleaned$snpdf) #> [1] 97 20 dim(mout_cleaned$inddf) #> [1] 970 17"},{"path":"/articles/multidog.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Genotyping Many SNPs with multidog()","text":"Gerard, David, Luís Felipe Ventorim Ferrão. “Priors genotyping polyploids.” Bioinformatics 36, . 6 (2020): 1795-1800. https://doi.org/10.1093/bioinformatics/btz852. Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468. Uitdewilligen, Anne-Marie . D’hoop, Jan G. . M. L. Wolters. 2013. “Next-Generation Sequencing Method Genotyping--Sequencing Highly Heterozygous Autotetraploid Potato.” PLOS ONE 8 (5). Public Library Science: 1–14. https://doi.org/10.1371/journal.pone.0062355.","code":""},{"path":"/articles/oracle_calculations.html","id":"abstract","dir":"Articles","previous_headings":"","what":"Abstract","title":"Oracle Calculations","text":"provide example usage oracle calculations available updog. particularly useful read-depth determination. calculations described detail Gerard et al. (2018).","code":""},{"path":"/articles/oracle_calculations.html","id":"controlling-misclassification-error","dir":"Articles","previous_headings":"","what":"Controlling Misclassification Error","title":"Oracle Calculations","text":"Suppose sample tetraploid individuals derived S1 cross (single generation selfing). Using domain expertise (either previous studies pilot analysis), ’ve determined sequencing technology produce relatively clean data. , sequencing error rate large (say, ~0.001), bias moderate (say, ~0.7 extreme), majority SNPs reasonable levels overdispersion (say, less 0.01). want know deep need sequence. Using oracle_mis, can see deep need sequence worst-case scenario want control (sequencing error rate = 0.001, bias = 0.7, overdispersion = 0.01) order obtain misclassification error rate , say, 0.05. , also need distribution offspring genotypes. can get distribution assuming various parental genotypes using get_q_array function. Typically, error rates larger allele-frequency closer 0.5. ’ll start worst-case scenario assuming parent 2 copies reference allele. genotype distribution offspring looks like: Now, ready iterate read-depth’s reach one error rate less 0.05. Looks like need depth 90 order get misclassification error rate 0.05. Note oracle_mis returns best misclassification error rate possible conditions (ploidy = 4, bias = 0.7, seq = 0.001, od = 0.01, pgeno = 2). actual analysis, worse misclassification error rate returned oracle_mis. However, lot individuals sample, act reasonable approximation error rate. general though, sequence little deeper suggested oracle_mis.","code":"bias <- 0.7 od <- 0.01 seq <- 0.001 maxerr <- 0.05 library(updog) ploidy <- 4 pgeno <- 2 gene_dist <- get_q_array(ploidy = ploidy)[pgeno + 1, pgeno + 1, ] library(ggplot2) distdf <- data.frame(x = 0:ploidy, y = 0, yend = gene_dist) ggplot(distdf, mapping = aes(x = x, y = y, xend = x, yend = yend)) + geom_segment(lineend = \"round\", lwd = 2) + theme_bw() + xlab(\"Allele Dosage\") + ylab(\"Probability\") err <- Inf depth <- 0 while(err > maxerr) { depth <- depth + 1 err <- oracle_mis(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) } depth #> [1] 90"},{"path":"/articles/oracle_calculations.html","id":"visualizing-the-joint-distribution","dir":"Articles","previous_headings":"","what":"Visualizing the Joint Distribution","title":"Oracle Calculations","text":"Suppose budget sequence depth 30. errors can expect? can use oracle_joint oracle_plot visualize errors can expect. errors mistakes genotypes 2/3 mistakes genotypes 1/2. Even though misclassification error rate pretty high (0.14), correlation oracle estimator true genotype pretty reasonable (0.89). can obtain using oracle_cor function.","code":"depth <- 30 jd <- oracle_joint(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) oracle_plot(jd) ocorr <- oracle_cor(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) ocorr #> [1] 0.8935101"},{"path":"/articles/oracle_calculations.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Oracle Calculations","text":"Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.","code":""},{"path":"/articles/simulate_ngs.html","id":"abstract","dir":"Articles","previous_headings":"","what":"Abstract","title":"Simulate Next-Generation Sequencing Data","text":"demonstrate simulate NGS data various genotype distributions, fit data using flexdog. genotyping methods described Gerard et al. (2018).","code":""},{"path":"/articles/simulate_ngs.html","id":"analysis","dir":"Articles","previous_headings":"","what":"Analysis","title":"Simulate Next-Generation Sequencing Data","text":"Let’s suppose 100 hexaploid individuals, varying levels read-depth. can simulate read-counts various genotype distributions, allele biases, overdispersions, sequencing error rates using rgeno rflexdog functions.","code":"set.seed(1) library(updog) nind <- 100 ploidy <- 6 sizevec <- round(stats::runif(n = nind, min = 50, max = 200))"},{"path":"/articles/simulate_ngs.html","id":"f1-population","dir":"Articles","previous_headings":"Analysis","what":"F1 Population","title":"Simulate Next-Generation Sequencing Data","text":"Suppose individuals siblings first parent 4 copies reference allele second parent 5 copies reference allele. following code, using rgeno, simulate individuals’ genotypes. genotypes, can simulate read-counts using rflexdog. Let’s suppose moderate level allelic bias (0.7) small level overdispersion (0.005). Generally, real data ’ve seen, bias range 0.5 2 overdispersion range 0 0.02, extremely overdispersed SNPs 0.02. plot data, looks realistic can test flexdog data flexdog gives us reasonable genotyping, accurately estimates proportion individuals mis-genotyped.","code":"true_geno <- rgeno(n = nind, ploidy = ploidy, model = \"f1\", p1geno = 4, p2geno = 5) refvec <- rflexdog(sizevec = sizevec, geno = true_geno, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.005) plot_geno(refvec = refvec, sizevec = sizevec, ploidy = ploidy, bias = 0.7, seq = 0.001, geno = true_geno) fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"f1\") #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -363.9369 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -363.937 #> Keeping old fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -363.9369 #> Keeping new fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -381.6123 #> Keeping old fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -412.8604 #> Keeping old fit. #> #> Done! plot(fout) ## Estimated proportion misgenotyped fout$prop_mis #> [1] 0.07011089 ## Actual proportion misgenotyped mean(fout$geno != true_geno) #> [1] 0.05"},{"path":"/articles/simulate_ngs.html","id":"hwe-population","dir":"Articles","previous_headings":"Analysis","what":"HWE Population","title":"Simulate Next-Generation Sequencing Data","text":"Now run simulations assuming individuals Hardy-Weinberg population allele frequency 0.75.","code":"true_geno <- rgeno(n = nind, ploidy = ploidy, model = \"hw\", allele_freq = 0.75) refvec <- rflexdog(sizevec = sizevec, geno = true_geno, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.005) fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"hw\") #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -377.9226 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -377.9226 #> Keeping old fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -377.9226 #> Keeping old fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -377.9226 #> Keeping new fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -377.9226 #> Keeping old fit. #> #> Done! plot(fout) ## Estimated proportion misgenotyped fout$prop_mis #> [1] 0.07625987 ## Actual proportion misgenotyped mean(fout$geno != true_geno) #> [1] 0.07 ## Estimated allele frequency close to true allele frequency fout$par$alpha #> [1] 0.7473264"},{"path":"/articles/simulate_ngs.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Simulate Next-Generation Sequencing Data","text":"Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.","code":""},{"path":"/articles/smells_like_updog.html","id":"whats-updog","dir":"Articles","previous_headings":"","what":"What’s Updog?","title":"Example Use of Updog","text":"Updog package containing empirical Bayes approaches genotype individuals (particularly polyploids) next generation sequencing (NGS) data. mind NGS data results reduced representation library, “genotyping--sequencing” (GBS) (Elshire et al., 2011) “restriction site-associated DNA sequencing” (RAD-seq) (Baird et al., 2008). Updog wields power hierarchical modeling account key features NGS data overlooked analyses, particularly allelic bias overdispersion. Updog also automatically account sequencing errors. efficiently account features, updog needs know distribution individual genotypes population. function flexdog can accurately estimate distribution wide variety situations. can read updog method Gerard et al. (2018). vignette, go one example fitting flexdog S1 population individuals.","code":""},{"path":[]},{"path":"/articles/smells_like_updog.html","id":"fit-updog","dir":"Articles","previous_headings":"Example from an S1 Population","what":"Fit updog","title":"Example Use of Updog","text":"Load updog snpdat dataset. data frame snpdat contains three example SNPs (single nucleotide polymorphisms) study Shirasawa et al. (2017). individuals dataset resulted single generation selfing (S1 population). can read typing ?snpdat. ’ll extract First SNP. separate counts children parent (first individual). Note need parental counts fit updog, can help improve estimates parameters updog model. can first use plot_geno visualize raw data. Now use flexdog function fit model. use model = \"s1\" individuals resulted one generation selfing parent.","code":"set.seed(1) library(updog) data(\"snpdat\") smalldat <- snpdat[snpdat$snp == \"SNP1\", c(\"counts\", \"size\", \"id\")] head(smalldat) #> # A tibble: 6 × 3 #> counts size id #> #> 1 298 354 Xushu18 #> 2 187 187 Xushu18S1-001 #> 3 201 201 Xushu18S1-002 #> 4 157 184 Xushu18S1-003 #> 5 175 215 Xushu18S1-004 #> 6 283 283 Xushu18S1-005 pref <- smalldat$counts[1] psize <- smalldat$size[1] oref <- smalldat$counts[-1] osize <- smalldat$size[-1] ploidy <- 6 # sweet potatoes are hexaploid plot_geno(refvec = oref, sizevec = osize, ploidy = ploidy) uout <- flexdog(refvec = oref, sizevec = osize, ploidy = ploidy, model = \"s1\", p1ref = pref, p1size = psize) #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -592.9506 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -592.9506 #> Keeping old fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -538.1967 #> Keeping new fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -538.1963 #> Keeping new fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -538.1963 #> Keeping old fit. #> #> Done!"},{"path":"/articles/smells_like_updog.html","id":"analyze-output","dir":"Articles","previous_headings":"Example from an S1 Population","what":"Analyze Output","title":"Example Use of Updog","text":"use plot.flexdog visualize fit. Points color coded according genotype highest posterior probability. example, genotype “4” represents four copies reference allele two copies alternative allele (AAAAaa). level transparency proportional maximum posterior probability. equivalent posterior probability genotype estimate correct. lines represent mean counts given genotype. “+” symbol black dot location parent.","code":"plot(uout)"},{"path":"/articles/smells_like_updog.html","id":"filtering-snps","dir":"Articles","previous_headings":"Example from an S1 Population","what":"Filtering SNPs","title":"Example Use of Updog","text":"downstream analyses, might want filter poorly behaved SNPs. SNPs might poorly behaved variety reasons (might real SNPs, might much difficult map one allele correct location relative allele, etc). Updog gives measures filter SNPs. intuitive measure (posterior) proportion individuals mis-genotyped: SNP, expect 4.22 percent individuals mis-genotyped. specific cutoff use context data dependent. starting point, try loose cutoff keeping SNPs prop_mis less 0.2. simulation studies, also generally get rid SNPs overdispersion parameters greater 0.05 SNPs bias parameters either less 0.5 greater 2. However, higher lower read depths looked simulations, adjust levels accordingly.","code":"uout$prop_mis #> [1] 0.04216399"},{"path":"/articles/smells_like_updog.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Example Use of Updog","text":"Baird, Paul D. Atwood, Nathan . Etter. 2008. “Rapid SNP Discovery Genetic Mapping Using Sequenced RAD Markers.” PLOS ONE 3 (10). Public Library Science: 1–7. https://doi.org/10.1371/journal.pone.0003376. Elshire, Jeffrey C. Sun, Robert J. Glaubitz. 2011. “Robust, Simple Genotyping--Sequencing (GBS) Approach High Diversity Species.” PLOS ONE 6 (5). Public Library Science: 1–10. https://doi.org/10.1371/journal.pone.0019379. Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468. Shirasawa, Kenta, Masaru Tanaka, Yasuhiro Takahata, Daifu Ma, Qinghe Cao, Qingchang Liu, Hong Zhai, et al. 2017. “High-Density SNP Genetic Map Consisting Complete Set Homologous Groups Autohexaploid Sweetpotato (Ipomoea batatas).” Scientific Reports 7. Nature Publishing Group. https://doi.org/10.1038/srep44207.","code":""},{"path":"/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Gerard. Author, maintainer.","code":""},{"path":"/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Gerard D, Ferr~ao L, Garcia , Stephens M (2018). “Genotyping Polyploids Messy Sequencing Data.” Genetics, 210(3), 789–807. ISSN 0016-6731, doi:10.1534/genetics.118.301468. Gerard D, Ferr~ao L (2020). “Priors Genotyping Polyploids.” Bioinformatics, 36(6), 1795–1800. ISSN 1367-4803, doi:10.1093/bioinformatics/btz852.","code":"@Article{, title = {Genotyping Polyploids from Messy Sequencing Data}, year = {2018}, journal = {Genetics}, publisher = {Genetics}, volume = {210}, number = {3}, pages = {789--807}, issn = {0016-6731}, doi = {10.1534/genetics.118.301468}, author = {David Gerard and Lu{\\'i}s Felipe Ventorim Ferr{\\~a}o and Antonio Augusto Franco Garcia and Matthew Stephens}, } @Article{, title = {Priors for Genotyping Polyploids}, year = {2020}, journal = {Bioinformatics}, publisher = {Oxford University Press}, volume = {36}, number = {6}, pages = {1795--1800}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btz852}, author = {David Gerard and Lu{\\'i}s Felipe Ventorim Ferr{\\~a}o}, }"},{"path":"/index.html","id":"updog-","dir":"","previous_headings":"","what":"Flexible Genotyping for Polyploids","title":"Flexible Genotyping for Polyploids","text":"Updog provides suite methods genotyping polyploids next-generation sequencing (NGS) data. accounting many common features NGS data: allele bias, overdispersion, sequencing error. named updog “Using Parental Data Offspring Genotyping” originally developed method full-sib populations, works now general populations. method described detail Gerard et. al. (2018) . Additional details concerning prior specification described Gerard Ferrão (2020) . main functions flexdog() multidog(), provide many options distribution genotypes sample. novel genotype distribution included class proportional normal distributions (model = \"norm\"). default prior distribution robust varying genotype distributions, feel free use specialized priors information data. Also provided : filter_snp(): filter SNPs based output multidog(). format_multidog(): format output multidog() terms multidimensional array. Plot methods. flexdog() multidog() plot methods. See help files plot.flexdog() plot.multidog() details. Functions simulate genotypes (rgeno()) read-counts (rflexdog()). support models available flexdog(). Functions evaluate oracle genotyping performance: oracle_joint(), oracle_mis(), oracle_mis_vec(), oracle_cor(). mean “oracle” sense assume entire data generation process known (.e. genotype distribution, sequencing error rate, allele bias, overdispersion known). good approximations lot individuals (necessarily large read-depth). original updog package now named updogAlpha may found . See also ebg, fitPoly, polyRAD. best “competitor” probably fitPoly, though polyRAD nice ideas utilizing population structure linkage disequilibrium. See NEWS latest updates package.","code":""},{"path":"/index.html","id":"vignettes","dir":"","previous_headings":"","what":"Vignettes","title":"Flexible Genotyping for Polyploids","text":"’ve included many vignettes updog, can access online .","code":""},{"path":"/index.html","id":"bug-reports","dir":"","previous_headings":"","what":"Bug Reports","title":"Flexible Genotyping for Polyploids","text":"find bug want enhancement, please submit issue .","code":""},{"path":"/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Flexible Genotyping for Polyploids","text":"can install updog CRAN usual way: can install current (unstable) version updog GitHub :","code":"install.packages(\"updog\") # install.packages(\"devtools\") devtools::install_github(\"dcgerard/updog\")"},{"path":"/index.html","id":"how-to-cite","dir":"","previous_headings":"","what":"How to Cite","title":"Flexible Genotyping for Polyploids","text":"Please cite Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi: 10.1534/genetics.118.301468. , using BibTex: using proportional normal prior class (model = \"norm\"), also default prior, please also cite: Gerard D, Ferrão L (2020). “Priors Genotyping Polyploids.” Bioinformatics, 36(6), 1795-1800. ISSN 1367-4803, doi: 10.1093/bioinformatics/btz852. , using BibTex:","code":"@article {gerard2018genotyping, author = {Gerard, David and Ferr{\\~a}o, Lu{\\'i}s Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew}, title = {Genotyping Polyploids from Messy Sequencing Data}, volume = {210}, number = {3}, pages = {789--807}, year = {2018}, doi = {10.1534/genetics.118.301468}, publisher = {Genetics}, issn = {0016-6731}, URL = {https://doi.org/10.1534/genetics.118.301468}, journal = {Genetics} } @article{gerard2020priors, title = {Priors for Genotyping Polyploids}, year = {2020}, journal = {Bioinformatics}, publisher = {Oxford University Press}, volume = {36}, number = {6}, pages = {1795--1800}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btz852}, author = {David Gerard and Lu{\\'i}s Felipe Ventorim Ferr{\\~a}o}, }"},{"path":"/index.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Flexible Genotyping for Polyploids","text":"Please note project released Contributor Code Conduct. participating project agree abide terms.","code":""},{"path":"/reference/betabinom.html","id":null,"dir":"Reference","previous_headings":"","what":"The Beta-Binomial Distribution — dbetabinom","title":"The Beta-Binomial Distribution — dbetabinom","text":"Density, distribution function, quantile function random generation beta-binomial distribution parameterized mean mu overdispersion parameter rho rather typical shape parameters.","code":""},{"path":"/reference/betabinom.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"The Beta-Binomial Distribution — dbetabinom","text":"","code":"dbetabinom(x, size, mu, rho, log) pbetabinom(q, size, mu, rho, log_p) qbetabinom(p, size, mu, rho) rbetabinom(n, size, mu, rho)"},{"path":"/reference/betabinom.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"The Beta-Binomial Distribution — dbetabinom","text":"x, q vector quantiles. size vector sizes. mu Either scalar mean observation, vector means observation, thus length x size. must 0 1. rho Either scalar overdispersion parameter observation, vector overdispersion parameters observation, thus length x size. must 0 1. log, log_p logical vector either length 1 length x size. determines whether return log probabilities observations (case length 1) observation (case length x size). p vector probabilities. n number observations.","code":""},{"path":"/reference/betabinom.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"The Beta-Binomial Distribution — dbetabinom","text":"Either random sample (rbetabinom), density (dbetabinom), tail probability (pbetabinom), quantile (qbetabinom) beta-binomial distribution.","code":""},{"path":"/reference/betabinom.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"The Beta-Binomial Distribution — dbetabinom","text":"Let \\(\\mu\\) \\(\\rho\\) mean overdispersion parameters. Let \\(\\alpha\\) \\(\\beta\\) usual shape parameters beta distribution. relation $$\\mu = \\alpha/(\\alpha + \\beta),$$ $$\\rho = 1/(1 + \\alpha + \\beta).$$ necessarily means $$\\alpha = \\mu (1 - \\rho)/\\rho,$$ $$\\beta = (1 - \\mu) (1 - \\rho)/\\rho.$$","code":""},{"path":"/reference/betabinom.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"The Beta-Binomial Distribution — dbetabinom","text":"dbetabinom(): Density function. pbetabinom(): Distribution function. qbetabinom(): Quantile function. rbetabinom(): Random generation.","code":""},{"path":"/reference/betabinom.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"The Beta-Binomial Distribution — dbetabinom","text":"David Gerard","code":""},{"path":"/reference/betabinom.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"The Beta-Binomial Distribution — dbetabinom","text":"","code":"x <- rbetabinom(n = 10, size = 10, mu = 0.1, rho = 0.01) dbetabinom(x = 1, size = 10, mu = 0.1, rho = 0.01, log = FALSE) #> [1] 0.3689335 pbetabinom(q = 1, size = 10, mu = 0.1, rho = 0.01, log_p = FALSE) #> [1] 0.7345131 qbetabinom(p = 0.6, size = 10, mu = 0.1, rho = 0.01) #> [1] 1"},{"path":"/reference/filter_snp.html","id":null,"dir":"Reference","previous_headings":"","what":"Filter SNPs based on the output of multidog(). — filter_snp","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"Filter based provided logical predicates terms variable names x$snpdf. function filters x$snpdf x$inddf.","code":""},{"path":"/reference/filter_snp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"","code":"filter_snp(x, expr)"},{"path":"/reference/filter_snp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"x output multidog. expr Logical predicate expression defined terms variables x$snpdf. SNPs condition evaluates TRUE kept.","code":""},{"path":[]},{"path":"/reference/filter_snp.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"David Gerard","code":""},{"path":"/reference/filter_snp.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"","code":"if (FALSE) { data(\"uitdewilligen\") mout <- multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, nc = 2) ## The following filters are for educational purposes only and should ## not be taken as a default filter: mout2 <- filter_snp(mout, bias < 0.8 & od < 0.003) }"},{"path":"/reference/flexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"Genotype polyploid individuals next generation sequencing (NGS) data assuming genotype distribution one several forms. flexdog accounting allele bias, overdispersion, sequencing error. method described detail Gerard et. al. (2018) Gerard Ferrão (2020). See multidog() running flexdog multiple SNPs parallel.","code":""},{"path":"/reference/flexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"","code":"flexdog( refvec, sizevec, ploidy, model = c(\"norm\", \"hw\", \"bb\", \"s1\", \"s1pp\", \"f1\", \"f1pp\", \"flex\", \"uniform\", \"custom\"), p1ref = NULL, p1size = NULL, p2ref = NULL, p2size = NULL, snpname = NULL, bias_init = exp(c(-1, -0.5, 0, 0.5, 1)), verbose = TRUE, prior_vec = NULL, ... )"},{"path":"/reference/flexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"refvec vector counts reads reference allele. sizevec vector total counts. ploidy ploidy species. Assumed individual. model form prior (genotype distribution) take? See Details possible values. p1ref reference counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p1size total counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p2ref reference counts second parent model = \"f1\" model = \"f1pp\". p2size total counts second parent model = \"f1\" model = \"f1pp\". snpname string. name SNP consideration. just returned input list reference. bias_init vector initial values bias parameter multiple runs flexdog_full(). verbose output (TRUE) less (FALSE)? prior_vec pre-specified genotype distribution. used model = \"custom\" must otherwise NULL. specified, vector length ploidy + 1 non-negative elements sum 1. ... Additional parameters pass flexdog_full().","code":""},{"path":"/reference/flexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"object class flexdog, consists list following elements: bias estimated bias parameter. seq estimated sequencing error rate. od estimated overdispersion parameter. num_iter number EM iterations ran. wary equals itermax. llike maximum marginal log-likelihood. postmat matrix posterior probabilities genotype individual. rows index individuals columns index allele dosage. genologlike matrix genotype log-likelihoods genotype individual. rows index individuals columns index allele dosage. gene_dist estimated genotype distribution. ith element proportion individuals genotype -1. par list final estimates parameters genotype distribution. elements included par depends value model: model = \"norm\": mu: normal mean. sigma: normal standard deviation (variance). model = \"hw\": alpha: major allele frequency. model = \"bb\": alpha: major allele frequency. tau: overdispersion parameter. See description rho Details betabinom(). model = \"s1\": pgeno: allele dosage parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1\": p1geno: allele dosage first parent. p2geno: allele dosage second parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"s1pp\": ell1: estimated dosage parent. tau1: estimated double reduction parameter parent. Available ell1 1, 2, 3. Identified ell1 1 3. gamma1: estimated preferential pairing parameter. Available ell1 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1pp\": ell1: estimated dosage parent 1. ell2: estimated dosage parent 2. tau1: estimated double reduction parameter parent 1. Available ell1 1, 2, 3. Identified ell1 1 3. tau2: estimated double reduction parameter parent 2. Available ell2 1, 2, 3. Identified ell2 1 3. gamma1: estimated preferential pairing parameter parent 1. Available ell1 2. However, returned identified form. gamma2: estimated preferential pairing parameter parent 2. Available ell2 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"flex\": par empty list. model = \"uniform\": par empty list. model = \"custom\": par empty list. geno posterior mode genotype. genotype estimates. maxpostprob maximum posterior probability. equivalent posterior probability correctly genotyping individual. postmean posterior mean genotype. downstream association studies, might want consider using estimates. input$refvec value refvec provided user. input$sizevec value sizevec provided user. input$ploidy value ploidy provided user. input$model value model provided user. input$p1ref value p1ref provided user. input$p1size value p1size provided user. input$p2ref value p2ref provided user. input$p2size value p2size provided user. input$snpname value snpname provided user. prop_mis posterior proportion individuals genotyped incorrectly.","code":""},{"path":"/reference/flexdog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"Possible values genotype distribution (values model) : \"norm\" distribution whose genotype frequencies proportional density value normal mean standard deviation. Unlike \"bb\" \"hw\" options, allow distributions less dispersed binomial. seems robust violations modeling assumptions, default. prior class developed Gerard Ferrão (2020). \"hw\" binomial distribution results assuming population Hardy-Weinberg equilibrium (HWE). actually pretty well even minor moderate deviations HWE. Though perform well `\"norm\"` option severe deviations HWE. \"bb\" beta-binomial distribution. overdispersed version \"hw\" can derived special case Balding-Nichols model. \"s1\" prior assumes individuals full-siblings resulting one generation selfing. .e. one parent. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1\" prior assumes individuals full-siblings resulting one generation bi-parental cross. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1pp\" prior allows double reduction preferential pairing F1 population tretraploids. \"s1pp\" prior allows double reduction preferential pairing S1 population tretraploids. \"flex\" Generically categorical distribution. Theoretically, works well lot individuals. practice, seems much less robust violations modeling assumptions. \"uniform\" discrete uniform distribution. never used practice. \"custom\" pre-specified prior distribution. specify using prior_vec argument. almost never use option practice. might think good default model = \"uniform\" somehow \"uninformative prior.\" informative tends work horribly practice. intuition estimate allele bias sequencing error rates estimated genotypes approximately uniform (since assuming approximately uniform). usually result unintuitive genotyping since populations uniform genotype distribution. include option completeness. Please use . value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.05. value maxpostprob intuitive measure quality genotype estimate individual. posterior probability correctly genotyping individual using geno (posterior mode) genotype estimate. want correctly genotype, say, 95% individuals, discard individuals maxpostprob 0.95. However, just going impute missing genotypes later, might consider discarding individuals flexdog's genotype estimates probably accurate naive approaches, imputing using grand mean. datasets examined, allelic bias major issue. However, may fit model assuming allelic bias setting update_bias = FALSE bias_init = 1. Prior using flexdog, read-mapping step, try get rid allelic bias using WASP (doi:10.1101/011221 ). successful removing allelic bias (source read-mapping step), setting update_bias = FALSE bias_init = 1 reasonable. can visually inspect SNPs bias using plot_geno(). flexdog(), like methods, invariant allele label \"reference\" label \"alternative\". , set refvec number alternative read-counts, resulting genotype estimates estimated allele dosage alternative allele.","code":""},{"path":"/reference/flexdog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/flexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"David Gerard","code":""},{"path":"/reference/flexdog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"","code":"# \\donttest{ ## An S1 population where the first individual ## is the parent. data(\"snpdat\") ploidy <- 6 refvec <- snpdat$counts[snpdat$snp == \"SNP2\"] sizevec <- snpdat$size[snpdat$snp == \"SNP2\"] fout <- flexdog(refvec = refvec[-1], sizevec = sizevec[-1], ploidy = ploidy, model = \"s1\", p1ref = refvec[1], p1size = sizevec[1]) #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -557.6433 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -519.2793 #> Keeping new fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -519.2793 #> Keeping old fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -519.2793 #> Keeping new fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -519.2793 #> Keeping new fit. #> #> Done! plot(fout) #> Warning: Removed 1 rows containing missing values (`geom_point()`). # } ## A natural population. We will assume a ## normal prior since there are so few ## individuals. data(\"uitdewilligen\") ploidy <- 4 refvec <- uitdewilligen$refmat[, 1] sizevec <- uitdewilligen$sizemat[, 1] fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"norm\") #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -14.66905 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -14.66905 #> Keeping new fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -15.44144 #> Keeping old fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -15.44141 #> Keeping old fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -15.44141 #> Keeping old fit. #> #> Done! plot(fout)"},{"path":"/reference/flexdog_full.html","id":null,"dir":"Reference","previous_headings":"","what":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"Genotype polyploid individuals next generation sequencing (NGS) data assuming genotype distribution one several forms. flexdog_full() accounting allele bias, overdispersion, sequencing error. function options flexdog meant expert users. method described detail Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/flexdog_full.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"","code":"flexdog_full( refvec, sizevec, ploidy, model = c(\"norm\", \"hw\", \"bb\", \"s1\", \"s1pp\", \"f1\", \"f1pp\", \"flex\", \"uniform\", \"custom\"), verbose = TRUE, mean_bias = 0, var_bias = 0.7^2, mean_seq = -4.7, var_seq = 1, mean_od = -5.5, var_od = 0.5^2, seq = 0.005, bias = 1, od = 0.001, update_bias = TRUE, update_seq = TRUE, update_od = TRUE, itermax = 200, tol = 10^-4, fs1_alpha = 10^-3, p1ref = NULL, p1size = NULL, p2ref = NULL, p2size = NULL, snpname = NULL, prior_vec = NULL, seq_upper = 0.05 )"},{"path":"/reference/flexdog_full.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"refvec vector counts reads reference allele. sizevec vector total counts. ploidy ploidy species. Assumed individual. model form prior (genotype distribution) take? See Details possible values. verbose output (TRUE) less (FALSE)? mean_bias prior mean log-bias. var_bias prior variance log-bias. mean_seq prior mean logit sequencing error rate. var_seq prior variance logit sequencing error rate. mean_od prior mean logit overdispersion parameter. var_od prior variance logit overdispersion parameter. seq starting value sequencing error rate. bias starting value bias. od starting value overdispersion parameter. update_bias logical. update bias (TRUE), (FALSE)? update_seq logical. update seq (TRUE), (FALSE)? update_od logical. update od (TRUE), (FALSE)? itermax total number EM iterations run. tol tolerance stopping criterion. EM algorithm stop difference log-likelihoods two consecutive iterations less tol. fs1_alpha value fix mixing proportion uniform component model = \"f1\", model = \"s1\", model = \"f1pp\", model = \"s1pp\". recommend small value 10^-3. p1ref reference counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p1size total counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p2ref reference counts second parent model = \"f1\" model = \"f1pp\". p2size total counts second parent model = \"f1\" model = \"f1pp\". snpname string. name SNP consideration. just returned input list reference. prior_vec pre-specified genotype distribution. used model = \"custom\" must otherwise NULL. specified, vector length ploidy + 1 non-negative elements sum 1. seq_upper upper bound possible sequencing error rate. Default 0.05, adjust prior knowledge sequencing error rate sequencing technology.","code":""},{"path":"/reference/flexdog_full.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"object class flexdog, consists list following elements: bias estimated bias parameter. seq estimated sequencing error rate. od estimated overdispersion parameter. num_iter number EM iterations ran. wary equals itermax. llike maximum marginal log-likelihood. postmat matrix posterior probabilities genotype individual. rows index individuals columns index allele dosage. genologlike matrix genotype log-likelihoods genotype individual. rows index individuals columns index allele dosage. gene_dist estimated genotype distribution. ith element proportion individuals genotype -1. par list final estimates parameters genotype distribution. elements included par depends value model: model = \"norm\": mu: normal mean. sigma: normal standard deviation (variance). model = \"hw\": alpha: major allele frequency. model = \"bb\": alpha: major allele frequency. tau: overdispersion parameter. See description rho Details betabinom(). model = \"s1\": pgeno: allele dosage parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1\": p1geno: allele dosage first parent. p2geno: allele dosage second parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"s1pp\": ell1: estimated dosage parent. tau1: estimated double reduction parameter parent. Available ell1 1, 2, 3. Identified ell1 1 3. gamma1: estimated preferential pairing parameter. Available ell1 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1pp\": ell1: estimated dosage parent 1. ell2: estimated dosage parent 2. tau1: estimated double reduction parameter parent 1. Available ell1 1, 2, 3. Identified ell1 1 3. tau2: estimated double reduction parameter parent 2. Available ell2 1, 2, 3. Identified ell2 1 3. gamma1: estimated preferential pairing parameter parent 1. Available ell1 2. However, returned identified form. gamma2: estimated preferential pairing parameter parent 2. Available ell2 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"flex\": par empty list. model = \"uniform\": par empty list. model = \"custom\": par empty list. geno posterior mode genotype. genotype estimates. maxpostprob maximum posterior probability. equivalent posterior probability correctly genotyping individual. postmean posterior mean genotype. downstream association studies, might want consider using estimates. input$refvec value refvec provided user. input$sizevec value sizevec provided user. input$ploidy value ploidy provided user. input$model value model provided user. input$p1ref value p1ref provided user. input$p1size value p1size provided user. input$p2ref value p2ref provided user. input$p2size value p2size provided user. input$snpname value snpname provided user. prop_mis posterior proportion individuals genotyped incorrectly.","code":""},{"path":"/reference/flexdog_full.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"Possible values genotype distribution (values model) : \"norm\" distribution whose genotype frequencies proportional density value normal mean standard deviation. Unlike \"bb\" \"hw\" options, allow distributions less dispersed binomial. seems robust violations modeling assumptions, default. prior class developed Gerard Ferrão (2020). \"hw\" binomial distribution results assuming population Hardy-Weinberg equilibrium (HWE). actually pretty well even minor moderate deviations HWE. Though perform well `\"norm\"` option severe deviations HWE. \"bb\" beta-binomial distribution. overdispersed version \"hw\" can derived special case Balding-Nichols model. \"s1\" prior assumes individuals full-siblings resulting one generation selfing. .e. one parent. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1\" prior assumes individuals full-siblings resulting one generation bi-parental cross. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1pp\" prior allows double reduction preferential pairing F1 population tretraploids. \"s1pp\" prior allows double reduction preferential pairing S1 population tretraploids. \"flex\" Generically categorical distribution. Theoretically, works well lot individuals. practice, seems much less robust violations modeling assumptions. \"uniform\" discrete uniform distribution. never used practice. \"custom\" pre-specified prior distribution. specify using prior_vec argument. almost never use option practice. might think good default model = \"uniform\" somehow \"uninformative prior.\" informative tends work horribly practice. intuition estimate allele bias sequencing error rates estimated genotypes approximately uniform (since assuming approximately uniform). usually result unintuitive genotyping since populations uniform genotype distribution. include option completeness. Please use . value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.05. value maxpostprob intuitive measure quality genotype estimate individual. posterior probability correctly genotyping individual using geno (posterior mode) genotype estimate. want correctly genotype, say, 95% individuals, discard individuals maxpostprob 0.95. However, just going impute missing genotypes later, might consider discarding individuals flexdog's genotype estimates probably accurate naive approaches, imputing using grand mean. datasets examined, allelic bias major issue. However, may fit model assuming allelic bias setting update_bias = FALSE bias_init = 1. Prior using flexdog, read-mapping step, try get rid allelic bias using WASP (doi:10.1101/011221 ). successful removing allelic bias (source read-mapping step), setting update_bias = FALSE bias_init = 1 reasonable. can visually inspect SNPs bias using plot_geno(). flexdog(), like methods, invariant allele label \"reference\" label \"alternative\". , set refvec number alternative read-counts, resulting genotype estimates estimated allele dosage alternative allele.","code":""},{"path":"/reference/flexdog_full.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/flexdog_full.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"David Gerard","code":""},{"path":"/reference/flexdog_full.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"","code":"## A natural population. We will assume a ## normal prior since there are so few ## individuals. data(\"uitdewilligen\") ploidy <- 4 refvec <- uitdewilligen$refmat[, 1] sizevec <- uitdewilligen$sizemat[, 1] fout <- flexdog_full(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"norm\") plot(fout)"},{"path":"/reference/format_multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Return arrayicized elements from the output of multidog. — format_multidog","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"function allow genotype estimates, maximum posterior probability, values form matrix/array. multiple variable names provided, data formatted 3-dimensional array dimensions corresponding (individuals, SNPs, variables).","code":""},{"path":"/reference/format_multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"","code":"format_multidog(x, varname = \"geno\")"},{"path":"/reference/format_multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"x output multidog. varname character vector variable names whose values populate cells. column names x$inddf.","code":""},{"path":"/reference/format_multidog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"Note order individuals reshuffled. order SNPs x$snpdf.","code":""},{"path":"/reference/format_multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"David Gerard","code":""},{"path":"/reference/get_q_array.html","id":null,"dir":"Reference","previous_headings":"","what":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"Return probabilities offspring's genotype given parental genotypes possible combinations parental offspring genotypes. species polysomal inheritance bivalent, non-preferential pairing.","code":""},{"path":"/reference/get_q_array.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"","code":"get_q_array(ploidy)"},{"path":"/reference/get_q_array.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"ploidy positive integer. ploidy species.","code":""},{"path":"/reference/get_q_array.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"three-way array proportions. (, j, k)th element probability offspring k - 1 reference alleles given parent 1 - 1 reference alleles parent 2 j - 1 reference alleles. dimension array ploidy + 1. dimension names, \"\" stands reference allele \"\" stands alternative allele.","code":""},{"path":"/reference/get_q_array.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"David Gerard","code":""},{"path":"/reference/get_q_array.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"","code":"qarray <- get_q_array(6) apply(qarray, c(1, 2), sum) ## should all be 1's. #> parent2 #> parent1 aaaaaa Aaaaaa AAaaaa AAAaaa AAAAaa AAAAAa AAAAAA #> aaaaaa 1 1 1 1 1 1 1 #> Aaaaaa 1 1 1 1 1 1 1 #> AAaaaa 1 1 1 1 1 1 1 #> AAAaaa 1 1 1 1 1 1 1 #> AAAAaa 1 1 1 1 1 1 1 #> AAAAAa 1 1 1 1 1 1 1 #> AAAAAA 1 1 1 1 1 1 1"},{"path":"/reference/is.flexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Tests if an argument is a flexdog object. — is.flexdog","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"Tests argument flexdog object.","code":""},{"path":"/reference/is.flexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"","code":"is.flexdog(x)"},{"path":"/reference/is.flexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"x Anything.","code":""},{"path":"/reference/is.flexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"logical. TRUE x flexdog object, FALSE otherwise.","code":""},{"path":"/reference/is.flexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"David Gerard","code":""},{"path":"/reference/is.flexdog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"","code":"is.flexdog(\"anything\") #> [1] FALSE # FALSE"},{"path":"/reference/is.multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Tests if an argument is a multidog object. — is.multidog","title":"Tests if an argument is a multidog object. — is.multidog","text":"Tests argument multidog object.","code":""},{"path":"/reference/is.multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tests if an argument is a multidog object. — is.multidog","text":"","code":"is.multidog(x)"},{"path":"/reference/is.multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tests if an argument is a multidog object. — is.multidog","text":"x Anything.","code":""},{"path":"/reference/is.multidog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tests if an argument is a multidog object. — is.multidog","text":"logical. TRUE x multidog object, FALSE otherwise.","code":""},{"path":"/reference/is.multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Tests if an argument is a multidog object. — is.multidog","text":"David Gerard","code":""},{"path":"/reference/is.multidog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tests if an argument is a multidog object. — is.multidog","text":"","code":"is.multidog(\"anything\") #> [1] FALSE # FALSE"},{"path":"/reference/log_sum_exp.html","id":null,"dir":"Reference","previous_headings":"","what":"Log-sum-exponential trick. — log_sum_exp","title":"Log-sum-exponential trick. — log_sum_exp","text":"Log-sum-exponential trick.","code":""},{"path":"/reference/log_sum_exp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Log-sum-exponential trick. — log_sum_exp","text":"","code":"log_sum_exp(x)"},{"path":"/reference/log_sum_exp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Log-sum-exponential trick. — log_sum_exp","text":"x vector log-sum-exp.","code":""},{"path":"/reference/log_sum_exp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Log-sum-exponential trick. — log_sum_exp","text":"log sum exponential elements x.","code":""},{"path":"/reference/log_sum_exp.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Log-sum-exponential trick. — log_sum_exp","text":"David Gerard","code":""},{"path":"/reference/log_sum_exp_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"Log-sum-exponential trick using just two doubles.","code":""},{"path":"/reference/log_sum_exp_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"","code":"log_sum_exp_2(x, y)"},{"path":"/reference/log_sum_exp_2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"x double. y Another double.","code":""},{"path":"/reference/log_sum_exp_2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"log sum exponential x y.","code":""},{"path":"/reference/log_sum_exp_2.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"David Gerard","code":""},{"path":"/reference/multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Fit flexdog to multiple SNPs. — multidog","title":"Fit flexdog to multiple SNPs. — multidog","text":"convenience function run flexdog many SNPs. Support provided parallel computing doParallel package. function extensively tested. Please report bugs https://github.com/dcgerard/updog/issues.","code":""},{"path":"/reference/multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fit flexdog to multiple SNPs. — multidog","text":"","code":"multidog( refmat, sizemat, ploidy, model = c(\"norm\", \"hw\", \"bb\", \"s1\", \"s1pp\", \"f1\", \"f1pp\", \"flex\", \"uniform\", \"custom\"), nc = 1, p1_id = NULL, p2_id = NULL, bias_init = exp(c(-1, -0.5, 0, 0.5, 1)), prior_vec = NULL, ... )"},{"path":"/reference/multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fit flexdog to multiple SNPs. — multidog","text":"refmat matrix reference read counts. columns index individuals rows index markers (SNPs). matrix must rownames (names markers) column names (names individuals). names must match names sizemat. sizemat matrix total read counts. columns index individuals rows index markers (SNPs). matrix must rownames (names markers) column names (names individuals). names must match names refmat. ploidy ploidy species. Assumed individual. model form prior (genotype distribution) take? See Details possible values. nc number computing cores use parallelization local machine. See section \"Parallel Computation\" implement complicated evaluation strategies using future package. specifying evaluation strategies using future package, also set nc = NA. value nc never number cores available computing environment. can determine maximum number available cores running future::availableCores() R. p1_id ID first parent. character length 1. correspond single column name refmat sizemat. p2_id ID second parent. character length 1. correspond single column name refmat sizemat. bias_init vector initial values bias parameter multiple runs flexdog_full(). prior_vec pre-specified genotype distribution. used model = \"custom\" must otherwise NULL. specified, vector length ploidy + 1 non-negative elements sum 1. ... Additional parameters pass flexdog_full().","code":""},{"path":"/reference/multidog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fit flexdog to multiple SNPs. — multidog","text":"list-like object two data frames. snpdf data frame containing properties SNPs (markers). rows index SNPs. variables include: snp name SNP (marker). bias estimated allele bias SNP. seq estimated sequencing error rate SNP. od estimated overdispersion parameter SNP. prop_mis estimated proportion individuals misclassified SNP. num_iter number iterations performed EM algorithm SNP. llike maximum marginal likelihood SNP. ploidy provided ploidy species. model provided model prior genotype distribution. p1ref user-provided reference read counts parent 1. p1size user-provided total read counts parent 1. p2ref user-provided reference read counts parent 2. p2size user-provided total read counts parent 2. Pr_k estimated frequency individuals genotype k, k can integer 0 ploidy level. Model specific parameter estimates See return value par help page flexdog. inddf data frame containing properties individuals SNP. variables include: snp name SNP (marker). ind name individual. ref provided reference counts individual SNP. size provided total counts individual SNP. geno posterior mode genotype individual SNP. estimated reference allele dosage given individual given SNP. postmean posterior mean genotype individual SNP. continuous genotype estimate reference allele dosage given individual given SNP. maxpostprob maximum posterior probability. posterior probability individual genotyped correctly. Pr_k posterior probability given individual given SNP genotype k, k can vary 0 ploidy level species. logL_k genotype log-likelihoods dosage k given individual given SNP, k can vary f rom 0 ploidy level species.","code":""},{"path":"/reference/multidog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fit flexdog to multiple SNPs. — multidog","text":"format reference counts total read counts two separate matrices. rows index markers (SNPs) columns index individuals. Row names ID SNPs column names ID individuals, required attributes. data VCF files, recommend importing using VariantAnnotation package Bioconductor https://bioconductor.org/packages/VariantAnnotation/. great VCF parser. See details flexdog possible values model. model = \"f1\", model = \"s1\", model = \"f1pp\" model = \"s1pp\" user may provide individual ID parent(s) via p1_id p2_id arguments. output list containing two data frames. first data frame, called snpdf, contains information SNP, allele bias sequencing error rate. second data frame, called inddf, contains information individual SNP, estimated genotype posterior probability classified correctly. SNPs contain 0 reads (missing data) entirely removed.","code":""},{"path":"/reference/multidog.html","id":"parallel-computation","dir":"Reference","previous_headings":"","what":"Parallel Computation","title":"Fit flexdog to multiple SNPs. — multidog","text":"multidog() function supports parallel computing. future package. just running multidog() local machine, can use nc argument specify parallelization. value nc greater 1 result multiple background R sessions genotype SNPs. maximum value nc try can found running future::availableCores(). Running multidog() using nc equivalent setting future plan future::plan(future::multisession, workers = nc). Using future package means different evaluation strategies possible. particular, using high performance machine, can explore using future.batchtools package evaluate multidog() using schedulers like Slurm TORQUE/PBS. use different strategy, set nc = NA run future::plan() prior running multidog(). example, set forked R processes current machine (instead using background R sessions), run (work Windows): future::plan(future::multicore), followed running multidog() nc = NA. See examples .","code":""},{"path":[]},{"path":"/reference/multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Fit flexdog to multiple SNPs. — multidog","text":"David Gerard","code":""},{"path":"/reference/multidog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fit flexdog to multiple SNPs. — multidog","text":"","code":"if (FALSE) { data(\"uitdewilligen\") ## Run multiple R sessions using the `nc` variable. mout <- multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, nc = 2) mout$inddf mout$snpdf ## Run multiple external R sessions on the local machine. ## Note that we set `nc = NA`. cl <- parallel::makeCluster(2, timeout = 60) future::plan(future::cluster, workers = cl) mout <- multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, nc = NA) mout$inddf mout$snpdf ## Close cluster and reset future to current R process parallel::stopCluster(cl) future::plan(future::sequential) }"},{"path":"/reference/oracle_cor.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"Calculates correlation oracle MAP estimator (perfect knowledge data generation process) true genotype. useful approximation lot individuals.","code":""},{"path":"/reference/oracle_cor.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"","code":"oracle_cor(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_cor.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_cor.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"Pearson correlation true genotype oracle estimator.","code":""},{"path":"/reference/oracle_cor.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ].","code":""},{"path":"/reference/oracle_cor.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/oracle_cor.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"David Gerard","code":""},{"path":"/reference/oracle_cor.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ## See how correlation decreases as we ## increase the ploidy. ploidy <- 2 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.9999983 ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.9803195 ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.940216"},{"path":"/reference/oracle_cor_from_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"Calculates correlation oracle MAP estimator (perfect knowledge data generation process) true genotype. useful approximation lot individuals.","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"","code":"oracle_cor_from_joint(jd)"},{"path":"/reference/oracle_cor_from_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"jd matrix numerics. Element (, j) probability genotype - 1 estimated genotype j - 1. usually obtained oracle_joint.","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"Pearson correlation true genotype oracle estimator.","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_cor_from_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) oracle_cor_from_joint(jd = jd) #> [1] 0.940216 ## Compare to oracle_cor oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.940216"},{"path":"/reference/oracle_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"returns joint distribution true genotypes oracle estimator given perfect knowledge data generating process. useful approximation lot individuals.","code":""},{"path":"/reference/oracle_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"","code":"oracle_joint(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"matrix. Element (, j) joint probability estimating genotype +1 true genotype j+1. , estimated genotype indexes rows true genotype indexes columns. using oracle estimator.","code":""},{"path":"/reference/oracle_joint.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ]. See Examples see reconcile output oracle_joint oracle_mis oracle_mis_vec.","code":""},{"path":"/reference/oracle_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) jd #> [,1] [,2] [,3] [,4] [,5] #> [1,] 3.905665e-03 1.759022e-07 8.379767e-17 1.784566e-29 3.601827e-52 #> [2,] 5.849980e-07 4.379346e-02 2.180335e-03 2.159655e-09 3.235599e-26 #> [3,] 1.897235e-20 3.081362e-03 1.961102e-01 1.099225e-02 6.173803e-14 #> [4,] 1.314974e-34 2.427440e-09 1.264700e-02 4.105964e-01 2.601245e-04 #> [5,] 2.284980e-57 7.543647e-25 9.090651e-12 2.863373e-04 3.161461e-01 ## Get same output as oracle_mis this way: 1 - sum(diag(jd)) #> [1] 0.02944818 oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.02944818 ## Get same output as oracle_mis_vec this way: 1 - diag(sweep(x = jd, MARGIN = 2, STATS = colSums(jd), FUN = \"/\")) #> [1] 0.0001497595 0.0657395175 0.0702925658 0.0267344300 0.0008221220 oracle_mis_vec(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.0001497595 0.0657395175 0.0702925658 0.0267344300 0.0008221220"},{"path":"/reference/oracle_mis.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate oracle misclassification error rate. — oracle_mis","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"Given perfect knowledge data generating parameters, oracle_mis calculates misclassification error rate, error rate taken data generation process allele-distribution. ideal level misclassification error rate real method larger rate . useful approximation lot individuals.","code":""},{"path":"/reference/oracle_mis.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"","code":"oracle_mis(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_mis.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_mis.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"double. oracle misclassification error rate.","code":""},{"path":"/reference/oracle_mis.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ].","code":""},{"path":"/reference/oracle_mis.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/oracle_mis.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"David Gerard","code":""},{"path":"/reference/oracle_mis.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ## See how oracle misclassification error rates change as we ## increase the ploidy. ploidy <- 2 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 1.262647e-06 ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.02944818 ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.1329197"},{"path":"/reference/oracle_mis_from_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"Get oracle misclassification error rate directly joint distribution genotype oracle estimator.","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"","code":"oracle_mis_from_joint(jd)"},{"path":"/reference/oracle_mis_from_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"jd matrix numerics. Element (, j) probability genotype - 1 estimated genotype j - 1. usually obtained oracle_joint.","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"double. oracle misclassification error rate.","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_mis_from_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) oracle_mis_from_joint(jd = jd) #> [1] 0.1329197 ## Compare to oracle_cor oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.1329197"},{"path":"/reference/oracle_mis_vec.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"Given perfect knowledge data generating parameters, oracle_mis_vec calculates misclassification error rate genotype. differs oracle_mis average genotype distribution get overall misclassification error rate. , oracle_mis_vec returns vector misclassification error rates conditional genotype.","code":""},{"path":"/reference/oracle_mis_vec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"","code":"oracle_mis_vec(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_mis_vec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_mis_vec.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"vector numerics. Element oracle misclassification error rate genotyping individual actual genotype + 1.","code":""},{"path":"/reference/oracle_mis_vec.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"ideal level misclassification error rate real method larger rate . useful approximation lot individuals. come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ].","code":""},{"path":"/reference/oracle_mis_vec.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/oracle_mis_vec.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"David Gerard","code":""},{"path":"/reference/oracle_mis_vec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) om <- oracle_mis_vec(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) om #> [1] 0.0001497595 0.0657395175 0.0702925658 0.0267344300 0.0008221220 ## Get same output as oracle_mis this way: sum(dist * om) #> [1] 0.02944818 oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.02944818"},{"path":"/reference/oracle_mis_vec_from_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"Get oracle misclassification error rates (conditional true genotype) directly joint distribution genotype oracle estimator.","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"","code":"oracle_mis_vec_from_joint(jd)"},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"jd matrix numerics. Element (, j) probability genotype - 1 estimated genotype j - 1. usually obtained oracle_joint.","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"vector numerics. Element oracle misclassification error rate genotyping individual actual genotype + 1.","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) oracle_mis_vec_from_joint(jd = jd) #> [1] 0.001855178 0.186231038 0.262779904 0.249400633 0.177957888 0.103565813 #> [7] 0.005097110 ## Compare to oracle_cor oracle_mis_vec(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.001855178 0.186231038 0.262779904 0.249400633 0.177957888 0.103565813 #> [7] 0.005097110"},{"path":"/reference/oracle_plot.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"obtaining joint distribution true genotype estimated genotype oracle estimator using oracle_joint, can use oracle_plot visualize joint distribution.","code":""},{"path":"/reference/oracle_plot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"","code":"oracle_plot(jd)"},{"path":"/reference/oracle_plot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"jd matrix containing joint distribution true genotype oracle estimator. Usually, obtained call oracle_joint.","code":""},{"path":"/reference/oracle_plot.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"ggplot object containing oracle plot. x-axis indexes possible values estimated genotype. y-axis indexes possible values true genotype. number cell (, j) probability individual true genotype estimated genotype j. using oracle estimator. cells also color-coded size probability cell. top listed oracle misclassification error rate correlation true genotype estimated genotype. quantities may derived joint distribution.","code":""},{"path":"/reference/oracle_plot.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_plot.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"David Gerard","code":""},{"path":"/reference/oracle_plot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"","code":"ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) pl <- oracle_plot(jd = jd) print(pl)"},{"path":"/reference/plot.flexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Draw a genotype plot from the output of flexdog. — plot.flexdog","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"wrapper plot_geno. create genotype plot single SNP.","code":""},{"path":"/reference/plot.flexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"","code":"# S3 method for flexdog plot(x, use_colorblind = TRUE, ...)"},{"path":"/reference/plot.flexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"x flexdog object. use_colorblind use colorblind-safe palette (TRUE) (FALSE)? TRUE allowed ploidy less equal 6. ... used.","code":""},{"path":"/reference/plot.flexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"ggplot object genotype plot.","code":""},{"path":"/reference/plot.flexdog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"genotype plot, x-axis contains counts non-reference allele y-axis contains counts reference allele. dashed lines expected counts (reference alternative) given sequencing error rate allele-bias. plots color-coded maximum--posterior genotypes. Transparency proportional maximum posterior probability individual's genotype. Thus, less certain genotype transparent individuals. types plots used Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/plot.flexdog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/plot.flexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"David Gerard","code":""},{"path":"/reference/plot.multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot the output of multidog. — plot.multidog","title":"Plot the output of multidog. — plot.multidog","text":"Produce genotype plots output multidog. may select SNPs plot.","code":""},{"path":"/reference/plot.multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot the output of multidog. — plot.multidog","text":"","code":"# S3 method for multidog plot(x, indices = seq(1, min(5, nrow(x$snpdf))), ...)"},{"path":"/reference/plot.multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot the output of multidog. — plot.multidog","text":"x output multidog. indices vector integers. indices SNPs plot. ... used.","code":""},{"path":"/reference/plot.multidog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot the output of multidog. — plot.multidog","text":"genotype plot, x-axis contains counts non-reference allele y-axis contains counts reference allele. dashed lines expected counts (reference alternative) given sequencing error rate allele-bias. plots color-coded maximum--posterior genotypes. Transparency proportional maximum posterior probability individual's genotype. Thus, less certain genotype transparent individuals. types plots used Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/plot.multidog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Plot the output of multidog. — plot.multidog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/plot.multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Plot the output of multidog. — plot.multidog","text":"David Gerard","code":""},{"path":"/reference/plot_geno.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a genotype plot. — plot_geno","title":"Make a genotype plot. — plot_geno","text":"x-axis counts non-reference allele, y-axis counts reference allele. Transparency controlled maxpostprob vector. types plots used Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/plot_geno.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a genotype plot. — plot_geno","text":"","code":"plot_geno( refvec, sizevec, ploidy, p1ref = NULL, p1size = NULL, p2ref = NULL, p2size = NULL, geno = NULL, seq = 0, bias = 1, maxpostprob = NULL, p1geno = NULL, p2geno = NULL, use_colorblind = TRUE )"},{"path":"/reference/plot_geno.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a genotype plot. — plot_geno","text":"refvec vector non-negative integers. number reference reads observed individuals sizevec vector positive integers. total number reads individuals. ploidy non-negative integer. ploidy species. p1ref vector non-negative integers. number reference reads observed parent 1 (individuals siblings). p1size vector positive integers. total number reads parent 1 (individuals siblings). p2ref vector non-negative integers. number reference reads observed parent 2 (individuals siblings). p2size vector positive integers. total number reads parent 2 (individuals siblings). geno individual genotypes. seq sequencing error rate. bias bias parameter. maxpostprob vector posterior probabilities modal genotype. p1geno Parent 1's genotype. p2geno Parent 2's genotype. use_colorblind logical. use colorblind safe palette (TRUE), (FALSE)? allowed ploidy <= 6.","code":""},{"path":"/reference/plot_geno.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Make a genotype plot. — plot_geno","text":"ggplot object genotype plot.","code":""},{"path":"/reference/plot_geno.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Make a genotype plot. — plot_geno","text":"parental genotypes provided (p1geno p2geno) colored offspring. Since often hard see, small black dot also indicate position.","code":""},{"path":"/reference/plot_geno.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Make a genotype plot. — plot_geno","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":"/reference/plot_geno.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Make a genotype plot. — plot_geno","text":"David Gerard","code":""},{"path":"/reference/plot_geno.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Make a genotype plot. — plot_geno","text":"","code":"data(\"snpdat\") refvec <- snpdat$counts[snpdat$snp == \"SNP1\"] sizevec <- snpdat$size[snpdat$snp == \"SNP1\"] ploidy <- 6 plot_geno(refvec = refvec, sizevec = sizevec, ploidy = ploidy)"},{"path":"/reference/rflexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Simulate GBS data from the flexdog likelihood. — rflexdog","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"take vector genotypes vector total read-counts, generate vector reference counts. get genotypes, use rgeno. likelihood used generate read-counts described detail Gerard et. al. (2018).","code":""},{"path":"/reference/rflexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"","code":"rflexdog(sizevec, geno, ploidy, seq = 0.005, bias = 1, od = 0.001)"},{"path":"/reference/rflexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"sizevec vector total read-counts individuals. geno vector genotypes individuals. .e. number reference alleles individual . ploidy ploidy species. seq sequencing error rate. bias bias parameter. Pr(read selected) / Pr(read selected). od overdispersion parameter. See Details rho variable betabinom.","code":""},{"path":"/reference/rflexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"vector length sizevec. ith element number reference counts individual .","code":""},{"path":"/reference/rflexdog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/rflexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"David Gerard","code":""},{"path":"/reference/rflexdog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"","code":"set.seed(1) n <- 100 ploidy <- 6 ## Generate the genotypes of individuals from an F1 population, ## where the first parent has 1 copy of the reference allele ## and the second parent has two copies of the reference ## allele. genovec <- rgeno(n = n, ploidy = ploidy, model = \"f1\", p1geno = 1, p2geno = 2) ## Get the total number of read-counts for each individual. ## Ideally, you would take this from real data as the total ## read-counts are definitely not Poisson. sizevec <- stats::rpois(n = n, lambda = 200) ## Generate the counts of reads with the reference allele ## when there is a strong bias for the reference allele ## and there is no overdispersion. refvec <- rflexdog(sizevec = sizevec, geno = genovec, ploidy = ploidy, seq = 0.001, bias = 0.5, od = 0) ## Plot the simulated data using plot_geno. plot_geno(refvec = refvec, sizevec = sizevec, ploidy = ploidy, seq = 0.001, bias = 0.5)"},{"path":"/reference/rgeno.html","id":null,"dir":"Reference","previous_headings":"","what":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"simulate genotypes sample individuals drawn one populations supported flexdog. See details flexdog models allowed. genotype distributions described detail Gerard Ferrão (2020).","code":""},{"path":"/reference/rgeno.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"","code":"rgeno( n, ploidy, model = c(\"hw\", \"bb\", \"norm\", \"f1\", \"s1\", \"flex\", \"uniform\"), allele_freq = NULL, od = NULL, p1geno = NULL, p2geno = NULL, pivec = NULL, mu = NULL, sigma = NULL )"},{"path":"/reference/rgeno.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"n number observations. ploidy ploidy species. model form prior take? See Details flexdog. allele_freq model = \"hw\", allele frequency population. model, NULL. od model = \"bb\", overdispersion parameter beta-binomial distribution. See betabinom details. model, NULL. p1geno Either first parent's genotype model = \"f1\", parent's genotype model = \"s1\". model, NULL. p2geno second parent's genotype model = \"f1\". model, NULL. pivec vector probabilities. model = \"ash\", represents mixing proportions discrete uniforms. model = \"flex\", element probability genotype - 1. model, NULL. mu model = \"norm\", mean normal. model, NULL. sigma model = \"norm\", standard deviation normal. model, NULL.","code":""},{"path":"/reference/rgeno.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"vector length n genotypes sampled individuals.","code":""},{"path":"/reference/rgeno.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"List non-NULL arguments: model = \"flex\": pivec model = \"hw\": allele_freq model = \"f1\": p1geno p2geno model = \"s1\": p1geno model = \"uniform\": non-NULL arguments model = \"bb\": allele_freq od model == \"norm\": mu sigma","code":""},{"path":"/reference/rgeno.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":"/reference/rgeno.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"David Gerard","code":""},{"path":"/reference/rgeno.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"","code":"## F1 Population where parent 1 has 1 copy of the referenc allele ## and parent 2 has 4 copies of the reference allele. ploidy <- 6 rgeno(n = 10, ploidy = ploidy, model = \"f1\", p1geno = 1, p2geno = 4) #> [1] 3 3 3 2 2 4 1 2 2 2 ## A population in Hardy-Weinberge equilibrium with an ## allele frequency of 0.75 rgeno(n = 10, ploidy = ploidy, model = \"hw\", allele_freq = 0.75) #> [1] 5 3 3 4 3 5 5 4 4 4"},{"path":"/reference/snpdat.html","id":null,"dir":"Reference","previous_headings":"","what":"GBS data from Shirasawa et al (2017) — snpdat","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"Contains counts reference alleles total read counts GBS data Shirasawa et al (2017) three SNPs used examples Gerard et. al. (2018).","code":""},{"path":"/reference/snpdat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"","code":"snpdat"},{"path":"/reference/snpdat.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"tibble 419 rows 4 columns: id identification label individuals. snp SNP label. counts number read-counts support reference allele. size total number read-counts given SNP.","code":""},{"path":"/reference/snpdat.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"doi:10.1038/srep44207","code":""},{"path":"/reference/snpdat.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"tibble. See Format Section.","code":""},{"path":"/reference/snpdat.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"Shirasawa, Kenta, Masaru Tanaka, Yasuhiro Takahata, Daifu Ma, Qinghe Cao, Qingchang Liu, Hong Zhai, Sang-Soo Kwak, Jae Cheol Jeong, Ung-Han Yoon, Hyeong-Un Lee, Hideki Hirakawa, Sahiko Isobe \"high-density SNP genetic map consisting complete set homologous groups autohexaploid sweetpotato (Ipomoea batatas).\" Scientific Reports 7 (2017). doi:10.1038/srep44207 Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/uitdewilligen.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"list containing matrix reference counts, matrix total counts, ploidy level (4) species. subset data Uitdewilligen et al (2013).","code":""},{"path":"/reference/uitdewilligen.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"","code":"uitdewilligen"},{"path":"/reference/uitdewilligen.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"list containing three objects. Two matrices numeric scalar: refmat matrix read counts containing reference allele. rows index individuals columns index SNPs. sizemat matrix total number read counts. rows index individuals columns index SNPs. ploidy ploidy level species (just 4).","code":""},{"path":"/reference/uitdewilligen.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"doi:10.1371/journal.pone.0062355","code":""},{"path":"/reference/uitdewilligen.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"list. See Format Section.","code":""},{"path":"/reference/uitdewilligen.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"Uitdewilligen, J. G., Wolters, . M. ., Bjorn, B., Borm, T. J., Visser, R. G., & van Eck, H. J. (2013). next-generation sequencing method genotyping--sequencing highly heterozygous autotetraploid potato. PLoS One, 8(5), e62355. doi:10.1371/journal.pone.0062355","code":""},{"path":"/reference/updog-package.html","id":null,"dir":"Reference","previous_headings":"","what":"updog Flexible Genotyping for Polyploids — updog-package","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"Implements empirical Bayes approaches genotype polyploids next generation sequencing data accounting allele bias, overdispersion, sequencing error. main functions flexdog() multidog(), allow specification many different genotype distributions. Also provided functions simulate genotypes, rgeno(), read-counts, rflexdog(), well functions calculate oracle genotyping error rates, oracle_mis(), correlation true genotypes, oracle_cor(). latter two functions useful read depth calculations. Run browseVignettes(package = \"updog\") R example usage. See Gerard et al. (2018) Gerard Ferrao (2020) details implemented methods.","code":""},{"path":"/reference/updog-package.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"package named updog \"Using Parental Data Offspring Genotyping\" originally developed method full-sib populations, works now general populations. best competitor probably fitPoly package, can check https://cran.r-project.org/package=fitPoly. Though, think updog returns better calibrated measures uncertainty next-generation sequencing data. find bug want enhancement, please submit issue https://github.com/dcgerard/updog/issues.","code":""},{"path":"/reference/updog-package.html","id":"updog-functions","dir":"Reference","previous_headings":"","what":"updog Functions","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"flexdog() main function fits empirical Bayes approach genotype polyploids next generation sequencing data. multidog() convenience function running flexdog() many SNPs. function provides support parallel computing. format_multidog() Return arrayicized elements output multidog(). filter_snp() Filter SNPs based output multidog() rgeno() simulate genotypes sample one models allowed flexdog(). rflexdog() Simulate read-counts flexdog() model. plot.flexdog() Plotting output flexdog(). plot.multidog() Plotting output multidog(). oracle_joint() joint distribution true genotype oracle estimator. oracle_plot() Visualize output oracle_joint(). oracle_mis() oracle misclassification error rate (Bayes rate). oracle_cor() Correlation true genotype oracle estimated genotype.","code":""},{"path":"/reference/updog-package.html","id":"updog-datasets","dir":"Reference","previous_headings":"","what":"updog Datasets","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"snpdat small example dataset using flexdog. uitdewilligen small example dataset","code":""},{"path":"/reference/updog-package.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":"/reference/updog-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"David Gerard","code":""},{"path":"/reference/wem.html","id":null,"dir":"Reference","previous_headings":"","what":"EM algorithm to fit weighted ash objective. — wem","title":"EM algorithm to fit weighted ash objective. — wem","text":"Solves following optimization problem $$\\max_{\\pi} \\sum_k w_k \\log(\\sum_j \\pi_j \\ell_jk).$$ using weighted EM algorithm.","code":""},{"path":"/reference/wem.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"EM algorithm to fit weighted ash objective. — wem","text":"","code":"wem(weight_vec, lmat, pi_init, lambda, itermax, obj_tol)"},{"path":"/reference/wem.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"EM algorithm to fit weighted ash objective. — wem","text":"weight_vec vector weights. element weight_vec corresponds column lmat. lmat matrix inner weights. columns \"individuals\" rows \"classes.\" pi_init initial values pivec. element pi_init corresponds row lmat. lambda penalty pi's. greater 0 really really small. itermax maximum number EM iterations take. obj_tol objective stopping criterion.","code":""},{"path":"/reference/wem.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"EM algorithm to fit weighted ash objective. — wem","text":"vector numerics.","code":""},{"path":"/reference/wem.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"EM algorithm to fit weighted ash objective. — wem","text":"David Gerard","code":""},{"path":"/reference/wem.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"EM algorithm to fit weighted ash objective. — wem","text":"","code":"set.seed(2) n <- 3 p <- 5 lmat <- matrix(stats::runif(n * p), nrow = n) weight_vec <- seq_len(p) pi_init <- stats::runif(n) pi_init <- pi_init / sum(pi_init) wem(weight_vec = weight_vec, lmat = lmat, pi_init = pi_init, lambda = 0, itermax = 100, obj_tol = 10^-6) #> [,1] #> [1,] 3.830930e-01 #> [2,] 6.169070e-01 #> [3,] 3.041614e-09"},{"path":"/news/index.html","id":"updog-214","dir":"Changelog","previous_headings":"","what":"updog 2.1.4","title":"updog 2.1.4","text":"CRAN release: 2023-11-17 Removed ggthemes dependency. Removed usage ggplot2::aes_string(), since deprecated, replaced tidy evaluation idioms.","code":""},{"path":"/news/index.html","id":"updog-213","dir":"Changelog","previous_headings":"","what":"updog 2.1.3","title":"updog 2.1.3","text":"CRAN release: 2022-10-18 Bug fix: Use && instead & C++.","code":""},{"path":"/news/index.html","id":"updog-212","dir":"Changelog","previous_headings":"","what":"updog 2.1.2","title":"updog 2.1.2","text":"CRAN release: 2022-01-24 Fixed bug use assertthat::are_equal() testthat::expect_equal(). See 21 Jan 2022 R-devel/NEWS states: .equal.numeric() gains sanity check tolerance argument - calling .equal(, b, c) three numeric vectors surprisingly common error.","code":""},{"path":"/news/index.html","id":"updog-211","dir":"Changelog","previous_headings":"","what":"updog 2.1.1","title":"updog 2.1.1","text":"CRAN release: 2021-10-25 Added upper bound sequencing error rate flexdog_full() (, hence, flexdog() multidog()). protects poor behavior observed corner case. Specifically, F1 populations offspring genotype sequenced moderate low depth. Fixed stale URLs, fixed style issues found lintr.","code":""},{"path":"/news/index.html","id":"updog-210","dir":"Changelog","previous_headings":"","what":"updog 2.1.0","title":"updog 2.1.0","text":"parallel backend multidog() now handled future package. use nc argument multidog(), still run parallel using multiple R sessions local machine. However, can now use functionality future choose evaluation strategy, setting nc = NA. also allow use schedulers high performance computing environments future.batchtools package. See multidog() function documentation details. vignette “Genotyping Many SNPs multidog()” goes example using future package. new experimental function, export_vcf(), works export multidog objects VCF file. yet exported still bugs fix. plot.multidog() now plot parent read-counts F1 S1 populations. Internally, multidog() now uses iterators iterators package send subsets data R process. new internal .combine function used foreach() call multidog() order decrease memory usage multidog().","code":""},{"path":"/news/index.html","id":"updog-202","dir":"Changelog","previous_headings":"","what":"updog 2.0.2","title":"updog 2.0.2","text":"CRAN release: 2020-07-21 massive edit updog software. Major changes include: support model = \"ash\". seemed model = \"norm\" always better faster, just got rid \"ash\" option. also extremely simplified code. Removal mupdog(). think good idea, computation way slow usable. Revision model = \"f1pp\" model = \"s1pp\". now include interpretable parameterizations meant identified via another R package. support tetraploids right now. multidog() now prints nice ASCII art ’s run. format_multidog() now allows format multiple variables terms multidimensional array. Fixes bug format_multidog() reordering SNP dimensions. fine long folks used dimnames properly, now allow folks also use dim positions. Updog now returns genotype log-likelihoods.","code":""},{"path":"/news/index.html","id":"updog-121","dir":"Changelog","previous_headings":"","what":"updog 1.2.1","title":"updog 1.2.1","text":"Adds filter_snp() filtering output multidog() based predicates terms variables snpdf. Removes stringr Imports. using one place replaced code base R code. Removes Rmpfr Suggests. longer needed since CVXR longer suggested.","code":""},{"path":"/news/index.html","id":"updog-120","dir":"Changelog","previous_headings":"","what":"updog 1.2.0","title":"updog 1.2.0","text":"CRAN release: 2020-01-28 Adds multidog() genotyping multiple SNPs using parallel computing. Adds plot.multidog() plotting output multidog(). Adds format_multidog() formatting output multidog() matrix. Removes dependency CVXR. makes install maintenance little easier. defaults specific problem little faster anyway. longer changes color scale plot_geno() based genotypes present. .cpp files, now coerce objects unsigned comparing. gets rid warnings install.","code":""},{"path":"/news/index.html","id":"updog-113","dir":"Changelog","previous_headings":"","what":"updog 1.1.3","title":"updog 1.1.3","text":"CRAN release: 2019-11-21 Updates documentation include Bioinformatics publication, Gerard Ferrão (2020) . Adds “internal” keyword functions users don’t need. Removes tidyverse Suggests field. using vignettes, changed base R (except ggplot2).","code":""},{"path":"/news/index.html","id":"updog-111","dir":"Changelog","previous_headings":"","what":"updog 1.1.1","title":"updog 1.1.1","text":"CRAN release: 2019-09-09 Updates documentation include Gerard Ferrão (2020) reference. Minor fixes documentation.","code":""},{"path":"/news/index.html","id":"updog-110","dir":"Changelog","previous_headings":"","what":"updog 1.1.0","title":"updog 1.1.0","text":"CRAN release: 2019-07-31 Introduces flexible priors general populations. Places normal prior distribution logit overdispersion parameter. might change genotype calls previous versions updog. reproduce genotype calls previous versions updog, simply set mean_od = 0 var_od = Inf flexdog(). Adds method = \"custom\" option flexdog(). lets users choose genotype distribution completely known priori. Documentation updates.","code":""},{"path":"/news/index.html","id":"updog-101","dir":"Changelog","previous_headings":"","what":"updog 1.0.1","title":"updog 1.0.1","text":"CRAN release: 2018-07-27 Fixes bug option model = \"s1pp\" flexdog(). originally constraining levels preferential pairing segregations parent. now fixed. downside model = \"s1pp\" now supported ploidy = 4 ploidy = 6. optimization becomes difficult larger ploidy levels. fixed documentation. Perhaps biggest error comes snippet original documentation flexdog: value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.95. now says value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.05. ’ve now exported C++ functions think useful. can call usual way.","code":""},{"path":"/news/index.html","id":"updog-0990","dir":"Changelog","previous_headings":"","what":"updog 0.99.0","title":"updog 0.99.0","text":"complete re-working code updog. old version may found updogAlpha package. main function now flexdog(). experimental approach mupdog() now live. provide guarantees mupdog()’s performance. Oracle misclassification error rates may calculated oracle_mis(). Genotypes can simulated using rgeno(). Next-generation sequencing data can simulated using rflexdog().","code":""}]
+[{"path":"/CONDUCT.html","id":null,"dir":"","previous_headings":"","what":"Contributor Code of Conduct","title":"Contributor Code of Conduct","text":"contributors maintainers project, pledge respect people contribute reporting issues, posting feature requests, updating documentation, submitting pull requests patches, activities. committed making participation project harassment-free experience everyone, regardless level experience, gender, gender identity expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion. Examples unacceptable behavior participants include use sexual language imagery, derogatory comments personal attacks, trolling, public private harassment, insults, unprofessional conduct. Project maintainers right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct. Project maintainers follow Code Conduct may removed project team. Instances abusive, harassing, otherwise unacceptable behavior may reported opening issue contacting one project maintainers. Code Conduct adapted Contributor Covenant (https:contributor-covenant.org), version 1.0.0, available https://contributor-covenant.org/version/1/0/0/","code":""},{"path":"/articles/multidog.html","id":"abstract","dir":"Articles","previous_headings":"","what":"Abstract","title":"Genotyping Many SNPs with multidog()","text":"multidog() provides support genotyping many SNPs iterating flexdog() SNPs. Support provided parallel computing future package. genotyping method described Gerard et al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/articles/multidog.html","id":"fit-multidog","dir":"Articles","previous_headings":"","what":"Fit multidog()","title":"Genotyping Many SNPs with multidog()","text":"Let’s load updog, future, data Uitdewilligen et al. (2013). uitdewilligen$refmat matrix reference counts uitdewilligen$sizemat matrix total read counts. data, rows index individuals columns index loci. insertion multidog() need way around (individuals columns loci rows). transpose matrices. sizemat refmat row column names. names identify loci individuals. want parallel computing, check proper number cores: Now let’s run multidog(): default, parallelization run using nc greater 1. can choose evaluation strategy running future::plan() prior running multidog(), setting nc = NA. particularly useful higher performance computing environments use schedulers, can control evaluation strategy future.batchtools package. example, following run multidog() using forked R processes:","code":"library(future) library(updog) data(\"uitdewilligen\") refmat <- t(uitdewilligen$refmat) sizemat <- t(uitdewilligen$sizemat) ploidy <- uitdewilligen$ploidy setdiff(colnames(sizemat), colnames(refmat)) #> character(0) setdiff(rownames(sizemat), rownames(refmat)) #> character(0) future::availableCores() #> system #> 16 mout <- multidog(refmat = refmat, sizemat = sizemat, ploidy = ploidy, model = \"norm\", nc = 2) #> | *.#,% #> ||| *******/ #> ||||||| (**..#**. */ **/ #> ||||||||| */****************************/*% #> ||| &****..,*.************************/ #> ||| (....,,,*,...****%********/(****** #> ||| ,,****%////,,,,./.****/ #> ||| /**// .*///.... #> ||| .*/*/%# .,/ ., #> ||| , **/ #% .* .. #> ||| ,,,* #> #> Working on it...done! future::plan(future::multisession, workers = nc) future::plan(future::multicore, workers = 2) mout <- multidog(refmat = refmat, sizemat = sizemat, ploidy = ploidy, model = \"norm\", nc = NA) ## Shut down parallel workers future::plan(future::sequential)"},{"path":"/articles/multidog.html","id":"multidog-output","dir":"Articles","previous_headings":"","what":"multidog() Output","title":"Genotyping Many SNPs with multidog()","text":"plot method output multidog(). output multidog contains two data frame. first contains properties SNPs, estimated allele bias estimated sequencing error rate. second data frame contains properties individual SNP, estimated genotypes (geno) posterior probability genotyping correctly (maxpostprob). can obtain columns inddf matrix form format_multidog(). filter SNPs based quality metrics (bias, sequencing error rate, overdispersion, etc), can use filter_snp(), uses non-standard evaluation used dplyr::filter(). , can define predicates terms variable names snpdf data frame output mupdog(). keeps rows snpdf inddf predicate SNP evaluates TRUE.","code":"plot(mout, indices = c(1, 5, 100)) #> [[1]] #> #> [[2]] #> #> [[3]] str(mout$snpdf) #> 'data.frame': 100 obs. of 20 variables: #> $ snp : chr \"PotVar0089524\" \"PotVar0052647\" \"PotVar0120897\" \"PotVar0066020\" ... #> $ bias : num 0.519 1.026 0.929 1.221 0.847 ... #> $ seq : num 0.00485 0.00221 0.002 0.0039 0.00206 ... #> $ od : num 0.00304 0.00295 0.00337 0.00275 0.00335 ... #> $ prop_mis: num 0.004926 0.002274 0.000626 0.002718 0.003 ... #> $ num_iter: num 6 3 3 5 7 7 4 8 8 4 ... #> $ llike : num -14.7 -25.3 -10.4 -22.7 -32 ... #> $ ploidy : num 4 4 4 4 4 4 4 4 4 4 ... #> $ model : chr \"norm\" \"norm\" \"norm\" \"norm\" ... #> $ p1ref : num NA NA NA NA NA NA NA NA NA NA ... #> $ p1size : num NA NA NA NA NA NA NA NA NA NA ... #> $ p2ref : num NA NA NA NA NA NA NA NA NA NA ... #> $ p2size : num NA NA NA NA NA NA NA NA NA NA ... #> $ Pr_0 : num 0.000279 0.248211 0.66369 0.015803 0.08409 ... #> $ Pr_1 : num 0.00707 0.45067 0.26892 0.06938 0.20154 ... #> $ Pr_2 : num 0.0745 0.2542 0.0597 0.1931 0.2968 ... #> $ Pr_3 : num 0.32604 0.04452 0.00725 0.34069 0.26844 ... #> $ Pr_4 : num 0.592065 0.002423 0.000482 0.381024 0.149179 ... #> $ mu : num 4.18 1.01 -1 3.75 2.29 ... #> $ sigma : num 1.067 0.925 1.289 1.481 1.433 ... str(mout$inddf) #> 'data.frame': 1000 obs. of 17 variables: #> $ snp : chr \"PotVar0089524\" \"PotVar0089524\" \"PotVar0089524\" \"PotVar0089524\" ... #> $ ind : chr \"P5PEM08\" \"P3PEM05\" \"P2PEM10\" \"P7PEM09\" ... #> $ ref : num 122 113 86 80 69 85 130 228 60 211 ... #> $ size : num 142 143 96 80 69 86 130 228 86 212 ... #> $ geno : num 3 3 3 4 4 4 4 4 2 4 ... #> $ postmean : num 3 2.99 3 4 4 ... #> $ maxpostprob: num 1 0.988 1 1 1 ... #> $ Pr_0 : num 3.74e-90 1.03e-78 2.21e-77 1.06e-86 8.21e-79 ... #> $ Pr_1 : num 7.97e-23 3.86e-16 2.61e-20 6.80e-30 1.21e-26 ... #> $ Pr_2 : num 4.94e-06 1.17e-02 3.27e-06 2.82e-14 1.01e-12 ... #> $ Pr_3 : num 1.00 9.88e-01 1.00 6.74e-06 2.75e-05 ... #> $ Pr_4 : num 1.45e-10 1.14e-15 3.56e-06 1.00 1.00 ... #> $ logL_0 : num -201 -176 -172 -190 -172 ... #> $ logL_1 : num -49.6 -35.6 -44 -62.9 -55.4 ... #> $ logL_2 : num -13.27 -6.95 -13.93 -29.29 -25.69 ... #> $ logL_3 : num -2.55 -4 -2.79 -11.49 -10.06 ... #> $ logL_4 : num -25.804 -38.999 -15.935 -0.181 -0.158 ... genomat <- format_multidog(mout, varname = \"geno\") head(genomat) #> P1PEM10 P2PEM05 P2PEM10 P3PEM05 P4PEM01 P4PEM09 P5PEM04 P5PEM08 #> PotVar0089524 4 4 3 3 4 4 4 3 #> PotVar0052647 3 1 0 1 1 2 0 1 #> PotVar0120897 0 0 0 0 0 0 0 1 #> PotVar0066020 3 2 3 4 4 3 1 4 #> PotVar0003381 3 1 2 0 2 3 3 1 #> PotVar0131622 2 4 1 2 2 3 4 3 #> P6PEM11 P7PEM09 #> PotVar0089524 2 4 #> PotVar0052647 1 1 #> PotVar0120897 2 1 #> PotVar0066020 4 2 #> PotVar0003381 4 3 #> PotVar0131622 3 3 dim(mout$snpdf) #> [1] 100 20 dim(mout$inddf) #> [1] 1000 17 mout_cleaned <- filter_snp(mout, prop_mis < 0.05 & bias > exp(-1) & bias < exp(1)) dim(mout_cleaned$snpdf) #> [1] 97 20 dim(mout_cleaned$inddf) #> [1] 970 17"},{"path":"/articles/multidog.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Genotyping Many SNPs with multidog()","text":"Gerard, David, Luís Felipe Ventorim Ferrão. “Priors genotyping polyploids.” Bioinformatics 36, . 6 (2020): 1795-1800. https://doi.org/10.1093/bioinformatics/btz852. Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468. Uitdewilligen, Anne-Marie . D’hoop, Jan G. . M. L. Wolters. 2013. “Next-Generation Sequencing Method Genotyping--Sequencing Highly Heterozygous Autotetraploid Potato.” PLOS ONE 8 (5). Public Library Science: 1–14. https://doi.org/10.1371/journal.pone.0062355.","code":""},{"path":"/articles/oracle_calculations.html","id":"abstract","dir":"Articles","previous_headings":"","what":"Abstract","title":"Oracle Calculations","text":"provide example usage oracle calculations available updog. particularly useful read-depth determination. calculations described detail Gerard et al. (2018).","code":""},{"path":"/articles/oracle_calculations.html","id":"controlling-misclassification-error","dir":"Articles","previous_headings":"","what":"Controlling Misclassification Error","title":"Oracle Calculations","text":"Suppose sample tetraploid individuals derived S1 cross (single generation selfing). Using domain expertise (either previous studies pilot analysis), ’ve determined sequencing technology produce relatively clean data. , sequencing error rate large (say, ~0.001), bias moderate (say, ~0.7 extreme), majority SNPs reasonable levels overdispersion (say, less 0.01). want know deep need sequence. Using oracle_mis, can see deep need sequence worst-case scenario want control (sequencing error rate = 0.001, bias = 0.7, overdispersion = 0.01) order obtain misclassification error rate , say, 0.05. , also need distribution offspring genotypes. can get distribution assuming various parental genotypes using get_q_array function. Typically, error rates larger allele-frequency closer 0.5. ’ll start worst-case scenario assuming parent 2 copies reference allele. genotype distribution offspring looks like: Now, ready iterate read-depth’s reach one error rate less 0.05. Looks like need depth 90 order get misclassification error rate 0.05. Note oracle_mis returns best misclassification error rate possible conditions (ploidy = 4, bias = 0.7, seq = 0.001, od = 0.01, pgeno = 2). actual analysis, worse misclassification error rate returned oracle_mis. However, lot individuals sample, act reasonable approximation error rate. general though, sequence little deeper suggested oracle_mis.","code":"bias <- 0.7 od <- 0.01 seq <- 0.001 maxerr <- 0.05 library(updog) ploidy <- 4 pgeno <- 2 gene_dist <- get_q_array(ploidy = ploidy)[pgeno + 1, pgeno + 1, ] library(ggplot2) distdf <- data.frame(x = 0:ploidy, y = 0, yend = gene_dist) ggplot(distdf, mapping = aes(x = x, y = y, xend = x, yend = yend)) + geom_segment(lineend = \"round\", lwd = 2) + theme_bw() + xlab(\"Allele Dosage\") + ylab(\"Probability\") err <- Inf depth <- 0 while(err > maxerr) { depth <- depth + 1 err <- oracle_mis(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) } depth #> [1] 90"},{"path":"/articles/oracle_calculations.html","id":"visualizing-the-joint-distribution","dir":"Articles","previous_headings":"","what":"Visualizing the Joint Distribution","title":"Oracle Calculations","text":"Suppose budget sequence depth 30. errors can expect? can use oracle_joint oracle_plot visualize errors can expect. errors mistakes genotypes 2/3 mistakes genotypes 1/2. Even though misclassification error rate pretty high (0.14), correlation oracle estimator true genotype pretty reasonable (0.89). can obtain using oracle_cor function.","code":"depth <- 30 jd <- oracle_joint(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) oracle_plot(jd) ocorr <- oracle_cor(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) ocorr #> [1] 0.8935101"},{"path":"/articles/oracle_calculations.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Oracle Calculations","text":"Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.","code":""},{"path":"/articles/simulate_ngs.html","id":"abstract","dir":"Articles","previous_headings":"","what":"Abstract","title":"Simulate Next-Generation Sequencing Data","text":"demonstrate simulate NGS data various genotype distributions, fit data using flexdog. genotyping methods described Gerard et al. (2018).","code":""},{"path":"/articles/simulate_ngs.html","id":"analysis","dir":"Articles","previous_headings":"","what":"Analysis","title":"Simulate Next-Generation Sequencing Data","text":"Let’s suppose 100 hexaploid individuals, varying levels read-depth. can simulate read-counts various genotype distributions, allele biases, overdispersions, sequencing error rates using rgeno rflexdog functions.","code":"set.seed(1) library(updog) nind <- 100 ploidy <- 6 sizevec <- round(stats::runif(n = nind, min = 50, max = 200))"},{"path":"/articles/simulate_ngs.html","id":"f1-population","dir":"Articles","previous_headings":"Analysis","what":"F1 Population","title":"Simulate Next-Generation Sequencing Data","text":"Suppose individuals siblings first parent 4 copies reference allele second parent 5 copies reference allele. following code, using rgeno, simulate individuals’ genotypes. genotypes, can simulate read-counts using rflexdog. Let’s suppose moderate level allelic bias (0.7) small level overdispersion (0.005). Generally, real data ’ve seen, bias range 0.5 2 overdispersion range 0 0.02, extremely overdispersed SNPs 0.02. plot data, looks realistic can test flexdog data flexdog gives us reasonable genotyping, accurately estimates proportion individuals mis-genotyped.","code":"true_geno <- rgeno(n = nind, ploidy = ploidy, model = \"f1\", p1geno = 4, p2geno = 5) refvec <- rflexdog(sizevec = sizevec, geno = true_geno, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.005) plot_geno(refvec = refvec, sizevec = sizevec, ploidy = ploidy, bias = 0.7, seq = 0.001, geno = true_geno) fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"f1\") #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -363.9369 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -363.937 #> Keeping old fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -363.9369 #> Keeping new fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -381.6123 #> Keeping old fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -412.8604 #> Keeping old fit. #> #> Done! plot(fout) ## Estimated proportion misgenotyped fout$prop_mis #> [1] 0.07011089 ## Actual proportion misgenotyped mean(fout$geno != true_geno) #> [1] 0.05"},{"path":"/articles/simulate_ngs.html","id":"hwe-population","dir":"Articles","previous_headings":"Analysis","what":"HWE Population","title":"Simulate Next-Generation Sequencing Data","text":"Now run simulations assuming individuals Hardy-Weinberg population allele frequency 0.75.","code":"true_geno <- rgeno(n = nind, ploidy = ploidy, model = \"hw\", allele_freq = 0.75) refvec <- rflexdog(sizevec = sizevec, geno = true_geno, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.005) fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"hw\") #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -377.9226 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -377.9226 #> Keeping old fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -377.9226 #> Keeping old fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -377.9226 #> Keeping new fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -377.9226 #> Keeping old fit. #> #> Done! plot(fout) ## Estimated proportion misgenotyped fout$prop_mis #> [1] 0.07625987 ## Actual proportion misgenotyped mean(fout$geno != true_geno) #> [1] 0.07 ## Estimated allele frequency close to true allele frequency fout$par$alpha #> [1] 0.7473264"},{"path":"/articles/simulate_ngs.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Simulate Next-Generation Sequencing Data","text":"Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.","code":""},{"path":"/articles/smells_like_updog.html","id":"whats-updog","dir":"Articles","previous_headings":"","what":"What’s Updog?","title":"Example Use of Updog","text":"Updog package containing empirical Bayes approaches genotype individuals (particularly polyploids) next generation sequencing (NGS) data. mind NGS data results reduced representation library, “genotyping--sequencing” (GBS) (Elshire et al., 2011) “restriction site-associated DNA sequencing” (RAD-seq) (Baird et al., 2008). Updog wields power hierarchical modeling account key features NGS data overlooked analyses, particularly allelic bias overdispersion. Updog also automatically account sequencing errors. efficiently account features, updog needs know distribution individual genotypes population. function flexdog can accurately estimate distribution wide variety situations. can read updog method Gerard et al. (2018). vignette, go one example fitting flexdog S1 population individuals.","code":""},{"path":[]},{"path":"/articles/smells_like_updog.html","id":"fit-updog","dir":"Articles","previous_headings":"Example from an S1 Population","what":"Fit updog","title":"Example Use of Updog","text":"Load updog snpdat dataset. data frame snpdat contains three example SNPs (single nucleotide polymorphisms) study Shirasawa et al. (2017). individuals dataset resulted single generation selfing (S1 population). can read typing ?snpdat. ’ll extract First SNP. separate counts children parent (first individual). Note need parental counts fit updog, can help improve estimates parameters updog model. can first use plot_geno visualize raw data. Now use flexdog function fit model. use model = \"s1\" individuals resulted one generation selfing parent.","code":"set.seed(1) library(updog) data(\"snpdat\") smalldat <- snpdat[snpdat$snp == \"SNP1\", c(\"counts\", \"size\", \"id\")] head(smalldat) #> # A tibble: 6 × 3 #> counts size id #> #> 1 298 354 Xushu18 #> 2 187 187 Xushu18S1-001 #> 3 201 201 Xushu18S1-002 #> 4 157 184 Xushu18S1-003 #> 5 175 215 Xushu18S1-004 #> 6 283 283 Xushu18S1-005 pref <- smalldat$counts[1] psize <- smalldat$size[1] oref <- smalldat$counts[-1] osize <- smalldat$size[-1] ploidy <- 6 # sweet potatoes are hexaploid plot_geno(refvec = oref, sizevec = osize, ploidy = ploidy) uout <- flexdog(refvec = oref, sizevec = osize, ploidy = ploidy, model = \"s1\", p1ref = pref, p1size = psize) #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -592.9506 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -592.9506 #> Keeping old fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -538.1967 #> Keeping new fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -538.1963 #> Keeping new fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -538.1963 #> Keeping old fit. #> #> Done!"},{"path":"/articles/smells_like_updog.html","id":"analyze-output","dir":"Articles","previous_headings":"Example from an S1 Population","what":"Analyze Output","title":"Example Use of Updog","text":"use plot.flexdog visualize fit. Points color coded according genotype highest posterior probability. example, genotype “4” represents four copies reference allele two copies alternative allele (AAAAaa). level transparency proportional maximum posterior probability. equivalent posterior probability genotype estimate correct. lines represent mean counts given genotype. “+” symbol black dot location parent.","code":"plot(uout)"},{"path":"/articles/smells_like_updog.html","id":"filtering-snps","dir":"Articles","previous_headings":"Example from an S1 Population","what":"Filtering SNPs","title":"Example Use of Updog","text":"downstream analyses, might want filter poorly behaved SNPs. SNPs might poorly behaved variety reasons (might real SNPs, might much difficult map one allele correct location relative allele, etc). Updog gives measures filter SNPs. intuitive measure (posterior) proportion individuals mis-genotyped: SNP, expect 4.22 percent individuals mis-genotyped. specific cutoff use context data dependent. starting point, try loose cutoff keeping SNPs prop_mis less 0.2. simulation studies, also generally get rid SNPs overdispersion parameters greater 0.05 SNPs bias parameters either less 0.5 greater 2. However, higher lower read depths looked simulations, adjust levels accordingly.","code":"uout$prop_mis #> [1] 0.04216399"},{"path":"/articles/smells_like_updog.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Example Use of Updog","text":"Baird, Paul D. Atwood, Nathan . Etter. 2008. “Rapid SNP Discovery Genetic Mapping Using Sequenced RAD Markers.” PLOS ONE 3 (10). Public Library Science: 1–7. https://doi.org/10.1371/journal.pone.0003376. Elshire, Jeffrey C. Sun, Robert J. Glaubitz. 2011. “Robust, Simple Genotyping--Sequencing (GBS) Approach High Diversity Species.” PLOS ONE 6 (5). Public Library Science: 1–10. https://doi.org/10.1371/journal.pone.0019379. Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens. 2018. “Genotyping Polyploids Messy Sequencing Data.” Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468. Shirasawa, Kenta, Masaru Tanaka, Yasuhiro Takahata, Daifu Ma, Qinghe Cao, Qingchang Liu, Hong Zhai, et al. 2017. “High-Density SNP Genetic Map Consisting Complete Set Homologous Groups Autohexaploid Sweetpotato (Ipomoea batatas).” Scientific Reports 7. Nature Publishing Group. https://doi.org/10.1038/srep44207.","code":""},{"path":"/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Gerard. Author, maintainer.","code":""},{"path":"/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Gerard D, Ferr~ao L, Garcia , Stephens M (2018). “Genotyping Polyploids Messy Sequencing Data.” Genetics, 210(3), 789–807. ISSN 0016-6731, doi:10.1534/genetics.118.301468. Gerard D, Ferr~ao L (2020). “Priors Genotyping Polyploids.” Bioinformatics, 36(6), 1795–1800. ISSN 1367-4803, doi:10.1093/bioinformatics/btz852.","code":"@Article{, title = {Genotyping Polyploids from Messy Sequencing Data}, year = {2018}, journal = {Genetics}, publisher = {Genetics}, volume = {210}, number = {3}, pages = {789--807}, issn = {0016-6731}, doi = {10.1534/genetics.118.301468}, author = {David Gerard and Lu{\\'i}s Felipe Ventorim Ferr{\\~a}o and Antonio Augusto Franco Garcia and Matthew Stephens}, } @Article{, title = {Priors for Genotyping Polyploids}, year = {2020}, journal = {Bioinformatics}, publisher = {Oxford University Press}, volume = {36}, number = {6}, pages = {1795--1800}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btz852}, author = {David Gerard and Lu{\\'i}s Felipe Ventorim Ferr{\\~a}o}, }"},{"path":"/index.html","id":"updog-","dir":"","previous_headings":"","what":"Flexible Genotyping for Polyploids","title":"Flexible Genotyping for Polyploids","text":"Updog provides suite methods genotyping polyploids next-generation sequencing (NGS) data. accounting many common features NGS data: allele bias, overdispersion, sequencing error. named updog “Using Parental Data Offspring Genotyping” originally developed method full-sib populations, works now general populations. method described detail Gerard et. al. (2018) . Additional details concerning prior specification described Gerard Ferrão (2020) . main functions flexdog() multidog(), provide many options distribution genotypes sample. novel genotype distribution included class proportional normal distributions (model = \"norm\"). default prior distribution robust varying genotype distributions, feel free use specialized priors information data. Also provided : filter_snp(): filter SNPs based output multidog(). format_multidog(): format output multidog() terms multidimensional array. Plot methods. flexdog() multidog() plot methods. See help files plot.flexdog() plot.multidog() details. Functions simulate genotypes (rgeno()) read-counts (rflexdog()). support models available flexdog(). Functions evaluate oracle genotyping performance: oracle_joint(), oracle_mis(), oracle_mis_vec(), oracle_cor(). mean “oracle” sense assume entire data generation process known (.e. genotype distribution, sequencing error rate, allele bias, overdispersion known). good approximations lot individuals (necessarily large read-depth). original updog package now named updogAlpha may found . See also ebg, fitPoly, polyRAD. best “competitor” probably fitPoly, though polyRAD nice ideas utilizing population structure linkage disequilibrium. See NEWS latest updates package.","code":""},{"path":"/index.html","id":"vignettes","dir":"","previous_headings":"","what":"Vignettes","title":"Flexible Genotyping for Polyploids","text":"’ve included many vignettes updog, can access online .","code":""},{"path":"/index.html","id":"bug-reports","dir":"","previous_headings":"","what":"Bug Reports","title":"Flexible Genotyping for Polyploids","text":"find bug want enhancement, please submit issue .","code":""},{"path":"/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Flexible Genotyping for Polyploids","text":"can install updog CRAN usual way: can install current (unstable) version updog GitHub :","code":"install.packages(\"updog\") # install.packages(\"devtools\") devtools::install_github(\"dcgerard/updog\")"},{"path":"/index.html","id":"how-to-cite","dir":"","previous_headings":"","what":"How to Cite","title":"Flexible Genotyping for Polyploids","text":"Please cite Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi: 10.1534/genetics.118.301468. , using BibTex: using proportional normal prior class (model = \"norm\"), also default prior, please also cite: Gerard D, Ferrão L (2020). “Priors Genotyping Polyploids.” Bioinformatics, 36(6), 1795-1800. ISSN 1367-4803, doi: 10.1093/bioinformatics/btz852. , using BibTex:","code":"@article {gerard2018genotyping, author = {Gerard, David and Ferr{\\~a}o, Lu{\\'i}s Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew}, title = {Genotyping Polyploids from Messy Sequencing Data}, volume = {210}, number = {3}, pages = {789--807}, year = {2018}, doi = {10.1534/genetics.118.301468}, publisher = {Genetics}, issn = {0016-6731}, URL = {https://doi.org/10.1534/genetics.118.301468}, journal = {Genetics} } @article{gerard2020priors, title = {Priors for Genotyping Polyploids}, year = {2020}, journal = {Bioinformatics}, publisher = {Oxford University Press}, volume = {36}, number = {6}, pages = {1795--1800}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btz852}, author = {David Gerard and Lu{\\'i}s Felipe Ventorim Ferr{\\~a}o}, }"},{"path":"/index.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Flexible Genotyping for Polyploids","text":"Please note project released Contributor Code Conduct. participating project agree abide terms.","code":""},{"path":"/reference/betabinom.html","id":null,"dir":"Reference","previous_headings":"","what":"The Beta-Binomial Distribution — dbetabinom","title":"The Beta-Binomial Distribution — dbetabinom","text":"Density, distribution function, quantile function random generation beta-binomial distribution parameterized mean mu overdispersion parameter rho rather typical shape parameters.","code":""},{"path":"/reference/betabinom.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"The Beta-Binomial Distribution — dbetabinom","text":"","code":"dbetabinom(x, size, mu, rho, log) pbetabinom(q, size, mu, rho, log_p) qbetabinom(p, size, mu, rho) rbetabinom(n, size, mu, rho)"},{"path":"/reference/betabinom.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"The Beta-Binomial Distribution — dbetabinom","text":"x, q vector quantiles. size vector sizes. mu Either scalar mean observation, vector means observation, thus length x size. must 0 1. rho Either scalar overdispersion parameter observation, vector overdispersion parameters observation, thus length x size. must 0 1. log, log_p logical vector either length 1 length x size. determines whether return log probabilities observations (case length 1) observation (case length x size). p vector probabilities. n number observations.","code":""},{"path":"/reference/betabinom.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"The Beta-Binomial Distribution — dbetabinom","text":"Either random sample (rbetabinom), density (dbetabinom), tail probability (pbetabinom), quantile (qbetabinom) beta-binomial distribution.","code":""},{"path":"/reference/betabinom.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"The Beta-Binomial Distribution — dbetabinom","text":"Let \\(\\mu\\) \\(\\rho\\) mean overdispersion parameters. Let \\(\\alpha\\) \\(\\beta\\) usual shape parameters beta distribution. relation $$\\mu = \\alpha/(\\alpha + \\beta),$$ $$\\rho = 1/(1 + \\alpha + \\beta).$$ necessarily means $$\\alpha = \\mu (1 - \\rho)/\\rho,$$ $$\\beta = (1 - \\mu) (1 - \\rho)/\\rho.$$","code":""},{"path":"/reference/betabinom.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"The Beta-Binomial Distribution — dbetabinom","text":"dbetabinom(): Density function. pbetabinom(): Distribution function. qbetabinom(): Quantile function. rbetabinom(): Random generation.","code":""},{"path":"/reference/betabinom.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"The Beta-Binomial Distribution — dbetabinom","text":"David Gerard","code":""},{"path":"/reference/betabinom.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"The Beta-Binomial Distribution — dbetabinom","text":"","code":"x <- rbetabinom(n = 10, size = 10, mu = 0.1, rho = 0.01) dbetabinom(x = 1, size = 10, mu = 0.1, rho = 0.01, log = FALSE) #> [1] 0.3689335 pbetabinom(q = 1, size = 10, mu = 0.1, rho = 0.01, log_p = FALSE) #> [1] 0.7345131 qbetabinom(p = 0.6, size = 10, mu = 0.1, rho = 0.01) #> [1] 1"},{"path":"/reference/filter_snp.html","id":null,"dir":"Reference","previous_headings":"","what":"Filter SNPs based on the output of multidog(). — filter_snp","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"Filter based provided logical predicates terms variable names x$snpdf. function filters x$snpdf x$inddf.","code":""},{"path":"/reference/filter_snp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"","code":"filter_snp(x, expr)"},{"path":"/reference/filter_snp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"x output multidog. expr Logical predicate expression defined terms variables x$snpdf. SNPs condition evaluates TRUE kept.","code":""},{"path":[]},{"path":"/reference/filter_snp.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"David Gerard","code":""},{"path":"/reference/filter_snp.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Filter SNPs based on the output of multidog(). — filter_snp","text":"","code":"if (FALSE) { data(\"uitdewilligen\") mout <- multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, nc = 2) ## The following filters are for educational purposes only and should ## not be taken as a default filter: mout2 <- filter_snp(mout, bias < 0.8 & od < 0.003) }"},{"path":"/reference/flexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"Genotype polyploid individuals next generation sequencing (NGS) data assuming genotype distribution one several forms. flexdog accounting allele bias, overdispersion, sequencing error. method described detail Gerard et. al. (2018) Gerard Ferrão (2020). See multidog() running flexdog multiple SNPs parallel.","code":""},{"path":"/reference/flexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"","code":"flexdog( refvec, sizevec, ploidy, model = c(\"norm\", \"hw\", \"bb\", \"s1\", \"s1pp\", \"f1\", \"f1pp\", \"flex\", \"uniform\", \"custom\"), p1ref = NULL, p1size = NULL, p2ref = NULL, p2size = NULL, snpname = NULL, bias_init = exp(c(-1, -0.5, 0, 0.5, 1)), verbose = TRUE, prior_vec = NULL, ... )"},{"path":"/reference/flexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"refvec vector counts reads reference allele. sizevec vector total counts. ploidy ploidy species. Assumed individual. model form prior (genotype distribution) take? See Details possible values. p1ref reference counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p1size total counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p2ref reference counts second parent model = \"f1\" model = \"f1pp\". p2size total counts second parent model = \"f1\" model = \"f1pp\". snpname string. name SNP consideration. just returned input list reference. bias_init vector initial values bias parameter multiple runs flexdog_full(). verbose output (TRUE) less (FALSE)? prior_vec pre-specified genotype distribution. used model = \"custom\" must otherwise NULL. specified, vector length ploidy + 1 non-negative elements sum 1. ... Additional parameters pass flexdog_full().","code":""},{"path":"/reference/flexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"object class flexdog, consists list following elements: bias estimated bias parameter. seq estimated sequencing error rate. od estimated overdispersion parameter. num_iter number EM iterations ran. wary equals itermax. llike maximum marginal log-likelihood. postmat matrix posterior probabilities genotype individual. rows index individuals columns index allele dosage. genologlike matrix genotype log-likelihoods genotype individual. rows index individuals columns index allele dosage. gene_dist estimated genotype distribution. ith element proportion individuals genotype -1. par list final estimates parameters genotype distribution. elements included par depends value model: model = \"norm\": mu: normal mean. sigma: normal standard deviation (variance). model = \"hw\": alpha: major allele frequency. model = \"bb\": alpha: major allele frequency. tau: overdispersion parameter. See description rho Details betabinom(). model = \"s1\": pgeno: allele dosage parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1\": p1geno: allele dosage first parent. p2geno: allele dosage second parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"s1pp\": ell1: estimated dosage parent. tau1: estimated double reduction parameter parent. Available ell1 1, 2, 3. Identified ell1 1 3. gamma1: estimated preferential pairing parameter. Available ell1 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1pp\": ell1: estimated dosage parent 1. ell2: estimated dosage parent 2. tau1: estimated double reduction parameter parent 1. Available ell1 1, 2, 3. Identified ell1 1 3. tau2: estimated double reduction parameter parent 2. Available ell2 1, 2, 3. Identified ell2 1 3. gamma1: estimated preferential pairing parameter parent 1. Available ell1 2. However, returned identified form. gamma2: estimated preferential pairing parameter parent 2. Available ell2 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"flex\": par empty list. model = \"uniform\": par empty list. model = \"custom\": par empty list. geno posterior mode genotype. genotype estimates. maxpostprob maximum posterior probability. equivalent posterior probability correctly genotyping individual. postmean posterior mean genotype. downstream association studies, might want consider using estimates. input$refvec value refvec provided user. input$sizevec value sizevec provided user. input$ploidy value ploidy provided user. input$model value model provided user. input$p1ref value p1ref provided user. input$p1size value p1size provided user. input$p2ref value p2ref provided user. input$p2size value p2size provided user. input$snpname value snpname provided user. prop_mis posterior proportion individuals genotyped incorrectly.","code":""},{"path":"/reference/flexdog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"Possible values genotype distribution (values model) : \"norm\" distribution whose genotype frequencies proportional density value normal mean standard deviation. Unlike \"bb\" \"hw\" options, allow distributions less dispersed binomial. seems robust violations modeling assumptions, default. prior class developed Gerard Ferrão (2020). \"hw\" binomial distribution results assuming population Hardy-Weinberg equilibrium (HWE). actually pretty well even minor moderate deviations HWE. Though perform well `\"norm\"` option severe deviations HWE. \"bb\" beta-binomial distribution. overdispersed version \"hw\" can derived special case Balding-Nichols model. \"s1\" prior assumes individuals full-siblings resulting one generation selfing. .e. one parent. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1\" prior assumes individuals full-siblings resulting one generation bi-parental cross. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1pp\" prior allows double reduction preferential pairing F1 population tretraploids. \"s1pp\" prior allows double reduction preferential pairing S1 population tretraploids. \"flex\" Generically categorical distribution. Theoretically, works well lot individuals. practice, seems much less robust violations modeling assumptions. \"uniform\" discrete uniform distribution. never used practice. \"custom\" pre-specified prior distribution. specify using prior_vec argument. almost never use option practice. might think good default model = \"uniform\" somehow \"uninformative prior.\" informative tends work horribly practice. intuition estimate allele bias sequencing error rates estimated genotypes approximately uniform (since assuming approximately uniform). usually result unintuitive genotyping since populations uniform genotype distribution. include option completeness. Please use . value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.05. value maxpostprob intuitive measure quality genotype estimate individual. posterior probability correctly genotyping individual using geno (posterior mode) genotype estimate. want correctly genotype, say, 95% individuals, discard individuals maxpostprob 0.95. However, just going impute missing genotypes later, might consider discarding individuals flexdog's genotype estimates probably accurate naive approaches, imputing using grand mean. datasets examined, allelic bias major issue. However, may fit model assuming allelic bias setting update_bias = FALSE bias_init = 1. Prior using flexdog, read-mapping step, try get rid allelic bias using WASP (doi:10.1101/011221 ). successful removing allelic bias (source read-mapping step), setting update_bias = FALSE bias_init = 1 reasonable. can visually inspect SNPs bias using plot_geno(). flexdog(), like methods, invariant allele label \"reference\" label \"alternative\". , set refvec number alternative read-counts, resulting genotype estimates estimated allele dosage alternative allele.","code":""},{"path":"/reference/flexdog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/flexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"David Gerard","code":""},{"path":"/reference/flexdog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog","text":"","code":"# \\donttest{ ## An S1 population where the first individual ## is the parent. data(\"snpdat\") ploidy <- 6 refvec <- snpdat$counts[snpdat$snp == \"SNP2\"] sizevec <- snpdat$size[snpdat$snp == \"SNP2\"] fout <- flexdog(refvec = refvec[-1], sizevec = sizevec[-1], ploidy = ploidy, model = \"s1\", p1ref = refvec[1], p1size = sizevec[1]) #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -557.6433 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -519.2793 #> Keeping new fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -519.2793 #> Keeping old fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -519.2793 #> Keeping new fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -519.2793 #> Keeping new fit. #> #> Done! plot(fout) #> Warning: Removed 1 rows containing missing values (`geom_point()`). # } ## A natural population. We will assume a ## normal prior since there are so few ## individuals. data(\"uitdewilligen\") ploidy <- 4 refvec <- uitdewilligen$refmat[, 1] sizevec <- uitdewilligen$sizemat[, 1] fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"norm\") #> Fit: 1 of 5 #> Initial Bias: 0.3678794 #> Log-Likelihood: -14.66905 #> Keeping new fit. #> #> Fit: 2 of 5 #> Initial Bias: 0.6065307 #> Log-Likelihood: -14.66905 #> Keeping new fit. #> #> Fit: 3 of 5 #> Initial Bias: 1 #> Log-Likelihood: -15.44144 #> Keeping old fit. #> #> Fit: 4 of 5 #> Initial Bias: 1.648721 #> Log-Likelihood: -15.44141 #> Keeping old fit. #> #> Fit: 5 of 5 #> Initial Bias: 2.718282 #> Log-Likelihood: -15.44141 #> Keeping old fit. #> #> Done! plot(fout)"},{"path":"/reference/flexdog_full.html","id":null,"dir":"Reference","previous_headings":"","what":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"Genotype polyploid individuals next generation sequencing (NGS) data assuming genotype distribution one several forms. flexdog_full() accounting allele bias, overdispersion, sequencing error. function options flexdog meant expert users. method described detail Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/flexdog_full.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"","code":"flexdog_full( refvec, sizevec, ploidy, model = c(\"norm\", \"hw\", \"bb\", \"s1\", \"s1pp\", \"f1\", \"f1pp\", \"flex\", \"uniform\", \"custom\"), verbose = TRUE, mean_bias = 0, var_bias = 0.7^2, mean_seq = -4.7, var_seq = 1, mean_od = -5.5, var_od = 0.5^2, seq = 0.005, bias = 1, od = 0.001, update_bias = TRUE, update_seq = TRUE, update_od = TRUE, itermax = 200, tol = 10^-4, fs1_alpha = 10^-3, p1ref = NULL, p1size = NULL, p2ref = NULL, p2size = NULL, snpname = NULL, prior_vec = NULL, seq_upper = 0.05 )"},{"path":"/reference/flexdog_full.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"refvec vector counts reads reference allele. sizevec vector total counts. ploidy ploidy species. Assumed individual. model form prior (genotype distribution) take? See Details possible values. verbose output (TRUE) less (FALSE)? mean_bias prior mean log-bias. var_bias prior variance log-bias. mean_seq prior mean logit sequencing error rate. var_seq prior variance logit sequencing error rate. mean_od prior mean logit overdispersion parameter. var_od prior variance logit overdispersion parameter. seq starting value sequencing error rate. bias starting value bias. od starting value overdispersion parameter. update_bias logical. update bias (TRUE), (FALSE)? update_seq logical. update seq (TRUE), (FALSE)? update_od logical. update od (TRUE), (FALSE)? itermax total number EM iterations run. tol tolerance stopping criterion. EM algorithm stop difference log-likelihoods two consecutive iterations less tol. fs1_alpha value fix mixing proportion uniform component model = \"f1\", model = \"s1\", model = \"f1pp\", model = \"s1pp\". recommend small value 10^-3. p1ref reference counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p1size total counts first parent model = \"f1\" model = \"f1pp\", parent model = \"s1\" model = \"s1pp\". p2ref reference counts second parent model = \"f1\" model = \"f1pp\". p2size total counts second parent model = \"f1\" model = \"f1pp\". snpname string. name SNP consideration. just returned input list reference. prior_vec pre-specified genotype distribution. used model = \"custom\" must otherwise NULL. specified, vector length ploidy + 1 non-negative elements sum 1. seq_upper upper bound possible sequencing error rate. Default 0.05, adjust prior knowledge sequencing error rate sequencing technology.","code":""},{"path":"/reference/flexdog_full.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"object class flexdog, consists list following elements: bias estimated bias parameter. seq estimated sequencing error rate. od estimated overdispersion parameter. num_iter number EM iterations ran. wary equals itermax. llike maximum marginal log-likelihood. postmat matrix posterior probabilities genotype individual. rows index individuals columns index allele dosage. genologlike matrix genotype log-likelihoods genotype individual. rows index individuals columns index allele dosage. gene_dist estimated genotype distribution. ith element proportion individuals genotype -1. par list final estimates parameters genotype distribution. elements included par depends value model: model = \"norm\": mu: normal mean. sigma: normal standard deviation (variance). model = \"hw\": alpha: major allele frequency. model = \"bb\": alpha: major allele frequency. tau: overdispersion parameter. See description rho Details betabinom(). model = \"s1\": pgeno: allele dosage parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1\": p1geno: allele dosage first parent. p2geno: allele dosage second parent. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"s1pp\": ell1: estimated dosage parent. tau1: estimated double reduction parameter parent. Available ell1 1, 2, 3. Identified ell1 1 3. gamma1: estimated preferential pairing parameter. Available ell1 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"f1pp\": ell1: estimated dosage parent 1. ell2: estimated dosage parent 2. tau1: estimated double reduction parameter parent 1. Available ell1 1, 2, 3. Identified ell1 1 3. tau2: estimated double reduction parameter parent 2. Available ell2 1, 2, 3. Identified ell2 1 3. gamma1: estimated preferential pairing parameter parent 1. Available ell1 2. However, returned identified form. gamma2: estimated preferential pairing parameter parent 2. Available ell2 2. However, returned identified form. alpha: mixture proportion discrete uniform (included fixed small value mostly numerical stability reasons). See description fs1_alpha flexdog_full(). model = \"flex\": par empty list. model = \"uniform\": par empty list. model = \"custom\": par empty list. geno posterior mode genotype. genotype estimates. maxpostprob maximum posterior probability. equivalent posterior probability correctly genotyping individual. postmean posterior mean genotype. downstream association studies, might want consider using estimates. input$refvec value refvec provided user. input$sizevec value sizevec provided user. input$ploidy value ploidy provided user. input$model value model provided user. input$p1ref value p1ref provided user. input$p1size value p1size provided user. input$p2ref value p2ref provided user. input$p2size value p2size provided user. input$snpname value snpname provided user. prop_mis posterior proportion individuals genotyped incorrectly.","code":""},{"path":"/reference/flexdog_full.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"Possible values genotype distribution (values model) : \"norm\" distribution whose genotype frequencies proportional density value normal mean standard deviation. Unlike \"bb\" \"hw\" options, allow distributions less dispersed binomial. seems robust violations modeling assumptions, default. prior class developed Gerard Ferrão (2020). \"hw\" binomial distribution results assuming population Hardy-Weinberg equilibrium (HWE). actually pretty well even minor moderate deviations HWE. Though perform well `\"norm\"` option severe deviations HWE. \"bb\" beta-binomial distribution. overdispersed version \"hw\" can derived special case Balding-Nichols model. \"s1\" prior assumes individuals full-siblings resulting one generation selfing. .e. one parent. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1\" prior assumes individuals full-siblings resulting one generation bi-parental cross. model assumes particular type meiotic behavior: polysomic inheritance bivalent, non-preferential pairing. \"f1pp\" prior allows double reduction preferential pairing F1 population tretraploids. \"s1pp\" prior allows double reduction preferential pairing S1 population tretraploids. \"flex\" Generically categorical distribution. Theoretically, works well lot individuals. practice, seems much less robust violations modeling assumptions. \"uniform\" discrete uniform distribution. never used practice. \"custom\" pre-specified prior distribution. specify using prior_vec argument. almost never use option practice. might think good default model = \"uniform\" somehow \"uninformative prior.\" informative tends work horribly practice. intuition estimate allele bias sequencing error rates estimated genotypes approximately uniform (since assuming approximately uniform). usually result unintuitive genotyping since populations uniform genotype distribution. include option completeness. Please use . value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.05. value maxpostprob intuitive measure quality genotype estimate individual. posterior probability correctly genotyping individual using geno (posterior mode) genotype estimate. want correctly genotype, say, 95% individuals, discard individuals maxpostprob 0.95. However, just going impute missing genotypes later, might consider discarding individuals flexdog's genotype estimates probably accurate naive approaches, imputing using grand mean. datasets examined, allelic bias major issue. However, may fit model assuming allelic bias setting update_bias = FALSE bias_init = 1. Prior using flexdog, read-mapping step, try get rid allelic bias using WASP (doi:10.1101/011221 ). successful removing allelic bias (source read-mapping step), setting update_bias = FALSE bias_init = 1 reasonable. can visually inspect SNPs bias using plot_geno(). flexdog(), like methods, invariant allele label \"reference\" label \"alternative\". , set refvec number alternative read-counts, resulting genotype estimates estimated allele dosage alternative allele.","code":""},{"path":"/reference/flexdog_full.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/flexdog_full.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"David Gerard","code":""},{"path":"/reference/flexdog_full.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Flexible genotyping for polyploids from next-generation sequencing data. — flexdog_full","text":"","code":"## A natural population. We will assume a ## normal prior since there are so few ## individuals. data(\"uitdewilligen\") ploidy <- 4 refvec <- uitdewilligen$refmat[, 1] sizevec <- uitdewilligen$sizemat[, 1] fout <- flexdog_full(refvec = refvec, sizevec = sizevec, ploidy = ploidy, model = \"norm\") plot(fout)"},{"path":"/reference/format_multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Return arrayicized elements from the output of multidog. — format_multidog","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"function allow genotype estimates, maximum posterior probability, values form matrix/array. multiple variable names provided, data formatted 3-dimensional array dimensions corresponding (individuals, SNPs, variables).","code":""},{"path":"/reference/format_multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"","code":"format_multidog(x, varname = \"geno\")"},{"path":"/reference/format_multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"x output multidog. varname character vector variable names whose values populate cells. column names x$inddf.","code":""},{"path":"/reference/format_multidog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"Note order individuals reshuffled. order SNPs x$snpdf.","code":""},{"path":"/reference/format_multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Return arrayicized elements from the output of multidog. — format_multidog","text":"David Gerard","code":""},{"path":"/reference/get_q_array.html","id":null,"dir":"Reference","previous_headings":"","what":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"Return probabilities offspring's genotype given parental genotypes possible combinations parental offspring genotypes. species polysomal inheritance bivalent, non-preferential pairing.","code":""},{"path":"/reference/get_q_array.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"","code":"get_q_array(ploidy)"},{"path":"/reference/get_q_array.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"ploidy positive integer. ploidy species.","code":""},{"path":"/reference/get_q_array.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"three-way array proportions. (, j, k)th element probability offspring k - 1 reference alleles given parent 1 - 1 reference alleles parent 2 j - 1 reference alleles. dimension array ploidy + 1. dimension names, \"\" stands reference allele \"\" stands alternative allele.","code":""},{"path":"/reference/get_q_array.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"David Gerard","code":""},{"path":"/reference/get_q_array.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Return the probabilities of an offspring's genotype given its\nparental genotypes for all possible combinations of parental and\noffspring genotypes. This is for species with polysomal inheritance\nand bivalent, non-preferential pairing. — get_q_array","text":"","code":"qarray <- get_q_array(6) apply(qarray, c(1, 2), sum) ## should all be 1's. #> parent2 #> parent1 aaaaaa Aaaaaa AAaaaa AAAaaa AAAAaa AAAAAa AAAAAA #> aaaaaa 1 1 1 1 1 1 1 #> Aaaaaa 1 1 1 1 1 1 1 #> AAaaaa 1 1 1 1 1 1 1 #> AAAaaa 1 1 1 1 1 1 1 #> AAAAaa 1 1 1 1 1 1 1 #> AAAAAa 1 1 1 1 1 1 1 #> AAAAAA 1 1 1 1 1 1 1"},{"path":"/reference/is.flexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Tests if an argument is a flexdog object. — is.flexdog","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"Tests argument flexdog object.","code":""},{"path":"/reference/is.flexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"","code":"is.flexdog(x)"},{"path":"/reference/is.flexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"x Anything.","code":""},{"path":"/reference/is.flexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"logical. TRUE x flexdog object, FALSE otherwise.","code":""},{"path":"/reference/is.flexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"David Gerard","code":""},{"path":"/reference/is.flexdog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tests if an argument is a flexdog object. — is.flexdog","text":"","code":"is.flexdog(\"anything\") #> [1] FALSE # FALSE"},{"path":"/reference/is.multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Tests if an argument is a multidog object. — is.multidog","title":"Tests if an argument is a multidog object. — is.multidog","text":"Tests argument multidog object.","code":""},{"path":"/reference/is.multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tests if an argument is a multidog object. — is.multidog","text":"","code":"is.multidog(x)"},{"path":"/reference/is.multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tests if an argument is a multidog object. — is.multidog","text":"x Anything.","code":""},{"path":"/reference/is.multidog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tests if an argument is a multidog object. — is.multidog","text":"logical. TRUE x multidog object, FALSE otherwise.","code":""},{"path":"/reference/is.multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Tests if an argument is a multidog object. — is.multidog","text":"David Gerard","code":""},{"path":"/reference/is.multidog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tests if an argument is a multidog object. — is.multidog","text":"","code":"is.multidog(\"anything\") #> [1] FALSE # FALSE"},{"path":"/reference/log_sum_exp.html","id":null,"dir":"Reference","previous_headings":"","what":"Log-sum-exponential trick. — log_sum_exp","title":"Log-sum-exponential trick. — log_sum_exp","text":"Log-sum-exponential trick.","code":""},{"path":"/reference/log_sum_exp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Log-sum-exponential trick. — log_sum_exp","text":"","code":"log_sum_exp(x)"},{"path":"/reference/log_sum_exp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Log-sum-exponential trick. — log_sum_exp","text":"x vector log-sum-exp.","code":""},{"path":"/reference/log_sum_exp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Log-sum-exponential trick. — log_sum_exp","text":"log sum exponential elements x.","code":""},{"path":"/reference/log_sum_exp.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Log-sum-exponential trick. — log_sum_exp","text":"David Gerard","code":""},{"path":"/reference/log_sum_exp_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"Log-sum-exponential trick using just two doubles.","code":""},{"path":"/reference/log_sum_exp_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"","code":"log_sum_exp_2(x, y)"},{"path":"/reference/log_sum_exp_2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"x double. y Another double.","code":""},{"path":"/reference/log_sum_exp_2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"log sum exponential x y.","code":""},{"path":"/reference/log_sum_exp_2.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Log-sum-exponential trick using just two doubles. — log_sum_exp_2","text":"David Gerard","code":""},{"path":"/reference/multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Fit flexdog to multiple SNPs. — multidog","title":"Fit flexdog to multiple SNPs. — multidog","text":"convenience function run flexdog many SNPs. Support provided parallel computing doParallel package. function extensively tested. Please report bugs https://github.com/dcgerard/updog/issues.","code":""},{"path":"/reference/multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fit flexdog to multiple SNPs. — multidog","text":"","code":"multidog( refmat, sizemat, ploidy, model = c(\"norm\", \"hw\", \"bb\", \"s1\", \"s1pp\", \"f1\", \"f1pp\", \"flex\", \"uniform\", \"custom\"), nc = 1, p1_id = NULL, p2_id = NULL, bias_init = exp(c(-1, -0.5, 0, 0.5, 1)), prior_vec = NULL, ... )"},{"path":"/reference/multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fit flexdog to multiple SNPs. — multidog","text":"refmat matrix reference read counts. columns index individuals rows index markers (SNPs). matrix must rownames (names markers) column names (names individuals). names must match names sizemat. sizemat matrix total read counts. columns index individuals rows index markers (SNPs). matrix must rownames (names markers) column names (names individuals). names must match names refmat. ploidy ploidy species. Assumed individual. model form prior (genotype distribution) take? See Details possible values. nc number computing cores use parallelization local machine. See section \"Parallel Computation\" implement complicated evaluation strategies using future package. specifying evaluation strategies using future package, also set nc = NA. value nc never number cores available computing environment. can determine maximum number available cores running future::availableCores() R. p1_id ID first parent. character length 1. correspond single column name refmat sizemat. p2_id ID second parent. character length 1. correspond single column name refmat sizemat. bias_init vector initial values bias parameter multiple runs flexdog_full(). prior_vec pre-specified genotype distribution. used model = \"custom\" must otherwise NULL. specified, vector length ploidy + 1 non-negative elements sum 1. ... Additional parameters pass flexdog_full().","code":""},{"path":"/reference/multidog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fit flexdog to multiple SNPs. — multidog","text":"list-like object two data frames. snpdf data frame containing properties SNPs (markers). rows index SNPs. variables include: snp name SNP (marker). bias estimated allele bias SNP. seq estimated sequencing error rate SNP. od estimated overdispersion parameter SNP. prop_mis estimated proportion individuals misclassified SNP. num_iter number iterations performed EM algorithm SNP. llike maximum marginal likelihood SNP. ploidy provided ploidy species. model provided model prior genotype distribution. p1ref user-provided reference read counts parent 1. p1size user-provided total read counts parent 1. p2ref user-provided reference read counts parent 2. p2size user-provided total read counts parent 2. Pr_k estimated frequency individuals genotype k, k can integer 0 ploidy level. Model specific parameter estimates See return value par help page flexdog. inddf data frame containing properties individuals SNP. variables include: snp name SNP (marker). ind name individual. ref provided reference counts individual SNP. size provided total counts individual SNP. geno posterior mode genotype individual SNP. estimated reference allele dosage given individual given SNP. postmean posterior mean genotype individual SNP. continuous genotype estimate reference allele dosage given individual given SNP. maxpostprob maximum posterior probability. posterior probability individual genotyped correctly. Pr_k posterior probability given individual given SNP genotype k, k can vary 0 ploidy level species. logL_k genotype log-likelihoods dosage k given individual given SNP, k can vary f rom 0 ploidy level species.","code":""},{"path":"/reference/multidog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fit flexdog to multiple SNPs. — multidog","text":"format reference counts total read counts two separate matrices. rows index markers (SNPs) columns index individuals. Row names ID SNPs column names ID individuals, required attributes. data VCF files, recommend importing using VariantAnnotation package Bioconductor https://bioconductor.org/packages/VariantAnnotation/. great VCF parser. See details flexdog possible values model. model = \"f1\", model = \"s1\", model = \"f1pp\" model = \"s1pp\" user may provide individual ID parent(s) via p1_id p2_id arguments. output list containing two data frames. first data frame, called snpdf, contains information SNP, allele bias sequencing error rate. second data frame, called inddf, contains information individual SNP, estimated genotype posterior probability classified correctly. SNPs contain 0 reads (missing data) entirely removed.","code":""},{"path":"/reference/multidog.html","id":"parallel-computation","dir":"Reference","previous_headings":"","what":"Parallel Computation","title":"Fit flexdog to multiple SNPs. — multidog","text":"multidog() function supports parallel computing. future package. just running multidog() local machine, can use nc argument specify parallelization. value nc greater 1 result multiple background R sessions genotype SNPs. maximum value nc try can found running future::availableCores(). Running multidog() using nc equivalent setting future plan future::plan(future::multisession, workers = nc). Using future package means different evaluation strategies possible. particular, using high performance machine, can explore using future.batchtools package evaluate multidog() using schedulers like Slurm TORQUE/PBS. use different strategy, set nc = NA run future::plan() prior running multidog(). example, set forked R processes current machine (instead using background R sessions), run (work Windows): future::plan(future::multicore), followed running multidog() nc = NA. See examples .","code":""},{"path":[]},{"path":"/reference/multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Fit flexdog to multiple SNPs. — multidog","text":"David Gerard","code":""},{"path":"/reference/multidog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fit flexdog to multiple SNPs. — multidog","text":"","code":"if (FALSE) { data(\"uitdewilligen\") ## Run multiple R sessions using the `nc` variable. mout <- multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, nc = 2) mout$inddf mout$snpdf ## Run multiple external R sessions on the local machine. ## Note that we set `nc = NA`. cl <- parallel::makeCluster(2, timeout = 60) future::plan(future::cluster, workers = cl) mout <- multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, nc = NA) mout$inddf mout$snpdf ## Close cluster and reset future to current R process parallel::stopCluster(cl) future::plan(future::sequential) }"},{"path":"/reference/oracle_cor.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"Calculates correlation oracle MAP estimator (perfect knowledge data generation process) true genotype. useful approximation lot individuals.","code":""},{"path":"/reference/oracle_cor.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"","code":"oracle_cor(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_cor.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_cor.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"Pearson correlation true genotype oracle estimator.","code":""},{"path":"/reference/oracle_cor.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ].","code":""},{"path":"/reference/oracle_cor.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/oracle_cor.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"David Gerard","code":""},{"path":"/reference/oracle_cor.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculates the correlation between the true genotype and an\noracle estimator. — oracle_cor","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ## See how correlation decreases as we ## increase the ploidy. ploidy <- 2 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.9999983 ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.9803195 ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.940216"},{"path":"/reference/oracle_cor_from_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"Calculates correlation oracle MAP estimator (perfect knowledge data generation process) true genotype. useful approximation lot individuals.","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"","code":"oracle_cor_from_joint(jd)"},{"path":"/reference/oracle_cor_from_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"jd matrix numerics. Element (, j) probability genotype - 1 estimated genotype j - 1. usually obtained oracle_joint.","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"Pearson correlation true genotype oracle estimator.","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_cor_from_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_cor_from_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate the correlation of the oracle estimator with the true\ngenotype from the joint distribution matrix. — oracle_cor_from_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) oracle_cor_from_joint(jd = jd) #> [1] 0.940216 ## Compare to oracle_cor oracle_cor(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.940216"},{"path":"/reference/oracle_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"returns joint distribution true genotypes oracle estimator given perfect knowledge data generating process. useful approximation lot individuals.","code":""},{"path":"/reference/oracle_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"","code":"oracle_joint(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"matrix. Element (, j) joint probability estimating genotype +1 true genotype j+1. , estimated genotype indexes rows true genotype indexes columns. using oracle estimator.","code":""},{"path":"/reference/oracle_joint.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ]. See Examples see reconcile output oracle_joint oracle_mis oracle_mis_vec.","code":""},{"path":"/reference/oracle_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"The joint probability of the genotype and the genotype estimate\nof an oracle estimator. — oracle_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) jd #> [,1] [,2] [,3] [,4] [,5] #> [1,] 3.905665e-03 1.759022e-07 8.379767e-17 1.784566e-29 3.601827e-52 #> [2,] 5.849980e-07 4.379346e-02 2.180335e-03 2.159655e-09 3.235599e-26 #> [3,] 1.897235e-20 3.081362e-03 1.961102e-01 1.099225e-02 6.173803e-14 #> [4,] 1.314974e-34 2.427440e-09 1.264700e-02 4.105964e-01 2.601245e-04 #> [5,] 2.284980e-57 7.543647e-25 9.090651e-12 2.863373e-04 3.161461e-01 ## Get same output as oracle_mis this way: 1 - sum(diag(jd)) #> [1] 0.02944818 oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.02944818 ## Get same output as oracle_mis_vec this way: 1 - diag(sweep(x = jd, MARGIN = 2, STATS = colSums(jd), FUN = \"/\")) #> [1] 0.0001497595 0.0657395175 0.0702925658 0.0267344300 0.0008221220 oracle_mis_vec(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.0001497595 0.0657395175 0.0702925658 0.0267344300 0.0008221220"},{"path":"/reference/oracle_mis.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate oracle misclassification error rate. — oracle_mis","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"Given perfect knowledge data generating parameters, oracle_mis calculates misclassification error rate, error rate taken data generation process allele-distribution. ideal level misclassification error rate real method larger rate . useful approximation lot individuals.","code":""},{"path":"/reference/oracle_mis.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"","code":"oracle_mis(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_mis.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_mis.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"double. oracle misclassification error rate.","code":""},{"path":"/reference/oracle_mis.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ].","code":""},{"path":"/reference/oracle_mis.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/oracle_mis.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"David Gerard","code":""},{"path":"/reference/oracle_mis.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate oracle misclassification error rate. — oracle_mis","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ## See how oracle misclassification error rates change as we ## increase the ploidy. ploidy <- 2 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 1.262647e-06 ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.02944818 ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.1329197"},{"path":"/reference/oracle_mis_from_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"Get oracle misclassification error rate directly joint distribution genotype oracle estimator.","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"","code":"oracle_mis_from_joint(jd)"},{"path":"/reference/oracle_mis_from_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"jd matrix numerics. Element (, j) probability genotype - 1 estimated genotype j - 1. usually obtained oracle_joint.","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"double. oracle misclassification error rate.","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_mis_from_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_mis_from_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the oracle misclassification error rate directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_from_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) oracle_mis_from_joint(jd = jd) #> [1] 0.1329197 ## Compare to oracle_cor oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.1329197"},{"path":"/reference/oracle_mis_vec.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"Given perfect knowledge data generating parameters, oracle_mis_vec calculates misclassification error rate genotype. differs oracle_mis average genotype distribution get overall misclassification error rate. , oracle_mis_vec returns vector misclassification error rates conditional genotype.","code":""},{"path":"/reference/oracle_mis_vec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"","code":"oracle_mis_vec(n, ploidy, seq, bias, od, dist)"},{"path":"/reference/oracle_mis_vec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"n read-depth. ploidy ploidy individual. seq sequencing error rate. bias allele-bias. od overdispersion parameter. dist distribution alleles.","code":""},{"path":"/reference/oracle_mis_vec.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"vector numerics. Element oracle misclassification error rate genotyping individual actual genotype + 1.","code":""},{"path":"/reference/oracle_mis_vec.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"ideal level misclassification error rate real method larger rate . useful approximation lot individuals. come dist, need additional assumptions. example, population Hardy-Weinberg equilibrium allele frequency alpha calculate dist using R code: dbinom(x = 0:ploidy, size = ploidy, prob = alpha). Alternatively, know genotypes individual's two parents , say, ref_count1 ref_count2, use get_q_array function updog package: get_q_array(ploidy)[ref_count1 + 1, ref_count2 + 1, ].","code":""},{"path":"/reference/oracle_mis_vec.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/oracle_mis_vec.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"David Gerard","code":""},{"path":"/reference/oracle_mis_vec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns the oracle misclassification rates for each genotype. — oracle_mis_vec","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 4 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) om <- oracle_mis_vec(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) om #> [1] 0.0001497595 0.0657395175 0.0702925658 0.0267344300 0.0008221220 ## Get same output as oracle_mis this way: sum(dist * om) #> [1] 0.02944818 oracle_mis(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.02944818"},{"path":"/reference/oracle_mis_vec_from_joint.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"Get oracle misclassification error rates (conditional true genotype) directly joint distribution genotype oracle estimator.","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"","code":"oracle_mis_vec_from_joint(jd)"},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"jd matrix numerics. Element (, j) probability genotype - 1 estimated genotype j - 1. usually obtained oracle_joint.","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"vector numerics. Element oracle misclassification error rate genotyping individual actual genotype + 1.","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"David Gerard","code":""},{"path":"/reference/oracle_mis_vec_from_joint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the oracle misclassification error rates (conditional on\ntrue genotype) directly from the\njoint distribution of the genotype and the oracle estimator. — oracle_mis_vec_from_joint","text":"","code":"## Hardy-Weinberg population with allele-frequency of 0.75. ## Moderate bias and moderate overdispersion. ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) oracle_mis_vec_from_joint(jd = jd) #> [1] 0.001855178 0.186231038 0.262779904 0.249400633 0.177957888 0.103565813 #> [7] 0.005097110 ## Compare to oracle_cor oracle_mis_vec(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) #> [1] 0.001855178 0.186231038 0.262779904 0.249400633 0.177957888 0.103565813 #> [7] 0.005097110"},{"path":"/reference/oracle_plot.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"obtaining joint distribution true genotype estimated genotype oracle estimator using oracle_joint, can use oracle_plot visualize joint distribution.","code":""},{"path":"/reference/oracle_plot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"","code":"oracle_plot(jd)"},{"path":"/reference/oracle_plot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"jd matrix containing joint distribution true genotype oracle estimator. Usually, obtained call oracle_joint.","code":""},{"path":"/reference/oracle_plot.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"ggplot object containing oracle plot. x-axis indexes possible values estimated genotype. y-axis indexes possible values true genotype. number cell (, j) probability individual true genotype estimated genotype j. using oracle estimator. cells also color-coded size probability cell. top listed oracle misclassification error rate correlation true genotype estimated genotype. quantities may derived joint distribution.","code":""},{"path":"/reference/oracle_plot.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":[]},{"path":"/reference/oracle_plot.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"David Gerard","code":""},{"path":"/reference/oracle_plot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Construct an oracle plot from the output of oracle_joint. — oracle_plot","text":"","code":"ploidy <- 6 dist <- stats::dbinom(0:ploidy, ploidy, 0.75) jd <- oracle_joint(n = 100, ploidy = ploidy, seq = 0.001, bias = 0.7, od = 0.01, dist = dist) pl <- oracle_plot(jd = jd) print(pl)"},{"path":"/reference/plot.flexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Draw a genotype plot from the output of flexdog. — plot.flexdog","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"wrapper plot_geno. create genotype plot single SNP.","code":""},{"path":"/reference/plot.flexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"","code":"# S3 method for flexdog plot(x, use_colorblind = TRUE, ...)"},{"path":"/reference/plot.flexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"x flexdog object. use_colorblind use colorblind-safe palette (TRUE) (FALSE)? TRUE allowed ploidy less equal 6. ... used.","code":""},{"path":"/reference/plot.flexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"ggplot object genotype plot.","code":""},{"path":"/reference/plot.flexdog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"genotype plot, x-axis contains counts non-reference allele y-axis contains counts reference allele. dashed lines expected counts (reference alternative) given sequencing error rate allele-bias. plots color-coded maximum--posterior genotypes. Transparency proportional maximum posterior probability individual's genotype. Thus, less certain genotype transparent individuals. types plots used Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/plot.flexdog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/plot.flexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Draw a genotype plot from the output of flexdog. — plot.flexdog","text":"David Gerard","code":""},{"path":"/reference/plot.multidog.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot the output of multidog. — plot.multidog","title":"Plot the output of multidog. — plot.multidog","text":"Produce genotype plots output multidog. may select SNPs plot.","code":""},{"path":"/reference/plot.multidog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot the output of multidog. — plot.multidog","text":"","code":"# S3 method for multidog plot(x, indices = seq(1, min(5, nrow(x$snpdf))), ...)"},{"path":"/reference/plot.multidog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot the output of multidog. — plot.multidog","text":"x output multidog. indices vector integers. indices SNPs plot. ... used.","code":""},{"path":"/reference/plot.multidog.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot the output of multidog. — plot.multidog","text":"genotype plot, x-axis contains counts non-reference allele y-axis contains counts reference allele. dashed lines expected counts (reference alternative) given sequencing error rate allele-bias. plots color-coded maximum--posterior genotypes. Transparency proportional maximum posterior probability individual's genotype. Thus, less certain genotype transparent individuals. types plots used Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/plot.multidog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Plot the output of multidog. — plot.multidog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/plot.multidog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Plot the output of multidog. — plot.multidog","text":"David Gerard","code":""},{"path":"/reference/plot_geno.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a genotype plot. — plot_geno","title":"Make a genotype plot. — plot_geno","text":"x-axis counts non-reference allele, y-axis counts reference allele. Transparency controlled maxpostprob vector. types plots used Gerard et. al. (2018) Gerard Ferrão (2020).","code":""},{"path":"/reference/plot_geno.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a genotype plot. — plot_geno","text":"","code":"plot_geno( refvec, sizevec, ploidy, p1ref = NULL, p1size = NULL, p2ref = NULL, p2size = NULL, geno = NULL, seq = 0, bias = 1, maxpostprob = NULL, p1geno = NULL, p2geno = NULL, use_colorblind = TRUE )"},{"path":"/reference/plot_geno.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a genotype plot. — plot_geno","text":"refvec vector non-negative integers. number reference reads observed individuals sizevec vector positive integers. total number reads individuals. ploidy non-negative integer. ploidy species. p1ref vector non-negative integers. number reference reads observed parent 1 (individuals siblings). p1size vector positive integers. total number reads parent 1 (individuals siblings). p2ref vector non-negative integers. number reference reads observed parent 2 (individuals siblings). p2size vector positive integers. total number reads parent 2 (individuals siblings). geno individual genotypes. seq sequencing error rate. bias bias parameter. maxpostprob vector posterior probabilities modal genotype. p1geno Parent 1's genotype. p2geno Parent 2's genotype. use_colorblind logical. use colorblind safe palette (TRUE), (FALSE)? allowed ploidy <= 6.","code":""},{"path":"/reference/plot_geno.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Make a genotype plot. — plot_geno","text":"ggplot object genotype plot.","code":""},{"path":"/reference/plot_geno.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Make a genotype plot. — plot_geno","text":"parental genotypes provided (p1geno p2geno) colored offspring. Since often hard see, small black dot also indicate position.","code":""},{"path":"/reference/plot_geno.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Make a genotype plot. — plot_geno","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":"/reference/plot_geno.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Make a genotype plot. — plot_geno","text":"David Gerard","code":""},{"path":"/reference/plot_geno.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Make a genotype plot. — plot_geno","text":"","code":"data(\"snpdat\") refvec <- snpdat$counts[snpdat$snp == \"SNP1\"] sizevec <- snpdat$size[snpdat$snp == \"SNP1\"] ploidy <- 6 plot_geno(refvec = refvec, sizevec = sizevec, ploidy = ploidy)"},{"path":"/reference/rflexdog.html","id":null,"dir":"Reference","previous_headings":"","what":"Simulate GBS data from the flexdog likelihood. — rflexdog","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"take vector genotypes vector total read-counts, generate vector reference counts. get genotypes, use rgeno. likelihood used generate read-counts described detail Gerard et. al. (2018).","code":""},{"path":"/reference/rflexdog.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"","code":"rflexdog(sizevec, geno, ploidy, seq = 0.005, bias = 1, od = 0.001)"},{"path":"/reference/rflexdog.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"sizevec vector total read-counts individuals. geno vector genotypes individuals. .e. number reference alleles individual . ploidy ploidy species. seq sequencing error rate. bias bias parameter. Pr(read selected) / Pr(read selected). od overdispersion parameter. See Details rho variable betabinom.","code":""},{"path":"/reference/rflexdog.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"vector length sizevec. ith element number reference counts individual .","code":""},{"path":"/reference/rflexdog.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":[]},{"path":"/reference/rflexdog.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"David Gerard","code":""},{"path":"/reference/rflexdog.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simulate GBS data from the flexdog likelihood. — rflexdog","text":"","code":"set.seed(1) n <- 100 ploidy <- 6 ## Generate the genotypes of individuals from an F1 population, ## where the first parent has 1 copy of the reference allele ## and the second parent has two copies of the reference ## allele. genovec <- rgeno(n = n, ploidy = ploidy, model = \"f1\", p1geno = 1, p2geno = 2) ## Get the total number of read-counts for each individual. ## Ideally, you would take this from real data as the total ## read-counts are definitely not Poisson. sizevec <- stats::rpois(n = n, lambda = 200) ## Generate the counts of reads with the reference allele ## when there is a strong bias for the reference allele ## and there is no overdispersion. refvec <- rflexdog(sizevec = sizevec, geno = genovec, ploidy = ploidy, seq = 0.001, bias = 0.5, od = 0) ## Plot the simulated data using plot_geno. plot_geno(refvec = refvec, sizevec = sizevec, ploidy = ploidy, seq = 0.001, bias = 0.5)"},{"path":"/reference/rgeno.html","id":null,"dir":"Reference","previous_headings":"","what":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"simulate genotypes sample individuals drawn one populations supported flexdog. See details flexdog models allowed. genotype distributions described detail Gerard Ferrão (2020).","code":""},{"path":"/reference/rgeno.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"","code":"rgeno( n, ploidy, model = c(\"hw\", \"bb\", \"norm\", \"f1\", \"s1\", \"flex\", \"uniform\"), allele_freq = NULL, od = NULL, p1geno = NULL, p2geno = NULL, pivec = NULL, mu = NULL, sigma = NULL )"},{"path":"/reference/rgeno.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"n number observations. ploidy ploidy species. model form prior take? See Details flexdog. allele_freq model = \"hw\", allele frequency population. model, NULL. od model = \"bb\", overdispersion parameter beta-binomial distribution. See betabinom details. model, NULL. p1geno Either first parent's genotype model = \"f1\", parent's genotype model = \"s1\". model, NULL. p2geno second parent's genotype model = \"f1\". model, NULL. pivec vector probabilities. model = \"ash\", represents mixing proportions discrete uniforms. model = \"flex\", element probability genotype - 1. model, NULL. mu model = \"norm\", mean normal. model, NULL. sigma model = \"norm\", standard deviation normal. model, NULL.","code":""},{"path":"/reference/rgeno.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"vector length n genotypes sampled individuals.","code":""},{"path":"/reference/rgeno.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"List non-NULL arguments: model = \"flex\": pivec model = \"hw\": allele_freq model = \"f1\": p1geno p2geno model = \"s1\": p1geno model = \"uniform\": non-NULL arguments model = \"bb\": allele_freq od model == \"norm\": mu sigma","code":""},{"path":"/reference/rgeno.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":"/reference/rgeno.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"David Gerard","code":""},{"path":"/reference/rgeno.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simulate individual genotypes from one of the supported flexdog models. — rgeno","text":"","code":"## F1 Population where parent 1 has 1 copy of the referenc allele ## and parent 2 has 4 copies of the reference allele. ploidy <- 6 rgeno(n = 10, ploidy = ploidy, model = \"f1\", p1geno = 1, p2geno = 4) #> [1] 3 3 3 2 2 4 1 2 2 2 ## A population in Hardy-Weinberge equilibrium with an ## allele frequency of 0.75 rgeno(n = 10, ploidy = ploidy, model = \"hw\", allele_freq = 0.75) #> [1] 5 3 3 4 3 5 5 4 4 4"},{"path":"/reference/snpdat.html","id":null,"dir":"Reference","previous_headings":"","what":"GBS data from Shirasawa et al (2017) — snpdat","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"Contains counts reference alleles total read counts GBS data Shirasawa et al (2017) three SNPs used examples Gerard et. al. (2018).","code":""},{"path":"/reference/snpdat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"","code":"snpdat"},{"path":"/reference/snpdat.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"tibble 419 rows 4 columns: id identification label individuals. snp SNP label. counts number read-counts support reference allele. size total number read-counts given SNP.","code":""},{"path":"/reference/snpdat.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"doi:10.1038/srep44207","code":""},{"path":"/reference/snpdat.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"tibble. See Format Section.","code":""},{"path":"/reference/snpdat.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"GBS data from Shirasawa et al (2017) — snpdat","text":"Shirasawa, Kenta, Masaru Tanaka, Yasuhiro Takahata, Daifu Ma, Qinghe Cao, Qingchang Liu, Hong Zhai, Sang-Soo Kwak, Jae Cheol Jeong, Ung-Han Yoon, Hyeong-Un Lee, Hideki Hirakawa, Sahiko Isobe \"high-density SNP genetic map consisting complete set homologous groups autohexaploid sweetpotato (Ipomoea batatas).\" Scientific Reports 7 (2017). doi:10.1038/srep44207 Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .","code":""},{"path":"/reference/uitdewilligen.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"list containing matrix reference counts, matrix total counts, ploidy level (4) species. subset data Uitdewilligen et al (2013).","code":""},{"path":"/reference/uitdewilligen.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"","code":"uitdewilligen"},{"path":"/reference/uitdewilligen.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"list containing three objects. Two matrices numeric scalar: refmat matrix read counts containing reference allele. rows index individuals columns index SNPs. sizemat matrix total number read counts. rows index individuals columns index SNPs. ploidy ploidy level species (just 4).","code":""},{"path":"/reference/uitdewilligen.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"doi:10.1371/journal.pone.0062355","code":""},{"path":"/reference/uitdewilligen.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"list. See Format Section.","code":""},{"path":"/reference/uitdewilligen.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Subset of individuals and SNPs from Uitdewilligen et al (2013). — uitdewilligen","text":"Uitdewilligen, J. G., Wolters, . M. ., Bjorn, B., Borm, T. J., Visser, R. G., & van Eck, H. J. (2013). next-generation sequencing method genotyping--sequencing highly heterozygous autotetraploid potato. PLoS One, 8(5), e62355. doi:10.1371/journal.pone.0062355","code":""},{"path":"/reference/updog-package.html","id":null,"dir":"Reference","previous_headings":"","what":"updog Flexible Genotyping for Polyploids — updog-package","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"Implements empirical Bayes approaches genotype polyploids next generation sequencing data accounting allele bias, overdispersion, sequencing error. main functions flexdog() multidog(), allow specification many different genotype distributions. Also provided functions simulate genotypes, rgeno(), read-counts, rflexdog(), well functions calculate oracle genotyping error rates, oracle_mis(), correlation true genotypes, oracle_cor(). latter two functions useful read depth calculations. Run browseVignettes(package = \"updog\") R example usage. See Gerard et al. (2018) Gerard Ferrao (2020) details implemented methods.","code":""},{"path":"/reference/updog-package.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"package named updog \"Using Parental Data Offspring Genotyping\" originally developed method full-sib populations, works now general populations. best competitor probably fitPoly package, can check https://cran.r-project.org/package=fitPoly. Though, think updog returns better calibrated measures uncertainty next-generation sequencing data. find bug want enhancement, please submit issue https://github.com/dcgerard/updog/issues.","code":""},{"path":"/reference/updog-package.html","id":"updog-functions","dir":"Reference","previous_headings":"","what":"updog Functions","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"flexdog() main function fits empirical Bayes approach genotype polyploids next generation sequencing data. multidog() convenience function running flexdog() many SNPs. function provides support parallel computing. format_multidog() Return arrayicized elements output multidog(). filter_snp() Filter SNPs based output multidog() rgeno() simulate genotypes sample one models allowed flexdog(). rflexdog() Simulate read-counts flexdog() model. plot.flexdog() Plotting output flexdog(). plot.multidog() Plotting output multidog(). oracle_joint() joint distribution true genotype oracle estimator. oracle_plot() Visualize output oracle_joint(). oracle_mis() oracle misclassification error rate (Bayes rate). oracle_cor() Correlation true genotype oracle estimated genotype.","code":""},{"path":"/reference/updog-package.html","id":"updog-datasets","dir":"Reference","previous_headings":"","what":"updog Datasets","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"snpdat small example dataset using flexdog. uitdewilligen small example dataset","code":""},{"path":"/reference/updog-package.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"Gerard, D., Ferrão, L. F. V., Garcia, . . F., & Stephens, M. (2018). Genotyping Polyploids Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 . Gerard, David, Luís Felipe Ventorim Ferrão. \"Priors genotyping polyploids.\" Bioinformatics 36, . 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .","code":""},{"path":"/reference/updog-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"updog Flexible Genotyping for Polyploids — updog-package","text":"David Gerard","code":""},{"path":"/reference/wem.html","id":null,"dir":"Reference","previous_headings":"","what":"EM algorithm to fit weighted ash objective. — wem","title":"EM algorithm to fit weighted ash objective. — wem","text":"Solves following optimization problem $$\\max_{\\pi} \\sum_k w_k \\log(\\sum_j \\pi_j \\ell_jk).$$ using weighted EM algorithm.","code":""},{"path":"/reference/wem.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"EM algorithm to fit weighted ash objective. — wem","text":"","code":"wem(weight_vec, lmat, pi_init, lambda, itermax, obj_tol)"},{"path":"/reference/wem.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"EM algorithm to fit weighted ash objective. — wem","text":"weight_vec vector weights. element weight_vec corresponds column lmat. lmat matrix inner weights. columns \"individuals\" rows \"classes.\" pi_init initial values pivec. element pi_init corresponds row lmat. lambda penalty pi's. greater 0 really really small. itermax maximum number EM iterations take. obj_tol objective stopping criterion.","code":""},{"path":"/reference/wem.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"EM algorithm to fit weighted ash objective. — wem","text":"vector numerics.","code":""},{"path":"/reference/wem.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"EM algorithm to fit weighted ash objective. — wem","text":"David Gerard","code":""},{"path":"/reference/wem.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"EM algorithm to fit weighted ash objective. — wem","text":"","code":"set.seed(2) n <- 3 p <- 5 lmat <- matrix(stats::runif(n * p), nrow = n) weight_vec <- seq_len(p) pi_init <- stats::runif(n) pi_init <- pi_init / sum(pi_init) wem(weight_vec = weight_vec, lmat = lmat, pi_init = pi_init, lambda = 0, itermax = 100, obj_tol = 10^-6) #> [,1] #> [1,] 3.830930e-01 #> [2,] 6.169070e-01 #> [3,] 3.041614e-09"},{"path":"/news/index.html","id":"updog-215","dir":"Changelog","previous_headings":"","what":"updog 2.1.5","title":"updog 2.1.5","text":"Used updated Rcpp generate RcppExports fix CRAN warning.","code":""},{"path":"/news/index.html","id":"updog-214","dir":"Changelog","previous_headings":"","what":"updog 2.1.4","title":"updog 2.1.4","text":"CRAN release: 2023-11-17 Removed ggthemes dependency. Removed usage ggplot2::aes_string(), since deprecated, replaced tidy evaluation idioms.","code":""},{"path":"/news/index.html","id":"updog-213","dir":"Changelog","previous_headings":"","what":"updog 2.1.3","title":"updog 2.1.3","text":"CRAN release: 2022-10-18 Bug fix: Use && instead & C++.","code":""},{"path":"/news/index.html","id":"updog-212","dir":"Changelog","previous_headings":"","what":"updog 2.1.2","title":"updog 2.1.2","text":"CRAN release: 2022-01-24 Fixed bug use assertthat::are_equal() testthat::expect_equal(). See 21 Jan 2022 R-devel/NEWS states: .equal.numeric() gains sanity check tolerance argument - calling .equal(, b, c) three numeric vectors surprisingly common error.","code":""},{"path":"/news/index.html","id":"updog-211","dir":"Changelog","previous_headings":"","what":"updog 2.1.1","title":"updog 2.1.1","text":"CRAN release: 2021-10-25 Added upper bound sequencing error rate flexdog_full() (, hence, flexdog() multidog()). protects poor behavior observed corner case. Specifically, F1 populations offspring genotype sequenced moderate low depth. Fixed stale URLs, fixed style issues found lintr.","code":""},{"path":"/news/index.html","id":"updog-210","dir":"Changelog","previous_headings":"","what":"updog 2.1.0","title":"updog 2.1.0","text":"parallel backend multidog() now handled future package. use nc argument multidog(), still run parallel using multiple R sessions local machine. However, can now use functionality future choose evaluation strategy, setting nc = NA. also allow use schedulers high performance computing environments future.batchtools package. See multidog() function documentation details. vignette “Genotyping Many SNPs multidog()” goes example using future package. new experimental function, export_vcf(), works export multidog objects VCF file. yet exported still bugs fix. plot.multidog() now plot parent read-counts F1 S1 populations. Internally, multidog() now uses iterators iterators package send subsets data R process. new internal .combine function used foreach() call multidog() order decrease memory usage multidog().","code":""},{"path":"/news/index.html","id":"updog-202","dir":"Changelog","previous_headings":"","what":"updog 2.0.2","title":"updog 2.0.2","text":"CRAN release: 2020-07-21 massive edit updog software. Major changes include: support model = \"ash\". seemed model = \"norm\" always better faster, just got rid \"ash\" option. also extremely simplified code. Removal mupdog(). think good idea, computation way slow usable. Revision model = \"f1pp\" model = \"s1pp\". now include interpretable parameterizations meant identified via another R package. support tetraploids right now. multidog() now prints nice ASCII art ’s run. format_multidog() now allows format multiple variables terms multidimensional array. Fixes bug format_multidog() reordering SNP dimensions. fine long folks used dimnames properly, now allow folks also use dim positions. Updog now returns genotype log-likelihoods.","code":""},{"path":"/news/index.html","id":"updog-121","dir":"Changelog","previous_headings":"","what":"updog 1.2.1","title":"updog 1.2.1","text":"Adds filter_snp() filtering output multidog() based predicates terms variables snpdf. Removes stringr Imports. using one place replaced code base R code. Removes Rmpfr Suggests. longer needed since CVXR longer suggested.","code":""},{"path":"/news/index.html","id":"updog-120","dir":"Changelog","previous_headings":"","what":"updog 1.2.0","title":"updog 1.2.0","text":"CRAN release: 2020-01-28 Adds multidog() genotyping multiple SNPs using parallel computing. Adds plot.multidog() plotting output multidog(). Adds format_multidog() formatting output multidog() matrix. Removes dependency CVXR. makes install maintenance little easier. defaults specific problem little faster anyway. longer changes color scale plot_geno() based genotypes present. .cpp files, now coerce objects unsigned comparing. gets rid warnings install.","code":""},{"path":"/news/index.html","id":"updog-113","dir":"Changelog","previous_headings":"","what":"updog 1.1.3","title":"updog 1.1.3","text":"CRAN release: 2019-11-21 Updates documentation include Bioinformatics publication, Gerard Ferrão (2020) . Adds “internal” keyword functions users don’t need. Removes tidyverse Suggests field. using vignettes, changed base R (except ggplot2).","code":""},{"path":"/news/index.html","id":"updog-111","dir":"Changelog","previous_headings":"","what":"updog 1.1.1","title":"updog 1.1.1","text":"CRAN release: 2019-09-09 Updates documentation include Gerard Ferrão (2020) reference. Minor fixes documentation.","code":""},{"path":"/news/index.html","id":"updog-110","dir":"Changelog","previous_headings":"","what":"updog 1.1.0","title":"updog 1.1.0","text":"CRAN release: 2019-07-31 Introduces flexible priors general populations. Places normal prior distribution logit overdispersion parameter. might change genotype calls previous versions updog. reproduce genotype calls previous versions updog, simply set mean_od = 0 var_od = Inf flexdog(). Adds method = \"custom\" option flexdog(). lets users choose genotype distribution completely known priori. Documentation updates.","code":""},{"path":"/news/index.html","id":"updog-101","dir":"Changelog","previous_headings":"","what":"updog 1.0.1","title":"updog 1.0.1","text":"CRAN release: 2018-07-27 Fixes bug option model = \"s1pp\" flexdog(). originally constraining levels preferential pairing segregations parent. now fixed. downside model = \"s1pp\" now supported ploidy = 4 ploidy = 6. optimization becomes difficult larger ploidy levels. fixed documentation. Perhaps biggest error comes snippet original documentation flexdog: value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.95. now says value prop_mis intuitive measure quality SNP. prop_mis posterior proportion individuals mis-genotyped. want SNPS accurately genotype, say, 95% individuals, discard SNPs prop_mis 0.05. ’ve now exported C++ functions think useful. can call usual way.","code":""},{"path":"/news/index.html","id":"updog-0990","dir":"Changelog","previous_headings":"","what":"updog 0.99.0","title":"updog 0.99.0","text":"complete re-working code updog. old version may found updogAlpha package. main function now flexdog(). experimental approach mupdog() now live. provide guarantees mupdog()’s performance. Oracle misclassification error rates may calculated oracle_mis(). Genotypes can simulated using rgeno(). Next-generation sequencing data can simulated using rflexdog().","code":""}]