% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Dropout.R
\name{CorrectDropout}
\alias{CorrectDropout}
\title{Correct for experimental/bioinformatic dropout of labeled RNA.}
\usage{
CorrectDropout(
  obj,
  strategy = c("grandR", "bakR"),
  grouping_factors = NULL,
  features = NULL,
  populations = NULL,
  fraction_design = NULL,
  repeatID = NULL,
  exactMatch = TRUE,
  read_cutoff = 25,
  dropout_cutoff = 5,
  ...
)
}
\arguments{
\item{obj}{An EZbakRFractions object, which is an EZbakRData object on which
you have run \code{EstimateFractions()}.}

\item{strategy}{Which dropout correction strategy to use. Options are:
\itemize{
\item grandR: Described \href{https://pubmed.ncbi.nlm.nih.gov/38381903/}{here}.
Cite that work and \href{https://www.nature.com/articles/s41467-023-39163-4}{grandR} if using this strategy. Quasi-non-parametric strategy
that finds an estimate of the dropout rate that eliminates any linear correlation
between the newness of a transcript and the difference in +s4U and -s4U normalized
read counts.
\item bakR: Described \href{https://simonlabcode.github.io/bakR/articles/Dropout.html}{here}.
Uses a simple generative model of dropout to derive a likelihood function, and the
dropout rate is estimated via the method of maximum likelihood.
}
The "bakR" strategy has the advantage of being model-derived, making it possible
to assess model fit and thus whether the simple assumptions of both the "bakR"
and "grandR" dropout models are met. The "grandR" strategy has the advantage of
being more robust. Thus, the "grandR" strategy is currently used by default.}

\item{grouping_factors}{Which sample-detail columns in the metadf should be used
to group -s4U samples by for calculating the average -s4U RPM? The default value of
\code{NULL} will cause all sample-detail columns to be used.}

\item{features}{Character vector of the set of features you want to stratify
reads by and estimate proportions of each RNA population. The default of \code{NULL}
will expect there to be only one fractions table in the EZbakRFractions object.}

\item{populations}{Mutational populations that were analyzed to generate the
fractions table to use. For example, this would be "TC" for a standard
s4U-based nucleotide recoding experiment.}

\item{fraction_design}{"Design matrix" specifying which RNA populations exist
in your samples. By default, this will be created automatically and will assume
that all combinations of the \code{mutrate_populations} you have requested to analyze are
present in your data. If this is not the case for your data, then you will have
to create one manually. See docs for \code{EstimateFractions} (run ?EstimateFractions()) for more details.}

\item{repeatID}{If multiple \code{fractions} tables exist with the same metadata,
then this is the numerical index by which they are distinguished.}

\item{exactMatch}{If TRUE, then \code{features} must exactly match the \code{features}
metadata for a given fractions table for it to be used. Means that you cannot
specify a subset of features by default. Set this to FALSE if you would like
to specify a feature subset.}

\item{read_cutoff}{Minimum number of reads for a feature to be used to fit
the dropout model.}

\item{dropout_cutoff}{Maximum ratio of -s4U:+s4U RPMs for a feature to be
used to fit the dropout model (i.e., simple outlier filtering cutoff).}

\item{...}{Parameters passed to internal \code{calculate_dropout()} function;
namely \code{dropout_cutoff_min}, which sets the minimum dropout value used for
fitting the dropout model.}
}
\value{
An \code{EZbakRData} object with the specified "fractions" table replaced
with a dropout corrected table.
}
\description{
Uses the strategy described \href{https://simonlabcode.github.io/bakR/articles/Dropout.html}{here}, and similar to that originally presented
in \href{https://pubmed.ncbi.nlm.nih.gov/38381903/}{Berg et al. 2024}.
}
\details{
Dropout is the disproportionate loss of labeled RNA/reads from said RNA
described independently \href{https://pubmed.ncbi.nlm.nih.gov/38381903/}{here}
and \href{https://pubmed.ncbi.nlm.nih.gov/37292657/}{here}. It can originate from a combination of
bioinformatic (loss of high mutation content reads due to alignment problems),
technical (loss of labeled RNA during RNA extraction), and biological (transcriptional
shutoff in rare cases caused by metabolic label toxicity) sources.
\code{CorrectDropout()} compares label-fed and label-free controls from the same
experimental conditions to estimate and correct for this dropout. It assumes
that there is a single number (referred to as the dropout rate, or pdo) which
describes the rate at which labeled RNA is lost (relative to unlabeled RNA).
pdo ranges from 0 (no dropout) to 1 (complete loss of all labeled RNA), and
is thus interpreted as the percentage of labeled RNA/reads from labeled RNA
disproportionately lost, relative to the equivalent unlabeled species.
}
\examples{

# Simulate data to analyze
simdata <- EZSimulate(30)

# Create EZbakR input
ezbdo <- EZbakRData(simdata$cB, simdata$metadf)

# Estimate Fractions
ezbdo <- EstimateFractions(ezbdo)

# Correct for dropout
ezbdo <- CorrectDropout(ezbdo)

}
