---
title: "Calculating LOD from Olink® Explore data" 
output: 
  html_vignette:
    toc: true
    toc_depth: 3
    includes:
      in_header: ../man/figures/logo.html
vignette: >
  %\VignetteIndexEntry{Calculating LOD from Olink® Explore data}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  tidy = FALSE,
  tidy.opts = list(width.cutoff = 95),
  fig.width = 6,
  fig.height = 3,
  message = FALSE,
  warning = FALSE,
  time_it = TRUE,
  fig.align = "center"
)
```

## Introduction

This tutorial describes how to use Olink^®^ Analyze to integrate Limit of
Detection (LOD) into Olink^®^ Explore HT, Olink^®^ Reveal, and Olink^®^ Explore
384/3072 datasets. Although it is recommended to use all Olink Explore and Olink
Reveal data in downstream analyses, LOD information can be useful when 
performing technical evaluations of a dataset.

In this tutorial, you will learn how to use `olink_lod()` to add LOD information
to your Olink Explore or Olink Reveal dataset. Note that Olink Analyze does not 
contain example Olink Explore HT, Olink Reveal, or Olink Explore 384/3072 
datasets within the package, so external data will be necessary for the code 
below to work. The external data should contain internal and external controls 
for proper calculation and normalization. All file paths should be replaced 
with a path to your data and fixed LOD reference file (if applicable).

## Integrating LOD

Limit of Detection (LOD) is a metric that indicates the lowest measurable value
of a protein. LOD can be helpful when performing technical evaluations of NPX™
datasets, such as calculating CVs. As a note, LOD is less important in
downstream statistical analyses as values under LOD typically converge across
groups. As such, including data below LOD is unlikely to increase the risk of
false positive discoveries. Furthermore, data below LOD can be instrumental in
downstream analyses such as biomarker discovery as a protein may be well
expressed in one group and not measured in another group. In this case, this
protein can be a strong biomarker candidate for specific groups.

LOD can be added to Olink Explore or Olink Reveal NPX datasets using
`olink_lod()`. This function can calculate LOD from an NPX dataset using the
dataset's negative controls or a list of predetermined fixed LOD values
(available in the Document Download Center at
[olink.com](https://olink.com/knowledge/documents)). As the default setting,
`olink_lod()` will calculate LOD using a dataset's negative controls.

Olink Explore and Olink Reveal data are delivered as either plate control (PC) 
normalized or intensity normalized (the normalization type employed is 
indicated in the NPX file column Normalization), where the latter is dependent 
on the assumption that the analyzed samples are randomized. These are reported 
in the two respective columns, NPX and PCNormalizedNPX. Please notice that for 
PC normalized datasets the content in these two columns will be identical, 
while for intensity normalized datasets the NPX column will include the 
intensity normalized values. Similarly, the `olink_lod()` function adds two 
columns to your dataset; LOD and PCNormalizedLOD, respectively. For a PC 
normalized dataset, the content in these two columns will be identical, while 
for an intensity normalized dataset the LOD column will contain intensity 
normalized LOD values. Examples of results for plate control and intensity 
normalization are shown in the tables below.

```{r, echo = FALSE}
set.seed(1234)

# Load example dataset (npx_data1)
npx_data1 <- OlinkAnalyze::npx_data1

# NPX file preprocessing
## Generate check log
check_log_npx_data1 <- OlinkAnalyze::check_npx(
  df = npx_data1
)

## Clean NPX
npx_data1_clean <- OlinkAnalyze::clean_npx(
  df = npx_data1,
  check_log = check_log_npx_data1
)

## Generate check log on cleaned data
check_log_npx_data1_clean <- OlinkAnalyze::check_npx(
  df = npx_data1_clean
)

table1 <- npx_data1_clean |>
  dplyr::slice_head(
    n = 6L
  ) |>
  dplyr::select(
    -dplyr::all_of(
      c("Index", "MissingFreq", "Panel_Version", "QC_Warning", "Subject",
        "Treatment", "Site", "Time", "Project", "Panel", "PlateID")
    )
  ) |>
  dplyr::mutate(
    Count = round(
      x = .data[["NPX"]] * (100L + sample(x = seq(from = -5L, to = 15L),
                                          size = 1L))
    )
  ) |>
  dplyr::mutate(
    SampleType = "SAMPLE"
  ) |>
  dplyr::mutate(
    Normalization = "Plate control"
  ) |>
  dplyr::mutate(
    NPX = round(x = .data[["NPX"]], digits = 2L)
  ) |>
  dplyr::mutate(
    LOD = round(x = .data[["LOD"]], digits = 2L)
  ) |>
  dplyr::mutate(
    PCNormalizedNPX = .data[["NPX"]]
  ) |>
  dplyr::mutate(
    PCNormalizedLOD = .data[["LOD"]]
  ) |>
  dplyr::select(
    dplyr::all_of(
      c("SampleID", "SampleType", "OlinkID", "UniProt", "Assay", "Count", "NPX",
        "PCNormalizedNPX", "Normalization", "LOD", "PCNormalizedLOD")
    )
  )

table1 |>
  knitr::kable(
    caption = "Example results from Plate Control Normalized Project"
  ) |>
  kableExtra::kable_styling(
    font_size = 10L
  )

table1 |>
  dplyr::mutate(
    Normalization = "Intensity"
  ) |>
  dplyr::mutate(
    NPX = round(x = .data[["NPX"]] + 4.16, digits = 2L)
  ) |>
  dplyr::mutate(
    LOD = round(x = .data[["LOD"]] + 4.16, digits = 2L)
  ) |>
  dplyr::select(
    dplyr::all_of(
      c("SampleID", "SampleType", "OlinkID", "UniProt", "Assay", "Count", "NPX",
        "PCNormalizedNPX", "Normalization", "LOD", "PCNormalizedLOD")
    )
  ) |>
  knitr::kable(
    caption = "Example results from Intensity Normalized Project"
  ) |>
  kableExtra::kable_styling(
    font_size = 10
  )
```

## Import Olink NGS datasets

Olink Next Generation Sequencing (NGS) datasets are standard Olink Explore HT,
Olink Reveal, and Olink Explore 384/3072 NPX tables. The `read_NPX()` function
can be used to import an NPX file in parquet form as generated by Olink
Software. More information on using `read_NPX()` can be found in the
[Olink Analyze Overview tutorial](https://CRAN.R-project.org/package=OlinkAnalyze).

```{r npx_data_for_lod, eval = FALSE, message = FALSE, warning = FALSE}
# Preprocessing steps for both fixed LOD and Negative Control LOD integration

## Load NPX file
df_npx <- OlinkAnalyze::read_NPX(
  filename = "Path_to/Explore_NPX_file.parquet"
)

## Check NPX data
check_log_df_npx <- OlinkAnalyze::check_npx(
  df = df_npx
)

## Clean NPX data
df_npx_clean <- OlinkAnalyze::clean_npx(
  df = df_npx,
  check_log = check_log_df_npx,
  # do not remove controls or warnings to ensure LOD can be calculated correctly
  remove_control_assay = FALSE,
  remove_control_sample = FALSE,
  remove_assay_warning = FALSE,
  remove_qc_warning = FALSE
)

## Generate check log on cleaned data
check_log_df_npx_clean <- OlinkAnalyze::check_npx(
  df = df_npx_clean
)

# Cleanup intermediate objects
rm(
  df_npx,
  check_log_df_npx
)
```

------------------------------------------------------------------------

#### *A note on calculating LOD with clean_npx()*

*When the function `clean_npx()` is used before calculating LOD, one should
disable exclusion of internal controls, external controls, assay warnings and
sample warnings to ensure that the LOD values are calculated correctly. This can
be done by setting the following parameters to FALSE in the `clean_npx()`
function: `remove_control_assay`, `remove_control_sample`,
`remove_assay_warning` and `remove_qc_warning`. This should be done regardless
of the LOD method that is used (fixed LOD or Negative Control LOD).*

*By default, `clean_npx()` removes both internal and external controls, which 
are required to calculate count-based LOD values. If `olink_lod()` is run 
without the internal or external controls present, the function will still 
execute, but all count-based LODs will be assigned `NA` values. In addition,
the `olink_lod()` function contains code to exclude only the relevant assay and
QC warnings for the LOD calculation, so these warnings do not need to be removed
for LOD to be calculated.*

*The function `clean_npx()` can be applied with default settings after LOD has
been calculated to remove controls and warnings from the dataset and prepare it
for downstream analyses.*

*More information on `clean_npx()` can be found in the
[Olink Analyze Overview tutorial](https://CRAN.R-project.org/package=OlinkAnalyze).*

------------------------------------------------------------------------

## Integrating Negative Control LOD

The negative control (NC) LOD method requires at least 10 negative controls in a
dataset. Negative control data is available in the standard exported Olink NGS
NPX parquet files. NCs can be identified through the SampleID and SampleType
columns.

A negative control will not contribute to the minimum number of required NCs if
the negative control does not pass sample QC criteria (sample QC failure or
warning) in all of the data (i.e. all datapoints measured for that sample).

Negative controls are used to calculate LOD from either PC normalized NPX or
counts. For assays with more than 150 counts in one of the negative controls,
LOD is calculated using the median PC normalized NPX and adding 3 standard
deviations, or 0.2 NPX whichever is larger. For assays with fewer than 150
counts in all negative controls, LOD is calculated using the count values which
are then converted into PC normalized NPX.

------------------------------------------------------------------------

#### *A note on calculating LOD from counts*

*Some assays will use count values as the LOD because the assay receives very
few counts in the negative controls. For the convenience of data processing, the
LOD in count values are converted to NPX values in the `olink_lod()` function.
The LOD value for this assay (in counts) will become many LOD values in NPX (as
extension control counts will vary across all samples). This is due to the fact
that minor changes on the counts scale can result in significant changes on the
NPX scale when working with small counts. The reason for this is that NPX is a
relative scale, which is calculated by dividing the counts of the assay by the
counts of the extension control. For example, given that the extension control
values remain constant, if a count value were to change from 1 count to 2
counts, this would be a change of 1 NPX, while a change from 1000 counts to 1001
counts would be negligible on the NPX scale.*

*Furthermore, due to the low number of counts, the NPX values calculated from
these counts do not correlate to true background levels. The converted NPX
values should not be used as LOD values for these assays.*

------------------------------------------------------------------------

The resulting LOD is the PC normalized negative control LOD. In the event that
the Olink NGS dataset is intensity normalized, an intensity normalization 
adjustment factor is applied and the resulting intensity normalized LOD is 
reported in the LOD column and the PC normalized LOD is reported in the 
PCNormalizedLOD column.

```{r NCLOD_example, eval = FALSE, message = FALSE, warning = FALSE}
# Calculate LOD from negative controls

## Calculate LOD
df_npx_nc_lod <- OlinkAnalyze::olink_lod(
  data = df_npx_clean,
  lod_method = "NCLOD",
  check_log = check_log_df_npx_clean
)

## Generate check log on data with LOD
check_log_df_npx_nc_lod <- OlinkAnalyze::check_npx(
  df = df_npx_nc_lod
)

## Clean NPX data with LOD
df_npx_nc_lod_clean <- OlinkAnalyze::clean_npx(
  df = df_npx_nc_lod,
  check_log = check_log_df_npx_nc_lod
)

## Generate check log on cleaned data with LOD
check_log_df_npx_nc_lod_clean <- OlinkAnalyze::check_npx(
  df = df_npx_nc_lod_clean
)

# Cleanup intermediate objects
rm(
  df_npx_nc_lod,
  check_log_df_npx_nc_lod
)

# Rename final cleaned data with LOD for clarity
df_npx_nc_lod <- df_npx_nc_lod_clean
check_log_df_npx_nc_lod <- check_log_df_npx_nc_lod_clean
rm(
  df_npx_nc_lod_clean,
  check_log_df_npx_nc_lod_clean
)
```

## Integrating Fixed LOD

The fixed LOD method uses fixed LOD values that have been calculated on negative
controls used in Olink reference runs using the method described above for
negative control LOD. These values are specific to the Data Analysis Reference
ID, which can be found in your dataset. The fixed LOD data is available in an
external CSV file which can be downloaded from the Document Download Center at
[olink.com](https://olink.com/knowledge/documents). The fixed LOD values
reported in this CSV file are the PC normalized LODs.

The fixed LOD file is read into the `olink_lod()` function to be integrated into
an Olink NGS dataset. In the event that the NGS dataset is intensity
normalized, an intensity normalization adjustment factor is applied and the
resulting intensity normalized LOD is reported in the LOD column and the PC
normalized LOD is reported in the PCNormalizedLOD column.

```{r FixedLOD, eval = FALSE, message = FALSE, warning = FALSE}
# Integrating fixed LOD

## Fixed LOD file path
fixedlod_filepath <- "Path_to/ExploreHT_fixedLOD.csv"

## Calculate LOD
df_npx_fixed_lod <- OlinkAnalyze::olink_lod(
  data = df_npx_clean,
  lod_method = "FixedLOD",
  lod_file_path = fixedlod_filepath,
  check_log = check_log_df_npx_clean
)

## Generate check log on data with LOD
check_log_df_npx_fixed_lod <- OlinkAnalyze::check_npx(
  df = df_npx_fixed_lod
)

## Clean NPX data with LOD
df_npx_fixed_lod_clean <- OlinkAnalyze::clean_npx(
  df = df_npx_fixed_lod,
  check_log = check_log_df_npx_fixed_lod
)

## Generate check log on cleaned data with LOD
check_log_npx_fixed_lod_clean <- OlinkAnalyze::check_npx(
  df = df_npx_fixed_lod_clean
)

# Cleanup intermediate objects
rm(
  df_npx_fixed_lod,
  check_log_df_npx_fixed_lod
)

# Rename final cleaned data with LOD for clarity
df_npx_fixed_lod <- df_npx_fixed_lod_clean
check_log_df_npx_fixed_lod <- check_log_npx_fixed_lod_clean
rm(
  df_npx_fixed_lod_clean,
  check_log_npx_fixed_lod_clean
)
```

## When to Use Fixed LOD vs NC LOD

For smaller sized studies (\<10 NCs) we recommend using fixed LOD to integrate
LOD values into your NPX dataset, as LOD calculations on fewer NCs may provide
non-accurate values. However, it is important to keep in mind that fixed LOD
values are not specific to your project, rather these values are generated by
Olink when a new lot of reagents is released.

For larger projects we recommend calculating LOD from NC to obtain LOD values
that are specific to your project. However, this requires that the dataset has
at least 10 NCs with passing SampleQC.

## Integrating Both NC LOD and Fixed LOD

There is also the option to calculate both NC LOD and fixed LOD for a data file
by setting `lod_method` to “Both”. The resulting data will have 4 additional
columns, starting with NC or Fixed to indicate the method used to calculate LOD,
followed by LOD or PCNormalizedLOD as explained above. An example of the file
format is shown below. Note that these columns will not automatically be
recognized by other functions within Olink Analyze that use LOD (for example
`olink_bridgeselector()`). To use these functions, the LOD value to be used
should have "LOD" as the column name.

```{r, echo = FALSE}
table1 |>
  dplyr::mutate(
    Normalization = "Intensity"
  ) |>
  dplyr::mutate(
    PCNormalizedNPX = round(x = .data[["NPX"]], digits = 2L)
  ) |>
  dplyr::mutate(
    PCNormalizedLOD = round(x = .data[["LOD"]], digits = 2L)
  ) |>
  dplyr::mutate(
    NPX = round(.data[["NPX"]] + 4.16, digits = 2L)
  ) |>
  dplyr::mutate(
    LOD = round(.data[["LOD"]] + 4.16, digits = 2L)
  ) |>
  dplyr::rename(
    "FixedLOD" = "LOD",
    "FixedPCNormalizedLOD" = "PCNormalizedLOD"
  ) |>
  dplyr::mutate(
    NCLOD = .data[["FixedLOD"]] - 2.34,
    NCPCNormalizedLOD = .data[["FixedPCNormalizedLOD"]] - 2.34
  ) |>
  dplyr::select(
    dplyr::all_of(
      c("SampleID", "SampleType", "OlinkID", "UniProt", "Assay", "Count", "NPX",
        "Normalization", "PCNormalizedNPX", "FixedLOD", "FixedPCNormalizedLOD",
        "NCLOD", "NCPCNormalizedLOD")
    )
  ) |>
  knitr::kable(
    caption = "Example results using both LOD calculation methods"
  ) |>
  kableExtra::kable_styling(
    font_size = 10L
  )
```

## Adjusting LOD for Intensity Normalized Data

If an Olink NGS dataset is intensity normalized, a normalization 
adjustment factor is applied to the PC normalized LOD within the `olink_lod()` 
function.

For each assay, this adjustment factor is calculated as the median NPX of all
samples (excluding Olink's external controls) within each plate. For Olink
Explore 3072, overlapping assays are assessed separately, within their
respective panels. The intensity normalized negative control LOD is calculated
by subtracting this adjustment factor from the PC normalized negative control
LOD.

The intensity normalization LOD adjustment is applied to both the negative
control and fixed LOD methods.

## Handling LOD in Bridge-Normalized Data

When using the `olink_normalization_bridge()` function to bridge two datasets,
the reference project remains unchanged throughout the bridging procedure,
including the NC LOD (calculated from negative controls in the reference
project) and the fixed LOD. In contrast, the LOD values (both NC LOD and fixed
LOD) for the non-reference project are adjusted using the same adjustment factor
that is applied to all other samples for the corresponding assay. This
adjustment factor is the median of the paired NPX differences per assay between
the bridging samples.

Consequently, for **within-product bridging**, LOD values from both the
reference and non-reference projects can be used for downstream analysis. In
contrast, for **between-product bridging**, differences in assay bridgeability
between products and the use of distinct normalization methods (median centering
versus quantile smoothing) should be taken into account. Therefore, we recommend
applying the LOD values from the reference project to the non-reference project
in between-product bridging.

## Export Olink NGS Data with LOD

Olink NGS data with LOD data can be exported using `arrow::write_parquet()` to 
export the data as a parquet file in long format.

```{r explore_npx_export, eval = FALSE, message = FALSE, warning = FALSE}
# Exporting Olink NGS data with LOD information as a parquet file

## Integrate both Negative Control LOD and fixed LOD before NPX preprocessing
df_npx_both_lod <- OlinkAnalyze::olink_lod(
  data = df_npx_clean,
  lod_file_path = fixedlod_filepath,
  lod_method = "Both",
  check_log = check_log_df_npx_clean
)

## Generate check log
check_log_df_npx_both_lod <- OlinkAnalyze::check_npx(
  df = df_npx_both_lod
)

## Clean NPX
df_npx_both_lod_clean <- OlinkAnalyze::clean_npx(
  df = df_npx_both_lod,
  check_log = check_log_df_npx_both_lod
)

## Generate check log on cleaned data
check_log_npx_both_lod_clean <- OlinkAnalyze::check_npx(
  df = df_npx_both_lod_clean
)

# Cleanup intermediate objects
rm(
  df_npx_both_lod,
  check_log_df_npx_both_lod
)

# Rename final cleaned data with both LOD for clarity
df_npx_both_lod <- df_npx_both_lod_clean
check_log_df_npx_both_lod <- check_log_npx_both_lod_clean
rm(
  df_npx_both_lod_clean,
  check_log_npx_both_lod_clean
)

# Add metadata for export
df_npx_both_lod_arrow <- df_npx_both_lod |>
  arrow::as_arrow_table()

df_npx_both_lod_arrow$metadata$FileVersion <- "NA"
df_npx_both_lod_arrow$metadata$ExploreVersion <- "NA"
df_npx_both_lod_arrow$metadata$ProjectName <- "NA"
df_npx_both_lod_arrow$metadata$SampleMatrix <- "NA"
df_npx_both_lod_arrow$metadata$DataFileType <- "Olink Analyze Export File"
# One of "ExploreHT", "Explore3072", or "Reveal"
df_npx_both_lod_arrow$metadata$ProductType <- "ExploreHT"
# # "ExploreHT", "Explore3072", or "Reveal"
df_npx_both_lod_arrow$metadata$Product <- "ExploreHT"

arrow::write_parquet(
  x = df_npx_both_lod_arrow,
  sink = "path_to_output.parquet"
)
```

## Contact Us

We are always happy to help. Email us with any questions:

-   biostat\@olink.com for statistical services and general stats
    questions

-   support\@olink.com for Olink lab product and technical support

-   info\@olink.com for more information

## Legal Disclaimer

© `r format(Sys.Date(), "%Y")` Olink Proteomics AB, part of Thermo Fisher
Scientific.

Olink products and services are For Research Use Only. Not for use in diagnostic
procedures.

All information in this document is subject to change without notice. This
document is not intended to convey any warranties, representations and/or
recommendations of any kind, unless such warranties, representations and/or
recommendations are explicitly stated.

Olink assumes no liability arising from a prospective reader’s actions based on
this document.

OLINK, NPX, PEA, PROXIMITY EXTENSION, INSIGHT and the Olink logotype are
trademarks registered, or pending registration, by Olink Proteomics AB. All
third-party trademarks are the property of their respective owners.

Olink products and assay methods are covered by several patents and patent
applications [https://www.olink.com/patents/](https://olink.com/patents/).