Olink® NPX datasets are normalized datasets using either plate control normalization or intensity normalization methods. Intensity normalization method assumes that all samples within a project are fully randomized.
The joint analysis of two or more Olink® NPX datasets often requires additional batch correction step to remove technical variations, which is referred to as bridging.
Bridging is needed if Olink® NPX datasets are:
plate control normalized only and run conditions (e.g lab and reagent lots) have changes.
intensity normalized but from two different sample populations.
To bridge two or more Olink® NPX datasets, bridging samples are needed to calculate adjustment factors between datasets. Bridging samples are overlapping samples between datasets. The recommended number of bridging samples for Explore 1536 datasets is between 8-16. Olink® NPX datasets without overlapping samples can not be combined to perform joint analysis using the bridging approach described below.
The following tutorial is designed to give you an overview of the kinds of data combining methods that are possible using the Olink® bridging procedure.
library(OlinkAnalyze)
library(dplyr)
library(stringr)The bridging objects are standard Olink® NPX tables. They can be
loaded using read_NPX() function with NPX manager output
file as input.
data1 <- read_NPX("~/NPX_file1_location.xlsx")
data2 <- read_NPX("~/NPX_file2_location.xlsx")To demonstrate how bridging works, we will use the example datasets
(npx_data1 and npx_data2) from
OlinkAnalyze package.
We can use olink_normalization() function to bridge two
datasets. The bridging procedure is to first calculate the
median of the paired NPX differences per assay between the
bridging samples as adjustment factor then use these adjustment factors
to adjust NPX values between two datasets. The output from
olink_normalization() function is a NPX table with adjusted
NPX value in the column NPX.
# Find overlapping samples
npx_1 <- npx_data1 %>%
mutate(dataset = "data1")
npx_2 <- npx_data2 %>%
mutate(dataset = "data2")
overlap_samples <- intersect(npx_1$SampleID, npx_2$SampleID) %>%
data.frame() %>%
filter(!str_detect(., 'CONTROL_SAMPLE')) %>% #Remove control samples
pull(.)
# Perform Bridging normalization
npx_br_data <- olink_normalization(df1 = npx_1,
df2 = npx_2,
overlapping_samples_df1 = overlap_samples,
df1_project_nr = '20200001',
df2_project_nr = '20200002',
reference_project = '20200001')
glimpse(npx_br_data)
#> Rows: 61,824
#> Columns: 19
#> $ SampleID <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "CONTROL…
#> $ Index <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
#> $ OlinkID <chr> "OID01216", "OID01216", "OID01216", "OID01216", "OID0121…
#> $ UniProt <chr> "O00533", "O00533", "O00533", "O00533", "O00533", "O0053…
#> $ Assay <chr> "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", …
#> $ MissingFreq <dbl> 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.…
#> $ Panel_Version <chr> "v.1201", "v.1201", "v.1201", "v.1201", "v.1201", "v.120…
#> $ PlateID <chr> "Example_Data_1_CAM.csv", "Example_Data_1_CAM.csv", "Exa…
#> $ QC_Warning <chr> "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", …
#> $ LOD <dbl> 2.368467, 2.368467, 2.368467, 2.368467, 2.368467, 2.3684…
#> $ NPX <dbl> 12.956143, 11.269477, 25.451070, 14.453038, 7.628712, 6.…
#> $ Subject <chr> "ID1", "ID1", "ID1", "ID2", "ID2", "ID2", "ID3", "ID3", …
#> $ Treatment <chr> "Untreated", "Untreated", "Untreated", "Untreated", "Unt…
#> $ Site <chr> "Site_D", "Site_D", "Site_D", "Site_C", "Site_C", "Site_…
#> $ Time <chr> "Baseline", "Week.6", "Week.12", "Baseline", "Week.6", "…
#> $ Project <chr> "20200001", "20200001", "20200001", "20200001", "2020000…
#> $ Panel <chr> "Olink Cardiometabolic", "Olink Cardiometabolic", "Olink…
#> $ dataset <chr> "data1", "data1", "data1", "data1", "data1", "data1", "d…
#> $ Adj_factor <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…PCA plot is used to visualize sample-to-sample distance before and after bridging.
## before bridging
### Generate unique SampleIDs
npx_1 <- npx_data1 %>%
mutate(dataset = "data1") %>%
mutate(SampleID = paste0(dataset, PlateID, SampleID))
npx_2 <- npx_data2 %>%
mutate(dataset = "data2") %>%
mutate(SampleID = paste0(dataset, PlateID, SampleID))
npx_before_br <- rbind(npx_1, npx_2)
### PCA plot
OlinkAnalyze::olink_pca_plot(df = npx_before_br, color_g = "dataset", byPanel = TRUE, coloroption = c("orange",
"darkblue"))PCA plot of combined datasets without bridging
## After bridging
### Generate unique SampleIDs
npx_after_br <- npx_br_data %>%
mutate(SampleID = paste0(dataset, PlateID, SampleID))
### PCA plot
OlinkAnalyze::olink_pca_plot(df = npx_after_br, color_g = "dataset", byPanel = TRUE, coloroption = c("orange",
"darkblue"))PCA plot of combined datasets after bridging