Holomics

Introduction

Holomics is an R Shiny application that allows its users to perform single- and multi-omics analyses by providing a user-friendly interface to upload the different omics datasets, select and run the implemented algorithms and finally visualize the generated results.

Holomics is mainly built on the R package mixOmics (Rohart et al. 2017), which offers numerous algorithms for the integrative analysis of omics datasets. From this repertoire, the single-omics algorithms “Principal Component Analysis” (PCA) and “Partial Least Squares Discriminant Analysis” (PLS-DA), the pairwise-omics analysis “sparse Partial Least Squares” (sPLS) and the multi-omics framework DIABLO (“Data Integration Analysis for Biomarker discovery using Latent variable approaches for Omics studies”) have been implemented in Holomics.

Getting started

Installation

For the current Holomics version it is very important that you use R 4.2. and check that mixOmics was installed with version 6.22.0.

CRAN

install.packages("Holomics")

Github

# Install devtools if it is not already installed
install.packages("devtools")
library(devtools)

# Install Holomics package 
install_github("MolinLab/Holomics")

Additional packages

You need to install the Bioconductor package separately.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("mixOmics")
BiocManager::install("BiocParallel")

I could be that you need to install some Bioconductor packages separately. Just use the code snippet above.

Start application

Within the Rstudio environment, start the application either with

library(Holomics)
run_app()

Holomics::run_app()

Data preparation

Holomics requires two input file types:

the dataset(s) with the measured values of the performed omics analysis (e.g. transcriptomics, metabolomics)
a metadata file containing the label or class information associated to the samples

Omics dataset

Omics datasets can contain molecular features measured on a continuous scale (e.g., microarray, mass spectrometry-based metabolomics) or sequence-based count data (e.g., RNA-seq, 16S rRNA amplicon sequencing), which become continuous after pre-processing and normalization.

In general, the omics dataset must be a numeric matrix (essentially a count table) in either .xlsx, .csv or .txt format, where rows represent samples and columns represent measured features (see Table 1). The first column has to contain the sample names and the first row the feature names. There are no strict restrictions on the characters and symbols used in names, but it is recommended to minimize the use of special characters.

It is important that all omics datasets used together in an analysis share the same sample names and follow the same sample order.

If the dataset contains more features (columns) than Excel allows, the matrix can also be uploaded in transposed format.

Additionally, if an omics dataset contains more than 10,000 features, it will be pre-filtered to 10,000 or fewer features, as mixOmics can process a maximum of 10,000 features per dataset (Lê Cao 2023).

Any desired or required normalization must be performed separately on each omics dataset before using Holomics.

Table 1: Example of an omics dataset with sample names in the first column and feature names in the first row.
	ASV_1855	ASV_258	ASV_1015	ASV_1018	ASV_103	ASV_1089	ASV_1101
Sample1	0	0	24	21	120	0	0
Sample2	0	90	25	0	241	8	0
Sample3	0	0	0	16	150	0	8
Sample4	0	0	14	7	307	0	0
Sample5	16	0	29	0	91	8	0
Sample6	14	0	45	20	94	0	13
Sample7	14	0	0	20	63	0	27
Sample8	12	0	0	17	95	14	9

Metadata file

The metadate file has to be in an .xlsx or .csv file format with at least two columns (see Table 2 for an example):

The first column contains the sample names, which have to be exactly the same and in the same order as in the associated omics datasets.
The second column contains the classes or labels of the samples (e.g. control, treatment1, …). The column name of the second column has to be the name of the attribute the classes describe (e.g. Quality, Treatment, …).
Optional: in the third column, a color code (HEX code or ASCII name) per class can be added, which will be used later in most of the plots. Here a link to an easy to use Color picker.

Table 2: Example of a valid metadata file with the mandatory first two columns (sample name and label information) and the optional column with the color codes.
Samples	Class	Colorcode
Sample1	control	#b3ca81
Sample2	control	#b3ca82
Sample3	control	#b3ca83
Sample4	control	#b3ca84
Sample5	treatment1	red
Sample6	treatment1	red
Sample7	treatment1	red
Sample8	treatment1	red

Workflow overview

To make use of all the functionality provided by Holomics, the workflow described below should be followed. First, datasets are uploaded, during which any necessary pre-filtering or transformation steps can be applied. Next, single-omics analysis is performed to identify key features and reduce the datasets accordingly. Using these reduced datasets, multi-omics analysis is then conducted to identify correlations between two or more datasets.

Visualisation of the <b>Holomics</b> workflow, which goes from uploading the input datasets to performing single-omics analyses including feature selection and finally going into the multi-omics analyses. Alternatively, it is possible to go directly into the multi-omics analyses.

Figure 1: Visualisation of the Holomics workflow, which goes from uploading the input datasets to performing single-omics analyses including feature selection and finally going into the multi-omics analyses. Alternatively, it is possible to go directly into the multi-omics analyses.

NOTE: If pre-filtered datasets (ideally processed earlier using Holomics) are uploaded, the dataset reduction step can be skipped, and the workflow can proceed directly to the multi-omics analysis. This shortcut can also be used if the original, unfiltered datasets are to be used in the multi-omics analysis. In either case, the “Was previously filtered” checkbox must be selected to enable this route, effectively bypassing the reduction step.

Data upload

On the data upload page the omics datasets (Figure 2) and the related metadata (including sample names and labels, see Figure 3) are uploaded. On both sub-pages, a collapsible “General information” box is given, which contains some information on how the different input files should look like. Additionally, next to some form fields question marks indicate a tooltip for a quick explanation.

Upload an omics dataset

To upload an omics dataset (e.g., a transcriptomics read count table or a metabolomics table), the “Omics data” data type must be selected.

Then the following fields need or can be filled:

File upload: Dataset as .xlsx,.csv or .txt file
Separator: Separator character that separates the data entries (in case of a xlsx file any separator can be selected)
Data name: Name for the dataset, which will be used inside the Holomics app to differentiate between the datasets
Was previously filtered: The dataset has already been filtered at an earlier stage (preferably using Holomics), the filtering steps will then be skipped
Is microbiome data: The dataset originates from microbial community analysis (e.g., an OTU/ASV table). If checked the mixMC pipeline for microbiome data (Le Cao et al. 2016) will be triggered to preprocess the data.
Has transposed format: Samples in columns and features in rows
Use for … omics analysis: Dataset can be used either for single- or multi-omics or both analysis. If “multi” is selected, then also “Was previously filterd” needs to be selected.
Name for plots: Choose or manually enter the plot name to be used for the dataset in visualizations. This plot name does not need to match the earlier file name and can even be the same for multiple datasets.

By clicking the Save button, the dataset is stored within the Holomics application, and a summary of the uploaded information is displayed in the table on the right side of the page.

Visualization of the upload page for omics datasets. On the left is an example of a completed form for a new dataset before saving, and on the right is a summary of an already uploaded dataset.

Figure 2: Visualization of the upload page for omics datasets. On the left is an example of a completed form for a new dataset before saving, and on the right is a summary of an already uploaded dataset.

Upload the metadata file

To upload the metadata file containing the labels or class information for each sample, the “Labels/Classes” data type must be selected.

Then the following fields need or can be filled:

File upload: Labels file as .xlsx or .csv file
Data name: Name for the metadata, which will be used inside the Holomics app to differentiate between them
Includes color code: Check when file includes an optional third column with color codes (either HEX codes or ASCII color names) for each sample. This custom color scheme will then be applied in most of the plots to visually distinguish the different classes

After saving, a summary of the uploaded metadata is displayed in the table on the right side of the page.

Figure 3 shows a summary of an already uploaded metadata file on the right side, and the form to upload a new or additional file on the left.

Visualization of the upload page for the metadata file. On the left side, the form for uploading new metadata files is shown, and on the right side, a summary of an already uploaded file is presented.

Figure 3: Visualization of the upload page for the metadata file. On the left side, the form for uploading new metadata files is shown, and on the right side, a summary of an already uploaded file is presented.

Single-omics analyses

Principal Component Analysis

For a first glimpse into the individual omics data and for the necessary feature selection, a Principal Component Analysis (PCA) can be done. This page (see Figure 4) is separated into two parts:

On the left side, the original—but already pre-processed and potentially pre-filtered—data are displayed. At the top left of the page, the omics dataset to be analyzed is selected, along with the corresponding metadata file, via the drop-down menus. If the selected files are not compatible, an error message will be displayed.

The “General Information” box briefly explains the key concepts of PCA and provides links to additional learning resources. In the “Analysis Parameters” box, the number of components to use for the PCA (a value between 2 and 15) can be selected. By default, the “Scaling” checkbox is activated, but it can be deactivated if the data do not require scaling.

Below, several plots display the PCA results based on the parameters selected above.

After triggering the feature selection process by clicking the “Perform feature selection” button at the top-middle of the page, the reduced dataset is displayed on the right side of the page. The number of components needed to explain at least 80% of the variance in the dataset is computed. Once this number is set, the algorithm determines how many features per component should be selected to optimize the model’s performance.

All features not used in any component are removed from the dataset (note that the original dataset remains unchanged), and the reduced dataset is then used to calculate the PCA results shown in the plots.

The “General Information” and “Reduced Dataset Parameters” boxes above the plots provide additional details about the reduced dataset and the overall feature selection process.

Additionally, the reduced dataset can be downloaded in the .xlsx file format through the “Save reduced data” button, which will appear below the “Perform feature selection” button after the feature selection process has finished. Also, the reduced dataset will automatically be saved in the running Holomics application to be used later for the multi-omics analysis.

NOTE: if the feature selection process calculates that the PCA needs more than 15 components to reach the minimum of 80% of the explained variance, the feature selection process will be aborted, as the calculation would take too long. It is then recommended to use the PLS-DA function instead.

Visualization of the PCA page. On the left side the original dataset can be inspected. On the right side the results are presented after the feature selection process.

Figure 4: Visualization of the PCA page. On the left side the original dataset can be inspected. On the right side the results are presented after the feature selection process.

Partial Least Squares Discriminant Analysis

The Partial Least Squares Discriminant Analysis (PLS-DA) can also be used for single-omics datasets, but in comparison to PCA, PLS-DA is a supervised method where the information of the corresponding class (or label) is included in its computation.

At the top left of the page (Figure 5), the omics dataset to be analyzed and the corresponding metadata file are selected. If the two are incompatible, an error message will be displayed. The “General Information” box briefly explains the concepts behind PLS-DA and provides links to additional resources. In the “Analysis Parameters” box, users can adjust the number of components (between 2 and 15) and choose whether to scale the dataset (scaling is enabled by default). The resulting plots are displayed below. These plots are similar to those in the PCA analysis, except that PLS-DA does not produce a scree plot.

To perform feature selection, click the “Perform feature selection” button located in the middle of the page. The results will appear on the right side. Unlike the PCA feature selection process, the number of components selected on the left influences the algorithm here. The algorithm tests models with 1 up to n components (where n is the selected number of components) and chooses the number that yields the lowest balanced error rate (BER). It then calculates the number of features per component and reduces the dataset accordingly, keeping only these features. Note that after feature selection, the number of components and the scaling option cannot be changed for the resulting plots.

Additionally, the reduced dataset can be downloaded as an .xlsx file by clicking the “Save reduced data” button, which appears after feature selection is complete. The reduced dataset is also automatically saved within the running Holomics application for later use in multi-omics analysis.

Visualization of the PLS-DA page. On the left side the original dataset can be examined. On the right side the results after feature selection are presented. In this case, three components where calculated (number was set on the left side) and the error rates plot shows the performance of the models using the different number of components.

Figure 5: Visualization of the PLS-DA page. On the left side the original dataset can be examined. On the right side the results after feature selection are presented. In this case, three components where calculated (number was set on the left side) and the error rates plot shows the performance of the models using the different number of components.

Pairwise omics analysis

For the pairwise omics analysis (integration of two different omics datasets), the sparse Partial Least Squares (sPLS) method is applied. The structure of the page (Figure 6) is the same as previously described in the single-omics analysis pages. On the top, the two datasets X and Y and the corresponding metadata file is selected.

On the left side of the page, general information about sPLS is provided, along with options to adjust the analysis parameters. Users can also choose between two algorithm modes: “regression” and “canonical.” In regression mode, dataset X is used to predict dataset Y, so switching the roles of the datasets (X becomes Y and vice versa) will produce different results. In contrast, canonical mode treats the datasets as interchangeable. This mode is recommended when there is no known directional dependency between the datasets. More details about these modes can be found in the mixOmics documentation.
Instead of the “Perform feature selection” button used in the PCA and PLS-DA sections, sPLS uses a “Tune parameters” button located in the middle of the page. The tuning process is similar to feature selection but includes additional steps. Based on the user-defined number of components, cross-validation is performed to calculate the Q² score for each component. The tuning step assesses the correlation between actual and predicted components by varying the number of selected features for each dataset.

The algorithm then identifies:
- The optimal number of components, defined as the last component with a total Q² greater than 0.0975.
- The optimal number of features, based on the configuration that produces the highest correlation.

The tuning process tests all components from 1 to n (as set on the left) and selects one as the “ideal” configuration.

Once tuning is complete, the results are displayed on the right side of the page using the tuned parameters. Additional general information and the final analysis parameters are shown in the boxes above the plots.

Visualization of the sPLS page depicting two omics datasets. On the left side, the datasets out of the single-omics analysis are used. On the right side the results of tuned datasets are presented.

Figure 6: Visualization of the sPLS page depicting two omics datasets. On the left side, the datasets out of the single-omics analysis are used. On the right side the results of tuned datasets are presented.

Multi-omics analysis

The multi-omics analysis is done by applying DIABLO framework of mixOmics, which can take two or more datasets as its input and tries to find any correlations between the datasets. The structure of the page (Figure 7) is the same as described in the “Pairwise omics analysis” chapter.

At the top of the page, the datasets are added to the “Select the datasets” field, and the corresponding metadata file is selected. On the left side of the page, general information is provided, along with the option to adjust analysis parameters in the “Analysis Parameters” box.

Here, users can modify the value for the design matrix, which indicates whether known or calculated correlations between the datasets should be taken into account. Currently, Holomics only supports setting a single correlation value for the entire matrix, which is then applied uniformly to all pairwise dataset correlations. This field is pre-filled with the lowest calculated correlation value but can be manually adjusted by the user.

Again, the “Tune parameters” button in the middle of the page initiates the tuning process. This process functions identically to the one described earlier for sPLS: it calculates the ideal number of components and the optimal number of features per dataset. The only difference here is that the best number of components is selected based on the overall Balanced Error Rate (BER) using the centroids.dist metric. For more information about this metric, please have a look at the Distance Metrics post on the mixOmics website.
After tuning, the tuned parameters are visualized on the right side of the page. Some general information and the resulting tuned parameters are provided in the boxes above the plots.

Visualization of the multi-omics page. On the left side, the datasets out of the single-omics analysis are used. On the right side the results of tuned datasets are presented.

Figure 7: Visualization of the multi-omics page. On the left side, the datasets out of the single-omics analysis are used. On the right side the results of tuned datasets are presented.

Help pages

The help pages that can be found in the application provide short descriptions of the used plots as well as the feature selection and tuning processes. Additionally, there are several links to the mixOmics website or to other papers, where even more detailed information is provided, if desired.

Known issues

During the development phase and thanks to our (test) user, we were able to identify some issues that can occur, but unfortunately cannot be fixed from Holomics side. Still, there are some hacks and tricks that can be performed to bypass the issues.

“Not enough margin” error

When you encounter an error message that contains “… margin errors. Ensure feature names are not too long …” please make sure that in RStudio the plotting area (“Plots” tab in (right) side menu) is wide enough.

Increase upload size

If you want to increase the upload size for the file upload you have to run the following command:

options(shiny.maxRequestSize = X*1024^2)

and replace X by the number of MB you want to upload. Afterwards you can start the Holomics application as usual.

Test datasets

Examples of transcriptomics, metabolomics and microbiomics data as well as a file with the labels and class information can be found as additional files in our paper Munk et.al. (2024). These omics data can be uploaded directly into the application after removing the first line with the table title in each case. In addition, exactly the same datasets were processed with Holomics in the papers’ described case study.

License

The Holomics package is distributed under GPL-3 (GNU GENERAL PUBLIC LICENSE version 3).

Cite

Munk, K., Ilina, D., Ziemba, L., Brader, G. & Molin, E.M. (2024). Holomics - a user-friendly R shiny application for multi-omics data integration and analysis
R package version 1.1.0 https://CRAN.R-project.org/package=Holomics

Acknowledgements

Holomics has been developed at the AIT Austrian Institute of Technology GmbH within the research project OMICS 4.0, which is funded by the Federal State of Lower Austria as part of the FTI-Strategy Lower Austria. We also would like to thank all beta testers for their valuable input and advice.

Session info

#> R version 4.5.0 (2025-04-11 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=C                    LC_CTYPE=German_Austria.utf8   
#> [3] LC_MONETARY=German_Austria.utf8 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Austria.utf8    
#> 
#> time zone: Europe/Vienna
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Holomics_1.2.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_2.0.0      dplyr_1.1.4         compiler_4.5.0     
#>  [4] promises_1.3.2      tidyselect_1.2.1    Rcpp_1.0.14        
#>  [7] shinyvalidate_0.1.3 later_1.4.2         jquerylib_0.1.4    
#> [10] yaml_2.3.10         fastmap_1.2.0       readxl_1.4.5       
#> [13] mime_0.13           R6_2.6.1            generics_0.1.4     
#> [16] knitr_1.50          htmlwidgets_1.6.4   visNetwork_2.1.2   
#> [19] tibble_3.2.1        bookdown_0.43       shiny_1.10.0       
#> [22] bslib_0.9.0         pillar_1.10.2       rlang_1.1.6        
#> [25] cachem_1.1.0        httpuv_1.6.16       xfun_0.52          
#> [28] config_0.3.2        sass_0.4.10         cli_3.6.5          
#> [31] magrittr_2.0.3      shinyWidgets_0.9.0  attempt_0.3.1      
#> [34] digest_0.6.37       rstudioapi_0.17.1   xtable_1.8-4       
#> [37] lifecycle_1.0.4     vctrs_0.6.5         evaluate_1.0.3     
#> [40] glue_1.8.0          cellranger_1.1.0    golem_0.5.1        
#> [43] rmarkdown_2.29      tools_4.5.0         pkgconfig_2.0.3    
#> [46] htmltools_0.5.8.1

References

Le Cao, Kim-Anh, Mary-Ellen Costello, Vanessa Anne Lakis, Francois Bartolo, Xin-Yi Chua, Remi Brazeilles, and Pascale Rondeau. 2016. “MixMC: A Multivariate Statistical Framework to Gain Insight into Microbial Communities.” PloS One 11 (8): e0160169.

Lê Cao, Kim-Anh. 2023. “mixOmics Vignette.” https://mixomicsteam.github.io/mixOmics-Vignette/.

Rohart, Florian, Benoit Gautier, Amrit Singh, and Kim-Anh Lê Cao. 2017. “mixOmics: An r Package for ‘Omics Feature Selection and Multiple Data Integration.” PLoS Computational Biology 13 (11): e1005752. https://mixomics.org/.