| Type: | Package |
| Title: | Transcriptional Disruption Score for Gene Sets |
| Version: | 1.3.1 |
| Description: | Computes pathway-level transcriptional disruption scores from differential expression p-values using permutation tests with Generalized Pareto Distribution (GPD) tail extrapolation and false discovery rate (FDR) correction. Reference: Guo P, Li H (2026) DSGE: Disruption Score of Gene Expression for Gene-Set Enrichment Analysis. https://github.com/LHJLab/DSGE. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | data.table, evd, methods, POT, Rcpp |
| LinkingTo: | Rcpp |
| Suggests: | AnnotationDbi, GO.db |
| Enhances: | KEGGREST, org.Hs.eg.db, reactome.db |
| Config/roxygen2/version: | 8.0.0 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-06-24 03:19:16 UTC; Lenovo |
| Author: | Pingwei Guo [aut, cre], Huijuan Li [aut] |
| Maintainer: | Pingwei Guo <1489413126@qq.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-30 11:00:21 UTC |
Compute DSGE (Disruption Score of Gene Expression)
Description
Computes the mean absolute z-score for all genes passing filters from differential expression analysis results. Higher DSGE indicates stronger global transcriptional perturbation.
Usage
calc_dsge(pvalue, base_mean = NULL, base_mean_cutoff = 0.1)
Arguments
pvalue |
Numeric vector of p-values from differential expression analysis (e.g., DESeq2, edgeR, Seurat FindMarkers). |
base_mean |
Numeric vector of mean expression values (same length as pvalue), e.g. DESeq2 baseMean or Seurat avg_log2FC corresponding expression level. If NULL, no expression filtering is applied. |
base_mean_cutoff |
Expression filter threshold, default 0.1. Genes with mean expression at or below this value are excluded as lowly expressed. Ignored when base_mean = NULL. |
Value
A list with elements:
dsge |
scalar, genome-wide DSGE |
n_genes |
integer, number of genes passing filters |
z_scores |
named numeric vector of per-gene raw z-scores |
Examples
# Generate random p-values (simulating differential expression results)
set.seed(42)
pvals <- runif(1000)
z <- calc_dsge(pvals)
head(z)
# With baseMean filtering
base_mean <- rexp(1000)
z <- calc_dsge(pvals, base_mean)
head(z)
Permutation test for DSGE enrichment of a single gene set
Description
Tests whether a target gene set has a significantly higher DSGE than expected by chance. Generates a null distribution via sampling without replacement and computes an empirical right-tail p-value.
Usage
dsge_perm_test(
gene_list,
pvalue,
base_mean,
gene_names,
base_mean_cutoff = 0.1,
n_perm = 10000L,
seed = NULL,
progress = TRUE,
heterogeneity = FALSE,
directional = FALSE,
direction_vec = NULL,
use_std = TRUE,
use_gpd = TRUE,
gpd_threshold = 0.99,
gpd_method = "mle",
safety_margin = 1.6,
nds_top_frac = 0.25
)
Arguments
gene_list |
Character vector of target gene identifiers (must match values in gene_names). |
pvalue |
Numeric vector of p-values from differential expression analysis (e.g., DESeq2, edgeR, Seurat FindMarkers) for all genes in the background pool. |
base_mean |
Numeric vector of mean expression values, same length as pvalue. Pass NULL to skip expression-level filtering. |
gene_names |
Character vector of gene names, same length as pvalue, must be unique. |
base_mean_cutoff |
baseMean filter threshold, default 0.1. |
n_perm |
Number of permutations, default 10000. |
seed |
Optional integer random seed for reproducibility. |
progress |
Whether to show a progress bar, default TRUE. |
heterogeneity |
Whether to compute perturbation heterogeneity
indices (Gini, CV) and two-sided p-values. Default |
directional |
Whether to compute Normalized Direction Score
(NDS) for each pathway. Default |
direction_vec |
Numeric vector of direction indicators
(e.g., log fold changes), same length as |
use_std |
Whether to compute and use standardised DSGE.
Default
When |
use_gpd |
Whether to use GPD tail extrapolation for extreme
p-values. Default |
gpd_threshold |
Tail quantile threshold for GPD fitting, between 0 and 1. Default 0.99. Lower = more tail samples (lower variance, higher bias); higher = fewer samples (higher variance, lower bias). |
gpd_method |
GPD estimation method passed to
|
safety_margin |
Safety margin for GPD support-constrained adjustment.
Default |
nds_top_frac |
Fraction of most-perturbed genes retained for
NDS calculation. Default |
Details
Algorithm steps:
Filter gene pool: baseMean > cutoff, non-missing p-value.
Convert p-values to per-gene raw z-scores.
Match
gene_listto the filtered gene pool; compute observed DSGE (mean z-score).Generate null distribution: randomly sample (without replacement) equally-sized gene sets from the pool, computing random DSGE via the same mean z-score formula, repeated n_perm times.
Empirical right-tail p-value: count(DSGE_null > DSGE_obs) / n_perm.
Value
A list with elements:
observed |
observed DSGE value |
n_genes |
number of target genes matched |
null |
permutation null distribution vector |
p_value |
right-tail p-value (GPD tail extrapolation when observed falls above the 'gpd_threshold' quantile (default 99th); empirical ECDF otherwise) |
ecdf |
empirical cumulative distribution function of the null |
dsge_std |
(only when |
nds |
(only when |
gini_observed |
(only when |
cv_observed |
(only when |
null_gini |
(only when |
null_cv |
(only when |
het_p_value |
(only when |
Examples
# Toy example with simulated data
set.seed(42)
pvals <- runif(500)
base_mean <- rexp(500)
gene_names <- paste0("gene", 1:500)
forebrain_like <- paste0("gene", 1:30)
dsge_perm_test(forebrain_like, pvals, base_mean, gene_names,
n_perm = 100, seed = 42, progress = FALSE)
Extract header lines from a GAF file
Description
Lines starting with ! in GAF files contain metadata (database
version, date, column definitions, etc.). This function extracts them
for inspecting file version and provenance.
Usage
get_gaf_header(file)
Arguments
file |
Path to the GAF file. |
Value
Character vector, one string per header line (with leading
! removed).
Examples
# Create a temporary GAF file for demonstration
gaf_file <- tempfile(fileext = ".gaf")
writeLines(c(
"!gaf-version: 2.2",
"!generated-by: R example",
paste0("UniProtKB\tP12345\tGENE_A\t\tGO:0003674\t",
"PMID:123456\tIDA\t\tF\tGene A\t\tprotein\t",
"taxon:9606\t20240101\tGO\t\t"),
paste0("UniProtKB\tP67890\tGENE_B\t\tGO:0005575\t",
"PMID:789012\tIBA\t\tC\tGene B\t\tprotein\t",
"taxon:9606\t20240101\tGO\t\t")
), gaf_file)
get_gaf_header(gaf_file)
Extract gene-pathway (GO term) associations from GAF data
Description
Splits the parsed GAF data.frame returned by
read_gaf() into a named list by GO term. Supports
filtering by qualifier, evidence code, ontology aspect, and minimum
pathway size. Optionally attaches GO term names from an OBO file.
Usage
get_pathway_genes(
gaf_data,
genes = c("db_object_id", "db_object_symbol"),
unique = TRUE,
min_size = 5L,
qualifier = NULL,
evidence = NULL,
aspect = NULL,
go_names = NULL
)
Arguments
gaf_data |
A |
genes |
Character vector of column names identifying genes.
Default |
unique |
Logical. Whether to remove duplicate gene entries within
a GO term. Default |
min_size |
Minimum gene count threshold; pathways with fewer
genes are discarded. Default |
qualifier |
Qualifier filter, e.g. |
evidence |
Evidence code filter, e.g. |
aspect |
Ontology aspect filter: |
go_names |
A |
Value
A named list where each element name is a GO term ID
(e.g. "GO:0005515") and the value is a data.frame of
associated genes. The list is sorted alphabetically by GO ID.
Examples
# Create a minimal GAF data.frame for demonstration
toy_gaf <- data.frame(
db_object_id = c("1", "2", "3", "1", "2"),
db_object_symbol = c("GENE_A", "GENE_B", "GENE_C", "GENE_A", "GENE_D"),
go_id = c("GO:0003674", "GO:0003674", "GO:0005575",
"GO:0005575", "GO:0005575"),
aspect = c("F", "F", "C", "C", "C"),
evidence_code = c("IDA", "IBA", "IDA", "IDA", "IEA"),
stringsAsFactors = FALSE
)
pathway_genes <- get_pathway_genes(toy_gaf, min_size = 1)
pathway_genes[["GO:0003674"]]
# Create a temporary GAF file for demonstration
gaf_file <- tempfile(fileext = ".gaf")
writeLines(c(
"!gaf-version: 2.2",
paste0("UniProtKB\tP12345\tGENE_A\t\tGO:0003674\t",
"PMID:123456\tIDA\t\tF\tGene A\t\tprotein\t",
"taxon:9606\t20240101\tGO\t\t"),
paste0("UniProtKB\tP67890\tGENE_B\t\tGO:0005575\t",
"PMID:789012\tIBA\t\tC\tGene B\t\tprotein\t",
"taxon:9606\t20240101\tGO\t\t")
), gaf_file)
gaf <- read_gaf(gaf_file)
# Create a temporary OBO file
obo_file <- tempfile(fileext = ".obo")
writeLines(c(
"format-version: 1.2",
"",
"[Term]",
"id: GO:0003674",
"name: molecular_function",
"namespace: molecular_function",
"",
"[Term]",
"id: GO:0005575",
"name: cellular_component",
"namespace: cellular_component"
), obo_file)
go <- read_obo(obo_file)
# Build pathway-gene mapping with GO names
pathway_genes <- get_pathway_genes(gaf, go_names = go, min_size = 1)
pathway_genes[["GO:0003674"]]
Build pathway-gene mapping from a Bioconductor OrgDb object
Description
Extracts GO term to gene mappings from an OrgDb annotation
package (e.g. org.Hs.eg.db) or an AnnotationHub record.
Returns a named list in the same format as
get_pathway_genes(), ready to pass to
pathway_dsge().
Usage
get_pathway_genes_db(
orgdb,
keytype = "ENTREZID",
gene_id_col = "db_object_id",
gene_symbol_col = "db_object_symbol",
min_size = 5L,
aspect = NULL,
evidence = NULL,
attach_go_names = TRUE,
use_goall = FALSE
)
Arguments
orgdb |
An |
keytype |
Key type used to query the |
gene_id_col |
Name for the gene ID column in the output
data.frames. Default |
gene_symbol_col |
Name for the gene symbol column in the output
data.frames. Default |
min_size |
Minimum gene count per pathway; pathways below this
are discarded. Default |
aspect |
Ontology aspect filter. |
evidence |
Evidence code filter (e.g. |
attach_go_names |
Logical. Whether to fetch GO term names and
ontology classifications via |
use_goall |
Logical. If |
Details
This function provides an alternative to get_pathway_genes()
for users who prefer Bioconductor's annotation infrastructure over
GAF + OBO files.
Value
A named list where each element is a data.frame with
columns go_name (if attached), go_namespace
(if attached), gene_id_col, and gene_symbol_col.
The list names are GO term IDs. This matches the output format of
get_pathway_genes() and can be passed directly to
pathway_dsge().
Note
This function requires the AnnotationDbi package. The
GO.db package is required when attach_go_names = TRUE
(the default). Both are Bioconductor packages; install with
BiocManager::install(c("AnnotationDbi", "GO.db")).
See Also
get_pathway_genes for the GAF-based equivalent.
Examples
# Requires additional Bioconductor database packages
if (require("org.Hs.eg.db", quietly = TRUE)) {
pw <- get_pathway_genes_db(org.Hs.eg.db)
str(pw[1:2])
}
Build KEGG pathway-gene mapping from a Bioconductor OrgDb object
Description
Extracts KEGG pathway to gene mappings from an OrgDb annotation
package (e.g. org.Hs.eg.db) or an AnnotationHub record.
Returns a named list in the same format as
get_pathway_genes(), ready to pass to
pathway_dsge().
Usage
get_pathway_genes_kegg(
orgdb,
keytype = "ENTREZID",
gene_id_col = "kegg_gene_id",
gene_symbol_col = "kegg_gene_symbol",
min_size = 5L,
attach_path_names = TRUE
)
Arguments
orgdb |
An |
keytype |
Key type used to query the |
gene_id_col |
Name for the gene ID column in the output
data.frames. Default |
gene_symbol_col |
Name for the gene symbol column in the output
data.frames. Default |
min_size |
Minimum gene count per pathway; pathways below this
are discarded. Default |
attach_path_names |
Logical. Whether to fetch pathway names via
|
Details
KEGG pathway IDs are stored in the PATH column of OrgDb objects.
IDs include an organism prefix (e.g. "hsa00010" for human,
"mmu00010" for mouse). The organism code is auto-detected from
the OrgDb species and a built-in lookup table of 15 common model
organisms.
Pathway names are fetched online via KEGGREST::keggList().
When the KEGGREST package is not available or network access
fails, names are set to NA with a warning.
Value
A named list where each element is a data.frame with
columns kegg_name (if attached), organism_code,
gene_id_col, and gene_symbol_col. The list names are
full KEGG pathway IDs including the organism prefix (e.g.
"hsa00010"). S3 class "kegg_pathway" is set on list
elements for source auto-detection by pathway_dsge().
Note
Requires the AnnotationDbi package. The KEGGREST
package is required only when attach_path_names = TRUE.
Install with:
BiocManager::install(c("AnnotationDbi", "KEGGREST")).
See Also
get_pathway_genes_db for the GO-based equivalent.
get_pathway_genes_reactome for the Reactome equivalent.
Examples
# Requires additional Bioconductor database packages
if (require("org.Hs.eg.db", quietly = TRUE)) {
pw <- get_pathway_genes_kegg(org.Hs.eg.db)
str(pw[1:2])
}
Build Reactome pathway-gene mapping from a Bioconductor OrgDb object
Description
Extracts Reactome pathway to gene mappings, using reactome.db
for pathway-to-gene associations and an OrgDb for gene symbol
resolution. Returns a named list ready to pass to
pathway_dsge().
Usage
get_pathway_genes_reactome(
orgdb,
keytype = "ENTREZID",
gene_id_col = "reactome_gene_id",
gene_symbol_col = "reactome_gene_symbol",
min_size = 5L,
attach_path_names = TRUE,
species_prefix = "R-HSA"
)
Arguments
orgdb |
An |
keytype |
Key type used to query the |
gene_id_col |
Name for the gene ID column in the output
data.frames. Default |
gene_symbol_col |
Name for the gene symbol column in the output
data.frames. Default |
min_size |
Minimum gene count per pathway; pathways below this
are discarded. Default |
attach_path_names |
Logical. Whether to fetch pathway names via
|
species_prefix |
Reactome species prefix for filtering pathways.
Default |
Details
Reactome pathway IDs follow the pattern R-HSA-<number>
(e.g. R-HSA-177929). Pathway names are fetched locally from
the reactome.db Bioconductor package in a single batched query.
Value
A named list where each element is a data.frame with
columns reactome_name (if attached), gene_id_col, and
gene_symbol_col. The list names are Reactome pathway IDs
(e.g. "R-HSA-177929"). S3 class "reactome_pathway"
is set on list elements for source auto-detection by
pathway_dsge().
Note
Requires reactome.db and AnnotationDbi. Install
with: BiocManager::install(c("AnnotationDbi", "reactome.db")).
See Also
get_pathway_genes_db for the GO-based equivalent.
get_pathway_genes_kegg for the KEGG equivalent.
Examples
# Requires additional Bioconductor database packages
if (require("org.Hs.eg.db", quietly = TRUE) &&
require("reactome.db", quietly = TRUE)) {
pw <- get_pathway_genes_reactome(org.Hs.eg.db)
str(pw[1:2])
}
Pathway-level DSGE permutation test with FDR correction
Description
Computes DSGE for each GO pathway, generates permutation null distributions grouped by number of matched genes, computes p-values using GPD tail extrapolation, and applies Benjamini-Hochberg FDR correction.
Usage
pathway_dsge(
pathway_genes,
pvalue,
base_mean = NULL,
gene_names,
gene_id_col = NULL,
source = NULL,
base_mean_cutoff = 0.1,
min_size = 5L,
max_size = 500L,
n_perm = 10000L,
seed = NULL,
return_null = FALSE,
progress = TRUE,
heterogeneity = FALSE,
directional = FALSE,
direction_vec = NULL,
use_std = TRUE,
use_gpd = TRUE,
gpd_threshold = 0.99,
gpd_method = "mle",
safety_margin = 1.6,
n_cores = 1L,
dsge_std = NULL,
p_adjust_method = "BY",
nds_top_frac = 0.25
)
Arguments
pathway_genes |
Named list returned by
|
pvalue |
Numeric vector of p-values from differential expression analysis (e.g., DESeq2, edgeR, Seurat FindMarkers) for all genes in the background pool. |
base_mean |
Numeric vector of mean expression values, same length as pvalue. Pass NULL to skip expression-level filtering. |
gene_names |
Character vector of gene names (same length as pvalue), must be unique. |
gene_id_col |
Column name in pathway_genes data.frames used to
match gene names. When |
source |
Pathway source. One of |
base_mean_cutoff |
baseMean filter threshold, default 0.1. |
min_size |
Minimum number of matched genes; pathways below this are not tested. Default 5. |
max_size |
Maximum number of matched genes; pathways above this are excluded. Default 500. Set to Inf to retain all. |
n_perm |
Number of permutations per size group, default 10000. |
seed |
Optional integer random seed. |
return_null |
Whether to include null distribution data in the
result (for plotting). Default |
progress |
Whether to show a progress bar, default TRUE. |
heterogeneity |
Whether to compute perturbation heterogeneity
indices (Gini, CV) and two-sided p-values. Default |
directional |
Whether to compute Normalized Direction Score
(NDS) for each pathway. Default |
direction_vec |
Numeric vector of direction indicators
(e.g., log fold changes), same length as |
use_std |
Whether to compute and use standardised DSGE.
Default
When |
use_gpd |
Whether to use GPD tail extrapolation for extreme
p-values. Default |
gpd_threshold |
Tail quantile threshold for GPD fitting, between 0 and 1. Default 0.99. Lower = more tail samples (lower variance, higher bias); higher = fewer samples (higher variance, lower bias). |
gpd_method |
GPD estimation method passed to
|
safety_margin |
Safety margin for GPD support-constrained adjustment.
Default |
n_cores |
Number of CPU cores for parallel null distribution
generation. Default |
dsge_std |
[Deprecated]. Use |
p_adjust_method |
Multiple testing correction method. Passed to
|
nds_top_frac |
Fraction of most-perturbed genes retained for
NDS calculation. Default |
Details
Pathways sharing the same number of matched genes reuse the same null distribution, greatly reducing computation.
**Algorithm steps (9 steps)**
-
Build gene pool: filter DESeq2 results (baseMean > cutoff), compute per-gene raw z-scores.
-
Match pathway genes: match each pathway's genes to the gene pool via the
gene_id_colcolumn. -
Filter pathways: retain pathways with
min_size <= n_matched <= max_size. -
Compute observed DSGE: per pathway as the mean of member gene z-scores: DSGE = mean(z_i).
-
Generate size-grouped null distributions: each unique matched-gene count gets its own permutation run (vectorized batch sampling), with simultaneous GPD tail fitting.
-
Compute standardised DSGE (step 6): when
use_std = TRUE, computedsge_std = (observed - mean(null)) / sd(null)for each pathway, using the size-grouped null distribution. -
Compute p-values: when
use_std = TRUE, empirical ECDF of the standardised null vs. dsge_std; whenuse_std = FALSE, GPD tail extrapolation (above thegpd_thresholdquantile, default 99th) + empirical ECDF on the raw null distribution. -
BH FDR correction: Benjamini-Hochberg multiple testing correction on all pathway p-values.
-
Sort and return: ordered by p_adj ascending.
Value
By default, a data.frame sorted by p_adj
ascending. The pathway ID and name columns depend on the source:
For GO:
go_id,go_name,aspectFor KEGG:
kegg_id,kegg_nameFor REACTOME:
reactome_id,reactome_name
All sources include: n_pathway, n_matched,
dsge, p_value, p_adj.
For GO sources, aspect is the ontology classification:
"BP" (Biological Process), "MF" (Molecular Function),
"CC" (Cellular Component).
When heterogeneity = TRUE, additionally includes:
gini, cv, het_p_value, het_p_adj.
When directional = TRUE, additionally includes:
nds (Normalized Direction Score, ranging from -1 to +1).
When use_std = TRUE (default), additionally includes:
dsge_std = (observed DSGE - mean(null)) / sd(null),
standardised using the size-grouped null distribution.
When return_null = TRUE, returns a list with:
table |
the data.frame described above |
null_raw |
named list, keys are pathway sizes (as character), values are DSGE null distribution vectors |
null_gpd |
named list, keys are pathway sizes, values are GPD fit parameters (or NULL) |
When both heterogeneity = TRUE and return_null = TRUE,
the list also includes:
null_gini_raw |
named list, keys are pathway sizes, values are Gini null distribution vectors |
null_cv_raw |
named list, keys are pathway sizes, values are CV null distribution vectors |
Examples
# Toy example with simulated data
set.seed(42)
pvals <- runif(500)
base_mean <- rexp(500)
gene_names <- paste0("gene", 1:500)
pw <- list(
pathway_A = data.frame(db_object_symbol = paste0("gene", 1:20),
go_name = "Pathway A",
stringsAsFactors = FALSE),
pathway_B = data.frame(db_object_symbol = paste0("gene", 21:40),
go_name = "Pathway B",
stringsAsFactors = FALSE)
)
result <- pathway_dsge(pw, pvals, base_mean, gene_names,
min_size = 1, n_perm = 100, seed = 42)
head(result)
Plot null distribution vs. observed DSGE for selected pathways
Description
For pathways in a pathway_dsge() result, plots the
density of the permutation null distribution with a dashed red line
marking the observed DSGE. If GPD tail fitting is available for the
size group, the tail region is highlighted in semi-transparent orange.
Usage
plot_dsge(
result,
n = 9L,
pathway_ids = NULL,
go_ids = NULL,
col_null = "#2166AC",
col_tail = "#E41A1C",
col_obs = "#D73027",
col_thr = "#999999",
safety_margin = NULL,
cex_main = 0.85,
use_std = NULL
)
Arguments
result |
List returned by |
n |
When |
pathway_ids |
Optional character vector of pathway IDs to plot
(e.g. GO IDs, KEGG IDs, or Reactome IDs depending on source). When
provided, directly plots these pathways, overriding |
go_ids |
[Deprecated]. Use |
col_null |
Color of the null distribution density curve.
Default |
col_tail |
Color of the GPD tail region highlight.
Default |
col_obs |
Color of the observed DSGE vertical line.
Default |
col_thr |
Color of the GPD threshold vertical line.
Default |
safety_margin |
Safety margin for visual GPD tail extension
in the plot. Defaults to the value from |
cex_main |
Title font scaling. Default |
use_std |
Whether to plot standardised DSGE. Defaults to
|
Value
No return value; called for its side effect (plotting).
Examples
# Build a result object with simulated data for demonstration
set.seed(42)
pvals <- runif(500)
base_mean <- rexp(500)
gene_names <- paste0("gene", 1:500)
pw <- list(
pw1 = data.frame(db_object_symbol = paste0("gene", 1:30),
go_name = "Pathway 1",
stringsAsFactors = FALSE),
pw2 = data.frame(db_object_symbol = paste0("gene", 31:60),
go_name = "Pathway 2",
stringsAsFactors = FALSE)
)
result <- pathway_dsge(pw, pvals, base_mean, gene_names,
min_size = 1, n_perm = 100, seed = 42,
return_null = TRUE)
plot_dsge(result, n = 2)
Gene-level volcano plot for a specific pathway
Description
Plots a focused volcano plot showing only the genes belonging to a specified GO pathway. Each point is one gene from the pathway, with log2 fold change on the x-axis and statistical significance (-log10 p-value) on the y-axis. This allows visual inspection of the direction, magnitude, and distribution of perturbation within a single pathway.
Usage
plot_dsge_volcano(
de_results,
dsge_result = NULL,
pathway_genes,
go_id,
logFC_col = "log2FoldChange",
pval_col = "pvalue",
padj_col = NULL,
gene_col = "geneName",
gene_id_col = "db_object_symbol",
threshold = 0.05,
padj_threshold = 0.05,
lfc_threshold = NULL,
color_up = "#CC3333",
color_down = "#3366CC",
color_nom = "#CC9933",
color_ns = "#AAAAAA",
alpha_sig = 0.9,
alpha_ns = 0.5,
label = NULL,
label_genes = NULL,
label_sig = FALSE,
cex_label = 0.7,
cex_point = 1.6,
xlab = NULL,
ylab = NULL,
main = NULL,
go_name = NULL,
...
)
Arguments
de_results |
Data.frame of differential expression results. Must contain at least log2FC, p-value, and gene identifier columns. |
dsge_result |
Optional result from |
pathway_genes |
A named list of pathway-gene mappings, as used in
|
go_id |
A single character string specifying which pathway (GO ID)
to plot. Must be a name in |
logFC_col |
Column name in |
pval_col |
Column name in |
padj_col |
Optional column name in |
gene_col |
Column name in |
gene_id_col |
Column name in |
threshold |
p-value significance threshold for the horizontal
reference line. Default |
padj_threshold |
Adjusted p-value threshold for significance
classification. Default |
lfc_threshold |
Numeric vector of logFC thresholds for vertical
reference lines (e.g. |
color_up |
Color for significantly up-regulated genes.
Default |
color_down |
Color for significantly down-regulated genes.
Default |
color_nom |
Color for nominally significant genes (raw p-value
below |
color_ns |
Color for non-significant genes.
Default |
alpha_sig |
Transparency for significant points.
Default |
alpha_ns |
Transparency for non-significant points.
Default |
label |
Whether to label genes with their names. Auto-set to
|
label_genes |
Optional character vector of specific gene names
within the pathway to label. Overrides |
label_sig |
When |
cex_label |
Text size for gene labels. Default |
cex_point |
Point size for significant genes. Non-significant
genes are plotted at |
xlab, ylab |
Axis labels. Auto-generated when |
main |
Plot title. Default |
go_name |
Optional GO term name for the title. When |
... |
Additional arguments passed to |
Value
Invisibly returns the subset of de_results for the
pathway genes (data.frame).
Examples
# Build input objects with simulated data for demonstration
set.seed(42)
de_results <- data.frame(
log2FoldChange = rnorm(100, mean = 0, sd = 1.5),
pvalue = runif(100),
geneName = paste0("GENE", 1:100),
stringsAsFactors = FALSE
)
pw <- list(
"GO:0042101" = data.frame(
db_object_symbol = paste0("GENE", 1:15),
go_name = "T cell receptor complex",
stringsAsFactors = FALSE
)
)
dsge <- pathway_dsge(pw, de_results$pvalue, de_results$pvalue,
de_results$geneName, min_size = 1, n_perm = 100,
seed = 42)
plot_dsge_volcano(de_results, dsge, pw, go_id = "GO:0042101",
logFC_col = "log2FoldChange", pval_col = "pvalue", gene_col = "geneName")
Read a Gene Ontology Annotation File (GAF)
Description
Efficiently reads a GAF 2.x format gene ontology annotation file using
data.table::fread(). Comment header lines starting with
! are automatically skipped. Returns a data.frame with all 17
standard GAF columns.
Usage
read_gaf(file, col_names = GAF_COLUMNS, ...)
Arguments
file |
Path to the GAF file. |
col_names |
Character vector of column names. Defaults to the
standard 17 GAF 2.2 column names. Pass |
... |
Additional arguments passed to |
Value
A data.frame containing the GAF annotation data.
Examples
# Create a temporary GAF file for demonstration
gaf_file <- tempfile(fileext = ".gaf")
writeLines(c(
"!gaf-version: 2.2",
paste0("UniProtKB\tP12345\tGENE_A\t\tGO:0003674\t",
"PMID:123456\tIDA\t\tF\tGene A\t\tprotein\t",
"taxon:9606\t20240101\tGO\t\t"),
paste0("UniProtKB\tP67890\tGENE_B\t\tGO:0005575\t",
"PMID:789012\tIBA\t\tC\tGene B\t\tprotein\t",
"taxon:9606\t20240101\tGO\t\t")
), gaf_file)
gaf <- read_gaf(gaf_file)
head(gaf)
Read GO term names from an OBO file
Description
Parses a Gene Ontology OBO format file (version 1.2 / 1.4), extracting
the id, name, and namespace from each [Term]
stanza. [Typedef] stanzas are skipped.
Usage
read_obo(file)
Arguments
file |
Path to the OBO file (e.g. |
Value
A data.frame with columns:
id |
GO identifier (e.g. |
name |
Human-readable term name (e.g. |
namespace |
Ontology classification: |
Examples
# Create a temporary OBO file for demonstration
obo_file <- tempfile(fileext = ".obo")
writeLines(c(
"format-version: 1.2",
"",
"[Term]",
"id: GO:0003674",
"name: molecular_function",
"namespace: molecular_function",
"",
"[Term]",
"id: GO:0005575",
"name: cellular_component",
"namespace: cellular_component",
"",
"[Term]",
"id: GO:0008150",
"name: biological_process",
"namespace: biological_process"
), obo_file)
go_names <- read_obo(obo_file)
head(go_names)