This function evaluates the association between gene expression scores and metadata variables. It uses linear modeling to get Cohen's F, and contrast-based comparisons for categorical variables to compute Cohen's D. The function generates plots summarizing the results.
Usage
Score_VariableAssociation(
data,
metadata,
cols,
method = c("logmedian", "ssGSEA", "ranking"),
gene_set,
mode = c("simple", "medium", "extensive"),
nonsignif_color = "grey",
signif_color = "red",
saturation_value = NULL,
sig_threshold = 0.05,
widthlabels = 18,
labsize = 10,
title = NULL,
titlesize = 14,
pointSize = 5,
discrete_colors = NULL,
continuous_color = "#8C6D03",
color_palette = "Set2",
printplt = TRUE
)Arguments
- data
A data frame or matrix containing gene expression data.
- metadata
A data frame containing sample metadata with at least one column corresponding to the variables of interest.
- cols
A character vector specifying metadata columns to analyse.
- method
A character string specifying the scoring method (
"logmedian","ssGSEA", or"ranking").- gene_set
A named list containing one gene set for scoring.
- mode
A character string specifying the contrast generation method (
"simple","medium","extensive"). Four methods are available:ssGSEA: Uses the single-sample Gene Set Enrichment Analysis (ssGSEA) method to compute an enrichment score for each signature in each sample using an adaptation of the
gsva()function from theGSVApackage.logmedian: Computes the score as the sum of the normalized (log2-median-centered) expression values of the signature genes divided by the number of genes in the signature.
ranking: Computes gene signature scores for each sample by ranking the expression of signature genes in the dataset and normalizing the score based on the total number of genes.
- nonsignif_color
A string specifying the color for non-significant results. Default:
"grey".- signif_color
A string specifying the color for significant results. Default:
"red".- saturation_value
A numeric value for color saturation threshold. Default:
NULL(auto-determined).- sig_threshold
A numeric value specifying the significance threshold. Default:
0.05.- widthlabels
An integer controlling contrast label wrapping. Default:
18.- labsize
An integer controlling axis text size. Default:
10.- titlesize
An integer specifying the title size. Default:
14.- pointSize
A numeric value for point size in plots. Default:
5.- discrete_colors
A named list mapping categorical variable levels to colors. Each element should be a named vector where names correspond to factor levels. Default:
NULL.- continuous_color
A string specifying the color for continuous variables. Default:
"#8C6D03".- color_palette
A string specifying the color palette for discrete variables. Default:
"Set2".- printplt
Boolean specifying if plot is to be printed. Default:
TRUE.
Value
A list with:
Overall: Data frame of effect sizes and p-values for each contrasted phenotypic variable.Contrasts: Data frame of Cohen's d and adjusted p-values for contrasts between levels of categorical variables, with the resolution of contrasts determined by the mode parameter.plot: A combined visualization with three main panels: (1) lollipop plots of Cohen's f for each variable of interest, (2) distribution plots of the score by variable (density or scatter depending on variable type), and (3, if applicable) lollipop plots of Cohen's d for contrasts in categorical variables.plot_contrasts: Lollipop plots of Cohen's d effect sizes for contrasts between levels of non numerical variables (if applicable), colored by adjusted p-value (BH).plot_overall: Lollipop plot showing Cohen's f effect sizes for each variable, colored by p-value.plot_distributions: List of density or scatter plots of the score across variable levels, depending on variable type.
