
Calculate Gene Signature Scores using Score-Based Approaches
Source:R/CalculateScores.R
CalculateScores.Rd
This function calculates a gene signature score for each sample based on one or more predefined gene sets (signatures).
Usage
CalculateScores(
data,
metadata,
gene_sets,
method = c("ssGSEA", "logmedian", "ranking", "all")
)
Arguments
- data
A data frame of normalized (non-transformed) counts where each row is a gene and each column is a sample. The row names should contain gene names, and the column names should contain sample identifiers. (Required)
- metadata
A data frame describing the attributes of each sample. Each row corresponds to a sample and each column to an attribute. The first column of
metadata
should be the sample identifiers (i.e., the column names ofdata
). Defaults toNULL
if no metadata is provided.- gene_sets
Gene set input. (Required)
If using unidirectional gene sets, provide a named list where each element is a vector of gene names representing a gene signature. The names of the list elements should correspond to the labels for each signature.
If using bidirectional gene sets, provide a named list where each element is a data frame. The names of the list elements should correspond to the labels for each signature, and each data frame should contain the following structure:
The first column should contain gene names.
The second column should indicate the expected direction of enrichment (1 for upregulated genes, -1 for downregulated genes).
- method
A character string indicating the scoring method to use. Options are
"ssGSEA"
,"logmedian"
,"ranking"
, or"all"
(to compute scores using all methods). Defaults to"logmedian"
.
Value
If a single method is chosen, a data frame containing the calculated scores for each gene signature, including metadata if provided.
If method = "all"
, a list is returned where each element corresponds to a scoring method and contains the respective data frame of scores.
- sample
The sample identifier (matching the column names of the input data).
- score
The calculated gene signature score for the corresponding sample.
- (metadata)
Any additional columns from the
metadata
data frame provided by the user, if available.
Details
-
This function calculates a gene signature score for each sample based on one or more predefined gene sets
(signatures). Four methods are available:
ssGSEA
Uses the single-sample Gene Set Enrichment Analysis (ssGSEA) method to compute an enrichment score for each signature in each sample. This method uses an adaptation from the the
gsva()
function from theGSVA
package to compute an enrichment score, representing the absolute enrichment of each gene set in each sample.logmedian
Computes, for each sample, the score as the sum of the normalized (log2-median-centered) expression values of the signature genes divided by the number of genes in the signature.
ranking
Computes gene signature scores for each sample by ranking the expression of signature genes in the dataset and normalizing the score based on the total number of genes.
all
Computes gene signature scores using all three methods (
ssGSEA
,logmedian
, andranking
). The function returns a list containing the results of each method.
Examples
if (FALSE) { # \dontrun{
# Assume 'gene_data' is your gene expression data frame and 'sample_metadata'
# is your metadata. Define a list of gene signatures as follows:
gene_sets <- list(
"Signature_A" = c("Gene1", "Gene5", "Gene10", "Gene20"),
"Signature_B" = c("Gene2", "Gene6", "Gene15", "Gene30")
)
# Using the ssGSEA method:
scores_ssgsea <- calculate_signature_score(data = gene_data,
metadata = sample_metadata,
gene_sets = gene_sets,
method = "ssGSEA")
# Using the logmedian method (default):
scores_logmedian <- calculate_signature_score(data = gene_data,
gene_sets = gene_sets)
# Using all methods:
scores_all <- calculate_signature_score(data = gene_data,
metadata = sample_metadata,
gene_sets = gene_sets,
method = "all")
} # }