Skip to contents

This function calculates a gene signature score for each sample based on one or more predefined gene sets (signatures).

Usage

CalculateScores(
  data,
  metadata,
  gene_sets,
  method = c("ssGSEA", "logmedian", "ranking", "all")
)

Arguments

data

A data frame of normalized (non-transformed) counts where each row is a gene and each column is a sample. The row names should contain gene names, and the column names should contain sample identifiers. (Required)

metadata

A data frame describing the attributes of each sample. Each row corresponds to a sample and each column to an attribute. The first column of metadata should be the sample identifiers (i.e., the column names of data). Defaults to NULL if no metadata is provided.

gene_sets

Gene set input. (Required)

If using unidirectional gene sets, provide a named list where each element is a vector of gene names representing a gene signature. The names of the list elements should correspond to the labels for each signature.

If using bidirectional gene sets, provide a named list where each element is a data frame. The names of the list elements should correspond to the labels for each signature, and each data frame should contain the following structure:

  • The first column should contain gene names.

  • The second column should indicate the expected direction of enrichment (1 for upregulated genes, -1 for downregulated genes).

method

A character string indicating the scoring method to use. Options are "ssGSEA", "logmedian", "ranking", or "all" (to compute scores using all methods). Defaults to "logmedian".

Value

If a single method is chosen, a data frame containing the calculated scores for each gene signature, including metadata if provided. If method = "all", a list is returned where each element corresponds to a scoring method and contains the respective data frame of scores.

sample

The sample identifier (matching the column names of the input data).

score

The calculated gene signature score for the corresponding sample.

(metadata)

Any additional columns from the metadata data frame provided by the user, if available.

Details

This function calculates a gene signature score for each sample based on one or more predefined gene sets (signatures). Four methods are available:
ssGSEA

Uses the single-sample Gene Set Enrichment Analysis (ssGSEA) method to compute an enrichment score for each signature in each sample. This method uses an adaptation from the the gsva() function from the GSVA package to compute an enrichment score, representing the absolute enrichment of each gene set in each sample.

logmedian

Computes, for each sample, the score as the sum of the normalized (log2-median-centered) expression values of the signature genes divided by the number of genes in the signature.

ranking

Computes gene signature scores for each sample by ranking the expression of signature genes in the dataset and normalizing the score based on the total number of genes.

all

Computes gene signature scores using all three methods (ssGSEA, logmedian, and ranking). The function returns a list containing the results of each method.

Examples

if (FALSE) { # \dontrun{
  # Assume 'gene_data' is your gene expression data frame and 'sample_metadata'
  # is your metadata. Define a list of gene signatures as follows:
  gene_sets <- list(
    "Signature_A" = c("Gene1", "Gene5", "Gene10", "Gene20"),
    "Signature_B" = c("Gene2", "Gene6", "Gene15", "Gene30")
  )

  # Using the ssGSEA method:
  scores_ssgsea <- calculate_signature_score(data = gene_data,
                                             metadata = sample_metadata,
                                             gene_sets = gene_sets,
                                             method = "ssGSEA")

  # Using the logmedian method (default):
  scores_logmedian <- calculate_signature_score(data = gene_data,
                                                gene_sets = gene_sets)

  # Using all methods:
  scores_all <- calculate_signature_score(data = gene_data,
                                          metadata = sample_metadata,
                                          gene_sets = gene_sets,
                                          method = "all")
} # }