Skip to contents

This function performs PCA on a given dataset and visualizes the results using ggplot2. It allows users to specify genes of interest, customize scaling and centering, and color points based on a metadata variable.

Usage

plotPCA(
  data,
  metadata = NULL,
  genes = NULL,
  scale = FALSE,
  center = TRUE,
  PCs = list(c(1, 2)),
  ColorVariable = NULL,
  ColorValues = NULL,
  pointSize = 5,
  legend_nrow = 2,
  legend_position = c("bottom", "top", "right", "left"),
  ncol = NULL,
  nrow = NULL
)

Arguments

data

A numeric matrix or data frame where rows represent genes and columns represent samples.

metadata

A data frame containing sample metadata. The first column should contain sample names. Default is NULL.

genes

A character vector specifying genes to be included in the PCA. Default is NULL (uses all genes).

scale

Logical; if TRUE, variables are scaled before PCA. Default is FALSE.

center

Logical; if TRUE, variables are centered before PCA. Default is TRUE.

PCs

A list specifying which principal components (PCs) to plot. Default is list(c(1,2)).

ColorVariable

A character string specifying the metadata column used for coloring points. Default is NULL.

ColorValues

A vector specifying custom colors for groups in ColorVariable. Default is NULL.

pointSize

Numeric; sets the size of points in the plot. Default is 5.

legend_nrow

Integer; number of rows in the legend. Default is 2.

legend_position

Character; position of the legend ("bottom", "top", "right", "left"). Default is "bottom".

ncol

Integer; number of columns in the arranged PCA plots. Default is determined automatically.

nrow

Integer; number of rows in the arranged PCA plots. Default is determined automatically.

Value

An invisible list containing:

plt

A ggplot2 or ggarrange object displaying the PCA plot.

data

A data frame containing PCA-transformed data and sample metadata (if not NULL).

Details

The function performs PCA using prcomp() and visualizes the results using ggplot2. If a metadata data frame is provided, it ensures the sample order matches between data and metadata.

Examples

if (FALSE) { # \dontrun{
# Example dataset
set.seed(123)
data <- matrix(rnorm(1000), nrow=50, ncol=20)
colnames(data) <- paste0("Sample", 1:20)
rownames(data) <- paste0("Gene", 1:50)

metadata <- data.frame(Sample = colnames(data),
                       Group = rep(c("A", "B"), each = 10))

# Basic PCA plot
plotPCA(data, metadata, ColorVariable = "Group")
} # }