betAS is a user-friendly R package that allows intuitive analysis and visualisation of differential alternative splicing (AS) based on beta distributions.
Beta distributions are suitable to quantify inclusion proportions of alternative sequences, using RNA sequencing reads supporting their inclusion and exclusion as surrogates for the two distribution shape parameters. Each such beta distribution has the inclusion proportion as mean value and is narrower when the read coverage is higher, facilitating the interpretability of its precision when plotted. betAS uses beta distributions to accurately model PSI values and their precision, to quantitatively and visually compare AS between groups of samples.
betAS allows the analysis of user-provided tables with AS quantifications, such as those obtained by vast-tools, rMATS or Whippet, ranking differentially spliced events by a significance metric that incorporates the compromise between the uncertainty in individual sample estimates and the variability among replicates.
Please note that this tutorial will provide an overview on how to use betAS R package for alternative splicing analyses. The cases explored in this tutorial were adapted from betAS’ original article:
Mariana Ascensão-Ferreira, Rita Martins-Silva, Nuno Saraiva-Agostinho and Nuno L. Barbosa-Morais (2023). betAS: intuitive analysis and visualisation of differential alternative splicing using beta distributions. bioRxiv
The results herein presented don’t correspond necessarily to the results presented in the manuscript. For a more detailed explanation on the methods used, please check the manuscript.
Install betAS by typing the following in an R console:
devtools::install_github("DiseaseTranscriptomicsLab/betAS@dev")After the installation, load betAS and other required packages by typing:
library(betAS)Consider also loading the following libraries, for visualization purposes, by typing:
if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)
if(!require(plotly)) install.packages("plotly")
library(plotly)
if(!require(dplyr)) install.packages("dplyr")
library(dplyr)
if(!require(cowplot)) install.packages("cowplot")
library(cowplot)To load data for differential alternative splicing analyses, we can use the getDataset function. betAS accepts files sourced from vast-tools, rMATS, or whippet. These files should contain junction reads information (e.g. results obtained from the vast-tools’ tidy module are not supported).
Based on the value assigned to the tool parameter, the pathTables parameter can take the following input types:
tool="vast-tools": Should be a path pointing to a *INCLUSION_LEVELS_FULL*.tab file.tool="rMATS": Should be a path pointing to a *MATS.JC.txt file.tool="whippet": Should be a list of paths (minimum of 2) pointing to *.psi.gz files.# Example to load data from vast-tools
dataset <- getDataset(pathTables="path/to/dataset/*INCLUSION_LEVELS_FULL*.tab", tool="vast-tools")
# Example to load data from rMATS
dataset <- getDataset(pathTables="path/to/dataset/*MATS.JC.txt", tool="rMATS")
# Example to load data from whippet
dataset <- getDataset(pathTables=list("path/to/sample1/*.psi.gz",
"path/to/sample2/*.psi.gz",
"path/to/sample3/*.psi.gz"), tool="whippet")The getDataset function retrieves the inclusion tables specific to the chosen tool for further analyses. However, we also need to use the getEvents function in order to format inclusion tables into a standard format that is compatible with betAS workflow.
The output from getEvents is an object with 4 attributes:
PSI: A data frame with information regarding the splicing events and their inclusion levels.Qual: A data frame detailing the events and information related to the coverage associated with each event. This representation follows the vast-tools representation of inclusion (inc) and exclusion (exc) reads.EventsPerType: A named table summarising the number of events per type.Samples: Vector containing the names of the samples considered.tool <- "vast-tools" # example tool, can be one of the following: "vast-tools", "rMATS", "whippet"
dataset <- getEvents(dataset, tool = tool)For users who may not have files to upload and just want to explore betAS functionalities, there’s also an option to work with built-in datasets. This dataset is a subset of the Deep transcriptional profiling of longitudinal changes during neurogenesis and network maturation in vivo public mouse dataset:
| Run | Cell Type | Div |
|---|---|---|
| SRR645824 | ESC | DIV-8 |
| SRR645826 | ESC | DIV-8 |
| SRR645828 | ESC | DIV-8 |
| SRR645830 | ESC | DIV-8 |
| SRR645872 | Neuron | DIV28 |
| SRR645875 | Neuron | DIV28 |
| SRR645877 | Neuron | DIV28 |
| SRR645879 | Neuron | DIV28 |
To use the pre-built datasets, simply don’t specify the file and choose the tool you prefer.
Take into account that the dataset provided by the three different tools might not be totally the same. The events provided greatly depend on the tool itself, the type of events considered (e.g. for rMATS only exon-skipping events were considered), among others. These datasets should be used for exploration of betAS features only.
# Example to load data from vast-tools
dataset <- getDataset(pathTables=NULL, tool="vast-tools")
# Example to load data from rMATS
dataset <- getDataset(pathTables=NULL, tool="rMATS")
# Example to load data from whippet
dataset <- getDataset(pathTables=NULL, tool="whippet")For simplicity, in this tutorial we will be using a default dataset from vast-tools to illustrate such features. At this stage, the user can explore the raw / non-filtered data by browsing in the four objects inside dataset. It might be useful to check at this stage the overall results for a given gene / event of interest.
dataset <- getDataset(pathTables=NULL, tool="vast-tools")
dataset <- getEvents(dataset, tool = "vast-tools")gene <- "Actn4"
head(dataset$PSI[dataset$PSI$GENE==gene,])| GENE | EVENT | COORD | LENGTH | FullCO | COMPLEX | SRR645824 | SRR645826 | SRR645828 | SRR645830 | SRR645872 | SRR645875 | SRR645877 | SRR645879 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13530 | Actn4 | MmuEX6009510 | chr7:28901877-28902027 | 151 | chr7:28904541,28901877-28902027,28898728 | S | 100.00 | 99.80 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| 45339 | Actn4 | MmuEX0003637 | chr7:28895432-28895512 | 81 | chr7:28896089,28895432-28895512,28894659 | C3 | 90.99 | 90.44 | 89.74 | 96.25 | 19.44 | 33.33 | 19.15 | 30.51 |
| 45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 60 | chr7:28896089,28894911+28895121-28895180+28895186,28894659 | C3 | 9.19 | 8.67 | 6.80 | 3.54 | 82.09 | 60.53 | 75.61 | 59.66 |
| 45341 | Actn4 | MmuEX0003639 | chr7:28911490-28911575 | 86 | chr7:28912240,28911490-28911575,28907064 | C3 | 85.63 | 85.26 | 86.64 | 80.03 | 36.13 | 23.28 | 25.78 | 31.03 |
| 45342 | Actn4 | MmuEX0003640 | chr7:28909903-28909988 | 86 | chr7:28912240,28909903-28909988,28907064 | C3 | 17.62 | 18.01 | 18.31 | 21.92 | 65.48 | 72.69 | 84.48 | 79.66 |
| 119010 | Actn4 | MmuEX6009503 | chr7:28918672-28918791 | 120 | chr7:28918954,28918672-28918791,28916246 | S | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
event_of_interest <- "MmuEX0003638"
dataset$PSI[dataset$PSI$EVENT==event_of_interest,]| GENE | EVENT | COORD | LENGTH | FullCO | COMPLEX | SRR645824 | SRR645826 | SRR645828 | SRR645830 | SRR645872 | SRR645875 | SRR645877 | SRR645879 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 60 | chr7:28896089,28894911+28895121-28895180+28895186,28894659 | C3 | 9.19 | 8.67 | 6.8 | 3.54 | 82.09 | 60.53 | 75.61 | 59.66 |
betAS allows to filter the dataset based on alternative splicing event types, PSI value range to consider and minimum number of supporting reads.
betAS allows to filter event types. The supported event types are in accordance to the ones documented in the alternative splicing quantification tools supported by betAS:
types include Alt3, Alt5, ANN, C1, C2, C3, IR-C, IR-S, MIC and S as described in vast-tools documentation.types include CE, AA, AD, IR, TS, TE, AF, AL and BS as described in whippet documentation.types parameter should be set to NULLTo explore the type of events in a dataset and their respective number, we can access the EventsPerType field in the datasetvariable.
dataset$EventsPerType##
## Alt3 Alt5 ANN C1 C2 C3 IR MIC S
## 85993 61115 24750 5699 5179 5896 97042 1038 63792
betAS also allows to filter events with less than a given number of junction reads (sum of inclusion and exclusion reads) in at least one sample. To do so, the user can specify the minimum number of reads to consider in the Nparameter.
In this tutorial, we will be considering only exon skipping events (C1, C2, C3, S & MIC events from vast-tools) with at least 10 junction read counts in all samples.
dataset_filtered <- filterEvents(dataset, types=c("C1", "C2", "C3", "S", "MIC"), N=10)
cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))## Alternative events: 195636
betAS allows to filter events based on their PSI value range (in percentage, i.e. PSI ∈ [0,100]). We will be keeping only alternative events, i.e. ~ 0 < PSI < 100 in all samples. If you wish to keep all of the events and don’t filter by PSI range, skip this step or put minPsi=0 and maxPsi=100.
dataset_filtered <- alternativeEvents(dataset_filtered, minPsi=1, maxPsi=99)
cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))## Alternative events: 1404
After filtering the data to only keep events of interest, we can have an overview of the PSIs on the samples being analysed.
bigPicturePlot <- bigPicturePlot(table = dataset_filtered$PSI)
bigPicturePlot + theme_minimal()To allow for alternative splicing analyses between groups of interest, the user must firstly specify to what groups the samples under analyses belong to.
To check the names of the samples being used, we can access the Samples attribute inside the dataset object.
dataset_filtered$Samples## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830" "SRR645872" "SRR645875" "SRR645877" "SRR645879"
In order to ease group visualization, we can assign a color to each group. In this section we showcase how to assign a group to a given color, based on a manually defined palette. To automatically define color palettes of N colors the user might want to check Custom color palette for group definition section of this tutorial.
We can define sample groups based on the name of the samples being used, assuming that the sample naming has some characteristics that allow to easily identify the group to which they belong to.
samples <- dataset_filtered$Samples
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7", "#C7CEEA", "#FBE2FD", "#D9ECFE")# Automatic group definition
not_grouped <- samples
checked <- c()
groups <- LETTERS[seq(1, length = min(length(names), 26))]
used_groups <- groups
groupList <- list()
while((length(checked) < length(names)) & length(used_groups) <= 26){
groupNames <- agrep(pattern = not_grouped[1], x = not_grouped, value = TRUE)
checked <- c(checked, groupNames)
not_grouped <- not_grouped[-c(match(groupNames, not_grouped))]
# Assign new group
currentNames <- names(groupList)
groupList[[length(groupList)+1]] <- list(name = used_groups[1],
samples = groupNames,
color = random_colors[1])
names(groupList) <- make.unique(c(currentNames, used_groups[1]))
random_colors <- random_colors[-1]
used_groups <- used_groups[-1]
}
groupList## $A
## $A$name
## [1] "A"
##
## $A$samples
## [1] "SRR645824" "SRR645826" "SRR645828"
##
## $A$color
## [1] "#FF9AA2"
# Visualize colors being used
slices <- rep(1, length(groupList)) # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groupList)], border = "black", labels=names(groupList), main = "Color palette for group definition")We can also define groups of samples based on a given column of the metadata. We will be using betAS metadata for vast-tools built-in example to ease group creation:
data("VT2_metadata_mouse")
(metadata <- as.data.frame(VT2_metadata_mouse))| Run | CellType | Div |
|---|---|---|
| SRR645824 | ESC | DIV-8 |
| SRR645826 | ESC | DIV-8 |
| SRR645828 | ESC | DIV-8 |
| SRR645830 | ESC | DIV-8 |
| SRR645872 | Neuron | DIV28 |
| SRR645875 | Neuron | DIV28 |
| SRR645877 | Neuron | DIV28 |
| SRR645879 | Neuron | DIV28 |
For this example we will use the cell type as a grouping variable, i.e. Neurons or Embrionic Stem Cell (ESC). This information can be accessed through the CellType column in the example metadata. The name of the samples to be used can be retrieved by accessing the Run column in the example metadata.
If using user-provided data, the user must also provide a metadata table, or ensure that the final result corresponds to groupList presented in this section.
# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "CellType"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata[,groupingVariable])
# Define vector of sample names based on the example metadata
samples <- metadata$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2")groupList <- list()
for(i in 1:length(groups)){
groupNames <- samples[which(metadata[,groupingVariable] == groups[i])]
# Assign new group
currentNames <- names(groupList)
groupList[[length(groupList)+1]] <- list(name = groups[i],
samples = groupNames,
color = random_colors[i])
names(groupList) <- make.unique(c(currentNames, groups[i]))
}
groupList## $ESC
## $ESC$name
## [1] "ESC"
##
## $ESC$samples
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830"
##
## $ESC$color
## [1] "#FF9AA2"
##
##
## $Neuron
## $Neuron$name
## [1] "Neuron"
##
## $Neuron$samples
## [1] "SRR645872" "SRR645875" "SRR645877" "SRR645879"
##
## $Neuron$color
## [1] "#FFB7B2"
# Visualize colors being used
slices <- rep(1, length(groups)) # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")betAS approach for alternative splicing analyses takes into account two metrics for each alternative splicing event under analyses: the magnitude of the effect (𝚫PSI = PSIbetAS(group A) - PSIbetAS(group B)) and the statistical significance of such changes. For the latter, betAS introduces three approaches: probability of differential splicing (Pdiff), F-statistic or false positive rate (FPR).
Pdiff: This approach takes the two sets of random points per condition and calculates, for each AS event’s estimated ∆PSI, the proportion of differences between these that are greater than zero, which has the same interpretation as asking what proportion of beta distribution-emitted values for one condition are higher than those emitted for the other, thus reflecting the probability of differential AS of PSIbetAS(group A) being greater than PSIbetAS(group B).
F-statistic: betAS also enables an ANOVA-like analysis of variance, comparing inter- and intra-group variabilities. For each event, within is considered the set of differences between each pair of samples that are part of the same group and between the set of differences between each pair of groups. The ratio of the median absolute values of between and within therefore provides an “F-like” statistic. This metric provides a compromise between the effect size of AS differences and their significance.
FPR: Random generation of points from a beta distribution is used to estimate the null distribution’s PSI and its precision. Then, one point is randomly selected from each of the sample’s null distribution and, keeping the samples’ group assignment (i.e., which samples belong to each group), the ∆PSI between groups under the null hypothesis is calculated. The process is repeated many times (10 000 by default) and the FPR is the proportion of ∆PSI random simulations that are larger than (i.e., more extreme) or equal to the empirical ∆PSI.
Firstly, we need to define the groups we will be analysing. This step is particularly important when we have more than one group defined, but want to compare only two of them.
# Define groups
groupA <- "Neuron"
groupB <- "ESC"
# Define samples inside each group
samplesA <- groupList[[groupA]]$samples
samplesB <- groupList[[groupB]]$samples
# Convert samples into indexes
colsGroupA <- convertCols(dataset_filtered$PSI, samplesA)
colsGroupB <- convertCols(dataset_filtered$PSI, samplesB)betAS utilizes R’s rbeta and sample functions, introducing inherent uncertainty. Running betAS twice may yield different results, particularly near significance thresholds. Variability in outcomes can be informative, signaling cases where an event’s significance is on the cusp. However, to ensure reproducibility, users can set a seed, avoiding randomization of the emitted points. By default, the seed is set to TRUE, and will be used as such for the tutorial’s reproducibility.
Moreover, the user can choose to weight the number of points emitted for beta distributions based on the coverage of each sample. With large sample sizes, uncertainty and significance of differential splicing are mostly determined by biological variability among replicate samples. However, as stated in betAS manuscript, working with small sample sizes implies a good compromise between modelling the uncertainty in the estimation of inclusion levels of alternative splicing events in individual samples and accounting for that biological variability among replicates. As such, we consider that for not-so-small sample sizes and/or a knowledgeable user intending to tune the aforementioned compromise differently, size factors proportional to coverage can be useful. This parameter should be used with care, and by default it is set to FALSE.
To calculate the Pdiff for all events in our dataset, we will use the prepareTableVolcano function. To visualize the results in a volcano plot, we will use the plotvolcanoPdiff function.
options(error=recover)
volcanoTable_Pdiff <- prepareTableVolcano(psitable = dataset_filtered$PSI,
qualtable = dataset_filtered$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = FALSE)
volcano_Pdiff <- plotVolcano(betasTable = volcanoTable_Pdiff,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C")
volcano_PdiffThe above plot colors in pink and labels by default the events with a ∆PSI>0.1. However, if these are in high number, the labels might not appear. To make the plot interactive, we can use the ggplotly function.
# change size points manually
volcano_Pdiff$layers[[1]]$aes_params$size <- 2
volcano_Pdiff$layers[[2]]$aes_params$size <- 1
# Plot interactively
plotly_volcano_Pdiff <- ggplotly(volcano_Pdiff, width = 700, height = 500) %>%
layout(
font = list(size = 10),
xaxis = list(
title = list(font = list(size = 14)), # Adjust as necessary
tickfont = list(size = 10) # Adjust the tick font size here
),
yaxis = list(
title = list(font = list(size = 14)), # Adjust as necessary
tickfont = list(size = 10) # Adjust the tick font size here
)
)
hover_text <- paste(
"Event: ", volcanoTable_Pdiff$EVENT,
"<br>Pdiff: ", round(volcanoTable_Pdiff$Pdiff,3),
"<br>Deltapsi: ", round(volcanoTable_Pdiff$deltapsi,3),
sep = ""
)
plotly_volcano_Pdiff$x$data[[1]]$text <- hover_text # all points
plotly_volcano_Pdiff$x$data[[2]]$text <- NULL
plotly_volcano_PdiffThe results can also be manually inspected by browsing the volcanoTable_Pdiff table
head(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")])| GENE | EVENT | COORD | Pdiff | deltapsi | |
|---|---|---|---|---|---|
| 37 | Gmpr | MmuEX0021295 | chr13:45521127-45521219 | 0.9845 | 0.1392815 |
| 98 | Hnrnpd | MmuEX0023099 | chr5:99962134-99962204 | 0.9050 | -0.0222313 |
| 99 | Hnrnpd | MmuEX0023100 | chr5:99963731-99963877 | 0.8880 | -0.0442848 |
| 144 | Fmr1 | MmuEX0019414 | chrX:68705861-68705923 | 1.0000 | 0.6229724 |
| 161 | Chtop | MmuEX0001048 | chr3:90505395-90505478 | 0.5615 | 0.0032424 |
| 175 | Cnnm3 | MmuEX0011891 | chr1:36524034-36524132 | 1.0000 | -0.5776761 |
To calculate the F-statistic for all events in our dataset, we will use the prepareTableVolcanoFstat function. To visualize the results in a volcano plot, we will use the plotVolcanoFstat function. The user can also use the approach presented above to make the plot interactive.
volcanoTable_Fstat <- prepareTableVolcanoFstat(psitable = dataset_filtered$PSI,
qualtable = dataset_filtered$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight=F )
volcano_Fstat <- plotVolcanoFstat(betasTable = volcanoTable_Fstat,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C")
volcano_Fstathead(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")])| GENE | EVENT | COORD | Fstat | deltapsi | |
|---|---|---|---|---|---|
| 37 | Gmpr | MmuEX0021295 | chr13:45521127-45521219 | 4.599441 | 0.1392815 |
| 98 | Hnrnpd | MmuEX0023099 | chr5:99962134-99962204 | 2.554947 | -0.0222313 |
| 99 | Hnrnpd | MmuEX0023100 | chr5:99963731-99963877 | 2.120957 | -0.0442848 |
| 144 | Fmr1 | MmuEX0019414 | chrX:68705861-68705923 | 10.682045 | 0.6229724 |
| 161 | Chtop | MmuEX0001048 | chr3:90505395-90505478 | 0.849660 | 0.0032424 |
| 175 | Cnnm3 | MmuEX0011891 | chr1:36524034-36524132 | 10.675033 | -0.5776761 |
To calculate the FPR for all events in our dataset, we will use the prepareTableVolcanoFDR function. To visualize the results in a volcano plot, we will use the plotVolcanoFDR function. The user can also use the approach presented above to make the plot interactive.
volcanoTable_FDR <- prepareTableVolcanoFDR(psitable = dataset_filtered$PSI,
qualtable = dataset_filtered$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 100,
seed=TRUE,
CoverageWeight = FALSE)
volcano_FDR <- plotVolcanoFDR(betasTable = volcanoTable_FDR,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C")
volcano_FDRhead(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")])| GENE | EVENT | COORD | FDR | deltapsi | |
|---|---|---|---|---|---|
| 37 | Gmpr | MmuEX0021295 | chr13:45521127-45521219 | 0.00 | 0.1392815 |
| 98 | Hnrnpd | MmuEX0023099 | chr5:99962134-99962204 | 0.02 | -0.0222313 |
| 99 | Hnrnpd | MmuEX0023100 | chr5:99963731-99963877 | 0.20 | -0.0442848 |
| 144 | Fmr1 | MmuEX0019414 | chrX:68705861-68705923 | 0.00 | 0.6229724 |
| 161 | Chtop | MmuEX0001048 | chr3:90505395-90505478 | 0.83 | 0.0032424 |
| 175 | Cnnm3 | MmuEX0011891 | chr1:36524034-36524132 | 0.00 | -0.5776761 |
To illustrate betAS approach for an individual event with differential alternative splicing between Neurons and ESC, we will be using the event MmuEX0003638. This is an exon skipping event in the Actn4 gene.
eventID <- "MmuEX0003638"We can visualize the beta distributions for this event for each sample.
# Auxiliary data to add increment based on the event coverage to avoid dividing by zero
data("maxDevSimulationN100")
tdensities <- plotIndividualDensitiesList(eventID = eventID,
npoints = 500,
psitable = dataset$PSI,
qualtable = dataset$Qual,
groupList = groupList,
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = FALSE)
tdensities + theme_minimal() + ggtitle(eventID)We can also inspect manually the different statistics for this event, and plot such results.
subset(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")], EVENT==eventID)| GENE | EVENT | COORD | Pdiff | deltapsi | |
|---|---|---|---|---|---|
| 45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 1 | -0.618374 |
subset(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")], EVENT==eventID)| GENE | EVENT | COORD | Fstat | deltapsi | |
|---|---|---|---|---|---|
| 45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 11.61165 | -0.618374 |
subset(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")], EVENT==eventID)| GENE | EVENT | COORD | FDR | deltapsi | |
|---|---|---|---|---|---|
| 45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 0 | -0.618374 |
plotPdiff <- prepareTableEvent(eventID = eventID,
psitable = dataset$PSI,
qualtable = dataset$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 1000,
seed=T,
CoverageWeight=F) %>%
plotPDiffFromEventObjList()
plotFstat <- prepareTableEvent(eventID = eventID,
psitable = dataset$PSI,
qualtable = dataset$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 1000,
seed=T,
CoverageWeight=F) %>%
plotFstatFromEventObjList()
plotFDR <- prepareTableEvent(eventID = eventID,
psitable = dataset$PSI,
qualtable = dataset$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 1000,
seed=T,
CoverageWeight=F) %>%
plotFDRFromEventObjList()
plot_grid(plotPdiff,plotFstat,plotFDR, ncol=3)The differential AS approach implemented by betAS can be applied to multiple (i.e., more than two) groups in a novel ANOVA-inspired way that extends the Pdiff definition to the comparison of the differences between samples belonging to different biological conditions to those found between replicates.
To illustrate this, we applied multiple-group betAS to the analysis of AS differences in a subset of human transcriptomes of forebrain, hindbrain, heart, kidney, liver and testis. This dataset is a subset of the Human RNA-seq time-series of the development of seven major organs public human dataset.
# load data and prepare it for betAS
data("VT1_data_human")
dataset_multgroups <- getEvents(VT1_data_human, tool = "vast-tools")
# load metadata
data("VT1_metadata_human")
metadata_multgroups <- as.data.frame(VT1_metadata_human)
metadata_multgroups| Run | age | developmental_stage | organism_part | sex |
|---|---|---|---|---|
| ERR2598266 | 14 | adolescent | forebrain | male |
| ERR2598267 | 16 | adolescent | forebrain | male |
| ERR2598268 | 17 | adolescent | forebrain | male |
| ERR2598269 | 13 | adolescent | forebrain | male |
| ERR2598270 | 13 | adolescent | heart | male |
| ERR2598271 | 13 | adolescent | hindbrain | male |
| ERR2598272 | 16 | adolescent | hindbrain | male |
| ERR2598273 | 14 | adolescent | hindbrain | male |
| ERR2598274 | 17 | adolescent | hindbrain | male |
| ERR2598275 | 17 | adolescent | liver | male |
| ERR2598276 | 16 | adolescent | testis | male |
| ERR2598277 | 17 | adolescent | testis | male |
| ERR2598278 | 14 | adolescent | testis | male |
| ERR2598279 | 14 | adolescent | testis | male |
| ERR2598280 | 58 | elderly | forebrain | male |
| ERR2598281 | 58 | elderly | forebrain | male |
| ERR2598282 | 58 | elderly | hindbrain | male |
| ERR2598283 | 58 | elderly | hindbrain | male |
| ERR2598284 | 58 | elderly | liver | male |
| ERR2598285 | 58 | elderly | liver | male |
| ERR2598286 | 55 | elderly | testis | male |
| ERR2598305 | 50 | middle adult | forebrain | male |
| ERR2598306 | 53 | middle adult | forebrain | male |
| ERR2598307 | 54 | middle adult | heart | male |
| ERR2598308 | 53 | middle adult | hindbrain | male |
| ERR2598309 | 50 | middle adult | hindbrain | male |
| ERR2598310 | 55 | middle adult | liver | male |
| ERR2598311 | 46 | middle adult | testis | male |
| ERR2598346 | 29 | young adult | forebrain | male |
| ERR2598347 | 29 | young adult | forebrain | male |
| ERR2598348 | 28 | young adult | forebrain | male |
| ERR2598349 | 39 | young adult | forebrain | male |
| ERR2598350 | 39 | young adult | forebrain | male |
| ERR2598351 | 25 | young adult | heart | male |
| ERR2598352 | 29 | young adult | hindbrain | male |
| ERR2598353 | 29 | young adult | hindbrain | male |
| ERR2598354 | 32 | young adult | hindbrain | female |
| ERR2598355 | 39 | young adult | hindbrain | male |
| ERR2598356 | 39 | young adult | hindbrain | male |
| ERR2598357 | 29 | young adult | liver | male |
| ERR2598358 | 39 | young adult | liver | male |
| ERR2598359 | 39 | young adult | liver | male |
| ERR2598360 | 29 | young adult | testis | male |
| ERR2598361 | 29 | young adult | testis | male |
| ERR2598362 | 28 | young adult | testis | male |
| ERR2598363 | 39 | young adult | testis | male |
For this dataset, we are considering a reasonable number of samples (46). As such, in cases like this, it might be interesting to be less conservative on the filtering step and choose to keep all events irrespectively of their PSI value. However, for the purpose of this tutorial, we will keep the criteria of filtering discussed above, for computational ease of this exercise. For simplicity, we will consider only exon-skipping events.
cat(paste0("Initial number of events: ", nrow(dataset_multgroups$PSI)))## Initial number of events: 2231
dataset_multgroups_filtered <- filterEvents(dataset_multgroups, types=c("C1", "C2", "C3", "S", "MIC"), N=10)
cat(paste0("Alternative events (after event type and coverage filtering): ", nrow(dataset_multgroups_filtered$PSI)))## Alternative events (after event type and coverage filtering): 2197
dataset_multgroups_filtered <- alternativeEvents(dataset_multgroups_filtered, minPsi=1, maxPsi=99)
cat(paste0("Alternative events (after PSI range filtering): ", nrow(dataset_multgroups_filtered$PSI)))## Alternative events (after PSI range filtering): 117
We will create groups based on the samples’ tissue of origin, encoded in the organism_part metadata’s column.
# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "organism_part"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata_multgroups[,groupingVariable])
# Define vector of sample names based on the example metadata
samples <- metadata_multgroups$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7")groupList <- list()
for(i in 1:length(groups)){
groupNames <- samples[which(metadata_multgroups[,groupingVariable] == groups[i])]
# Assign new group
currentNames <- names(groupList)
groupList[[length(groupList)+1]] <- list(name = groups[i],
samples = groupNames,
color = random_colors[i])
names(groupList) <- make.unique(c(currentNames, groups[i]))
}
groupList## $forebrain
## $forebrain$name
## [1] "forebrain"
##
## $forebrain$samples
## [1] "ERR2598266" "ERR2598267" "ERR2598268" "ERR2598269" "ERR2598280" "ERR2598281" "ERR2598305" "ERR2598306" "ERR2598346" "ERR2598347"
## [11] "ERR2598348" "ERR2598349" "ERR2598350"
##
## $forebrain$color
## [1] "#FF9AA2"
##
##
## $heart
## $heart$name
## [1] "heart"
##
## $heart$samples
## [1] "ERR2598270" "ERR2598307" "ERR2598351"
##
## $heart$color
## [1] "#FFB7B2"
##
##
## $hindbrain
## $hindbrain$name
## [1] "hindbrain"
##
## $hindbrain$samples
## [1] "ERR2598271" "ERR2598272" "ERR2598273" "ERR2598274" "ERR2598282" "ERR2598283" "ERR2598308" "ERR2598309" "ERR2598352" "ERR2598353"
## [11] "ERR2598354" "ERR2598355" "ERR2598356"
##
## $hindbrain$color
## [1] "#FFDAC1"
##
##
## $liver
## $liver$name
## [1] "liver"
##
## $liver$samples
## [1] "ERR2598275" "ERR2598284" "ERR2598285" "ERR2598310" "ERR2598357" "ERR2598358" "ERR2598359"
##
## $liver$color
## [1] "#E2F0CB"
##
##
## $testis
## $testis$name
## [1] "testis"
##
## $testis$samples
## [1] "ERR2598276" "ERR2598277" "ERR2598278" "ERR2598279" "ERR2598286" "ERR2598311" "ERR2598360" "ERR2598361" "ERR2598362" "ERR2598363"
##
## $testis$color
## [1] "#B5EAD7"
# Visualize colors being used
slices <- rep(1, length(groups)) # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")options(error=recover)
volcanoTable_multgroups <- prepareTableVolcanoMultipleGroups(psitable = dataset_multgroups_filtered$PSI,
qualtable = dataset_multgroups_filtered$Qual,
groupList = groupList,
npoints = 500,
maxDevTable = maxDevSimulationN100, seed=TRUE, CoverageWeight = F)
head(volcanoTable_multgroups[,c("GENE","EVENT","Fstat","Pdiff","Pzero","medianBetweens","deltaAbsolute")])| GENE | EVENT | Fstat | Pdiff | Pzero | medianBetweens | deltaAbsolute | |
|---|---|---|---|---|---|---|---|
| 1197 | DAZAP1 | HsaEX0018545 | 1.531845 | 0.6245004 | 0.6245004 | 0.0319597 | 0.0149702 |
| 1565 | SMARCA2 | HsaEX0060271 | 1.393718 | 0.5721423 | 0.5721423 | 0.0168873 | 0.0140413 |
| 1784 | GNAS | HsaEX0027841 | 7.605087 | 0.8578009 | 0.8578009 | -0.0528636 | 0.2314300 |
| 2649 | DNAJA3 | HsaEX0020236 | 1.270586 | 0.5736325 | 0.5736325 | 0.0334410 | 0.0178975 |
| 2771 | INTS10 | HsaEX0031861 | 9.684621 | 0.8501499 | 0.8501499 | -0.2672253 | 0.4973357 |
| 2790 | SNRNP70 | HsaEX0060707 | 2.220629 | 0.6369095 | 0.6369095 | -0.0153788 | 0.1056033 |
plotVolcano_MultipleGroups_Pdiff(betasTable = volcanoTable_multgroups)plotVolcano_MultipleGroups_Fstat(betasTable = volcanoTable_multgroups)eventID_mult <- "HsaEX0031861"mult_violinplots <- plotIndividualViolinsList(eventID = eventID_mult,
npoints = 500,
psitable = dataset_multgroups_filtered$PSI,
qualtable = dataset_multgroups_filtered$Qual,
groupList = groupList,
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = F)
eventList <- prepareTableEventMultiple(eventID = eventID_mult,
psitable = dataset_multgroups_filtered$PSI,
qualtable = dataset_multgroups_filtered$Qual,
groupList = groupList,
npoints = 500,
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = F)
fstat_mult <- plotFstatFromEventObjListMultiple(eventObjList = eventList)
plot_grid(fstat_mult,mult_violinplots, ncol=2, rel_widths = c(0.3, 1))In the provided examples, the color palette used for group definition was manually defined. If the user wishes to automatically generate a palette of Ncolors, the following code is advisable:
#install.packages("grDevices")
library(grDevices)
generate_pastel_colors <- function(N) {
pastel_colors <- vector("list", length = N)
for (i in 1:N) {
# Generate random values for hue, chroma, and luminance
hue <- runif(1, min = 0, max = 360) # Adjust the range for a different initial hue
chroma <- runif(1, min = 50, max = 80) # Adjust these values for your desired pastel range
luminance <- runif(1, min = 80, max = 90) # Adjust these values for your desired pastel range
# Create a pastel color using hcl() with the random values
pastel_color <- hcl(hue, chroma, luminance)
pastel_colors[[i]] <- pastel_color
}
return(pastel_colors)
}
# Generate 10 random pastel colors
N <- 10
random_pastel_colors <- generate_pastel_colors(N)
cat("HEX codes of generated pastel colors: ")## HEX codes of generated pastel colors:
cat(unlist(random_pastel_colors))## #60F6B9 #FFCF8C #CFCFFF #D3D182 #FFC5B8 #FFCE71 #6FECFB #DCE67C #FFB8CA #FFBDFF
# Prepare data for the pie chart
slices <- rep(1, N) # Equal-sized slices for each color
# Display the pie chart with pastel colors
pie(slices, col = unlist(random_pastel_colors), border = "black", labels=unlist(random_pastel_colors), main = "Automatically generated color palette for group definition")