betAS command-line interface (CLI) tutorial

Mariana Ascensão-Ferreira

2024-01-15

betAS is a user-friendly R package that allows intuitive analysis and visualisation of differential alternative splicing (AS) based on beta distributions.

Beta distributions are suitable to quantify inclusion proportions of alternative sequences, using RNA sequencing reads supporting their inclusion and exclusion as surrogates for the two distribution shape parameters. Each such beta distribution has the inclusion proportion as mean value and is narrower when the read coverage is higher, facilitating the interpretability of its precision when plotted. betAS uses beta distributions to accurately model PSI values and their precision, to quantitatively and visually compare AS between groups of samples.

betAS allows the analysis of user-provided tables with AS quantifications, such as those obtained by vast-tools, rMATS or Whippet, ranking differentially spliced events by a significance metric that incorporates the compromise between the uncertainty in individual sample estimates and the variability among replicates.

Please note that this tutorial will provide an overview on how to use betAS R package for alternative splicing analyses. The cases explored in this tutorial were adapted from betAS’ original article:

Mariana Ascensão-Ferreira, Rita Martins-Silva, Nuno Saraiva-Agostinho and Nuno L. Barbosa-Morais (2023). betAS: intuitive analysis and visualisation of differential alternative splicing using beta distributionsbioRxiv

The results herein presented don’t correspond necessarily to the results presented in the manuscript. For a more detailed explanation on the methods used, please check the manuscript.

1 Installing and starting the program

Install betAS by typing the following in an R console:

devtools::install_github("DiseaseTranscriptomicsLab/betAS@dev")

After the installation, load betAS and other required packages by typing:

library(betAS)

Consider also loading the following libraries, for visualization purposes, by typing:

if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)

if(!require(plotly)) install.packages("plotly")
library(plotly)

if(!require(dplyr)) install.packages("dplyr")
library(dplyr)

if(!require(cowplot)) install.packages("cowplot")  
library(cowplot)

2 Loading alternative splicing quantification data

2.1 Load user-provided files

To load data for differential alternative splicing analyses, we can use the getDataset function. betAS accepts files sourced from vast-tools, rMATS, or whippet. These files should contain junction reads information (e.g. results obtained from the vast-tools’ tidy module are not supported).

Based on the value assigned to the tool parameter, the pathTables parameter can take the following input types:

# Example to load data from vast-tools
dataset <- getDataset(pathTables="path/to/dataset/*INCLUSION_LEVELS_FULL*.tab", tool="vast-tools")

# Example to load data from rMATS
dataset <- getDataset(pathTables="path/to/dataset/*MATS.JC.txt", tool="rMATS")

# Example to load data from whippet
dataset <- getDataset(pathTables=list("path/to/sample1/*.psi.gz",
                                      "path/to/sample2/*.psi.gz",
                                      "path/to/sample3/*.psi.gz"), tool="whippet")

The getDataset function retrieves the inclusion tables specific to the chosen tool for further analyses. However, we also need to use the getEvents function in order to format inclusion tables into a standard format that is compatible with betAS workflow.

The output from getEvents is an object with 4 attributes:

tool <- "vast-tools" # example tool, can be one of the following: "vast-tools", "rMATS", "whippet"
dataset <- getEvents(dataset, tool = tool)

2.2 Load betAS default datasets

For users who may not have files to upload and just want to explore betAS functionalities, there’s also an option to work with built-in datasets. This dataset is a subset of the Deep transcriptional profiling of longitudinal changes during neurogenesis and network maturation in vivo public mouse dataset:

Run Cell Type Div
SRR645824 ESC DIV-8
SRR645826 ESC DIV-8
SRR645828 ESC DIV-8
SRR645830 ESC DIV-8
SRR645872 Neuron DIV28
SRR645875 Neuron DIV28
SRR645877 Neuron DIV28
SRR645879 Neuron DIV28

To use the pre-built datasets, simply don’t specify the file and choose the tool you prefer.

Take into account that the dataset provided by the three different tools might not be totally the same. The events provided greatly depend on the tool itself, the type of events considered (e.g. for rMATS only exon-skipping events were considered), among others. These datasets should be used for exploration of betAS features only.

# Example to load data from vast-tools
dataset <- getDataset(pathTables=NULL, tool="vast-tools")

# Example to load data from rMATS
dataset <- getDataset(pathTables=NULL, tool="rMATS")

# Example to load data from whippet
dataset <- getDataset(pathTables=NULL, tool="whippet")

For simplicity, in this tutorial we will be using a default dataset from vast-tools to illustrate such features. At this stage, the user can explore the raw / non-filtered data by browsing in the four objects inside dataset. It might be useful to check at this stage the overall results for a given gene / event of interest.

dataset <- getDataset(pathTables=NULL, tool="vast-tools")
dataset <- getEvents(dataset, tool = "vast-tools")
gene <- "Actn4"
head(dataset$PSI[dataset$PSI$GENE==gene,])
GENE EVENT COORD LENGTH FullCO COMPLEX SRR645824 SRR645826 SRR645828 SRR645830 SRR645872 SRR645875 SRR645877 SRR645879
13530 Actn4 MmuEX6009510 chr7:28901877-28902027 151 chr7:28904541,28901877-28902027,28898728 S 100.00 99.80 100.00 100.00 100.00 100.00 100.00 100.00
45339 Actn4 MmuEX0003637 chr7:28895432-28895512 81 chr7:28896089,28895432-28895512,28894659 C3 90.99 90.44 89.74 96.25 19.44 33.33 19.15 30.51
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 60 chr7:28896089,28894911+28895121-28895180+28895186,28894659 C3 9.19 8.67 6.80 3.54 82.09 60.53 75.61 59.66
45341 Actn4 MmuEX0003639 chr7:28911490-28911575 86 chr7:28912240,28911490-28911575,28907064 C3 85.63 85.26 86.64 80.03 36.13 23.28 25.78 31.03
45342 Actn4 MmuEX0003640 chr7:28909903-28909988 86 chr7:28912240,28909903-28909988,28907064 C3 17.62 18.01 18.31 21.92 65.48 72.69 84.48 79.66
119010 Actn4 MmuEX6009503 chr7:28918672-28918791 120 chr7:28918954,28918672-28918791,28916246 S 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
event_of_interest <- "MmuEX0003638"
dataset$PSI[dataset$PSI$EVENT==event_of_interest,]
GENE EVENT COORD LENGTH FullCO COMPLEX SRR645824 SRR645826 SRR645828 SRR645830 SRR645872 SRR645875 SRR645877 SRR645879
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 60 chr7:28896089,28894911+28895121-28895180+28895186,28894659 C3 9.19 8.67 6.8 3.54 82.09 60.53 75.61 59.66

3 Filtering datasets

betAS allows to filter the dataset based on alternative splicing event types, PSI value range to consider and minimum number of supporting reads.

3.1 Filter dataset based on alternative splicing event types and minimum number of supporting reads

betAS allows to filter event types. The supported event types are in accordance to the ones documented in the alternative splicing quantification tools supported by betAS:

To explore the type of events in a dataset and their respective number, we can access the EventsPerType field in the datasetvariable.

dataset$EventsPerType
## 
##  Alt3  Alt5   ANN    C1    C2    C3    IR   MIC     S 
## 85993 61115 24750  5699  5179  5896 97042  1038 63792

betAS also allows to filter events with less than a given number of junction reads (sum of inclusion and exclusion reads) in at least one sample. To do so, the user can specify the minimum number of reads to consider in the Nparameter.

In this tutorial, we will be considering only exon skipping events (C1, C2, C3, S & MIC events from vast-tools) with at least 10 junction read counts in all samples.

dataset_filtered <- filterEvents(dataset, types=c("C1", "C2", "C3", "S", "MIC"), N=10)

cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))
## Alternative events: 195636

3.2 Filter dataset based on PSI value range

betAS allows to filter events based on their PSI value range (in percentage, i.e. PSI ∈ [0,100]). We will be keeping only alternative events, i.e. ~ 0 < PSI < 100 in all samples. If you wish to keep all of the events and don’t filter by PSI range, skip this step or put minPsi=0 and maxPsi=100.

dataset_filtered <- alternativeEvents(dataset_filtered, minPsi=1, maxPsi=99)

cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))
## Alternative events: 1404

3.3 Plotting PSI distribution across samples

After filtering the data to only keep events of interest, we can have an overview of the PSIs on the samples being analysed.

bigPicturePlot <- bigPicturePlot(table = dataset_filtered$PSI)
bigPicturePlot + theme_minimal()

4 Group definition and management from automatically detected or user-defined sample list

To allow for alternative splicing analyses between groups of interest, the user must firstly specify to what groups the samples under analyses belong to.

To check the names of the samples being used, we can access the Samples attribute inside the dataset object.

dataset_filtered$Samples
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830" "SRR645872" "SRR645875" "SRR645877" "SRR645879"

In order to ease group visualization, we can assign a color to each group. In this section we showcase how to assign a group to a given color, based on a manually defined palette. To automatically define color palettes of N colors the user might want to check Custom color palette for group definition section of this tutorial.

4.1 Automatic group definition based on sample name

We can define sample groups based on the name of the samples being used, assuming that the sample naming has some characteristics that allow to easily identify the group to which they belong to.

samples <- dataset_filtered$Samples 
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7", "#C7CEEA", "#FBE2FD", "#D9ECFE")
# Automatic group definition
not_grouped   <- samples
checked       <- c()
groups        <- LETTERS[seq(1, length = min(length(names), 26))]
used_groups   <- groups
groupList <- list()

while((length(checked) < length(names)) & length(used_groups) <= 26){
  
  groupNames  <- agrep(pattern = not_grouped[1], x = not_grouped, value = TRUE)
  checked     <- c(checked, groupNames)
  not_grouped <- not_grouped[-c(match(groupNames, not_grouped))]
  
  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = used_groups[1],
                                                   samples = groupNames,
                                                   color = random_colors[1])
  names(groupList) <- make.unique(c(currentNames, used_groups[1]))
  
  random_colors <- random_colors[-1]
  used_groups   <- used_groups[-1]
  
}

groupList
## $A
## $A$name
## [1] "A"
## 
## $A$samples
## [1] "SRR645824" "SRR645826" "SRR645828"
## 
## $A$color
## [1] "#FF9AA2"
# Visualize colors being used
slices <- rep(1, length(groupList))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groupList)], border = "black", labels=names(groupList), main = "Color palette for group definition")

4.2 Group definition based on user-defined list

We can also define groups of samples based on a given column of the metadata. We will be using betAS metadata for vast-tools built-in example to ease group creation:

data("VT2_metadata_mouse")
(metadata <- as.data.frame(VT2_metadata_mouse))
Run CellType Div
SRR645824 ESC DIV-8
SRR645826 ESC DIV-8
SRR645828 ESC DIV-8
SRR645830 ESC DIV-8
SRR645872 Neuron DIV28
SRR645875 Neuron DIV28
SRR645877 Neuron DIV28
SRR645879 Neuron DIV28

For this example we will use the cell type as a grouping variable, i.e. Neurons or Embrionic Stem Cell (ESC). This information can be accessed through the CellType column in the example metadata. The name of the samples to be used can be retrieved by accessing the Run column in the example metadata.

If using user-provided data, the user must also provide a metadata table, or ensure that the final result corresponds to groupList presented in this section.

# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "CellType"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata[,groupingVariable])
# Define vector of sample names based on the example metadata 
samples <- metadata$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2")
groupList <- list()

for(i in 1:length(groups)){

  groupNames <- samples[which(metadata[,groupingVariable] == groups[i])]

  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = groups[i],
                                           samples = groupNames,
                                           color = random_colors[i])
  names(groupList) <- make.unique(c(currentNames, groups[i]))

}

groupList
## $ESC
## $ESC$name
## [1] "ESC"
## 
## $ESC$samples
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830"
## 
## $ESC$color
## [1] "#FF9AA2"
## 
## 
## $Neuron
## $Neuron$name
## [1] "Neuron"
## 
## $Neuron$samples
## [1] "SRR645872" "SRR645875" "SRR645877" "SRR645879"
## 
## $Neuron$color
## [1] "#FFB7B2"
# Visualize colors being used
slices <- rep(1, length(groups))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")

5 Differential alternative splicing between two groups

betAS approach for alternative splicing analyses takes into account two metrics for each alternative splicing event under analyses: the magnitude of the effect (𝚫PSI = PSIbetAS(group A) - PSIbetAS(group B)) and the statistical significance of such changes. For the latter, betAS introduces three approaches: probability of differential splicing (Pdiff), F-statistic or false positive rate (FPR).

5.1 Calculate statistical metrics for differential alternative splicing & visualize results

Firstly, we need to define the groups we will be analysing. This step is particularly important when we have more than one group defined, but want to compare only two of them.

# Define groups
groupA    <- "Neuron"
groupB    <- "ESC"

# Define samples inside each group
samplesA    <- groupList[[groupA]]$samples
samplesB    <- groupList[[groupB]]$samples

# Convert samples into indexes
colsGroupA    <- convertCols(dataset_filtered$PSI, samplesA)
colsGroupB    <- convertCols(dataset_filtered$PSI, samplesB)

betAS utilizes R’s rbeta and sample functions, introducing inherent uncertainty. Running betAS twice may yield different results, particularly near significance thresholds. Variability in outcomes can be informative, signaling cases where an event’s significance is on the cusp. However, to ensure reproducibility, users can set a seed, avoiding randomization of the emitted points. By default, the seed is set to TRUE, and will be used as such for the tutorial’s reproducibility.

Moreover, the user can choose to weight the number of points emitted for beta distributions based on the coverage of each sample. With large sample sizes, uncertainty and significance of differential splicing are mostly determined by biological variability among replicate samples. However, as stated in betAS manuscript, working with small sample sizes implies a good compromise between modelling the uncertainty in the estimation of inclusion levels of alternative splicing events in individual samples and accounting for that biological variability among replicates. As such, we consider that for not-so-small sample sizes and/or a knowledgeable user intending to tune the aforementioned compromise differently, size factors proportional to coverage can be useful. This parameter should be used with care, and by default it is set to FALSE.

5.1.1 Probability of differential splicing (Pdiff)

To calculate the Pdiff for all events in our dataset, we will use the prepareTableVolcano function. To visualize the results in a volcano plot, we will use the plotvolcanoPdiff function.

options(error=recover)
volcanoTable_Pdiff <- prepareTableVolcano(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100, 
                                    seed=TRUE, 
                                    CoverageWeight = FALSE)

volcano_Pdiff <- plotVolcano(betasTable = volcanoTable_Pdiff,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_Pdiff

The above plot colors in pink and labels by default the events with a ∆PSI>0.1. However, if these are in high number, the labels might not appear. To make the plot interactive, we can use the ggplotly function.

# change size points manually
volcano_Pdiff$layers[[1]]$aes_params$size <- 2
volcano_Pdiff$layers[[2]]$aes_params$size <- 1

# Plot interactively
plotly_volcano_Pdiff <- ggplotly(volcano_Pdiff, width = 700, height = 500) %>%
                          layout(
                            font = list(size = 10),
                            xaxis = list(
                              title = list(font = list(size = 14)),  # Adjust as necessary
                              tickfont = list(size = 10)  # Adjust the tick font size here
                            ),
                            yaxis = list(
                              title = list(font = list(size = 14)),  # Adjust as necessary
                              tickfont = list(size = 10)  # Adjust the tick font size here
                            )
                          )

 
hover_text  <- paste(
  "Event: ", volcanoTable_Pdiff$EVENT,
  "<br>Pdiff: ", round(volcanoTable_Pdiff$Pdiff,3),
  "<br>Deltapsi: ", round(volcanoTable_Pdiff$deltapsi,3),
  sep = ""
)

 

plotly_volcano_Pdiff$x$data[[1]]$text <- hover_text # all points
plotly_volcano_Pdiff$x$data[[2]]$text <- NULL 
  
plotly_volcano_Pdiff

The results can also be manually inspected by browsing the volcanoTable_Pdiff table

head(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")])
GENE EVENT COORD Pdiff deltapsi
37 Gmpr MmuEX0021295 chr13:45521127-45521219 0.9845 0.1392815
98 Hnrnpd MmuEX0023099 chr5:99962134-99962204 0.9050 -0.0222313
99 Hnrnpd MmuEX0023100 chr5:99963731-99963877 0.8880 -0.0442848
144 Fmr1 MmuEX0019414 chrX:68705861-68705923 1.0000 0.6229724
161 Chtop MmuEX0001048 chr3:90505395-90505478 0.5615 0.0032424
175 Cnnm3 MmuEX0011891 chr1:36524034-36524132 1.0000 -0.5776761

5.1.2 F-statistic

To calculate the F-statistic for all events in our dataset, we will use the prepareTableVolcanoFstat function. To visualize the results in a volcano plot, we will use the plotVolcanoFstat function. The user can also use the approach presented above to make the plot interactive.

volcanoTable_Fstat <- prepareTableVolcanoFstat(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100, 
                                    seed=TRUE,
                                   CoverageWeight=F )

volcano_Fstat <- plotVolcanoFstat(betasTable = volcanoTable_Fstat,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_Fstat

head(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")])
GENE EVENT COORD Fstat deltapsi
37 Gmpr MmuEX0021295 chr13:45521127-45521219 4.599441 0.1392815
98 Hnrnpd MmuEX0023099 chr5:99962134-99962204 2.554947 -0.0222313
99 Hnrnpd MmuEX0023100 chr5:99963731-99963877 2.120957 -0.0442848
144 Fmr1 MmuEX0019414 chrX:68705861-68705923 10.682045 0.6229724
161 Chtop MmuEX0001048 chr3:90505395-90505478 0.849660 0.0032424
175 Cnnm3 MmuEX0011891 chr1:36524034-36524132 10.675033 -0.5776761

5.1.3 False positive rate (FPR)

To calculate the FPR for all events in our dataset, we will use the prepareTableVolcanoFDR function. To visualize the results in a volcano plot, we will use the plotVolcanoFDR function. The user can also use the approach presented above to make the plot interactive.

volcanoTable_FDR <- prepareTableVolcanoFDR(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100,
                                    nsim = 100, 
                                    seed=TRUE, 
                                    CoverageWeight = FALSE) 
        
volcano_FDR <- plotVolcanoFDR(betasTable = volcanoTable_FDR,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_FDR

head(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")])
GENE EVENT COORD FDR deltapsi
37 Gmpr MmuEX0021295 chr13:45521127-45521219 0.00 0.1392815
98 Hnrnpd MmuEX0023099 chr5:99962134-99962204 0.02 -0.0222313
99 Hnrnpd MmuEX0023100 chr5:99963731-99963877 0.20 -0.0442848
144 Fmr1 MmuEX0019414 chrX:68705861-68705923 0.00 0.6229724
161 Chtop MmuEX0001048 chr3:90505395-90505478 0.83 0.0032424
175 Cnnm3 MmuEX0011891 chr1:36524034-36524132 0.00 -0.5776761

5.2 Visualize individual events

To illustrate betAS approach for an individual event with differential alternative splicing between Neurons and ESC, we will be using the event MmuEX0003638. This is an exon skipping event in the Actn4 gene.

eventID <- "MmuEX0003638"

We can visualize the beta distributions for this event for each sample.

# Auxiliary data to add increment based on the event coverage to avoid dividing by zero 
data("maxDevSimulationN100")

tdensities <- plotIndividualDensitiesList(eventID = eventID,
                                          npoints = 500,
                                          psitable = dataset$PSI,
                                          qualtable = dataset$Qual,
                                          groupList = groupList,
                                          maxDevTable = maxDevSimulationN100, 
                                          seed=TRUE, 
                                          CoverageWeight = FALSE)

tdensities + theme_minimal() + ggtitle(eventID)

We can also inspect manually the different statistics for this event, and plot such results.

subset(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")], EVENT==eventID)
GENE EVENT COORD Pdiff deltapsi
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 1 -0.618374
subset(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")], EVENT==eventID)
GENE EVENT COORD Fstat deltapsi
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 11.61165 -0.618374
subset(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")], EVENT==eventID)
GENE EVENT COORD FDR deltapsi
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 0 -0.618374
plotPdiff <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000, 
                               seed=T,
                                CoverageWeight=F) %>% 
  plotPDiffFromEventObjList()

plotFstat <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000,
                               seed=T,
                             CoverageWeight=F) %>% 
  plotFstatFromEventObjList()
 
plotFDR <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000,
                               seed=T,
                             CoverageWeight=F) %>% 
  plotFDRFromEventObjList()

plot_grid(plotPdiff,plotFstat,plotFDR, ncol=3)

6 Differential alternative splicing between multiple groups

The differential AS approach implemented by betAS can be applied to multiple (i.e., more than two) groups in a novel ANOVA-inspired way that extends the Pdiff definition to the comparison of the differences between samples belonging to different biological conditions to those found between replicates.

6.1 Dataset loading and filtering

To illustrate this, we applied multiple-group betAS to the analysis of AS differences in a subset of human transcriptomes of forebrain, hindbrain, heart, kidney, liver and testis. This dataset is a subset of the Human RNA-seq time-series of the development of seven major organs public human dataset.

# load data and prepare it for betAS
data("VT1_data_human")
dataset_multgroups <- getEvents(VT1_data_human, tool = "vast-tools")

# load metadata
data("VT1_metadata_human")
metadata_multgroups <- as.data.frame(VT1_metadata_human)
metadata_multgroups
Run age developmental_stage organism_part sex
ERR2598266 14 adolescent forebrain male
ERR2598267 16 adolescent forebrain male
ERR2598268 17 adolescent forebrain male
ERR2598269 13 adolescent forebrain male
ERR2598270 13 adolescent heart male
ERR2598271 13 adolescent hindbrain male
ERR2598272 16 adolescent hindbrain male
ERR2598273 14 adolescent hindbrain male
ERR2598274 17 adolescent hindbrain male
ERR2598275 17 adolescent liver male
ERR2598276 16 adolescent testis male
ERR2598277 17 adolescent testis male
ERR2598278 14 adolescent testis male
ERR2598279 14 adolescent testis male
ERR2598280 58 elderly forebrain male
ERR2598281 58 elderly forebrain male
ERR2598282 58 elderly hindbrain male
ERR2598283 58 elderly hindbrain male
ERR2598284 58 elderly liver male
ERR2598285 58 elderly liver male
ERR2598286 55 elderly testis male
ERR2598305 50 middle adult forebrain male
ERR2598306 53 middle adult forebrain male
ERR2598307 54 middle adult heart male
ERR2598308 53 middle adult hindbrain male
ERR2598309 50 middle adult hindbrain male
ERR2598310 55 middle adult liver male
ERR2598311 46 middle adult testis male
ERR2598346 29 young adult forebrain male
ERR2598347 29 young adult forebrain male
ERR2598348 28 young adult forebrain male
ERR2598349 39 young adult forebrain male
ERR2598350 39 young adult forebrain male
ERR2598351 25 young adult heart male
ERR2598352 29 young adult hindbrain male
ERR2598353 29 young adult hindbrain male
ERR2598354 32 young adult hindbrain female
ERR2598355 39 young adult hindbrain male
ERR2598356 39 young adult hindbrain male
ERR2598357 29 young adult liver male
ERR2598358 39 young adult liver male
ERR2598359 39 young adult liver male
ERR2598360 29 young adult testis male
ERR2598361 29 young adult testis male
ERR2598362 28 young adult testis male
ERR2598363 39 young adult testis male

For this dataset, we are considering a reasonable number of samples (46). As such, in cases like this, it might be interesting to be less conservative on the filtering step and choose to keep all events irrespectively of their PSI value. However, for the purpose of this tutorial, we will keep the criteria of filtering discussed above, for computational ease of this exercise. For simplicity, we will consider only exon-skipping events.

cat(paste0("Initial number of events: ", nrow(dataset_multgroups$PSI)))
## Initial number of events: 2231
dataset_multgroups_filtered <- filterEvents(dataset_multgroups, types=c("C1", "C2", "C3", "S", "MIC"), N=10)
cat(paste0("Alternative events (after event type and coverage filtering): ", nrow(dataset_multgroups_filtered$PSI)))
## Alternative events (after event type and coverage filtering): 2197
dataset_multgroups_filtered <- alternativeEvents(dataset_multgroups_filtered, minPsi=1, maxPsi=99)
cat(paste0("Alternative events (after PSI range filtering): ", nrow(dataset_multgroups_filtered$PSI)))
## Alternative events (after PSI range filtering): 117

6.2 Group definition

We will create groups based on the samples’ tissue of origin, encoded in the organism_part metadata’s column.

# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "organism_part"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata_multgroups[,groupingVariable])
# Define vector of sample names based on the example metadata 
samples <- metadata_multgroups$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7")
groupList <- list()

for(i in 1:length(groups)){

  groupNames <- samples[which(metadata_multgroups[,groupingVariable] == groups[i])]

  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = groups[i],
                                           samples = groupNames,
                                           color = random_colors[i])
  names(groupList) <- make.unique(c(currentNames, groups[i]))

}

groupList
## $forebrain
## $forebrain$name
## [1] "forebrain"
## 
## $forebrain$samples
##  [1] "ERR2598266" "ERR2598267" "ERR2598268" "ERR2598269" "ERR2598280" "ERR2598281" "ERR2598305" "ERR2598306" "ERR2598346" "ERR2598347"
## [11] "ERR2598348" "ERR2598349" "ERR2598350"
## 
## $forebrain$color
## [1] "#FF9AA2"
## 
## 
## $heart
## $heart$name
## [1] "heart"
## 
## $heart$samples
## [1] "ERR2598270" "ERR2598307" "ERR2598351"
## 
## $heart$color
## [1] "#FFB7B2"
## 
## 
## $hindbrain
## $hindbrain$name
## [1] "hindbrain"
## 
## $hindbrain$samples
##  [1] "ERR2598271" "ERR2598272" "ERR2598273" "ERR2598274" "ERR2598282" "ERR2598283" "ERR2598308" "ERR2598309" "ERR2598352" "ERR2598353"
## [11] "ERR2598354" "ERR2598355" "ERR2598356"
## 
## $hindbrain$color
## [1] "#FFDAC1"
## 
## 
## $liver
## $liver$name
## [1] "liver"
## 
## $liver$samples
## [1] "ERR2598275" "ERR2598284" "ERR2598285" "ERR2598310" "ERR2598357" "ERR2598358" "ERR2598359"
## 
## $liver$color
## [1] "#E2F0CB"
## 
## 
## $testis
## $testis$name
## [1] "testis"
## 
## $testis$samples
##  [1] "ERR2598276" "ERR2598277" "ERR2598278" "ERR2598279" "ERR2598286" "ERR2598311" "ERR2598360" "ERR2598361" "ERR2598362" "ERR2598363"
## 
## $testis$color
## [1] "#B5EAD7"
# Visualize colors being used
slices <- rep(1, length(groups))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")

6.3 Differential alternative splicing across multiple groups/conditions

options(error=recover)
volcanoTable_multgroups <- prepareTableVolcanoMultipleGroups(psitable = dataset_multgroups_filtered$PSI,
                                                 qualtable = dataset_multgroups_filtered$Qual,
                                                 groupList = groupList,
                                                 npoints = 500,
                                                 maxDevTable = maxDevSimulationN100, seed=TRUE, CoverageWeight = F)

head(volcanoTable_multgroups[,c("GENE","EVENT","Fstat","Pdiff","Pzero","medianBetweens","deltaAbsolute")])
GENE EVENT Fstat Pdiff Pzero medianBetweens deltaAbsolute
1197 DAZAP1 HsaEX0018545 1.531845 0.6245004 0.6245004 0.0319597 0.0149702
1565 SMARCA2 HsaEX0060271 1.393718 0.5721423 0.5721423 0.0168873 0.0140413
1784 GNAS HsaEX0027841 7.605087 0.8578009 0.8578009 -0.0528636 0.2314300
2649 DNAJA3 HsaEX0020236 1.270586 0.5736325 0.5736325 0.0334410 0.0178975
2771 INTS10 HsaEX0031861 9.684621 0.8501499 0.8501499 -0.2672253 0.4973357
2790 SNRNP70 HsaEX0060707 2.220629 0.6369095 0.6369095 -0.0153788 0.1056033

6.3.1 Probability of differential splicing (Pdiff)

plotVolcano_MultipleGroups_Pdiff(betasTable = volcanoTable_multgroups)

6.3.2 F-statistic

plotVolcano_MultipleGroups_Fstat(betasTable = volcanoTable_multgroups)

6.4 Visualize individual events

eventID_mult <- "HsaEX0031861"
mult_violinplots <- plotIndividualViolinsList(eventID = eventID_mult,
                          npoints = 500,
                          psitable = dataset_multgroups_filtered$PSI,
                          qualtable = dataset_multgroups_filtered$Qual,
                          groupList = groupList,
                          maxDevTable = maxDevSimulationN100,
                          seed=TRUE,
                          CoverageWeight = F)

eventList <- prepareTableEventMultiple(eventID = eventID_mult,
                                       psitable = dataset_multgroups_filtered$PSI,
                                       qualtable = dataset_multgroups_filtered$Qual,
                                       groupList = groupList,
                                       npoints = 500,
                                       maxDevTable = maxDevSimulationN100,
                                       seed=TRUE,
                                        CoverageWeight = F)

fstat_mult <- plotFstatFromEventObjListMultiple(eventObjList = eventList)


plot_grid(fstat_mult,mult_violinplots, ncol=2, rel_widths = c(0.3, 1))

6.5 The monotony coefficient between multiple sequential groups

7 Supplementary Information

7.1 Custom color palette for group definition

In the provided examples, the color palette used for group definition was manually defined. If the user wishes to automatically generate a palette of Ncolors, the following code is advisable:

#install.packages("grDevices")
library(grDevices)

generate_pastel_colors <- function(N) {
  pastel_colors <- vector("list", length = N)
  
  for (i in 1:N) {
    # Generate random values for hue, chroma, and luminance
    hue <- runif(1, min = 0, max = 360)  # Adjust the range for a different initial hue
    chroma <- runif(1, min = 50, max = 80)  # Adjust these values for your desired pastel range
    luminance <- runif(1, min = 80, max = 90)  # Adjust these values for your desired pastel range
    
    # Create a pastel color using hcl() with the random values
    pastel_color <- hcl(hue, chroma, luminance)
    
    pastel_colors[[i]] <- pastel_color
  }
  
  return(pastel_colors)
}


# Generate 10 random pastel colors
N <- 10
random_pastel_colors <- generate_pastel_colors(N)

cat("HEX codes of generated pastel colors: ")
## HEX codes of generated pastel colors:
cat(unlist(random_pastel_colors))
## #60F6B9 #FFCF8C #CFCFFF #D3D182 #FFC5B8 #FFCE71 #6FECFB #DCE67C #FFB8CA #FFBDFF
# Prepare data for the pie chart
slices <- rep(1, N)  # Equal-sized slices for each color
# Display the pie chart with pastel colors
pie(slices, col = unlist(random_pastel_colors), border = "black", labels=unlist(random_pastel_colors), main = "Automatically generated color palette for group definition")