betAS is a user-friendly R package that allows intuitive analysis and visualisation of differential alternative splicing (AS) based on beta distributions.

Beta distributions are suitable to quantify inclusion proportions of alternative sequences, using RNA sequencing reads supporting their inclusion and exclusion as surrogates for the two distribution shape parameters. Each such beta distribution has the inclusion proportion as mean value and is narrower when the read coverage is higher, facilitating the interpretability of its precision when plotted. betAS uses beta distributions to accurately model PSI values and their precision, to quantitatively and visually compare AS between groups of samples.

betAS allows the analysis of user-provided tables with AS quantifications, such as those obtained by vast-tools, rMATS or Whippet, ranking differentially spliced events by a significance metric that incorporates the compromise between the uncertainty in individual sample estimates and the variability among replicates.

1 Installing and starting the program

Install betAS by typing the following in an R console:

devtools::install_github("DiseaseTranscriptomicsLab/betAS@dev")

After the installation, load betAS and other required packages by typing:

library(betAS)

Consider also loading the following libraries, for visualization purposes, by typing:

if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)

if(!require(plotly)) install.packages("plotly")
library(plotly)

if(!require(dplyr)) install.packages("dplyr")
library(dplyr)

if(!require(cowplot)) install.packages("cowplot")  
library(cowplot)

2 Loading alternative splicing quantification data

2.1 Load user-provided files

To load data for differential alternative splicing analyses, we can use the getDataset function. betAS accepts files sourced from vast-tools, rMATS, or whippet. These files should contain junction reads information (e.g. results obtained from the vast-tools’ tidy module are not supported).

Based on the value assigned to the tool parameter, the pathTables parameter can take the following input types:

tool="vast-tools": Should be a path pointing to a *INCLUSION_LEVELS_FULL*.tab file.
tool="rMATS": Should be a path pointing to a *MATS.JC.txt file.
tool="whippet": Should be a list of paths (minimum of 2) pointing to *.psi.gz files.

# Example to load data from vast-tools
dataset <- getDataset(pathTables="path/to/dataset/*INCLUSION_LEVELS_FULL*.tab", tool="vast-tools")

# Example to load data from rMATS
dataset <- getDataset(pathTables="path/to/dataset/*MATS.JC.txt", tool="rMATS")

# Example to load data from whippet
dataset <- getDataset(pathTables=list("path/to/sample1/*.psi.gz",
                                      "path/to/sample2/*.psi.gz",
                                      "path/to/sample3/*.psi.gz"), tool="whippet")

The getDataset function retrieves the inclusion tables specific to the chosen tool for further analyses. However, we also need to use the getEvents function in order to format inclusion tables into a standard format that is compatible with betAS workflow.

The output from getEvents is an object with 4 attributes:

PSI: A data frame with information regarding the splicing events and their inclusion levels.
Qual: A data frame detailing the events and information related to the coverage associated with each event. This representation follows the vast-tools representation of inclusion (inc) and exclusion (exc) reads.
EventsPerType: A named table summarising the number of events per type.
Samples: Vector containing the names of the samples considered.

tool <- "vast-tools" # example tool, can be one of the following: "vast-tools", "rMATS", "whippet"
dataset <- getEvents(dataset, tool = tool)

2.2 Load betAS default datasets

For users who may not have files to upload and just want to explore betAS functionalities, there’s also an option to work with built-in datasets. This dataset is a subset of the Deep transcriptional profiling of longitudinal changes during neurogenesis and network maturation in vivo public mouse dataset:

Run	Cell Type	Div
SRR645824	ESC	DIV-8
SRR645826	ESC	DIV-8
SRR645828	ESC	DIV-8
SRR645830	ESC	DIV-8
SRR645872	Neuron	DIV28
SRR645875	Neuron	DIV28
SRR645877	Neuron	DIV28
SRR645879	Neuron	DIV28

To use the pre-built datasets, simply don’t specify the file and choose the tool you prefer.

Take into account that the dataset provided by the three different tools might not be totally the same. The events provided greatly depend on the tool itself, the type of events considered (e.g. for rMATS only exon-skipping events were considered), among others. These datasets should be used for exploration of betAS features only.

# Example to load data from vast-tools
dataset <- getDataset(pathTables=NULL, tool="vast-tools")

# Example to load data from rMATS
dataset <- getDataset(pathTables=NULL, tool="rMATS")

# Example to load data from whippet
dataset <- getDataset(pathTables=NULL, tool="whippet")

For simplicity, in this tutorial we will be using a default dataset from vast-tools to illustrate such features. At this stage, the user can explore the raw / non-filtered data by browsing in the four objects inside dataset. It might be useful to check at this stage the overall results for a given gene / event of interest.

dataset <- getDataset(pathTables=NULL, tool="vast-tools")
dataset <- getEvents(dataset, tool = "vast-tools")

gene <- "Actn4"
head(dataset$PSI[dataset$PSI$GENE==gene,])

	GENE	EVENT	COORD	LENGTH	FullCO	COMPLEX	SRR645824	SRR645826	SRR645828	SRR645830	SRR645872	SRR645875	SRR645877	SRR645879
13530	Actn4	MmuEX6009510	chr7:28901877-28902027	151	chr7:28904541,28901877-28902027,28898728	S	100.00	99.80	100.00	100.00	100.00	100.00	100.00	100.00
45339	Actn4	MmuEX0003637	chr7:28895432-28895512	81	chr7:28896089,28895432-28895512,28894659	C3	90.99	90.44	89.74	96.25	19.44	33.33	19.15	30.51
45340	Actn4	MmuEX0003638	chr7:28895121-28895180	60	chr7:28896089,28894911+28895121-28895180+28895186,28894659	C3	9.19	8.67	6.80	3.54	82.09	60.53	75.61	59.66
45341	Actn4	MmuEX0003639	chr7:28911490-28911575	86	chr7:28912240,28911490-28911575,28907064	C3	85.63	85.26	86.64	80.03	36.13	23.28	25.78	31.03
45342	Actn4	MmuEX0003640	chr7:28909903-28909988	86	chr7:28912240,28909903-28909988,28907064	C3	17.62	18.01	18.31	21.92	65.48	72.69	84.48	79.66
119010	Actn4	MmuEX6009503	chr7:28918672-28918791	120	chr7:28918954,28918672-28918791,28916246	S	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00

event_of_interest <- "MmuEX0003638"
dataset$PSI[dataset$PSI$EVENT==event_of_interest,]

	GENE	EVENT	COORD	LENGTH	FullCO	COMPLEX	SRR645824	SRR645826	SRR645828	SRR645830	SRR645872	SRR645875	SRR645877	SRR645879
45340	Actn4	MmuEX0003638	chr7:28895121-28895180	60	chr7:28896089,28894911+28895121-28895180+28895186,28894659	C3	9.19	8.67	6.8	3.54	82.09	60.53	75.61	59.66

3 Filtering datasets

betAS allows to filter the dataset based on alternative splicing event types, PSI value range to consider and minimum number of supporting reads.

3.1 Filter dataset based on alternative splicing event types and minimum number of supporting reads

betAS allows to filter event types. The supported event types are in accordance to the ones documented in the alternative splicing quantification tools supported by betAS:

For vast-tools, valid types include Alt3, Alt5, ANN, C1, C2, C3, IR-C, IR-S, MIC and S as described in vast-tools documentation.
For Whippet, valid types include CE, AA, AD, IR, TS, TE, AF, AL and BS as described in whippet documentation.
For rMATS, as each output file from this tool considers only one event, the types parameter should be set to NULL

To explore the type of events in a dataset and their respective number, we can access the EventsPerType field in the datasetvariable.

dataset$EventsPerType

## 
##  Alt3  Alt5   ANN    C1    C2    C3    IR   MIC     S 
## 85993 61115 24750  5699  5179  5896 97042  1038 63792

betAS also allows to filter events with less than a given number of junction reads (sum of inclusion and exclusion reads) in at least one sample. To do so, the user can specify the minimum number of reads to consider in the Nparameter.

In this tutorial, we will be considering only exon skipping events (C1, C2, C3, S & MIC events from vast-tools) with at least 10 junction read counts in all samples.

dataset_filtered <- filterEvents(dataset, types=c("C1", "C2", "C3", "S", "MIC"), N=10)

cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))

## Alternative events: 195636

3.2 Filter dataset based on PSI value range

betAS allows to filter events based on their PSI value range (in percentage, i.e. PSI ∈ [0,100]). We will be keeping only alternative events, i.e. ~ 0 < PSI < 100 in all samples. If you wish to keep all of the events and don’t filter by PSI range, skip this step or put minPsi=0 and maxPsi=100.

dataset_filtered <- alternativeEvents(dataset_filtered, minPsi=1, maxPsi=99)

cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))

## Alternative events: 1404

3.3 Plotting PSI distribution across samples

After filtering the data to only keep events of interest, we can have an overview of the PSIs on the samples being analysed.

bigPicturePlot <- bigPicturePlot(table = dataset_filtered$PSI)
bigPicturePlot + theme_minimal()

4 Group definition and management from automatically detected or user-defined sample list

To allow for alternative splicing analyses between groups of interest, the user must firstly specify to what groups the samples under analyses belong to.

To check the names of the samples being used, we can access the Samples attribute inside the dataset object.

dataset_filtered$Samples

## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830" "SRR645872" "SRR645875" "SRR645877" "SRR645879"

In order to ease group visualization, we can assign a color to each group. In this section we showcase how to assign a group to a given color, based on a manually defined palette. To automatically define color palettes of N colors the user might want to check Custom color palette for group definition section of this tutorial.

4.1 Automatic group definition based on sample name

We can define sample groups based on the name of the samples being used, assuming that the sample naming has some characteristics that allow to easily identify the group to which they belong to.

samples <- dataset_filtered$Samples 
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7", "#C7CEEA", "#FBE2FD", "#D9ECFE")

# Automatic group definition
not_grouped   <- samples
checked       <- c()
groups        <- LETTERS[seq(1, length = min(length(names), 26))]
used_groups   <- groups
groupList <- list()

while((length(checked) < length(names)) & length(used_groups) <= 26){
  
  groupNames  <- agrep(pattern = not_grouped[1], x = not_grouped, value = TRUE)
  checked     <- c(checked, groupNames)
  not_grouped <- not_grouped[-c(match(groupNames, not_grouped))]
  
  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = used_groups[1],
                                                   samples = groupNames,
                                                   color = random_colors[1])
  names(groupList) <- make.unique(c(currentNames, used_groups[1]))
  
  random_colors <- random_colors[-1]
  used_groups   <- used_groups[-1]
  
}

groupList

## $A
## $A$name
## [1] "A"
## 
## $A$samples
## [1] "SRR645824" "SRR645826" "SRR645828"
## 
## $A$color
## [1] "#FF9AA2"

# Visualize colors being used
slices <- rep(1, length(groupList))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groupList)], border = "black", labels=names(groupList), main = "Color palette for group definition")

4.2 Group definition based on user-defined list

We can also define groups of samples based on a given column of the metadata. We will be using betAS metadata for vast-tools built-in example to ease group creation:

data("VT2_metadata_mouse")
(metadata <- as.data.frame(VT2_metadata_mouse))

Run	CellType	Div
SRR645824	ESC	DIV-8
SRR645826	ESC	DIV-8
SRR645828	ESC	DIV-8
SRR645830	ESC	DIV-8
SRR645872	Neuron	DIV28
SRR645875	Neuron	DIV28
SRR645877	Neuron	DIV28
SRR645879	Neuron	DIV28

For this example we will use the cell type as a grouping variable, i.e. Neurons or Embrionic Stem Cell (ESC). This information can be accessed through the CellType column in the example metadata. The name of the samples to be used can be retrieved by accessing the Run column in the example metadata.

If using user-provided data, the user must also provide a metadata table, or ensure that the final result corresponds to groupList presented in this section.

# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "CellType"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata[,groupingVariable])
# Define vector of sample names based on the example metadata 
samples <- metadata$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2")

groupList <- list()

for(i in 1:length(groups)){

  groupNames <- samples[which(metadata[,groupingVariable] == groups[i])]

  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = groups[i],
                                           samples = groupNames,
                                           color = random_colors[i])
  names(groupList) <- make.unique(c(currentNames, groups[i]))

}

groupList

## $ESC
## $ESC$name
## [1] "ESC"
## 
## $ESC$samples
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830"
## 
## $ESC$color
## [1] "#FF9AA2"
## 
## 
## $Neuron
## $Neuron$name
## [1] "Neuron"
## 
## $Neuron$samples
## [1] "SRR645872" "SRR645875" "SRR645877" "SRR645879"
## 
## $Neuron$color
## [1] "#FFB7B2"

# Visualize colors being used
slices <- rep(1, length(groups))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")

5 Differential alternative splicing between two groups

betAS approach for alternative splicing analyses takes into account two metrics for each alternative splicing event under analyses: the magnitude of the effect (𝚫PSI = PSI_betAS(group A) - PSI_betAS(group B)) and the statistical significance of such changes. For the latter, betAS introduces three approaches: probability of differential splicing (Pdiff), F-statistic or false positive rate (FPR).

Pdiff: This approach takes the two sets of random points per condition and calculates, for each AS event’s estimated ∆PSI, the proportion of differences between these that are greater than zero, which has the same interpretation as asking what proportion of beta distribution-emitted values for one condition are higher than those emitted for the other, thus reflecting the probability of differential AS of PSI_betAS(group A) being greater than PSI_betAS(group B).
F-statistic: betAS also enables an ANOVA-like analysis of variance, comparing inter- and intra-group variabilities. For each event, within is considered the set of differences between each pair of samples that are part of the same group and between the set of differences between each pair of groups. The ratio of the median absolute values of between and within therefore provides an “F-like” statistic. This metric provides a compromise between the effect size of AS differences and their significance.
FPR: Random generation of points from a beta distribution is used to estimate the null distribution’s PSI and its precision. Then, one point is randomly selected from each of the sample’s null distribution and, keeping the samples’ group assignment (i.e., which samples belong to each group), the ∆PSI between groups under the null hypothesis is calculated. The process is repeated many times (10 000 by default) and the FPR is the proportion of ∆PSI random simulations that are larger than (i.e., more extreme) or equal to the empirical ∆PSI.

5.1 Calculate statistical metrics for differential alternative splicing & visualize results

Firstly, we need to define the groups we will be analysing. This step is particularly important when we have more than one group defined, but want to compare only two of them.

# Define groups
groupA    <- "Neuron"
groupB    <- "ESC"

# Define samples inside each group
samplesA    <- groupList[[groupA]]$samples
samplesB    <- groupList[[groupB]]$samples

# Convert samples into indexes
colsGroupA    <- convertCols(dataset_filtered$PSI, samplesA)
colsGroupB    <- convertCols(dataset_filtered$PSI, samplesB)

betAS utilizes R’s rbeta and sample functions, introducing inherent uncertainty. Running betAS twice may yield different results, particularly near significance thresholds. Variability in outcomes can be informative, signaling cases where an event’s significance is on the cusp. However, to ensure reproducibility, users can set a seed, avoiding randomization of the emitted points. By default, the seed is set to TRUE, and will be used as such for the tutorial’s reproducibility.

Moreover, the user can choose to weight the number of points emitted for beta distributions based on the coverage of each sample. With large sample sizes, uncertainty and significance of differential splicing are mostly determined by biological variability among replicate samples. However, as stated in betAS manuscript, working with small sample sizes implies a good compromise between modelling the uncertainty in the estimation of inclusion levels of alternative splicing events in individual samples and accounting for that biological variability among replicates. As such, we consider that for not-so-small sample sizes and/or a knowledgeable user intending to tune the aforementioned compromise differently, size factors proportional to coverage can be useful. This parameter should be used with care, and by default it is set to FALSE.

5.1.1 Probability of differential splicing (Pdiff)

To calculate the Pdiff for all events in our dataset, we will use the prepareTableVolcano function. To visualize the results in a volcano plot, we will use the plotvolcanoPdiff function.

options(error=recover)
volcanoTable_Pdiff <- prepareTableVolcano(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100, 
                                    seed=TRUE, 
                                    CoverageWeight = FALSE)

volcano_Pdiff <- plotVolcano(betasTable = volcanoTable_Pdiff,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_Pdiff

The above plot colors in pink and labels by default the events with a ∆PSI>0.1. However, if these are in high number, the labels might not appear. To make the plot interactive, we can use the ggplotly function.

# change size points manually
volcano_Pdiff$layers[[1]]$aes_params$size <- 2
volcano_Pdiff$layers[[2]]$aes_params$size <- 1

# Plot interactively
plotly_volcano_Pdiff <- ggplotly(volcano_Pdiff, width = 700, height = 500) %>%
                          layout(
                            font = list(size = 10),
                            xaxis = list(
                              title = list(font = list(size = 14)),  # Adjust as necessary
                              tickfont = list(size = 10)  # Adjust the tick font size here
                            ),
                            yaxis = list(
                              title = list(font = list(size = 14)),  # Adjust as necessary
                              tickfont = list(size = 10)  # Adjust the tick font size here
                            )
                          )

 
hover_text  <- paste(
  "Event: ", volcanoTable_Pdiff$EVENT,
  "<br>Pdiff: ", round(volcanoTable_Pdiff$Pdiff,3),
  "<br>Deltapsi: ", round(volcanoTable_Pdiff$deltapsi,3),
  sep = ""
)

 

plotly_volcano_Pdiff$x$data[[1]]$text <- hover_text # all points
plotly_volcano_Pdiff$x$data[[2]]$text <- NULL 
  
plotly_volcano_Pdiff

The results can also be manually inspected by browsing the volcanoTable_Pdiff table

head(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")])

	GENE	EVENT	COORD	Pdiff	deltapsi
37	Gmpr	MmuEX0021295	chr13:45521127-45521219	0.9845	0.1392815
98	Hnrnpd	MmuEX0023099	chr5:99962134-99962204	0.9050	-0.0222313
99	Hnrnpd	MmuEX0023100	chr5:99963731-99963877	0.8880	-0.0442848
144	Fmr1	MmuEX0019414	chrX:68705861-68705923	1.0000	0.6229724
161	Chtop	MmuEX0001048	chr3:90505395-90505478	0.5615	0.0032424
175	Cnnm3	MmuEX0011891	chr1:36524034-36524132	1.0000	-0.5776761

5.1.2 F-statistic

To calculate the F-statistic for all events in our dataset, we will use the prepareTableVolcanoFstat function. To visualize the results in a volcano plot, we will use the plotVolcanoFstat function. The user can also use the approach presented above to make the plot interactive.

volcanoTable_Fstat <- prepareTableVolcanoFstat(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100, 
                                    seed=TRUE,
                                   CoverageWeight=F )

volcano_Fstat <- plotVolcanoFstat(betasTable = volcanoTable_Fstat,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_Fstat

head(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")])

	GENE	EVENT	COORD	Fstat	deltapsi
37	Gmpr	MmuEX0021295	chr13:45521127-45521219	4.599441	0.1392815
98	Hnrnpd	MmuEX0023099	chr5:99962134-99962204	2.554947	-0.0222313
99	Hnrnpd	MmuEX0023100	chr5:99963731-99963877	2.120957	-0.0442848
144	Fmr1	MmuEX0019414	chrX:68705861-68705923	10.682045	0.6229724
161	Chtop	MmuEX0001048	chr3:90505395-90505478	0.849660	0.0032424
175	Cnnm3	MmuEX0011891	chr1:36524034-36524132	10.675033	-0.5776761

5.1.3 False positive rate (FPR)

To calculate the FPR for all events in our dataset, we will use the prepareTableVolcanoFDR function. To visualize the results in a volcano plot, we will use the plotVolcanoFDR function. The user can also use the approach presented above to make the plot interactive.

volcanoTable_FDR <- prepareTableVolcanoFDR(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100,
                                    nsim = 100, 
                                    seed=TRUE, 
                                    CoverageWeight = FALSE) 
        
volcano_FDR <- plotVolcanoFDR(betasTable = volcanoTable_FDR,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_FDR

head(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")])

	GENE	EVENT	COORD	FDR	deltapsi
37	Gmpr	MmuEX0021295	chr13:45521127-45521219	0.00	0.1392815
98	Hnrnpd	MmuEX0023099	chr5:99962134-99962204	0.02	-0.0222313
99	Hnrnpd	MmuEX0023100	chr5:99963731-99963877	0.20	-0.0442848
144	Fmr1	MmuEX0019414	chrX:68705861-68705923	0.00	0.6229724
161	Chtop	MmuEX0001048	chr3:90505395-90505478	0.83	0.0032424
175	Cnnm3	MmuEX0011891	chr1:36524034-36524132	0.00	-0.5776761

5.2 Visualize individual events

To illustrate betAS approach for an individual event with differential alternative splicing between Neurons and ESC, we will be using the event MmuEX0003638. This is an exon skipping event in the Actn4 gene.

eventID <- "MmuEX0003638"

We can visualize the beta distributions for this event for each sample.

# Auxiliary data to add increment based on the event coverage to avoid dividing by zero 
data("maxDevSimulationN100")

tdensities <- plotIndividualDensitiesList(eventID = eventID,
                                          npoints = 500,
                                          psitable = dataset$PSI,
                                          qualtable = dataset$Qual,
                                          groupList = groupList,
                                          maxDevTable = maxDevSimulationN100, 
                                          seed=TRUE, 
                                          CoverageWeight = FALSE)

tdensities + theme_minimal() + ggtitle(eventID)

We can also inspect manually the different statistics for this event, and plot such results.

subset(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")], EVENT==eventID)

	GENE	EVENT	COORD	Pdiff	deltapsi
45340	Actn4	MmuEX0003638	chr7:28895121-28895180	1	-0.618374

subset(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")], EVENT==eventID)

	GENE	EVENT	COORD	Fstat	deltapsi
45340	Actn4	MmuEX0003638	chr7:28895121-28895180	11.61165	-0.618374

subset(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")], EVENT==eventID)

	GENE	EVENT	COORD	FDR	deltapsi
45340	Actn4	MmuEX0003638	chr7:28895121-28895180	0	-0.618374

plotPdiff <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000, 
                               seed=T,
                                CoverageWeight=F) %>% 
  plotPDiffFromEventObjList()

plotFstat <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000,
                               seed=T,
                             CoverageWeight=F) %>% 
  plotFstatFromEventObjList()
 
plotFDR <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000,
                               seed=T,
                             CoverageWeight=F) %>% 
  plotFDRFromEventObjList()

plot_grid(plotPdiff,plotFstat,plotFDR, ncol=3)

6 Differential alternative splicing between multiple groups

The differential AS approach implemented by betAS can be applied to multiple (i.e., more than two) groups in a novel ANOVA-inspired way that extends the Pdiff definition to the comparison of the differences between samples belonging to different biological conditions to those found between replicates.

6.1 Dataset loading and filtering

To illustrate this, we applied multiple-group betAS to the analysis of AS differences in a subset of human transcriptomes of forebrain, hindbrain, heart, kidney, liver and testis. This dataset is a subset of the Human RNA-seq time-series of the development of seven major organs public human dataset.

# load data and prepare it for betAS
data("VT1_data_human")
dataset_multgroups <- getEvents(VT1_data_human, tool = "vast-tools")

# load metadata
data("VT1_metadata_human")
metadata_multgroups <- as.data.frame(VT1_metadata_human)
metadata_multgroups

Run	age	developmental_stage	organism_part	sex
ERR2598266	14	adolescent	forebrain	male
ERR2598267	16	adolescent	forebrain	male
ERR2598268	17	adolescent	forebrain	male
ERR2598269	13	adolescent	forebrain	male
ERR2598270	13	adolescent	heart	male
ERR2598271	13	adolescent	hindbrain	male
ERR2598272	16	adolescent	hindbrain	male
ERR2598273	14	adolescent	hindbrain	male
ERR2598274	17	adolescent	hindbrain	male
ERR2598275	17	adolescent	liver	male
ERR2598276	16	adolescent	testis	male
ERR2598277	17	adolescent	testis	male
ERR2598278	14	adolescent	testis	male
ERR2598279	14	adolescent	testis	male
ERR2598280	58	elderly	forebrain	male
ERR2598281	58	elderly	forebrain	male
ERR2598282	58	elderly	hindbrain	male
ERR2598283	58	elderly	hindbrain	male
ERR2598284	58	elderly	liver	male
ERR2598285	58	elderly	liver	male
ERR2598286	55	elderly	testis	male
ERR2598305	50	middle adult	forebrain	male
ERR2598306	53	middle adult	forebrain	male
ERR2598307	54	middle adult	heart	male
ERR2598308	53	middle adult	hindbrain	male
ERR2598309	50	middle adult	hindbrain	male
ERR2598310	55	middle adult	liver	male
ERR2598311	46	middle adult	testis	male
ERR2598346	29	young adult	forebrain	male
ERR2598347	29	young adult	forebrain	male
ERR2598348	28	young adult	forebrain	male
ERR2598349	39	young adult	forebrain	male
ERR2598350	39	young adult	forebrain	male
ERR2598351	25	young adult	heart	male
ERR2598352	29	young adult	hindbrain	male
ERR2598353	29	young adult	hindbrain	male
ERR2598354	32	young adult	hindbrain	female
ERR2598355	39	young adult	hindbrain	male
ERR2598356	39	young adult	hindbrain	male
ERR2598357	29	young adult	liver	male
ERR2598358	39	young adult	liver	male
ERR2598359	39	young adult	liver	male
ERR2598360	29	young adult	testis	male
ERR2598361	29	young adult	testis	male
ERR2598362	28	young adult	testis	male
ERR2598363	39	young adult	testis	male

For this dataset, we are considering a reasonable number of samples (46). As such, in cases like this, it might be interesting to be less conservative on the filtering step and choose to keep all events irrespectively of their PSI value. However, for the purpose of this tutorial, we will keep the criteria of filtering discussed above, for computational ease of this exercise. For simplicity, we will consider only exon-skipping events.

cat(paste0("Initial number of events: ", nrow(dataset_multgroups$PSI)))

## Initial number of events: 2231

dataset_multgroups_filtered <- filterEvents(dataset_multgroups, types=c("C1", "C2", "C3", "S", "MIC"), N=10)
cat(paste0("Alternative events (after event type and coverage filtering): ", nrow(dataset_multgroups_filtered$PSI)))

## Alternative events (after event type and coverage filtering): 2197

dataset_multgroups_filtered <- alternativeEvents(dataset_multgroups_filtered, minPsi=1, maxPsi=99)
cat(paste0("Alternative events (after PSI range filtering): ", nrow(dataset_multgroups_filtered$PSI)))

## Alternative events (after PSI range filtering): 117

6.2 Group definition

We will create groups based on the samples’ tissue of origin, encoded in the organism_part metadata’s column.

# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "organism_part"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata_multgroups[,groupingVariable])
# Define vector of sample names based on the example metadata 
samples <- metadata_multgroups$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7")

groupList <- list()

for(i in 1:length(groups)){

  groupNames <- samples[which(metadata_multgroups[,groupingVariable] == groups[i])]

  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = groups[i],
                                           samples = groupNames,
                                           color = random_colors[i])
  names(groupList) <- make.unique(c(currentNames, groups[i]))

}

groupList

## $forebrain
## $forebrain$name
## [1] "forebrain"
## 
## $forebrain$samples
##  [1] "ERR2598266" "ERR2598267" "ERR2598268" "ERR2598269" "ERR2598280" "ERR2598281" "ERR2598305" "ERR2598306" "ERR2598346" "ERR2598347"
## [11] "ERR2598348" "ERR2598349" "ERR2598350"
## 
## $forebrain$color
## [1] "#FF9AA2"
## 
## 
## $heart
## $heart$name
## [1] "heart"
## 
## $heart$samples
## [1] "ERR2598270" "ERR2598307" "ERR2598351"
## 
## $heart$color
## [1] "#FFB7B2"
## 
## 
## $hindbrain
## $hindbrain$name
## [1] "hindbrain"
## 
## $hindbrain$samples
##  [1] "ERR2598271" "ERR2598272" "ERR2598273" "ERR2598274" "ERR2598282" "ERR2598283" "ERR2598308" "ERR2598309" "ERR2598352" "ERR2598353"
## [11] "ERR2598354" "ERR2598355" "ERR2598356"
## 
## $hindbrain$color
## [1] "#FFDAC1"
## 
## 
## $liver
## $liver$name
## [1] "liver"
## 
## $liver$samples
## [1] "ERR2598275" "ERR2598284" "ERR2598285" "ERR2598310" "ERR2598357" "ERR2598358" "ERR2598359"
## 
## $liver$color
## [1] "#E2F0CB"
## 
## 
## $testis
## $testis$name
## [1] "testis"
## 
## $testis$samples
##  [1] "ERR2598276" "ERR2598277" "ERR2598278" "ERR2598279" "ERR2598286" "ERR2598311" "ERR2598360" "ERR2598361" "ERR2598362" "ERR2598363"
## 
## $testis$color
## [1] "#B5EAD7"

# Visualize colors being used
slices <- rep(1, length(groups))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")

6.3 Differential alternative splicing across multiple groups/conditions

options(error=recover)
volcanoTable_multgroups <- prepareTableVolcanoMultipleGroups(psitable = dataset_multgroups_filtered$PSI,
                                                 qualtable = dataset_multgroups_filtered$Qual,
                                                 groupList = groupList,
                                                 npoints = 500,
                                                 maxDevTable = maxDevSimulationN100, seed=TRUE, CoverageWeight = F)

head(volcanoTable_multgroups[,c("GENE","EVENT","Fstat","Pdiff","Pzero","medianBetweens","deltaAbsolute")])

	GENE	EVENT	Fstat	Pdiff	Pzero	medianBetweens	deltaAbsolute
1197	DAZAP1	HsaEX0018545	1.531845	0.6245004	0.6245004	0.0319597	0.0149702
1565	SMARCA2	HsaEX0060271	1.393718	0.5721423	0.5721423	0.0168873	0.0140413
1784	GNAS	HsaEX0027841	7.605087	0.8578009	0.8578009	-0.0528636	0.2314300
2649	DNAJA3	HsaEX0020236	1.270586	0.5736325	0.5736325	0.0334410	0.0178975
2771	INTS10	HsaEX0031861	9.684621	0.8501499	0.8501499	-0.2672253	0.4973357
2790	SNRNP70	HsaEX0060707	2.220629	0.6369095	0.6369095	-0.0153788	0.1056033

6.3.1 Probability of differential splicing (Pdiff)

plotVolcano_MultipleGroups_Pdiff(betasTable = volcanoTable_multgroups)

6.3.2 F-statistic

plotVolcano_MultipleGroups_Fstat(betasTable = volcanoTable_multgroups)

6.4 Visualize individual events

eventID_mult <- "HsaEX0031861"

mult_violinplots <- plotIndividualViolinsList(eventID = eventID_mult,
                          npoints = 500,
                          psitable = dataset_multgroups_filtered$PSI,
                          qualtable = dataset_multgroups_filtered$Qual,
                          groupList = groupList,
                          maxDevTable = maxDevSimulationN100,
                          seed=TRUE,
                          CoverageWeight = F)

eventList <- prepareTableEventMultiple(eventID = eventID_mult,
                                       psitable = dataset_multgroups_filtered$PSI,
                                       qualtable = dataset_multgroups_filtered$Qual,
                                       groupList = groupList,
                                       npoints = 500,
                                       maxDevTable = maxDevSimulationN100,
                                       seed=TRUE,
                                        CoverageWeight = F)

fstat_mult <- plotFstatFromEventObjListMultiple(eventObjList = eventList)


plot_grid(fstat_mult,mult_violinplots, ncol=2, rel_widths = c(0.3, 1))

betAS command-line interface (CLI) tutorial

Mariana Ascensão-Ferreira

2024-01-15