betAS is a user-friendly R package that allows intuitive analysis and visualisation of differential alternative splicing (AS) based on beta distributions.
Beta distributions are suitable to quantify inclusion proportions of alternative sequences, using RNA sequencing reads supporting their inclusion and exclusion as surrogates for the two distribution shape parameters. Each such beta distribution has the inclusion proportion as mean value and is narrower when the read coverage is higher, facilitating the interpretability of its precision when plotted. betAS uses beta distributions to accurately model PSI values and their precision, to quantitatively and visually compare AS between groups of samples.
betAS allows the analysis of user-provided tables with AS quantifications, such as those obtained by vast-tools, rMATS or Whippet, ranking differentially spliced events by a significance metric that incorporates the compromise between the uncertainty in individual sample estimates and the variability among replicates.
Please note that this tutorial will provide an overview on how to use betAS R package for alternative splicing analyses. The cases explored in this tutorial were adapted from betAS’ original article:
Mariana Ascensão-Ferreira, Rita Martins-Silva, Nuno Saraiva-Agostinho and Nuno L. Barbosa-Morais (2023). betAS: intuitive analysis and visualisation of differential alternative splicing using beta distributions. bioRxiv
The results herein presented don’t correspond necessarily to the results presented in the manuscript. For a more detailed explanation on the methods used, please check the manuscript.
Install betAS by typing the following in an R console:
::install_github("DiseaseTranscriptomicsLab/betAS@dev") devtools
After the installation, load betAS and other required packages by typing:
library(betAS)
Consider also loading the following libraries, for visualization purposes, by typing:
if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)
if(!require(plotly)) install.packages("plotly")
library(plotly)
if(!require(dplyr)) install.packages("dplyr")
library(dplyr)
if(!require(cowplot)) install.packages("cowplot")
library(cowplot)
To load data for differential alternative splicing analyses, we can use the getDataset
function. betAS accepts files sourced from vast-tools
, rMATS
, or whippet
. These files should contain junction reads information (e.g. results obtained from the vast-tools’ tidy
module are not supported).
Based on the value assigned to the tool
parameter, the pathTables
parameter can take the following input types:
tool="vast-tools"
: Should be a path pointing to a *INCLUSION_LEVELS_FULL*.tab
file.tool="rMATS"
: Should be a path pointing to a *MATS.JC.txt
file.tool="whippet"
: Should be a list of paths (minimum of 2) pointing to *.psi.gz
files.# Example to load data from vast-tools
<- getDataset(pathTables="path/to/dataset/*INCLUSION_LEVELS_FULL*.tab", tool="vast-tools")
dataset
# Example to load data from rMATS
<- getDataset(pathTables="path/to/dataset/*MATS.JC.txt", tool="rMATS")
dataset
# Example to load data from whippet
<- getDataset(pathTables=list("path/to/sample1/*.psi.gz",
dataset "path/to/sample2/*.psi.gz",
"path/to/sample3/*.psi.gz"), tool="whippet")
The getDataset
function retrieves the inclusion tables specific to the chosen tool for further analyses. However, we also need to use the getEvents
function in order to format inclusion tables into a standard format that is compatible with betAS workflow.
The output from getEvents
is an object with 4 attributes:
PSI
: A data frame with information regarding the splicing events and their inclusion levels.Qual
: A data frame detailing the events and information related to the coverage associated with each event. This representation follows the vast-tools representation of inclusion (inc) and exclusion (exc) reads.EventsPerType
: A named table summarising the number of events per type.Samples
: Vector containing the names of the samples considered.<- "vast-tools" # example tool, can be one of the following: "vast-tools", "rMATS", "whippet"
tool <- getEvents(dataset, tool = tool) dataset
For users who may not have files to upload and just want to explore betAS functionalities, there’s also an option to work with built-in datasets. This dataset is a subset of the Deep transcriptional profiling of longitudinal changes during neurogenesis and network maturation in vivo public mouse dataset:
Run | Cell Type | Div |
---|---|---|
SRR645824 | ESC | DIV-8 |
SRR645826 | ESC | DIV-8 |
SRR645828 | ESC | DIV-8 |
SRR645830 | ESC | DIV-8 |
SRR645872 | Neuron | DIV28 |
SRR645875 | Neuron | DIV28 |
SRR645877 | Neuron | DIV28 |
SRR645879 | Neuron | DIV28 |
To use the pre-built datasets, simply don’t specify the file and choose the tool you prefer.
Take into account that the dataset provided by the three different tools might not be totally the same. The events provided greatly depend on the tool itself, the type of events considered (e.g. for rMATS only exon-skipping events were considered), among others. These datasets should be used for exploration of betAS features only.
# Example to load data from vast-tools
<- getDataset(pathTables=NULL, tool="vast-tools")
dataset
# Example to load data from rMATS
<- getDataset(pathTables=NULL, tool="rMATS")
dataset
# Example to load data from whippet
<- getDataset(pathTables=NULL, tool="whippet") dataset
For simplicity, in this tutorial we will be using a default dataset from vast-tools to illustrate such features. At this stage, the user can explore the raw / non-filtered data by browsing in the four objects inside dataset
. It might be useful to check at this stage the overall results for a given gene / event of interest.
<- getDataset(pathTables=NULL, tool="vast-tools")
dataset <- getEvents(dataset, tool = "vast-tools") dataset
<- "Actn4"
gene head(dataset$PSI[dataset$PSI$GENE==gene,])
GENE | EVENT | COORD | LENGTH | FullCO | COMPLEX | SRR645824 | SRR645826 | SRR645828 | SRR645830 | SRR645872 | SRR645875 | SRR645877 | SRR645879 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13530 | Actn4 | MmuEX6009510 | chr7:28901877-28902027 | 151 | chr7:28904541,28901877-28902027,28898728 | S | 100.00 | 99.80 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
45339 | Actn4 | MmuEX0003637 | chr7:28895432-28895512 | 81 | chr7:28896089,28895432-28895512,28894659 | C3 | 90.99 | 90.44 | 89.74 | 96.25 | 19.44 | 33.33 | 19.15 | 30.51 |
45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 60 | chr7:28896089,28894911+28895121-28895180+28895186,28894659 | C3 | 9.19 | 8.67 | 6.80 | 3.54 | 82.09 | 60.53 | 75.61 | 59.66 |
45341 | Actn4 | MmuEX0003639 | chr7:28911490-28911575 | 86 | chr7:28912240,28911490-28911575,28907064 | C3 | 85.63 | 85.26 | 86.64 | 80.03 | 36.13 | 23.28 | 25.78 | 31.03 |
45342 | Actn4 | MmuEX0003640 | chr7:28909903-28909988 | 86 | chr7:28912240,28909903-28909988,28907064 | C3 | 17.62 | 18.01 | 18.31 | 21.92 | 65.48 | 72.69 | 84.48 | 79.66 |
119010 | Actn4 | MmuEX6009503 | chr7:28918672-28918791 | 120 | chr7:28918954,28918672-28918791,28916246 | S | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
<- "MmuEX0003638"
event_of_interest $PSI[dataset$PSI$EVENT==event_of_interest,] dataset
GENE | EVENT | COORD | LENGTH | FullCO | COMPLEX | SRR645824 | SRR645826 | SRR645828 | SRR645830 | SRR645872 | SRR645875 | SRR645877 | SRR645879 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 60 | chr7:28896089,28894911+28895121-28895180+28895186,28894659 | C3 | 9.19 | 8.67 | 6.8 | 3.54 | 82.09 | 60.53 | 75.61 | 59.66 |
betAS allows to filter the dataset based on alternative splicing event types, PSI value range to consider and minimum number of supporting reads.
betAS allows to filter event types. The supported event types are in accordance to the ones documented in the alternative splicing quantification tools supported by betAS:
types
include Alt3, Alt5, ANN, C1, C2, C3, IR-C, IR-S, MIC and S as described in vast-tools documentation.types
include CE, AA, AD, IR, TS, TE, AF, AL and BS as described in whippet documentation.types
parameter should be set to NULL
To explore the type of events in a dataset and their respective number, we can access the EventsPerType
field in the dataset
variable.
$EventsPerType dataset
##
## Alt3 Alt5 ANN C1 C2 C3 IR MIC S
## 85993 61115 24750 5699 5179 5896 97042 1038 63792
betAS also allows to filter events with less than a given number of junction reads (sum of inclusion and exclusion reads) in at least one sample. To do so, the user can specify the minimum number of reads to consider in the N
parameter.
In this tutorial, we will be considering only exon skipping events (C1, C2, C3, S & MIC events from vast-tools
) with at least 10 junction read counts in all samples.
<- filterEvents(dataset, types=c("C1", "C2", "C3", "S", "MIC"), N=10)
dataset_filtered
cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))
## Alternative events: 195636
betAS allows to filter events based on their PSI value range (in percentage, i.e. PSI ∈ [0,100]). We will be keeping only alternative events, i.e. ~ 0 < PSI < 100 in all samples. If you wish to keep all of the events and don’t filter by PSI range, skip this step or put minPsi
=0 and maxPsi
=100.
<- alternativeEvents(dataset_filtered, minPsi=1, maxPsi=99)
dataset_filtered
cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))
## Alternative events: 1404
After filtering the data to only keep events of interest, we can have an overview of the PSIs on the samples being analysed.
<- bigPicturePlot(table = dataset_filtered$PSI)
bigPicturePlot + theme_minimal() bigPicturePlot
To allow for alternative splicing analyses between groups of interest, the user must firstly specify to what groups the samples under analyses belong to.
To check the names of the samples being used, we can access the Samples
attribute inside the dataset
object.
$Samples dataset_filtered
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830" "SRR645872" "SRR645875" "SRR645877" "SRR645879"
In order to ease group visualization, we can assign a color to each group. In this section we showcase how to assign a group to a given color, based on a manually defined palette. To automatically define color palettes of N colors the user might want to check Custom color palette for group definition section of this tutorial.
We can define sample groups based on the name of the samples being used, assuming that the sample naming has some characteristics that allow to easily identify the group to which they belong to.
<- dataset_filtered$Samples
samples <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7", "#C7CEEA", "#FBE2FD", "#D9ECFE") random_colors
# Automatic group definition
<- samples
not_grouped <- c()
checked <- LETTERS[seq(1, length = min(length(names), 26))]
groups <- groups
used_groups <- list()
groupList
while((length(checked) < length(names)) & length(used_groups) <= 26){
<- agrep(pattern = not_grouped[1], x = not_grouped, value = TRUE)
groupNames <- c(checked, groupNames)
checked <- not_grouped[-c(match(groupNames, not_grouped))]
not_grouped
# Assign new group
<- names(groupList)
currentNames length(groupList)+1]] <- list(name = used_groups[1],
groupList[[samples = groupNames,
color = random_colors[1])
names(groupList) <- make.unique(c(currentNames, used_groups[1]))
<- random_colors[-1]
random_colors <- used_groups[-1]
used_groups
}
groupList
## $A
## $A$name
## [1] "A"
##
## $A$samples
## [1] "SRR645824" "SRR645826" "SRR645828"
##
## $A$color
## [1] "#FF9AA2"
# Visualize colors being used
<- rep(1, length(groupList)) # Equal-sized slices for each color
slices # Display the pie chart with colors
pie(slices, col = random_colors[1:length(groupList)], border = "black", labels=names(groupList), main = "Color palette for group definition")
We can also define groups of samples based on a given column of the metadata. We will be using betAS metadata for vast-tools built-in example to ease group creation:
data("VT2_metadata_mouse")
<- as.data.frame(VT2_metadata_mouse)) (metadata
Run | CellType | Div |
---|---|---|
SRR645824 | ESC | DIV-8 |
SRR645826 | ESC | DIV-8 |
SRR645828 | ESC | DIV-8 |
SRR645830 | ESC | DIV-8 |
SRR645872 | Neuron | DIV28 |
SRR645875 | Neuron | DIV28 |
SRR645877 | Neuron | DIV28 |
SRR645879 | Neuron | DIV28 |
For this example we will use the cell type as a grouping variable, i.e. Neurons or Embrionic Stem Cell (ESC). This information can be accessed through the CellType
column in the example metadata. The name of the samples to be used can be retrieved by accessing the Run
column in the example metadata.
If using user-provided data, the user must also provide a metadata table, or ensure that the final result corresponds to groupList
presented in this section.
# Define variable of the metadata table to be used as a grouping variable
<- "CellType"
groupingVariable # to get the unique values of the grouping variable. In this example, will be Neuron and ESC
<- unique(metadata[,groupingVariable])
groups # Define vector of sample names based on the example metadata
<- metadata$Run
samples # Define colors for the two groups
<- c("#FF9AA2", "#FFB7B2") random_colors
<- list()
groupList
for(i in 1:length(groups)){
<- samples[which(metadata[,groupingVariable] == groups[i])]
groupNames
# Assign new group
<- names(groupList)
currentNames length(groupList)+1]] <- list(name = groups[i],
groupList[[samples = groupNames,
color = random_colors[i])
names(groupList) <- make.unique(c(currentNames, groups[i]))
}
groupList
## $ESC
## $ESC$name
## [1] "ESC"
##
## $ESC$samples
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830"
##
## $ESC$color
## [1] "#FF9AA2"
##
##
## $Neuron
## $Neuron$name
## [1] "Neuron"
##
## $Neuron$samples
## [1] "SRR645872" "SRR645875" "SRR645877" "SRR645879"
##
## $Neuron$color
## [1] "#FFB7B2"
# Visualize colors being used
<- rep(1, length(groups)) # Equal-sized slices for each color
slices # Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")
betAS approach for alternative splicing analyses takes into account two metrics for each alternative splicing event under analyses: the magnitude of the effect (𝚫PSI = PSIbetAS(group A) - PSIbetAS(group B)) and the statistical significance of such changes. For the latter, betAS introduces three approaches: probability of differential splicing (Pdiff), F-statistic or false positive rate (FPR).
Pdiff: This approach takes the two sets of random points per condition and calculates, for each AS event’s estimated ∆PSI, the proportion of differences between these that are greater than zero, which has the same interpretation as asking what proportion of beta distribution-emitted values for one condition are higher than those emitted for the other, thus reflecting the probability of differential AS of PSIbetAS(group A) being greater than PSIbetAS(group B).
F-statistic: betAS also enables an ANOVA-like analysis of variance, comparing inter- and intra-group variabilities. For each event, within is considered the set of differences between each pair of samples that are part of the same group and between the set of differences between each pair of groups. The ratio of the median absolute values of between and within therefore provides an “F-like” statistic. This metric provides a compromise between the effect size of AS differences and their significance.
FPR: Random generation of points from a beta distribution is used to estimate the null distribution’s PSI and its precision. Then, one point is randomly selected from each of the sample’s null distribution and, keeping the samples’ group assignment (i.e., which samples belong to each group), the ∆PSI between groups under the null hypothesis is calculated. The process is repeated many times (10 000 by default) and the FPR is the proportion of ∆PSI random simulations that are larger than (i.e., more extreme) or equal to the empirical ∆PSI.
Firstly, we need to define the groups we will be analysing. This step is particularly important when we have more than one group defined, but want to compare only two of them.
# Define groups
<- "Neuron"
groupA <- "ESC"
groupB
# Define samples inside each group
<- groupList[[groupA]]$samples
samplesA <- groupList[[groupB]]$samples
samplesB
# Convert samples into indexes
<- convertCols(dataset_filtered$PSI, samplesA)
colsGroupA <- convertCols(dataset_filtered$PSI, samplesB) colsGroupB
betAS utilizes R’s rbeta
and sample
functions, introducing inherent uncertainty. Running betAS twice may yield different results, particularly near significance thresholds. Variability in outcomes can be informative, signaling cases where an event’s significance is on the cusp. However, to ensure reproducibility, users can set a seed, avoiding randomization of the emitted points. By default, the seed is set to TRUE, and will be used as such for the tutorial’s reproducibility.
Moreover, the user can choose to weight the number of points emitted for beta distributions based on the coverage of each sample. With large sample sizes, uncertainty and significance of differential splicing are mostly determined by biological variability among replicate samples. However, as stated in betAS manuscript, working with small sample sizes implies a good compromise between modelling the uncertainty in the estimation of inclusion levels of alternative splicing events in individual samples and accounting for that biological variability among replicates. As such, we consider that for not-so-small sample sizes and/or a knowledgeable user intending to tune the aforementioned compromise differently, size factors proportional to coverage can be useful. This parameter should be used with care, and by default it is set to FALSE.
To calculate the Pdiff for all events in our dataset, we will use the prepareTableVolcano
function. To visualize the results in a volcano plot, we will use the plotvolcanoPdiff
function.
options(error=recover)
<- prepareTableVolcano(psitable = dataset_filtered$PSI,
volcanoTable_Pdiff qualtable = dataset_filtered$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = FALSE)
<- plotVolcano(betasTable = volcanoTable_Pdiff,
volcano_Pdiff labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C")
volcano_Pdiff
The above plot colors in pink and labels by default the events with a ∆PSI>0.1. However, if these are in high number, the labels might not appear. To make the plot interactive, we can use the ggplotly
function.
# change size points manually
$layers[[1]]$aes_params$size <- 2
volcano_Pdiff$layers[[2]]$aes_params$size <- 1
volcano_Pdiff
# Plot interactively
<- ggplotly(volcano_Pdiff, width = 700, height = 500) %>%
plotly_volcano_Pdiff layout(
font = list(size = 10),
xaxis = list(
title = list(font = list(size = 14)), # Adjust as necessary
tickfont = list(size = 10) # Adjust the tick font size here
),yaxis = list(
title = list(font = list(size = 14)), # Adjust as necessary
tickfont = list(size = 10) # Adjust the tick font size here
)
)
<- paste(
hover_text "Event: ", volcanoTable_Pdiff$EVENT,
"<br>Pdiff: ", round(volcanoTable_Pdiff$Pdiff,3),
"<br>Deltapsi: ", round(volcanoTable_Pdiff$deltapsi,3),
sep = ""
)
$x$data[[1]]$text <- hover_text # all points
plotly_volcano_Pdiff$x$data[[2]]$text <- NULL
plotly_volcano_Pdiff
plotly_volcano_Pdiff
The results can also be manually inspected by browsing the volcanoTable_Pdiff
table
head(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")])
GENE | EVENT | COORD | Pdiff | deltapsi | |
---|---|---|---|---|---|
37 | Gmpr | MmuEX0021295 | chr13:45521127-45521219 | 0.9845 | 0.1392815 |
98 | Hnrnpd | MmuEX0023099 | chr5:99962134-99962204 | 0.9050 | -0.0222313 |
99 | Hnrnpd | MmuEX0023100 | chr5:99963731-99963877 | 0.8880 | -0.0442848 |
144 | Fmr1 | MmuEX0019414 | chrX:68705861-68705923 | 1.0000 | 0.6229724 |
161 | Chtop | MmuEX0001048 | chr3:90505395-90505478 | 0.5615 | 0.0032424 |
175 | Cnnm3 | MmuEX0011891 | chr1:36524034-36524132 | 1.0000 | -0.5776761 |
To calculate the F-statistic for all events in our dataset, we will use the prepareTableVolcanoFstat
function. To visualize the results in a volcano plot, we will use the plotVolcanoFstat
function. The user can also use the approach presented above to make the plot interactive.
<- prepareTableVolcanoFstat(psitable = dataset_filtered$PSI,
volcanoTable_Fstat qualtable = dataset_filtered$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight=F )
<- plotVolcanoFstat(betasTable = volcanoTable_Fstat,
volcano_Fstat labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C")
volcano_Fstat
head(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")])
GENE | EVENT | COORD | Fstat | deltapsi | |
---|---|---|---|---|---|
37 | Gmpr | MmuEX0021295 | chr13:45521127-45521219 | 4.599441 | 0.1392815 |
98 | Hnrnpd | MmuEX0023099 | chr5:99962134-99962204 | 2.554947 | -0.0222313 |
99 | Hnrnpd | MmuEX0023100 | chr5:99963731-99963877 | 2.120957 | -0.0442848 |
144 | Fmr1 | MmuEX0019414 | chrX:68705861-68705923 | 10.682045 | 0.6229724 |
161 | Chtop | MmuEX0001048 | chr3:90505395-90505478 | 0.849660 | 0.0032424 |
175 | Cnnm3 | MmuEX0011891 | chr1:36524034-36524132 | 10.675033 | -0.5776761 |
To calculate the FPR for all events in our dataset, we will use the prepareTableVolcanoFDR
function. To visualize the results in a volcano plot, we will use the plotVolcanoFDR
function. The user can also use the approach presented above to make the plot interactive.
<- prepareTableVolcanoFDR(psitable = dataset_filtered$PSI,
volcanoTable_FDR qualtable = dataset_filtered$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 100,
seed=TRUE,
CoverageWeight = FALSE)
<- plotVolcanoFDR(betasTable = volcanoTable_FDR,
volcano_FDR labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C")
volcano_FDR
head(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")])
GENE | EVENT | COORD | FDR | deltapsi | |
---|---|---|---|---|---|
37 | Gmpr | MmuEX0021295 | chr13:45521127-45521219 | 0.00 | 0.1392815 |
98 | Hnrnpd | MmuEX0023099 | chr5:99962134-99962204 | 0.02 | -0.0222313 |
99 | Hnrnpd | MmuEX0023100 | chr5:99963731-99963877 | 0.20 | -0.0442848 |
144 | Fmr1 | MmuEX0019414 | chrX:68705861-68705923 | 0.00 | 0.6229724 |
161 | Chtop | MmuEX0001048 | chr3:90505395-90505478 | 0.83 | 0.0032424 |
175 | Cnnm3 | MmuEX0011891 | chr1:36524034-36524132 | 0.00 | -0.5776761 |
To illustrate betAS approach for an individual event with differential alternative splicing between Neurons and ESC, we will be using the event MmuEX0003638
. This is an exon skipping event in the Actn4 gene.
<- "MmuEX0003638" eventID
We can visualize the beta distributions for this event for each sample.
# Auxiliary data to add increment based on the event coverage to avoid dividing by zero
data("maxDevSimulationN100")
<- plotIndividualDensitiesList(eventID = eventID,
tdensities npoints = 500,
psitable = dataset$PSI,
qualtable = dataset$Qual,
groupList = groupList,
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = FALSE)
+ theme_minimal() + ggtitle(eventID) tdensities
We can also inspect manually the different statistics for this event, and plot such results.
subset(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")], EVENT==eventID)
GENE | EVENT | COORD | Pdiff | deltapsi | |
---|---|---|---|---|---|
45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 1 | -0.618374 |
subset(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")], EVENT==eventID)
GENE | EVENT | COORD | Fstat | deltapsi | |
---|---|---|---|---|---|
45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 11.61165 | -0.618374 |
subset(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")], EVENT==eventID)
GENE | EVENT | COORD | FDR | deltapsi | |
---|---|---|---|---|---|
45340 | Actn4 | MmuEX0003638 | chr7:28895121-28895180 | 0 | -0.618374 |
<- prepareTableEvent(eventID = eventID,
plotPdiff psitable = dataset$PSI,
qualtable = dataset$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 1000,
seed=T,
CoverageWeight=F) %>%
plotPDiffFromEventObjList()
<- prepareTableEvent(eventID = eventID,
plotFstat psitable = dataset$PSI,
qualtable = dataset$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 1000,
seed=T,
CoverageWeight=F) %>%
plotFstatFromEventObjList()
<- prepareTableEvent(eventID = eventID,
plotFDR psitable = dataset$PSI,
qualtable = dataset$Qual,
npoints = 500,
colsA = colsGroupA,
colsB = colsGroupB,
labA = groupA,
labB = groupB,
basalColor = "#89C0AE",
interestColor = "#E69A9C",
maxDevTable = maxDevSimulationN100,
nsim = 1000,
seed=T,
CoverageWeight=F) %>%
plotFDRFromEventObjList()
plot_grid(plotPdiff,plotFstat,plotFDR, ncol=3)
The differential AS approach implemented by betAS can be applied to multiple (i.e., more than two) groups in a novel ANOVA-inspired way that extends the Pdiff definition to the comparison of the differences between samples belonging to different biological conditions to those found between replicates.
To illustrate this, we applied multiple-group betAS to the analysis of AS differences in a subset of human transcriptomes of forebrain, hindbrain, heart, kidney, liver and testis. This dataset is a subset of the Human RNA-seq time-series of the development of seven major organs public human dataset.
# load data and prepare it for betAS
data("VT1_data_human")
<- getEvents(VT1_data_human, tool = "vast-tools")
dataset_multgroups
# load metadata
data("VT1_metadata_human")
<- as.data.frame(VT1_metadata_human)
metadata_multgroups metadata_multgroups
Run | age | developmental_stage | organism_part | sex |
---|---|---|---|---|
ERR2598266 | 14 | adolescent | forebrain | male |
ERR2598267 | 16 | adolescent | forebrain | male |
ERR2598268 | 17 | adolescent | forebrain | male |
ERR2598269 | 13 | adolescent | forebrain | male |
ERR2598270 | 13 | adolescent | heart | male |
ERR2598271 | 13 | adolescent | hindbrain | male |
ERR2598272 | 16 | adolescent | hindbrain | male |
ERR2598273 | 14 | adolescent | hindbrain | male |
ERR2598274 | 17 | adolescent | hindbrain | male |
ERR2598275 | 17 | adolescent | liver | male |
ERR2598276 | 16 | adolescent | testis | male |
ERR2598277 | 17 | adolescent | testis | male |
ERR2598278 | 14 | adolescent | testis | male |
ERR2598279 | 14 | adolescent | testis | male |
ERR2598280 | 58 | elderly | forebrain | male |
ERR2598281 | 58 | elderly | forebrain | male |
ERR2598282 | 58 | elderly | hindbrain | male |
ERR2598283 | 58 | elderly | hindbrain | male |
ERR2598284 | 58 | elderly | liver | male |
ERR2598285 | 58 | elderly | liver | male |
ERR2598286 | 55 | elderly | testis | male |
ERR2598305 | 50 | middle adult | forebrain | male |
ERR2598306 | 53 | middle adult | forebrain | male |
ERR2598307 | 54 | middle adult | heart | male |
ERR2598308 | 53 | middle adult | hindbrain | male |
ERR2598309 | 50 | middle adult | hindbrain | male |
ERR2598310 | 55 | middle adult | liver | male |
ERR2598311 | 46 | middle adult | testis | male |
ERR2598346 | 29 | young adult | forebrain | male |
ERR2598347 | 29 | young adult | forebrain | male |
ERR2598348 | 28 | young adult | forebrain | male |
ERR2598349 | 39 | young adult | forebrain | male |
ERR2598350 | 39 | young adult | forebrain | male |
ERR2598351 | 25 | young adult | heart | male |
ERR2598352 | 29 | young adult | hindbrain | male |
ERR2598353 | 29 | young adult | hindbrain | male |
ERR2598354 | 32 | young adult | hindbrain | female |
ERR2598355 | 39 | young adult | hindbrain | male |
ERR2598356 | 39 | young adult | hindbrain | male |
ERR2598357 | 29 | young adult | liver | male |
ERR2598358 | 39 | young adult | liver | male |
ERR2598359 | 39 | young adult | liver | male |
ERR2598360 | 29 | young adult | testis | male |
ERR2598361 | 29 | young adult | testis | male |
ERR2598362 | 28 | young adult | testis | male |
ERR2598363 | 39 | young adult | testis | male |
For this dataset, we are considering a reasonable number of samples (46). As such, in cases like this, it might be interesting to be less conservative on the filtering step and choose to keep all events irrespectively of their PSI value. However, for the purpose of this tutorial, we will keep the criteria of filtering discussed above, for computational ease of this exercise. For simplicity, we will consider only exon-skipping events.
cat(paste0("Initial number of events: ", nrow(dataset_multgroups$PSI)))
## Initial number of events: 2231
<- filterEvents(dataset_multgroups, types=c("C1", "C2", "C3", "S", "MIC"), N=10)
dataset_multgroups_filtered cat(paste0("Alternative events (after event type and coverage filtering): ", nrow(dataset_multgroups_filtered$PSI)))
## Alternative events (after event type and coverage filtering): 2197
<- alternativeEvents(dataset_multgroups_filtered, minPsi=1, maxPsi=99)
dataset_multgroups_filtered cat(paste0("Alternative events (after PSI range filtering): ", nrow(dataset_multgroups_filtered$PSI)))
## Alternative events (after PSI range filtering): 117
We will create groups based on the samples’ tissue of origin, encoded in the organism_part
metadata’s column.
# Define variable of the metadata table to be used as a grouping variable
<- "organism_part"
groupingVariable # to get the unique values of the grouping variable. In this example, will be Neuron and ESC
<- unique(metadata_multgroups[,groupingVariable])
groups # Define vector of sample names based on the example metadata
<- metadata_multgroups$Run
samples # Define colors for the two groups
<- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7") random_colors
<- list()
groupList
for(i in 1:length(groups)){
<- samples[which(metadata_multgroups[,groupingVariable] == groups[i])]
groupNames
# Assign new group
<- names(groupList)
currentNames length(groupList)+1]] <- list(name = groups[i],
groupList[[samples = groupNames,
color = random_colors[i])
names(groupList) <- make.unique(c(currentNames, groups[i]))
}
groupList
## $forebrain
## $forebrain$name
## [1] "forebrain"
##
## $forebrain$samples
## [1] "ERR2598266" "ERR2598267" "ERR2598268" "ERR2598269" "ERR2598280" "ERR2598281" "ERR2598305" "ERR2598306" "ERR2598346" "ERR2598347"
## [11] "ERR2598348" "ERR2598349" "ERR2598350"
##
## $forebrain$color
## [1] "#FF9AA2"
##
##
## $heart
## $heart$name
## [1] "heart"
##
## $heart$samples
## [1] "ERR2598270" "ERR2598307" "ERR2598351"
##
## $heart$color
## [1] "#FFB7B2"
##
##
## $hindbrain
## $hindbrain$name
## [1] "hindbrain"
##
## $hindbrain$samples
## [1] "ERR2598271" "ERR2598272" "ERR2598273" "ERR2598274" "ERR2598282" "ERR2598283" "ERR2598308" "ERR2598309" "ERR2598352" "ERR2598353"
## [11] "ERR2598354" "ERR2598355" "ERR2598356"
##
## $hindbrain$color
## [1] "#FFDAC1"
##
##
## $liver
## $liver$name
## [1] "liver"
##
## $liver$samples
## [1] "ERR2598275" "ERR2598284" "ERR2598285" "ERR2598310" "ERR2598357" "ERR2598358" "ERR2598359"
##
## $liver$color
## [1] "#E2F0CB"
##
##
## $testis
## $testis$name
## [1] "testis"
##
## $testis$samples
## [1] "ERR2598276" "ERR2598277" "ERR2598278" "ERR2598279" "ERR2598286" "ERR2598311" "ERR2598360" "ERR2598361" "ERR2598362" "ERR2598363"
##
## $testis$color
## [1] "#B5EAD7"
# Visualize colors being used
<- rep(1, length(groups)) # Equal-sized slices for each color
slices # Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")
options(error=recover)
<- prepareTableVolcanoMultipleGroups(psitable = dataset_multgroups_filtered$PSI,
volcanoTable_multgroups qualtable = dataset_multgroups_filtered$Qual,
groupList = groupList,
npoints = 500,
maxDevTable = maxDevSimulationN100, seed=TRUE, CoverageWeight = F)
head(volcanoTable_multgroups[,c("GENE","EVENT","Fstat","Pdiff","Pzero","medianBetweens","deltaAbsolute")])
GENE | EVENT | Fstat | Pdiff | Pzero | medianBetweens | deltaAbsolute | |
---|---|---|---|---|---|---|---|
1197 | DAZAP1 | HsaEX0018545 | 1.531845 | 0.6245004 | 0.6245004 | 0.0319597 | 0.0149702 |
1565 | SMARCA2 | HsaEX0060271 | 1.393718 | 0.5721423 | 0.5721423 | 0.0168873 | 0.0140413 |
1784 | GNAS | HsaEX0027841 | 7.605087 | 0.8578009 | 0.8578009 | -0.0528636 | 0.2314300 |
2649 | DNAJA3 | HsaEX0020236 | 1.270586 | 0.5736325 | 0.5736325 | 0.0334410 | 0.0178975 |
2771 | INTS10 | HsaEX0031861 | 9.684621 | 0.8501499 | 0.8501499 | -0.2672253 | 0.4973357 |
2790 | SNRNP70 | HsaEX0060707 | 2.220629 | 0.6369095 | 0.6369095 | -0.0153788 | 0.1056033 |
plotVolcano_MultipleGroups_Pdiff(betasTable = volcanoTable_multgroups)
plotVolcano_MultipleGroups_Fstat(betasTable = volcanoTable_multgroups)
<- "HsaEX0031861" eventID_mult
<- plotIndividualViolinsList(eventID = eventID_mult,
mult_violinplots npoints = 500,
psitable = dataset_multgroups_filtered$PSI,
qualtable = dataset_multgroups_filtered$Qual,
groupList = groupList,
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = F)
<- prepareTableEventMultiple(eventID = eventID_mult,
eventList psitable = dataset_multgroups_filtered$PSI,
qualtable = dataset_multgroups_filtered$Qual,
groupList = groupList,
npoints = 500,
maxDevTable = maxDevSimulationN100,
seed=TRUE,
CoverageWeight = F)
<- plotFstatFromEventObjListMultiple(eventObjList = eventList)
fstat_mult
plot_grid(fstat_mult,mult_violinplots, ncol=2, rel_widths = c(0.3, 1))
In the provided examples, the color palette used for group definition was manually defined. If the user wishes to automatically generate a palette of N
colors, the following code is advisable:
#install.packages("grDevices")
library(grDevices)
<- function(N) {
generate_pastel_colors <- vector("list", length = N)
pastel_colors
for (i in 1:N) {
# Generate random values for hue, chroma, and luminance
<- runif(1, min = 0, max = 360) # Adjust the range for a different initial hue
hue <- runif(1, min = 50, max = 80) # Adjust these values for your desired pastel range
chroma <- runif(1, min = 80, max = 90) # Adjust these values for your desired pastel range
luminance
# Create a pastel color using hcl() with the random values
<- hcl(hue, chroma, luminance)
pastel_color
<- pastel_color
pastel_colors[[i]]
}
return(pastel_colors)
}
# Generate 10 random pastel colors
<- 10
N <- generate_pastel_colors(N)
random_pastel_colors
cat("HEX codes of generated pastel colors: ")
## HEX codes of generated pastel colors:
cat(unlist(random_pastel_colors))
## #60F6B9 #FFCF8C #CFCFFF #D3D182 #FFC5B8 #FFCE71 #6FECFB #DCE67C #FFB8CA #FFBDFF
# Prepare data for the pie chart
<- rep(1, N) # Equal-sized slices for each color
slices # Display the pie chart with pastel colors
pie(slices, col = unlist(random_pastel_colors), border = "black", labels=unlist(random_pastel_colors), main = "Automatically generated color palette for group definition")