if (!require("remotes", quietly = TRUE))
install.packages("remotes")
::install_github("bartongroup/dexdash")
remotes
# Examples in this document require additional data
::install_github("bartongroup/dexdata") remotes
Dexdash: Differential EXpression Downstream Analysis SHiny app
This R package is designed for interactive exploration and analysis of differential expression (DE) results. It combines Shiny-based visualization tools with rapid functional enrichment analysis. Users can select genes from a volcano or MA plot to find their names, descriptions and plot expression patterns. The built-in fast functional enrichment (based on the fenr package) allows users to instantly see which biological pathways and processes those genes are involved in, based on enrichment analysis of Gene Ontology (GO), KEGG, and Reactome databases. The input for the app consists of five data frames containing DE results, expression or abundance data, experimental design, gene names and descriptions. A build-in function allows for easy downloading of functional term data. Once started, the app is fast and easy to use.
dexdash version 0.2.7.
Overview
Imagine you’ve carried out an RNA-seq experiment, a proteomics study, or another type of -omics research. You’ve done read mapping or protein quantifications and now have a set of data in hand. This data capture expression or abundance (we will call it ‘expression’ for simplicity) for each gene, protein, or other entity (hereinafter referred to as ‘feature’). These expressions are recorded across various biological conditions (which we’ll call ‘groups’) and their replicates. Following this, you’ve conducted a differential expression analysis using tools like limma, edgeR, DESeq2, or similar, ending up with a data frame that includes log-fold change, p-value, among other metrics.
Now, you’re ready to investigate the results. You want to generate a volcano or MA plot, quickly identify features of interest, and explore the functional similarities among groups of features that behave similarly. This is exactly what dexdash is for. Simply feed it your expression data, differential expression results, and some details about your experimental design and gene names, and voilà! You’re equipped with a web-based interactive tool to explore your data. A screenshot of the dexdash app is shown in Figure 1.
Installation
dexdash can be installed from GitHub by using:
Quick example
library(dexdash)
library(dexdata)
# Data examples available in the package
data(yeast_de, yeast_data, yeast_metadata)
# Create the dexdash data set
<- dexdash_set(yeast_de, yeast_data, yeast_metadata, name = "Yeast")
yeast_dexset
# The slow bit: download feature information and functional term data
<- download_feature_information(species = "yeast")
yeast_features <- download_functional_terms(species = "yeast")
yeast_terms <- prepare_functional_terms(yeast_terms, feature_name = "gene_id")
yeast_fterms
# The fast bit: interactive app
run_app(yeast_dexset, yeast_features, yeast_fterms)
Usage
The dexdash Shiny app is launched with a command:
run_app(dexset, features, fterms)
There are five mandatory arguments:
dexset
- adexdash_set
ordexdash_list
object containing differential expression results, expression of abundance data and the design of the experiment.dexdash_set
contains data from one experiment,dexdash_list
combines data from multiple experiments,features
- data frame with feature names and descriptions,fterms
- an object containing functional terms data.
User data: experimental results
The dexset
argument for the app should be created with dexdash_set()
or dexdash_list()
function, for single- and multiple-experiment data, respectively. For a single experiment, the dexdash_set
object is created using
<- dexdash_set(yeast_de, yeast_data, yeast_metadata, name = "Yeast") yeast_dexset
The inputs for this function are three data frames containing experimental results, provided by the user. As differential expression output differs from tool to tool, it would be difficult to feed them directly into dexdash. Therefore, users need to rename data frame columns as shown in Table 1. A fully worked example below demonstrates how to prepare the required data, step by step.
Data frame | Column | Type | Description |
---|---|---|---|
de | id | character | feature identifier |
log_fc | numeric | log-fold change; use logFC column from edgeR and limma output, log2FoldChange from DESeq2 |
|
expr | numeric | mean expression or abundance, typically in logarithmic scale; use logCPM from edgeR, log10(baseMean) from DESeq2, AveExpr from limma |
|
p_value | numeric | p-value from statistical tests; use PValue from edgeR and limma, pvalue from DESeq2 |
|
fdr | numeric | (optional) false discovery rate to assess statistical significance; if not provided, FDR will be calculated using the Benjamini-Hochberg multiple test corrections from the p-values | |
contrast | factor | name of the contrast; this data frame can contain results from multiple contrast | |
data | id | character | feature identifier |
sample | character | sample identifier | |
value | numeric | the expression, or abundance of the gene/protein for this sample | |
metadata | sample | character | sample identifier |
… | character/factor | at least one additional column with information that will be used as the x-axis or to colour the points in the feature plot |
Feature information
argument | column | type | description |
---|---|---|---|
features | id | character | feature identifier |
name | character | a human-readable name of the feature, typically a gene symbol | |
description | character | a brief description of the feature |
Feature (gene or protein) information should be organised in a data frame with columns id
, name
and description
, as explained in Table 2.
This data frame can be prepared by the user, but dexdash includes a helper function to download these data from Ensembl:
<- download_feature_information(species = "yeast") yeast_features
The necessary information to connect to the correct Ensembl dataset is stored in a JSON file attached to the package (see Section 4.4 for details).
Functional term data
For fast interactive functional enrichment, dexdash requires data downloaded and formatted appropriately. This can be achieved with the following built-in functions:
<- download_functional_terms(species = "yeast")
yeast_terms <- prepare_functional_terms(terms, feature_name = "gene_id") yeast_fterms
The first function, download_functional_terms()
, retrieves functional term data and term-feature mappings from online databases, organizing them into two user-readable data frames (tibbles). This format allows users to verify that the data adheres to the necessary specifications. The species
argument corresponds to an entry in the JSON file included with the package, which is detailed further in Section 4.4.
The second function, prepare_functional_terms()
, transforms these data frames into an object optimized for fast functional enrichment within dexdash, based on Bioconductor’s fenr package. The feature_name
argument designates which column in the mapping tibble acts as the feature identifier. For more details, refer to the fenr package vignette. Depending on your dataset, feature_name
should be set to "gene_symbol"
if dealing with gene symbols like “BRCA1” or “FOXP1”, or "gene_id"
for Ensembl identifiers such as “ENSG00000012048” or “ENSG00000114861”.
Species JSON file
Online databases such as GO, Reactome, KEGG, and Ensembl require specific species designations and dataset names for access, which are often not intuitive. For example, the designations for yeast are “sgd” in GO, “Saccharomyces cerevisiae” in Reactome, and “sce in KEGG. To make access easier, dexdash includes a JSON file containing these necessary identifiers along with Ensembl BioMart host and dataset details for various species. This file is located locally at:
<- system.file("extdata", "species.json", package = "dexdash") species_file
To check the species supported by dexdash, you can use:
list_species()
[1] "human" "mouse" "rat" "yeast" "roundworm" "slime mold"
[7] "fruit fly" "xenopus" "arabidopsis"
If your species is not listed, you need to create a JSON file in the same format, as the included file, and provide its location by argument species_file
in the download_functional_terms()
function.
Here is the entry for yeast in the JSON file:
{
"yeast": {
"ontology": {
"go": "sgd",
"reactome": "Saccharomyces cerevisiae",
"kegg": "sce"
},
"ensembl": {
"biomart": "ensembl",
"host": "https://www.ensembl.org",
"dataset": "scerevisiae_gene_ensembl"
}
}
}
It is divided in two sections:
"ontology"
: containing species designations needed to fetch data from GO, Reactome, and KEGG. These can be found by functions fenr::fetch_go_species(), fenr::fetch_reactome_species(), and fenr::fetch_kegg_species().
"ensembl"
: detailing the BioMart, host, and dataset information required by biomaRt::useMart()
, which varies for different organism groups such as plants and protists. These can be found using biomaRt::listMarts()
and biomaRt::listDatasets()
. For instance, this is the entry for Dictyostelium discoideum (commonly known as slime mold):
{
"slime mold": {
"ontology": {
"go": "dictybase",
"reactome": "Dictyostelium discoideum",
"kegg": "ddi"
},
"ensembl": {
"biomart": "protists_mart",
"host": "https://protists.ensembl.org",
"dataset": "ddiscoideum_eg_gene"
}
}
}
Simple example
In this example we are given only one data set containing read counts from an RNA-seq experiment (Gierlinski et al. 2015). It is a matrix yeast_mtx
, with rows corresponding to genes and columns corresponding to samples. It contains the same data as the yeast_data
object, but in wide, matrix format. We will demonstrate how to prepare all data required for dexdash from this one matrix.
We start with the packages needed for this example:
library(dexdash)
library(dexdata)
library(tibble)
library(dplyr)
library(tidyr)
library(forcats)
library(purrr)
library(stringr)
library(edgeR)
Count data
Here is the count matrix. This is the starting point for most of RNA-seq data analysis.
data(yeast_mtx)
dim(yeast_mtx)
[1] 6298 12
head(yeast_mtx)
Snf2-07 Snf2-21 Snf2-26 Snf2-27 Snf2-39 Snf2-40 WT-01 WT-10 WT-18 WT-29 WT-42 WT-43
YDL246C 7 4 5 0 0 3 1 1 2 2 1 3
YDL243C 411 273 356 151 423 250 192 191 183 160 154 206
YDR387C 1118 709 875 472 1242 706 377 555 576 497 469 523
YDL094C 62 62 17 22 56 22 3 22 29 19 26 18
YDR438W 488 296 408 248 537 267 153 239 211 272 208 199
YDR523C 64 38 52 43 78 48 17 88 63 90 38 20
Metadata
First, we create metadata with the design of the experiment. We extract strain designation (WT or Snf2) from sample names (column names of yeast_mtx
) and convert them into factors, with the first level WT
. As per dexdash requirements, metadata
contains a column called sample
and an additional column strain
:
<- tibble(sample = colnames(yeast_mtx)) |>
yeast_metadata separate_wider_delim(sample, names = c("strain", "replicate"), delim = "-", cols_remove = FALSE) |>
mutate(strain = fct_relevel(strain, "WT"))
yeast_metadata
# A tibble: 12 × 3
strain replicate sample
<fct> <chr> <chr>
1 Snf2 07 Snf2-07
2 Snf2 21 Snf2-21
3 Snf2 26 Snf2-26
4 Snf2 27 Snf2-27
5 Snf2 39 Snf2-39
6 Snf2 40 Snf2-40
7 WT 01 WT-01
8 WT 10 WT-10
9 WT 18 WT-18
10 WT 29 WT-29
11 WT 42 WT-42
12 WT 43 WT-43
We also create an additional column called replicate
. It can be used in the app to colour points in the feature plot.
Differential expression
Next, we carry out differential expression using edgeR. This is a simple case of two conditions (strains), each in six replicates. Again, we need specific column names for dexdash. If you use a different differential expression tool, the default column names would be different. In any case, they need to be id
, log_fc
, expr
, p_value
, and contrast
(see Table 1). Whichever DE tool you use, make sure to rename the columns before passing the result to dexdash
.
<- model.matrix(~strain, yeast_metadata)
design_mat
<- DGEList(yeast_mtx)
dge <- filterByExpr(dge, design_mat)
keep
<- dge[keep, ] |>
yeast_de normLibSizes() |>
glmQLFit(design = design_mat) |>
glmQLFTest(coef = "strainSnf2") |>
topTags(n = 1e16, sort.by = "none") |>
pluck("table") |>
as_tibble(rownames = "id") |>
add_column(contrast = "Snf2-WT") |>
rename(
log_fc = logFC,
expr = logCPM,
p_value = PValue
)
yeast_de
# A tibble: 5,999 × 7
id log_fc expr F p_value FDR contrast
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 YDL243C -0.0118 4.88 0.00723 0.933 0.952 Snf2-WT
2 YDR387C 0.00810 6.32 0.0143 0.907 0.934 Snf2-WT
3 YDL094C 0.295 1.87 0.782 0.391 0.490 Snf2-WT
4 YDR438W 0.0594 5.12 0.529 0.479 0.574 Snf2-WT
5 YDR523C -0.665 2.76 5.42 0.0350 0.0699 Snf2-WT
6 YDR492W -0.636 6.78 11.2 0.00468 0.0135 Snf2-WT
7 YDR018C 1.19 4.09 50.3 0.00000469 0.0000627 Snf2-WT
8 YDL189W 0.0568 7.16 0.608 0.448 0.545 Snf2-WT
9 YDR508C -0.0831 8.13 0.439 0.518 0.610 Snf2-WT
10 YDR462W -0.171 5.66 4.05 0.0632 0.112 Snf2-WT
# ℹ 5,989 more rows
Count data in long format
dexdash
requires count data in long format, with column names id
, sample
and value
. We create the appropriate data frame:
<- yeast_mtx |>
yeast_data as_tibble(rownames = "id") |>
pivot_longer(-id, names_to = "sample", values_to = "value")
yeast_data
# A tibble: 75,576 × 3
id sample value
<chr> <chr> <int>
1 YDL246C Snf2-07 7
2 YDL246C Snf2-21 4
3 YDL246C Snf2-26 5
4 YDL246C Snf2-27 0
5 YDL246C Snf2-39 0
6 YDL246C Snf2-40 3
7 YDL246C WT-01 1
8 YDL246C WT-10 1
9 YDL246C WT-18 2
10 YDL246C WT-29 2
# ℹ 75,566 more rows
Complete data set
Now the function dexdash_set()
can be used to create the full data set for dexdash:
<- dexdash_set(yeast_de, yeast_data, yeast_metadata, name = "Yeast") yeast_dexset
The name, "Yeast"
, will be shown in the data set drop-down menu in the app. In this case we have just one experiment, so the drop-down menu will contain only one element.
Feature names and descriptions
Next, we need a table converting gene identifiers into human-readable gene names and descriptions. This can be done with the helper function download_feature_information()
:
<- download_feature_information(species = "yeast") yeast_features
|>
yeast_features arrange(name)
# A tibble: 7,127 × 3
id name description
<chr> <chr> <chr>
1 Q0020 15S_RRNA Ribosomal RNA of the small mitochondrial ribosomal subunit
2 Q0158 21S_RRNA Mitochondrial 21S rRNA
3 YMR056C AAC1 Mitochondrial inner membrane ADP/ATP translocator
4 YBR085W AAC3 Mitochondrial inner membrane ADP/ATP translocator
5 YJR155W AAD10 Putative aryl-alcohol dehydrogenase
6 YNL331C AAD14 Putative aryl-alcohol dehydrogenase
7 YOL165C AAD15 Putative aryl-alcohol dehydrogenase
8 YCR107W AAD3 Putative aryl-alcohol dehydrogenase
9 YDL243C AAD4 Putative aryl-alcohol dehydrogenase
10 YFL056C AAD6 Putative aryl-alcohol dehydrogenase
# ℹ 7,117 more rows
Functional enrichment data
Finally, as indicated earlier, we need the functional enrichment data. This is done in two steps:
<- download_functional_terms("yeast")
yeast_terms <- prepare_functional_terms(yeast_terms, feature_name = "gene_id") yeast_fterms
Run dexdash
Now, we have all necessary data to launch dexdash
:
run_app(yeast_dexset, yeast_features, yeast_fterms)
By default, the x-axis varable in the feature plot is “sample”. This can be changed in the app by clicking on the gear icon and selecting a different x-axis variable. Alternatively, we can specify which of the metadata
columns we want to use when starting the app from the console:
run_app(yeast_dexset, yeast_features, yeast_fterms, x_variable = "strain")
Advanced example
Here we show an example of data, where feature identifiers do not correspond directly to those in the online databases. The data comes from a mouse RNA-seq experiment (originally from Blümli et al. (2021). Here we use only three time points.
Again, we need the following packages loaded:
library(dexdash)
library(dexdata)
library(tibble)
library(dplyr)
library(tidyr)
library(forcats)
library(purrr)
library(stringr)
library(edgeR)
Messing with identifiers
Struggling with identifiers is all in a day’s work of a bioinformatician. Alas, there are no unified gene of protein identifiers in this imperfect word, and getting data from various sources inevitable leads to inconsistencies. This is our read count matrix:
data(mouse_mtx)
head(mouse_mtx)
Samples
Tags 0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1 18h-2 18h-3 18h-4
ENSMUSG00000051951.5 2 4 0 10 4 3 2 3 3 4 5 6
ENSMUSG00000103161.1 0 0 1 1 4 0 0 3 2 1 1 4
ENSMUSG00000102331.1 3 7 2 6 2 1 1 0 4 1 0 3
ENSMUSG00000025902.13 16 15 15 31 17 16 13 10 29 17 34 11
ENSMUSG00000098104.1 0 3 0 1 1 2 1 1 4 6 1 2
ENSMUSG00000103922.1 114 82 42 67 89 85 47 65 162 91 80 73
Genes are identified by Ensemble gene IDs. Let us have a look at the mouse functional term information, downloaded from GO, Reactome and KEGG. First, we download terms and mappings:
<- download_functional_terms(species = "mouse") mouse_terms
It is a nested list, the first level corresponds to the “ontology”, that is go
, reactome
and kegg
. Each ontology consists of two data frames, mapping
- containing feature-term mapping and terms
- containing term information. Let’s look at a random selection of rows in feature-to-term mapping tables:
set.seed(42)
$go$mapping |> slice_sample(n = 10) mouse_terms
# A tibble: 10 × 4
gene_symbol gene_id db_id term_id
<chr> <chr> <chr> <chr>
1 EHBP1 Flj21950 MGI:2667252 GO:0005654
2 DHX30 2810477H02Rik MGI:1920081 GO:0035770
3 ITGAV 1110004F14Rik MGI:96608 GO:0034684
4 FZD1 Fz1 MGI:1196625 GO:0030165
5 CSNK1G1 9130020E21Rik MGI:2660884 GO:0000166
6 EI24 PIG8 MGI:108090 GO:0005794
7 TMEM40 9030407H20Rik MGI:2137870 GO:0008150
8 KCND2 Kv4.2 MGI:102663 GO:0006813
9 EIF3E 48kDa MGI:99257 GO:0006412
10 RBBP5 4933411J24Rik MGI:1918367 GO:0042393
$reactome$mapping |> slice_sample(n = 10) mouse_terms
# A tibble: 10 × 3
gene_id gene_symbol term_id
<chr> <chr> <chr>
1 ENSMUSG00000074886 GRK6 R-MMU-418555
2 ENSMUST00000057580 MTOR R-MMU-380972
3 ENSMUSP00000128608 UNC13B R-MMU-181430
4 ENSMUSG00000042243.11 BC051665 R-MMU-8939242
5 ENSMUST00000212214 CSNK2A2 R-MMU-2514853
6 ENSMUSG00000056220 PLA2G4A R-MMU-1482922
7 ENSMUSP00000024810 FGD2 R-MMU-9013148
8 ENSMUSG00000032310 CYP1A2 R-MMU-5423646
9 ENSMUST00000111751 MYL2 R-MMU-390522
10 ENSMUST00000007757 TGFBR1 R-MMU-2173789
$kegg$mapping |> slice_sample(n = 10) mouse_terms
# A tibble: 10 × 3
gene_id gene_symbol term_id
<chr> <chr> <chr>
1 66171 PGLS mmu00030
2 16456 F11R mmu04514
3 14388 GAB1 mmu04014
4 12289 CACNA1D mmu04728
5 79459 ALDOART2 mmu00010
6 18125 NOS1 mmu05022
7 69146 GSDMD mmu04613
8 22408 WNT1 mmu05225
9 100727 UGT2B34 mmu00040
10 242519 IFNA12 mmu05165
We have a problem. Our data uses Ensembl IDs, but out of the three databases, only Reactome provides with Ensembl IDs. GO returns gene aliases and KEGG uses NCBI IDs. We need put all these data together. The only common identifier is the gene symbol returned by all three databases. To make it consistent with our data, we need to convert Ensembl IDs into gene symbols. For this, we use the function download_feature_information()
:
<- download_feature_information(species = "mouse") mouse_genes
mouse_genes
# A tibble: 57,180 × 3
id name description
<chr> <chr> <chr>
1 ENSMUSG00000000001 Gnai3 guanine nucleotide binding protein (G protein), alpha inhibiting 3
2 ENSMUSG00000000003 Pbsn probasin
3 ENSMUSG00000000028 Cdc45 cell division cycle 45
4 ENSMUSG00000000031 H19 H19, imprinted maternally expressed transcript
5 ENSMUSG00000000037 Scml2 Scm polycomb group protein like 2
6 ENSMUSG00000000049 Apoh apolipoprotein H
7 ENSMUSG00000000056 Narf nuclear prelamin A recognition factor
8 ENSMUSG00000000058 Cav2 caveolin 2
9 ENSMUSG00000000078 Klf6 Kruppel-like transcription factor 6
10 ENSMUSG00000000085 Scmh1 sex comb on midleg homolog 1
# ℹ 57,170 more rows
It returns a data frame linking Ensembl IDs and gene symbols. It also provides with gene descriptions, which we are going to need later. From this data frame we create a new one:
<- tibble(
id2symbol rowname = rownames(mouse_mtx),
id = rowname |>
str_remove("\\.\\d+$")
|>
) inner_join(mouse_genes, by = "id") |>
mutate(gene_id = toupper(name))
id2symbol
# A tibble: 28,133 × 5
rowname id name description gene_id
<chr> <chr> <chr> <chr> <chr>
1 ENSMUSG00000051951.5 ENSMUSG00000051951 Xkr4 X-linked Kx blood group related 4 XKR4
2 ENSMUSG00000103161.1 ENSMUSG00000103161 Gm38148 predicted gene, 38148 GM38148
3 ENSMUSG00000102331.1 ENSMUSG00000102331 Gm19938 predicted gene, 19938 GM19938
4 ENSMUSG00000025902.13 ENSMUSG00000025902 Sox17 SRY (sex determining region Y)-box … SOX17
5 ENSMUSG00000098104.1 ENSMUSG00000098104 Gm6085 predicted gene 6085 GM6085
6 ENSMUSG00000103922.1 ENSMUSG00000103922 Gm6123 predicted gene 6123 GM6123
7 ENSMUSG00000033845.13 ENSMUSG00000033845 Mrpl15 mitochondrial ribosomal protein L15 MRPL15
8 ENSMUSG00000102275.1 ENSMUSG00000102275 Gm37144 predicted gene, 37144 GM37144
9 ENSMUSG00000025903.14 ENSMUSG00000025903 Lypla1 lysophospholipase 1 LYPLA1
10 ENSMUSG00000033813.15 ENSMUSG00000033813 Tcea1 transcription elongation factor A (… TCEA1
# ℹ 28,123 more rows
It contains rownames from the count table (rowname
), the extracted Ensembl ID (id
), with gene annotation version numbers (e.g. .1
) removed. This data frame is joined with our gene data frame. In addition, we converted gene symbols into upper case (column gene_id
). This is another issue - different sources provide gene symbols in different capitalisation. Sometimes, the first letter is in capitals, sometimes the entire gene symbol is capitalised. To circumvent this issue our function download_functional_terms()
automatically capitalises all gene symbols, hence, we need to do the same in the id2symbol
structure.
Having prepared all of this, we can rename rows in the count table:
<- mouse_mtx[id2symbol$rowname, ]
renamed_mtx rownames(renamed_mtx) <- id2symbol$gene_id
head(renamed_mtx)
Samples
Tags 0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1 18h-2 18h-3 18h-4
XKR4 2 4 0 10 4 3 2 3 3 4 5 6
GM38148 0 0 1 1 4 0 0 3 2 1 1 4
GM19938 3 7 2 6 2 1 1 0 4 1 0 3
SOX17 16 15 15 31 17 16 13 10 29 17 34 11
GM6085 0 3 0 1 1 2 1 1 4 6 1 2
GM6123 114 82 42 67 89 85 47 65 162 91 80 73
Now, we have consistent identifers in count data and functional enrichment data. We can now proceed.
Metadata
Like before, we create a metadata data frame with experiment design:
<- tibble(sample = colnames(renamed_mtx)) |>
mouse_metadata separate_wider_delim(sample, names = c("time_point", "replicate"), delim = "-", cols_remove = FALSE) |>
mutate(time_point = fct_relevel(time_point, c("0h", "2h", "18h")))
mouse_metadata
# A tibble: 12 × 3
time_point replicate sample
<fct> <chr> <chr>
1 0h 1 0h-1
2 0h 2 0h-2
3 0h 3 0h-3
4 0h 4 0h-4
5 2h 1 2h-1
6 2h 2 2h-2
7 2h 3 2h-3
8 2h 4 2h-4
9 18h 1 18h-1
10 18h 2 18h-2
11 18h 3 18h-3
12 18h 4 18h-4
Differential expression
Differential expression if carried out with edgeR, like for the yeast data, but this time we have more than one contrast. We select timepoint 0h
as the baseline and find contrasts between this and other time points.
<- model.matrix(~time_point, mouse_metadata)
design_mat <- colnames(design_mat)[-1]
coefficients
<- DGEList(renamed_mtx)
dge <- filterByExpr(dge, design_mat)
keep <- dge[keep, ] |>
fit normLibSizes() |>
glmQLFit(design = design_mat)
<- map(coefficients, function(coef) {
mouse_de |>
fit glmQLFTest(coef = coef) |>
topTags(n = 1e16, sort.by = "none") |>
pluck("table") |>
as_tibble(rownames = "id") |>
add_column(contrast = coef)
|>
}) list_rbind() |>
rename(
log_fc = logFC,
expr = logCPM,
p_value = PValue
)
mouse_de
# A tibble: 38,954 × 7
id log_fc expr F p_value FDR contrast
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 SOX17 -0.330 1.45 0.719 0.410 1.00 time_point2h
2 GM6123 0.0253 3.47 0.0121 0.914 1.00 time_point2h
3 MRPL15 -0.00572 7.54 0.00165 0.968 1.00 time_point2h
4 GM37144 0.549 3.32 3.32 0.0881 1.00 time_point2h
5 LYPLA1 0.181 6.63 0.783 0.390 1.00 time_point2h
6 TCEA1 0.00297 6.54 0.000521 0.982 1.00 time_point2h
7 GM6104 0.00878 2.12 0.000346 0.985 1.00 time_point2h
8 GM37277 0.00513 2.69 0.000125 0.991 1.00 time_point2h
9 RGS20 -0.344 2.49 0.635 0.438 1.00 time_point2h
10 GM16041 -0.452 1.15 0.925 0.352 1.00 time_point2h
# ℹ 38,944 more rows
Count data in long format
We need data in long format:
<- renamed_mtx |>
mouse_data as_tibble(rownames = "id") |>
pivot_longer(-id, names_to = "sample", values_to = "value")
mouse_data
# A tibble: 337,596 × 3
id sample value
<chr> <chr> <dbl>
1 XKR4 0h-1 2
2 XKR4 0h-2 4
3 XKR4 0h-3 0
4 XKR4 0h-4 10
5 XKR4 2h-1 4
6 XKR4 2h-2 3
7 XKR4 2h-3 2
8 XKR4 2h-4 3
9 XKR4 18h-1 3
10 XKR4 18h-2 4
# ℹ 337,586 more rows
Dexdash data set
<- dexdash_set(mouse_de, mouse_data, mouse_metadata, name = "Mouse") mouse_dexdash
Feature names and descriptions
For the Shiny app, a feature information table is needed to link feature identifiers (capitalised gene symbols), feature names (original gene symbols) and descriptions. The columns needed are described in Table 2. Here is how we prepare the required tibble:
<- mouse_genes |>
mouse_features mutate(id = toupper(name))
mouse_features
# A tibble: 57,180 × 3
id name description
<chr> <chr> <chr>
1 GNAI3 Gnai3 guanine nucleotide binding protein (G protein), alpha inhibiting 3
2 PBSN Pbsn probasin
3 CDC45 Cdc45 cell division cycle 45
4 H19 H19 H19, imprinted maternally expressed transcript
5 SCML2 Scml2 Scm polycomb group protein like 2
6 APOH Apoh apolipoprotein H
7 NARF Narf nuclear prelamin A recognition factor
8 CAV2 Cav2 caveolin 2
9 KLF6 Klf6 Kruppel-like transcription factor 6
10 SCMH1 Scmh1 sex comb on midleg homolog 1
# ℹ 57,170 more rows
Functional enrichment data
Finally, we need to prepare functional enrichment data in the fast enrichment format. We specify feature_name = "gene_symbol"
to inform the function which column should be used to identify features.
<- prepare_functional_terms(mouse_terms, feature_name = "gene_symbol") mouse_fterms
Run dexdash
Now, we have all necessary data to launch dexdash
:
run_app(mouse_dexdash, mouse_features, mouse_fterms)
Multiple data sets
The function dexdash_list()
can be used to combine a few data sets together, for example:
<- dexdash_set(de1, data1, metadata1, "Experiment 1")
set1 <- dexdash_set(de2, data2, metadata2, "Experiment 2")
set2
<- dexdash_list(se1, set2)
dexlist run_app(declist, features, fterms)
Here set1
and set2
contain data and differential expression results from two separate expreiments. They are, however, based on the same organism, so they share features
and fterms
.
Comparison with Other Tools
- iDEP is a flexible web-based tool that supports comprehensive data analysis, starting from the count matrix. It includes preprocessing, clustering, PCA, differential expression, KEGG pathway analysis, and more. While it generates volcano and MA plots, these lack fast interactivity. Specifically, enrichment analysis in iDEP is only performed on down- or up-regulated genes. In contrast, dexdash is smaller and more focused on interactivity, visualization, and fast functional enrichment on any arbitrary gene selection.
- ShinyGO is a web-based Shiny app for comprehensive pathway enrichment analysis. Users can submit a list of gene names for analysis, which then provides various outputs. In contrast, dexdash works with differential expression result sets and creates interactive volcano/MA plots for fast repeated gene selection and simple GO, KEGG, and Reactome enrichment analysis.
- Glimma is a Bioconductor R package designed to create interactive MDS, MA, and volcano plots. Users can explore gene expression patterns by selecting individual genes. In contrast, dexdash allows for multiple gene selection, functional enrichment analysis, and reverse selection of functional terms to find functionally related genes.
- ShinyNGS is a Shiny-based exploratory tool for RNA-seq data analysis. It includes a variety of interactive plots for quality control, differential expression, and gene-set enrichment. While dexdash shares some features, it offers fast interactive enrichment analysis, which is not provided by ShinyNGS.
- Other packages. VolcaNoseR is a simple customizable volcano plot with no additional functionality. Degust is a web-based tool that includes interactive volcano and MA plots offering KEGG pathway visualization but lacks functional enrichment. R packages clusterProfiler, ReactomePA, GOplot, and topGO provide robust tools for enrichment analysis but do not offer the interactive, user-driven experience found in dexdash.
Session info
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] dexdata_0.1.0 dexdash_0.2.7 edgeR_4.2.1 limma_3.60.4 stringr_1.5.1 forcats_1.0.0
[7] purrr_1.0.2 tidyr_1.3.1 dplyr_1.1.4 tibble_3.2.1
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 renv_1.0.7 stringi_1.8.3 lattice_0.22-6
[6] digest_0.6.35 magrittr_2.0.3 evaluate_0.23 grid_4.4.1 pkgload_1.4.0
[11] fastmap_1.2.0 jsonlite_1.8.8 processx_3.8.4 sessioninfo_1.2.2 pkgbuild_1.4.4
[16] ps_1.7.7 urlchecker_1.0.1 promises_1.3.0 fansi_1.0.6 cli_3.6.2
[21] shiny_1.9.1 rlang_1.1.3 splines_4.4.1 ellipsis_0.3.2 withr_3.0.0
[26] remotes_2.5.0 cachem_1.1.0 yaml_2.3.10 devtools_2.4.5 tools_4.4.1
[31] memoise_2.0.1 locfit_1.5-9.10 httpuv_1.6.15 assertthat_0.2.1 vctrs_0.6.5
[36] R6_2.5.1 mime_0.12 lifecycle_1.0.4 fs_1.6.3 htmlwidgets_1.6.4
[41] usethis_3.0.0 miniUI_0.1.1.1 fenr_1.2.1 callr_3.7.6 pkgconfig_2.0.3
[46] pillar_1.9.0 later_1.3.2 profvis_0.3.8 glue_1.7.0 Rcpp_1.0.13
[51] statmod_1.5.0 xfun_0.43 tidyselect_1.2.1 knitr_1.46 xtable_1.8-4
[56] htmltools_0.5.8.1 rmarkdown_2.27 compiler_4.4.1