library(dexdash)
This R package is designed for interactive exploration and analysis of differential expression (DE) results. It combines Shiny-based visualization tools with rapid functional enrichment analysis. Users can select genes from a volcano or MA plot to find their names, descriptions and plot expression patterns. The build-in fast functional enrichment (based on fenr
package) allows to see instantly which biological pathways and processes those genes are involved in, based on enrichment analysis of Gene Ontology (GO), KEGG, and Reactome databases. The input for the app consists of five data frames containing DE results, expression or abundance data, experimental design, gene names and descriptions. A build-in function allowes for easy downloading of functional term data. Once started, the app is fast and easy to use.
library(dexdash)
Imagine you’ve carried out an RNA-seq experiment, a proteomics study, or maybe another type of -omics research. You’ve done read mapping or protein quantifications and now have a set of data in hand. This data captures expression or abundance (we will call it ‘expression’ for simplicity) for each gene, protein, or other entity (hereinafter referred to as ‘feature’). These expressions are recorded across various biological conditions (which we’ll call ‘groups’) and their replicates. Following this, you’ve conducted a differential expression analysis using tools like limma, edgeR, DESeq2, or similar, ending up with a data frame that includes log-fold change, p-value, among other metrics.
Now, you’re ready to investigate the results. You want to generate a volcano or MA plot, quickly identify features of interest, and explore the functional similarities among groups of features that behave similarly. This is exactly what dexdash is for. Simply feed it your expression data, differential expression results, and some details about your experimental design and gene names, and voila! You’re equipped with a web-based interactive tool to explore your data.
dexdash can be installed from GitHub by using:
if (!require("remotes", quietly = TRUE))
install.packages("remotes")
::install_github("bartongroup/dexdash") remotes
library(dexdash)
# Data examples available in the package
data(yeast_de, yeast_data, yeast_metadata)
# The slow bit: download feature information and functional term data
<- download_feature_information(species = "yeast")
yeast_features <- download_functional_terms(species = "yeast")
yeast_terms <- prepare_functional_terms(terms, feature_name = "gene_id")
yeast_fterms
# The fast bit: interactive app
run_app(yeast_de, yeast_data, yeast_metadata, yeast_features, yeast_fterms)
The dexdash Shiny app is launched with a command:
run_app(de, data, metadata, features, fterms)
There are five mandatory arguments:
de
- data frame with differential expression results,data
- data frame with expression or abundance data,metadata
- data frame with the design of the experiment,features
- data frame with feature names and descriptions,fterms
- an object containing functional terms data.There are three data frames containing experimental results, provided by the user. As differential expression output differs from tool to tool, it would be difficult to feed them directly into dexdash. Therefore, the user needs to rename data frame columns as shown in Table 1. A fully worked example below demonstrates how to prepare the required data, step by step.
argument | column | type | description |
---|---|---|---|
de | id | character | feature identifier |
log_fc | numeric | log-fold change; use logFC from edgeR and limma, log2FoldChange from DESeq2 | |
expr | numeric | mean expression or abundance, typically in logarithmic scale; use logCPM from edgeR, log10(baseMean) from DESeq2, AveExpr from limma | |
p_value | numeric | p-value from statistical tests; use PValue from edgeR and limma, pvalue from DESeq2 | |
contrast | factor | name of the contrast; this data frame can contain results from multiple contrast | |
data | id | character | feature identifier |
sample | character | sample identifer | |
value | numeric | the expression, or abundance of the gene/protein for this sample | |
metadata | sample | character | sample identifer |
group | character/factor | a grouping variable, for example condition or treatment |
Feature (gene or protein) information should be prepared in a data frame with columns id
, name
and description
, as explained in Table 2.
argument | column | type | description |
---|---|---|---|
features | id | character | feature identifier |
name | character | a human-readable name of the feature, typically a gene symbol | |
decription | character | a brief description of the feature |
This data frame can be prepared by the user, but dexdash includes a helper function to download these data from Ensembl:
<- download_feature_information(species = "yeast") yeast_features
The necessary information to connect to the correct Ensembl dataset is stored in a JSON file attached to the package (see Section 4.4 for details).
To perform fast interactive functional enrichment, dexdash requires data downloaded from on-line databases and prepared for fast enrichment. This can be achieved with the following functions:
<- download_functional_terms(species = "yeast")
yeast_terms <- prepare_functional_terms(terms, feature_name = "gene_id") yeast_fterms
The first function downloads functional term data and term-feature mapping from on-line databases and returns them in a format of two data frame (tibbles). These are user-readable and can be accessed to check if data conforms to the required format. The argument species = "yeast"
refers to the entry in the JSON file attached to the package (see Section 4.4 for details).
The second function converts data frames into an object required by dexdash for fast functional enrichment (based on Bioconductor’s fenr package). The second argument, feature_name
, specifies the name of the column in the mapping
tibble to be used as the feature identifier. See the vignette of fenr
package for more details. feature_name
can be either "gene_symbol"
or "gene_id"
. If your data contain gene symbols (e.g. "BRCA1"
or "FOXP1"
), use feature_name = "gene_symbol"
. If your data contain Ensembl identifiers (e.g. "ENSG00000012048"
or "ENSG00000114861"
), use feature_name = "gene_id"
.
On-line databases (GO, Reactome, KEGG and Ensembl) can be accessed by specifying a species designation (functional terms data) and dataset (Ensembl). In many cases these names are not obvious to the user. For example, species designations for yeast are “sgd”, “Saccharomyces cerevisiae”, and “sce” for GO, Reactome and KEGG databases. There is a small JSON file attached to the package that contains these designations and Ensembl biomart, host and dataset names for a selection of species. The file can be found at this path on the local machine:
<- system.file("extdata", "species.json", package = "dexdash") species_file
You can quickly check which species are available using a function
list_species()
#> [1] "human" "mouse" "rat" "yeast"
#> [5] "roundworm" "slime mold" "fruit fly" "xenopus"
#> [9] "arabidopsis"
If your species is not included, you need to create a JSON file in the same format, as the included file, and provide its location by argument species_file
in the download_functional_terms()
function.
Here is the entry for yeast:
{
"yeast": {
"ontology": {
"go": "sgd",
"reactome": "Saccharomyces cerevisiae",
"kegg": "sce"
},
"ensembl": {
"biomart": "ensembl",
"host": "https://www.ensembl.org",
"dataset": "scerevisiae_gene_ensembl"
}
}
}
The “ontology” field contains species designations for on-line databases. The species designations can be found using fenr::fetch_go_species()
, fenr::fetch_reactome_species()
and fenr::fetch_kegg_species()
. These three functions return data frames, where column designation
contains the species designation required.
The “ensembl” field contains the biomart, the host and the dataset used by the BiomaRt::useMart()
function. These are different for plants and protists. For example, this is the entry for Dictyostelium discoideum (we call it “slime mold”):
{
"slime mold": {
"ontology": {
"go": "dictybase",
"reactome": "Dictyostelium discoideum",
"kegg": "ddi"
},
"ensembl": {
"biomart": "protists_mart",
"host": "https://protists.ensembl.org",
"dataset": "ddiscoideum_eg_gene"
}
}
}
We start with the count data from an RNA-seq experiment. Let’s call it yeast_mtx
. It is a matrix with rows corresponding to genes and columns corresponding to samples. It contains the same data as the yeast_data
object, but in wide, matrix format.
data(yeast_mtx)
dim(yeast_mtx)
#> [1] 6298 12
head(yeast_mtx)
#> Snf2-07 Snf2-21 Snf2-26 Snf2-27 Snf2-39 Snf2-40 WT-01 WT-10
#> YDL246C 7 4 5 0 0 3 1 1
#> YDL243C 411 273 356 151 423 250 192 191
#> YDR387C 1118 709 875 472 1242 706 377 555
#> YDL094C 62 62 17 22 56 22 3 22
#> YDR438W 488 296 408 248 537 267 153 239
#> YDR523C 64 38 52 43 78 48 17 88
#> WT-18 WT-29 WT-42 WT-43
#> YDL246C 2 2 1 3
#> YDL243C 183 160 154 206
#> YDR387C 576 497 469 523
#> YDL094C 29 19 26 18
#> YDR438W 211 272 208 199
#> YDR523C 63 90 38 20
First, we create a metadata data frame with grouping information. We extract group designation (WT or Snf2) form sample names and convert them into factors, with the first level WT
. As per dexdash requirements, the column names of this data frame are sample
and group
:
library(tibble)
library(dplyr)
library(tidyr)
library(forcats)
<- tibble(sample = colnames(yeast_mtx)) |>
yeast_metadata mutate(group = str_extract(sample, "^.+(?=-)")) |>
mutate(group = fct_relevel(group, "WT"))
yeast_metadata#> # A tibble: 12 × 2
#> sample group
#> <chr> <fct>
#> 1 Snf2-07 Snf2
#> 2 Snf2-21 Snf2
#> 3 Snf2-26 Snf2
#> 4 Snf2-27 Snf2
#> 5 Snf2-39 Snf2
#> 6 Snf2-40 Snf2
#> 7 WT-01 WT
#> 8 WT-10 WT
#> 9 WT-18 WT
#> 10 WT-29 WT
#> 11 WT-42 WT
#> 12 WT-43 WT
Next, we carry out differential expression using edgeR. This is a simple case of two conditions (groups), each in six replicates. Again, we need specific column names for dexdash. If you use a different differential expression tool, the default column names would be different. In any case, they need to be id
, log_fc
, expr
, p_value
, and contrast
. Whichever DE tool you use, make sure to rename the columns before passing the result to dexdash
.
library(edgeR)
library(purrr)
<- model.matrix(~group, yeast_metadata)
design_mat <- yeast_mtx |>
yeast_de DGEList() |>
calcNormFactors() |>
estimateDisp(design = design_mat) |>
glmQLFit(design = design_mat) |>
glmQLFTest(coef = "groupSnf2") |>
topTags(n = 1e16, sort.by = "none") |>
pluck("table") |>
as_tibble(rownames = "id") |>
add_column(contrast = "Snf2-WT") |>
rename(
log_fc = logFC,
expr = logCPM,
p_value = PValue
)
yeast_de#> # A tibble: 6,298 × 7
#> id log_fc expr F p_value FDR contrast
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 YDL246C 0.172 -0.937 0.0645 0.803 0.855 Snf2-WT
#> 2 YDL243C -0.0107 4.88 0.00599 0.939 0.958 Snf2-WT
#> 3 YDR387C 0.00915 6.32 0.0183 0.894 0.924 Snf2-WT
#> 4 YDL094C 0.301 1.89 0.791 0.388 0.493 Snf2-WT
#> 5 YDR438W 0.0602 5.12 0.541 0.474 0.574 Snf2-WT
#> 6 YDR523C -0.650 2.75 5.00 0.0414 0.0824 Snf2-WT
#> 7 YDR492W -0.636 6.78 11.3 0.00445 0.0134 Snf2-WT
#> 8 YDR018C 1.19 4.11 49.6 0.00000465 0.0000660 Snf2-WT
#> 9 YDL189W 0.0578 7.16 0.617 0.445 0.547 Snf2-WT
#> 10 YDR508C -0.0823 8.13 0.426 0.524 0.621 Snf2-WT
#> # ℹ 6,288 more rows
dexdash
requires count data in long format, with column names id
, sample
and value
. We create the appropriate data frame:
<- yeast_mtx |>
yeast_data as_tibble(rownames = "id") |>
pivot_longer(-id, names_to = "sample", values_to = "value")
yeast_data#> # A tibble: 75,576 × 3
#> id sample value
#> <chr> <chr> <int>
#> 1 YDL246C Snf2-07 7
#> 2 YDL246C Snf2-21 4
#> 3 YDL246C Snf2-26 5
#> 4 YDL246C Snf2-27 0
#> 5 YDL246C Snf2-39 0
#> 6 YDL246C Snf2-40 3
#> 7 YDL246C WT-01 1
#> 8 YDL246C WT-10 1
#> 9 YDL246C WT-18 2
#> 10 YDL246C WT-29 2
#> # ℹ 75,566 more rows
Next, we need a table convertinggene identifiers into human-readable gene names and descriptions. This can be done with the helper function download_feature_information()
:
<- download_feature_information(species = "yeast") yeast_features
|>
yeast_features arrange(name)
#> # A tibble: 7,127 × 3
#> id name description
#> <chr> <chr> <chr>
#> 1 Q0020 15S_RRNA Ribosomal RNA of the small mitochondrial ribosomal …
#> 2 Q0158 21S_RRNA Mitochondrial 21S rRNA
#> 3 YMR056C AAC1 Mitochondrial inner membrane ADP/ATP translocator
#> 4 YBR085W AAC3 Mitochondrial inner membrane ADP/ATP translocator
#> 5 YJR155W AAD10 Putative aryl-alcohol dehydrogenase
#> 6 YNL331C AAD14 Putative aryl-alcohol dehydrogenase
#> 7 YOL165C AAD15 Putative aryl-alcohol dehydrogenase
#> 8 YCR107W AAD3 Putative aryl-alcohol dehydrogenase
#> 9 YDL243C AAD4 Putative aryl-alcohol dehydrogenase
#> 10 YFL056C AAD6 Putative aryl-alcohol dehydrogenase
#> # ℹ 7,117 more rows
Finally, as indicated earlier, we need the functional enrichment data. This is done in two steps:
<- download_functional_terms("yeast")
yeast_terms <- prepare_functional_terms(yeast_terms, feature_name = "gene_id") yeast_fterms
Now, we have all necessary data to launch dexdash
:
run_app(yeast_de, yeast_data, yeast_metadata, yeast_features, yeast_fterms)
Struggling with identifiers is all in a day’s work of a bioinformatician. Alas, there are no unified gene of protein identifiers in this imperfect word, and getting data from various sources inevitable leads to inconsistencies. Below, we show an example of a typical dexdash workflow, which requires a little bit more effort.
Consider a count matrix from a mouse RNA-seq experiment (originally from Blümli et al. (2021), here we use only three time points and KO):
data(mouse_mtx)
head(mouse_mtx)
#> Samples
#> Tags 0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1
#> ENSMUSG00000064842.1 0 0 0 0 0 0 0 0 0
#> ENSMUSG00000051951.5 2 4 0 10 4 3 2 3 3
#> ENSMUSG00000102851.1 0 0 0 0 0 0 0 0 0
#> ENSMUSG00000103377.1 0 0 2 0 0 0 2 1 0
#> ENSMUSG00000104017.1 0 0 0 3 0 0 0 1 0
#> ENSMUSG00000103025.1 1 0 0 1 0 0 0 1 0
#> Samples
#> Tags 18h-2 18h-3 18h-4 KO-1 KO-2 KO-3 KO-4
#> ENSMUSG00000064842.1 0 0 0 0 0 0 0
#> ENSMUSG00000051951.5 4 5 6 3 5 1 8
#> ENSMUSG00000102851.1 0 0 0 0 0 0 0
#> ENSMUSG00000103377.1 1 0 1 0 1 0 0
#> ENSMUSG00000104017.1 2 0 0 0 0 1 1
#> ENSMUSG00000103025.1 0 0 1 0 3 0 1
Genes are identified by Ensemble gene IDs. Let us have a look at the mouse functional term information, downloaded from GO, Reactome and KEGG. First, we download terms and mappings:
<- download_functional_terms(species = "mouse") mouse_terms
Now, we look at a random selection of rows in feature-to-term mapping tables:
set.seed(42)
$go$mapping |> sample_n(10)
mouse_terms#> # A tibble: 10 × 4
#> gene_symbol gene_id db_id term_id
#> <chr> <chr> <chr> <chr>
#> 1 EHBP1 Flj21950 MGI:2667252 GO:0005654
#> 2 DHX30 2810477H02Rik MGI:1920081 GO:0035770
#> 3 ITGAV 1110004F14Rik MGI:96608 GO:0034684
#> 4 FZD1 Fz1 MGI:1196625 GO:0030165
#> 5 CSNK1G1 9130020E21Rik MGI:2660884 GO:0000166
#> 6 EI24 PIG8 MGI:108090 GO:0005794
#> 7 TMEM40 9030407H20Rik MGI:2137870 GO:0008150
#> 8 KCND2 Kv4.2 MGI:102663 GO:0006813
#> 9 EIF3E 48kDa MGI:99257 GO:0006412
#> 10 RBBP5 4933411J24Rik MGI:1918367 GO:0042393
$reactome$mapping |> sample_n(10)
mouse_terms#> # A tibble: 10 × 3
#> gene_id gene_symbol term_id
#> <chr> <chr> <chr>
#> 1 ENSMUSG00000074886 GRK6 R-MMU-418555
#> 2 ENSMUST00000057580 MTOR R-MMU-380972
#> 3 ENSMUSP00000128608 UNC13B R-MMU-181430
#> 4 ENSMUSG00000042243.11 BC051665 R-MMU-8939242
#> 5 ENSMUST00000212214 CSNK2A2 R-MMU-2514853
#> 6 ENSMUSG00000056220 PLA2G4A R-MMU-1482922
#> 7 ENSMUSP00000024810 FGD2 R-MMU-9013148
#> 8 ENSMUSG00000032310 CYP1A2 R-MMU-5423646
#> 9 ENSMUST00000111751 MYL2 R-MMU-390522
#> 10 ENSMUST00000007757 TGFBR1 R-MMU-2173789
$kegg$mapping |> sample_n(10)
mouse_terms#> # A tibble: 10 × 3
#> gene_id gene_symbol term_id
#> <chr> <chr> <chr>
#> 1 66171 PGLS mmu00030
#> 2 16456 F11R mmu04514
#> 3 14388 GAB1 mmu04014
#> 4 12289 CACNA1D mmu04728
#> 5 79459 ALDOART2 mmu00010
#> 6 18125 NOS1 mmu05022
#> 7 69146 GSDMD mmu04613
#> 8 22408 WNT1 mmu05225
#> 9 100727 UGT2B34 mmu00040
#> 10 242519 IFNA12 mmu05165
We have a problem. Our data uses Ensembl IDs, but out of the three databases, only Reactome provides with Ensembl IDs. GO returns gene aliases and KEGG uses NCBI IDs. We need put all these data together. The only common identifier is the gene symbol, returned by all three databases. To make it consistent with our data, we need to convert Ensembl IDs into gene symbols. For this, we use the function download_feature_information()
:
<- download_feature_information(species = "mouse") mouse_genes
mouse_genes#> # A tibble: 57,180 × 3
#> id name description
#> <chr> <chr> <chr>
#> 1 ENSMUSG00000000001 Gnai3 guanine nucleotide binding protein (G prote…
#> 2 ENSMUSG00000000003 Pbsn probasin
#> 3 ENSMUSG00000000028 Cdc45 cell division cycle 45
#> 4 ENSMUSG00000000031 H19 H19, imprinted maternally expressed transcr…
#> 5 ENSMUSG00000000037 Scml2 Scm polycomb group protein like 2
#> 6 ENSMUSG00000000049 Apoh apolipoprotein H
#> 7 ENSMUSG00000000056 Narf nuclear prelamin A recognition factor
#> 8 ENSMUSG00000000058 Cav2 caveolin 2
#> 9 ENSMUSG00000000078 Klf6 Kruppel-like transcription factor 6
#> 10 ENSMUSG00000000085 Scmh1 sex comb on midleg homolog 1
#> # ℹ 57,170 more rows
It returns a data frame linking Ensembl IDs and gene symbols. It also provides with gene descriptions, which we are going to need later. From this data frame we create a new one:
<- tibble(
id2symbol rowname = rownames(mouse_mtx),
id = rowname |>
str_remove("\\.\\d+$")
|>
) inner_join(mouse_genes, by = "id") |>
mutate(gene_id = toupper(name))
id2symbol#> # A tibble: 50,892 × 5
#> rowname id name description gene_id
#> <chr> <chr> <chr> <chr> <chr>
#> 1 ENSMUSG00000064842.1 ENSMUSG00000064842 Gm26206 predicted ge… GM26206
#> 2 ENSMUSG00000051951.5 ENSMUSG00000051951 Xkr4 X-linked Kx … XKR4
#> 3 ENSMUSG00000102851.1 ENSMUSG00000102851 Gm18956 predicted ge… GM18956
#> 4 ENSMUSG00000103377.1 ENSMUSG00000103377 Gm37180 predicted ge… GM37180
#> 5 ENSMUSG00000104017.1 ENSMUSG00000104017 Gm37363 predicted ge… GM37363
#> 6 ENSMUSG00000103025.1 ENSMUSG00000103025 Gm37686 predicted ge… GM37686
#> 7 ENSMUSG00000089699.1 ENSMUSG00000089699 Gm1992 predicted ge… GM1992
#> 8 ENSMUSG00000103201.1 ENSMUSG00000103201 Gm37329 predicted ge… GM37329
#> 9 ENSMUSG00000103147.1 ENSMUSG00000103147 Gm7341 predicted ge… GM7341
#> 10 ENSMUSG00000103161.1 ENSMUSG00000103161 Gm38148 predicted ge… GM38148
#> # ℹ 50,882 more rows
It contains the exact rowname from the count table (rowname
), the extracted Ensembl ID (id
), with version number (e.g. .1
) removed, which is joined with our gene data frame. In addition, we converted gene symbols into upper case (column gene_id
). This is another issue - different sources provide gene symbols in different capitalisation. Sometimes, the first letter is in capitals, sometimes the entire gene symbol is capitalised. To circumvent this issue our function download_functional_terms()
automatically capitalises all gene symbols, hence, we need to do the same in the id2symbol
structure.
Having prepared all of this, we can rename rows in the count table:
<- mouse_mtx[id2symbol$rowname, ]
renamed_mtx rownames(renamed_mtx) <- id2symbol$gene_id
head(renamed_mtx)
#> Samples
#> Tags 0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1 18h-2 18h-3
#> GM26206 0 0 0 0 0 0 0 0 0 0 0
#> XKR4 2 4 0 10 4 3 2 3 3 4 5
#> GM18956 0 0 0 0 0 0 0 0 0 0 0
#> GM37180 0 0 2 0 0 0 2 1 0 1 0
#> GM37363 0 0 0 3 0 0 0 1 0 2 0
#> GM37686 1 0 0 1 0 0 0 1 0 0 0
#> Samples
#> Tags 18h-4 KO-1 KO-2 KO-3 KO-4
#> GM26206 0 0 0 0 0
#> XKR4 6 3 5 1 8
#> GM18956 0 0 0 0 0
#> GM37180 1 0 1 0 0
#> GM37363 0 0 0 1 1
#> GM37686 1 0 3 0 1
Now, we have consistent identifers in count data and functional enrichment data. We can now proceed.
Like before, we create a metadata data frame with experiment grouping:
<- tibble(sample = colnames(renamed_mtx)) |>
mouse_metadata mutate(group = str_extract(sample, "^.+(?=-)")) |>
mutate(group = fct_relevel(group, c("0h", "2h", "18h")))
mouse_metadata#> # A tibble: 16 × 2
#> sample group
#> <chr> <fct>
#> 1 0h-1 0h
#> 2 0h-2 0h
#> 3 0h-3 0h
#> 4 0h-4 0h
#> 5 2h-1 2h
#> 6 2h-2 2h
#> 7 2h-3 2h
#> 8 2h-4 2h
#> 9 18h-1 18h
#> 10 18h-2 18h
#> 11 18h-3 18h
#> 12 18h-4 18h
#> 13 KO-1 KO
#> 14 KO-2 KO
#> 15 KO-3 KO
#> 16 KO-4 KO
Differential expression if carried out with edgeR, like for the yeast data, but this time we have more than one contrast. We select timepoint 0h
as the baseline and find contrasts between this and other time points.
library(edgeR)
library(purrr)
<- model.matrix(~group, mouse_metadata)
design_mat <- colnames(design_mat)[-1]
coefficients <- renamed_mtx |>
fit DGEList() |>
calcNormFactors() |>
estimateDisp(design = design_mat) |>
glmQLFit(design = design_mat)
<- map(coefficients, function(coef) {
mouse_de |>
fit glmQLFTest(coef = coef) |>
topTags(n = 1e16, sort.by = "none") |>
pluck("table") |>
as_tibble(rownames = "id") |>
add_column(contrast = coef)
|>
}) list_rbind() |>
rename(
log_fc = logFC,
expr = logCPM,
p_value = PValue
)
mouse_de#> # A tibble: 152,676 × 7
#> id log_fc expr F p_value FDR contrast
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 GM26206 0 -1.93 0 1 1 group2h
#> 2 XKR4 -0.221 -0.367 0.106 0.746 1 group2h
#> 3 GM18956 0 -1.93 0 1 1 group2h
#> 4 GM37180 0.578 -1.61 0.188 0.666 1 group2h
#> 5 GM37363 -1.10 -1.61 0.591 0.445 1 group2h
#> 6 GM37686 -0.644 -1.61 0.196 0.660 1 group2h
#> 7 GM1992 0 -1.93 0 1 1 group2h
#> 8 GM37329 1.31 -1.58 0.881 0.352 1 group2h
#> 9 GM7341 0.772 -1.68 0.283 0.597 1 group2h
#> 10 GM38148 1.63 -1.29 2.04 0.159 1 group2h
#> # ℹ 152,666 more rows
We need data in long format:
<- renamed_mtx |>
mouse_data as_tibble(rownames = "id") |>
pivot_longer(-id, names_to = "sample", values_to = "value")
mouse_data#> # A tibble: 814,272 × 3
#> id sample value
#> <chr> <chr> <dbl>
#> 1 GM26206 0h-1 0
#> 2 GM26206 0h-2 0
#> 3 GM26206 0h-3 0
#> 4 GM26206 0h-4 0
#> 5 GM26206 2h-1 0
#> 6 GM26206 2h-2 0
#> 7 GM26206 2h-3 0
#> 8 GM26206 2h-4 0
#> 9 GM26206 18h-1 0
#> 10 GM26206 18h-2 0
#> # ℹ 814,262 more rows
For the Shiny app, a feature information table is needed to link feature identifiers (capitalised gene symbols), feature names (original gene symbols) and descriptions. The columns needed are described in Table 2. Here is how we prepare the required tibble:
<- mouse_genes |>
mouse_features mutate(id = toupper(name))
mouse_features#> # A tibble: 57,180 × 3
#> id name description
#> <chr> <chr> <chr>
#> 1 GNAI3 Gnai3 guanine nucleotide binding protein (G protein), alpha in…
#> 2 PBSN Pbsn probasin
#> 3 CDC45 Cdc45 cell division cycle 45
#> 4 H19 H19 H19, imprinted maternally expressed transcript
#> 5 SCML2 Scml2 Scm polycomb group protein like 2
#> 6 APOH Apoh apolipoprotein H
#> 7 NARF Narf nuclear prelamin A recognition factor
#> 8 CAV2 Cav2 caveolin 2
#> 9 KLF6 Klf6 Kruppel-like transcription factor 6
#> 10 SCMH1 Scmh1 sex comb on midleg homolog 1
#> # ℹ 57,170 more rows
Finally, we need to prepare functional enrichment data in the fast enrichment format. We specify feature_name = "gene_symbol"
to inform the function which column should be used to identify features.
<- prepare_functional_terms(mouse_terms, feature_name = "gene_symbol") mouse_fterms
Now, we have all necessary data to launch dexdash
:
run_app(mouse_de, mouse_data, mouse_metadata, mouse_features, mouse_fterms)
iDEP is a flexible web-based tool that supports full data analysis, starting from the count matrix. It includes pre-processing, clustering, PCA, differential expression, KEGG pathway analysis and more. While it generates volcano and MA plots, they lack fast interactivity. In particular, pathway analysis in iDEP if only performed on down- or up-regulated genes. In contrast, dexdash is smaller and more focussed on interactivity, visualisation and fast functional enrichment on any arbitrary gene selection. Additionally, dexdash suports GO and Reactome functional terms, alongside with KEGG.
ShinyGO is a web-base Shiny app for comprehensive KEGG pathway enrichment analysis. Users can submit a list of gene names for analysis, which then provides various outputs. In contrast, dexdash works on differential expression result set and creates interactive volcano/MA plots for fast repeated gene selection and simple GO, KEGG and Reactome enrichment analysis.
Glimma is a Bioconductor R package designed to create interactive MDS, MA and volcano plots. Users can explore gene expression patterns by selecting individual genes. By contrast, dexdash allows for multiple gene selection, functional enrichment analysis and reverse selection of functional terms to find functionally related genes.
VolcaNoseR is a simple customisable volcano plot with no additional functionality. clusterProfiler, ReactomePA, GOplot and topGO provide robust tools for enrichment analysis but do not offer the interactive, user-driven experience found in dexdash.
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sonoma 14.4.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/London
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods
#> [7] base
#>
#> other attached packages:
#> [1] dexdash_0.1.2 edgeR_4.0.16 limma_3.58.1 stringr_1.5.1
#> [5] forcats_1.0.0 purrr_1.0.2 tidyr_1.3.1 dplyr_1.1.4
#> [9] tibble_3.2.1
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_1.8.8 compiler_4.4.0 tidyselect_1.2.1
#> [4] Rcpp_1.0.12 assertthat_0.2.1 splines_4.4.0
#> [7] yaml_2.3.8 fastmap_1.1.1 statmod_1.5.0
#> [10] lattice_0.22-6 R6_2.5.1 generics_0.1.3
#> [13] knitr_1.46 pillar_1.9.0 rlang_1.1.3
#> [16] utf8_1.2.4 stringi_1.8.3 xfun_0.43
#> [19] cli_3.6.2 withr_3.0.0 magrittr_2.0.3
#> [22] digest_0.6.35 grid_4.4.0 locfit_1.5-9.9
#> [25] rstudioapi_0.16.0 fenr_1.1.11 lifecycle_1.0.4
#> [28] vctrs_0.6.5 evaluate_0.23 glue_1.7.0
#> [31] fansi_1.0.6 rmarkdown_2.26 tools_4.4.0
#> [34] pkgconfig_2.0.3 htmltools_0.5.8.1