Dexdash: Differential EXpression Downstream Analysis SHiny app

Abstract

This R package is designed for interactive exploration and analysis of differential expression (DE) results. It combines Shiny-based visualization tools with rapid functional enrichment analysis. Users can select genes from a volcano or MA plot to find their names, descriptions and plot expression patterns. The build-in fast functional enrichment (based on fenr package) allows to see instantly which biological pathways and processes those genes are involved in, based on enrichment analysis of Gene Ontology (GO), KEGG, and Reactome databases. The input for the app consists of five data frames containing DE results, expression or abundance data, experimental design, gene names and descriptions. A build-in function allowes for easy downloading of functional term data. Once started, the app is fast and easy to use.

library(dexdash)

Overview

Imagine you’ve carried out an RNA-seq experiment, a proteomics study, or maybe another type of -omics research. You’ve done read mapping or protein quantifications and now have a set of data in hand. This data captures expression or abundance (we will call it ‘expression’ for simplicity) for each gene, protein, or other entity (hereinafter referred to as ‘feature’). These expressions are recorded across various biological conditions (which we’ll call ‘groups’) and their replicates. Following this, you’ve conducted a differential expression analysis using tools like limma, edgeR, DESeq2, or similar, ending up with a data frame that includes log-fold change, p-value, among other metrics.

Now, you’re ready to investigate the results. You want to generate a volcano or MA plot, quickly identify features of interest, and explore the functional similarities among groups of features that behave similarly. This is exactly what dexdash is for. Simply feed it your expression data, differential expression results, and some details about your experimental design and gene names, and voila! You’re equipped with a web-based interactive tool to explore your data.

Screenshot of dexdash Shiny app.

Installation

dexdash can be installed from GitHub by using:

if (!require("remotes", quietly = TRUE))
  install.packages("remotes")

remotes::install_github("bartongroup/dexdash")

Quick example

library(dexdash)

# Data examples available in the package
data(yeast_de, yeast_data, yeast_metadata)

# The slow bit: download feature information and functional term data
yeast_features <- download_feature_information(species = "yeast")
yeast_terms <- download_functional_terms(species = "yeast")
yeast_fterms <- prepare_functional_terms(terms, feature_name = "gene_id")

# The fast bit: interactive app
run_app(yeast_de, yeast_data, yeast_metadata, yeast_features, yeast_fterms)

Usage

The dexdash Shiny app is launched with a command:

run_app(de, data, metadata, features, fterms)

There are five mandatory arguments:

User data: experimental results

There are three data frames containing experimental results, provided by the user. As differential expression output differs from tool to tool, it would be difficult to feed them directly into dexdash. Therefore, the user needs to rename data frame columns as shown in Table 1. A fully worked example below demonstrates how to prepare the required data, step by step.

Table 1: Column names, types and descriptions for data frames with differential expression results (de), expression data (data) and experimental design (metadata).
argument column type description
de id character feature identifier
log_fc numeric log-fold change; use logFC from edgeR and limma, log2FoldChange from DESeq2
expr numeric mean expression or abundance, typically in logarithmic scale; use logCPM from edgeR, log10(baseMean) from DESeq2, AveExpr from limma
p_value numeric p-value from statistical tests; use PValue from edgeR and limma, pvalue from DESeq2
contrast factor name of the contrast; this data frame can contain results from multiple contrast
data id character feature identifier
sample character sample identifer
value numeric the expression, or abundance of the gene/protein for this sample
metadata sample character sample identifer
group character/factor a grouping variable, for example condition or treatment

Feature information

Feature (gene or protein) information should be prepared in a data frame with columns id, name and description, as explained in Table 2.

Table 2: Column names, types and descriptions for data frame with feature (gene or protein) information.
argument column type description
features id character feature identifier
name character a human-readable name of the feature, typically a gene symbol
decription character a brief description of the feature

This data frame can be prepared by the user, but dexdash includes a helper function to download these data from Ensembl:

yeast_features <- download_feature_information(species = "yeast")

The necessary information to connect to the correct Ensembl dataset is stored in a JSON file attached to the package (see Section 4.4 for details).

Functional term data

To perform fast interactive functional enrichment, dexdash requires data downloaded from on-line databases and prepared for fast enrichment. This can be achieved with the following functions:

yeast_terms <- download_functional_terms(species = "yeast")
yeast_fterms <- prepare_functional_terms(terms, feature_name = "gene_id")

The first function downloads functional term data and term-feature mapping from on-line databases and returns them in a format of two data frame (tibbles). These are user-readable and can be accessed to check if data conforms to the required format. The argument species = "yeast" refers to the entry in the JSON file attached to the package (see Section 4.4 for details).

The second function converts data frames into an object required by dexdash for fast functional enrichment (based on Bioconductor’s fenr package). The second argument, feature_name, specifies the name of the column in the mapping tibble to be used as the feature identifier. See the vignette of fenr package for more details. feature_name can be either "gene_symbol" or "gene_id". If your data contain gene symbols (e.g. "BRCA1" or "FOXP1"), use feature_name = "gene_symbol". If your data contain Ensembl identifiers (e.g. "ENSG00000012048" or "ENSG00000114861"), use feature_name = "gene_id".

Species JSON file

On-line databases (GO, Reactome, KEGG and Ensembl) can be accessed by specifying a species designation (functional terms data) and dataset (Ensembl). In many cases these names are not obvious to the user. For example, species designations for yeast are “sgd”, “Saccharomyces cerevisiae”, and “sce” for GO, Reactome and KEGG databases. There is a small JSON file attached to the package that contains these designations and Ensembl biomart, host and dataset names for a selection of species. The file can be found at this path on the local machine:

species_file <- system.file("extdata", "species.json", package = "dexdash")

You can quickly check which species are available using a function

list_species()
#> [1] "human"       "mouse"       "rat"         "yeast"      
#> [5] "roundworm"   "slime mold"  "fruit fly"   "xenopus"    
#> [9] "arabidopsis"

If your species is not included, you need to create a JSON file in the same format, as the included file, and provide its location by argument species_file in the download_functional_terms() function.

Here is the entry for yeast:

{
  "yeast": {
    "ontology": {
      "go": "sgd",
      "reactome": "Saccharomyces cerevisiae",
      "kegg": "sce"
    },
    "ensembl": {
      "biomart": "ensembl",
      "host": "https://www.ensembl.org",
      "dataset": "scerevisiae_gene_ensembl"
    }
  }
} 

The “ontology” field contains species designations for on-line databases. The species designations can be found using fenr::fetch_go_species(), fenr::fetch_reactome_species() and fenr::fetch_kegg_species(). These three functions return data frames, where column designation contains the species designation required.

The “ensembl” field contains the biomart, the host and the dataset used by the BiomaRt::useMart() function. These are different for plants and protists. For example, this is the entry for Dictyostelium discoideum (we call it “slime mold”):

{
  "slime mold": {
    "ontology": {
      "go": "dictybase",
      "reactome": "Dictyostelium discoideum",
      "kegg": "ddi"
    },
    "ensembl": {
      "biomart": "protists_mart",
      "host": "https://protists.ensembl.org",
      "dataset": "ddiscoideum_eg_gene"
    }
  }
} 

Simple example

Count data

We start with the count data from an RNA-seq experiment. Let’s call it yeast_mtx. It is a matrix with rows corresponding to genes and columns corresponding to samples. It contains the same data as the yeast_data object, but in wide, matrix format.

data(yeast_mtx)
dim(yeast_mtx)
#> [1] 6298   12
head(yeast_mtx)
#>         Snf2-07 Snf2-21 Snf2-26 Snf2-27 Snf2-39 Snf2-40 WT-01 WT-10
#> YDL246C       7       4       5       0       0       3     1     1
#> YDL243C     411     273     356     151     423     250   192   191
#> YDR387C    1118     709     875     472    1242     706   377   555
#> YDL094C      62      62      17      22      56      22     3    22
#> YDR438W     488     296     408     248     537     267   153   239
#> YDR523C      64      38      52      43      78      48    17    88
#>         WT-18 WT-29 WT-42 WT-43
#> YDL246C     2     2     1     3
#> YDL243C   183   160   154   206
#> YDR387C   576   497   469   523
#> YDL094C    29    19    26    18
#> YDR438W   211   272   208   199
#> YDR523C    63    90    38    20

Metadata

First, we create a metadata data frame with grouping information. We extract group designation (WT or Snf2) form sample names and convert them into factors, with the first level WT. As per dexdash requirements, the column names of this data frame are sample and group:

library(tibble)
library(dplyr)
library(tidyr)
library(forcats)

yeast_metadata <- tibble(sample = colnames(yeast_mtx)) |> 
  mutate(group = str_extract(sample, "^.+(?=-)")) |> 
  mutate(group = fct_relevel(group, "WT"))

yeast_metadata
#> # A tibble: 12 × 2
#>    sample  group
#>    <chr>   <fct>
#>  1 Snf2-07 Snf2 
#>  2 Snf2-21 Snf2 
#>  3 Snf2-26 Snf2 
#>  4 Snf2-27 Snf2 
#>  5 Snf2-39 Snf2 
#>  6 Snf2-40 Snf2 
#>  7 WT-01   WT   
#>  8 WT-10   WT   
#>  9 WT-18   WT   
#> 10 WT-29   WT   
#> 11 WT-42   WT   
#> 12 WT-43   WT

Differential expression

Next, we carry out differential expression using edgeR. This is a simple case of two conditions (groups), each in six replicates. Again, we need specific column names for dexdash. If you use a different differential expression tool, the default column names would be different. In any case, they need to be id, log_fc, expr, p_value, and contrast. Whichever DE tool you use, make sure to rename the columns before passing the result to dexdash.

library(edgeR)
library(purrr)

design_mat <- model.matrix(~group, yeast_metadata)
yeast_de <- yeast_mtx |> 
  DGEList() |>
  calcNormFactors() |>
  estimateDisp(design = design_mat) |>
  glmQLFit(design = design_mat) |>
  glmQLFTest(coef = "groupSnf2") |>
  topTags(n = 1e16, sort.by = "none") |>
  pluck("table") |>
  as_tibble(rownames = "id") |>
  add_column(contrast = "Snf2-WT") |> 
  rename(
    log_fc = logFC,
    expr = logCPM,
    p_value = PValue
  )

yeast_de
#> # A tibble: 6,298 × 7
#>    id        log_fc   expr        F    p_value       FDR contrast
#>    <chr>      <dbl>  <dbl>    <dbl>      <dbl>     <dbl> <chr>   
#>  1 YDL246C  0.172   -0.937  0.0645  0.803      0.855     Snf2-WT 
#>  2 YDL243C -0.0107   4.88   0.00599 0.939      0.958     Snf2-WT 
#>  3 YDR387C  0.00915  6.32   0.0183  0.894      0.924     Snf2-WT 
#>  4 YDL094C  0.301    1.89   0.791   0.388      0.493     Snf2-WT 
#>  5 YDR438W  0.0602   5.12   0.541   0.474      0.574     Snf2-WT 
#>  6 YDR523C -0.650    2.75   5.00    0.0414     0.0824    Snf2-WT 
#>  7 YDR492W -0.636    6.78  11.3     0.00445    0.0134    Snf2-WT 
#>  8 YDR018C  1.19     4.11  49.6     0.00000465 0.0000660 Snf2-WT 
#>  9 YDL189W  0.0578   7.16   0.617   0.445      0.547     Snf2-WT 
#> 10 YDR508C -0.0823   8.13   0.426   0.524      0.621     Snf2-WT 
#> # ℹ 6,288 more rows

Count data in long format

dexdash requires count data in long format, with column names id, sample and value. We create the appropriate data frame:

yeast_data <- yeast_mtx |> 
  as_tibble(rownames = "id") |> 
  pivot_longer(-id, names_to = "sample", values_to = "value")

yeast_data
#> # A tibble: 75,576 × 3
#>    id      sample  value
#>    <chr>   <chr>   <int>
#>  1 YDL246C Snf2-07     7
#>  2 YDL246C Snf2-21     4
#>  3 YDL246C Snf2-26     5
#>  4 YDL246C Snf2-27     0
#>  5 YDL246C Snf2-39     0
#>  6 YDL246C Snf2-40     3
#>  7 YDL246C WT-01       1
#>  8 YDL246C WT-10       1
#>  9 YDL246C WT-18       2
#> 10 YDL246C WT-29       2
#> # ℹ 75,566 more rows

Feature names and descriptions

Next, we need a table convertinggene identifiers into human-readable gene names and descriptions. This can be done with the helper function download_feature_information():

yeast_features <- download_feature_information(species = "yeast")
yeast_features |>
  arrange(name)
#> # A tibble: 7,127 × 3
#>    id      name     description                                         
#>    <chr>   <chr>    <chr>                                               
#>  1 Q0020   15S_RRNA Ribosomal RNA of the small mitochondrial ribosomal …
#>  2 Q0158   21S_RRNA Mitochondrial 21S rRNA                              
#>  3 YMR056C AAC1     Mitochondrial inner membrane ADP/ATP translocator   
#>  4 YBR085W AAC3     Mitochondrial inner membrane ADP/ATP translocator   
#>  5 YJR155W AAD10    Putative aryl-alcohol dehydrogenase                 
#>  6 YNL331C AAD14    Putative aryl-alcohol dehydrogenase                 
#>  7 YOL165C AAD15    Putative aryl-alcohol dehydrogenase                 
#>  8 YCR107W AAD3     Putative aryl-alcohol dehydrogenase                 
#>  9 YDL243C AAD4     Putative aryl-alcohol dehydrogenase                 
#> 10 YFL056C AAD6     Putative aryl-alcohol dehydrogenase                 
#> # ℹ 7,117 more rows

Functional enrichment data

Finally, as indicated earlier, we need the functional enrichment data. This is done in two steps:

yeast_terms <- download_functional_terms("yeast")
yeast_fterms <- prepare_functional_terms(yeast_terms, feature_name = "gene_id")

Run dexdash

Now, we have all necessary data to launch dexdash:

run_app(yeast_de, yeast_data, yeast_metadata, yeast_features, yeast_fterms)

Advanced example

Messing with identifiers

Struggling with identifiers is all in a day’s work of a bioinformatician. Alas, there are no unified gene of protein identifiers in this imperfect word, and getting data from various sources inevitable leads to inconsistencies. Below, we show an example of a typical dexdash workflow, which requires a little bit more effort.

Consider a count matrix from a mouse RNA-seq experiment (originally from Blümli et al. (2021), here we use only three time points and KO):

data(mouse_mtx)
head(mouse_mtx)
#>                       Samples
#> Tags                   0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1
#>   ENSMUSG00000064842.1    0    0    0    0    0    0    0    0     0
#>   ENSMUSG00000051951.5    2    4    0   10    4    3    2    3     3
#>   ENSMUSG00000102851.1    0    0    0    0    0    0    0    0     0
#>   ENSMUSG00000103377.1    0    0    2    0    0    0    2    1     0
#>   ENSMUSG00000104017.1    0    0    0    3    0    0    0    1     0
#>   ENSMUSG00000103025.1    1    0    0    1    0    0    0    1     0
#>                       Samples
#> Tags                   18h-2 18h-3 18h-4 KO-1 KO-2 KO-3 KO-4
#>   ENSMUSG00000064842.1     0     0     0    0    0    0    0
#>   ENSMUSG00000051951.5     4     5     6    3    5    1    8
#>   ENSMUSG00000102851.1     0     0     0    0    0    0    0
#>   ENSMUSG00000103377.1     1     0     1    0    1    0    0
#>   ENSMUSG00000104017.1     2     0     0    0    0    1    1
#>   ENSMUSG00000103025.1     0     0     1    0    3    0    1

Genes are identified by Ensemble gene IDs. Let us have a look at the mouse functional term information, downloaded from GO, Reactome and KEGG. First, we download terms and mappings:

mouse_terms <- download_functional_terms(species = "mouse")

Now, we look at a random selection of rows in feature-to-term mapping tables:

set.seed(42)
mouse_terms$go$mapping |> sample_n(10)
#> # A tibble: 10 × 4
#>    gene_symbol gene_id       db_id       term_id   
#>    <chr>       <chr>         <chr>       <chr>     
#>  1 EHBP1       Flj21950      MGI:2667252 GO:0005654
#>  2 DHX30       2810477H02Rik MGI:1920081 GO:0035770
#>  3 ITGAV       1110004F14Rik MGI:96608   GO:0034684
#>  4 FZD1        Fz1           MGI:1196625 GO:0030165
#>  5 CSNK1G1     9130020E21Rik MGI:2660884 GO:0000166
#>  6 EI24        PIG8          MGI:108090  GO:0005794
#>  7 TMEM40      9030407H20Rik MGI:2137870 GO:0008150
#>  8 KCND2       Kv4.2         MGI:102663  GO:0006813
#>  9 EIF3E       48kDa         MGI:99257   GO:0006412
#> 10 RBBP5       4933411J24Rik MGI:1918367 GO:0042393

mouse_terms$reactome$mapping |> sample_n(10)
#> # A tibble: 10 × 3
#>    gene_id               gene_symbol term_id      
#>    <chr>                 <chr>       <chr>        
#>  1 ENSMUSG00000074886    GRK6        R-MMU-418555 
#>  2 ENSMUST00000057580    MTOR        R-MMU-380972 
#>  3 ENSMUSP00000128608    UNC13B      R-MMU-181430 
#>  4 ENSMUSG00000042243.11 BC051665    R-MMU-8939242
#>  5 ENSMUST00000212214    CSNK2A2     R-MMU-2514853
#>  6 ENSMUSG00000056220    PLA2G4A     R-MMU-1482922
#>  7 ENSMUSP00000024810    FGD2        R-MMU-9013148
#>  8 ENSMUSG00000032310    CYP1A2      R-MMU-5423646
#>  9 ENSMUST00000111751    MYL2        R-MMU-390522 
#> 10 ENSMUST00000007757    TGFBR1      R-MMU-2173789

mouse_terms$kegg$mapping |> sample_n(10)
#> # A tibble: 10 × 3
#>    gene_id gene_symbol term_id 
#>    <chr>   <chr>       <chr>   
#>  1 66171   PGLS        mmu00030
#>  2 16456   F11R        mmu04514
#>  3 14388   GAB1        mmu04014
#>  4 12289   CACNA1D     mmu04728
#>  5 79459   ALDOART2    mmu00010
#>  6 18125   NOS1        mmu05022
#>  7 69146   GSDMD       mmu04613
#>  8 22408   WNT1        mmu05225
#>  9 100727  UGT2B34     mmu00040
#> 10 242519  IFNA12      mmu05165

We have a problem. Our data uses Ensembl IDs, but out of the three databases, only Reactome provides with Ensembl IDs. GO returns gene aliases and KEGG uses NCBI IDs. We need put all these data together. The only common identifier is the gene symbol, returned by all three databases. To make it consistent with our data, we need to convert Ensembl IDs into gene symbols. For this, we use the function download_feature_information():

mouse_genes <- download_feature_information(species = "mouse")
mouse_genes
#> # A tibble: 57,180 × 3
#>    id                 name  description                                 
#>    <chr>              <chr> <chr>                                       
#>  1 ENSMUSG00000000001 Gnai3 guanine nucleotide binding protein (G prote…
#>  2 ENSMUSG00000000003 Pbsn  probasin                                    
#>  3 ENSMUSG00000000028 Cdc45 cell division cycle 45                      
#>  4 ENSMUSG00000000031 H19   H19, imprinted maternally expressed transcr…
#>  5 ENSMUSG00000000037 Scml2 Scm polycomb group protein like 2           
#>  6 ENSMUSG00000000049 Apoh  apolipoprotein H                            
#>  7 ENSMUSG00000000056 Narf  nuclear prelamin A recognition factor       
#>  8 ENSMUSG00000000058 Cav2  caveolin 2                                  
#>  9 ENSMUSG00000000078 Klf6  Kruppel-like transcription factor 6         
#> 10 ENSMUSG00000000085 Scmh1 sex comb on midleg homolog 1                
#> # ℹ 57,170 more rows

It returns a data frame linking Ensembl IDs and gene symbols. It also provides with gene descriptions, which we are going to need later. From this data frame we create a new one:

id2symbol <- tibble(
  rowname = rownames(mouse_mtx),
  id = rowname |> 
     str_remove("\\.\\d+$")
) |> 
  inner_join(mouse_genes, by = "id") |> 
  mutate(gene_id = toupper(name))

id2symbol
#> # A tibble: 50,892 × 5
#>    rowname              id                 name    description   gene_id
#>    <chr>                <chr>              <chr>   <chr>         <chr>  
#>  1 ENSMUSG00000064842.1 ENSMUSG00000064842 Gm26206 predicted ge… GM26206
#>  2 ENSMUSG00000051951.5 ENSMUSG00000051951 Xkr4    X-linked Kx … XKR4   
#>  3 ENSMUSG00000102851.1 ENSMUSG00000102851 Gm18956 predicted ge… GM18956
#>  4 ENSMUSG00000103377.1 ENSMUSG00000103377 Gm37180 predicted ge… GM37180
#>  5 ENSMUSG00000104017.1 ENSMUSG00000104017 Gm37363 predicted ge… GM37363
#>  6 ENSMUSG00000103025.1 ENSMUSG00000103025 Gm37686 predicted ge… GM37686
#>  7 ENSMUSG00000089699.1 ENSMUSG00000089699 Gm1992  predicted ge… GM1992 
#>  8 ENSMUSG00000103201.1 ENSMUSG00000103201 Gm37329 predicted ge… GM37329
#>  9 ENSMUSG00000103147.1 ENSMUSG00000103147 Gm7341  predicted ge… GM7341 
#> 10 ENSMUSG00000103161.1 ENSMUSG00000103161 Gm38148 predicted ge… GM38148
#> # ℹ 50,882 more rows

It contains the exact rowname from the count table (rowname), the extracted Ensembl ID (id), with version number (e.g. .1) removed, which is joined with our gene data frame. In addition, we converted gene symbols into upper case (column gene_id). This is another issue - different sources provide gene symbols in different capitalisation. Sometimes, the first letter is in capitals, sometimes the entire gene symbol is capitalised. To circumvent this issue our function download_functional_terms() automatically capitalises all gene symbols, hence, we need to do the same in the id2symbol structure.

Having prepared all of this, we can rename rows in the count table:

renamed_mtx <- mouse_mtx[id2symbol$rowname, ]
rownames(renamed_mtx) <- id2symbol$gene_id

head(renamed_mtx)
#>          Samples
#> Tags      0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1 18h-2 18h-3
#>   GM26206    0    0    0    0    0    0    0    0     0     0     0
#>   XKR4       2    4    0   10    4    3    2    3     3     4     5
#>   GM18956    0    0    0    0    0    0    0    0     0     0     0
#>   GM37180    0    0    2    0    0    0    2    1     0     1     0
#>   GM37363    0    0    0    3    0    0    0    1     0     2     0
#>   GM37686    1    0    0    1    0    0    0    1     0     0     0
#>          Samples
#> Tags      18h-4 KO-1 KO-2 KO-3 KO-4
#>   GM26206     0    0    0    0    0
#>   XKR4        6    3    5    1    8
#>   GM18956     0    0    0    0    0
#>   GM37180     1    0    1    0    0
#>   GM37363     0    0    0    1    1
#>   GM37686     1    0    3    0    1

Now, we have consistent identifers in count data and functional enrichment data. We can now proceed.

Metadata

Like before, we create a metadata data frame with experiment grouping:

mouse_metadata <- tibble(sample = colnames(renamed_mtx)) |> 
  mutate(group = str_extract(sample, "^.+(?=-)")) |> 
  mutate(group = fct_relevel(group, c("0h", "2h", "18h")))

mouse_metadata
#> # A tibble: 16 × 2
#>    sample group
#>    <chr>  <fct>
#>  1 0h-1   0h   
#>  2 0h-2   0h   
#>  3 0h-3   0h   
#>  4 0h-4   0h   
#>  5 2h-1   2h   
#>  6 2h-2   2h   
#>  7 2h-3   2h   
#>  8 2h-4   2h   
#>  9 18h-1  18h  
#> 10 18h-2  18h  
#> 11 18h-3  18h  
#> 12 18h-4  18h  
#> 13 KO-1   KO   
#> 14 KO-2   KO   
#> 15 KO-3   KO   
#> 16 KO-4   KO

Differential expression

Differential expression if carried out with edgeR, like for the yeast data, but this time we have more than one contrast. We select timepoint 0h as the baseline and find contrasts between this and other time points.

library(edgeR)
library(purrr)

design_mat <- model.matrix(~group, mouse_metadata)
coefficients <- colnames(design_mat)[-1]
fit <- renamed_mtx |> 
  DGEList() |>
  calcNormFactors() |>
  estimateDisp(design = design_mat) |>
  glmQLFit(design = design_mat)

mouse_de <- map(coefficients, function(coef) {
  fit |> 
    glmQLFTest(coef = coef) |> 
    topTags(n = 1e16, sort.by = "none") |>
    pluck("table") |>
    as_tibble(rownames = "id") |>
    add_column(contrast = coef)
}) |> 
  list_rbind() |> 
  rename(
    log_fc = logFC,
    expr = logCPM,
    p_value = PValue
  )

mouse_de
#> # A tibble: 152,676 × 7
#>    id      log_fc   expr     F p_value   FDR contrast
#>    <chr>    <dbl>  <dbl> <dbl>   <dbl> <dbl> <chr>   
#>  1 GM26206  0     -1.93  0       1         1 group2h 
#>  2 XKR4    -0.221 -0.367 0.106   0.746     1 group2h 
#>  3 GM18956  0     -1.93  0       1         1 group2h 
#>  4 GM37180  0.578 -1.61  0.188   0.666     1 group2h 
#>  5 GM37363 -1.10  -1.61  0.591   0.445     1 group2h 
#>  6 GM37686 -0.644 -1.61  0.196   0.660     1 group2h 
#>  7 GM1992   0     -1.93  0       1         1 group2h 
#>  8 GM37329  1.31  -1.58  0.881   0.352     1 group2h 
#>  9 GM7341   0.772 -1.68  0.283   0.597     1 group2h 
#> 10 GM38148  1.63  -1.29  2.04    0.159     1 group2h 
#> # ℹ 152,666 more rows

Count data in long format

We need data in long format:

mouse_data <- renamed_mtx |> 
  as_tibble(rownames = "id") |> 
  pivot_longer(-id, names_to = "sample", values_to = "value")

mouse_data
#> # A tibble: 814,272 × 3
#>    id      sample value
#>    <chr>   <chr>  <dbl>
#>  1 GM26206 0h-1       0
#>  2 GM26206 0h-2       0
#>  3 GM26206 0h-3       0
#>  4 GM26206 0h-4       0
#>  5 GM26206 2h-1       0
#>  6 GM26206 2h-2       0
#>  7 GM26206 2h-3       0
#>  8 GM26206 2h-4       0
#>  9 GM26206 18h-1      0
#> 10 GM26206 18h-2      0
#> # ℹ 814,262 more rows

Feature names and descriptions

For the Shiny app, a feature information table is needed to link feature identifiers (capitalised gene symbols), feature names (original gene symbols) and descriptions. The columns needed are described in Table 2. Here is how we prepare the required tibble:

mouse_features <- mouse_genes |> 
  mutate(id = toupper(name))

mouse_features
#> # A tibble: 57,180 × 3
#>    id    name  description                                              
#>    <chr> <chr> <chr>                                                    
#>  1 GNAI3 Gnai3 guanine nucleotide binding protein (G protein), alpha in…
#>  2 PBSN  Pbsn  probasin                                                 
#>  3 CDC45 Cdc45 cell division cycle 45                                   
#>  4 H19   H19   H19, imprinted maternally expressed transcript           
#>  5 SCML2 Scml2 Scm polycomb group protein like 2                        
#>  6 APOH  Apoh  apolipoprotein H                                         
#>  7 NARF  Narf  nuclear prelamin A recognition factor                    
#>  8 CAV2  Cav2  caveolin 2                                               
#>  9 KLF6  Klf6  Kruppel-like transcription factor 6                      
#> 10 SCMH1 Scmh1 sex comb on midleg homolog 1                             
#> # ℹ 57,170 more rows

Functional enrichment data

Finally, we need to prepare functional enrichment data in the fast enrichment format. We specify feature_name = "gene_symbol" to inform the function which column should be used to identify features.

mouse_fterms <- prepare_functional_terms(mouse_terms, feature_name = "gene_symbol")

Run dexdash

Now, we have all necessary data to launch dexdash:

run_app(mouse_de, mouse_data, mouse_metadata, mouse_features, mouse_fterms)

Comparison with other tools

iDEP

iDEP is a flexible web-based tool that supports full data analysis, starting from the count matrix. It includes pre-processing, clustering, PCA, differential expression, KEGG pathway analysis and more. While it generates volcano and MA plots, they lack fast interactivity. In particular, pathway analysis in iDEP if only performed on down- or up-regulated genes. In contrast, dexdash is smaller and more focussed on interactivity, visualisation and fast functional enrichment on any arbitrary gene selection. Additionally, dexdash suports GO and Reactome functional terms, alongside with KEGG.

ShinyGO

ShinyGO is a web-base Shiny app for comprehensive KEGG pathway enrichment analysis. Users can submit a list of gene names for analysis, which then provides various outputs. In contrast, dexdash works on differential expression result set and creates interactive volcano/MA plots for fast repeated gene selection and simple GO, KEGG and Reactome enrichment analysis.

Glimma

Glimma is a Bioconductor R package designed to create interactive MDS, MA and volcano plots. Users can explore gene expression patterns by selecting individual genes. By contrast, dexdash allows for multiple gene selection, functional enrichment analysis and reverse selection of functional terms to find functionally related genes.

Other packages

VolcaNoseR is a simple customisable volcano plot with no additional functionality. clusterProfiler, ReactomePA, GOplot and topGO provide robust tools for enrichment analysis but do not offer the interactive, user-driven experience found in dexdash.

Session info

#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/London
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods  
#> [7] base     
#> 
#> other attached packages:
#> [1] dexdash_0.1.2 edgeR_4.0.16  limma_3.58.1  stringr_1.5.1
#> [5] forcats_1.0.0 purrr_1.0.2   tidyr_1.3.1   dplyr_1.1.4  
#> [9] tibble_3.2.1 
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_1.8.8    compiler_4.4.0    tidyselect_1.2.1 
#>  [4] Rcpp_1.0.12       assertthat_0.2.1  splines_4.4.0    
#>  [7] yaml_2.3.8        fastmap_1.1.1     statmod_1.5.0    
#> [10] lattice_0.22-6    R6_2.5.1          generics_0.1.3   
#> [13] knitr_1.46        pillar_1.9.0      rlang_1.1.3      
#> [16] utf8_1.2.4        stringi_1.8.3     xfun_0.43        
#> [19] cli_3.6.2         withr_3.0.0       magrittr_2.0.3   
#> [22] digest_0.6.35     grid_4.4.0        locfit_1.5-9.9   
#> [25] rstudioapi_0.16.0 fenr_1.1.11       lifecycle_1.0.4  
#> [28] vctrs_0.6.5       evaluate_0.23     glue_1.7.0       
#> [31] fansi_1.0.6       rmarkdown_2.26    tools_4.4.0      
#> [34] pkgconfig_2.0.3   htmltools_0.5.8.1