Dexdash: Differential EXpression Downstream Analysis SHiny app

Abstract

This R package is designed for interactive exploration and analysis of differential expression (DE) results. It combines Shiny-based visualization tools with rapid functional enrichment analysis. Users can select genes from a volcano or MA plot to find their names, descriptions and plot expression patterns. The built-in fast functional enrichment (based on the fenr package) allows users to instantly see which biological pathways and processes those genes are involved in, based on enrichment analysis of Gene Ontology (GO), KEGG, and Reactome databases. The input for the app consists of five data frames containing DE results, expression or abundance data, experimental design, gene names and descriptions. A build-in function allows for easy downloading of functional term data. Once started, the app is fast and easy to use.

dexdash version 0.2.7.

Overview

Imagine you’ve carried out an RNA-seq experiment, a proteomics study, or another type of -omics research. You’ve done read mapping or protein quantifications and now have a set of data in hand. This data capture expression or abundance (we will call it ‘expression’ for simplicity) for each gene, protein, or other entity (hereinafter referred to as ‘feature’). These expressions are recorded across various biological conditions (which we’ll call ‘groups’) and their replicates. Following this, you’ve conducted a differential expression analysis using tools like limma, edgeR, DESeq2, or similar, ending up with a data frame that includes log-fold change, p-value, among other metrics.

Now, you’re ready to investigate the results. You want to generate a volcano or MA plot, quickly identify features of interest, and explore the functional similarities among groups of features that behave similarly. This is exactly what dexdash is for. Simply feed it your expression data, differential expression results, and some details about your experimental design and gene names, and voilà! You’re equipped with a web-based interactive tool to explore your data. A screenshot of the dexdash app is shown in Figure 1.

Figure 1: Screenshot of dexdash Shiny app.

Installation

dexdash can be installed from GitHub by using:

if (!require("remotes", quietly = TRUE))
  install.packages("remotes")

remotes::install_github("bartongroup/dexdash")

# Examples in this document require additional data
remotes::install_github("bartongroup/dexdata")

Quick example

library(dexdash)
library(dexdata)

# Data examples available in the package
data(yeast_de, yeast_data, yeast_metadata)

# Create the dexdash data set
yeast_dexset <- dexdash_set(yeast_de, yeast_data, yeast_metadata, name = "Yeast")

# The slow bit: download feature information and functional term data
yeast_features <- download_feature_information(species = "yeast")
yeast_terms <- download_functional_terms(species = "yeast")
yeast_fterms <- prepare_functional_terms(yeast_terms, feature_name = "gene_id")

# The fast bit: interactive app
run_app(yeast_dexset, yeast_features, yeast_fterms)

Usage

The dexdash Shiny app is launched with a command:

run_app(dexset, features, fterms)

There are five mandatory arguments:

  • dexset - a dexdash_set or dexdash_list object containing differential expression results, expression of abundance data and the design of the experiment. dexdash_set contains data from one experiment, dexdash_list combines data from multiple experiments,
  • features - data frame with feature names and descriptions,
  • fterms - an object containing functional terms data.

User data: experimental results

The dexset argument for the app should be created with dexdash_set() or dexdash_list() function, for single- and multiple-experiment data, respectively. For a single experiment, the dexdash_set object is created using

yeast_dexset <- dexdash_set(yeast_de, yeast_data, yeast_metadata, name = "Yeast")

The inputs for this function are three data frames containing experimental results, provided by the user. As differential expression output differs from tool to tool, it would be difficult to feed them directly into dexdash. Therefore, users need to rename data frame columns as shown in Table 1. A fully worked example below demonstrates how to prepare the required data, step by step.

Table 1: Column names, types and descriptions for data frames with differential expression results (de), expression data (data) and experimental design (metadata).
Data frame Column Type Description
de id character feature identifier
log_fc numeric log-fold change; use logFC column from edgeR and limma output, log2FoldChange from DESeq2
expr numeric mean expression or abundance, typically in logarithmic scale; use logCPM from edgeR, log10(baseMean) from DESeq2, AveExpr from limma
p_value numeric p-value from statistical tests; use PValue from edgeR and limma, pvalue from DESeq2
fdr numeric (optional) false discovery rate to assess statistical significance; if not provided, FDR will be calculated using the Benjamini-Hochberg multiple test corrections from the p-values
contrast factor name of the contrast; this data frame can contain results from multiple contrast
data id character feature identifier
sample character sample identifier
value numeric the expression, or abundance of the gene/protein for this sample
metadata sample character sample identifier
character/factor at least one additional column with information that will be used as the x-axis or to colour the points in the feature plot

Feature information

Table 2: Column names, types and descriptions for data frame with feature (gene or protein) information.
argument column type description
features id character feature identifier
name character a human-readable name of the feature, typically a gene symbol
description character a brief description of the feature

Feature (gene or protein) information should be organised in a data frame with columns id, name and description, as explained in Table 2.

This data frame can be prepared by the user, but dexdash includes a helper function to download these data from Ensembl:

yeast_features <- download_feature_information(species = "yeast")

The necessary information to connect to the correct Ensembl dataset is stored in a JSON file attached to the package (see Section 4.4 for details).

Functional term data

For fast interactive functional enrichment, dexdash requires data downloaded and formatted appropriately. This can be achieved with the following built-in functions:

yeast_terms <- download_functional_terms(species = "yeast")
yeast_fterms <- prepare_functional_terms(terms, feature_name = "gene_id")

The first function, download_functional_terms(), retrieves functional term data and term-feature mappings from online databases, organizing them into two user-readable data frames (tibbles). This format allows users to verify that the data adheres to the necessary specifications. The species argument corresponds to an entry in the JSON file included with the package, which is detailed further in Section 4.4.

The second function, prepare_functional_terms(), transforms these data frames into an object optimized for fast functional enrichment within dexdash, based on Bioconductor’s fenr package. The feature_name argument designates which column in the mapping tibble acts as the feature identifier. For more details, refer to the fenr package vignette. Depending on your dataset, feature_name should be set to "gene_symbol" if dealing with gene symbols like “BRCA1” or “FOXP1”, or "gene_id" for Ensembl identifiers such as “ENSG00000012048” or “ENSG00000114861”.

Species JSON file

Online databases such as GO, Reactome, KEGG, and Ensembl require specific species designations and dataset names for access, which are often not intuitive. For example, the designations for yeast are “sgd” in GO, “Saccharomyces cerevisiae” in Reactome, and “sce in KEGG. To make access easier, dexdash includes a JSON file containing these necessary identifiers along with Ensembl BioMart host and dataset details for various species. This file is located locally at:

species_file <- system.file("extdata", "species.json", package = "dexdash")

To check the species supported by dexdash, you can use:

list_species()
[1] "human"       "mouse"       "rat"         "yeast"       "roundworm"   "slime mold" 
[7] "fruit fly"   "xenopus"     "arabidopsis"

If your species is not listed, you need to create a JSON file in the same format, as the included file, and provide its location by argument species_file in the download_functional_terms() function.

Here is the entry for yeast in the JSON file:

{
  "yeast": {
    "ontology": {
      "go": "sgd",
      "reactome": "Saccharomyces cerevisiae",
      "kegg": "sce"
    },
    "ensembl": {
      "biomart": "ensembl",
      "host": "https://www.ensembl.org",
      "dataset": "scerevisiae_gene_ensembl"
    }
  }
} 

It is divided in two sections:

"ontology": containing species designations needed to fetch data from GO, Reactome, and KEGG. These can be found by functions fenr::fetch_go_species(), fenr::fetch_reactome_species(), and fenr::fetch_kegg_species().

"ensembl": detailing the BioMart, host, and dataset information required by biomaRt::useMart(), which varies for different organism groups such as plants and protists. These can be found using biomaRt::listMarts() and biomaRt::listDatasets(). For instance, this is the entry for Dictyostelium discoideum (commonly known as slime mold):

{
  "slime mold": {
    "ontology": {
      "go": "dictybase",
      "reactome": "Dictyostelium discoideum",
      "kegg": "ddi"
    },
    "ensembl": {
      "biomart": "protists_mart",
      "host": "https://protists.ensembl.org",
      "dataset": "ddiscoideum_eg_gene"
    }
  }
} 

Simple example

In this example we are given only one data set containing read counts from an RNA-seq experiment (Gierlinski et al. 2015). It is a matrix yeast_mtx, with rows corresponding to genes and columns corresponding to samples. It contains the same data as the yeast_data object, but in wide, matrix format. We will demonstrate how to prepare all data required for dexdash from this one matrix.

We start with the packages needed for this example:

library(dexdash)
library(dexdata)
library(tibble)
library(dplyr)
library(tidyr)
library(forcats)
library(purrr)
library(stringr)
library(edgeR)

Count data

Here is the count matrix. This is the starting point for most of RNA-seq data analysis.

data(yeast_mtx)
dim(yeast_mtx)
[1] 6298   12
head(yeast_mtx)
        Snf2-07 Snf2-21 Snf2-26 Snf2-27 Snf2-39 Snf2-40 WT-01 WT-10 WT-18 WT-29 WT-42 WT-43
YDL246C       7       4       5       0       0       3     1     1     2     2     1     3
YDL243C     411     273     356     151     423     250   192   191   183   160   154   206
YDR387C    1118     709     875     472    1242     706   377   555   576   497   469   523
YDL094C      62      62      17      22      56      22     3    22    29    19    26    18
YDR438W     488     296     408     248     537     267   153   239   211   272   208   199
YDR523C      64      38      52      43      78      48    17    88    63    90    38    20

Metadata

First, we create metadata with the design of the experiment. We extract strain designation (WT or Snf2) from sample names (column names of yeast_mtx) and convert them into factors, with the first level WT. As per dexdash requirements, metadata contains a column called sample and an additional column strain:

yeast_metadata <- tibble(sample = colnames(yeast_mtx)) |>
  separate_wider_delim(sample, names = c("strain", "replicate"), delim = "-", cols_remove = FALSE) |> 
  mutate(strain = fct_relevel(strain, "WT"))

yeast_metadata
# A tibble: 12 × 3
   strain replicate sample 
   <fct>  <chr>     <chr>  
 1 Snf2   07        Snf2-07
 2 Snf2   21        Snf2-21
 3 Snf2   26        Snf2-26
 4 Snf2   27        Snf2-27
 5 Snf2   39        Snf2-39
 6 Snf2   40        Snf2-40
 7 WT     01        WT-01  
 8 WT     10        WT-10  
 9 WT     18        WT-18  
10 WT     29        WT-29  
11 WT     42        WT-42  
12 WT     43        WT-43  

We also create an additional column called replicate. It can be used in the app to colour points in the feature plot.

Differential expression

Next, we carry out differential expression using edgeR. This is a simple case of two conditions (strains), each in six replicates. Again, we need specific column names for dexdash. If you use a different differential expression tool, the default column names would be different. In any case, they need to be id, log_fc, expr, p_value, and contrast (see Table 1). Whichever DE tool you use, make sure to rename the columns before passing the result to dexdash.

design_mat <- model.matrix(~strain, yeast_metadata)

dge <- DGEList(yeast_mtx)
keep <- filterByExpr(dge, design_mat)

yeast_de <- dge[keep, ] |> 
  normLibSizes() |>
  glmQLFit(design = design_mat) |>
  glmQLFTest(coef = "strainSnf2") |>
  topTags(n = 1e16, sort.by = "none") |>
  pluck("table") |>
  as_tibble(rownames = "id") |>
  add_column(contrast = "Snf2-WT") |> 
  rename(
    log_fc = logFC,
    expr = logCPM,
    p_value = PValue
  )

yeast_de
# A tibble: 5,999 × 7
   id        log_fc  expr        F    p_value       FDR contrast
   <chr>      <dbl> <dbl>    <dbl>      <dbl>     <dbl> <chr>   
 1 YDL243C -0.0118   4.88  0.00723 0.933      0.952     Snf2-WT 
 2 YDR387C  0.00810  6.32  0.0143  0.907      0.934     Snf2-WT 
 3 YDL094C  0.295    1.87  0.782   0.391      0.490     Snf2-WT 
 4 YDR438W  0.0594   5.12  0.529   0.479      0.574     Snf2-WT 
 5 YDR523C -0.665    2.76  5.42    0.0350     0.0699    Snf2-WT 
 6 YDR492W -0.636    6.78 11.2     0.00468    0.0135    Snf2-WT 
 7 YDR018C  1.19     4.09 50.3     0.00000469 0.0000627 Snf2-WT 
 8 YDL189W  0.0568   7.16  0.608   0.448      0.545     Snf2-WT 
 9 YDR508C -0.0831   8.13  0.439   0.518      0.610     Snf2-WT 
10 YDR462W -0.171    5.66  4.05    0.0632     0.112     Snf2-WT 
# ℹ 5,989 more rows

Count data in long format

dexdash requires count data in long format, with column names id, sample and value. We create the appropriate data frame:

yeast_data <- yeast_mtx |> 
  as_tibble(rownames = "id") |> 
  pivot_longer(-id, names_to = "sample", values_to = "value")

yeast_data
# A tibble: 75,576 × 3
   id      sample  value
   <chr>   <chr>   <int>
 1 YDL246C Snf2-07     7
 2 YDL246C Snf2-21     4
 3 YDL246C Snf2-26     5
 4 YDL246C Snf2-27     0
 5 YDL246C Snf2-39     0
 6 YDL246C Snf2-40     3
 7 YDL246C WT-01       1
 8 YDL246C WT-10       1
 9 YDL246C WT-18       2
10 YDL246C WT-29       2
# ℹ 75,566 more rows

Complete data set

Now the function dexdash_set() can be used to create the full data set for dexdash:

yeast_dexset <- dexdash_set(yeast_de, yeast_data, yeast_metadata, name = "Yeast")

The name, "Yeast", will be shown in the data set drop-down menu in the app. In this case we have just one experiment, so the drop-down menu will contain only one element.

Feature names and descriptions

Next, we need a table converting gene identifiers into human-readable gene names and descriptions. This can be done with the helper function download_feature_information():

yeast_features <- download_feature_information(species = "yeast")
yeast_features |>
  arrange(name)
# A tibble: 7,127 × 3
   id      name     description                                               
   <chr>   <chr>    <chr>                                                     
 1 Q0020   15S_RRNA Ribosomal RNA of the small mitochondrial ribosomal subunit
 2 Q0158   21S_RRNA Mitochondrial 21S rRNA                                    
 3 YMR056C AAC1     Mitochondrial inner membrane ADP/ATP translocator         
 4 YBR085W AAC3     Mitochondrial inner membrane ADP/ATP translocator         
 5 YJR155W AAD10    Putative aryl-alcohol dehydrogenase                       
 6 YNL331C AAD14    Putative aryl-alcohol dehydrogenase                       
 7 YOL165C AAD15    Putative aryl-alcohol dehydrogenase                       
 8 YCR107W AAD3     Putative aryl-alcohol dehydrogenase                       
 9 YDL243C AAD4     Putative aryl-alcohol dehydrogenase                       
10 YFL056C AAD6     Putative aryl-alcohol dehydrogenase                       
# ℹ 7,117 more rows

Functional enrichment data

Finally, as indicated earlier, we need the functional enrichment data. This is done in two steps:

yeast_terms <- download_functional_terms("yeast")
yeast_fterms <- prepare_functional_terms(yeast_terms, feature_name = "gene_id")

Run dexdash

Now, we have all necessary data to launch dexdash:

run_app(yeast_dexset, yeast_features, yeast_fterms)

By default, the x-axis varable in the feature plot is “sample”. This can be changed in the app by clicking on the gear icon and selecting a different x-axis variable. Alternatively, we can specify which of the metadata columns we want to use when starting the app from the console:

run_app(yeast_dexset, yeast_features, yeast_fterms, x_variable = "strain")

Advanced example

Here we show an example of data, where feature identifiers do not correspond directly to those in the online databases. The data comes from a mouse RNA-seq experiment (originally from Blümli et al. (2021). Here we use only three time points.

Again, we need the following packages loaded:

library(dexdash)
library(dexdata)
library(tibble)
library(dplyr)
library(tidyr)
library(forcats)
library(purrr)
library(stringr)
library(edgeR)

Messing with identifiers

Struggling with identifiers is all in a day’s work of a bioinformatician. Alas, there are no unified gene of protein identifiers in this imperfect word, and getting data from various sources inevitable leads to inconsistencies. This is our read count matrix:

data(mouse_mtx)
head(mouse_mtx)
                       Samples
Tags                    0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1 18h-2 18h-3 18h-4
  ENSMUSG00000051951.5     2    4    0   10    4    3    2    3     3     4     5     6
  ENSMUSG00000103161.1     0    0    1    1    4    0    0    3     2     1     1     4
  ENSMUSG00000102331.1     3    7    2    6    2    1    1    0     4     1     0     3
  ENSMUSG00000025902.13   16   15   15   31   17   16   13   10    29    17    34    11
  ENSMUSG00000098104.1     0    3    0    1    1    2    1    1     4     6     1     2
  ENSMUSG00000103922.1   114   82   42   67   89   85   47   65   162    91    80    73

Genes are identified by Ensemble gene IDs. Let us have a look at the mouse functional term information, downloaded from GO, Reactome and KEGG. First, we download terms and mappings:

mouse_terms <- download_functional_terms(species = "mouse")

It is a nested list, the first level corresponds to the “ontology”, that is go, reactome and kegg. Each ontology consists of two data frames, mapping - containing feature-term mapping and terms - containing term information. Let’s look at a random selection of rows in feature-to-term mapping tables:

set.seed(42)
mouse_terms$go$mapping |> slice_sample(n = 10)
# A tibble: 10 × 4
   gene_symbol gene_id       db_id       term_id   
   <chr>       <chr>         <chr>       <chr>     
 1 EHBP1       Flj21950      MGI:2667252 GO:0005654
 2 DHX30       2810477H02Rik MGI:1920081 GO:0035770
 3 ITGAV       1110004F14Rik MGI:96608   GO:0034684
 4 FZD1        Fz1           MGI:1196625 GO:0030165
 5 CSNK1G1     9130020E21Rik MGI:2660884 GO:0000166
 6 EI24        PIG8          MGI:108090  GO:0005794
 7 TMEM40      9030407H20Rik MGI:2137870 GO:0008150
 8 KCND2       Kv4.2         MGI:102663  GO:0006813
 9 EIF3E       48kDa         MGI:99257   GO:0006412
10 RBBP5       4933411J24Rik MGI:1918367 GO:0042393
mouse_terms$reactome$mapping |> slice_sample(n = 10)
# A tibble: 10 × 3
   gene_id               gene_symbol term_id      
   <chr>                 <chr>       <chr>        
 1 ENSMUSG00000074886    GRK6        R-MMU-418555 
 2 ENSMUST00000057580    MTOR        R-MMU-380972 
 3 ENSMUSP00000128608    UNC13B      R-MMU-181430 
 4 ENSMUSG00000042243.11 BC051665    R-MMU-8939242
 5 ENSMUST00000212214    CSNK2A2     R-MMU-2514853
 6 ENSMUSG00000056220    PLA2G4A     R-MMU-1482922
 7 ENSMUSP00000024810    FGD2        R-MMU-9013148
 8 ENSMUSG00000032310    CYP1A2      R-MMU-5423646
 9 ENSMUST00000111751    MYL2        R-MMU-390522 
10 ENSMUST00000007757    TGFBR1      R-MMU-2173789
mouse_terms$kegg$mapping |> slice_sample(n = 10)
# A tibble: 10 × 3
   gene_id gene_symbol term_id 
   <chr>   <chr>       <chr>   
 1 66171   PGLS        mmu00030
 2 16456   F11R        mmu04514
 3 14388   GAB1        mmu04014
 4 12289   CACNA1D     mmu04728
 5 79459   ALDOART2    mmu00010
 6 18125   NOS1        mmu05022
 7 69146   GSDMD       mmu04613
 8 22408   WNT1        mmu05225
 9 100727  UGT2B34     mmu00040
10 242519  IFNA12      mmu05165

We have a problem. Our data uses Ensembl IDs, but out of the three databases, only Reactome provides with Ensembl IDs. GO returns gene aliases and KEGG uses NCBI IDs. We need put all these data together. The only common identifier is the gene symbol returned by all three databases. To make it consistent with our data, we need to convert Ensembl IDs into gene symbols. For this, we use the function download_feature_information():

mouse_genes <- download_feature_information(species = "mouse")
mouse_genes
# A tibble: 57,180 × 3
   id                 name  description                                                       
   <chr>              <chr> <chr>                                                             
 1 ENSMUSG00000000001 Gnai3 guanine nucleotide binding protein (G protein), alpha inhibiting 3
 2 ENSMUSG00000000003 Pbsn  probasin                                                          
 3 ENSMUSG00000000028 Cdc45 cell division cycle 45                                            
 4 ENSMUSG00000000031 H19   H19, imprinted maternally expressed transcript                    
 5 ENSMUSG00000000037 Scml2 Scm polycomb group protein like 2                                 
 6 ENSMUSG00000000049 Apoh  apolipoprotein H                                                  
 7 ENSMUSG00000000056 Narf  nuclear prelamin A recognition factor                             
 8 ENSMUSG00000000058 Cav2  caveolin 2                                                        
 9 ENSMUSG00000000078 Klf6  Kruppel-like transcription factor 6                               
10 ENSMUSG00000000085 Scmh1 sex comb on midleg homolog 1                                      
# ℹ 57,170 more rows

It returns a data frame linking Ensembl IDs and gene symbols. It also provides with gene descriptions, which we are going to need later. From this data frame we create a new one:

id2symbol <- tibble(
  rowname = rownames(mouse_mtx),
  id = rowname |> 
     str_remove("\\.\\d+$")
) |> 
  inner_join(mouse_genes, by = "id") |> 
  mutate(gene_id = toupper(name))

id2symbol
# A tibble: 28,133 × 5
   rowname               id                 name    description                          gene_id
   <chr>                 <chr>              <chr>   <chr>                                <chr>  
 1 ENSMUSG00000051951.5  ENSMUSG00000051951 Xkr4    X-linked Kx blood group related 4    XKR4   
 2 ENSMUSG00000103161.1  ENSMUSG00000103161 Gm38148 predicted gene, 38148                GM38148
 3 ENSMUSG00000102331.1  ENSMUSG00000102331 Gm19938 predicted gene, 19938                GM19938
 4 ENSMUSG00000025902.13 ENSMUSG00000025902 Sox17   SRY (sex determining region Y)-box … SOX17  
 5 ENSMUSG00000098104.1  ENSMUSG00000098104 Gm6085  predicted gene 6085                  GM6085 
 6 ENSMUSG00000103922.1  ENSMUSG00000103922 Gm6123  predicted gene 6123                  GM6123 
 7 ENSMUSG00000033845.13 ENSMUSG00000033845 Mrpl15  mitochondrial ribosomal protein L15  MRPL15 
 8 ENSMUSG00000102275.1  ENSMUSG00000102275 Gm37144 predicted gene, 37144                GM37144
 9 ENSMUSG00000025903.14 ENSMUSG00000025903 Lypla1  lysophospholipase 1                  LYPLA1 
10 ENSMUSG00000033813.15 ENSMUSG00000033813 Tcea1   transcription elongation factor A (… TCEA1  
# ℹ 28,123 more rows

It contains rownames from the count table (rowname), the extracted Ensembl ID (id), with gene annotation version numbers (e.g. .1) removed. This data frame is joined with our gene data frame. In addition, we converted gene symbols into upper case (column gene_id). This is another issue - different sources provide gene symbols in different capitalisation. Sometimes, the first letter is in capitals, sometimes the entire gene symbol is capitalised. To circumvent this issue our function download_functional_terms() automatically capitalises all gene symbols, hence, we need to do the same in the id2symbol structure.

Having prepared all of this, we can rename rows in the count table:

renamed_mtx <- mouse_mtx[id2symbol$rowname, ]
rownames(renamed_mtx) <- id2symbol$gene_id

head(renamed_mtx)
         Samples
Tags      0h-1 0h-2 0h-3 0h-4 2h-1 2h-2 2h-3 2h-4 18h-1 18h-2 18h-3 18h-4
  XKR4       2    4    0   10    4    3    2    3     3     4     5     6
  GM38148    0    0    1    1    4    0    0    3     2     1     1     4
  GM19938    3    7    2    6    2    1    1    0     4     1     0     3
  SOX17     16   15   15   31   17   16   13   10    29    17    34    11
  GM6085     0    3    0    1    1    2    1    1     4     6     1     2
  GM6123   114   82   42   67   89   85   47   65   162    91    80    73

Now, we have consistent identifers in count data and functional enrichment data. We can now proceed.

Metadata

Like before, we create a metadata data frame with experiment design:

mouse_metadata <- tibble(sample = colnames(renamed_mtx)) |> 
  separate_wider_delim(sample, names = c("time_point", "replicate"), delim = "-", cols_remove = FALSE) |> 
  mutate(time_point = fct_relevel(time_point, c("0h", "2h", "18h")))

mouse_metadata
# A tibble: 12 × 3
   time_point replicate sample
   <fct>      <chr>     <chr> 
 1 0h         1         0h-1  
 2 0h         2         0h-2  
 3 0h         3         0h-3  
 4 0h         4         0h-4  
 5 2h         1         2h-1  
 6 2h         2         2h-2  
 7 2h         3         2h-3  
 8 2h         4         2h-4  
 9 18h        1         18h-1 
10 18h        2         18h-2 
11 18h        3         18h-3 
12 18h        4         18h-4 

Differential expression

Differential expression if carried out with edgeR, like for the yeast data, but this time we have more than one contrast. We select timepoint 0h as the baseline and find contrasts between this and other time points.

design_mat <- model.matrix(~time_point, mouse_metadata)
coefficients <- colnames(design_mat)[-1]

dge <- DGEList(renamed_mtx)
keep <- filterByExpr(dge, design_mat)
fit <- dge[keep, ] |> 
  normLibSizes() |>
  glmQLFit(design = design_mat)

mouse_de <- map(coefficients, function(coef) {
  fit |> 
    glmQLFTest(coef = coef) |> 
    topTags(n = 1e16, sort.by = "none") |>
    pluck("table") |>
    as_tibble(rownames = "id") |>
    add_column(contrast = coef)
}) |> 
  list_rbind() |> 
  rename(
    log_fc = logFC,
    expr = logCPM,
    p_value = PValue
  )

mouse_de
# A tibble: 38,954 × 7
   id        log_fc  expr        F p_value   FDR contrast    
   <chr>      <dbl> <dbl>    <dbl>   <dbl> <dbl> <chr>       
 1 SOX17   -0.330    1.45 0.719     0.410   1.00 time_point2h
 2 GM6123   0.0253   3.47 0.0121    0.914   1.00 time_point2h
 3 MRPL15  -0.00572  7.54 0.00165   0.968   1.00 time_point2h
 4 GM37144  0.549    3.32 3.32      0.0881  1.00 time_point2h
 5 LYPLA1   0.181    6.63 0.783     0.390   1.00 time_point2h
 6 TCEA1    0.00297  6.54 0.000521  0.982   1.00 time_point2h
 7 GM6104   0.00878  2.12 0.000346  0.985   1.00 time_point2h
 8 GM37277  0.00513  2.69 0.000125  0.991   1.00 time_point2h
 9 RGS20   -0.344    2.49 0.635     0.438   1.00 time_point2h
10 GM16041 -0.452    1.15 0.925     0.352   1.00 time_point2h
# ℹ 38,944 more rows

Count data in long format

We need data in long format:

mouse_data <- renamed_mtx |> 
  as_tibble(rownames = "id") |> 
  pivot_longer(-id, names_to = "sample", values_to = "value")

mouse_data
# A tibble: 337,596 × 3
   id    sample value
   <chr> <chr>  <dbl>
 1 XKR4  0h-1       2
 2 XKR4  0h-2       4
 3 XKR4  0h-3       0
 4 XKR4  0h-4      10
 5 XKR4  2h-1       4
 6 XKR4  2h-2       3
 7 XKR4  2h-3       2
 8 XKR4  2h-4       3
 9 XKR4  18h-1      3
10 XKR4  18h-2      4
# ℹ 337,586 more rows

Dexdash data set

mouse_dexdash <- dexdash_set(mouse_de, mouse_data, mouse_metadata, name = "Mouse")

Feature names and descriptions

For the Shiny app, a feature information table is needed to link feature identifiers (capitalised gene symbols), feature names (original gene symbols) and descriptions. The columns needed are described in Table 2. Here is how we prepare the required tibble:

mouse_features <- mouse_genes |> 
  mutate(id = toupper(name))

mouse_features
# A tibble: 57,180 × 3
   id    name  description                                                       
   <chr> <chr> <chr>                                                             
 1 GNAI3 Gnai3 guanine nucleotide binding protein (G protein), alpha inhibiting 3
 2 PBSN  Pbsn  probasin                                                          
 3 CDC45 Cdc45 cell division cycle 45                                            
 4 H19   H19   H19, imprinted maternally expressed transcript                    
 5 SCML2 Scml2 Scm polycomb group protein like 2                                 
 6 APOH  Apoh  apolipoprotein H                                                  
 7 NARF  Narf  nuclear prelamin A recognition factor                             
 8 CAV2  Cav2  caveolin 2                                                        
 9 KLF6  Klf6  Kruppel-like transcription factor 6                               
10 SCMH1 Scmh1 sex comb on midleg homolog 1                                      
# ℹ 57,170 more rows

Functional enrichment data

Finally, we need to prepare functional enrichment data in the fast enrichment format. We specify feature_name = "gene_symbol" to inform the function which column should be used to identify features.

mouse_fterms <- prepare_functional_terms(mouse_terms, feature_name = "gene_symbol")

Run dexdash

Now, we have all necessary data to launch dexdash:

run_app(mouse_dexdash, mouse_features, mouse_fterms)

Multiple data sets

The function dexdash_list() can be used to combine a few data sets together, for example:

set1 <- dexdash_set(de1, data1, metadata1, "Experiment 1")
set2 <- dexdash_set(de2, data2, metadata2, "Experiment 2")

dexlist <- dexdash_list(se1, set2)
run_app(declist, features, fterms)

Here set1 and set2 contain data and differential expression results from two separate expreiments. They are, however, based on the same organism, so they share features and fterms.

Comparison with Other Tools

  • iDEP is a flexible web-based tool that supports comprehensive data analysis, starting from the count matrix. It includes preprocessing, clustering, PCA, differential expression, KEGG pathway analysis, and more. While it generates volcano and MA plots, these lack fast interactivity. Specifically, enrichment analysis in iDEP is only performed on down- or up-regulated genes. In contrast, dexdash is smaller and more focused on interactivity, visualization, and fast functional enrichment on any arbitrary gene selection.
  • ShinyGO is a web-based Shiny app for comprehensive pathway enrichment analysis. Users can submit a list of gene names for analysis, which then provides various outputs. In contrast, dexdash works with differential expression result sets and creates interactive volcano/MA plots for fast repeated gene selection and simple GO, KEGG, and Reactome enrichment analysis.
  • Glimma is a Bioconductor R package designed to create interactive MDS, MA, and volcano plots. Users can explore gene expression patterns by selecting individual genes. In contrast, dexdash allows for multiple gene selection, functional enrichment analysis, and reverse selection of functional terms to find functionally related genes.
  • ShinyNGS is a Shiny-based exploratory tool for RNA-seq data analysis. It includes a variety of interactive plots for quality control, differential expression, and gene-set enrichment. While dexdash shares some features, it offers fast interactive enrichment analysis, which is not provided by ShinyNGS.
  • Other packages. VolcaNoseR is a simple customizable volcano plot with no additional functionality. Degust is a web-based tool that includes interactive volcano and MA plots offering KEGG pathway visualization but lacks functional enrichment. R packages clusterProfiler, ReactomePA, GOplot, and topGO provide robust tools for enrichment analysis but do not offer the interactive, user-driven experience found in dexdash.

Session info

R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] dexdata_0.1.0 dexdash_0.2.7 edgeR_4.2.1   limma_3.60.4  stringr_1.5.1 forcats_1.0.0
 [7] purrr_1.0.2   tidyr_1.3.1   dplyr_1.1.4   tibble_3.2.1 

loaded via a namespace (and not attached):
 [1] utf8_1.2.4        generics_0.1.3    renv_1.0.7        stringi_1.8.3     lattice_0.22-6   
 [6] digest_0.6.35     magrittr_2.0.3    evaluate_0.23     grid_4.4.1        pkgload_1.4.0    
[11] fastmap_1.2.0     jsonlite_1.8.8    processx_3.8.4    sessioninfo_1.2.2 pkgbuild_1.4.4   
[16] ps_1.7.7          urlchecker_1.0.1  promises_1.3.0    fansi_1.0.6       cli_3.6.2        
[21] shiny_1.9.1       rlang_1.1.3       splines_4.4.1     ellipsis_0.3.2    withr_3.0.0      
[26] remotes_2.5.0     cachem_1.1.0      yaml_2.3.10       devtools_2.4.5    tools_4.4.1      
[31] memoise_2.0.1     locfit_1.5-9.10   httpuv_1.6.15     assertthat_0.2.1  vctrs_0.6.5      
[36] R6_2.5.1          mime_0.12         lifecycle_1.0.4   fs_1.6.3          htmlwidgets_1.6.4
[41] usethis_3.0.0     miniUI_0.1.1.1    fenr_1.2.1        callr_3.7.6       pkgconfig_2.0.3  
[46] pillar_1.9.0      later_1.3.2       profvis_0.3.8     glue_1.7.0        Rcpp_1.0.13      
[51] statmod_1.5.0     xfun_0.43         tidyselect_1.2.1  knitr_1.46        xtable_1.8-4     
[56] htmltools_0.5.8.1 rmarkdown_2.27    compiler_4.4.1