Using proteus R package: TMT data

Marek Gierlinski

This tutorial demonstrates how to analyse data from TMT MS/MS experiment in Proteus. We strongly suggest the user should read the main tutorial first and familiarize themselves with the package.

vignette("proteus")

Here, we are only going to point out the differences between label-free and TMT analysis. For this, we use data from Crozier et al. 2018, Pride accession PXD008741. These data are distributed in a separate package proteusTMT, which needs loading first:

library(proteusTMT)
data(proteusTMT)

1 Input data

1.1 Columns in evidence file

The default measure.cols object is designed for label-free data. For TMT data we need to specify all reporter intensity columns. In out example we have 10 reporter columns, numbered from 0 to 9:

measCols <- paste0("Reporter intensity ", 0:9)
names(measCols) <- paste0("reporter_", 0:9)

The resulting object is

str(as.list(measCols))
## List of 10
##  $ reporter_0: chr "Reporter intensity 0"
##  $ reporter_1: chr "Reporter intensity 1"
##  $ reporter_2: chr "Reporter intensity 2"
##  $ reporter_3: chr "Reporter intensity 3"
##  $ reporter_4: chr "Reporter intensity 4"
##  $ reporter_5: chr "Reporter intensity 5"
##  $ reporter_6: chr "Reporter intensity 6"
##  $ reporter_7: chr "Reporter intensity 7"
##  $ reporter_8: chr "Reporter intensity 8"
##  $ reporter_9: chr "Reporter intensity 9"

We are going to use the default evidenceColumns object, but the user should carefully check that the column names correspond to those in the evidence file.

1.2 Read evidence file

Due to its large size we do not attach the TMT evidence file to this package. The command to read the file would be:

evi <- readEvidenceFile(evidenceFile, measure.cols=measCols)

We do attach an example processed evidence data with the package, it is called evi. It contains all reported intensity columns:

head(evi)
##        sequence modified_sequence protein_group      proteins
## 1 AAAAAASVGSSNK   _AAAAAASVGSSNK_    Unmodified   Tb927.1.840
## 2  AAAAALLSALLR    _AAAAALLSALLR_    Unmodified Tb927.10.1310
## 3  AAAAALLSALLR    _AAAAALLSALLR_    Unmodified Tb927.10.1310
## 4  AAAAALLSALLR    _AAAAALLSALLR_    Unmodified Tb927.10.1310
## 5  AAAAALLSALLR    _AAAAALLSALLR_    Unmodified Tb927.10.1310
## 6  AAAAALLSALLR    _AAAAALLSALLR_    Unmodified Tb927.10.1310
##         protein experiment charge reverse contaminant reporter_0
## 1   Tb927.1.840      E5016      2                             NA
## 2 Tb927.10.1310      E5015      2                        4079.90
## 3 Tb927.10.1310      E5015      3                        1201.50
## 4 Tb927.10.1310      E5016      2                        1075.00
## 5 Tb927.10.1310      E5016      3                         594.79
## 6 Tb927.10.1310      E5014      2                         961.38
##   reporter_1 reporter_2 reporter_3 reporter_4 reporter_5 reporter_6
## 1         NA     45.638     190.24     83.692     55.347         NA
## 2     5038.5   8193.700   10170.00   8853.500   7637.100    7555.20
## 3     1824.5   3049.600    4098.00   3595.300   2940.000    3060.60
## 4     3102.3   2391.800    2790.30   3421.500   2874.200    3057.50
## 5     1756.6   1663.700    1874.20   1809.300   1547.000    1719.60
## 6     1533.8   1059.900    1097.10    929.030   1036.400     729.57
##   reporter_7 reporter_8 reporter_9
## 1     117.83     109.15      69.12
## 2    5632.60    5961.00    6240.70
## 3    2111.50    2433.60    2748.20
## 4    2392.30    3070.20    3067.40
## 5    1443.40    1905.20    1884.00
## 6    1179.10     872.27     557.23

1.3 Metadata

Metadata for this experiment needs to specify all the reporter columns, experiments and corresponding condition and replicates:

metadataFile <- system.file("extdata", "metadata.txt", package="proteusTMT")
meta <- read.delim(metadataFile, header=TRUE, sep="\t")
meta
##    experiment              measure sample condition replicate
## 1       E5014 Reporter intensity 0 CTRL-1      CTRL         1
## 2       E5015 Reporter intensity 0 CTRL-2      CTRL         2
## 3       E5016 Reporter intensity 0 CTRL-3      CTRL         3
## 4       E5014 Reporter intensity 1 T0.5-1      T0.5         1
## 5       E5015 Reporter intensity 1 T0.5-2      T0.5         2
## 6       E5016 Reporter intensity 1 T0.5-3      T0.5         3
## 7       E5014 Reporter intensity 2   T3-1        T3         1
## 8       E5015 Reporter intensity 2   T3-2        T3         2
## 9       E5016 Reporter intensity 2   T3-3        T3         3
## 10      E5014 Reporter intensity 3   T5-1        T5         1
## 11      E5015 Reporter intensity 3   T5-2        T5         2
## 12      E5016 Reporter intensity 3   T5-3        T5         3
## 13      E5014 Reporter intensity 4   T6-1        T6         1
## 14      E5015 Reporter intensity 4   T6-2        T6         2
## 15      E5016 Reporter intensity 4   T6-3        T6         3
## 16      E5014 Reporter intensity 5   T7-1        T7         1
## 17      E5015 Reporter intensity 5   T7-2        T7         2
## 18      E5016 Reporter intensity 5   T7-3        T7         3
## 19      E5014 Reporter intensity 6   T8-1        T8         1
## 20      E5015 Reporter intensity 6   T8-2        T8         2
## 21      E5016 Reporter intensity 6   T8-3        T8         3
## 22      E5014 Reporter intensity 7   T9-1        T9         1
## 23      E5015 Reporter intensity 7   T9-2        T9         2
## 24      E5016 Reporter intensity 7   T9-3        T9         3
## 25      E5014 Reporter intensity 8  T10-1       T10         1
## 26      E5015 Reporter intensity 8  T10-2       T10         2
## 27      E5016 Reporter intensity 8  T10-3       T10         3
## 28      E5014 Reporter intensity 9  T11-1       T11         1
## 29      E5015 Reporter intensity 9  T11-2       T11         2
## 30      E5016 Reporter intensity 9  T11-3       T11         3

1.4 No ‘Experiment’ column?

It is possible to run MaxQuant for one experiment where conditions are encoded in reporter columns only. In this case the evidence file will not contain the column ‘Experiment’. Proteus can deal with such configuration. All you need is to remove the ‘experiment’ field from the default column list:

eviCols <- evidenceColumns
eviCols$experiment <- NULL

and read evidence file with an additional argument:

evi <- readEvidenceFile(evidenceFile, measure.cols=measCols, data.cols = eviCols)

The ‘Experiment’ column will be skipped. Obviously, metadata should not contain ‘Experiment’ column either.

Please note that in such case peptide and protein objects created by Proteus will contain a metadata data frame (metadata field in the object, see, e.g., pepdat$metadata) with column experiment filled with dummy values. This is done for consistent internal processing. OK, we are lazy and it was easier to do it this way.

2 Peptide data

2.1 Create a peptide dataset

We chose the median to aggregate peptide intensities from multiple entries in the evidence data:

pepdat <- makePeptideTable(evi, meta, measure.cols=measCols, aggregate.fun=aggregateMedian, experiment.type="TMT")

We can use generic summary function to see more information about pepdat.

summary(pepdat)
## 
## *** Basic statistics ***
## 
##   content = peptide
##   experiment type = TMT
##   number of samples = 30
##   number of conditions = 10
##   number of peptides = 42381
##   samples = CTRL-1, CTRL-2, CTRL-3, T0.5-1, T0.5-2, T0.5-3, T3-1, T3-2, T3-3, T5-1, T5-2, T5-3, T6-1, T6-2, T6-3, T7-1, T7-2, T7-3, T8-1, T8-2, T8-3, T9-1, T9-2, T9-3, T10-1, T10-2, T10-3, T11-1, T11-2, T11-3
##   conditions = CTRL, T0.5, T10, T11, T3, T5, T6, T7, T8, T9
## 
## *** Data processing ***
## 
##   evidence columns used = Reporter intensity 0, Reporter intensity 1, Reporter intensity 2, Reporter intensity 3, Reporter intensity 4, Reporter intensity 5, Reporter intensity 6, Reporter intensity 7, Reporter intensity 8, Reporter intensity 9
##   sequence = 'Sequence'
##   protein = 'Leading razor protein'
##   normalization = identity

2.2 Number of peptides

This is the number of non-zero peptide intensities per sample.

plotCount(pepdat)

3 Protein data

3.1 Create protein dataset

We create protein data using the high-flyer method.

prodat <- makeProteinTable(pepdat, aggregate.fun=aggregateHifly, hifly=3)

Again, we can use a generic summary function to see its properties.

summary(prodat)
## 
## *** Basic statistics ***
## 
##   content = protein
##   experiment type = TMT
##   number of samples = 30
##   number of conditions = 10
##   number of proteins = 5772
##   samples = CTRL-1, CTRL-2, CTRL-3, T0.5-1, T0.5-2, T0.5-3, T3-1, T3-2, T3-3, T5-1, T5-2, T5-3, T6-1, T6-2, T6-3, T7-1, T7-2, T7-3, T8-1, T8-2, T8-3, T9-1, T9-2, T9-3, T10-1, T10-2, T10-3, T11-1, T11-2, T11-3
##   conditions = CTRL, T0.5, T10, T11, T3, T5, T6, T7, T8, T9
## 
## *** Data processing ***
## 
##   evidence columns used = Reporter intensity 0, Reporter intensity 1, Reporter intensity 2, Reporter intensity 3, Reporter intensity 4, Reporter intensity 5, Reporter intensity 6, Reporter intensity 7, Reporter intensity 8, Reporter intensity 9
##   sequence = 'Sequence'
##   protein = 'Leading razor protein'
##   normalization = identity
## 
## *** Protein data ***
## 
##   quantification method = 
##   minimum number of peptides per protein = 2

3.2 Normalization

For TMT data we recommend using CONSTANd normalization Maes et al. 2016.

prodat.norm <- normalizeTMT(prodat)

These two figures show reporter intensity distributions before and after normalization.

plotSampleDistributions(prodat, fill="replicate") + labs(title="Before")

plotSampleDistributions(prodat.norm, log.scale=FALSE, fill="replicate") + labs(title="After")

3.3 Protein clustering

We can use the same function plotClustering() to see the dendrogram for the proteins.

plotClustering(prodat.norm)

4 Differential expression

These particular data come from a time-course experiment that would require analysis beyond Proteus package. We can however show how differential expression can be used to identify proteins changing between time points 0.5h and 8h.

res <- limmaDE(prodat.norm, conditions=c("T0.5", "T8"))

The distribution of p-values looks fine.

plotPdist(res)
## Warning: Removed 805 rows containing non-finite values (stat_bin).

The top of the table of differentially expressed proteins:

res <- res[order(res$adj.P.Val),]
head(res)
##             protein     logFC   AveExpr         t      P.Value
## 1417  Tb927.11.1030  2.830422 -4.129244  74.00862 7.726029e-11
## 1583 Tb927.11.12410  1.749682 -3.787771  38.09474 5.981584e-09
## 1130  Tb927.10.6330  2.367067 -3.989116  32.95518 1.541802e-08
## 1659  Tb927.11.1340  1.526537 -3.807683  30.91149 2.341458e-08
## 3650   Tb927.5.285b -1.499136 -3.068029 -31.22542 2.192074e-08
## 3898   Tb927.6.2070 -1.687678 -3.597221 -26.04447 7.150507e-08
##         adj.P.Val         B significant mean_T0.5   mean_T8 ngood_T0.5
## 1417 3.837519e-07 12.482381        TRUE -5.544455 -2.714033          3
## 1583 1.485526e-05 10.651477        TRUE -4.662612 -2.912930          3
## 1130 2.326005e-05 10.042550        TRUE -5.172649 -2.805583          3
## 1659 2.326005e-05  9.749585        TRUE -4.570952 -3.044415          3
## 3650 2.326005e-05  9.796774        TRUE -2.318461 -3.817597          3
## 3898 5.073795e-05  8.897958        TRUE -2.753382 -4.441060          3
##      ngood_T8
## 1417        3
## 1583        3
## 1130        3
## 1659        3
## 3650        3
## 3898        3

And the top protein from the table:

prot <- as.character(res$protein[1])
plotIntensities(prodat.norm, id=prot)