This tutorial demonstrates how to analyse data from TMT MS/MS experiment in Proteus. We strongly suggest the user should read the main tutorial first and familiarize themselves with the package.
vignette("proteus")
Here, we are only going to point out the differences between label-free and TMT analysis. For this, we use data from Crozier et al. 2018, Pride accession PXD008741. These data are distributed in a separate package proteusTMT, which needs loading first:
library(proteusTMT)
data(proteusTMT)
The default measure.cols
object is designed for label-free data. For TMT data we need to specify all reporter intensity columns. In out example we have 10 reporter columns, numbered from 0 to 9:
measCols <- paste0("Reporter intensity ", 0:9)
names(measCols) <- paste0("reporter_", 0:9)
The resulting object is
str(as.list(measCols))
## List of 10
## $ reporter_0: chr "Reporter intensity 0"
## $ reporter_1: chr "Reporter intensity 1"
## $ reporter_2: chr "Reporter intensity 2"
## $ reporter_3: chr "Reporter intensity 3"
## $ reporter_4: chr "Reporter intensity 4"
## $ reporter_5: chr "Reporter intensity 5"
## $ reporter_6: chr "Reporter intensity 6"
## $ reporter_7: chr "Reporter intensity 7"
## $ reporter_8: chr "Reporter intensity 8"
## $ reporter_9: chr "Reporter intensity 9"
We are going to use the default evidenceColumns
object, but the user should carefully check that the column names correspond to those in the evidence file.
Due to its large size we do not attach the TMT evidence file to this package. The command to read the file would be:
evi <- readEvidenceFile(evidenceFile, measure.cols=measCols)
We do attach an example processed evidence data with the package, it is called evi
. It contains all reported intensity columns:
head(evi)
## sequence modified_sequence protein_group proteins
## 1 AAAAAASVGSSNK _AAAAAASVGSSNK_ Unmodified Tb927.1.840
## 2 AAAAALLSALLR _AAAAALLSALLR_ Unmodified Tb927.10.1310
## 3 AAAAALLSALLR _AAAAALLSALLR_ Unmodified Tb927.10.1310
## 4 AAAAALLSALLR _AAAAALLSALLR_ Unmodified Tb927.10.1310
## 5 AAAAALLSALLR _AAAAALLSALLR_ Unmodified Tb927.10.1310
## 6 AAAAALLSALLR _AAAAALLSALLR_ Unmodified Tb927.10.1310
## protein experiment charge reverse contaminant reporter_0
## 1 Tb927.1.840 E5016 2 NA
## 2 Tb927.10.1310 E5015 2 4079.90
## 3 Tb927.10.1310 E5015 3 1201.50
## 4 Tb927.10.1310 E5016 2 1075.00
## 5 Tb927.10.1310 E5016 3 594.79
## 6 Tb927.10.1310 E5014 2 961.38
## reporter_1 reporter_2 reporter_3 reporter_4 reporter_5 reporter_6
## 1 NA 45.638 190.24 83.692 55.347 NA
## 2 5038.5 8193.700 10170.00 8853.500 7637.100 7555.20
## 3 1824.5 3049.600 4098.00 3595.300 2940.000 3060.60
## 4 3102.3 2391.800 2790.30 3421.500 2874.200 3057.50
## 5 1756.6 1663.700 1874.20 1809.300 1547.000 1719.60
## 6 1533.8 1059.900 1097.10 929.030 1036.400 729.57
## reporter_7 reporter_8 reporter_9
## 1 117.83 109.15 69.12
## 2 5632.60 5961.00 6240.70
## 3 2111.50 2433.60 2748.20
## 4 2392.30 3070.20 3067.40
## 5 1443.40 1905.20 1884.00
## 6 1179.10 872.27 557.23
Metadata for this experiment needs to specify all the reporter columns, experiments and corresponding condition and replicates:
metadataFile <- system.file("extdata", "metadata.txt", package="proteusTMT")
meta <- read.delim(metadataFile, header=TRUE, sep="\t")
meta
## experiment measure sample condition replicate
## 1 E5014 Reporter intensity 0 CTRL-1 CTRL 1
## 2 E5015 Reporter intensity 0 CTRL-2 CTRL 2
## 3 E5016 Reporter intensity 0 CTRL-3 CTRL 3
## 4 E5014 Reporter intensity 1 T0.5-1 T0.5 1
## 5 E5015 Reporter intensity 1 T0.5-2 T0.5 2
## 6 E5016 Reporter intensity 1 T0.5-3 T0.5 3
## 7 E5014 Reporter intensity 2 T3-1 T3 1
## 8 E5015 Reporter intensity 2 T3-2 T3 2
## 9 E5016 Reporter intensity 2 T3-3 T3 3
## 10 E5014 Reporter intensity 3 T5-1 T5 1
## 11 E5015 Reporter intensity 3 T5-2 T5 2
## 12 E5016 Reporter intensity 3 T5-3 T5 3
## 13 E5014 Reporter intensity 4 T6-1 T6 1
## 14 E5015 Reporter intensity 4 T6-2 T6 2
## 15 E5016 Reporter intensity 4 T6-3 T6 3
## 16 E5014 Reporter intensity 5 T7-1 T7 1
## 17 E5015 Reporter intensity 5 T7-2 T7 2
## 18 E5016 Reporter intensity 5 T7-3 T7 3
## 19 E5014 Reporter intensity 6 T8-1 T8 1
## 20 E5015 Reporter intensity 6 T8-2 T8 2
## 21 E5016 Reporter intensity 6 T8-3 T8 3
## 22 E5014 Reporter intensity 7 T9-1 T9 1
## 23 E5015 Reporter intensity 7 T9-2 T9 2
## 24 E5016 Reporter intensity 7 T9-3 T9 3
## 25 E5014 Reporter intensity 8 T10-1 T10 1
## 26 E5015 Reporter intensity 8 T10-2 T10 2
## 27 E5016 Reporter intensity 8 T10-3 T10 3
## 28 E5014 Reporter intensity 9 T11-1 T11 1
## 29 E5015 Reporter intensity 9 T11-2 T11 2
## 30 E5016 Reporter intensity 9 T11-3 T11 3
It is possible to run MaxQuant for one experiment where conditions are encoded in reporter columns only. In this case the evidence file will not contain the column ‘Experiment’. Proteus can deal with such configuration. All you need is to remove the ‘experiment’ field from the default column list:
eviCols <- evidenceColumns
eviCols$experiment <- NULL
and read evidence file with an additional argument:
evi <- readEvidenceFile(evidenceFile, measure.cols=measCols, data.cols = eviCols)
The ‘Experiment’ column will be skipped. Obviously, metadata should not contain ‘Experiment’ column either.
Please note that in such case peptide and protein objects created by Proteus will contain a metadata data frame (metadata
field in the object, see, e.g., pepdat$metadata
) with column experiment
filled with dummy values. This is done for consistent internal processing. OK, we are lazy and it was easier to do it this way.
We chose the median to aggregate peptide intensities from multiple entries in the evidence data:
pepdat <- makePeptideTable(evi, meta, measure.cols=measCols, aggregate.fun=aggregateMedian, experiment.type="TMT")
We can use generic summary
function to see more information about pepdat
.
summary(pepdat)
##
## *** Basic statistics ***
##
## content = peptide
## experiment type = TMT
## number of samples = 30
## number of conditions = 10
## number of peptides = 42381
## samples = CTRL-1, CTRL-2, CTRL-3, T0.5-1, T0.5-2, T0.5-3, T3-1, T3-2, T3-3, T5-1, T5-2, T5-3, T6-1, T6-2, T6-3, T7-1, T7-2, T7-3, T8-1, T8-2, T8-3, T9-1, T9-2, T9-3, T10-1, T10-2, T10-3, T11-1, T11-2, T11-3
## conditions = CTRL, T0.5, T10, T11, T3, T5, T6, T7, T8, T9
##
## *** Data processing ***
##
## evidence columns used = Reporter intensity 0, Reporter intensity 1, Reporter intensity 2, Reporter intensity 3, Reporter intensity 4, Reporter intensity 5, Reporter intensity 6, Reporter intensity 7, Reporter intensity 8, Reporter intensity 9
## sequence = 'Sequence'
## protein = 'Leading razor protein'
## normalization = identity
This is the number of non-zero peptide intensities per sample.
plotCount(pepdat)
We create protein data using the high-flyer method.
prodat <- makeProteinTable(pepdat, aggregate.fun=aggregateHifly, hifly=3)
Again, we can use a generic summary
function to see its properties.
summary(prodat)
##
## *** Basic statistics ***
##
## content = protein
## experiment type = TMT
## number of samples = 30
## number of conditions = 10
## number of proteins = 5772
## samples = CTRL-1, CTRL-2, CTRL-3, T0.5-1, T0.5-2, T0.5-3, T3-1, T3-2, T3-3, T5-1, T5-2, T5-3, T6-1, T6-2, T6-3, T7-1, T7-2, T7-3, T8-1, T8-2, T8-3, T9-1, T9-2, T9-3, T10-1, T10-2, T10-3, T11-1, T11-2, T11-3
## conditions = CTRL, T0.5, T10, T11, T3, T5, T6, T7, T8, T9
##
## *** Data processing ***
##
## evidence columns used = Reporter intensity 0, Reporter intensity 1, Reporter intensity 2, Reporter intensity 3, Reporter intensity 4, Reporter intensity 5, Reporter intensity 6, Reporter intensity 7, Reporter intensity 8, Reporter intensity 9
## sequence = 'Sequence'
## protein = 'Leading razor protein'
## normalization = identity
##
## *** Protein data ***
##
## quantification method =
## minimum number of peptides per protein = 2
For TMT data we recommend using CONSTANd normalization Maes et al. 2016.
prodat.norm <- normalizeTMT(prodat)
These two figures show reporter intensity distributions before and after normalization.
plotSampleDistributions(prodat, fill="replicate") + labs(title="Before")
plotSampleDistributions(prodat.norm, log.scale=FALSE, fill="replicate") + labs(title="After")
We can use the same function plotClustering()
to see the dendrogram for the proteins.
plotClustering(prodat.norm)
These particular data come from a time-course experiment that would require analysis beyond Proteus package. We can however show how differential expression can be used to identify proteins changing between time points 0.5h and 8h.
res <- limmaDE(prodat.norm, conditions=c("T0.5", "T8"))
The distribution of p-values looks fine.
plotPdist(res)
## Warning: Removed 805 rows containing non-finite values (stat_bin).
The top of the table of differentially expressed proteins:
res <- res[order(res$adj.P.Val),]
head(res)
## protein logFC AveExpr t P.Value
## 1417 Tb927.11.1030 2.830422 -4.129244 74.00862 7.726029e-11
## 1583 Tb927.11.12410 1.749682 -3.787771 38.09474 5.981584e-09
## 1130 Tb927.10.6330 2.367067 -3.989116 32.95518 1.541802e-08
## 1659 Tb927.11.1340 1.526537 -3.807683 30.91149 2.341458e-08
## 3650 Tb927.5.285b -1.499136 -3.068029 -31.22542 2.192074e-08
## 3898 Tb927.6.2070 -1.687678 -3.597221 -26.04447 7.150507e-08
## adj.P.Val B significant mean_T0.5 mean_T8 ngood_T0.5
## 1417 3.837519e-07 12.482381 TRUE -5.544455 -2.714033 3
## 1583 1.485526e-05 10.651477 TRUE -4.662612 -2.912930 3
## 1130 2.326005e-05 10.042550 TRUE -5.172649 -2.805583 3
## 1659 2.326005e-05 9.749585 TRUE -4.570952 -3.044415 3
## 3650 2.326005e-05 9.796774 TRUE -2.318461 -3.817597 3
## 3898 5.073795e-05 8.897958 TRUE -2.753382 -4.441060 3
## ngood_T8
## 1417 3
## 1583 3
## 1130 3
## 1659 3
## 3650 3
## 3898 3
And the top protein from the table:
prot <- as.character(res$protein[1])
plotIntensities(prodat.norm, id=prot)