Collaborators: Greg Findlay
One of our major interests is in functions of a protein kinase called SRSF protein kinase (SRPK), which is well known to play a role in splicing regulation but also has emerging functions in stem cells and during development. SRPK family genes are mutated in patients with intellectual disability disorders, suggesting that SRPK activity may be required for neurological development and/or function. In this project, we seek to identify new SRPK substrates that are important for regulation of stem cells and neurodevelopment. To this end, we have performed a phosphoproteomic analysis to identify substrates of SRPK that may suggest new molecular functions.
Phosphoproteomics analysis was done with Houjiang Zhou in the MRC-PPU proteomics facility. This involves a very simple experimental set-up in which we have used mouse embryonic stem cells in biological triplicate treated with DMSO control or 2uM or 10uM of a potent and selective SRPK inhibitor, SRPKIN-1. We then performed phosphopeptide enrichment, TMT labelling and fractionation prior to MS analysis on the Oribitrap fusion (?). Phosphopeptide identification was performed using Proteome Discoverer (?). We hope that this experimental design should enable identification of phosphopeptides whose abundance is altered by SRPK kinase activity, which we hypothesis to be SRPK substrates.
Statistical analysis of SRPK phosphoproteomic data to identify phosphopeptides that are significantly different between control ES cells and those treated with the SRPKIN-1 inhibitor.
Data came in form of Excel sheet - output from Proteome Discoverer (PD). As the differential expression in PD is done by a simple t-test, we ignore p-values and process data here. We start from normalised abundances. There are three conditions (hereafter DMOS, Inh2uM and Inh10uM) in three replicates each.
Peptide modifications are encoded in a string in one column of the input file. We parse this modifications extracting number of modifications per peptide (n_mod
), modified residue, position and probability of being correct. In some peptides PD returns multiple residues and no modification position. Here is an example:
sequence | modifications | residues | positions |
---|---|---|---|
SNTPMGDKDDDDDDDADEK | 2xTMT6plex [K8; K19]; 1xTMT6plex [N-Term]; 1xPhospho [T3(100)] | T | 3 |
SNTPMGDKDDDDDDDADEK | 1xOxidation [M5]; 2xTMT6plex [K8; K19]; 1xTMT6plex [N-Term]; 1xPhospho [T3(100)] | T | 3 |
GRQEHYEEEEDEEDGAAVAEK | 1xTMT6plex [K21]; 1xTMT6plex [N-Term]; 1xPhospho [Y6(100)] | Y | 6 |
GCSSPPPPEPNPQPPDGPSLQLPP | 1xCarbamidomethyl [C2]; 1xTMT6plex [N-Term]; 1xPhospho [S] | S | NA |
RSSFLNAK | 1xTMT6plex [K8]; 1xTMT6plex [N-Term]; 1xPhospho [S3(99.4)] | S | 3 |
ALTPPADPPR | 1xTMT6plex [N-Term]; 1xPhospho [T3(100)] | T | 3 |
LSTTPSPTNSLHEDGVDDFRR | 1xTMT6plex [N-Term]; 2xPhospho [S6(99.9); S10(100)] | S,S | 6,10 |
LSTTPSPTNSLHEDGVDDFRR | 1xTMT6plex [N-Term]; 3xPhospho [T3(97.6); S10(99.9); T/S] | T,S,T/S | 3,10,NA |
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK | 1xOxidation [M17]; 1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 1xPhospho [S8(100)] | S | 8 |
LSTTPSPTNSLHEDGVDDFRR | 1xTMT6plex [N-Term]; 1xPhospho [S6(98.8)] | S | 6 |
GGCSSGNSQRRSPPTTK | 1xCarbamidomethyl [C3]; 1xTMT6plex [K17]; 1xTMT6plex [N-Term]; 1xPhospho [S12(100)] | S | 12 |
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK | 1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 1xPhospho [S8(100)] | S | 8 |
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK | 1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 2xPhospho [S4(99.9); S8(100)] | S,S | 4,8 |
The plot shows distribution of normalised abundance across samples.
Here is pairwise comparison of all replicates within each condition.
Let’s have a look at Inh2uM_2 vs Inh2uM_3
I marked outstanding peptides with internal identifiers (sequences with modifications are too long). Have a look at a few of them at the bottom of the plot:
peptide_id | sequence | modifications | n_proteins | n_groups | pep | protein_start | protein_end | uniprot | description | gene_name | positions | residues | seqmod | length |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PEP3 | SGVTNMSSPHK | 1xOxidation [M6]; 1xTMT6plex [K11]; 1xTMT6plex [N-Term]; 1xPhospho [S8(99)] | 1 | 1 | 0 | 177 | 187 | Q8R3D1 | TBC1 domain family member 13 | Tbc1d13 | 8 | S | sgvtnmsSphk | 400 |
PEP4266 | AGDSDEESRTDDK | 1xTMT6plex [K13]; 1xTMT6plex [N-Term]; 1xPhospho [S4(100)] | 1 | 1 | 0 | 328 | 340 | O88939 | Zinc finger and BTB domain-containing protein 7A | Zbtb7a | 4 | S | agdSdeesrtddk | 569 |
PEP27531 | GREGSLTGTK | 1xTMT6plex [K10]; 1xTMT6plex [N-Term]; 1xPhospho [S5(100)] | 1 | 1 | 0 | 1072 | 1081 | G3X9K3 | Brefeldin A-inhibited guanine nucleotide-exchange protein 1 | Arfgef1 | 5 | S | gregSltgtk | 1846 |
PEP29175 | DGGQTESNEEGK | 1xTMT6plex [K12]; 1xTMT6plex [N-Term]; 1xPhospho [S7(100)] | 1 | 1 | 0 | 110 | 121 | Q80X50 | Ubiquitin-associated protein 2-like | Ubap2l | 7 | S | dggqteSneegk | 1107 |
Normalised abundance figure shows that the difference seems to be consistent between replicates. I’m not sure how to interpret it.
For differential expression we use limma
. It used an empirical Bayes method to squeeze the peptide-wise residual variances towards a common value (or towards a global trend) (Smyth, 2004; Phipson et al, 2016). The degrees of freedom for the individual variances are increased to reflect the extra information gained from the empirical Bayes moderation, resulting in increased statistical power to detect differential expression.
For more information see Ritchie et al. 2015.
limma
uses a t-test, which is moderated by borrowing data across all peptides. A global variance model is built and moderation squeezes variances towards this global trend. Extreme variances (either large or small) become less extreme, which makes the test more robust to random outliers.
Below, we set a limit of FDR < 0.05 to call a change statistically significant.
We notice that the inhibitor results are very similar at both concentrations. In particular, there are no significantly DE phosphosites between 2uM and 10uM. The Venn diagram below shows the comparison between DE peptides from 2uM and 10uM versus DMSO.
Due to large overlap and 2uM being essentially a subset of 10uM, we focus our attention on 10uM.
Here are all DE sites for 10uM vs DMSO.
The DE phosphosites are located in a rather small number of proteins.
We use STRING database of known and predicted protein-protein interactions to identify clusters of interacting proteins among up- and down-regulated genes.
We use STRING database of known and predicted protein-protein interactions to identify clusters of interacting proteins among up- and down-regulated genes.
Here we have a quick look at potential functional enrichment of genes with phosphosites going up and down. Warning: a complex pattern of phosphorylation is ignored here, we only look which genes/proteins contain up- and down-regulated phosphosites. We simply look at what these proteins have in common in terms of functionality.
Here we see GO terms related to splicing, which is expected.
term_id | term_name | term_namespace | tot | sel | expect | enrich | ids | P |
---|---|---|---|---|---|---|---|---|
GO:0050733 | RS domain binding | molecular_function | 4 | 2 | 0.02 | 124.9 | Son,Srsf5 | 0.00 |
GO:0036002 | pre-mRNA binding | molecular_function | 11 | 2 | 0.04 | 45.4 | Srsf2,Srsf6 | 0.01 |
GO:0043488 | regulation of mRNA stability | biological_process | 22 | 3 | 0.09 | 34.1 | Zc3h14,Fxr1,Carhsp1 | 0.00 |
GO:0034063 | stress granule assembly | biological_process | 18 | 2 | 0.07 | 27.8 | Ubap2l,Prrc2c | 0.03 |
GO:0007517 | muscle organ development | biological_process | 19 | 2 | 0.08 | 26.3 | Chd2,Fxr1 | 0.03 |
GO:0000381 | regulation of alternative mRNA splicing, via spliceosome | biological_process | 36 | 3 | 0.14 | 20.8 | Srsf2,Srsf6,Fxr1 | 0.01 |
GO:0000398 | mRNA splicing, via spliceosome | biological_process | 84 | 4 | 0.34 | 11.9 | Tra2a,Srsf2,Srsf6,Srsf5 | 0.01 |
GO:0003729 | mRNA binding | molecular_function | 134 | 4 | 0.54 | 7.5 | Fxr2,Srsf6,Fxr1,Srsf5 | 0.02 |
GO:0008380 | RNA splicing | biological_process | 168 | 5 | 0.67 | 7.4 | Son,Srsf2,Scaf1,Srsf6,Srsf5 | 0.01 |
GO:0003676 | nucleic acid binding | molecular_function | 321 | 9 | 1.28 | 7.0 | Son,Matr3,Tra2a,Srsf2,Fxr2,Srsf6,Fxr1,Carhsp1,Srsf5 | 0.00 |
GO:0016607 | nuclear speck | cellular_component | 246 | 6 | 0.98 | 6.1 | Son,Zc3h14,Srsf2,Zc3h18,Srsf6,Srsf5 | 0.01 |
GO:0006397 | mRNA processing | biological_process | 219 | 5 | 0.88 | 5.7 | Son,Srsf2,Scaf1,Srsf6,Srsf5 | 0.02 |
GO:0003723 | RNA binding | molecular_function | 515 | 11 | 2.06 | 5.3 | Son,Matr3,Zc3h14,Tra2a,Srsf2,Scaf1,Fxr2,Srsf6,Fxr1,Carhsp1,Srsf5 | 0.00 |
This is also confirmed by Reactome and KEGG pathways enrichment
term_id | term_name | tot | sel | expect | enrich | ids | P |
---|---|---|---|---|---|---|---|
R-MMU-72165 | mRNA Splicing - Minor Pathway | 19 | 2 | 0.08 | 26.3 | Srsf2,Srsf6 | 0.01 |
R-MMU-72187 | mRNA 3’-end processing | 36 | 3 | 0.14 | 20.8 | Srsf2,Srsf6,Srsf5 | 0.01 |
R-MMU-73856 | RNA Polymerase II Transcription Termination | 38 | 3 | 0.15 | 19.7 | Srsf2,Srsf6,Srsf5 | 0.01 |
R-MMU-159236 | Transport of Mature mRNA derived from an Intron-Containing Transcript | 46 | 3 | 0.18 | 16.3 | Srsf2,Srsf6,Srsf5 | 0.01 |
R-MMU-72202 | Transport of Mature Transcript to Cytoplasm | 54 | 3 | 0.22 | 13.9 | Srsf2,Srsf6,Srsf5 | 0.01 |
R-MMU-72163 | mRNA Splicing - Major Pathway | 106 | 3 | 0.42 | 7.1 | Srsf2,Srsf6,Srsf5 | 0.04 |
R-MMU-72172 | mRNA Splicing | 108 | 3 | 0.43 | 6.9 | Srsf2,Srsf6,Srsf5 | 0.04 |
term_id | term_name | tot | sel | expect | enrich | ids | P |
---|---|---|---|---|---|---|---|
mmu03040 | Spliceosome - Mus musculus (mouse) | 77 | 4 | 0.31 | 13.0 | Tra2a,Srsf5,Srsf2,Srsf6 | 0 |
mmu05168 | Herpes simplex virus 1 infection - Mus musculus (mouse) | 60 | 3 | 0.24 | 12.5 | Srsf5,Srsf2,Srsf6 | 0 |
There is no obvious enrichment here, it seems that down-regulated sites are more heterogeneous.
Source code is available from GitHub (private until publication).
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DT_0.20 eulerr_6.1.1 kableExtra_1.3.4 forcats_0.5.1 stringr_1.4.0
## [6] dplyr_1.0.7 purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tibble_3.1.6
## [11] ggplot2_3.3.5 tidyverse_1.3.1 targets_0.9.0
##
## loaded via a namespace (and not attached):
## [1] fs_1.5.2 lubridate_1.8.0 webshot_0.5.2 RColorBrewer_1.1-2
## [5] httr_1.4.2 tools_4.1.2 backports_1.4.1 bslib_0.3.1
## [9] utf8_1.2.2 R6_2.5.1 DBI_1.1.2 colorspace_2.0-2
## [13] withr_2.4.3 tidyselect_1.1.1 GGally_2.1.2 processx_3.5.2
## [17] compiler_4.1.2 cli_3.1.0 rvest_1.0.2 xml2_1.3.3
## [21] labeling_0.4.2 stringfish_0.15.5 sass_0.4.0 scales_1.1.1
## [25] callr_3.7.0 systemfonts_1.0.3 digest_0.6.29 rmarkdown_2.11
## [29] svglite_2.0.0 pkgconfig_2.0.3 htmltools_0.5.2 dbplyr_2.1.1
## [33] fastmap_1.1.0 highr_0.9 htmlwidgets_1.5.4 rlang_0.4.12
## [37] readxl_1.3.1 rstudioapi_0.13 jquerylib_0.1.4 generics_0.1.1
## [41] farver_2.1.0 RApiSerialize_0.1.0 jsonlite_1.7.2 crosstalk_1.2.0
## [45] magrittr_2.0.1 Rcpp_1.0.7 munsell_0.5.0 fansi_0.5.0
## [49] lifecycle_1.0.1 stringi_1.7.6 yaml_2.2.1 plyr_1.8.6
## [53] grid_4.1.2 ggrepel_0.9.1 crayon_1.4.2 haven_2.4.3
## [57] hms_1.1.1 polylabelr_0.2.0 knitr_1.37 ps_1.6.0
## [61] pillar_1.6.4 igraph_1.2.10 codetools_0.2-18 reprex_2.0.1
## [65] glue_1.6.0 evaluate_0.14 data.table_1.14.2 RcppParallel_5.1.4
## [69] modelr_0.1.8 vctrs_0.3.8 tzdb_0.2.0 cellranger_1.1.0
## [73] gtable_0.3.0 polyclip_1.10-0 reshape_0.8.8 qs_0.25.2
## [77] assertthat_0.2.1 xfun_0.29 broom_0.7.11 viridisLite_0.4.0
## [81] ellipsis_0.3.2