Phosphoproteomics of SRPK

Proposal

Background

One of our major interests is in functions of a protein kinase called SRSF protein kinase (SRPK), which is well known to play a role in splicing regulation but also has emerging functions in stem cells and during development. SRPK family genes are mutated in patients with intellectual disability disorders, suggesting that SRPK activity may be required for neurological development and/or function. In this project, we seek to identify new SRPK substrates that are important for regulation of stem cells and neurodevelopment. To this end, we have performed a phosphoproteomic analysis to identify substrates of SRPK that may suggest new molecular functions.

Phosphoproteomics analysis was done with Houjiang Zhou in the MRC-PPU proteomics facility. This involves a very simple experimental set-up in which we have used mouse embryonic stem cells in biological triplicate treated with DMSO control or 2uM or 10uM of a potent and selective SRPK inhibitor, SRPKIN-1. We then performed phosphopeptide enrichment, TMT labelling and fractionation prior to MS analysis on the Oribitrap fusion (?). Phosphopeptide identification was performed using Proteome Discoverer (?). We hope that this experimental design should enable identification of phosphopeptides whose abundance is altered by SRPK kinase activity, which we hypothesis to be SRPK substrates.

Description of Proposed Work

Statistical analysis of SRPK phosphoproteomic data to identify phosphopeptides that are significantly different between control ES cells and those treated with the SRPKIN-1 inhibitor.

Data

Data came in form of Excel sheet - output from Proteome Discoverer (PD). As the differential expression in PD is done by a simple t-test, we ignore p-values and process data here. We start from normalised abundances. There are three conditions (hereafter DMOS, Inh2uM and Inh10uM) in three replicates each.

Peptide modifications are encoded in a string in one column of the input file. We parse this modifications extracting number of modifications per peptide (n_mod), modified residue, position and probability of being correct. In some peptides PD returns multiple residues and no modification position. Here is an example:

sequence	modifications	residues	positions
SNTPMGDKDDDDDDDADEK	2xTMT6plex [K8; K19]; 1xTMT6plex [N-Term]; 1xPhospho [T3(100)]	T	3
SNTPMGDKDDDDDDDADEK	1xOxidation [M5]; 2xTMT6plex [K8; K19]; 1xTMT6plex [N-Term]; 1xPhospho [T3(100)]	T	3
GRQEHYEEEEDEEDGAAVAEK	1xTMT6plex [K21]; 1xTMT6plex [N-Term]; 1xPhospho [Y6(100)]	Y	6
GCSSPPPPEPNPQPPDGPSLQLPP	1xCarbamidomethyl [C2]; 1xTMT6plex [N-Term]; 1xPhospho [S]	S	NA
RSSFLNAK	1xTMT6plex [K8]; 1xTMT6plex [N-Term]; 1xPhospho [S3(99.4)]	S	3
ALTPPADPPR	1xTMT6plex [N-Term]; 1xPhospho [T3(100)]	T	3
LSTTPSPTNSLHEDGVDDFRR	1xTMT6plex [N-Term]; 2xPhospho [S6(99.9); S10(100)]	S,S	6,10
LSTTPSPTNSLHEDGVDDFRR	1xTMT6plex [N-Term]; 3xPhospho [T3(97.6); S10(99.9); T/S]	T,S,T/S	3,10,NA
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK	1xOxidation [M17]; 1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 1xPhospho [S8(100)]	S	8
LSTTPSPTNSLHEDGVDDFRR	1xTMT6plex [N-Term]; 1xPhospho [S6(98.8)]	S	6
GGCSSGNSQRRSPPTTK	1xCarbamidomethyl [C3]; 1xTMT6plex [K17]; 1xTMT6plex [N-Term]; 1xPhospho [S12(100)]	S	12
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK	1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 1xPhospho [S8(100)]	S	8
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK	1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 2xPhospho [S4(99.9); S8(100)]	S,S	4,8

Overview

Abundance distribution

The plot shows distribution of normalised abundance across samples.

Clustering

PCA

UMAP

Replicate comparison

Here is pairwise comparison of all replicates within each condition.

Let’s have a look at Inh2uM_2 vs Inh2uM_3

I marked outstanding peptides with internal identifiers (sequences with modifications are too long). Have a look at a few of them at the bottom of the plot:

peptide_id	sequence	modifications	n_proteins	n_groups	protein_start	protein_end	uniprot	description	gene_name	positions	residues	seqmod	length
PEP3	SGVTNMSSPHK	1xOxidation [M6]; 1xTMT6plex [K11]; 1xTMT6plex [N-Term]; 1xPhospho [S8(99)]	1	1	177	187	Q8R3D1	TBC1 domain family member 13	Tbc1d13	8	S	sgvtnmsSphk	400
PEP4266	AGDSDEESRTDDK	1xTMT6plex [K13]; 1xTMT6plex [N-Term]; 1xPhospho [S4(100)]	1	1	328	340	O88939	Zinc finger and BTB domain-containing protein 7A	Zbtb7a	4	S	agdSdeesrtddk	569
PEP27531	GREGSLTGTK	1xTMT6plex [K10]; 1xTMT6plex [N-Term]; 1xPhospho [S5(100)]	1	1	1072	1081	G3X9K3	Brefeldin A-inhibited guanine nucleotide-exchange protein 1	Arfgef1	5	S	gregSltgtk	1846
PEP29175	DGGQTESNEEGK	1xTMT6plex [K12]; 1xTMT6plex [N-Term]; 1xPhospho [S7(100)]	1	1	110	121	Q80X50	Ubiquitin-associated protein 2-like	Ubap2l	7	S	dggqteSneegk	1107

Normalised abundance figure shows that the difference seems to be consistent between replicates. I’m not sure how to interpret it.

Differential expression

Description

From limma R vignette

For differential expression we use limma. It used an empirical Bayes method to squeeze the peptide-wise residual variances towards a common value (or towards a global trend) (Smyth, 2004; Phipson et al, 2016). The degrees of freedom for the individual variances are increased to reflect the extra information gained from the empirical Bayes moderation, resulting in increased statistical power to detect differential expression.

For more information see Ritchie et al. 2015.

In simpler words

limma uses a t-test, which is moderated by borrowing data across all peptides. A global variance model is built and moderation squeezes variances towards this global trend. Extreme variances (either large or small) become less extreme, which makes the test more robust to random outliers.

Results

Below, we set a limit of FDR < 0.05 to call a change statistically significant.

We notice that the inhibitor results are very similar at both concentrations. In particular, there are no significantly DE phosphosites between 2uM and 10uM. The Venn diagram below shows the comparison between DE peptides from 2uM and 10uM versus DMSO.

Due to large overlap and 2uM being essentially a subset of 10uM, we focus our attention on 10uM.

DE phosphosites

Here are all DE sites for 10uM vs DMSO.

Proteins with DE sites

The DE phosphosites are located in a rather small number of proteins.

We use STRING database of known and predicted protein-protein interactions to identify clusters of interacting proteins among up- and down-regulated genes.

Proteins with upregulated phosphosites

Protein-protein interaction

We use STRING database of known and predicted protein-protein interactions to identify clusters of interacting proteins among up- and down-regulated genes.

Proteins with downregulated phosphosites

Protein-protein interaction

Functional enrichment

Here we have a quick look at potential functional enrichment of genes with phosphosites going up and down. Warning: a complex pattern of phosphorylation is ignored here, we only look which genes/proteins contain up- and down-regulated phosphosites. We simply look at what these proteins have in common in terms of functionality.

Proteins with upregulated sites

Here we see GO terms related to splicing, which is expected.

term_id	term_name	term_namespace	tot	sel	expect	enrich	ids	P
GO:0050733	RS domain binding	molecular_function	4	2	0.02	124.9	Son,Srsf5	0.00
GO:0036002	pre-mRNA binding	molecular_function	11	2	0.04	45.4	Srsf2,Srsf6	0.01
GO:0043488	regulation of mRNA stability	biological_process	22	3	0.09	34.1	Zc3h14,Fxr1,Carhsp1	0.00
GO:0034063	stress granule assembly	biological_process	18	2	0.07	27.8	Ubap2l,Prrc2c	0.03
GO:0007517	muscle organ development	biological_process	19	2	0.08	26.3	Chd2,Fxr1	0.03
GO:0000381	regulation of alternative mRNA splicing, via spliceosome	biological_process	36	3	0.14	20.8	Srsf2,Srsf6,Fxr1	0.01
GO:0000398	mRNA splicing, via spliceosome	biological_process	84	4	0.34	11.9	Tra2a,Srsf2,Srsf6,Srsf5	0.01
GO:0003729	mRNA binding	molecular_function	134	4	0.54	7.5	Fxr2,Srsf6,Fxr1,Srsf5	0.02
GO:0008380	RNA splicing	biological_process	168	5	0.67	7.4	Son,Srsf2,Scaf1,Srsf6,Srsf5	0.01
GO:0003676	nucleic acid binding	molecular_function	321	9	1.28	7.0	Son,Matr3,Tra2a,Srsf2,Fxr2,Srsf6,Fxr1,Carhsp1,Srsf5	0.00
GO:0016607	nuclear speck	cellular_component	246	6	0.98	6.1	Son,Zc3h14,Srsf2,Zc3h18,Srsf6,Srsf5	0.01
GO:0006397	mRNA processing	biological_process	219	5	0.88	5.7	Son,Srsf2,Scaf1,Srsf6,Srsf5	0.02
GO:0003723	RNA binding	molecular_function	515	11	2.06	5.3	Son,Matr3,Zc3h14,Tra2a,Srsf2,Scaf1,Fxr2,Srsf6,Fxr1,Carhsp1,Srsf5	0.00

This is also confirmed by Reactome and KEGG pathways enrichment

term_id	term_name	tot	sel	expect	enrich	ids	P
R-MMU-72165	mRNA Splicing - Minor Pathway	19	2	0.08	26.3	Srsf2,Srsf6	0.01
R-MMU-72187	mRNA 3’-end processing	36	3	0.14	20.8	Srsf2,Srsf6,Srsf5	0.01
R-MMU-73856	RNA Polymerase II Transcription Termination	38	3	0.15	19.7	Srsf2,Srsf6,Srsf5	0.01
R-MMU-159236	Transport of Mature mRNA derived from an Intron-Containing Transcript	46	3	0.18	16.3	Srsf2,Srsf6,Srsf5	0.01
R-MMU-72202	Transport of Mature Transcript to Cytoplasm	54	3	0.22	13.9	Srsf2,Srsf6,Srsf5	0.01
R-MMU-72163	mRNA Splicing - Major Pathway	106	3	0.42	7.1	Srsf2,Srsf6,Srsf5	0.04
R-MMU-72172	mRNA Splicing	108	3	0.43	6.9	Srsf2,Srsf6,Srsf5	0.04

term_id	term_name	tot	sel	expect	enrich	ids	P
mmu03040	Spliceosome - Mus musculus (mouse)	77	4	0.31	13.0	Tra2a,Srsf5,Srsf2,Srsf6	0
mmu05168	Herpes simplex virus 1 infection - Mus musculus (mouse)	60	3	0.24	12.5	Srsf5,Srsf2,Srsf6	0

Proteins with downregulated sites

There is no obvious enrichment here, it seems that down-regulated sites are more heterogeneous.

Resources

Downloads

Differential expression results for 10uM/DMSO

Interactive data

Phospho sites explorer

Source code

Source code is available from GitHub (private until publication).

Session info

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DT_0.20          eulerr_6.1.1     kableExtra_1.3.4 forcats_0.5.1    stringr_1.4.0   
##  [6] dplyr_1.0.7      purrr_0.3.4      readr_2.1.1      tidyr_1.1.4      tibble_3.1.6    
## [11] ggplot2_3.3.5    tidyverse_1.3.1  targets_0.9.0   
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.2            lubridate_1.8.0     webshot_0.5.2       RColorBrewer_1.1-2 
##  [5] httr_1.4.2          tools_4.1.2         backports_1.4.1     bslib_0.3.1        
##  [9] utf8_1.2.2          R6_2.5.1            DBI_1.1.2           colorspace_2.0-2   
## [13] withr_2.4.3         tidyselect_1.1.1    GGally_2.1.2        processx_3.5.2     
## [17] compiler_4.1.2      cli_3.1.0           rvest_1.0.2         xml2_1.3.3         
## [21] labeling_0.4.2      stringfish_0.15.5   sass_0.4.0          scales_1.1.1       
## [25] callr_3.7.0         systemfonts_1.0.3   digest_0.6.29       rmarkdown_2.11     
## [29] svglite_2.0.0       pkgconfig_2.0.3     htmltools_0.5.2     dbplyr_2.1.1       
## [33] fastmap_1.1.0       highr_0.9           htmlwidgets_1.5.4   rlang_0.4.12       
## [37] readxl_1.3.1        rstudioapi_0.13     jquerylib_0.1.4     generics_0.1.1     
## [41] farver_2.1.0        RApiSerialize_0.1.0 jsonlite_1.7.2      crosstalk_1.2.0    
## [45] magrittr_2.0.1      Rcpp_1.0.7          munsell_0.5.0       fansi_0.5.0        
## [49] lifecycle_1.0.1     stringi_1.7.6       yaml_2.2.1          plyr_1.8.6         
## [53] grid_4.1.2          ggrepel_0.9.1       crayon_1.4.2        haven_2.4.3        
## [57] hms_1.1.1           polylabelr_0.2.0    knitr_1.37          ps_1.6.0           
## [61] pillar_1.6.4        igraph_1.2.10       codetools_0.2-18    reprex_2.0.1       
## [65] glue_1.6.0          evaluate_0.14       data.table_1.14.2   RcppParallel_5.1.4 
## [69] modelr_0.1.8        vctrs_0.3.8         tzdb_0.2.0          cellranger_1.1.0   
## [73] gtable_0.3.0        polyclip_1.10-0     reshape_0.8.8       qs_0.25.2          
## [77] assertthat_0.2.1    xfun_0.29           broom_0.7.11        viridisLite_0.4.0  
## [81] ellipsis_0.3.2

Phosphoproteomics of SRPK

Marek Gierlinski

13 January 2022

Proposal

Background

Description of Proposed Work

Data

Overview

Abundance distribution

Clustering

PCA

UMAP

Replicate comparison

Differential expression

Description

From limma R vignette

In simpler words

Results

DE phosphosites

Proteins with DE sites

Proteins with upregulated phosphosites

Protein-protein interaction

Proteins with downregulated phosphosites

Protein-protein interaction

Functional enrichment

Proteins with upregulated sites

Proteins with downregulated sites

Resources

Downloads

Interactive data

Source code

Session info