Collaborators: Greg Findlay

Proposal

Background

One of our major interests is in functions of a protein kinase called SRSF protein kinase (SRPK), which is well known to play a role in splicing regulation but also has emerging functions in stem cells and during development. SRPK family genes are mutated in patients with intellectual disability disorders, suggesting that SRPK activity may be required for neurological development and/or function. In this project, we seek to identify new SRPK substrates that are important for regulation of stem cells and neurodevelopment. To this end, we have performed a phosphoproteomic analysis to identify substrates of SRPK that may suggest new molecular functions.

Phosphoproteomics analysis was done with Houjiang Zhou in the MRC-PPU proteomics facility. This involves a very simple experimental set-up in which we have used mouse embryonic stem cells in biological triplicate treated with DMSO control or 2uM or 10uM of a potent and selective SRPK inhibitor, SRPKIN-1. We then performed phosphopeptide enrichment, TMT labelling and fractionation prior to MS analysis on the Oribitrap fusion (?). Phosphopeptide identification was performed using Proteome Discoverer (?). We hope that this experimental design should enable identification of phosphopeptides whose abundance is altered by SRPK kinase activity, which we hypothesis to be SRPK substrates.

Description of Proposed Work

Statistical analysis of SRPK phosphoproteomic data to identify phosphopeptides that are significantly different between control ES cells and those treated with the SRPKIN-1 inhibitor.

Data

Data came in form of Excel sheet - output from Proteome Discoverer (PD). As the differential expression in PD is done by a simple t-test, we ignore p-values and process data here. We start from normalised abundances. There are three conditions (hereafter DMOS, Inh2uM and Inh10uM) in three replicates each.

Peptide modifications are encoded in a string in one column of the input file. We parse this modifications extracting number of modifications per peptide (n_mod), modified residue, position and probability of being correct. In some peptides PD returns multiple residues and no modification position. Here is an example:

sequence modifications residues positions
SNTPMGDKDDDDDDDADEK 2xTMT6plex [K8; K19]; 1xTMT6plex [N-Term]; 1xPhospho [T3(100)] T 3
SNTPMGDKDDDDDDDADEK 1xOxidation [M5]; 2xTMT6plex [K8; K19]; 1xTMT6plex [N-Term]; 1xPhospho [T3(100)] T 3
GRQEHYEEEEDEEDGAAVAEK 1xTMT6plex [K21]; 1xTMT6plex [N-Term]; 1xPhospho [Y6(100)] Y 6
GCSSPPPPEPNPQPPDGPSLQLPP 1xCarbamidomethyl [C2]; 1xTMT6plex [N-Term]; 1xPhospho [S] S NA
RSSFLNAK 1xTMT6plex [K8]; 1xTMT6plex [N-Term]; 1xPhospho [S3(99.4)] S 3
ALTPPADPPR 1xTMT6plex [N-Term]; 1xPhospho [T3(100)] T 3
LSTTPSPTNSLHEDGVDDFRR 1xTMT6plex [N-Term]; 2xPhospho [S6(99.9); S10(100)] S,S 6,10
LSTTPSPTNSLHEDGVDDFRR 1xTMT6plex [N-Term]; 3xPhospho [T3(97.6); S10(99.9); T/S] T,S,T/S 3,10,NA
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK 1xOxidation [M17]; 1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 1xPhospho [S8(100)] S 8
LSTTPSPTNSLHEDGVDDFRR 1xTMT6plex [N-Term]; 1xPhospho [S6(98.8)] S 6
GGCSSGNSQRRSPPTTK 1xCarbamidomethyl [C3]; 1xTMT6plex [K17]; 1xTMT6plex [N-Term]; 1xPhospho [S12(100)] S 12
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK 1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 1xPhospho [S8(100)] S 8
YSQSAPGSPVSAQPVIMAVPPRPSNLVAK 1xTMT6plex [K29]; 1xTMT6plex [N-Term]; 2xPhospho [S4(99.9); S8(100)] S,S 4,8

Overview

Abundance distribution

The plot shows distribution of normalised abundance across samples.

Clustering

PCA

UMAP

Replicate comparison

Here is pairwise comparison of all replicates within each condition.

Let’s have a look at Inh2uM_2 vs Inh2uM_3

I marked outstanding peptides with internal identifiers (sequences with modifications are too long). Have a look at a few of them at the bottom of the plot:

peptide_id sequence modifications n_proteins n_groups pep protein_start protein_end uniprot description gene_name positions residues seqmod length
PEP3 SGVTNMSSPHK 1xOxidation [M6]; 1xTMT6plex [K11]; 1xTMT6plex [N-Term]; 1xPhospho [S8(99)] 1 1 0 177 187 Q8R3D1 TBC1 domain family member 13 Tbc1d13 8 S sgvtnmsSphk 400
PEP4266 AGDSDEESRTDDK 1xTMT6plex [K13]; 1xTMT6plex [N-Term]; 1xPhospho [S4(100)] 1 1 0 328 340 O88939 Zinc finger and BTB domain-containing protein 7A Zbtb7a 4 S agdSdeesrtddk 569
PEP27531 GREGSLTGTK 1xTMT6plex [K10]; 1xTMT6plex [N-Term]; 1xPhospho [S5(100)] 1 1 0 1072 1081 G3X9K3 Brefeldin A-inhibited guanine nucleotide-exchange protein 1 Arfgef1 5 S gregSltgtk 1846
PEP29175 DGGQTESNEEGK 1xTMT6plex [K12]; 1xTMT6plex [N-Term]; 1xPhospho [S7(100)] 1 1 0 110 121 Q80X50 Ubiquitin-associated protein 2-like Ubap2l 7 S dggqteSneegk 1107

Normalised abundance figure shows that the difference seems to be consistent between replicates. I’m not sure how to interpret it.

Differential expression

Description

From limma R vignette

For differential expression we use limma. It used an empirical Bayes method to squeeze the peptide-wise residual variances towards a common value (or towards a global trend) (Smyth, 2004; Phipson et al, 2016). The degrees of freedom for the individual variances are increased to reflect the extra information gained from the empirical Bayes moderation, resulting in increased statistical power to detect differential expression.

For more information see Ritchie et al. 2015.

In simpler words

limma uses a t-test, which is moderated by borrowing data across all peptides. A global variance model is built and moderation squeezes variances towards this global trend. Extreme variances (either large or small) become less extreme, which makes the test more robust to random outliers.

Results

Below, we set a limit of FDR < 0.05 to call a change statistically significant.

We notice that the inhibitor results are very similar at both concentrations. In particular, there are no significantly DE phosphosites between 2uM and 10uM. The Venn diagram below shows the comparison between DE peptides from 2uM and 10uM versus DMSO.

Due to large overlap and 2uM being essentially a subset of 10uM, we focus our attention on 10uM.

DE phosphosites

Here are all DE sites for 10uM vs DMSO.

Proteins with DE sites

The DE phosphosites are located in a rather small number of proteins.

We use STRING database of known and predicted protein-protein interactions to identify clusters of interacting proteins among up- and down-regulated genes.

Proteins with upregulated phosphosites

Protein-protein interaction

We use STRING database of known and predicted protein-protein interactions to identify clusters of interacting proteins among up- and down-regulated genes.

Proteins with downregulated phosphosites

Protein-protein interaction

Functional enrichment

Here we have a quick look at potential functional enrichment of genes with phosphosites going up and down. Warning: a complex pattern of phosphorylation is ignored here, we only look which genes/proteins contain up- and down-regulated phosphosites. We simply look at what these proteins have in common in terms of functionality.

Proteins with upregulated sites

Here we see GO terms related to splicing, which is expected.

term_id term_name term_namespace tot sel expect enrich ids P
GO:0050733 RS domain binding molecular_function 4 2 0.02 124.9 Son,Srsf5 0.00
GO:0036002 pre-mRNA binding molecular_function 11 2 0.04 45.4 Srsf2,Srsf6 0.01
GO:0043488 regulation of mRNA stability biological_process 22 3 0.09 34.1 Zc3h14,Fxr1,Carhsp1 0.00
GO:0034063 stress granule assembly biological_process 18 2 0.07 27.8 Ubap2l,Prrc2c 0.03
GO:0007517 muscle organ development biological_process 19 2 0.08 26.3 Chd2,Fxr1 0.03
GO:0000381 regulation of alternative mRNA splicing, via spliceosome biological_process 36 3 0.14 20.8 Srsf2,Srsf6,Fxr1 0.01
GO:0000398 mRNA splicing, via spliceosome biological_process 84 4 0.34 11.9 Tra2a,Srsf2,Srsf6,Srsf5 0.01
GO:0003729 mRNA binding molecular_function 134 4 0.54 7.5 Fxr2,Srsf6,Fxr1,Srsf5 0.02
GO:0008380 RNA splicing biological_process 168 5 0.67 7.4 Son,Srsf2,Scaf1,Srsf6,Srsf5 0.01
GO:0003676 nucleic acid binding molecular_function 321 9 1.28 7.0 Son,Matr3,Tra2a,Srsf2,Fxr2,Srsf6,Fxr1,Carhsp1,Srsf5 0.00
GO:0016607 nuclear speck cellular_component 246 6 0.98 6.1 Son,Zc3h14,Srsf2,Zc3h18,Srsf6,Srsf5 0.01
GO:0006397 mRNA processing biological_process 219 5 0.88 5.7 Son,Srsf2,Scaf1,Srsf6,Srsf5 0.02
GO:0003723 RNA binding molecular_function 515 11 2.06 5.3 Son,Matr3,Zc3h14,Tra2a,Srsf2,Scaf1,Fxr2,Srsf6,Fxr1,Carhsp1,Srsf5 0.00

This is also confirmed by Reactome and KEGG pathways enrichment

term_id term_name tot sel expect enrich ids P
R-MMU-72165 mRNA Splicing - Minor Pathway 19 2 0.08 26.3 Srsf2,Srsf6 0.01
R-MMU-72187 mRNA 3’-end processing 36 3 0.14 20.8 Srsf2,Srsf6,Srsf5 0.01
R-MMU-73856 RNA Polymerase II Transcription Termination 38 3 0.15 19.7 Srsf2,Srsf6,Srsf5 0.01
R-MMU-159236 Transport of Mature mRNA derived from an Intron-Containing Transcript 46 3 0.18 16.3 Srsf2,Srsf6,Srsf5 0.01
R-MMU-72202 Transport of Mature Transcript to Cytoplasm 54 3 0.22 13.9 Srsf2,Srsf6,Srsf5 0.01
R-MMU-72163 mRNA Splicing - Major Pathway 106 3 0.42 7.1 Srsf2,Srsf6,Srsf5 0.04
R-MMU-72172 mRNA Splicing 108 3 0.43 6.9 Srsf2,Srsf6,Srsf5 0.04
term_id term_name tot sel expect enrich ids P
mmu03040 Spliceosome - Mus musculus (mouse) 77 4 0.31 13.0 Tra2a,Srsf5,Srsf2,Srsf6 0
mmu05168 Herpes simplex virus 1 infection - Mus musculus (mouse) 60 3 0.24 12.5 Srsf5,Srsf2,Srsf6 0

Proteins with downregulated sites

There is no obvious enrichment here, it seems that down-regulated sites are more heterogeneous.

Resources

Interactive data

Source code

Source code is available from GitHub (private until publication).

Session info

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DT_0.20          eulerr_6.1.1     kableExtra_1.3.4 forcats_0.5.1    stringr_1.4.0   
##  [6] dplyr_1.0.7      purrr_0.3.4      readr_2.1.1      tidyr_1.1.4      tibble_3.1.6    
## [11] ggplot2_3.3.5    tidyverse_1.3.1  targets_0.9.0   
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.2            lubridate_1.8.0     webshot_0.5.2       RColorBrewer_1.1-2 
##  [5] httr_1.4.2          tools_4.1.2         backports_1.4.1     bslib_0.3.1        
##  [9] utf8_1.2.2          R6_2.5.1            DBI_1.1.2           colorspace_2.0-2   
## [13] withr_2.4.3         tidyselect_1.1.1    GGally_2.1.2        processx_3.5.2     
## [17] compiler_4.1.2      cli_3.1.0           rvest_1.0.2         xml2_1.3.3         
## [21] labeling_0.4.2      stringfish_0.15.5   sass_0.4.0          scales_1.1.1       
## [25] callr_3.7.0         systemfonts_1.0.3   digest_0.6.29       rmarkdown_2.11     
## [29] svglite_2.0.0       pkgconfig_2.0.3     htmltools_0.5.2     dbplyr_2.1.1       
## [33] fastmap_1.1.0       highr_0.9           htmlwidgets_1.5.4   rlang_0.4.12       
## [37] readxl_1.3.1        rstudioapi_0.13     jquerylib_0.1.4     generics_0.1.1     
## [41] farver_2.1.0        RApiSerialize_0.1.0 jsonlite_1.7.2      crosstalk_1.2.0    
## [45] magrittr_2.0.1      Rcpp_1.0.7          munsell_0.5.0       fansi_0.5.0        
## [49] lifecycle_1.0.1     stringi_1.7.6       yaml_2.2.1          plyr_1.8.6         
## [53] grid_4.1.2          ggrepel_0.9.1       crayon_1.4.2        haven_2.4.3        
## [57] hms_1.1.1           polylabelr_0.2.0    knitr_1.37          ps_1.6.0           
## [61] pillar_1.6.4        igraph_1.2.10       codetools_0.2-18    reprex_2.0.1       
## [65] glue_1.6.0          evaluate_0.14       data.table_1.14.2   RcppParallel_5.1.4 
## [69] modelr_0.1.8        vctrs_0.3.8         tzdb_0.2.0          cellranger_1.1.0   
## [73] gtable_0.3.0        polyclip_1.10-0     reshape_0.8.8       qs_0.25.2          
## [77] assertthat_0.2.1    xfun_0.29           broom_0.7.11        viridisLite_0.4.0  
## [81] ellipsis_0.3.2