Fitting protein degradation or synthesis curve using R package "proturn"
Abstract
Pulse-SILAC has been used to analyze protein turnover in a system-wise. This package provides convenient functions to fit protein degradation and synthesis curves using pulsed SILAC data. In addition, a graphical user interface is provided.
Introduction
Pulsed (or time resolved) SILAC experiments have been successfully applied to
proteome-wise study of protein turnover rate [1].
In a simple experimental setup,
initially all proteins in a cell population are completely (heavy) labeled, then
the culture medium is changed from heavy to light (unlabeled) so the newly synthesis
proteins would contain only the light (unlabeled) amino acids. After a long time of
culturing, all the proteins in the population would become light labelled. In order to quantify
protein degradation rate, the amount of heavy labelled proteins at different time points
are compared with heavy labelled proteins at time point 0 (before changing medium).
As a result, the proportion of un-degraded proteins is available at different time points, which
could be used to fit models to calculate protein degradation constant and other parameters.
This package provides functions fitting protein turnover curves using nonlinear least square (NLS)
methods. For the degradation curve, the following model is fitted:
The synthesis curve, the fitted model is:
where and are the degradation and synthesis constant respectively, which are going to be estimated. The is the time points given in hours and is the cell doubling time. With the assumption of steady state (no proliferation), this parameter is set to infinite (Inf
). In addition, two other parameters are included: is the (normalized) amplitude, accounting for the offset seen in data, which could be attributed by the recycling of amino acids in protein synthesis.
Data preparation
The data is assumed to be properly normalized and transform to the ratios, that is, at a specific time point, the proportion of newly synthesized protein (for synthesis curve fitting) or remained protein (for degradation curve fitting). The package is primarily designed for the pSILAC data, but it can also be used to any data transformed to this format.
Workflow
Installation
You can install the pacakge by:
Loading data
Then, load the library and read example data:
dat
is a data.frame has 100 rows, each represent a peptides. Columns 1 to 10 are annotation information for peptides, including sequence, gene name, intensity measured in mass spectrometry, etc. The subsequent columns are the ratio of remaining peptide from degradation and the ratio of newly synthesis peptides, indicated by “deg” and “syn” in the end of column headers. The degradation and synthesis ratios are measured at the same time points from 0 hours to 64 hours.
Fitting a single degradation/synthesis curve
The package provides a easy ways to fit a single degradation or synthesis curve and to visualized the fitted curve. The following codes show how to fit a synthesis curve for the peptide in the first row of dat
:
The horizontal error bar indicates the 95% confidence internal of peptide’s half-life.
Fitting a protein degradation/synthesis curve using multiple peptides
In a pSILAC experiment, we always identify multiple peptides from the same protein, would they have the same degradation/synthesis rate? If so, we should be able to have a narrower confidence interval of $k_d$/$k_s$ if we combine them to fit a single model. In this case, we can use a matrix, where each row represents a single peptide from the protein, as input to fit a model. In our exemplary data, there are some peptides from the same proteins, we will use one of them as an example to know how to fit a single model using multiple peptides.
In this plot, you can see three fitted lines, the black line is the model fitted using both peptides. Where are the other two lines from? Please note that there is an argument fitIndividual = TRUE
, this means in addition to fit one model using all the data points, also fit models using each individual row in the matrix. So in this plot, the green and red lines are the models fitted from each individual peptides. For this protein, the two individual peptides have fairly similar turnover rate, so probably it is not a bad idea to fit a single model using both. But in this paper [2], we clearly see that peptides from the same protein do not always have similar turnover rate, why? Right, isoforms. Some times peptides from a specific isoform may have different cycling rate, in these cases, you may find two clusters of curves. That’s why we also want to fit curves on peptide level.
Also note that the only requirement of fitting combined curve is x
is a matrix, which could be multiple peptides from the same protein (as this example). However, it could also be multiple proteins from the same complex, or the same peptide but multiple identification by mass spectrometry. The function is general enough to fit combined models on all level.
Fitting curves for all peptides/proteins in an experiment
Mass spectrometry base pSILAC measures proteins turnover from a system-wise, according, this package provides a convenient function to fit curves for all peptide/proteins in an experiment. Here, we will fit the degradation curves for all peptides (and proteins) in the example data:
In the fitNLSModels
function, f
is the collapse factor specifying which rows should be combined to fit a single model. In this example, each individual row is a peptide and we want to fit the combined models on protein levels, so f
is a factor (or a character vector) indicating to which protein of each peptides belongs.
In addition to the list of fitted models stored in fits$list
, the fitted parameters are also available in a matrix format:
where the columns with a header starting with “comb” are the model parameters for the combined fittings, the rest are for individual fittings.
Shiny application
The package also include a shiny application, to start the app:
Two arguments could be specified:
-
maxFileSize The maximum file size that is allowed to be uploaded to the app. The default is 100*1024^2 corresponding to ~100MB, adjust it according to your own needs.
-
figureFolder If you want to visually check all the curves, it’s more convenient to save them on your disk. This argument gives a control on where you want to save all the figures.
An instruction about how to use the shiny app could be find here.
Session info
Here is the output of sessionInfo()
on the system on which this
document was compiled:
References
[1] François-Michel Boisvert, Yasmeen Ahmad, Marek Gierliński, Fabien Charrière, Douglas Lamont, Michelle Scott§, Geoff Barton and Angus I. Lamond. 2012. “A Quantitative Spatial Proteomics Analysis of Proteome Turnover in Human Cells.” Molecular & Cellular Proteomics: MCP 11 (3): M111.011429.
[2] Jana Zecha, Chen Meng, Daniel Paul Zolg, Patroklos Samaras, Mathias Wilhelm and Bernhard Kuster. 2018. “Peptide level turnover measurements enable the study of proteoform dynamics.” Molecular & Cellular Proteomics: MCP