Understanding the relative importance of signaling pathway components which regulate a specific cellular response is a major focus of current efforts in biology. This interest, along with the inherit complexity of these systems, is driving the development of approaches capable of providing both quantitative predictions as well as guiding the design of future experiments. Of particular interest is the establishment of methods for the analysis of cellularlevel inputoutput signaling relationships that have been characterized over time.
Work by the Alliance for Cellular Signaling (AfCS) has provided an extensive profile
of ligandinduced changes in protein phosphorylation state and cytokine output
response in macrophagelike RAW 264.7 cells. Using model averaging with partial least
squares (PLS) or principal components regression (PCR), we compared multivariate
models quantitatively predicting cytokine release and identifying key regulatory
components of the underlying signaling pathways. We paid particular attention to the
effect of metrics extracted from the experimentally derived signaling time courses so
as to determine whether the inclusion of such temporal information improved model
predictions. Results indicate that we were able to determine the key biological
predictors responsible for generating a specific cytokine response, with model
The use of multivariate approaches and model averaging provides a valuable predictive framework for quantitative studies of these complex biological processes. Results of this work also point to several issues for consideration in the design of similar largescale interrogations.
A continuing challenge in biology today is the need to integrate large quantities of
experimental data into quantitative and testable descriptions of system behavior. Such a
challenge is particularly relevant at the cellular level, where recent technological
advancements have made the generation of largescale and comprehensive data sets feasible.
Given such data, the opportunity arises to greatly improve our understanding of the
overall dynamics of cellular behavior and its relevance to cellular dysfunction. Due to
their size and complexity, it is generally recognized that the data generated through
largescale interrogations are largely uninterpretable without the use of computational
methods for data reduction, analysis and modeling. As a result, a number of methods have
been adopted from fields such as engineering, computer science and statistics, which are
particularly well suited for dealing with such systemsscale biological data [
Recent work by the Alliance for Cellular Signaling (AfCS) has led to the generation of an
extensive, openly accessible, profile of the systemwide response of macrophagelike RAW
264.7 cells to over 200 input stimuli. These stimuli were applied to cells either alone or
simultaneously as a paired combination, with the resulting changes in cytokine output
responses quantified over time. In addition to the cytokine outputs, the phosphorylation
states of 21 signaling proteins were also characterized over time. Overall, such data
presents a largescale picture of cell system dynamics that is still relatively rare in
the literature. Herein, we model the input/output response of RAW 264.7 cells based on the
studies performed by the AfCS. Due to possible advantages of the method (described below),
we use partial least squares regression for modeling input/output responses, and compare
these results with identical analyses using principal components regression (PCR). We were
particularly interested in the temporal aspects of the data as recent work by Janes and
colleagues [
Here we show that for this data, the predictive capability of PLS and PCR were generally
equivalent. However, there was a significant benefit of PLS over PCR in the significant
reduction in the number of variables that must be used within the models to accurately
describe variation within the data. In addition, the ability to generate variable
importance in projection (VIP) scores with PLS provides the ability to readily determine
important variables (e.g. specific signaling molecules) that drive cellular output
response. The generation and interpretation of these VIP scores is much simpler than
methods developed to adapt PCR to this task [
As discussed in greater detail in Materials and Methods, the AfCS data was derived from
RAW 264.7 macrophagelike cells and consisted of phosphorylation state time courses for
21 signaling proteins and the resulting release of seven cytokines including GCSF,
IL1
A method common in the field of chemometrics, PLS is an extension to the multiple
linear regression model and thus related to other methods including principal components
regression [
Both PLS and PCR produce factor scores
We first wanted to look at what effect using temporal information had on prediction
accuracy for both PLS and PCR. In this case, we extracted timedependent signaling
metrics from the time curves that describe the phosphorylation state of the 21
intracellular signaling proteins (see Table
Briefly, the data was split into ten equally sized contiguous blocks. All but one of
the blocks were used to train the model (calibration stage), with the resulting model
then being used to predict the withheld block (test stage). To choose an optimized
number of latent variables (LVs) or principal components (PCs), we examined the
rootmeansquare error (RMSE) between the measured and the predicted responses with
increasing numbers of LVs or PCs for each cytokine. As LVs or PCs, which describe large
amounts of systematic variance (i.e. variables of predictive value), are added to the
model, the crossvalidation RMSE (RMSECV) should decrease. On the other hand, when LVs
or PCs describing only small noise variance are added (i.e. variables that are largely
noise), the RMSECV should increase. For example, when timedependent signaling metrics
were used to predict the output response of TNF
Metrics extracted from protein phosphorylation state time courses.
Metric class  Metrics generated 
Temporal measurements  1 min 
3 min  
10 min  
30 min  


Instantaneous derivatives  1 min 
3 min  
10 min  
30 min  


Summary metrics  area under the curve (AUC) 
Maximum signal  
Mean signal 
After selecting the optimized number of LVs or PCs for each model of cytokine output
response, we examined the squared Pearson correlation coefficient,
Prediction accuracy as measured by squared Pearson correlation coefficient
GCSF  IL1 
IL6  IL10  MIP1 
RANTES  TNF 

PLS (timederived metrics)  0.93 (5)  0.49 (1)  0.63 (5)  0.62 (4)  0.49 (6)  0.65 (2)  0.75 (6) 
PCR (timederived metrics)  0.92 (2)  0.51 (2)  0.64 (21)  0.62 (16)  0.48 (10)  0.64 (2)  0.72 (15) 
PLS (timeaveraged data)  0.88 (5)  0.83 (6)  0.58 (6)  0.64 (5)  0.62 (5)  0.73 (5)  0.83 (5) 
PCR (timeaveraged data)  0.89 (10)  0.85 (10)  0.58 (13)  0.68 (10)  0.59 (9)  0.77 (9)  0.84 (13) 
Number of principal components (for PCR) or latent variables (for PLS) are listed in parentheses.
The results of Table
A benefit of the PLS approach is the ability to readily determine the important/highly
predictive variables within a model. We do this by calculating the weighted VIP score
for each cytokine (see Materials and Methods). An example of this is shown in Figure
Top 10% and 20% most significant timedependent signaling metrics as identified via PLS
Cytokine  Top 10% metrics  Top 20% metrics (not including those in top 10%) 
GCSF  JNK lg: AUC, maximum  JNK lg: mean, @ 30 min, derivative @ 10 min, @ 10 min 
JNK sh: AUC  JNK sh: derivative @ 10 min, maximum, mean, @ 30 min, @ 10 min  
P38: @ 30 min, @ 10 min  
ERK1: derivative @ 10 min  
ERK2: derivative @ 10 min  
RSK: derivative @ 10 min, @ 30 min  
NF 

PKCM: derivative @ 30 min  


IL1 
JNK lg: maximum  JNK lg: AUC, @ 30 min, mean, derivative @ 10 min, @ 10 min 
JNK sh: AUC  JNK sh: maximum, mean, derivative @ 10 min, @ 30 min, @ 10 min  
ERK1: derivative @ 10 min  
ERK2: derivative @ 10 min  
P38: @ 30 min  
RSK: derivative @ 10 min  
PKCM: derivative @ 30 min  
NF 



IL6  STAT3: mean, AUC, derivative @ 3 min, @ 3 min, maximum, @ 1 min, derivative @ 1 min, @ 10 min  STAT3: @ 30 min 
STAT1 

STAT1 



IL10  RSK: derivative @ 10 min  
ERK2: derivative @ 10 min  
ERK1: derivative @ 10 min  
JNK sh: derivative @ 10 min, @ 30 min, AUC, @ 10 min  JNKsh: maximum, mean  
JNK lg: @ 30 min, @ 10 min, derivative @ 10 min  JNK lg: AUC, maximum, mean  
P38: derivative @ 10 min  P38: @ 30 min  
NF 
NF 

GSK3A: derivative @ 10 min  


MIP1 
JNK lg: mean, maximum, AUC  
NF 
NF 

JNK sh: maximum, AUC  JNK sh: mean, derivative @ 10 min  
STAT5: 1 min, derivative @ 1 min  
ERK2: derivative @ 10 min  
ERK1: derivative @ 10 min  
PKCM: maximum, derivative @ 30 min  
P38 @ 30 min, maximum, @ 10 min  
STAT1 



RANTES  JNK lg: maximum, AUC, mean  JNK lg: derivative @ 10 min, @ 30 min, @ 10 min 
JNK sh: maximum, AUC, derivative @ 10 min  JNK sh: mean, @ 10 min, @ 30 min  
ERK2: derivative @ 10 min  
ERK1: derivative @ 10 min  
RSK: derivative @ 10 min  
PKCM: derivative @ 30 min, maximum  
P38: derivative @ 10 min, @ 30 min  
NF 



TNF 
NF 

JNK lg: mean, maximum, AUC  
PKCM: derivative @ 30 min, @ 30 min, maximum  
P38: @ 30 min, @ 10 min, AUC  
JNK sh: maximum, AUC, mean  
RSK: @ 30 min  
ERK1: @ 30 min  
Rps6: derivative @ 30 min  
ERK2: @ 30 min 
To examine the redundancy in the signaling information contained within the original
231metric model, we generated PLS models using only the reduced set of metrics with VIP
scores greater than 1. We found that for each cytokine, a PLS model containing from 49
up to 97 most informative signaling metrics was as predictive as the complete one that
used all 231 metrics (Table
Prediction results of PLS regression using all vital signaling metrics
GCSF  IL1 
IL6  IL10  MIP1 
RANTES  TNF 

number of vital metrics  74  70  49  85  97  76  78 

0.90  0.49  0.72  0.66  0.54  0.69  0.76 
For each cytokine, a PLS regression model using from 49 to 97 most informative (vital) signaling metrics was as predictive as the complete model that used all 231 metrics.
Prediction results of PLS regression using top vital signaling metrics.
GCSF  IL1 
IL6  IL10  MIP1 
RANTES  TNF 

Numb. of metrics  1  1  14  34  46  1  38 

88.0 ± 0.27%  51.4 ± 0.55%  64.2 ± 2.41%  56.6 ± 2.61%  49.4 ± 2.42%  75.0 ± 0.60%  66.3 ± 1.84% 
For each cytokine, a PLS regression model using from 1 to 46 of the most
informative signaling metrics can achieve an average
A benefit of dimension reduction and regression models such as PLS is that they provide
the capability to identify key modulators behind a specific signaling response. For
example, model predictions for GCSF were the most accurate of all cytokines, with
Our analysis found that we were able to predict TNF
As a final example, model predictions for the chemokine RANTES ('Regulated upon
Activation, Normal Tcell Expressed and Secreted' or CCL5) were analyzed. RANTES is
chemotactic for T cells, eosinophils and basophils, and is needed for the maintenance of
allergic inflammation [
Introduced earlier, Figure
While significant work lies ahead, continued efforts in systems approaches to understanding cellular function hold considerable promise. Key to this success is the development and application of computational methods capable of synthesizing predictive models from large and complex data sets. Significant progress is being made in this area, with the application of multivariate approaches such as the PLS method described here being just one of many. Part of the usefulness of this approach for such highlevel or "topdown" modeling, lies in its ability to decrease model complexity by reducing the number of problem dimensions to the smallest, most informative set. Greater reduction in model dimensions and the ability to rank and extract the most important model variables through VIP scores present tangible advantages over PCR.
It should be noted that these highlevel systems models determine the important
modulators of system response by fitting model variables across all experimental
conditions. As a result, some key proteins may be missed under certain circumstances. For
example, protein activities that are primary drivers of a specific output in only a small
number of experimental conditions may not be characterized as "significant." This enforces
the concept that these models must be continually refined so as to address
conditionspecific details and be used to supplement more thorough investigations of
pathways of interest. As described earlier, a major aim of this work was to determine if
the addition of temporal information/metrics into the model would help to improve
predictions. This was especially of interest as this data consisted of relatively few (4)
time points. Results indicate that the use of this limited temporal information provided
generally poor results, improving predictions in only 2 of 7 cases when compared to
timeaveraged data. We found that for many of the time courses, the response curves appear
to have been just initiated, show relatively little dynamics, and/or are far from
returning to poststimulus levels. An example experimental time course is shown in Figure
Previous work using PLS with nearly identical data types has been shown to be highly
effective, with model predictions having 90% correlation with measured outputs [
Together, these results emphasize that similar, systemslevel studies should carefully consider the minimum number of time points that must be sampled in order to appropriately monitor system dynamics. Sampling need not be uniform and/or the same for all variables, but should rather be chosen so as to capture the desired dynamic properties of each variable (e.g. an initial rapid rise in protein phosphorylation state). Such experimental design decisions would be expected to significantly improve prediction accuracy as well as model interpretation, even with sparsely sampled protein signaling curves.
For systems biology approaches to succeed, modeling and experimental approaches must be
highly integrated, and for models to provide worthwhile information, experiments must be
designed so as to maximize the capabilities of the computational methods. The RAW 264.7
single and doubleligand AfCS data used in this study presents two issues of note in this
regard. One issue is the use of completely different cell cultures for the collection of
signaling protein phosphorylation state and cytokine secretion data. While the effects of
this can presumably be monitored with a sufficient number of samples, this may raise
questions in downstream analysis. Perhaps of even greater concern is the finding that in
approximately 80% of experiments, the concentration of an applied stimulant used for the
induction of protein phosphorylation was not the same concentration of stimulant used to
induce a cytokine response. Thus stimuli conditions across experiments cannot be matched,
leading to a significant reduction in the total amount of useable data. One would also
expect that the lack of matching stimuli concentrations would generate significant errors
in downstream predictions. As evidence of this, when we ignore differences in input
concentrations and the complete set of data is used to develop a model for the prediction
of RANTES output, we find that the quality of predictions is quite poor with an
RAW 264.7 single and doubleligand screen data were obtained from the AfCS Data Center.
After applying single or doubleligand stimuli, phosphorylation changes of 21 signaling
proteins were measured at 1, 3, 10 and 30 minutes, intracellular cAMP concentrations
were measured at 20, 40, 90, 300 and 1200 seconds, and extracellular cytokine
concentrations were measured at 2, 3 and 4 hours after initial stimulation [
Since the concentrations for many ligands were different between the protein
phosphorylation and cytokine secretion experiments, the AfCS data were first filtered to
select matched ligandstimulus conditions. From a total of 253 stimulant conditions,
including both single and doubleligand stimuli, this filtering resulted in 55
conditions where the input stimuli concentration was identical for both the
phosphorylation and cytokine experiments. These 55 conditions were then used in
subsequent analyses [see Additional file
Protein phosphorylation data measuring a total of 55 stimulant conditions (including
both single and doubleligand stimulus) was used to construct a predictor matrix
(independent block). Note that cAMP data was not used in this analysis as it was only
measured under a highly limited set of stimulant conditions (35 out of the original 253
stimulating conditions). For phosphorylation data, a fold change over baseline was first
calculated [
Seven cytokines (GCSF, IL1
We also looked at whether the greater number of time points/metrics was responsible for
the improved performance shown in [
PCR models were constructed in Matlab (Mathworks Inc.) with PLS models being
constructed via the SIMPLS algorithm [
Tenfold crossvalidation was performed to select the optimum number of latent
variables used in the regression models. For each cytokine output response, the dataset
was split into ten equally sized folds/subsets. A regression model was then constructed
using all but one of the subsets (calibrationstep) using up to 100 model components.
This model was then used to estimate the samples in the leftout fold [
When assigning significance to each explanatory predictor (e.g., AUC for STAT3) in the
model, the VIP score of each predictor is usually computed from the PLS regression
model. These VIP scores estimate the importance of each predictor variable used in the
PLS model and are often used to select those predictors that are most influential in a
given output response [
To reduce the prediction bias and the likelihood of overfitting, we again used
crossvalidation to obtain test samples that were different from training samples. By
doing this, a more realistic estimate of the prediction error can be obtained. In this
work,
Assuming all models have normally distributed residual errors with a constant variance,
an adjusted small sample Akaike's Information Criterion (AIC) score was determined for
each model from least squares regression statistics [
where
To allow a quick comparison and ranking of candidate models, the difference between AIC scores was computed over all candidate models and for each cytokine as
To better interpret the relative likelihood of each candidate model, the Akaike weight
for each model is determined by [
where the Akaike weight
For a given PLS regression model, the VIP score for the
where
Since the average of the squared weighted VIP scores equals one, important metrics were defined as any signaling metric with a weighted VIP score greater than 1.
YW, GLJ, and SMG conceived the project, YW and SMG performed the research, YW, GLJ and SMG analyzed the results. All authors read and approved the final manuscript.
This work was supported by NIH grants DK37871 and GM30324.