Skip to main content

Mass spectrometry-based serum proteome pattern analysis in molecular diagnostics of early stage breast cancer



Mass spectrometric analysis of the blood proteome is an emerging method of clinical proteomics. The approach exploiting multi-protein/peptide sets (fingerprints) detected by mass spectrometry that reflect overall features of a specimen's proteome, termed proteome pattern analysis, have been already shown in several studies to have applicability in cancer diagnostics. We aimed to identify serum proteome patterns specific for early stage breast cancer patients using MALDI-ToF mass spectrometry.


Blood samples were collected before the start of therapy in a group of 92 patients diagnosed at stages I and II of the disease, and in a group of age-matched healthy controls (104 women). Serum specimens were purified and the low-molecular-weight proteome fraction was examined using MALDI-ToF mass spectrometry after removal of albumin and other high-molecular-weight serum proteins. Protein ions registered in a mass range between 2,000 and 10,000 Da were analyzed using a new bioinformatic tool created in our group, which included modeling spectra as a sum of Gaussian bell-shaped curves.


We have identified features of serum proteome patterns that were significantly different between blood samples of healthy individuals and early stage breast cancer patients. The classifier built of three spectral components that differentiated controls and cancer patients had 83% sensitivity and 85% specificity. Spectral components (i.e., protein ions) that were the most frequent in such classifiers had approximate m/z values of 2303, 2866 and 3579 Da (a biomarker built from these three components showed 88% sensitivity and 78% specificity). Of note, we did not find a significant correlation between features of serum proteome patterns and established prognostic or predictive factors like tumor size, nodal involvement, histopathological grade, estrogen and progesterone receptor expression. In addition, we observed a significantly (p = 0.0003) increased level of osteopontin in blood of the group of cancer patients studied (however, the plasma level of osteopontin classified cancer samples with 88% sensitivity but only 28% specificity).


MALDI-ToF spectrometry of serum has an obvious potential to differentiate samples between early breast cancer patients and healthy controls. Importantly, a classifier built on MS-based serum proteome patterns outperforms available protein biomarkers analyzed in blood by immunoassays.


In recent years cancer diagnostics has been taking enormous advantage of genomics and proteomics, novel fields of modern biology. Proteomics is the study of the proteome, the complete protein components of the cell, tissue or organism, which in contrast to the genome is dynamic and fluctuates depending on a combination of numerous internal and external factors (e.g., physiological status, dietary behavior, stress, disease and medical treatment). Identifying and understanding changes in the proteome related to disease development and therapy progression is the subject of clinical/disease proteomics [1, 2]. It is currently well appreciated that because of the complexity of molecular processes involved in cancer no particular molecular feature alone, neither gene nor protein, could be a reliable biomarker in cancer diagnosis. Instead, multi-component molecular classifiers, exemplified by multi-gene cancer signatures implemented in the functional genomics field, are built and successfully applied. Multi-gene signatures identified for breast cancer have proved their diagnostic power even though detailed knowledge about the function of particular genes that build such signatures may not be available at present [3, 4].

The low molecular weight (<10 kDa) component of the blood proteome is a promising source of previously undiscovered biomarkers. Since this protein fraction is below the limit of effective resolution of conventional gel electrophoresis, mass spectrometric analysis appears to be a method of choice [5], and consequently is an emerging method of clinical proteomics and cancer diagnostics [rev. in: [69]]. The milestone paper in this field was published in 2002 by the group of Petricoin and Liotta, who showed that components of the serum proteome identified by mass spectrometry differentiate patients with ovarian cancer from healthy individuals [10]. Since that time, in spite of a certain controversy regarding this pioneering work [11], numerous papers have been published that aimed to verify the applicability of mass spectrometric analyses of the serum (or plasma) proteome for cancer diagnostics. Although no single peptide could be expected to be a reliable bio-marker in such analyses, multi-peptide sets of markers selected in numerical tests have been shown already in a few studies to have potential prognostic and predictive values for cancer diagnostics [rev. in: [1216]]. The approach that takes into consideration features of the whole proteome, e.g. protein fingerprints given by mass spectra or 2D gel electrophoresis but does not rely on particular identified protein(s), could be called proteome pattern analysis or proteome profiling. In this approach, whose strategy is similar to the search for multi-gene signatures in functional genomics, multi-component sets of peptides/proteins (which are exemplified by ions registered at defined m/z values in the mass spectrum) define specific proteomic patterns (or profiles), allowing one to classify samples even though their particular components lack differentiating power when analyzed separately. Importantly, such pattern/profile reflects features of the specimen's proteome and allows its classification even without detailed knowledge about particular elements [1719]. Mass spectrometry methods particularly suitable for proteome pattern analysis are Matrix-Assisted Laser Desorption-Ionization spectrometry (MALDI) and its derivative Surface-Enhanced Laser Desorption/Ionization spectrometry (SELDI) coupled to a Time-of-Flight (ToF) analyzer, which combine high throughput, fair sensitivity and accuracy of annotation of m/z values of ions in recorded mass spectra of complex protein mixtures such as biological specimens [20, 21]. The relevance of mass spectrometry-based serum (or plasma) proteome pattern analysis has been already tested for several type of human malignancies though none of identified peptide signatures was approved for diagnostics in clinical practice, as yet [15, 2226].

Breast cancer is the most common malignancy in women, comprising about 18% of all female cancers, and 1 million new cases occur worldwide each year. In Western countries the disease is the single commonest cause of death among women aged 40–50, accounting for about a fifth of all deaths in this age group [27]. The most important tools in screening and early detection of breast cancer are imaging techniques: mammography, ultrasonography and magnetic resonance imaging. Unfortunately however, up to 20% of new breast cancer incidents cannot be detected by these methods [28], indicating a constant need for novel molecular markers suitable for screening and early detection of this cancer. Several studies have already addressed the possibility of applying SELDI or MALDI mass spectrometric analyses of blood proteome in diagnostics of breast cancer, and elicited serum (or plasma) proteome patterns specific for patients with breast cancer at either early or late clinical stages [2938]. Among the peptides identified in such differentiating patterns were fragments of C3a [33] and of FPA, fibrinogen, C3f, C4a, ITIH4, apoA-IV, bradykinin, factor XIIIa and transthyrein [35]. In addition, mass spectrometry analyses of the blood proteome allowed the identification of patterns specific for breast cancer patients with different outcome and response to therapy [3943]. Different methodological approaches, both experimental and computational, have been implemented in such studies, and the proposed proteome patterns specific for breast cancer consisted of different peptide sets. However, several peptides that differentiated cancer and control samples appeared reproducibly when comparative analysis across different studies was performed [44], demonstrating the high potential of mass spectrometry-based analyses of the blood proteome pattern in diagnostics of breast cancer once problems with standardization of experimental and computational design are solved.

Here we examined the potential applicability of the serum proteome pattern identified by MALDI-ToF mass spectrometry, either alone or in combination with protein biomarkers analyzed by immunoassays, in early detection of breast cancer. The spectral components that were annotated on the basis of recorded mass spectra were successfully used to build classifiers that allowed reliable identification of early stage breast cancer patients. Importantly, the classifier based on serum proteome pattern outperformed available biomarkers analyzed in blood by immunoassays.


Characteristics of patient and control groups

The clinical part of the study was carried out at the Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice Branch, between May 2006 and January 2008. Ninety-two patients diagnosed with clinical stage I or II breast cancer were included in the study, of average age 58.5 years (range 31–74 years). Patients were classified according to the TNM scale; the majority were scored as T1 and T2 (47% and 45%, respectively) as well as N0 and N1 (75% and 24%, respectively), and none had diagnosed metastases (all M0). Biopsy material was used to assess for histopathological tumor grade (27% G1, 45% G2, 28% G3), as well as for expression of estrogen receptor (63% ER+) and progesterone receptor (60% PR+) by immunohistochemistry. Serum samples were collected before the start of therapy. One hundred and four female volunteers were included as a control group; they were required to be free of any known acute or chronic illness and were not treated with any anticancer therapy in the past. The average age in this group was 54 years (range 32–77 years). The study was approved by the appropriate Ethics Committee and all participants provided informed consent indicating their voluntary participation.

Preparation of serum samples

Samples were collected and processed following a standardized protocol. Blood was collected in a 5 ml Vacutainer Tube (Becton Dickinson), incubated for 30 min. at room temperature to allow clotting, and then centrifuged at 1000 g for 10 min. to remove the clot. The serum was aliquoted and stored at -70°C. Directly before analysis, samples were diluted 1:5 with 20% acetonitrile (ACN) in water, then applied onto an Amicon Ultra-4 membrane (50 kDa cut-off) in a spin column and centrifuged at 3000 g for 30 min. This removed the majority (up to 80%) of albumin and other abundant high-molecular weight proteins from the serum samples (not shown).

Mass spectrometry

Samples were analyzed using an Autoflex MALDI-ToF mass spectrometer (Bruker Daltonics, Bremen, Germany); the analyzer worked in the linear mode and positive ions were recorded in the mass range between 2,000–10,000 Da. Mass calibration was performed after every four samples using standards in the range of 5000 to 17,500 Da (Protein Calibration Standard I, Bruker Daltonics). Prior to analysis each sample was loaded onto a ZipTip C18 tip-microcolumn by passing it through repeatedly 10 times, column was washed with water and then eluted with 1 μl of matrix solution (30 mg/ml sinapinic acid in 50% ACN/H2O and 0.1% TFA with addition of 1 mM n-octyl glucopyranoside) directly onto the 600 μm AnchorChip (Bruker Daltonics) plate. ZipTip extraction/loading was repeated twice for each sample and for each spot on the plate two spectra were acquired after 120 laser shots (i.e. four spectra were recorded for each sample). Spectra were exported from the Bruker FlexAnalysis 2.2 software in standard 8-bit binary ASCII format; they consisted of approximately 45,400 measurement points describing mass to charge ratios (m/z) for consecutive [M+H]+ ions and the corresponding signal abundances, covering the range of analyzed m/z values.

Analysis of protein tumor markers in plasma

Plasma samples were obtained after centrifugation of blood on a Ficoll gradient (Lymphoprep™, ICN), and then levels of selected markers were quantified using standard methods of immuno-diagnostics. Enzyme-Linked Immunosorbent Assay (ELISA) was used for assessment of leptin (DRG Diagnostics) and osteopontin (R&D Systems), Chemiluminescent Microparticle Immunoassay (CMIA) for assessment of CEA (Abbott), Trace Resolved Amplified Cryptate Emission (TRACE) for assessment of CYFRA 21.1 (Brahms), and Microparticle Enzyme Immunoassay (MEIA) for assessment of CA15.3 (Abbott). In addition, the level of osteopontin was analyzed in serum samples as described above.

Data Processing and Statistical Analysis

The preprocessing of data that included averaging of technical repeats, interpolation of missing or non-aligned points, binning of neighboring points to reduce data complexity, removal of the spectral area below baseline and the total ion current (TIC) normalization was performed according to procedures considering to be standard in the field [45, 46]. In the second step the spectral components, which reflected [M+H]+ ions recorded at defined m/z values, were identified using decomposition of mass spectra into their Gaussian components. The spectra were modeled as a sum of Gaussian bell-shaped curves, then models were fitted to the experimental data by a variant of the expectation maximization (EM) algorithm [47]. In a few cases when the standard deviation of a Gaussian exceeded a value of 50 the corresponding spectral component was excluded from further more detailed analyses. Based on the decomposition of the average mass spectrum into the Gaussian components, the classifier features were computed by the scalar product with the Gaussian curves treated as kernel functions. The classification used version of the Support Vector Machine (SVM) algorithm described by Schölkopf and coworkers [48]. The size of the training sample was changed from 20% to 90% of the whole dataset, and for each size the two-step training/validation procedure was repeated 1000 times to estimate the average error rate and its 95% confidence interval, which characterized the accuracy of classification. In order to further characterize the quality of classification, receiver operating curves (ROC) were computed by changing the value of the classification threshold in the SVM classifiers, and averaging the obtained specificity/sensitivity proportions over 1000 random validation experiments. We tested the performance of classification with classifiers built of different numbers of spectral components by estimating the level of total errors, as well the number of false positive and false negative classifications. Construction and validation of a classifier is a statistical process, i.e. many different classifiers built of a given number of spectral components were tested (1000 random splits of the dataset), and those which pass the quality threshold could be built of different spectral components. Thus, to identify the components that are the best determinants of a specific proteome pattern we looked for the most frequent components in classifiers that correctly classified samples. The performance of classifiers built of optimized components was assessed by standard logistic regression (1000 iterations with a 50/50 split of the training/validation set).

Results and discussion

Classifiers built on spectral components that determine proteome patterns

The low-molecular-weight fraction of the blood serum proteome consists of numerous peptides, proteins and their fragments. Some of these interact with each other, and a substantial fraction of this blood proteome compartment is carried by albumin as cargo peptides [49, 50]. For this reason we implemented dilution of serum samples with a denaturing organic solvent (acetonitrile) that destroyed the majority of protein interactions and allowed analysis of individual peptides dissociated from (not interacting with) other proteins (e.g., albumin). Characteristic features of MALDI ionization are that most ions created during laser irradiation are singly charged (multiply charged ions, especially those with low m/z values, have very low abundances and can be are neglected), and that these ions are not fragmented under the ionization conditions applied. In other words, peaks registered in a MALDI mass spectrum correspond to mono-protonated peptide/protein molecular ions [M+H]+ described by m/z values that reflect actual molecular weights increased by the mass of the proton. However, when MALDI mass spectra are recorded over a wide range of m/z values (like the 2–10 kDa range in this study) the expected mass accuracy is relatively low and reaches 0.01–0.1% of the analyte's molecular mass, which corresponds to a few Daltons in the range of m/z values analyzed. In consequence, the relative broadening of spectral peaks recorded for the [M+H]+ ions could reflect the low resolution of the analyzer operating in the linear mode or might result in overlapping of ions originating from protein/peptides of very similar molecular masses. In addition, because of technological imperfections there might be some shift in the positions of peptide ions between measurements, which adds more complexity to analyses of large datasets. For this reason, some approaches used for analysis of large datasets relay on alignment of identified spectral peaks [45], which requires numerical "stretching" of spectra before further analyses.

Here we decided to implement an original mathematical procedure based on modeling average spectra and then fitting actual experimental spectra into such a model. Averaging was performed over either the whole dataset or data for cancer patients only, depending on whether the model was used to discriminate cancer and normal samples or different clinical outcomes of patients. We tested models with different numbers of components, and found that for the mass spectra analyzed in the present work 300 components ensured both sufficient fidelity of the model and its efficient computation (not shown). As a result of computation an "average" spectrum was decomposed into spectral components characterized by the exact molecular weight (m/z values of recorded [M+H]+ ions) and the interval where fit corresponding peaks in at least 95% of actual spectra expected in the dataset (+/-95% CI). The resulting spectral components reflect peaks recorded in multiple samples during mass spectrometric analysis, which contained either single peptide/protein ions or a combination of a few ions of very similar m/z values. This approach allowed us to avoid artifacts resulting from the peak alignment and facilitated quantitative analysis of data by simple assessment of signal volumes that fitted to a given component within its 95% CI. Having identified and quantified spectral components, one could find certain whose abundances were significantly different between groups of samples (e.g. between cancer patient and healthy samples) which could be defined as "differentiating". However, to obtain more reliable classification of samples we used spectral components to build multi-component classifiers that determined proteome patterns characteristic for groups, and looked for the most frequent components in classifiers that classified samples correctly.

Identification of components that determine proteome patterns specific for healthy persons and breast cancer patients

At first we compared the serum proteome patterns of 104 healthy women and 92 early stage breast cancer patients. Spectral components corresponding to protein/peptide [M+H]+ ions recorded in MALDI mass spectra were used to built classifiers to perform cancer/healthy control classifications as described above. The best classification performance was obtained with classifiers built of 2–5 features, i.e. spectral components (Fig. 1A). To estimate the sensitivity and specificity of classification, ROC curves [51] were computed for classifiers built of 3 or 4 spectral components. According to our estimations these classifiers allowed classification of cancer patients with 85% specificity and 82–83% sensitivity (Fig. 1B).

Figure 1

Estimation of the performance of classification of breast cancer samples. A – The total error rate was plotted against the number of features (i.e. spectral components) in the classifier. Shown are average error rates and 95% confidence intervals calculated based on 1000 random validation experiments with 50:50 training/validation split of data. B – Estimation of the sensitivity and specificity of the classification for classifiers built of three or four spectral components. The ROC curve was computed by changing the value of the probability threshold in the SVM classifier from 0.0 to 1.0, and averaging the specificity obtained versus sensitivity rate over 1000 random repeats of training and validation.

In further analyses we looked for the most frequent spectral components in classifiers that correctly classified breast cancer samples. The three most important components corresponded to the following [M+H]+ peptide ions: m/z = 2865.54, m/z = 3578.73, and m/z = 2303.48 (Fig. 2A). Most interestingly, two of these (m/z = 2865.54 and m/z = 3578.73) were present in nearly all well-performing classifiers, while the third (m/z = 2303.48) was present in 78% of classifiers; it was noteworthy that all other spectral components appeared in classifiers less frequently (<50%; Table 1). Importantly, these most frequent components of cancer classifiers had very high potency to differentiate control and cancer samples by themselves; the statistical significance of differences obtained in univariant analyses for these three peaks were at the level of p-values from 10-20 to 10-14 (they remained highly significant after application of the Bonferroni correction for multiple testing; Table 1). Fig. 2B shows fragments of mass spectra in the near vicinity of the components that were the most frequent features of these breast cancer classifiers; the actual spectral lines for samples from all 196 individuals are shown together with the model component. The levels of such components in samples from individual breast cancer patients and healthy controls were quantified and are shown as box-plots (Fig. 2B).

Figure 2

Characterization of spectral components essential for cancer classification. A – The three most frequent differentiating components are marked with arrows along the mass spectra of serum samples of cancer patients (red lines) and healthy controls (green lines). B – Actual spectral plots of three selected components for cancer patients (red lines) and healthy controls (green lines), as well as modeled Gaussian kernels (blue curves); X-axes represent the m/z values, Y-axes represent intensities. Box-plots on the right represent quantification of the abundance of spectral components in samples from cancer patients (red) and healthy controls (green) (shown are minimum, lower quartile, median, upper quartile and maximum values; outliers are marked by asterisks).

Table 1 Characteristics of spectral components that differentiated samples from breast cancer patients and healthy controls.

We also found that 49 out of 300 modeled spectral components (i.e., 16%) had themselves a high potential to differentiate control and cancer samples in univariant analyses (p-value < 0,05 after the Bonferroni correction). Furthermore, all 14 spectral components that appeared in at least 1% of classifiers built of 4 features retained a very high differentiation potential in univariant analyses (p-value < 0.0002 after the Bonferroni correction; Table 1). In addition, we cross-compared spectral components that showed some differentiating power in our study (90 spectral components with uncorrected p-value < 0.005) with spectral peaks that were reported in some other published studies to differentiate breast cancer from healthy control samples (uncorrected p-value < 0.005). The correspondence of [M+H]+ ions was based on ± 0.2% of the m/z values. We found that at least 15 of these spectral components had a corresponding differentiating peak in comparable studies (although not always showing the same tendency; Table 2). This reproducibility, observed in spite of large differences in experimental and computational design, indicates a potency of convergence toward a common proteome pattern specific for breast cancer samples. Interestingly, two spectral components that appeared the most important for cancer classification in our study (i.e., m/z = 2865.54 and m/z = 3578.73) were not reported as differentiating peaks in other studies. We note, however, that in our study serum was analyzed after removal of albumin and components bound to it, which apparently influenced the pattern of mass spectra of the low-molecular-weight fraction of the blood proteome. We observed markedly increased levels of some spectral components in albumin-depleted samples as compared to those analyzed directly (not shown), which could possibly be explained by a reduced efficiency of ionization and detection of certain less abundant peptides in the presence of albumin [49].

Table 2 Comparison of discriminating spectral components/peptide peaks found in this study and in other published work.

Serum proteome patterns identified by MALDI-ToF analyses are similar for different sub-groups of early stage breast cancer patients

Having established that MALDI-ToF analysis of serum peptides identified proteome patterns characteristic for cancer patients, we next examined whether features of peptide profiles would differentiate specific subgroups of patients. First, the group of patients was divided into two equal subgroups according to their age (younger or older then 56.5 years, which was the median), and then spectral classifiers were built according to the methodology described above. In this particular case the performance of classification was about 50% independently of the number of spectral components (features) in classifiers (Fig. 3A), and consequently the classifier had about 50% specificity and 50% sensitivity as shown on the corresponding ROC curve (Fig. 3B). This indicated that there was no real difference in serum proteome patterns between subgroups of patients divided according to their age. This result could be expected because in the whole group there was only 1 patient younger then 35 years which is normally considered an early appearance of cancer, and thus our two age-related subgroups most possibly reflect a random division of the group. Having this "negative control" classification, we next aimed to identify serum proteome patterns specific for subgroups of patients with different clinical and molecular outcomes. We compared patients with different primary tumor size (T1 vs. T2), lymph node status (N0 vs. N1), histopathological grade (G1 and G2 vs. poorly differentiated G3), and also two well-established breast cancer prognostic and predictive molecular markers, expression of estrogen receptor or progesterone receptor [rev. in: [5254]]. For each comparison the performance of classification (total error of classifiers built of 1 to 20 features) and the corresponding ROC curves for classifiers built of 15 spectral components (these were representative of ROC curves computed for classifiers built of 1 to 20 features) are shown in Fig. 3. Most importantly, we observed a low performance of putative classification with a high level of errors for all analyses carried out. Although analyses based on the nodal status and the histopathological grade showed relatively moderate levels of total error (Fig. 3A), they had a very high level of false negative classifications (not shown) which was related to the unbalanced number of subgroups compared (see Table 3); the shape of the corresponding ROC curves also reflect this unbalance (Fig. 3B).

Figure 3

Estimation of differences of serum proteome patterns between sub-groups of breast cancers patients. Patients were differentiated by age, primary tumor size (T), lymph nodal status (N), histopathological grade (G), and estrogen (ER) and progesterone (PR) receptor expression. A – The total error rates of classification plotted against the number of features in the classifiers as in Fig. 1A; the actual line width corresponded to 95% confidence intervals. B – ROC curves computed for classifiers built of 15 spectral components for each comparison (computation was done as described in Fig. 1B).

Table 3 Comparison of serum proteome patterns among different sub-groups of breast cancer patients.

The spectral components identified by Gaussian model decomposition were also used for univariant analyses of differences between the subgroups described above. Table 3 presents examples of the top five spectral components with the lowest p-values identified for each of such comparisons. Most importantly, although in standard analyses the levels of some components were different between the subgroups compared, none of these differences appeared significant after application of the Bonferroni test for multiple testing correction (not shown). This result was in complete agreement with results of classification by multi-component classifiers (Fig. 3), which clearly showed similar serum proteome patterns identified by MALDI-ToF analyses in different sub-groups of the early stage breast cancer group. This finding suggested that the multi-component cancer classifier described above might be potentially applicable for early detection of breast cancer, independent of further more detailed clinical and pathological features.

A classifier built on MS-based serum proteome pattern outperforms available protein biomarkers analyzed in blood by immunoassays

To further assess potential diagnostic power of multi-component classifier described above we compared reliability of classification based on biomarker identified by mass spectrometry with the one that based on available protein biomarkers analyzed in blood by immunoassays. Five markers were selected: carcinoembryonic antigen (CEA), carbohydrate antigen CA15.3, cytokeratin fragment CYFRA-21.1, leptin and osteopontin, which had putative diagnostic value for breast cancer, especially at advanced clinical stages, yet none of them was routinely used for early diagnostics of breast cancer [5560]. The plasma levels of these biomarkers were quantified in a group of early stage breast cancer patients (which largely overlapped with the group examined using MALDI-ToF mass spectrometry) and compared with corresponding levels in a group of healthy donors (Table 4). We observed that the level of osteopontin was markedly increased in plasma of cancer patients, and the difference had a high level of statistical significance (p = 0.0003). The differences were much less significant for the four other markers, and therefore osteopontin alone was used in further analyses. The anti-osteopontin antibody used for ELISA recognized all four isoforms (OPN-a, OPN-b, OPN-c, OPN-d) and their different proteolytic fragments present in blood, and thus direct correlation of the ELISA results with MALDI-ToF analyses was not possible. When the plasma level of osteopontin was used for cancer classification it showed 88% sensitivity but only 28% specificity (as tested by the standard logistic regression method).

Table 4 Levels of tumor markers in plasma of breast cancer patients and healthy controls.

With the aim of constructing a putative marker useful in early diagnosis of breast cancer, we decided to combine features of the serum proteome pattern identified by MALDI-ToF MS analysis and the level of osteopontin measured by ELISA. Three spectral components, m/z = 2865.54, m/z = 3578.73, and m/z = 2303.48 Da, which were the most frequent components of the cancer classifier described above, were selected for these analyses. The marker built of this three spectral components showed 78% specificity and 88% sensitivity when tested by the standard logistic regression method. Then, the level of osteopontin was re-tested in serum samples from the cancer patients and healthy individuals subjected to the MS-based study. In this case, however, the average concentration of osteopontin in serum was about two-fold lower as compared to that in plasma, and the difference between cancer patients and healthy persons was much less pronounced. The biomarker built of the serum level of osteopontin alone showed 84% specificity and but only 12% sensitivity when tested by the standard logistic regression method. Finally we tested the performance of a marker built of four features, the three most frequent spectral components (m/z = 2303.48, 2865.54, and 3578,73) and osteopontin. This combined marker showed 78% specificity and 88% sensitivity, the same as the marker built of three spectral peaks alone.


Here we confirmed the high potential of serum proteome pattern analysis by MALDI-ToF spectrometry for the differentiation between early breast cancer patients and healthy controls. Most importantly, a classifier built on this analysis outperforms those based on available protein biomarkers analyzed by immunoassays in blood. However, further combination of MS-based serum proteome pattern analysis with traditional cancer markers might possibly result in a biomarker with a reliability high enough for practical implementation in the early detection and diagnostics of breast cancer.


  1. 1.

    Hanash S: Disease proteomics. Nature. 2003, 422: 226-232.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Wulfkuhle JD, Liotta LA, Petricoin EF: Proteomic applications for the early detection of cancer. Nat Rev Cancer. 2003, 3 (4): 267-275.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    van't Veer L, Dai H, Vijver van de M, He YD, Hart AAM, Mao M, Peterse HL, Kooy van der K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 415: 530-536.

    Article  Google Scholar 

  4. 4.

    Buyse M, Loi S, van't Veer L, Viale G, Delorenzi M, Glas AM, Saghatchian d'Assignies M, Bergh L, Lidereau R, Ellis P, Harris A, Bogaerts J, Therasse P, Floore A, Amakrane M, Piette F, Rutgers E, Sotiriou C, Cardoso F, Piccard MJ: Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst. 2006, 98 (17): 1183-1192.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Aebersold R, Mann M: Mass spectrometry-based proteomics. Nature. 2003, 422: 198-207.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Liotta LA, Ferrari M, Petricoin EF: Clinical proteomics: written in blood. Nature. 2003, 425: 905-

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    Rosenblatt KP, Bryant-Greenwood P, Killian JK, Mehta A, Geho D, Espina V, Petricoin EF, Liotta LA: Serum proteomics in cancer diagnosis and management. Annu Rev Med. 2004, 55: 97-112.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Liotta LA, Petricoin EF: Serum peptidome for cancer detection: spinning biological trash into diagnostic gold. J Clin Invest. 2006, 116: 26-30.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  9. 9.

    Rodland KD: Proteomics and cancer diagnosis: the potential of mass spectrometry. Clin Biochem. 2004, 37: 579-583.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA: Use of proteomic patterns in serum to identify ovarian cancer. Lancet. 2002, 359: 572-577.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Ransohoff DF: Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer Inst. 2005, 97: 315-319.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Posadas EM, Simpkins F, Liotta LA, MacDonald C, Kohn EC: Proteomic analysis for the early detection and rational treatment of cancer-realistic hope?. Ann Oncol. 2005, 16: 16-22.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Azad NS, Rasool N, Annunziata CM, Minasian L, Whiteley G, Kohn EC: Proteomics in clinical trials and practice. Mol Cell Proteomics. 2006, 5: 1819-1829.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Cho WCS: Contribution of oncoproteomics to cancer biomarker discovery. Mol Cancer. 2007, 6: 25-

    PubMed Central  Article  PubMed  Google Scholar 

  15. 15.

    Conrads TP, Hood BL, Issaq HJ, Veenstra TD: Proteomic patterns as a diagnostic tool for early-stage cancer: a review of its progress to a clinically relevant tool. Mol Diagn. 2004, 8: 77-85.

    PubMed  Google Scholar 

  16. 16.

    Solassol J, Jacot W, Lhermitte L, Boulle N, Maudelonde T, Mangé A: Clinical proteomics and mass spectrometry profiling for cancer detection. Expert Rev Proteomics. 2006, 3: 311-320.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Dworzanski JP, Snyder AP: Classification and identification of bacteria using mass spectrometry-based proteomics. Expert Rev Proteomics. 2005, 2: 863-878.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Somorjai RL: Pattern recognition approaches for classifying proteomic mass spectra of biofluids. Methods Mol Biol. 2008, 428: 383-396.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Li L, Tang H, Wu Z, Gong J, Gruidl M, Zou J, Tockman M, Clark RA: Data mining techniques for cancer detection using serum proteomic profiling. Artif Intell Med. 2004, 32: 71-83.

    Article  PubMed  Google Scholar 

  20. 20.

    Hutchens TW, Yip TT: New desorption strategies for the mass spectrometric analysis of macromolecules. Rapid Commun Mass Spectrom. 1993, 7: 576-80.

    CAS  Article  Google Scholar 

  21. 21.

    Petricoin EF, Liotta LA: SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. Curr Opin Biotech. 2004, 15: 24-30.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Yang SY, Xiao XY, Zhang WG, Zhang LJ, Zhang W, Zhou B, Chen G, He DC: Application of serum SELDI proteomic patterns in diagnosis of lung cancer. BMC Cancer. 2005, 5: 83-

    PubMed Central  Article  PubMed  Google Scholar 

  23. 23.

    Liu XP, Shen J, Li ZF, Yan L, Gu J: A serum proteomic pattern for the detection of colorectal adenocarcinoma using surface enhanced laser desorption and ionization mass spectrometry. Cancer Invest. 2006, 24: 747-753.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Lin YW, Lin CY, Lai HC, Chiou JY, Chang CC, Yu MH, Chu TY: Plasma proteomic pattern as biomarkers for ovarian cancer. Int J Gynecol Cancer. 2006, 16 (Suppl 1): 139-146.

    Article  PubMed  Google Scholar 

  25. 25.

    Lim JY, Cho JY, Paik YH, Chang YS, Kim HG: Diagnostic application of serum proteomic patterns in gastric cancer patients by ProteinChip surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. Int J Biol Markers. 2007, 22: 281-286.

    CAS  PubMed  Google Scholar 

  26. 26.

    Palmblad M, Tiss A, Cramer R: Mass spectrometry in clinical proteomics – from the present to the future. Proteomics Clin Appl. 2009, 3: 6-17.

    CAS  Article  PubMed  Google Scholar 

  27. 27.

    McPherson K, Steel CM, Dixon JM: Breast cancer – epidemiology, risk factors, and genetics. BMJ. 2000, 321: 624-628.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  28. 28.

    Astley SM: Computer-based detection and prompting of mammographic abnormalities. Br J Radiol. 2004, 77: S194-S200.

    Article  PubMed  Google Scholar 

  29. 29.

    Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW: Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem. 2002, 48: 1296-1304.

    CAS  PubMed  Google Scholar 

  30. 30.

    Laronga C, Becker S, Watson P, Gregory B, Cazares L, Lynch H, Perry RR, Wright GL, Drake RR, Semmes OJ: SELDI-TOF serum profiling for prognostic and diagnostic classification of breast cancers. Dis Markers. 2003, 19: 229-238.

    CAS  Article  PubMed  Google Scholar 

  31. 31.

    Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D, Perry RR, Wright GL, Semmes OJ: A novel approach toward development of a rapid blood test for breast cancer. Clin Breast Cancer. 2003, 4: 203-209.

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Caputo E, Lombardi ML, Luongo V, Moharram R, Tornatore P, Pirozzi G, Guardiola J, Martin BM: Peptide profiling in epithelial tumor plasma by the emerging proteomic techniques. J Chromatogr B Analyt Technol Biomed Life Sci. 2005, 819 (1): 59-66.

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Li J, Orlandi R, White CN, Rosenzweig J, Zhao J, Seregni E, Morelli D, Yu Y, Meng X-Y, Zhang Z, Davidson NE, Fung ET, Chan DW: Independent validation of candidate breast cancer serum biomarkers identified by mass spectrometry. Clin Chem. 2005, 51: 2229-2235.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Mathelin C, Cromer A, Wendling C, Tomasetto C, Rio MC: Serum biomarkers for detection of breast cancers: a prospective study. Breast Cancer Res Treat. 2006, 96: 83-90.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB, Fleisher M, Lilja H, Brogi E, Boyd J, Sanchez-Carbayo M, Holland EC, Cordon-Cardo C, Scher HI, Tempst P: Differential exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest. 2006, 116: 271-284.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  36. 36.

    de Noo ME, Deelder A, Werff van der M, Özalp A, Mertens B, Tollenaar R: MALDI-TOF serum protein profiling for the detection of breast cancer. Onkologie. 2006, 29: 501-506.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Belluco C, Petricoin EF, Mammano E, Facchiano F, Ross-Rucker S, Nitti D, Di Maggio C, Liu C, Lise M, Liotta LA, Whiteley G: Serum proteomic analysis identifies a highly sensitive and specific discriminatory pattern in stage 1 breast cancer. Ann Surg Oncol. 2007, 14 (9): 2470-2476.

    Article  PubMed  Google Scholar 

  38. 38.

    Callesen AK, Vach W, Jørgensen PE, Cold S, Tan Q, dePont Christensen R, Mogensen O, Kruse TA, Jensen ON, Madsen JS: Combined experimental and statistical strategy for mass spectrometry based serum protein profiling for diagnosis of breast cancer: a case-control study. J Proteome Res. 2008, 7: 1419-1426.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Pusztai L, Gregory BW, Baggerly KA, Peng B, Koomen J, Kuerer HM, Esteva FJ, Symmans WF, Wagner P, Hortobagyi GN, Laronga C, Semmes OJ, Wright GL, Drake RR, Vlahou A: Pharmacoproteomic analysis of prechemotherapy and postchemotherapy plasma samples from patients receiving neoadjuvant or adjuvant chemotherapy for breast carcinoma. Cancer. 2004, 100: 1814-1822.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Goncalves A, Esterni B, Bertucci F, Sauvan R, Chabannon C, Cubizolles M, Bardou VJ, Houvenaegel G, Jacquemier J, Granjeaud S, Meng XY, Fung ET, Birnbaum D, Maraninchi D, Viens P, Borg JP: Postoperative serum proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving adjuvant chemotherapy. Oncogene. 2006, 25: 981-989.

    CAS  Article  PubMed  Google Scholar 

  41. 41.

    Heike Y, Hosokawa M, Osumi S, Fujii D, Aogi K, Takigawa N, Ida M, Tajiri H, Eguchi K, Shiwa M, Wakatabe R, Arikuni H, Takaue Y, Takashima S: Identification of serum proteins related to adverse effects induced by docetaxel infusion from protein expression profiles of serum using SELDI ProteinChip system. Anticancer Res. 2005, 25: 1197-1203.

    CAS  PubMed  Google Scholar 

  42. 42.

    Becker S, Cazares LH, Watson P, Lynch H, Semmes OJ, Drake RR, Laronga C: Surfaced-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) differentiation of serum protein profiles of BRCA-1 and sporadic breast cancer. Ann Surg Oncol. 2004, 11: 907-914.

    Article  PubMed  Google Scholar 

  43. 43.

    Pawlik TM, Fritsche H, Coombes KR, Xiao L, Krishnamurthy S, Hunt KK, Pusztai L, Chen JN, Clarke CH, Arun B, Hung MC, Kuerer HM: Significant differences in nipple aspirate fluid protein expression between healthy women and those with breast cancer demonstrated by time-of-flight mass spectrometry. Breast Cancer Res Treat. 2005, 89: 149-157.

    CAS  Article  PubMed  Google Scholar 

  44. 44.

    Callesen AK, Vach W, Jørgensen PE, Cold S, Mogensen O, Kruse TA, Jensen ON, Madsen JS: Reproducibility of mass spectrometry based protein profiles for diagnosis of breast cancer across clinical studies: a systematic review. J Proteome Res. 2008, 7: 1395-1402.

    CAS  Article  PubMed  Google Scholar 

  45. 45.

    Karpievitch YV, Hill EG, Smolka AJ, Morris JS, Coombes KR, Baggerly KA, Almeida JS: PrepMS: TOF MS Data Graphical Preprocessing Tool. Bioinformatics. 2007, 23: 264-265.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  46. 46.

    Hilario M, Kalousis A, Pellegrini C, Müller M: Processing and classification of protein mass spectra. Mass Spectrom Rev. 2006, 25: 409-449.

    CAS  Article  PubMed  Google Scholar 

  47. 47.

    Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning. 2001, Springer Verlag

    Chapter  Google Scholar 

  48. 48.

    Schölkopf B, Smola AJ, Williamson RC, Bartlett PL: New Support Vector algorithms. Neural Comput. 2000, 12: 1207-1245.

    Article  PubMed  Google Scholar 

  49. 49.

    Tirumalai RS, Chan KC, Prieto DRA, Issaq HJ, Conrads TP, Veenstra TD: Characterization of the low molecular weight human serum proteome. Mol Cell Proteomics. 2003, 2: 1096-1103.

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Mehta AI, Ross S, Lowenthal MS, Fusaro V, Fishman DA, Petricoin EF, Liotta LA: Biomarker amplification by serum carrier protein binding. Dis Markers. 2003, 19: 1-10.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  51. 51.

    Zweig MH, Campbell G: ROC plots: a fundamental evaluation tool in clinical medicine. Clin Chem. 1993, 39: 561-577.

    CAS  PubMed  Google Scholar 

  52. 52.

    Dowsett M, Dunbier AK: Emerging biomarkers and new understanding of traditional markers in personalized therapy for breast cancer. Clin Cancer Res. 2008, 14: 8019-8026.

    CAS  Article  PubMed  Google Scholar 

  53. 53.

    Hayashi S, Yamaguchi Y: Estrogen signaling pathway and hormonal therapy. Breast Cancer. 2008, 15: 256-261.

    Article  PubMed  Google Scholar 

  54. 54.

    Payne SJ, Bowen RL, Jones JL, Wells CA: Predictive markers in breast cancer-the present. Histopathology. 2008, 52: 82-90.

    CAS  Article  PubMed  Google Scholar 

  55. 55.

    Giovanella L, Ceriani L, Giardina G, Bardelli D, Tanzi F, Garancini S: Serum cytokeratin fragment 21.1 (CYFRA 21.1) as tumour marker for breast cancer: comparison with carbohydrate antigen 15.3 (CA 15.3) and carcinoembryonic antigen (CEA). Clin Chem Lab Med. 2002, 40: 298-303.

    CAS  PubMed  Google Scholar 

  56. 56.

    Nakata B, Ogawa Y, Ishikawa T, Ikeda K, Kato Y, Nishino H, Hirakawa K: Serum CYFRA 21-1 is one of the most reliable tumor markers for breast carcinoma. Cancer. 2000, 89: 1285-1290.

    CAS  Article  PubMed  Google Scholar 

  57. 57.

    Furger KA, Menon RK, Tuck AB, Bramwell VH, Chambers AF: The functional and clinical roles of osteopontin in cancer and metastasis. Curr Mol Med. 2001, 1: 621-632.

    CAS  Article  PubMed  Google Scholar 

  58. 58.

    Tuck AB, Chambers AF, Allan AL: Osteopontin overexpression in breast cancer: knowledge gained and possible implications for clinical management. J Cell Biochem. 2007, 102: 859-868.

    CAS  Article  PubMed  Google Scholar 

  59. 59.

    Mantzoros CS, Bolhke K, Moschos S, Cramer DW: Leptin in relation to carcinoma in situ of the breast: a study of pre-menopausal cases and controls. Int J Cancer. 1999, 80: 523-526.

    CAS  Article  PubMed  Google Scholar 

  60. 60.

    Garofalo C, Koda M, Cascio S, Sulkowska M, Kanczuga-Koda L, Golaszewska J, Russo A, Sulkowski S, Surmacz E: Increased expression of leptin and the leptin receptor as a marker of breast cancer progression: possible role of obesity-related stimuli. Clin Cancer Res. 2006, 12: 1447-1453.

    CAS  Article  PubMed  Google Scholar 

Download references


We thank Prof. Ronald Hancock for help in preparation of the manuscript. This work was supported by the Polish Ministry of Science and Higher Education, grant 2 P05E 067 30.

Author information



Corresponding author

Correspondence to Piotr Widlak.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MP – performed experiments, interpreted results, LM – performed experiments, interpreted results, JP – performed mathematical modeling and statistical analyses, KB – collected and interpreted clinical data, EN – collected and interpreted clinical data, AW – performed experiments, AC – performed immunoassyas, RD – performed immunoassays, MS – designed and interpreted MS data, drafted manuscript, AP – designed mathematical modeling, drafted manuscript, RT – designed and interpreted clinical part of the study, drafted manuscript, PW – designed and interpreted experiment, prepared final manuscript. All authors read and approved the final manuscript.

Monika Pietrowska, Lukasz Marczak, Joanna Polanska contributed equally to this work.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Pietrowska, M., Marczak, L., Polanska, J. et al. Mass spectrometry-based serum proteome pattern analysis in molecular diagnostics of early stage breast cancer. J Transl Med 7, 60 (2009).

Download citation


  • Breast Cancer
  • Spectral Component
  • MALDI Mass Spectrum
  • Cancer Diagnostics
  • Early Stage Breast Cancer Patient