De novo sequencing of circulating miRNAs identifies novel markers predicting clinical outcome of locally advanced breast cancer

Background MicroRNAs (miRNAs) have been recently detected in the circulation of cancer patients, where they are associated with clinical parameters. Discovery profiling of circulating small RNAs has not been reported in breast cancer (BC), and was carried out in this study to identify blood-based small RNA markers of BC clinical outcome. Methods The pre-treatment sera of 42 stage II-III locally advanced and inflammatory BC patients who received neoadjuvant chemotherapy (NCT) followed by surgical tumor resection were analyzed for marker identification by deep sequencing all circulating small RNAs. An independent validation cohort of 26 stage II-III BC patients was used to assess the power of identified miRNA markers. Results More than 800 miRNA species were detected in the circulation, and observed patterns showed association with histopathological profiles of BC. Groups of circulating miRNAs differentially associated with ER/PR/HER2 status and inflammatory BC were identified. The relative levels of selected miRNAs measured by PCR showed consistency with their abundance determined by deep sequencing. Two circulating miRNAs, miR-375 and miR-122, exhibited strong correlations with clinical outcomes, including NCT response and relapse with metastatic disease. In the validation cohort, higher levels of circulating miR-122 specifically predicted metastatic recurrence in stage II-III BC patients. Conclusions Our study indicates that certain miRNAs can serve as potential blood-based biomarkers for NCT response, and that miR-122 prevalence in the circulation predicts BC metastasis in early-stage patients. These results may allow optimized chemotherapy treatments and preventive anti-metastasis interventions in future clinical applications.


Background
Breast cancer (BC) is the most common cancer among females and a leading cause of cancer death worldwide [1]. Current clinical decision-making for BC treatment relies on tumor characteristics and therapeutic targets including the estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2). Chemotherapy is given in the neoadjuvant and adjuvant settings to patients with locally advanced/high-risk primary BC as treatment for metastatic BC leading to life-threatening parenchymal disease, or to treat endocrine resistant (ER/PR-negative) metastatic BC [2,3]. Neoadjuvant chemotherapy (NCT) is increasingly being used for initial treatment of locally advanced and inflammatory BCs. Pathologic complete response (pCR), defined as the preoperative eradication of tumors from the breast and axillary lymph nodes [4], is associated with optimal clinical outcome, including improved disease-free and overall survival [5,6]. However, conventional NCT regimens result in pCR in only 10-30% of treated BC patients [6]. In patients with residual invasive disease after NCT, a substantial proportion experience disease progression to metastatic stage within a few years after surgical resection. Patients with both de novo and recurrent metastatic BC face poor prognosis, with a median survival of 1-2 years and a 5-year survival rate < 20% [7][8][9]. Development of early diagnostic markers capable of predicting a patient's response to therapy and recurrence with metastatic BC is therefore critical to advancing more effective, personalized treatment. In this study, we explored the use of circulating miRNAs as blood-based, minimally invasive biomarkers for NCT response and disease relapse in locally advanced and inflammatory BC patients.
MiRNAs are naturally-occurring, non-coding small RNA molecules of 21-24 nucleotides (nts) that form partially complementary base pairs within the 3' untranslated regions of protein-encoding mRNAs, resulting in mRNA destabilization and/or translational inhibition [10]. To date, approximately 1000 miRNAs have been identified in humans. Compared to the large number of mRNA genes (~30, 000 mRNAs per cell), this relatively small number of miRNAs allows for largescale evaluation for individualized diagnosis and treatment with higher efficiency and lower cost. Increasingly, reports demonstrate applications of miRNAs as tissuebased markers for the classification and prognosis of several human cancers, including BC [11][12][13][14]. A number of miRNAs have been found differentially expressed between BC and normal tissues, with significant up-(e. g., miR-21 and miR-155) or down-regulation (e.g., miR-10b and miR-145) in cancerous tissues [12,15,16]. Expression of certain BC-related miRNAs has been correlated with specific biopathologic features, such as ER and PR expression, tumor stage, vascular invasion, and proliferation index [12,[14][15][16][17][18].
MiRNAs are stably present in whole blood, serum, and plasma [19,20]. Therefore, assessment of circulating miRNA profiles from cancer patients allowing for correlations between tumor traits (e.g., treatment response and metastatic potential) and cancer-released miRNAs are of great clinical interest. Using PCR assessments of selected miRNAs that are reportedly dysregulated in BC, several recent studies indicate associations of different circulating miRNAs with primary BC, metastatic disease, and ER status [21][22][23][24][25]. Microarray-based profiling has also been carried out in pilot studies, in which certain circulating miRNAs exhibit promising role as BC diagnostic markers [26,27]. Results from these previous studies, however, share limited consistency, possibly due to the different sensitivity and specificity of the detection approaches, as well as different patient numbers and compositions in the study cohorts.
Because some miRNAs may exclusively exist in the circulation with low or undetectable levels in cancer cells, powerful discovery approaches, such as deep sequencing analysis, may be more likely to identify diagnostic miRNA markers in the blood. Accordingly, we set out to explore the potential of deep sequencing in the current study to comprehensively analyze miRNAs that can serve as blood-based markers for NCT response and relapse with metastasis. As the first exploration to profile circulating miRNAs in BC patients using a de novo sequencing strategy, our study revealed global patterns of circulating miRNAs, and led to the identification of miRNA markers that can predict clinical outcome in primary stage II-III BC.

Study cohort and validation cohort
The clinical and histopathological factors of the study (training) and validation (testing) cohorts are summarized in Additional file 1: Table S1. Patients at City of Hope Medical Center (COH) had given informed consent before blood sera were collected under Institutional Review Board (IRB)-approved protocols, aliquoted and stored at -80°C until use.
Serum specimens of the training cohort were obtained as part of an NCT clinical trial conducted at COH. Forty-two female stage II-III BC patients deemed appropriate for NCT at the time of diagnosis were collected between December 2005 and April 2009. Upon diagnosis, all patients received conventional chemotherapy lasting 4-6 months followed by surgical resection of the tumor. Among the 42 patients, 7 received docetaxel/ doxorubicin/cyclophosphamide (TAC) treatment regimen (group A), and 12 received doxorubicin/cyclophosphamide (AC) treatment followed by carboplatin and nab-paclitaxel (group B). The other 23 patients had HER2 + BC, and were given the same regimen as for group B but with addition of trastuzumab (group C). One of the HER2 + patients was likely stage IV based on presence of pleural effusion at diagnosis, which later was documented to be malignant. Biopsies of the primary tumor were analyzed for pathological classification. Upon completion of NCT, patients with Symmans residual cancer burden (RCB) score [4] of 0 were defined as pathologic complete response (pCR), whereas those with RCB score of ≥ 1 were defined as non-pCR. In the training cohort, 11 patients from all 3 treatment groups relapsed with metastatic disease within 1.5 years after serum collection, whereas the other 31 patients have not had disease progression to date during 2-5 years of follow-up. Patients with or without metastatic progression exhibit balanced age, tumor subtype, sample collection time and treatment regimen.
For the testing cohort, patients were selected from the COH Circulating Breast Cancer Tumor Marker Registry, an observational cohort study that recruits women with a variety of breast tumor histologies at the time of diagnosis, collects pretreatment biospecimens, and follows them throughout their clinical course, recording patient and tumor characteristics, treatments delivered, and clinical outcomes. Patients who had stage II-III BC at the time of registration who developed systemic recurrence while on study were defined as cases (N = 9). Eight of these had sufficient serum RNA for inclusion in the study. Controls were matched for 2:1 (N = 18) from the remaining stage II-III BC patients who had not developed systemic relapse on study and who had been followed at least as long as the case. All controls had sufficient serum RNA for the study. Other matching characteristics included age at diagnosis, hormone receptor and HER2 expression, and lymph node involvement. These patients were recruited onto the parent study between July 2006 and May 2010, a similar era to the training cohort; however, as an observational cohort, their systemic therapy regimen was determined by their primary oncologist. Half of the patients received a similar regimen to the training cohort, including a taxane with either doxorubicin, cyclophosphamide, and/or carboplatin, plus trastuzamab if their tumor was HER2 + . The remaining individuals received a hormonal regimen for ER + HER2 -BC. All therapies were delivered in the adjuvant rather than neoadjuvant setting; therefore, metastatic relapse was the only measurable clinical outcome in this group. Mean follow-up for the testing cohort was 5.8 years.
RNA purification from serum TRIZOL LS reagent (Invitrogen) was used to extract total RNA from~1.5 ml of serum, as describdi in the manufacturer's protocol. RNA pellet was dissolved in 10 μl of RNase-free water, and subjected to deep sequencing and qRT-PCR as described below.

Solexa deep sequencing for small RNAs
Each serum sample was independently subjected to library preparation and deep sequencing. All small RNAs of 15-52 nts were selected and sequenced using the Solexa system, following the manufacturer's protocol (Illumina Inc., San Diego, CA). Library preparation, as well as cluster generation and deep sequencing, was performed according to the 5' ligation-dependent (5' monophosphate-dependent) manufacturer's protocol (Digital Gene Expression for small RNA; Illumina). For each sample, 5 μl of total RNA extracted from serum was used for small RNA library preparation. Small RNAs were size-selected between 17 and 52 nt according to the single-stranded DNA marker in the small RNA sequencing kit (Illumina). The library was quantified using picoGreen and qPCR. Sequencing was performed on a Genome Analyzer IIx (Illumina), and image processing and base calling were conducted using Illumina's pipeline.

MiRNA-directed and genome-wide interrogation of identified sequences
Sequenced reads from Solexa were first mapped onto human genome version hg18 using novoalign software and the expression level of mature miRNAs in the miR-Base human miRNA database v15 was summarized as described before [28]. Normalization and identification of differentially expressed miRNAs between groups were carried out using Bioconductor package "edgeR" [29].

Reverse transcription (RT) and real time quantitative PCR (qPCR)
For qRT-PCR assays, 5 μl of total RNA extracted from serum was used as input into the RT reaction. RT was carried out using the miScript Reverse Transcription Kit (Qiagen) according to the manufacturer's protocol. For qPCR amplification, RT product was combined with PCR assay reagents containing miScript Primer Assay, Universal Primer, and SYBR Green PCR Master Mix (Qiagen). Real-time qPCR was carried out on a BioRad iQ5 thermocycler.

Statistical analyses
Sequence data analysis and statistical comparisons were carried out using Bioconductor packages and an inhouse developed analysis pipeline using R statistical environment. After mapping the deep sequencing data onto the human genome and counting the reads for the mature miRNAs in the miRBase database, raw miRNA expression data were quantile normalized and log2 transformed with offset of 1. Predictive miRNA classifiers for clinical outcome (the miR-375/miR-122 twogene signature, and each gene individually) in the NCT training cohort was evaluated using univariate logistic regression and leave-one-out cross-validation. Briefly, one sample was left out as the test sample, and the remaining 31 samples served as the training set and used to train a univariate logistic regression model using the two-gene signature, which was then used to predict the status of the left out sample. If the predicted probability from the logistic regression model is more than a cutoff determined by minimizing the prediction error rate of training samples, the predicted status of that sample would be assigned as "relapsed", or "nonrelapsed" otherwise. The predicted classification was then compared to the observed relapse status using 2-by-2 tables, from which sensitivity and specificity were calculated. This procedure was then repeated for each of the two single gene markers. To evaluate the performance of the two-gene signature in predicting the independent validation cohort, the entire NCT cohort (32 samples with RT-PCR data) was used as a training set. Odds ratio and 95% confidence internal were calculated using univariate unconditional logistic regression to determine if the histopathological parameters and circulating miRNAs were associated with NCT response.

Study design, deep sequencing and annotation strategy
To comprehensively profile all small RNA species in the circulation, we isolated total RNA from~1.5 ml serum collected from BC patients at initial diagnosis (prior to NCT), and selected small RNAs of 15-52 nts for library preparation and deep sequencing. The profiling study involved 42 stage II-III BC patients who participated in a clinical trial comparing docetaxel, doxorubicin, cyclophosphamide versus doxorubicin and cyclophosphamide, followed by nab-paclitaxel and carboplatin. All patients signed voluntary informed, IRB-approved consent forms, and were treated with NCT at the City of Hope Medical Center (ClinicalTrials.gov; NCT00295893). Among them, 11 relapsed with metastatic disease progression to stage IV during the follow-up, and the other 31 nonprogressive control cases had matched ages, tumor subtypes and follow-up time, thus comprising the study cohort for metastatic relapse. The 23 HER2 + patients received the trastuzumab-containing NCT regimen, upon which comparable numbers of pCR (12 cases; 52%) and non-pCR (11 cases; 48%) were observed. Correlation of miRNAs to NCT response and metastatic relapse was examined (see Methods and Additional file 1: Table S1 for details).

Identification of circulating miRNAs associated with clinical parameters
MiRNAs were assessed and were deemed detectable if seen in at least 2 patients. This resulted in detection of more than 800 miRNAs, including 373 miRNAs with sequence counts of > 50 in at least 10% of the samples (Additional file 2: Table S2). Unsupervised hierarchical clustering of these miRNAs detected in the circulation separated ER + from ERcases ( Figure 1A). No perfect separation was observed between relapsed and nonrelapsed cases at the global level. The top 100 circulating miRNAs with the highest average counts among all tested patients are indicated in Figure 1B (see Additional file 2: Table S2 for a full list of all detected miRNAs). These include several miRNAs that had been measured by PCR in previous studies, such as miR-21, let-7a, miR-155, and miR-10b, as well as miR-375 and miR-122, the two genes we subsequently identified as outcome-associated genes. We next analyzed the association of each miRNA with a given clinical parameter, i.e., status of relapse, NCT response, ER/PR/HER2 expression or inflammatory disease. Multivariate comparison, taking into account the different NCT regimens the patients received, led to the identification of two miRNAs, miR-375 and miR-122, that were differentially expressed between patients who later developed metastatic relapse and those who did not, with P < 0.005 and false discovery rate (FDR) < 0.1 (Table 1). Because only 2 HER2patients in the whole 42-patient cohort had pCR to NCT, the comparison on treatment response was only carried out among HER2 + patients, all of whom received the same trastuzumabcontaining NCT regimen. We identified 7 miRNAs that were associated with NCT response (pCR vs. non-pCR) in HER2 + patients with FDR < 0.1 (Table 1). Of note, miR-375 was identified in both analyses as the most significantly different miRNA, whose prevalence in circulation appeared to reflect better clinical outcome, i.e., pCR to NCT and absence of relapse. In addition to the clinical outcome, miRNAs associated with the biopathological characteristics of primary BC were also identified. A partial list (FDR < 0.05) of miRNAs significantly correlated with the status of ER, PR, HER2 and inflammatory BC is indicated in Table 2 (see Additional file 3: Table  S3 for a full list of all miRNAs with P < 0.05). Interestingly, higher levels of circulating miR-375 were linked to negative ER/PR status, positive HER2 status, and inflammatory BC, whereas higher levels of circulating miR-122 were associated with HER2-negative and non-inflammatory tumors (Tables 2 and Additional file 3: S3).

PCR validation of selected miRNAs
We next selected several miRNAs, including miR-375, miR-122, miR-184, miR-196a, miR-1, miR-410, miR-432, and miR-16, for qRT-PCR-based validation using the same total RNA extracts used for deep sequencing (32 out of the 42 sequenced samples had sufficient amounts left for PCR assays). The sequencing-determined abundance of these miRNAs ranged from very high (e.g., miR-122, ranged from 4, 683-1, 094, 999 counts with an   average of 16, 456) to very low (e.g., miR-184, ranged from 6-1, 097 counts with an average of 61, and miR-410, ranged from 1-1, 620 counts with an average of 220). For PCR analyses, levels of miR-16, which were consistent among all samples (Additional file 2: Table  S2), and reportedly used as the internal control for circulating miRNAs in previous PCR-based studies [22,24], were used as the reference for data normalization. Our results indicated that gene-specific PCR could detect miRNA with as few as 20 counts in a sample (data not shown), and the low-abundant miR-184 and miR-410 could be detected from~90% of all tested samples (29 out of 32). Pairwise Pearson correlation was calculated to determine the consistency of the miRNA levels determined by deep sequencing (normalized log2 counts and PCR-determined levels relative to miR-16) in each sample. For all tested miRNAs, significant (P < 0.05) correlations were observed between the two methods (Additional file 1: Figure S2). We further focused on miR-375 and miR-122, two miRNAs with the most significant fold differences in the associations with metastatic relapse and NCT response (Table 1). Consistent with the deep sequencing results, lower levels of miR-375 and higher levels of miR-122 detected by PCR both significantly correlated with disease relapse in all patients and with resistance to NCT (non-pCR) in HER2 + patients (Figures 2 and Additional file 1: Figure S3). These data indicated the feasibility of cost-efficient PCR assays of these two genes to potentially predict clinical outcome of locally advanced BCs. Levels of other miRNAs examined by PCR did not show significant differences in the comparisons for disease relapse and NCT response (data not shown).

Prediction of relapse by circulating miR-375 and miR-122
We next evaluated the sensitivity and specificity of the two circulating miRNA markers we identified, i.e., miR-375 and miR-122, in predicting metastatic relapse in our NCT study cohort using leave-one-out cross validation (See method section for details). Results indicated that both circulating miR-375 and miR-122 could predict metastasis with relatively high sensitivity and specificity, and the miR-375/miR-122 two-gene signature demonstrated the best predicting performance, with a sensitivity of 80% and specificity of 100% (Table 3).
Circulating miR-122 predicts metastasis in an independent cohort of early-stage BCs We further assembled an independent validation cohort of 26 stage II-III BC patients, including 8 patients with metastatic recurrence within 2 years after initial diagnosis, as well as appropriate controls with matched clinical parameters but without disease recurrence (Additional file 1: Table S1). Serum RNA was isolated and subjected to qRT-PCR to detect levels of miR-375, miR-122 and miR-16. Upon normalization to miR-16, circulating miR-122 levels were significantly higher in the group with relapse (P = 0.0294), and could predict metastasis at a sensitivity of 88% and specificity of 78% in this cohort ( Figure 3 and Table  3). Levels of miR-375, however, were not significantly  different between the relapsed and non-relapsed groups (Table 3 and data not shown), possibly due to the fact that patients in the validation cohort were generally lower stage at diagnosis, with more hormone receptor positive disease, and more frequently overexpressed HER2 (Additional file 1: Table S1). In addition, they received diverse therapies seen in a general oncology practice. Because the status of ER/PR and HER2 in primary BCs have been historically linked to clinical outcomes [30,31], we also computed the sensitivity and specificity of each histopathological parameter in association with development of metastatic relapse in the testing cohort. Results indicated that, in comparison to the histopathological parameters, circulating miR-122 served as a better predictor of metastasis, regardless of the heterogeneity of this cohort (Odds Ratio 24.5; P < 0.01; Additional file 1: Table S4).

Discussion
Using a comprehensive de novo sequencing approach, we identified sets of circulating miRNAs that were associated with various clinicopathological parameters and clinical outcome in stage II-III BC patients. Previous studies have linked higher circulating levels of miR-10b and miR-21 to negative ER status [22], and higher circulating levels of miR-155 to positive PR status [25]. None of these miRNAs exhibited correlations with ER/PR status in our study, possibly due to the differences of the size, composition of study cohorts, and/or treatment regimens. We observed that two miRNA clusters, miR-200b-200a-429 and miR-200c-141, were significantly associated with negative ER status and inflammatory BC (Tables 2 and Additional file 3: S3). In addition, we found that several miRNAs, including miR-375, miR-429, miR-196a, miR-370, miR-125a-5p, and miR-224, simultaneously correlated with both ER and PR status in the same direction (Table 2). Among these miRNAs, miR-375 and miR-429 also correlated with HER2 status but in the opposite direction as compared to their correlations with ER/PR status ( Table 2). Expression of these miRNAs in primary BCs and their functional links with ER/PR/HER2 merit additional investigation, and may further elucidate the pathogenic mechanisms of these long-known receptors in BC. We also identified a two-gene signature consisting of miR-375 and miR-122 with the capacity to predict disease relapse in our study cohort of stage II-III BC patients who  received identical NCT regimens. In a heterogeneous validation set derived from an observational cohort, circulating miR-122, but not miR-375, remained as a predictor of metastasis ( Figure 3). Interestingly, both miR-375 and miR-122 correlated with HER2 status (Table 2), with higher levels of miR-375 and lower levels of miR-122 associating with positive HER2 status (Table 2), pCR to NCT ( Figure 2B), and absence of relapse (Figure 2A). Consistent with previous observations that HER2 + BCs have higher rates of pCR to NCT [30], these results raise interesting questions to be addressed in future studies. For example, the origin of circulating miRNA is still unclear. It has been proposed that tumor-associated miRNAs can be released into the bloodstream when tumor cells are dying and being lysed [22,32], or through active secretion of miRNAloaded exosomes by tumor cells [20]. Furthermore, the cellular source of circulating miR-375 and miR-122 remains unknown; their presence may reflect expression in the primary tumor or in other cell types, such as immune cells.
MiR-122, the circulating miRNA that consistently predicted metastasis in both our study cohort and validation cohort, is the most frequent miRNA isolated in the liver and is involved in the regulation of lipid metabolism [33]. Downregulation of miR-122 has been reported in hepatocarcinoma (HCC) [34]. In contrast, higher levels of miR-122 in circulation correlate with HCC [35] and liver injury [36]. Expression of miR-122 has also been reported in primary fibroblasts, where the miRNA is involved in p53 mRNA polyadenylation/translation by targeting the cytoplasmic polyadenylation element binding protein (CPEB) [37]. Expression and function of miR-122 have not yet been reported in BC. Our results here, however, strongly suggest a role of miR-122 in BC progression, an area of study currently under investigation in our laboratory.
Roth et al. report that circulating levels of miR-10b, miR-34a and miR-155 correlate with the presence of overt metastases in BC patients [24]. These miRNAs, however, did not significantly correlate with metastatic relapse in our retrospective study in stage II-III patients. It is possible that the change of circulating miR-10b, miR-34a and miR-155 levels occurs after cancer dissemination, whereas levels of circulating miR-122 and miR-375, as identified herein, start to change and reflect metastatic potential at an earlier stage of disease. Validation using a larger patient cohort will be necessary in future studies. Findings will allow us to further refine a circulating miRNA signature that can predict metastasis and guide individualized adjuvant and neoadjuvant therapy to minimize risk of systemic relapse.