- Research
- Open access
- Published:
Identification of potential biomarkers to differentially diagnose solid pseudopapillary tumors and pancreatic malignancies via a gene regulatory network
Journal of Translational Medicine volume 13, Article number: 361 (2015)
Abstract
Background
Solid pseudopapillary neoplasms (SPN) are pancreatic tumors with low malignant potential and good prognosis. However, differential diagnosis between SPN and pancreatic malignancies including pancreatic neuroendocrine tumor (PanNET) and ductal adenocarcinoma (PDAC) is difficult. This study tried to identify candidate biomarkers for the distinction between SPN and the two malignant pancreatic tumors by examining the gene regulatory network of SPN.
Methods
The gene regulatory network for SPN was constructed by a co-expression model. Genes that have been reported to be correlated with SPN were used as the clues to hunt more SPN-related genes in the network according to a shortest path approach. By means of the K-nearest neighbor algorithm (KNN) classifier evaluated by the jackknife test, sets of genes to distinguish SPN and malignant pancreatic tumors were determined.
Results
We took a new strategy to identify candidate biomarkers for differentiating SPN from the two malignant pancreatic tumors PanNET and PDAC by analyzing shortest paths among SPN-related genes in the gene regulatory network. 43 new SPN-relevant genes were discovered, among which, we found hsa-miR-194 and hsa-miR-7 along with 7 transcription factors (TFs) such as SOX11, SMAD3 and SOX4 etc. could correctly differentiate SPN from PanNET, while hsa-miR-204 and 4 TFs such as SOX9, TCF7 and PPARD etc. were demonstrated as the potential markers for SPN versus PDAC. 14 genes were demonstrated to serve as the candidate biomarkers for distinguishing SPN from PanNET and PDAC when considering them as malignant pancreatic tumors together.
Conclusion
This study provides new candidate genes related to SPN and the potential biomarkers to differentiate SPN from PanNET and PDAC, which may help to diagnose patients with SPN in clinical setting. Furthermore, candidate biomarkers such as SOX11 and hsa-miR-204 which could cause cell proliferation but inhibit invasion or metastasis may be of importance in understanding the molecular mechanism of pancreatic oncogenesis and could be possible therapeutic targets for malignant pancreatic tumors.
Background
Solid pseudopapillary neoplasms (SPN) [1] are uncommon tumors that account for 0.2–2.7 % of all pancreatic tumors and are predominantly seen in young female patients for as-yet-unknown reasons [2]. The fact that SPNs occur predominantly in young women led to the study of gender hormonal receptors by several authors without any evidence of estrogen receptors in pathogenesis of the tumor [3]. Most patients are asymptomatic at diagnosis, and abdominal pain is the most common symptom [3]. SPN shows low-grade malignancy and local surgical excision is usually with a cure rate of greater than 95 % [4, 5]. It is important to distinguish SPN from pancreatic neuroendocrine tumor (PanNET) or pancreatic ductal adenocarcinoma (PDAC) so as to treat them differently, because the treatment of PanNET is usually a selection or combination of surgery, hormone therapy, radiation therapy, and chemotherapy. Similarly, less than 20 % of PDAC patients are suitable for surgery, the gemcitabine or gemcitabine in combination with other chemotherapy agents is thus the main therapeutic measure for PDAC patients [6]. Preoperative diagnosis of SPN will minimize the extent of unnecessary treatment compared with that required for more malignant pancreatic lesions [7, 8]. However, correct diagnosis of SPN is a big challenge because many of the SPN features resembles other types of pancreatic malignant tumors. For example, SPNs are most commonly confused with PanNETs which could occur at pancreatic tail, body and head like SPNs. The difficulty in diagnosis lies in that the two kinds of tumors have histological commonalities, including small- to medium-sized uniform cells with scanty cytoplasm, indiscernible nucleoli, hyaline globules, and numerous small blood vessels with hyalinized walls [9] and both of them behave monomorphous growth and rosette-like structures in morphology [7, 10]. Moreover, both of them can express some neuroendocrine markers, such as CDH1, MME, VIM and CD56 [11]. PDACs which mainly occur at pancreatic head (67 %) with the remaining 33.3 % occur in the body is another type of pancreatic malignancy that has similar radiological features [7, 12] and immunophenotypes to SPN [13].
Recent efforts are devoted to distinguish SPN from other pancreatic tumors at the molecular level and several SPN-related transcription factors (TFs) and other protein-coding genes were exposed. For example, the accumulation of CTNNB1 in nuclear and loss of CDH1 were found to be the characteristic features of SPN. So, immunohistochemical staining of the two proteins could be useful for differentiating SPN from PanNET and PDAC [14, 15]. However, the aberrant behaviors of CTNNB1 and CDH1 in SPN were also observed in some PanNET cases [15, 16]. Similarly, nuclear staining of CTNNB1 and reducing staining of CDH1 could also be positive in some patients with PDAC [13, 17]. Although nearly 30 genes were reported to be SPN related, there have been no genes serving as the gold standard to effectively distinct SPN from malignant tumors in clinical setting.
MicroRNAs (miRNAs) were discovered as a new type of potential biomarker as well as therapeutic targets for diseases in recent years [18]. For example, miR-10b was proposed to be a good diagnostic biomarker for PDAC for its overexpression in the cancer cells [19]. Similarly, upregulation of miR-21 and downregulation of both miR-148a and miR-375 were observed in PDAC relative to adjacent normal tissue and the study therefore proposed that these miRNAs may be used as biomarkers for detecting pancreatic cancer [20]. These small miRNAs encoded by the genome are of about 21nt in length and negatively regulate gene expression by binding to the 3′ UTR of target mRNA and are involved in diverse biological processes, such as differentiation, proliferation, apoptosis etc. [21]. A couple of cancers including colon cancer [22], aggressive B cell lymphoma [23], gastric cancer and invasive endometrial cancer [24] were reported to be associated with specific miRNAs. Park et al. [25] integrated the expression profiles of mRNAs and miRNAs to study the pathogenesis of SPN for the first time. Both mRNAs and miRNAs were shown to be differentially expressed between SPN and PanNET/PDAC. In addition, they found SPN were characterized by three activated pathways, the Wnt/β-catenin, Hedgehog and androgen receptor signaling pathway, with which 17 differently expressed miRNAs were identified to be closely associated by target prediction. However, this work was mainly concerned with the correlation between miRNAs and the pathogenesis of SPNs. It did not elaborate whether miRNAs could be used as biomarkers when discriminating SPNs and the two other pancreatic tumors mentioned above.
Gene regulation is a biological process intertwined by TFs, miRNAs and their target genes, and is vital in controlling gene expression. Abnormal state for certain regulators may affect subsequent regulatory events and thus lead to aberrant cell behaviors. For such a complex system, network has been successfully used as a universal framework to model the biological process in searching genes contributing to the pathogenesis of cancers [26, 27].
In this study, we tried to discover high-quality candidate genes (protein-coding genes and miRNAs) that could diagnose SPN from other malignant pancreatic tumors with the help of the gene regulatory network. The gene regulatory network (GRN) defined here contains three kinds of nodes, including TF, non-TF gene and miRNA. TFs regulate the expression of protein coding genes and miRNAs at the transcriptional level. While miRNAs normally act as negative gene regulators by binding to the 3′UTR of target mRNAs through base pairing, which results in the cleavage of target mRNAs or translation inhibition at the posttranscriptional level. We firstly constructed a GRN by the method that we previously developed and applied to hepatocellular carcinoma [28]. Then we collected the candidate genes related to SPN by searching shortest paths between any pair of known SPN-related genes in the GRN based on the idea that genes interacted together conduct the similar functions [29]. The candidate biomarkers that distinguish SPN from malignant pancreatic tumors were filtered independently from the set of candidate genes by applying K-nearest neighbor algorithm (KNN) on the expression profiles of patient samples. Finally, we evaluated the predictions by the jackknife test. 14 genes including TFs and miRNAs were demonstrated to well separate SPN from PanNET and PDAC samples. The expression patterns of other gene sets were shown to be able to distinguish between SPN and PanNET or SPN and PDAC specifically. So these genes could serve as the potential biomarkers for clinical application. Meanwhile, the discovery of these potential biomarkers may also provide clues to understand the molecular bases of pancreatic tumorigenesis and development as these genes characterize the difference between SPN and PanNET or PDAC.
Methods
Microarray data
The mRNA and miRNA expression data were from the SPN study of Park et al. [25] that contained 14 SPN, 6 PanNET, 6 PDAC and 5 non-neoplastic pancreatic samples. We retrieved the data from GEO database [30] with the accession number of GSE 43797. For mRNA, the gene expression profile was obtained using an Illumina HumanHT-12 V4.0 expression beadchip (San Diego, CA). The miRNA data was generated by the Agilent-031181 Unrestricted_Human_miRNA_V16.0_Microarray 030840. Both datasets for mRNA and miRNA expression were log2-transformed and quantile-normalized using the Bioconductor package in R.
Network construction
We applied a two-step integration method [28] to construct the GRN for SPN, which was previously reported for hepatocellular carcinoma network analysis. The overall procedure for constructing the GRN is briefed as follows. Firstly, a candidate network was obtained through the following steps: (1) predicting target genes for TFs and miRNAs using bioinformatics algorithms, i.e. MATCH [31] for TFs and TargetScan [32] for miRNAs; (2) obtaining experimentally-validated regulations for TFs to targets from ChEA [33] and TransmiR 1.2 [34], miRNAs to targets from TarBase 7.0 [35]; (3) integrating all the predicted regulations and experimentally-validated regulations. Since this step produced a lot of noise and was not restricted to the specific tissue, the co-expression model together with the gene expression profile data was then introduced to pick out the regulatory relationships for the corresponding tissue among the candidate network. The Pearson correlation coefficient for each pair of regulation was calculated at this step and the thresholds (cut-off) were determined according to the power law fitness of the degree distribution to result in a scale-free network [36]. That is, the fraction p(k) of nodes having k connections to other nodes decreases exponentially to k as shown in formula 1. In scale-free networks, a minority of nodes dominates most of the connections. The rationality for the selection of threshold is that many studies have shown the biological networks including protein–protein interaction network [37], metabolic network [38] and regulatory network [39] are of hierarchical scale-free nature. So, the network obtained through this filtration is biologically-meaningful.
where λ is a parameter that typically takes the value of 2 < λ < 3.
Identifying differentially expressed genes
The two integrated statistical methods: (1) Student’s t-test; (2) median-ratio fold change were used to identify differentially expressed genes for SPN, PanNET and PDAC versus non-neoplastic pancreas samples respectively, SPN versus PanNET and SPN versus PDAC samples. Genes with P value of <0.01 (t-test) and a fold change ≥ 2 or ≤ 0.5 were recognized as differentially expressed genes.
Previously reported SPN genes and shortest path calculation
Based on the previous studies of immunohistochemical staining, we manually collected genes that have been validated to exhibit abnormal behaviors in SPN using the Polysearch [40], a biomedical text mining tool for extracting relationships between human diseases and genes. After identifying the SPN genes, we calculated the shortest paths for each pair of these genes in the GRN using Dijkstra’s algorithm [41], which was developed to construct shortest paths in weighted networks. In our study, the weight of edges was calculated based on Pearson correlation coefficient, α. For convenience, the parameter β = 1−|α| was taken in the study, so that the smaller the β, the stronger the regulation between two genes.
Prediction algorithm
The KNN [42, 43] is widely used in computational biology and bioinformatics for its apparent efficiency and easy-to-use features [44, 45]. In the KNN classifier, the query sample should be allocated to the subset represented by majority of its K nearest neighbors. In this study, the KNN classifier was adopted to predict the performance of shortest path genes to classify the three kinds of pancreatic tumors mentioned before, and Euclidean distance was used to measure the locality of samples based on gene expression profiles. As mentioned above, the KNN classifier contains a parameter K that could affect the prediction result. In other words, different K values may assign the query sample to a distinct subset. The one-dimensional method proposed by Kuo-Chen Chou [46] was used to solve this problem. In this method, all samples were classified into M subsets where each subset Sm (m = 1, 2, …, M) is composed of the same attribute category and its size (the number of samples) is Nm. Given a query sample S, it is predicted to belong to the subset Sm with which its score of the following equation is the highest.
where
Performance validation
In statistical predictions, three kinds of cross-validation methods are widely used to exam the effectiveness of the classifier: the independent dataset test, sub-sampling test (such as fivefold or tenfold cross-validation), and the jackknife test (also called leave-one-out cross-validation) [47]. As demonstrated by Chou.et al. [47], the jackknife test is the least arbitrary and therefore is widely used to examine the performance of various predictors [48–50]. The jackknife test was utilized to evaluate the quality of the prediction model in our study. The prediction accuracy is defined by the percentage of the number of correct prediction events for all classes divided by the number of total prediction events, as follows:
Results
To identify high-quality candidate biomarkers that could diagnose SPN from PaNET and PDAC, we firstly constructed the candidate regulatory network by integrated all the predicted regulations and experimentally-validated regulations involving TFs and miRNAs as regulators. Then a tissue-specific GRN for SPN was constructed by integrating co-expression model together with the gene expression profile data. Meanwhile we acquired previously reported SPN gene list (protein coding genes and miRNAs) by Polysearch tool [40] and mapped the members of the list to GRN. By linking the members through shortest path method, we obtained new candidate SPN genes which are on the shortest paths. Finally, the candidate biomarkers that could distinguish SPN from malignant pancreatic tumors were filtered from the set of candidate genes by applying KNN classifier on the expression profiles of patient samples. The workflow is shown in Fig. 1.
Regulatory network for SPN
To uncover new genes implicated in the pathogenesis of SPN, we constructed the GRN for SPN [28]. The candidate network was firstly constructed by collecting all the regulations of TFs and miRNAs predicted by bioinformatics tools. Then, the co-expression model was applied to this network. In general, the GRN for SPN is constructed by collecting the regulatory interactions between any pair of genes that are co-expressed (see “Methods” section) in SPN. The cut-off parameter used to decide whether there exists a co-expression relationship between specific gene pairs is basically determined by whether the GRN shows a good power-law behavior in degree distribution (Additional file 1). It is shown that the power-law fitness (see R2 in Fig. 2a, e) of the in-degree distribution for either miRNA or TF in the GRN of SPN rises as the cut-off for co-expression increases. This means the GRN of SPN approaches the scale-free network when the criterion for co-expression goes strict. However, the strictness of the co-expression criterion also affects the size of the GRN. As seen in Fig. 2b, f, the size of GRN decreases as the co-expression cut-off increases. To avoid losing too many effective regulatory interactions, we set the cut-offs for both TF and miRNA regulation at 0.8 for compromising their impact on the power-law fitness and the size of GRN. It is notable that the SPN analyses based on GRN was basically robust when cut-off varied from 0.6 to 0.8. The final GRN contained 7215 nodes (including 180 TFs and 164 miRNAs) and 86,084 interactions, which comprised 11,351 regulations from miRNAs to protein coding genes, 1013 regulations from TFs to miRNAs and 73,720 regulations from TFs to protein coding genes. Among the GRN, 1182 regulations were experimentally validated. The entire GRN is detailed in Additional file 2.
Candidate genes that are closely related to SPN
We firstly acquired genes which have been reported to be deregulated in SPN by text mining. Previous studies of SPN were mainly conducted by immunohistochemical staining and have identified various SPN-relevant genes such as FLI1 and CCND1 [51], LEF1 [52] and CTNND1 [53], and CTNNB1 and CDH1 in association with the Wnt signaling pathway [25, 54]. A list of 26 previously reported genes including 4 TFs, 7 miRNAs (Additional file 3: Table S1) were manually extracted by using Polysearch tool [40].
To discover more candidate genes involved in SPN, we conducted a search in the GRN of SPN based on the “guilt-by-association” rule [29] which has been widely used to predict gene functions in many biological networks [55, 56]. The rule regarded the neighbors of a given gene as to have similar biological functions. According to such a rule, it can be further inferred that genes in the shortest paths [57] between two known SPN genes (i.e. the path with minimal length between two SPN genes) may have features in common with SPN genes. The shortest paths between each pair of the 26 original SPN-related genes were calculated by the algorithm of Dijkstra [41]. A total of 216 shortest paths were obtained (Additional file 3: Table S2), and 43 genes containing 33 TFs and 10 miRNAs were found to be located in the paths (Additional file 3: Table S3) in addition to those 26 known SPN genes. The 216 shortest paths formed a sub-network (Fig. 3, 25 known genes were shown in the figure, as there was no shortest path between CCDN1 and the other known genes) in which, transcription information is transmitted among known SPN-related regulators and 43 path genes. These 43 genes were believed to be new candidates implicated in the tumorigenesis of SPN according to “guilt-by-association” rule.
Functional analysis for candidate SPN genes
Functional analyses were made to testify whether the new candidate genes were truly correlated with SPN. Firstly, we performed function enrichment analysis on the candidate TFs. KEGG pathway enrichment analysis demonstrated that all of the candidate TFs were involved in classic cancer-related pathways, such as non-small cell lung cancer, and colorectal cancer (Table 1). Specially, some of genes were enriched in the Wnt signaling pathway whose activation has been reported to be the essential characteristics for the pathogenesis of SPN [13, 25, 51]. Figure 4 illustrates the genes deregulated in Wnt signaling pathway, where SMAD3, c-MYC and c-JUN which participated in cell cycle were found aberrantly expressed in SPN when comparing to non-neoplastic pancreas samples.
In addition, the functions of 10 candidate miRNAs were annotated by the tool TAM [58], a web-accessible program which could mine the potential biological processes that a set of miRNAs could be involved in. The results showed miRNA-associated functions were enriched in apoptosis, cell differentiation and epithelial-mesenchymal transition (EMT) (P < 0.05). All of these functions are also closely related to the tumorigenesis.
Candidate genes were enriched in differentially expressed genes
Basically, the genes that contribute to the pathogenesis and development of the disease are prone to be differentially expressed in SPN comparing to the normal state. We checked the expression value of all the candidate genes in both SPN and normal condition and found that most of the candidate genes (TFs and miRNAs) were differentially-expressed (20 out of 33 TFs, 6 out of 10 miRNAs, Tables 2, 3). Taking the 48/41 TFs/miRNAs (see “Methods” section and Additional file 4) that are differentially expressed in SPN as background, the Fisher’s exact test showed that both the candidate TFs and miRNAs are significantly enriched in differentially-expressed genes (P < 0.05). This suggests that the candidate genes identified by the shortest-path method are generally essential to the pathogenesis of SPN.
Candidate biomarkers to differentially diagnose SPN and malignant pancreatic tumors
The candidate biomarkers that could separate SPN from malignant pancreatic tumors (including PanNET and PDAC) were searched in the 69 SPN-related genes, i.e. the 26 known genes collected from literatures and the 43 new candidate genes that were predicted in this study. The procedure is as following: (1) the KNN classifier was applied on gene expression profiles of the 69 SPN-related genes to find out the gene set that could independently distinguish SPN and malignant pancreatic tumors; (2) the quality of the KNN classifier was evaluated by jackknife test (see Method). 14 genes were found to discriminate SPN from pancreatic malignancies with 100 % accuracy (Fig. 5a, b) when taking PanNET and PDAC as a whole, among which 8 genes were downregulated in SPN while 6 other genes were upregulated compared with PanNET as well as PDAC. Three genes, TCF7, PPARD and miR-194, were newly found in this study (Additional file 5: Table S6). Obviously, the expression patterns of the genes found here represent the difference between SPN and the other two malignant cancers and may help to understand the molecular mechanisms between benign neoplasms like SPN and malignant tumors.
Particularly, we searched the candidate biomarkers for discriminating SPN from PanNET or PDAC respectively among the 69 SPN-related genes following the same procedure. 17 genes were identified to separate SPN from PanNET with 100 % accuracy (Fig. 6a, b) and 7/10 of them decreased/increased in SPN when comparing with PanNET. The same number of genes and accuracy were obtained for SPN versus PDAC (Fig. 7a, b) with 8/9 genes decreased/increased in SPN compared to PDAC. For the distinction of SPN from PanNET, 9 genes were newly predicted; while for the distinction of SPN from PDAC, 5 genes are newly found (Additional file 5: Table S7/S8). Comparing the two sets of 17 candidate biomarkers, it is found that TCF7 and PPARD are the common members (Figs. 6b, 7b) and both of them were upregulated in SPN compared with PanNET and PDAC (Additional file 5).
Discussion
In this study, we conducted a first investigation to identify the potential biomarkers for differentially diagnosing SPN from malignant pancreatic tumors, PanNET and PDAC, via a network approach. The shortest paths among 26 previously-reported SPN-related genes in the network were calculated and 43 new candidate genes were identified. Genes from this analysis together with the previously reported SPN-related genes may potentially contribute to the pathogenesis of SPN and help to the precise diagnosis of SPN which is important to improve the prognosis. Thus, we further explored candidate biomarkers for the differentiation of SPN from malignant pancreatic tumors using a nearest neighbor algorithm that was evaluated by the jackknife test.
When considering PanNET and PDAC collectively as malignant pancreatic cancers, PPARD, TCF7 and miR-194 were found to have excellent capabilities to separate SPN from malignant cases. More interestingly, miR-194 was down-regulated in SPN (−2.76-fold, P < 0.01), but was up-regulated in both PanNET (2.06-fold, P < 0.01) and PDAC (2.97-fold, P < 0.01) when all classes were compared with non-neoplastic pancreatic cases. A recent study revealed that up-regulation of miR-194 in PDAC was correlated with increased tumor growth and progression [59], which supports our results.
For SPN versus PanNET, we uncovered a set of 17 genes including 9 new candidate genes and 8 previously-reported genes that could correctly separate the two groups of samples (14 SPN and 6 PanNET). Of the path genes, sex-determining region Y-box 11 (SOX11; 5.62-fold versus normal, 5.02-fold versus PanNET, P < 0.01), which has an out-degree of 690 in the network was reported to be implicated in embryonic development and tissue remodeling [60]. In this study, SOX11 was found to serve as a candidate biomarker to confer a differential diagnosis between SPN and PanNET. SOX11 was also found to be exclusively overexpressed in SPN when the gene expression profiles of SPN with non-neoplastic samples, PanNET with non-neoplastic samples, and PDAC with non-neoplastic samples were compared. Interestingly, in epithelia ovarian cancer (EOC), SOX11 was revealed to be overexpressed when compared with normal ovarian tissues, but loss of expression of SOX11 protein was associated with a more aggressive phenotype [61]. The hypothesis that SOX11 expression in EOC may lead to the aberrant regulation of genes associated with cell survival/death which could promote a pro-apoptotic and less aggressive phenotype was thus postulated in that report. It has also been reported that the overexpression of SOX11 strongly suppresses cell migration/invasion in vitro and in vivo but does not inhibit cell proliferation in gastric cancer [62]. Given these facts, we speculated that overexpression of SOX11 may contribute to the tumorigenesis but less malignant behaviours of SPN, although the detailed mechanisms need further investigation.
Another set of 17 genes consisting of five new candidate genes and 12 reported genes were found to successfully differentiate SPN from PDAC. Of the path genes, sex-determining region Y (SRY) box 9 (SOX9) has an out-degree of 2030 in the network. SOX9 is an important transcription factor required for development and has been implicated in several types of cancer. SOX9 decreased in SPN as compared with PDAC (−12.62-fold, P < 0.01) and non-neoplastic pancreatic tissues (−10.73-fold, P < 0.01), which was consistent with the previous immunohistochemistry data that the detection of SOX9 was observed in the majority (89 %) of PDACs samples but not in SPN (0 %) [63]. Furthermore, SOX9 was also found to be critical for PDAC initiation and involved in their tumorigenesis by regulating the ERBB pathway [64]. These facts implied that SOX9 could act as a candidate biomarker in differentiating SPN and PDAC.
Another path gene that could successfully differentiate SPN from PDAC was miR-204, which increased in SPN compared with PDAC (10.86-fold, P < 0.01) and non-neoplastic pancreatic tissues (5.65-fold, P < 0.01). MiR-204 has been reported to be down-regulated in several human cancers, including gastric cancers, ovarian cancers, breast cancers, malignant peripheral nerve sheath tumors as well as endometrial cancers, and associated with the promotion of tumor invasion and metastasis [65]. Additionally, overexpression of miR-204 dramatically suppressed intrahepatic cholangiocarcinoma cell migration and invasion, as well as the EMT process [66]. Aberrant EMT activation is an important step towards tumor cell invasion and metastasis [67, 68]. From these discoveries, we inferred that miR-204 may be a vital factor to maintain SPN in a low malignant state, and this mechanism requires further study of course. Since miR-204 was down-regulated in PDAC compared with non-neoplastic pancreatic tissues (−2.01-fold, P < 0.01), we speculated whether miR-204 could be of therapeutic usage to suppress tumor metastasis in PDAC. In fact, over-expression of miR-204 was found to cause cell death in malignant pancreatic cancer [69].
The strategy we proposed to discover candidate biomarkers to discriminate SPN from PanNET or PDAC was to search for the candidate SPN-related genes through SPN gene regulatory network firstly, and then to analyze the expression profiles by focusing on the resulted SPN genes in order to decrease the noise that most of the current large scale gene expression analysis are confronted with. The strategy worked well with 100 % accuracy for the dataset available so far. However, it should be noted that more samples should be included to further quantify the importance of each marker gene in the future. Despite the shortage of the amount of samples, the effectiveness of this approach has been demonstrated by showing the biological relevance of the candidate biomarkers as well as the experimental evidence from literatures for certain genes. The potential biomarkers including miRNAs and TFs proposed here need further validation by qRT-PCR, immunohistochemical staining and Western blotting, etc. In fact, mRNA level of one candidate biomarker, androgen receptor (AR) has been verified by qRT-PCR in patient samples [25]. Moreover, both western blotting and immunohistochemical staining analyses revealed that AR was increased in SPN comparing to PanNET and PDAC by Park et al. [25]. Once the potential biomarkers are confirmed, they may be adapted to clinical setting for differential diagnosis between SPN and PanNET/PDAC in which tumor tissue section will be used and the expression level of potential marker genes be checked.
Conclusions
In conclusion, this study provides new insights into the identification of new potential biomarkers to differentiate SPN from PanNET as well as PDAC by a network-based study. 43 new candidate genes involved in the tumorigenesis of SPN were found by using the shortest path analysis among 26 reported SPN-related genes. With the help of KNN classifier and jackknife test, 14 genes were found to discriminate SPN from pancreatic malignancies with 100 % accuracy, and three genes, TCF7, PPARD and miR-194 were newly found in this study. For SPN versus PanNET, 17 genes including SOX11, SMAD3 and miR-194 etc. were identified to separate the two diseases, among which nine were newly discovered. The same number of genes and accuracy were obtained for the distinction of SPN from PDAC, with five genes containing SOX9 and miR-204 etc. were newly predicted. Genes obtained from this study may provide clues to further understanding of the gene regulation mechanism of SPN as well as PanNET and PDAC. Some potential biomarkers, e.g. SOX11 and miR-204 which could cause cell proliferation but inhibit invasion or metastasis could be potential therapeutic targets for malignant pancreatic tumors to lighten their malignant degree.
References
Frantz VK. Tumors of the pancreas. In: Atlas of tumor pathology, 1st series. Washington DC: Armed Forces Institute of Pathology; 1959.
Mortenson MM, Katz MHG, Tamm EP, Bhutani MS, Wang HM, Evans DB, Fleming JB. Current diagnosis and management of unusual pancreatic tumors. Am J Surg. 2008;196:100–13.
Papavramidis T, Papavramidis S. Solid pseudopapillary tumors of the pancreas: review of 718 patients reported in English literature. J Am Coll Surg. 2005;200:965–72.
Klimstra DS, Wenig BM, Heffess CS. Solid-pseudopapillary tumor of the pancreas: a typically cystic carcinoma of low malignant potential. Semin Diagn Pathol. 2000;17:66–80.
Cai Y, Ran X, Xie S, Wang X, Peng B, Mai G, Liu X. Surgical management and long-term follow-up of solid pseudopapillary tumor of pancreas: a large series from a single institution. J Gastrointest Surg. 2014;18:935–40.
Burris HA, Moore MJ, Andersen J, Green MR, Rothenberg ML, Madiano MR, Cripps MC, Portenoy RK, Storniolo AM, Tarassoff P, et al. Improvements in survival and clinical benefit with gemcitabine as first-line therapy for patients with advanced pancreas cancer: a randomized trial. J Clin Oncol. 1997;15:2403–13.
Baek JH, Lee JM, Kim SH, Kim SJ, Kim SH, Lee JY, Han JK, Choi BI. Small (< = 3 cm) Solid Pseudopapillary Tumors of the Pancreas at Multiphasic Multidetector CT. Radiology. 2010;257:97–106.
Reddy S, Cameron JL, Scudiere J, Hruban RH, Fishman EK, Ahuja N, Pawlik TM, Edil BH, Schulick RD, Wolfgang CL. Surgical management of solid-pseudopapillary neoplasms of the pancreas (Franz or Hamoudi tumors): a large single-institutional series. J Am Coll Surg. 2009;208:950–9.
Kim SA, Kim MS, Kim MS, Kim SC, Choi J, Yu E, Hong SM. Pleomorphic solid pseudopapillary neoplasm of the pancreas: degenerative change rather than high-grade malignant potential. Hum Pathol. 2014;45:166–74.
Takahashi Y, Hiraoka N, Onozato K, Shibata T, Kosuge T, Nimura Y, Kanai Y, Hirohashi Y. Solid–pseudopapillary neoplasms of the pancreas in men and women: do they differ? Virchows Arch. 2006;448(5):561–9.
Liu BA, Li ZM, Su ZS, She XL. Pathological differential diagnosis of solid-pseudopapillary neoplasm and endocrine tumors of the pancreas. World J Gastroenterol. 2010;16:1025–30.
Coleman KM, Doherty MC, Bigler SA. Solid-pseudopapillary tumor of the pancreas. Radiographics. 2003;23:1644–8.
Hong SM, Li A, Olino K, Wolfgang CL, Herman JM, Schulick RD, Iacobuzio-Donahue C, Hruban RH, Goggins M. Loss of E-cadherin expression and outcome among patients with resectable pancreatic adenocarcinomas. Mod Pathol. 2011;24:1237–47.
Chetty R, Serra S. Membrane loss and aberrant nuclear localization of E-cadherin are consistent features of solid pseudopapillary tumour of the pancreas. An immunohistochemical study using two antibodies recognizing different domains of the E-cadherin molecule. Histopathology. 2008;52:325–30.
Kim MJ, Jang SJ, Yu E. Loss of E-cadherin and cytoplasmic-nuclear expression of beta-catenin are the most useful immunoprofiles in the diagnosis of solid-pseudopapillary neoplasm of the pancreas. Hum Pathol. 2008;39:251–8.
Chetty R, Serra S, Asa SL. Loss of membrane localization and aberrant nuclear E-cadherin expression correlates with invasion in pancreatic endocrine tumors. Am J Surg Pathol. 2008;32:413–9.
Li L, Li JS, Hao CY, Zhang CJ, Mu K, Wang Y, Zhang TG. Immunohistochemical evaluation of solid pseudopapillary tumors of the pancreas: the expression pattern of CD99 is highly unique. Cancer Lett. 2011;310:9–14.
Mishra PJ. Non-coding RNAs as clinical biomarkers for cancer diagnosis and prognosis. Expert Rev of Mol Diagn. 2014;14:917–9.
Preis M, Gardner TB, Gordon SR, Pipas JM, Mackenzie TA, Klein EE, Longnecker DS, Gutmann EJ, Sempere LF, Korc M. MicroRNA-10b expression correlates with response to neoadjuvant therapy and survival in pancreatic ductal adenocarcinoma. Clin Cancer Res. 2011;17:5812–21.
Bhatti I, Lee A, James V, Hall RI, Lund JN, Tufarelli C, Lobo DN, Larvin M. Knockdown of microRNA-21 inhibits proliferation and increases cell death by targeting programmed cell death 4 (PDCD4) in pancreatic ductal adenocarcinoma. J Gastrointest Surg. 2011;15:199–208.
Moreno-Moya JM, Vilella F, Simon C. MicroRNA: key gene expression regulators. Fertil Steril. 2014;101:1516–23.
Zhai H, Song B, Xu X, Zhu W, Ju J. Inhibition of autophagy and tumor growth in colon cancer by miR-502. Oncogene. 2013;32:1570–9.
Iqbal J, Shen Y, Huang X, Liu Y, Wake L, Liu C, Deffenbacher K, Lachel CM, Wang C, Rohr J, et al. Global microRNA expression profiling uncovers molecular markers for classification and prognosis in aggressive B-cell lymphoma. Blood. 2015;125:1137–45.
Dong P, Kaneuchi M, Watari H, Sudo S, Sakurag N. MicroRNA-106b modulates epithelial-mesenchymal transition by targeting TWIST1 in invasive endometrial cancer cell lines. Mol Carcinog. 2013;53:349–59.
Park M, Kim M, Hwang D, Kim WK, Kim SK, Shin J, Park ES, Kang CM, Paik YK, Kim H. Characterization of gene expression and activated signaling pathways in solid-pseudopapillary neoplasm of pancreas. Mod Pathol. 2014;27:580–93.
Ye H, Liu X, Lv M, Wu Y, Kuang S, Gong J, Yuan P, Zhong Z, Li Q, Jia H, et al. MicroRNA and transcription factor co-regulatory network analysis reveals miR-19 inhibits CYLD in T-cell acute lymphoblastic leukemia. Nucleic Acids Res. 2012;40:5201–14.
Poos K, Smida J, Nathrath M, Maugg D, Baumhoe D, Korsching E, et al. How microRNA and transcription factor co-regulatory networks affect osteosarcoma cell proliferation. Plos Comput Biol. 2013;9(8):e1003210.
Gu ZG, Zhang CY, Wang J. Gene regulation is governed by a core network in hepatocellular carcinoma. BMC Syst Biol. 2012;6:32.
Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics. 2005;21:I302–10.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, et al. NCBI GEO: archive for functional genomics data data sets–update. Nucleic Acids Res. 2013;41:D991–5.
Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E. MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 2003;31:3576–9.
Agarwal V, Bell GW, Nam JW, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. Elife. 2015;4:e05005.
Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma’ayan A. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics. 2010;26:2438–44.
Wang J, Lu M, Qiu C, Cui Q. TransmiR: a transcription factor-microRNA regulation database. Nucleic Acids Res. 2010;38:D119–22.
Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas G, Vergoulis T, Kanellos I, Anastasopoulos IL, Maniou S, Karathanou K, Kalfakakou D, et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 2015;43:D153–9.
Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–12.
Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M. Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature. 2004;430:88–93.
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of metabolic networks. Nature. 2000;407:651–4.
Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005;37:382–90.
Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008;36:W399–405.
Dijkstra EW. A note on two problems in connection with graphs. Numerische Mathematik. 1959;1:269–71.
Ruiz EV. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recogn Lett. 1986;4:145–57.
Denoeux T. A K-nearest neighbor classification rule-based on Dempster-Shafer theory. IEEE Trans Syst Man Cybern. 1995;25:804–13.
Cai Y, Huang T, Hu L, Shi X, Xie L, Li Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids. 2012;42:1387–95.
Huang T, Chen L, Cai YD, Chou KC. classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One. 2011;6:e25297.
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011;273:236–47.
Chou KC, Zhang CT. Prediction of protein structural classes. Crit Rev Biochem Mol Biol. 1995;30:275–349.
Anand A, Suganthan PN. Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates. J Theor Biol. 2009;259:533–40.
Nanni L, Lumini A. A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease. Protein Pept Lett. 2009;16:163–7.
Vilar S, Gonzalez-Diaz H, Santana L, Uriarte E. A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer. J Theor Biol. 2009;261:449–58.
Tiemann K, Heitling U, Kosmahl M, Kloppel G. Solid pseudopapillary neoplasms of the pancreas show an interruption of the Wnt-signaling pathway and express gene products of 11q. Mod Pathol. 2007;20:955–60.
Singhi AD, Lilo M, Hruban RH, Cressman KL, Fuhrer K, Seethala RR. overexpression of lymphoid enhancer-binding factor 1 (LEF1) in solid-pseudopapillary neoplasms of the pancreas. Mod Pathol. 2014;27:1355–63.
Chetty R, Jain D, Serra S. p120 catenin reduction and cytoplasmic relocalization leads to dysregulation of E-cadherin in solid pseudopapillary tumors of the pancreas. Am J Clin Pathol. 2008;130:71–6.
Burford H, Baloch Z, Liu X, Jhala D, Siegal GP, Jhala N. E-cadherin/beta-catenin and CD10: a limited immunohistochemical panel to distinguish pancreatic endocrine neoplasm from solid pseudopapillary neoplasm of the pancreas on endoscopic ultrasound-guided fine-needle aspirates of the pancreas. Am J Clin Pathol. 2009;132:831–9.
Song JM, Singh M. How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics. 2009;25:3143–50.
Saha S, Chatterjee P, Basu S, Kundu M, Nasipuri M. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis. Cell Mol Biol Lett. 2014;19:675–91.
Weiss M, Hultsch H, Adam I, Scharff C, Kipper S. The use of network analysis to study complex animal communication systems: a study on nightingale song. Proc Roy Soc B. 2014;281:20140460.
Lu M, Shi B, Wang JA, Cao Q, Cui QH. TAM: A method for enrichment and depletion analysis of a microRNA category in a list of microRNAs. BMC Bioinform. 2010;11:419.
Zhang J, Zhao CY, Zhang SH, Dang-Hui DH, Chen Y, Liu QH, Shi M, Ni CR, Zhu MH. upregulation of miR-194 contributes to tumor growth and progression in pancreatic ductal adenocarcinoma. Oncol Rep. 2014;31:1157–64.
Sock E, Rettig SD, Enderich J, Bosl MR, Tamm ER, Wegner M. Gene targeting reveals a widespread role for the high-mobility-group transcription factor Sox11 in tissue remodeling. Mol Cell Biol. 2004;24:6635–44.
Brennan DJ, Ekc S, Doyle E, Drew T, Foley M, Flannelly G, O’Connor DP, Gallagher WM, Kilpinen S, Kallioniemi OP, et al. The transcription factor Sox11 is a prognostic factor for improved recurrence-free survival in epithelial ovarian cancer. Eur J Cancer. 2009;45:1510–7.
Qu Y, Zhou CF, Zhang JN, Cai Q, Li JF, Du T, Zhu ZG, Cui XJ, Liu BY. The metastasis suppressor SOX11 is an independent prognostic factor for improved survival in gastric cancer. Int J Oncol. 2014;44:1512–20.
Shroff S, Rashid A, Wang H, Katz MH, Abbruzzese JL, Fleming JB, Wang H. SOX9: a useful marker for pancreatic ductal lineage of pancreatic neoplasms. Hum Pathol. 2014;45:456–63.
Grimont A, Pinho AV, Cowley MJ, Augereau C, Mawson A, Giry-Laterriere M, Van den Steen G, Waddell N, Pajic M, Sempoux C, et al. Sox9 regulates Erbb signalling in pancreatic cancer development. Pancreas. 2014;43:1361.
Imam JS, Plyler JR, Bansal H, Prajapati S, Bansal S, Rebeles J, Chen HI, Chang YF, Panneerdoss S, Zoghi B, et al. Genomic loss of tumor suppressor miRNA-204 promotes cancer cell migration and invasion by activating AKT/mTOR/Rac1 signaling and actin reorganization. PLoS One. 2012;7:e52397.
Qiu YH, Wei YP, Shen NJ, Wang ZC, Kan T, Yu WL, Yi B, Zhang YJ. miR-204 inhibits epithelial to mesenchymal transition by targeting slug in intrahepatic cholangiocarcinoma cells. Cell Physiol Biochem. 2013;32:1331–41.
Nieto MA. Epithelial-Mesenchymal Transitions in development and disease: old views and new perspectives. Int J Dev Biol. 2009;53:1541–7.
Wang B, Lindley LE, Fernandez-Vega V, Rieger ME, Sims AH, Briegel KJ. The T box transcription factor TBX2 promotes epithelial-mesenchymal transition and invasion of normal and malignant breast epithelial cells. PLoS One. 2012;7:e41355.
Chen Z, Sangwan V, Banerjee S, Mackenzie T, Dudeja V, Li X, Wang H, Vickers SM, Saluja AK. miR-204 mediated loss of Myeloid cell leukemia-1 results in pancreatic cancer cell death. Mol Cancer. 2013;12:105.
Authors’ contributions
PL, JW conceived the study; PL, YH and JY carried out the analysis and wrote the manuscript. JW, JL and JY supervised the work. All authors read and approved the final manuscript.
Acknowledgements
This work was supported by the National Science Foundation of China (No. 81250044 and No. 31500674) and the National Basic Research Program of China (973 Program) (No. 2014CB542300) and the funding from NJU-Yangzhou Institute of Optoelectronics. The authors would acknowledge the Center of High Performance Computation of Nanjing University for the support of computational resources.
Competing interests
The authors declare that they have no competing interests.
Author information
Authors and Affiliations
Corresponding authors
Additional files
12967_2015_718_MOESM1_ESM.pdf
Additional file 1: In-degree distribution for GRN. X-axis represents the in-degree for a certain node. A node of in-degree x means that this node is regulated by a total number of x other nodes. Y-axis represents the total number of network nodes which has an in-degree of x. The red curve was the fitting to the power law distribution. (A): The in-degree distribution for sub-GRN in which only miRNAs are included as regulators and the in-degree for each node (miRNAs and protein coding genes) was calculated in this sub-GRN. The in-degree ranges from 0 to 27. (B): In-degree distribution for sub-GRN in which only TFs are included as regulators.
12967_2015_718_MOESM2_ESM.txt
Additional file 2: Gene regulatory network of SPN. The first column represents regulators, while the second column represents corresponding targets.
12967_2015_718_MOESM3_ESM.xlsx
Additional file 3: Previously reported SPN-relevant genes and shortest paths among path genes. Table S1 lists the 26 genes which have been reported to be related with SPN. Table S2 shows in detail the shortest paths for each pair of 26 SPN-related genes listed in table S1. Table S3 lists 43 new candidate genes on the shortest path, together with the corresponding gene type.
12967_2015_718_MOESM4_ESM.xlsx
Additional file 4: Differentially expressed genes between SPN and non-neoplastic samples. Table S4 posts differentially expressed TFs with fold change and t-test p-value for each TF. Table S5 posts differentially expressed miRNAs with fold change and t-test p-value too for each miRNA.
12967_2015_718_MOESM5_ESM.xlsx
Additional file 5: Biomarkers for differential diagnosis of SPN and two malignant pancreatic. Table S6 lists fold change of 14 biomarkers for SPN, PanNET and PDAC versus normal pancreatic tissues as well as SPN versus PanNET and PDAC. Table S7 lists fold change of 17 biomarkers for SPN and PanNET versus normal pancreatic tissues as well as SPN versus PanNET. Table S8 lists fold change of 17 biomarkers for SPN and PDAC versus normal pancreatic tissues as well as SPN versus PDAC.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Li, P., Hu, Y., Yi, J. et al. Identification of potential biomarkers to differentially diagnose solid pseudopapillary tumors and pancreatic malignancies via a gene regulatory network. J Transl Med 13, 361 (2015). https://doi.org/10.1186/s12967-015-0718-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12967-015-0718-3