Gene expression profiling for molecular distinction and characterization of laser captured primary lung cancers

Methods We examined gene expression profiles of tumor cells from 29 untreated patients with lung cancer (10 adenocarcinomas (AC), 10 squamous cell carcinomas (SCC), and 9 small cell lung cancer (SCLC)) in comparison to 5 samples of normal lung tissue (NT). The European and American methodological quality guidelines for microarray experiments were followed, including the stipulated use of laser capture microdissection for separation and purification of the lung cancer tumor cells from surrounding tissue. Results Based on differentially expressed genes, different lung cancer samples could be distinguished from each other and from normal lung tissue using hierarchical clustering. Comparing AC, SCC and SCLC with NT, we found 205, 335 and 404 genes, respectively, that were at least 2-fold differentially expressed (estimated false discovery rate: < 2.6%). Different lung cancer subtypes had distinct molecular phenotypes, which also reflected their biological characteristics. Differentially expressed genes in human lung tumors which may be of relevance in the respective lung cancer subtypes were corroborated by quantitative real-time PCR. Genetic programming (GP) was performed to construct a classifier for distinguishing between AC, SCC, SCLC, and NT. Forty genes, that could be used to correctly classify the tumor or NT samples, have been identified. In addition, all samples from an independent test set of 13 further tumors (AC or SCC) were also correctly classified. Conclusion The data from this research identified potential candidate genes which could be used as the basis for the development of diagnostic tools and lung tumor type-specific targeted therapies.


Background
Lung cancer represents a heterogeneous group of diseases in terms of their biology and the clinical course. The diagnosis and classification of lung cancers are primarily based on the histological morphology and immunohistological methods for distinguishing between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) [1]. The molecular pathogenesis of lung cancer, as far as it has been deciphered, consists of genetic and epigenetic alterations, including the activation of protooncogenes and inactivation of tumor suppressor genes [2][3][4]. This leads to a malignant phenotype, resulting in changes in cell structure, adhesion and cell proliferation [5].
Oligonucleotide microarray studies are commonly used to extend the knowledge of the differences in the biology of lung tumors and to identify new candidate genes with diagnostic, prognostic and therapeutic value [6][7][8][9]. Several gene expression profiling studies in lung cancer have been published, however, it is still difficult to compare these studies due to the differences in methodologies, array platforms, normalization of the data and biostatistical analyses approaches, which may influence the reproducibility and comparability [10][11][12]. Such differences could have led to divergent results, with limited overlap of described genes.
Another crucial step in the field of oligonucleotide microarray studies is the preparation of the solid tumor sample itself. It contains a variable amount of mesenchymal stroma cells, blood vessels, fibroblasts, tumor-invading lymphocytes and necrotic areas next to the tumor cells themselves. Analyzing the complete tumor sample without efficient separation of the tumor cell confounds the true gene expression profile of the tumor.
In order to overcome these methodological limitations, we followed the guidelines from the Microarray Gene Expression Data Society [13] and the MicroArray Quality Control (MAQC) Consortium [14,15], the External RNA Controls Consortium (ERCC) [16] as well as the European consensus guidelines for gene expression experiments [17]. The purification of the tumor cells was carried out by laser capture microdissection (LCM), which has been shown to greatly improve the sample preparation for microarray expression analysis [18]. Few reports on LCM and microarray gene expression analysis have been published to date, comparing all distinct lung cancer subtypes to normal lung tissue [19][20][21].
In this report, we performed a comparison of gene expression profiles, using microarray analysis and LCM, according to the methodological quality consensus guidelines for microarray experiments, with the aim of identifying genes that are differentially expressed in the major histological lung cancer subtypes, as compared to normal lung tissue. In addition, 14 differentially expressed genes in human lung tumors were corroborated by quantitative real-time PCR. Furthermore, using genetic programming, we found a subset of 40 genes, that could be utilized for the classification of different types of lung tumors.

Lung tumor samples
Samples of lung tumors were obtained using bronchoscopy or CT-guided needle aspiration from 29 patients, newly diagnosed patients with lung cancer. The samples that were immediately fixed in RNA-later consisted of 10 adenocarcinomas, 10 squamous cell carcinomas and 9 small cell lung carcinomas. Control samples of normal lung tissue were obtained from 5 patients with suspected tuberculosis or sarcoidosis, without presence of malignant lung tumors. The histopathological diagnosis was based on routinely processed hematoxylin-eosin stains and confirmed by immunohistochemical staining looking for pan-cytokeratin, cytokeratin 5 and 7, chromogranin A, synaptophysin and tissue-transcripion-factor-1. For validation of the classificator from genetic programming, 13 lung cancer samples were selected as a test-set from patients with advanced NSCLC lung cancers. All patients gave their informed consent and the study was approved by the ethics committee of the Heinrich-Heine University, Duesseldorf.

Laser capture microdissection
From each frozen tumor tissue, we prepared 8-μm thick sections. The sections were fixed in methanol/acetic acid and stained in hematoxylin. The tumor cells were identified and ascertained in the sample by an experienced pathologist using the Autopix 100 automated LCM system and collected on a CapSure HS LCM Cap (Arcturus, Mount View, CA). Following microdissection, total RNAextraction was performed with the RNeasy Micro Kit (QIAamp DNA MicroKit Qiagen, Santa Clarita, CA, USA), according to the manufacturer's instruction. A standard quality control of the total RNA was performed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, USA).

RNA isolation, cRNA labeling and hybridization to microarrays
The described procedures strictly adhered to the guidelines from the Microarray Gene Expression Data Society and the MicroArray Quality Control (MAQC) Consortium, the External RNA Controls Consortium (ERCC), as well as the European consensus guidelines for gene expression experiments [13][14][15][16][17]. The full description of the Extraction protocol, labeling and labeling protocol, hybridization protocol and data processing is obtainable in the GEO DATA base under http:// www.ncbi.nlm.nih.gov/geo/ (accession number GSE6044). Total RNA (median: 375 ng; range: 250 -500 ng) was used to generate biotin-labeled cRNA (median: 6,5 μg; range: 3-10 μg) by means of Message Amp aRNA Amplification Kit (Ambion, Austin, TX). Quality control of RNA and cRNA was performed using a bioanalyzer (Agilent 2001 Biosizing, Agilent Technologies). Following fragmentation, labeled cRNA of each individual patient sample was hybridized to Affymetrix HG-Focus Gene-Chips, covering 8793 genes, and stained according to the manufacturer's instructions.

Quantification, normalization and statistical analysis
The quality control, normalization and data analysis, were assured with the affy package of functions of statistical scripting language 'R' integrated into the Bioconductor project http://www.bioconductor.org/, as described previously [22]. Using histograms of perfect match intensities, 5' to 3' RNA degradation side-by-side plots, or scatter plots, we estimated the quality of samples and hybridizations. To normalize raw data, we used a method of variance stabilizing transformations (VSN) [23]. To compare the normalized data from AC, SCC, SCLC and normal lung tissue, we used the Significance Analysis of Microarrays (SAM) algorithm v2.23 http://www-stat.stan ford.edu/~tibs/SAM/ which contains a sliding scale for false discovery rate (FDR) of significantly up-and downregulated genes [24]. All data were permuted 1000 times by using the two classes, unpaired data mode of the algorithm. As a cut-off for significance, an estimated FDR of 2.6% was chosen by the tuning parameter delta of the software. The significance level of each gene was given by the q-value describing the lowest FDR in multiple testing [25], and a cut-off for fold-change of differential expression of 2 was used.
Hierarchical clustering analysis (HCA) was used to determine components of variation in the data in this study. For these analyses we used the unsupervised complete linkage algorithm.
The data points were organized in a phylogenetic tree with the branch lengths represent the degree of similarity between the values. Significantly expressed genes were uploaded to KEGG (Kyoto Encyclopedia of Genes and Genomes) and functional annotation was performed. Genes that were not listed or could be classified in more than one functional group were reviewed for the function based on the literature available using Pubmed, OMIM and GENE available in http://www.ncbi.nlm.nih.gov.

Quantitative real-time PCR
Corroboration of RNA expression data was performed by realtime PCR using the ABI PRISM 7900 HT Sequence Detection System Instrument (Applied Biosystems, Applera Deutschland GmbH, Darmstadt, Germany). Total RNA, ranging between 600 -1000 ng, underwent reverse transcription using a High capacity cDNA Archive Kit according to the manufacturer's instruction (Applied Biosystems, Applera Deutschland GmbH, Darmstadt, Germany). PCRs were performed according to the instructions of the manufacturer, using commercially available assays-on-demand (Applied Biosystems, Applera Deutschland GmbH, Darmstadt, Germany). Ct values were calculated by the ABI PRISM software, and relative gene expression levels were expressed as the difference in Ct values of the target gene and the control gene ribosomal protein S11(RPS11). RPS11 was selected as reference gene for the quantification analyses, because the expression levels of the gene were similar between the examined tumor samples and normal tissue.

Classification using genetic programming
In order to generate a classifier that distinguishes between AC, SCC and SCLC, as well as the normal lung tissue, a Genetic Programming (GP) approach was used. The software DISCIPULUS which implements GP [26] was utilized. A leave-one-out cross validation (LOOCV) was performed, whereby one sample is removed from the training set. The other samples are reduced to those 50 genes with the highest signal-to-noise ratio, which are used as a training set in a training series. A training series generates a number of classifiers. After each series, the 30 best resulting classifiers are applied to that sample removed before, and the number of exact predictions were counted. The procedure was iterated, so that every sample was outside the training set once. The percentages of exact predictions for all samples of a class using 1020 classifiers (34 tissue samples and 30 classifiers = 34 * 30 = 1020 classifiers) were calculated. Each classifier used 50 different genes of a sample, queried their expression values and made the decision of "part of the class" or "outside the class". For each classifier and LOOCV iteration, the frequency of a gene (how often a gene occurs as appropriate classifier) was determined. The frequency was used as a quality criterion. The 10 genes with the highest frequency in each of the four classes were chosen in order to generate a final classifier of 40 genes. The accuracy of correct classification of the tissue is calculated as percentage using 30 classifiers of all left-out samples.

Expression profiles and hierarchical cluster analysis
In this study, we examined gene expression profiles of untreated tumor cells from 29 patients with lung cancer (10 adenocarcinomas, 10 squamous cell carcinomas, 9 small cell lung cancer) in comparison to 5 normal lung tissues. The original data set and the patients characteristics are available in the GEO DATA base under http:// www.ncbi.nlm.nih.gov/geo/ (accession number GSE6044).
Comparing AC, SCC and SCLC to normal lung tissue using significance analysis of microarrays (SAM), we found 205, 335 and 404 genes with an at least 2-fold different expression level and an estimated false discovery rate (FDR) of <2.6%. For an overview, a Venn diagram shows the overlaps of the three among groups ( Figure 1) and the differentially expressed genes were further grouped in 14 functional classes (Table 1). Following SAM analysis, an unsupervised complete linkage clustering algorithm for cluster analyses was performed. The closest pair of the highest expression values of 198 differentially expressed genes was grouped together and a clear segregation of the analyzed groups (adenocarcinomas, squamous cell carcinomas, small cell lung cancer and normal lung tissue) was obtained ( Figure 2)

Adenocarcinomas
We found 205 deregulated genes in AC; 43 were upregulated and 162 were downregulated. Looking at oncogenes and tumor-associated genes, only the paraneoplastic anti-gen MA2 gene was upregulated. Focusing on genes involved in cell structure, 7 genes were upregulated 2 to 7.9-fold, compared to normal lung tissue. Next to the intermediary filament keratin 7 gene, 3 genes were involved in the actin metabolism such as thymosin beta-10, actin-related protein 2/3 complex subunit 1B and plastin 3. Four downregulated genes, involved in cell structure, were found in AC compared to normal lung tissue. These genes were tubulin alpha 3 and beta 2 involved in the assembly of microtubules and intermediary filaments, as well as keratin 5 and 15.
We also looked for genes involved in cell adhesion and migration and found integrin alpha 3, integrin beta 2 and intercellular adhesion molecule 1 to be upregulated in adenocarcinomas, while 6 genes, including the desmosomal cadherins desmoglein 3 and desmocollin 3, were significantly downregulated compared to normal lung tissue.
Examining the genes involved in cell cycle control and proliferation, we found only 2 genes differentially expressed. The chromosome condensation protein G was  (Table 2).

Squamous cell carcinomas
In SCC, we found 335 deregulated genes, including 172 upregulated and 163 downregulated genes. Looking at oncogenes and tumor-associated genes, 4 genes of the RAS associated gene family; the oncogenes v-myc myelocytomatosis viral oncogene homolog, v-maf musculoaponeurotic fibrosarcoma oncogene homolog and pituitary tumor-transforming 1 were upregulated. Examining genes involved in cell structure and cell adhesion, we found 5 types of collagen genes, in particular the genes encoding for collagen type I alpha-1 and 2, type V alpha-2, type VI alpha-3 and type XI alpha-1 to be upregulated in comparison to normal lung tissue. Further, gap junction protein alpha 1 (43 kDa), a member of the connexin gene family and neuronal cell adhesion molecule, a member of the immunoglobulin superfamily were upregulated, while 6 other genes involved in cell adhesion such as the tight junction protein 3 and claudin 10 were downregulated in comparison to normal lung tissue. In SCCs, 41 genes involved in cell cycle regulation were upregulated between 2 to 4.3-fold. Looking at key molecules for progression of cell cycle, the cyclines A2 and B2, cyclindependent kinase 4 and the cell division cycle 2 genes were upregulated. In the group of genes involved in DNA repair, we found genes with key functions for mismatch and double-strand DNA repair such as proliferating cell nuclear antigen, mutS homolog 6 replication factor C 4 and C5, RAD51 associated protein 1, which were overexpressed in comparison to normal lung tissue (Table 3).
Venn Diagramm of significantly regulated genes comparing adenocarcinomas, squamous cell carcinomas and small cell lung can-cer to normal lung tissue (NT) Figure 1 Venn Diagramm of significantly regulated genes comparing adenocarcinomas, squamous cell carcinomas and small cell lung cancer to normal lung tissue (NT

Small cell lung cancer
In SCLC, we found 404 differential expressed genes, including 223 upregulated and 181 downregulated genes. Looking at oncogenes and tumor-associated genes, 4 genes of the rat sarcoma viral oncogene homolog associ-ated gene family, FYN oncogene related to SRC and pituitary tumor-transforming 1 were upregulated, respectively. Of interest, the three tumor-related genes: tumor protein D52, melanoma antigen family D 4, stathmin 1/oncoprotein 18 and two oncogenes DEK oncogene and forkhead  box G1 were upregulated which has not been described in the context of lung cancer so far.
In comparison to normal lung tissue, a different pattern of cell adhesion molecules was found in SCLC, showing 8 genes up-and 7 genes downregulated between 2 to 12.8fold and 2.1 to 4.6-fold, respectively. In particular, the neural cell adhesion molecule 1 and the neuronal cell adhesion molecule, both members of the immunoglobulin superfamily, were overexpressed. Looking for genes involved in cell cycle regulation, we found 56 genes upregulated between 2.1 to 5.1-fold compared to normal lung tissue among them the key molecules for progression of cell cycle, the cyclines A2 and B2, cyclin-dependent kinase 4 and the cell division cycle 2 genes and cyclin E. The expression patterns of genes of the centromer/kinetochore complex and genes involved in DNA repair were similar to the expression patterns in SCC ( Table 4).

Corroboration of array data by quantitative real-time (RT) PCR
Quantitative RT-PCR was used to verify the microarray data for 13 genes found to be differentially regulated in the different histologic lung cancer subgroups as compared to normal lung tissue. The 13 tested genes that were selected from different functional classes, with focus on the genes presented in tables 2-4, were: CASK (calcium/ calmodulin-dependent serine protein kinase), CCNB2 and NRCAM (neuronal cell adhesion molecule). The expression data generated by the oligonucleotide array and RT-PCR were highly concordant, supporting the reliability of the array analysis ( Figure 3). Of interest, the pituitary tumor-transforming gene 1 was 2.56 and 2.49fold significantly differentially expressed in SCLC and SCC, respectively, in comparison to normal lung tissue using microarray analysis. In AC, the difference of expression was not significant in microarray analysis. However, using RT-PCR for corroboration, the pituitary tumortransforming gene 1 was 5.7, 8.0 and 8.3 overexpressed in SCLC, SCC and AC, respectively, in comparison to normal lung tissue. In a previously conducted immunohistochemical study, we have demonstrated a strong pituitary tumor-transforming gene 1 expression in SCLC, adenocarcinomas, as well as in SCC, whilst a weak expression was only found in the luminal layer of normal lung epithelia, thus supporting the data of RT-PCR [27].

Class prediction using genetic programming
In order to identify genes that enable accurate distinction between AC, SCC and SCLC, as well as normal lung tissue, a genetic programming data analysis was performed. The percentages of exact predictions for all samples of a class

Discussion
In this study, a comparison of the expression pattern of the 3 major histological lung cancer subtypes, as meas-ured by array analysis, is presented. In comparison to the normal lung tissue, 205, 335 and 404 genes in AC, SCC and SCLC were found to be at least 2-fold differentially expressed. Fourteen genes of different gene families were corroborated using RT-PCR.
In AC, we found an up-regulation of keratin 7, a characteristic finding for pathologists to diagnose this subtype of lung cancer. On the other hand, keratin 5 was downregulated in AC. The differential expression is already described as a separator between AC and SCC, in line with our results [28,29]. Looking at adhesion molecules in AC, a down-regulation of the desmosomes desmoglein 3 and desmocollin 3 was found. In this context, it was shown that the invasive behavior of cells is inhibited when transfected with desmosomal components [30], suggesting that down-regulation of the desmosomes in adenocarci-  Corroboration of the results from microarray analysis using RT-PCR The extracellular cell matrix receptors integrin alpha-3 and integrin beta-2 as well as the collagen binding protein-1 (SERPHINH1) were upregulated in AC. These genes have a high affinity to collagen IV and laminin, both essential components of the basement membrane [31], possibly mediating adhesion and invasion. Additionally, we found intercellular adhesion molecule 1 (ICAM1), a cell-adhesion molecule also binding to integrin beta-2 and promoting metastasis due to tumor cell adhesion to endothelium overexpressed in AC [32,33].
Looking at the oncogenes in SCC, we found genes of the RAS associated gene family, the myc myelocytomatosis viral oncogene homolog (MYC) and musculoaponeurotic fibrosarcoma oncogene (MAF) upregulated. MAF encodes for nuclear transcriptional regulating proteins with a leucine zipper motif, and was identified in the genome of the acute transforming avian retrovirus AS42, which induces fibrosarcomas and has the ability to transform chicken embryo fibroblasts [34].
It is noteworthy that in SCC 5 members of the collagen family type I, V, VI, and XI were upregulated. An increased collagen synthesis might be associated with carcinogenesis, as in patients with breast cancer the emerging fibrotic focus is regarded as an indicator of tumor angiogenesis and independent predictor of early metastasis [35].
SCLCs show an up-regulation of 3 proto-oncogenes, which have not been described in this context so far. The DEK oncogene encodes for a 375 amino acid chromatin binding protein, which introduces supercoiling in DNA. It has been described to be upregulated in other tumor types, such as bladder cancer, glioblastoma, melanoma and leukemia [36]. The Qin oncogene, originally isolated from avian sarcoma virus, causes oncogenic transformation. Qin is the avian orthologue of mammalian brain factor-1 or forkhead box G1 (FOXG1B), a gene which belongs to the human forkhead-box gene family [37]. Possibly related to the neuroendocrine differentiation of SCLC, forkehead box G1 is essential for the proliferation and survival of cerebro-cortical progenitor cells [38]. Further, we found the Fyn oncogene upregulated in SCLC.
Fyn is a member of the src family which is activated in colorectal cancer, and has also been identified in melanoma cells with elevated cell motility and spreading ability [39,40].
With regard to adhesion molecules, the overexpressed neural cell adhesion molecule 1 is useful for the diagnosis of SCLC [41,42]. Next to neural cell adhesion molecule 1 we found other genes significantly upregulated such as the Purkinje cell protein 4, secretory granule neuroendocrine protein 1, synaptotagmin 1 and the neuronal cell adhesion molecule (NRCAM) that seems to reflect the neuronal heritage of this particular lung tumor subtype. NRCAM belongs to the L1 family immunoglobulin-like CAMs, which are involved in the guidance, growth and fasciculation of neuronal cells [43]. Neuronal cell adhesion molecule has also been described in 2006 by Taniwaki and colleagues', who performed comprehensive gene expression profiles of pure SCLC cells derived from laser-microdissected tissue samples [44]. In order to confirm the overexpression of the neuronal cell adhesion molecule using a different technique, we corroborated the result of microarray analysis using RT-PCR, showing a 9.3fold overexpression of the neuronal cell adhesion molecule in SCLC in comparison to lung tissue.
The imbalance of activated oncogenes and lost tumor suppressor genes, found in different types of lung cancer, may be associated with the different tumor growth kinetics. SCLC is the fastest growing lung tumor with a median tumor doubling time of 50 days [45]. This is reflected by our data with regard to the number and strength of upregulated cell cycle genes affecting growth rate. Several cyclines, their associated cyclin-dependent kinases and cell division cycle (CDC) genes controlling cell cycle progression, such as cyclin A2, B2 and E2, and cyclin-dependent kinase 2 and 4, as well as cell division cycle 2, 20 and 25B were upregulated [46]. The activation level of different cell cycle genes may be relevant with regard to new antitumor agents, which selectively target cell cycle proteins. For example, flavopiridol has the ability to induce cell cycle arrest by binding and inhibiting different cyclindependent kinase such as 2 and 4 [47,48]. Both CDKs are significantly upregulated in SCLC. On the other hand, the upregulation of cyclin-dependent kinase 2, that is critical for cell entry and progression through S phase of the cell cycle, is missing in NSCLC. Preclinical data support this finding since most NSCLC cell lines are resistant to flavopiridol-induced apoptosis unless they were treated during S phase. Furthermore, the IC 50 of flavopiridol-treated cells in SCLC cell lines is three times lower compared to NSCLC cell lines [49]. Consequently, this drug might be more promising in patients with SCLC.We have further shown that genes involved in mismatch repair, such as mutS homolog 2 or 6, were upregulated in SCLCs, which  is in line with other reports showing that these gene transcripts and proteins are present [50,51], in contrast to NSCLCs, where high resolution deletion mapping reveals frequent allelic losses at the DNA mismatch repair loci mutS homolog 3 [52]. Similar to the latter report, we have observed a downregulation of mutS homolog 3 in ACs.
After outlining potentially important molecular differences in different subtypes of lung cancer to normal lung tissue, we were interested in defining how many and which genes are necessary for correct classification of the lung tumor subtype. Using genetic programming (GP), a training set of 34 tissue samples was applied. With an evolutionary algorithm of GP, 40 genes were sufficient for a correct discrimination between all lung tumor tissue types and normal lung tissue. The 40 selected genes, identified using GP, were a subset of the genes, which were previously identified to be differentially expressed using cluster analysis. Following identification of the 40 genes with GP, further 13 tissue samples of previously treated patients NSCLC lung cancers were correctly classified with 100% prediction accuracy. It is important to note that the samples of the training set were from treatment naïve patients, while the test set came from those that were previously treated for their cancer using platinum-based chemotherapy. Nevertheless, the presented genes for distinction seem to maintain their value, independent from whether or not the patient had been treated. However, caution must be applied, since in the test set did not contain additional SCLC samples and larger sample size is needed which includes samples from all lung cancer subtypes in order to confirm the predictor.

Conclusion
Our data show the different gene expression profiles in dependence from the histological type of lung cancer, which reflects the specific biological characteristics of the respective tumor subtype. These data may form the basis for a molecular classification system and allows a further insight into the altered genomic progress of the lung cancer cell, which may help to develop molecularly targeted drugs.