Deep sequencing of gastric carcinoma reveals somatic mutations relevant to personalized medicine

Background Globally, gastric cancer is the second most common cause of cancer-related death, with the majority of the health burden borne by economically less-developed countries. Methods Here, we report a genetic characterization of 50 gastric adenocarcinoma samples, using affymetrix SNP arrays and Illumina mRNA expression arrays as well as Illumina sequencing of the coding regions of 384 genes belonging to various pathways known to be altered in other cancers. Results Genetic alterations were observed in the WNT, Hedgehog, cell cycle, DNA damage and epithelial-to-mesenchymal-transition pathways. Conclusions The data suggests targeted therapies approved or in clinical development for gastric carcinoma would be of benefit to ~22% of the patients studied. In addition, the novel mutations detected here, are likely to influence clinical response and suggest new targets for drug discovery.


Background
Despite recent decline of mortality rates from gastric cancer in North America and in most of Northern and Western Europe, stomach cancer remains one of the major causes of death worldwide and is common in Japan, Korea, Chile, Costa Rica, Russian Federation and other countries of the former soviet union [1]. Despite improvements in treatment modalities and screening, the prognosis of patients with gastric adenocarcinoma remains poor [2]. To understand the pathogenesis and to develop new therapeutic strategies, it is essential to dissect the molecular mechanisms that regulate the progression of gastric cancer. In particular, the oncogenic mechanisms which can be targeted by personalized medicine.
The term "oncogene addiction" to describe cancer cells highly dependent on a given oncogene or oncogenic pathway was introduced by Weinstein [3,4]. The concept underscores the development of targeted therapies which attempt to inactivate an oncogene, critical to survival of cancer cells whilst sparing normal cells which are not similarly addicted.
Several oncogenes activated at high frequency in other cancers have also been shown to be mutated in gastric cancer. It follows that marketed therapeutics targeting these oncogenes would effectively treat a proportion of gastric carcinomas, either as single agents or in combination. In January 2010, trastuzumab was approved in combination with chemotherapy for the first-line treatment of ERBB2-positive advanced and metastatic gastric cancer. Trastuzumab is the first targeted agent to be approved for the treatment of gastric carcinoma and an increase of 12.8% in response rate was seen with addition of Trastuzumab to chemotherapy in ERBB2 positive gastric adenocarcinoma [5,6]. It has been estimated that 2-27% of gastric cancers harbour ERBB2 amplifications and may be treated with ERBB2 inhibitors [7,8]. Similarly, overexpression of another receptor tyrosine kinase (RTK) EGFR, has been noted in gastric cancer and multiple trials of EGFR inhibitors in this cancer type are ongoing (reviewed in [9,10]). Furthermore some gastric cancers harbour DNA amplification or overexpression of the RTK MET [11,12] and its paralogue MST1R [13] and may be treated with MET or MST1R inhibitors [14][15][16][17][18][19][20]. Finally, FGFR2 over expression and amplification has been observed in a small proportion of gastric cancers (scirrhous) [21] and inhibitors have shown some efficacy in clinic [22].
Reports of the frequency of different types of oncogenic activation and their co-occurrence are limited. In contrast to gastrointestinonal stromal tumours (GIST) which are characterized by a high frequency of KIT and PDGFRA activation [38] and hence effectively treated in the majority by imitanib and sunitinib [39,40], gastric adenocarcinoma appears to be a molecularly heterogeneous disease with no high-frequency oncogenic perturbation discovered thus far. This is illustrated by a recent survey of somatic mutation in kinase coding genes across 14 gastric cancer cell lines and three gastric cancer tissues which discovered more than 300 novel kinase single nucleotide variations and kinase-related structural variants. However, no very frequently recurrent mutation or mutated kinase was uncovered [41].
With the aim of elucidating the potential for treatment of gastric carcinoma with targeted therapies either on the market, in development or to be discovered, we have characterized clinical gastric carcinoma samples to detect oncogene activation.
We took a global approach by assaying the samples on affymetrix SNP arrays and Illumina mRNA expression arrays. These technologies are well validated for detection of genotype, DNA copy number variation and mRNA expression profile. They are amenable to heterogeneous clinical samples. The samples were also interrogated by second generation (Illumina) sequencing. Relatively novel second generation sequencing technologies offer both increased throughput and deep sequencing capacity. The latter is especially important for characterizing cancer samples which tend to include a mixture of cell types including infiltrating normal cells, vasculature and tumour cell of different genotypes. In this study we utilized target enrichment and Illumina sequencing technology to sequence the coding regions of 384 genes. We decided to favour depth of coverage over wider coverage in order to capture mutations present in subpopulations within the tumours. Recent studies have shown cancers tend to harbour many mutations in a smaller number of signalling pathways [42,43] therefore we concentrated on genes in these pathways. We also included genes coding for proteins previously shown to affect response to targeted therapies and more likely to be successfully targeted by small molecule intervention, as our aim is to find more effective and novel ways of treating gastric carcinoma.

Tissue samples
DNA and RNA samples were obtained from hospitals in Russia and Vietnam according to IRB approved Protocols and with IRB approved Consent forms for molecular and genetic analysis. The medical centres themselves also have internal ethical committees with reviewed the protocol and ICFs. The samples were sourced through Tissue Solutions Ltd http://www.tissue-solutions.com/. For sample characteristics see additional file 1 table S1 Arrays Genotypes and copy number profiles were generated for each samples using 1 μg of DNA run on Affymetrix SNP V6 arrays using Affymetrix protocols. Copy number variation data was analysed within the ArrayStudio software http://www.Omicsoft.com. Data was normalized using Affymetrix algorithm and segmented using CBS. A transcript profile was generated for each sample using 1 μg of total RNA run on Illumnia HG-12 RNA expression arrays following the Illumina protocols. Data was analysed within the Illumina GenomeStudio software http:// www.illumina.com/software/genomestudio_software. ilmn. As a data pre-processing procedure, a probe set was only retained if it has a "present" (i.e. two standard deviations above background) call in at least one of the samples. Signal values of the remaining probe sets were transformed to 2-based logarithm scale and quantile normalization was performed. DNA copy and RNA expression levels were integrated at the gene level within the ArrayStudio software http://www.Omicsoft.com. Pathway enrichment analysis was performed within the GeneGO metacore analysis suite http://www.genego.com/. All array data from this study is available in GEO http:// www.ncbi.nlm.nih.gov/geo/ under series accession number GSE29999.
Targeted deep DNA sequencing 5 μg of DNA was PCR-enriched for the coding exons of any known transcript of 384 genes of interest (additional file 2 table S2) using the Raindance platform http:// www.raindancetechnologies.com/.
The resulting target libraries were sequenced using Illumnia GAII at a read-length of 54 nt. Sequence reads were mapped to the reference genome (hg18) using the BWA program [44]. Bases outside the targeted regions were ignored when summarizing coverage statistics and variant calls. SAMtools was used to parse the alignments and make genotype calls [45], and any call that deviates from reference base was regarded as a potential variant. The SAMtools package generates consensus quality and variant quality estimates to characterize the genotype calls. Accuracy of genotype calls was estimated by concordance to genotype calls from the Affymetrix 6.0 SNP microarray. Concordance matrices of samples based on both SNP and sequence data were generated to check for sample mislabelling (additional file 3 figure S1). Concordance and quantity of genotype calls were tabulated for thresholds of consensus quality, variant quality, and depth. The final set of variant calls were identified using consensus quality greater than or equal to 50 and variant quality greater than 0. To exclusively identify somatic changes, only those mutations present in the cancer sample and not detected in any of the normal samples were retained. As an additional filter for germline variants, all variants present in dbSNP and 1000 genome polymorphism datasets were removed.

Q-PCR
Q-PCR was performed via standard protocol using Fluidigm 48*48 dynamic array. Firstly, a validation run was conducted using pooled control RNA from three specimens. Four input RNA amounts were tested (125 ng, 250 ng, 375 ng and 500 ng). Triplicate data points were obtained for the subsequently 10-point serial dilution per each condition per assay. The best overall results were at 250 or 500 ng, which yielded efficiency values 85%. Therefore 250 ng input amount for the experimental samples. Data was produced in triplicate and mean combined. CT values were converted to abundance using standard formula abundance = 10(40-CT/ 3.5). Test data was normalised to housekeepers using the analysis of covariance method whereby the two housekeepers (GAPDH and beta-actin) were used to compute a robust score and the score was used as a covariate to adjust the other genes. Data analysis was performed in the Arraystudio software.

Sanger Sequencing
Genomic DNA PCR primers were ordered from IDT (Integrated DNA Technologies Inc, Coralville, Iowa). PCR reactions were carried out using Invitrogen Platnium polymerase (Invitrogen, Carlsbad, CA). 50 ng of genomic DNA was amplified for 35 cycles at 94°C for 30 seconds, 58°C for 30 seconds and 68°C for 45 seconds. PCR products were purified using Agencourt AmPure (Agencourt Bioscience Corporation, Beverly, MA). Direct sequencing of purified PCR products with sequencing primers were performed with AB v3.1 BigDye-terminator cycle sequencing kit (Applied Biosystems, Foster City, CA) and sequencing reactions were purified using Agencourt CleanSeq (Agencourt Bioscience Corporation, Beverly, MA). The sequencing reactions were analyzed using a Genetic Analyzer 3730XL (Applied Biosystems, Foster City, CA). All sequence results data were assembled and analyzed using Codon Code Aligner (CodonCode Corporation, Dedham, MA).

DNA and RNA amplification patterns across samples are consistent with previous studies
Consistent with most other human cancers, copy number changes occurred across the genomes of the 50 gastric cancer samples compared to matched normal samples ( Figure 1). Large regions of frequent amplification were found at chromosomal regions 8q, 13q, 20q, and 20p. Known oncogenes MYC and CCNE1 are located in the 8q and 20p amplicons, respectively and likely contribute to a growth advantage conferred by the amplification. These amplifications have been seen in prior studies in gastric cancer along with amplification of 20p for which ZNF217 and TNFRSF6B have been suggested as candidate driver genes [46].
Concordance between DNA copy number gain and RNA expression among the cancer samples was evaluated and the top 200 genes contained within a region of frequent high DNA copy in cancer samples and which had high mRNA levels (compared to matched normal tissue) are tabulated in additional file 4 table S3. Most of the genes on this list are from chromosomal regions 20q and 8q, suggesting that these amplifications have the most effect on mRNA levels, in the minority are genes for 20p, 3q, 7p, and 1q. Figure 2 shows the RNA profiles measured by Q-PCR of an exemplar gene from each region showing general overexpression in gastric cancer, particularly in certain samples. Besides MYC and CCNE1, there are multiple genes in these regions, which could contribute to a growth advantage for the cancer cell. The biological pathways most significantly enriched for amplified and overexpressed genes are involved in regulation of translation (p = 0.000015) and DNA damage repair (p = 0.003). Samples with amplifications in these genomic regions are annotated in Figure 3. There is no discernible tendency for amplifications in these regions to co-occur or to be exclusive. In agreement with a previous study [47], the PERLD1 locus was amplified (within the ERBB2 amplicon) in sample 08280 and MMP9 was overexpressed but not discernibly amplified. Also in Figure 3 focal DNA amplifications with concordant RNA expression of genes likely to affect the response to targeted therapies are denoted, for example underlying data see additional file 5 figure S2.

Sequencing data shows high concordance with genotyping
Sequencing library preparation failed for six of the original 50 cancer samples and fourteen of the original matched normal samples. Therefore two more matched pairs were added to the analysis, resulting in a dataset of 44 cancer samples, 36 with matched normal pairs (additional file 1 table S1). The targeted region included 3.28 MB across 6,547 unique exons in 384 genes (additional file 2 table S2). Median coverage of across all samples was 88.3% and dropped to 74% when requiring minimum coverage of 20. All sequencing was carried out to a minimum of 110x average read coverage across the enriched genomic regions for each sample. The reads were aligned against the human genome and variants from the reference genome were called. As a control, an analysis to compare genotyping calls from the Affymetrix V6 SNP arrays and the Illumina sequencing was performed. The regions targeted for sequencing contained 1005 loci covered by the Affymetrix V6 SNP arrays. With no filtering of the sequencing variant calls for quality metrics, the median agreement between the genotyping and sequencing results was 97.8% with a range of 65-99% (additional file 6a, Figure S3a). The raw overall genotype call concordance was 96.8%. Quality metrics were chosen to maximize the agreement between the genotyping and the sequencing calls while minimizing false negatives. The most informative metric was consensus quality and a cut-off of ≥50 resulted in loss of about 10% of the shared genotypes but an overall 2% increase in concordance to 98.7% (additional file 6b, Figure S3b). Variant genotype calls were isolated for further concordance analysis. In this set, a variant quality threshold of > 0 increased accuracy of variant genotype calls to 98.9% (additional file 6c, Figure S3c). When both quality thresholds were applied the median sample concordance is 99.5% (additional file 6d, Figure S3d) which is within the region of genotyping array error. Six samples (08362T1, 08373T2, 336MHAXA, 08337T1, 89362T2, DV41BNOH) had a concordance of < 98% and two of these (08393T2 and DV41BNOH) had a concordance of 82% and 88% respectively. Therefore with a consensus quality ≥ 50 and a variant quality > 0, the false positive rate was 0.5% and 1.6% for reference genotypes and variant genotypes, respectively (additional file 6e Figure S3e).
From all single nucleotide changes passing the above thresholds, all variants present in any of the normal samples or in the polymorphism databases of dbSNP (v130) or 1000 genomes were assumed to be germline variants and discarded. Variants present only in the exons of cancer samples were assumed to be somatic and retained. 18,549 somatic variants were detected in total across all 44 samples (additional file 7 Table S4), 3357 were predicted to be exonic and nonsynonymous. To prioritise for mutations with functional impact we  concentrate all further analyses on nonsynonymous mutations and highlighted mutations leading to loss or gain of stop codons. We have applied the SIFT algorithm [48] to predict amino acid changes that are not tolerated in evolution and so are more likely to affect the function of the protein, 1509 somatic nonsynonymous mutations have a SIFT score of < 0.05. The rate of mutations with SIFT score < 0.05 per gene, corrected for CDS length was calculated (4). Figure 4 shows, the genes with the highest concentration of low SIFT scoring mutations were S1PR2, LPAR2, SSTR1, TP53, GPR78 and RET, with S1PR2 being most extreme. There are fifteen mutations with SIFT score <0.05 across the 353aa CDS of S1PR2, concentrated in nine samples. S1PR2 also known as EDG5 codes for a G-protein coupled receptor of S1P and activates RhoGEF, LARG [49]. Little is known of its role in cancer and somatic mutations have not been observed in the 44 tissues sequenced for S1PR2 in the COSMIC database [50].

Sequencing data is confirmed by Sanger sequencing
Some nonsynonymous somatic mutations were selected to be confirmed by Sanger sequencing. All mutations reported in blue in Figure 3 were confirmed by Sanger sequencing and were also confirmed to be somatic by sequencing of the wildtype sequence in the matched normal tissue (see additional file 8 Figure S4 for example sequencing traces). Although 74% were confirmed, some mutations detected in the Illumnia sequencing were not confirmed as somatic mutations by Sanger sequencing. Sixteen of the 68 (24%) mutations we attempted to confirm were present in the normal and cancer sample, these are germline mutations but not detected in any of the normal samples by Illumina sequencing and also not represented in dbSNP or 1000 genomes data. Five of the sixteen germline mutations were from cancer samples with no matched normal tissue included in the dataset, the other eleven came from cancer samples with matched normal tissue sequence included in the dataset. This evidences a rate of germline contamination not eliminated by the matched normal controls or the comparison to known polymorphism databases. It may be that the coverage of the substitutions in the normal tissue happens to be lower than in the cancer sample and so some germline mutations remain despite the somatic filters. Two of the 68 (3%) mutations we attempted to confirm were not present in the normal or cancer sample by Sanger sequencing. One cause could be false positives in the Illumnia data due to artefact; however additional file 6 Figure S3 shows the false positive rate to be low at least for those variants represented on the Affymetrix V6 arrays. Another possibility is that these are present in a subset of the sample below the sensitivity of the Sanger methodology but detected by the Illumina sequencing. Therefore, mutations reported in the Illumina sequencing are also reported in purple in Figure 3, some caution is warranted when interpreting these results as they may be germline polymorphisms or present only in a subset of the tumour sample.

Alterations in the RAS/RAF/MEK/ERK pathway
Three tumour samples had KRAS genetic alterations ( Figure 3) suggesting therapeutic opportunity for treatment with MEK inhibitors. One of these alterations is a G12D mutation. KRAS G12D mutations have been shown to initiate carcinogenesis and tumour survival [51]. Amplification and overexpression of wildtype KRAS was seen in the other 2 samples. KRAS amplification has been observed before in 5% of primary gastric cancers. Gastric cancer cell lines with wildtype KRAS amplification show constitutive KRAS activation and sensitivity to KRAS RNAi knockdown [24]. A novel mutation in KRAS was also observed; (in sample 08393) the functional consequence is unknown.
The PIK3CA mutation co-occurring with KRAS G12D, is known to affect sensitivity to MEK inhibitors [25]; in addition, novel mutations observed in this study may also have consequences for the same class of therapeutics. For instance: KSR2 functions as a molecular scaffold to promote ERK signalling [52,53]. Therefore, mutations in KSR2 such as seen in seven samples may affect sensitivity to MEK inhibitors. A second example is ULK1, which positively controls autophagy downstream of mTOR [54] and is mutated in fourteen samples. Autophagy is increased along with ERK phosphorylation when gastric cancer cells are treated with a proteasome inhibitor [55], therefore mutations in ULK1 may affect sensitivity to proteasomal inhibitor treatments such as bortezomib as a single agent or in combination with MEK inhibitors.

Alterations in the PI3K/AKT pathway
There was substantial sequence disruption of the phosphoinositide-3-kinase (Pi3K) pathway genes in the sample set. There are a number of PI3K/AKT/mTOR inhibitors in clinical development and patients with activating mutations in the pathway are candidates for treatment [56]. PIK3CA mutations of known oncogenicity were found in four samples. This results in a frequency of PIK3CA hotspot mutation of 9%, slightly higher than previous estimates of 6% (12/185) [27] and 4.3% (4/94) [57]. The common PIK3CA hotspot mutations of known oncogenicity (E545K and H1047R) [58] were observed twice each. Another mutation in PIK3CA K111E, which has also been observed before in four samples in COSMIC, was observed once and potentially novel somatic mutations were observed in two more samples.
Five nonsynonymous AKT1 mutations were observed. Although AKT1 mutations are found in about 2% of all cancers, they mainly occur at amino acid 15 and the functional importance of mutation at other sites is unknown. Another nonsynonymous mutation in AKT2 was observed in sample 08407. AKT2 mutations are much rarer than AKT1 mutations, although an AKT2 mutation has been observed before in gastric carcinoma, at a 2% frequency [59]. Finally mutation of PTEN or MTOR may affect response to pathway inhibitors. Several PTEN mutations are noted and MTOR mutations are frequent.

Alterations in Receptor Tyrosine Kinases
The receptor tyrosine kinases (RTKs) and drug targets EGFR, ERBB2 and MET were each amplified (log2 > 0.6) and overexpressed at the RNA level in one cancer sample. It follows that the tumours may be sensitive to the inhibitors of the amplified RTKs. In addition, multiple nonsynonymous mutations are observed in their coding regions. Downstream mutations would be expected to influence response. For instance, in the MET amplified sample a truncating mutation in AKT3 may affect sensitivity to MET inhibitors.
FGFR2 is amplified and RNA overexpressed in two samples, there are also multiple mutations in FGFR1-4. Broad range RTK inhibitors, which target FGFRs among other kinases, may be efficacious in these patients [60,61].

Alterations in Cell Cycle Proteins
The viral oncogene homolog SRC is mutated in four of the tumour samples, two of the mutations are predicted to have a deleterious effect including introduction of a stop codon. This may counter-indicate SRC inhibitors. MET amplification is also a known resistance marker for anti-SRC therapeutics such as dasatanib [62,63]. The cell cycle related kinase, AURKA was amplified and overexpressed in one sample. AURKA inhibitors are in development for solid tumours [37] and may be indicated in this case. CCNE1 was amplified in two samples (08390 and 08357). High levels of CCNE1 have been shown to be frequently associated with early gastric cancer and metastasis but expression levels do not correlate with survival [64,65]. High CCNE1 levels have been suggested as a sensitivity marker for the gene-directed pro-drug enzymeactivated therapies [66] Activation of wnt pathway is common in the carcinoma samples Mutations were observed in the APC gene in 22 samples. APC is a tumour suppressor known to activate CTNNB1 and wnt pathway signalling, amongst other effects [67]. The wnt pathway has been previously found to be frequently activated in gastric cancer [68]. We used a transcriptional signature, generated from previous studies [69,70] and available at the Broad Institute MSigDB database to classify the study samples by their wnt transcriptional signatures. Figure 5A shows a heat map of the transcriptional levels of the WNT signature genes in the datasets. Activation of this pathway is higher in nearly all the cancer samples compared to the normal samples. Wnt inhibitors are the subject of intense investigation in pharmaceutical and academic research [71][72][73]. These results suggest they will have an indication in gastric cancer as well as many other cancers.
Activation of the hedgehog pathway is also common in the carcinoma samples PTCH1 is a tumour suppressor and acts as a receptor for the hedgehog ligands and inhibits the function of smoothened. When smoothened is freed, it signals intracellularly leading to the activation of the GLI transcription factors [74]. Multiple somatic mutations of PTCH1 are recorded in COSMIC, consistent with its tumour suppressor role. The D362Y mutation seen in this study in sample FICJG, is in the fourth transmembrane domain of PTCH1 and has been previously seen as a loss-of-function germline mutation in a patient with Gorlin syndrome, predisposing to neoplasms (numbered D513Y due to different transcript) [75]. Therefore, sample FICJG is very likely to have deregulated hedgehog signalling and does indeed have high levels of GLI target genes (as defined by [74] (Figure 5B)). Other samples also contain PTCH1 mutations in the Illumina sequence data, including a truncating stop codon (Y140X) in sample 08379 and have high levels of hedgehog signature genes. Hedgehog signalling has previously been shown be frequently activated in gastric cancer [76] though no genetic cause has been previously implicated. Inhibitors of the hedgehog pathway are in clinical development [77,78].

Loss of Epithelial phenotype
Epithelial or mesenchymal status has been shown to affect response to multiple drugs [79] and samples may be more resistant due to loss of an epithelial phenotype. Both hedgehog and wnt signalling upregulate mesenchymal precursors such as BMP4 and mutations can lead directly to loss of epithelial phenotype. CDH1 is a marker of an epithelial phenotype and is often lost in gastric tumours due to the process of epithelial to mesenchymal transformation (EMT) and is a negative prognostic marker [80]. Mutations in CDH1 were observed in nine samples, including a D254G mutation in CDH1 was detected in sample 08359. A mutation at the same site (D254Y) has been recorded in COSMIC in a breast tumour and 211 somatic mutations have been observed in the 2732 samples sequenced for CDH1 in COSMIC. Mutation in SMAD4 is also likely to affect epithelial phenotype. Loss of SMAD4 function facilitates EMT and its re-expression reverses the process in cancer cell lines [81]. Mutations in tumour suppressor SMAD4 were observed in ten samples.

Sensitivity to chemotherapy
Multiple substitutions in BRCA1 were observed in ten samples, including three cases of substitution of a stop codon. Germline mutations in BRCA1 predispose patients to breast and ovarian cancer, multiple somatic mutations have been found in tumours [82]. expression levels and polymorphic status has been shown to correlate with sensitivity to chemotherapeutics in gastric cancer [83,84]. Therefore, the observed mutations of BRCA1 may affect sensitivity to chemotherapy.
Another commonly mutated gene which is linked to sensitivity to chemotherapy in gastric cancer is TP53 [85]. Eight examples of TP53 mutation including two stop codons are seen in the dataset.
Mutations in TRAPP were found in 22 samples, including one mutation to a stop codon. TRRAP is a component of histone acetyltransferase complexes and is implicated in oncogenic transformation and cell fate decisions through chromatin regulation [86]. Loss of function mutations of the Sacchromyces pombe orthologue of TRRAP, cause defects in G2/M cell cycle control and resistance to CHK1 overexpression [87]. Mutations in TRAPP are likely to affect response to HDAC and CHK1 inhibitors currently approved and in trials for use as anticancer agents [88][89][90][91][92].

Novel targets for therapies in gastric cancer
An additional aim of our study was to uncover novel drug targets for gastric cancer. Many novel perturbations were observed in tractable target genes, following are three examples which warrant further investigation.
Thyrotropin receptor (TSHR) is mutant in four samples. The A553T mutation of TSHR found in sample 08360, has been previously been observed in two siblings with congenital hypothyroidism and was found to be inactivating [93]. Both loss and gain of function TSHR mutations are often found in thyroid cancer [94]. However, a role for TSHR in other cancers has not been elucidated, although infrequent mutations in lung cancer are recorded in COSMIC and TSHR has been shown to be lost at the DNA level, in some gastric cancers [95]. Three of the four TSHR mutations found have very low SIFT scores and may suggest deregulation of this growth hormone pathway.
We used the COPA algorithm [96] to identify mRNAs with outlier expression in the cancer samples. The top gene identified was KLK6. KLK6 is not detected or detected at very low levels in the normal samples, whilst its expression is very high in eleven of the cancer samples. Figure 6 shows the expression profile of KLK6 across the samples, confirmed by Q-PCR. KLK6 has previously been shown to be over expressed in gastric cancer and RNAi mediated knockdown of KLK6 in gastric cancer cell lines has been shown to be anti-proliferative and anti-invasive [97,98].
Finally, mutations in the Rho associated coiled-coil containing protein kinases (ROCK1 and ROCK2) are interesting in view of their role as effectors of RhoA GTPase and the recent finding that truncating mutations in ROCK1 (similar to the confirmed ROCK2 mutation in this study) are activating and lead to increased motility and adhesion in cancer cells [99].

Discussion
Gastric adenocarcinoma rates vary widely across geographical regions, gender, ethnicity and time [100]. Diet has been shown to significantly influence gastric cancer risk as have tobacco smoking and obesity [101]. The infectious agent Helicobacter pylori is intimately associated with the most common types of gastric adenocarcinoma development [102]. H. pylori colonizes the stomach of at least half the world's population, virtually all persons infected with H. pylori develop gastric inflammation, which confers an increased risk for developing gastric cancer; however, only a fraction of infected individuals develop the clinical disease [103]. H. pylori induces generalized mutation and genomic instability in host DNA [104], which along with the complex risk profile suggests diverse routes to oncogenesis in gastric adenocarcinoma.
Therefore, an individualized personal medicine approach, measuring molecular targets in tumours and suggesting treatment regimens based on the results, is attractive. A recent study using this approach across tumour types has reported improved outcomes [105]. The trial used IHC, FISH and microarray technologies to assay levels of molecular targets in tumours, as the authors mention, second generation sequencing techniques offers a more complete picture of tumour mutagenic profile and will be even more informative in identifying sensitivity and resistance biomarkers.

Conclusions
This study evidences previously observed perturbations of the KRAS, ERBB2, EGFR, MET, PIK3CA, FGFR2 and AURKA genes in gastric cancer and suggests some of the targeted therapies approved or in clinical development would be of benefit to 11 of the 50 patients studied. The data, also suggests that agents targeting the wnt and hedgehog pathways would be of benefit to a majority of patients. The previously undocumented DNA mutations discovered are likely to affect clinical response to marked therapeutics and may be good drug targets. Detection of these mutations was enabled by Illumina sequencing and the concordance with genotyping arrays shows its suitability for heterogeneous cancer samples. These "nextgen sequencing" techniques are just at the beginning of expanding our abilities to detect genome wide DNA mutation, DNA copy number, RNA levels and epigenetic changes, in each patient's genome. However, it remains a challenge to filter germline from somatic mutations and sort driver mutations with functional import from passenger mutations.
Whole genome studies using both Sanger and nextgen sequencing have revealed mutagenic profiles of other cancers in unprecedented completeness and detail [41,[106][107][108][109][110][111][112]. Similar studies with large numbers of samples will be critical to fully appreciate the mutagenic diversity in gastric cancer and identify the important driver mutations. Bodies such as the ICGC (International Cancer Genomics Consortium) are currently collecting gastric adenocarcinoma samples.
Translation of these findings to clinic will require pinpointing of important mutations as well as easier access to broad diagnostic assays and clinical development of agents targeting low-frequency events [113]. Data such as that presented here, is a necessary preliminary step in delivering the maximum benefit from the major advances of targeted therapies and personalized medicine to gastric cancer patients.

Additional material
Additional file 1: Table S1: Sample characteristics.
Additional file 2: Table S2: List of genes sequenced.
Additional file 3: Figure S1: Concordance matrices of samples based on array and sequence data.
Addtional file 4: Table S3: Top 200 genes with amplification at the DNA levels and concordant overexpression at the mRNA level.
Additional file 5: Figure S2: Array data evidencing focal amplifications. Top panels show mRNA expression data from arrays, bottom panels show log2 value for DNA abundance in genomic context as derived from SNP arrays.
Additional file 6: Figure S3: Comparison of genotyping calls with sequencing data. A total of 1005 common loci were mapped between the Affymetrix 6.0 SNP microarray and the targeted regions. Concordance of genotype calls between affymetrix 6.0 SNP and SAMtools with no filters applied (top left). Application of a consensus quality filters (threshold values plotted as points) improves concordance (y-axis) but reduces the total number of calls (x-axis)(top right). A similar trend is observed for the variant quality thresholds, but at different threshold values (plotted points)(middle left). Sample concordance of genotype calls is improved with consensus quality filter >= 50 and variant quality > 0 (middle right). The total number of genotype calls stratified by reference or variant genotype, and concordance (bottom left). Additional file 7: Table S4: All somatic variants detected.
Additional file 8: Figure S4: Sanger sequencing traces. Sanger sequencing traces for variants denoted by blue boxes in Figure 3 (i.e. confirmed in Illumnia and Sanger) are provided.