A strategy for detection of known and unknown SNP using a minimum number of oligonucleotides applicable in the clinical settings
© Wang et al; licensee BioMed Central Ltd. 2003
Received: 02 July 2003
Accepted: 20 August 2003
Published: 20 August 2003
Detection of unknown single nucleotide polymorphism (SNP) relies on large scale sequencing expeditions of genomic fragments or complex high-throughput chip technology. We describe a simplified strategy for fluorimetric detection of known and unknown SNP by proportional hybridization to oligonucleotide arrays based on optimization of the established principle of signal loss or gain that requires a drastically reduced number of matched or mismatched probes. The array consists of two sets of 18-mer oligonucleotide probes. One set includes overlapping oligos with 4-nucleotide tiling representing an arbitrarily selected "consensus" sequence (consensus-oligos), the other includes oligos specific for known SNP within the same genomic region (variant-oligos). Fluorescence-labeled DNA amplified from a homozygous source identical to the consensus represents the reference target and is co-hybridized with a differentially-labeled test sample. Lack of hybridization of the test sample to consensus- with simultaneous hybridization to variant-oligos designates a known allele. Lack of hybridization to consensus- and variant-oligos indicates a new allele. Detection of unknown variants in heterozygous samples depends upon fluorimetric analysis of signal intensity based on the principle that homozygous samples generate twice the amount of signal. This method can identify unknown SNP in heterozygous conditions with a sensitivity of 82% and specificity of 90%. This strategy should dramatically increase the efficiency of SNP detection throughout the human genome and will decrease the cost and complexity of applying genomic wide analysis in the context of clinical trials.
The human genome project provides the first reference sequence of all human chromosomes with the remaining challenge of characterizing the frequency of deviations from this reference among individuals . It is estimated that 1.42 million SNP are distributed throughout the human genome and about 60,000 SNP fall within exons . Detection of SNP due to genetic variation in a given population (polymorphisms) or epigenetic changes throughout life (mutations) is important since it often has functional implications. In fact, 25 % of the known non-synonymous SNP could affect the function of the correspondent gene product [3–6]. Yet, it is still unclear whether the prevalence of common diseases can be truly attributed, at least in part, to SNP because of the incomplete information available about SNP prevalence throughout the genome. The completion of the human genome project could not provide comprehensive knowledge about sequence variations because sequences are based on information derived from randomly chosen individuals [1, 2] and there are only few examples of systematic searches for genetic variants within a specific genomic region . However, in the context of clinical research a large number of individuals may need to be screened when investigating associations between genetic variation and disease susceptibility. In such an endeavor, a tool capable of efficiently identifying known and flagging unknown SNP could dramatically increase the efficiency of the study of human pathology through direct application of genome-derived information .
Known SNP can be readily detected using oligo-array-based techniques [9–12] or comparable high-throughput systems -[14, 15]. Detection of unknown SNP, however, is not as readily achievable because most current methodologies are based on the utilization of probes encompassing only known variant sequences [16, 17]. Thus, identification of unknown SNP has relied on high-throughput sequencing which is burdened by high cost and demanding requirements for sample preparation. To improve the efficiency of SNP detection, high-density oligonucleotide arrays have been proposed that cover all possible sequence permutations of the genomic region of interest [7, 9, 18, 19]. These arrays are characterized by extreme accuracy not only for detecting but also in providing definitive sequence information about SNP . However, for each genomic region a complex SNP array needs to be assembled as for the 4L (length of nucleotide) oligomer probes . These arrays are composed of oligomer probes that query sequential positions in the genome spanning the length of the probe each one overlapping the previous one of one base. For each position a set of four oligos is prepared identical except at a single, generally central, position systematically substituted with each of the four nucleotides. Thus for a given genomic region a number of oligos equal to the number of bp investigated times 4 is spotted to the array. To query a 16,569-base pairs (bp) sequence 66,276 probes were necessary . Similarly others have investigated BRCA1 and ATM genes using 96,600 and >90,000 oligonucleotides for genomic regions encompassing 3,450 and 9,170 bp respectively [10, 19]. Although this approach could potentially cover the full genome, it might not be justified for genomic areas with no polymorphism . In addition, preparation of these arrays would be disproportionate for genomic areas with very low density of SNP. In those cases it would be preferable to obtain more information about the location of highly polymorphic sites prior to the design of 4L tiled arrays or other comprehensive high-throughput systems. Finally, this approach would not be justifiable in situation where SNP occur extremely rarely in a given population. In those cases a tool that could identify the rare individuals carrying SNP could indicate few instances where routine sequencing of a limited genomic region could be more appropriate than the preparation of complex high-density arrays. Thus, a simplified screening tool that could discriminate conserved from polymorphic genomic regions or identify rare individuals carrying unusual SNP could dramatically restrict the use of high-throughput sequencing or guide the production of high-density 4L tiled arrays.
We describe here a strategy that utilizes the well-established principle of loss or gain of hybridization signal [9, 19]. for the screening of genomic regions that requires ~250 overlapping oligos to cover a 1 kb consensus sequence (consensus oligos) or 4,142 rather than 66,276 oligos to cover a 16.569 bp genomic region as for the previous example . Thus, the proposed strategy should be considered a screening tool applicable for the investigation of unexplored areas of the human genome prior to extensive sequencing expeditions or the construction of high-density arrays. In addition, in situations where allelic variation is already revealed, the array can be complemented by a number of oligos equal to the number of known SNP within that region. This number, therefore, is proportional to the degree of pre-documented polymorphism of a given genomic fragment. This strategy proposes a fluorimetric detection of SNP by proportional hybridization to oligonucleotide arrays. The reference sample, from a homozygous cell line identical to the consensus (a,a), and test sample are amplified by PCR followed by in vitro transcription to generate single stranded RNA. Array data generated from hybridization of fluorescence-labeled reference (i.e. Cy3, green) and test (i.e. Cy5, red) cDNA sample to consensus and additional oligos, representing known SNP (variant oligos), is compared and represented as natural log of the fluorescence intensity ratio (LogRatio). In diploid organisms, four type of combinations can occur: I) Homozygosity identical to the consensus (a,a); II) Homozygosity different from the consensus (b,b); III) heterozygosity containing one allele identical and one different from the consensus (a,b); IV) heterozygosity with both alleles different from the consensus (b,c). Although this conceptually applies to whole genes, in loci containing more than one polymorphic site, this distinction applies to regions investigated by individual oligos; while, for the whole gene various combinations can simultaneously occur. Thus, in this paper we will refer to the various combinations by specifying ad hoc whether we are referring to the whole gene or a specific region of the same gene.
In case of a,a homozygosity, similar fluorescence intensity is expected in both channels with a theoretical Cy5/Cy3 fluorescence intensity ratio = 1 (LogRatio = 0). This can be experimentally tested by arbitrarily selecting genomic fragments within the investigated region that, based on available information, are most likely conserved in every potential test sample (displayed in yellow in Figure 1) . In this region, reference and test samples can be predicted to be a,a homozygous. Consistent deflection from 0 of LogRatios in these oligos denotes biases of labeled target or reference. Thus, the average of the LogRatio for these oligos is used as a normalization factor, constant (k), to correct the bias of both channels in the rest of the data set. The unlikely occurrence of SNP within the constant region could still be detected since in such cases, the LogRatio of one oligo will diverge from the rest of the constant region oligos.
Since human genomes are diploid, polymorphisms can occur in combinations and, therefore, detection of SNP should be possible in the context of heterozygosity. This discrimination can be achieved through fluorimetric assessment of the hybridization pattern based on the general principle that homozygous samples (two identical alleles) will generate twice the amount of signal for a given sequence than a heterozygous sample (Figure 3, rows III-VII). In the context of heterozygosity, a single allele hybridization to variant oligos generates a weaker signal than in the homozygous condition resulting in a lower LogRatio (specific hybridization over background). (Figure 3, row III-V Va1 and row IV Va2). Competitive hybridization to consensus oligos will lead to two patterns. In a,b heterozygosity (portrayed as lighter green by the digital image) at least one allele of the test sample will hybridize to the consensus resulting in LogRatio depression of lesser magnitude (lighter green) than in b,b homozygosity where none of the test samples hybridizes to the consensus (Figure 3, row III and VII). In b,c heterozygosity none of the alleles hybridizes to consensus oligos and, therefore, a situation similar to b,b homozygosity occurs in the consensus oligos with strongly depressed LogRatios (Figure 3, row IV-VI).
Occurrence of more than one SNP within an oligo could complicate the data analysis. For instance, in the case described in Figure 5d, two regions with a double mismatch are observed which are characterized by LogRatios disproportionately high for the HLA-A*0201/2901 heterozygous state (p and q depict oligos 13-SP-A-02 and 3-SP-A-2901 respectively, specific for the two alleles). In conditions of HLA-A*0201/2901 heterozygosity, consensus oligos corresponding to polymorphism p representing A*0201, strongly hybridized to the reference sample resulting in a b,b homozygosity hybridization pattern. In the case of polymorphism q representing A*2901 variant, the same A*0201 region is identical to the consensus and, therefore, a,b heterozygosity pattern is observed.
An unexplainable finding was observed where specific hybridization to a variant oligo is not associated with depressed LogRatios of the correspondent consensus oligo (z in Figure 5d). A purposeful mismatch, a C → A switch, may have less repercussions on the hybridization pattern of this oligo. Indeed, even in homozygous conditions (z in Figure 6) the LogRatios are only minimally depressed in this case. In these relatively rare occurrences (only case in our study) an "unknown" polymorphism would not be detected.
We, therefore, describe a potentially powerful and efficient strategy for high-throughput screening of genes for which little is known about their polymorphism. This strategy could also be used to identify mutations in disease processes or for typing known allelic variants of well-characterized genes such as HLA. This, however, would require specialized design of numerous oligos encompassing known variants and supportive software for efficient data interpretation. Various scenarios are best exemplified by using as a model a highly polymorphic region of the human genome such as exon 2 of the HLA-A locus. The simplest case would be the investigation of a gene characterized by minimal or no polymorphism. In this case, screening of samples from different ethnic groups would yield a pattern described as a,a type homozygosity, similar to the one shown in Figure 8a. Another possibility would be the investigation of a gene with few but relatively common polymorphism(s). In this case the occurrence of b,b type homozygosity would be common as depicted in Figure 8b. In the same situation, a,b type heterozygosity would also frequently occur as shown in Figure 8c. Finally, a most complex scenario, likely to occur only for genes characterized by high polymorphic prevalence is portrayed in Figure 8d. In this case, b,c type heterozygosity should occur frequently as it might be expected for the HLA loci. However, SNP occur in the human genome on average every 600 – 2,000 bases [1, 2]. Therefore, most genes are characterized by a relatively narrow range of polymorphism that would allow a relatively simple design of oligo-array chips and interpretation of results. Independently of the genomic region investigated, this strategy can identify unknown variants through observation of disproportionably depressed LogRatios in consensus oligos.
The SNP detection system described her may provide a great improvement in the ability to screen different genes for the frequency and location of polymorphic sites, which ca be confirmed by site directed sequencing limited to the region of interest. Thus, the best application of this strategy stems from the clinical need to rapidly segregate genes characterized by presence or lack of polymorphisms in their coding or regulatory regions that may affect clinical behaviors. A good example of such application is the screening of cytokines, chemokines and their receptors whose polymorphism(s) have been associated to individual predisposition to immune pathology, survival of transplanted organs and predisposition to cancer [16, 22–25].
Material and methods
Oligo nucleotide probe design
The HLA-A locus exon 2 region from position 73–346, allele A*01011, was used for the design of consensus oligos (Figure 1). Allele-specific (variant) oligos were designed based on single or double nucleotide variants according to alignment to the arbitrarily selected consensus sequence (HLA-A*0201). Variant and consensus oligos consisted of 18-mers with a 5' amino-modifier having a six-carbon spacer for immobilization (Operon Technologies, Inc, Alameda, CA). The polymorphic site in the variant oligos was designed in the centermost position. Melting temperatures of oligo probes were maintained as close as possible to a range of 56–60°C. 350–400 pmol/μ l oligo probes synthesized in 96-well format were dried and re-suspended in 3 × SSC for printing.
Oligonucleotide array printing and post process of slides
Probes were spotted onto a 3D-link activated slide for covalent immobilization (Motor roller) using OmniGrid robotic printer (Genemachine) with four printing pins picking up 0.25 μl of probe solution and depositing 0.6 nl per spot (TeleChem International, Inc.). Each spot was quadruplicated to minimize printing bias and test reproducibility. Spot diameter was 90–100 μm, spaced at 250 μm to prepare 4 × 16 × 6 spot/arrays. After printing, slides were kept in a sealed humidifier chamber at room temperature overnight and blocked with 50 mM ethanolamine, 0.1 M Tris, pH9 and 0.1% SDS at 50°C for 15 minutes followed with rinsing in water twice, washing with 4 × SSC/0.1% SDS 50°C for 45 minutes, rinsing with water briefly and centrifuging at 800 rpm for 3 minutes with micro-plate carriers. Arrays were then stored in a desiccator until use.
Test and consensus reference samples
Genomic DNA was isolated from EBV transformed B cell line. 12 heterozygous or homozygous samples were tested for oligo nucleotide array hybridization and confirmed by sequence-based typing using the ABI Prism 3700-96 Analyzer.
Preparation of target nucleic acids
In order to generate single strand DNA, PCR products from Exon 1 to Exon 5 of the HLA-A locus were amplified with an attachment of T7 promoter to the 5' end using 5' T7-EX1A-6 primer 5'AAACGACGGCCAGTGAATACGACTCACTATAGGCGCCAGACGCCGAGGATGGCC3') and three 3' primers (3' EX 5-A 993-1 CAT TGC TGG CCT GGT TCT CC; 3' EX 5-A 993-2 CAT TGC TGG CCT GGT TCT CTT; 3' EX 5-A 993-3 CAT TGC TGG CCT AGT TCT CTT). PCR reaction was mixed with 25 μl of HotStart PCR reagents (Qiagen, CA), 5 ng-0.5 μg of genomic DNA, 5 μl of 15 μM 5'primer, 5 μl of 15 μM 3'primer mix and H2O to a 50 μl final volume. The reaction was cycled at 95°C for 10 minutes, 96°C for 35 seconds, 65°C for 45 seconds, 72°C for 3 minutes, 4 cycle; 96°C for 30 seconds, 60°C for 40 seconds, 72°C for 3 minutes, 19 cycles and 96°C for 30 seconds, 55°C for 40 seconds, 72°C for 2 minutes, 9 cycles. One μl of PCR product from each sample was analyzed using a Bioanalyzer on DNA7500 chip (Agilent Biotechnology). Approximately 2,000 bp amplicons from each sample were amplified. The PCR products were then precipitated with EtOH at room temperature and re-suspended in DEPC-treated H2O at 0.1 μg/μl concentration. In vitro transcription (IVT) was performed using an Ambion T7 Megascript Kit (Cat. #1334). For each sample, the following reaction mixture was made: 4 μl of each 75 mM NTP (A, G, C and UTP), 4 μl reaction buffer, 4 μl enzyme mix (RNase inhibitor and T7 phage polymerase) and 1 μg purified PCR product in 16 μl DEPC-treated H2O. The reactions were then incubated at 37°C for six hours to permit transcription. Amplified RNA was then purified using TRIzol reagent according to manufacture instruction (GibcoBRL) and re-suspended in 40 μl of DEPC water. RNA concentrations were estimated by using a Bioanalyzer on RNA 6000 chip (Agilent Biotechnology).
Target labeling and hybridization
Fluorescence-labeled single strand cDNA was generated by reverse transcription (RT) and used for hybridization. In the RT reaction, 4 μl of first strand buffer, 1 μl random hexmer (8 μg/μl; Boehringer Mannheim), 2 μl 10 × low T-dNTP (5 mM A, C and GTP, 2 mM dTTP), 2 μl 1 mM Cy-dUTP (Cy3 for reference sample or Cy5 for test sample unless otherwise specified), 2 μl 0.1 M DTT, 1 μl Rnasin and 1.2 μg amplified RNA in 8 μl DEPC H2O were mixed and heated for five minutes at 65°C. This was followed by addition of 1 μl Superscript II (Life technology,), 40 minutes of incubation at 42°C, another 1 ul of Superscript II and 50 more minutes continued incubation at 42°C. Reactions were stopped by addition of 2.5 μl 500 mM EDTA, 5 μl 1 M NaOH and heated to 65°C for 15 minute to hydrolyze the RNA. Tris buffer (12.5 μl of 1 M) was added immediately to neutralize the pH, and the volume risen to 70 μl by adding 35 μl of 1 × TE. Target solution was then applied to Bio-6 column according to the manufacturer's instructions. The flow through mixed with 200 μl 1 × TE was concentrated to 20–40 μl using Microcon YM-30 column (Millipore) and further concentrated to 8 ul using speed-vacuum.
Cy3- and Cy5-labeled probes were combined (1:1 ratio) and 5 μl 20 × SSC, 0.5 μl 10% SDS and 0.5 μl of 4 mg / ml salmon sperm DNA were added to the probe for hybridization. The samples were then heated for two minutes at 99°C. Prepared probe mixture was applied to an array slide with cover slid and hybridized at 47°C for different amounts of time as described in the text. Slides were washed with 4 × SSC, 2 × SSC with 0.1 % SDS, 1 × SSC, 0.2 × SSC and 0.05 × SSC sequentially for one minute each step and dried by centrifugation at 800 rpm for 3 minutes. The slides were then scanned for fluorescent signal using a GenePix 4000B scanner and the results analyzed using GenePix Pro3 software (Axon Instruments, Inc.).
- Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R: Large-scale identification, mapping and genotyping of single-nucleotide polymorphisms in the human genome. Science. 1998, 280: 1077-1082. 10.1126/science.280.5366.1077.View ArticlePubMedGoogle Scholar
- The International SNP Map Working Group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001, 409: 928-933. 10.1038/35057149.View ArticleGoogle Scholar
- Cooper DN, Ball EV, Krawczak M: The human gene mutation database. Nucleic Acids Res. 1998, 26: 285-287. 10.1093/nar/26.1.285.PubMed CentralView ArticlePubMedGoogle Scholar
- Ng PC, Henikoff S: Accounting for human polymorphisms predicted to affect protein function. Genome Res. 2002, 12: 436-446. 10.1101/gr.212802.PubMed CentralView ArticlePubMedGoogle Scholar
- Collins FS, Brooks LD, Chakravarti A: A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 1998, 8: 1229-1231.PubMedGoogle Scholar
- Schafer AL, Hawkins JR: DNA variation and the future of human genetics. Nature Biotech. 1998, 16: 33-39. 10.1038/5412.View ArticleGoogle Scholar
- Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science. 2001, 294: 1719-1723. 10.1126/science.1065573.View ArticlePubMedGoogle Scholar
- Kwok PY: Genetic association by whole-genome analysis. Science. 2001, 294: 2669-1670. 10.1126/science.1066921.View ArticleGoogle Scholar
- Chee M, Yang R, Hubbell E, Berno A, Xiaohua C, Stern D: Accessing genetic information with high-density DNA arrays. Science. 1996, 274: 610-614. 10.1126/science.274.5287.610.View ArticlePubMedGoogle Scholar
- Hacia JG, Brody LC, Chee MS, Fodor SP, Collins FS: Detection of heterozygous mutations in BRCA1 using high density oligonucleotide arrays and two-colour fluorescence analysis [see comments]. Nat Genet. 1996, 14: 441-447.View ArticlePubMedGoogle Scholar
- Saiki RK, Walsh PS, Levenson CH, Erlich HA: Genetic analysis of amplified DNA with immobilized sequence-specific oligonucleotide probes. Proc Natl Acad Sci U S A. 1989, 86: 6230-6234.PubMed CentralView ArticlePubMedGoogle Scholar
- Lockhart DJ, Dong H, Byrne MC, Folliette MT, Gallo MV, Chee MS: Expression monitoring of hybridization to high-density oligonucleotide arrays. Nature Biotechnol. 1996, 14: 1675-1680.View ArticleGoogle Scholar
- Chen J, Iannone MA, Li M-S, Taylor JD, Rivers P, Nelsen AJ: A microsphere-based assay for multiplex single nucleotide polymorphism analysis using single base chain extension. Genome Res. 2000, 10: 549-557. 10.1101/gr.10.4.549.PubMed CentralView ArticlePubMedGoogle Scholar
- Tong AK, Ju J: Single nucleotide polymorphism detection by combinatorial fluorescence energy transfer tags and biotinylated dideoxynucleotides. Nucleic Acids Res. 2002, 30: e19-10.1093/nar/30.5.e19.PubMed CentralView ArticlePubMedGoogle Scholar
- Kwok PY: High-throughput genotyping assay approaches. Pharmacogenomics. 2000, 1: 95-100.View ArticlePubMedGoogle Scholar
- Turner D, Choudhury F, Reynard M, Railton D, Navarrete C: Typing of multiple single nucleotide polymorphisms in cytokine and receptor genes using SNaPshot. Hum Immunol. 2002, 63: 508-513. 10.1016/S0198-8859(02)00392-0.View ArticlePubMedGoogle Scholar
- Guo Z, Gatterman MS, Hood L, Hansen JA, Petersdorf EW: Oligonucleotide arrays for high-throughput SNPs detection in the MHC class I genes: HLA-B as a model system. Genome Res. 2002, 12: 447-457. 10.1101/gr.206402. Article published online before print in February 2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Hacia JG: Resequencing and mutational analysis using oligonucleotide microarrays. Nature Genetics. 1999, 21: 42-47. 10.1038/4469.View ArticlePubMedGoogle Scholar
- Hacia JG, Sun B, Hunt N, Edgemon K, Mosbrook D, Robbins C: Strategies for mutational analysis of the large multiexon ATM gene using high-density oligonucleotide arrays. Genome Res. 1998, 8: 1245-1258.PubMedGoogle Scholar
- Adams SD, Barracchini KC, Simonis TB, Stroncek D, Marincola FM: High throughput HLA sequence-based typing utilizing the ABI prism 3700 analyzer. Tumori. 2001, 87: s41-s44.Google Scholar
- Swets JA: Measuring the accuracy of diagnostic systems. Science. 1988, 240: 1285-1293.View ArticlePubMedGoogle Scholar
- Keen LJ: The extent and analysis of cytokine and cytokine receptor gene polymorphism. Transpl Immunol. 2002, 10: 143-146. 10.1016/S0966-3274(02)00061-8.View ArticlePubMedGoogle Scholar
- McCarron SL, Edwards S, Evans PR, Gibbs R, Dearnaley DP, Dowe A: Influence of cytokine gene polymorphism on the development of prostate cancer. Cancer Res. 2002, 62: 3369-3372.PubMedGoogle Scholar
- Howell WM, Turner SJ, Bateman AC, Theaker JM: IL-10 promoter polymorphisms influence tumour development in cutaneous malignant melanoma. Genes Immun. 2001, 2: 25-31. 10.1038/sj.gene.6363726.View ArticlePubMedGoogle Scholar
- Howell WM, Bateman AC, Turner SJ, Collins A, Theaker JM: Influence of vascular endothelial growth factor single nucleotide polymorphisms on tumour development in cutaneous malignant melanoma. Genes Immun. 2002, 3: 229-232. 10.1038/sj.gene.6363851.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.