Pyrosequencing™ : A one-step method for high resolution HLA typing

While the use of high-resolution molecular typing in routine matching of human leukocyte antigens (HLA) is expected to improve unrelated donor selection and transplant outcome, the genetic complexity of HLA still makes the current methodology limited and laborious. Pyrosequencing™ is a gel-free, sequencing-by-synthesis method. In a Pyrosequencing reaction, nucleotide incorporation proceeds sequentially along each DNA template at a given nucleotide dispensation order (NDO) that is programmed into a pyrosequencer. Here we describe the design of a NDO that generates a pyrogram unique for any given allele or combination of alleles. We present examples of unique pyrograms generated from each of two heterozygous HLA templates, which would otherwise remain cis/trans ambiguous using standard sequencing based typing (SBT) method. In addition, we display representative data that demonstrate long read and linear signal generation. These features are prerequisite of high-resolution typing and automated data analysis. In conclusion Pyrosequencing is a one-step method for high resolution DNA typing.


Introduction
Solid organ transplantation and allogeneic stem cell transplantation currently represent a common treatment for end-stage organ failure and several hematological and non-hematological malignances. Matching of patient and unrelated donor for human leukocyte antigen (HLA) molecules significantly decreases the probability of graft rejection, graft vs. host disease and transplant-related mortality [1]. However, the extensive diversity of the HLA genes makes the identification of matched donors extremely challenging. Although, in several instances it might not be feasible to identify perfect matches, algorithms have been developed that allow identification of likely histocompatibility based on the molecular definition of individual alleles [2,3]. This algorithm grades mismatches according to the number of variant epitopes present between donor and recipient. As histocompatibility is inversely correlated with number of mismatches it is likely that sequencebased information that provides the definitive information about HLA allele identity will become increasingly important in the future. High-resolution information about HLA alleles identity is best achieved using sequencing-based methodology that could be performed using high-throughput automated systems [4]. Although significant advancement has been made in resolution, automation, throughput and data analysis in DNA sequencing and other polymorphism analysis techniques, the search continues for more efficient methods that could resolve cis/trans ambiguities in highly polymorphic genetic systems such as HLA genes. Currently, commonly used HLA molecular typing methods include sequence specific oligonucleotide probes (SSOP), polymerase chain reaction (PCR) using sequence specific primers (SSP) and sequence based typing (SBT) [5]. Among them, SSOP solely exploits DNA hybridization and, therefore, results in the most cis/trans ambiguities. SSP can solve ambiguous combinations if primers are designed to cover the geneomic region where the ambiguity is present. In this case, amplification of the genomic region framed by two primers assures the occurrence in cis of these two regions. This strategy, however, requires a large number of primers to reach a desired resolution and cover various combinations of ambiguous sites within HLA loci. SBT provides by far the highest resolution and currently represents the golden standard for high resolution DNA typing and novel allele discovery. In addition, recent advances made possible to perform SBT at a high throughput level in routine HLA typing laboratories [4]. The biggest challenge that SBT of HLA alleles incurs is the resolution of intrinsic cis/trans ambiguities that cannot be solved by SBT unless time consuming cloning of individual genes is performed [6]. This is because nucleotide incorporation proceeds simultaneously along all DNA templates in a SBT reaction [7].
Pyrosequencing™ [9][10][11] is a real-time, sequencing by synthesis method catalyzed by four kinetically well-balanced enzymes, DNA polymerase, ATP sulfurylase, luciferase, and apyrase. It fundamentally differs from Sanger's sequencing method in the order of nucleotide incorporation. Each nucleotide is dispensed and tested individually for its incorporation into a nascent DNA template. Each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of nucleotide incorporated. ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5' phosphosulfate. ATP then drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. The light is detected by a charge coupled device (CCD) camera and displayed as a peak in a pyrogram™. Each peak height is proportional to the number of nucleotides incorporated. Unincorporated dNTP and excess ATP are continuously degraded by Apyrase. After the degradation is completed, the next dNTP is added and a new Pyrosequencing cycle is started. As the process continues, the complementary DNA strand is built up. To pyrosequence an unknown DNA sequence, a cyclic nucleotide dispensation order (NDO) is generally used. As a result of each cycle of dATP, dGTP, dCTP and dTTP dispensation, one of the four dNTPs is incorporated into the DNA template while the other dNTPs are degraded by Apyrase. When a DNA sequence is known, non-cyclic NDOs can be programmed with predictable pyrograms. Nucleotide sequence is determined from the order of nucleotide dispensation and peak height in the pyrogram.
Based on the programmable nucleotide incorporation feature of Pyrosequencing, we set out to optimize Pyrosequencing for high resolution HLA DNA typing. Here we describe the design of NDO that generates a pyrogram that is unique for any given allele or combination of alleles. We present unique pyrograms generated from each of the heterozygous HLA templates that would otherwise be cis/trans ambiguous using sequencing based typing (SBT) methods. We also present representative data that demonstrate long read and linear signal generation. These features are prerequisite of high-resolution typing and automated data analysis. In conclusion, Pyrosequencing can be used as a one-step method for high resolution DNA typing and could be applied in several settings spanning from HLA typing in support of donor/recipient selection to become a complement to comprehensive immunogenetic profiling in several clinical setting where other aspects of immune polymorphism need to be explored [8].

Design of nucleotide dispensation order (NDO) that generates unique pyrogram for any allele or combination of alleles
Two types of nucleotide dispensation can be used to pyrosequence a homozygous HLA template. An in-phase dispensation results in incorporation of nucleotides into all templates at the same base pair position(s). A negative dispensation results in no incorporation of any nucleotide, generating background signal (zero peak) only. Introducing negative dispensations at different positions results in different pyrograms from the same homozygous template [11]. In addition to in-phase and negative dispensations, it is possible to exploit out-of-phase dispensation to pyrosequence heterozygous DNA templates. Outof-phase dispensation results in nucleotide incorporation along one allele, which put the sequencing reaction ahead of the other allele. Nucleotide incorporation can become in-phase again at various downstream positions, which can be controlled by NDO. Figure 1 shows five NDOs designed to sequence heterozygous genomic regions of the HLA-class II locus, DRB1. In this case, the goal is to the differentiate DRB1*11011, 13011 (black bars) combination from the DRB1*0319, 1320 (Red bars) whose sequences are only different at positions 5'-298-299-3'. All NDOs start with nucleotide incorporation at position 5'-286-3' and end at or after nucleotide incorporation at position 5'-299-3' of both alleles. Each pyrogram peak represents the sum of nucleotide incorporation at each nucleotide dispensation step, into all DNA templates in the same reaction mixture in either in-phase or out-of- . NDO 1 to NDO 5 are designed to pyrosequence these two pairs of heterozygous templates. X-axis represents the NDO. G, A, T and C represent the dNTP that is dispensed at each. Alphabetical numbers represent the dispensation step. E.g. the first step dispenses dATP. Y-axis represents theoretical peak height, shown as number of nucleotide incorporated into the two molecules of template alleles at each nucleotide dispensation step. Black peaks represent signals generated from DRB1*11011, 13011. Red peaks represent signals generated from DRB1*1111, 1306.

C G T A C G T A C G T A C G T A C G T A C
phase fashion. NDO 1 requires the least number of nucleotide dispensations but it generates the same theoretical pyrogram from both templates. NDO 2, which is a typical cyclic NDO, generates unique theoretical pyrograms from each template but it requires more nucleotide dispensations than the other four NDOs, partly due to the inclusion of four negative dispensations. NDO

Pyrosequencing resolves intrinsic sequencing based typing (SBT) cis/trans ambiguity
Although high-resolution SBT of HLA allleles provides the highest resolution, it cannot effectively solve many intrinsic cis/trans ambiguities unless coupled with time consuming cloning of sequencing of individual clones. The sequence difference between the two heterozygous templates at position 5'-298-299-3' as described in Figure 1, for example, is a commonly encountered SBT ambiguous example. In an effort to solve this SBT ambiguity, we tested whether or not experimentally obtained pyrogram matched the theoretical pyrogram predicted by NDO 5. Figure 2 further illustrates NDO 5 step-wise. We chose to place the 3' end of the Pyrosequencing primer just upstream of another polymorphic site at 5'-286-3', designated reference polymorphic site. Out-of-phase NDO is designed at the very first nucleotide dispensation. T is incorporated in to template DRB1*110101 but not DRB1*130101. The pyrogram output, as shown in Figure  3, demonstrates differential peak heights at all four theoretically different positions. Using the 11 th peak adjacent and upstream of the first differential peak (the 12 th peak) as a normalizer, we could observe that peak height ratios at peaks 12, 14, 16 and 18 closely correlated with theoretical peak height ratios proposed in Figure 1 (IDO 5). The 12 th peak deviates from the prediction by 18%. The  The general principles for the design of NDO can be summarized as follows: a primer is usually placed in proximity upstream of the reference polymorphic site chosen to be the one closest to the ambiguous polymorphic site to be investigated. The first nucleotide dispensation is usually out-of-phase. As a result, SBT ambiguity at one position is generally magnified into pyrograms differences at multiple peaks. This greatly enhances sensitivity and accuracy in detection of peak height differences. In our experience, ambiguities that cannot be solved by SBT within the HLA-DRB1 locus can be consistently solved by unique Pyrosequencing NDO (Wang et al, unpublished results).

Long read and linear signal generation facilitates automated data analysis
The ability to perform long Pyrosequencing reads (length of the genomic region investigated) is often necessary for reasonable throughput. It is essential for achieving high resolution when the reference polymorphic site downstream of the Pyrosequencing primer is distant from the ambiguous site. In addition to the optimized NDO, the PCR amplicons are designed to prevent background generation that could occur during a long Pyrosequencing reaction. The pyrogram shown in Figure 5 is an example of a linear and predictable reduction in signal generation with low background signal generation through 72 nucleotide dispensations. The background signal ranges from 2% to 11% with an average of 6% of the signals immediately upstream and downstream ( Figure 6, bottom panel). The low background signal makes possible the discrimination of linear sequence-specific signals. One trend line is plotted against the signals generated from dATP ( Figure  6, Top panel). A similar trend line is plotted against the signals generated from dGTP, dCTP and dTTP ( Figure 6, middle panel). The dATP trend line is plotted separately because of its kinetics slightly faster than the other three dNTPs. Note that both trend lines indicate high confidence level with R 2 greater than 95%. This linearity allows the extrapolation of the actual peak height relative to the dispensation point. Combining the two trend lines, the actual peak height can be extrapolated using the formula: "Extrapolated peak = [Split Height + (Slope × Disp#)] × Nuc#". The extrapolated peak heights only vary from the theoretical peak heights from 0% to 20%, averaging at 4.3%. This algorithm offers powerful aid to automated data analysis of Pyrosequencing results.

Discussion
Pyrosequencing offers a new approach to data acquisition, analysis and identification of known and unknown (new) alleles, in particular in heterozygous conditions. This method may represent a useful tool to the screening and characterization of polymorphic genetic markers in several clinical or experimental settings [12][13][14][15][16][17][18][19][20][21][22][23][24]. In addition, Pyrosequencing has been applied for the study of gene expression [23] and could be a usefull complement to high throughput single nucleotide polymorphism identification system as a substitute to SBT [8,24]. Here we propose that Pyrosequencing may confront the most challenging task of solving ambiguities in HLA typing by SBT in heterozygous conditions. Although its reading length is currently shorter than that routinely covered by SBT, automated dNTP dispensation could compensate for this limitation by controling simultaneous reactions in multiple wells using primers that anneal to different locations of the template DNA. In fact, a reading length of 70 to 100 nucleotides allows the high-resolution genotyping of Exon II of HLA-DRB1 (Wang et al, unpublished results). NDOs can also be designed to achieve higher throughput and lower genotyping resolution by introducing fewer numbers of out-of-phase dispensations (Wang et al, unpublished results). Without automatization, it is possible to process 96 to 384 wells PCR product by Pyrosequencing within 4 hours. Constant improvements in the chemistry for sample preparation for Pyrosequenc-Pyrograms generated from two heterozygous DNA templates using the same NDO Figure 3 Pyrograms generated from two heterozygous DNA templates using the same NDO. Top pyrogram is generated from Pyrosequencing reaction in one well of a 96 well plate. Bottom pyrogram is generated from Pyrosequencing reaction in a separate well of a 96 well plate. The X axis of each pyrogram, from left to right, indicates the order of reagent addition. E represents enzyme. S represents substrate. The expected number of nucleotides incorporation into each pair of heterozygous DNA template and actual peak height are indicated for the normalizer peaks and differential peaks below each pyrogram. At the bottom of the figure is shown expected ratio of peak heights and the ratio of normalized peak heights (in shadowed area).   ing and Pyrosequencing [25][26][27][28][29][30][31][32][33][34] and the implementation of automation devices http://www.pyrosequencing.com it may be possible in the future to apply this technology directly for routine typing of HLA and other immune related genes characterized by extensive polymorphisms [8].

DNA samples
Genomic DNA samples were locally available or obtained from the International Histocompatibility Workshops (IHW) cell lines panel, UCLA interchange panel and samples.

PCR amplification
Each PCR amplification mixture of 50 µl contains 1 × PCR buffer (made in house), 2 mM MgCl2, 0.2 mM of each dNTP (purchased from Amersham Biosciences Inc.), 0.2 mM PCR primers, 2 U Taq DNA polymerase, and 250 ng genomic DNA. Either forward or reverse primer is biotinylated. PCR reaction starts with a 95°C denaturation for 5 minutes. This is followed with a 50-cycle thermal cycling. Each cycle is programmed to include 30 seconds denaturation at 95°C, 60 seconds annealing at appropriate temperature, and a 10 seconds final extention at 72°C. The PCR amplicon produced is enough for more 8 pyrosequencing reactions. The PCR amplicons used in this The same NDO can generate unique pyrogram from DRB1*030101, 130101 Vs Figure 4 The same NDO can generate unique pyrogram from DRB1*030101, 130101 Vs. DRB1*0319, 1320. Top pyrogram is generated from Pyrosequencing reaction in one well of a 96 well plate. Bottom pyrogram is generated from Pyrosequencing reaction in a separate well of a 96 well plate. The X axis of each pyrogram, from left to right, indicates the order of reagent addition. E represents enzyme. S represents substrate. The expected number of nucleotides incorporation into each pair of heterozygous DNA template and actual peak height are indicated for the normalizer peaks and differential peaks below each Pyrogram. At the bottom of the figure is shown expected ratio of peak heights and the ratio of normalized peak heights (in shadowed area).
work is 286 bp containing Exon II and the flanking intron sequences. Linear signal generation through a 72 nucleotide dispensation Pyrosequencing run. Shown on the top in red is the Pyrosequencing primer sequence from 5' end to 3' end in the direction the red arrow points to. In this Pyrosequencing reaction, nucleotide incorporation into DRB1*1201, 1302 starts immediately downstream of the 3' end of Pyrosequencing primer and ends at the 3' end template sequence shown. The polymorphic positions are shown and underlined in blue. Pyrogram is shown below template sequence. Y-axis represents peak heights. X-axis represents NDO.

DRB1*1201015'-TGGAACAGCCAGAAGGACATCCTGGAAGACAG AGGCGCGCCGCGGTGGACACCTAT TTGCAGACACAACTAC-3' DRB1*1302015'-TGGAACAGCCAGAAGGACATCCTGGAAGACGA GAGCGGGCCGCGGTGGACACCTAC CTGCAGACACAACTAC-3'
The percentage of each background peak over sequence-specific signal is calculated by dividing each background peak height with the average peak height of the proximal upstream and downstream sequence-specific peak height Figure 6 The percentage of each background peak over sequence-specific signal is calculated by dividing each background peak height with the average peak height of the proximal upstream and downstream sequence-specific peak height. Top panel depicts a trend line generated from all the "A" peak heights of the pyrogram shown in Figure 5. Middle panel depicts a trend line generated from all the "G, C and T" peak heights of the pyrogram shown in Figure 5. R 2 are shown on the upper right corners of both panels. Bottom depicts theoretical and extrapolated peak heights generated from the NDO and peak heights shown in top panel. The formula that is used is: Extrapolated peak= [Split Height + (Slope × Disp#)] × Nuc#. Average background is the average of all background signals. MAX background is the highest background signal over sequence specific signal. MIN background is the lowest background signal over sequence specific signal.