Ambiguous allele combinations in HLA Class I and Class II sequence-based typing: when precise nucleotide sequencing leads to imprecise allele identification

Sequence-based typing (SBT) is one of the most comprehensive methods utilized for HLA typing. However, one of the inherent problems with this typing method is the interpretation of ambiguous allele combinations which occur when two or more different allele combinations produce identical sequences. The purpose of this study is to investigate the probability of this occurrence. We performed HLA-A,-B SBT for Exons 2 and 3 on 676 donors. Samples were analyzed with a capillary sequencer. The racial distribution of the donors was as follows: 615-Caucasian, 13-Asian, 23-African American, 17-Hispanic and 8-Unknown. 672 donors were analyzed for HLA-A locus ambiguities and 666 donors were analyzed for HLA-B locus ambiguities. At the HLA-A locus a total of 548 total ambiguous allele combinations were identified (548/1344 = 41%). Most (278/548 = 51%) of these ambiguities were due to the fact that Exon 4 analysis was not performed. At the HLA-B locus 322 total ambiguous allele combinations were found (322/1332 = 24%). The HLA-B*07/08/15/27/35/44 antigens, common in Caucasians, produced a large portion of the ambiguities (279/322 = 87%). A large portion of HLA-A and B ambiguous allele combinations can be addressed by utilizing a group-specific primary amplification approach to produce an unambiguous homozygous sequence. Therefore, although the prevalence of ambiguous allele combinations is high, if the resolution of these ambiguities is clinically warranted, methods exist to compensate for this problem.


Introduction
The precise identification of HLA Class I and Class II alle-les is critical for successful hematopoietic progenitor transplants, the development of peptide based viral and cancer vaccines, and investigating immune response [1,2]. DNA sequencing is one of the most comprehensive methods available for HLA typing. Sequence-based typing (SBT) involves PCR amplification of specific coding regions of HLA genes and sequencing of the amplicons [3,4]. SBT allows for a detailed interpretation of HLA alleles by comparing nucleotide sequences of the polymorphic, and sometimes conserved, regions of the HLA gene to a database of possible allelic combinations. While SBT permits the highest resolution of genotypes, like all typing methods, it has limitations. One of the inherent problems when using SBT is the interpretation of ambiguous allele combinations which can occur for several reasons [5][6][7].
There exist two main types of ambiguous typing results obtained with SBT. The first is when a heterozygous sequence can be explained by more than one possible pair of alleles within the region analyzed. The second exists when alleles are defined by a polymorphism outside the region analyzed. In addition to these two situations, a third type of ambiguity arises when an allele has an incomplete sequence in the region analyzed.
The prevalence of ambiguities in HLA typing relates to the nature of polymorphisms which exists in the sequence of the major histocompatibility complex (MHC) Class I and Class II genes. The majority of polymorphisms that distinguish one MHC allele from another are oftentimes due to gene conversion, recombination and exon shuffling events. Due to this, polymorphic motifs at given positions are generally shared among several alleles.
Sequence-based typing involves PCR amplification and sequencing of specific HLA exons, which are known to be polymorphic, from genomic DNA. For each HLA locus both alleles are amplified and sequenced; therefore, it is not always possible to determine exactly which two alleles were responsible for sequence results. For example two or more different allele combinations can combine to produce identical sequences due to the heterozygous base pair combinations, the first type of ambiguity. More specifically, in the Class I region, HLA-B*070201, 3503 would have the same nucleotide sequence as HLA-B*0724, 3533 in positions 559 and 560 ( Figure 1). In this example, the SBT produces a heterozygous base pair combination at positions 559 and 560 with an international union of biochemists (IUB) designation of K(G+C)W(A+T). Therefore, the interpretation to the high resolution level can not be made because it is not known which allele combination is correct.
The second type of ambiguity relates to defining a polymorphism outside the region analyzed. For example, many HLA-A alleles are defined by a polymorphism located in Exon 4 (Table 3). Traditionally, for Class I typing most laboratories only sequence Exon 2 and Exon 3; for Class II typing most laboratories only sequence Exon 2. This approach has been the standard due to the functional relevance of this region which defines the peptide groove of Class I and Class II molecules, respectively. However, some Class I alleles have identical sequences across Exons 2 and 3. To resolve these alleles it is necessary to analyze the gene at the region where they differ. As DNA sequencing has become easier and more widely applied to defining HLA alleles, additional polymorphisms have been found in other exons, and also in the introns.
Finally, an ambiguity may be due to incomplete sequence information, because not all alleles have been sequenced for the same exons. For some alleles the entire sequence is not known in the region that is amplified. For example, A*010101 has been sequenced from Exon 1 through Exon 8, but A*010102 has been sequenced only in Exon 2 and Exon 3. Numerous ambiguities arise due to an incomplete sequence in Exon 4. The minimum requirements for submission of new sequences into reference databases of HLA sequences are the sequencing of Exon 2 and Exon 3 for Class I and Exon 2 for Class II.
The relevance of completely identifying the polymorphisms found by SBT needs consideration. In clinical respects, it may not always be necessary to resolve ambiguities that involve a silent non-coding polymorphism and/or an intron polymorphism. Exceptions will exist to this situation where the polymorphism negates or impairs expression (e.g. A*24020102L or B*15010102N -both are due to an intron polymorphism). However, for inves-Two different HLA-B allele combinations that yield identical sequence based-typing results The purpose of this study was to summarize the incidence, nature, and cause of ambiguous HLA SBT results. This represents an important step toward developing strategies to reduce or eliminate this problem.

DNA Isolation
Genomic DNA was isolated from peripheral blood using the Gentra PUREGENE ® isolation kit (Gentra Systems, Minneapolis, MN, U.S.A). The DNA was resuspended in Tris HCl buffer (pH 8.5) and the concentration was measured using a Pharmacia Gene Quant II Spectrophotometer. The DNA was then stored at -70°C until testing.

Sequence-Based Typing
The primary PCR amplification reaction consists of a 1.5 kb reaction encompassing exon 1 through intron 3 of the HLA region. All reagents necessary for primary amplification and sequencing are supplied in the HLA-A or HLA-B AlleleSEQR Sequenced Based Typing Kits (Forensic Analytical, Hayward, CA, U.S.A.). The primary amplification PCR products were purified from excess primers, dNTPs, and genomic DNA using ExoSAP-IT (Amersham Life Science, Cleveland, OH, U.S.A.) Each template was sequenced in the forward and reverse sequence orientation for exon 2 and Exon 3 according to protocols supplied with the SBT kit. Excess dye terminators were removed from the sequencing products utilizing an ethanol precipitation method with absolute ethanol. The reaction products were reconstituted with 15 µl of Hi-Di™ Formamide (PE Applied Biosystems/Perkin-Elmer, Foster City, CA, U.S.A.) and analyzed on the ABI Prism ® 3700 DNA Analyzer with Dye Set file: Z and mobility file: DT3700POP6 {ET}.

Results
Sequence based typing analysis of HLA-A and B alleles was performed on a population of 676 normal donors. The racial distribution of the subjects studied was: 615 Caucasian, 13 Asian, 23 African American, 17 Hispanic and 8 Unknown. 672 of the 676 subjects were analyzed for the  presence of HLA-A locus ambiguities and 666 were analyzed for HLA-B locus ambiguities. Each allele was counted separately in this analysis in order to determine the total percentage of ambiguous allele combinations. Four new potential alleles were found.
At the HLA-A locus a total of 548 ambiguous allele combinations were found. This represented 41% of all HLA-A alleles (548 of 1344) (Table 1). Approximately half, 51% (278 of 548) of these ambiguities were due to the fact that Exon 4 analysis was not performed ( Table 2 and Table 3). HLA-A*01 and HLA-A*24 are very prevalent alleles and most ambiguities involving these alleles could be resolved by performing Exon 4 analysis. For example the ambiguity most prevalent for HLA-A*01 in this study was HLA-A*0101/0104N. The sequences of these two alleles are identical across Exons 2 and 3; the difference between these two alleles occurs at position 627insC, which is located in Exon 4.
A large portion of HLA-A locus sequence-based typing ambiguities involved HLA-A*02, 30% or 162 of 548. Some of the HLA-A*02 ambiguities can also be resolved via Exon 4 sequencing and most of the other HLA-A*02 ambiguities can be resolved with traditional A*02 molecular subtyping methodologies using sequence specific primers or sequence specific probes.
Not all HLA-A ambiguous allele combinations can be resolved as simply as those involving A*01, A*024 and A*02. Most of the remaining 19% (108 of 548) of the HLA-A ambiguities cannot be resolved with Exon 4 analysis ( Table 3).
Review of the HLA-B locus results revealed 322 ambiguous allele combinations among the 1332 total HLA-B alleles (24%). Antigens HLA-B*07/08/15/27/35/44, common in Caucasians, produced the largest portion of the ambiguities (279 of 322 or 87%). Each of these ambiguities had an independent reason for occurring. Table 4 lists some of the more common B locus ambiguities seen in this study. The reason for each ambiguity is variable; however a large portion of the ambiguities are related to cis/trans allele combinations.

Discussion
While SBT provides the best available typing of HLA-A and B antigens, it is limited by sequence results that don't allow the precise identification of alleles. We found that 41% of HLA-A alleles and 24% of HLA-B alleles were ambiguously typed. The ambiguities involve some of the most frequent HLA-A and HLA-B antigens: A*01, A*02, A*24, B*07, B*08, B*15, B*27, B*35, and B*44. However, ambiguous allele combinations occur in all loci tested in HLA. The IMGT/HLA Sequence Database http:// www.ebi.ac.uk/imgt/hla/ maintains an updated listing of all ambiguous possibilities http://www.ebi.ac.uk/imgt/ hla/ambig.html [5,8].
The need to initiate additional testing to clarify ambiguous allele combinations must consider whether it is practical to obtain the information and if the information is useful and valuable. The clinical need for the highest resolution HLA typing possible is an important variable that must be considered. When typing is performed for cancer and viral vaccine development studies, high resolution allele data may be necessary to determine if a subject has an HLA type that is appropriate for a study. Utilization of high resolution data may also have implications for hematopoietic progenitor cell transplantation. Transplants involving partially mismatched or unrelated donor-recipient pairs require a higher resolution typing, but those involving HLA identical siblings may not.
If it is necessary to resolve an ambiguous typing, a variety of different methods can be used. If the ambiguity is due to an allele that has not been completely sequenced or because the ambiguity is outside the region amplified by the SBT assay, the resolution is dependent on the nature and complexity of the ambiguity. Traditionally, for Class I sequencing purposes most laboratories have performed Exon 2 and Exon 3 analysis alone and for Class II sequencing only Exon 2 analysis. Many of the ambiguities can be resolved by sequencing Exon 4. In fact, in this study the largest portion of the typing ambiguities can be resolved by sequencing exon 4. However, many polymorphisms in exon 4 have no functional significance, so it may not be worthwhile resolving most ambiguities involving exon 4. The requirement for the analysis of Exon 4 to reduce the incidence of typing ambiguity has now been realized by commercial kit manufacturers. Both Celera Diagnostics (Alameda, CA) and Forensic Analytical/Atria Genetics (South San Francisco, CA) now include reagents for analysis of exon 4 in the HLA-A and -B kits. As the use of SBT increases, more data may become available from non-traditional exons in addition to those that have been traditionally sequenced and the number of ambiguities due to unknown sequences will decrease.
If the ambiguity is due to an identical heterozygote sequence, as shown in figure 1, the ambiguous allele combinations can sometimes be addressed by utilizing a group-specific primary amplification approach. In this approach each allele is amplified separately by using group specific primers for the alleles in question. A homologous sequence for each separate allele can then be obtained by sequencing the product of the group specific amplifications. Currently, there are commercially available kits (Forensic Analytical, Hayward, CA) for group specific amplification of the B locus. These kits allow the primary amplification of a specific group. Upon discovery of a particular ambiguity, a group specific amplification is done to separate out the allele pair. The resultant sequence will be homozygous for each allele in question. Another method, which will reduce the number of ambiguities in the B locus, is the utilization of a two tube group amplification approach (DYNAL Biotech, Brown Deer, WI). This method allows for resolution of ambiguities by taking into account the cis/trans allele combinations which result from simultaneous nucleotide incorporation for DNA templates being sequenced. This method allows for separation of ambiguities when the ambiguity has arisen due to a cis/trans situation. Ambiguities utilizing this method are reduced by 56%. (Table 4) Another method for separation of alleles is Haploprep™. (Genovision, Philadelphia, PA). Haploprep™ physically separates a diploid sample into its haploid components. Once the haplotypes are separated, routine HLA typing methods can be performed to determine the alleles. This laboratory is currently conducting studies to determine the efficacy of this product. Several of the remaining HLA loci ambiguities can be managed utilizing in-house custom group specific primary amplification mixes. Other methods which have been utilized to produce a homologous sequence include cloning, reference strand conformational analysis (RSCA), Pyrosequencing™ and denaturing high-performance liquid chromatography (DHPLC) [9][10][11]. Pyrosequencing™ is being explored by this laboratory and results at this time are preliminary. This method relies on the identification of a correct dispensation order of nucleotides during the Pyrosequencing™ process. Each ambiguity would require a separate dispensation order to be determined due to the unique nature of each ambiguity. The initial setup of this technology may be cumbersome; however, once established it may become very streamlined due to the availability of data on different ambiguous allele combinations. Each one of these methods has advantages and concerns which must be thoroughly investigated by the laboratory. Some, not all, ambiguous allele combinations produced by having identical heterozygote combinations can be resolved utilizing traditional sequence specific primers (SSP) or sequence specific oligonucleotide probes (SSOP). This may be a more viable approach for laboratories if they are already performing one of these technologies.
In conclusion, although the prevalence of ambiguous allele combinations is high, methods exist to compensate for this problem. As the HLA field continues the discovery of new alleles, alternative approaches to discerning ambiguous allele combinations will need to be investigated in order to reduce the ever-growing number of ambiguities.