HCV Envelope protein 2 sequence comparison of Pakistani isolate and In-silico prediction of conserved epitopes for vaccine development
© Idrees et al.; licensee BioMed Central Ltd. 2013
Received: 31 January 2013
Accepted: 23 April 2013
Published: 30 April 2013
HCV is causing hundreds of cases yearly in Pakistan and has become a threat for Pakistani population. HCV E2 protein is a transmembrane protein involved in viral attachment and thus can serve as an important target for vaccine development but because of its variability, vaccine development against it has become a challenge. Therefore, this study was designed to isolate the HCV E2 gene from Pakistani HCV infected patients of 3a genotype, to perform In-silico analysis of HCV E2 isolated in Pakistan and to analyze HCV E2 protein sequence in comparison with other E2 proteins belonging to 3a and 1a genotypes to find potential conserved B-cells and T-cell epitopes that can be important in designing novel inhibitory compounds and peptide vaccine against genotype 3a and 1a.
Patients and methods
Patients were selected on the basis of elevated serum ALT and AST levels at least for six months, histological examination, and detection of serum HCV RNA anti-HCV antibodies (3rd generation ELISA). RNA isolation, cDNA synthesis, amplification, cloning and sequencing was performed from 4 patient’s serum samples in order to get the HCV E2 sequence. HCV E2 protein of Pakistani origin was analyzed using various bioinformatics tools including sequence and structure tools.
HCV E1 protein modeling was performed with I-TASSER online server and quality of the model was assessed with ramchandran plot and Z-score. A total of 3 B-cell and 3 T-cell epitopes were found to be highly conserved among HCV 3a and 1a genotype.
The present study revealed potential conserved B-cell and T-cell epitopes of the HCV E2 protein along with 3D protein modeling. These conserved B-cell and T-cell epitopes can be helpful in developing effective vaccines against HCV and thus limiting threats of HCV infection in Pakistan.
KeywordsHCV E2 protein Sequencing 3D structure Epitopes
Hepatitis C virus (HCV) is a global health problem and a significant risk factor in developing liver associated diseases including hepatocellular carcinoma. HCV has affected 270 million people worldwide of which 10 million belongs to Pakistan . Hundreds of HCV cases are reported each year in Pakistan and according to the prevalence analysis it is clear that HCV genotype 3a is most common in all provinces of Pakistan  except in Balochistan where the most prevalent subtype is 1a . Due to six genotypes and their variability, HCV vaccine development has always been a challenge and for this, structural and non- structural proteins are being targeted to develop an effective vaccine.
HCV is a plus strand virus having a genomic RNA and viral envelope proteins, namely E1 and E2  that are anchored in a host derived lipid protein membrane surrounding the nucleocapsid composed of several copies of core protein. E1 and E2 have molecular weights of 33–35 and 70–72 kDa, respectively [5–7]. E2 is highly glycosylated and contains up to 11 N-linked glycosylation sites, with most of the sites being well conserved. In addition, E2 contains hypervariable regions with amino acid sequences differing up to 80% between HCV genotypes and between subtypes of the same genotype [8–10]. E2 glycoprotein is a key molecule regulating the interaction of the HCV with cell surface proteins and binds to the major extracellular loop of human CD81, a tetraspanin expressed in various cell types including hepatocytes and B lymphocytes , its truncated forms also interacts with scavenger receptor type B class 1 protein (SRB-1) and high density lipoprotein (HDL) binding molecule [12–15]. Mannose binding proteins (DC-SIGN and L-SIGN) have been suggested to have interactions with the HCV E2 but their function in viral entry is unclear . HCV E2 posses’ glycosylation sites which interact directly with cell surface receptors enabling the virus to enter the cell [17–20], therefore it is important to target this protein to stop viral entry.
For designing effective inhibitors against envelope proteins, it is important to have knowledge of sequence and structure of protein. Bioinformatics analysis has open new vistas to provide more insights into protein sequence and structural features. Therefore, this study was designed to isolate the HCV E2 sequence from HCV infected patients of 3a genotype and to analyze conservation and variability for designing conserved B-cells and T-cells epitopes. B-cell and T-cell epitopes are important in raising the desired immune responses and number of epitopes and modulation of immune recognition of antigens can be influenced by deglycosylation of viral glycoproteins . As knowledge of epitopic regions on protein is important in designing effective inhibitors,  therefore, both B-cell and T-cell epitopes were predicted that were well conserved in the HCV E2 protein of genotype 3a and 1a.
Source of serum samples
The local HCV 3a serum samples from 4 patients were collected from CAMB (Center for Applied Molecular Biology) diagnostic laboratory, Lahore, Pakistan after clinical diagnosis under the provision of the Institutional Review Board (IRB) of NCEMB (National Center of Excellence in Molecular Biology), University of the Punjab Lahore, Pakistan. The participating subjects gave informed consent for the collection of blood samples for this study. Patients were selected on the basis of elevated serum ALT and AST levels at least for six months, histological examination, and detection of serum HCV RNA anti-HCV antibodies (3rd generation ELISA).
RNA isolation, cDNA synthesis, amplification and cloning of the HCV E2 gene
Sequences of Primer used for PCR amplification of the HCV E2 gene
PCR product size
Sequence analysis, homology modeling and stereochemical analysis
HCV E2 sequence of 3a genotype (GQ355940.1) was used to develop three-dimensional structure of E2 protein through homology modeling because crystal or NMR structure of the HCV E2 protein was not available in Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do). Different parameters of primary structure analysis were computed using ProtParam online tool . The secondary structure of the protein was computed using different servers. DiANNA tool  was used to check the system classification and disulfide connectivity. This knowledge can be helpful in understanding the secondary structure of the protein since disulfide bond bridges are important in protein fold stabilization. The 3D model was generated using the I-TASSER online server  which generates 3D models along with their confidence score (C-Score). After generating 3D model, structure analysis and stereochemical analysis were performed using different evaluation and validation tools. The Psi/Phi Ramachandran plot was obtained using PROCHECK  which helped in evaluating backbone conformation. Ramachandran plot was also used to check non-GLY residues at the disallowed regions. Quality of the model was assured using Z - scores, which is indicative of overall model quality and to assure that the predicted structure is within range of score as found in native proteins. PROSA web tool  was used to determine Z-scores. Furthermore, the generated model was submitted in the Protein model database (PMDB) (http://mi.caspur.it/PMDB/main.php) having PMDB identifier PM0078776.
T-cell epitope and B-cell epitope prediction
Transmembrane topology of the E2 protein was checked using TMHMM online tool  and antigenicity of protein was checked using Vexigen v2.0 online antigen prediction server . T-cell epitopes were predicted using Epijen v1.0  online server using HLA Alleles A*0101, A*0201, A*0202, A*0203, A*0206, A*0301, A*1101, B*07, B*51. Proteasome cutoff was set to a value of 0.1. TAP prediction cutoff was set to 5 and output cut off threshold was set to a 5%. Transmembrane localization of epitopes with minimum IC50 value was checked and epitopes that were present in transmembrane/exo-membrane region were selected and checked for potential antigen or not. Only epitopes that were in transmembrane/Exo-membrane region and have a potential antigenicity score were subjected to conservancy analysis. Furthermore, B-cell epitopes were predicted using BCPred  online server with 75% specificity criteria for epitope prediction. Epitopes exposed on the surface of the membrane were checked for their antigenecity using Vexijen v2. 0 online server. Both T-cell and B-cell epitopes were analyzed for their conservancy among all retrieved sequences of E2 belonging to genotype 3a and 1a. For this purpose, the IEDB Epitope conservancy analysis server  was utilized.
Conservation of epitopes
The degree of conservation of amino acid depicts there structural and functional importance. For predicting effective and conserved peptides, E2 protein sequences belonging to HCV 3a and 1a genotype were retrieved from the NCBI protein database (Additional file 1) and were compared with the E2 sequence of Pakistan. Conservation and variation analysis of the HCV E2 was carried out through the IEDB conservancy analysis tool. As the HCV E2 protein is important in viral entry and highly variable, therefore it is important to identify conserved epitopes that can serve as best targets for potential inhibitors and vaccine.
Cloning and confirmation of HCV 3a E2 in pCR2.1 vector
Structural description of the 3D model
Predicted disulfide bonds
46 - 170
RTARNCNESIK - GRWFGCTSMNS
69 - 126
FKLTGCPQRLS - CGPGYCFTPSP
76 - 208
QRLSSCKPITF - FCPTDCFRKHP
112 - 204
YAPRPCDTVKQ - GRELFCPTDCF
121 - 187
KQPTVCGPGYC - CGGPPCDIYGG
182 - 355
GFVKTCGGPPC - CHPRVCVALWL
220 - 300
ATYSRCGSGPW - LAILPCSFTPM
243 - 350
LWHYPCTVNFT - VFLLLCHPRVC
267 - 275
RFTAACNWTRG - TRGERCDIEDR
The TMHMM online server showed that residues 1–340 presented outside region, residues 341–363 were within the transmembrane and residues 364–365 were inside the region of the protein. Vexijen v2. 0 showed an overall antigenic score of 0.4653.
B-cell epitope prediction
B-cell epitopes with their antigenic score
T-cell epitope prediction
T-cell epitopes on the basis of minimum IC50 value and antigenic score
Predicted IC50value (nM)
Conservation of epitopes
Conserved B-cell and T-cell epitopes
T-Cell (A*0201, A*0202)
T-Cell (A*0201, A*0202, A*0203)
Advancements in biotechnology and knowledge of immune responses have opened new doors for vaccine development and implementation. Discovery of vaccine using genetic information through in-silico approach rather than in-vitro study is called as reverse vaccinology . Reverse Vaccinology takes advantage of the genome sequence of the pathogen. This approach helps in identifying all antigens of pathogens and also allows the discovery of novel antigens . Gene sequences of viral pathogens have been used to develop synthetic peptides, used for vaccines against chronic infections such as Hepatitis B, Hepatitis C and HIV. Peptide based vaccines have shown efficacy in clinical trials and this efficacy correlate with the induction of the T cell-specific immunity. Bioinformatics resources, store and organize immune reactivity and pathogen data and availability of genomic sequences of pathogens can provide new information on cancer-specific epitopes and increase our knowledge to design novel peptide vaccines . Many vaccines that were impossible to develop have now become a reality . The most common HCV genotype in Pakistan is 3a while 1a is common in Balochistan with a strong correlation between chronic HCV infection (genotype 3a) and HCC in Pakistan .
This study was designed to perform in-silico analysis of the HCV E2 protein isolated in Pakistan. For this purpose, different sequence and structure analysis tools were used to explore the insights of the HCV E2 and to compare the HCV E2 sequence of Pakistan with other Pakistani E2 sequences. There is currently no high-resolution structure of the HCV E2 glycoprotein to further understand its mechanism of viral entry or immune evasion . We used a homology modeling approach to predict the 3D structure of the HCV E2 protein of Pakistan. The predicted 3D structure will provide more insight in understanding the structure and function of the protein. Moreover, this structure can be used for drug designing or understanding the interactions between proteins. As a part of the present study, we predicted conserved T-cell and B-cell epitopes that can be used as the target for vaccine development against HCV genotype 3a and 1a. Among all the predicted B-cell epitopes, 12 epitopes were found to be antigenically effective and all these epitopes were in the exo-membrane region of the protein. After conservation analysis it was found that only 3 epitopes were conserved with other E2 sequences of genotype 3a and 1a. For T-cell epitope mapping, the Epijen online server was used. A total of 25 epitopes with minimum IC50 value were selected and 9 were found to be antigenically effective, but only 3 T-cell epitopes were found to be well conserved in E2 sequences of genotype 3a and 1a.
Multiple antigenic components of the virus can be a important target to develop effective vaccines, thus directing the immune system to protect the host from the virus. In Pakistan, genotype 3a is the most prevalent genotype followed by 3b and 1a. Keeping this in mind, this study was conducted to perform sequence, structure, and conservation/variation analysis along with homology modeling of the HCV E2 protein of Pakistani origin. This study revealed B-cell and T-cell epitopes that are conserved in 3a and 1a E2 protein of HCV. For diagnosing HCV genotype 3a and 1a, these conserved epitopes may be highly useful and may also help in developing a successful vaccine that can target both 3a and 1a genotypes.
Sobia Idrees (MPhil student), Usman A Ashfaq (PhD molecular Biolog), Saba Khaliq (PhD molecular biology).
- Raja NS, Janjua KA: Epidemiology of hepatitis C virus infection in Pakistan. J Microbiol Immunol Infect. 2008, 41 (1): 4-8.PubMedGoogle Scholar
- Idrees M, Riazuddin S: Frequency distribution of hepatitis C virus genotypes in different geographical regions of Pakistan and their possible routes of transmission. BMC Infect Dis. 2008, 8: 69-10.1186/1471-2334-8-69.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashfaq UA, Javed T, Rehman S, Nawaz Z, Riazuddin S:An overview of HCV molecular biology, replication and immune responses. Virol J. 2011, 8: 161-10.1186/1743-422X-8-161.PubMed CentralView ArticlePubMedGoogle Scholar
- Idrees S, Ashfaq UA, Idrees N: Development of global consensus sequence of HCV glycoproteins involved in viral entry. Theor Biol Med Model. 2013, 10 (1): 24-10.1186/1742-4682-10-24.PubMed CentralView ArticlePubMedGoogle Scholar
- Bartosch B, Dubuisson J, Cosset FL: Infectious hepatitis C virus pseudo-particles containing functional E1-E2 envelope protein complexes. J Exp Med. 2003, 197 (5): 633-642. 10.1084/jem.20021756.PubMed CentralView ArticlePubMedGoogle Scholar
- Nielsen SU, Bassendine MF, Burt AD, Bevitt DJ, Toms GL: Characterization of the genome and structural proteins of hepatitis C virus resolved from infected human liver. J Gen Virol. 2004, 85 (Pt 6): 1497-1507.PubMed CentralView ArticlePubMedGoogle Scholar
- Deleersnyder V, Pillez A, Wychowski C, Blight K, Xu J, Hahn YS, Rice CM, Dubuisson J: Formation of native hepatitis C virus glycoprotein complexes. J Virol. 1997, 71 (1): 697-704.PubMed CentralPubMedGoogle Scholar
- Weiner AJ, Christopherson C, Hall JE, Bonino F, Saracco G, Brunetto MR, Crawford K, Marion CD, Crawford KA, Venkatakrishna S: Sequence variation in hepatitis C viral isolates. J Hepatol. 1991, 13 (Suppl 4): S6-14.View ArticlePubMedGoogle Scholar
- Goffard A, Callens N, Bartosch B, Wychowski C, Cosset FL, Montpellier C, Dubuisson J: Role of N-linked glycans in the functions of hepatitis C virus envelope glycoproteins. J Virol. 2005, 79 (13): 8400-8409. 10.1128/JVI.79.13.8400-8409.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashfaq UA, Masoud MS, Nawaz Z, Riazuddin S: Glycyrrhizin as antiviral agent against Hepatitis C Virus. J Transl Med. 2011, 9: 112-10.1186/1479-5876-9-112.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashfaq UA, Qasim M, Yousaf MZ, Awan MT, Jahan S: Inhibition of HCV 3a genotype entry through host CD81 and HCV E2 antibodies. J Transl Med. 2011, 9: 194-10.1186/1479-5876-9-194.PubMed CentralView ArticlePubMedGoogle Scholar
- Flint M, McKeating JA: The role of the hepatitis C virus glycoproteins in infection. Rev Med Virol. 2000, 10 (2): 101-117. 10.1002/(SICI)1099-1654(200003/04)10:2<101::AID-RMV268>3.0.CO;2-W.View ArticlePubMedGoogle Scholar
- Rosa D, Campagnoli S, Moretto C, Guenzi E, Cousens L, Chin M, Dong C, Weiner AJ, Lau JY, Choo QL: A quantitative test to estimate neutralizing antibodies to the hepatitis C virus: cytofluorimetric assessment of envelope glycoprotein 2 binding to target cells. Proc Natl Acad Sci U S A. 1996, 93 (5): 1759-1763. 10.1073/pnas.93.5.1759.PubMed CentralView ArticlePubMedGoogle Scholar
- Scarselli E, Ansuini H, Cerino R, Roccasecca RM, Acali S, Filocamo G, Traboni C, Nicosia A, Cortese R, Vitelli A: The human scavenger receptor class B type I is a novel candidate receptor for the hepatitis C virus. EMBO J. 2002, 21 (19): 5017-5025. 10.1093/emboj/cdf529.PubMed CentralView ArticlePubMedGoogle Scholar
- Mazzocca A, Sciammetta SC, Carloni V, Cosmi L, Annunziato F, Harada T, Abrignani S, Pinzani M: Binding of hepatitis C virus envelope protein E2 to CD81 up-regulates matrix metalloproteinase-2 in human hepatic stellate cells. J Biol Chem. 2005, 280 (12): 11329-11339. 10.1074/jbc.M410161200.View ArticlePubMedGoogle Scholar
- Gardner JP, Durso RJ, Arrigale RR, Donovan GP, Maddon PJ, Dragic T, Olson WC: L-SIGN (CD 209L) is a liver-specific capture receptor for hepatitis C virus. Proc Natl Acad Sci U S A. 2003, 100 (8): 4498-4503. 10.1073/pnas.0831128100.PubMed CentralView ArticlePubMedGoogle Scholar
- Helle F, Dubuisson J: Hepatitis C virus entry into host cells. Cell Mol Life Sci. 2008, 65: 100-112. 10.1007/s00018-007-7291-8.View ArticlePubMedGoogle Scholar
- Monazahian M: I IB, Bonk S, Koch A, Scholz C, Grethe S, Thomssen R: Low density lipoprotein receptor as a candidate receptor for hepatitis C virus. J Med Virol. 1999, 999 (57): 223-229.View ArticleGoogle Scholar
- Pileri P, Uematsu Y, Campagnoli S, Galli G, Falugi F, Petracca R, Weiner A, Houghton M, Rosa D, Grandi G: Binding of hepatitis C virus to CD81. Science. 1998, 282: 938-941.View ArticlePubMedGoogle Scholar
- Ashfaq UA, Khan SN, Nawaz Z, Riazuddin S:In-vitro model systems to study Hepatitis C Virus. Genet Vaccines Ther. 2011, 9: 1-7. 10.1186/1479-0556-9-1.View ArticleGoogle Scholar
- Fournillier A, Wychowski C, Boucreux D, Baumert TF, Meunier JC, Jacobs D, Muguet S, Depla E, Inchauspe G: Induction of hepatitis C virus E1 envelope protein-specific immune response can be enhanced by mutation of N-glycosylation sites. J Virol. 2001, 75 (24): 12088-12097. 10.1128/JVI.75.24.12088-12097.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Idrees S, Ashfaq UA: Structural analysis and epitope prediction of HCV E1 protein isolated in Pakistan: an in-silico approach. Virol J. 2013, 10 (1): 113-10.1186/1743-422X-10-113.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilkins MR, Gasteiger E, Bairoch A, Sanchez JC, Williams KL, Appel RD, Hochstrasser DF: Protein identification and analysis tools in the ExPASy server. Methods Mol Biol. 1999, 112: 531-552.PubMedGoogle Scholar
- Ferre F, Clote P: DiANNA: a web server for disulfide connectivity prediction. Nucleic Acids Res. 2005, 33 (Web Server issue): W230-232.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y: I-TASSER server for protein 3D structure prediction. BMC Bioinformatics. 2008, 9: 40-10.1186/1471-2105-9-40.PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK - a program to check the stereochemical quality of protein structures. J App Cryst. 1993, 26 (2): 283-291. 10.1107/S0021889892009944.View ArticleGoogle Scholar
- Wiederstein M, Sippl MJ: ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007, 35 (Web Server issue): W407-410.PubMed CentralView ArticlePubMedGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.View ArticlePubMedGoogle Scholar
- Doytchinova IA, Flower DR: VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics. 2007, 8: 4-10.1186/1471-2105-8-4.PubMed CentralView ArticlePubMedGoogle Scholar
- Doytchinova IA, Guan P, Flower DR: EpiJen: a server for multistep T cell epitope prediction. BMC Bioinformatics. 2006, 7: 131-10.1186/1471-2105-7-131.PubMed CentralView ArticlePubMedGoogle Scholar
- EL-Manzalawy Y, Dobbs D, Honavar V: Prediction of linear B-cell epitopes using string kernels. J Mol Recognit. 2008, 21 (4): 243-255. 10.1002/jmr.893.PubMed CentralView ArticlePubMedGoogle Scholar
- Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B: The immune epitope database 2.0. Nucleic Acids Res. 2010, 38 (Database issue): D854-862.PubMed CentralView ArticlePubMedGoogle Scholar
- Rappuoli R: Reverse vaccinology. Curr Opin Microbiol. 2000, 3 (5): 445-450. 10.1016/S1369-5274(00)00119-3.View ArticlePubMedGoogle Scholar
- Raju S, RAO UM: Current development strategies for vaccines and the role of reverse vaccinology. JPRHC. 2010, 2 (4): 339-346.Google Scholar
- Sette A, Rappuoli R: Reverse Vaccinology: Developing Vaccines in the Era of Genomics. Immunity. 2010, 33 (4): 530-541. 10.1016/j.immuni.2010.09.017.PubMed CentralView ArticlePubMedGoogle Scholar
- McCaffrey K: There is currently no high-resolution structure of the HCV E2 glycoprotein to further understand its mechanism of viral entry or immune evasion. 2010, Melbourne: The University of MelbourneGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.