Genetic variability of the core protein in hepatitis C virus genotype 4 in Saudi Arabian patients and its implication on pegylated interferon and ribavirin therapy

Background Hepatitis C virus (HCV) shows a remarkable genetic diversity, contributing to its high persistence and varied susceptibilities to antiviral treatment. Previous studies have reported that the substitution of amino acids in the HCV subgenotype 1b core protein in infected patients is associated with a poor response to pegylated interferon and ribavirin (PEG-IFN/RBV) combined therapy. Objectives Because the role of the core protein in HCV genotype 4 infections is unclear, we aimed in this study to compare the full-length core protein sequences of HCV genotype 4 between Saudi patients who responded (SVR) and did not respond (non-SVR) to PEG-IFN/RBV therapy. Study design Direct sequencing of the full-length core protein and bioinformatics sequence analysis were utilized. Results Our data revealed that there is a significant association between core protein mutations, particularly at position 70 (Arg70Gln), and treatment outcome in HCV subgenotype 4d patients. However, HCV subgenotype 4a showed no significant association between core protein mutations and treatment outcome. In addition, amino acid residue at position 91 was well-conserved among studied patients where Cys91 is the dominant amino acid residue. Conclusions These findings provide a new insight into HCV genotype 4 among affected Saudi population where the knowledge of HCV core gene polymorphisms is inadequate.


Background
Hepatitis C virus (HCV) infects more than 170 million people worldwide leading to chronic hepatitis, cirrhosis and hepatocellular carcinoma [1]. HCV belongs to the family Flaviviridae and is a member of hepacivirus genus. It is classified into seven genotypes and numerous subtypes [2,3]. HCV has a single-stranded RNA that encodes a polyprotein which subsequently gets cleaved into number of structural and non-structural proteins. Although the function of each protein has been intensively studied, the point mutations that occur in various positions and cause antiviral drug resistance are largely unknown. Therefore, the study of variation at the nucleotide sequence of HCV, core protein in particular, from different geographical region is important to understand its prevalence in the world as well as its clinical management.
Recently, advances in HCV treatment have led to the development of many direct-acting antiviral (DAA) agents.
Early this year, the U.S. Food and Drug Administration (FDA) has approved a new therapy (simeprevir) to treat chronic HCV infection [4]. However, the standard treatment for chronic hepatitis C infection in the developing countries is pegylated interferon (PEG-IFN) plus ribavirin (RBV) where the expected outcome of the treatment is to attain a sustained virological response (SVR) [5]. There are serious side-effects and high medical cost that are associated with PEG-IFN/RBV treatment. As a result, it is important to predict the response to therapy for each individual patient beforehand. Previous studies have shown that the sequence polymorphisms within viral proteins, such as core protein, correlate with IFN-based treatment outcome. For example, substitutions of amino acid 70 and/ or 91 in HCV subgenotype 1b core protein are predictors of poor response to PEG-IFN/RBV treatment [6,7]. The clinical advantage of predicting SVR to PEG-IFN/RBV in patients is that patients with Arg 70 /Lue 91 residues ought to continue the treatment course with predicted positive response. However, in patients who have mutated residues in the core region (Gln 70 /Met 91 ) would be advised to withdraw from the treatment to avoid unnecessary side-effects. Indeed, if a correlation between HCV core gene mutation(s) and treatment outcome is established, then HCV sequencing can become a noninvasive and economical tool to assess an individual status and response to a treatment.
Although HCV genotype 4 is the cause of approximately 20% of HCV infection worldwide, it is poorly studied [8]. Furthermore, there are limited studies and low informative data from patients in Saudi Arabia who are infected with HCV genotype 4. The aim of this study is to analyze the core protein of HCV genotype 4 from Saudi patient isolates and investigate the association between core protein sequence variations and treatment outcome.

Study patients and treatment regimens
The study protocol was approved by the local ethics committee at King Faisal Specialist and Research Center and written informed consent was obtained from each patient. A total of 115 baseline (i.e., treatment-naïve) patients from three different hospitals (King Khalid University Hospital, King Faisal Specialist Hospital and Research Center, and Riyadh Military Hospital) in Riyadh, Saudi Arabia, were used in this study. Exclusion criteria included co-infection with hepatitis B or human immunodeficiency virus, co-existent autoimmune or metabolic liver disease, active drug-induced hepatitis, decompensated cirrhosis, evidence of severe retinopathy, neoplastic disease, coronary artery or cerebrovascular disease, history of clinically relevant psychiatric disease. The complete treatment protocol used for these patients was previously published [9]. HCV RNA extraction, genotyping and subgenotyping were determined using previously described methods [10]. Herein, we presented the most dominant subgenotypes of HCV genotype 4 that are HCV-4d and HCV-4a in each group (SVR and non-SVR). Due to limited sample size, we excluded 4r, 4n and 4o from data analysis.

HCV sequence alignment and primer design
Complete genome sequences of HCV from different geographical regions were retrieved from the GenBank database (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Multiple sequence alignment of the retrieved sequences was performed using ClustalW module of MegAlign software (DNASTAR, Inc.,) and the consensus sequence was used to design degenerate primers for the core region. Primer sequences and positions are as follows: Forward: 5' TGCTAGCCGAGTAGTGTTGG 3' (positions 246-268) Reverse: 5' CCARTTCATCATCATRTCCCA 3' (position 1298-1318) and the amplicon size is 1045 bp.

Polymerase chain reaction (PCR)
All PCR mixtures had a total volume of 25 μl that contained 1 μl of HCV cDNA, 12.5 μl of GoTaq® Green Master Mix (Promega, Madison, USA), 1 μM of forward and reverse primers, and sterile nuclease-free water. In addition, appropriate positive and negative controls were employed. PCR conditions were as follows: 2 min an initial denaturing step at 95°C, followed by 35 cycles of 30 sec denaturing step at 95°C, 1 min of annealing step at 56°C, and 1 min of extending step at 72°C. A final extension at 72°C for 5 min was performed. PCR amplicons were visualized on a 1.5% agarose gel and stained with ethidium bromide. The positive amplicons were processed further for PCR sequencing using ABI3730XL sequencer (Applied Biosystems, Foster City, CA). To confirm positive results, nucleotide sequences were blasted against NCBI database.

Data analysis and statistics
Sequence chromatograms of 115 full-length core gene sequences were aligned and edited using the Lasergene suite for sequence analysis (DNASTAR, Inc.,) [11]. Nucleotide (573 bp) and amino acid (191 aa) sequences from different patient isolates were aligned using ClustalX module (MegAlign, DNASTAR, Inc.). Full-length core gene sequences of HCV genotype 4 were retrieved from GenBank and used in this study as references. BioEdit program was used to visually display the full-length core protein with genotype corresponding references [12]. In addition, phylogenetic tree was constructed using HCV genotype 4 patient sequences (all subgenotypes were included) and 20 random sequence references. The neighborjoining method with a bootstrap value of 1,000 replications was employed in constructing the tree using Mega 5.0 software [13].
Further, detecting the most statistically significant differences between the responders and non-responders groups was done using the Viral Epidemiology Signature Pattern Analysis (VESPA) tool, provided by HCV sequence database [14]. Numerical data were analyzed by Student's t test using STATA IC/13 software (StataCorpLP, Houston, USA) where a P value of <0.05 was considered statistically significant.

Results
Response to PEG-IFN/RBV therapy One hundred and fifteen (115) patients with chronic HCV genotype 4 were enrolled in this study. The patients' clinical characteristics are presented in Table 1. Notably, there was no significant association between the response to treatment and age, gender, weight, liver enzymes, HCV viral load, disease stage, and grade. However, there is a significant association between subgenotypes and treatment response. Indeed, SVR rate in HCV-4a is 58% while the SVR rate in HCV-4d is lower (35%) (P value = 0.02). Twenty four weeks after the completion of 48 weeks of PEG-IFN/RBV combined treatment; patients were tested and then divided to responders (i.e., SVR (48%)) and nonresponders (i.e., non-SVR (51%)). HCV genotype 4 mean genetic distance was calculated between SVR and non-SVR patients ( Table 2). All reported sequences in this study were deposited in the GenBank and were assigned the following accession numbers (KC143812 -KC143908).

Phylogenetic analysis of SVR and non-SVR patients
Phylogenetic analysis of core sequences provides information about overall relatedness of core gene among HCV genotype 4 isolates. One hundred fifteen sequences of HCV-4 core gene from SVR and non-SVR groups were used to construct the tree (Figure 1). HCV-4 sequences showed no clustering based on response to the treatment but rather they clustered to the respective subgenotypes correctly (i.e. HCV-4a and -4d).
Multiple sequence alignment of the core protein There was a significant correlation between HCV-4d core

Patterns discovery and recognition
Positional variations of the core protein were compared using Viral Epidemiology Signature Pattern Analysis (VESPA). Results revealed that the variations in HCV 4a SVR and non-SVR patients are not statistically significant ( Figure 4A), while, the signature pattern analysis of HCV 4d SVR and non-SVR was statistically significant at position 70 (Arg 70 Gln) (P value < 0.05) ( Figure 4B).

Discussion
The HCV core gene is the genetic region that encodes for the viral nucleocapsid protein. It consists of 191 amino acid residues that are divided into three domains, an N-terminal hydrophilic domain (D1, residues 1-117), a C-terminal hydrophobic domain (D2, residues 118-170), and the last 21 amino acids that serve as signal peptide for the downstream envelope protein E1 [15,16]. It has been shown that the core protein is associated with number of cellular proteins and pathways that have direct effect on HCV lifecycle and biology [17]. Also, HCV core protein has been suggested to have a role on antiviral activity of IFN inhibition through interaction with the cellular protein, STAT1 [18]. Therefore, mutations in this protein have the potential to alter the viral structure leading to unexpected functions such as poor response to PEG-IFN/ RBV therapy. Previous studies have shown that there is a significant correlation between mutations in the core protein and poor treatment outcome. In particular, patients who had substitutions of Arg 70 to Gln 70 and/or Leu 91 to Met 91 showed lower response to PEG-IFN/RBV combined therapy [19,20]. However, most of these studies have been conducted on Asian populations, especially Japanese patients, who were diagnosed with HCV genotype 1b. Herein, we hypothesized that the amino acid substitutions in HCV genotype 4 (subgenotypes 4a and 4d) core region could correlate with treatment outcome. HCV subgenotype 4d showed that there is a significant association between core protein mutations, particularly at position  can be classified into host and/or viral factors. Host factors include age, gender, patient body weight, ethnicity, alcohol consumption and host genetic variations. Several recent studies have shown that single nucleotide polymorphisms (SNPs) in IL-28B gene region are associated with response to combination therapy with pegylated IFN-α and ribavirin [21]. On the other hand, virus genotypes and viral load have been shown to modulate treatment outcome [6,22]. Based on previous studies, HCV genotype is the most significant factor affecting treatment responses [23]. While HCV genotype 2 and 3 have the highest rate of SVR to PEG-IFN/RBV treatment (80%), HCV genotype 1 and 4 are showing more resistance to treatment (50-60%) [24,25]. Notably, the present study revealed that the SVR rate in HCV-4a is higher (58%) than HCV-4d (35%) indicating a role of the subgenotyping in treatment response. The differences in responding to the treatment among different genotypes and subgenotypes suggest a role of the viral sequence variations. It is noteworthy that most previous studies were conducted on Asian population. Thus, further investigations are needed to explore this phenomenon in different ethnic populations.

HCV-4d Isolates
In recent studies, El-Shamy et al. has investigated 43 Egyptian patients who were infected with HCV genotype 4 (mostly subgenotype 4a) and revealed that no significant correlation between core protein amino acid substitutions at position 70 and/or 91 and treatment outcome [26]. Our finding in regard to HCV-4a is in agreement with the aforementioned report that the substitutions at positions 70 and/or 91 are not associated with antiviral resistance.
However, in HCV-4d patient isolates, our data revealed that there is a significant association between core amino acid substitutions, particularly at position 70 and treatment outcome. Phylogenetic analysis and sequence comparison showed that no clustering was observed based on treatment response but rather they grouped to the corresponding subgenotypes correctly (i.e. HCV-4a, −4d).

Conclusions
The present study revealed that HCV-4d has a point mutation at position 70 (Arg 70 Gln) that is statistically significant. However, no evidence was found in HCV-4a for the effect of core protein polymorphisms, either at position 70 and/or 91, and treatment outcome. Instead, mutations were scattered over the full-length core region with no specific association with drug resistance. Although several possibilities have been proposed to explain the effect of amino acid substitutions of core protein on treatment outcome, the exact mechanism has not been determined. Nonetheless, this study emphasizes the fact that single nucleotide mutations in the core gene could prove helpful in predicting the treatment outcome, at least in sub-genotype 4dinfcted patients.
the data, and edited the manuscript. All authors have read and approved the final manuscript.