Energetics based epitope screening in SARS CoV-2 (COVID 19) spike glycoprotein by Immuno-informatic analysis aiming to a suitable vaccine development

The recent outbreak by SARS-CoV-2 has generated a chaos in global health and economy and claimed/infected a large number of lives. Closely resembling with SARS CoV, the present strain has manifested exceptionally higher degree of spreadability, virulence and stability possibly due to some unidentified mutations. The viral spike glycoprotein is very likely to interact with host Angiotensin-Converting Enzyme 2 (ACE2) and transmits its genetic materials and hijacks host machinery with extreme fidelity for self propagation. Few attempts have been made to develop a suitable vaccine or ACE2 blocker or virus-receptor inhibitor within this short period of time. Here, attempt was taken to develop some therapeutic and vaccination strategies with a comparison of spike glycoproteins among SARS-CoV, MERS-CoV and the SARS-CoV-2. We verified their structure quality (SWISS-MODEL, Phyre2, Pymol) topology (ProFunc), motifs (MEME Suite, GLAM2Scan), gene ontology based conserved domain (InterPro database) and screened several epitopes (SVMTrip) of SARS CoV-2 based on their energetics, IC50 and antigenicity with regard to their possible glycosylation and MHC/paratopic binding (Vaxigen v2.0, HawkDock, ZDOCK Server) effects. We screened here few pairs of spike protein epitopic regions and selected their energetic, IC50, MHC II reactivity and found some of those to be very good target for vaccination. A possible role of glycosylation on epitopic region showed profound effects on epitopic recognition. The present work might be helpful for the urgent development of a suitable vaccination regimen against SARS CoV-2.


Introduction
An outbreak of a novel Coronavirus, Severe Acute Respiratory Syndrome CoV-2 (or SARS CoV-2 or COVID- 19) infection is threatening the humanity, globally occurring from last week of December 2019. As a result, a massive loss of human health status and global economy are becoming unaccountable. As of current situation, SARS CoV-2 claimed more than 40,000 lives from more than 800,000 infected persons globally [1]. The outbreak started from the Wuhan province of China and spread more than 195 countries with most adverse effects in China, Italy, Iran, Spain, the United States, France, Germany, Britain and several other countries.
Any type of therapeutic strategies starting from the blocking of viral entry, inhibition of spike proteins association with host ACE-2 (angiotensin converting enzyme type 2), modulations of interfering kinase activity, inactivation of viral genome expression-packaging and vaccination against this virus is the demand of the present situation. Regarding the vaccination strategies, it is assumed that frequent mutation results in anomalies in its surface/ spike proteins [2,3]. Mostly resembling the features of SARS CoV global outbreak (2003, https://www.who.int/csr/sars/en/) , this virus unlikely manifested it's extremely high grade of virulence, spreading capability and stability across the geographical barrier (or specifically colder place, aged persons or specific genders; yet to be clarified) [4].
The positive selective pressure could account for the stability and some clinical features of this virus compared with SARS and Bat SARS-like CoV [5]. Stabilizing mutation falling in the endosome-associated-protein-like domain of the nsp2 protein could account for COVID-2019 high ability of contagious, while the destabilizing mutation in nsp3 proteins could suggest a potential mechanism differentiating COVID-2019 from SARS CoV [5]. Nevertheless, nutritional and immunological statuses are also important factors for the screening of the therapeutic strategies for the affected and sensitive persons. Possible medications or immunizations from the existing drugs or infusion of convalescent plasma should be conducted with utmost care to the COVID 19 patients [6]. Advanced precautionary steps and therapeutic interventions should be formulated taking into account of several personal and community factors [7]. Development of a successful and reproducible vaccination protocol and its human trial may take longer time for the issues of mutation and large number glycan shield and epitope masking on the SARS CoV 2 proteins [8].
In a series of medication regimen, 1 (AT1R) blockers is used for reducing the severity and mortality from SARS-CoV-2 virus infections [9]. Chloroquine and Hydroxychloroquine are now being prescribed somewhere to fight COVID-19 for the time being [10,11]. Human coronaviruses and other influenza viruses resulted in epidemic in last 2 decade in different parts of the world. The anomalies between severity and spreading between the origin site, China and the other parts of the World (European and North America countries) might have some indication. Common human CoVs may have annual peaks of circulation in winter months in the US, and individual human CoVs may show variable circulation from year to year. [12] Colder climate and prior exposure to other human coronaviruses, or influenza or flu viruses or possible vaccination against those might develop antibody dependent enhancement (ADE) of immunological responses during recent SARS CoV-2 exposure. ADE might have modulated immune response and could elicit sustained inflammation, lymphopenia, and/or cytokine storm [13,14]. Possibly, that could be one of the reasons (more history of exposure with CoVs beside weaker immune system) for older people being more affected by the present SARS CoV-2. Moreover, both helper T cells and suppressor T cells in patients with COVID-19 were below normal levels. The novel coronavirus might mainly act on lymphocytes, especially T lymphocytes [15]. Strong inflammatory events could be the initiator of the collapsing environment during COVID-19 infection. In most of the death cases in COVID-19 infections, acute respiratory failure is followed by other organs like kidney anomalies. In these cases inflammatory outburst might have worsened the infection and post viral-incubation situations [16,17]. Recent studies in experimentally infected animal strongly suggest a crucial role for virus-induced immunopathological events in causing fatal pneumonia after human CoV infections [18]. So, combined anti-viral and anti-inflammatory treatment might be beneficial in these cases [19]. SARS-based available immune-therapeutic and prophylactic modalities revealed poor efficacy to neutralize and protect from infection by targeting the novel spike protein. [20].
In this background, critical screening of the spike sequence and structure from SARS CoV-2 by energetic and IC50 based immunoinfrmatics analysis may help to develop a suitable vaccine. So, in the current study we were intended to analyze the spike proteins of SARS CoV, MERS CoV and SARS CoV 2 and four other earlier out-breaking human corona virus strains.
We critically compared SARS CoV and SARS CoV 2 spike-proteins, domains, motifs and screened several epitopes based on their energetics, IC50 and antigenicity employing several bio/immuuno -informatics software with regard to their possible glycosylation and MHC/paratopic binding effects. The present work might be helpful for the urgent development of a suitable vaccination regimen.

Structure Prediction and Structure Quality Assessment
Tertiary structures of selected coronavirus (CoV) spike proteins were predicted/ validated using Phyre2, Protein Homology/analogy Recognition Engine V 2.0 [22] and SWISS-MODEL [23]. In Phyre2 structures were predicted against 100,000 experimentally designed protein folds.
Predicted structures were subjected to analysis in SWISS-MODEL for QMEAN Z-score calculation which includes cumulative Z-score of Cβ, All atoms, Solvation and Torsion values prediction. RAMPAGE: Ramachandran Plot Analysis server [24] was used for protein 3D structures quality assessment. The summation of number of residues in favored regions and in additionally allowed regions was considered for percent (%) quality assessment.

Protein Structural Alignment
Predicted tertiary structures were visualized and aligned using PyMol molecular visualization system. Pymol assigns the secondary structure using a secondary structure alignment algorithm called "dss", where the sequences of two structures were aligned first then the structures were aligned. For the visualization of molecules a high-speed ray-tracer molecular graphics system was used.

Secondary structure analysis
Secondary structural analysis and their 3D folding patterns were analyzed in the form of topology using ProFunc; a protein function predicting server using protein 3D structures [25]. In protein classification, topology analysis plays an independent and effective alternative to traditional structural prediction. Topological differences between two structures indicated differences in protein folding and flexibility.

Sequence Comparison
Sequence comparisons among selected CoV spike glycoproteins were conducted through multiple sequence alignment using Clustal X2 [26]. Conserved motifs were identified using MEME Suite (http://meme.sdsc.edu/meme/cgi-bin/mast.cgi) server. MEME Suite represents the ungapped conserved sequences which are frequently present in a group of related sequences. The 7 motif number has been defined in the current study for motif finding. Whereas, GLAM2Scan tools was used for the identification of gapped motifs within the related sequences. Conserved motifs were represented through LOGO using GLAM2Scan tools of MEME Suite server.

Epitope Designing
Conserved epitopes of SARS Cov-2 spike glycoprotein were identified using SVMTrip: A tool which predicts Linear Antigenic Epitopes [28]. SVMTrip predicts the linier antigenic epitopes by feeding Support Vector Machine with the Tri-peptide similarity and Propensity scores of different pre-analyzed epitope data. Annotation of predicted epitopes was performed through protein BLAST. SVMTrip have gained 80.1% sensitivity and 55.2% precision value with five fold cross-validation. For epitope prediction 20 amino acid lengths was selected.

Analysis for Epitopes binding efficiency to MHC Class II
The Major Histocompatibility Complex (MHC) binding efficiency of predicted epitops was performed using Immune Epitope Database (IEDB) and Analysis Resource [29]. A total of 5 DPA, 6 DQA and 662 DRB alleles from MHC class II were screened for the detection of best interactive alleles on the basis of highest consensus percentile rank and lowest IC50 value. All the analyses were performed on Human Class II allele, using frequently occurring alleles (frequency > 1%), peptide length of 9mers was selected; consensus percentile rank ≤ 1 was used for the selection of peptides.

Antigenecity Prediction
Antigenecity of predicted epitopes were determined using Vaxigen v2.0 protective antigen, tumour antigens and subunit vaccines prediction server [30]. Vaxigen v2.0 uses auto cross covariance (ACC) transformation of selected protein sequences based on unique amino acid properties. Each sequence was used to find out 100 known antigen and 100 non-antigens.
The identified sequences were tested for antigenecity by leave-one-out cross-validation and overall external validation. The prediction accuracy was up to 89%.

Molecular Docking
The structure of MHC class II HLA-DRA, DRB molecule (PDB ID: 2q6w, 5jlz) and fully glycosylated COVID 19 spike protein structure (PDB ID: 6svb) was retrieved from Protein Data Bank (PDB) and Docking was performed using HawkDock [31] and ZDOCK [32] Server generating 100 docking solutions. Among them best 10 were analyzed based on docking scores and binding free energy value calculation.

Structure Prediction and Structure Quality Assessment
The Higher degree of similarity between SARS CoV and SARS CoV-2 might indicate and help in the therapeutic and vaccination strategies with reference to the current global situation.
However, an absolute higher degree of virulence and spreading nature of SARS CoV-2 is of great concern in the present scenario.

Conserved Motif Identification
Based on the alignment pattern, selected sequences were subjected to analysis different conserved motifs in the protein sequences. A total of 7 conserved motifs were analyzed (

COVID 19 specific Epitope Designing
The epitope designing was conducted only with the COVID 19 spike glycoprotein sequence and structure. From the sequence analysis, 10 different locations were found which also showed similarity with SARS CoV and SARS COVID 2 spike glycoprotein in protein BLAST (Figure 4). Also the motif positions within the spike glycoprotein monomer were represented in figure 4. Epitopes 1, 4 and 5 were not represented in COVID 19 spike glycoproteins, as they were found to be embedded within the virus envelop. Among the others, motifs 2, 3, 6 and 7 were found at the interior location of spike glycoprotein monomer but motifs 8, 9 and 10 were found at the surface of the structure, which could be used as the immunological targets for the proper diagnosis and treatment of COVID 19. The important issue of epitope finalization could be confronted by the factor of possible transition between pre-fusion and postfusion spike structural distortion. Specific mutant structure has been designed and tested to be resistant to conformational change after ACE2 binding and protease cleavage at the S1/S2 site [34]. This may be indicative to searching suitable epitope which may remain unhindered from pre-to post-fusion state transition.

Analysis of epitope binding to specific MHC Class II
The proper type of Major Histocompatibility Complex (MHC) selection for identified COVID 19 epitope was performed and enlisted in  Table 3). The threshold value of highest consensus percentile rank was selected as 10 for all. As a whole, highest Consensus percentile rank value of 10 was observed for sequence QQLIRAAEIRASANL (epitope 3A) and lowest IC50 value of 7.11 was observed for sequence IIAYTMSLGAENSVA (epitope 8B).
The antigenic property of identified target sequences from epitopes was also predicted on the basis of threshold value of 0.4. Below the threshold value, the sequence has been considered as non-antigenic and sequences with above value were antigenic in nature. A total of 9 antigenic sequences were detected (Table 3), among them two sequences AAEIRASANLAATKM (epitope 3B) and ITPGTNTSNQVAVLY (epitope 10B) were found with higher threshold value of 0.7125 and 0.7193 respectively. As the location of 3B was more interior 10B could be used as potent antigen. According to epitope locations ( Figure 4) and antigenic nature, other sequences like 8A&B, 9A&B could be the target also.

Glycosylation and structural modification
Coronavirus spike proteins are glycosylated in nature where N-acetyl glucosamine (NAG) is the main component. Glycan shielding and possible epitope masking of an HCoV-NL63 has been observed which may be the barrier for proper immunogenic responses [8].
Comparative analysis between glycosylated and non-glycosylated protein revealed some structural modification at the epitope locations. Among the identified epitopes 10B with sequence ITPGTNTSNQVAVLY (598-612) was found with N linked glycosylation at 603 position. The structural modification of this epitope was analyzed using non-glycosylated protein structure of COVID 19 (Acc. No.: NC_045512.2:21563-25384) and gltcosylated COVID 19 protein (PDB ID: 6vsb). Effect of glycosylation on protein structures revealed that glycosylated conformation was more organised (Figure 5a) than non-glycosylated one (Figure 5b). Secondary structural comparison between two epitopes showed more organised structure with attached NAG residue (Figure 5d) whereas a shorter β-sheet structure was observed when NAG is removed from the structure (Figure 5c). The peptide interactive site of 10B epitope was blocked due to NAG attachment. As a result of which antibody binding to the antigen may hamper. The NAG residue directly binds with N or ASN amino acid residue (Figure 5e). So the removal of NAG from the spike glycoprotein structure is difficult. Structural distortion between glycosylated and non-glycosylated epitope 10B at tertiary level indicated that removal of NAG may distort the structure of epitope (Figure 5f). Again that may hamper the proper antigenantibody binding.

Effect of epitope glycosylation on MHC class IIepitope binding
In this section energetics of epitope attachment with MHC class II HLA-DRA, DRB was determined in presence and absence of NAG at the 10B epitope structure through molecular docking ( Figure 6). Docking results showed that without NAG, the binding efficiency of 10B Though, epitope 8 A & B were also present at the surface of the spike glycoprotein but was found to wrapped with a short segment IGAEHVNNSYECD (651-663) carrying a glycosylation at N residue position 657 (Figure 7a). As a result of which antibody accessibility to this epitope may also be difficult. Whereas, surface epitope 9 with sequence VRDPQTLEILDITPC (576-590) showed highest antigenecity of 1.1285 (Table 3)  can recombine to gain entry into human cells are the points also to be noted [35]. SARS CoV-2 induced severe and often lethal lung failure is caused due to its inhibition of ACE-2 expression [36]. So, keeping the ACE-2 normal functioning but blocking viral entry is the most challenging issue right now. Possible suitable epitope as screened in the current study may be helpful in this global pandemic situation. The history of last two decades' outbreak of these types of virus is very much evident. The present situation justifies further advanced studies with proper infrastructure and fund-resources facilities at a global scale to eradicate current or any possible future outbreak.         Position of Epitope 9 in COVID 19 spike protein with no direct or indirect NAG attachment, PDB ID: 6svb (b). Among 10 best docking positions, 8 were found in epitope presenting site of MHC II HLA-DRB1, PDB ID: 5jlz (c). The best docking posture of epitope 9 with MHC II HLA-DRB1 (d) and its specific interaction pattern (e).