Energetics and IC50 based epitope screening in SARS CoV-2 (COVID 19) spike protein by immunoinformatic analysis implicating for a suitable vaccine development

The recent outbreak by SARS-CoV-2 has generated a chaos in global health and economy and claimed/infected a large number of lives. Closely resembling with SARS CoV, the present strain has manifested exceptionally higher degree of spreadability, virulence and stability possibly due to some unidentified mutations. The viral spike glycoprotein is very likely to interact with host Angiotensin-Converting Enzyme 2 (ACE2) and transmits its genetic materials and hijacks host machinery with extreme fidelity for self propagation. Few attempts have been made to develop a suitable vaccine or ACE2 blocker or virus-receptor inhibitor within this short period of time. Here, attempt was taken to develop some therapeutic and vaccination strategies with a comparison of spike glycoproteins among SARS-CoV, MERS-CoV and the SARS-CoV-2. We verified their structure quality (SWISS-MODEL, Phyre2, and Pymol) topology (ProFunc), motifs (MEME Suite, GLAM2Scan), gene ontology based conserved domain (InterPro database) and screened several epitopes (SVMTrip) of SARS CoV-2 based on their energetics, IC50 and antigenicity with regard to their possible glycosylation and MHC/paratope binding (Vaxigen v2.0, HawkDock, ZDOCK Server) effects. We screened here few pairs of spike protein epitopic regions and selected their energetic, Inhibitory Concentration50 (IC50), MHC II reactivity and found some of those to be very good target for vaccination. A possible role of glycosylation on epitopic region showed profound effects on epitopic recognition. The present work might be helpful for the urgent development of a suitable vaccination regimen against SARS CoV-2.

Page 2 of 14 Banerjee et al. J Transl Med (2020) 18:281 lives from more than 60,57,853 infected persons globally [1]. The outbreak started from the Wuhan province of China and spread at about 216 countries with most adverse effects in China, Italy, Iran, Spain, the United States, France, Germany, Britain and several other countries. Any type of therapeutic strategies starting from the blocking of viral entry, inhibition of spike proteins association with host ACE-2 (angiotensin converting enzyme type 2), modulations of interfering kinase activity, inactivation of viral genome expression-packaging and vaccination against this virus is the demand of the present situation. Regarding the vaccination strategies, it is assumed that frequent mutation results in anomalies in its surface/spike proteins [2,3].
Mostly resembling the features of SARS CoV global outbreak (2003, https ://www.who.int/csr/sars/en/), this virus unlikely manifested it's extremely high grade of virulence, spreading capability and stability across the geographical barrier (or specifically colder place, aged persons or specific genders; yet to be clarified) [4]. The positive selective pressure could account for the stability and some clinical features of this virus compared with SARS and Bat SARS-like CoV [5]. Stabilizing mutation falling in the endosome-associated-protein-like domain of the nsp2 protein could account for COVID-2019 high ability of contagious, while the destabilizing mutation in nsp3 proteins could suggest a potential mechanism differentiating COVID-2019 from SARS CoV [5]. Nevertheless, nutritional and immunological statuses are also important factors for the screening of the therapeutic strategies for the affected and sensitive persons. Possible medications or immunizations from the existing drugs or infusion of convalescent plasma should be conducted with utmost care to the COVID 19 patients [6]. Advanced precautionary steps and therapeutic interventions should be formulated taking into account of several personal and community factors [7]. Development of a successful and reproducible vaccination protocol and its human trial may take longer time for the issues of mutation and large number glycan shield and epitope masking on the SARS CoV 2 proteins [8].
In a series of medication regimen, 1 (AT1R) blockers is used for reducing the severity and mortality from SARS-CoV-2 virus infections [9]. Chloroquine and Hydroxychloroquine are now being prescribed somewhere to fight COVID-19 for the time being [10,11]. Human coronaviruses and other influenza viruses resulted in epidemic in last 2 decade in different parts of the world. The anomalies between severity and spreading between the origin site, China and the other parts of the World (European and North America countries) might have some indication. Common human CoVs may have annual peaks of circulation in winter months in the US, and individual human CoVs may show variable circulation from year to year. [12].
Colder climate and prior exposure to other human coronaviruses, or influenza or flu viruses or possible vaccination against those might develop antibody dependent enhancement (ADE) of immunological responses during recent SARS CoV-2 exposure. ADE might have modulated immune response and could elicit sustained inflammation, lymphopenia, and/or cytokine storm [13,14]. Possibly, that could be one of the reasons (more history of exposure with CoVs beside weaker immune system) for older people being more affected by the present SARS CoV-2. Moreover, both helper T cells and suppressor T cells in patients with COVID-19 were below normal levels. The novel coronavirus might mainly act on lymphocytes, especially T lymphocytes [15]. Strong inflammatory events could be the initiator of the collapsing environment during COVID-19 infection. In most of the death cases in COVID-19 infections, acute respiratory failure is followed by other organs like kidney anomalies. In these cases inflammatory outburst might have worsened the infection and post viral-incubation situations [16,17]. Recent studies in experimentally infected animal strongly suggest a crucial role for virus-induced immune-pathological events in causing fatal pneumonia after human CoV infections [18]. So, combined anti-viral and anti-inflammatory treatment might be beneficial in these cases [19]. SARS-based available immune-therapeutic and prophylactic modalities revealed poor efficacy to neutralize and protect from infection by targeting the novel spike protein. [20].
In this background, critical screening of the spike sequence and structure from SARS CoV-2 by energetic and IC50 based immune-informatics analysis may help to develop a suitable vaccine. So, in the current study we were intended to analyze the spike proteins of SARS CoV, MERS CoV and SARS CoV 2 and four other earlier out-breaking human corona virus strains. We critically compared SARS CoV and SARS CoV 2 spike-proteins, domains, motifs and screened several epitopes based on their energetics, IC50 and antigenicity employing several bio/immuuno-informatics software with regard to their possible glycosylation and MHC/paratope binding effects. The present work might be helpful for the urgent development of a suitable vaccination regimen.

Structure prediction and structure quality assessment
Tertiary structures of selected coronavirus (CoV) spike proteins were predicted/validated using Phyre2, Protein Homology/analogy Recognition Engine V 2.0 [22] and SWISS-MODEL [23]. In Phyre2 structures were predicted against 100,000 experimentally designed protein folds. Predicted structures were subjected to analysis in SWISS-MODEL for QMEAN Z-score calculation which includes cumulative Z-score of Cβ, All atoms, Solvation and Torsion values prediction. RAMPAGE: Ramachandran Plot Analysis server [24] was used for protein 3D structures quality assessment. The summation of number of residues in favored regions and in additionally allowed regions was considered for percent (%) quality assessment.

Protein structural alignment
Predicted tertiary structures were visualized and aligned using PyMol molecular visualization system. Pymol assigns the secondary structure using a secondary structure alignment algorithm called "dss", where the sequences of two structures were aligned first then the structures were aligned. For the visualization of molecules a high-speed ray-tracer molecular graphics system was used.

Secondary structure analysis
Secondary structural analysis and their 3D folding patterns were analyzed in the form of topology using Pro-Func; a protein function predicting server using protein 3D structures [25]. In protein classification, topology analysis plays an independent and effective alternative to traditional structural prediction. Topological differences between two structures indicated differences in protein folding and flexibility.

Sequence comparison
Sequence comparisons among selected CoV spike glycoproteins were conducted through multiple sequence alignment using Clustal X2 [26]. Conserved motifs were identified using MEME Suite (http://meme.sdsc.edu/ meme/cgi-bin/mast.cgi) server. MEME Suite represents the ungapped conserved sequences which are frequently present in a group of related sequences. The 7 motif number has been defined in the current study for motif finding. Whereas, GLAM2Scan tools was used for the identification of gapped motifs within the related sequences. Conserved motifs were represented through LOGO using GLAM2Scan tools of MEME Suite server. Identified motifs were subjected to annotation using protein BLAST (https ://blast .ncbi.nlm.nih.gov/Blast .cgi?PROGR AM=blast p&PAGE_TYPE=Blast Searc h&LINK_LOC=blast home) and finally functional gene ontology based conserved domain identification was conducted using InterPro: Classification of protein families interactive database [27].

Epitope designing
Conserved epitopes of SARS Cov-2 spike glycoprotein were identified using SVMTrip: A tool which predicts Linear Antigenic Epitopes [28]. SVMTrip predicts the linier antigenic epitopes by feeding Support Vector Machine with the Tri-peptide similarity and Propensity scores of different pre-analyzed epitope data. Annotation of predicted epitopes was performed through protein BLAST. SVMTrip have gained 80.1% sensitivity and 55.2% precision value with five fold cross-validation. For epitope prediction 20 amino acid lengths was selected.

Analysis for epitopes binding efficiency to MHC class II
The Major Histocompatibility Complex (MHC) binding efficiency of predicted epitopes was performed using Immune Epitope Database (IEDB) and Analysis Resource [29]. A total of 5 DPA, 6 DQA and 662 DRB alleles from MHC class II were screened for the detection of best interactive alleles on the basis of highest consensus percentile rank and lowest IC50 value. All the analyses were performed on Human Class II allele, using frequently occurring alleles (frequency > 1%), peptide length of 9mers was selected; consensus percentile rank ≤ 1 was used for the selection of peptides.

Antigenecity prediction
Antigenecity of predicted epitopes were determined using Vaxigen v2.0 protective antigen, tumour antigens and subunit vaccines prediction server [30]. Vaxigen v2.0 uses auto cross covariance (ACC) transformation of selected protein sequences based on unique amino acid properties. Each sequence was used to find out 100 known antigen and 100 non-antigens. The identified sequences were tested for antigenecity by leave-one-out cross-validation and overall external validation. The prediction accuracy was up to 89%.

Molecular docking
The structure of MHC class II HLA-DRA, DRB molecule (PDB ID: 2q6w, 5jlz) and fully glycosylated COVID 19 spike protein structure (PDB ID: 6svb) was retrieved from Protein Data Bank (PDB) and Docking was performed using HawkDock [31] and ZDOCK [32] Server generating 100 docking solutions. Among them best 10 were analyzed based on docking scores and binding free energy value calculation.

Structure prediction and structure quality assessment
Initially the seven preselected spike glycoprotein sequences including the recent outbreaking strain SARS CoV-2 (Covid 19) were subjected to tertiary structure prediction ( Table 1). The Ramachandran plot data and structural alignment data suggests that SARS CoV and SARS CoV-2 (Covid-2) has higher degree of alignment (  (Figs. 1, 2). Position specific multiple sequence alignment also showed the highest similarity of COVID 19 with the SARS CoV (Fig. 1). Although having sequential diversity, all the selected spike glycoproteins showed some stretches of conserved sequences (Fig. 1). The position of N-terminal and C-terminal were found similar between SARS CoV and COVID-19 during topology analysis. On the other hand, a drastic difference was observed in MERS CoV in arrangement of secondary structures in the tertiary region (Fig. 2).

Conserved motif identification
Based on the alignment pattern, selected sequences were subjected to analysis different conserved motifs in the protein sequences. A total of 7 conserved motifs were analyzed (Fig. 3). Most remarkably all the selected spike glycoprotein sequences were shown to have each 7 motif sequences in similar pattern and some of those conserved motif position were represented in Fig. 3. All the identified conserved motifs were individually subjected to protein BLAST for functional annotation. Where motif 1, 5, 6 and 7 showed similarity with spike protein of Human and Bat coronavirus origin, motif 2 shared similarities with spike structural protein of mouse coronavirus origin, motif 3 showed similarity with spike glycoprotein S from SARS CoV and motif 4 with spike glycoprotein from MERS CoV ( Table 2). Highest percent identity of 91.84% was observed for motif 2. Functional gene ontology based conserved domain identification within 7 identified motifs were predicted against InterPro database. IPR002552 (CORONA_S2), PF01601 (CORONA_S2) domains were found within motif 1, 5 and 6. Similar domains were also observed in motif 2, 3 and 4 with an extra domain of SSF111474 (Coronavirus _S2 glycoprotein). No such domains were observed in motif 7. Identified domains were predicted with membrane fusion function (GO:0061025) and receptor-mediated virion attachment to host cell (GO:0046813). Whereas, those were detected as viral envelope (GO:0019031) and integral component of membrane (GO:0016021).

COVID 19 specific epitope probing
The epitope probing was conducted only with the COVID 19 spike glycoprotein sequence and structure. From the sequence analysis, 10 different locations were found which also showed similarity with SARS CoV and SARS COVID 2 spike glycoprotein in protein BLAST (Fig. 4). Also the motif positions within the spike glycoprotein monomer were represented in Fig. 4. Epitopes 1, 4 and 5 were not represented in COVID 19 spike glycoproteins, as they were found to be embedded within the virus envelop. Among the others, epitopes 2, 3, 6 and 7 were found at the interior location of spike glycoprotein monomer but epitopes 8, 9 and 10 were found at the surface of the structure.

Analysis of epitope binding to specific MHC class II
The proper type of Major Histocompatibility Complex (MHC) selection for identified COVID 19 epitope was performed and enlisted in Table 3. All the epitopes were individually screened against 5 DPA, 6 DQA and 662 DRB alleles from MHC class II for best fit analysis. As, we have analyzed the spike glycoprotein of COVID 19 which is an infectious particle, transmit from one infected individual to another, alleles of MHC class II were selected for viral epitope specificity analysis. Where HLA-DRB1*01:13 was observed to bind with 2B, 3B, 4B, 5B, 7B and 8B epitope sequences identified on the basis of IC50 value. Among them sequence IIAYTMSLGAENSVA (epitope 8B) was shown with lowest IC50 value of 7.11. On the other hand, HLA-DRB1*04:04 was found for both the sequence of TIMLCCMTSCCSCLK (epitope 5A) and SIIAYTMSLGAENSV (epitope 8A) on highest consensus percentile rank basis. Highest value of 9.5 was found for motif 5. Similarly HLA-DRB1*04:08 was observed for the sequences VRDPQTLEILDITPC (epitope 9A) with highest 9.50 and VSVITPGTNTSNQVA (epitope 10A) with 7.90 consensus percentile rank value. Individual MHC class II molecules were found for others ( Table 3). The threshold value of highest consensus percentile rank was selected as 10 for all. As a whole, highest Consensus percentile rank value of 10 was observed for sequence QQLIRAAEIRASANL (epitope 3A) and lowest IC50 value of 7.11 was observed for sequence IIAYT-MSLGAENSVA (epitope 8B). The antigenic property of identified target sequences from epitopes was also predicted on the basis of threshold value of 0.4. Below the threshold value, the sequence has been considered as non-antigenic and sequences with above value were antigenic in nature. A total of 9 antigenic sequences were detected (Table 3), among them two sequences AAEIRASANLAATKM (epitope 3B) and ITPGTNTSNQVAVLY (epitope 10B) were found with higher threshold value of 0.7125 and 0.7193 respectively.

Glycosylation and structural modification
Coronavirus spike protein has a masking of N-acetyl glucosamine (NAG) at different locations. Comparative analysis between glycosylated and non-glycosylated protein revealed some structural modification at the epitope locations. Among the identified epitopes 10B with sequence ITPGTNTSNQVAVLY (598-612) was found with N linked glycosylation at 603 position. The structural modification of this epitope was analyzed using non-glycosylated protein structure of COVID 19 (Acc. No.: NC_045512.2:21563-25384) and glycosylated COVID 19 protein (PDB ID: 6vsb). Effect of glycosylation on protein structures revealed that glycosylated conformation was more organized (Fig. 5a) than non-glycosylated one (Fig. 5b). Secondary structural comparison between two epitopes showed more organized structure with attached NAG residue (Fig. 5d) whereas a shorter β-sheet structure was observed when NAG is removed from the structure (Fig. 5c). The peptide interactive site of 10B epitope was blocked due to NAG attachment. As a result of which antibody binding to the antigen may hamper. The NAG residue directly binds with N or ASN amino acid residue (Fig. 5e). So the removal of NAG from the spike glycoprotein structure is difficult. Structural distortion between glycosylated and non-glycosylated epitope 10B at tertiary level indicated that removal of NAG may distort the structure of epitope (Fig. 5f ). Again that may hamper the proper antigen-antibody binding.

Effect of epitope glycosylation on MHC class II-epitope binding
In this section energetics of epitope attachment with MHC class II HLA-DRA, DRB was determined in presence and absence of NAG at the 10B epitope structure through molecular docking (Fig. 6) Fig. 6 where amino acid attachment differences were clearly indicated in Fig. 6b (Fig. 6c) where structural stabilization by hydrogen bond networking was noticed. But, the bindings were less when NAG residue was attached  . 6f ). This result indicated that the attachment of NAG with epitope also made it difficult for MHC class II molecules to proper representation of epitope. Though, epitope 8 A & B were also present at the surface of the spike glycoprotein but was found to wrapped with a short segment IGAEHVNNSYECD (651-663) carrying a glycosylation at N residue position 657 (Fig. 7a). As a result of which antibody accessibility to this epitope may also be difficult. Whereas, surface epitope 9 with sequence VRDPQTLEILDITPC (576-590) showed highest antigenecity of 1.1285 (Table 3) and highest consensus percentile rank of 9.50 and found free of any direct or indirect NAG attachment pattern (Fig. 7b). On that basis, it was further analyzed for MHC II HLA-DRB1 (PDB ID: 5jlz) binding through molecular docking. Among the best 10 docking posture, 8 th was found at the desired position of MHC molecule (Fig. 7c). The best position 1 was represented in Table 2

Structure prediction and structure quality assessment and conserved motif identification
An effort was made for epitope based peptide vaccine development by searching MHC-I and II classes compatible sites and the results yet to come [33]. In the current study, energetic and Inhibition Concentration50 based selection of SARS CoV-2 spike epitope and its possible glycosylation effect/structural-hindrance have been evaluated. This may help in urgent vaccination strategies in the current disastrous situation. Predicted structures were analyzed for quality assessment and according to QMEAN values of two SARS strains − 2.82 and − 3.63) for respective models (Table 1), good quality was observed. QMEAN values of This indicated the degree of native nature of the predicted structures in an universal scale [23]. Very Good quality structures were indicated with QMEAN Z-score closest to zero. QMEAN indicated the overall Z-score of Cβ, All atoms, Solvation and Torsion values.
But according to the Ramachandran plot (Table 1), the quality of all the predicted structures was above 98% indicating good structural prediction.
The structural alignment between each sets revealed that SARS CoV-2 was highly similar to SARS CoV rather than MERS. From both the sequential and structural point of view, higher degree of similarity between SARS CoV and SARS CoV-2 might indicate and help in the therapeutic and vaccination strategies with reference to the current global situation. However, an absolute higher degree of virulence and spreading nature of SARS CoV-2 is of great concern in the present scenario. Predicted conserved motifs showed functional similarity with different Coronavirus sequences. Above analysis indicated that identified motifs were specific for coronavirus and they could be used as the markers for common coronavirus infection detection irrespective of COVID 19.

COVID 19 specific epitope probing
Among all the selected epitopes, motifs 8, 9 and 10 were found at the surface of the structure, which could be used as the immunological targets for the proper diagnosis and treatment of COVID 19. The important issue of epitope finalization could be confronted by the factor of possible transition between pre-fusion and post-fusion spike structural distortion. Specific mutant structure has been designed and tested to be resistant to conformational change after ACE2 binding and protease cleavage at the S1/S2 site [34]. This may be indicative to searching suitable epitope which may remain unhindered from preto post-fusion state transition.

Epitope analysis for specific MHC class II binding beyond structural modification through glycosylation
During COVID-19 specific epitope designing, 10 different sequences were found at different structural location. Among them the location of 3B was more interior but 10B could be used as potent antigen. According to epitope locations (Fig. 4) and antigenic nature, other sequences like 8A&B, 9A&B could be the target also. Coronavirus spike proteins are glycosylated in nature where N-acetyl glucosamine (NAG) is the main component. Glycan shielding and possible epitope masking of an HCoV-NL63 has been observed which may be the barrier for proper immunogenic responses [8].
On the other hand, glycosylation showed a direct effect on proper structural packaging of viral spike glycoprotein (Fig. 5a). Whereas, non glycosylation or removal of glycosylation may distort the structure as proper MHC Class II binding and representation may hamper. Among the selected epitopes, epitope 9 with sequence VRDPQTLEILDITPC (576-590) showed highest antigenecity of 1.1285 and consensus percentile rank of 9.50 (Table 3) and found any direct or indirect glycosylation pattern (Fig. 7b). And epitope 9 was also formed rigid bonds with MHC II HLA-DRB1 (PDB ID: 5jlz). So, this could be a target for COVID-19 vaccine development.

Conclusions
Modifications in spike proteins structure during receptor mediated host cell entry and further prediction on post-fusion events may result in success in vaccination strategies or blocking entry. Host protease processing during viral entry and how different lineage B viruses can recombine to gain entry into human cells are the points also to be noted [35]. SARS CoV-2 induced severe and often lethal lung failure is caused due to its inhibition of ACE-2 expression [36]. So, keeping the ACE-2 normal functioning but blocking viral entry is the most challenging issue right now. Possible suitable epitope as screened in the current study may be helpful in this global pandemic situation. The history of last