Evolving geographic diversity in SARS-CoV2 and in silico analysis of replicating enzyme 3CLpro targeting repurposed drug candidates

Background Severe acute respiratory syndrome (SARS) has been initiating pandemics since the beginning of the century. In December 2019, the world was hit again by a devastating SARS episode that has so far infected almost four million individuals worldwide, with over 200,000 fatalities having already occurred by mid-April 2020, and the infection rate continues to grow exponentially. SARS coronavirus 2 (SARS-CoV-2) is a single stranded RNA pathogen which is characterised by a high mutation rate. It is vital to explore the mutagenic capability of the viral genome that enables SARS-CoV-2 to rapidly jump from one host immunity to another and adapt to the genetic pool of local populations. Methods For this study, we analysed 2301 complete viral sequences reported from SARS-CoV-2 infected patients. SARS-CoV-2 host genomes were collected from The Global Initiative on Sharing All Influenza Data (GISAID) database containing 9 genomes from pangolin-CoV origin and 3 genomes from bat-CoV origin, Wuhan SARS-CoV2 reference genome was collected from GeneBank database. The Multiple sequence alignment tool, Clustal Omega was used for genomic sequence alignment. The viral replicating enzyme, 3-chymotrypsin-like cysteine protease (3CLpro) that plays a key role in its pathogenicity was used to assess its affinity with pharmacological inhibitors and repurposed drugs such as anti-viral flavones, biflavanoids, anti-malarial drugs and vitamin supplements. Results Our results demonstrate that bat-CoV shares > 96% similar identity, while pangolin-CoV shares 85.98% identity with Wuhan SARS-CoV-2 genome. This in-depth analysis has identified 12 novel recurrent mutations in South American and African viral genomes out of which 3 were unique in South America, 4 unique in Africa and 5 were present in-patient isolates from both populations. Using state of the art in silico approaches, this study further investigates the interaction of repurposed drugs with the SARS-CoV-2 3CLpro enzyme, which regulates viral replication machinery. Conclusions Overall, this study provides insights into the evolving mutations, with implications to understand viral pathogenicity and possible new strategies for repurposing compounds to combat the nCovid-19 pandemic.


Background
In early January 2020, the World Health Organisation (WHO) reported cases of pneumonia of an unknown cause in Wuhan City, Hubei Province of China, and by 30 January 2020, WHO escalated the warning to public health emergency of international concern. By 12 March 2020, the novel coronavirus (nCoV) outbreak achieved a global pandemic status and was recognised as novel

Open Access
Journal of Translational Medicine *Correspondence: nitin.chitranshi@mq.edu.au; vivek.gupta@mq.edu.au 1 Faculty of Medicine, Health and Human Sciences, Macquarie University, F10A, 2 Technology Place, North Ryde, NSW 2109, Australia Full list of author information is available at the end of the article Page 2 of 15 Chitranshi et al. J Transl Med (2020) 18:278 Covid-19 disease (nCovid-19) [1]. The present coronavirus outbreak is associated with severe acute respiratory syndrome 2 (SARS-CoV-2), phylogeny and taxonomy designated [2]. Worldometer reported the total SARS-CoV-2 infected cases on 31 May 2020 as 6,238,550 and deaths 374,374 worldwide (https ://www.world omete rs.info/coron aviru s/#count ries). The pathogen has been established to transmit from human to human contact and has quickly spread to more than 187 countries across the globe (https ://gisan ddata .maps.arcgi s.com/). Coronaviruses are single and positive stranded RNA viruses belonging to the genus Coronavirus of the family Coronaviridae that can cause acute and chronic respiratory and central nervous system illnesses in animals, including in humans [3,4]. The infection can also cause mild episodes of follicular conjunctivitis in certain patients. In animal models, the infection has been shown to induce anterior uveitis, retinitis, and optic neuritis like symptoms [5]. Recent study has shown formation of hyper-reflective lesions in the ganglion cell and inner plexiform layers of the retina particularly around the papillomacular bundles [6]. The disease has also been shown to affect sense of smell and taste bud sensitivity in patients [7]. All coronaviruses have a minimum of 3 basic viral proteins (i) an envelope protein (E), which is a highly hydrophobic protein involved in several aspects of the virus life cycle such as assembly and envelope formation [8] (ii) a spike protein (S), a glycoprotein involved in receptor recognition and membrane fusion [9] and (iii) a membrane protein (M), which plays a key role in virion assembly [10] (Fig. 1). The viral genome also encodes two open reading frames (ORF), ORFa and ORFb that activate intracellular pathways and triggers the host innate immune response [11]. The polyprotein encoded by the virus are initially processed by two main viral proteases, which include a papain-like cysteine protease (PL pro ) and  26,191) and Nucleocapsid (N, nt 28,274-29,533) proteins in green. ORF1a gene encodes papain-like protease and 3CL protease, ORF1b gene encodes RNA-dependent RNA polymerase, helicase and endo ribo-nuclease, S, E, M and N gene encodes spike, membrane glycoprotein and nucleocapsid phosphoprotein respectively. Three-dimensional crystal structure of 3CL-protese, endoribonuclease and SARS-Cov-2 spike protein receptor binding domain (RBD) engaged human angiotensin converting enzyme 2 (ACE2) receptor were collected from protein data bank chymotrypsin-like cysteine protease, known as 3C-like protease (3CL pro ), into intermediate and mature nonstructural proteins [12]. The main proteinase 3CL pro , is one of the primary targets for development in an antiviral drug therapies, as it plays a critical role in the viral replication [13]. K11777, camostat and EST, are cysteine protease inhibitors, which have been shown to inhibit SARS-CoV 3CL pro replication in cell culture conditions [14,15]. Recent release of the high-resolution crystal structure for the main proteinase 3CL pro (Protein Data Bank, PDB ID: 6Y2G), describing an additional amide bond with the α-ketoamide inhibitor pyridone ring to enhance the half-life of the compound in plasma [16] is suggested to accelerate the targeted drug discovery efforts. Two HIV-1 proteinase inhibitors, lopinavir and ritonavir, have been considered to target SARS-CoV [17]. Interestingly, the substrate binding cleft is located between domains I and II of both SARS-CoV 3CL pro and SARS-CoV-2 3CL pro enzymes [16,18].
Since the initial stages of the SARS-CoV-2 outbreak, laboratories and hospitals around the world have sequenced viral genome data with unprecedented speed, enabling real-time understanding of this novel disease process, which will hopefully contribute to the development of novel candidate drugs. The complete genomes of SARS-Cov-2 from all over the world have been deposited at The Global Initiative on Sharing Avian Influenza Data (GISAID) [19] database and more sequences continue to be deposited with the passage of time. Development of a novel vaccine against SARS-CoV-2 so far remains elusive and requires a thorough understanding of molecular changes in viral genetics. This may be attained by freely accessing the GISAID database and processing the data to enhance our understanding of the fine biochemical and genetic differences that differentiate this virus from the previously known strains [20].
It is well known that viruses are non-living and that they require host cells to survive and to reproduce, with the sole aim to perpetuate themselves. When a virus jumps from animal to human, it is termed a zoonotic virus. This occurred during the SARS outbreak of 2002, when a new coronavirus spread around the world and resulted in death of hundreds of people [21]. In 2012, another novel coronavirus outbreak, termed Middle East respiratory syndrome (MERS), caused over 400 fatalities and spread to over 20 different countries [22]. There are currently many circulating viruses, but why SARS-CoV-2 has achieved such a devastating pandemic status and whether this pandemic will subside remain unanswered.
The purpose of this study is to characterise known viral variants that have spread across different countries, especially hot-spot regions, with a focus on recurrent mutations in South American and African geographical regions. We also focused on the SARS-CoV-2 main proteinase, 3CL pro which is highly conserved in most of the coronaviruses and has been suggested to be a potential drug target to fight against nCovid-19. Repurposed drugs such as flavonoids and biflavanoids, known anti-malarial and anti-viral drugs and the inhibitory effects of vitamins could selectively inhibit this enzyme and can be used either alone or in combination with other disease management approaches to suppress the virulence of SARS-CoV-2. These bioinformatics, computational modelling and molecular docking approaches using repurposed drugs could be particularly useful in the current nCovid-19 outbreak.

Collection of SARS-Cov-2 genome
The Global Initiative on Sharing Avian Influenza Data (GISAID) is headquartered in Munich, Germany and is a public-private partnership project between German government and the non-profit organization founded by leading medical researchers in 2006. Since December 2019, GISAID has become a repository storage database for nCovid-19 genome. The genome analysis was carried out for data deposited up to 31 May 2020 (https ://www. gisai d.org/). Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Wuhan genome was collected from NCBI, NC_045512.2.

Multiple sequence alignment and Phylogenetic tree construction
Multiple sequence alignment (MSA) of all nucleotide sequences were carried out in the EMBL-EBI Clustal Omega server to investigate sequence conservation [23,24]. The Newick format for the multiple align sequence was used to generate phylogeny [25]. The phylogenetic tree was constructed in the Interactive Tree of Life (iTOL) online tool [26]. The iTOL server generate phylogeny trees in a circular (radial) and normal standard trees. The circular trees can be rooted and displayed in different arc sizes [27][28][29].

Structure analysis SARS and SARS-CoV-2 3CL pro
Crystal structure of SARS and SARS-CoV-2 3CL pro with bound inhibitors were collected from the protein data bank (PDB) [30]. PDB ID: 3TNT, SARS main protease was selected as reference to analyse the variants in SARS-CoV-2 3CL pro (PDB ID: 6Y2G). All the PDBs were visualised using UCSF Chimera software [31]. Multiple alignment, ribbon, surface and superimposition module in Chimera software were used for analysis and image generation [24,32].
The dataset comprises of flavones and biflavanoids, anti-viral, anti-malarial and vitamins as SARS-CoV-2 3CL pro inhibitors [16]. In total 17 repurposed drugs were collected from the Pubchem database [33]. Twodimensional (2D) structures were downloaded from the Pubchem database in.sdf format. The inhibitor energies were minimized using the Austin Model-1 (AM1) until the root mean square (RMS) gradient value became smaller than 0.100 kcal/mol Å and later re-optimization was done by MOPAC (Molecular Orbital Package) method [34,35]. Later, all the inhibitors were converted to.pdb format in Open Babel software [36] and submitted to molecular docking studies.

Selection and preparation of SARS-Cov-2 main protease protein (3CL pro )
Crystal structure of the SARS-CoV-2 3CL pro was retrieved from PDB (PDB ID: 6Y2G). The protein macromolecule (SARS-CoV-2 3CL pro ) optimization was carried out in UCSF Chimera software [31,37,38] by adding polar hydrogen atoms, removing water molecules, implying amber parameters, followed by minimization with the MMTK method in 500 steps with a step size of 0.02 Å. SARS-CoV-2 3CL pro contained chain A and B of 306 amino acids sequence length. Chain A of PDB ID: 6Y2G containing alpha-ketoamide (O6K) inhibitor was used for identification of substrate binding site.

SARS-Cov-2 main protease protein (3CL pro ) inhibitors docking studies
The docking of SARS-CoV-2 3CL pro specific pharmacological inhibitors into the catalytic site was performed by the AutoDock 4.2 program [39]. The alpha-ketoamide (O6K) inhibitor was extracted from the SARS-CoV-2 3CL pro protein. The polar hydrogen atoms were added, the non-polar hydrogen atoms were merged, Gasteiger charges were assigned and solvation parameters were added to the protease, SARS-CoV-2 3CL pro protein. The protonation state for all inhibitors and O6K were set to physiological pH and rotatable bonds of the ligands were set to be free. The AutoGrid program was also used to generate grid maps. Cys145 residue in the SARS-CoV-2 3CL pro protein was selected with grid box dimensions of 40 × 40 × 40 Å formed around the Cys145 protease residue, which is present in the substrate binding site. Protein rigid docking was performed using the empirical free energy function together with the Lamarckian genetic algorithm (LGA) [40].
LGA default parameters were used in each docking procedure and 10 different poses were calculated. Chimera and Discovery Studio (DS) Visualizer2.5 [31] software were used for visualisation and calculation of protein-ligand interactions.
Sequences from bat-SARS-CoV and Pangolin-SARS-CoV were aligned and compared to the Wuhan SARS-CoV-2 (NC_045512.2) as a reference genome. To determine the evolutionary relationship among bat-CoV, Pangolin-CoV and SARS-CoV-2, we estimated a phylogenetic tree based on the nucleotide sequences of the whole-genome sequence. Bat-SARS-CoV and SARS-CoV-2 were grouped together and were observed to share > 96% similarity, whereas the Pangolin-SARS-CoV was closest evolutionary ancestor (Fig. 2d). Isolate of human Wuhan SARS-CoV-2 (NC_045512.2) shared 85.98% identity with Pangolin-SARS-CoV which suggests that Pangolin may be associated with SARS-CoV-2 evolution or subsequent outbreak [41,42].

Identification of hotspot mutations in SARS-Cov-2 complete genome from South American and African regions and analysis of main protease (3CL pro ) sequence
Recently, Pachetti et al. [43] has reported eight novel recurrent mutations of SARS-Cov-2 that have been identified in positions 1397, 2891, 14,408, 17,746,  same locus indicates the high susceptibility of these genetic regions to change as the virus evolves. For its actions, single-stranded SARS-CoV-2 RNA viral genome encodes two protease polyproteins (i) papainlike cysteine-protease (PL pro ) and (ii) the chymotrypsinlike cysteine protease known as 3C-like protease (3CL pro ). 3CL pro , which is a main protease and therefore important in order to examine the incidence of any mutation in SARS-CoV-2 3CL pro . Multiple sequence alignment of the SARS-CoV-2 genome collected from patients in six different geographical locations exhibited 100% similarity and no discernible variations in sequences obtained from diverse geographical regions, for this enzyme.

SARS-CoV and SARS-CoV-2 similarity
SARS and SARS-CoV-2 complete genomes were collected from NCBI, GenBank database (NC_004718 and NC_045512). Protease nucleotide sequences were extracted from SARS (NC_004718) and were aligned with SARS-CoV-2 (NC_045512). Clustal Omega alignment of 918 SARS nucleotides showed around 95% similarity with SARS-CoV-2 (Additional file 2: Table S2). Higher amino acid sequence identity was also observed in SARS-CoV and SARS-CoV-2 main protease (3CL pro ) derived from Wuhan and US patients. SARS-CoV and SARS-CoV-2 3CL pro showed highly conserved region in both the catalytic sites, His41 and Cys145 [44] and substrate Previously confirmed mutations at positions nt3036, nt8782, nt11083, nt14408, nt23403, nt28144 and nt28881 were also present in South American and African populations. We normalize the mutation frequency percentage by estimating the frequency of genomes carrying mutation and comparing it with the overall number of collected genomes per geographical area. The graph shows the cumulative mutation frequency of all given mutations present in South American and African regions. Mutation localisation in viral genes are reported in the legend as well as the proteins (i.e. non-structural protein, nsp) presenting these mutations. b It is also evident that South American and African clusters show a differential pattern of novel mutations: mutation 1059 (black), 9477 (pink), 28,657 (green) and 28,878 (red) in South American, whereas mutation 1059 (black), 15,324 (orange), 28,878 (yellow) and 29,742 (magenta) are present with greater frequency in African patients  (Fig. 4a), inferring that these proteases exhibit high similarities. Furthermore, 12 variant positions (Thr35Val, Ala46Ser, Ser65Asn, Leu86Val, Arg88Lys, Ser94Ala, His-134Phe, Lys180Asn, Leu202Val, Ala267Ser, Thr285Ala and Ile286Leu) were observed in SARS-CoV-2 3CL pro (Fig. 4b, c). The effects of mutations and potential resultant amino acids on SARS-CoV-2 3CL pro structure are expected to conserve the polarity and hydrophobicity, except when the resulting amino acid is Leucine at 286 position. However, it is important to mention that these 12 variants are not present in catalytic and substrate binding regions which are involved in critical proteolytic activity of the SARS-CoV-2 protease molecule.

Docking study of SARS-CoV-2 3CL pro inhibitors
The SARS-CoV-2 3CL pro receptor binding pocket was determined by superimposing SARS and SARS-CoV-2 3CL pro with their respective inhibitors (Fig. 4). Interestingly, Needleman-Wunsch alignment algorithm and BLOSUM-62 matrix analysis revealed 94.44% sequence identity between SARS (Fig. 5a, grey) and SARS-CoV-2 3CL pro (Fig. 5a, Cyan). Cys-His catalytic dyad (Cys145 and His41) comprises the active catalytic binding site in SARS-CoV-2 3CL pro (Fig. 5a' , b) and indicated the strong possibility that intended pharmacological inhibitors of SARS-CoV-2 3CL pro may also suppress the activity of SARS-CoV-2 3CL pro viral enzymes. Docking protocol for the Autodock 4.2 program was optimized by extracting and re-docking the alpha-ketoamide inhibitor named O6K in the binding pocket of SARS-CoV-2 3CL pro . The lowest binding energy − 6.45 kcal/mol and 18.72 µM inhibitory constant (Ki) was predicted for alpha-ketoamide inhibitor (shown in Table 1). Re-docking of O6K inhibitor occupied the similar docking pose in the SARS-CoV-2 3CL pro catalytic dyad active site as previously reported in the crystal structure (PDB ID: 6Y2G) (Fig. 5c,  d).
Seven flavonoids and biflavonoid, three anti-malarial compounds, seven anti-viral drugs and three vitamin molecules were subjected to automated docking within the active site of SARS-CoV-2 3CL pro catalyticdyad. The superimposition of all docked flavones and biflavones (Fig. 6a), anti-malarial drugs (Fig. 6b), antiviral drugs (Fig. 6c) and vitamins (Fig. 6d) are shown in Fig. 6 and various binding parameter have been tabulated in detail in Table 2.
Amentaflavone, a biflavonoid showed the highest binding energy (− 8.49 kcal/mol) implicating a strong affinity with SARS-CoV-2 3CL pro . This corresponded with previously reported enzyme inhibitory assays with amentaflavone that showed the highest IC 50 value at low concentrations of the molecule, 8.3 ± 1.2 µM [46]. However, bilobetin demonstrated the lowest IC 50 value at a higher concentration of 72.3 ± 4.5 µM in SARS-CoV enzyme activity assays [46]. In contrast, our docking studies revealed that bilobetin, predicted almost comparable binding energy with that of amentaflavone (− 8.29 kcal/mol) suggesting that mutation in SARS-CoV-2 3CL pro could potentially disrupt hydrogen bonding or induce some conformational change that could result in alterations in the binding site thus affecting inhibitor interactions with the enzyme active site residues. Amentaflavone showed H-bond interactions with the catalytic dyad residues (Cys145 and His41) as well as noteworthy interactions with the SARS-CoV-2 3CL pro residues Thr26, Ser46, Ser144 and Glu166 whereas His164, and Gln189 amino acids contributed  to the hydrophobic interactions for the SARS-CoV-2 3CL pro inhibitors (Fig. 7a). Three antimalarial drugs were then selected to study their inhibitory actions on SARS-CoV-2 3CL pro . We found, Artemisinin, a natural compound derived from Chinese herb Artemisia annua produces the highest docking score (− 6.40 kcal/mol) as compared to O6K, chloroquine (-4.95 kcal/mol) and hydroxychloroquine (− 5.77 kcal/mol) anti-malarial molecules. Importantly, Artemisinin has demonstrated broad anti-viral activity against human cytomegalovirus, herpes simplex virus type 1, Epstein-Barr virus, hepatitis B virus, hepatitis C virus, and bovine viral diarrhea virus [47]. Artemisinin was shown to exhibit hydrogen bonding with His41, Leu141, Asn142, Gly143, Ser144 and Glu166 SARS-CoV-2 3CL pro amino-acid residues (Fig. 7b).
Amongst the seven antiviral drugs, Ritonavir showed the highest binding energy (-7.45 kcal/mol) and lowest inhibitory constant Ki value (3.49 µM). Ritonavir produced hydrogen bond interactions with Thr26, His41 and Cys145 SARS-CoV-2 amino acids (Fig. 7c). A combination of two HIV-1 protease inhibitors, lopinavir and ritonavir, were given to critically ill SARS-CoV 2 infected patients [48]. However, the combination therapy of lopinavir and ritonavir was also stopped early in 13 patients (total recruitment 99 patients) due to associated gastrointestinal adverse events [48].
The severity of antiviral therapy adverse events has led researchers to explore the potential of macro-, microand phytonutrients that can potentially promote an immune response and suppress viral induced effects. Vitamins are known to modulate the host immune functions by providing anti-oxidants and anti-inflammatory activity [49,50]. Therefore, we selected vitamins, ascorbic acid (vitamin C), cholecalciferol (vitamin D) and alpha-tocopherol (vitamin E) to investigate their potential interactions with the enzyme SARS-CoV-2 3CL pro . Our docking results interestingly, showed that vitamin D has the lowest binding energy and Ki (− 7.75 kcal/mol and 2.08 µM respectively) as compared to vitamin C and vitamin E. Amino acid residues Thr24, Thr26, His41 and Cys145 of SARS-CoV-2 3CL pro showed hydrogen bond formation with vitamin D (Fig. 7d). Amino acid Thr is extensively involved in intracellular signalling changes through phosphorylation changes, and here we observed that cholecalciferol formed a strong hydrogen bond with Thr residues and could potentially block the phosphorylation of Thr residue in SARS-CoV-2 3CL pro enzyme. There is evidence that serious SARS-CoV-2 infected cases have reported severe vitamin D deficiency and thus therapeutic concentrations of this molecule could potentially be used clinically in SARS-CoV-2 cases [51,52].

Discussions
The novel coronavirus termed "nCovid-19" is now known as the third large-scale epidemic coronavirus introduced into the human population in the twenty-first century. At the time of writing, more than 3.67 million confirmed cases globally, with nearly 250,000 deaths had been reported by WHO. Clinically, nCovid-19 is similar to SARS regarding its presentation, however the sheer capacity and speed of which nCovid-19 has spread to global pandemic levels have left researchers asking what makes this outbreak so similar in presentation, yet so different in its virulence to previous coronaviruses. Genome sequence analysis has looked to investigate similarities in the phylogeny of SARS-CoV-2, which like SARS and MERS, have now placed it in the betacoronavirus genus [53]. The known severe and often fatal pathogenicity of betacoronaviruses has been highlighted in these previous epidemics and has reported higher transmission and pathogenicity than the milder and lesser known a-CoVs, which are often compared to the common cold [54].
Our study further compares the similarities between SARS-CoV and SARS-CoV-2 using Clustal Omega alignment to show that of 918 SARS nucleotides, there was a similarity of approximately 95%. Furthermore, we report high amino acid sequence identity in both SARS-CoV and SARS-CoV-2 main protease 3CL pro , which regulates coronavirus replication complexes [55]. Such highly conserved regions in both catalytic sites and the substrate binding regions of the enzymes has also been validated previously in studies by Huang et al. and Muramatsu et al. [44,45]. While this region provides an attractive target for anti-viral drug design, it also can begin to elucidate on viral origins and uncover its ease in transmission. Based on more recent virus genome sequencing results and evolutionary analysis, the origins and transmission of nCovid-19 have uncovered bats as the natural host of the virus origins [42]. As such, studies earlier this year queried the unknown intermediate host between bats and humans, and recent studies have pointed this to pangolins [41,42]. To determine the extent of the evolutionary relationship between bat-CoV, Pangolin-CoV and SARS-CoV-2, we corroborate that based on the nucleotide sequences of the whole-genome sequence, bat-SARS-CoV and SARS-CoV-2 are grouped together and share > 96% similarity, with Pangolin-SARS-CoV as the closest evolutionary ancestor [41,42]. Furthermore, we report that in isolates of human Wuhan SARS-CoV-2 there is an 85.98% similarity in identity to Pangolin-SARS-CoV, which suggests that Pangolin may be associated with the evolution of subsequent outbreaks of COVID-19.
Regarding nCovid-19 and its similarity in transmission to SARS-CoV, recent studies have also demonstrated that transmission occurs via the receptor angiotensinconverting enzyme 2 (ACE2) [42]. This may indicate why SARS-CoV-2 has often led to severe and in many cases fatal respiratory tract infections, like its two SAR-CoV predecessors. Since the SARS-CoV epidemic of 2002 was also known to use the ACE2 receptor to infect humans [56]. Bronchoalveolar lavage fluid taken from nCovid-19 patients have shown that ACE2 is widely distributed in the lower respiratory tracts of humans [42]. Furthermore, the virion S-glycoproteins expressed on the surface of coronaviruses adhere to ACE2 receptors on human cells [57]. This location provides a target for uncovering the mechanistic insights into the severity of the disease and how this region has assisted in the zoonosis of SARS-CoV-2 specifically. Additionally, mutations in the genomic structure of SARS-CoV-2 also might elucidate on the aggressiveness and pathogenicity of the viruses, which may in turn help to explain why some strains are evolutionarily much more virulent and contagious. Angeletti et al. have described mutations in the endosome-associated-protein-like domain of the nsp2 and nsp3 proteins, the former possibly accounting for the high virulence and contagion, while the latter suggesting a mechanism that differentiates nCovid-19 from SARS-CoV [58]. Our studies build on this knowledge and assist to begin to identify the sub-clinical causes for the virulence and unique pandemic pattern of this outbreak by identifying the evolving mutations from region to region. Additionally, previous studies by Pachetti et al. have reported novel recurrent mutations of the SARS-Cov-2, and our study corroborates these mutations in South America and Africa regions [43]. Drug discovery and vaccine development against SARS-CoV-2 infection require time and lengthy processes, however drug repurposing represents an alternative strategy in the current scenario. Some of these antivirals are currently being used clinically in SARS-CoV-2 treatment, including lopinavir [59], ritonavir [60], remdesivir [61], and oseltamivir [62]. However, in the clinical setting, lopinavir/ritonavir, a 3CL pro and RdRp inhibitors, showed no benefit in Covid-19 adult patients [48]. The double point mutation in RdRp gene identified in our study can potentially lead to a drug-resistance event. Moreover, other classes of drugs, such as chloroquine and hydroxychloroquine have shown antiviral properties by blocking viral entry into cells by inhibiting glycosylation of host receptors [63]. We observed no differences in the SARS-CoV-2 main proteinase, 3CL pro genome sequences, but important differences in SARS-CoV-2 3CL pro with SARs-CoV protein, underlining the extreme need for identification of inhibitors to target the viral life cycle. It is not known whether these mutations induce any alterations in the gene transcription or localisation of affected proteins which can be investigated in near future using biochemical and immunological approaches [64,65].

Conclusions
Various theories have been proposed regarding the origin of highly virulent SARS-CoV-2 particle. Our analysis shows that Bat-SARS-CoV shares > 90% similarity with the SARS-CoV-2, however it is possible that the bat coronavirus infected another "intermediate host", such as Pangolin, which subsequently transmitted the virus to humans. Pangolin isolates do share sequence identity with SARS-CoV-2 genomes and could be an intermediate host. We identified novel mutation hotspot regions from South American and African isolates of SARS-CoV-2 genome sequences. Interestingly, double point mutations in RdRp at position 14,805 and 14,808 and triple point mutations in nucleocapsid protein at position 28,881, 28,882 and 28,883 were identified in both South American and African genomic sequences, suggesting the vulnerability of these genetic loci to undergo change. In addition, a novel mutation pattern specifically oriented towards nucleocapsid phosphoprotein in both South American and African sequences was noted while novel ORF3a and RdRp specific variants were observed particularly from African genomic sequences. The potential effects of double and triple point mutations on translated proteins and the virulence of SARS-CoV-2 requires further investigations. SARS-CoV-2 main proteinase, 3CL pro genome was observed to be conserved across all collected genomic sequences. Despite significant similarities in the SARS-CoV 3CL pro structure with SARS-CoV protein, SARS-CoV-2 3CL pro revealed certain key differences, which highlight the extreme need for identification of novel mechanism-based drugs to target the virus processing. Repurposed drugs including natural flavonoids and bioflavonoids, antimalarial, antiviral and vitamins-based compounds have previously been shown to be beneficial in several viral infections and outbreaks. The novel data generated from this study enhances our knowledge of the fine molecular differences that differentiate SARS-CoV-2 virus SARS-CoV. It also highlights the emerging variations in the viral genome across different populations as the virus evolves to local genetic and environmental factors. These findings will likely play a key role in the development of mechanism-based and targeted therapeutic strategies to treat SARS-CoV-2 infection and reduce its virulence.
Additional file 1: Table S1. Acknowledgement table containing information about authors, originating, and submitting laboratories of the sequences deposited to GISAID database. Additional file 2: Table S2. SARS-CoV and SARS-CoV-2 sequence alignment of 3CLPro shares around 95% similarity.