Advanced bioinformatics rapidly identifies existing therapeutics for patients with coronavirus disease-2019 (COVID-19)

Background The recent global pandemic has placed a high priority on identifying drugs to prevent or lessen clinical infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), caused by Coronavirus disease-2019 (COVID-19). Methods We applied two computational approaches to identify potential therapeutics. First, we sought to identify existing FDA approved drugs that could block coronaviruses from entering cells by binding to ACE2 or TMPRSS2 using a high-throughput AI-based binding affinity prediction platform. Second, we sought to identify FDA approved drugs that could attenuate the gene expression patterns induced by coronaviruses, using our Disease Cancelling Technology (DCT) platform. Results Top results for ACE2 binding iincluded several ACE inhibitors, a beta-lactam antibiotic, two antiviral agents (Fosamprenavir and Emricasan) and glutathione. The platform also assessed specificity for ACE2 over ACE1, important for avoiding counterregulatory effects. Further studies are needed to weigh the benefit of blocking virus entry against potential counterregulatory effects and possible protective effects of ACE2. However, the data herein suggest readily available drugs that warrant experimental evaluation to assess potential benefit. DCT was run on an animal model of SARS-CoV, and ranked compounds by their ability to induce gene expression signals that counteract disease-associated signals. Top hits included Vitamin E, ruxolitinib, and glutamine. Glutathione and its precursor glutamine were highly ranked by two independent methods, suggesting both warrant further investigation for potential benefit against SARS-CoV-2. Conclusions While these findings are not yet ready for clinical translation, this report highlights the potential use of two bioinformatics technologies to rapidly discover existing therapeutic agents that warrant further investigation for established and emerging disease processes.

older individuals and immunosuppressed patients. In 2002, an outbreak of severe acute respiratory syndrome (SARS) in Guangdong China was traced to SARS-CoV, a new beta-coronavirus. During this outbreak nearly 8100 patients were diagnosed with an overall mortality of 9%, which increased to 50% in patients over 60 years of age [3]. The disease was thought to have originated from infected bats and was easily contained as transmission appeared to require direct contact with infected individuals. A distinct group 2c b-coronavirus, genetically related to bat coronaviruses, was responsible for another outbreak in Saudi Arabia in 2012 and the disease was termed Middle East Respiratory Syndrome (MERS). This virus was associated with an initial 50% mortality but did not spread appreciably outside the region [4]. An outbreak of an unknown respiratory illness in Wuhan China was reported in late December of 2019 and the causative agent was identified as SARS coronavirus (SARS-CoV-2) and the disease was called coronavirus disease 2019 (COVID-19) [5]. The disease has rapidly become a global pandemic and a major priority has been placed on finding drugs that prevent or limit viral propagation and infection.
Coronaviruses share a large genome, around 30 kB, express large replicase genes encoding non-structural proteins involving approximately 20 kB of the genome, undergo early transcription of the replicase gene, contain a viral envelope, and utilize ribosomal frameshifting for non-structural gene expression [6]. The viral genome is composed of a 5′-cap structure with a leader sequence and untranslated region (UTR) composed of multiple stem loop structures needed for RNA replication [7]. The 3′-end contains an UTR that has RNA structures necessary for viral RNA synthesis as well as a 3′-poly(A) tail that mimics mRNA allowing translation of replicaseencoded non-structural proteins. Transcriptional regulatory sequences (TRSs) are found at the 5′-end of most structural and accessory genes with most accessory genes being non-essential but modulating viral pathogenesis [8]. There are four main structural proteins, termed spike (S), membrane (M), envelope (E) and nucleocapsid (N). The S protein is about 150 kD and is responsible for the "spike" on the viral surface and trimeric S protein is used for viral attachment to cell entry receptors [9].
The life cycle of human coronaviruses begins with viral attachment via the S protein to cell entry receptors, typically peptidases. The SARS-CoV virus uses the angiotensin converting enzyme 2 (ACE2) as the main cellular receptor with the membrane serine protease, TMPRSS2, acting as an accessory protein to stabilize cell entry and cleavage of the S protein following viral fusion with the cell membrane [10,11]. The virus enters and replicates within the cytoplasm starting with translation of the replicase gene and assembly of a viral replicase complex [12]. The complex and non-structural genes act to inhibit host cell translation while promoting host mRNA degradation and enhancing viral RNA synthesis and replication [12]. The process results in genomic and subgenomic RNA generated via negative-strand intermediates and the S, E, and M structural proteins enter the endoplasmic reticulum (ER) and move into the ER-Golgi intermediate compartment where viral genomic progeny are encapsidated by the N protein [13]. After assembly virions are transported to the cell surface and released by exocytosis. In some coronaviruses excess S protein can mediate cell fusion with neighboring cells, a process that may allow rapid viral transmission without detection by the host humoral immune response [14].
To accelerate pharma R&D across targets and disease areas, Immuneering developed Disease Cancelling Technology (DCT) to identify targets and drugs reversing disease gene expression and Fluency, a computational platform for large scale high throughput in silico screening. DCT quantifies similarity of genome-wide signatures of disease to signatures of drug induced gene expression changes using cosine similarity. Uniquely relative to other methods, DCT quantifies the per-gene contribution to overall disease amplification or cancellation and is not biased to any specific targets or pathways. Fluency predicts quantitative binding affinity purely from sequence. Unlike other methods, Fluency is a single universal quantitative structure-activity relationship (QSAR) model able to accept any molecule and protein sequence as input. When trained on the over 2 million IC50 values from Chembl, Fluency achieves near experimental level binding prediction accuracy as well generating predictions on the binding site. We applied these platforms to determine if repurposing of existing drugs may be helpful in COVID-19 infection, by: (1) assessing established drugs for binding to ACE2 and TMPRSS2, two proteins used by the virus to enter cells and (2) Scanning FDA approved compounds for transcriptomic disease cancellation of coronavirus associated gene expression changes.

Results
Given that the COVID-19 virus uses angiotensin converting enzyme 2 (ACE2) as the main cellular receptor to enter the cell, we ran two Fluency models with ACE2 as the target input. Two different Fluency models ("model a" and "model b") were run to predict binding of ACE2 to all chemicals in the Selleckchem FDA approved drug library. Initial ranking by performance in model a is shown in Table 1, which included multiple known ACE inhibitors scoring well (Enalaprilat, Ramipril, Lisinopril, Monopril, Captopril). Out of these drugs, Enalaprilat has the best binding score from model a. Given reports of the possibility of ACE2 induction being driven by ACE1 inhibition [15] and multiple subsequent reports hinting at benefit from ACE inhibition [16][17][18][19], we were interested to observe ACE2 specificity in comparison to ACE1. For top hits, the binding of ACE2 and ACE1 was compared by calculating the difference in predicted binding (ACE2 binding minus ACE1 binding) using two Fluency models (Table 1). According to model a, Brigatinib, Tirofiban Hydrochloride, and Aleuritic Acid are top ranked by pBind, and Brigatinib is also highest ranked by model a as specific for ACE2 over ACE1. Glutathione was ranked in 7th place by model a for being more specific to ACE2 over ACE1. Next, a consensus ranking using the results of both models a and b was used to select top ACE2 binders (Table 2). Enalaprilat, Tirofiban hydrochloride, and Sotagliflozin showed balanced performance in both models. In order to assess specificity, fluency was run on top hits in reverse (predicting binding of a small molecule to the human proteome). By this metric, Ramipril, Piperacillin Sodium and Captopril had high ranking for ACE2 ( Table 2). The worst score by far of top hits considered was R-406.
To explore other potential COVID-19 associated hits, we ran both Fluency models with TMPRSS2 as the target on the Selleckchem FDA approved drug library, and ranked hits based on performance in model a. Ombitasvir, Elbasvir, and Capecitabine are the top predicted binding hits for TMPRSS2, and Cefotiam Hexetil Hydrochloride and Bictegravir are top 10 predicted hits by both models (Table 3). Interestingly, chloroquine diphosphate was predicted by model b to bind ACE2 with a pBind of 7.8 (ranked 290 out of the FDA approved drugs for predicted binding) and TMPRSS2 with a pBind of 7.5 (ranked 210), while hydroxychloroquine sulfate was predicted by model b to bind ACE2 with a pBind of 7.9 (rank 261) and TMPRSS2 with a pBind of 7.22 (rank 307) (results not shown).
In order to confirm or deny findings from Fluency, we applied a disease cancelling technology approach, searching for FDA approved drugs which reverse Coronavirus associated gene expression changes. Unlike

Table 1 Top ranked fluency hits for binding to ACE2, ranked by pBind in model a
For each version of fluency run (models a and b), the predicted binding and rank is reported. A higher "pBind" signifies a higher binding affinity. The difference in pBind between ACE2 and ACE is reported in the last two columns, with larger values reflecting increased predicted binding specificity for ACE2 over ACE Fluency, DCT was applied in a target and pathway agnostic way, capturing the full gene expression change in a data driven way. Publicly available gene expression data were downloaded from GEO (GSE68820). Healthy mice (C57BL/6NJ) were infected with MA15 (mouse version of SARS-CoV) [20]. Lung tissue was collected for gene expression analysis. A robust differential expression signal was detected between infected and uninfected mice at day 2 (Fig. 1a) (Table 4). Genes changing in the opposite direction between MA15 infection and glutamine treatment are plotted in Fig. 1b. Interestingly, Glutamine is a precursor to Glutathione, which was ranked highly in Fluency results (Table 1). Thus, two orthogonal approaches (neural networks and cosine distance) used on two different data types (binding prediction and gene expression) both arrived at the same potential hit (Glutamine/Glutathione).

Discussion
First, we utilized an unbiased AI-based systems algorithm to interrogate 2657 FDA approved or repurposing drugs for binding to ACE2, the main SARS-CoV-2 Table 2 Top ranked fluency hits for binding to ACE2, based on a consensus ranking using the results of both models For each version of fluency run (models a and b), the predicted binding and rank is reported. A higher "pBind" signifies a higher binding affinity. A lower "Reverse Fluency" rank signifies a higher predicted specificity to the intended target human cell entry receptor. The rapid analysis of repurposing approved drugs for new indications allows for immediate access to potential agents that could be used for urgent emerging diseases, such as COVID-19. The ability to identify such drugs requires additional biologic validation through in vitro studies confirming receptor blockade and inhibition of SARS-CoV-2 cell entry and replication, and in vivo ideally through randomized, Table 3 Top ranked fluency hits from both models for binding to TMPRSS2 The "rank" column indicates the ranked position for a given model by binding prediction  controlled clinical trials. During a global pandemic, however, time may not allow for usual drug development processes and repurposing of commonly available drugs may be critical. Indeed, anecdotal reports of hydroxychloroquine, azithromycin and anti-IL6 therapies have received attention [21,22]. While hydroxychloroquine was predicted to bind to ACE2 by model b, supporting the anecdotal reports, we did not detect azithromycin or anti-IL-6 agents as these would not be anticipated to mediate therapeutic activity through ACE2 modulation. Further validation will be needed to determine if unbiased AI-based systems approaches are superior to anecdotal observations. In the binding prediction analysis, multiple known drugs were identified as potential ACE2 inhibitors (Table 1). Not surprisingly, twelve were ACE inhibitors. This adds some confirmation that the unbiased selection accurately identified drugs with high likelihood of receptor binding. ACE inhibitors are agents commonly used for the treatment of hypertension and heart failure. This family of drugs are based on various peptide compositions and were initially selected for binding to ACE1, which catalyzes the conversion of angiotensin I to angiotensin II, thereby blocking the renin-angiotensin system (RAS), lowering systemic blood pressure, increasing sodium excretion and increased renal water output. ACE inhibitors are not known to bind to ACE2, which lacks the carboxypeptidase activity of ACE1, but does contain a zinc-binding domain, exhibits metallopeptidase activity and shares approximately 40% homology with ACE1 [23,24]. Our model selected for preferential ACE2 binding and agents with better predicted binding values were prioritized (see Table 1). Early studies largely used angiotensin catalysis as the major readout for inhibition and whether current ACE inhibitors may block SARS-CoV-2 binding remains speculative [25]. In addition, due to the counter regulatory nature of ACE1 and ACE2 expression, it is possible that agents that downregulate ACE1 receptors may increase ACE2 receptor expression and could worsen coronavirus infection. Thus, we scanned for binding of both ACE1 and ACE2 for top hits, and ranked by predicted difference in binding. By this metric, Captopril, Enalaprilat and Monopril looked likely to inhibit both and potentially solicit this undesired feedback effect (Table 1). Ramipril is a long-acting ACE inhibitor prodrug that is converted to the active metabolite ramiprilat in the liver and may be associated with hepatic injury. Likewise, monopril is a pro-drug that undergoes transformation in the liver to the active metabolite fosinoprilat. In contrast, lisinopril is an orally active ACE inhibitor that does not undergo metabolic transformation and is excreted in the urine and does not bind to other serum proteins but may also be associated with hepatic toxicity and these drugs need to be used cautiously in patients with underlying liver disease. Captopril is a sulfhydryl-containing proline analog with potent and specific activity in blocking ACE peptidyl-dipeptidase activity. Captopril may also have anti-tumor activity through inhibition of tumor angiogenesis and promotion of anti-tumor immunity [26].
The analysis also identified a drug involved in glucose homeostasis and used in patients with diabetes mellitus as anti-hyperglycemic agents. Nateglinide (Table 1) is a derivative of phenylalanine and acts on beta-islet pancreatic cells ATP-sensitive potassium channels and stimulates insulin secretion [27]. The drug has been used for treatment of type 2 diabetes mellitus. Sotaglifozin (Table 2) is an oral inhibitor of the sodium-glucose cotransporter subtype 1 (SGLT1), expressed in the gastrointestinal (GI) tract and SGLT2, expressed in the kidneys [28]. To our knowledge, this agent have not been previously known to bind to ACE or ACE2. Glutathione is another interesting agent that was predicted by both binding AI and gene expression disease cancellation. It is an antioxidant demonstrating improved airway clearance and pulmonary function in cystic fibrosis [29]. Glutathione has also been evaluated as an adjunct in patients receiving certain chemotherapy agents following lung transplantation, and for management of HIV and Parkinson's disease with mixed results [30].
Fostamatinib (R-406, Table 2) is an oral inhibitor of the spleen tyrosine kinase (Syk) that is converted to the active metabolite, tamatinib, and has been approved for the treatment of chronic immune thrombocytopenic purpura and is being evaluated in other autoimmune disorders, such as rheumatoid arthritis [31]. R-406 may also mediate signal transduction downstream of classical immunoreceptors, including the B-cell receptor explaining why it may be useful in treating autoimmune diseases and B cell hematologic malignancies [32]. Emricasan (Table 2), also called IDN-6556, is a thiol protease that acts as a caspase-3 inhibitor that received orphan g status by the U.S. FDA for treatment of liver disease, such as chronic hepatitis C, where it functions to protect against excessive hepatic cell apoptosis. Emricasan has been shown to decrease hepatic aminotransferases in patients with hepatitis C and other viral-induced and non-viral liver diseases [33]. The drug has also shown activity against Zika virus-mediated caspase 3 induction and blocked viral infection of neural cells in vitro [34]. The potential antiviral activity of emricasan was identified in a drug repurposing screen following the Zika virus outbreak in 2016 [34]. Fosamprenavir was identified ( Table 2) and is a protease inhibitor prodrug of amprenavir, an anti-retroviral drug approved for the treatment of HIV disease. Agents with known antiviral activity against RNA viruses are especially interesting for evaluation against Coronaviruses. Orlistat (Table 2) is a carboxyl ester and reversible inhibitor of GI lipases [35]. Orlistat was initially isolated from Streptomyces toxytricini, a gram-positive bacterium, and blocks hydrolysis and absorption of dietary fats and was approved in the U.S. and U.K. for the treatment of obesity. Two of the drugs identified have activity as anticoagulants, tirofiban hydrochloride (Table 1) and argatroban ( Table 2). Tirofiban is a non-peptide tyrosine derivative and functions as an antagonist of the purinergic receptor, platelet glycoprotein-IIB/IIA [36]. The drug inhibits platelet aggregation and has been used for treating acute coronary syndrome and is being studied for management of ischemic stroke [37]. In contrast, argatroban is a small molecule that directly inhibits thrombin and is used for management of heparin-induced thrombocytopenia [38]. Piperacillin (Table 2) is a broad spectrum, semi-synthetic, beta-lactam, ureidopenicillin antibiotic derived from ampicillin. Piperacillin is active against gram-negative bacteria and was initially used for treating Pseudomonas aeruginosa infections and later as part of combination antibiotics for more complex infectious indications [39]. In contrast to macrolide antibiotics such as azithromycin which inhibit bacterial protein synthesis, piperacillin blocks bacterial wall synthesis. Since these are commonly used agents in the management of patients with pneumonia, they both merit further studies to understand their role in ACE2 modulation and potential role in management of COVID-19 infection.
To search for potential COVID-19 therapeutic approaches in an orthogonal and unbiased way, we applied our Disease Cancellation Technology to gene expression data from an animal model of SARS-CoV and ranked compounds by their ability to induce gene expression signals that counteract disease-associated signals. By this gene expression method, glutamine was a top hit for reversing Coronavirus associated changes in gene expression. Glutathione was highly ranked by Fluency for ACE2 binding and its precursor glutamine was highly ranked by gene expression DCT, suggesting both deserve further testing to explore potential benefits against SARS-CoV-2. Both glutamine and glutathione have previously demonstrated antiviral activity against herpes virus (HSV) infections [40].

Conclusion
In summary, we used a novel AI-based systems approach to identify potential drugs currently available that are predicted to bind to ACE 2. These agents are readily available and could be rapidly assessed both in the laboratory and clinic for activity against SARS-CoV-2 infection and clinical course of COVID-19 disease. Further studies of these agents may provide new clinical strategies for patients with coronavirus diseases. Under normal circumstances, we would conduct experimental validation prior to submitting this report for publication. Given the current public health emergency, we are publishing this work now in the event that others are set up to more quickly validate, assess, and build upon these findings. Although validation is still needed, this report highlights how AI-based systems may be utilized to rapidly identify drugs for repurposing against new and emerging human diseases.

Materials and methods
ACE2 (UNIPROT ID: Q9BYF1), ACE1 (UNIPROT ID: P12821), and TMPRSS2 (UNIPROT ID: O15393) were run separately as the protein target of Immuneering's Fluency query. Fluency is a single universal quantitative structure-activity relationship (QSAR) deep learning model, which takes protein amino acid sequence and small molecule SMILES as input. Fluency was trained on experimental binding data from chembl 24 (model a) and chembl 25 (model b). Fluency predictions have previously been experimentally validated for multiple targets. In this case, Fluency was used to predict binding of the Selleckchem FDA approved drug library (https ://www.selle ckche m.com/scree ning/fda-appro ved-drug-libra ry.html) separately to ACE2, ACE1, and TMPRSS2. For top hits, fluency was run in reverse (predicting binding of a single small molecule to 20,206 human proteins) to score specificity. Predicted binding scores for ACE1 and ACE2 were compared for top hits to assess predicted specificity for ACE2 over ACE1 in each model (as reflected in the "pBind_x_ACE2-pBind_x_ACE" columns). Similarity to known binders (reported pChEMBL value greater than 7 in the ChEMBL database) to ACE2 was computed using Tanimoto distance of molecular fingerprints from RDKit in Python. Top ranked Fluency hits were filtered by evaluating individual rankings from model a and model b, as well as the average rank of predictions and the combined pBIND scores of both models.
Gene expression data was downloaded from GEO (GSE68820). The processed data which was background corrected, quantile normalized, and summarized after outlier removal by the author was used [20]. For each of the time points, differential expression was calculated between the MA15 (SARS-CoV) virus infected wild type mice lung samples and the mock-inoculated wild type mice using the limma R-package version 3.40.6 [41]. Immuneering leveraged its previously described [42] and validated [43,44] DCT, and ran the SARS-CoV disease signature against the LINCS drug perturbation database [45]. Results were filtered for adjusted p-value significance and maximal disease cancellation score.