Although it is becoming evident that individual’s immune system has a decisive influence on SARS-CoV-2 disease progression, pathogenesis is largely unknown. In this study, we aimed to profile the host transcriptome of COVID-19 patients from nasopharyngeal samples along with virus genomic features isolated from respective host, and a comparative analyses of differential host responses in various SARS-CoV-2 infection systems.
Unique and rare missense mutations in 3C-like protease observed in all of our reported isolates. Functional enrichment analyses exhibited that the host induced responses are mediated by innate immunity, interferon, and cytokine stimulation. Surprisingly, induction of apoptosis, phagosome, antigen presentation, hypoxia response was lacking within these patients. Upregulation of immune and cytokine signaling genes such as CCL4, TNFA, IL6, IL1A, CCL2, CXCL2, IFN, and CCR1 were observed in lungs. Lungs lacked the overexpression of ACE2 as suspected, however, high ACE2 but low DPP4 expression was observed in nasopharyngeal cells. Interestingly, directly or indirectly, viral proteins specially non-structural protein mediated overexpression of integrins such as ITGAV, ITGA6, ITGB7, ITGB3, ITGA2B, ITGA5, ITGA6, ITGA9, ITGA4, ITGAE, and ITGA8 in lungs compared to nasopharyngeal samples suggesting the possible way of enhanced invasion. Furthermore, we found comparatively highly expressed transcription factors such as CBP, CEBP, NFAT, ATF3, GATA6, HDAC2, TCF12 which have pivotal roles in lung injury.
Even though this study incorporates a limited number of cases, our data will provide valuable insights in developing potential studies to elucidate the differential host responses on the viral pathogenesis in COVID-19, and incorporation of further data will enrich the search of an effective therapeutics.
Since the declaration of COVID-19 pandemic on 11 March, this Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) mediated infection has spread ~ 213 countries and territories . Approximately, 15 million individuals across the globe have fallen victim to this virus and the number is constantly increasing at an alarming rate, as of the writing of this manuscript . Though the initial fatality was as low as 3.5%, currently this value lies around ~ 6.66%  and it might be increased because of the withdrawal of earlier preventing measures taken throughout the world. Coronaviruses are not new to human civilization, as these viruses caused several earlier outbreaks during the past two decades. However, none of the earlier outbreaks spread as widely as the current ongoing pandemic. As the pandemic progresses, more researches on the molecular pathobiology of the COVID-19 are being rapidly carried out to search for effective therapeutic intervention.
Coronaviruses possess single-stranded RNA (positive sense) genomes lengthening approximately 30 Kb . Amongst the coronaviruses, SARS-CoV-2 is a member of the betacoronaviruses having a ~ 29.9 Kb genome which contains 11 functional genes . Though SARS-CoV-2 shows similar clinical characteristics as Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrome-related Coronavirus (MERS-CoV) viruses, it has only ~ 79% and ~ 50% genome sequence similarities with these viruses, respectively; whereas, the genome sequence of SARS-CoV-2 is ~ 90% identical to that of bat derived SARS-like coronavirus . Moreover, several key genomic variances between SARS-CoV-2 and SARS-CoV such as- 380 different amino acid substitutions, ORF8a deletion, ORF8b elongation, and ORF3b truncation were also reported .
The clinical characteristics of the COVID-19 range from mild fever to severe lung injury . Some of the commonly observed mild COVID-19 symptoms are fever, cough, and fatigue; however, complications such as- myalgia, shortness of breath, headache, diarrhea, and sore throat were also reported . Furthermore, severely affected patients had exhibited respiratory complications like moderate to severe pneumonia, acute respiratory distress syndrome (ARDS), sepsis, acute lung injury (ALI), and multiple organ dysfunction (MOD) . Primarily, the lungs of the COVID-19 patients are affected ; however, failures of other functional systems, namely cardiovascular system, and nervous system were also reported [9, 10].
Several features of the SARS-CoV-2 infection made it more complicated for effective clinical management. From the earlier studies, the incubation period of SARS-CoV-2 was reported to be around 4–5 days, however, some recent studies suggested a prolonged incubation period of 8–27 days . Additionally, several cases of viral latency within the host , and the recurrent presence of SARS-CoV-2 in clinically recovered patients were also recorded [13, 14]. However, the detailed molecular mechanism behind these phenomena is still elusive.
In COVID-19, an increased level of infection-associated pro-inflammatory cytokines were recorded , which thereby supports the term “Cytokine storm”, that was frequently used to describe the SARS-CoV and MERS-CoV disease pathobiology . This phenomenon causes the hyperactivation and recruitment of the inflammatory cells within the lungs and results in the acute lung injury of the infected patients . However, this illustrates one putative molecular mechanism of COVID-19, there are many other immune regulators and host genetic/epigenetic factors which can also play significant contribution towards the disease manifestation [18, 19]. This multifaceted regulation was also reported previously for other different coronavirus infections . Host–pathogen interactions in different coronavirus infections can function as a double-edged sword, as these could be beneficial not only to the hosts but also the viruses . Similar host-virus tug-of-war can also occur in COVID-19 which might be contributing towards the overcomplicated disease outcomes .
Collectively, more than 1.7 million (almost 9% of the total infections around the globe) people have been diagnosed with COVID-19 in the South-Asian region and the number is still increasing devastatingly . Recently, it has been speculated that South-Asian people might be possessing a genomic region acting as the risk factor for COVID-19 . Moreover, another study suggested some genomic variations in several Indian SARS-CoV-2 isolates that might be involved in the COVID-19 pathogenesis in Indian patients . However, any data suggesting the COVID-19 patients’ transcriptomic responses from this part of the globe are yet to be reported.
SARS-CoV-2 follows a highly variable course and it is becoming more evident that individual’s immune system has a decisive influence on the progression of the disease . However, the detailed underlying molecular mechanisms of the SARS-CoV-2 mediate disease pathogenesis are largely unknown. Even previously conducted studies using patient samples, animal models, and cell lines to explain the pathobiology of COVID-19 [24,25,26] lack a detailed comparison of the host transcriptional responses between different infection models as well as the different sites of the respiratory system that might provide valuable insights on the COVID-19 pathogenesis and disease severity. In this present study, we sought to discuss the host transcriptional responses observed in nasppharyangeal cells of COVID-19 patients. This trscriptional profile report is first such kind from South Asian region. Additionally, we reported the genome variations observed in the four SARS-CoV-2 isolates obtained from these patients. Finally, we illuminated the differences in host transcriptional responses in different COVID-19 infection models and further pursued to discover the putative effects of these altered responses (Fig. 1).
Our sequenced SARS-CoV-2 isolates showed a divergent variation pattern compared to the other worldwide isolates
We sought to find out the genome variations within the four SARS-CoV-2 isolates that we sequenced, and pursued the deviation of these genomes compared to the other isolates from this country. To accomplish these goals, we first identified and annotated the genome variations observed within our sequenced isolates. Then we produced informative statistics from these observed variations and compared the prevalence of those with the other isolates of Bangladesh and the rest of the world.
We mapped the RNA-seq reads of each of the samples and checked their distribution athwart the entire reference genome of SARS-CoV-2 (Fig. 2a). High coverages and read evidence were observed for all the isolates across the whole genome of the SARS-CoV-2 (Fig. 2a). This suggests that the sequenced genomes of these isolates are of high coverage and no such region is observed without the mapped reads.
We detected sixty different types of variations within these four analyzed SARS-CoV-2 isolates (Table 1). All the four different types of sequence variations were spotted in these isolates, however, single nucleotide polymorphisms (SNPs) were the most prominent (Fig. 2b). Among these variations, twelve variations were found in more than one isolate, whereas rest forty-eight variation occurred in only one isolate (Table 1, Fig. 2c). Among the isolates, isolate S3 contained the lowest number of variations, whereas isolate S4 has the highest number of variations (Fig. 2d). Many concepts are there correlating the probable roles of variations with the COVID-19 disease severity [27, 28]. We did not observe any such variations within the spike region of our reported isolates; however, we recorded an unusual amount of 3′-UTR and 5′-UTR variations within these four isolates (Fig. 2c, d). Surprisingly, out of all these variations, we found only one downstream gene variation on the 3′-UTRs of all the four isolates; this variation can potentially impact the regulation of the ORF10 gene (Fig. 2d). Most of the nucleic acid mutations were located on the 3′-UTR of the isolates, whereas the ORF1ab gene contained most of the amino acid mutations (Fig. 2e).
No highly severe mutation was identified amongst these variations, but we found nine moderately impacting, seven low impacting, and forty-seven modifier variations within these isolates (Fig. 2f, Additional file 1). As of 8th July, thirty-eight out of the sixty variations within our sequenced isolates were completely absent in all other SARS-CoV-2 isolates (Table 1). Strikingly, we observed that variation 10,329: A>G is present within three of our sequenced isolates, only one other Bangladeshi and one other USA isolate contain this variation (Fig. 2g). This variation is located within the 3C-like protease of SARS-CoV-2. Previously, the potential implication of the mutations of this protein was reported to alter its overall structure and functionality [29,30,31] in SARS-CoV. The only one deceased patient did not have this mutation in our samples. Also, few of our reported variations like 25,505: A>T and 29,392: G>T are not highly prevalent globally (Fig. 2g).
Exploring the Nextstrain portal , we noticed that our analyzed SARS-CoV-2 sequences are closely placed to the Saudi-Arabian isolates (Additional file 2: Figure S1A); although, most of the other isolates of this country were placed in the major European clusters (data not shown). Furthermore, these isolates analyzed in this study are distinctly placed in our constructed Neighbor-Joining phylogenetic tree (Additional file 2: Figure S1B), this also supports the differences between these isolates and other SARS-CoV-2 isolates of this country which might have been originated from the European nations. As a large number of people from Bangladesh recently immigrated to Middle-East (particularly Saudi Arabia) for work ; those immigrant people returning from the Middle-East during this pandemic might have brought these isolates into Bangladesh.
Stimulated antiviral immune responses are detected in the nasopharyngeal samples of COVID-19 patients
Our analyzed patients exhibited the commonly observed sign and symptoms of COVID-19 such as mild fever, sore throat, coughing, bodyache, fatigue, and dysosmia (Additional file 3). Patients were hospitalized but no intensive clinical interventions such as ICU support or ventilation support were needed. Male to female ratio of the patients were 1:1. The median age of the patients were ~ 45 years, only one patient was around 85 years old. This oldest patient had some additional clinical features such as preexisting asthma and diarrhea. All the patients recovered within one month of the initial diagnosis except patient S9, who died after COVID-19 infection.
Though initial researches suggested the potential implication of viral variations on the COVID-19 disease severity, one recent study indicated otherwise; Several host factors such as abnormal immune responses, and cytokine signaling might be influencing the overall disease outcomes more prominently compared to the viral mutations . Moreover, several data surmised that ethnicity might be a pivotal risk factor of being susceptible to COVID-19 .
In this context, we explored the transcriptome data obtained from the nasopharyngeal samples from COVID-19 patients to find out how these patients were responding against the invading SARS-CoV-2. We compared the RNA-seq data of these patients with some random normal individuals’ nasopharyngeal RNA-seq data to find out the differentially expressed genes within our analyzed samples. We observed a roughly constant standard deviation for the normalized reads suggesting a lesser amount of variation occurred during the normalization (Fig. 3a). Furthermore, we performed sample clustering to assess the quality of our generated normalized RNA-seq data. No anomalies were observed in the sample to sample distance matrix (Fig. 3b) and principal component analysis (PCA) (Fig. 3c) while comparing our samples with the used healthy individuals’ data. Moreover, the larger differences observed in the PCA plot (Fig. 3c) and clustered heatmap of the count matrix with the top 50 significant genes (Fig. 3d) suggest a significant transcriptomic response difference between our infected patients’ data and the normal individuals’ data. Likewise, the sample to sample distance plot suggested the similarities of samples of similar nature; the infected and healthy samples were clustered into two distinct groups (Fig. 3b).
Sungnak et al. described the significance of several viral entry associated host proteins in SARS-CoV-2 pathogenesis, namely ACE2, TMPRSS2, BSG, CTSL, DPP4 . We also investigated the expression of the associated transcripts of these proteins within our patients’ samples. We spotted that both the healthy and infected samples have expressed these genes except the DPP4 gene (Fig. 3e).
Genes that are deregulated due to SARS-CoV-2 initial infection site at nasopharyngeal region are not elucidated much so far. Here, we identified 1,614 differentially expressed genes within our reported four SARS-CoV-2 infected nasopharyngeal samples; among these differentially expressed genes, 558 genes were upregulated, and 1056 genes were downregulated (Additional file 4). Then we sought to discover the biological functions/pathways these deregulated genes might be involved in. To achieve this, we performed functional enrichment analyses with the observed deregulated genes using different ontology and pathway modules.
Several GOBP terms related to antiviral immune responses such as viral process, defense response to virus, innate immune response, inflammatory response, negative regulation of viral transcription, and negative regulation of viral genome replication were observed enriched for the upregulated genes (Fig. 3f, Additional file 5: Figure S2). Surprisingly, several other important antiviral defense related functions such as- apoptosis, and antigen processing and presentation were found enriched for downregulated genes (Fig. 3f).
Similarly, this pattern was also observed for the functional enrichment using KEGG and Bioplanet pathways modules. Upregulated genes are observed involved in signaling pathways such as innate immune system, antiviral mechanism by interferon-stimulated genes, interleukin-2 signaling, interferon-gamma signaling, interferon alpha–beta signaling, antiviral mechanism by interferon stimulated genes, IL-17 signaling pathway, Toll-like receptor signaling pathway, RIG-I like receptor signaling pathway, and MAPK signaling pathway (Fig. 3g, h, Additional file 5: Figure S2). Strikingly, several important antiviral signaling pathways such as antigen processing and presentation, apoptosis, HIF-1 signaling pathway, Natural killer cell mediated cytotoxicity, phagosome, PI3K-Akt signaling pathway, Interleukin-6 regulation of target genes, and Interleukin-10 inflammatory signaling pathway were enriched for the downregulated genes (Fig. 3g, h). This unusual observation made us curious to search for a similar pattern of deregulated host responses in several other COVID-19 disease models.
Host responses observed in nasopharyngeal samples are significantly different compared to the other SARS-CoV-2 infections models
We sought to compare the host responses of our analyzed samples with several other different SARS-CoV-2 infection models (two different experiments containing lung biopsy samples from COVID-19 patients and two different SARS-CoV-2 infected cell lines). We performed functional enrichment analyses using differentially expressed genes from four other SARS-CoV-2 infection systems, and compared the enriched terms of our samples with these four other samples. Moreover, how the host responds differently in different tissue types were also evaluated. To achieve these goals, we identified the differentially expressed genes across these different samples and systematically compared the enrichment results of those deregulated genes.
Using the similar parameterization of the differential gene expression analyses, we identified 6714 genes in lung cells (GSE147507), 232 genes in lung cells (GSE150316), 143 genes in NHBE cells (GSE147507), and 5637 genes in Calu-3 cells (GSE148729) as differentially expressed compared to their respective healthy controls (Additional file 6). Significant proportions of the deregulated genes detected in our nasopharyngeal samples are also found deregulated in lung (GSE147507) and Calu-3 cells (GSE148729) samples (Fig. 4a), while a small number of our samples’ deregulated genes were also observed deregulated in rest of the two samples used (Fig. 4a).
Enrichment analysis using these deregulated genes suggested the host response differences among the different infection systems used (Fig. 4b–d). Upon the analysis, only a few GOBP terms were found enriched for both our samples, lung (GSE147507), and Calu-3 cells (GSE148729) samples, such as viral process, immune response, innate immune response, defense response to virus, and interferon signaling (Fig. 4b). However, genes in many important antiviral immune response related functions were not significantly enriched in nasopharyngeal samples but were enriched for the lung (GSE147507), and Calu-3 cells (GSE148729) samples; these processes are autophagy, apoptotic signaling pathway, interleukin-6 mediated signaling pathway, interleukin-12 mediated signaling pathway, cytokine-mediated signaling pathway, and inflammatory response (Fig. 4b). Moreover, processes such as response to hypoxia, response to vitamin-D, and lung development were also not enriched for the deregulated genes in nasal samples (Fig. 4b).
We noticed several commonly enriched important immune signaling pathways for most of the samples used for the comparison (Fig. 4c, d), such as adaptive immune system, innate immune system, interferon signaling, apoptosis, Toll-like receptor signaling pathway regulation, antigen processing and presentation, integrin signaling pathway, RIG-I like receptor signaling pathway, and phagosomes (Fig. 4c, d, Additional file 7: Figure S3); however, pathways such as JAK-STAT signaling pathway, Natural killer cell mediated cytotoxicity, NF-κB signaling pathway, asthma, PI3K-Akt pathway, cellular response to hypoxia, inflammasomes, and inflammatory response pathway (Fig. 4c, d, Additional file 7: Figure S3) were not enriched for the deregulated genes of our nasopharyngeal samples. These results suggest that host responses observed in nasopharyngeal samples have a different host response compared to the other infection systems. However, the diffenences observed in the infected cell lines’ transcriptomes might be the resultant effects of the inherent variability of these cells compared to the nasal epithelial cells or the lung cells. Therefore, to unveil the mystery behind this observation, we further analyzed these data to compare the COVID-19 patients’ nasal and lung gene expression patterns for different specific functionalities.
Significant gene expression differences were spotted between the nasopharyngeal samples and lung biopsy samples
We compared the normalized read counts of each infected nasal and lung samples without integrating the respective controls to shed insights on the differences in gene expression patterns between the individual samples and tissues. A constant standard deviation was observed for the normalized read counts of the infected samples (Fig. 5a) indicating the acceptability of the normalized reads for analysis. From the sample to sample distance clustering, principal component analysis, and clustered heatmap of the count matrix with top 50 genes, we observed that gene expression profiles of our nasopharyngeal samples are more relevant to that of lung samples; whereas, high level of variance was observed between the gene expression counts of the cell lines and primary nasopharyngeal samples (Fig. 5b–d). PCA analysis (Fig. 5c) also suggests that cell line data are quite different than primary samples data. Furthermore, we had a similar observation from the clustered normalized read counts of the samples based on Pearson’s correlation distance with all genes that vary across samples (Fig. 5e). We then narrowed down our searches to the sample level gene expression profiles of several COVID-19 related important biological functions within these samples (Fig. 6), to understand the gene expression similarities and dissimilarities among these infections systems, specially comparing nasal and lung tissues.
Genes related to integrins and integrin signaling pathway are highly expressed in lung samples compared to the nasopharyngeal samples
Though some previous reports [37, 38] suggested an important aspect of integrins in SARS-CoV-2 pathogenesis, precise information on which particular integrins are deregulated and how virus interactions might modulate them remained unclear. Therefore, we sought to find out the expression profiles of integrin related genes in different COVID-19 infection models at sample level. RGD motif of the spike protein of SARS-CoV-2 can bind the integrins and this motif is placed near to the ACE2-receptor binding motif . Moreover, evidence of integrin domain binding was also reported for SARS-CoV . Therefore, we sought to discover the expression profiles of the integrin related genes in different SARS-CoV-2 infection models. To accomplish this, we filtered out the integrin and integrin signaling related genes (Additional file 8) within the terms of the GOBP, KEGG pathway, and Bioplanet pathway modules that we used for enrichment analysis. Intriguingly, we observed that the genes related to integrins and integrin signaling such as ITGAV, ITGA6, ITGB7, ITGB3, ITGA2B, ITGA5, ITGA6, ITGA9, ITGA4, ITGAE, and ITGA8 were highly expressed in analyzed lung samples, and the lowest number of these genes were expressed in the nasopharyngeal samples (Fig. 6a, b, Additional file 9: Figure S4A). Based on these observations, we can assume that overexpression of integrins and integrin signaling related genes in the lungs might provide the virus a competitive edge in invading the lung cells more efficiently compared to the cells of the nasopharynx and respiratory tracts.
Cytokine and inflammatory signaling genes are overexpressed in lung samples
Aberrant cytokine stimulation and inflammatory responses are thought to be the major contributor to pathogenic lung damages in severely affected COVID-19 patients [40, 41]. We wanted to find out whether the genes related to cytokine signaling and inflammation have differential expression profiles in lung cells compared to the other infection systems. We extracted and compared the gene expression values of the genes related to these two terms (Additional file 8). We are not surprised to observe that the genes of these two major contributing events of COVID-19 lung pathobiology are significantly overexpressed in lung samples compared to the rest of the SARS-CoV-2 infected cell types (Fig. 6c–f, Additional file 9: Figure S4B-C). Particularly, the analyzed nasopharyngeal samples have very low expression values for the cytokine and inflammatory signaling genes such as CCL4, TNFA, IL6, IL1A, CCL2, CXCL2, IFN, and CCR1 (Fig. 6c–f). Therefore, these observations are fueling the preexisting supposition of the roles of enhanced cytokine, and inflammatory signaling for worsening the disease condition in patients with SARS-CoV-2 infected lungs.
A differential gene expression profile was detected for the SARS-CoV-2 entry receptors/associated proteins in different infection models
Expression of receptor protein ACE2 and entry associated proteins such as TMPRSS2, BSG, CTSL, DPP4 on the cell surface of the host is essential for the invasion of SARS-CoV-2 . Moreover, ACE2 overexpression is thought to increase the infection potentiality of SARS-CoV-2 . Furthermore, Kuba et al. demonstrated the potential role of ACE2 in SARS-CoV induced lung injury . So, we ventured to check the gene expression levels of ACE2 and the other entry associated proteins in the different SARS-CoV-2 infected cells. Surprisingly, we observed that the ACE2 gene was not expressed in high levels in lung samples as speculated (Fig. 6g). However, gene expression levels of the other entry associated proteins were higher in lung samples (Fig. 6g). Nonetheless, in few of the lung samples, the TMPRSS2 gene was not expressed in higher amounts (Fig. 6g). Interestingly, we have not detected any expression of DPP4 gene within the reported nasopharyngeal samples (Fig. 6g).
Inflammatory immune responses were several folds higher in lungs than the nasopharynx of COVID-19 patients
From our previous observations, it was evident that COVID-19 patient’s lung responds to the viral infection differently compared to the epithelial cells of nasopharynx. We then sought to figure out the specific genes and biological functions/signaling pathways which have this differential pattern. We achieved this by designing a multifactorial differential gene expression analysis using a generalized linear model (GLM) ; in which we compared the fold changes of every differentially expressed gene in our nasopharyngeal samples and lung (GSE150316) samples, to discover how many folds lung is alternatively expressing the genes than nasopharynx in COVID-19.
Firstly, we analyzed the suitability of the data for this design and observed no irregularities between the data used (Fig. 7a–d). Moreover, upon this multifactorial differential gene expression analysis, we observed an acceptable common biological coefficient of variation; this variation decreases significantly as the expression values increases (Fig. 7e). From the MA plot, we observed a very high amount of the significantly (p-value < 0.05) several fold upregulated and downregulated genes in lungs compared to nasopharyngeal samples (Fig. 7f). We detected 807 upregulated and 298 downregulated genes in lungs compared to the nasopharyngeal samples (Additional file 10). Interestingly, we noticed the highly upregulated integrin and integrin signaling genes in lungs compared to the nasal samples (Fig. 7g) which are consistent with our previous observations. Modulatory roles of integrins are well established in acute lung damages . Similarly, aberrant expression of genes involved in integrin signaling can also provoke acute lung injuries, namely- ADAM15 , SDC1 , CD14 , CD47 , CD9 , HMGB1 , ITA6 , and ITAV . Therefore, SARS-CoV-2 infection induced deregulation of these genes might be contributing towards the worsening of the normal pathobiology and functionality of lungs in COVID-19.
We then performed functional enrichment analysis to hunt down the signaling pathways which are differentially expressed in lungs compared to the nasopharyngeal cells. These enrichment analyses revealed that biological functions such as viral process, and antigen processing and presentation were highly upregulated, function such as regulation of gene silencing by miRNA was found downregulated in lungs compared to the nasopharyngeal cells (Fig. 8a). Furthermore, pathways that provide antiviral immunity such as apoptosis, phagosome, antigen processing and presentation, adaptive immune system, innate immune system, interferon signaling, different interleukin signaling, and cytokine signaling in immune system were highly upregulated in lungs compared to the nasopharyngeal samples (Fig. 8b–d). Despite having the antiviral protective roles, hyperactivity from these pathways can significantly worsen the COVID-19 patient’s overall lung functionality which can be further complicated with progressive and permanent lung damage.
Previously, it was reported that transcription factors can contribute to many inflammatory lung diseases [54, 55] which have similar lung characteristics observed in COVID-19. In this context, we identified the highly expressed transcription factors in lungs by comparing their respective expression values in nasopharyngeal samples (Fig. 8e). Among these, transcription factors such as CBP , CEBP , NFAT , ATF3 , GATA6 , HDAC2 , and TCF12  have significant roles in lung’s overall functionality, acute lung injury and antiviral response mechanism in lungs.
SARS-CoV-2 integrates its proteins in regulating the host antiviral immune responses
As we have observed the differential host responses in COVID-19 nasopharyngeal samples, then we sought to interconnect the virus-host interplay in those host responses. We first analyzed how many of the virus interacting host proteins’ genes reported by Gordon et al.  are differentially expressed in our reported nasopharyngeal samples. Only 51 genes of those proteins are found deregulated in our nasopharyngeal samples (Fig. 9a). We then constructed a network interlinking the virus-host protein–protein interaction data from Gordon et al.  along with the deregulated genes from the nasopharyngeal samples (Fig. 9b). Strikingly, we observed that most of the immune-signaling-related downregulated genes are directly or indirectly connected to the viral proteins (Fig. 9b); this suggests the probable roles of the virus in the differential host responses in the COVID-19 affected patients.
Furthermore, to understand if there are any viral factor dependent enhancement of integrin expression, we sought to establish the links between the viral proteins with integrin signaling associated genes by constructing a functional network with the viral-host protein–protein interaction data with the highly upregulated genes observed in lungs (from the comparison analysis between the lung and nasopharyngeal samples) (Fig. 9c). From this constructed network, we observed that viral proteins such as ORF10, N, ORF9b, NSP7, NSP15, NSP5, M, NSP13, NSP2, NSP9, ORF8, ORF9c, NSP12, and NSP1 can directly or indirectly interact with the differentially expressed genes in lungs (Fig. 9c), suggesting the putative mechanism behind the deregulated integrin signaling to promote the viral invasion in lungs (Fig. 10).
For a better understanding of the host-virus interaction in the SARS-CoV-2 pathogenesis, transcriptional responses of hosts play an enormous role. In this context, we aimed to discover the host transcriptome response upon SARS-CoV-2 infection by performing and analyzing total RNA-seq from the nasopharyngeal samples of four COVID-19 positive individuals. Moreover, we compared the transcriptome from different SARS-CoV-2 infection models, particularly, we compared the differential gene expression of the lung biopsy samples with the nasopharyngeal samples of ours to illustrate the possible molecular mechanisms behind the lung damages in severe COVID-19 patients.
Previously, host transcriptional responses reported by Blanco-melo et al.  and Butler et al.  suggested a potential increase in the host antiviral immune responses such as interferon signaling, interferon stimulated gene signaling, chemokine signaling, and cytokine signaling; however, Blanco-melo et al.  also reported the presence of low IFN-I and IFN-III in COVID-19 patient’s lung cells. We observed similar host immune responses, interferon, and cytokine signaling in our reported COVID-19 patients too. Moreover, we also observed a stimulated innate immune response in our patients which was also reported for other COVID-19 patients .
Astoundingly, important signaling pathways those elicit antiviral immune responses such as apoptosis , phagosome formation , antigen processing and presentation , Natural killer cell mediated cytotoxicity , and Toll-like receptor signaling  were found downregulated in these COVID-19 patients. Also, pathways such as HIF-1 response , PI3K-Akt signaling , and IL-17 signaling  were also found deregulated, which could assist the COVID-19 patients suffering from hypoxia, lung injury, and inflammation of the respiratory tract.
All of our patients showed dysosmia which is also a commonly observed features in most other COVID-19 patients around the world. This might have occurred due to the hypothesized reasons reported by Breguglio et al. . Interestingly, our patients’ nasopharyngeal data also provides supportive clues such as overexpressing local cytokine signaling, inflammatory responses and accumulation of innate immune cells in the nasopharyngeal regions; all of which might contribute towards the destabilization of olfaction within these patients.
While we were comparing the nasopharyngeal cell’s transcriptional responses with other SARS-CoV-2 infection models, we observed that lung cells elicited the immense cytokine and inflammatory responses against the invading viral pathogen. These overstimulated responses sometimes can do irreversible damages to the lungs . This might shed insights into the COVID-19 disease severity when the viral infection progresses into the lungs.
Though an increased amount of ACE2 will facilitate the invasion of SARS-CoV-2, nonetheless, we observed a significant downregulation of ACE2 in lung cells; Hou et al. reported similar phenomenon in an earlier study . This phenomenon could backup the concept of ACE2 downregulation by SARS-CoV-2 itself after using it , thus reducing the organ protective roles of ACE2  and resulting in progressive lung damages.
Integrins were reported important for the entry of SARS-CoV into the host cells , so it was speculated similar phenomenon might also be present in SARS-CoV-2. This idea is further intensified after the study by Sigrist et al. , who suggested the presence of an integrin-binding RGD motif in the spike of SARS-CoV-2. Surprisingly, upon the gene expression comparison between the different SARS-CoV-2 infected cells, we observed several folds upregulated expressions of genes encoding integrins in lung cells. This observation could support the idea of increased viral infections in lungs might be happening due to the overexpression of these probable attachment proteins. Also, the network analysis suggests a probable mechanism of upregulation of these proteins by the virus itself by the putative interactions through its proteins. As our study is based on the data acquired from a limited number of samples, therefore, more targeted studies with a larger sample size should be undertaken for conclusive evidence supporting this phenomenon.
In this study, we present the very first report of the host transcriptional response data from COVID-19 patients of the South-Asian region along with the SARS-CoV-2 isolates obtained from these patients. This data might provide newer insights into the host responses against the virus in the different parts of the respiratory tract. However, a limited number of patient data is used here, but subsequent incorporation of more patient data from other parts of the world will significantly increase the understanding of this complex host-virus response in COVID-19, which will help in designing therapeutic interventions as well as in current clinical management of the patients.
Sample collection and virus detection by Real-time reverse transcription-quantitative PCR (RT-qPCR)
The nasopharyngeal swab samples were collected from patients suspicious of COVID-19 and placed in sample collection vial containing normal saline. Collected samples were preserved at − 20 °C until further use for RNA extraction and RT-qPCR assay. The RT-qPCR was performed for ORF1ab and N genes of SARS-CoV-2 using Novel Coronavirus (2019-nCoV) Nucleic Acid Diagnostic Kit (PCR-Fluorescence Probing) of Sansure Biotech Inc. according to the manufacturer’s instructions. RNA was extracted from a 20 µL swab sample through lysis with sample release reagent provided by the kit and then directly used for RT-qPCR. Thermal cycling was performed at 50 °C for 30 min for reverse transcription, followed by 95 °C for 1 min and then 45 cycles of 95 °C for 15 s, 60 °C for 30 s on an Analytik-Jena qTOWER instrument (Analytik Jena, Germany).
Total RNA was extracted from nasopharyngeal swab samples (labeled as S2, S3, S4, S9) collected from SARS-COV-2 infected COVID-19 patients using TRIzol (Invitrogen) reagent following the manufacturer’s protocol. RNA-seq libraries were prepared from total RNA using TruSeq Stranded Total RNA Library Prep kit (Illumina) according to the manufacturer’s instructions where the first-strand cDNA was synthesized using SuperScript II Reverse Transcriptase (Thermo Fisher) and random primers. Paired-end (150 bpreads) sequencing of the RNA library was performed on the Illumina NextSeq 500 platform.
Data processing and identification of the viral agent
Firstly, the sequencing reads were adapter and quality trimmed using the Trimmomatic program . The remaining reads were mapped against the SARS-CoV-2 reference sequence (NC_045512.2) using Bowtie 2 . Then the mapped reads were assembled de novo using Megahit (v.1.1.3) .
Mapping of the RNA-seq reads onto SARS-CoV-2 reference genome
We mapped the normalized (by count per million mapped reads-CPM) RNA-seq reads onto the SARS-CoV-2 genome track of the UCSC genome browser  using the “bamCoverage” feature of deepTools2 suite .
Identification of SARS-CoV-2 genome variations and variation annotation
We identified the variations within our sequenced SARS-CoV-2 genome using the “Variation Identification” (https://bigd.big.ac.cn/ncov/online/tool/variation) tool of “2019 Novel Coronavirus Resource (2019nCoVR)” portal of China National Center for Bioinformation . We then annotated the variations of the isolated SARS-CoV-2 isolates using the “Variation Annotation” (https://bigd.big.ac.cn/ncov/online/tool/annotation) tool from the same portal . We also gathered the global frequency of every identified variation using this same information portal . Different representations showing the information regarding the variations were produced using the Microsoft Excel program . The impacts of the variations were further characterized utilizing the Ensembl Variant Effect Predictor (VEP) tool .
Analysis of RNA-seq expression data
We analyzed both our RNA-seq and some publicly available RNA-seq data for COVID-19 host transcriptional profile analysis. Publicly available Illumina sequenced RNA-seq raw FastQ reads were extracted from the GEO database (accessions of the data used can be found in Additional file 11) . We have checked the raw sequence quality using FastQC program (v0.11.9)  and found that the "Per base sequence quality", and "Per sequence quality scores" were high over the threshold for all sequences (Additional file 12). The mapping of reads was done with TopHat (tophat v2.1.1 with Bowtie v2.4.1) . Short reads were uniquely aligned allowing at best two mismatches to the human reference genome from (GRCh38) as downloaded from the UCSC database . Sequence matched exactly more than one place with equally quality were discarded to avoid bias . The reads that were not mapped to the genome were utilized to map against the transcriptome (junctions mapping). Ensembl gene model  (version 99, as extracted from UCSC) was used for this process. After mapping, we used the SubRead package featureCount (v2.21)  to calculate absolute read abundance (read count, rc) for each transcript/gene associated to the Ensembl genes.
Differential gene expression analysis
To obtain the differential gene expression profile of our studied nasal samples, we utilized the the RNA-seq data recorded from nasal epithelial cells of 4 different non-asthmatic adult individuals as normal controls (Additional file 11); these cells were taken 7 days before the original infection analysis (GEO accession: GSE97668). For the differential gene expression analysis of COVID-19 affected lungs, we’ve taken the RNA-seq data from the lung biopsy of a deceased COVID-19 patient and the associated controls from the original study (GEO accession: GSE147507); and another set of data from five deceased COVID-19 patient’s (initially all of them were hospitalized) lung autopsy and associated controls from the original study (GEO accession: GSE150316) (Additional file 11). Moreover, for the differential transcriptome documented for the cell lines, we used the RNA-seq data from infected cell lines and associated controls from GEO datasets GSE148729 and GSE147507 (Additional file 11).
For differential expression (DE) analysis, we used DESeq2 (v1.26.0)  with R (v3.6.2; 2019–07-05) that uses a model based on the negative binomial distribution. To avoid false positive, we considered only those transcripts where at least 10 reads are annotated in at least one of the samples used in this study and also applied a minimum Log2 fold change of 0.5 for to be differentially apart from adjusted p-value cut-off of ≤ 0.05 by FDR. To assess the fidelity of the RNA-seq data used in this study and normalization method applied here, we checked the normalized Log2 expression data quality using R/Bioconductor package “arrayQualityMetrics (v3.44.0)” . From these analyses, no outlier was detected in our data by “Distance between arrays”, “Boxplots”, and “MA plots” methods and replicate samples are clustered together (data not shown). We considered the genes upregulated which have a positive Log2 fold change value higher than 0.5, and those with a Log2 fold change value lower than − 0.5 were considered downregulated.
We also performed a multifactorial differential gene expression analysis using the edgeR tool  following the generalized linear model (GLM) experimental design- log2 (lung samples/normal lung control samples)/ log2 (our studied Nasal samples/normal nasal control samples); we used the autopsy samples of COVID-19 patients and associated controls from (GEO accession: GSE150316) as lung sample & controls, and we used our analyzed nasal COVID-19 transcriptomes as nasal samples alongwith the RNA-seq data from (GEO accession: GSE97668) as normal controls.
Construction of phylogenetic tree
We constructed a Neighbour-Joining phylogenetic tree with all available 145 SARS-CoV-2 genomes of Bangladeshi isolates (retrieved on 6th May from GISAID ). Firstly, the genome sequences were aligned using MAFFT  tool using the auto-configuration. Then we used MEGA X  for constructing the phylogenetic tree utilizing 500 bootstrapping with substitution model/method: maximum composite likelihood, uniform rates of variation among sites, the partial deletion of gaps/missing data and site coverage cutoff 95%.
Functional enrichment analysis
We utilized Gitools (v1.8.4) for enrichment analysis and heatmap generation . We have utilized the Gene Ontology Biological Processes (GOBP) , Bioplanet pathways , KEGG pathway , and Reactome pathway  modules for the overrepresentation analysis. Resulting p-values were adjusted for multiple testing using the Benjamin and Hochberg's method of False Discovery Rate (FDR) .
Retrieval of the host proteins that interact with SARS-CoV-2
We have obtained the list of human proteins that form high confidence interactions with SARS-CoV-2 proteins from conducted previously study  and processed their provided protein names into the associated HGNC official gene symbols.
Construction of biological networks
Construction, visualization, and analysis of biological networks with differentially expressed genes, their associated transcription factors, and interacting viral proteins were executed in the Cytoscape software (v3.8.0) . We used the STRING  database to extract the highest confidences (0.9) edges only for the protein–protein interactions to reduce any false positive connection.
Availability of data and materials
SARS-CoV-2 genomes are deposited at GISAID initiatives portal (https://www.gisaid.org/) with accessions- EPI_ISL_450340, EPI_ISL_450341, EPI_ISL_450342, EPI_ISL_450345. Raw RNA-seq data are deposited at NCBI-GEO (https://www.ncbi.nlm.nih.gov/geo/) under the accession- GSM4667504, GSM4667505, GSM4667506, GSM4667507. Additionally, publicly available data were utilized (Additional file 11). Analyses generated data are deposited as Additional files.
Worldometer. Coronavirus Cases. New York: Worldometer ; 2020. p. 1–22.
Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet. 2020;395(10224):565–74.
Koh J, Shah SU, Chua PEY, Gui H, Pang J. Epidemiological and clinical characteristics of cases during the early phase of COVID-19 pandemic: a systematic review and meta-analysis. Front Med (Lausanne). 2020;7:295.
Lauer SA, Grantz KH, Bi Q, Jones FK, Zheng Q, Meredith HR, et al. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann Intern Med. 2020;2:58.
Yoshikawa T, Hill TE, Yoshikawa N, Popov VL, Galindo CL, Garner HR, et al. Dynamic innate immune responses of human bronchial epithelial cells to severe acute respiratory syndrome-associated coronavirus infection. PloS ONE. 2010;5(1):e8729.
Xiong Y, Liu Y, Cao L, Wang D, Guo M, Jiang A, et al. Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients. Emerg Microb Infect. 2020;9(1):761–70.
Hu T, Zhang Y, Li L, Wang K, Chen S, Chen J, et al. Two adjacent mutations on the dimer interface of SARS coronavirus 3C-like protease cause different conformational changes in crystal structure. Virology. 2009;388(2):324–34.
Sungnak W, Huang N, Bécavin C, Berg M, Queen R, Litvinukova M, et al. SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes. Nat Med. 2020;26(5):681–7.
Hänel K, Stangler T, Stoldt M, Willbold D. Solution structure of the X4 protein coded by the SARS related coronavirus reveals an immunoglobulin like fold and suggests a binding activity to integrin I domains. J Biomed Sci. 2006;13(3):281–93.
Luzina IG, Todd NW, Nacu N, Lockatell V, Choi J, Hummers LK, et al. Regulation of pulmonary inflammation and fibrosis through expression of integrins αVβ3 and αVβ5 on pulmonary T lymphocytes. Arthritis Rheum. 2009;60(5):1530–9.
Alexiou K, Wilbring M, Matschke K, Dschietzig T. Relaxin protects rat lungs from ischemia-reperfusion injury via inducible NO synthase: role of ERK-1/2, PI3K, and forkhead transcription factor FKHRL1. PloS one. 2013;8(9):e75592-e.
Gurczynski SJ, Moore BB. IL-17 in the lung: the good, the bad, and the ugly. Am J Physiol Lung Cell Mol Physiol. 2018;314(1):L6–16.
Briguglio M, Bona A, Porta M, Dell’Osso B, Pregliasco FE, Banfi G. Disentangling the hypothesis of host dysosmia and SARS-CoV-2: the bait symptom that hides neglected neurophysiological routes. Front Physiol. 2020;11:671.
Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, et al. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics. 2013;30(7):1003–5.
Huang R, Grishagin I, Wang Y, Zhao T, Greene J, Obenauer JC, et al. The NCATS BioPlanet—an integrated platform for exploring the universe of cellular signaling pathways for toxicology, systems biology, and chemical genomics. Front Pharmacol. 2019;10:445.
We would like to thank all the authors who have kindly deposited and shared genome data on GISAID (https://www.gisaid.org/). We would also want to convey our gratitude to the developers maintaining the NextStrain novel coronavirus data portal (https://nextstrain.org/).
This project was not associated with any internal or external source of funding.
Abul Bashar Mir Md. Khademul
Islam and Md. Abdullah-Al-Kamran Khan contributed equally to this work.
Authors and Affiliations
Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, 1000, Bangladesh
Abul Bashar Mir Md. Khademul Islam
Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
Md. Abdullah-Al-Kamran Khan
Basic and Applied Research On Jute Project, Bangladesh Jute Research Institute, Dhaka, Bangladesh
Rasel Ahmed, Md. Sabbir Hossain, Shah Md. Tamim Kabir & Md. Shahidul Islam
Department of Pathology and Parasitology, Chittagong Veterinary and Animal Sciences University (CVASU), Khulshi, Chittagong, Bangladesh
ABMMKI designed the workflow and conceived the project. AMAMZS performed sample collection and detection. RA, MSH, SMTK, and MSI performed the RNA sequencing and viral genome assembly. ABMMKI and MAAKK performed the RNA-seq data analysis, comparative genomics, and other bioinformatic analyses. MAAKK and ABMMKI wrote the manuscript. ABMMKI and MAAKK contributed equally to this work. All authors read and approved the final manuscript.
Patients’ written-consents were taken before the collection of the samples. An institutional ethical clearance (Ref. No: 100/Biol. Scs., dated: 23/08/2020) was obtained from the Ethical Review Committee of the Faculty of Biological Sciences, University of Dhaka.
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A. Snapshot of Nextstrain data portal showing the phylogenetic relationship of two SARS-CoV-2 isolates used in this study. Isolates of this study are indicated using a red arrow. B. Phylogenetic tree of Bangladeshi SARS-CoV-2 isolates. Neighbor-joining tree using MEGA tools. Isolates reported in this study are indicated with a red arrow. The evolutionary history was inferred using the Neighbor-Joining method. The optimal tree with the sum of branch length = 0.01403419 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. The evolutionary distances were computed using the Maximum Composite Likelihood method and are in the units of the number of base substitutions per site. This analysis involved 145 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding. All positions with less than 95% site coverage were eliminated, i.e., fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position (partial deletion option). There was a total of 29827 positions in the final dataset. Values represent bootstrap numbers (%).
Hierarchically clustered heatmap representing the patient-wise complete expression profiles. Normalized Log2 fold changes compared to average normal expression values across the samples are represented in a color coded heatmap, and for one of the four samples only protein coding genes (with Log2 fold change > 0.5) are represented. Pearson correlation distance was utilized for this hierarchical clustering of the genes. B. Deregulated genes of selected terms from Fig. 3 in different SARS-CoV-2 infection systems. Genes of selected significant terms are represented here. For individual processes, blue means presence (differentially expressed gene of the module term) while grey means absence (not differentially expressed in the experimental condition in that module term). Processes in the green, blue, red color background represent KEGG, Bioplanet, GOBP enriched terms, respectively.
Deregulated genes of selected terms from Fig. 3 in different SARS-CoV-2 infection systems. For individual processes, blue means presence (differentially expressed gene of the module term) while grey means absence (not differentially expressed in the experimental condition in that module term). Processes in the green, blue, red color background represent KEGG, Bioplanet, GOBP enriched terms, respectively.
Per base sequence quality reports of the generated RNA-seq reads of the four COVID-19 infected nasal samples used in this study.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Islam, A.B.M.M.K., Khan, M.AAK., Ahmed, R. et al. Transcriptome of nasopharyngeal samples from COVID-19 patients and a comparative analysis with other SARS-CoV-2 infection models reveal disparate host responses against SARS-CoV-2.
J Transl Med19, 32 (2021). https://doi.org/10.1186/s12967-020-02695-0