The EoE TaMMA web app identifies EoE-specific markers
EoE aetiopathogenesis is not fully explained, even if a major shift toward antigen-mediated TH2 response has been accepted as the most relevant characteristic [1]. Although some RNA-seq studies have been performed, the complete survey of all transcriptomics collections to advance EoE-related research has not been compiled yet. For this purpose, we analyzed and batch-corrected a total of 18 different studies, including 660 samples from esophageal mucosa and blood, combined into the EoE TaMMA web app (Fig. 1A and B).
We included blood and esophageal tissues from EoE and GERD patients and IBD-derived blood samples. Of note, we included also IBD samples because it helped to better correct the batch variability, a normal consequence of the combination of different studies coming from a variety of data sources generated by different operators, sequencers, and analytic platforms [9, 10]. However, IBD characterization was not shown but can be fully browsed at the dedicated platform (IBD TaMMA). After batch correction, esophagus and blood-derived samples appeared as two distinct clusters (Fig. 1C), despite the different study sources (Fig. 1D), indicating that the correction approach was effective in rendering samples harmonized and comparable. Differential gene expression (DGE) analysis revealed 533 and 504 genes up- and down-regulated, respectively, in the EoE esophagus by comparison with the control (Fig. 2A).
Since the role of TH2 cytokines is key in EoE pathogenesis, we specifically evaluated the expression of interleukin (IL)13, IL4, IL5, and their receptors [26]. According to a previously published EoE single-cell (sc)RNA-seq [21], IL13 and IL5 were broadly expressed by pathogenic effector GATA-3 TH2 cells, expanded in EoE tissue biopsies [21] (Additional file 1: Fig. S2A, and Additional file 1: Fig. S3M and 3P). IL4 was expressed by Treg exclusively (Additional file 1: Fig. S3D and 3P), while IL4 receptor by both the stromal and immune compartments (Additional file 1: Fig. 3E, J, M–O). Additionally, IL5RA resulted as expressed by all myeloid cells, among which CLC-expressing eosinophils (Additional file 1: Fig. 3F, N, O).
In EoE TaMMA, IL13 was the sole confirmed as upregulated in the EoE esophagus as compared to the control, while IL5, IL4, and IL13, IL4, and IL5 receptors were not significantly modulated, although a trend was observed (Fig. 2B). Additionally, IL13 did not result in a specific trait of EoE when compared with GERD-derived samples (Fig. 2B and Additional file 1: Fig. 4A). This evidence might support the difficulties in a straightforward diagnosis for patients with EoE and GERD-shared symptoms [1].
Nonetheless, we sought to further characterize and confirm EoE-related traits in our TaMMA platform. IL13 is known as a mediator of a series of processes in allergic diseases, such as eosinophil chemotaxis, epithelial (goblet) cell proliferation, collagen deposition, and smooth muscle contractility [26], thus prompting us to evaluate these features in EoE esophagi. Therefore, by gene ontology (GO) analysis, we observed biological processes related to epithelial cell proliferation, smooth muscle cell migration, proliferation, differentiation, extracellular matrix remodeling, and chemotaxis to be modulated in EoE by comparison with the control tissues (Fig. 2C).
Interestingly, we also found these biological signatures to be modulated when EoE tissues were compared to the GERD (Fig. 2D), indicating that the EoE pathogenesis is different from the GERD concerning these aspects.
We then evaluated which genes were involved in these biological process alterations and distinguished EoE from GERD in terms of expression levels. Besides the already known factor CCL26 (Eotaxin-3) expressed by stromal and epithelial cells (Additional file 1: Fig. S3G, 3J and 3L) known to regulate the eosinophilic trafficking to the esophagus in patients with EoE and to discriminate between EoE and GERD [27], other markers were pointed out, such as CXCL14, PDGFRA, CXCL12, ACVRL1, POSTN, NOX4 and LTBP4 (Fig. 2E). These results provided evidence that a composite panel of markers specific to EoE may be developed to make the diagnosis more accurate.
Furthermore, IL13 was acknowledged as a factor that induces calpain 14 (CAPN14) expression (Additional file 1: Fig. S3H and 3J), which affects the epithelial barrier through the degradation of the desmosomal protein desmoglein 1 (DSG1) [28]. Our analysis revealed increased CAPN14 expression in the EoE esophagus by comparison with the control, while DSG1 was found down-regulated. (Fig. 2F, G), supporting the inverse relationship existing between these two proteins in the epithelial barrier [28]. Even if the bulk sequencing data are key for understanding the molecular process in a biological system, the great limitation remains the unavailability of information regarding the proportion of cell types within a sample. Nonetheless, in recent decades, approaches like computational deconvolution of single-cell RNA-seq data have been developed and optimized to obtain such information starting from whole tissue expression profiling data [29]. Deconvolution is a time and cost-efficient approach for obtaining cell type-specific information from bulk gene expression of heterogeneous tissues, providing an estimation of cell-type proportions or abundances in samples.
To this end, we exploited Tabula Sapiens, a multiple-organ, single-cell transcriptomic atlas of human tissues [30]. The analysis performed on the EoE TaMMA data confirmed the increased proportion of T helper cells, described as part of the EoE pathogenic process [31], by comparison with both the healthy and GERD tissues.
Furthermore, considering the role of invariant (i)NKT cells, also known as classical NKT cells, during EoE pathogenesis [32], we verified and confirmed their increased proportion specifically in EoE tissues (Additional file 1: Fig. S5A–F).
Tissue remodeling by increased collagen deposition, matrix disassembly, and epithelial-to-mesenchymal (EMT) transition are phenomena that lead to the peculiar fibrostenotic aspect of an EoE esophagus [33]. During fibrotic complications, epithelia lose many characteristics, such as polarity, specific markers, and tight junctions, and acquire properties of mesenchymal cells, including motility, loose cell adhesion via N-cadherin, and de-polarized cytoskeletal arrangements such as vimentin [34]. Consistently, we observed an increased proportion of mesenchymal stem cells (MSC) in EoE compared to GERD and healthy samples (Fig. 3A–C), thus confirming the pro-fibrotic status of the esophagus in EoE conditions. This finding may have implications for developing prognostic molecular markers predicting the risk of fibrostenosis in EoE patients.
These data were also paralleled by the dysregulation of biological processes related to the transforming growth factor beta (TGFB), which was found to increase in the EoE by comparison with both the healthy and the GERD tissues (Fig. 3D and E), with specific markers distinguishing between EoE and GERD (Fig. 2E, specifically: WNT2, ACVRL1, POSTN, NOX4, LEFTY2, GDF5, and LTBP4), supporting the notion that the tissue remodeling and fibrotic process are associated with EoE pathogenesis [26].
Overall, these results pinpointed the EoE TaMMA web app as a reliable tool, evidencing the main hallmarks of EoE, often different from GERD, and thus resulting in a powerful asset for expediting research with novel insights into both pathogenesis and approaches for a more accurate diagnosis of EoE.
EoE TaMMA reveals microbiota dysbiosis as a predominant characteristic during EoE pathogenesis
As mentioned above, EoE TaMMA provides a wide picture of omics profiling of EoE tissues, not only confirming the already known molecular landscape associated with EoE but also pointing out new insights for further investigation of their complex pathogenesis. For instance, among all the markers that were pointed out as specifically determining EoE (Fig. 2 and Additional file 1: Fig. S1), CXCL14, a chemoattractant chemokine expressed by the epithelium, stromal cells and by monocytes (Additional file 1: Fig. S3I, 3J, 3L, and 3R), gained our attention because of its documented antimicrobial activities against pathogens [35] and its higher level in EoE compared with both control and GERD, suggesting possible EoE-specific microbial signatures different from the GERD.
Thus, going deeper into the microbiota profiling, EoE TaMMA pointed out the bacterial species as the most differentially dysregulated microbial entities among EoE, GERD, and control esophageal tissues (Fig. 4A and B and Additional file 1: Fig. S6A–6F).
We then intersected the bacterial species highly abundant in EoE by comparison with the healthy or GERD and identified the 9 candidates specifically characterizing the EoE esophagus (Additional file 1: Fig. S6G). The most abundant were the Streptococcus mitis and Hemophilus parainfluenzae (Fig. 4C), normally colonizing the oropharynx tract and already reported as being associated with EoE pathogenesis [36]. Moreover, increased bacterial diversity but no species dominance was found in EoE esophagi as compared to controls (Fig. 4D), despite previous studies reporting no differences between these experimental groups [7, 36,37,38]. Such a discrepancy might be explained by the former small-sized samples and the consequent lower statistical power that might have contributed to the loss of significant signals that, by contrast, our analysis pointed out.
This is an example of how this computational approach may be exploited to highlight specific signatures. Nevertheless, dissecting each omic at a time would be a huge effort, and many statistically relevant details could be lost.
Multi-omics approaches, often supported by machine-learning algorithms [39], are facilitating the discovery of new molecular networks and hubs by comprehensively and simultaneously analyzing different data layers, such as the human transcriptome and metatranscriptome [40]. Also, it can allow the identification of the origin of patient heterogeneity, ultimately stratifying them based on their molecular characteristics. Indeed, this methodological approach mitigates intersubject variability thanks to the discovery of the principal sources of variation in multi-omics data sets. In this regard, the possibility to perform such an analysis recently became effective thanks to the machine learning-based tool Multi-Omics Factor Analysis (MOFA). MOFA infers a set of (hidden) factors that capture biological and technical sources of variability [25].
Therefore, by applying MOFA for processing the six different types of omics data, encompassing the human transcriptome, virome, eukaryome (fungi and protists), bacteriome, and archaeome from EoE and healthy control samples (Additional file 1: Fig. S7A), the source of variation between the EoE and healthy (control) esophageal mucosa was identified mainly among the metatranscriptomics (microbiome) factors. Specifically, a subset of archaea, fungi, protozoa, and viral species and, to a lesser extent some human transcripts, allowed the development of 4 multi-layers molecular signatures able to distinguish EoE patients from controls (Fig. 5A), indicating that microbial dysbiosis may be a key player during the EoE pathogenesis.
Going deeper into the analysis, the primary source of variance between EoE and control was found at the level of factor 1 mainly in the esophageal virome and archaeon composition and in the blood bacteriome and mycome (fungi, Fig. 5B–F). The top features within factor 1, showing a high impact (weight) in explaining the variance, were the Staphylococcus virus Andhra (Fig. 5C and C′), the Sulfodiicoccus acidophilus, and the Nitrosopumilus sp. K4 (Fig. 5D and D′), the Staphylococcus aureus and the Pasteurella multocida (Fig. 5E and E′), and the Malassezia restricta (Fig. 5F and F′). Besides the factor 1-driven stratum of patients, the factor 2-driven defined another subset of human subjects with EoE where the Plasmodium knowlesi explained the majority of the variance in the protozoa profiling of the blood (Additional file 1: Fig. S7B and 7B′), while factor 3 was featured by the Proteus virus Isfahan, explaining most of the variance in this stratum of patients (Additional file 1: Fig. S7C and 7C′).
Of note, since Staphylococcus virus Andhra parasitizes Staphylococci we checked the levels of these bacteria, but no differences in the esophagi were found (Fig. 6A).
Regarding the Proteus species (i.e., mirabilis), they are known to be parasitized by the Proteus virus Isfahan [41]. Thus, we wondered whether some Proteus species could change their levels according to the Proteus virus Isfahan abundance. Interestingly, Proteus vulgaris was pointed out as highly abundant in EoE blood by comparison with the control, while no differences in the esophagus were found (Fig. 6B).
Based on these pieces of evidence, we can speculate that the presence of defined classes of microbial entities in specific subsets of patients may participate in inducing the antigen-mediated response typical of EoE pathogenesis.