Skip to main content
Fig. 1 | Journal of Translational Medicine

Fig. 1

From: Harnessing large language models (LLMs) for candidate gene prioritization and selection

Fig. 1

Schematic overview of the targeted panel development strategy. This figure presents our novel workflow for candidate gene prioritization (C), within a broader omics data-driven strategy for developing targeted “transcriptome fingerprinting assays” (TFAs). The first component involves data-driven construction of a collection of co-expressed blood transcriptional modules (A). This “fixed transcriptional repertoire” provides a stable framework over time for data analysis and interpretation. The BloodGen3 repertoire consists of 382 modules in 38 aggregates representing 14,168 transcripts, constructed and characterized as described in the Methods and a prior publication [17]. Using BloodGen3 in multiple studies provided insight into the potential biological and clinical relevance of its modular signatures (B). One signature, corresponding to the module aggregate A37, was associated with circulating erythroid cells, vaccine responses, and respiratory viral infection severity [15, 16], leading to its prioritization for inclusion in a generic Immune Profiling TFA panel (ImmP-TFA). Modules within this aggregate were selected to pilot the novel workflow for the prioritization of candidate gene pools (C). In doing so, we investigated the versatility of large LLMs for a range of tasks, from scoring candidates to the selection of top candidates for more comprehensive characterization in a separate workflow (D) [30, 31]

Back to article page