Skip to main content

Table 1 List of converging functional themes identified by different LLMs for a co-expression gene set (M9.2), from the BloodGen3 repertoire associated with circulating erythroid cells: ALAS2 BCL2L1 BPGM C14ORF45 C1ORF128 CA1 EPB42 FAM46C FECH GMPR GPR146 GYPB GYPE IFITL1 KRT1 MARCH8 NFIX OR2W3 OSBP2 PLEK2 SELENBP1 SLC14A1 SLC4A1 SNCA TMCC2 TMOD1 TNS1 TRIM10 TUBB2A XK

From: Harnessing large language models (LLMs) for candidate gene prioritization and selection

Model

Functional Themes

Associated genes

LLMs’ Comments

GPT3.5

Hemoglobin synthesis and red blood cell function

ALAS2, CA1, EPB42, FECH, GYPB, GYPE, SLC4A1

These genes are involved in processes related to hemoglobin synthesis, red blood cell membrane stability, and carbon dioxide transport

 

Enzymes and metabolic processes:

BPGM, FECH, GMPR

These genes encode enzymes involved in glycolysis, heme biosynthesis, and purine nucleotide biosynthesis, respectively

 

Cytoskeletal proteins:

KRT1, TUBB2A

These genes encode proteins involved in maintaining the structure and function of the cytoskeleton

GPT-4

Erythrocyte development and function

ALAS2, BPGM, EPB42, GYPB, GYPE, SLC4A1, XK, FECH

These genes are all involved in processes related to erythrocyte (red blood cell) development, function, or associated conditions. For example, ALAS2 and FECH are involved in heme biosynthesis, a crucial component of hemoglobin. BPGM, EPB42, SLC4A1, GYPB, GYPE, and XK are directly involved in erythrocyte structure or function, and mutations in these genes can lead to conditions like hereditary spherocytosis, elliptocytosis, or blood group antigen variations

 

Cytoskeleton and cellular structure

TNS1, TMOD1, TUBB2A

These genes are all involved in cytoskeleton formation and function, which is essential for cell shape, division, and movement

 

Ubiquitin–proteasome system

MARCH8, TRIM10

These genes are involved in the ubiquitin–proteasome system, which is essential for protein degradation, cell cycle regulation, and other crucial cellular functions

Claude

Erythrocyte biology or iron/heme metabolism

ALAS2, FECH, SLC4A1

ALAS2 and FECH are both involved in porphyrin/heme biosynthesis. SLC4A1 encodes a major anion exchanger in erythrocytes

 

Cytoskeletal or cytoskeleton-associated genes

EPB42, KRT1, TMOD1, TNS1, TUBB2A

These genes encode cytoskeletal proteins like keratins, tubulins, actins

 

Protein binding, degradation or trafficking

MARCH8, TRIM10

MARCH8 is an E3 ubiquitin ligase involved in protein degradation. TRIM10 is a RING-finger E3 ubiquitin ligase as well that regulates trafficking of GLUT4 glucose transporters

 

Interferon response

IFITL1, SELENBP1

IFITL1 and SELENBP1 are both interferon-stimulated genes, though they have different functions

 

phospholipid metabolism or vesicle trafficking

OSBP2, PLEK2

OSBP2 encodes an oxysterol binding protein that regulates lipid metabolism. PLEK2 is involved in regulating vesicle transport

 

GPCR biology

GPR146, SNCA

GPR146 is an orphan GPCR. SNCA encodes alpha-synuclein, which is involved in dopamine neurotransmission and signaling

Bard

Production of Heme

ALAS2, FECH

ALAS2 and FECH are both involved in the production of heme, which is an essential molecule for many cellular processes

 

Glutathione synthesis

BPGM, GYPB

BPGM and GYPB are both involved in the synthesis of glutathione, which is an antioxidant that protects cells from damage

 

Cell signaling

GPR146, OR2W3

GPR146 and OR2W3 are both G protein-coupled receptors that bind to specific ligands and signal to the cell

 

Solute carrier genes

SLC14A1, SLC4A1

SLC14A1 and SLC4A1 are both solute carrier genes that transport essential molecules across cell membranes