Skip to main content

Identification of therapeutic targets in osteoarthritis by combining heterogeneous transcriptional datasets, drug-induced expression profiles, and known drug-target interactions



Osteoarthritis (OA) is a multifactorial, hypertrophic, and degenerative condition involving the whole joint and affecting a high percentage of middle-aged people. It is due to a combination of factors, although the pivotal mechanisms underlying the disease are still obscure. Moreover, current treatments are still poorly effective, and patients experience a painful and degenerative disease course.


We used an integrative approach that led us to extract a consensus signature from a meta-analysis of three different OA cohorts. We performed a network-based drug prioritization to detect the most relevant drugs targeting these genes and validated in vitro the most promising candidates. We also proposed a risk score based on a minimal set of genes to predict the OA clinical stage from RNA-Seq data.


We derived a consensus signature of 44 genes that we validated on an independent dataset. Using network analysis, we identified Resveratrol, Tenoxicam, Benzbromarone, Pirinixic Acid, and Mesalazine as putative drugs of interest for therapeutics in OA for anti-inflammatory properties. We also derived a list of seven gene-targets validated with functional RT-qPCR assays, confirming the in silico predictions. Finally, we identified a predictive subset of genes composed of DNER, TNFSF11, THBS3, LOXL3, TSPAN2, DYSF, ASPN and HTRA1 to compute the patient’s risk score. We validated this risk score on an independent dataset with a high AUC (0.875) and compared it with the same approach computed using the entire consensus signature (AUC 0.922).


The consensus signature highlights crucial mechanisms for disease progression. Moreover, these genes were associated with several candidate drugs that could represent potential innovative therapeutics. Furthermore, the patient’s risk scores can be used in clinical settings.


Osteoarthritis (OA) is the predominant joint disease in middle-aged and older people, and the risk of being affected dramatically increases with aging [1,2,3,4]. Often labeled “wear and tear” disorder, it is defined as a hypertrophic, complex, and multifactorial condition affecting the whole joint (cartilage, subchondral bone, synovial membrane and fluid, ligaments, adjacent joint muscles), characterized by stiffness, resulting in reduced function, pain on movements and crepitus [2, 4]: the most affected sites are knees, hips, hands, and spine likewise multiple joints are commonly involved [4]. Overall, recent studies show that 18% of women and 9.6% of men over 60 have symptomatic OA [2], where most people over 75 show radiographic evidence of the disease [3]. Obesity, female gender, prior joint trauma, performing specific activities, and acquired or congenital anatomic abnormalities represent additional factors of risk. However, the causes of the onset of the disease are complex and multifactorial, encompassing biochemical, environmental, genetic, and systemic factors [2]. Evidence of a genetic influence comes from several sources, including epidemiological studies of family history and exploration of rare genetic disorders related to OA, like chondrodysplasias: alterations in genes such as VDR, COL2A, AGC1, IGF-1 are supposed to be involved in the hereditability component of the disorder, which is overall estimated to be greater than \(50\%\) [5]. The impact of the environment and lifestyle can also be determinant. Indeed, diet, alcohol consumption, and smoking can also have a role, but additional studies are needed to assess the actual influence of these factors on the OA outbreak [2]. People affected have a painful and degenerative course in most cases. Currently, available treatments aim for pain control and improved quality of life, still not being entirely adequate [6]. This scenario forces the patient to experience an extremely disabling condition, both physically and psychologically. Indeed, OA represents a pathology with a high social impact, also burdening the healthcare system [1].

Research in the field of OA focuses on different topics: on the one hand, it is crucial to understand the fundamental mechanisms underlying the onset and evolution of OA. On the other hand, the research on new therapeutics and strategies for effective patient treatments remain urgent. To date, the use of different technologies, encompassing the exploration of various omics, has made it possible to shed some light on the pivotal processes involved. It has been established that inflammatory signaling and extracellular matrix (ECM) remodeling are the predominant activated mechanisms [1].

Numerous studies have demonstrated the presence of a considerable number of Single Nucleotide Polymorphisms (SNPs) associated with the onset of OA pathology, particularly in non-coding regions [7]: this represents evidence of the decisive part played by transcription dysregulation in the establishment and development of the OA phenotype, where modifications of the methylation pattern and chromatin accessibility of these specific loci are determinant [8, 9]. The role of non-coding transcripts is also relevant. Several studies confirmed miRNAs as modulators of cartilage homeostasis, down-regulated in the affected patients. Some of these, like miR-204, miR-211 miR-335-5, and miR-93 activate the autophagic process and inhibit the inflammatory response, interfering with the TNF-\(\alpha\) and interleukins mediated pathways [10,11,12,13]. Across the biological programs potentially altered in the OA patient, TGF\(\beta\) and WNT signaling are crucial: the TGF\(\beta\)-FOXO1 axis activated by TAK1 and the TGF\(\beta\)-pSMAD2/3-FBXO6 axis are involved in the development and homeostasis of healthy joint cartilage, regulating autophagy and ubiquitination [14,15,16] whereas hyperactivation of the WNT program has been associated with chondrocyte hypertrophy and OA [16, 17]. Regarding clinical research, in recent years, scientists have focused on regenerative therapy, where various studies report how the use of stem cells (both deriving from healthy tissue and mesenchyma) can bring the joint in the initial stage of OA to a good level of regeneration [18,19,20,21].

Despite the achievements and results, it has not been clarified yet which genes involved in the maintenance and progression of OA are the master regulators of a dysfunctional and degenerative phenotype and how they can be regulated using available drugs [22]. This gap can be due to the variability introduced by the various experimental strategies and the specificities of the different cohorts analyzed. Our study, using a meta-analysis of three available cohorts, identified a “consensus” signature of 44 differentially expressed genes in OA that recapitulate the main OA patient’s transcriptional profile. Based on such a signature, we performed two types of analysis: disease–centric and patient–centric. In the disease–centric approach, we built a Protein–Protein Interaction (PPI) network encompassing both consensus genes and relevant drug targets, revealing interesting drug-genes interactions and suggesting potential novel treatments. Based on the network, we also selected a list of genes for in vitro validation on human chondrocytes. The patient–centric strategy aimed to define a risk score using the transcriptional data to estimate the severity of the OA disease of each patient. Indeed, for this purpose, we identified a more incisive sub-signature of eight genes to predict the OA status effectively and rapidly, suggesting its potential use in clinical applications. Our results indicate that the selected genes retain the main features of the OA transcriptional profile. Moreover, further studies on the prioritized drugs can also help the clinical research for new therapeutics.

Materials and methods

Figure 1 illustrates the workflow proposed in this work. Starting from a meta-analysis of RNA-seq datasets from three different cohorts, we set up two pipelines for studying the profile of the OA patients from two points of view: a disease–centric approach, in which we prioritized potential drug targets among the differentially expressed genes, and a patient–centric strategy, aimed to propose a risk score associated with the severity of the disease based on the expression of few relevant genes. We confirmed the biological findings using an independent dataset and experimentally validated the expression of 7 of the 44 genes in vitro on human OA chondrocytes.

As a preliminary step, we collected data from the three cohorts in Table 1 (Dataset 1–3), independently pre-processed, and identified differentially expressed genes (DE) associated with OA. In the meta-analysis phase, we integrated the three lists of genes from the DE analysis to return a consensus signature of genes able to capture biological information about the OA transcriptomic profile. For reproducibility purposes, we have created a repository accessible at The repository contains code related to all the pre-processing steps (enriched by further exploratory analyses), the consensus signature extraction, enrichment analyses, and the patient-centric approach workflow. The user will find a dedicated vignette and related useful files for each section to support the analyses. We provided in the repository an R markdown and a Python notebook with the related code to reproduce the disease-centric approach. Further, we added details about the software for the network construction.

Fig. 1
figure 1

Pipeline workflow. The workflow is summarized in this figure

Table 1 Datasets used in this study

Single-cell RNA-Seq cohort (Dataset 1) data collection and pre-processing

We downloaded the raw gene expression count matrix, genes, and barcodes from Gene Expression Omnibus (GEO), id: GSE152805 [23]. This cohort includes data from three patients (one male and two females) suffering from knee OA, with an average age of 67.7 years. These patients had been previously extracted by chance from a cohort of 22 patients subjected to total knee replacement for Medial Compartment OA [23]. For this work, we considered only the transcriptional profiles of 26,228 chondrocytes. We grouped them into two conditions according to the available annotations: (i) derived from damaged tissue (medial region of the tibia, MT, 11,603 cells) or (ii) from minimally damaged tissue (outer lateral region of the tibia, OLT, 14,625 cells). After performing quality control steps using the standard Seurat pipeline, we retained 23,752 cells [27,28,29,30]. After that, we normalized and integrated the data to remove the potential batch effects by using the default procedure from the Seurat R package. For this aim, we extracted 8,183 features to use as anchors for the integration by using the SelectIntegrationFeatures and FindIntegrationAnchors functions, and we applied the IntegrateData function. Then, we applied scaling and dimension reduction. We used the pseudo-bulk approach in the R package muscat [31] with the raw RNA assay to identify DE genes between MT and OLT conditions; in particular, we aggregated the data for each patient using the aggregateData function, with “sum” as the summary aggregation parameter. We created a DESeq2 object and used the patient identifier to deal with the patient effect and perform a DE analysis [32]. We adjusted the p-value using the BH correction (FDR) [33] and defined DE those genes with \(\log _2\) Fold Change (FC) less than − 1.5 or greater than 1.5 and adjusted p-value \(\le 0.05\). Detailed information about the computational procedure used to analyze this cohort is available at The user could find additional explorative analyses here.

Bulk RNA-Seq data collection and pre-processing

We collected bulk RNA-Seq data from two distinct cohorts. The first consists of a paired cohort (i.e., with healthy and OA samples from the same individual), and the second includes healthy and OA samples from independent individuals. Additional details for each dataset are provided in the following subsections. For more details about the procedure employed to analyze these cohorts, see the vignette (, where one can access other explorative analyses (i.e., PCAs, Volcano plots, enrichment analyses) done on DE genes for both the cohorts.

Paired bulk RNA-Seq cohort (Dataset 2)

We downloaded the aligned (reference genome hg19) bam files from the European Genome-phenome Archive (EGA), id: EGAD00001001331 [24]. We quantified the gene expression raw counts using the featureCounts function from the R package Rsubread [34] using as annotation the hg19.ensGene.Chr.gtf.gz file downloaded from the UCSC genome browser.Footnote 1 This cohort profiled paired samples (healthy and damaged cartilage/endochondral bone) from 12 patients (2 females and 10 males) affected by hip and knee OA and underwent total joint replacement. According to the study by Steinberg and colleagues, these patients did not have any malignancies, infections, or knee injuries in the previous 5 years. Additionally, they had not used glucocorticoids in the last 6 months [24]. We filtered out low-expressed genes from the count matrix, retaining only genes with a minimum of 15 counts for at least 50% of the patients. Then, we retrieved the gene symbols using GenomicFeatures [35] and EnsDb.Hsapiens.v75 [36] as an annotation. To deal with gene symbol duplicates, we used the mean counts across the detected isoforms as gene counts for each sample. We accounted for the patient effect, including the factor variable Donor in the design matrix, and then we applied the DESeq2 approach to perform a DE analysis. We corrected the p-value for multiple tests [33] and declared DE those genes with \(\log _2FC \le -0.8\) or \(\log _2FC \ge 0.8\) and FDR \(\le 0.05\).

Unpaired bulk RNA-Seq cohort (Dataset 3)

We downloaded the raw-counts data related to cartilage tissues from the folder “data” in the GitHub repositoryFootnote 2 associated with the study of Soul and colleagues [25]. We used the txi.RData and patientDetails_all_withMed.csv files, i.e., the RNA-Seq data pre-processed with tximport R package [37] and associated patients’ meta-data. This cohort includes 60 Knee OA-affected patients (27 females and 33 males) and 10 non-OA-affected patients (9 males and 1 female) with peripheral vascular disease and no history or clinical sign of OA, joint disease, or joint trauma [25], with an overall average age of 70.7 years (55–86). All patients were subjected to total knee replacement and were diagnosed with predominant Medial compartment OA; as specified by Soul et al., affected samples were isolated from the posterior lateral condyle (PLC), whereas non-OA aliquots were extracted from amputation above the knee [25]. We extracted the integer counts by using the DESeqDataSetFromTximport function, then we filtered out the low-expressed genes, retaining those with a minimum of 5 counts in at least 50% of the samples. We retrieved the gene symbols using EnsDb.Hsapiens.v79 [38] as an annotation. We processed all the duplicated genes’ isoforms as done for Dataset 2. We removed the known batch effect by adding the Batches variable to the design matrix and used DESeq2 to apply a DE analysis; we corrected the p-values [33] and selected the DE genes using the same criteria as in Dataset 1, and then we removed 5 genes for which DESeq2 did not provided a p-value. For the patient-centric pipeline, we retrieved transcript per million (TPM) values provided [25]. For more details on all the pre-processing steps on TPMs, see the “Data pre-processing to perform the patient-centric strategy” paragraph.

Validation cohort data collection and pre-processing (Dataset 4)

As validation dataset, we downloaded the raw-counts as two spreadsheet files directly from GEO, id: GSE114007. This dataset, described by Fisch et al. [26], comprises knee cartilage tissues from 18 healthy samples (5 females and 13 males with an average age of 38, with no history of knee injuries or joint diseases) and 20 OA-affected patients (12 females, 8 males, average age of 66). In particular, normal samples were collected from tissue banks, where they were processed from 24 h to 48 h post-mortem. Affected aliquots were harvested during the knee replacement surgery procedure [26].

For this dataset, we collected the complete list of genes from the DE analysis (OA vs Normal) from the supplementary file NIHMS992829−supplement−Suppl_matl.pdf [26] considering DE those genes with \(\log _2FC \le -1.5\) or \(\log _2FC \ge 1.5\) and FDR \(\le 0.05.\) This list was crucial to validate the consensus signature we obtained (see next section) and also evaluate the robustness of the model implemented in the patient-centric approach (see “Models set-up for risk score computation” section).

Indeed, we used these data to compute a Venn plot between consensus signature and DE genes from this cohort by using the ggvenn function of the homonymous R package [39] and also to perform enrichment analysis (see “Enrichment analyses” section for details). Moreover, we validated the significance of the Venn plot results by the hypergeometric test [40]. For this dataset we retained genes with no less than 5 counts in at least the 50% of the samples, normalized for variance using the vst function from DESeq2 [32] and then batch-corrected these data for the condition variable using the removeBatchEffect function from the Limma R package [41].

To assess the validity of the consensus signature, we applied unsupervised clustering on the z-score transformed normalized data of the consensus genes available for this cohort. We plotted the heatmap using the heatmap3 R package [42], maintaining the default clustering parameters (distance calculation method = “euclidian” and hierarchical clustering method = “complete”). To estimate the validity of our risk score in this dataset, we also derived the TPM values using the raw-count data (see “Data pre-processing to perform the patient-centric strategy” section). More detailed information about these procedures and other explorative analyses are available in the vignette (

Meta-analysis and identification of a consensus signature

We performed a meta-analysis of the three cohorts (Dataset 1, Dataset 2, Dataset 3) by combining the p-values of the respective gene lists returned from the DE analysis using the Fisher approach [43]. Before combining the p-values, we set to 0 the \(\log _2FC\) of all the genes with a slight variation (i.e., \(-0.1 \le \log _2FC \le 0.1\)). We replaced with 0 the NA \(\log _2FCs\) of the genes missing in one of the three cohorts. This was done to compute a weighted combined median \(\log _2FC\) that would be smaller in absolute value than the same calculated by just averaging the only two \(\log _2FC\) values available. By doing this, we accounted for the absence of a gene in one of the datasets. We flagged those genes with a difference in sign on the \(\log _2FC\) across the case studies, and we removed those with no change for all three cohorts. We used Fisher’s combined probability test to combine the (not adjusted) p-values of all genes measured in at least two of the three datasets.

We set the combined p-value to 1 for those genes showing discordant signs of \(\log _2FC\) across the cohorts. Then, we assigned a combined \(\log _2FC\) value as the median of the available \(\log _2FC\) values from the respective DE analyses. Finally, we corrected for multiplicity using the BH approach [33] and obtained adjusted combined p-values. Genes with an adjusted combined p-value (FDR) \(\le 0.01\) and a median combined \(\log _2FC \le -1.5\) or \(\log _2FC \ge 1.5\) are selected as part of the consensus signature (and declared as down and up-regulated genes in OA samples, respectively).

We added a dedicated vignette showing in detail the code used for these analyses (see “Materials and methods” section).

Enrichment analyses

To understand the molecular mechanisms leading to OA, we used the Gene Set Enrichment Analysis (GSEA) approach [44] to retrieve the GO:BP (Gene Ontology–Biological Processes) terms and REACTOME pathways enriched for each of the three DE lists. To identify the biological programs enriched by the up-regulated consensus genes, we applied an over-representation analysis (ORA) using the Hypergeometric test [40]. We set a value of 0.15 as the FDR cutoff for GSEA and 0.05 for the ORA [33]. We used Enrichplot [45], msigdbr [46], and clusterProfiler [47] to perform these analyses.

We assessed the enrichment of the consensus signature in the Validation cohort by obtaining the Running (Enrichment Score) plot (FDR \(\le 0.01\)). We first performed a GSEA analysis of the Validation DE genes using the consensus signature as the “gene set”, then we computed the Running plot using the gseaplot2 function by clusterProfiler.

Disease-centric approach: drug prioritization

To perform drug prioritization, we scored drugs based on their ability to induce an opposite transcriptional response compared to that induced by the disease. This was achieved through a pathway-based approach, which enabled us to abstract the analysis from individual genes to molecular activities while also working around the problem of matching the gene expressions measured by different platforms. In particular, we adapted previously published related approaches that have been used to identify drugs inducing a desired transcriptional effect in a pathway-centric space instead of a gene-centric space. The performance of the pathway-centric approach has been previously assessed in diverse contexts, for example, in the identification of small molecules increasing the efficiency of stem cell differentiation protocols [48] or inducing partial rescue of mutated proteins in a cellular model of cystic fibrosis [49]. In further detail, this approach applies a GSEA-like algorithm to evaluate the enrichment of a large collection of pathways against a given gene expression profile. Once the enrichment scores are computed, they are used in place of the gene expression values to form a pathway-based expression profile (PEP). Techniques similar to those applied to gene-centric profiles can then be applied to the PEPs, including the GSEA, which can be renamed Pathway Set Enrichment Analysis (PSEA) in this context. The PSEA evaluates how much a set of pathways tends to fall towards the top or the bottom of a PEP. In this work, we adapted the PSEA methodology [49] to account for two sets of pathways simultaneously. We selected all the pathways from the GO:MF (Gene Ontology–Molecular Function) category involving any of the genes in the consensus signature. In particular, we performed this selection separately for the up-regulated (“up set”) and down-regulated (“down set”) genes. Next, using the Gep2Pep R package [50], we scored the drug-induced gene expression profiles from the Connectivity Map (CMap), applying PSEA separately against the up and down sets. We then ranked all the CMap drugs according to their PSEA scores so that the top drugs had positive (negative) enrichment values for the pathways involving down- (up-) regulated genes. Finally, we obtained a single score for each drug by computing the average of such ranks.

Network analysis

To elucidate the potential mechanisms of action of interest characterizing the top prioritized drugs, we connected the molecular context with the consensus signature. For this purpose, we extracted drug target information from the Therapeutic Target Database [51]. Next, we used the STRING [52] PPI network to find paths connecting each gene in the signature to each known gene target of the top 50 drugs. To have more reliable protein linkages, we considered only interactions whose combined score was greater than 500 in a range of values [0, 1000] representing the 9.6% of the full proteins interactions. Starting with the gene targets of the prioritized drugs and with the genes of the consensus signature, we find the intersections between these two lists and the genes in the PPI. The results of this intersection were plotted using Gephi (ver 0.10.1),Footnote 3a software for graph visualization. The codes for the drug prioritization and the construction of the network are available at the repository

Selection of the consensus genes for validation in vitro experiments

Starting from the 39 up-regulated genes of the consensus signature, we selected the 10 genes closest to the drug targets over the network (directly connected), and those still unknown in the literature for experimental validation on human articular chondrocytes.

Primary cell cultures

Human Cryopreserved chondrocytes, Osteoarthritis (#CDD-H-2610-OA, CliniSciences) and Human Cryopreserved chondrocytes, Normal (#CDD-H-2610-N, CliniSciences) were cultured in Chondrocyte Growth Medium with 10% Human Serum supplement (#M2600-10HS, CliniSciences) according to manufacture recommendations. The cells were maintained in an incubator at 37 °C and 5% CO2 in a fully humidified atmosphere.

RNA extraction and real-time PCR

Total RNA was isolated from primary cell cultures using Trizol Reagent (Invitrogen, Carlsbad, CA, USA) and quantified using NanoPhotometer NP80 (Implen, USA). One \(\upmu\)g of RNA was reverse-transcribed in cDNA using SuperScript VILO (#11754050, Invitrogen) according to the manufacturer’s protocol. The mRNA levels of the analyzed genes were measured by RT-qPCR amplification using iQ SYBR GREEN Supermix (#1708882, Bio-Rad Laboratories) according to the manufacturer’s instructions. We performed RT-qPCR experiments using C1000 Touch Thermal Cycler (Bio-Rad, Hercules, California, CA, USA). The reaction volume was 25 \(\upmu\)L. Each reaction was performed in duplicate. We quantified mRNA expression using the comparative \(\Delta \Delta\)Ct method, and we used the Ribosomal Protein S18 gene (RPS18) as a control to normalize the gene expressions. Two independent experiments were performed. Data were analyzed using Biorad CFX Maestro version 1.0 (Bio-Rad). For the complete list of the oligonucleotides used for the RT-qPCR, see Additional file 1.

Data pre-processing to perform the patient-centric strategy

The patient-centric approach aims to determine a risk score to distinguish between healthy and OA individuals and capture the disease severity for each patient with a value. We fitted a logistic regression model to reach this goal.

We chose Dataset 3 as the training set for fitting the model and Dataset 4 as test set to demonstrate the approach’s efficacy. We used the \(\log _2(TPM)\) normalized counts since they represent a measurable physical quantity readily available for each patient without the need for between-sample normalization. As a proxy for the TPM values related to the training set, we used the matrix returned by Soul and colleagues from the tximport processing, where the TPMs have been computed using the effective gene lengths [25, 37]; we selected genes with TPM > 0 for at least the 50% of all the samples, dealt for gene isoforms by computing the mean TPM values per-patient, and then applied \(\log _2\) transformation. For the validation dataset, we computed the TPMs starting from raw counts and collecting the theoretical gene lengths by the EDASeq R package [53], filtered out null values as performed in the training dataset, and \(\log _2\) transformed the data.

We performed the following pre-processing steps on the data prior to the model implementation and score extraction:

  1. 1.

    At first, we considered all the genes in common between the two datasets (the training set and the test set).

  2. 2.

    We applied the quantile normalization transformation to the training set.

  3. 3.

    Finally, we re-scaled each sample in the test set to follow the distribution of the quantile normalized \(\log _2(TPM)\) values in the training cohort.

All the details about the procedure used for this strategy and the corresponding R code are available at

Models set-up for risk score computation

We defined a reduced signature-based risk score \(s_R\) by using logistic regression with Elastic-Net penalization to achieve feature selection and identify a reduced subset of consensus genes retaining the maximum information associated with OA.

To determine whether using a small number of features might compromise the efficacy of the score, we also evaluated the total signature risk score \(s_T\), incorporating all of the consensus genes within a logistic regression with the Ridge penalization. Ultimately, we compared the two scores’ classification performances to examine how well they captured the OA condition. The user can reproduce the score computation on this data or other dataset by using the code available at the vignette (see subsection above).

Elastic net model to extract highly informative features

We fitted a logistic regression model with an Elastic-net penalty using the caret R package [54] with method =“glmnet” and \(\alpha\) = 0.5, to select the genes of the consensus signature that were most relevant in discriminating OA patients from healthy ones. Due to the class disbalance and the small number of samples available for the training, we applied a bootstrapping strategy with Leave-One-Out-Cross-Validation (LOOCV). We trained the algorithm on n = 100 sub-datasets, composed of the 10 healthy patients available from the cohort and as many affected patients randomly selected for each sub-dataset. The resulting model returned the vector of estimated regression coefficients for each run with the non-significant coefficients set to zero. As the reduced signature, we selected only the genes detected as significant in at least 50% of the runs. Then, we averaged the coefficients associated with this reduced signature across all the 100 models to use these mean values to compute the final score \(s_R\) for each patient i as shown in (1).

$$\displaystyle {s_{R}}(i)=\sum_{j=1}^{m}a_j*z_{j,i}$$

In this formula, \(a_j\) is the average value of j-th significant regression coefficients (i.e., j-th gene in the reduced signature of length m) computed across all the runs, and \(z_{j,i}\) is the \(\log _2(TPM)\) value of the corresponding gene j in patient i.

We used the Receiver Operating Characteristic (ROC) curve and related Area Under the Curve (AUC) to assess the discriminative capability of the reduced signature.

To further evaluate the \(s_R\) score, we computed the z-score transformed \(\log _2(TPM)\) values and performed an unsupervised clustering on the expression of the consensus genes available for both the training and test datasets. Since we added the class and the assigned \(s_R\) score annotations, we sorted all the samples by increasing assigned score values in the heatmaps to estimate the concordance with the corresponding class.

Ridge regression on all the features for assessing method reliability

We calculated the total risk score, \(s_T\), considering all the features of the consensus signature and applied a logistic regression with the Ridge penalization. We used the LOOCV as previously described. Since Ridge penalization did not perform a feature selection, the model estimated the vector of coefficients for all genes in the signature. Using these coefficients and the corresponding expression values, we derived \(s_T\) as

$$\displaystyle {s_{T}}(i) =\sum_{j=1}^{k}b_j*z_{j,i}$$

where \(b_j\) is the estimated regression coefficient for the gene j of the consensus signature of length k. We evaluated the ROC curves and AUC for the \(s_T\) risk score on the training and test sets. Finally, we used the DeLong test to assess if there was a significant difference between the two AUCs [55].


Differential analysis reveals common biological programs activated in OA samples across heterogeneous datasets

Investigating the DE analysis within each cohort, we obtained an overview of the biological processes and pathways characterizing OA patients compared to healthy controls accounting for the diverse experimental setups of each study. For example, we observed a different number of significantly dysregulated genes: 156 up, 79 down for Dataset 1, 190 up and 28 down for Dataset 2, 272 up, 170 down for Dataset 3, 315 up, 320 down for the Validation dataset (see Fig. 2 and Additional file 2).

Fig. 2
figure 2

Volcano plots for all the datasets considered in this study. DE genes identified in each cohort are respectively in red and green as up/down-regulated genes; the genes in the consensus signature are in blue; not DE genes in grey; the genes chosen for the risk score procedure are labeled and shaped with an asterisk

Nonetheless, the GSEA analysis showed substantial agreement in the altered biological processes among the different cohorts. For example, the GO terms “Extracellular Matrix Organization” and “Extracellular Structure Organization” were found to be significantly enriched in the OA profiles across the three signatures (Fig. 3). The analysis of the REACTOME pathways provided similar insights, including pathways related to collagen fibril synthesis and reorganization as the most enriched (Fig. 3, Additional file 4: Fig. S1).

Fig. 3
figure 3

Bubble chart of commonly enriched biological processes. The figure shows the significantly enriched GO:BP terms shared by all the up-regulated genes for the three cohorts. For each cohort, the size of the dots is the normalized enrichment score (NES) and the color is the statistical significance as \(-\log _{10}(FDR)\)

A consensus signature characterizing the OA phenotype

The meta-analysis of the three cohorts allowed us to identify a consensus signature including 44 genes (39 up, 5 down) that better recapitulate the coherent differences associated with the OA (Additional file 3: Table S1). Enrichment analysis of the consensus profile yielded extracellular matrix and bone reorganization-related terms as the most enriched by the up-regulated genes (Fig. 4). We assessed the consensus signature, evaluating its agreement with an independent cohort [26]. The Venn diagram in Fig. 5A, shows that 31 out of 44 consensus signature genes were also DE in the validation dataset. The hypergeometric test revealed that the overlap was statistically significant (p-value \(< 0.01\)). As further validation, the unsupervised clustering applied to the validation dataset and induced by the consensus genes confirmed their discriminative power (Fig. 5B, 3 out of 44 genes were excluded after the pre-processing). Also, the GSEA Running plot shows that consensus signature genes tend to appear toward the top of the validation dataset expression profile (Additional file 5: Fig. S2).

Fig. 4
figure 4

Consensus signature enrichment analysis. The CNET plot shows the top 6 GO:BP enriched by the up-regulated genes of the consensus signature (over-representation analysis with hypergeometric test); the category node size is the \(-log_{10}(FDR)\) returned from the analysis. ECM org.: extracellular matrix organization; ECM extracellular structure organization; external encapsulating structure organization; collagen fibril org.: collagen fibril organization

Fig. 5
figure 5

Consensus signature validation. A Venn plot representing the overlap between DE genes in the validation dataset and the consensus signature (P \(<<\) 0.01); B the heatmap showing the expression of the consensus genes (DE) across the samples of the validation cohort (z-score)

Drug prioritization and network analysis uncover potential pharmaceutical targets

In the disease-centric analysis, we scored a list of ~ 19,000 drug-induced gene expression profiles based on their predicted ability to counteract the transcriptional effects induced by OA. In particular, the top hits in this approach correspond to those drugs that can up-regulate the pathways involving genes that are down-regulated in the OA signature and vice-versa. This prioritization is entirely target-agnostic but still carries transcription-related information, which we sought to exploit in identifying potential clinically relevant genes. We also performed a PPI network analysis, including the signature genes and top drug molecular targets. In particular, we first noted that none of the molecules in The Library of Integrated Network-based Cellular Signatures (LINCS) databaseFootnote 4 had any of the OA signature genes as their targets. Then, we selected relevant genes from the consensus signature based on the assumption that those appearing along a short path to a drug target could have clinical significance (Fig. 6). Moreover, 5 genes (PPARA, SIRT1, PTGS1/2, ANO1) that are direct targets of several prioritized drugs showed a direct interaction with signature genes. Among the corresponding drugs, Resveratrol has been shown to exert positive effects on tissues’ homeostasis both in vitro and in vivo [56,57,58]; Tenoxicam is currently used as a treatment for pain and inflammation relief in various degenerative rheumatic diseases [59, 60]. Other relevant drugs revealed by the network analysis were Benzbromarone, Pirinixic Acid, and Mesalazine (see “Discussion” section).

Fig. 6
figure 6

Network construction. The figure shows a sub-graph extracted from the PPI graph filtered by considering only DE up consensus genes (blue) and the targeted genes (green) of the top 50 drugs (red, rank value reported). Those links represent the STRING combined score greater than 500. Single genes that had only low-scoring connections or none of them, and our Genes that did not match those of STRING, were not shown

Experimental validations

The reduced set of genes we extracted for the in vitro validation included TSPAN2, TNFSF11, GAS1, KCNN4 and CRABP2 as the closest in the network, and THY1, TGFBI, S100A4 HTRA1 as additional genes. Furthermore, we included COL3A1 as a positive control to assess the test validity of the experimental protocol and to confirm literature results on Primary Human Chondrocytes.

For details on the gene selection, see Additional file 1. Results from RT-qPCR experiments agreed with in silico predictions. Specifically, Fig. 7 shows that most of the tested genes had marked overexpression in OA cells compared to normal cells, except for GAS1 and KCNN4 (not shown because of undetected signal). Among the validated genes, TSPAN2, HTRA1 and TNFSF11 are also part of the sub-signature extracted for the risk-score computation (see next paragraph).

Fig. 7
figure 7

In vitro validation. mRNA fold expression was evaluated in normal and OA chondrocytes for 9 of the 10 selected consensus genes: TNFSF11, COL3A1, HTRA1, S100A4, TSPAN2, GAS1, THY1, CRABP2, TGFBI, using unpaired t-test (*P < 0.05, **P < 0.01, ***P < 0.001)

A single patient risk score to predict OA status

The patient-centric analysis allowed us to derive risk scores associated with OA. For what concerns reduced \(s_R\) score, the logistic regression with the Elastic Net penalty and the bootstrapping strategy allowed us to identify DNER, TNFSF11, THBS3, LOXL3, TSPAN2, DYSF, ASPN and HTRA1 as the features selected in at least the 50% of the runs (Additional file 6: Fig. S3). Such genes are up-regulated as consensus genes (Fig. 2). The weights associated with such genes are used to evaluate \(s_R\) (Fig. 8A). After evaluating \(s_{R}\) for each sample i in both the training and the test data, we compared the score distribution across the two classes of patients: Fig. 8B and C shows that Normal and OA samples have a significantly different \(s_R\) score distribution in both training and test cohorts (p-values \(<<\) 0.01, Wilcoxon test). Moreover, Fig. 9A, B shows that patients sorted by their \(s_R\) scores tend to segregate both for disease status and expression of the signature genes. While the segregation is perfect in the training set (panel A), it is also highly evident in the test set (panel B). Finally, we compared the classification performances of the \(s_R\) score against that of an analogous score (referred to as \(s_T\)) based on the consensus signature genes available in both datasets (43 out of 44). The corresponding ROC curves reported in Fig. 10 show that using a reduced set of genes does not significantly impact the prediction performance (AUC equal to 0.875 as compared to 0.922, the DeLong AUC test applied returned a p-value of 0.507).

Fig. 8
figure 8

Risk score computation based on the most discriminating genes of the consensus signature. In A the barplot shows final mean coefficients associated with their related genes, sorted by decreasing value; in B and C, boxplots report the \(s_R\) score distributions, associated with the training and test sets respectively

Fig. 9
figure 9

Validating risk score. In A and B, the heatmaps of the consensus genes expression (DE), where the patients are sorted by increasing risk score, respectively for training and test sets (z-score)

Fig. 10
figure 10

Performance comparison (Validation dataset) of the two models. In this plot we see that the AUC for the \(s_R\) score (orange) is not particularly lower than the AUC returned from the \(s_T\) score (turquoise); we confirm that reducing the number of features used for the score calculation does not affect the risk estimation of the score


The OA patient phenotype is very varied, although it shares a progressive and unrestrainable degeneration. While significant progress in treating clinical manifestations has been made, fundamental molecular mechanisms underlying the disease remain poorly understood, thus limiting the possibility of identifying effective drugs for the restoration of tissue homeostasis. Such a targeted approach would be especially practical if applied as early as possible, minimizing the damage caused by the degenerative and progressive nature of the disease. Our work aimed to contribute to this endeavor by extracting a reduced set of relevant genes from multiple heterogeneous OA-related datasets. Indeed, as expected, each cohort showed its transcriptional specificities. Such variability could be exacerbated by the degenerative nature of the disease, which progressively affects the extracellular matrix and collagen fibrils, depending on the anatomical structures involved. Nonetheless, the integration process allowed us to isolate 44 genes that appear significantly dysregulated across diverse samples. The panel includes previously known OA-related genes such as COL3A1, ASPN or ADAMTS6/S2, that are involved in cartilage remodeling and inflammation with a specific role in the pathology [61,62,63,64] or TSPAN2 that correlates with inflammation and immune-cell infiltration in affected cartilage [65] and is overexpressed in OA cartilage of murine models [66].

Intriguing is the putative role of the retinoic acid binding protein CRABP2 as a target: the expression of this gene was demonstrated to be sensibly higher in murine models with degenerative joint disorders [67]. Furthermore, fibroblast-like synoviocytes (FLS) derived from rheumatoid arthritis patients showed augmented apoptosis in response to gene inhibition [68]. TNFSF11 has been recently associated with the induction of the PI3K/AKT/mTOR signaling in affected tissues, supporting inflammation and structure destruction: in particular, its downregulation could enhance mitophagy and reduce cartilage degradation [69]. The crucial role that this gene holds in bone remodeling makes it a potential therapeutic target in the treatment of OA and other degenerative joint diseases [70, 71].

Although not known to be directly linked to the disease, other genes could be worthy of further investigation. For example, DNER has been primarily detected in OA-affected cartilage, and its role in the disease is currently under investigation [72, 73]. Indeed, we extracted novel genes such as S100A4, which is known to be involved in cancer metastatic progression but that results overexpressed in cartilage and synovium damaged by arthritis [74, 75]. HTRA1, which has a role in various biological processes, including cancer, is intriguing for its involvement in musculoskeletal diseases [76, 77].

Even if most of the genes in our panel are not known to be direct targets of any drug, our network-based analysis allowed us to identify the most proximal ones, which may provide clues into the therapeutic modes of action for the prioritized drugs. Among the genes that are close to the drug targets, PTGS1/2 are pivotal in the inflammatory process induction (involved in prostaglandin synthesis) and may have clear implications in the pathology progress [78, 79]. On the other hand, SIRT1 and PPARA have a protective and anti-inflammatory action when targeted by their activators and agonists [80,81,82,83]. Indeed, PPARA is a PI3K/AKT/mTOR signaling regulator, promoting autophagy by mTOR signaling pathway inhibition [80, 84, 85]. It was demonstrated that a hypoxic microenvironment can enhance mitophagy and ameliorate chondrocyte viability [85, 86]. Of interest is the role of HIF-1\(\alpha\) in maintaining hypoxia and slowing cell degeneration in mice OA models [87]. To date, non-steroidal anti-inflammatory drugs, glucocorticoids, opioids, chondroprotective agents and anti-cytokines are the classes of drugs currently used to treat OA accompanied by several side effects for the patient [88, 89]. Some of these, like the most common Prednisolone [90] or Paracetamol and Aspirin [91], induce anti-inflammatory and immunosuppressive effects that block the production of pro-inflammatory cytokines, leukocyte recruitment, and activation. Resveratrol, ranked 29 in our prioritization, could play a role in both prevention and treatment of OA, due to its anti-inflammatory, chondrogenic matrix-protective, or antiaging effect [56,57,58, 92,93,94]. Even an anti-gastric cancer effect of this molecule has been recently assessed [95]. Mesalazine (ranked 45) and Pirinixic Acid, a potent PPARA agonist, (ranked 4) have been shown to have anti-inflammatory potential in OA and may warrant further testing [96, 97]. Finally, Benzbromarone (ranked 19) is a medication used to treat gout [98, 99], but we do not have evidence for using this drug for OA treatment. The drug prioritization analysis helped identify a set of genes that may be clinically relevant as both part of the consensus signature and close to the top drugs in the PPI network. The RT-qPCR results confirmed a significant dysregulation of seven of the ten genes we selected for validation.

Although a reduced panel of 44 genes may represent a valuable reference for future mechanistic studies, a smaller number is required to achieve practical relevance in clinical settings. For this reason, in the patient–centric approach, we aimed at developing a score to assess the disease severity with a limited number of genes. The \(s_R\) score, based on only 8 genes, was proved effective and with statistically similar accuracy to \(s_T\) obtained with the larger panel of 43 genes. Together with additional clinical parameters, this score could be helpful in rapid preliminary checks and assessment of disease progression. This study has a few limitations. The reliability of the results could be affected by the limited number of samples available for the training phase of both models. However, we tried to assess this risk using bootstrapping steps and by including a test dataset. Moreover, the specificity of the selected samples, all from patients with damaged cartilage, may limit the possibility of generalizing the result to all OA phenotypes. While we used prioritized drugs as proxies to identify clinically relevant genes, their actual effectiveness as OA treatments would require additional efforts that fall out of the scope of this study. Nonetheless, our results show that some of them, such as Resveratrol, warrant further investigation.


Using a meta-analysis of several OA cohorts, we first identified a panel of 44 genes (39 up-regulated and 5 down-regulated) consistently dysregulated across the different studies, therefore carrying robust molecular features that could be critical for understanding the disease’s mechanisms and suggesting novel therapies. Therefore, starting from this panel and using a network-based approach, from one side, we identified a few drugs that could enlighten therapeutical approaches for OA. For such purpose, we prioritized drugs that can revert the profile and whose targets are close to the identified genes. With this approach, we found either drugs already tested in OA or potential novel therapeutic approaches. Conversely, we extracted a subset of eight genes and defined a patient-specific risk score with significant prediction power to distinguish OA from healthy individuals. The presented results could help understand OA mechanisms from multiple perspectives, including its pathophysiology, diagnosis, and treatment. However, we employed four cohorts related to a specific subtype, cartilage, with a limited number of patients due to the scarcity of publicly available data specific to OA (compared to other studies). It would be interesting to validate our findings’ generalizability more in-depth (for example, when varying clinical and demographic data) and enhance the signature and score obtained with additional data. An example could be the application of this framework on other tissue types like synovial fluid or subchondral bone [23, 100]. The proposed methodological approach can easily scale to additional cohorts of assessment. Therefore, future dataset availability could support the translational purpose of our work.

Availability of data and materials

All the analyses performed have been done using RStudio with R version 4.2.1 (2022–06–23) and Python version 3.9.7. The code of the implemented pipeline and supplementary files useful to reproduce the analyses can be found at: The vignette is related to the entire pre-processing workflow of the datasets, consensus signature extraction, enrichment analyses, and patient-centric strategy. We provided the files with the codes for the disease-centric approach in the repository, also including all the useful information related to the package versions used.







  1. Katz JN, Arant KR, Loeser RF. Diagnosis and treatment of hip and knee osteoarthritis. JAMA. 2021;325(6):568.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Fu K, Robbins SR, McDougall JJ. Osteoarthritis: the genesis of pain. Rheumatology. 2017;57(suppl-4):43–50.

    Article  CAS  Google Scholar 

  3. Arden N, Cooper C. Osteoarthritis handbook. Oxon: Taylor and Francis; 2006.

    Google Scholar 

  4. Dieppe PA, Lohmander LS. Pathogenesis and management of pain in osteoarthritis. Lancet. 2005;365(9463):965–73.

    Article  CAS  PubMed  Google Scholar 

  5. Spector TD, MacGregor AJ. Risk factors for osteoarthritis: genetics. Osteoarthr Cartil. 2004;12:39–44.

    Article  Google Scholar 

  6. Onishi K, Utturkar A, Chang E, Panush R, Hata J, Perret-Karimi D. Osteoarthritis: a critical review. Crit Rev Phys Rehabil Med. 2012;24(3–4):251–64.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Aubourg G, Rice SJ, Bruce-Wootton P, Loughlin J. Genetics of osteoarthritis. Osteoarthr Cartil. 2022;30(5):636–49.

    Article  CAS  Google Scholar 

  8. Rice SJ, Cheung K, Reynard LN, Loughlin J. Discovery and analysis of methylation quantitative trait loci (mQTLs) mapping to novel osteoarthritis genetic risk signals. Osteoarthr Cartil. 2019;27(10):1545–56.

    Article  CAS  Google Scholar 

  9. Liu Y, Chang J-C, Hon C-C, Fukui N, Tanaka N, Zhang Z, Lee MTM, Minoda A. Chromatin accessibility landscape of articular knee cartilage reveals aberrant enhancer regulation in osteoarthritis. Sci Rep. 2018;8(1):15499.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Huang J, Zhao L, Xing L, Chen D. MicroRNA-204 regulates runx2 protein expression and mesenchymal progenitor cell differentiation. Stem Cells. 2009;28(2):357–64.

    Article  CAS  Google Scholar 

  11. Huang J, Zhao L, Fan Y, Liao L, Ma PX, Xiao G, Chen D. The micrornas mir-204 and mir-211 maintain joint homeostasis and protect against osteoarthritis progression. Nat Commun. 2019.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Zhong G, Long H, Ma S, Shunhan Y, Li J, Yao J. miRNA-335-5p relieves chondrocyte inflammation by activating autophagy in osteoarthritis. Life Sci. 2019;226:164–72.

    Article  CAS  PubMed  Google Scholar 

  13. Ding Y, Wang L, Zhao Q, Wu Z, Kong L. MicroRNA-93 inhibits chondrocyte apoptosis and inflammation in osteoarthritis by targeting the TLR4/NF-kb signaling pathway. Int J Mol Med. 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Wang C, Shen J, Ying J, Xiao D, O’Keefe RJ. FoxO1 is a crucial mediator of TGF-β/TAK1 signaling and protects against osteoarthritis by maintaining articular cartilage homeostasis. Proc Natl Acad Sci. 2020;117(48):30488–97.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wang G, Chen S, Xie Z, Shen S, Xu W, Chen W, Li X, Wu Y, Li L, Liu B, Ding X, Qin A, Fan S. TGF-β attenuates cartilage extracellular matrix degradation via enhancing FBXO6-mediated MMP14 ubiquitination. Ann Rheum Dis. 2020;79(8):1111–20.

    Article  CAS  PubMed  Google Scholar 

  16. Jiang Y. Osteoarthritis year in review 2021: biology. Osteoarthr Cartil. 2022;30(2):207–15.

    Article  CAS  Google Scholar 

  17. Bertrand J, Kräft T, Gronau T, Sherwood J, Rutsch F, Lioté F, Dell’Accio F, Lohmann CH, Bollmann M, Held A, Pap T. BCP crystals promote chondrocyte hypertrophic differentiation in OA cartilage by sequestering wnt3a. Ann Rheum Dis. 2020;79(7):975–84.

    Article  CAS  PubMed  Google Scholar 

  18. Jiang Y, Cai Y, Zhang W, Yin Z, Hu C, Tong T, Lu P, Zhang S, Neculai D, Tuan RS, Ouyang HW. Human cartilage-derived progenitor cells from committed chondrocytes for efficient cartilage repair and regeneration. Stem Cells Transl Med. 2016;5(6):733–44.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Murphy MP, Koepke LS, Lopez MT, Tong X, Ambrosi TH, Gulati GS, Marecic O, Wang Y, Ransom RC, Hoover MY, Steininger H, Zhao L, Walkiewicz MP, Quarto N, Levi B, Wan DC, Weissman IL, Goodman SB, Yang F, Longaker MT, Chan CKF. Articular cartilage regeneration by activated skeletal stem cells. Nat Med. 2020;26(10):1583–92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Gupta PK, Chullikana A, Rengasamy M, Shetty N, Pandey V, Agarwal V, Wagh SY, Vellotare PK, Damodaran D, Viswanathan P, Thej C, Balasubramanian S, Majumdar AS. Efficacy and safety of adult human bone marrow-derived, cultured, pooled, allogeneic mesenchymal stromal cells (stempeucel®): preclinical and clinical trial in osteoarthritis of the knee joint. Arthr Res Ther. 2016;18(1):1–18.

    Article  CAS  Google Scholar 

  21. Lu L, Dai C, Zhang Z, Du H, Li S, Ye P, Fu Q, Zhang L, Wu X, Dong Y, Song Y, Zhao D, Pang Y, Bao C. Treatment of knee osteoarthritis with intra-articular injection of autologous adipose-derived mesenchymal progenitor cells: a prospective, randomized, double-blind, active-controlled, phase IIb clinical trial. Stem Cell Res Ther. 2019;10(1):1–10.

    Article  CAS  Google Scholar 

  22. Fernandez-Moreno M, Rego I, Carreira-Garcia V, Blanco F. Genetics in osteoarthritis. Curr Genom. 2008;9(8):542–7.

    Article  CAS  Google Scholar 

  23. Chou C-H, Jain V, Gibson J, Attarian DE, Haraden CA, Yohn CB, Laberge R-M, Gregory S, Kraus VB. Synovial cell cross-talk with cartilage plays a major role in the pathogenesis of osteoarthritis. Sci Rep. 2020;10(1):10868.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Steinberg J, Ritchie GRS, Roumeliotis TI, Jayasuriya RL, Clark MJ, Brooks RA, Binch ALA, Shah KM, Coyle R, Pardo M, Maitre CLL, Ramos YFM, Nelissen RGHH, Meulenbelt I, McCaskie AW, Choudhary JS, Wilkinson JM, Zeggini E. Integrative epigenomics, transcriptomics and proteomics of patient chondrocytes reveal genes and pathways involved in osteoarthritis. Sci Rep. 2017;7(1):8935.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Soul J, Dunn SL, Anand S, Serracino-Inglott F, Schwartz J-M, Boot-Handford RP, Hardingham TE. Stratification of knee osteoarthritis: two major patient subgroups identified by genome-wide expression analysis of articular cartilage. Ann Rheum Dis. 2018;77(3):423.

    Article  CAS  PubMed  Google Scholar 

  26. Fisch KM, Gamini R, Alvarez-Garcia O, Akagi R, Saito M, Muramatsu Y, Sasho T, Koziol JA, Su AI, Lotz MK. Identification of transcription factors responsible for dysregulated networks in human osteoarthritis cartilage by global gene expression analysis. Osteoarthr Cartil. 2018;26(11):1531.

    Article  CAS  Google Scholar 

  27. Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, Hoffman P, Stoeckius M, Papalexi E, Mimitou EP, Jain J, Srivastava A, Stuart T, Fleming LM, Yeung B, Rogers AJ, McElrath JM, Blish CA, Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573–3587. Accessed 06 Mar 2023.

  28. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888–902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Helena L, Soneson C, Germain PL, Mark D. Muscat. 2019. Accessed 06 Mar 2023.

  32. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol). 1995;57(1):289–300.

    Article  Google Scholar 

  34. Liao Y, Smyth GK, Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 2019;47(8):e47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9(8):1003118.

    Article  CAS  Google Scholar 

  36. Rainer J. EnsDb.Hsapiens.v79: Ensembl based annotation package. R package version 2.99.0. 2017.

  37. Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research. 2016;4:1521.

    Article  PubMed Central  Google Scholar 

  38. Rainer J. EnsDb.Hsapiens.v79: Ensembl based annotation package. R package version 2.99.0. 2017.

  39. Yan L. Ggvenn: draw venn diagram by ‘ggplot2’. R package version 0.1.10. 2023.

  40. Fisher RA. Statistical methods for research workers. Edinburgh: Oliver and Boyd; 1934.

    Google Scholar 

  41. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Zhao S, Guo Y, Sheng Q, Shyr Y. Heatmap3: an improved heatmap package with more powerful and convenient features. BMC Bioinform. 2014;15(S10):1–2.

    Article  Google Scholar 

  43. Fisher RA. Statistical methods for research workers. Edinburgh: Oliver and Boyd; 1950. p. 99–101.

    Google Scholar 

  44. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102(43):15545–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Yu G. Enrichplot: visualization of functional enrichment result. R package version 1.21.0. 2023.

  46. Dolgalev I. Msigdbr: MSigDB gene sets for multiple organisms in a tidy data format. R package version 7.5.1. 2022.

  47. Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu X, Liu S, Bo X, Yu G. clusterprofiler 4.0: a universal enrichment tool for interpreting omics data. Innovation. 2021;2(3): 100141.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Napolitano F, Rapakoulia T, Annunziata P, Hasegawa A, Cardon M, Napolitano S, Vaccaro L, Iuliano A, Wanderlingh LG, Kasukawa T, Medina DL, Cacchiarelli D, Gao X, di Bernardo D, Arner E. Automatic identification of small molecules that promote cell conversion and reprogramming. Stem Cell Rep. 2021;16(5):1381–90.

    Article  CAS  Google Scholar 

  49. Napolitano F, Carrella D, Mandriani B, Pisonero-Vaquero S, Sirci F, Medina DL, Brunetti-Pierri N, di Bernardo D. Gene2drug: A computational tool for pathway-based rational drug repositioning. Bioinformatics. 2018;34(9):1498–505.

    Article  CAS  PubMed  Google Scholar 

  50. Napolitano F, Carrella D, Gao X, di Bernardo D. Gep2pep: A bioconductor package for the creation and analysis of pathway-based expression profiles. Bioinformatics. 2019.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Zhou Y, Zhang Y, Lian X, Li F, Wang C, Zhu F, Qiu Y, Chen Y. Therapeutic target database update 2022: Facilitating drug discovery with enriched comparative data of targeted agents. Nucleic Acids Res. 2022;50(D1):1398–407.

    Article  CAS  Google Scholar 

  52. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. Correction to ’The STRING database in 2021: Customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets’. Nucleic Acids Res. 2021;49(18):10800.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Risso D, Schwartz K, Sherlock G, Dudoit S. GC-content normalization for RNA-seq data. BMC Bioinform. 2011;12(1):1–17.

    Article  CAS  Google Scholar 

  54. Kuhn M. Building predictive models in r using the caret package. J Stat Softw. 2008;28(5):1–26.

    Article  Google Scholar 

  55. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837.

    Article  CAS  PubMed  Google Scholar 

  56. Liang C, Xing H, Wang C, Xu X, Hao Y, Qiu B. Resveratrol improves the progression of osteoarthritis by regulating the SIRT1-FoxO1 pathway-mediated cholesterol metabolism. Mediat Inflamm. 2023;2023:1–13.

    Article  CAS  Google Scholar 

  57. Yang S, Sun M, Zhang X. Protective effect of resveratrol on knee osteoarthritis and its molecular mechanisms: a recent review in preclinical and clinical trials. Front Pharmacol. 2022;13: 921003.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Zhou M, Wang D, Tang J. Identification of the resveratrol potential targets in the treatment of osteoarthritis. Evid-Based Complement Altern Med. 2021;2021:1–12.

    Article  Google Scholar 

  59. Jawish R, Najdi H, Safi CA, Chameseddine A. The effect of intra-articular tenoxicam on knee effusion after arthroscopy. Int Orthop. 2015;39(7):1423–6.

    Article  PubMed  Google Scholar 

  60. Yilmaz E. The evaluation of the effectiveness of intra-articular steroid, tenoxicam, and combined steroid–tenoxicam injections in the treatment of patients with knee osteoarthritis. Clin Rheumatol. 2019;38(11):3243–52.

    Article  PubMed  Google Scholar 

  61. Liu R, Liu Q, Wang K, Dang X, Zhang F. Comparative analysis of gene expression profiles in normal hip human cartilage and cartilage from patients with necrosis of the femoral head. Arthr Res Ther. 2016;18(1):1–8.

    Article  CAS  Google Scholar 

  62. Li S, Wang H, Zhang Y, Qiao R, Xia P, Kong Z, Zhao H, Yin L. COL3A1 and MMP9 serve as potential diagnostic biomarkers of osteoarthritis and are associated with immune cell infiltration. Front Genet. 2021;12:721258.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Mishra A, Awasthi S, Raj S, Mishra P, Srivastava RN. Identifying the role of ASPN and COMP genes in knee osteoarthritis development. J Orthop Surg Res. 2019;14(1):1–9.

    Article  Google Scholar 

  64. Li T, Peng J, Li Q, Shu Y, Zhu P, Hao L. The mechanism and role of ADAMTS protein family in osteoarthritis. Biomolecules. 2022;12(7):959.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Da Z, Guo R, Sun J, Wang A. Identification of osteoarthritis-characteristic genes and immunological micro-environment features through bioinformatics and machine learning-based approaches. BMC Med Genom. 2023;16(1):236.

    Article  CAS  Google Scholar 

  66. Hu, Y, Li K, Swahn H, Ordoukhanian P, Head SR, Natarajan P, Woods AK, Joseph SB, Johnson KA, Lotz MK. Transcriptomic analyses of joint tissues during osteoarthritis development in a rat model reveal dysregulated mechanotransduction and extracellular matrix pathways. Osteoarthritis and Cartilage. 2023;31(2):199–212.

    Article  CAS  Google Scholar 

  67. Welch ID, Cowan MF, Beier F, Underhill TM. The retinoic acid binding protein crabp2 is increased in murine models of degenerative joint disease. Arthr Res Ther. 2009;11(1):14.

    Article  CAS  Google Scholar 

  68. Mosquera N, Rodriguez-Trillo A, Mera-Varela A, Gonzalez A, Conde C. Uncovering cellular retinoic acid-binding protein 2 as a potential target for rheumatoid arthritis synovial hyperplasia. Sci Rep. 2018;8(1):8731.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Li J, Jiang M, Yu Z, Xiong C, Pan J, Cai Z, Xu N, Zhou X, Huang Y, Yang Z. Artemisinin relieves osteoarthritis by activating mitochondrial autophagy through reducing TNFSF11 expression and inhibiting PI3k/AKT/mTOR signaling in cartilage. Cell Mol Biol Lett. 2022;27(1):1–19.

    Article  CAS  Google Scholar 

  70. Tat SK, Pelletier J-P, Velasco CR, Padrines M, Martel-Pelletier J. New perspective in osteoarthritis: the OPG and RANKL system as a potential therapeutic target? Keio J Med. 2009;58(1):29–40.

    Article  CAS  PubMed  Google Scholar 

  71. Muratovic D, Atkins GJ, Findlay DM. Is RANKL a potential molecular target in osteoarthritis? Osteoarthr Cartil. 2023.

    Article  Google Scholar 

  72. Berninger L, Balkenhol A, Baier C, Müller-Ladner U, Neumann E, Geyer M. Investigation on the role of delta/notch like egf-related receptor in the pathogenesis of osteoarthritis. Osteoarthr Cartil. 2014;22:369.

    Article  Google Scholar 

  73. Geyer M, Grässel S, Straub RH, Schett G, Dinser R, Grifka J, Gay S, Neumann E, Müller-Ladner U. Differential transcriptome analysis of intraarticular lesional vs intact cartilage reveals new candidate genes in osteoarthritis pathophysiology. Osteoarthr Cartil. 2009;17(3):328–35.

    Article  CAS  Google Scholar 

  74. Šenolt L. The link between metastasis-associated protein s100a4 and rheumatoid arthritis. Joint Bone Spine. 2008;75(2):242–3.

    Article  Google Scholar 

  75. Yammani RR. S100 proteins in cartilage: role in arthritis. Biochim Biophys Acta (BBA) Mol Basis Dis. 2012;1822(4):600–6.

    Article  CAS  Google Scholar 

  76. Tossetta G, Fantone S, Licini C, Marzioni D, Mattioli-Belmonte M. The multifaced role of HtrA1 in the development of joint and skeletal disorders. Bone. 2022;157: 116350.

    Article  CAS  PubMed  Google Scholar 

  77. Grau S, Richards PJ, Kerr B, Hughes C, Caterson B, Williams AS, Junker U, Jones SA, Clausen T, Ehrmann M. The role of human HtrA1 in arthritic disease. J Biol Chem. 2006;281(10):6124–9.

    Article  CAS  PubMed  Google Scholar 

  78. Wang C, Wang F, Lin F, Duan X, Bi B. Naproxen attenuates osteoarthritis progression through inhibiting the expression of prostaglandinl-endoperoxide synthase 1. J Cell Physiol. 2018;234(8):12771–85.

    Article  CAS  PubMed  Google Scholar 

  79. Sun T-W, Wu Z-H, Weng X-S. Celecoxib can suppress expression of genes associated with pge2 pathway in chondrocytes under inflammatory conditions. Int J Clin Exp Med. 2015;8(7):10902.

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Sheng W, Wang Q, Qin H, Cao S, Wei Y, Weng J, Yu F, Zeng H. Osteoarthritis: role of peroxisome proliferator-activated receptors. Int J Mol Sci. 2023;24(17):13137.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Park S, Baek I-J, Ryu JH, Chun C-H, Jin E-J. PPARa-ACOT12 axis is responsible for maintaining cartilage homeostasis through modulating de novo lipogenesis. Nature Commun. 2022;13(1):3.

    Article  CAS  Google Scholar 

  82. Deng Z, Li Y, Liu H, Xiao S, Li L, Tian J, Cheng C, Zhang G, Zhang F. The role of sirtuin 1 and its activator, resveratrol in osteoarthritis. Biosci Rep. 2019.

    Article  PubMed  PubMed Central  Google Scholar 

  83. Zhou Z, Deng Z, Liu Y, Zheng Y, Yang S, Lu W, Xiao D, Zhu W. Protective effect of SIRT1 activator on the knee with osteoarthritis. Front Physiol. 2021;12: 661852.

    Article  PubMed  PubMed Central  Google Scholar 

  84. Zhou Y, Li L, Chen X, Zhao Q, Qu N, Zhang B, Jin X, Xia C. Impaired autophagy contributes to the aggravated deterioration of osteoarthritis articular cartilage by peroxisome proliferator-activated receptor alpha deficiency, associated with decreased ERK and AKT activation. Eur J Med Res. 2023;28(1):332.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Yao Q, Wu X, Tao C, Gong W, Chen M, Qu M, Zhong Y, He T, Chen S, Xiao G. Osteoarthritis: pathogenic signaling pathways and therapeutic targets. Signal Transduct Target Ther. 2023;8(1):56.

    Article  PubMed  PubMed Central  Google Scholar 

  86. Hu S, Zhang C, Ni L, Huang C, Chen D, Shi K, Jin H, Zhang K, Li Y, Xie L, Fang M, Xiang G, Wang X, Xiao J. Stabilization of HIF-1α alleviates osteoarthritis via enhancing mitophagy. Cell Death Dis. 2020;11(6):481.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Zhang H, Wang L, Cui J, Wang S, Han Y, Shao H, Wang C, Hu Y, Li X, Zhou Q, Guo J, Zhuang X, Sheng S, Zhang T, Zhou D, Chen J, Wang F, Gao Q, Jing Y, Chen X, Su J. Maintaining hypoxia environment of subchondral bone alleviates osteoarthritis progression. Sci Adv. 2023.

    Article  PubMed  PubMed Central  Google Scholar 

  88. Richard MJ, Driban JB, McAlindon TE. Pharmaceutical treatment of osteoarthritis. Osteoarthr Cartil. 2023;31(4):458–66.

    Article  CAS  Google Scholar 

  89. da Costa BR, Pereira TV, Saadat P, Rudnicki M, Iskander SM, Bodmer NS, Bobos P, Gao L, Kiyomoto HD, Montezuma T, Almeida MO, Cheng P-S, Hincapié CA, Hari R, Sutton AJ, Tugwell P, Hawker GA, Jüni P. Effectiveness and safety of non-steroidal anti-inflammatory drugs and opioid treatment for knee and hip osteoarthritis: network meta-analysis. BMJ. 2021.

    Article  PubMed  PubMed Central  Google Scholar 

  90. Abou-Raya A, Abou-Raya S, Khadrawi T, Helmii M. Effect of low-dose oral prednisolone on symptoms and systemic inflammation in older adults with moderate to severe knee osteoarthritis: a randomized placebo-controlled trial. J Rheumatol. 2013;41(1):53–9.

    Article  CAS  PubMed  Google Scholar 

  91. Brandt KD. The role of analgesics in the management of osteoarthritis pain. Am J Ther. 2000;7(2):75–90.

    Article  CAS  PubMed  Google Scholar 

  92. Marouf BH, Hussain SA, Ali ZS, Ahmmad RS. Resveratrol supplementation reduces pain and inflammation in knee osteoarthritis patients treated with meloxicam: a randomized placebo-controlled study. J Med Food. 2018;21(12):1253–9.

    Article  CAS  Google Scholar 

  93. Meng T, Xiao D, Muhammed A, Deng J, Chen L, He J. Anti-inflammatory action and mechanisms of resveratrol. Molecules. 2021;26(1):229.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. de Sá Coutinho D, Pacheco M, Frozza R, Bernardi A. Anti-inflammatory effects of resveratrol: mechanistic insights. Int J Mol Sci. 2018;19(6):1812.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Ma Y-Q, Zhang M, Sun Z-H, Tang H-Y, Wang Y, Liu J-X, Zhang Z-X, Wang C. Identification of anti-gastric cancer effects and molecular mechanisms of resveratrol: from network pharmacology and bioinformatics to experimental validation. World J Gastrointest Oncol. 2024;16(2):493–513.

    Article  PubMed  PubMed Central  Google Scholar 

  96. Li Y, Mai Y, Cao P, Wen X, Fan T, Wang X, Ruan G, Tang S, Ding C, Zhu Z. Relative efficacy and safety of anti-inflammatory biologic agents for osteoarthritis: a conventional and network meta-analysis. J Clin Med. 2022;11(14):3958.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Clockaerts S, Bastiaansen-Jenniskens YM, Feijt C, Verhaar JAN, Somville J, Clerck LSD, Osch GJVMV. Peroxisome proliferator activated receptor alpha activation decreases inflammatory and destructive responses in osteoarthritic cartilage. Osteoarthr Cartil. 2011;19(7):895–902.

    Article  CAS  Google Scholar 

  98. Lai S-W, Liao K-F, Kuo Y-H, Hwang B-F, Liu C-S. The risk of ischemic cerebrovascular disease associated with benzbromarone use in gout people: a retrospective cohort study in Taiwan. Medicine. 2023;102(5):32779.

    Article  Google Scholar 

  99. Halai K, Jordon KM. P150 benzbromarone treatment in complex gout: experience in university Sussex hospitals (east). Rheumatology. 2022;61(Supplement–1):96.

    Article  Google Scholar 

  100. Hu Y, Cui J, Liu H, Wang S, Zhou Q, Zhang H, Guo J, Cao L, Chen X, Xu K, Su J. Single-cell RNA-sequencing analysis reveals the molecular mechanism of subchondral bone cell heterogeneity in the development of osteoarthritis. RMD Open. 2022;8(2): 002314.

    Article  Google Scholar 

Download references


This work was supported by Ministero dell’Istruzione, dell’Università e della Ricerca (MIUR) through project MEDIA (PON_03PE_00060_5) and PRIN P2022FCBLM (Next Generation EU).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Michele Ceccarelli.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

 Oligonucleotides designed as primers and information related to the genes selected from the network to set up the validation experiment.

Additional file 2.

 DE genes derived from each DEA related to the three datasets.

Additional file 3.

 The consensus signature of 44 genes.

Additional file 4: Figure S1.

Enrichment Pathway analysis from REACTOME. Significant pathways enriched by DE genes for each experiment.

Additional file 5: Figure S2.

GSEA running plot. The positions of the 44 genes in the ranked list of DE genes of the validation dataset are set on the abscissa axis, while on the ordinates is the calculated enrichment score. The consensus signature is significantly enriched (P << 0.01).

Additional file 6: Figure S3.

Feature selection by means Elastic net bootstrapping. In this plot, for each feature on the ordinate axis, we marked (red) each run in which it was selected by the model. We finally chose the features occurring in at least 50 runs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Costa, M.C., Angelini, C., Franzese, M. et al. Identification of therapeutic targets in osteoarthritis by combining heterogeneous transcriptional datasets, drug-induced expression profiles, and known drug-target interactions. J Transl Med 22, 281 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: