Skip to main content

Discovery of novel JAK1 inhibitors through combining machine learning, structure-based pharmacophore modeling and bio-evaluation



Janus kinase 1 (JAK1) plays a critical role in most cytokine-mediated inflammatory, autoimmune responses and various cancers via the JAK/STAT signaling pathway. Inhibition of JAK1 is therefore an attractive therapeutic strategy for several diseases. Recently, high-performance machine learning techniques have been increasingly applied in virtual screening to develop new kinase inhibitors. Our study aimed to develop a novel layered virtual screening method based on machine learning (ML) and pharmacophore models to identify the potential JAK1 inhibitors.


Firstly, we constructed a high-quality dataset comprising 3834 JAK1 inhibitors and 12,230 decoys, followed by establishing a series of classification models based on a combination of three molecular descriptors and six ML algorithms. To further screen potential compounds, we constructed several pharmacophore models based on Hiphop and receptor-ligand algorithms. We then used molecular docking to filter the recognized compounds. Finally, the binding stability and enzyme inhibition activity of the identified compounds were assessed by molecular dynamics (MD) simulations and in vitro enzyme activity tests.


The best performance ML model DNN-ECFP4 and two pharmacophore models Hiphop3 and 6TPF 08 were utilized to screen the ZINC database. A total of 13 potentially active compounds were screened and the MD results demonstrated that all of the above molecules could bind with JAK1 stably in dynamic conditions. Among the shortlisted compounds, the four purchasable compounds demonstrated significant kinase inhibition activity, with Z-10 being the most active (IC50 = 194.9 nM).


The current study provides an efficient and accurate integrated model. The hit compounds were promising candidates for the further development of novel JAK1 inhibitors.


The Janus kinases (JAKs) family consists of four tyrosine kinases [JAK1, JAK2, JAK3, and tyrosine kinase 2 (TYK2)], a class of cytoplasmic tyrosine kinases associated with cytokine receptors [1]. After stimulated by cytokines, JAKs become enzymatically active and phosphorylate themselves, and then signal transducer and activator of transcription (STAT) [2, 3], which can act directly as transcription factors or trigger downstream signaling pathways [4]. JAK1 isoform is regulated by more than ten cytokine signals from interferon (IFN γ, α) receptor, gamma common (γc) subfamily and glycoprotein 130 (gp130) receptor families [5, 6]. It can phosphorylate any STAT protein (STAT 1–6) and is ubiquitously expressed in human tissues [7]. The JAK1/STAT signaling pathway dysregulated activity is mainly associated with autoimmune illnesses, acute lymphoblastic leukemia, acute myelogenous leukemia, and solid organ malignancies [811]. The critical role of JAK1 in the above diseases has emerged as an appealing therapeutic target and has inspired the pursuit of JAK1 inhibitors.

Currently, approved JAK1 inhibitors including Ruxolitinib, Tofacitinib, Upadacitinib, Abrocitinib, and numerous additional second-generation inhibitors are now under investigation to treat myelofibrosis, ulcerative colitis, atopic dermatitis, and autoimmune illnesses [1215]. However, challenges remain in developing JAK1 inhibitors, and safety and tolerability issues need to be urgently addressed [2]. Structurally, JAK1 consists of seven distinct structural domains (JH1-JH7) (Fig. 1A), with the C-terminal kinase (JH1) module having an ATP-binding site that is formed by the P-loop, A-loop, hinge region, DFG and αC-helix (Fig. 1B) [16]. All the JAK family members share a highly conserved kinase domain, particularly in the ATP-binding site, giving rise to off-target promiscuity of existing drugs and complicating the achievement of JAK isoform specificity [17, 18]. Nevertheless, conventional synthesis and screening methods are laborious, expensive, and time-consuming. Therefore, developing a robust strategy to screen novel JAK1 inhibitors with high potency is urgently needed.

Fig. 1
figure 1

Crystal structure and active pocket of JAK1 (PDB ID: 3EYG)

Pharmacophore models describe the vital molecular features and their spatial arrangement of ligand–protein interactions and are a fast and efficient method for virtual screening (VS) active drug molecules [19, 20]. Despite the significant advances, the pharmacophore approach still faces several challenges, such as the low efficiency of screening large chemical databases with flexible molecules and high false positive/negative rates due to model quality issues [21, 22]. Machine learning (ML) [e.g., random forest (RF) and support vector machine (SVM)], especially deep neural network (DNN), as one of deep learning, have become popular after giving rise to epochal developments in the fields of computer vision and natural language processing. Its advanced algorithms and inference techniques provide fresh opportunities in various fields of data science, including biomedicine [23, 24]. Compared to traditional ML methods with manually designed features, DNN facilitates the utilization of large data sets by automatically learning features from raw input data and having fewer generalization errors [25]. Recently, sophisticated deep learning methods have been applied in VS due to their high recall and low false-positive rates, and could be combined with other methods to develop more efficient and accurate VS methods to discover novel active molecules [2628]. However, as far as we know, research on ML predictive models for VS of kinase inhibitors was quite limited, and lacked bioactivity validation [5, 20, 29, 30]. Therefore, combining pharmacophore and ML models is necessary to build a powerful integrated model to screen potential JAK1 inhibitors.

In this study, we aimed to develop accurate integrated models to screen potential JAK1 inhibitors from an extensive compound database. To achieve this goal, we collected a highly diverse positive and negative dataset. Based on three molecular descriptors and six ML algorithms, a series of prediction ML models and pharmacophore models were constructed to screen the ZINC database. Additionally, the detailed binding modes structure–activity relationship among the hit molecules and JAK1 protein were elucidated via molecular docking and molecular dynamics (MD) simulation. Finally, some of the screened compounds were validated for in vitro biological activity.

Materials and methods

Data collection and preparation

The dataset of JAK1 inhibitors for VS was retrieved from the ChEMBL database ( (accessed Oct 2022) [31]. After removing the redundancy, 3834 inhibitors (IC50 ≤ 1000 nM) were collected as active set. Besides, 6590 decoys were retrieved from the DUD-E database ( (accessed Oct 2022) included in the inactive datasets [32]. To increase the structural diversity of inactive datasets, the Extended connectivity fingerprints 4 (ECFP4, radius = 2) of 10 million compounds from PubChem database ( (accessed Oct 2022) were calculated and clustered with Discovery Studio 2019 to retrieve 5640 compounds. The 5640 compounds were also included in the inactive set, resulting in total of 12,230 compounds in the inactive database. Finally, the active and inactive databases were randomly divided into test set and training set with ratio of 1:3 (Table 1).

Table 1 Number of compounds in datasets for ML models

Molecular fingerprint calculation

Numerous studies have revealed that the performance of prediction models are closely related to the representation which encode the molecular features for similarity assessment in medicinal chemistry [33]. Chemical fingerprints and structural keys are popular fingerprints for ML models and similarity searching, with RDKit topological fingerprinting (RDK) and ECFP4 classified as the former and Molecular Access Systems (MACCS) as the latter. This study calculated the RDK, ECFP4, and MACCS using the RDKit package in Python 3.7.3 ( A total of 1024 RDK, 1024 ECFP4 and 166 MACCS fingerprint descriptors were calculated for each compound in the datasets, which were used as input for the ML models.

ML model generation

The Tensorflow 2.10.0 and Scikit-learn 1.0.2 packages in Python 3.7.3 were used to implement the DNN and the other ML models [34]. In this part, compounds in the datasets were binary classified by numerous algorithms with different fingerprint descriptors as input (listed below) (Fig. 2).

Fig. 2
figure 2

The flowchart for ML model construction

DNN model

Deep learning model with artificial neural networks as the architecture is a type of ML. As a framework of deep learning, the DNN was implemented to learn the molecular features in our study. The algorithm achieves data classification by performing a series of feature extractions and non-linear transformations on the input. Each neuron in the DNN receives outputs from the previous layer of neurons, multiplies them by weights, and propagates the result to the next layer of neurons, eventually generating the results. DNN of our method consisted of five fully-connected layers: one input layer, three hidden layers, and one output layer. Additionally, dropout was used to constrain each hidden layer to reduce overfitting.

SVM model

As a commonly used ML model, the SVM can be implemented by the SVM module of Scikit-learn. Through this algorithm, each data is mapped in an n-dimensional space, and then a hyperplane is estimated to optimally separate the compounds into active and inactive [35].

KNN model

The k-Nearest Neighbor (KNN) is an instance-based classification algorithm studying the class association of a data point in the feature space [36]. The KNN algorithm transforms input training data into a set of n-dimensional vectors in a multidimensional feature space. When a test vector is fed into the model, it can be assigned to the class that its k-nearest neighbors belong to at most, through the reference of Euclidean distances between the train and test vectors in the feature space. In this study, the Scikit-learn library was also used to implement the algorithm.

LR model

Logistic regression (LR) is another widely used supervised ML method. The algorithm allows predicting the probability of each compound to belong to the active or inactive group by mapping values between 0 and 1 as outputs through an activation function [37]. As a core part of the LR, a sigmoid function is used to realize the above process. The formula for LR is as follows:

$$ {\text{log}}\left[ {\frac{{\text{y}}}{{1 - {\text{y}}}}} \right] = {\text{b}}_{0} + {\text{b}}_{1} {\text{x}}_{1} + {\text{b}}_{2} {\text{x}}_{2} + {\text{b}}_{3} {\text{x}}_{3} + \cdots + {\text{b}}_{{\text{n}}} {\text{x}}_{{\text{n}}} $$

DT model

Decision Tree (DT) is a popular supervised ML method commonly used in both dataset classification and regression [38]. Therefore, the algorithm can also be applied to identify active compounds. With the tree’s structure to separate data, a leaf node, an internal node, a root node, and branches are included in the DT. As one of the standard algorithms used to solve classification problems, the Classification and Regression Tree (CART) algorithm can be applied to construct the DT model by slitting the nodes into sub-nodes on the basis of threshold values of attributes.

RF model

RF is an ensemble method that integrates diverse classifiers to make predictions for problems. Based on RF, ensemble learning can combine decision trees to yield better predictive performance than the other constituent classifiers. The RF can also prevent overfitting by selecting random subsets of training data for each tree and considering random factors at each decision node [39].

ML model evaluation

Applicability domain

All models were developed on a limited number of compounds that do not cover the entire chemical space, and the applicability domain (AD) is the region of the chemical space where the models can accurately forecast new compounds. Principal component analysis (PCA) is a commonly utilized method for feature extraction through data dimensionality reduction. In this study, the optimal principal component (n) was determined by calculating the cumulative variance contribution. Subsequently, the value of n was employed for data dimensionality reduction, ultimately yielding the AD. The above process was implemented by the PCA function in Python’s Scikit-learn 1.0.2 package.

Evaluation metrics

To evaluate model quality, accuracy, precision, recall, F1 score, and Matthews correlation coefficient (Mcc) were calculated (Table 2) [40]. Moreover, AUC, an essential index for model quality assessment, was evaluated by the Python script.

Table 2 Description of the evaluation metrics


Y-randomization is a frequently used method to validate model robustness, which aims to test the random correlation between the dependent and independent variables. In this validation, the dependent variable Y is randomly ordered and a new model is built using the original independent variable matrix X. The process is repeated several times and each estimate of model accuracy and AUC are recorded. In total, 75% of the compounds in the training set were resampled and used for 500 runs of the Y-randomization test.

VS of ZINC database

In this work, 1.6 million compounds were downloaded from the ZINC database ( The RDKit package was used to calculate each compound’s molecular fingerprints. Finally, the ML model with optimal statistical parameters was selected for VS.

Pharmacophore models generation and validation

Hiphop pharmacophore model

The Hiphop algorithm mainly applies to constructing the critical common features from active ligands in the training set. Notably, the ‘Principle’ and ‘MaxOmitFeat’ values are important parameters to discriminate different inhibitors in the Hiphop model [41]. Both range from 0 to 2, the former with larger values correspond to a stronger activity of the inhibitor and the latter correspond to how many features can be missed. All the six JAK1 inhibitors included in the training set were obtained by literature search and energy minimized with Chemistry at Harvard Macromolecular Mechanics (CHARMm) force field [4245] (Fig. 3). The ‘MaxOmitFeat’ value of 0 and the ‘Principle’ value of 2 were assigned to all molecules in the training set [46]. With six inhibitors as input, ten Hiphop pharmacophore models, which include hydrogen bond acceptor (A), hydrogen bond donor (D), aromatic feature (R), and hydrophobic feature (H), could be generated by the ‘common feature pharmacophore model generation’ protocol in Discovery Studio 2019.

Fig. 3
figure 3

Chemical structures of the training set and their IC50 values (Hiphop)

Receptor-ligand pharmacophore model

Different with the Hiphop, the receptor-ligand pharmacophore models were constructed based on the description of receptor-ligand interactions [47]. By employing the ‘receptor-ligand pharmacophore generation’ module within Discovery Studio 2019, ten pharmacophore models, referred to as 6TPF 01–10, were derived from the crystal JAK1-ligand (PDB ID: 6TPF). In addition to the above pharmacophore feature, excluded volume spheres were also considered in the models to describe the interactions between ligands and receptors.

Pharmacophore model validation

To further assess the capability of models to discriminate inhibitors from database, a test database (including 3834 compounds from active dataset and 6590 compounds from DUD-E) was constructed to validate the pharmacophore models. We first examined the cutoff values by the SPSS 18.0 and then precision, recall, F1 Score, and Mcc were also calculated according to the formulas in Table 2.

The pharmacophore model with the optimal statistical parameters was implemented for further screening the hits obtained by the ML model.

Molecular docking

X-ray crystal structure JAK1 receptor (PDB ID: 3EYG, resolution of 1.90 Å) was imported into Discovery Studio 2019 and prepared using the ‘Prepared Protein’ module, including removal of all the water molecules, insertion of missing loops, and addition of hydrogen atoms. The active site was defined as 12.0 Å around the endogenous ligand. The hits were energy minimized by the CHARMm force field [48]. The ligands and prepared protein were imported to implement CDOCKER docking with the parameters set as default. The interaction energy (CDOCKER interaction energy) with higher values corresponds to stronger binding between the ligands and proteins [49].

MD simulation

MD simulations were performed in the GROMACS 2021.6. The software package and all simulation systems utilize the amber force field Amber99SB. Prior to that, the Multiwfn 3.8 (dev) and ORCA 5.0.2 software was used to calculate the resp charges of the ligands, and the Sobtop 1.0 (dev3.1) software was employed to convert it into a general Amber force field (GAFF) force field file that the GROMACS could recognize [5052]. Then each complex was immersed in a water box with a 12 Å buffer of TIP3P water molecules and neutralized with an appropriate number of counterions (Na+ or Cl). Firstly, energy minimization was performed with 1000 iterations of steepest descent and 5000 iterations of conjugate gradient algorithm. Thereafter, the NVT ensemble and NPT ensemble equilibration were carried out at 100 ps under 310 K [53]. In the simulation, the hydrogen bonds were constrained by the LINCS methods, the long-range electrostatic interactions were calculated using the PME method, and the short-range electrostatic and van der Waals interactions were truncated at a distance of 12.0 Å. MD simulations were performed for each of the systems for 50 ns at NPT conditions, and the integration time step was set to 2 fs and trajectories were recorded every 10 ps. Evaluation of molecular structure deviations and atomic flexibility during simulations were estimated using root mean square deviation (RMSD) and root mean square fluctuations (RMSF). Snapshots of these complexes were collected from the equilibrium region (30–50 ns) of the MD simulations. And the binding free energy was calculated by the Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) method. The specific formula is as follows:

$$ \begin{aligned} \Delta G_{bind} = & \,G_{complex} - \left( {G_{protein} + G_{ligand} } \right) \\ = & \,\Delta E_{MM} + \Delta G_{polar} + \Delta G_{nonpolar} - T\Delta S \\ = & \,\Delta H - T\Delta S \\ \end{aligned} $$

where Gcomplex, Gprotein, and Gligand, represent the free energy of the complex, receptor, and ligand, respectively. \({\Delta E}_{MM}\) represents the gas phase interaction energy including van der Waals (\({\Delta E}_{vdw}\)) and electrostatic energy (\({\Delta E}_{ele}\)). \({\Delta G}_{polar}\) and \({\Delta G}_{nonpolar}\) represent the polar and nonpolar solvation free energy. \(\Delta H\) corresponds to the enthalpy of bind, which is usually sufficient for comparing relative binding free energies of structurally similar ligands [54]. − T∆S is the entropy contribution, which was calculated using the interaction entropy (IE) method [55, 56].

JAK1 kinase inhibition assay

The HTRF-based biochemical binding assay was performed to evaluate the inhibitory activities of the obtained compounds against JAK1. Briefly, different potential inhibitors and Tofacitinib were diluted and transferred to Gerinier white assay plate by echo, to get different concentration points in duplicate, which was followed by co-incubation for 30 min at room temperature. Then the Xl665 and antibody detection reagent mixture was added to each well, and the assay plate was incubated for 60 min at room temperature. After incubation, read TR-FRET signal 665/612 on Envision.

Result and discussion

Generation and evaluation of ML models

Eighteen ensemble models were obtained by applying different molecular fingerprints to the corresponding machine learning algorithms.

Applicability domain analysis

Validation of Applicability domain (AD) was carried out on all 18 classification models mentioned above, with results depicted in Fig. 4A. The training set was denoted by blue, while the test set was represented by orange. It can be observed that these two sets exhibit a high degree of overlap in chemical space distribution. Furthermore, all the test set compounds were appropriately placed within the AD, signifying the performance of the prediction model was valid and worthy of trust.

Fig. 4
figure 4

A Applicability domain plot based on ECFP4, RDK, and MACCS. B Comparison of the F1 Score, Mcc, and AUC of the different models. C The accuracy and AUC of Y-randomization models

Evaluation metrics analysis

The F1-score, Mcc, and AUC are the commonly used metrics for the overall discrimination ability to compare models [57]. Specifically, the statistical parameters of all the models were summarized in Table 3 and Fig. 4B. The following points can be noted: (1) All the 18 models showed the satisfactory performance to solve the binary classification problems, which were manifested in accuracy, precision, recall, F1-score, Mcc, and AUC values above 0.92, 0.86, 0.80, 0.83, 0.78, and 0.88, respectively. (2) In terms of fingerprint descriptors, ECFP4 performed slightly better than RDK, which clearly outperforms the MACCS. (3) Among six algorithms evaluated, DNN, RF, and SVM had stronger generalization ability than others, while the DT approach performed worst.

Table 3 Performance of the models based on different combinations of fingerprint descriptors and ML algorithms

When all of the statistical values of the different ML models were compared, the DNN-ECFP4, RF-RDK, and SVM-ECFP4 performed well in our binary classification problem. Collectively, the DNN-ECFP4 model exhibited the best predictive properties (the training precision was 0.9983; the precision, recall, F1-score, Mcc, and AUC of the test set were 0.9764, 0.9937, 0.9850, 0.9803, and 0.9935, respectively) and can be further used for large-scale VS of the ZINC database.

Y-randomization analysis

Finally, additional internal validation with the Y-randomization test was employed to test whether the best model DNN-ECFP4 was correlated by chance. The accuracy and AUC distribution results from 500 iterations were presented in Fig. 4C. Obviously, the correlation coefficient of the optimal model was significantly larger than that of the stochastic model, which indicates that there was a genuine link between the molecular characteristics defined by ECFP4 and compound activity, and that the optimal model was not a result of chance.

DNN-based screening of the ZINC database

The ZINC database was used to identify potential inhibitors against the JAK1 receptor. For each compound, the ECFP4 fingerprint descriptor, as the input, was calculated and stored in CSV format using the RDKit package in Python. After creating a new data frame using the descriptor, potential hit compounds were screened using our DNN-ECFP4 model in the TensorFlow framework. The DNN-ECFP4 model could return the estimated probability (0 ≤ EstPGood ≤ 1) that a compound is in the active class, resulting in 13,976 molecules being identified to be potential JAK1 inhibitors (EstPGood > 0.5).

Pharmacophore models generation and validation

Hiphop pharmacophore model

Six compounds against the JAK1 present in training set were employed to generate qualitative top 10 hypotheses using the Hiphop algorithm. Based on the pharmacophore feature similarities, three clusters were generated in this paper: Cluster I include two models with the combination of six pharmacophore chemical features like 1R, 2H, 1D, and 2A; Cluster II also has two models with the combination of six pharmacophore chemical features like 2H, 2D, and 2A; Cluster III includes six models with the combination of five pharmacophore chemical features like 1R, 1H, 1D, and 2A (Table 4).

Table 4 Performance of the pharmacophore models based on hiphop algorithms

Cluster I-Hiphop1, Cluster II-Hiphop3, and Cluster III-Hiphop5, which was the highest rank score for each cluster, were employed as 3D queries to identify active compounds in the test database. As a result, each pharmacophore model captured partial molecules and assigned the ‘Fitvalue’ to each compound. To further analyze the quality of the pharmacophore model from a statistical perspective, cut-off values, as well as metrics such as precision, recall, F1 Score, and Mcc, were calculated and presented in Table 4. Based on the validation results, Hiphop3 (precision = 0.9020, F1 Score = 0.1929, and Mcc = 0.2377) was able to discriminate active molecules from inactive molecules more effectively using 2.2359 as cutoff values. The Hiphop3 model, including 2H, 2D, and 2A pharmacophore features, shows the best alignment with the compound 4 which has shown in Fig. 5A.

Fig. 5
figure 5

A The Hiphop3 pharmacophore model and it mapping with the compound 4. B The 6TPF 08 pharmacophore model was identified based on the 6TPF complex. Green color indicates A; Cyan and magenta indicate H and D, respectively; Gray color indicates excluded volume

Receptor-ligand pharmacophore model

From the Protein Data Bank (, the co-crystal structure (PDB ID: 6TPF) has been derived and checked to abstract and interpret the mutual interactions between receptor and ligand. As critical starting point, the active pocket of 6TPF was visualized and further analyzed. The ligand could be embedded well in the active site, while the pyrazolopyridinone docked the vicinity of the hinge region and formed hydrogen bonds with Glu 957 and Leu 959. In addition, the NH and difluorocyclopropyl group in the ligand docked near DFG and P-loop in the receptor. The ‘Receptor-Ligand pharmacophore generation’ protocol was used to derive ten pharmacophore models called 6TPF 01–10. The models generated contains D points towards Leu 959, Glu 957 and Asp 1021, A points towards Leu 959, some hydrophobic feature and excluded volume spheres, which recapitulates the mutual interactions between receptor and ligand well.

Similarly, the external test set was employed to identify the pharmacophore models, and the multiple statistical parameters of each model are manifested in Table 5. The 6TPF 08 model with a probability cutoff of 1.5159 has been selected for screening as it exhibits optimal statistical parameters, including precision of 0.8152, F1 score of 0.3547, and Mcc of 0.3131. In addition, the model consists of two D points towards leu 959 and Asp 1021, one A point towards Leu 959, one hydrophobic feature, and 23 excluded volume spheres (Fig. 5B).

Table 5 Performance of the different receptor-ligand pharmacophore models

Pharmacophore models-based screening

With the 13,976 hits as input, the ‘Build 3D database’ protocol generated different conformations of each hit in Discovery Studio 2019. Then the Hiphop3 and 6TPF 08 pharmacophore models were employed as 3D queries to screen above molecules further with the search method of best. According to the results, the screening process using the Hiphop3 model only yielded 254 molecules (Fitvalue > 2.2359). Additionally, 972 molecules matched all the chemical features of 6TPF 08 (Fitvalue > 1.5159). Overall, 113 compounds could simultaneously match the pharmacophore features of the two models. Further analysis revealed that all 113 compounds are also located within the AD of the DNN-ECFP4 model (Additional file 1: Fig. S1).

Docking and visual inspection

Molecular docking allows a visual understanding of protein–ligand interactions at the molecular level to evaluate the stability and binding affinity of their docking complexes. Prior to docking, the endogenous ligand Tofacitinib was extracted from JAK1 (PDB ID:3EYG; resolution 1.90 Å) crystals and redocked into the active pocket of this receptor. The RMSD value of 1.25 Å was calculated for the re-docked pose with respect to the co-crystallized ligand, which was below the 2.00 Å threshold, confirming the accuracy of the docking protocols and parameters. The 113 compounds obtained from the previous step were docked into the active pocket of the JAK1 protein [58]. To screen out molecules that fit into the protein active pocket well, the following criteria have been applied: (i) The shape between the protein’s active pocket and ligand is complementary; (ii) The main skeleton of the hit compound could dock at the vicinity of the hinge region and formed the hydrogen bonds with Glu957 and Leu959; (iii) Calculated CDOCKER interaction energy of hit compound was less than − 35 kcal mol−1. A total of 13 compounds (i.e., 11.5%) fulfilled the above criteria (Table 6). In general, the CDOCKER interaction energy of the hit compounds ranged from − 38.98 – − 71.20 kcal mol−1, mostly higher than that of the co-crystal ligand Tofacitinib (− 45.19 kcal mol−1), suggesting that they have a high binding affinity for JAK1.

Table 6 Chemical structures and CDOCKER interaction energy of the screened hits and Tofacitinib

Moreover, all the 13 hit compounds and Tofacitinib could dock into the JAK1 active site and mainly form the hydrogen bonds with Leu881, Glu957, Leu959, Ser963, Glu966, Arg1007 and Asn1008. The roles of most residues have been reported in previous studies [42, 59, 60]. Tofacitinib assumed a favorable conformation within the active pocket of JAK1, with its t-butyl group directed towards the P-loop and its pyrimidopyrrole moiety buried in the vicinity of hinge region (Fig. 6A). Z-01, exhibiting the lowest CDOCKER interaction energy, was sandwiched within the active site and oriented such that its main skeleton was buried in a deep hydrophobic pocket and its morpholine group pointed towards the P-loop (Fig. 6B). Additionally, the Z-01 could form four hydrogen bonds with Glu957, Leu959 of the hinge and Asn1008. The ligands Z-02 and Z-03 demonstrated binding modes similar to that of Z-01 (Fig. 6C-D). Compound Z-08 shared analogous binding modes with the Tofacinitb, Z-04, Z-05 and Z-07. Its pyrimidopyrrole core docked near the hinge region and formed stable hydrogen bonding with Glu957 and Leu959 at a bond distance of 2.1 Å and 2.8 Å, respectively (Fig. 6I). Furthermore, its methyl group pointed towards the P-loop and formed Pi-alkyl interactions with His885 at a bond distance of 4.9 Å. With the fluorobenzene ring oriented towards the solvent-exposed region, the ligand Z-10 could be inserted into the ATP-binding cleft of JAK1 and establish hydrogen bonding with Glu957, Leu959, Arg1007, and Glu883 at a bond distance of 2.1 Å, 2.1 Å, 2.5 Å, and 2.1 Å, respectively (Fig. 6K). Moreover, Z-06, Z-09, Z-11, Z-12, and Z-13 displayed the comparable binding modes to that of Z-10 (Fig. 6G, J, L-N). Finally, the swissADME ( was employed to calculate the ADME values of the screened compounds, and all 13 compounds meet Lipinsk’s rule (RO5 = 0) (Table 6, Additional file 1: Table S1).

Fig. 6
figure 6

AN Binding mode of the Tofacitinib and top 13 compounds in the active site of JAK1 (PDB ID: 3EYG)

MD simulations analysis

MD simulations and molecular docking have been considered two complementary strategies for understanding the mutual interactions between receptors and ligands [61]. MD simulations can verify the plausibility of docking results and monitor the time-resolved motion of macromolecules. We carried out MD simulations of the 13 ligand–protein docking complexes mentioned above and Tofacitinib-protein based on the same simulation parameters to evaluate the stability of the complex interactions.

The RMSD between initiated docked poses and ligands in the 14 systems was calculated and plotted against time (Fig. 7A–C). In the initial phase, marginal variations in most systems had been noticed. However, the fluctuations in the ligand trajectories smoothed out after 10 ns (20 ns for Z-01 and Z-03), and all complexes had RMSD values of less than 2.5 Å (Tofacitinib < 1.0 Å; Z-02, 04–08,12–13 < 2.0 Å), indicating that they were stably binding in the hydrophobic pocket of JAK1. Of these, compound Z-08 had the minimum fluctuation after converging, whereas Z-10 showed the maximum fluctuation. Further, trajectory analysis of the ligands was conducted to explore ligand conformation changes during MD simulation. After undergoing a certain degree of relative conformational change, the hydroxyl group on the furan ring of Z-08 could form a stable hydrogen bond with Asp1021 in the DFG, generating a stable pose of Z-08 in the active site with low fluctuation (Additional file 1: Fig. S2A). The binding core of Z-10 exhibited high stability in the hinge region, while the terminal phenyl ring displayed significant flexibility with conformational changes over time. Furthermore, Z-10 alternately approached Gly883 and Arg1007 during the process of conformational changes. Despite the considerable fluctuation in the ligand’s RMSD values, the Z-10 remained relatively stable in the protein’s active site during the simulation (Additional file 1: Fig. S2B).

Fig. 7
figure 7

AC RMSD plots of the Tofacitinib and top 13 compounds bound to the JAK1 protein. DG RMSF plots of the proteins in the 14 systems

The RMSF of the Cα atoms in 14 MD trajectories was computed to characterize the flexibility of the protein during the simulation. Generally, the flexibility of amino acid residues, especially those near the active pocket, could decrease after the formation of the ligand–protein complex, which in turn reflects the binding affinity between protein and ligand. The RMSF plots of the 14 complexes are given in Fig. 7D–G. Overall, the RMSF plots of 13 hits followed a similar trend to that of Tofacitinib. Besides, the residues around the active site (e.g., 956–964 and 1019–1024), especially the binding residues, demonstrated higher stability, whereas P-loop and other loops (895–903, 946–951, and 1094–1099) in which Gly898, Gly949 and Gly1097 are located experienced a dramatic tensile deformation. Specifically, the critical residues from the hinge region displayed low flexibility, with RMSF values of less than 1.0 Å. Compared to the other complexes, contraction at the P-loop made the region more stable in the Z-08-protein, Z-12-protein, and Tofacitinib-protein which could be explained by the result of the molecular docking study. Additionally, the Z-12-protein complex increased residues Gly898’s flexibility and experienced a dramatic stretching deformation in this region with the RMSF > 4.0 Å, while the other complexes’ RMSF < 3.0 Å. In contrast, the binding of Z-04 and Z-06 to the JAK1 protein induced the enhancement of molecular rigidity at the loop in which Gly949 is located. For the Z-06-protein complexes, the loop in which Gly1097 is located exhibited high flexibility with the RMSF values of more than 4.5 Å.

As hydrogen bonding interaction (H-bond) is an essential contributor to ligand–protein binding, the number of H-bond was also evaluated during the dynamical shift of the compound at the active site (Fig. 8). The average number of H-bond forming between Z-13 and JAK1 protein was more than that of other compounds with the value of four. Besides, on average, Z-01, Z-04, Z-05, Z-08, Z-10, and Z-12 formed three hydrogen bonds during the dynamical shift, while that of the other compounds was two.

Fig. 8
figure 8

AF H-bond plots of the Tofacitinib and top 13 compounds in the MD simulations

The above results fully confirmed that there were constantly stable hydrogen bonds and no major conformational changes in the ligands and key residues during the MD simulation, suggesting the reliability and stability of the docking study.

Binding free energy analysis

To analyze the binding affinities of 13 hits, the MM-PBSA method was employed to calculate the binding free energies (ΔGbind) of the systems. As shown in Table 7, the free binding energy of Tofacitinib was − 89.375 kJ mol-1, which was inferior to Z-1, Z-2, Z-13 and superior to the other compounds. Besides, the ΔGbind of compounds Z-01, Z-02, Z-08, Z-10, Z-13, and Tofacitinib in JAK1 were less than − 80 kJ mol−1, while that of the other compounds were less than − 40 kJ mol−1. The results implied that firmly binding complexes could be formed between each compound and JAK1, while Z-01, Z-02, Z-08, Z-10, and Z-13 might have a stronger binding ability than the other compounds. In all 14 systems, ΔEvdw, ΔEele, and ΔGnonpolar (negative values) contributes to binding affinities between ligand and protein, whereas the ΔGpolar provides an unfavourable contribution to the total binding free energy. Interestingly, the ΔGnonpolar of all the 14 systems were numerically similar. The polar solvation and electrostatic interactions can compensate for each other in a vacuum. Considering the sum of the ΔEele and ΔGpolar, the value of Z-07 was significantly higher than that of the other compounds, which is unfavorable for the total binding free energy. Furthermore, in absolute value the ΔEvdw of Z-03 and Z-09 was relatively low with respect to the other compounds, which therefore contributes less to ΔGbind. Hence, the van der Waals interaction, polar solvation interaction and electrostatic interaction might significantly contribute to the binding affinities of these compounds.

Table 7 Complex combined with free energy analysis (kJ mol−1)

Besides, the per-residue energy decomposition of the top 9 hits with higher binding free energy was presented in Additional file 1: Fig. S3. and the Val889, Glu957, Phe958, Leu959, and Leu1010 can be identified as critical residues by combining these data with the docking results.

Evaluation of JAK1 inhibitors’ kinase activity

To further verify the screening results, four purchasable compounds (Z-05, Z-08, Z-10, Z-12) were obtained from ChemBridge for the kinase assay (Additional file 1: Table S2). As presented in Fig. 9, the JAK1 kinase inhibition assay revealed the Z-10 as the best inhibition activity against JAK1 (IC50 = 194 nM) among all the tested compounds. The kinase assay results further strengthened these compounds’ ability to inhibit the JAK1 kinase activity and the reliability of our screening method.

Fig. 9
figure 9

AE IC50 of Z-05, Z-08, Z-10, Z-12, and Tofacitinib toward the JAK1


In our study, a dataset consisting of 3834 JAK1 inhibitors and 12,230 decoys was collected, and 18 classification models were constructed using a combination of three molecular descriptors and six ML algorithms. When comparing between the different descriptors and algorithms, the classification effect of ECFP4 and RDK was close and noticeably stronger than that of MACCS; DNN, RF, and SVM had the stronger generalization ability than the other algorithms. The best classifier DNN-ECFP4 based on DNN and ECFP4 achieved an accuracy of 0.9928 and an AUC of 0.9935 for the test set. Furthermore, two pharmacophore models were constructed and identified based on different algorithms. Combining the DNN-ECFP4 model and the pharmacophore models, the ZINC database was screened, followed by further selection based on CDOCKER to hit the top 13 compounds. The MD and free energy calculation were employed to further confirm the interaction strength and stability of all 13 hits and the receptor. Moreover, we demonstrated the enzyme inhibition activities of purchasable compounds in vitro. As a result, all purchasable compounds Z-05, Z-08, Z-10, and Z-12 exhibited more potent inhibitory activity against JAK1 (IC50 < 10,000 nM). Significantly, IC50 of the most active compound Z-10 against JAK1 was 194.9 nM. In this study, the DNN model exhibited notably higher screening efficiency compared to traditional pharmacophore and molecular docking screening methods. Incorporating the DNN model as the initial step of the screening program and combining it with subsequent screening steps can significantly enhance both the rate and scope of screening. Besides, the hit compounds can be further studied, including the in vitro/vivo studies and structural modification.

Availability of data and materials

The authors confirm that the data supporting the findings of this study are available within the article [and/or] its additional file.


  1. Schwartz DM, Kanno Y, Villarino A, Ward M, Gadina M, O’Shea JJ. JAK inhibition as a therapeutic strategy for immune and inflammatory diseases. Nat Rev Drug Discov. 2017;16(12):843–62.

    Article  CAS  PubMed  Google Scholar 

  2. Kavanagh ME, Horning BD, Khattri R, Roy N, Lu JP, Whitby LR, et al. Selective inhibitors of JAK1 targeting an isoform-restricted allosteric cysteine. Nat Chem Biol. 2022;18(12):1388–98.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Park E, Lee SJ, Moon H, Park J, Jeon H, Hwang JS, et al. Discovery and biological evaluation of N-methyl-pyrrolo[2,3-b]pyridine-5-carboxamide derivatives as JAK1-selective inhibitors. J Med Chem. 2021;64(2):958–79.

    Article  CAS  PubMed  Google Scholar 

  4. Spinelli FR, Colbert RA, Gadina M. JAK1: number one in the family; number one in inflammation? Rheumatology. 2021;60(Suppl 2):ii3–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Keretsu S, Ghosh S, Cho SJ. Computer aided designing of novel pyrrolopyridine derivatives as JAK1 inhibitors. Sci Rep. 2021;11(1):23051.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Meyer SC, Levine RL. Molecular pathways: molecular basis for sensitivity and resistance to JAK kinase inhibitors. Clin Cancer Res. 2014;20(8):2051–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Glassman CR, Tsutsumi N, Saxton RA, Lupardus PJ, Jude KM, Garcia KC. Structure of a Janus kinase cytokine receptor complex reveals the basis for dimeric activation. Science. 2022;376(6589):163–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Quintás-Cardama A, Vaddi K, Liu P, Manshouri T, Li J, Scherle PA, et al. Preclinical characterization of the selective JAK1/2 inhibitor INCB018424: therapeutic implications for the treatment of myeloproliferative neoplasms. Blood. 2010;115(15):3109–17.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Gruber CN, Calis JJA, Buta S, Evrony G, Martin JC, Uhl SA, et al. Complex autoinflammatory syndrome unveils fundamental principles of JAK1 kinase transcriptional and biochemical function. Immunity. 2020;53(3):672-684.e611.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Pardanani A, Gotlib J, Roberts AW, Wadleigh M, Sirhan S, Kawashima J, et al. Long-term efficacy and safety of momelotinib, a JAK1 and JAK2 inhibitor, for the treatment of myelofibrosis. Leukemia. 2018;32(4):1035–8.

    Article  CAS  PubMed  Google Scholar 

  11. Kurtz SE, Eide CA, Kaempf A, Mori M, Tognon CE, Borate U, et al. Dual inhibition of JAK1/2 kinases and BCL2: a promising therapeutic strategy for acute myeloid leukemia. Leukemia. 2018;32(9):2025–8.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Kawamura H, Matsumoto K, Shigemi A, Orita M, Nakagawa A, Nozima S, et al. A bundle that includes active surveillance, contact precaution for carriers, and cefazolin-based antimicrobial prophylaxis prevents methicillin-resistant Staphylococcus aureus infections in clean orthopedic surgery. Am J Infect Control. 2016;44(2):210–4.

    Article  PubMed  Google Scholar 

  13. Sandborn WJ, Su C, Panes J. Tofacitinib as induction and maintenance therapy for ulcerative colitis. N Engl J Med. 2017;377(5):496–7.

    Article  PubMed  Google Scholar 

  14. McInnes IB, Anderson JK, Magrey M, Merola JF, Liu Y, Kishimoto M, et al. Trial of upadacitinib and adalimumab for psoriatic arthritis. N Engl J Med. 2021;384(13):1227–39.

    Article  CAS  PubMed  Google Scholar 

  15. Reich K, Thyssen JP, Blauvelt A, Eyerich K, Soong W, Rice ZP, et al. Efficacy and safety of abrocitinib versus dupilumab in adults with moderate-to-severe atopic dermatitis: a randomised, double-blind, multicentre phase 3 trial. Lancet. 2022;400(10348):273–82.

    Article  CAS  PubMed  Google Scholar 

  16. Shawky AM, Almalki FA, Abdalla AN, Abdelazeem AH, Gouda AM. A comprehensive overview of globally approved JAK inhibitors. Pharmaceutics. 2022;14(5):1001.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. McLornan DP, Pope JE, Gotlib J, Harrison CN. Current and future status of JAK inhibitors. Lancet. 2021;398(10302):803–16.

    Article  PubMed  Google Scholar 

  18. Xin P, Xu X, Deng C, Liu S, Wang Y, Zhou X, et al. The role of JAK/STAT signaling pathway and its inhibitors in diseases. Int Immunopharmacol. 2020;80:106210.

    Article  CAS  PubMed  Google Scholar 

  19. Ghoshal N, Vijayan RS. Pharmacophore models for GABA(A) modulators: implications in CNS drug discovery. Expert Opin Drug Discov. 2010;5(5):441–60.

    Article  CAS  PubMed  Google Scholar 

  20. Heider J, Kilian J, Garifulina A, Hering S, Langer T, Seidel T. Apo2ph4: a versatile workflow for the generation of receptor-based pharmacophore models for virtual screening. J Chem Inf Model. 2023;63(1):101–10.

    Article  CAS  PubMed  Google Scholar 

  21. Yang SY. Pharmacophore modeling and applications in drug discovery: challenges and recent advances. Drug Discov Today. 2010;15(11–12):444–50.

    Article  CAS  PubMed  Google Scholar 

  22. Macalino SJ, Gosu V, Hong S, Choi S. Role of computer-aided drug design in modern drug discovery. Arch Pharm Res. 2015;38(9):1686–701.

    Article  CAS  PubMed  Google Scholar 

  23. Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform. 2019;20(5):1878–912.

    Article  CAS  PubMed  Google Scholar 

  24. Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A. A primer on deep learning in genomics. Nat Genet. 2019;51(1):12–8.

    Article  CAS  PubMed  Google Scholar 

  25. Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40–55.

    Article  CAS  PubMed  Google Scholar 

  26. Kimber TB, Chen Y, Volkamer A. Deep learning in virtual screening: recent applications and developments. Int J Mol Sci. 2021;22(9):4435.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kumari M, Subbarao N. Deep learning model for virtual screening of novel 3C-like protease enzyme inhibitors against SARS coronavirus diseases. Comput Biol Med. 2021;132:104317.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Xiao T, Qi X, Chen Y, Jiang Y. Development of ligand-based big data deep neural network models for virtual screening of large compound libraries. Mol Inform. 2018;37(11):e1800031.

    Article  PubMed  Google Scholar 

  29. Lien ST, Lin TE, Hsieh JH, Sung TY, Chen JH, Hsu KC. Establishment of extensive artificial intelligence models for kinase inhibitor prediction: Identification of novel PDGFRB inhibitors. Comput Biol Med. 2023;156:106722.

    Article  CAS  PubMed  Google Scholar 

  30. Babu S, Nagarajan SK, Sathish S, Negi VS, Sohn H, Madhavan T. Identification of potent and selective JAK1 lead compounds through ligand-based drug design approaches. Front Pharmacol. 2022;13:837369.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2016;45(D1):D945–54.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Mysinger MM, Carchia M, Irwin JJ, Shoichet BK. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem. 2012;55(14):6582–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Maggiora G, Vogt M, Stumpfe D, Bajorath J. Molecular similarity in medicinal chemistry. J Med Chem. 2014;57(8):3186–204.

    Article  CAS  PubMed  Google Scholar 

  34. Bac J, Mirkes EM, Gorban AN, Tyukin I, Zinovyev A. Scikit-dimension: a python package for intrinsic dimension estimation. Entropy. 2021;23(10):1368.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Zhang YH, Li H, Zeng T, Chen L, Li Z, Huang T, et al. Identifying transcriptomic signatures and rules for SARS-CoV-2 infection. Front Cell Dev Biol. 2020;8:627302.

    Article  PubMed  Google Scholar 

  36. Rezaie H, Asadollahi-Baboli M, Hassaninejad-Darzi SK. Hybrid consensus and k-nearest neighbours (kNN) strategies to classify dual BRD4/PLK1 inhibitors. SAR QSAR Environ Res. 2022;33(10):779–92.

    Article  CAS  PubMed  Google Scholar 

  37. Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, et al. Top 10 algorithms in data mining. Knowl Inf Syst. 2008;14:1–37.

    Article  Google Scholar 

  38. Podgorelec V, Kokol P, Stiglic B, Rozman I. Decision trees: an overview and their use in medicine. J Med Syst. 2002;26(5):445–63.

    Article  PubMed  Google Scholar 

  39. Corps J, Rekik I. Morphological Brain Age prediction using multi-view brain networks derived from cortical morphology in healthy and disordered participants. Sci Rep. 2019;9(1):9676.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Li X, Li Z, Wu X, Xiong Z, Yang T, Fu Z, et al. Deep learning enhancing kinome-wide polypharmacology profiling: model construction and experiment validation. J Med Chem. 2020;63(16):8723–37.

    Article  CAS  PubMed  Google Scholar 

  41. Lin W, Zheng X, Fang D, Zhou S, Wu W, Zheng K. Identifying hQC inhibitors of alzheimer’s disease by effective customized pharmacophore-based virtual screening, molecular dynamic simulation, and binding free energy analysis. Appl Biochem Biotechnol. 2019;187(4):1173–92.

    Article  CAS  PubMed  Google Scholar 

  42. Grimster NP, Anderson E, Alimzhanov M, Bebernitz G, Bell K, Chuaqui C, et al. Discovery and optimization of a novel series of highly selective JAK1 kinase inhibitors. J Med Chem. 2018;61(12):5235–44.

    Article  CAS  PubMed  Google Scholar 

  43. Su Q, Banks E, Bebernitz G, Bell K, Borenstein CF, Chen H, et al. Discovery of (2R)-N-[3-[2-[(3-Methoxy-1-methyl-pyrazol-4-yl)amino]pyrimidin-4-yl]-1H-indol-7-yl]-2-(4-methylpiperazin-1-yl)propenamide (AZD4205) as a potent and selective janus kinase 1 inhibitor. J Med Chem. 2020;63(9):4517–27.

    Article  CAS  PubMed  Google Scholar 

  44. Kulagowski JJ, Blair W, Bull RJ, Chang C, Deshmukh G, Dyke HJ, et al. Identification of imidazo-pyrrolopyridines as novel and potent JAK1 inhibitors. J Med Chem. 2012;55(12):5901–21.

    Article  CAS  PubMed  Google Scholar 

  45. Vazquez ML, Kaila N, Strohbach JW, Trzupek JD, Brown MF, Flanagan ME, et al. Identification of N-{cis-3-[Methyl(7H-pyrrolo[2,3-d]pyrimidin-4-yl)amino]cyclobutyl}propane-1-sulfonamide (PF-04965842): a selective JAK1 clinical candidate for the treatment of autoimmune diseases. J Med Chem. 2018;61(3):1130–52.

    Article  CAS  PubMed  Google Scholar 

  46. Sakkiah S, Thangapandian S, John S, Lee KW. Pharmacophore based virtual screening, molecular docking studies to design potent heat shock protein 90 inhibitors. Eur J Med Chem. 2011;46(7):2937–47.

    Article  CAS  PubMed  Google Scholar 

  47. Ren JX, Zhang RT, Zhang H. Identifying novel ATX inhibitors via combinatory virtual screening using crystallography-derived pharmacophore modelling, docking study, and QSAR analysis. Molecules. 2020;25(5):1107.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Al-Balas QA, Amawi HA, Hassan MA, Qandil AM, Almaaytah AM, Mhaidat NM. Virtual lead identification of farnesyltransferase inhibitors based on ligand and structure-based pharmacophore techniques. Pharmaceuticals. 2013;6(6):700–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Rampogu S, Zeb A, Baek A, Park C, Son M, Lee KW. Discovery of potential plant-derived peptide deformylase (PDF) inhibitors for multidrug-resistant bacteria using computational studies. J Clin Med. 2018;7(12):563.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Lu T, Chen F. Multiwfn: a multifunctional wavefunction analyzer. J Comput Chem. 2012;33(5):580–92.

    Article  PubMed  Google Scholar 

  51. Neese F. Software update: The ORCA program system—Version 5.0. WIREs Comput Mol Sci. 2022;12(5):e1606.

    Article  Google Scholar 

  52. Lu T. Sobtop, Version [1.0(dev3.1)]. Available online: Accessed 10 Oct 2022.

  53. Sun L, Wang Z, Yang Z, Liu X, Dong H. Virtual screening and structure-activity relationship study of novel BTK inhibitors in traditional Chinese medicine for the treatment of rheumatoid arthritis. J Biomol Struct Dyn. 2023.

    Article  PubMed  Google Scholar 

  54. Wang E, Sun H, Wang J, Wang Z, Liu H, Zhang JZH, et al. End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design. Chem Rev. 2019;119(16):9478–508.

    Article  CAS  PubMed  Google Scholar 

  55. Duan L, Liu X, Zhang JZ. Interaction entropy: a new paradigm for highly efficient and reliable computation of protein-ligand binding free energy. J Am Chem Soc. 2016;138(17):5722–8.

    Article  CAS  PubMed  Google Scholar 

  56. Li Y, Cong Y, Feng G, Zhong S, Zhang JZH, Sun H, et al. The impact of interior dielectric constant and entropic change on HIV-1 complex binding free energy prediction. Struct Dyn. 2018;5(6):064101.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Zhao B, Zhai H, Shao H, Bi K, Zhu L. Potential of vibrational spectroscopy coupled with machine learning as a non-invasive diagnostic method for COVID-19. Comput Methods Programs Biomed. 2023;229:107295.

    Article  PubMed  Google Scholar 

  58. Williams NK, Bamert RS, Patel O, Wang C, Walden PM, Wilks AF, et al. Dissecting specificity in the Janus kinases: the structures of JAK-specific inhibitors complexed to the JAK1 and JAK2 protein tyrosine kinase domains. J Mol Biol. 2009;387(1):219–32.

    Article  CAS  PubMed  Google Scholar 

  59. Hansen BB, Jepsen TH, Larsen M, Sindet R, Vifian T, Burhardt MN, et al. Fragment-based discovery of pyrazolopyridones as JAK1 inhibitors with excellent subtype selectivity. J Med Chem. 2020;63(13):7008–32.

    Article  CAS  PubMed  Google Scholar 

  60. Zak M, Hanan EJ, Lupardus P, Brown DG, Robinson C, Siu M, et al. Discovery of a class of highly potent janus kinase 1/2 (JAK1/2) inhibitors demonstrating effective cell-based blockade of IL-13 signaling. Bioorg Med Chem Lett. 2019;29(12):1522–31.

    Article  CAS  PubMed  Google Scholar 

  61. Zhao Y, Yang H, Wu F, Luo X, Sun Q, Feng W, et al. Exploration of N-arylsulfonyl-indole-2-carboxamide derivatives as novel fructose-1,6-bisphosphatase inhibitors by molecular simulation. Int J Mol Sci. 2022;23(18):10259.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We gratefully acknowledge the invaluable contributions of all individuals involved in this research.


This work was no external financial support was received.

Author information

Authors and Affiliations



ZW: conceptualization, methodology, data curation, and writing the original draft. LS: methodology, software, validation, and data curation. YX: supervision and software. PL: validation, and data acquisition. KX: software, analysis, and interpretation of data. HJ: supervision, methodology, writing and review. All authors have read and agreed to the submission of the manuscript.

Corresponding authors

Correspondence to Zixiao Wang or Jing Huang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

The ADMET parameters of the screened hits. Table S2. The information of four purchasable compounds. Figure S1. The AD of DNN-ECFP4 model and the chemical space of hits. Figure S2. The initial (pink sticks) and final (yellow sticks) conformations of compounds Z-08 (A) and Z-10 (B). Figure S3. The graph of the binding free energy decomposition per residue for JAK1-inhibitor complexes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Sun, L., Xu, Y. et al. Discovery of novel JAK1 inhibitors through combining machine learning, structure-based pharmacophore modeling and bio-evaluation. J Transl Med 21, 579 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Janus kinase 1
  • Machine learning
  • Pharmacophore
  • Molecular dynamics simulations
  • Virtual screening