Prediction experiments
Human drug-target interactions database
In this study, we use the DrugBank established by Wishart et al. as the benchmark dataset, which can be downloaded at https://www.drugbank.ca [20]. The chemical structure of each drug in SMILES format is extracted from and extracted from DrugBank. In the experiments, only those that satisfied the human target represented by a unique EnsemblProt login number were used. In detail, 904 drugs and 613 unique human targets (proteins) were linked to construct a DTI network \(A\) as positive samples, and a matching number of unknown drug-target pairs (by excluding all known DTIs) were randomly selected as negative samples. The labels of training set and testing set are binary label.
Feature representation
Gaussian interaction profile kernel similarity for drugs and targets
On the basis of previous work, drug similarity can be measured by calculating nuclear similarity through Gaussian interaction profile (GIP) kernel similarity [21, 22]. The GIP similarity between drug \({d}_{i}\) and drug \({d}_{j}\) is defined as follow:
$$ {\text{D}}_{sim} \left( {d_{i} ,d_{j} } \right) = exp\left( { - \tau_{d} *\left| {\left| {V\left( {d_{i} } \right) - V\left( {d_{j} } \right)} \right|} \right|^{2} } \right) $$
(1)
where the binary vector \(V\left({d}_{i}\right)\) and \(V\left({d}_{j}\right)\) is the i-th row vector and the j-th row vector of the drug-target interaction network \(A\). The parameter \({\tau }_{d}\) is the kernel bandwidth. It computes by normalizing original parameter \(\tau_{d} ^{\prime}\):
$$ \tau_{d} = \frac{{\tau^{\prime}_{d} }}{{\frac{1}{{n_{d} }}\mathop \sum \nolimits_{i = 1}^{{n_{d} }} \left| {\left| {V\left( {d_{i} } \right)} \right|} \right|^{2} }} $$
(2)
Similarly, the GIP similarity for targets can be defined as follows:
$$ {\text{D}}_{sim} \left( {d_{i} ,d_{j} } \right) = exp\left( { - \tau_{d} *\left| {\left| {V\left( {p_{i} } \right) - V\left( {p_{j} } \right)} \right|} \right|^{2} } \right) $$
(3)
where the binary vector \(V\left({p}_{i}\right)\) and \(V\left({p}_{j}\right)\) is the i-th row vector and the j-th column vector of the drug-target interaction network \(A\). The parameter \({\tau }_{p}\) is the kernel bandwidth. It computes by normalizing original parameter \(\tau_{p} ^{\prime}\):
$$ \tau_{p} = \frac{{\tau^{\prime}_{p} }}{{\frac{1}{{n_{p} }}\mathop \sum \nolimits_{i = 1}^{{n_{p} }} \left| {\left| {V\left( {p_{i} } \right)} \right|} \right|^{2} }} $$
(4)
Protein sequence feature
The sequences for drug targets (proteins) in Homo sapiens downloaded from the String database ( https://string-db.org/) [24]. The k-mer algorithm is used to count Subsequence information in protein sequences and uses it as a feature vector to solve the alignment problem posed by differences in sequence length [24].
Drug structure feature
The SMILES for drugs downloaded from the DrugBank database. We use Morgan fingerprint, a circular fingerprint, to map the structure information of drugs to feature vectors.
Graph embedding-based feature for drugs and targets
Graph data is rich in behavioral information about nodes, and behavioral information can be used as a descriptor to describe drugs and targets that can be more comprehensive description of the characteristics [25]. So how do we map a high-dimensional dense matrix like graph data to a low-density vector? Here we introduce the Graph Factorization algorithm [26]. Graph factorization (GF) is a method for graph embedding with time complexity O(|E|). To obtain the embedding, GF factorizes the adjacency matrix of the graph to minimize the loss functions as follow:
$$ \varepsilon \left( {P,Q,\lambda } \right) = \frac{1}{2}\mathop \sum \limits_{{\left( {i,j} \right) \in E}} \left( {P_{ij} - \left\langle {Q_{i} ,Q_{j} } \right\rangle } \right)^{2} + \frac{\lambda }{2}\mathop \sum \limits_{i} \left| {\left| {Q_{i} } \right|} \right|^{2} $$
(5)
where \(\lambda \) is the regularization coefficient. \(P\) and \(Q\) are the adjacency matrix with weights and factor matrix, respectively. \(E\) is the set of edges, which includes i and j.
The gradient of the function \(\varepsilon \) with respect to \({Q}_{i}\) is defined as follow:
$$ \frac{\partial \varepsilon }{{\partial Q_{i} }} = - \mathop \sum \limits_{{k \in N_{o} }} \left( {P_{ij} - \left\langle {Q_{i} ,Q_{j} } \right\rangle } \right)Q_{j} + \lambda Q_{i} $$
(6)
where \({N}_{o}\) is the set of neighbors of node \(o\). With the Graph Factorization algorithm, graph embeddings of drugs and targets in the drug-target interaction network can be obtained to describe their behavioral information.
Stacked autoencoder
As DLDTI integrates heterogeneous data from multiple sources, including protein sequence information, drug structure information, and drug-target interaction network information, the integrated biological data suffers from noise, incomplete and high-dimensional. Here, the stack autoencoder (SAE) is introduced to find the optimal mapping of drug space to target space to obtain low dimensional drug Feature vector [27, 28]. SAE can be defined as follows:
$$ y = f\left( x \right) = S_{e} \left( {W + b} \right) $$
(7)
$$ z = g\left( y \right) = S_{d} \left( {W^{\prime}y + b^{\prime}} \right) $$
(8)
where \(y\) and \(z\) are encoding function and decoding function respectively. \(W\) and \({W}^{^{\prime}}\) are the relational parameters between two layers. \(b\) and \({b}^{^{\prime}}\) are vectors of bias parameters. The activation function used is ReLU:
$$ S_{e} \left( t \right) = S_{d} \left( t \right) = max\left( {0,W^{T} + b} \right) $$
(9)
Convolutional neural network
Lecun et al. proposed convolutional neural networks in 1989 [29]. Subsequently, they have performed well in tasks such as image classification, sentence classification, and biological data analysis. Thus, in this study, convolutional neural networks were used to train supervised learning models to predict potential DTIs. In this work, convolutional neural networks were chosen as supervised learning models to learn deep features and predict potential DTIs. The model used includes convolutional and activation layers, a Maxpooling layer, a fully connected layer and a softmax layer. Their roles are, respectively, to extract depth features, down-sample, and classify samples. The convolutional layer is one of the most important parts of the CNN and aims to learn the deep characteristics of the input vectors, which is defined as follows
$$ C_{m} = \mathop \sum \limits_{i = 1}^{{N_{k} }} W_{i} X_{m + j} $$
(10)
where \(X\) is the input feature of length\(L\). \({N}_{k}\) is the number of kernels.\(m\in \{0, ... , L-N\}\), W is a weight vector of length\({N}_{k}\). Then, the feature map \({C}_{m}\) is put into the activation function ReLU, which is defined as follow:
$$ f\left( x \right) = max\left( {0,x} \right) $$
(11)
The role of the ReLU function is to increase the nonlinear relationship between the layers of the neural network, save computation, solve the gradient disappearance problem, and reduce the interdependence of parameters to mitigate the overfitting problem.
The convolutional and maximum pooling layers can extract important features from the input vectors. The output of all kernels is then concatenated into a vector and fed to the fully-connected layer \(f\left(W\cdot y\right)\). Where \(y\) is the output of Maxpooling layer and \(W\) is the weight matrix. Finally, the softmax layer scores the input vectors as a percentage.
Pathway analysis of predicted results from DLDTI
Atherosclerosis-related gene sets were collected from GeneCards (https://www.genecards.org/) [30]. After using retrieve tool on Uniprot database (https://www.uniprot.org/), different identifiers from Drug Bank and GeneCards were converted to UniProtKB. Based the intersection of potential targets of TMPZ from DLDTI model and confirmed target proteins of atherosclerosis, the matched targets were regarded as the predicted targets of TMPZ on atherosclerosis. The predicted targets were uploaded to the Search Tool for the Retrieval of Interacting Genes/Proteins database (STRING, Version 11) (https://string-db.org/) [23] for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) biological process analysis.
Validation experiments
Ldlr−/− hamsters
This study was approved by the Animal Ethics Committee of Xiyuan Hospital and strictly adhered to the principles of laboratory animal care (NIH publication No.85Y23, revised 1996). Male, 8 week aged and low-density lipoprotein receptor knock-out (Ldlr−/−) hamsters were provided by the health science center, Peking University. The Ldlr−/− genotype was confirmed using polymerase chain reaction (PCR) analysis of DNA extracts from ears [31]. After 1 week of acclimatization, they were fed on high-cholesterol and high-fat (HCHF) diet containing 15% lard and 0.5% cholesterol (Biotech company, China) for 8 weeks. The Ldlr−/− hamsters were then randomly divided into three groups according to their weights (n = 8 per group) and orally administered with a mixture of volume vehicle (distilled water), TMPZ (32 mg/kg/d) and clopidogrel (32 mg/kg/d) drugs for 8 weeks. Wild type (WT) golden Syrian hamsters (n = 8) purchased from Vital River Laboratory (Charles River, Beijing, China) were fed on a standard chow diet as healthy control. All hamsters were maintained on a 12 h light/12 h dark cycle with free access to water.
Hamsters were fasted for 12 h and anesthetized by intraperitoneal injection of 1% sodium phenobarbital (70 mg/kg). Blood samples were taken from abdominal aortas and plasma was separated by centrifugation for 10 min at 2700 × g. TC, TG and HDL were determined using commercially available kits (BIOSINO, China).
Oil red O staining
As described previously [32], anesthetized hamsters were perfused with 0.01 M PBS through the left ventricle. In brief, hearts and whole aortas were placed in 4% paraformaldehyde solution overnight, transferred to 20% sucrose solution for 1 week. Hearts were then fixed into O.C.T compound and cross-sectioned (8 μm per slice). The atherosclerotic lesions in aortic root were stained with 0.3% Oil red O solution (Solarbio, China), rinsed with 60% isopropanol and distilled water and counterstained with hematoxylin. The results were represented by the percentage positive area of total area (en face analysis) and net lesion area (aortic root sections). Images were analyzed with Image J [33].
Histological analysis
Analysis of atherosclerotic plaque cell composition was determined by immunohistochemistry (IHC) analysis of the aortic root. Macrophages and smooth muscle cells (SMC) were stained with CD68 (BOSTER, BA36381:100) antibody and a-SMA antibody (BOSTER, A03744, 1:100) as reported previously in hamster researches [31]. Then biotinylated second antibody (Vector Laboratories, ABC Vectastain, 1:200) were used for incubation under 2% normal blocking serum. The cryosections were visualized using 3,3-diaminobenzidine (Vector Laboratories, DAB Vectastain). The results were represented by the percentage positive area of total cross-sectional vessel wall area in the aortic root sections and analyzed using Image J [33].
Washed platelet preparation
Blood per hamster, 3 to 4 mL was collected from abdominal aortas into a tube containing an acid-citrate-dextrose anticoagulant (83.2 mM D-glucose, 85 mM trisodium citrate dihydrate, 19 mM citric acid monohydrate, pH5.5). Platelet-rich plasma (PRP) was prepared after centrifugation at 300 × g for 10 min in room temperature. For washed platelet preparation, PRP was centrifuged at 1500 × g for 2 min. After collecting supernatant consisting of platelet-poor plasma into another centrifuge tube, the remaining PRP was washing three times, and the pellet was re-suspended in a modified Tyrode buffer (2.4 mM HEPES, 6.1 mM D-glucose, 137 mM NaCl, 12 mM HaHCO3, 2.6 mM KCl, pH7.4).
Assessment of platelet activity
Washed platelets were loaded with fura-2/AM(5 μM, Molecular Probe) in the presence of Pluronic F-127 (0.2 μg/mL, Molecular Probe) and then incubated at 37 °C for 1 h in the dark [34]. Platelets were washed and re-suspended in Tyrode buffer containing 1 mM calcium. After activation of ADP (20 μM, Sigma), intracellular calcium concentration was measured using a fluorescence mode of Synergy H1 microplate reader (Biotek, USA). Excitation wavelengths was alternated at 340 and 380 nm. Excitation was measured at 510 nm. TritonX-100 and EGTA were used for calibration of maximal and minimal calcium concentrations, respectively. Washed platelets were activated by ADP and then lysed by 0.1 M HCl on ice. According to the manufacturer’s instructions, the level of intracellular cAMP was determined by ELISA (Enzo Life Sciences, ADI-900-066).
Western blot analysis
Washed platelets from each group were lysed with radioimmunoprecipitation assay buffer with the presence of protease and phosphatase inhibitor mixtures on ice (Solarbio, China). Lysates were separated by 10,000 × g centrifugation for 10 min at 4 °C. Total protein concentrations were determined by BCA method. Equal amounts of total protein (40 μg) were resolved in SDS-PAGE and electroblotted. The nitrocellulose membranes were blocked with 5% skimmed milk at room temperature for 2 h and incubated with primary antibodies targeting PI3K(CST, 4257 T, 1:500), Akt(CST, 9272, 1:2000), p-Akt(CST,2965,1:1000) and GADPH (Abcam, ab8245, 1:5000) overnight at 4 °C. The membranes were then incubated with the HRP-conjugated anti-rabbit antibody for 1 h at 37 °C, followed by enhanced chemiluminescence detection.
Statistical analysis
All data were expressed as mean ± standard error. Shapiro-Wild test and Levene’s test were used for normality of data distribution and homogeneity of variances, respectively. An unpaired student’s t-test were used to compare data in different groups when data normally distributed and variances were equal among groups. Unpaired t test with Welch’s correction were used when unequal standard deviation among groups. Mann–Whitney test were used for nonparametric test. All p values less than 0.05 were considered statistically significant. All statistical analyses were performed using GraphPad Prism 8.0 (GraphPad, United states).