tRFTar: Identifying The Targets of tRNA-Derived Fragments

Background: The tRNA-derived fragments (tRFs) are 14–40 nucleotides, small non-coding RNAs from specic tRNA cleavages, and they have key regulatory functions in many biological processes. Many studies showed that tRFs are associated with Argonaute complexes and inhibit gene expression in the same manner as miRNAs. However, there are currently no tools to accurately predict tRF target genes. Methods: We used tRF-mRNA pairs identied by crosslinking, ligation, and sequencing of hybrids (CLASH) and covalent ligation of endogenous Argonaute-bound RNAs (CLEAR)-CLIP to assess features that may participate in tRF targeting, including sequence context of each individual site and tRF-mRNA interactions. We applied genetic algorithm (GA) to select key features and support vector machine (SVM) to construct tRF predicting models. Results: We rst identied features that globally inuenced tRF targeting. Among them, the most signicant ones were minimum free folding energy (MFE), position 8 match, number of bases paired in tRF-mRNA duplex, and length of tRF, which were consistent with previous ndings. We built the model with the area under the receiver operating characteristic (ROC) curve (AUC) = 0.980 (0.977-0.983) in the training process and AUC = 0.847 (0.83-0.861) in the test process. The model was applied to all the sites with perfect Watson-Crick complementarity to the seed in the 3'-UTR of human genome. Seven of nine target / non-target genes of tRFs conrmed by reporter assay were predicted. Conclusions: Predictions can be obtained online, tRFTar, freely available at http://trftar.cmuzhenninglab.org:3838/tar/, which is the rst tool to predict targets of tRFs in human with a user-friendly interface.

(CLASH) and covalent ligation of endogenous Argonaute-bound RNAs (CLEAR)-CLIP to assess features that may participate in tRF targeting, including sequence context of each individual site and tRF-mRNA interactions. We applied genetic algorithm (GA) to select key features and support vector machine (SVM) to construct tRF predicting models.
Results: We rst identi ed features that globally in uenced tRF targeting. Among them, the most signi cant ones were minimum free folding energy (MFE), position 8 match, number of bases paired in tRF-mRNA duplex, and length of tRF, which were consistent with previous ndings. We built the model with the area under the receiver operating characteristic (ROC) curve (AUC) = 0.980 (0.977-0.983) in the training process and AUC = 0.847 (0.83-0.861) in the test process. The model was applied to all the sites with perfect Watson-Crick complementarity to the seed in the 3'-UTR of human genome. Seven of nine target / non-target genes of tRFs con rmed by reporter assay were predicted.
Conclusions: Predictions can be obtained online, tRFTar, freely available at http://trftar.cmuzhenninglab.org:3838/tar/, which is the rst tool to predict targets of tRFs in human with a user-friendly interface.

Full Text
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF. Figure 1 The work ow of the target prediction pipeline of tRFTar.

Figure 2
Features that in uence tRF targeting. Features most signi cantly different between the positive group and background are displayed. The size of the petal represents -log(P value), which indicates the degree of signi cance.  Comparison of tRF target predicting models. A. The receiver operating characteristic (ROC) curve for classi cation of the pairs for model establishment, including SVM-GA model, conservation model, probabilistic model and intersection of miRNA target predicting models (TargetScan and miRanda). B. The relationship of features for potential target site with the probabilistic model or conservative model. The color of boxes represents the coe cient of correlations and * represents the signi cance of tRFTar and miRNA target predicting models (TargetScan and miRanda). D. Ternary plot of the number of targets of each tRFs. The value to each axis represents the proportion of targets predicted by corresponding models relative to all potential targets. The node color represents the number of potential targets by intersection of three models. As the number of targets increased, the node color changes from red to blue. The node size indicates the number of seed pairings in whole 3'-UTR. The larger the node is, the greater number of seed matches the 3'-UTRs have. Figure 5 Search the tRF targets in tRFTar with an example of 3001a. A. Overview of the Search page interface. B. Users can search the target according to the tRFs in tRFdb or anticodon/ amino acid of the source tRNA.
C. Users can input the corresponding name and rank the target candidates according to positive possibility assigned by the SVM-GA model, the conservative score or the sites overrepresented by the probabilistic model. D. The result panel of the target site of tRF 3001a.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Supplementarymaterial.pdf