Skip to main content

Table 1 Ten-fold cross validation performance of predictive models on viral siRNA dataset of 1380 sequences (T 1380 ) using SVM, ANN, KNN and REP Tree machine learning techniques

From: VIRsiRNApred: a web server for predicting inhibition efficacy of siRNAs targeting human viruses

Predictive model no.

siRNA features

No. of siRNA features

Pearson correlation coefficient* on training (T1380) dataset# during 10-fold cross validation

SVM

ANN

KNN

REP Tree

1

 

Mononucleotide frequency

4

0.19

0.10

0.11

0.10

2

Dinucleotide frequency

16

0.32

0.29

0.29

0.29

3

Trinucleotide frequency

64

0.42

0.28

0.30

0.28

4

Tetranucleotide frequency

256

0.43

0.28

0.30

0.30

5

Pentanucleotide frequency

1024

0.46

0.29

0.30

0.30

6

Binary

76

0.19

0.10

0.11

0.11

7

Thermodynamic features

21

0.26

0.22

0.21

0.20

8

Secondary structure

28

0.07

0.04

0.04

0.04

9

 

1 + 2 + 3 + 4 + 5

1364

0.48

0.30

0.31

0.31

10

6 + 9

1440

0.50

0.36

0.41

0.32

11

6 + 7 + 9

1461

0.55

0.46

0.48

0.45

12

 

6 + 7 + 8 + 9

1489

0.53

0.42

0.44

0.42

  1. *Pearson Correlation Coefficient (PCC) is the correlation between experimental and predicted viral siRNA efficacy.
  2. #T1380 is the training dataset of experimental viral siRNA. Predictive Models 1-8 were developed on individual siRNA features while models 9-12 were based on hybrid siRNA features.