Table 2 Initial number and types of different features used to encode sequence fragments

Feature types Features Number
Physicochemical property-based Amino acid composition, average flexibility indices, hydrophobicity indices, net charge, partition coefficient, residue volume and molecular weight 147 (21 × 7)
Sequence-based Binary-encoding 420 (21 × 20)
Structural level Accessible surface area; secondary structure (coil, helix and strand) and disordered regions 105 (21 × 5)
Functional features Gene ontology (GO) terms (1) biological process (BP), (2) molecular function (MF) and (3) cellular component (CC); protein domain and KEGG pathway 555 GO, 177 domain, 114 KEGG pathway
Functional annotation UP_SEQ_FEATURE and UP_KEYWORDS 526