Skip to main content

Table 2 Initial number and types of different features used to encode sequence fragments

From: Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins

Feature types

Features

Number

Physicochemical property-based

Amino acid composition, average flexibility indices, hydrophobicity indices, net charge, partition coefficient, residue volume and molecular weight

147 (21 × 7)

Sequence-based

Binary-encoding

420 (21 × 20)

Structural level

Accessible surface area; secondary structure (coil, helix and strand) and disordered regions

105 (21 × 5)

Functional features

Gene ontology (GO) terms (1) biological process (BP), (2) molecular function (MF) and (3) cellular component (CC); protein domain and KEGG pathway

555 GO, 177 domain, 114 KEGG pathway

Functional annotation

UP_SEQ_FEATURE and UP_KEYWORDS

526