From: Encodings and models for antimicrobial peptide classification for multi-resistant pathogens
Encoding | Description | Summary | Used in | Used along with | Main Category |
---|---|---|---|---|---|
Sparse | each amino acid is represented as an one-hot vector of length 20, where each position, except one, is set to 0 | Density: - Information: + | Substitution Matrix, Amino Acid Composition | Sparse encoding | |
Amino Acid Composition | feature vector contains at each position the proportion of an amino acid in relation with the sequence length | Density: + Information: - | Distance Frequency, Quantitative Matrix, Dipeptide Composition, PseAAC | Amino acid composition | |
Distance Frequency | calculates the distance between amino acids of similar properties and bins the occurrence according to the gap length | Density: + Information: + | [22] | Amino acid composition | |
Quantitative Matrix | encodes the propensity of each amino acid at a position | Density: + Information: + | [23] | Amino acid composition | |
CTD | describes the composition (C), transition (T) and distribution (D) of similar amino acids along the peptide sequence | Density: + Information: + | [25] | Amino acid composition | |
Pseudo-amino Acid Composition (PseAAC) | computes the correlation between different ranges among a pair of amino acids | Density: + Information: + | Dipeptide Composition | Pseudo amino acid composition | |
Reduced Amino Acid Alphabet | similar amino acids are grouped together | Density: + Information: o | N-gram Model, AAIndexLoc | Reduced amino acid alphabet | |
N-gram Model | occurrences of n-mers for an alphabet of size m, leading to a mn dimensional, sparse representation of the initial sequence | Density: - Information: o | [9] | Reduced amino acid alphabet | |
AAIndexLoc | k-nearest neighbor clustering to aggregate amino acids into 5 classes using their amino acid index, i.e., amino acids with the respective highest(T), high (H), medium (M), low (L), and lowest (B) values of a particular physicochemical property are clustered together | Density: o Information: + | [37] | Dipeptide Composition | Reduced amino acid alphabet |
Physicochemical Properties | translation of an amino acid to a particular physicochemical property | Density: o Information: + | z-descriptor, d-descriptor and many more | Physicochemical properties | |
z-descriptor | derived from the principal components of physicochemical properties by means of partial least squares (PLS) projections, PLS leads to a subset of five final features, capable to describe the 20 proteinogenic as well as 67 additional amino acids | Density: + Information: + | Physicochemical properties | ||
d-descriptor | amino acid sequence is squeezed between the y- (N-terminus) and the x-axis (C-terminus) with gradually bending of the single amino acids and subsequent vector summation | Density: + Information: + | [54] | Physicochemical properties | |
Autocorrelation | interdependence between two distant amino acids in a peptide sequence | Density: + Information: + | Autocorrelation | ||
Substitution/Scoring Matrix | provide accepted mutations between amino acid pairs, i.e., sequence alterations with either no or positive impact in terms of the protein function | Density: + Information: + | BLOMAP, Sparse, Amino Acid Composition, Dipeptide Composition, PseAAC, AAIndexLoc | Substitution and scoring matrix | |
BLOMAP | incorporates the BLOSUM62 to calculate distances in a high dimensional input space, i.e., the substitution matrix, to a lower dimension, using the Shannon-projection | Density: + Information: + | [65] | Substitution and scoring matrix | |
Fourier Transformation | to detect underlying patterns in time series, by transforming the time signal to a frequency domain | Density: o Information: + | Fourier Transformation |