TY - JOUR
T1 - Random Forest classification based on star graph topological indices for antioxidant proteins
AU - Fernández-Blanco, Enrique
AU - Aguiar-Pulido, Vanessa
AU - Robert Munteanu, Cristian
AU - Dorado, Julian
N1 - Funding Information:
Vanessa Aguiar-Pulido and Cristian R. Munteanu acknowledge the funding support for a research position by the “Plan I2C” and an “Isidro Parga Pondal” Program both from Xunta de Galicia, Spain (supported by the European Social Fund). The authors also want to thank the support from different proyects that has funded part of this research (CN 2011/034, CN2012/127, 10SIN105004PR, O9SIN010105PR and TIN-2009-07707).
PY - 2013/1/21
Y1 - 2013/1/21
N2 - Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randić's Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.
AB - Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randić's Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.
KW - Antioxidant protein
KW - Multi-target QSAR
KW - Star Graph
KW - Topological indices
UR - http://www.scopus.com/inward/record.url?scp=84869432205&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84869432205&partnerID=8YFLogxK
U2 - 10.1016/j.jtbi.2012.10.006
DO - 10.1016/j.jtbi.2012.10.006
M3 - Article
C2 - 23116665
AN - SCOPUS:84869432205
VL - 317
SP - 331
EP - 337
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
SN - 0022-5193
ER -