Prediction of protein-glucose binding sites using support vector machines

Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz

Research output: Contribution to journalArticlepeer-review

32 Scopus citations


Glucose is a simple sugar that plays an essential role in many basic metabolic and signaling pathways. Many proteins have binding sites that are highly specific to glucose. The exponential increase of genomic data has revealed the identity of many proteins that seem to be central to biological processes, but whose exact functions are unknown. Many of these proteins seem to be associated with disease processes. Being able to predict glucose-specific binding sites in these proteins will greatly enhance our ability to annotate protein function and may significantly contribute to drug design. We hereby present the first glucose-binding site classifier algorithm. We consider the sugar-binding pocket as a spherical spatio-chemical environment and represent it as a vector of geometric and chemical features. We then perform Random Forests feature selection to identify key features and analyze them using support vector machines classification. Our work shows that glucose binding sites can be modeled effectively using a limited number of basic chemical and residue features. Using a leave-one-out cross-validation method, our classifier achieves a 8.11% error, a 89.66% sensitivity and a 93.33% specificity over our dataset. From a biochemical perspective, our results support the relevance of ordered water molecules and ions in determining glucose specificity. They also reveal the importance of carboxylate residues in glucose binding and the high concentration of negatively charged atoms in direct contact with the bound glucose molecule.

Original languageEnglish (US)
Pages (from-to)121-132
Number of pages12
JournalProteins: Structure, Function and Bioinformatics
Issue number1
StatePublished - Oct 1 2009


  • Binding site signature
  • Carbohydrate
  • Feature vector
  • Hexose
  • Protein-carbohydrate interaction
  • Random forests
  • SVM
  • Substrate recognition

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology


Dive into the research topics of 'Prediction of protein-glucose binding sites using support vector machines'. Together they form a unique fingerprint.

Cite this