Kinome-wide activity modeling from diverse public high-quality data sets

Stephan C Schuerer, Steven M. Muskal

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Large corpora of kinase small molecule inhibitor data are accessible to public sector research from thousands of journal article and patent publications. These data have been generated employing a wide variety of assay methodologies and experimental procedures by numerous laboratories. Here we ask the question how applicable these heterogeneous data sets are to predict kinase activities and which characteristics of the data sets contribute to their utility. We accessed almost 500 000 molecules from the Kinase Knowledge Base (KKB) and after rigorous aggregation and standardization generated over 180 distinct data sets covering all major groups of the human kinome. To assess the value of the data sets, we generated hundreds of classification and regression models. Their rigorous cross-validation and characterization demonstrated highly predictive classification and quantitative models for the majority of kinase targets if a minimum required number of active compounds or structure-activity data points were available. We then applied the best classifiers to compounds most recently profiled in the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program and found good agreement of profiling results with predicted activities. Our results indicate that, although heterogeneous in nature, the publically accessible data sets are exceedingly valuable and well suited to develop highly accurate predictors for practical Kinome-wide virtual screening applications and to complement experimental kinase profiling.

Original languageEnglish
Pages (from-to)27-38
Number of pages12
JournalJournal of Chemical Information and Modeling
Volume53
Issue number1
DOIs
StatePublished - Jan 28 2013

Fingerprint

data quality
Phosphotransferases
Molecules
Standardization
Assays
Screening
Classifiers
Agglomeration
activity structure
aggregation
patent
public sector
regression
methodology

ASJC Scopus subject areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Kinome-wide activity modeling from diverse public high-quality data sets. / Schuerer, Stephan C; Muskal, Steven M.

In: Journal of Chemical Information and Modeling, Vol. 53, No. 1, 28.01.2013, p. 27-38.

Research output: Contribution to journalArticle

@article{8b3cfd365a1b4c2089a342ddad7fe32f,
title = "Kinome-wide activity modeling from diverse public high-quality data sets",
abstract = "Large corpora of kinase small molecule inhibitor data are accessible to public sector research from thousands of journal article and patent publications. These data have been generated employing a wide variety of assay methodologies and experimental procedures by numerous laboratories. Here we ask the question how applicable these heterogeneous data sets are to predict kinase activities and which characteristics of the data sets contribute to their utility. We accessed almost 500 000 molecules from the Kinase Knowledge Base (KKB) and after rigorous aggregation and standardization generated over 180 distinct data sets covering all major groups of the human kinome. To assess the value of the data sets, we generated hundreds of classification and regression models. Their rigorous cross-validation and characterization demonstrated highly predictive classification and quantitative models for the majority of kinase targets if a minimum required number of active compounds or structure-activity data points were available. We then applied the best classifiers to compounds most recently profiled in the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program and found good agreement of profiling results with predicted activities. Our results indicate that, although heterogeneous in nature, the publically accessible data sets are exceedingly valuable and well suited to develop highly accurate predictors for practical Kinome-wide virtual screening applications and to complement experimental kinase profiling.",
author = "Schuerer, {Stephan C} and Muskal, {Steven M.}",
year = "2013",
month = "1",
day = "28",
doi = "10.1021/ci300403k",
language = "English",
volume = "53",
pages = "27--38",
journal = "Journal of Chemical Information and Computer Sciences",
issn = "0095-2338",
publisher = "American Chemical Society",
number = "1",

}

TY - JOUR

T1 - Kinome-wide activity modeling from diverse public high-quality data sets

AU - Schuerer, Stephan C

AU - Muskal, Steven M.

PY - 2013/1/28

Y1 - 2013/1/28

N2 - Large corpora of kinase small molecule inhibitor data are accessible to public sector research from thousands of journal article and patent publications. These data have been generated employing a wide variety of assay methodologies and experimental procedures by numerous laboratories. Here we ask the question how applicable these heterogeneous data sets are to predict kinase activities and which characteristics of the data sets contribute to their utility. We accessed almost 500 000 molecules from the Kinase Knowledge Base (KKB) and after rigorous aggregation and standardization generated over 180 distinct data sets covering all major groups of the human kinome. To assess the value of the data sets, we generated hundreds of classification and regression models. Their rigorous cross-validation and characterization demonstrated highly predictive classification and quantitative models for the majority of kinase targets if a minimum required number of active compounds or structure-activity data points were available. We then applied the best classifiers to compounds most recently profiled in the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program and found good agreement of profiling results with predicted activities. Our results indicate that, although heterogeneous in nature, the publically accessible data sets are exceedingly valuable and well suited to develop highly accurate predictors for practical Kinome-wide virtual screening applications and to complement experimental kinase profiling.

AB - Large corpora of kinase small molecule inhibitor data are accessible to public sector research from thousands of journal article and patent publications. These data have been generated employing a wide variety of assay methodologies and experimental procedures by numerous laboratories. Here we ask the question how applicable these heterogeneous data sets are to predict kinase activities and which characteristics of the data sets contribute to their utility. We accessed almost 500 000 molecules from the Kinase Knowledge Base (KKB) and after rigorous aggregation and standardization generated over 180 distinct data sets covering all major groups of the human kinome. To assess the value of the data sets, we generated hundreds of classification and regression models. Their rigorous cross-validation and characterization demonstrated highly predictive classification and quantitative models for the majority of kinase targets if a minimum required number of active compounds or structure-activity data points were available. We then applied the best classifiers to compounds most recently profiled in the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program and found good agreement of profiling results with predicted activities. Our results indicate that, although heterogeneous in nature, the publically accessible data sets are exceedingly valuable and well suited to develop highly accurate predictors for practical Kinome-wide virtual screening applications and to complement experimental kinase profiling.

UR - http://www.scopus.com/inward/record.url?scp=84873024709&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873024709&partnerID=8YFLogxK

U2 - 10.1021/ci300403k

DO - 10.1021/ci300403k

M3 - Article

C2 - 23259810

AN - SCOPUS:84873024709

VL - 53

SP - 27

EP - 38

JO - Journal of Chemical Information and Computer Sciences

JF - Journal of Chemical Information and Computer Sciences

SN - 0095-2338

IS - 1

ER -