Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

Alex M. Clark, Barry A. Bunin, Nadia K. Litterman, Stephan C Schuerer, Ubbo E Visser

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.

Original languageEnglish (US)
Article numbere524
JournalPeerJ
Volume2014
Issue number1
DOIs
StatePublished - 2014

Fingerprint

Bioassay
artificial intelligence
Semantics
Biological Assay
Learning systems
Assays
bioassays
user interface
Bioinformatics
Computational Biology
Network protocols
bioinformatics
drugs
User interfaces
assays
researchers
Research Personnel
Natural Language Processing
Computer-Aided Design
Drug Design

Keywords

  • Bayesian
  • Bioassay
  • Machine learning
  • Natural language processing
  • Ontology
  • Semantic curation

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)
  • Neuroscience(all)

Cite this

Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation. / Clark, Alex M.; Bunin, Barry A.; Litterman, Nadia K.; Schuerer, Stephan C; Visser, Ubbo E.

In: PeerJ, Vol. 2014, No. 1, e524, 2014.

Research output: Contribution to journalArticle

@article{aa5d5ac617f849e096356e1ec0ddd24a,
title = "Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation",
abstract = "Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.",
keywords = "Bayesian, Bioassay, Machine learning, Natural language processing, Ontology, Semantic curation",
author = "Clark, {Alex M.} and Bunin, {Barry A.} and Litterman, {Nadia K.} and Schuerer, {Stephan C} and Visser, {Ubbo E}",
year = "2014",
doi = "10.7717/peerj.524",
language = "English (US)",
volume = "2014",
journal = "PeerJ",
issn = "2167-8359",
publisher = "PeerJ",
number = "1",

}

TY - JOUR

T1 - Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

AU - Clark, Alex M.

AU - Bunin, Barry A.

AU - Litterman, Nadia K.

AU - Schuerer, Stephan C

AU - Visser, Ubbo E

PY - 2014

Y1 - 2014

N2 - Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.

AB - Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.

KW - Bayesian

KW - Bioassay

KW - Machine learning

KW - Natural language processing

KW - Ontology

KW - Semantic curation

UR - http://www.scopus.com/inward/record.url?scp=84922888863&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922888863&partnerID=8YFLogxK

U2 - 10.7717/peerj.524

DO - 10.7717/peerj.524

M3 - Article

AN - SCOPUS:84922888863

VL - 2014

JO - PeerJ

JF - PeerJ

SN - 2167-8359

IS - 1

M1 - e524

ER -