Automated Molecular Identity Disambiguator (AutoMID)

Project: Research project

Project Details


PROJECT SUMMARY Small molecules are one of the most important classes of therapeutics alleviating suffering and in many cases death for hundreds of millions of people worldwide. Small molecules also serve as invaluable tools to study biology, often with the goal to validate novel targets for the development of future therapeutic drugs. Reproducibility of experimental results and the interoperability and reusability of resulting datasets depend on accurate descriptions of associated research objects, and most critically on correct representations of small molecules that are tested in biological assays. For example, it is not possible to develop predictive models of protein target - small molecule interactions if their chemical structure representations are not correct. Many factors contribute to errors in reported chemical structures in small molecule screening and omics reference databases, scientific publications, and many other web-based resources and documents. Because of the complexity of representing small molecules chemical structure graphs and the lack of thorough curation, errors are frequently introduced by non-experts and error propagation across different digital research assets is a pervasive problem. To address this challenging problem via a scalable approach, we propose the Automated Molecular Identity Disambiguator (AutoMID). AutoMID will be usable in batch mode at scale via an API, for example to assist chemical structure standardization and registration by maintainers of digital research assets, and also via interactive (UI) mode for everyday researchers to quickly and easily validate or correct their small molecule representations. AutoMID will leverage extensive highly standardized linked databases of chemical structures and associated information including names, synonyms, biological activity and physical properties and their sources / provenance and leverage expert rules and AI to enable reliable disambiguation of chemical structure identities at scale.
Effective start/end date3/1/212/28/22


  • U.S. National Library of Medicine: $265,972.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.