Discordancy partitioning for validating potentially inconsistent pharmacogenomic studies

Jonnagadda S Rao, Hongmei Liu

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) are two major studies that can be used to mine for therapeutic biomarkers for cancers of a large variety. Model validation using the two datasets however has proved challenging. Both predictions and signatures do not consistently validate well for models built on one dataset and tested on the other. While the genomic profiling seems consistent, the drug response data is not. Some efforts at harmonizing experimental designs has helped but not entirely removed model validation difficulties. In this paper, we present a partitioning strategy based on a data sharing concept which directly acknowledges a potential lack of concordance between datasets and in doing so, also allows for extraction of reproducible novel gene-drug interaction signatures as well as accurate test set predictions. We demonstrate these properties in a re-analysis of the GDSC and CCLE datasets.

Original languageEnglish (US)
Article number15169
JournalScientific Reports
Volume7
Issue number1
DOIs
StatePublished - Dec 1 2017

Fingerprint

Drug interactions
Biomarkers
Design of experiments
Genes
Cells
Pharmacogenetics
Genomics

ASJC Scopus subject areas

  • General

Cite this

Discordancy partitioning for validating potentially inconsistent pharmacogenomic studies. / Rao, Jonnagadda S; Liu, Hongmei.

In: Scientific Reports, Vol. 7, No. 1, 15169, 01.12.2017.

Research output: Contribution to journalArticle

@article{49dada0ea7a445628479f886451b264a,
title = "Discordancy partitioning for validating potentially inconsistent pharmacogenomic studies",
abstract = "The Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) are two major studies that can be used to mine for therapeutic biomarkers for cancers of a large variety. Model validation using the two datasets however has proved challenging. Both predictions and signatures do not consistently validate well for models built on one dataset and tested on the other. While the genomic profiling seems consistent, the drug response data is not. Some efforts at harmonizing experimental designs has helped but not entirely removed model validation difficulties. In this paper, we present a partitioning strategy based on a data sharing concept which directly acknowledges a potential lack of concordance between datasets and in doing so, also allows for extraction of reproducible novel gene-drug interaction signatures as well as accurate test set predictions. We demonstrate these properties in a re-analysis of the GDSC and CCLE datasets.",
author = "Rao, {Jonnagadda S} and Hongmei Liu",
year = "2017",
month = "12",
day = "1",
doi = "10.1038/s41598-017-15590-4",
language = "English (US)",
volume = "7",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - Discordancy partitioning for validating potentially inconsistent pharmacogenomic studies

AU - Rao, Jonnagadda S

AU - Liu, Hongmei

PY - 2017/12/1

Y1 - 2017/12/1

N2 - The Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) are two major studies that can be used to mine for therapeutic biomarkers for cancers of a large variety. Model validation using the two datasets however has proved challenging. Both predictions and signatures do not consistently validate well for models built on one dataset and tested on the other. While the genomic profiling seems consistent, the drug response data is not. Some efforts at harmonizing experimental designs has helped but not entirely removed model validation difficulties. In this paper, we present a partitioning strategy based on a data sharing concept which directly acknowledges a potential lack of concordance between datasets and in doing so, also allows for extraction of reproducible novel gene-drug interaction signatures as well as accurate test set predictions. We demonstrate these properties in a re-analysis of the GDSC and CCLE datasets.

AB - The Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) are two major studies that can be used to mine for therapeutic biomarkers for cancers of a large variety. Model validation using the two datasets however has proved challenging. Both predictions and signatures do not consistently validate well for models built on one dataset and tested on the other. While the genomic profiling seems consistent, the drug response data is not. Some efforts at harmonizing experimental designs has helped but not entirely removed model validation difficulties. In this paper, we present a partitioning strategy based on a data sharing concept which directly acknowledges a potential lack of concordance between datasets and in doing so, also allows for extraction of reproducible novel gene-drug interaction signatures as well as accurate test set predictions. We demonstrate these properties in a re-analysis of the GDSC and CCLE datasets.

UR - http://www.scopus.com/inward/record.url?scp=85033687123&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85033687123&partnerID=8YFLogxK

U2 - 10.1038/s41598-017-15590-4

DO - 10.1038/s41598-017-15590-4

M3 - Article

C2 - 29123200

AN - SCOPUS:85033687123

VL - 7

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 15169

ER -