Categorycompare, an analytical tool based on feature annotations

Robert M. Flight, Benjamin J. Harrison, Fahim Mohammad, Mary B Bunge, Lawrence D F Moon, Jeffrey C. Petruska, Eric C. Rouchka

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Assessment of high-throughput-omics data initially focuses on relative or raw levels of a particular feature, such as an expression value for a transcript, protein, or metabolite. At a second level, analyses of annotations including known or predicted functions and associations of each individual feature, attempt to distill biological context. Most currently available comparative- and meta-analyses methods are dependent on the availability of identical features across data sets, and concentrate on determining features that are differentially expressed across experiments, some of which may be considered "biomarkers." The heterogeneity of measurement platforms and inherent variability of biological systems confounds the search for robust biomarkers indicative of a particular condition. In many instances, however, multiple data sets show involvement of common biological processes or signaling pathways, even though individual features are not commonly measured or differentially expressed between them. We developed a methodology, CATEGORYCOMPARE, for cross-platform and cross-sample comparison of high-throughput data at the annotation level. We assessed the utility of the approach using hypothetical data, as well as determining similarities and differences in the set of processes in two instances: (1) denervated skin vs. denervated muscle, and (2) colon from Crohn's disease vs. colon from ulcerative colitis (UC). The hypothetical data showed that in many cases comparing annotations gave superior results to comparing only at the gene level. Improved analytical results depended as well on the number of genes included in the annotation term, the amount of noise in relation to the number of genes expressing in unenriched annotation categories, and the specific method in which samples are combined. In the skin vs. muscle denervation comparison, the tissues demonstrated markedly different responses. The Crohn's vs. UC comparison showed gross similarities in inflammatory response in the two diseases, with particular processes specific to each disease.

Original languageEnglish
Article numberArticle 98
JournalFrontiers in Genetics
Volume5
Issue numberAPR
DOIs
StatePublished - Jan 1 2014

Fingerprint

Ulcerative Colitis
Colon
Biomarkers
Muscle Denervation
Genes
Biological Phenomena
Skin
Crohn Disease
Noise
Meta-Analysis
Muscles
Proteins
Datasets
Data Curation

Keywords

  • Comparative analysis
  • Meta-analysis
  • Metabolomics
  • Proteomics
  • Transcriptomics

ASJC Scopus subject areas

  • Molecular Medicine
  • Genetics(clinical)
  • Genetics

Cite this

Flight, R. M., Harrison, B. J., Mohammad, F., Bunge, M. B., Moon, L. D. F., Petruska, J. C., & Rouchka, E. C. (2014). Categorycompare, an analytical tool based on feature annotations. Frontiers in Genetics, 5(APR), [Article 98]. https://doi.org/10.3389/fgene.2014.00098

Categorycompare, an analytical tool based on feature annotations. / Flight, Robert M.; Harrison, Benjamin J.; Mohammad, Fahim; Bunge, Mary B; Moon, Lawrence D F; Petruska, Jeffrey C.; Rouchka, Eric C.

In: Frontiers in Genetics, Vol. 5, No. APR, Article 98, 01.01.2014.

Research output: Contribution to journalArticle

Flight, RM, Harrison, BJ, Mohammad, F, Bunge, MB, Moon, LDF, Petruska, JC & Rouchka, EC 2014, 'Categorycompare, an analytical tool based on feature annotations', Frontiers in Genetics, vol. 5, no. APR, Article 98. https://doi.org/10.3389/fgene.2014.00098
Flight RM, Harrison BJ, Mohammad F, Bunge MB, Moon LDF, Petruska JC et al. Categorycompare, an analytical tool based on feature annotations. Frontiers in Genetics. 2014 Jan 1;5(APR). Article 98. https://doi.org/10.3389/fgene.2014.00098
Flight, Robert M. ; Harrison, Benjamin J. ; Mohammad, Fahim ; Bunge, Mary B ; Moon, Lawrence D F ; Petruska, Jeffrey C. ; Rouchka, Eric C. / Categorycompare, an analytical tool based on feature annotations. In: Frontiers in Genetics. 2014 ; Vol. 5, No. APR.
@article{95b0ff0b982d426da3e66aacc5f3d68f,
title = "Categorycompare, an analytical tool based on feature annotations",
abstract = "Assessment of high-throughput-omics data initially focuses on relative or raw levels of a particular feature, such as an expression value for a transcript, protein, or metabolite. At a second level, analyses of annotations including known or predicted functions and associations of each individual feature, attempt to distill biological context. Most currently available comparative- and meta-analyses methods are dependent on the availability of identical features across data sets, and concentrate on determining features that are differentially expressed across experiments, some of which may be considered {"}biomarkers.{"} The heterogeneity of measurement platforms and inherent variability of biological systems confounds the search for robust biomarkers indicative of a particular condition. In many instances, however, multiple data sets show involvement of common biological processes or signaling pathways, even though individual features are not commonly measured or differentially expressed between them. We developed a methodology, CATEGORYCOMPARE, for cross-platform and cross-sample comparison of high-throughput data at the annotation level. We assessed the utility of the approach using hypothetical data, as well as determining similarities and differences in the set of processes in two instances: (1) denervated skin vs. denervated muscle, and (2) colon from Crohn's disease vs. colon from ulcerative colitis (UC). The hypothetical data showed that in many cases comparing annotations gave superior results to comparing only at the gene level. Improved analytical results depended as well on the number of genes included in the annotation term, the amount of noise in relation to the number of genes expressing in unenriched annotation categories, and the specific method in which samples are combined. In the skin vs. muscle denervation comparison, the tissues demonstrated markedly different responses. The Crohn's vs. UC comparison showed gross similarities in inflammatory response in the two diseases, with particular processes specific to each disease.",
keywords = "Comparative analysis, Meta-analysis, Metabolomics, Proteomics, Transcriptomics",
author = "Flight, {Robert M.} and Harrison, {Benjamin J.} and Fahim Mohammad and Bunge, {Mary B} and Moon, {Lawrence D F} and Petruska, {Jeffrey C.} and Rouchka, {Eric C.}",
year = "2014",
month = "1",
day = "1",
doi = "10.3389/fgene.2014.00098",
language = "English",
volume = "5",
journal = "Frontiers in Genetics",
issn = "1664-8021",
publisher = "Frontiers Media S. A.",
number = "APR",

}

TY - JOUR

T1 - Categorycompare, an analytical tool based on feature annotations

AU - Flight, Robert M.

AU - Harrison, Benjamin J.

AU - Mohammad, Fahim

AU - Bunge, Mary B

AU - Moon, Lawrence D F

AU - Petruska, Jeffrey C.

AU - Rouchka, Eric C.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Assessment of high-throughput-omics data initially focuses on relative or raw levels of a particular feature, such as an expression value for a transcript, protein, or metabolite. At a second level, analyses of annotations including known or predicted functions and associations of each individual feature, attempt to distill biological context. Most currently available comparative- and meta-analyses methods are dependent on the availability of identical features across data sets, and concentrate on determining features that are differentially expressed across experiments, some of which may be considered "biomarkers." The heterogeneity of measurement platforms and inherent variability of biological systems confounds the search for robust biomarkers indicative of a particular condition. In many instances, however, multiple data sets show involvement of common biological processes or signaling pathways, even though individual features are not commonly measured or differentially expressed between them. We developed a methodology, CATEGORYCOMPARE, for cross-platform and cross-sample comparison of high-throughput data at the annotation level. We assessed the utility of the approach using hypothetical data, as well as determining similarities and differences in the set of processes in two instances: (1) denervated skin vs. denervated muscle, and (2) colon from Crohn's disease vs. colon from ulcerative colitis (UC). The hypothetical data showed that in many cases comparing annotations gave superior results to comparing only at the gene level. Improved analytical results depended as well on the number of genes included in the annotation term, the amount of noise in relation to the number of genes expressing in unenriched annotation categories, and the specific method in which samples are combined. In the skin vs. muscle denervation comparison, the tissues demonstrated markedly different responses. The Crohn's vs. UC comparison showed gross similarities in inflammatory response in the two diseases, with particular processes specific to each disease.

AB - Assessment of high-throughput-omics data initially focuses on relative or raw levels of a particular feature, such as an expression value for a transcript, protein, or metabolite. At a second level, analyses of annotations including known or predicted functions and associations of each individual feature, attempt to distill biological context. Most currently available comparative- and meta-analyses methods are dependent on the availability of identical features across data sets, and concentrate on determining features that are differentially expressed across experiments, some of which may be considered "biomarkers." The heterogeneity of measurement platforms and inherent variability of biological systems confounds the search for robust biomarkers indicative of a particular condition. In many instances, however, multiple data sets show involvement of common biological processes or signaling pathways, even though individual features are not commonly measured or differentially expressed between them. We developed a methodology, CATEGORYCOMPARE, for cross-platform and cross-sample comparison of high-throughput data at the annotation level. We assessed the utility of the approach using hypothetical data, as well as determining similarities and differences in the set of processes in two instances: (1) denervated skin vs. denervated muscle, and (2) colon from Crohn's disease vs. colon from ulcerative colitis (UC). The hypothetical data showed that in many cases comparing annotations gave superior results to comparing only at the gene level. Improved analytical results depended as well on the number of genes included in the annotation term, the amount of noise in relation to the number of genes expressing in unenriched annotation categories, and the specific method in which samples are combined. In the skin vs. muscle denervation comparison, the tissues demonstrated markedly different responses. The Crohn's vs. UC comparison showed gross similarities in inflammatory response in the two diseases, with particular processes specific to each disease.

KW - Comparative analysis

KW - Meta-analysis

KW - Metabolomics

KW - Proteomics

KW - Transcriptomics

UR - http://www.scopus.com/inward/record.url?scp=84901064162&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901064162&partnerID=8YFLogxK

U2 - 10.3389/fgene.2014.00098

DO - 10.3389/fgene.2014.00098

M3 - Article

AN - SCOPUS:84901064162

VL - 5

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

IS - APR

M1 - Article 98

ER -