Clustering gene expression profile data by selective shrinkage

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Clustering of gene expression profiles is a widely used approach for finding macroscopic data structure. A complication in such analyses is that not all genes are informative for forming clusters and different clusters might have different transcription regulation. Driven by these considerations, we present a novel two-stage clustering approach. The first stage identifies informative genes by adaptive variable selection using pseudo-samples modeled by a high dimensional multigroup ANOVA model. Variables are selected using a rescaled spike and slab Bayesian hierarchical model having a special selective shrinkage property. The second stage uses output from the first stage for clustering. We demonstrate why selective shrinkage occurs, and by extension, why it is useful for the clustering paradigm. We analyze a human gene atlas expression dataset where the question of interest is to look for tissue-specific transcription regulation and investigate whether tissues can be grouped together due to similar genomic control.

Original languageEnglish
Pages (from-to)1490-1497
Number of pages8
JournalStatistics and Probability Letters
Volume78
Issue number12
DOIs
StatePublished - Sep 1 2008
Externally publishedYes

Fingerprint

Gene Expression Profile
Shrinkage
Clustering
Gene
Transcription
Bayesian Hierarchical Model
Atlas
Variable Selection
Complications
Spike
Genomics
Data Structures
High-dimensional
Paradigm
Gene expression
Output
Demonstrate

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Statistics and Probability

Cite this

Clustering gene expression profile data by selective shrinkage. / Ishwaran, Hemant; Rao, Jonnagadda S.

In: Statistics and Probability Letters, Vol. 78, No. 12, 01.09.2008, p. 1490-1497.

Research output: Contribution to journalArticle

@article{f35a399056c64dc9bbb9c38730b5954e,
title = "Clustering gene expression profile data by selective shrinkage",
abstract = "Clustering of gene expression profiles is a widely used approach for finding macroscopic data structure. A complication in such analyses is that not all genes are informative for forming clusters and different clusters might have different transcription regulation. Driven by these considerations, we present a novel two-stage clustering approach. The first stage identifies informative genes by adaptive variable selection using pseudo-samples modeled by a high dimensional multigroup ANOVA model. Variables are selected using a rescaled spike and slab Bayesian hierarchical model having a special selective shrinkage property. The second stage uses output from the first stage for clustering. We demonstrate why selective shrinkage occurs, and by extension, why it is useful for the clustering paradigm. We analyze a human gene atlas expression dataset where the question of interest is to look for tissue-specific transcription regulation and investigate whether tissues can be grouped together due to similar genomic control.",
author = "Hemant Ishwaran and Rao, {Jonnagadda S}",
year = "2008",
month = "9",
day = "1",
doi = "10.1016/j.spl.2008.01.003",
language = "English",
volume = "78",
pages = "1490--1497",
journal = "Statistics and Probability Letters",
issn = "0167-7152",
publisher = "Elsevier",
number = "12",

}

TY - JOUR

T1 - Clustering gene expression profile data by selective shrinkage

AU - Ishwaran, Hemant

AU - Rao, Jonnagadda S

PY - 2008/9/1

Y1 - 2008/9/1

N2 - Clustering of gene expression profiles is a widely used approach for finding macroscopic data structure. A complication in such analyses is that not all genes are informative for forming clusters and different clusters might have different transcription regulation. Driven by these considerations, we present a novel two-stage clustering approach. The first stage identifies informative genes by adaptive variable selection using pseudo-samples modeled by a high dimensional multigroup ANOVA model. Variables are selected using a rescaled spike and slab Bayesian hierarchical model having a special selective shrinkage property. The second stage uses output from the first stage for clustering. We demonstrate why selective shrinkage occurs, and by extension, why it is useful for the clustering paradigm. We analyze a human gene atlas expression dataset where the question of interest is to look for tissue-specific transcription regulation and investigate whether tissues can be grouped together due to similar genomic control.

AB - Clustering of gene expression profiles is a widely used approach for finding macroscopic data structure. A complication in such analyses is that not all genes are informative for forming clusters and different clusters might have different transcription regulation. Driven by these considerations, we present a novel two-stage clustering approach. The first stage identifies informative genes by adaptive variable selection using pseudo-samples modeled by a high dimensional multigroup ANOVA model. Variables are selected using a rescaled spike and slab Bayesian hierarchical model having a special selective shrinkage property. The second stage uses output from the first stage for clustering. We demonstrate why selective shrinkage occurs, and by extension, why it is useful for the clustering paradigm. We analyze a human gene atlas expression dataset where the question of interest is to look for tissue-specific transcription regulation and investigate whether tissues can be grouped together due to similar genomic control.

UR - http://www.scopus.com/inward/record.url?scp=49349102807&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49349102807&partnerID=8YFLogxK

U2 - 10.1016/j.spl.2008.01.003

DO - 10.1016/j.spl.2008.01.003

M3 - Article

VL - 78

SP - 1490

EP - 1497

JO - Statistics and Probability Letters

JF - Statistics and Probability Letters

SN - 0167-7152

IS - 12

ER -