Extreme value theory in some statistical analysis of genomic sequences

Lily Wang, Pranab K. Sen

Research output: Contribution to journalArticle

Abstract

Because similarities in biological sequences often suggest similarities in structures and functions, profile searches using multiple alignment of families of related biological sequences provide useful hints for starting points for experimental investigations in molecular biology. Strategies are formulated for determining statistical significance of scores obtained by searching multiple alignment profiles with databanks, while accommodating for gaps in the profile. The methodology is validated with derivation of asymptotic distribution of the maximum of profile scores, even under weakly dependence conditions. Simulation studies show the proposed method is adequate for moderate sample sizes. The methodology is illustrated with an immunoglobulin protein domain study example.

Original languageEnglish (US)
Pages (from-to)295-310
Number of pages16
JournalExtremes
Volume8
Issue number4
DOIs
StatePublished - Dec 1 2005
Externally publishedYes

Fingerprint

Extreme Value Theory
Statistical Analysis
Genomics
Statistical methods
Molecular biology
Alignment
Proteins
Immunoglobulin
Methodology
Molecular Biology
Statistical Significance
Experimental Investigation
Asymptotic distribution
Sample Size
Simulation Study
Protein
Profile
Statistical analysis
Extreme value theory
Similarity

Keywords

  • Maximum profile scores
  • Protein profile
  • Sequence alignment
  • Statistical significance
  • Weakly dependent

ASJC Scopus subject areas

  • Statistics and Probability
  • Engineering (miscellaneous)
  • Economics, Econometrics and Finance (miscellaneous)

Cite this

Extreme value theory in some statistical analysis of genomic sequences. / Wang, Lily; Sen, Pranab K.

In: Extremes, Vol. 8, No. 4, 01.12.2005, p. 295-310.

Research output: Contribution to journalArticle

Wang, Lily ; Sen, Pranab K. / Extreme value theory in some statistical analysis of genomic sequences. In: Extremes. 2005 ; Vol. 8, No. 4. pp. 295-310.
@article{53c0b0a9e3c3476abef4849495d2ee7c,
title = "Extreme value theory in some statistical analysis of genomic sequences",
abstract = "Because similarities in biological sequences often suggest similarities in structures and functions, profile searches using multiple alignment of families of related biological sequences provide useful hints for starting points for experimental investigations in molecular biology. Strategies are formulated for determining statistical significance of scores obtained by searching multiple alignment profiles with databanks, while accommodating for gaps in the profile. The methodology is validated with derivation of asymptotic distribution of the maximum of profile scores, even under weakly dependence conditions. Simulation studies show the proposed method is adequate for moderate sample sizes. The methodology is illustrated with an immunoglobulin protein domain study example.",
keywords = "Maximum profile scores, Protein profile, Sequence alignment, Statistical significance, Weakly dependent",
author = "Lily Wang and Sen, {Pranab K.}",
year = "2005",
month = "12",
day = "1",
doi = "10.1007/s10687-006-0008-9",
language = "English (US)",
volume = "8",
pages = "295--310",
journal = "Extremes",
issn = "1386-1999",
publisher = "Springer Netherlands",
number = "4",

}

TY - JOUR

T1 - Extreme value theory in some statistical analysis of genomic sequences

AU - Wang, Lily

AU - Sen, Pranab K.

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Because similarities in biological sequences often suggest similarities in structures and functions, profile searches using multiple alignment of families of related biological sequences provide useful hints for starting points for experimental investigations in molecular biology. Strategies are formulated for determining statistical significance of scores obtained by searching multiple alignment profiles with databanks, while accommodating for gaps in the profile. The methodology is validated with derivation of asymptotic distribution of the maximum of profile scores, even under weakly dependence conditions. Simulation studies show the proposed method is adequate for moderate sample sizes. The methodology is illustrated with an immunoglobulin protein domain study example.

AB - Because similarities in biological sequences often suggest similarities in structures and functions, profile searches using multiple alignment of families of related biological sequences provide useful hints for starting points for experimental investigations in molecular biology. Strategies are formulated for determining statistical significance of scores obtained by searching multiple alignment profiles with databanks, while accommodating for gaps in the profile. The methodology is validated with derivation of asymptotic distribution of the maximum of profile scores, even under weakly dependence conditions. Simulation studies show the proposed method is adequate for moderate sample sizes. The methodology is illustrated with an immunoglobulin protein domain study example.

KW - Maximum profile scores

KW - Protein profile

KW - Sequence alignment

KW - Statistical significance

KW - Weakly dependent

UR - http://www.scopus.com/inward/record.url?scp=33747743051&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33747743051&partnerID=8YFLogxK

U2 - 10.1007/s10687-006-0008-9

DO - 10.1007/s10687-006-0008-9

M3 - Article

AN - SCOPUS:33747743051

VL - 8

SP - 295

EP - 310

JO - Extremes

JF - Extremes

SN - 1386-1999

IS - 4

ER -