TADKB: Family classification and a knowledge base of topologically associating domains

Tong Liu, Jacob Porter, Chenguang Zhao, Hao Zhu, Nan Wang, Zheng Sun, Yin Yuan Mo, Zheng Wang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Topologically associating domains (TADs) are considered the structural and functional units of the genome. However, there is a lack of an integrated resource for TADs in the literature where researchers can obtain family classifications and detailed information about TADs. Results: We built an online knowledge base TADKB integrating knowledge for TADs in eleven cell types of human and mouse. For each TAD, TADKB provides the predicted three-dimensional (3D) structures of chromosomes and TADs, and detailed annotations about the protein-coding genes and long non-coding RNAs (lncRNAs) existent in each TAD. Besides the 3D chromosomal structures inferred by population Hi-C, the single-cell haplotype-resolved chromosomal 3D structures of 17 GM12878 cells are also integrated in TADKB. A user can submit query gene/lncRNA ID/sequence to search for the TAD(s) that contain(s) the query gene or lncRNA. We also classified TADs into families. To achieve that, we used the TM-scores between reconstructed 3D structures of TADs as structural similarities and the Pearson's correlation coefficients between the fold enrichment of chromatin states as functional similarities. All of the TADs in one cell type were clustered based on structural and functional similarities respectively using the spectral clustering algorithm with various predefined numbers of clusters. We have compared the overlapping TADs from structural and functional clusters and found that most of the TADs in the functional clusters with depleted chromatin states are clustered into one or two structural clusters. This novel finding indicates a connection between the 3D structures of TADs and their DNA functions in terms of chromatin states. Conclusion: TADKB is available at http://dna.cs.miami.edu/TADKB/.

Original languageEnglish (US)
Article number217
JournalBMC Genomics
Volume20
Issue number1
DOIs
StatePublished - Mar 14 2019

Fingerprint

Knowledge Bases
Long Noncoding RNA
Chromatin
Chromosome Structures
Molecular Sequence Annotation
Haplotypes
Genes
Cluster Analysis
Research Personnel
Genome
DNA
Population
Proteins

Keywords

  • Family classification
  • lncRNAs
  • Long non-coding RNAs
  • Single-cell 3D genome structures
  • TADs
  • Topologically associating domains

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

TADKB : Family classification and a knowledge base of topologically associating domains. / Liu, Tong; Porter, Jacob; Zhao, Chenguang; Zhu, Hao; Wang, Nan; Sun, Zheng; Mo, Yin Yuan; Wang, Zheng.

In: BMC Genomics, Vol. 20, No. 1, 217, 14.03.2019.

Research output: Contribution to journalArticle

Liu, Tong ; Porter, Jacob ; Zhao, Chenguang ; Zhu, Hao ; Wang, Nan ; Sun, Zheng ; Mo, Yin Yuan ; Wang, Zheng. / TADKB : Family classification and a knowledge base of topologically associating domains. In: BMC Genomics. 2019 ; Vol. 20, No. 1.
@article{541f5aca35f0416a8289d8b340c77a8d,
title = "TADKB: Family classification and a knowledge base of topologically associating domains",
abstract = "Background: Topologically associating domains (TADs) are considered the structural and functional units of the genome. However, there is a lack of an integrated resource for TADs in the literature where researchers can obtain family classifications and detailed information about TADs. Results: We built an online knowledge base TADKB integrating knowledge for TADs in eleven cell types of human and mouse. For each TAD, TADKB provides the predicted three-dimensional (3D) structures of chromosomes and TADs, and detailed annotations about the protein-coding genes and long non-coding RNAs (lncRNAs) existent in each TAD. Besides the 3D chromosomal structures inferred by population Hi-C, the single-cell haplotype-resolved chromosomal 3D structures of 17 GM12878 cells are also integrated in TADKB. A user can submit query gene/lncRNA ID/sequence to search for the TAD(s) that contain(s) the query gene or lncRNA. We also classified TADs into families. To achieve that, we used the TM-scores between reconstructed 3D structures of TADs as structural similarities and the Pearson's correlation coefficients between the fold enrichment of chromatin states as functional similarities. All of the TADs in one cell type were clustered based on structural and functional similarities respectively using the spectral clustering algorithm with various predefined numbers of clusters. We have compared the overlapping TADs from structural and functional clusters and found that most of the TADs in the functional clusters with depleted chromatin states are clustered into one or two structural clusters. This novel finding indicates a connection between the 3D structures of TADs and their DNA functions in terms of chromatin states. Conclusion: TADKB is available at http://dna.cs.miami.edu/TADKB/.",
keywords = "Family classification, lncRNAs, Long non-coding RNAs, Single-cell 3D genome structures, TADs, Topologically associating domains",
author = "Tong Liu and Jacob Porter and Chenguang Zhao and Hao Zhu and Nan Wang and Zheng Sun and Mo, {Yin Yuan} and Zheng Wang",
year = "2019",
month = "3",
day = "14",
doi = "10.1186/s12864-019-5551-2",
language = "English (US)",
volume = "20",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - TADKB

T2 - Family classification and a knowledge base of topologically associating domains

AU - Liu, Tong

AU - Porter, Jacob

AU - Zhao, Chenguang

AU - Zhu, Hao

AU - Wang, Nan

AU - Sun, Zheng

AU - Mo, Yin Yuan

AU - Wang, Zheng

PY - 2019/3/14

Y1 - 2019/3/14

N2 - Background: Topologically associating domains (TADs) are considered the structural and functional units of the genome. However, there is a lack of an integrated resource for TADs in the literature where researchers can obtain family classifications and detailed information about TADs. Results: We built an online knowledge base TADKB integrating knowledge for TADs in eleven cell types of human and mouse. For each TAD, TADKB provides the predicted three-dimensional (3D) structures of chromosomes and TADs, and detailed annotations about the protein-coding genes and long non-coding RNAs (lncRNAs) existent in each TAD. Besides the 3D chromosomal structures inferred by population Hi-C, the single-cell haplotype-resolved chromosomal 3D structures of 17 GM12878 cells are also integrated in TADKB. A user can submit query gene/lncRNA ID/sequence to search for the TAD(s) that contain(s) the query gene or lncRNA. We also classified TADs into families. To achieve that, we used the TM-scores between reconstructed 3D structures of TADs as structural similarities and the Pearson's correlation coefficients between the fold enrichment of chromatin states as functional similarities. All of the TADs in one cell type were clustered based on structural and functional similarities respectively using the spectral clustering algorithm with various predefined numbers of clusters. We have compared the overlapping TADs from structural and functional clusters and found that most of the TADs in the functional clusters with depleted chromatin states are clustered into one or two structural clusters. This novel finding indicates a connection between the 3D structures of TADs and their DNA functions in terms of chromatin states. Conclusion: TADKB is available at http://dna.cs.miami.edu/TADKB/.

AB - Background: Topologically associating domains (TADs) are considered the structural and functional units of the genome. However, there is a lack of an integrated resource for TADs in the literature where researchers can obtain family classifications and detailed information about TADs. Results: We built an online knowledge base TADKB integrating knowledge for TADs in eleven cell types of human and mouse. For each TAD, TADKB provides the predicted three-dimensional (3D) structures of chromosomes and TADs, and detailed annotations about the protein-coding genes and long non-coding RNAs (lncRNAs) existent in each TAD. Besides the 3D chromosomal structures inferred by population Hi-C, the single-cell haplotype-resolved chromosomal 3D structures of 17 GM12878 cells are also integrated in TADKB. A user can submit query gene/lncRNA ID/sequence to search for the TAD(s) that contain(s) the query gene or lncRNA. We also classified TADs into families. To achieve that, we used the TM-scores between reconstructed 3D structures of TADs as structural similarities and the Pearson's correlation coefficients between the fold enrichment of chromatin states as functional similarities. All of the TADs in one cell type were clustered based on structural and functional similarities respectively using the spectral clustering algorithm with various predefined numbers of clusters. We have compared the overlapping TADs from structural and functional clusters and found that most of the TADs in the functional clusters with depleted chromatin states are clustered into one or two structural clusters. This novel finding indicates a connection between the 3D structures of TADs and their DNA functions in terms of chromatin states. Conclusion: TADKB is available at http://dna.cs.miami.edu/TADKB/.

KW - Family classification

KW - lncRNAs

KW - Long non-coding RNAs

KW - Single-cell 3D genome structures

KW - TADs

KW - Topologically associating domains

UR - http://www.scopus.com/inward/record.url?scp=85062942557&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062942557&partnerID=8YFLogxK

U2 - 10.1186/s12864-019-5551-2

DO - 10.1186/s12864-019-5551-2

M3 - Article

C2 - 30871473

AN - SCOPUS:85062942557

VL - 20

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 217

ER -