HiCNN: A very deep convolutional neural network to better enhance the resolution of Hi-C data

Tong Liu, Zheng Wang, John Hancock

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Motivation: High-resolution Hi-C data are indispensable for the studies of three-dimensional (3D) genome organization at kilobase level. However, generating high-resolution Hi-C data (e.g. 5 kb) by conducting Hi-C experiments needs millions of mammalian cells, which may eventually generate billions of paired-end reads with a high sequencing cost. Therefore, it will be important and helpful if we can enhance the resolutions of Hi-C data by computational methods. Results: We developed a new computational method named HiCNN that used a 54-layer very deep convolutional neural network to enhance the resolutions of Hi-C data. The network contains both global and local residual learning with multiple speedup techniques included resulting in fast convergence. We used mean squared errors and Pearson's correlation coefficients between real high-resolution and computationally predicted high-resolution Hi-C data to evaluate the method. The evaluation results show that HiCNN consistently outperforms HiCPlus, the only existing tool in the literature, when training and testing data are extracted from the same cell type (i.e. GM12878) and from two different cell types in the same or different species (i.e. GM12878 as training with K562 as testing, and GM12878 as training with CH12-LX as testing). We further found that the HiCNN-enhanced high-resolution Hi-C data are more consistent with real experimental high-resolution Hi-C data than HiCPlus-enhanced data in terms of indicating statistically significant interactions. Moreover, HiCNN can efficiently enhance low-resolution Hi-C data, which eventually helps recover two chromatin loops that were confirmed by 3D-FISH. Availability and implementation: HiCNN is freely available at http://dna.cs.miami.edu/HiCNN/. Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)4222-4228
Number of pages7
JournalBioinformatics
Volume35
Issue number21
DOIs
StatePublished - Nov 1 2019

Fingerprint

Neural Networks
Computational methods
Neural networks
Testing
High Resolution
Bioinformatics
Computational Biology
Chromatin
Genes
Cells
Availability
Learning
Genome
Costs and Cost Analysis
Computational Methods
Cell
Costs
Experiments
Pearson Correlation
Mean Squared Error

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

HiCNN : A very deep convolutional neural network to better enhance the resolution of Hi-C data. / Liu, Tong; Wang, Zheng; Hancock, John.

In: Bioinformatics, Vol. 35, No. 21, 01.11.2019, p. 4222-4228.

Research output: Contribution to journalArticle

@article{a24b85317c3c4756bca99e17184193db,
title = "HiCNN: A very deep convolutional neural network to better enhance the resolution of Hi-C data",
abstract = "Motivation: High-resolution Hi-C data are indispensable for the studies of three-dimensional (3D) genome organization at kilobase level. However, generating high-resolution Hi-C data (e.g. 5 kb) by conducting Hi-C experiments needs millions of mammalian cells, which may eventually generate billions of paired-end reads with a high sequencing cost. Therefore, it will be important and helpful if we can enhance the resolutions of Hi-C data by computational methods. Results: We developed a new computational method named HiCNN that used a 54-layer very deep convolutional neural network to enhance the resolutions of Hi-C data. The network contains both global and local residual learning with multiple speedup techniques included resulting in fast convergence. We used mean squared errors and Pearson's correlation coefficients between real high-resolution and computationally predicted high-resolution Hi-C data to evaluate the method. The evaluation results show that HiCNN consistently outperforms HiCPlus, the only existing tool in the literature, when training and testing data are extracted from the same cell type (i.e. GM12878) and from two different cell types in the same or different species (i.e. GM12878 as training with K562 as testing, and GM12878 as training with CH12-LX as testing). We further found that the HiCNN-enhanced high-resolution Hi-C data are more consistent with real experimental high-resolution Hi-C data than HiCPlus-enhanced data in terms of indicating statistically significant interactions. Moreover, HiCNN can efficiently enhance low-resolution Hi-C data, which eventually helps recover two chromatin loops that were confirmed by 3D-FISH. Availability and implementation: HiCNN is freely available at http://dna.cs.miami.edu/HiCNN/. Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Tong Liu and Zheng Wang and John Hancock",
year = "2019",
month = "11",
day = "1",
doi = "10.1093/bioinformatics/btz251",
language = "English (US)",
volume = "35",
pages = "4222--4228",
journal = "Bioinformatics (Oxford, England)",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "21",

}

TY - JOUR

T1 - HiCNN

T2 - A very deep convolutional neural network to better enhance the resolution of Hi-C data

AU - Liu, Tong

AU - Wang, Zheng

AU - Hancock, John

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Motivation: High-resolution Hi-C data are indispensable for the studies of three-dimensional (3D) genome organization at kilobase level. However, generating high-resolution Hi-C data (e.g. 5 kb) by conducting Hi-C experiments needs millions of mammalian cells, which may eventually generate billions of paired-end reads with a high sequencing cost. Therefore, it will be important and helpful if we can enhance the resolutions of Hi-C data by computational methods. Results: We developed a new computational method named HiCNN that used a 54-layer very deep convolutional neural network to enhance the resolutions of Hi-C data. The network contains both global and local residual learning with multiple speedup techniques included resulting in fast convergence. We used mean squared errors and Pearson's correlation coefficients between real high-resolution and computationally predicted high-resolution Hi-C data to evaluate the method. The evaluation results show that HiCNN consistently outperforms HiCPlus, the only existing tool in the literature, when training and testing data are extracted from the same cell type (i.e. GM12878) and from two different cell types in the same or different species (i.e. GM12878 as training with K562 as testing, and GM12878 as training with CH12-LX as testing). We further found that the HiCNN-enhanced high-resolution Hi-C data are more consistent with real experimental high-resolution Hi-C data than HiCPlus-enhanced data in terms of indicating statistically significant interactions. Moreover, HiCNN can efficiently enhance low-resolution Hi-C data, which eventually helps recover two chromatin loops that were confirmed by 3D-FISH. Availability and implementation: HiCNN is freely available at http://dna.cs.miami.edu/HiCNN/. Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: High-resolution Hi-C data are indispensable for the studies of three-dimensional (3D) genome organization at kilobase level. However, generating high-resolution Hi-C data (e.g. 5 kb) by conducting Hi-C experiments needs millions of mammalian cells, which may eventually generate billions of paired-end reads with a high sequencing cost. Therefore, it will be important and helpful if we can enhance the resolutions of Hi-C data by computational methods. Results: We developed a new computational method named HiCNN that used a 54-layer very deep convolutional neural network to enhance the resolutions of Hi-C data. The network contains both global and local residual learning with multiple speedup techniques included resulting in fast convergence. We used mean squared errors and Pearson's correlation coefficients between real high-resolution and computationally predicted high-resolution Hi-C data to evaluate the method. The evaluation results show that HiCNN consistently outperforms HiCPlus, the only existing tool in the literature, when training and testing data are extracted from the same cell type (i.e. GM12878) and from two different cell types in the same or different species (i.e. GM12878 as training with K562 as testing, and GM12878 as training with CH12-LX as testing). We further found that the HiCNN-enhanced high-resolution Hi-C data are more consistent with real experimental high-resolution Hi-C data than HiCPlus-enhanced data in terms of indicating statistically significant interactions. Moreover, HiCNN can efficiently enhance low-resolution Hi-C data, which eventually helps recover two chromatin loops that were confirmed by 3D-FISH. Availability and implementation: HiCNN is freely available at http://dna.cs.miami.edu/HiCNN/. Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85072614470&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072614470&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz251

DO - 10.1093/bioinformatics/btz251

M3 - Article

C2 - 31056636

AN - SCOPUS:85072614470

VL - 35

SP - 4222

EP - 4228

JO - Bioinformatics (Oxford, England)

JF - Bioinformatics (Oxford, England)

SN - 1367-4803

IS - 21

ER -