Advanced algorithms to infer and analyze 3D genome structures

Project: Research project

Project Details


Project Summary For the past decade, the population-cell Hi-C technique has significantly improved our ability to discover genome-wide DNA proximities. However, because population Hi-C is based on a pool of cells, it will not help us reveal each single cell's 3D genome structure or understand cell-to-cell variability in terms of 3D genome structure and gene regulation. It is also difficult to achieve a high resolution, such as 1 Kbp, with population Hi- C; therefore, when finding and analyzing the spatial interactions for the promoter or enhancer regions typically associated with biologically-important regulatory elements, population Hi-C data's resolution is too low to be useful. Moreover, while we know that the CTCF-cohesin complex plays a key role in the formation of genome 3D structures, the question is whether long non-coding RNAs (lncRNAs) are involved in the process since lncRNAs have been found to recruit proteins needed for chromatin remodeling, and our preliminary research has found that lncRNA LINC00346 directly interacts with CTCF. Finally, while members of the bioinformatics community, including the PI, have developed many algorithms to reconstruct 3D genome structures based on population Hi-C data, important questions still must be answered regarding how 3D genome structures are involved in gene regulation and whether there are relationships between 3D genome structures and genetic and epigenetic features. The PI proposes to conduct leading research to overcome these challenges and address these questions. During the next five years, the PI will develop algorithms to reconstruct the 3D whole- genome structures for single cells and analyze cell-to-cell variabilities in terms of 3D genome structure and gene regulation. The PI will develop a deep learning algorithm to enhance the resolution of population Hi-C data to that of Capture Hi-C data (1 Kbp) so that we can make good use of the large amount of Hi-C data accumulated in the past decade. An online database will be built to allow the community to access both population and single-cell 3D genome structures in an integrated way. The PI will work with a cancer biologist to discover any lncRNAs that function as a scaffold to fine-tune the CTCF-cohesin protein complex, as well as two neuron scientists to develop a more complete understanding of gene regulation while considering 3D genome and other genetic and epigenetic features. Given the PI's track record and productivity, having three computational goals and two collaborative goals is not only feasible but computationally and biologically rewarding. In five years, once the proposed studies are accomplished, the PI should have established a uniquely independent place in the field of 3D genome, maintaining leading positions in inferring single-cell 3D genome structures, enhancing Hi-C data resolution, and building 3D genome databases, while establishing similar positions in reconstructing high-resolution 3D genome structures, finding lncRNAs' roles in the formation of genome structures, and understanding how 3D genome structures are involved in gene regulation.
Effective start/end date9/1/218/31/22


  • National Institute of General Medical Sciences: $366,706.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.