Sparse Structural Equation Models for Gene Networks and Chemical Genomics

  • Cai, Xiaodong (PI)

Project: Research project

Project Details


DESCRIPTION (provided by applicant): Structural equation modeling unifies regression, factor analysis, directed graphs and other (non) linear models into a powerful and flexible toolbox for statistical inference. It has well-documented merits in various areas, as diverse as biology, ecology, economics, psychology, and social sciences. Despite the flexibility of structural equation models (SEMs), their ability to cope with high-dimensional problems encountered in contemporary fields is limited due to the lack of efficient and effective inference methods. A trul focused effort is required to make necessary breakthroughs in high-dimensional SEMs and demonstrate their suitability in emerging research areas. The objective of this project is to develop efficient inference methods for high-dimensional SEMs tailored for inference of gene networks and optimized strategies for chemical genomics. A key enabler to this end is leveraging the sparsity attributes present in high-dimensional data. The proposed research themes are centered around two thrusts: (T1) Inference for sparse SEMs: A set of efficient and robust inference methods using novel algorithmic techniques and parallel computing will be developed for both linear and nonlinear high-dimensional SEMs; and (T2) SEM-based inference of gene regulatory networks and application to optimized chemical genomics: S. cerevisiae and human gene networks will be inferred by integrating multiple types of data under the SEM framework. The inferred networks will be also validated experimentally. A set of natural compounds will be profiled using SEM-based computational strategies to drive chemical genetic screens in S. cerevisiae and S. pombe. The proposed modeling framework will explicitly incorporate genetic variation across individuals in a population, and thus, can directly utilize th wealth of sequencing data that is currently being generated to tackle the genotype-to-phenotype challenge. Furthermore, the proposed work will markedly enhance the throughput at which new bioactive compounds are characterized using chemical genomics-based approaches in yeast, and in other model systems. It will also enable the application of high-dimensional SEMs in additional areas including economics, psychology, ecology, biobehavioral and other social science. PUBLIC HEALTH RELEVANCE: Successful completion of the proposed project could have broad impact on human health as it would help to understand the role of genes and their interactions in various diseases and enable the construction of more comprehensive small molecule libraries with well-defined molecular targets for use in new therapeutics.
Effective start/end date7/1/1210/31/18


  • National Institutes of Health: $361,906.00
  • National Institutes of Health: $357,106.00
  • National Institutes of Health: $369,527.00
  • National Institutes of Health: $392,147.00
  • National Institutes of Health: $346,403.00


  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.