Sparse Structural Equation Models for Gene Networks and Chemical Genomics

Project: Research project

Project Details


Structural equation modeling unifies regression, factor analysis, directed graphs and other (non)linear models into a powerful and flexible toolbox for statistical inference. It has well-documented merits in various areas, as diverse as biology, ecology, economics, psychology, and social sciences. Despite the flexibility of structural equation models (SEMs), their ability to cope with high-dimensional problems encountered in contemporary fields is limited due to the lack of efficient and effective inference methods. A truly focused effort is required to make necessary breakthroughs in high-dimensional SEMs and demonstrate their suitability in emerging research areas. The objective of this project is to develop efficient inference methods for high-dimensional SEMs tailored for inference of gene networks and optimized strategies for chemical genomics. A key enabler to this end is leveraging the sparsity attributes present in high-dimensional data. The proposed research themes are centered around two thrusts: (T1) Inference for sparse SEMs: A set of efficient and robust inference methods using novel algorithmic techniques and parallel computing will be developed for both linear and nonlinear high-dimensional SEMs; and (T2) SEM-based inference of gene regulatory networks and application to optimized chemical genomics: S. cerevisiae and human gene networks will be inferred by integrating multiple types of data under the SEM framework. The inferred networks will be also validated experimentally. A set of natural compounds will be profiled using SEM-based computational strategies to drive chemical genetic screens in S. cerevisiae and S. pombe. The proposed modeling framework will explicitly incorporate genetic variation across individuals in a population, and thus, can directly utilize the wealth of sequencing data that is currently being generated to tackle the genotype-to-phenotype challenge. Furthermore, the proposed work will markedly enhance the throughput at which new bioactive compounds are characterized using chemical genomics-based approaches in yeast, and in other model systems. It will also enable the application of high-dimensional SEMs in additional areas including economics, psychology, ecology, biobehavioral and other social science.
Effective start/end date7/1/1210/31/18


  • National Institute of General Medical Sciences: $357,106.00
  • National Institute of General Medical Sciences: $369,527.00
  • National Institute of General Medical Sciences: $361,906.00
  • National Institute of General Medical Sciences: $392,147.00
  • National Institute of General Medical Sciences: $346,403.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.