Learning the Network Structure of Heterogeneous Data via Pairwise Exponential Markov Random Fields

Y. Park, D. Hallac, S. Boyd, and J. Leskovec

Proceedings International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1302–1310, 2017.

Markov random fields (MRFs) are a useful tool for modeling relationships present in large and high-dimensional data. Often, this data comes from various sources and can have diverse distributions, for example a combination of numerical, binary, and categorical variables. Here, we define the pairwise exponential Markov random field (PE-MRF), an approach capable of modeling exponential family distributions in heterogeneous domains. We develop a scalable method of learning the graphical structure across the variables by solving a regularized approximated maximum likelihood problem. Specifically, we first derive a tractable upper bound on the log-partition function. We then use this upper bound to derive the group graphical lasso, a generalization of the classic graphical lasso problem to heterogeneous domains. To solve this problem, we develop a fast algorithm based on the alternating direction method of multipliers (ADMM). We also prove that our estimator is sparsistent, with guaranteed recovery of the true underlying graphical structure, and that it has a polynomially faster runtime than the current stateof-the-art method for learning such distributions. Experiments on synthetic and realworld examples demonstrate that our approach is both efficient and accurate at uncovering the structure of heterogeneous data.