***Updated: Nov. 2007, Added links to testdata for Bayesian Scoring comparisons
***Updated: Feb. 2008, Added data for number of statistical/scoring calls

The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm
Ioannis Tsamardinos*, Laura E. Brown*, Constantin F. Aliferis*
Department of Biomedical Informatics
Vanderbilt University
Nashville, TN 37232

*Current Contact: I. Tsamardinos, University of Crete; L.E. Brown, Michigan Technological University; C.F. Aliferis, New York University.
Also hosted at:http://www.dsl-lab.org/supplements/mmhc_paper/mmhc_index.html
Or: http://discover.mc.vanderbilt.edu/discover/public/supplements/mmhc_paper/mmhc_index.html

Abstract
We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and state-of-the-art algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at http://www.cs.mtu.edu/~lebrown/supplements/mmhc_paper/ mmhc_index .html.

Paper (published in Machine Learning, 2006) [pdf]   [ps]     Supplemental Appendices [pdf]   [ps]



Algorithm
The Max-Min Hill-Climbing (MMHC) Algorithm is available in the Causal Explorer package. Implementations of Greedy Search (GS), PC, and Three Phase Dependency Analysis (TPDA) are also included in the Causal Explorer package.

Datasets
Datasets are listed by name, "data" links to a zip file of the datasets used in the paper, "link" directs the user to the networks entry in the Bayesian Network Repository . The Bayesian Network Repository contains the networks stored in multiple formats as well as citations to the original papers. Each zip file contains the 10 datasets used for learning at each sample size (500, 1000, 5000) and a file, Name_graph.txt, that contains the adjacency matrix of the true network (31 files in total). Note that for the larger datasets the zip file might be upwards of 15MB to download.

We also desired to experiment with larger networks than what is available in the public domain. In prior work, we have developed a method that generates large Bayesian Networks by tiling several copies of smaller Bayesian Networks until we reach a desired number of variables. The tiling is performed in a way that maintains the structural and probabilistic properties of the original network in the tiled network. Using the tiling algorithm we created versions of several of the smaller networks. The new Bayesian Networks contained 3, 5, and 10 copies of the originals and are indicated with the number of tiles next to the name of a network.




Supplemental Appendices
B. Computational Efficiency Results
C. Quality of Reconstruction Results
D. Results Across Common Datasets
In order to compare all algorithms to each other, the metrics are evaluated across common networks. In order to keep the number of common networks high we exclude GES from this analysis. Every algorithm (except for GES) runs across 10 networks at sample size 500 and 1000 and 12 networks at sample size 5000. With this analysis comparisons may be made between all algorithms and not just with MMHC.
E. Effects on time and quality of learning of increasing the number of variables



Complete Data Tables   The complete results tables are given below as Matlab data files. The results matrices are indexed by dataset (22) by algorithm (13) by sample size (3) by trial (5). Files giving the indexing for the dataset and algorithm are linked below. The sample size is indexed as follows: sample size 500 == 1, sample size 1000 == 2, sample size 5000 = 3.