MMHC - The Max-Min Hill-Climbing Algorithm

***Updated: Nov. 2007, Added links to testdata for Bayesian Scoring comparisons
***Updated: Feb. 2008, Added data for number of statistical/scoring calls

The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm
Ioannis Tsamardinos*, Laura E. Brown*, Constantin F. Aliferis*
Department of Biomedical Informatics
Vanderbilt University
Nashville, TN 37232

*Current Contact: I. Tsamardinos, University of Crete; L.E. Brown, Michigan Technological University; C.F. Aliferis, New York University.
Also hosted at:http://www.dsl-lab.org/supplements/mmhc_paper/mmhc_index.html
Or: http://discover.mc.vanderbilt.edu/discover/public/supplements/mmhc_paper/mmhc_index.html

Abstract
We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and state-of-the-art algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at http://www.cs.mtu.edu/~lebrown/supplements/mmhc_paper/ mmhc_index .html.

Paper (published in Machine Learning, 2006) [pdf] [ps] Supplemental Appendices [pdf] [ps]

Algorithm
The Max-Min Hill-Climbing (MMHC) Algorithm is available in the Causal Explorer package. Implementations of Greedy Search (GS), PC, and Three Phase Dependency Analysis (TPDA) are also included in the Causal Explorer package.

Datasets
Datasets are listed by name, "data" links to a zip file of the datasets used in the paper, "link" directs the user to the networks entry in the Bayesian Network Repository . The Bayesian Network Repository contains the networks stored in multiple formats as well as citations to the original papers. Each zip file contains the 10 datasets used for learning at each sample size (500, 1000, 5000) and a file, Name_graph.txt, that contains the adjacency matrix of the true network (31 files in total). Note that for the larger datasets the zip file might be upwards of 15MB to download.

We also desired to experiment with larger networks than what is available in the public domain. In prior work, we have developed a method that generates large Bayesian Networks by tiling several copies of smaller Bayesian Networks until we reach a desired number of variables. The tiling is performed in a way that maintains the structural and probabilistic properties of the original network in the tiled network. Using the tiling algorithm we created versions of several of the smaller networks. The new Bayesian Networks contained 3, 5, and 10 copies of the originals and are indicated with the number of tiles next to the name of a network.

Alarm - [data] - [test_data] [link]
- Alarm3 - [data] - [test_data]
- Alarm5 - [data] - [test_data]
- Alarm10 - [data] - [test_data]
Child - [data] - [test_data]
- Child3 - [data] - [test_data]
- Child5 - [data] - [test_data]
- Child10 - [data] - [test_data]

Insurance - [data] - [test_data] [link]
- Insurance3 - [data] - [test_data]
- Insurance5 - [data] - [test_data]
- Insurance10 - [data] - [test_data]
Hailfinder - [data] - [test_data] [link]
- Hailfinder3 - [data] - [test_data]
- Hailfinder5 - [data] - [test_data]
- Hailfinder10 - [data] - [test_data]

Supplemental Appendices
B. Computational Efficiency Results

B.1 Timing Results
- B.1.1 Median Timing Results [pdf] [ps]
- B.1.2 Complete Timing Results
  The complete results tables are given below as Matlab data files and text files. The complete results are also visually shown in Figures 1 - 9. These figures are bar plots of the absolute timing results for each algorithm on a given network and sample size. The error bars plotted are at +/- standard deviation.
  A red bar indicates that that particular algorithm hit the 2-day time limit. A bar with pink edging indicates it was an extreme value and exceeded the limits of the graph (the true value is given for comparison). Finally, a light green bar indicates the minimum time.
  When a bar is missing for an algorithm then the algorithm was unable to run on that particular network and sample size. Also, recall that the OR algorithm was set to run at one and two times the amount of time it took MMHC to finish.
  - Figures 1-9 [pdf] [ps]
B.2 Number of Statistical Calls Performed Results
- B.2.1 Median Statistical Calls Performed Results [pdf] [ps]

C. Quality of Reconstruction Results

C.1 Bayesian Score Results
- C.1.1 Median Bayesian Score Results [pdf] [ps]
- C.1.2 Complete Bayesian Score Results
  The complete results tables are given below as Matlab data files and text files. The Matlab files are 4-D matrices are indexed by network (22), algorithm (13), sample size (3), and sampling (5). For each sample size there is a text document with the mean and standard deviation for all algorithms and networks.
  The complete Bayesian Score results are also visually shown in Figures 10- 18. These figures are bar plots of the absolute timing results for each algorithm on a given network and sample size. The error bars plotted are at +/- standard deviation.
  A red * indicates the algorithm hit a two-day time limit with the quality not-determined. A bar with pink edging indicates it was an extreme value and exceeded the limits of the plot; the value is given for comparison. The green bar indicates the minimum.
  When a bar is missing for an algorithm, indicated by an X, the algorithm was unable to run on that particular network and sample size.
  - Figures 10-18 [pdf] [ps]
C.2 Structural Hamming Distance Results
- C.2.1 Median Structural Hamming Distance Results [pdf] [ps]
- C.2.2 Complete Structural Hamming Distance Results
  The complete results tables are given below as Matlab data files and text files. The 4-D matrices are indexed by network (22), algorithm (13), sample size (3), and sampling (5). Also, for each sample size there is a text document with the mean and standard deviation for all algorithms and networks.
  The complete Structural Hamming Distance results are also visually shown in 19 - 27. These figures are bar plots of the absolute timing results for each algorithm on a given network and sample size. The error bars plotted are at +/- standard deviation.
  A red * indicates the algorithm hit a two-day time limit with the quality not-determined. A bar with pink edging indicates it was an extreme value and exceeded the limits of the plot; the value is given for comparison. The green bar indicates the minimum (for two cases CHILD and PIGS at sample size 5000 the minimum was MMHC with 0 errors in this case a green 0 is plotted instead).
  When a bar is missing for an algorithm, denoted with an X, the algorithm was unable to run on that particular network and sample size.
  - Figures 19-27 [pdf] [ps]
C.3 Kullback-Leibler Divergence Results
- C.3.1 Average Kullback-Leibler Divergence Results [pdf] [ps]
- C.3.2 Median Kullback-Leibler Divergence Results [pdf] [ps]

D. Results Across Common Datasets
In order to compare all algorithms to each other, the metrics are evaluated across common networks. In order to keep the number of common networks high we exclude GES from this analysis. Every algorithm (except for GES) runs across 10 networks at sample size 500 and 1000 and 12 networks at sample size 5000. With this analysis comparisons may be made between all algorithms and not just with MMHC.

D.1 Computational Efficiency Results
- D.1.1 Time Results
  Normalized time is the running time of each algorithm for a particular sample size and network divided by the corresponding running time of MMHC. A normalized time value smaller than one corresponds to an algorithm with faster running times than MMHC. The average normalized timing results and median normalized timing results are presented over the common networks in Tables 7 and 8. The term in parenthesis is the number of networks taken in the average (median) calculation; here those networks are common across all algorithms.
  - Tables 7-8 [pdf] [ps]
- D.1.2 Number of Statistical Calls Performed Results
  Normalized number of statistical calls performed is the number of statistical calls completed by each algorithm (number of tests of conditional independence, measures of association, and/or number of calls to the scoring function) for a dataset divided by MMHC's on the same dataset. Normalized values greater than one correspond to an algorithm performing a greater number of statistical functions than MMHC. The average and median normalized number of statistical calls performed are presented in Tables 9 and 10. The term in parentheses indicate the number of common networks included in the average (median) calculation.
  - Tables 9-10 [pdf] [ps]
D.2 Quality of Reconstruction Results
- D.2.1 Bayesian Score Results
  - Tables 11-12 [pdf] [ps]
- D.2.2 Structural Hamming Distance Results
  - Tables 13-14 [pdf] [ps]

E. Effects on time and quality of learning of increasing the number of variables

E.1 Effect on Greedy Search of Increasing the Number of Variables
- Figure 28-29 [pdf] [ps]
E.2 Effect on PC of Increasing the Number of Variables
- Figure 30-31 [pdf] [ps]
E.3 Effect on TPDA of Increasing the Number of Variables
- Figure 32-33 [pdf] [ps]
E.4 Effect on Sparse Candidate k=5 of Increasing the Number of Variables
- Figure 34 [pdf] [ps]
E.5 Effect on Sparse Candidate k=10 of Increasing the Number of Variables
- Figure 35 [pdf] [ps]
E.6 Effect on Optimal Reinsertion 1 and 2 k=5 of Increasing the Number of Variables
- Figure 36 [pdf] [ps]
E.7 Effect on Optimal Reinsertion 1 and 2 k=10 of Increasing the Number of Variables
- Figure 37 [pdf] [ps]
E.8 Effect on Optimal Reinsertion 1 and 2 k=20 of Increasing the Number of Variables
- Figure 38 [pdf] [ps]

Complete Data Tables The complete results tables are given below as Matlab data files. The results matrices are indexed by dataset (22) by algorithm (13) by sample size (3) by trial (5). Files giving the indexing for the dataset and algorithm are linked below. The sample size is indexed as follows: sample size 500 == 1, sample size 1000 == 2, sample size 5000 = 3.