***Updated: Nov. 2007, Added links to testdata for Bayesian Scoring comparisons
***Updated: Feb. 2008, Added data for number of statistical/scoring calls
The Max-Min Hill-Climbing Bayesian Network Structure Learning
Algorithm
Ioannis Tsamardinos*, Laura E. Brown*, Constantin F. Aliferis*
Department of Biomedical Informatics
Vanderbilt University
Nashville, TN 37232
*Current Contact: I. Tsamardinos, University of Crete; L.E. Brown, Michigan
Technological University; C.F. Aliferis, New York University.
Also hosted at:http://www.dsl-lab.org/supplements/mmhc_paper/mmhc_index.html
Or: http://discover.mc.vanderbilt.edu/discover/public/supplements/mmhc_paper/mmhc_index.html
Abstract
We present a new algorithm for Bayesian network structure learning,
called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas
from local learning, constraint-based, and search-and-score
techniques in a principled and effective way. It first reconstructs
the skeleton of a Bayesian network and then performs a
Bayesian-scoring greedy hill-climbing search to orient the edges. In
our extensive empirical evaluation MMHC outperforms on average and
in terms of various metrics several prototypical and
state-of-the-art algorithms, namely the PC, Sparse Candidate, Three
Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence
Search, and Greedy Search. These are the first empirical results
simultaneously comparing most of the major Bayesian network
algorithms against each other. MMHC offers certain theoretical
advantages, specifically over the Sparse Candidate algorithm,
corroborated by our experiments. MMHC and detailed results of our
study are publicly available at
http://www.cs.mtu.edu/~lebrown/supplements/mmhc_paper/
mmhc_index .html.
Paper (published in
Machine Learning, 2006)
[pdf]  
[ps]
    Supplemental Appendices
[pdf]  
[ps]
Algorithm
The Max-Min Hill-Climbing (MMHC) Algorithm is available in the
Causal
Explorer package. Implementations of Greedy Search (GS), PC, and Three Phase Dependency Analysis (TPDA) are also included in the Causal Explorer package.
Datasets
Datasets are listed by name, "data" links to a zip file of
the datasets used in the paper, "link" directs the user to
the networks entry in the Bayesian
Network Repository . The Bayesian Network Repository contains the
networks stored in multiple formats as well as citations to the
original papers. Each zip file contains the 10 datasets used for
learning at each sample size (500, 1000, 5000) and a file,
Name_graph.txt, that contains the adjacency matrix of the true network
(31 files in total). Note that for the larger datasets the zip file
might be upwards of 15MB to download.
We also desired to experiment with larger networks than what is
available in the public domain. In
prior
work, we have developed a method that
generates large Bayesian Networks by tiling several copies of
smaller Bayesian Networks until we reach a desired number of
variables. The tiling is performed in a way that maintains the
structural and probabilistic properties of the original network in
the tiled network. Using the tiling algorithm we created versions of
several of the
smaller networks. The new Bayesian Networks contained 3, 5, and 10
copies of the originals and are indicated with the number of tiles
next to the name of a network.
Supplemental Appendices
B. Computational Efficiency Results
- B.1 Timing Results
- B.1.1 Median Timing Results
[pdf]  
[ps]
- B.1.2 Complete Timing Results
The complete results tables are given below as Matlab data files and text files. The complete results are also visually shown in Figures 1 - 9. These figures are bar plots of the absolute timing results for each algorithm on a given network and sample size. The error bars plotted are at +/- standard deviation.
A red bar indicates that that particular algorithm hit the 2-day time limit. A bar with pink edging indicates it was an extreme value and exceeded the limits of the graph (the true value is given for comparison). Finally, a light green bar indicates the minimum time.
When a bar is missing for an algorithm then the algorithm was unable to run on that particular network and sample size. Also, recall that the OR algorithm was set to run at one and two times the amount of time it took MMHC to finish.
- B.2 Number of Statistical Calls Performed Results
- B.2.1 Median Statistical Calls Performed Results
[pdf]  
[ps]
C. Quality of Reconstruction Results
- C.1 Bayesian Score Results
- C.1.1 Median Bayesian Score Results
[pdf]  
[ps]
- C.1.2 Complete Bayesian Score Results
The complete results tables are given below as Matlab data files and text files. The Matlab files are 4-D matrices are indexed by network (22), algorithm (13), sample size (3), and sampling (5). For each sample size there is a text document with the mean and standard deviation for all algorithms and networks.
The complete Bayesian Score results are also visually shown in Figures 10- 18. These figures are bar plots of the absolute timing results for each algorithm on a given network and sample size. The error bars plotted are at +/- standard deviation.
A red * indicates the algorithm hit a two-day time limit with the quality not-determined. A bar with pink edging indicates it was an extreme value and exceeded the limits of the plot; the value is given for comparison. The green bar indicates the minimum.
When a bar is missing for an algorithm, indicated by an X, the algorithm was unable to run on that particular network and sample size.
- C.2 Structural Hamming Distance Results
- C.2.1 Median Structural Hamming Distance Results
[pdf]  
[ps]
- C.2.2 Complete Structural Hamming Distance Results
The complete results tables are given below as Matlab data files and text files. The 4-D matrices are indexed by network (22), algorithm (13), sample size (3), and sampling (5). Also, for each sample size there is a text document with the mean and standard deviation for all algorithms and networks.
The complete Structural Hamming Distance results are also visually shown in 19 - 27. These figures are bar plots of the absolute timing results for each algorithm on a given network and sample size. The error bars plotted are at +/- standard deviation.
A red * indicates the algorithm hit a two-day time limit with the quality not-determined. A bar with pink edging indicates it was an extreme value and exceeded the limits of the plot; the value is given for comparison. The green bar indicates the minimum (for two cases CHILD and PIGS at sample size 5000 the minimum was MMHC with 0 errors in this case a green 0 is plotted instead).
When a bar is missing for an algorithm, denoted with an X, the algorithm was unable to run on that particular network and sample size.
- C.3 Kullback-Leibler Divergence Results
- C.3.1 Average Kullback-Leibler Divergence Results
[pdf]  
[ps]
- C.3.2 Median Kullback-Leibler Divergence Results
[pdf]  
[ps]
D. Results Across Common Datasets
In order to compare all algorithms to each other, the metrics are
evaluated across common networks. In order to keep the number of
common networks high we exclude GES from this analysis. Every
algorithm (except for GES) runs across 10 networks at sample size
500 and 1000 and 12 networks at sample size 5000. With this
analysis comparisons may be made between all algorithms and not just
with MMHC.
- D.1 Computational Efficiency Results
- D.1.1 Time Results
Normalized time is the running time of each algorithm for a particular sample size and network divided by the corresponding running time of MMHC. A normalized time value smaller than one corresponds to an algorithm with faster running times than MMHC. The average normalized timing results and median normalized timing results are presented over the common networks in Tables 7 and 8. The term in parenthesis is the number of networks taken in the average (median) calculation; here those networks are common across all algorithms.
- D.1.2 Number of Statistical Calls Performed Results
Normalized number of statistical calls performed is the number of statistical calls completed by each algorithm (number of tests of conditional independence, measures of association, and/or number of calls to the scoring function) for a dataset divided by MMHC's on the same dataset. Normalized values greater than one correspond to an algorithm performing a greater number of statistical functions than MMHC. The average and median normalized number of statistical calls performed are presented in Tables 9 and 10. The term in parentheses indicate the number of common networks included in the average (median) calculation.
- D.2 Quality of Reconstruction Results
- D.2.1 Bayesian Score Results
- D.2.2 Structural Hamming Distance Results
E. Effects on time and quality of learning of increasing the number of variables
- E.1 Effect on Greedy Search of Increasing the Number of Variables
- E.2 Effect on PC of Increasing the Number of Variables
- E.3 Effect on TPDA of Increasing the Number of Variables
- E.4 Effect on Sparse Candidate k=5 of Increasing the Number of Variables
- E.5 Effect on Sparse Candidate k=10 of Increasing the Number of Variables
- E.6 Effect on Optimal Reinsertion 1 and 2 k=5 of Increasing the Number of Variables
- E.7 Effect on Optimal Reinsertion 1 and 2 k=10 of Increasing the Number of Variables
- E.8 Effect on Optimal Reinsertion 1 and 2 k=20 of Increasing the Number of Variables
Complete Data Tables  
The complete results tables are given below as Matlab data files. The results matrices are indexed by dataset (22) by algorithm (13) by sample size (3) by trial (5). Files giving the indexing for the dataset and algorithm are linked below. The sample size is indexed as follows: sample size 500 == 1, sample size 1000 == 2, sample size 5000 = 3.