An Empirical Comparison of Supervised Learning Algorithms
performance occasionally perform exceptionally well.
Acknowledgments
We thank F. Provost and C. Perlich for the BACT,
COD, and CALHOUS data, C. Young for the SLAC
data, A. Gualtieri for the HS data, and B. Zadrozny
and C. Elkan for the Isotonic Regression code. This
work was supported by NSF Award 0412930.
References
Ayer, M., Brunk, H., Ewing, G., Reid, W., & Silver-
man, E. (1955). An empirical distribution function
for sampling with incomplete information. Annals
of Mathematical Statistics, 5, 641–647.
Bauer, E., & Kohavi, R. (1999). An empirical com-
parison of voting classification algorithms: Bagging,
boosting, and variants. Machine Learning, 36.
Blake, C., & Merz, C. (1998). UCI repository of ma-
chine learning databases.
Breiman, L. (1996). Bagging predictors. Machine
Learning, 24, 123–140.
Breiman, L. (2001). Random forests. Machine Learn-
ing, 45, 5–32.
Buntine, W., & Caruana, R. (1991). Introduction
to ind and recursive partitioning (Technical Report
FIA-91-28). NASA Ames Research Center.
Caruana, R., & Niculescu-Mizil, A. (2004). Data min-
ing in metric space: An empirical analysis of sup-
pervised learning performance criteria. Knowledge
Discovery and Data Mining (KDD’04).
Cooper, G. F., Aliferis, C. F., Ambrosino, R., Aronis,
J., Buchanan, B. G., Caruana, R., Fine, M. J., Gly-
mour, C., Gordon, G., Hanusa, B. H., Janosky, J. E.,
Meek, C., Mitchell, T., Richardson, T., & Spirtes,
P. (1997). An evaluation of machine learning meth-
ods for predicting pneumonia mortality. Artificial
Intelligence in Medicine, 9.
Giudici, P. (2003). Applied data mining. New York:
John Wiley and Sons.
Gualtieri, A., Chettri, S. R., Cromp, R., & Johnson,
L. (1999). Support vector machine classifiers as ap-
plied to aviris data. Proc. Eighth JPL Airborne Geo-
science Workshop.
Joachims, T. (1999). Making large-scale svm learning
practical. Advances in Kernel Methods.
King, R., Feng, C., & Shutherland, A. (1995). Statlog:
comparison of classification algorithms on large real-
world problems. Applied Artificial Intelligence, 9.
LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A.,
Cortes, C., Denker, J. S., Drucker, H., Guyon, I.,
Muller, U. A., Sackinger, E., Simard, P., & Vapnik,
V. (1995). Comparison of learning algorithms for
handwritten digit recognition. International Con-
ference on Artificial Neural Networks (pp. 53–60).
Paris: EC2 & Cie.
Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A
comparison of prediction accuracy, complexity, and
training time of thirty-three old and new classifica-
tion algorithms. Machine Learning, 40, 203–228.
Niculescu-Mizil, A., & Caruana, R. (2005). Predicting
good probabilities with supervised learning. Proc.
22nd International Conference on Machine Learn-
ing (ICML’05).
Perlich, C., Provost, F., & Simonoff, J. S. (2003). Tree
induction vs. logistic regression: a learning-curve
analysis. J. Mach. Learn. Res., 4, 211–255.
Platt, J. (1999). Probabilistic outputs for support vec-
tor machines and comparison to regularized likeli-
hood methods. Adv. in Large Margin Classifiers.
Provost, F., & Domingos, P. (2003). Tree induction
for probability-based rankings. Machine Learning.
Provost, F. J., & Fawcett, T. (1997). Analysis and
visualization of classifier performance: Comparison
under imprecise class and cost distributions. Knowl-
edge Discovery and Data Mining (pp. 43–48).
Robertson, T., Wright, F., & Dykstra, R. (1988). Or-
der restricted statistical inference. New York: John
Wiley and Sons.
Schapire, R. (2001). The boosting approach to ma-
chine learning: An overview. In MSRI Workshop
on Nonlinear Estimation and Classification.
Vapnik, V. (1998). Statistical learning theory. New
York: John Wiley and Sons.
Witten, I. H., & Frank, E. (2005). Data mining: Prac-
tical machine learning tools and techniques. San
Francisco: Morgan Kaufmann. Second edition.
Zadrozny, B., & Elkan, C. (2001). Obtaining cali-
brated probability estimates from decision trees and
naive bayesian classifiers. ICML.
Zadrozny, B., & Elkan, C. (2002). Transforming clas-
sifier scores into accurate multiclass probability es-
timates. KDD.