Monday, May 7, 2012

Multiboost in C++; ARFF; Weka in Java

Last week I left off with having attained SIFT histograms but having trouble with the slowness of the Matlab implementation of AdaBoost that I had been using.

I found an entire collection of boosting C++ code [1], which includes an AdaBoost implementation for multiple classes. One of the input formats is the ARFF (Attribute-Relation File Format) developed by the Computer Science department at the University of Waikato (in New Zealand) for use in their open-source Java-based Weka software for machine learning. Conveniently, Matt Dunham has already shared a Matlab-Weka interface which I was able to use to convert my .mat datasets to ARFF format to run though the multiboost framework and get weak learner results (training was so fast).

For this week I still need to 1) separate the covers into training/testing and run for cross-validation results for both genre and ratings and 2) understand what's going on with the multi-class generalization of AdaBoost.

If you have time you should check out the Weka project. I really like their mission statement and open-source spirit.

[1] D. Benbouzid, R. Busa-Fekete, N. Casagrande, F.-D. Collin, and B. Kégl
MultiBoost: a multi-purpose boosting package
Journal of Machine Learning Research, 13:549–553, 2012. 

No comments:

Post a Comment