For this week I worked on uploading classifier definitions learned from Multiboost into MATLAB and writing code to generate ROC curves. They show, once more, that color histogram perform much better for genre classification than SIFT features.
SIFT to classify genre
AUC
Diet/Fitness: .6642
History: .6237
Romance: .5533
SciFi/Fantasy: .5821
Color to classify genre
The following are 2 ROC curves for two different cross-validation folds:
Diet/Fitness: .7892
History: .7583
Romance: .7733
SciFi/Fantasy: .7433
AUC
Diet/Fitness: .9258
History: .8067
Romance: .6750
SciFi/Fantasy: .6950
The classification performance is obviously different for different train/test cross validation folds. Should I average them together to get final performance measures..?
I've also been working on getting rid of between-class redundant SIFT features prior to LDA projection to see if this improves classification, but have run into the MATLAB 'for' loop bottleneck. My goal is to write a MATLAB-interfaceable c file to accomplish this for inclusion in the report.
I've also looked for relationship between SIFT and publication year and number of user reviews with no positive results.


