Books and Their Covers: April 2012

Wednesday, April 25, 2012

SIFT -> Codebook -> Features -> Classification

Last time my update was about moving forward with feature extraction and logistic regression classification upon having completed installation of VLFeat.

vl_sift outputs X, Y frame center position coordinates, a frame scale coordinate, and a frame orientation in radians. When mapped onto an image, sift features look something like:

The yellow circles represent the output values mentioned above, and the green boxes represent descriptor vectors calculated using local gradient information.

These raw position/scale/orientation values won't immediately work for logistic regression purposes because each value on its own doesn't mean anything - I need a one-value representation of each SIFT frame. A way of doing this is generating a "codebook" of similar features across the training data and using the codebook to generate histograms for classification. This concept is visualized below from Kinnunen et al. 2009:

A commonly used codebook creation method is to run k-means and then use the cluster centers as a codevectors (for SIFT we could run k-means on the descriptor vectors). Defining a number of codevectors, one can then define similarity bins centered around each codevector.

So for next week I'll focus on generating a codebook of SIFT features, creating feature histograms for each image, and running logistic regression on SIFT and HoC.

My MATLAB aside:

MATLAB can store images in 64-bit (double), 32-bit (single), 16-bit (uint16), or 8-bit (uint8) form. double is MATLAB's standard 64-bit representation of numeric data. Storing images in uint8 form is a good idea for using less storage space, but many MATLAB operations are carried out on the double form.

To convert an image I to a double:

I = im2double(I)

The same can be carried out with any of the image types:

I = im2uint8(I)

I = im2uint16(I)

I = im2single(I)

Note that VLFeat asks for image input in single form.

Monday, April 16, 2012

Color Histograms + k-means

After initially attempting to write my own color histogram code (wouldn't recommend it), I found some effective code (getPatchHist.m) from a computer vision source code website, the method of which matched what I'd seen elsewhere. Essentially RGB values for each image are transformed into one value using a weighted sum of RGB values at each pixel. These values are then counted into a series of bins for each image.

Using 16^3 bins, the color histogram for Adora looks like:

The Histogram Intersection between each image was then computed by taking the sum of the intersection between all the color bins of each image, followed by summing and normalizing them.

Taking a look at the histogram intersections using imagesc, we can double-check that histograms intersect completely with themselves (diagonal values are 1). We also see that most color histograms are only distantly related, with a few that are very similar.

Running k-means on this intersection matrix, using 4 clusters and taking the closest 8 images from each cluster center, we get the following:

Cluster 1
Big black areas: 33% Romance, 37% History, 13% DietFitness, 17% SciFiFantasy

Cluster 2
Pastel colors: 24% Romance, 28% History, 36% DietFitness, 12% SciFiFantasy

Cluster 3
White with font: 11% Romance, 28% History, 56% DietFitness, 1% SciFiFantasy

Cluster 4
Bright colors: 29% Romance, 24% History, 21% DietFitness, 27% SciFiFantasy

Cluster Visualization
Taking the first 2 principle components of the histogram intersection matrix, and color-coding them according to cluster, we get:

Goals for Wed
- get familiar with VLfeat for more features

Wednesday, April 11, 2012

More covers! Still working on color histograms.

Kyle helped once more with image collection by pointing me towards openlibrary.org, which has a huge collection of book cover images for download (on the order of 10s of thousands). These do not come with ratings, but even without the ratings they can be used for unsupervised exploration.

Monday, April 9, 2012

Image collection, pre-processing, and clustering

I was able to add 90 book covers from the genre of health and fitness to the dataset, so that I have 60-100 book covers from 4 genres, now. Kyle generously shared a collection of .png files of philosophy book covers, grabbed from a collection of pdfs, but there are issues with finding corresponding ratings from Amazon (social commentary?). Rotten Tomatoes does not rate books, BUT I came across Goodreads just yesterday, and it seems promising as an alternate source for ratings and cover images.

I've written a script to load the images into MATLAB using eval and imread. From here I can directly access the RGB color values for each of the images, which I will use for my first set of image features, color histograms. I've chosen color as a first step because its readily available, and worth exploring.

I ran across this related project by Dr. Sai Chaitanya Gaddam at Boston University, Judging a movie by its cover: A clustering analysis. The gist:

use Netflix Prize data set of ~100 million ratings on 17770 movies from 480189 users
find movie similarity matrix
use k-means to find movie clusters
find "average" movie poster images from exemplar images from each cluster

He came up with a few averaged poster images like the ones below:

Note: he did not use the poster image itself for clustering.

Along the way, to overcome the issue of a 2D representation of 3D data, he came up with a neat visualization of the color properties of an image:

As for using color histograms to capture similarities between pictures, a classic technique called Histogram Intersection comes from Swain and Ballard 1991, where, given two histograms with n bins each, their intersection is:

Goals for Wednesday include:

Finish the dataset and upload everything to MATLAB
Calculate color histogram intersections between images

Goals for the rest of the week include:

Find something meaningful to do with intersection values - run k-means?
Read more classification literature

Tuesday, April 3, 2012

Image Collection

I was initially optimistic of finding an automated way of harvesting book cover images from Amazon, but after a few attempts at using website copying software (httrack, SiteSucker), difficulty with the website format (the covers aren't even directly downloadable files..?) and an approaching deadline, I resorted to using screenshots. For future knowledge, I would still be very interested in learning an automated way of doing this.

Finding a good variety of ratings for books has also been difficult. Books can only be sorted from highest-rated to lowest-rated, and I was not able to access search results past 100 pages.

Anyway, I at least have a range of ratings from 3-5 stars. I've been saving them in PNG format to avoid loss of image quality. I use XnView, a free image-editing software that lets me crop and re-size many images at a time (available at http://www.xnview.com/en/index.html). So far I have 66 SciFi/Fantasy, 98 Romance, and 85 History, with more being added every day. For next week, I'll need to finish image collection and have a solid idea of what features I am going to extract and how. Some of my favorite book covers so far:

If you're curious, here's a link to the full project proposal.

Related Work - Paper Gestalt

I realized that when Professor Belongie asked me about related work on Monday, he had (probably) meant this (humorous) paper he sent me about using computer vision to determine the quality of CVPR paper submissions.

Very witty paper by "von Bearnensquash", can be found at http://vision.ucsd.edu/sites/default/files/gestalt.pdf. They used "standard computer vision features" (LUV histograms, HoG, and gradient magnitude) and AdaBoost classification and found that "good" paper features include brightly colored graph and math equations, and "bad" paper features include complicated tables and missing pages (illustrated below).

They found that allowing for a false positive rate of 15%, they could successfully reject half of the "bad" papers.

The problem I'm addressing is similar but there's an important distinction. They use the content of the thing itself to evaluate quality, so it is sensible for their to be a relationship, but book cover images are not necessarily related to the content of what I'm evaluating for quality (the book itself).

In any case, AdaBoost could be a good classification method to try as it is simple and doubles as a feature selection method. There is a nice overview at https://hpcrd.lbl.gov/~meza/projects/MachineLearning/EnsembleMethods/introBoosting.pdf