reveal.js - The HTML Presentation Framework

CS231A TA Session

Christopher B. Choy
slide to right on a mobile phone or press the right array button

Transform a point in a cartesian coordinate to all lines (line parameters) that pass the point in the polar coordinate

$$H: (x, y) \rightarrow \{ (r, \theta) | r(\theta) = x \cos \theta + y \sin \theta \}$$

$$y = -\frac{\cos \theta}{\sin \theta} x + \frac{r}{\sin \theta} \rightarrow r(\theta) = x \cos \theta + y \sin \theta$$

Transform a feature $\Phi(x)$ to points in the 2D Hough space $S = \{ \vec{a} \}$
- where $\vec{a}$ is the reference origin of the shape
For points $\vec{x}$ on the boundary, store $\vec{r} = \vec{a} - \vec{x}$ : R-Table
- As a function of the gradient $\Phi(\vec{x})$
- Rotation and scale : 4D Hough Space

D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV 2004

Feature Extraction Local image feature Robust to typical image transformations Dense SIFT SIFT at every location vs. key points Interest points might not correspond to foreground
Clustering (Dictionary) Visual words should be distinctive and diverse Common visual words (wheel) will form a cluster K-means clustering

Encoding This maps local features to visual words Hard quantization assign to the NN
Spatial Pyramid Encodes spatial information Divides the image into subsections For each subsection, create a VW histogram.

Train a linear SVM on features (distribution): which distance metric?

How similar is a distribution $P$ from a distribution $Q$?

$f$-divergence!

$ f : (0, \infty) \rightarrow \mathcal{R}$, convex

$$D_f(P||Q) = \int f\left( \frac{dP}{dQ}\right) dQ$$

In problem 2, we will use the $\chi^2$ kernel for the similarity metric.

$K(\mathbf{x},\mathbf{y})$ is computationaly expensive, non linear!
But in the feature space $\Psi(\cdot)$, it becomes linear ($\infty$ dimension).

Compute the feature map $\Psi(x)$ using the Homogeneous Kernel
A generic histogram kernel is $K(\mathbf{x}, \mathbf{y}) = \sum_i k(x_i, y_i)$
- It is homogeneous if $k(cx, cy) = c k(x,y)$
- intersection, $\chi^2$, Hellinger's, Jensen-Shannon's
Find a finite feature map $\hat{\Psi}(\cdot)$ such that $$k(x,y) = \left< \Psi(x), \Psi(y) \right> \approx \left< \hat{\Psi}(x), \hat{\Psi}(y) \right>$$
Map distributions using $\hat{\Psi}(\cdot)$. Then the problem becomes a linear classification in the feature space!

Use the homogeneity, $k(x,y) = \sqrt{xy} k\left(\sqrt{x/y}, \sqrt{y/x}\right) = \sqrt{xy} \mathcal{K} \left( \log (y/x) \right)$
Ex. intersection kernel
- $k(x,y) = \min(x,y) = \sqrt{xy} \mathcal{K}(\log \frac{y}{x})$
- $\mathcal{K}(w) = e^{- |w|/2}$
Define $\kappa (\lambda)$ as the Fourier transformation of $\mathcal{K}(w)$
$\Psi(x) = e^{-i\lambda \log x} \sqrt{x \kappa(\lambda)}$
- Continuous, infinite dimensional feature
Sample at points $\lambda = -nL, (-n+1)L , ..., nL$
$\hat{\Psi}(x) \in \mathcal{R}^{2n + 1}$
- $k(x,y) \approx \left< \hat{\Psi}(x), \hat{\Psi}(y) \right>$
In problem 2, we use $n = 1$
vl_homkermap()

Install VL Feat, an extensive CV library.
- http://www.vlfeat.org/
- Set up the path correctly in the starter code p2.m.
Vary options and see how it affects the performance.

Code extract_dense_sift.m
- Image feature extraction
  - Use Dense SIFT features
  - Use vl_dsift()
- randomly select 10,000 features
Code create_dictionary.m
- Create dictionary of visual words
  - Use k-means to find the centroid of clusters.
  - use vl_kmeans()
Code create_histograms.m
- Represent images as histograms
- Spatial pyramid