CS231A TA Session

Problem set 4

Christopher B. Choy
slide to right on a mobile phone or press the right array button

Question 1

Hough Transformation

  • Transform a point in a cartesian coordinate to all lines (line parameters) that pass the point in the polar coordinate

$$H: (x, y) \rightarrow \{ (r, \theta) | r(\theta) = x \cos \theta + y \sin \theta \}$$

$$y = -\frac{\cos \theta}{\sin \theta} x + \frac{r}{\sin \theta} \rightarrow r(\theta) = x \cos \theta + y \sin \theta$$

Generalized Hough Transformation

  • Transform a feature $\Phi(x)$ to points in the 2D Hough space $S = \{ \vec{a} \}$
    • where $\vec{a}$ is the reference origin of the shape
  • For points $\vec{x}$ on the boundary, store $\vec{r} = \vec{a} - \vec{x}$ : R-Table
    • As a function of the gradient $\Phi(\vec{x})$
    • Rotation and scale : 4D Hough Space

D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981

Implicit Shape Model

  • Use the R-Table as a function of an image patch.
  • Transform an image patch to object centers.

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV 2004 

Question 2

Bag of Visual Words

Feature Extraction
Clustering
Encoding
Spatial Binning
Kernel SVM $$k(x,y) = \left< \Psi(x), \Psi(y) \right>$$

Bag of Visual Words

  • Feature Extraction
    • Local image feature
    • Robust to typical image transformations
      • Dense SIFT 
        • SIFT at every location vs. key points
      • Interest points might not correspond to foreground
  • Clustering (Dictionary)
    • Visual words should be distinctive and diverse
    • Common visual words (wheel) will form a cluster
      • K-means clustering

Bag of Visual Words

  • Encoding
    • This maps local features to visual words
    • Hard quantization
      • assign to the NN
  • Spatial Pyramid
    • Encodes spatial information
    • Divides the image into subsections
    • For each subsection, create a VW histogram.

Similarity Metric for Distributions

Train a linear SVM on features (distribution): which distance metric?

How similar is a distribution $P$ from a distribution $Q$?

$f$-divergence!

$ f : (0, \infty) \rightarrow \mathcal{R}$, convex

$$D_f(P||Q) = \int f\left( \frac{dP}{dQ}\right) dQ$$

  • Kullback-Leibler Divergence $$f(x) = x \log x$$
  • $\chi^2$ Divergence $$f(x) = (x - 1)^2$$

For a discrete case

  • Intersection kernel : $k(x,y) = \sum \min(x_i, y_i)$
  • $\chi^2$ kernel : $k(x,y) = \frac{1}{2} \sum \frac{(x_i-y_i)^2}{x_i+y_i}$

In problem 2, we will use the $\chi^2$ kernel for the similarity metric.

Homogeneous Kernel and Finite Approximation

$K(\mathbf{x},\mathbf{y})$ is computationaly expensive, non linear!
But in the feature space $\Psi(\cdot)$, it becomes linear ($\infty$ dimension).

  • Compute the feature map $\Psi(x)$ using the Homogeneous Kernel
  • A generic histogram kernel is $K(\mathbf{x}, \mathbf{y}) = \sum_i k(x_i, y_i)$
    • It is homogeneous if $k(cx, cy) = c k(x,y)$
    • intersection, $\chi^2$, Hellinger's, Jensen-Shannon's
  • Find a finite feature map $\hat{\Psi}(\cdot)$ such that $$k(x,y) = \left< \Psi(x), \Psi(y) \right> \approx \left< \hat{\Psi}(x), \hat{\Psi}(y) \right>$$
  • Map distributions using $\hat{\Psi}(\cdot)$. Then the problem becomes a linear classification in the feature space!

Homogeneous Kernel

  • Use the homogeneity, $k(x,y) = \sqrt{xy} k\left(\sqrt{x/y}, \sqrt{y/x}\right) = \sqrt{xy} \mathcal{K} \left( \log (y/x) \right)$
  • Ex. intersection kernel
    • $k(x,y) = \min(x,y) = \sqrt{xy} \mathcal{K}(\log \frac{y}{x})$
    • $\mathcal{K}(w) = e^{- |w|/2}$
  • Define $\kappa (\lambda)$ as the Fourier transformation of $\mathcal{K}(w)$
  • $\Psi(x) = e^{-i\lambda \log x} \sqrt{x \kappa(\lambda)}$
    • Continuous, infinite dimensional feature
  • Sample at points $\lambda = -nL, (-n+1)L , ..., nL$
  • $\hat{\Psi}(x) \in \mathcal{R}^{2n + 1}$
    • $k(x,y) \approx \left< \hat{\Psi}(x), \hat{\Psi}(y) \right>$
  • In problem 2, we use $n = 1$
  • vl_homkermap()

Bag of Visual Words

Feature Extraction
Clustering
Encoding
Spatial Binning
Kernel SVM $$k(x,y) = \left< \Psi(x), \Psi(y) \right>$$

Bag of Visual Words

  • Install VL Feat, an extensive CV library.

  • Vary options and see how it affects the performance.

  • Code extract_dense_sift.m

    • Image feature extraction

      • Use Dense SIFT features

      • Use vl_dsift()
    • randomly select 10,000 features

  • Code create_dictionary.m

    • Create dictionary of visual words

      • Use k-means to find the centroid of clusters.

      • use vl_kmeans() 

  • Code create_histograms.m

    • Represent images as histograms

    • Spatial pyramid