## CS231A TA Session

### Problem set 4

Christopher B. Choy
slide to right on a mobile phone or press the right array button

## Hough Transformation

• Transform a point in a cartesian coordinate to all lines (line parameters) that pass the point in the polar coordinate

$$H: (x, y) \rightarrow \{ (r, \theta) | r(\theta) = x \cos \theta + y \sin \theta \}$$

$$y = -\frac{\cos \theta}{\sin \theta} x + \frac{r}{\sin \theta} \rightarrow r(\theta) = x \cos \theta + y \sin \theta$$

## Generalized Hough Transformation

• Transform a feature $\Phi(x)$ to points in the 2D Hough space $S = \{ \vec{a} \}$
• where $\vec{a}$ is the reference origin of the shape
• For points $\vec{x}$ on the boundary, store $\vec{r} = \vec{a} - \vec{x}$ : R-Table
• As a function of the gradient $\Phi(\vec{x})$
• Rotation and scale : 4D Hough Space

D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981

## Implicit Shape Model

• Use the R-Table as a function of an image patch.
• Transform an image patch to object centers.

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV 2004

## Bag of Visual Words

 Feature Extraction Clustering Encoding Spatial Binning Kernel SVM $$k(x,y) = \left< \Psi(x), \Psi(y) \right>$$

## Bag of Visual Words

 Feature Extraction Local image feature Robust to typical image transformations Dense SIFT  SIFT at every location vs. key points Interest points might not correspond to foreground Clustering (Dictionary) Visual words should be distinctive and diverse Common visual words (wheel) will form a cluster K-means clustering

## Bag of Visual Words

 Encoding This maps local features to visual words Hard quantization assign to the NN Spatial Pyramid Encodes spatial information Divides the image into subsections For each subsection, create a VW histogram.

## Similarity Metric for Distributions

Train a linear SVM on features (distribution): which distance metric?

How similar is a distribution $P$ from a distribution $Q$?

$f$-divergence!

$f : (0, \infty) \rightarrow \mathcal{R}$, convex

$$D_f(P||Q) = \int f\left( \frac{dP}{dQ}\right) dQ$$

• Kullback-Leibler Divergence $$f(x) = x \log x$$
• $\chi^2$ Divergence $$f(x) = (x - 1)^2$$

## For a discrete case

• Intersection kernel : $k(x,y) = \sum \min(x_i, y_i)$
• $\chi^2$ kernel : $k(x,y) = \frac{1}{2} \sum \frac{(x_i-y_i)^2}{x_i+y_i}$

In problem 2, we will use the $\chi^2$ kernel for the similarity metric.

## Homogeneous Kernel and Finite Approximation

$K(\mathbf{x},\mathbf{y})$ is computationaly expensive, non linear!
But in the feature space $\Psi(\cdot)$, it becomes linear ($\infty$ dimension).

• Compute the feature map $\Psi(x)$ using the Homogeneous Kernel
• A generic histogram kernel is $K(\mathbf{x}, \mathbf{y}) = \sum_i k(x_i, y_i)$
• It is homogeneous if $k(cx, cy) = c k(x,y)$
• intersection, $\chi^2$, Hellinger's, Jensen-Shannon's
• Find a finite feature map $\hat{\Psi}(\cdot)$ such that $$k(x,y) = \left< \Psi(x), \Psi(y) \right> \approx \left< \hat{\Psi}(x), \hat{\Psi}(y) \right>$$
• Map distributions using $\hat{\Psi}(\cdot)$. Then the problem becomes a linear classification in the feature space!

## Homogeneous Kernel

• Use the homogeneity, $k(x,y) = \sqrt{xy} k\left(\sqrt{x/y}, \sqrt{y/x}\right) = \sqrt{xy} \mathcal{K} \left( \log (y/x) \right)$
• Ex. intersection kernel
• $k(x,y) = \min(x,y) = \sqrt{xy} \mathcal{K}(\log \frac{y}{x})$
• $\mathcal{K}(w) = e^{- |w|/2}$
• Define $\kappa (\lambda)$ as the Fourier transformation of $\mathcal{K}(w)$
• $\Psi(x) = e^{-i\lambda \log x} \sqrt{x \kappa(\lambda)}$
• Continuous, infinite dimensional feature
• Sample at points $\lambda = -nL, (-n+1)L , ..., nL$
• $\hat{\Psi}(x) \in \mathcal{R}^{2n + 1}$
• $k(x,y) \approx \left< \hat{\Psi}(x), \hat{\Psi}(y) \right>$
• In problem 2, we use $n = 1$
• vl_homkermap()

## Bag of Visual Words

 Feature Extraction Clustering Encoding Spatial Binning Kernel SVM $$k(x,y) = \left< \Psi(x), \Psi(y) \right>$$

## Bag of Visual Words

• Install VL Feat, an extensive CV library.

• Vary options and see how it affects the performance.

• Code extract_dense_sift.m

• Image feature extraction

• Use Dense SIFT features

• Use vl_dsift()
• randomly select 10,000 features

• Code create_dictionary.m

• Create dictionary of visual words

• Use k-means to find the centroid of clusters.

• use vl_kmeans()

• Code create_histograms.m

• Represent images as histograms

• Spatial pyramid