Making a Caffe Layer

Caffe is one of the most popular open-source neural network frameworks. It is modular, clean, and fast. Extending it is tricky but not as difficult as extending other frameworks.

Implementation Detail

When you implement the functions, try to use the macros and functions provided by caffe to minimize your workload.

• Blob offset

When you compute the offset from the blob pointer, use the safe offset(n,c) function.

• Basic Math Functions

caffe_[mul|add|sub|div|sqr|powx|exp|abs|sin|cos|copy|scal|cpu_axpby]

Basic elementwise functions and matrix multiplication are provided in /caffe/util/math_functions.hpp.

• CUDA Macros

There are several CUDA macros that come in very handy when implementing Forward_gpu and Backward_gpu

CUDA brief summary

Since the Caffe framework heavily relies on CUDA, I’ll briefly summarize the basics of CUDA.

• Function decorators

In CUDA terminology, device refers to a CUDA capable GPU and host refers to the CPU side.

There are two function decorators __device__ and __global__. If you put either of them in front of a function, the function is compiled as a CUDA kernel. You can call a __device__ kernel within a CUDA kernel whereas you can call a __global__ kernel from the host.

A kernel function runs in parallel. There are two levels of parallelism : threads and blocks. Blocks consists of multiple threads and a collection of blocks is called as a grid. However, threads are divide into groups of 32 threads called wraps and it is best to use multiple of 32 threads.

You can specify the number of execution blocks that you will run from CPU side when you launch a kernel. For example, if you want to lanch a kernel called kernel_function, simply put the following on the CPU side code. kernel_function<<<N_BLOCKS, N_THREADS>>>(arguments). This will launch N_BLOCKS of blocks with N_THREADS number of threads. 1

Angle To Sine Cosine Layer

The layer takes $N \times C \times 1 \times 1$ Blob and produces $N \times 2C \times 1 \times 1$ Blob. The angle must be in radian (which is none of our concern since the NN weight will adjust automatically).

For each input it produces two values $sin(x)$ and $cos(x)$. Let’s concatenate $n$ sines with $n$ cosines. If we define $y_i = \sin(x_i)$ and $y_{i+C} = \cos(x_i)$, the gradient will be

\begin{align} \frac{\partial E(y_i, y_{i+C}, \dots)}{\partial x_i} & = \frac{\partial E(y_i, \dots)}{\partial y_i} \frac{\partial y_i}{\partial x_i} + \frac{\partial E(y_{i + C}, \dots)}{\partial y_{i+C}} \frac{\partial y_{i + C}}{\partial x_i}\\ & = \frac{\partial E(y_i, \dots)}{\partial y_i} \frac{\partial \sin(x_i)}{\partial x_i} + \frac{\partial E(y_{i + C}, \dots)}{\partial y_(i + C)} \frac{\partial y_{i + C}}{\partial x_i}\\ & = \frac{\partial E(y_i, \dots)}{\partial y_i} \cos(x_i) - \frac{\partial E(y_{i + C}, \dots)}{\partial y_{i + C}} \sin(x_i) \end{align}

The $\frac{\partial E(y_i, \dots)}{\partial y_i}$ is defined in top[n]->[c|g]pu_diff

Loss Layer

A loss layer does not have any top outputs since a loss is the final output. However, in caffe, you can use the top layers to set the scalers of a specific loss layer.

A scaler is fed into the loss layer using

This is common practice and is used in many conventional loss layers including Euclidean Loss, Contrastive Loss, etc.

Tags:

Categories:

Updated: