# Making a Caffe Layer

Caffe is one of the most popular open-source neural network frameworks. It is modular, clean, and fast. Extending it is tricky but not as difficult as extending other frameworks.

## Files to modify or create

Relative from the `$(CAFFE_HOME)`

- /src/caffe/proto/caffe.proto
- /include/caffe/common_layers.hpp or vision_layers.hpp
- /src/caffe/layer_factory.cpp
- /src/caffe/layers/new_layer.cpp
- /src/caffe/layers/new_layer.cu
- /src/caffe/test/test_new_layer.cpp

## File 1: caffe.proto

You have to give a new index to your new layer. Look for `next available ID`

. There are two lines containing the phrase. Increment the `next available ID`

and define the new layer.

## File 2: layer_facctory.cpp

You have to add two lines that defines switch case of layers

## File 3: Layer Header

Define your layer in a common layer header file. Use either `common_layers.hpp`

or `vision_layers.hpp`

, depending on the type of the layer.

## File 4 & 5 : Defining a layer

The layer has to inherit the `Layer`

virtual class. The virtual functions that you have to implement are the ones defined as `= 0`

which are

## File 6 : Test File

All the layers in the caffe must have the corresponding unit test file. The unit test must thoroughly check all the functionalities implemented. Make a file `/src/caffe/test/test_new_layer.cpp`

and use provided caffe unit test macros.

Finally, check Backprop using the `GradientChecker`

.

## Compile and Test

Run the following lines on the `$CAFFE_HOME`

.

## Implementation Detail

When you implement the functions, try to use the macros and functions provided by caffe to minimize your workload.

Blob offset

When you compute the offset from the blob pointer, use the safe

`offset(n,c)`

function.Basic Math Functions

`caffe_[mul|add|sub|div|sqr|powx|exp|abs|sin|cos|copy|scal|cpu_axpby]`

Basic elementwise functions and matrix multiplication are provided in

`/caffe/util/math_functions.hpp`

.CUDA Macros

There are several CUDA macros that come in very handy when implementing

`Forward_gpu`

and`Backward_gpu`

## CUDA brief summary

Since the Caffe framework heavily relies on CUDA, I’ll briefly summarize the basics of CUDA.

Function decorators

In CUDA terminology, device refers to a CUDA capable GPU and host refers to the CPU side.

There are two function decorators

`__device__`

and`__global__`

. If you put either of them in front of a function, the function is compiled as a CUDA kernel. You can call a`__device__`

kernel within a CUDA kernel whereas you can call a`__global__`

kernel from the host.A kernel function runs in parallel. There are two levels of parallelism : threads and blocks. Blocks consists of multiple threads and a collection of blocks is called as a grid. However, threads are divide into groups of 32 threads called wraps and it is best to use multiple of 32 threads.

You can specify the number of execution blocks that you will run from CPU side when you launch a kernel. For example, if you want to lanch a kernel called

`kernel_function`

, simply put the following on the CPU side code.`kernel_function<<<N_BLOCKS, N_THREADS>>>(arguments)`

. This will launch`N_BLOCKS`

of blocks with`N_THREADS`

number of threads.^{1}

## Angle To Sine Cosine Layer

The layer takes $N \times C \times 1 \times 1$ `Blob`

and produces $N \times 2C \times 1 \times 1$ `Blob`

. The angle must be in radian (which is none of our concern since the NN weight will adjust automatically).

For each input it produces two values $sin(x)$ and $cos(x)$. Let’s concatenate $n$ sines with $n$ cosines. If we define $y_i = \sin(x_i)$ and $y_{i+C} = \cos(x_i)$, the gradient will be

$$ \begin{align} \frac{\partial E(y_i, y_{i+C}, \dots)}{\partial x_i} & = \frac{\partial E(y_i, \dots)}{\partial y_i} \frac{\partial y_i}{\partial x_i} + \frac{\partial E(y_{i + C}, \dots)}{\partial y_{i+C}} \frac{\partial y_{i + C}}{\partial x_i}\\ & = \frac{\partial E(y_i, \dots)}{\partial y_i} \frac{\partial \sin(x_i)}{\partial x_i} + \frac{\partial E(y_{i + C}, \dots)}{\partial y_(i + C)} \frac{\partial y_{i + C}}{\partial x_i}\\ & = \frac{\partial E(y_i, \dots)}{\partial y_i} \cos(x_i) - \frac{\partial E(y_{i + C}, \dots)}{\partial y_{i + C}} \sin(x_i) \end{align} $$The $\frac{\partial E(y_i, \dots)}{\partial y_i}$ is defined in `top[n]->[c|g]pu_diff`

### angle_to_trig.cpp

### angle_to_trig.cu

# Loss Layer

A loss layer does not have any top outputs since a loss is the final output. However, in caffe, you can use the top layers to set the scalers of a specific loss layer.

A scaler is fed into the loss layer using

This is common practice and is used in many conventional loss layers including Euclidean Loss, Contrastive Loss, etc.

## Leave a Comment