Caffe is one of the most popular open-source neural network frameworks. It is modular, clean, and fast. Extending it is tricky but not as difficult as extending other frameworks.
Files to modify or create
Relative from the $(CAFFE_HOME)
/src/caffe/proto/caffe.proto
/include/caffe/common_layers.hpp or vision_layers.hpp
/src/caffe/layer_factory.cpp
/src/caffe/layers/new_layer.cpp
/src/caffe/layers/new_layer.cu
/src/caffe/test/test_new_layer.cpp
File 1: caffe.proto
You have to give a new index to your new layer. Look for next available ID. There are two lines containing the phrase. Increment the next available ID and define the new layer.
File 2: layer_facctory.cpp
You have to add two lines that defines switch case of layers
File 3: Layer Header
Define your layer in a common layer header file. Use either common_layers.hpp or vision_layers.hpp, depending on the type of the layer.
File 4 & 5 : Defining a layer
The layer has to inherit the Layer virtual class. The virtual functions that you have to implement are the ones defined as = 0 which are
File 6 : Test File
All the layers in the caffe must have the corresponding unit test file. The unit test must thoroughly check all the functionalities implemented. Make a file /src/caffe/test/test_new_layer.cpp and use provided caffe unit test macros.
Finally, check Backprop using the GradientChecker.
Compile and Test
Run the following lines on the $CAFFE_HOME.
Implementation Detail
When you implement the functions, try to use the macros and functions provided by caffe to minimize your workload.
Blob offset
When you compute the offset from the blob pointer, use the safe offset(n,c) function.
Basic elementwise functions and matrix multiplication are provided in /caffe/util/math_functions.hpp.
CUDA Macros
There are several CUDA macros that come in very handy when implementing Forward_gpu and Backward_gpu
CUDA brief summary
Since the Caffe framework heavily relies on CUDA, I’ll briefly summarize the basics of CUDA.
Function decorators
In CUDA terminology, device refers to a CUDA capable GPU and host refers to the CPU side.
There are two function decorators __device__ and __global__. If you put either of them in front of a function, the function is compiled as a CUDA kernel. You can call a __device__ kernel within a CUDA kernel whereas you can call a __global__ kernel from the host.
A kernel function runs in parallel. There are two levels of parallelism : threads and blocks. Blocks consists of multiple threads and a collection of blocks is called as a grid. However, threads are divide into groups of 32 threads called wraps and it is best to use multiple of 32 threads.
You can specify the number of execution blocks that you will run from CPU side when you launch a kernel. For example, if you want to lanch a kernel called kernel_function, simply put the following on the CPU side code. kernel_function<<<N_BLOCKS, N_THREADS>>>(arguments). This will launch N_BLOCKS of blocks with N_THREADS number of threads. 1
Angle To Sine Cosine Layer
The layer takes $N \times C \times 1 \times 1$ Blob and produces $N \times 2C \times 1 \times 1$ Blob. The angle must be in radian (which is none of our concern since the NN weight will adjust automatically).
For each input it produces two values $sin(x)$ and $cos(x)$. Let’s concatenate $n$ sines with $n$ cosines. If we define $y_i = \sin(x_i)$ and $y_{i+C} = \cos(x_i)$, the gradient will be
The $\frac{\partial E(y_i, \dots)}{\partial y_i}$ is defined in top[n]->[c|g]pu_diff
angle_to_trig.cpp
angle_to_trig.cu
Loss Layer
A loss layer does not have any top outputs since a loss is the final output. However, in caffe, you can use the top layers to set the scalers of a specific loss layer.
A scaler is fed into the loss layer using
This is common practice and is used in many conventional loss layers including Euclidean Loss, Contrastive Loss, etc.
Leave a Comment