Posts
Pinned
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
Authors Junha Lee1,2,*, Chunghyun Park1,2,*, Jaesung Choe1, Yu-Chiang Frank Wang1, Jan Kautz1, Minsu Cho2, Chris Choy1 1NVIDIA, 2POSTECH * indicates equal contribution Abstract We tackle open-...
PeRFception: Perception using Radiance Fields
Authors Yoonwoo Jeong, Seungjoo Shin, Junha Lee, Christopher Choy, Anima Anandkumar, Minsu Cho, Jaesik Park NeurIPS, 2022 Abstract The recent progress in implicit 3D representation, ie, Neural ...
Deep Global Registration
We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans. Deep global registration is based on three modules: a 6-dimensional convolutional ...
High dimensional Convolutional Neural Networks for 3D Perception
Abstract The automation of mechanical tasks brought the modern world unprecedented prosperity and comfort. However, the majority of automated tasks have been simple mechanical tasks that only requ...
Fully Convolutional Geometric Features
Authors Christopher Choy, Jaesik Park, Vladlen Koltun International Conference on Computer Vision (ICCV), 2019 Speed vs. Accuracy Pareto optimal frontier of previous methods and ours. Abstrac...
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, those 3D-videos are processed frame-by-fr...
All Posts
CuTe DSL Basics: A Practical Introduction
CuTe DSL Basics — From Hello to Tiled Kernels This tutorial turns the CuTe DSL script snippets into a connected story: we start with a first GPU kernel, learn how dynamic printing and data types w...
CUDA Memory Load/Store Performance: A Comprehensive Benchmark Analysis
CUDA Memory Load/Store Performance: A Comprehensive Benchmark Analysis GPU memory performance is often the bottleneck in high-performance computing applications. Understanding the nuances of diffe...
Monocular Dynamic View Synthesis: A Reality Check
Authors Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa, Christopher Choy NeurIPS, 2023 Abstract Indoor scene reconstruction from monocular images has long been sought af...
ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation
Authors Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Christopher Choy, Zackory Erickson, David Held RSS, 2022 Abstract Manipulating volumetric deformabl...
Self-Calibrating Neural Radiance Fields
Authors Yoonwoo Jeong, Seokjun Ahn, Christopher Choy, Animashree Anandkumar, Minsu Cho, Jaesik Park ICCV, 2021 Abstract In this work, we propose a camera self-calibration algorithm for generic ...
Learning 3D Representations of Dynamic Environments from a Single Camera
Authors Gengshan Yang, Minh Vo, Neverova Natalia, Deva Ramanan, Andrea Vedaldi, Christopher Choy CVPR, 2021 Abstract Learning 3D representations of dynamic environments from a single camera pre...
Ghost of 3D Perception: Permutation Invariance Matters? Convolutions are Permutation Invariant!
Are you familiar with the python dictionary class? Let me give you a quick test to check your level of knowledge. a = dict() a[1.1] = 1 a[2.1] = 2 b = dict() b[2.1] = 2 b[1.1] = 1 Do you think t...
Faster Neural Radiance Fields Inference
The Neural Radiance Fields (NeRF) proposed an interesting way to represent a 3D scene using an implicit network for high fidelity volumetric rendering. Compared with traditional methods to generate...
Setting Class Attributes in Python
Setting class attributes in python can be tedious. In this post, I want to summarize a trick that I’ve been using to simplify this process. Class Attributes in init In many cases, we have to save...
Misconceptions about Memory and Good Documentation
Documentation probably is one of the most important tasks that no one has time for. I also overlook the importance as I get swept by a series of projects and requests. Recently, however, I learn mo...
High-dimensional Convolutional Networks for Geometric Pattern Recognition
Many problems in science and engineering can be formulated in terms of geometric patterns in high-dimensional spaces. We present high-dimensional convolutional networks (ConvNets) for pattern recog...
Pytorch Extension with a Makefile
Pytorch is a great neural network library that has both flexibility and power. Personally, I think it is the best neural network library for prototyping (advanced) dynamic neural networks fast and ...
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Abstract We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model com...
Short Note on Matrix Differentials and Backpropagation
Mathematical notation is the convention that we all use to denote a concept in a concise mathematical formulation, yet sometimes there is more than one way to express the same equation. For example...
Regression vs. Classification: Distance and Divergence
In Machine Learning, supervised problems can be categorized into regression or classification problems. The categorization is quite intuitive as the name indicate. For instance, if the output, or t...
Data Processing Inequality and Unsurprising Implications
We have heard enough about the great success of neural networks and how they are used in real problems. Today, I want to talk about how it was so successful (partially) from an information theoreti...
Learning Gaussian Process Covariances
A Gaussian process is a non-parametric model which can represent a complex function using a growing set of data. Unlike a neural network, which can also learn a complex functions, a Gaussian proces...
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
3D reconstruction from a single image is a key problem in multiple applications ranging from robotic manipulation to augmented reality. Prior methods have tackled this problem through generative mo...
Weakly Supervised 3D Reconstruction with Manifold Constraint
Volumetric 3D reconstruction has witnessed a significant progress in performance through the use of deep neural network based methods that address some of the limitations of traditional reconstruct...
Expectation Maximization and Variational Inference (Part 2)
In the previous post, we covered variational inference and how to derive update equations. In this post, we will go over a simple Gaussian Mixture Model with the Dirichlet prior distribution over t...
Scene Graph Generation by Iterative Message Passing
Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between objects also constitute rich semantic information about the scene. In this work, we expli...
DESIRE: Deep Stochastic IOC RNN Encoder-decoder for Distant Future Prediction in Dynamic Scenes with Multiple Interacting Agents
We introduce a Deep Stochastic IOC1 RNN Encoder- decoder framework, DESIRE, with a conditional Variational Auto-Encoder and multiple RNNs for the task of future predictions of multiple interacting ...
SegCloud: Segmantic Segmentation of 3D Point Clouds
Abstract 3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works lever...
Expectation Maximization and Variational Inference (Part 1)
Statistical inference involves finding the right model and parameters that represent the distribution of observations well. Let $\mathbf{x}$ be the observations and $\theta$ be the unknown paramete...
Dirichlet Process Mixtures and Inference (Part 1)
Statistical inference often requires modeling the distribution of data. There are two branches of statistical modeling: parametric and non-parametric methods. The former one specifies the data dist...
Universal Correspondence Network
We present a deep learning framework for accurate visual correspondences and demonstrate its effectiveness for both geometric and semantic matching, spanning across rigid motions to intra-class sha...
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Recon...
Caffe Python Layer
Python layer in Caffe can speed up development process Issue1703 Compile WITH_PYTHON_LAYER option First, you have to build Caffe with WITH_PYTHON_LAYER option 1. Run make clean to delete all the ...
Gentle Introduction to Gaussian Process Regression
Parametric Regression uses a predefined function form to fit the data best (i.e, we make an assumption about the distribution of data by implicitly modeling them as linear, quadratic, etc.). Howev...
Reading protobuf DB in Python
Caffe uses Google Protocol buffer and LMDB or LevelDB to save data in a single unified database file. This allows faster data loading. Saving Database in LMDB I will not cover this step. If you a...
Barycentric Coordinate for Surface Sampling
To convert a mesh into a point cloud, one has to sample points that can uniformly cover the surface. To do so, one must choose the number of samples proportional to the area of a face (polygon). F...
Making a Caffe Layer
Caffe is one of the most popular open-source neural network frameworks. It is modular, clean, and fast. Extending it is tricky but not as difficult as extending other frameworks. Files to modify o...
Computing Neural Network Gradients
Computing the neural network gradient requires very simple calculus, yet can be tedious. Affine Transformation (Fully Connected Layer) Gradients For a simple fully connected layer with batch size...
Interesting Properties of Matrix Norms and Singular Values
$ \DeclareMathOperator*{\argmax}{arg\,max} $ Matrix norms and singular values have special relationships. In this post, I’ll summarize a few interesting properties of matrix norms and singu...