Main Content

Get Started with PointPillars

PointPillars is a method for 3-D object detection using 2-D convolutional layers. PointPillars network has a learnable encoder that uses PointNets to learn a representation of point clouds organized in pillars (vertical columns). The network then runs a 2-D convolutional neural network (CNN) to produce network predictions, decodes the predictions, and generates 3-D bounding boxes for different object classes such as cars, trucks, and pedestrians.

PointPillars object detection

The PointPillars network has these main stages.

  1. Use a feature encoder to convert a point cloud to a sparse pseudoimage.

  2. Process the pseudoimage into a high-level representation using a 2-D convolution backbone.

  3. Detect and regress 3D bounding boxes using detection heads.

PointPillars Network

A PointPillars network requires two inputs: pillar indices as a P-by-2 and pillar features as a P-by-N-by-K matrix. P is the number of pillars in the network, N is the number of points per pillar, and K is the feature dimension.

The network begins with a feature encoder, which is a simplified PointNet. It contains a series of convolution, batch-norm, and relu layers followed by a max pooling layer. A scatter layer at the end maps the extracted features into a 2-D space using the pillar indices.

Next, the network has a 2-D CNN backbone that consists of encoder-decoder blocks. Each encoder block consists of convolution, batch-norm, and relu layers to extract features at different spatial resolutions. Each decoder block consists of transpose convolution, batch-norm, and relu layers.

The network then concatenates output features at the end of each decoder block, and passes these features through six detection heads with convolutional and sigmoid layers to predict occupancy, location, size, angle, heading, and class.

PointPillars network diagram

Create PointPillars Network

You can use the Deep Network Designer (Deep Learning Toolbox) app to interactively create a PointPillars deep learning network. To programmatically create a PointPillars network, use the pointPillarsObjectDetector object.

Transfer Learning

Transfer learning is a common deep learning technique in which you take a pretrained network as a starting point to train a network for a new task.

To perform transfer learning with a pretrained pointPillarsObjectDetector network, specify new object classes and their corresponding anchor boxes. Then, train the network on a new data set.

Anchor boxes capture the scale and aspect ratio of specific object classes you want to detect, and are typically chosen based on object sizes in your training data set. For more information on anchor boxes, see Anchor Boxes for Object Detection.

Train PointPillars Object Detector and Perform Object Detection

Use the trainPointPillarsObjectDetector function to train a PointPillars network. To perform object detection on a trained PointPillars network, use the detect function. For more information on how to train a PointPillars network, see Lidar 3-D Object Detection Using PointPillars Deep Learning.

Code Generation

To learn how to generate CUDA® code for a PointPillars Network, see Code Generation for Lidar Object Detection Using PointPillars Deep Learning.


[1] Lang, Alex H., Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. “PointPillars: Fast Encoders for Object Detection From Point Cloud” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12689–97. Long Beach, CA, USA: IEEE, 2019.

[2] Hesai and Scale. PandaSet.

See Also




Related Examples

More About