papers_we_read

Center-based 3D Object Detection and Tracking

Tianwei Yin, Xingyi Zhou, Philipp Krahenbuhl (UT Austin), CVPR 2021

Summary

This paper proposes that center-based 3D object detection and tracking is well performing over 3D bounding box representations. And how center based method detection and tracking is simple. This paper outperforms on Waymo Open Dataset and ranks first among all Lidar-only submissions.

Abstract

3D objects commonly represented as 3D boxes in point-cloud but this has many challenges. Like , point-clouds are sparse, and most regions of 3D space are without measurements, the resulting output three dimensional box is not well aligned with any global coordinate frame and objects having large size, shapes and aspect ratios. So In this paper, we will represent, detect, and track 3D objects as points. This has many advantages - unlike bounding boxes, points have no intrinsic orientation, a center-based representation simplifies downstream tasks such as tracking. Now our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity.. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model.

Methodology

  1. First get 3D point cloud data from LiDAR sensors.CenterPoint uses a standard Lidar-based backbone network, i.e., VoxelNet or PointPillars, to build a flattening map view of the input point cloud data. I.e projecting 3D data on a 2D plane. Now this will be treated as a regular 2D image.
  2. Now keypoint detector or in this case we will be using CenterNet algorithm which will takes an input image and predicts a w × h heatmap Ŷ ∈ [0, 1]w×h×K for each of K classes. Each local maximum in the output heatmap corresponds to the center of a detected object and also detect object size, rotation, and velocity using center features
  3. Now, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective

Main Results

Table 1: State-of-the-art comparisons for 3D detection on Waymo test set

Method mAP ↑ NDS ↑ PKL ↓
PointPillars 40.1 55.0 1.00
CVCNet 55.3 64.4 0.92
CBGS 52.8 63.3 0.77
PointPainting 46.4 58.1 0.89
Ours 58.0 65.5 0.69

Table 2: State-of-the-art comparisons for 3D detection on nuScenes test set

Method MOTA ↑ MOTP ↓
Vehicle Ped Vehicle Ped
AB3D 42.5 38.9 18.6 34.0
Ours 62.6 58.3 16.3 31.1

Table 3: State-of-the-art comparisons for 3D tracking on Waymo test set

Method ΔMOTA↑ FP↓ FN↓ IDS↓
AB3D 15.1 15088 75730 9027
Chiu et al. 55.0 17533 33216 950
Ours 63.8 18612 22928 760

Table 4: State-of-the-art comparisons for 3D tracking on nuScenes test set

Encoder Method Vehicle Pedestrian mAPH
VoxelNet Anchor-based 66.1 54.4 60.3
Center-based 66.5 62.7 64.6
PointPillars Anchor-based 64.1 50.8 57.5
Center-based 66.5 57.4 62.0

Our two cents

Simple: It use standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird-eye-view heatmap and other dense regression outputs including the offset to centers in the previous frame. Detection is a simple local peak extraction with refinement, and tracking is a closest-distance matching.

Fast and Accurate: Our best single model achieves 71.9 mAPH on Waymo and 65.5 NDS on nuScenes while running at 11FPS+.

Resources

Paper:https://paperswithcode.com/paper/center-based-3d-object-detection-and-tracking

Code and pretrained models:https://github.com/tianweiy/CenterPoint