papers_we_read

Center-based 3D Object Detection and Tracking

Tianwei Yin, Xingyi Zhou, Philipp Krahenbuhl (UT Austin), CVPR 2021

Summary

This paper proposes that center-based 3D object detection and tracking is well performing over 3D bounding box representations. And how center based method detection and tracking is simple. This paper outperforms on Waymo Open Dataset and ranks first among all Lidar-only submissions.

Abstract

3D objects commonly represented as 3D boxes in point-cloud but this has many challenges. Like , point-clouds are sparse, and most regions of 3D space are without measurements, the resulting output three dimensional box is not well aligned with any global coordinate frame and objects having large size, shapes and aspect ratios. So In this paper, we will represent, detect, and track 3D objects as points. This has many advantages - unlike bounding boxes, points have no intrinsic orientation, a center-based representation simplifies downstream tasks such as tracking. Now our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity.. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model.

Methodology

First get 3D point cloud data from LiDAR sensors.CenterPoint uses a standard Lidar-based backbone network, i.e., VoxelNet or PointPillars, to build a flattening map view of the input point cloud data. I.e projecting 3D data on a 2D plane. Now this will be treated as a regular 2D image.
Now keypoint detector or in this case we will be using CenterNet algorithm which will takes an input image and predicts a w × h heatmap Ŷ ∈ [0, 1]w×h×K for each of K classes. Each local maximum in the output heatmap corresponds to the center of a detected object and also detect object size, rotation, and velocity using center features
Now, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective

Main Results

Table 1: State-of-the-art comparisons for 3D detection on Waymo test set

Method	mAP ↑	NDS ↑	PKL ↓
PointPillars	40.1	55.0	1.00
CVCNet	55.3	64.4	0.92
CBGS	52.8	63.3	0.77
PointPainting	46.4	58.1	0.89
Ours	58.0	65.5	0.69

Table 2: State-of-the-art comparisons for 3D detection on nuScenes test set

Method	MOTA ↑		MOTP ↓
	Vehicle	Ped	Vehicle	Ped
AB3D	42.5	38.9	18.6	34.0
Ours	62.6	58.3	16.3	31.1

Table 3: State-of-the-art comparisons for 3D tracking on Waymo test set

Method	ΔMOTA↑	FP↓	FN↓	IDS↓
AB3D	15.1	15088	75730	9027
Chiu et al.	55.0	17533	33216	950
Ours	63.8	18612	22928	760

Table 4: State-of-the-art comparisons for 3D tracking on nuScenes test set

Encoder	Method	Vehicle	Pedestrian	mAPH
VoxelNet	Anchor-based	66.1	54.4	60.3
	Center-based	66.5	62.7	64.6
PointPillars	Anchor-based	64.1	50.8	57.5
	Center-based	66.5	57.4	62.0

Our two cents

Simple: It use standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird-eye-view heatmap and other dense regression outputs including the offset to centers in the previous frame. Detection is a simple local peak extraction with refinement, and tracking is a closest-distance matching.

Fast and Accurate: Our best single model achieves 71.9 mAPH on Waymo and 65.5 NDS on nuScenes while running at 11FPS+.

Resources

Paper:https://paperswithcode.com/paper/center-based-3d-object-detection-and-tracking

Code and pretrained models:https://github.com/tianweiy/CenterPoint

This site is open source. Improve this page.