Tianwei Yin, Xingyi Zhou, Philipp Krahenbuhl (UT Austin), CVPR 2021
This paper proposes that center-based 3D object detection and tracking is well performing over 3D bounding box representations. And how center based method detection and tracking is simple. This paper outperforms on Waymo Open Dataset and ranks first among all Lidar-only submissions.
3D objects commonly represented as 3D boxes in point-cloud but this has many challenges. Like , point-clouds are sparse, and most regions of 3D space are without measurements, the resulting output three dimensional box is not well aligned with any global coordinate frame and objects having large size, shapes and aspect ratios. So In this paper, we will represent, detect, and track 3D objects as points. This has many advantages - unlike bounding boxes, points have no intrinsic orientation, a center-based representation simplifies downstream tasks such as tracking. Now our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity.. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model.
Method | mAP ↑ | NDS ↑ | PKL ↓ |
---|---|---|---|
PointPillars | 40.1 | 55.0 | 1.00 |
CVCNet | 55.3 | 64.4 | 0.92 |
CBGS | 52.8 | 63.3 | 0.77 |
PointPainting | 46.4 | 58.1 | 0.89 |
Ours | 58.0 | 65.5 | 0.69 |
Method | MOTA ↑ | MOTP ↓ | ||
---|---|---|---|---|
Vehicle | Ped | Vehicle | Ped | |
AB3D | 42.5 | 38.9 | 18.6 | 34.0 |
Ours | 62.6 | 58.3 | 16.3 | 31.1 |
Method | ΔMOTA↑ | FP↓ | FN↓ | IDS↓ |
---|---|---|---|---|
AB3D | 15.1 | 15088 | 75730 | 9027 |
Chiu et al. | 55.0 | 17533 | 33216 | 950 |
Ours | 63.8 | 18612 | 22928 | 760 |
Encoder | Method | Vehicle | Pedestrian | mAPH |
---|---|---|---|---|
VoxelNet | Anchor-based | 66.1 | 54.4 | 60.3 |
Center-based | 66.5 | 62.7 | 64.6 | |
PointPillars | Anchor-based | 64.1 | 50.8 | 57.5 |
Center-based | 66.5 | 57.4 | 62.0 |
Simple: It use standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird-eye-view heatmap and other dense regression outputs including the offset to centers in the previous frame. Detection is a simple local peak extraction with refinement, and tracking is a closest-distance matching.
Fast and Accurate: Our best single model achieves 71.9 mAPH on Waymo and 65.5 NDS on nuScenes while running at 11FPS+.
Paper:https://paperswithcode.com/paper/center-based-3d-object-detection-and-tracking
Code and pretrained models:https://github.com/tianweiy/CenterPoint