6D-VNet: End-to-end 6-DoF Vehicle Pose Estimation from Monocular RGB Images
We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenario.
Our approach efficiently detects traffic participants in a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors.
The method, called 6D-VNet, extends Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation.
The proposed 6D-VNet is trained end-to-end compared to previous methods.
Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, e.g., in autonomous driving scenarios.
Additionally, we incorporate the mutual information between traffic participants via a modified non-local block.
As opposed to the original non-local block implementation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values.
Di Wu, Zhaoyong Zhuang, Xiangqun Can, Wenbin Zou, Xia Li
IEEE Conference on Computer Vision and Pattern Recognition workshop, 2019