PADLoC

LiDAR-Based Deep Loop Closure Detection and Registration using Panoptic Attention

A key component of graph-based SLAM systems is the ability to detect loop closures in a trajectory to reduce the drift accumulated over time from the odometry. Most LiDAR-based methods achieve this goal by using only the geometric information, disregarding the semantics of the scene. In this work, we introduce PADLoC, a LiDAR-based loop closure detection and registration architecture comprising a shared 3D convolutional feature extraction backbone, a global descriptor head for loop closure detection, and a novel transformer-based head for point cloud matching and registration. We present multiple methods for estimating the point-wise matching confidence based on diversity indices. Additionally, to improve forward-backward consistency, we propose the use of two shared matching and registration heads with their source and target inputs swapped by exploiting that the estimated relative transformations must be inverse of each other. Furthermore, we leverage panoptic information during training in the form of a novel loss function that reframes the matching problem as a classification task in the case of the semantic labels and as a graph connectivity assignment for the instance labels. We perform extensive evaluations of PADLoC on multiple real-world datasets demonstrating that it achieves state-of-the-art performance.

Technical Approach

PADLoC Architecture

We introduce our novel PADLoC architecture for joint loop closure detection and point cloud registration. Built upon our previously proposed LCDNet, instead of using a differentiable approximation of the optimal transport to obtain point matches, we propose to leverage the cross-attention matrices of transformers. By using independent keys, queries, and values and their respective learnable weights, we propose a less complex transformer architecture that yields a better latent representation of the features, and thus more reliable matches. We exploit information from the semantic and instance segmentation during training to further improve the consistency of the association of keypoints between two point clouds.

Our proposed PADLoC consists of three main modules:

A shared, convolutional feature extractor.
A global descriptor head used for loop closure detection.
A transformer-based registration and matching module to estimate the point correspondences and the relative 6-DoF transform between two point clouds.

Figure: Overview of our proposed PADLoC architecture for joint loop closure detection and point cloud registration.

Loss Functions

Our total loss function consists of a weighted sum of the triplet loss for loop closure detection as well as a geometric loss and the newly proposed panoptic loss for point cloud registration.

Triplet Loss

We use the triplet loss to train the global descriptor and enforce that the distance between the descriptors of the current point cloud and that of a positive loop closure sample is minimized, while the distance between the descriptor of the current point cloud and negative samples is maximized.

Geometric Loss

In the pose loss, we compare the predicted relative transformation from the anchor to the positive sample with respect to the ground truth transformation by applying both to the coordinates of the same sampled point cloud and evaluating the mean absolute error in Euclidean space. For the auxiliary matching loss, we evaluate the geometric correspondence between the sampled anchor and positive points, leveraging the predicted matching matrix.

Panoptic Loss

Figure: Example of Multi-Matched Object Loss. Points from a single object in the anchor that are paired with points from multiple objects in the positive sample are penalized. Notice how the matching between a person and a bicycle is not penalized, since that case is already being dealt with by the semantic and meta-semantic losses.

We propose to leverage panoptic information to register two point clouds. In detail, we formulate the following losses. In the semantic loss, we treat the matching process as a classification task, where semantic class labels are predicted for the projected positive keypoints. We then take the mean absolute error of these predicted labels with respect of those of the corresponding anchor points. In case of data mislabeling, and to allow some tolerance in the semantic classification loss, we group the semantic labels into super-classes such as "vehicles" or "structures". We then evaluate the mean absolute error of the predicted meta-class labels into a meta-semantic loss function that is more heavily penalized. Finally, not only do we penalize matchings between points belonging to different semantic classes. In addition, in our novel multi-matched object loss, we further constrain the correspondences so that a subset of points belonging to a single object in the anchor sample are matched to a points from a single object in the positive sample. To avoid double-counting, this loss does not consider the semantic classes.

Code

A software implementation of this project based on PyTorch including trained models can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors. You can download the pretrained models by clicking on the button below.

Publication

José Arce, Niclas Vödisch, Daniele Cattaneo, Wolfram Burgard, and Abhinav Valada,
"PADLoC: LiDAR-Based Deep Loop Closure Detection and Registration using Panoptic Attention"
IEEE Robotics and Automation Letters (RA-L), vol. 8, issue 3, pp. 1319-1326, March 2023.

(PDF) (BibTex)