360 Localization

Mobile robots and self-driving cars have entered our daily life in the recent years with the development of High-Definition maps-based accurate localization. Cameras have the huge potential to provide low-cost, compact and self-contained visual localization against point cloud maps. However, visual methods are inherently limited by inconsistent environmental conditions in the real world, e.g., illumination, weather, season and viewpoint differences. Whereas, accurate matching can be challenging to perform on point cloud data due to sensor sparsity with no sufficient texture feature guarantees. Transitional geometry-based methods implicitly assume a static environment, such as stable lighting conditions, sunny weather, and fixed seasonal attributes. Recent learning-based visual localization methods are either constrained under limit environments(structure road) or only fit for limited viewpoints (forwards or backwards on the street). Current image-to-range localization methods are difficult to leverage in real-world applications, or can hardly address the above issues simultaneously.

The major contributions of 360Loc are:

We developped a low cost 360cam localization system for large-scale and long-term real-world navigation.
The proposed method help robotic localization against either 360 visual imagries or existing LiDAR map under variant conditions.
The proposed method is invariant to orientation difference.
The proposed method can provide fast global re-localization.

Cross Domain Image Transfer

Feature extraction networks for 360cam localization.

On of our key screct sauce is the domain transfer module. To generate constant geometry features from visual inputs under different environmental conditions, we construct a cross domain transfer network between 2D imagery and 3D range projections. Before we introduce the details, we will introduce the visual features from the point view of information entropy. Naturally, the condition- related feature (illumination, weather, seasons) and invariant features (geometry) in the image domain are tightly coupled. Our proposed solution can extract conditional free geometric features during the semi-supervised training pipeline without any requirements of expensive human labeling.

Key Results

The training and evaluation dataset includes:

Long-term Dataset: we create 15 long-term dataset by generating trajectories in 2020\06~2021\01 with variant season, weather changes. Distance for each trajectory is around 200~400m. Trajectories 1~10 for training, and 11~15 for evaluation.
Large-scale Dataset: we create a large-scale outdoor dataset with 8 trajectories by traversing 1.5~2km routes under structured/unstructured outdoor environments. Trajectories 1~6 for training procedure, and trajectories 7~8 for evaluation.
Multistory Dataset: we create a indoor dataset by traversing 8 trajectories within a multi-floor area under daytime/nighttime. The average distance for indoor routines is 100~150m. We use trajectories 1~6 for network training, and 7~8 for evaluation.

The dataset are collected with a LiDAR (VLP16) device, an 360 camera and an inertial measurement unit (IMU). We gathered the indoor and outdoor datasets via recording the raw data from above sensors. Since lacking the ground truth position in indoor or other GPS-denied environments, we use the LiDAR odometry outputs as the ground truth estimation.

Publications

BibTeX:

@InProceedings{Yin:RSS2021,
  author = {Peng Yin and Lingyun Xu and Ji Zhang and Howie Choset and Sebastian Scherer},
  title = {i3dLoc: Image-to-range Cross-domain Localization Robust to Inconsistent Environmental Conditions},
  booktitle = {Proceedings of Robotics: Science and Systems (RSS '21)},
  year = {2021},
  month = {July},
  publisher = {Robotics: Science and Systems 2021},
  keywords = {Visual SLAM, Place Recognition, Condition Invariant, Viewpoint Invariant},
  url = {https://arxiv.org/abs/2105.12883},
  video     = {https://www.youtube.com/watch?v=ta1_CeJV5nI},
}

@ARTICLE{Yin2022Fast,
  author={Yin, Peng and Wang, Fuying and Egorov, Anton and Hou, Jiafan and Jia, Zhenzhong and Han, Jianda},
  journal={IEEE Transactions on Industrial Electronics}, 
  title={Fast Sequence-Matching Enhanced Viewpoint-Invariant 3-D Place Recognition}, 
  year={2022},
  volume={69},
  number={2},
  pages={2127-2135},
  doi={10.1109/TIE.2021.3057025}}

@ARTICLE{zhao2022spherevlad++,
  title={SphereVLAD++: Attention-based and Signal-enhanced Viewpoint Invariant Descriptor},
  author={Zhao, Shiqi and Yin, Peng and Yi, Ge and Scherer, Sebastian},
  journal={arXiv preprint arXiv:2207.02958},
  year={2022},
}