Author: stachnis

2019-09: Olga Vysotska defended her PhD Thesis

Olga Vysotska successfully defended her PhD thesis entitled “Visual Place Recognition in Changing Environments” at the University of Bonn on the Photogrammetry & Robotics Lab.


Localization is an essential capability of mobile robots and place recognition is an important component of localization. Only having precise localization, robots can reliably plan, navigate and understand the environment around them. The main task of visual place recognition algorithms is to recognize based on the visual input if the robot has seen previously a given place in the environment. Cameras are one of the popular sensors robots get information from. They are lightweight, affordable, and provide detailed descriptions of the environment in the form of images. Cameras are shown to be useful for the vast variety of emerging applications, from virtual and augmented reality applications to autonomous cars or even fleets of autonomous cars. All these applications need precise localization. Nowadays, the state-of-the-art methods are able to reliably estimate the position of the robots using image streams. One of the big challenges still is the ability to localize a camera given an image stream in the presence of drastic visual appearance changes in the environment. Visual appearance changes may be caused by a variety of different reasons, starting from camera-related factors, such as changes in exposure time, camera position-related factors, e.g. the scene is observed from a different position or viewing angle, occlusions, as well as factors that stem from natural sources, for example seasonal changes, different weather conditions, illumination changes, etc. These effects change the way the same place in the environments appears in the image and can lead to situations where it becomes hard even for humans to recognize the places. Also, the performance of the traditional visual localization approaches, such as FABMAP or DBow, decreases dramatically in the presence of strong visual appearance changes.

The techniques presented in this thesis aim at improving visual place recognition capabilities for robotic systems in the presence of dramatic visual appearance changes. To reduce the effect of visual changes on image matching performance, we exploit sequences of images rather than individual images. This becomes possible as robotic systems collect data sequentially and not in random order. We formulate the visual place recognition problem under strong appearance changes as a problem of matching image sequences collected by a robotic system at different points in time. A key insight here is the fact that matching sequences reduces the ambiguities in the data associations. This allows us to establish image correspondences between different sequences and thus recognize if two images represent the same place in the environment. To perform a search for image correspondences, we construct a graph that encodes the potential matches between the sequences and at the same time preserves the sequentiality of the data. The shortest path through such a data association graph provides the valid image correspondences between the sequences.

Robots operating reliably in an environment should be able to recognize a place in an online manner and not after having recorded all data beforehand. As opposed to collecting image sequences and then determining the associations between the sequences offline, a real-world system should be able to make a decision for every incoming image. In this thesis, we therefore propose an algorithm that is able to perform visual place recognition in changing environments in an online fashion between the query and the previously recorded reference sequences. Then, for every incoming query image, our algorithm checks if the robot is in the previously seen environment, i.e. there exists a matching image in the reference sequence, as well as if the current measurement is consistent with previously obtained query images.

Additionally, to be able to recognize places in an online manner, a robot needs to recognize the fact that it has left the previously mapped area as well as relocalize when it re-enters environment covered by the reference sequence. Thus, we relax the assumption that the robot should always travel within the previously mapped area and propose an improved graph-based matching procedure that allows for visual place recognition in case of partially overlapping image sequences.

To achieve a long-term autonomy, we further increase the robustness of our place recognition algorithm by incorporating information from multiple image sequences, collected along different overlapping and non-overlapping routes. This allows us to grow the coverage of the environment in terms of area as well as various scene appearances. The reference dataset then contains more images to match against and this increases the probability of finding a matching image, which can lead to improved localization. To be able to deploy a robot that performs localization in large scaled environments over extended periods of time, however, collecting a reference dataset may be a tedious, resource consuming and in some cases intractable task. Avoiding an explicit map collection stage fosters faster deployment of robotic systems in the real world since no map has to be collected beforehand. By using our visual place recognition approach the map collection stage can be skipped, as we are able to incorporate the information from a publicly available source, e.g., from Google Street View, into our framework due to its general formulation. This automatically enables us to perform place recognition on already existing publicly available data and thus avoid costly mapping phase. In this thesis, we additionally show how to organize the images from the publicly available source into the sequences to perform out-of-the-box visual place recognition without previously collecting the otherwise required reference image sequences at city scale.

All approaches described in this thesis have been published in peer-reviewed conference papers and journal articles. In addition to that, most of the presented contributions have been released publicly as open source software.

2019-07: Data Available: SemanticKITTI — A Dataset for Semantic Scene Understanding of LiDAR Sequences

SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences


With SemanticKITTI, we release a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete 360 deg field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using sequences comprised of multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks.

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,” in Proc. of the IEEE/CVF International Conf.~on Computer Vision (ICCV), 2019.

2019-07: Code Available: Bonnetal – an easy-to-use deep-learning training and deployment pipeline for robotics by Andres Milioto

We have recently open-sourced Bonnetal, an easy-to-use deep-learning training and deployment pipeline to do a suite of perception tasks, that we have developed for our robots’ perception systems.

Bonnetal can pre-train popular CNN backbones on ImageNet for transfer learning (popular model trained weights are downloaded by default from our server so the learning never happens from scratch) and it has fast decoders for real-time semantic segmentation. We have more applications in the internal pipeline that we will be open-sourcing within the framework as well, such as object detection, instance segmentation, keypoint/feature extraction, and more.

The key features of Bonnetal are:

  • The training interface is easy to use, even for a novice in machine learning,
  • The library of models for transfer learning requires significantly less training data and time for a new task and dataset, exploiting the knowledge that is already condensed in the pre-trained weights about low-level geometry and texture,
  • All architectures can be used with our C++ library, which also has a ROS wrapper so that you don’t have to code at all, and
  • All of the supported architectures are tested using NVIDIA’s TensorRT so that you can get that extra juice out of your Jetson or GPU, including fast inference tricks such as INT8 quantization and calibration (vs. standard, slower, floating point 32).

This video ( shows a person-vs-background segmentation network using a MobilenetsV2 architecture with a small Atrous Spatial Pyramid pooling module, running quantized to INT8 for fast inference, achieving 200FPS at VGA resolution on a single GPU.

Access to the code in our Lab’s GitHub:

2019-05: Code Available: ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras by Emanuele Palazzolo

ReFusion – 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals

ReFusion on github

Mapping and localization are essential capabilities of robotic systems. Although the majority of mapping systems focus on static environments, the deployment in real-world situations requires them to handle dynamic objects. In this paper, we propose an approach for an RGB-D sensor that is able to consistently map scenes containing multiple dynamic elements. For localization and mapping, we employ an efficient direct tracking on the truncated signed distance function (TSDF) and leverage color information encoded in the TSDF to estimate the pose of the sensor. The TSDF is efficiently represented using voxel hashing, with most computations parallelized on a GPU. For detecting dynamics, we exploit the residuals obtained after an initial registration, together with the explicit modeling of free space in the model. We evaluate our approach on existing datasets, and provide a new dataset showing highly dynamic scenes. These experiments show that our approach often surpass other state-of-the-art dense SLAM methods. We make available our dataset with the ground truth for both the trajectory of the RGB-D sensor obtained by a motion capture system and the model of the static environment using a high-precision terrestrial laser scanner.

If you use our implementation in your academic work, please cite the corresponding paper: E. Palazzolo, J. Behley, P. Lottes, P. Giguère, C. Stachniss. ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals, Submitted to IROS, 2019 (arxiv paper).

This code is related to the following publications:
E. Palazzolo, J. Behley, P. Lottes, P. Giguère, C. Stachniss. ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals, Submitted to IROS, 2019 (arxiv paper).

2019-03-11: Cyrill Stachniss receives AMiner TOP 10 Most Influential Scholar Award (2007-2017)

Cyrill Stachniss received the 2018 AMiner TOP 10 Most Influential Scholar Award (2007-2017) in the area of robotics. The AMiner Most Influential Scholar Annual List names the world’s top-cited research scholars from the fields of AI/robotics. The list is conferred in recognition of outstanding technical achievements with lasting contribution and impact to the research community. In 2018, the winners are among the most-cited scholars whose paper was published in the top venues of their respective subject fields between 2007 and 2017. Recipients are automatically determined by a computer algorithm deployed in the AMiner system that tracks and ranks scholars based on citation counts collected by top-venue publications. In specific, the list of the field Robot answers the question of between 2007 and 2017, who are the most cited scholars in ICRA and IROS conferences, which are identified as the top venues of this field.

2018-12: Code Available: Release Surfel-based Mapping using 3D Laser Range Data by Jens Behley

SuMa – Surfel-based Mapping using 3D Laser Range Data

SuMa on github

Mapping of 3d laser range data from a rotating laser range scanner, e.g., the Velodyne HDL-64E. For representing the map, we use surfels that enables fast rendering of the map for point-to-plane ICP and loop closure detection. If you use our implementation in your academic work, please cite the corresponding paper: J. Behley, C. Stachniss. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments, Proc. of Robotics: Science and Systems (RSS), 2018 (pdf).

This code is related to the following publications:
J. Behley, C. Stachniss. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments, Proc. of Robotics: Science and Systems (RSS), 2018 (pdf).