Author: stachnis

2020-06: Lorenzo Nardi Defended His PhD Thesis


Over the last decade, the demand for autonomous mobile robots has been growing continuously. Applications range from mobile manipulators operating on factory floors to autonomous cars driving in urban environments. A common requirement for all these tasks is the capability to autonomously navigate by making sequences of decisions in environments that are complex, dynamic, and uncertain. Robots are often deployed in environments populated by humans or other moving objects and require to perform safe and compliant behaviors for navigation. Furthermore, real-world scenarios are typically characterized by uncertainty in the robot’s perception, action execution, and belief about the world. Traditional approaches to robot navigation plan and follow the shortest path on static geometric representations of the environment. Such systems are often not adequate to capture the characteristics of real-world environments and may lead robots to perform behaviors that are sub-optimal in practice.
In this thesis, we address robot navigation in different real-world scenarios and investigate a set of approaches that go beyond planning the shortest paths. We present solutions for robot navigation that are able to take into account and reason about the situation in which the robot navigates, the dynamics populating the environment, and the uncertainty about the world by exploiting available background knowledge. For example, we use publicly available maps of urban environments to planning policies for performing robust navigation on road net- works under position uncertainty. Whereas, we exploit the paths experienced by the robot during navigation to generate safe and predictable behaviors that meet the user’s preferences. We also present solutions for navigating in partially unknown environments by actively gathering information and by exploiting this knowledge to automatically improve robot navigation over time. We use the onboard robot perception during navigation in outdoor environments to automatically discover paths along which the impact of detrimental factors due to the terrain is lower. Furthermore, we exploit the observations about the traversability changes in an environment to plan anticipatory behaviors that lead the robot to encounter a reduced number of unforeseen obstacles while navigating.

2020-03-19: Remote Teaching during the COVID-19 Pandemic

Dear Students,

The current COVID-19 situation is difficult for all of us. In order to allow you to attend as many courses as possible and complete your study plan with minimum delays, we will try hard and will offer all courses in an online fashion.

Our plan is to make all lectures that will be taught in the summer term 2020 available as video lectures on YouTube. In addition to that, we will have question sessions via a live video conferencing system with the lecturer and the same for the tutorials. This might not be the perfect setup but the closest thing we can do to teaching in the lecture room.

We will start our video lecture early and aim at making all the material available in the beginning of April 2020 on our website. This will allow you to start taking our classes around the original schedule of the term and thus during a time when most of you will have to stay at home. We will be flexible w.r.t. the submission date for homework assignments.

This will affect all our courses, for the summer term 2020, this will:

  • Photogrammetry 1
  • Sensors and State Estimation 2
  • Modern C++
  • Master project (Stachniss – Kuhlmann – McCool)

The Master project course will be done using 1:1 supervision with regular live video supervision and will probably be a bit more structured compared to previous years. All in all, we will try to do our best to allow you to study your mandatory and elective courses with our lab in the summer term 2020.

All the best and stay healthy,
Cyrill Stachniss

2019-12: Emanuele Palazzolo Defended His PhD Thesis


Mapping the environment with the purpose of building a 3D model that represents it, is traditionally achieved by trained personnel, using measuring equipment such as cameras or terrestrial laser scanners. This process is often expensive and time-consuming. The use of a robotic platform for such a purpose can simplify the process and enables the use of 3D models for consumer applications or in environments inaccessible to human operators. However, fully autonomous 3D reconstruction is a complex task and it is the focus of several open research topics.
In this thesis, we try to address some of the open problems in active 3D environment reconstruction. For solving such a task, a robot should autonomously determine the best positions to record measurements and integrate these measurements in a model while exploring the environment. In this thesis, we first address the task of integrating the measurements from a sensor in real-time into a dense 3D model. Second, we focus on \emph{where} the sensor should be placed to explore an unknown environment by recording the necessary measurements as efficiently as possible. Third, we relax the assumption of a static environment, which is typically made in active 3D reconstruction. Specifically, we target long-term changes in the environment and we address the issue of how to identify them online with an exploring robot, to integrate them in an existing 3D model. Finally, we address the problem of identifying and dealing with dynamic elements in the environment, while recording the measurements.

In the first part of this thesis, we assume the environment to be static and we solve the first two problems. We propose an approach to 3D reconstruction in real-time using a consumer RGB-D sensor. A particular focus of our approach is its efficiency in terms of both execution time and memory consumption. Moreover, our method is particularly robust to situations where the structural cues are insufficient. Additionally, we propose an approach to compute iteratively the next best viewpoint for the sensor to maximize the information obtained from the measurements. Our algorithm is taylored for micro aerial vehicles (MAV) and takes into account the specific limitations that this kind of robots have.

In the second part of this work, we focus on non-static environments and we address the last two problems. We deal with long-term changes by proposing an approach that is able to identify the regions that changed on a 3D model, from a short sequence of images. Our method is fast enough to be suitable to run online on a mapping robot, which can direct its effort on the parts of the environment that have changed. Finally, we address the problem of mapping fully dynamic environments, by proposing an online 3D reconstruction approach that is able to identify and filter out dynamic elements in the measurements.

In sum, this thesis makes several contributions in the context of robotic map building and dealing with change. Compared to the current state of the art, the approaches presented in this thesis allow for a more robust real-time tracking of RGB-D sensors including the ability to deal with dynamic scenes. Moreover, this work provides a new, more efficient view point selection technique for MAV exploration, and an efficient online change detection approach operating on 3D models from images that is substantially faster than comparable existing methods. Thus, we advanced the state of the art in the field with respect to robustness as well as efficiency.

2019-09: Olga Vysotska defended her PhD Thesis

Olga Vysotska successfully defended her PhD thesis entitled “Visual Place Recognition in Changing Environments” at the University of Bonn on the Photogrammetry & Robotics Lab.


Localization is an essential capability of mobile robots and place recognition is an important component of localization. Only having precise localization, robots can reliably plan, navigate and understand the environment around them. The main task of visual place recognition algorithms is to recognize based on the visual input if the robot has seen previously a given place in the environment. Cameras are one of the popular sensors robots get information from. They are lightweight, affordable, and provide detailed descriptions of the environment in the form of images. Cameras are shown to be useful for the vast variety of emerging applications, from virtual and augmented reality applications to autonomous cars or even fleets of autonomous cars. All these applications need precise localization. Nowadays, the state-of-the-art methods are able to reliably estimate the position of the robots using image streams. One of the big challenges still is the ability to localize a camera given an image stream in the presence of drastic visual appearance changes in the environment. Visual appearance changes may be caused by a variety of different reasons, starting from camera-related factors, such as changes in exposure time, camera position-related factors, e.g. the scene is observed from a different position or viewing angle, occlusions, as well as factors that stem from natural sources, for example seasonal changes, different weather conditions, illumination changes, etc. These effects change the way the same place in the environments appears in the image and can lead to situations where it becomes hard even for humans to recognize the places. Also, the performance of the traditional visual localization approaches, such as FABMAP or DBow, decreases dramatically in the presence of strong visual appearance changes.

The techniques presented in this thesis aim at improving visual place recognition capabilities for robotic systems in the presence of dramatic visual appearance changes. To reduce the effect of visual changes on image matching performance, we exploit sequences of images rather than individual images. This becomes possible as robotic systems collect data sequentially and not in random order. We formulate the visual place recognition problem under strong appearance changes as a problem of matching image sequences collected by a robotic system at different points in time. A key insight here is the fact that matching sequences reduces the ambiguities in the data associations. This allows us to establish image correspondences between different sequences and thus recognize if two images represent the same place in the environment. To perform a search for image correspondences, we construct a graph that encodes the potential matches between the sequences and at the same time preserves the sequentiality of the data. The shortest path through such a data association graph provides the valid image correspondences between the sequences.

Robots operating reliably in an environment should be able to recognize a place in an online manner and not after having recorded all data beforehand. As opposed to collecting image sequences and then determining the associations between the sequences offline, a real-world system should be able to make a decision for every incoming image. In this thesis, we therefore propose an algorithm that is able to perform visual place recognition in changing environments in an online fashion between the query and the previously recorded reference sequences. Then, for every incoming query image, our algorithm checks if the robot is in the previously seen environment, i.e. there exists a matching image in the reference sequence, as well as if the current measurement is consistent with previously obtained query images.

Additionally, to be able to recognize places in an online manner, a robot needs to recognize the fact that it has left the previously mapped area as well as relocalize when it re-enters environment covered by the reference sequence. Thus, we relax the assumption that the robot should always travel within the previously mapped area and propose an improved graph-based matching procedure that allows for visual place recognition in case of partially overlapping image sequences.

To achieve a long-term autonomy, we further increase the robustness of our place recognition algorithm by incorporating information from multiple image sequences, collected along different overlapping and non-overlapping routes. This allows us to grow the coverage of the environment in terms of area as well as various scene appearances. The reference dataset then contains more images to match against and this increases the probability of finding a matching image, which can lead to improved localization. To be able to deploy a robot that performs localization in large scaled environments over extended periods of time, however, collecting a reference dataset may be a tedious, resource consuming and in some cases intractable task. Avoiding an explicit map collection stage fosters faster deployment of robotic systems in the real world since no map has to be collected beforehand. By using our visual place recognition approach the map collection stage can be skipped, as we are able to incorporate the information from a publicly available source, e.g., from Google Street View, into our framework due to its general formulation. This automatically enables us to perform place recognition on already existing publicly available data and thus avoid costly mapping phase. In this thesis, we additionally show how to organize the images from the publicly available source into the sequences to perform out-of-the-box visual place recognition without previously collecting the otherwise required reference image sequences at city scale.

All approaches described in this thesis have been published in peer-reviewed conference papers and journal articles. In addition to that, most of the presented contributions have been released publicly as open source software.

2019-07: Data Available: SemanticKITTI — A Dataset for Semantic Scene Understanding of LiDAR Sequences

SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences


With SemanticKITTI, we release a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete 360 deg field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using sequences comprised of multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks.

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,” in Proc. of the IEEE/CVF International Conf.~on Computer Vision (ICCV), 2019.

2019-07: Code Available: Bonnetal – an easy-to-use deep-learning training and deployment pipeline for robotics by Andres Milioto

We have recently open-sourced Bonnetal, an easy-to-use deep-learning training and deployment pipeline to do a suite of perception tasks, that we have developed for our robots’ perception systems.

Bonnetal can pre-train popular CNN backbones on ImageNet for transfer learning (popular model trained weights are downloaded by default from our server so the learning never happens from scratch) and it has fast decoders for real-time semantic segmentation. We have more applications in the internal pipeline that we will be open-sourcing within the framework as well, such as object detection, instance segmentation, keypoint/feature extraction, and more.

The key features of Bonnetal are:

  • The training interface is easy to use, even for a novice in machine learning,
  • The library of models for transfer learning requires significantly less training data and time for a new task and dataset, exploiting the knowledge that is already condensed in the pre-trained weights about low-level geometry and texture,
  • All architectures can be used with our C++ library, which also has a ROS wrapper so that you don’t have to code at all, and
  • All of the supported architectures are tested using NVIDIA’s TensorRT so that you can get that extra juice out of your Jetson or GPU, including fast inference tricks such as INT8 quantization and calibration (vs. standard, slower, floating point 32).

This video ( shows a person-vs-background segmentation network using a MobilenetsV2 architecture with a small Atrous Spatial Pyramid pooling module, running quantized to INT8 for fast inference, achieving 200FPS at VGA resolution on a single GPU.

Access to the code in our Lab’s GitHub: