Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Khanh Son Pham*1,3, Christian Witte*1,2, Jens Behley2, Johannes Betz3,4, Cyrill Stachniss2,5,6
1CARIAD SE 2Center for Robotics, University of Bonn, Germany 3Technical University Munich 4Munich Institute of Robotics and Machine Intelligence (MIRMI) 5University of Oxford 6Lamarr Institute for Machine Learning and Artificial Intelligence
*Equal Contribution

Our approach, called Score, jointly predicts lane segments, road boundaries, and traffic elements such as traffic lights or traffic signs. Further, it derives an association between the traffic elements and the lane segments, as well as the topology among the lane segments.

Abstract

Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite these advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner.

To address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we propose the use of past frames to facilitate temporal consistency.

Our experimental evaluation demonstrates that our approach outperforms previous methods by a significant margin, highlighting the benefits of our modeling scheme.

Architecture

Score Architecture
First, multi-view image features are computed and transformed into a BEV representation. Utilizing SD map prior information, we enhance learnable queries by a positional sampling and further introduce denoising to lane queries. The Lane Segment Transformer refines the queries with the BEV features to output refine lane segment queries (blue). The BEV features are further employed to derive road boundary queries (yellow). To detect 2D traffic elements, the traffic element decoder outputs 2D traffic element queries (cyan). These queries interact with the lane segment queries in the Topology Estimation to derive the traffic element-to-lane association and it further reasons about the connectivity (i.e., topology) of lane segments. In order to predict bounding boxes, lane segments, and road boundaries, simple MLPs are employed to decode the queries into the corresponding output format.

Quantitative Results

Comparison on the OpenLane-V2 validation split (subset A).
Additional Examples
Results overview for the OpenLane-V2 validation split.
Method Backbone Ensemble TTA Temporal
Fusion
DET (LS) DET (Area) TOP (LL)
Score (Ours) R101 45.3% 43.0% 40.0%
MapVision InternImage-L 44.0% 40.0% 40.0%
LGMap ViT-L ✔✔ 57.1% 35.4% N/A

Additional Qualitative Results

Additional Examples
Illustrative examples for different scenes highlighting the performance advantage of our proposed method.
Additional Examples
These examples demonstrate the robustness of our approach to ambiguity in the SD map.

Geographically Disjoint Dataset Split

Results on the geographically disjoint data split.

Paper Video