Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Khanh Son Pham^*1,3, Christian Witte^*1,2, Jens Behley², Johannes Betz^3,4, Cyrill Stachniss^2,5,6

¹CARIAD SE ²Center for Robotics, University of Bonn, Germany ³Technical University Munich ⁴Munich Institute of Robotics and Machine Intelligence (MIRMI) ⁵University of Oxford ⁶Lamarr Institute for Machine Learning and Artificial Intelligence

^*Equal Contribution

Abstract

Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite these advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner.

To address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we propose the use of past frames to facilitate temporal consistency.

Our experimental evaluation demonstrates that our approach outperforms previous methods by a significant margin, highlighting the benefits of our modeling scheme.

Architecture

First, multi-view image features are computed and transformed into a BEV representation. Utilizing SD map prior information, we enhance learnable queries by a positional sampling and further introduce denoising to lane queries. The Lane Segment Transformer refines the queries with the BEV features to output refine lane segment queries (blue). The BEV features are further employed to derive road boundary queries (yellow). To detect 2D traffic elements, the traffic element decoder outputs 2D traffic element queries (cyan). These queries interact with the lane segment queries in the Topology Estimation to derive the traffic element-to-lane association and it further reasons about the connectivity (i.e., topology) of lane segments. In order to predict bounding boxes, lane segments, and road boundaries, simple MLPs are employed to decode the queries into the corresponding output format.

Quantitative Results

Comparison on the OpenLane-V2 validation split (subset A).

Results overview for the OpenLane-V2 validation split.
Method	Backbone	Ensemble	TTA	Temporal Fusion	DET (LS)	DET (Area)	TOP (LL)
Score (Ours)	R101	✗	✗	✔	45.3%	43.0%	40.0%
MapVision	InternImage-L	✔	✔	✗	44.0%	40.0%	40.0%
LGMap	ViT-L	✔✔	✔	✔	57.1%	35.4%	N/A

Results overview for the OpenLane-V2 validation split.

Method

Backbone

Ensemble

TTA

Temporal
Fusion

DET (LS)

DET (Area)

TOP (LL)

Score (Ours)

R101

✗

✔

45.3%

43.0%

40.0%

MapVision

InternImage-L

✔

✗

44.0%

40.0%

LGMap

ViT-L

✔✔

✔

57.1%

35.4%

N/A

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Our approach, called Score, jointly predicts lane segments, road boundaries, and traffic elements such as traffic lights or traffic signs. Further, it derives an association between the traffic elements and the lane segments, as well as the topology among the lane segments.

Abstract

Architecture

Quantitative Results

Additional Qualitative Results

Geographically Disjoint Dataset Split

Paper Video