DigiForests: A Longitudinal LiDAR Dataset for Forestry Robotics

Overview

We collected our dataset over multiple recording session in a forest near Stein am Rhein, Switzerland. Overall, we collected data in multiple plots denoted by a letter indicating the main type of trees in the plots, as shown in the map.

We provide LiDAR point clouds recorded with a backpack-carried automotive LiDAR sensors used in robotics applications and additionally LiDAR point clouds from a UAV for the March recording sessions.

For the backpack data, we annotated all plots with semantics (tree, shrub, ground) and also provide instance annotations for trees. Additionally, we distinguish for each tree between the stem and foliage of the trees.

Additionally, we provide reference measurements of key forestry traits acquired by domain experts.

The following table provides an overview of the dataset contents, where we split the data into a training, validation, and test set.

Plot	March 2023	October 2023	July 2024	Split
C1	G,A,L,R	G,L	G,L	Validation
D2	G,A,L,R	G,L	G,L	Training
M1	G,A,L,R	G	G	Training
M2	G,A	G	G	Test
M3	G,A,L,R	G	G	Training
M4	G,A,L,R	G	G	Training
M5	G,A,L,R	G	G	Training
"G" = Ground Data available, "A" = Aerial Data available, "L" = Labels available, "R" = Reference measurements available.

Data Format and Folder Structure

Besides the raw data as ROS bags, we also provide processed data, where all LiDAR scans have poses estimated via a LiDAR SLAM approach. For each plot, we have the following folder structure.

exp07-m1

ground_clouds

cloud_1679401800_628520000.pcd
cloud_1679401805_227596000.pcd
cloud_1679401806_228290000.pcd
⋮

aerial_clouds

cloud_1679401800_628520000.pcd
cloud_1679401805_227596000.pcd
cloud_1679401806_228290000.pcd
⋮

labels

cloud_1679401800_628520000.label
cloud_1679401805_227596000.label
cloud_1679401806_228290000.label
⋮

rosbags

frontier_2023-03-21-12-29-58_0.bag
frontier_2023-03-21-12-34-14_1.bag
frontier_2023-03-21-12-38-29_2.bag
⋮

poses.txt

Specifically, we have the following file formats:

ground_clouds and aerial_clouds: Contains the under-canopy point clouds recorded with the mobile mapping system in the PCD format, where we have x,y,z coordinates in the local coordinate frame of the LiDAR sensor and the intensity.
ground_labels: Contains for each point cloud in ground_clouds the semantic annotations. The format follows the binary SemanticKITTI format, where the lower 16-bit contain the label and fine-grained label and the upper 16-bits contain the instance id. More specifically, we use an 32-bit unsigned int for each point, where bits 31-16 correspond to the instance id, bits 15-8 are the fine-grained semantics, bits 7-0 are the semantic label:
poses.txt: Contains the per scan poses, where poses are associated via the timestamp to individual scans. Each line of the csv file contains the following columns: counter,sec,nsec,x,y,z,qx,qy,qz,qw. Here, x,y,z correspond to the translation $\mathbf{t}\in \mathbb{R}^3$ and qx,qy,qz,qw to a quaternion representing the rotational part $\mathbf{R} \in \mathbb{R}^{3\times 3}$ that can be used to transform every point $\textbf{p}_{\text{local}}$ into a point $\textbf{p}_{\text{global}}$ in the world coordinate frame: $$\mathbf{p}_{\text{global}} = \mathbf{R}\, \mathbf{p}_{\text{local}} + \mathbf{t}$$

Download

Plot	Ground Data	Aerial Data	ROS bags
C1	(1.5 GB)	(9.3 GB)	(96.6 GB)
D2	(1.1 GB)	(11.8 GB)	(59.7 GB)
M1	(0.3 GB)	(12.9 GB)	(101.0 GB)
M2	(1.2 GB)		(78.5 GB)
M3	(0.4 GB)	(11.0 GB)	(101.4 GB)
M4	(0.3 GB)		(103.2 GB)
M5	(0.3 GB)	(7.5 GB)	(52.8 GB)

Development Kit and Baseline Code

For more convenient usage of the dataset, we provide a development kit that provides code for reading and processing the data. Additionally, we provide the code of our baselines for panoptic forestry segmentation, which is available at:

https://github.com/PRBonn/digiforests

Hidden Test Set Evaluation

We plan to provide a hidden test set evaluation via a CodaLab competition that provides unbiased and reproducible results evaluated on plot M2 data. (Coming soon)

Citation

If you use our dataset or the provided tools, it would be nice if you cite our paper (PDF):

@inproceedings{malladi2025icra,
  author    = {Meher V.R. Malladi and Nived Chebrolu and Irene Scacchetti and Luca Lobefaro and Tiziano Guadagnino and Beno\^{i}t Casseau
    and Haedam Oh and Leonard Frei{\ss}muth and Markus Karppinen and Janine Schweier and Stefan Leutenegger and Jens Behley
    and Cyrill Stachniss and Maurice Fallon},
  title     = {{DigiForests: A Longitudinal LiDAR Dataset for Forestry Robotics}},
  journal   = {Proc.~of the IEEE Intl. Conf. on Robotics \& Automation (ICRA)},
  year      = {2025},
}