DigiForests: A Longitudinal LiDAR Dataset for Forestry Robotics

Semantics

Tree Instances

Tree Semantics

Our dataset, called DigiForests, provides LiDAR point clouds collected with a backpack-carried mobile mapping system and point clouds from an aerial scanning system. We provide semantic annotations of tree, shrub, and ground (left), tree instance annotations (middle), and fine-grained semantics for stem and crown of trees (right).

Overview

We collected our dataset over multiple recording session in a forest near Stein am Rhein, Switzerland. Overall, we collected data in multiple plots denoted by a letter indicating the main type of trees in the plots, as shown in the map.

We provide LiDAR point clouds recorded with a backpack-carried automotive LiDAR sensors used in robotics applications and additionally LiDAR point clouds from a UAV for the March recording sessions.

For the backpack data, we annotated all plots with semantics (tree, shrub, ground) and also provide instance annotations for trees. Additionally, we distinguish for each tree between the stem and foliage of the trees.

Additionally, we provide reference measurements of key forestry traits acquired by domain experts.

The following table provides an overview of the dataset contents, where we split the data into a training, validation, and test set.

Plot March 2023 October 2023 July 2024 Split
C1 G,A,L,R G,L G,L Validation
D2 G,A,L,R G,L G,L Training
M1 G,A,L,R G G Training
M2 G,A G G Test
M3 G,A,L,R G G Training
M4 G,A,L,R G G Training
M5 G,A,L,R G G Training
"G" = Ground Data available, "A" = Aerial Data available, "L" = Labels available, "R" = Reference measurements available.

Data Format and Folder Structure

Besides the raw data as ROS bags, we also provide processed data, where all LiDAR scans have poses estimated via a LiDAR SLAM approach. For each plot, we have the following folder structure.

  • exp07-m1
    • ground_clouds
      • cloud_1679401800_628520000.pcd
      • cloud_1679401805_227596000.pcd
      • cloud_1679401806_228290000.pcd
      •               ⋮
    • aerial_clouds
      • cloud_1679401800_628520000.pcd
      • cloud_1679401805_227596000.pcd
      • cloud_1679401806_228290000.pcd
      •               ⋮
    • labels
      • cloud_1679401800_628520000.label
      • cloud_1679401805_227596000.label
      • cloud_1679401806_228290000.label
      •               ⋮
    • rosbags
      • frontier_2023-03-21-12-29-58_0.bag
      • frontier_2023-03-21-12-34-14_1.bag
      • frontier_2023-03-21-12-38-29_2.bag
      •               ⋮
    • poses.txt

Specifically, we have the following file formats:

  • ground_clouds and aerial_clouds: Contains the under-canopy point clouds recorded with the mobile mapping system in the PCD format, where we have x,y,z coordinates in the local coordinate frame of the LiDAR sensor and the intensity.
  • ground_labels: Contains for each point cloud in ground_clouds the semantic annotations. The format follows the binary SemanticKITTI format, where the lower 16-bit contain the label and fine-grained label and the upper 16-bits contain the instance id. More specifically, we use an 32-bit unsigned int for each point, where bits 31-16 correspond to the instance id, bits 15-8 are the fine-grained semantics, bits 7-0 are the semantic label:
  • poses.txt: Contains the per scan poses, where poses are associated via the timestamp to individual scans. Each line of the csv file contains the following columns: counter,sec,nsec,x,y,z,qx,qy,qz,qw. Here, x,y,z correspond to the translation \(\mathbf{t}\in \mathbb{R}^3\) and qx,qy,qz,qw to a quaternion representing the rotational part \(\mathbf{R} \in \mathbb{R}^{3\times 3}\) that can be used to transform every point \(\textbf{p}_{\text{local}}\) into a point \(\textbf{p}_{\text{global}}\) in the world coordinate frame: $$\mathbf{p}_{\text{global}} = \mathbf{R}\, \mathbf{p}_{\text{local}} + \mathbf{t}$$

Download

Plot Ground Data Aerial Data ROS bags
C1 (1.5 GB) (9.3 GB) (96.6 GB)
D2 (1.1 GB) (11.8 GB) (59.7 GB)
M1 (0.3 GB) (12.9 GB) (101.0 GB)
M2 (1.2 GB) (78.5 GB)
M3 (0.4 GB) (11.0 GB) (101.4 GB)
M4 (0.3 GB) (103.2 GB)
M5 (0.3 GB) (7.5 GB) (52.8 GB)

Development Kit and Baseline Code

For more convenient usage of the dataset, we provide a development kit that provides code for reading and processing the data. Additionally, we provide the code of our baselines for panoptic forestry segmentation, which is available at:

https://github.com/PRBonn/digiforests

Hidden Test Set Evaluation

We plan to provide a hidden test set evaluation via a CodaLab competition that provides unbiased and reproducible results evaluated on plot M2 data. (Coming soon)

Citation

If you use our dataset or the provided tools, it would be nice if you cite our paper (PDF):

@inproceedings{malladi2025icra,
  author    = {Meher V.R. Malladi and Nived Chebrolu and Irene Scacchetti and Luca Lobefaro and Tiziano Guadagnino and Beno\^{i}t Casseau
    and Haedam Oh and Leonard Frei{\ss}muth and Markus Karppinen and Janine Schweier and Stefan Leutenegger and Jens Behley
    and Cyrill Stachniss and Maurice Fallon},
  title     = {{DigiForests: A Longitudinal LiDAR Dataset for Forestry Robotics}},
  journal   = {Proc.~of the IEEE Intl. Conf. on Robotics \& Automation (ICRA)},
  year      = {2025},
}