Dataset Download

Extract the provided zip file in a folder, which results in the folder structure shown under "Dataset Format".

Download (7.6 GB, md5: 5168bba762053725890478432cdbdb1d).

Dataset Format

For each image of the training (1,407 images) and validation set (772 images), we provide the following annotations in the corresponding sub-folders:

  • semantics: Pixel-wise semantic masks, where label ids correspond to: background (0), crop (1), weed(2), partial-crop (3), partial-crop (4). Partial crops and weeds have less than 50% visible pixels.
  • plant_instances: Pixel-wise instance masks for crops and weed, where ids > 0 correspond to distinct instances.
  • leaf_instances: Pixel-wise instance masks for leaves, where ids > 0 correspond to distinct instances.
  • plant_visibility: Pixel-wise visibility in range [0,255], where 255 means fully visible.
  • leaf_visibility: Pixel-wise visibility in range [0,255], where 255 means fully visible.

We store all annotations as 16-bit png files as these provide a decent lossless compression and can be easily read with off-the-shelf pillow or OpenCV, minimizing the dependencies.

All images and corresponding annotations have an image size of of size 1024 by 1024 pixels. The size was chosen such that even in later growth stages multiple plants are completely inside the image.

We additionally provide the acquisition data, e.g., 05-15, 05-26, or 06-05, at the beginning of every filename, which allows to separate the data based on the date of data acquisition.

Please, see our devkit providing a Pytorch dataloader that is ready to use with the dataset, but also the baselines implementations or instructions to reproduce our experiments in a separate repository. See our Code page for more information.

Unlabeled Data

For self-supervised pre-training or unsupervised training, we additionally provide a large number of unlabeled images. Here we distinguish patches extracted from the original images and augmented patches extracted from rotated versions.

Patches Augmented Patches
Example data (149 MB) Example data (152 MB)
April 25, 2020 (10 GB) April 25, 2020 (35.3 GB)
May 03, 2020 (9.5 GB) May 03, 2020 (33.2 GB)
May 15, 2020 (11.9 GB) May 15, 2020 (37.6 GB)
May 26, 2020 (10.3 GB) May 26, 2020 (36.1 GB)
June 05, 2020 (7.6 GB) June 05, 2020 (26.5 GB)
June 12, 2020 (7.2 GB) June 12, 2020 (26.4 GB)
July 02, 2020 (6.5 GB) July 02, 2020 (20.1 GB)

Dataset License

Creative Commons License We distribute the data under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

This means that you must attribute the work in the manner specified by the authors and if you alter, transform, or build upon the material for any purpose, even commercially, you may distribute the resulting work only under the same license.

Specifically you should cite our work (PDF):

  author = {Jan Weyler and Federico Magistri and Elias Marks and Yue Linn Chong and Matteo Sodano 
            and Gianmarco Roggiolani and Nived Chebrolu and Cyrill Stachniss and Jens Behley},
  title = {{PhenoBench --- A Large Dataset and Benchmarks for Semantic Image Interpretation 
            in the Agricultural Domain}},
  journal = {arXiv preprint},
  year = {2023}

We appreciate donations of any amount you feel appropriate from commercial users. Please contact if you want more information.