Data labeling
Summary
Data labeling
S00290
Version: 1.10
- arable farming
- horticulture
- viticulture
- provision of datasets
- data; other (including: nothing)
- Location: remote
- Offered by: POLIMI; UNIMI
Description
This service concerns the curation of ground truth annotations for testing and experimentation data. It is delivered starting from the definition of the most adequate labeling procedure based on the customer’s requirements and budget. In this phase, we will select the most appropriate type of annotations to produce (e.g., pixel-level annotations of images vs. a single label for the whole image), as well as the level of granularity of labels (e.g., plant colour, variety, crop class). We will also agree with the customer on the expected annotation format (e.g., JSON, csv), and annotation tool to be used (e.g., in case the customers want to label additional data by themselves in the future). We also provide labeling services for domain-specific data attributes, such as disease indicators and their confounding factors, by experts in agronomy. The labelled data will be accompanied by a report synthesising quality metrics and statistics like the number of data points annotated, percentage of the full set covered for the annotations, the number of annotators and the inter-annotator agreement.
Example service: The customer is interested in promptly identifying the emergence of the Peronospora (downy mildew) disease in vineyards. Peronospora symptoms can be detected by inspecting changes on the leaf surface (appearance of small spots, gradual changes in the leaf colour). The customer has already implemented a Computer Vision algorithm to classify leaves as healthy or unhealthy from images. However, additional data are required (e.g. collected in field, via service S00113 - Collection of test data during physical testing) to improve the performance and robustness of the solution. Specifically, the customer requires that the collected data are thoroughly annotated by expert agronomists who can recognise the presence of Peronospora symptoms at the level of individual leaves. To fulfil this request, service S????? is executed by first identifying segmentation masks as the most suitable annotation format. Albeit more costly and onerous to produce, polygonal masks can more precisely represent the leaf regions affected by the disease than, for example, rectangular bounding boxes that also enclose healthy leaf regions. It is then agreed with the customer that 60% of the collected images will be annotated by 5 agronomy experts with: i) segmentation masks for each region, if any region affected by the disease is found, ii) each region will be labeled as either “healthy” or “unhealthy”. Moreover, we will rely on Fleiss’ Kappa to ensure that a significant level of agreement (0.60 or higher) has been achieved among annotators when marking leaves as “healthy/unhealthy”.