Design of evaluation metrics for physical testing
Summary
Design of evaluation metrics for physical testing
S00108
Version: 1.11
- arable farming
- horticulture
- viticulture
- test design
- design/documentation; other (including: nothing)
- Location: at user’s premises; in Italy
- Offered by: POLIMI; UNIMI
Description
Any test activity involves three main components, i.e.: environment (where the tests take place), protocol (definining what tests are executed and how) and evaluation metrics (used to assess the results of the tests).
This service is aimed at designing the best metrics to evaluate the performance of a customer solution, taking into consideration the use cases specified by the customer and the environment and protocol for the tests (e.g., defined via services S00106 and S00107). Our team will identify and define with customers the most adequate set of quantitative (i.e., based on instrumental measurements) and/or qualitative (i.e., relying on expert human judgement) metrics to assess the system functionalities of interest.
Based on the defined evaluation metrics, a set of requirements for the collection of data and ground truth annotations will also be defined accordingly. For instance, the service may lay out the specifications for dedicated data collection campaigns (possibly executed via service S00113).
Analyses on additional environmental factors than those directly tracked through the designed metrics (e.g., seasonal effects, impact of test distribution over time on results, based on the protocol defined e.g. through service S00107) can also be provided on request and re-defined based on initial test results.
Example service: The customer is interested in measuring the capability of a Computer Vision model to discriminate weeds from crops on RGB images. Consequently, a set of objective evaluation metrics gets designed to describe the quality of predictions, including classification accuracy, precision, and recall scores for all “crop” and “weed” instances observed in the testing environment. The customer is also interested in measuring indicators such as “how often is Matricaria Chamomilla mistaken for Bean crops?”. Thus, quality of prediction scores also are computed not only at the macro level (“crop” vs. “weed”) but also for individual plant species (“Bean” vs. “Matricaria”). Based on these metrics, a balanced collection of crop and weed images will need to be prepared (e.g., through a dedicated data collection campaign - via service S00113 - followed by a suitable data annotation - via service S00115 - or augmenting existing data with additional examples - via service S00115): therefore the output of the service includes detailed information about the necessary data collection activities.