Design of evaluation metrics for digital testing

Contact Service Provider

Summary

abstractservice IDsectorservice typdelivered to TEFlocationcontact

S00178
Version: 1.11

arable farming
horticulture
viticulture

test design

design/documentation; other (including: nothing)

Location: remote

Offered by: POLIMI; UNIMI

E-Mail

Description

To assess the outcomes of digital testing, specific evaluation metrics need to be defined. Our team will define (i.e., select if available, otherwise design) with customers the most adequate set of quantitative or qualitative metrics to assess the system functionalities of interest.

Based on the defined evaluation metrics, we will provide customers with a set of requirements for the selection of the most relevant datasets and models to be used for testing (including considering the setup of ad hoc simulation environments). Such requirements can, if desired, be subsequently considered as input for Services S00176 and S00177; depending on customer’s priorities.
It is also possible to make use of the output of one or both such services as an input to S00178. The key point is that in order to proceed to the testing phase, the three main elements needed for testing (i.e.: environment, protocol, metrics) need to be defined. However, since these elements are interconnected the order in which they are defined is not preordained.

Service S00178 will also provide insights on which model baselines to consider as benchmark and which features should be modelled for conducting the test: e.g., dataset collected in the context of intra-row navigation tasks providing annotation of optimal trajectories, simulation environment replicating specific plant layouts, simulation of depth and inertial measurement unit (IMU) data, and so forth.

Example service: The customer is interested in testing an intra-row navigation solution. To achieve this goal, we define with them the performance objectives: for instance, maximising success rate in terms of traversed waypoints, and minimising the percentage of damaged plants. Subsequently, suitable performance metrics are defined to evaluate how well the system attains the chosen objectives Relevant datasets where these metrics have already been annotated are also identified. In the absence of any gold standard information about optimal metric values to use as ground truth, an ancillary dedicated data collection campaign can be defined, whether conducted at physical facilities (e.g., via service S00113) or in simulation (e.g., via service S00183).