Design of testing protocols for digital testing
Summary
Design of testing protocols for digital testing
S00177
Version: 1.11
- arable farming
- horticulture
- viticulture
- test design
- design/documentation; other (including: nothing)
- Location: remote
- Offered by: POLIMI; UNIMI
Description
Any test activity involves three main components, i.e.: environment (where the tests take place), protocol (defining what tests are executed and how) and evaluation metrics (used to assess the results of the tests).
In the context of testing customers’ solutions within digital environments (e.g., defined via Service S00176), this service is targeted at designing a suitable protocol for digital testing based on the considered use cases. Components of the testing protocol defined in this phase can include:
selection of datasets to be used for testing
selection of reference AI models to be used for testing (if needed)
description of data formats and metadata standards
definition of data pre-processing and preparation steps
definition of parameter values and ranges
complete description of protocol specifics to ensure reproducibility
Example service: In the context of testing the capability of a Computer Vision model to discriminate weeds from crops, a list of candidate methods is defined that are already largely-applied on the market, to consider as a benchmark and compare against the customer’s existing solution. Incremental variations of these models are also identified (e.g., values and ranges of model parameters, isolation of individual sub-modules during training and fine-tuning, different optimization functions, etc.). Based on the model size and performance on different datasets, estimates are made of the minimum number of examples required per weed and crop class for training the models. A list of comparable datasets available within the consortium and/or publicly is also defined to reduce the model training cost and reuse existing datasets wherever possible. In the defined experimental protocol, methods are tested 10 times in a row, to collect average and standard deviation values for all evaluation metrics (e.g., defined via Service S00178).