AI technology is revolutionizing industries and changing how we work and live. As AI systems become integrated into safety-critical industries such as aerospace, automotive, and healthcare (Figure 1), they are making decisions that directly impact human safety and welfare. This has created a growing need for rigorous verification and validation processes to explain, verify, and validate model behavior.
In the context of AI certification, verification and validation (V&V) techniques help identify and mitigate risks by demonstrating that AI models and AI-driven systems adhere to industry standards and regulations.
Traditional V&V workflows, such as the V-cycle, may not be sufficient for ensuring the accuracy and reliability of AI models. Adaptations of these workflows emerged to better suit AI applications, such as the W-shaped development process (Figure 2).
Figure 2. W-shaped development process. Based on an original diagram published by the European Union Aviation Safety Agency (EASA). (Image credit: EASA)
Figure 2. W-shaped development process. Based on an original diagram published by the European Union Aviation Safety Agency (EASA). (Image credit: EASA)
The following sections will guide you through the V&V steps of the W-shaped development process. For a closer look at this process, see the resources below.
Keep Exploring This Topic
To demonstrate a practical case for this process, this white paper will walk you through the development of a medical AI system designed to identify whether a patient is suffering from pneumonia by examining chest X-ray images. The following case study highlights the strengths and challenges of AI in safety-critical applications, showing why the image classification model must be both accurate and robust to prevent harmful misdiagnoses.
The first half of the W-shaped development process helps ensure that AI models meet required standards and perform reliably in real-world applications.
Requirements Allocated to ML Component Management
The first step in the W-cycle is collecting requirements specific to the machine learning component. Key considerations include implementation, testing, and explainability of the model. Requirements Toolbox™ facilitates requirement authoring, linking, and validation (Figure 3).
Figure 3. The Requirements Editor app captures requirements for the machine learning component.
Data Management
The next step in the W-cycle is data management, which is crucial for supervised learning since it requires labeled data. MATLAB® provides labeling apps such as Image Labeler and Signal Labeler for interactive and automated labeling. By using imageDatastore, which structures image files for scalability, you can manage large image data sets, such as image data management for pneumonia detection training:
trainingDataFolder = "pneumoniamnist\Train"; imdsTrain = imageDatastore(trainingDataFolder,IncludeSubfolders=true,LabelSource="foldernames"); countEachLabel(imdsTrain)
Learning Process Management
Prior to training, it is essential to finalize the network architecture and training options, including the algorithm, loss function, and hyperparameters. The Deep Network Designer app allows interactive design and visualization of networks. The following code defines the architecture of a convolutional neural network (CNN) for image classification:
numClasses = numel(classNames);
layers = [
imageInputLayer(imageSize,Normalization="none")
convolution2dLayer(7,64,Padding=0)
batchNormalizationLayer()
reluLayer()
dropoutLayer(0.5)
averagePooling2dLayer(2,Stride=2)
convolution2dLayer(7,128,Padding=0)
batchNormalizationLayer()
reluLayer()
dropoutLayer(0.5)
averagePooling2dLayer(2,Stride=2)
fullyConnectedLayer(numClasses)
softmaxLayer];
Finding optimal hyperparameters can be complex, but the Experiment Manager app helps by exploring different values through sweeping or Bayesian optimization (Figure 4). Multiple training configurations can be tested in parallel, leveraging available hardware to streamline the process.
Figure 4. Setting up the problem in the Experiment Manager app to find an optimal set of hyperparameters from the exported architecture in Deep Network Designer.
Figure 4. Setting up the problem in the Experiment Manager app to find an optimal set of hyperparameters from the exported architecture in Deep Network Designer.
Model Training and Initial Validation
The training phase begins by running experiments in the Experiment Manager app, yielding an initial model with promising accuracy (~96% on the validation set). However, it does not fully meet all predefined requirements, such as robustness. Since the W-cycle is iterative, further refinements are necessary.
Keep Exploring This Topic
Testing and Understanding Model Performance
The model was trained using fast gradient sign method (FGSM) adversarial training to enhance robustness against adversarial examples. It achieved over 90% accuracy, surpassing predefined requirements and benchmarks. To better understand its performance, a confusion matrix was used to analyze error patterns, while explainability techniques like Grad-CAM (Figure 6) provided visual insights that improve interpretability and trust in its decisions.
Adversarial Examples
Adversarial examples are small, imperceptible changes to inputs that can cause neural networks to misclassify, raising concerns about robustness in safety-critical tasks like medical imaging (Figure 7).
L-Infinity Norm
The L-infinity norm is used for understanding and quantifying adversarial perturbations (Figure 8). It defines a range within which pixel values can be altered. This results in countless possible combinations to test, making it challenging to evaluate all scenarios.
Formal Verification of Robustness
Formal verification methods offer a mathematical approach that provides a systematic way to assess and ensure the robustness of neural networks against a wide range of potential adversarial examples. The Deep Learning Toolbox™ Verification Library provides formal verification methods, such as abstract interpretation. Given an image from the test set, you can choose a perturbation that defines a large collection of perturbed images for a specific image (Figure 9).
There are three potential outcomes for each of the images:
- Verified—The output label remains consistent.
- Violated—The output label changes.
- Unproven—Further verification efforts or model improvement is needed.
Out-of-Distribution Detection
A trustworthy AI system should produce accurate predictions in a known context. Still, it should also be able to identify unknown examples to the model and reject them or defer them to a human expert for safe handling. With the Deep Learning Toolbox Verification Library, you can create an out-of-distribution (OOD) data discriminator to assign confidence to network predictions by computing a distribution confidence score for each observation (Figure 10). The discriminator also provides a threshold for separating the in-distribution data from the OOD data.
Keep Exploring This Topic
Once the learning process is verified, the focus shifts to adapting AI models for real-world applications. These last steps of the W-shaped development process involve preparing the model for deployment and ensuring it meets operational requirements.
Model Implementation and Code Generation
The transition from learning process verification to model implementation in the W-shaped development workflow is when an AI model moves from refinement to real-world application. Code generation with MATLAB and Simulink® automates the conversion of trained models into deployable code (e.g., C/C++ or CUDA®; see Figure 11), reducing manual coding effort and minimizing errors.
You can use the analyzeNetworkForCodegen function in MATLAB to verify if your deep learning model is ready for code generation. This ensures compatibility with target libraries and, for safety-critical applications, allows you to generate code without third-party dependencies. Automatic code generation simplifies certification, improves portability, and enables reliable deployment across diverse platforms.
analyzeNetworkForCodegen(net)
Supported
_________
none "Yes"
arm-compute "Yes"
mkldnn "Yes"
cudnn "Yes"
tensorrt "Yes"
When deployment requires optimizing memory, fixed-point arithmetic, or computational efficiency, the Deep Learning Toolbox Model Quantization Library is highly effective. Techniques such as quantization and pruning can significantly reduce model size and computational load—for example, compressing a model by 4x with only a 0.7% drop in accuracy when converting from floating point to int8 using the Deep Network Quantizer app (Figure 12).
With MATLAB Coder™ and GPU Coder™, you can generate C++ and CUDA code to deploy AI models on real-time systems where speed and low latency are critical. This involves configuring the target language and deep learning settings, such as using cuDNN for GPU acceleration.
cfg = coder.gpuConfig("mex"); cfg.TargetLang = "C++"; cfg.GpuConfig.ComputeCapability = "6.1"; cfg.DeepLearningConfig = coder.DeepLearningConfig("cudnn"); cfg.DeepLearningConfig.AutoTuning = true; cfg.DeepLearningConfig.CalibrationResultFile = "quantObj.mat"; cfg.DeepLearningConfig.DataType = "int8"; input = ones(inputSize,"int8"); codegen -config cfg -args input predictCodegen -report
Inference Model Verification and Integration
The inference model verification and integration phase ensures that an AI model, such as one used for pneumonia detection, performs reliably on new, unseen data and integrates well into a larger healthcare system.
After converting the model to C++ and CUDA, this phase verifies its accuracy and embeds it within a comprehensive system, alongside components for runtime monitoring, data acquisition, and visualization. By simulating the system in Simulink, you can verify that the model operates effectively in real time and maintains performance within the broader system (Figure 13).
The run-time monitor can help distinguish between familiar and unfamiliar inputs (Figure 14). It signals confident predictions in green when data matches the training distribution and flags potential anomalies in red for OOD cases. This capability enhances the AI system’s safety and reliability by ensuring it not only makes accurate predictions but also identifies and appropriately handles unfamiliar data.
At this stage, implementing a comprehensive testing strategy is essential. Using MATLAB Test™ or Simulink Test™, you can create automated tests to thoroughly validate the AI model’s accuracy, performance, and integration within the overall system.
Independent Data and Learning Verification
The independent data and learning verification phase ensures that training, validation, and test data sets are properly managed, complete, and representative of the application’s input space. It involves an independent review after the inference model is verified on the target platform. This phase also confirms that learning verification, including coverage analysis, has been satisfactorily completed.
Requirements Verification
The requirements verification phase concludes the W-shaped development process by ensuring all requirements are fully implemented and tested. Using Requirements Toolbox™, functions and tests are linked to their corresponding requirements, closing the development loop. Running these tests from the Requirements Editor verifies that all requirements have been successfully met (Figure 15).
Keep Exploring This Topic
After requirements verification, the W-shaped development process is complete. In this medical device example, the thorough and meticulous steps of this process have ensured the AI model for pneumonia detection is accurate, robust, and ready for deployment. By linking requirements to specific functions and tests, you will have established clear traceability and systematically verified each requirement, confirming the model meets the stringent standards for healthcare applications. Now, a reliable tool can be deployed to support improved patient care.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)