Verification and Validation for AI with MATLAB and Simulink

V&V Workflows: The W-Shaped Development Process

AI technology is revolutionizing industries and changing how we work and live. As AI systems become integrated into safety-critical industries such as aerospace, automotive, and healthcare (Figure 1), they are making decisions that directly impact human safety and welfare. This has created a growing need for rigorous verification and validation processes to explain, verify, and validate model behavior.

Representative symbols of each industry: a stop sign for automotive, an airplane for aerospace, and a chest x-ray for medical.

Figure 1. Image classification networks are used in the automotive, aerospace, and medical industries.

In the context of AI certification, verification and validation (V&V) techniques help identify and mitigate risks by demonstrating that AI models and AI-driven systems adhere to industry standards and regulations.

Traditional V&V workflows, such as the V-cycle, may not be sufficient for ensuring the accuracy and reliability of AI models. Adaptations of these workflows emerged to better suit AI applications, such as the W-shaped development process (Figure 2).

Adaptation to the V-model of AI/machine learning system development, showing stages from requirements to verification.

Figure 2. W-shaped development process. Based on an original diagram published by the European Union Aviation Safety Agency (EASA). (Image credit: EASA)

The following sections will guide you through the V&V steps of the W-shaped development process. For a closer look at this process, see the resources below.

Implementing the W-Shaped Process: A Medical Case Study

To demonstrate a practical case for this process, this white paper will walk you through the development of a medical AI system designed to identify whether a patient is suffering from pneumonia by examining chest X-ray images. The following case study highlights the strengths and challenges of AI in safety-critical applications, showing why the image classification model must be both accurate and robust to prevent harmful misdiagnoses.

From Requirements to Robust Modeling

The first half of the W-shaped development process helps ensure that AI models meet required standards and perform reliably in real-world applications.

Requirements Allocated to ML Component Management

The first step in the W-cycle is collecting requirements specific to the machine learning component. Key considerations include implementation, testing, and explainability of the model. Requirements Toolbox™ facilitates requirement authoring, linking, and validation (Figure 3).

The Requirements Editor app showing a test precision requirement for a machine learning model.

Figure 3. The Requirements Editor app captures requirements for the machine learning component.

Data Management

The next step in the W-cycle is data management, which is crucial for supervised learning since it requires labeled data. MATLAB® provides labeling apps such as Image Labeler and Signal Labeler for interactive and automated labeling. By using imageDatastore, which structures image files for scalability, you can manage large image data sets, such as image data management for pneumonia detection training:

trainingDataFolder = "pneumoniamnist\Train";

imdsTrain = imageDatastore(trainingDataFolder,IncludeSubfolders=true,LabelSource="foldernames");

countEachLabel(imdsTrain)

Learning Process Management

Prior to training, it is essential to finalize the network architecture and training options, including the algorithm, loss function, and hyperparameters. The Deep Network Designer app allows interactive design and visualization of networks. The following code defines the architecture of a convolutional neural network (CNN) for image classification:

numClasses = numel(classNames);
layers = [
      imageInputLayer(imageSize,Normalization="none")
      convolution2dLayer(7,64,Padding=0)
      batchNormalizationLayer()
      reluLayer()
      dropoutLayer(0.5)
      averagePooling2dLayer(2,Stride=2)
      convolution2dLayer(7,128,Padding=0)
      batchNormalizationLayer()
      reluLayer()
      dropoutLayer(0.5)
      averagePooling2dLayer(2,Stride=2)
      fullyConnectedLayer(numClasses)
      softmaxLayer];

Finding optimal hyperparameters can be complex, but the Experiment Manager app helps by exploring different values through sweeping or Bayesian optimization (Figure 4). Multiple training configurations can be tested in parallel, leveraging available hardware to streamline the process.

Screenshots of hyperparameter sweep setup and CNN layer configuration in the Experiment Manager app.

Figure 4. Setting up the problem in the Experiment Manager app to find an optimal set of hyperparameters from the exported architecture in Deep Network Designer.

Model Training and Initial Validation

The training phase begins by running experiments in the Experiment Manager app, yielding an initial model with promising accuracy (~96% on the validation set). However, it does not fully meet all predefined requirements, such as robustness. Since the W-cycle is iterative, further refinements are necessary.

Experiment Manager screenshot showing CNN hyperparameter results and a confusion matrix for pneumonia detection.

Figure 5. Finding an initial model with the Experiment Manager app.

Learning Process Verification

Ensuring AI models meet specified requirements is important, especially in safety-critical applications. The next steps of the W-shaped development process involve implementing verification techniques to confirm model performance aligns with expectations.

Testing and Understanding Model Performance

The model was trained using fast gradient sign method (FGSM) adversarial training to enhance robustness against adversarial examples. It achieved over 90% accuracy, surpassing predefined requirements and benchmarks. To better understand its performance, a confusion matrix was used to analyze error patterns, while explainability techniques like Grad-CAM (Figure 6) provided visual insights that improve interpretability and trust in its decisions.

A Grad-CAM heatmap highlights predicted pneumonia-relevant regions from a chest X-ray.

Figure 6. Understanding network predictions using gradient-weighted class activation mapping (Grad-CAM).

Adversarial Examples

Adversarial examples are small, imperceptible changes to inputs that can cause neural networks to misclassify, raising concerns about robustness in safety-critical tasks like medical imaging (Figure 7).

The original X-ray image of lungs with pneumonia is misclassified as normal after adding subtle adversarial noise.

Figure 7. Adversarial examples: The effect of input perturbation on image classification.

L-Infinity Norm

The L-infinity norm is used for understanding and quantifying adversarial perturbations (Figure 8). It defines a range within which pixel values can be altered. This results in countless possible combinations to test, making it challenging to evaluate all scenarios.

Zoomed-in view of an X-ray with matrices showing stages of pixel perturbation in a neural network.

Figure 8. The L-infinity norm: Examples of possible input perturbations.

Formal Verification of Robustness

Formal verification methods offer a mathematical approach that provides a systematic way to assess and ensure the robustness of neural networks against a wide range of potential adversarial examples. The Deep Learning Toolbox™ Verification Library provides formal verification methods, such as abstract interpretation. Given an image from the test set, you can choose a perturbation that defines a large collection of perturbed images for a specific image (Figure 9).

Workflow showing an X-ray image input, perturbation set, model, and output labels: verified, unproven, or violated.

Figure 9. Formal verification using abstract interpretation.

There are three potential outcomes for each of the images:

  • Verified—The output label remains consistent.
  • Violated—The output label changes.
  • Unproven—Further verification efforts or model improvement is needed.

Out-of-Distribution Detection

A trustworthy AI system should produce accurate predictions in a known context. Still, it should also be able to identify unknown examples to the model and reject them or defer them to a human expert for safe handling. With the Deep Learning Toolbox Verification Library, you can create an out-of-distribution (OOD) data discriminator to assign confidence to network predictions by computing a distribution confidence score for each observation (Figure 10). The discriminator also provides a threshold for separating the in-distribution data from the OOD data.

Histogram showing confidence score distributions for training data and various perturbations.

Figure 10. Distribution of confidence scores for the original and derived data sets.

From Model Implementation to Requirements Validation

Once the learning process is verified, the focus shifts to adapting AI models for real-world applications. These last steps of the W-shaped development process involve preparing the model for deployment and ensuring it meets operational requirements.

Model Implementation and Code Generation

The transition from learning process verification to model implementation in the W-shaped development workflow is when an AI model moves from refinement to real-world application. Code generation with MATLAB and Simulink® automates the conversion of trained models into deployable code (e.g., C/C++ or CUDA®; see Figure 11), reducing manual coding effort and minimizing errors.

Diagram of code generation from MATLAB models to CPU, GPU, microcontroller, and FPGA targets.

Figure 11. MATLAB and Simulink code generation tools.

You can use the analyzeNetworkForCodegen function in MATLAB to verify if your deep learning model is ready for code generation. This ensures compatibility with target libraries and, for safety-critical applications, allows you to generate code without third-party dependencies. Automatic code generation simplifies certification, improves portability, and enables reliable deployment across diverse platforms.

analyzeNetworkForCodegen(net)

                  Supported
                  _________
    none           "Yes"
    arm-compute    "Yes"
    mkldnn         "Yes"
    cudnn          "Yes"
    tensorrt       "Yes"

When deployment requires optimizing memory, fixed-point arithmetic, or computational efficiency, the Deep Learning Toolbox Model Quantization Library is highly effective. Techniques such as quantization and pruning can significantly reduce model size and computational load—for example, compressing a model by 4x with only a 0.7% drop in accuracy when converting from floating point to int8 using the Deep Network Quantizer app (Figure 12).

A screenshot of the Deep Network Quantizer app showing dynamic range stats and validation results for a neural network.

Figure 12. Quantizing a deep neural network using the Deep Network Quantizer app.

With MATLAB Coder™ and GPU Coder™, you can generate C++ and CUDA code to deploy AI models on real-time systems where speed and low latency are critical. This involves configuring the target language and deep learning settings, such as using cuDNN for GPU acceleration.

cfg = coder.gpuConfig("mex"); 

cfg.TargetLang = "C++"; 

cfg.GpuConfig.ComputeCapability = "6.1"; 

cfg.DeepLearningConfig = coder.DeepLearningConfig("cudnn"); 

cfg.DeepLearningConfig.AutoTuning = true; 

cfg.DeepLearningConfig.CalibrationResultFile = "quantObj.mat"; 

cfg.DeepLearningConfig.DataType = "int8"; 

input = ones(inputSize,"int8"); 

codegen -config cfg -args input predictCodegen -report

Inference Model Verification and Integration

The inference model verification and integration phase ensures that an AI model, such as one used for pneumonia detection, performs reliably on new, unseen data and integrates well into a larger healthcare system. 

After converting the model to C++ and CUDA, this phase verifies its accuracy and embeds it within a comprehensive system, alongside components for runtime monitoring, data acquisition, and visualization. By simulating the system in Simulink, you can verify that the model operates effectively in real time and maintains performance within the broader system (Figure 13).

A diagram of an AI model with run-time monitoring and visualization for trust assessment.

Figure 13. Simulink harness integrating the deep learning model.

The run-time monitor can help distinguish between familiar and unfamiliar inputs (Figure 14). It signals confident predictions in green when data matches the training distribution and flags potential anomalies in red for OOD cases. This capability enhances the AI system’s safety and reliability by ensuring it not only makes accurate predictions but also identifies and appropriately handles unfamiliar data.

Two X-rays showing correct and incorrect pneumonia predictions with confidence scores.

Figure 14. Examples of the output of the run-time monitor subsystem.

At this stage, implementing a comprehensive testing strategy is essential. Using MATLAB Test™ or Simulink Test™, you can create automated tests to thoroughly validate the AI model’s accuracy, performance, and integration within the overall system.

Independent Data and Learning Verification

The independent data and learning verification phase ensures that training, validation, and test data sets are properly managed, complete, and representative of the application’s input space. It involves an independent review after the inference model is verified on the target platform. This phase also confirms that learning verification, including coverage analysis, has been satisfactorily completed.

Requirements Verification

The requirements verification phase concludes the W-shaped development process by ensuring all requirements are fully implemented and tested. Using Requirements Toolbox™, functions and tests are linked to their corresponding requirements, closing the development loop. Running these tests from the Requirements Editor verifies that all requirements have been successfully met (Figure 15).

Figure 15. Running tests from within Requirements Editor.

Development Process Conclusion

After requirements verification, the W-shaped development process is complete. In this medical device example, the thorough and meticulous steps of this process have ensured the AI model for pneumonia detection is accurate, robust, and ready for deployment. By linking requirements to specific functions and tests, you will have established clear traceability and systematically verified each requirement, confirming the model meets the stringent standards for healthcare applications. Now, a reliable tool can be deployed to support improved patient care.