White Paper

8 Steps for Analyzing Manufacturing Data for Better AI Outcomes


Manufacturing data comes in many forms. Some data is amenable to a straightforward analysis, such as looking for statistical outliers in a washer hole’s diameter. Analysis with other types of data can be more challenging, such as the human categorization of engines that create undesirable noise.

Machine learning and artificial intelligence (AI) models can help deal with this more challenging data. But sometimes projects end in disappointment, especially when trying to replace human interpretation and categorization. Initial AI results can look promising but don’t hold up over time. This often arises due to a disconnect between data analytics, domain expertise, and confounding signals so that data is not well understood.

These issues can be mitigated by asking questions such as:

  • Do confounding factors exist in the data?
  • Is the data in a form that is most amenable to the AI model?
  • Does the data contain the information needed to train a machine learning/AI model?
  • Does the input data contain signals that only intermittently correlate with the output data?
  • Does the training data fully cover the operating range of the measurement system?
  • Are there conditions where the training data does not cover the necessary variability?
  • How accurate is my current classification (or regression) process, and how accurate can I expect an AI model to be?

This white paper provides eight best practice steps to help engineers with limited machine learning/Al experience answer the above questions, and thus understand raw data better, which can lead to an improved outcome. It covers the functionality in MATLAB® that can be used to investigate and remedy those issues. The best practices are illustrated in three scenarios featuring audio data, image data, and time-series data.

These eight steps provide approaches to understanding your data and getting the most out of it. The steps are organized into four phases:

  • Understanding the nature of the data
  • Understanding the limits of collected data
  • Preprocessing data and training models
  • Evaluating results

Understanding the Nature of the Data

1. Gather Expert Opinion

Gather input on the physical mechanisms that lead to the measured characteristic that needs to be detected, as well as any empirically understood characteristics. This can help you devise strategies on how to preprocess the data and how to categorize the data by asking:

  • Does the data need to be normalized using the mean and standard deviation, or will such normalization degrade the signal? Is another normalization method more appropriate?
  • Will frequency analysis or transforming the data using other mathematical relations make it easier for a human to make the categorization, which may also help the AI?
  • Would a statistical method or graphical visualization make the class separation clear for a human, so as to guide the feature selection for training an AI model?
  • What kinds of signal behaviors and signal levels are expected? What assumptions lead to the expectation? How can signal levels be verified with an experiment?
Workflow diagram that links cause, physics, effect, and expected data.

In an ideal-world scenario, the cause of the signal physically manifests as an effect and then is recorded in the data in an ideal situation, without the signal being affected by noise or deviated by other unwanted or unintended factors.

2. Understand Data Collection Assumptions

Understanding what factors may affect the data collection process can help you design data collection and data preprocessing strategies that mitigate possible artifacts, which otherwise might make a machine learning/AI approach fail if they are not controlled. Some example questions you can ask when collecting data include:

  • Is there an assumption that a human operator is detecting defects by audio only when they might also be getting visual cues that you are unaware of?
  • Is there an assumption that a conveyor belt speed is constant, or that a machine calibration is always performed consistently?
  • Is there an assumption that two audio amplifiers will have the same frequency response, or that two cameras will have the same RGB response to the same image scene?
  • Is there an assumption that two operators are operating the machines in the same way, or that temperature has no effect on your data?
A workflow diagram with cause, physics, effect, and actual data blocks, adding confounding variability to physics and collection artifacts to actual data.

In a real-world scenario, confounding variables in the physics and data collection hardware impact how the cause of the signal physically manifests as an effect and then is recorded in the data.


Understanding the Limitations of Collected Data

3. Collect Reproducible Data

Collect data in a way likely to enhance detection of the physical mechanisms underlying the desired measured quantity, while mitigating possible confounding variability as far as practically possible by asking:

  • Is it possible to repeat measurements under the same conditions to verify that data is reproducible?
  • How does signal-level variability compare when a run is repeated?
  • If a process is changed to accommodate the machine learning/AI system, such as with a new data collection protocol, how can you check to make sure the change does not affect the information in the data that is needed to make the prediction?
MATLAB plot with time on the x-axis and signal on the y-axis. Multiple measurements overlap.

Example of multiple measurements taken of the same sample under the same conditions. In this case, the data appears repeatable, within some noise range.

4. Experiment to Check Data

Perform experiments to assess the impact of uncontrolled data collection factors that could impact the machine learning/AI training. Test to assess what level of impact these otherwise uncontrolled factors could have and answer the following:

  • If a variable is not controlled or represented in your data, under the assumption that it will not impact the data, how can you perform an experiment to check that assumption?
  • Is it possible to have a known physical standard sample that is stable to periodically measure on the system to check for system drift, which may impact the machine learning/AI accuracy?
  • How can you use unsupervised learning (clustering) to look for new clusters that appear over time to detect uncontrolled variability that may impact the final model?
  • How can you mitigate the outsized impact of using highly correlated signals by using principal components or other data reduction approaches for dimensionality reduction?
A MATLAB plot, which has time on the x-axis and signal on the y-axis, with greater deviation between measurements than the previous plot.

The same plot as above, but the measurement is repeated with an extremely out-of-range temperature. This extreme temperature data can help an engineer gauge the impact of temperature swings under normal operation. The same approach can be used for EM interference, noise, or other factors, which may combine to have a detrimental impact on a trained AI/machine learning model.


Preprocessing Data and Training Models

5. Data Preprocessing

  1. Ideally, preprocess the data to a state where a human could perform the detection or classification. This preprocessed data is more likely suitable for training an accurate machine learning/AI model. Ask how you can use your learnings from steps 1–4 to inform your preprocessing strategy? The easier the trend is to spot in the preprocessed data for a human, the easier it will be for the machine learning/AI model.
  2. If it’s not possible to spot a signal when the experiment conditions are taken to extremes (such as extreme temperatures) where a signal should be evident, the data may not contain the information needed for a machine learning/AI detection or measurement.
  3. Use unsupervised learning to check for signs of expected clustering, and detect unexpected clustering, which may indicate factors that may confound the final model.

6. Training

First, use exploratory training with simplistic models on the data, using test data to assess where the likely good model should be; this will set a baseline. Then optimize by doing the following:

  1. Use quick training sessions to evaluate different machine learning/AI models to assess which models are the best. Use the simplest type of model to start out. Once a model type has been selected, optimize the training options, using the validation data to detect overfitting. The test set can be used to check that the model generalizes well on unseen data.
  2. Use insights from the results of simpler models to inform the best way to approach the implementation of more complex models.

Evaluating Results

7. Blind Study

Perform a blind study to compare the current best practices for detection/categorization with the new machine learning/AI model.

  1. Use a blind study to compare the machine learning/AI behavior with the current best practice. Ensure that the study is blind for both the legacy approach and the machine learning/AI approach. If the machine learning/AI is taking the place of a human, make sure that the results are blind for the human too.
  2. For human comparisons, take care to control cues from other sources, such as an out-of-sequence serial number or marks or labels that might give extra clues to the human operator. Use appropriate metrics such as precision, recall, confusion matrices, etc.

8. Review, Then Revisit, and Repeat as Needed

  1. Review the results of steps 5–6. If the data cannot be preprocessed to accentuate features that need to be detected and the training results of step 6 are inadequate, then:
    1. Revisit (if the conclusion of the review is not good) steps 1–4 to evaluate signal levels and factors that are masking the features that need to be detected, then investigate the development of a better data collection approach, or a better preprocessing approach to accentuate the features that need to be detected.
    2. Repeat steps 5–7 after a revisit of steps 1–4 to decide if longer term testing is warranted, or if further refinement is required.

Example Scenarios

In these three hypothetical scenarios covering audio data, image data, and time-series data, fictional users apply these best practices to their machine learning/AI projects. They use MATLAB in many steps.

Audio Data in Machine Production: Detecting Noisy Drills from a Production Line

Ken leads a team that tests drills at the end of a production line by plugging them in and powering them by hand to listen for unusual noises. Ken wants to automate this process and use an AI model to determine if there is an unusual noise. He wants to reduce costs but must capture >99% of bad drills to make an acceptable business case.

Ken follows the best practices checklist:

  1. Gather expert opinion
  2. Ken consults with a colleague who works on vibration minimization. He advises that the noise from drills will change depending on how they are held. Ken’s colleague has a special rig he can borrow to mitigate that potential issue and use as a standardized platform.

  3. Understand data collection assumptions
  4. Ken collects bad drill data during the manufacturing line downtime. A colleague notes normal data is contaminated with manufacturing machine noise. Ken uses anomaly detection and confirms that machine noise is detectable as an anomaly and so may unduly influence the AI; Ken updates the data collection strategy to make sure machine noise is equally present on good (OK) data with no defects and defected or no good (NG) data. He can also see differences in the principal components of the good versus bad drills. He wants to check if this difference is real, so he decides to check for data reproducibility.

  5. Collect reproducible data
  6. Ken collects manufacturing noise to augment his NG data that has no background manufacturing noise. He collects fresh data from his set of bad drills with manufacturing noise present, using the drill rig based on the insight in steps 1 and 2. He also records one good drill and one bad drill multiple times to check for repeatability or changes in the drill noise, as NG drills are normally powered multiple times, whereas OK drills are only powered once.

  7. Experiment to check data
  8. Ken uses the Wavelet Time-Frequency Analyzer app to verify that there is minimal difference between multiple recordings of the same drill, but he can also visualize the difference between good and bad drills. He tries different approaches for extracting audio features and trains a support vector machine (SVM) classifier that achieves good classification accuracy. He could try to improve the classification accuracy by using fitcauto() or the Classification Learner app to evaluate different machine learning models and hyperparameters.

  9. Data preprocessing
  10. Ken standardizes the data processing and uses audioDataAugmenter to add extra variation to his data. Then, he uses Wavelet Time-Frequency Analyzer to investigate, using wavelets with the best parameters, and cwt() to process the data in code for training.

  11. Training
  12. Ken uses cvpartition to split the data into training, validation, and test sets. Then, he uses Experiment Manager app to train convolutional neural networks (CNNs) with different parameters. He experiments with the training parameters of the CNN and the wavelet transform used to preprocess the data to form images. He modifies a pretrained AI using transfer learning to train on the data. He can further verify and debug the AI model’s predictions using visualization and explainability methods.

  13. Blind study
  14. Ken collects some fresh data, anonymizes it, and has his team listen to the data to classify it as good or bad. Ken’s team performs worse with the audio data alone, and he discovers that they visually inspect the drills, which impacts their good/bad decisions.

  15. Review, then revisit, and repeat as needed
  16. Ken finds that the CNN performs as well as his team under the audio-only condition, although his team’s performance improves with the ability to visually inspect. Adding visual inspection to the AI is something Ken can consider for future improvements.

Image Data for Medical Devices: Detecting Contaminants in Preloaded Syringes

Jen is contracted to develop an AI to identify contaminants in preloaded insulin syringes. She gets images of defective syringes, with notes on the defect type. She is unable to identify some defects because she is not trained. The company manufactures large numbers of syringes, and 100% manual inspection is not feasible; capturing such defects will help reduce recall issues.

Jen follows the best practices checklist:

  1. Gather expert opinion
  2. Jen gets the client to explicitly circle the defects on the images. The defects can be particles, scratches, or smears in either the inside or outside of the syringe. Therefore, she needs to include the entire needle. Some images have lamp glare, and she advises the client to use polarizers as the best way to mitigate that.

  3. Understand data collection assumptions
  4. Jen standardizes the images with rotation, cropping, and normalization. She sends the preprocessed and anonymized images back to the customer. Feedback confirms the preprocessing has not removed information necessary for classification by a trained technician.

  5. Collect reproducible data
  6. Jen requests more examples of good images from the client to get a better feeling of what is not a defect and increases the variety of possible images.

  7. Experiment to check data
  8. Jen uses imageDatastore to handle the images. To understand the differences, she uses the Registration Estimator app, image registration, and imsubtract() to overlay the syringes to look for differences. She trains an AI model to detect image anomalies. Looking at the anomaly heat map helps Jen better understand where the syringe defects might occur.

  9. Data preprocessing
  10. Jen devises a preprocessing strategy based on the first three steps. She uses createMask() to remove the background (tabletop), which should be excluded when training the AI model. She uses Image Labeler to create a boxLabelDatastore to train the AI on the different defect types. She augments the training images using image augmentation to alter the images to create a larger training set and bboxwarp() to adjust bounding boxes for the altered images.

  11. Training
  12. Jen uses a YOLOX object detector to detect the defect types. Following preliminary classification, she observes that one of the classes is more frequently misclassified. She adds more training data to that class and observes that the classification error reduces.

  13. Blind study
  14. Jen packages the AI with App Designer and compiles it using MATLAB Compiler™ for the customer to test. A MATLAB license is not necessary for the compiled app.

  15. Review, then revisit, and repeat as needed
  16. The client tests the AI app and sends examples of misclassified images so Jen can perform a second training.

Predicting “Infant Mortality Failure” of Valves on a Gas Turbine

Ben is tasked to use machine learning/AI to predict if an infant mortality failure will occur after his company’s microturbines (MTs) are shipped, based on the preshipping rig test data. He has a lot of rig test data, but only one data set leading up to and during a failure event. Although failures are rare, they have a serious impact on Ben’s company’s customers.

Ben follows the best practices checklist:

  1. Gather expert opinion
  2. Ben learns the failure appears to be associated with debris events that cause bearing damage in the compressor between 100 and 200 hours of use, but the source of that issue is unknown.

  3. Understand data collection assumptions
  4. A large effort has been made to ensure the rig test is consistent. Remaining variations during data collection are due to things such as ambient temperature, pressure, oil and fuel composition, and the human operators that are less well controlled.

  5. Collect reproducible data
  6. Ben has only one example of data covering a failure, so he creates a digital twin. He starts with the Simulink® gas turbine model, updates it to include bearing behavior with debris, and tunes it with the Parameter Estimator app (8:37) to mimic the real gas turbine.

  7. Experiment to check data
  8. Ben simulates damage to the bearings with the model based on good data and his one example of failure data. The indicative signals of failure at 100 hours before failure in the simulation model data are predicted to be smaller than the noise in the real data. Ben adds synthetic noise, then finds that an extended Kalman filter can detect defect trends.

  9. Data preprocessing
  10. Ben uses vibration signals and frequency-RPM map changes to examine the model data to see how bearing damage will manifest in the overall system and help him understand what to look for in the real data. The real data requires some transformation of signals with units of time period (sec) to frequency (Hz). The data has many colinear signals. Ben uses principal components for data reduction.

  11. Training
  12. Since the high-frequency noise over short time periods is expected to obscure the signal, but low-frequency drift over time will reveal the defect, Ben decides to try an LSTM, which can detect the longer term trend. With the extended testing periods, the LSTM appears to be able to detect the small drift (deterioration) that indicates an issue based on the model data.

  13. Blind study
  14. Ben trains an LSTM on a larger data set and sets up a blind study using the model data to assess the likely detection ranges on a real system.

  15. Review, then revisit, and repeat as needed
  16. Ben finds he needs to increase the turbine rig test time to detect small drift that is associated with failures and prepares a cost-benefit analysis to justify the longer turbine test costs to attempt to detect it.

Ben uses the anomaly detection tool on the raw good data and identifies that higher ambient temperatures and a particular operator may be more likely to be associated with outliers outside of the anomaly detector threshold, and thus be linked to infant mortality events. This requires more investigation.

Learn more with tips that go beyond data cleaning and preprocessing for deep learning and machine learning for signal processing applications.



Machine learning/AI models can eliminate the tedious, error-prone task of manual testing and QA in production environments. Failures in production, while serious, are rare and so root causes may not be well understood.

This can make machine learning/AI solutions appear difficult to implement in a reliable way. However, with an understanding of the data, you can generate good data sets for training.

Using the eight steps above and the example scenarios, you can apply a more systematic approach to data quality and bridge the gap between a machine learning/AI concept and its successful implementation.

MATLAB can help you get there.

A screenshot of a selection of apps from the MATLAB toolstrip: Data Cleaner, Image Labeler, Classification Learner, Signal Analyzer, Wavelet Time-Frequency Analyzer, Audio Labeler, and Wavelet Analyzer.

MATLAB apps for machine learning and for working with audio, image, and time-series data.

About the Author

Mike Simcock is a senior consultant at MathWorks and works on projects with real-world data that require data processing for AI and other applications. Prior to joining MathWorks, Mike was a senior consultant at Altran and a principal R&D scientist at Malvern Instruments, Halliburton, and Ometric. Mike holds BSc in chemistry, and a Ph.D. in semiconductor materials from the University of Salford. He has numerous peer-reviewed publications involving experimental data, and around 20 patents related to fabrication of thin film optics and application of optical instrumentation. MATLAB use has been a common theme across these positions.