Keyword Spotting in Simulink
This example shows a Simulink® model that identifies a keyword in speech using a pretrained deep learning model. This model was trained to identify the keyword "yes". To learn about the model architecture and training, see Keyword Spotting in Noise Using MFCC and LSTM Networks.
Download Pretrained Keyword Spotting Network
Download and unzip the pretrained network and the standardization factors. The standardization factors are the global mean and standard deviation of the features used to train the model.
downloadFolder = matlab.internal.examples.downloadSupportFile("audio","KeywordSpotting.zip"); dataFolder = tempdir; unzip(downloadFolder,dataFolder) netFolder = fullfile(dataFolder,"KeywordSpotting"); addpath(netFolder)
The deep learning network was trained on mel-frequency cepstral coefficients (MFCC) computed using an
MFCC block in the model has been configured to extract the same features that the network was trained on.
MFCC block extracts feature vectors from the audio stream using 512-point analysis windows with 384-point overlap and then applies a buffer to output 16 feature vectors consisting of 39 features each. Buffering the feature vectors enables vectorized computations on the
Stateful Classify block, which enables the system to keep pace with real time (given a short time delay).
MFCC block, the features are standardized using precomputed coefficients and then transposed so that time is along the second dimension.
Stateful Classify block outputs a binary decision for each feature vector. The decisions are converted to doubles and then upsampled to create a decision mask the same length as the corresponding audio.
Manual Switch block to select either a live stream from your microphone or a test signal from an audio file.
Close the model and remove the path to the pretrained network.