Issues with LSTM prediction due to normalization Layer settings
31 views (last 30 days)
Show older comments
Patrick Sontheimer
on 22 Sep 2023
Commented: Patrick Sontheimer
on 18 Oct 2023
Hello, i've recently tried to create a LSTM-Seq2Seq Model using Multiple-Input Multiple-Output data. It's simulation data, so timesteps are correlated and i use 'sequence' as output mode for all LSTM cells. I've had a look at the tutorial cases and my situation most closely resembles the turbofan tutorial case. https://www.mathworks.com/help/deeplearning/ug/sequence-to-sequence-regression-using-deep-learning.html
I tried both manual normalization and the sequenceInputLayer options. In the latter case there are issues with the prediction. I'll attach what i'm doing as code below (i left out the sequence sorting, which can be found in the turbofan tutorial). This code uses a noisy linear trend for training and validation instead of my real data. I'll attach some prediction plots using the actual data. I've confirmed that the same issue from my code is reproduced in the example below.
The alternative to the code below is to do all steps the same, but instead use
net = trainNetwork(XTrain,YTrain,Layers,options);
for training and
sequenceInputLayer(numFeatures,"Normalization","rescale-symmetric")
as Input Layer. Finally, the plots can be created with:
PYVal = predict(net,XVal,'MiniBatchSize',1);
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
nexttile
TargetY = YVal{i}(j,:);
PredictionY = PYVal{i}(j,:);
plot(TargetY,'k')
hold on
plot(PredictionY,'b')
hold on
legend('YVal','PYVal')
hold off
%Remove axis to fit more plots:
set(gca,'xtick',[],'ytick',[])
set(gca,'xtick',[],'ytick',[])
end
Can someone explain to me what is not working correctly with the SequenceInputLayer options and how to fix it?
Now the code (can't run the code in preview so maybe you have to run it locally):
Note: I edited this post. It occured to me that providing you with an example using completly randomly distributed training and validation data doesn't provide the ANN with any valueable learnable patterns.
% Define parameters
numSequences = 3;
numFeatures = 4;
numResponses = 5;
numTimesteps = 100;
Interval = [0.1 0.9];
StartTimesteps = round(numTimesteps*Interval(1),1)
EndTimesteps = round(numTimesteps*Interval(2),-1)
% Initialize cells
XTrain = cell(numSequences,1);
YTrain = cell(numSequences,1);
XVal = cell(numSequences,1);
YVal = cell(numSequences,1);
% Initialize normalized cells
NX_Train = cell([numSequences 1]);
NY_Train = cell([numSequences 1]);
NX_Val = cell([numSequences 1]);
NY_Val = cell([numSequences 1]);
% Initialize Help Variables
HelpXT = zeros(numFeatures,numTimesteps);
HelpXV = zeros(numFeatures,numTimesteps);
HelpYT = zeros(numResponses,numTimesteps);
HelpYV = zeros(numResponses,numTimesteps);
% Fill Input Data with a noisy function
for s=1:numSequences
for i=1:numFeatures
for j=1:StartTimesteps
HelpXT(i,j) = randn+i*5;
HelpXV(i,j) = randn+i*5;
end
for j=StartTimesteps:EndTimesteps
HelpXT(i,j) = randn+i*5+ 0.2*j;
HelpXV(i,j) = randn+i*5+ 0.2*j;
end
for j=EndTimesteps:numTimesteps
k = 0.2*EndTimesteps;
HelpXT(i,j) = randn+i*5+k;
HelpXV(i,j) = randn+i*5+k;
end
end
XTrain{s} = HelpXT;
XVal{s} = HelpXV;
end
clear k
% Fill Output Data with noisy linear trend
for s=1:numSequences
for i=1:numResponses
for j=1:StartTimesteps
HelpYT(i,j) = randn+i*5;
HelpYV(i,j) = randn+i*5;
end
for j=StartTimesteps:EndTimesteps
HelpYT(i,j) = randn+i*5+ 0.2*j;
HelpYV(i,j) = randn+i*5+ 0.2*j;
end
for j=EndTimesteps:numTimesteps
k = 0.2*EndTimesteps;
HelpYT(i,j) = randn+i*5+k;
HelpYV(i,j) = randn+i*5+k;
end
end
YTrain{s} = HelpYT;
YVal{s} = HelpYV;
end
clear k
% Normalize the first dataset
[NX_Train{1},SX_Train] = mapminmax(XTrain{1});
[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
% Normalize all remaining datasets using the same options
for i=2:numel(XTrain)
NX_Train{i} = mapminmax('apply',XTrain{i},SX_Train);
NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
NX_Val{i} = mapminmax('apply',XVal{i},SX_Train);
NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end
% Define network options:
numHiddenUnits = 3;
miniBatchSize = 1;
% Define network architecture
Layers = [ ...
sequenceInputLayer(numFeatures)
lstmLayer(numHiddenUnits,'OutputMode','sequence')
dropoutLayer(0.5,"Name",'dropout')
lstmLayer(numHiddenUnits,'OutputMode','sequence')
dropoutLayer(0.5,"Name",'dropout_2')
fullyConnectedLayer(numResponses)
regressionLayer];
% Define training options
maxEpochs = 100;
InitialLearnRate = 1e-2;
Shuffle = 'every-epoch';
Plots = 'training-progress';
GradientThreshold = 1;
Verbose = 0;
ValidataionData ={NX_Val, NY_Val};
ValidationFrequency = 1;
OutputNetwork = 'best-validation-loss';
L2Regularization = 0.05;
% Save training options
options = trainingOptions('adam' ,...
'MaxEpochs',maxEpochs ,...
'MiniBatchSize',miniBatchSize ,...
'InitialLearnRate',InitialLearnRate ,...
'GradientThreshold',GradientThreshold ,...
'Shuffle', Shuffle ,...
'Plots',Plots ,...
'Verbose', Verbose ,...
'ValidationData', ValidataionData ,...
'validationFrequency',ValidationFrequency ,...
'OutputNetwork',OutputNetwork ,...
'L2Regularization',L2Regularization );
% Train the network
net = trainNetwork(NX_Train,NY_Train,Layers,options);
% Predict with the network on the validation data
PN_YVal = predict(net,NX_Val,'MiniBatchSize',1);
% initialize renormalized values
A = cell(size(XTrain,1),1); % XTrain
B = cell(size(XTrain,1),1); % YTrain
C = cell(size(XTrain,1),1); % XVal
D = cell(size(XTrain,1),1); % YVal
E = cell(size(XTrain,1),1); % PYVal
% renormalize data
% you can compare elements of A with XTrain, etc., as sanity check
for i=1:size(XTrain,1)
A{i} = mapminmax('reverse',NX_Train{i},SX_Train);
B{i} = mapminmax('reverse',NY_Train{i},SY_Train);
C{i} = mapminmax('reverse',NX_Val{i},SX_Train);
D{i} = mapminmax('reverse',NY_Val{i},SY_Train);
E{i} = mapminmax('reverse',PN_YVal{i},SY_Train);
end
% Which sequence to plot
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
nexttile
TargetY = D{i}(j,:);
PredictionY = E{i}(j,:);
plot(TargetY,'k')
hold on
plot(PredictionY,'b')
hold on
legend('YVal','PYVal')
hold off
%Remove axis to fit more plots:
set(gca,'xtick',[],'ytick',[])
set(gca,'xtick',[],'ytick',[])
end
0 Comments
Accepted Answer
Neha
on 12 Oct 2023
Hi Patrick,
I understand that you are facing issues while normalizing the training data for LSTM and you are not getting the correct predictions when sequenceInputLayer normalization options are used instead of using mapminmax. So when you were using mapminmax, you were scaling both X and Y data but when you tried normalizing at the input layer you must have normalized only the input data.
So you can normalize the output data using mapminmax and then rescale the input data at the seqenceInputLayer.
[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
for i=2:numel(XTrain)
NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end
By specifying max and min, the normalization is analogous to mapminmax, but it's not mandatory, specifying only the type of normalization is sufficient.
sequenceInputLayer(numFeatures, "Normalization","rescale-symmetric", "Max", max(XTrain{1},[],2),"Min", min(XTrain{1},[],2))
In general, it is not necessary to normalize the output data in an LSTM network. The normalization of the output data depends on the specific task and the range of values the output can take. If the output values have a wide range or are continuous, it may be beneficial to apply normalization techniques. This can help in cases where the output values have high variances or are sensitive to scale differences.
Hope this helps!
More Answers (0)
See Also
Categories
Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!