Using ensemble to forecast timeseries (machine learning)

4 views (last 30 days)
Hi everyone. Can someone help fix this? I have a simple timeseries. I am struggling to get a good forecast. I just a get a flat prediction and forecast. The timeseries is attached. Here is my code:
clc
clear all
load("y.mat");
timeSeries =y';
% Normalize the time series
meanTS = mean(timeSeries);
stdTS = std(timeSeries);
timeSeriesNorm = (timeSeries - meanTS) / stdTS;
% Split the normalized data into training (70%) and validation (30%) sets
trainRatio = 0.7;
trainSize = floor(trainRatio * length(timeSeriesNorm));
trainData = timeSeriesNorm(1:trainSize);
validationData = timeSeriesNorm(trainSize+1:end);
% Create lagged features for training data
numLags = 5; % Number of lagged features to include
X_train = zeros(length(trainData) - numLags, numLags); % No trend feature
Y_train = trainData(numLags+1:end);
for i = 1:numLags
X_train(:, i) = trainData(i:end - numLags + i - 1);
end
% Create lagged features for validation data
X_validation = zeros(length(validationData), numLags); % No trend feature
Y_validation = validationData;
for i = 1:numLags
X_validation(:, i) = timeSeriesNorm(trainSize - numLags + i:trainSize - numLags + i + length(validationData) - 1);
end
% Automatically tune hyperparameters using Bayesian optimization
ensembleModel = fitrensemble(X_train, Y_train, ...
'Method', 'LSBoost', ... % Use LSBoost for regression
'OptimizeHyperparameters', 'auto', ... % Automatically tune hyperparameters
'HyperparameterOptimizationOptions', struct('AcquisitionFunctionName', 'expected-improvement-plus', ...
'MaxObjectiveEvaluations', 30, ... % Number of iterations for optimization
'ShowPlots', false)); % Set to true to see optimization progress
|===================================================================================================================================| | Iter | Eval | Objective: | Objective | BestSoFar | BestSoFar | Method | NumLearningC-| LearnRate | MinLeafSize | | | result | log(1+loss) | runtime | (observed) | (estim.) | | ycles | | | |===================================================================================================================================| | 1 | Best | 0.10052 | 4.1753 | 0.10052 | 0.10052 | LSBoost | 275 | 0.0034968 | 1 | | 2 | Best | 0.015233 | 2.8535 | 0.015233 | 0.028622 | Bag | 243 | - | 7 | | 3 | Accept | 0.45688 | 0.65438 | 0.015233 | 0.015259 | LSBoost | 42 | 0.7808 | 32 | | 4 | Accept | 0.46591 | 0.25092 | 0.015233 | 0.015256 | LSBoost | 14 | 0.0012808 | 2 | | 5 | Best | 0.006653 | 0.7035 | 0.006653 | 0.0066675 | Bag | 58 | - | 1 | | 6 | Accept | 0.45216 | 0.15003 | 0.006653 | 0.0066878 | Bag | 10 | - | 31 | | 7 | Accept | 0.007773 | 1.0893 | 0.006653 | 0.006718 | Bag | 112 | - | 3 | | 8 | Accept | 0.0066954 | 2.828 | 0.006653 | 0.0067436 | Bag | 293 | - | 1 | | 9 | Accept | 0.11275 | 4.4348 | 0.006653 | 0.0067345 | Bag | 500 | - | 22 | | 10 | Accept | 0.45688 | 5.4449 | 0.006653 | 0.006758 | LSBoost | 496 | 0.080122 | 32 | | 11 | Accept | 0.27739 | 1.2153 | 0.006653 | 0.0066791 | LSBoost | 104 | 0.0034941 | 8 | | 12 | Accept | 0.0068037 | 1.3092 | 0.006653 | 0.0064325 | Bag | 130 | - | 1 | | 13 | Accept | 0.0070201 | 4.6093 | 0.006653 | 0.0055684 | Bag | 500 | - | 3 | | 14 | Accept | 0.14382 | 2.5388 | 0.006653 | 0.0066338 | LSBoost | 229 | 0.0034983 | 10 | | 15 | Accept | 0.18859 | 5.6267 | 0.006653 | 0.0066588 | LSBoost | 496 | 0.0011837 | 2 | | 16 | Best | 0.0029353 | 5.7159 | 0.0029353 | 0.0029722 | LSBoost | 494 | 0.0085354 | 1 | | 17 | Accept | 0.0066139 | 5.5679 | 0.0029353 | 0.0029755 | LSBoost | 498 | 0.0061045 | 3 | | 18 | Accept | 0.0072969 | 2.4341 | 0.0029353 | 0.0030765 | Bag | 256 | - | 3 | | 19 | Accept | 0.0066314 | 4.5857 | 0.0029353 | 0.0030941 | Bag | 485 | - | 1 | | 20 | Accept | 0.0065484 | 5.544 | 0.0029353 | 0.0031131 | LSBoost | 491 | 0.0057481 | 1 | |===================================================================================================================================| | Iter | Eval | Objective: | Objective | BestSoFar | BestSoFar | Method | NumLearningC-| LearnRate | MinLeafSize | | | result | log(1+loss) | runtime | (observed) | (estim.) | | ycles | | | |===================================================================================================================================| | 21 | Accept | 0.0064074 | 0.80424 | 0.0029353 | 0.0029583 | Bag | 83 | - | 1 | | 22 | Accept | 0.45585 | 5.6268 | 0.0029353 | 0.0027739 | LSBoost | 491 | 0.0054689 | 30 | | 23 | Accept | 0.0047811 | 5.9736 | 0.0029353 | 0.0038877 | LSBoost | 497 | 0.027768 | 4 | | 24 | Accept | 0.0071014 | 2.2971 | 0.0029353 | 0.0027032 | Bag | 233 | - | 2 | | 25 | Best | 0.0024371 | 5.9351 | 0.0024371 | 0.002453 | LSBoost | 500 | 0.76015 | 1 | | 26 | Accept | 0.003372 | 5.859 | 0.0024371 | 0.0024437 | LSBoost | 495 | 0.98327 | 3 | | 27 | Best | 0.00242 | 6.0773 | 0.00242 | 0.0024196 | LSBoost | 499 | 0.24188 | 2 | | 28 | Accept | 0.0024885 | 0.23972 | 0.00242 | 0.0024208 | LSBoost | 14 | 0.90842 | 1 | | 29 | Accept | 0.0024208 | 0.90034 | 0.00242 | 0.0024211 | LSBoost | 72 | 0.98326 | 2 | | 30 | Accept | 0.0024371 | 1.0667 | 0.00242 | 0.0024135 | LSBoost | 83 | 0.92045 | 1 | __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 103.2999 seconds Total objective function evaluation time: 96.5115 Best observed feasible point: Method NumLearningCycles LearnRate MinLeafSize _______ _________________ _________ ___________ LSBoost 499 0.24188 2 Observed objective function value = 0.00242 Estimated objective function value = 0.0024199 Function evaluation time = 6.0773 Best estimated feasible point (according to models): Method NumLearningCycles LearnRate MinLeafSize _______ _________________ _________ ___________ LSBoost 83 0.92045 1 Estimated objective function value = 0.0024135 Estimated function evaluation time = 1.043
% Display the best hyperparameters
bestHyperparameters = ensembleModel.HyperparameterOptimizationResults.XAtMinObjective;
disp('Best Hyperparameters:');
Best Hyperparameters:
disp(bestHyperparameters);
Method NumLearningCycles LearnRate MinLeafSize _______ _________________ _________ ___________ LSBoost 499 0.24188 2
% Predict on the validation set
Y_validation_pred_norm = predict(ensembleModel, X_validation);
% Denormalize the validation predictions
Y_validation_pred = Y_validation_pred_norm * stdTS + meanTS;
% Calculate the error on the validation set
validationError = mean((Y_validation * stdTS + meanTS - Y_validation_pred).^2);
fprintf('Validation Error: %.4f\n', validationError);
Validation Error: 0.0064
% Forecast beyond the available data to a horizon of 100
forecastHorizon = 100;
Y_forecast_norm = zeros(forecastHorizon, 1);
X_forecast = X_validation(end, :); % Start with the last validation input
for i = 1:forecastHorizon
% Predict the next value
Y_forecast_norm(i) = predict(ensembleModel, X_forecast);
% Update the input features for the next prediction
X_forecast = [X_forecast(2:end), Y_forecast_norm(i)]; % Update lagged features
end
% Denormalize the forecast
Y_forecast = Y_forecast_norm * stdTS + meanTS;
% Plot the original data, validation predictions, and forecast
figure;
plot(1:length(timeSeries), timeSeries, 'b', 'LineWidth', 1.5); % Original data
hold on;
plot(trainSize+1:trainSize+length(validationData), Y_validation_pred, 'r', 'LineWidth', 1.5); % Validation predictions
plot(trainSize+length(validationData)+1:trainSize+length(validationData)+forecastHorizon, Y_forecast, 'g', 'LineWidth', 1.5); % Forecast
legend('Original Data', 'Validation Predictions', 'Forecast');
xlabel('Time');
ylabel('Value');
title('Time Series Forecasting with Normalization and Automatic Hyperparameter Tuning');
grid on;
hold off;

Answers (1)

Nithin
Nithin on 25 Feb 2025
Hi @yamid,
The model outputs a flat line because the data lacks strong seasonality, making it difficult for the model to predict future values. As a result, the model tends to predict values close to the average of the previous data.
The output indicates that the model is struggling to identify a pattern, thus defaulting to the mean value. I tried to further verify the code by repeating the input data multiple times by stacking it and then predicting the output using the same model. The image below demonstrates that the model functions as expected and accurately forecasts the data.
Therefore, adding more data to the input helps resolve the issue by enabling the model to detect a meaningful pattern and make accurate predictions.
I hope this helps address your query.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!