Data Handling, Regression Learner and pre processing

3 views (last 30 days)
Hi all, i've been given a data set which i have to use within a regression Learner, the goal is for me to predict the Load from the given data as accurately as possible, by analysing R^2 values, MAE and MSE.
I can also select some data values from previous days regarding the weather, alongside some lagged load data, i have selected a weeks worth as well as a days worth, also selecting the most important previous weather information using the feature selection from the regression learner itself, ( simply running it with all choices possible and then cutting down ).
I Have applied a series of preprocessing, techniques, Z-score normalization , handling missing values as well as L2 Regularization, when i save this new data, and use the regression learner on it, the values and graphs which i would use to analyse how good the predictions are have barely changed, the graphs are almost the same and the previous values compared to the pre processed data are basically the same. Im wondering if im missing something, or if someone could guide me in the right direction with how to approach this. Here are some snippets of codes that im using
% Preprocessing Data - Handling Missing Values
labFcastTrain = fillmissing(labFcastTrain, 'constant', 0, 'DataVariables', setdiff(labFcastTrain.Properties.VariableNames, excludeVars));
labFcastTest = fillmissing(labFcastTest, 'constant', 0, 'DataVariables', setdiff(labFcastTest.Properties.VariableNames, excludeVars));
% Feature Scaling (Z-score normalization)
labFcastTrain{:, setdiff(labFcastTrain.Properties.VariableNames, excludeVars)} = zscore(labFcastTrain{:, setdiff(labFcastTrain.Properties.VariableNames, excludeVars)});
labFcastTest{:, setdiff(labFcastTest.Properties.VariableNames, excludeVars)} = zscore(labFcastTest{:, setdiff(labFcastTest.Properties.VariableNames, excludeVars)});
% Fit regularized linear regression model (L2 regularization)
reg = fitrlinear(X_train, y_train, 'Regularization', 'ridge', 'Lambda', 0.1)
Any support and any ideas would be greatly appreciated as i am simply stumped,
Thanks in advance :)
  1 Comment
Ive J
Ive J on 30 Nov 2023
There are couple of things you might consider
1- instead of fillmissing, try to remove missing observations or impute them (MICE for instance)
2- Z-score transformation of features is a good approach. Also remember some methods rely on the distribution of errors (e.g. in linear regression a highly skewed outcome with low sample size may cause deviation of from this assumption, you may try to log/sqrt transformation but be cautious about the interpretation).
3- If you want to train a penalized regression model, you should to consider hyperparameter (lambda in this case) tuning.
4- Don't forget to split your data into traint/test set and avoid overfitting by approaches such as cross-validation (nested for instance)
5- Try regression learner, maybe linearity assumption does not work for your data (have you tried regression trees or GPR?)

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!