real or categorical predictors, which one is faster?
Show older comments
In regressions, is there a guidline to treat predictors as real values or categorical?
In a fitting problem with input as X, y where X contains the hour of the day information, e.g. 1, 2, 3, etc.., I tend to consider it as a categorical predictor because the length of unique(X) is limited (i.e. 24). Surprislingly, the fitting procedures seem slower than treating it as real values in a gaussian process fitrgp.
My questions are:
- why does it take longer with categorical predictor?
- in a similar situation, is there a guidline to decide whether take the predictors as real values or categorical inputs?
3 Comments
Walter Roberson
on 17 Sep 2023
Have you experimented with passing uint8 data? I don't know if that is permitted; if it is then it would signal that discrete algorithms are to be used
mono
on 17 Sep 2023
"why does it take longer with categorical predictor?"
I'd venture owing to the large number of dummy variables introduced by having 24 levels of time being modeled as categorical instead of continuous/discrete. You could try artificially reducing the same data set to 24, 12, 2 levels and see if that hypothesis is correct.
Regardless of whether it's true or not, it's still the model definition and purpose that should be controlling decisions such as this, not anything to do with compute time.
Accepted Answer
More Answers (0)
Categories
Find more on Gaussian Process Regression in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!