- Even if your dataset contains only whole numbers, the tree considers midpoints between consecutive values as potential split points.
- For example, if your sorted values are {3, 5, 7, 9, 11, 13, ...}, the tree might evaluate splits at {4, 6, 8, 10, 12, ...}.
- The split at 9.5 means the algorithm found that separating values below 9.5 from those above 9.5 resulted in the best reduction in impurity or error.
How is root node value chosen in regression decision tree?
2 views (last 30 days)
Show older comments
I understand the criteria for node splitting and how the root node variable is chosen but I do not understand how the actual value for the inequality at the root node is chosen. Is it just local optimization of the numbers? For example, I have a variety of whole number values ranging from 3 to 25 and the root node is chosing 9.5. This is not the median or mean, so why is this number chosen? Is it because the decision tree analyzed all potential values to see what had the lowest MSE to start with? If so, why did it chose a decimal number when all my data points are whole numbers?
Thank you for your help!
0 Comments
Answers (1)
Ayush Aniket
on 4 Jun 2025
The split value at the root node in a decision tree is chosen based on optimization criteria, not necessarily the median or mean. Decision trees aim to minimize impurity (for classification) or reduce variance/MSE (for regression).The algorithm evaluates all possible split points and selects the one that maximizes information gain or minimizes error.
Why a Decimal Value Instead of Whole Numbers?
In MATLAB, you can visualize the tree using:
view(SVModelTree, 'Mode', 'graph');
Refer the following documentation to learn more about the viewing options: https://www.mathworks.com/help/stats/view-decision-tree.html
0 Comments
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!