Possible Incorrect Documentation on ksdensity
2 Comments
Answers (1)
Hi @David Gillcrist,
After going through your comments and studying the documentation provided at the link below
https://www.mathworks.com/help/stats/ksdensity.html?s_tid=doc_ta#btpl6_1-1
To clarify your inquiry regarding the bandwidth estimation for kernel density estimation (KDE) in MATLAB versus traditional statistical rules, let me delve into each component:
Understanding Silverman’s and Scott’s Rules
Both formulas aim to optimize density estimation under different distributional assumptions.
MATLAB's Bandwidth Calculation
In your provided MATLAB snippet from matlab.internal.math.validateOrEstimateBW, it appears that MATLAB defaults to a bandwidth estimation method labeled as "normal-approx," which aligns more closely with Scott's Rule rather than Silverman's:
bw = sigma * (4/((d+2)*N))^(1/(d+4));
This formula indeed suggests that it uses Scott’s approach by employing a constant derived from normal distribution assumptions.
Clarification on Literature References
The confusion often arises because both Silverman and Scott provide estimates based on similar principles but differ slightly in their constants due to their unique derivations. For instance: Silverman adjusts his constants to achieve optimality across various distributions, while Scott focuses specifically on normal distributions and reference you mentioned from UBC likely conflates these methods or may be contextualizing them differently.
Practical Implications
Your personal experience resonates with common practice among statisticians. Many practitioners prefer adjusting bandwidth downwards (e.g., using factors like 0.5 or lower) to avoid over-smoothing, especially with smaller sample sizes where finer details are crucial.
Here are some additional insights I would like to share with you.
Depending on your data distribution characteristics (e.g., skewness or presence of outliers), you might want to explore robust bandwidth selectors beyond Silverman’s or Scott’s rules. For instance, adaptive methods can provide better performance in heterogeneous data contexts. Also, bear in mind that different statistical software packages may implement these rules with slight variations, leading to discrepancies in output. Therefore, when comparing results across platforms (e.g., R vs MATLAB), it's essential to understand these underlying implementations.
I do agree with @Torsten’s comments about, “You should address this question to the MATLAB development team, not to the forum members as poor end users.”
Hope this helps.
0 Comments
See Also
Categories
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!