Possible Incorrect Documentation on ksdensity

Question

David Gillcrist on 8 Oct 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2158420-possible-incorrect-documentation-on-ksdensity

Answered: Umar on 9 Oct 2024

I'm trying to implement a custom version of ksdensity. In the documentation the default way of calculating the bandwidth is said to be via Silverman's Rule-of-Thumb, i.e. for a bandwidth h this rule would give

This is according to the wikipedia article on Kernal Density Estimation. However, upon rooting about in matlab files the default bandwidth is calculated in the matlab function: matlab.internal.math.validateOrEstimateBW (run open matlab.internal.math.validateOrEstimateBW if you want to view it in its entirety). Lines 64–68 are shown below and are what is relevant

if isequal(bw, 'normal-approx')
      if all(sigma>0)
          % Default window parameter is optimal for normal distribution
          % Scott's rule
          bw = sigma * (4/((d+2)*N))^(1/(d+4));
      else
          ... % Unimportant
      end
else
    ... % Unimportant
end     

The 'normal-approx' is the default setting for bandwidth estimation and it should be the rule presented above, however, it is clearly different and is referenced as "Scott's Rule". This could be an issue of wikipedia referencing the wrong bandwidth calculation and that Scott's Rule is, in fact, the same as Silverman's Rule-of-Thumb, but it's been hard to find proper confirmation of this—for example this presentation from UBC has different rule labelled as Silverman's Rule-of-Thumb—as I cannot find Silverman's original paper where he preportedly first introduced this rule. If someone could confirm that this is in fact an error in code or an error in my understanding of the bandwidth calculation, I would be greatly appreciative.

2 Comments
Show NoneHide None

Torsten on 8 Oct 2024

You should address this question to the MATLAB development team, not to the forum members as poor end users.

the cyclist on 9 Oct 2024

This question triggered a distant memory. I searched and found this question and answer from 8 years ago.

Spoiler: It's not going to help.

Sign in to comment.

Sign in to answer this question.

Answer 1

Umar on 9 Oct 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2158420-possible-incorrect-documentation-on-ksdensity#answer_1529265

Hi @David Gillcrist,

After going through your comments and studying the documentation provided at the link below

https://www.mathworks.com/help/stats/ksdensity.html?s_tid=doc_ta#btpl6_1-1

To clarify your inquiry regarding the bandwidth estimation for kernel density estimation (KDE) in MATLAB versus traditional statistical rules, let me delve into each component:

Understanding Silverman’s and Scott’s Rules

Both formulas aim to optimize density estimation under different distributional assumptions.

MATLAB's Bandwidth Calculation

In your provided MATLAB snippet from matlab.internal.math.validateOrEstimateBW, it appears that MATLAB defaults to a bandwidth estimation method labeled as "normal-approx," which aligns more closely with Scott's Rule rather than Silverman's:

bw = sigma * (4/((d+2)*N))^(1/(d+4));

This formula indeed suggests that it uses Scott’s approach by employing a constant derived from normal distribution assumptions.

Clarification on Literature References

The confusion often arises because both Silverman and Scott provide estimates based on similar principles but differ slightly in their constants due to their unique derivations. For instance: Silverman adjusts his constants to achieve optimality across various distributions, while Scott focuses specifically on normal distributions and reference you mentioned from UBC likely conflates these methods or may be contextualizing them differently.

Practical Implications

Your personal experience resonates with common practice among statisticians. Many practitioners prefer adjusting bandwidth downwards (e.g., using factors like 0.5 or lower) to avoid over-smoothing, especially with smaller sample sizes where finer details are crucial.

Here are some additional insights I would like to share with you.

Depending on your data distribution characteristics (e.g., skewness or presence of outliers), you might want to explore robust bandwidth selectors beyond Silverman’s or Scott’s rules. For instance, adaptive methods can provide better performance in heterogeneous data contexts. Also, bear in mind that different statistical software packages may implement these rules with slight variations, leading to discrepancies in output. Therefore, when comparing results across platforms (e.g., R vs MATLAB), it's essential to understand these underlying implementations.

I do agree with @Torsten’s comments about, “You should address this question to the MATLAB development team, not to the forum members as poor end users.”

Hope this helps.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Possible Incorrect Documentation on ksdensity

2 Comments
Show NoneHide None

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Possible Incorrect Documentation on ksdensity

2 Comments Show NoneHide None

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments