kstest2

Two-sample Kolmogorov-Smirnov test

Syntax

h = kstest2(x1,x2)

h = kstest2(x1,x2,Name,Value)

[h,p] =
kstest2(___)

[h,p,ks2stat]
= kstest2(___)

Description

h = kstest2(x1,x2) returns a test decision for the null hypothesis that the data in vectors x1 and x2 are from the same continuous distribution, using the two-sample Kolmogorov-Smirnov test. The alternative hypothesis is that x1 and x2 are from different continuous distributions. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, and 0 otherwise.

example

h = kstest2(x1,x2,Name,Value) returns a test decision for a two-sample Kolmogorov-Smirnov test with additional options specified by one or more name-value pair arguments. For example, you can change the significance level or conduct a one-sided test.

example

[h,p] = kstest2(___) also returns the asymptotic p-value p, using any of the input arguments from the previous syntaxes.

example

[h,p,ks2stat] = kstest2(___) also returns the test statistic ks2stat.

example

Examples

collapse all

Test Two Samples for the Same Distribution

Open Live Script

Generate sample data from two different Weibull distributions.

rng(1);     % For reproducibility
x1 = wblrnd(1,1,1,50);
x2 = wblrnd(1.2,2,1,50);

Test the null hypothesis that data in vectors x1 and x2 comes from populations with the same distribution.

h = kstest2(x1,x2)

h = logical
   1

The returned value of h = 1 indicates that kstest rejects the null hypothesis at the default 5% significance level.

Test the Hypothesis at Different Significance Levels

Open Live Script

Generate sample data from two different Weibull distributions.

rng(1);     % For reproducibility
x1 = wblrnd(1,1,1,50);
x2 = wblrnd(1.2,2,1,50);

Test the null hypothesis that data vectors x1 and x2 are from populations with the same distribution at the 1% significance level.

[h,p] = kstest2(x1,x2,'Alpha',0.01)

h = logical
   0

p = 
0.0317

The returned value of h = 0 indicates that kstest does not reject the null hypothesis at the 1% significance level.

One-Sided Hypothesis Test

Open Live Script

Generate sample data from two different Weibull distributions.

rng(1);     % For reproducibility
x1 = wblrnd(1,1,1,50);
x2 = wblrnd(1.2,2,1,50);

Test the null hypothesis that data in vectors x1 and x2 comes from populations with the same distribution, against the alternative hypothesis that the cdf of the distribution of x1 is larger than the cdf of the distribution of x2.

[h,p,k] = kstest2(x1,x2,'Tail','larger')

h = logical
   1

p = 
0.0158

k = 
0.2800

The returned value of h = 1 indicates that kstest rejects the null hypothesis, in favor of the alternative hypothesis that the cdf of the distribution of x1 is larger than the cdf of the distribution of x2, at the default 5% significance level. The returned value of k is the test statistic for the two-sample Kolmogorov-Smirnov test.

Input Arguments

collapse all

`x1` — Sample data
vector

Sample data from the first sample, specified as a vector. Data vectors x1 and x2 do not need to be the same size.

Data Types: single | double

`x2` — Sample data
vector

Sample data from the second sample, specified as a vector. Data vectors x1 and x2 do not need to be the same size.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Tail','larger','Alpha',0.01 specifies a test using the alternative hypothesis that the cdf of x1 is larger than the cdf of x2, conducted at the 1% significance level.

`Alpha` — Significance level
`0.05` (default) | scalar value in the range (0,1)

Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1).

Example: 'Alpha',0.01

Data Types: single | double

`Tail` — Type of alternative hypothesis
`'unequal'` (default) | `'larger'` | `'smaller'`

Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of the following.

`'unequal'`	Test the alternative hypothesis that the cdf of `x1` is unequal to the cdf of `x2`.
`'larger'`	Test the alternative hypothesis that the cdf of `x1` is stochastically larger than the cdf of `x2`.
`'smaller'`	Test the alternative hypothesis that the cdf of `x1` is stochastically smaller than the cdf of `x2`.

If the data values in x1 tend to be larger than those in x2, the empirical distribution function of x1 tends to be smaller than that of x2, and vice versa.

Example: 'Tail','larger'

Output Arguments

collapse all

`h` — Hypothesis test result
`1` | `0`

Hypothesis test result, returned as a logical value.

If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level.
If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level.

`p` — Asymptotic p-value
scalar value in the range (0,1)

Asymptotic p-value of the test, returned as a scalar value in the range (0,1). p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. The asymptotic p-value becomes very accurate for large sample sizes, and is believed to be reasonably accurate for sample sizes n1 and n2, such that (n1*n2)/(n1 + n2) ≥ 4.

`ks2stat` — Test statistic
nonnegative scalar value

Test statistic, returned as a nonnegative scalar value.

More About

collapse all

Two-Sample Kolmogorov-Smirnov Test

The two-sample Kolmogorov-Smirnov test is a nonparametric hypothesis test that evaluates the difference between the cdfs of the distributions of the two sample data vectors over the range of x in each data set.

The two-sided test ('Tail','unequal') uses the maximum absolute difference between the empirical cdfs of the distributions of the two data vectors. The test statistic is

$D^{*} = \max_{x} (| {\hat{F}}_{1} (x) - {\hat{F}}_{2} (x) |),$

where ${\hat{F}}_{1} (x)$ is the proportion of x1 values less than or equal to x and ${\hat{F}}_{2} (x)$ is the proportion of x2 values less than or equal to x.

The one-sided test uses the actual value of the difference between the empirical cdfs of the distributions of the two data vectors rather than the absolute value. If you specify 'Tail','larger', the test statistic is

$D^{*} = \max_{x} ({\hat{F}}_{1} (x) - {\hat{F}}_{2} (x)) .$

If you specify 'Tail','smaller', the test statistic is

$D^{*} = \max_{x} ({\hat{F}}_{2} (x) - {\hat{F}}_{1} (x)) .$

Algorithms

In kstest2, the decision to reject the null hypothesis is based on comparing the p-value p with the significance level Alpha, not by comparing the test statistic ks2stat with a critical value.

References

[1] Massey, F. J. “The Kolmogorov-Smirnov Test for Goodness of Fit.” Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78.

[2] Miller, L. H. “Table of Percentage Points of Kolmogorov Statistics.” Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121.

[3] Marsaglia, G., W. Tsang, and J. Wang. “Evaluating Kolmogorov’s Distribution.” Journal of Statistical Software. Vol. 8, Issue 18, 2003.

Version History

Introduced before R2006a

kstest2

Syntax

Description

Examples

Test Two Samples for the Same Distribution

Test the Hypothesis at Different Significance Levels

One-Sided Hypothesis Test

Input Arguments

x1 — Sample data vector

x2 — Sample data vector

Name-Value Arguments

Alpha — Significance level 0.05 (default) | scalar value in the range (0,1)

Tail — Type of alternative hypothesis 'unequal' (default) | 'larger' | 'smaller'

Output Arguments

h — Hypothesis test result 1 | 0

p — Asymptotic p-value scalar value in the range (0,1)

ks2stat — Test statistic nonnegative scalar value

More About

Two-Sample Kolmogorov-Smirnov Test

Algorithms

References

Version History

See Also

`x1` — Sample data
vector

`x2` — Sample data
vector

`Alpha` — Significance level
`0.05` (default) | scalar value in the range (0,1)

`Tail` — Type of alternative hypothesis
`'unequal'` (default) | `'larger'` | `'smaller'`

`h` — Hypothesis test result
`1` | `0`

`p` — Asymptotic p-value
scalar value in the range (0,1)

`ks2stat` — Test statistic
nonnegative scalar value