kstest2

Two-sample Kolmogorov-Smirnov test

Syntax

``h = kstest2(x1,x2)``
``h = kstest2(x1,x2,Name,Value)``
``````[h,p] = kstest2(___)``````
``````[h,p,ks2stat] = kstest2(___)``````

Description

example

````h = kstest2(x1,x2)` returns a test decision for the null hypothesis that the data in vectors `x1` and `x2` are from the same continuous distribution, using the two-sample Kolmogorov-Smirnov test. The alternative hypothesis is that `x1` and `x2` are from different continuous distributions. The result `h` is `1` if the test rejects the null hypothesis at the 5% significance level, and `0` otherwise.```

example

````h = kstest2(x1,x2,Name,Value)` returns a test decision for a two-sample Kolmogorov-Smirnov test with additional options specified by one or more name-value pair arguments. For example, you can change the significance level or conduct a one-sided test.```

example

``````[h,p] = kstest2(___)``` also returns the asymptotic p-value `p`, using any of the input arguments from the previous syntaxes.```

example

``````[h,p,ks2stat] = kstest2(___)``` also returns the test statistic `ks2stat`.```

Examples

collapse all

Generate sample data from two different Weibull distributions.

```rng(1); % For reproducibility x1 = wblrnd(1,1,1,50); x2 = wblrnd(1.2,2,1,50);```

Test the null hypothesis that data in vectors `x1` and `x2` comes from populations with the same distribution.

`h = kstest2(x1,x2)`
```h = logical 1 ```

The returned value of `h = 1` indicates that `kstest` rejects the null hypothesis at the default 5% significance level.

Generate sample data from two different Weibull distributions.

```rng(1); % For reproducibility x1 = wblrnd(1,1,1,50); x2 = wblrnd(1.2,2,1,50);```

Test the null hypothesis that data vectors `x1` and `x2` are from populations with the same distribution at the 1% significance level.

`[h,p] = kstest2(x1,x2,'Alpha',0.01)`
```h = logical 0 ```
```p = 0.0317 ```

The returned value of `h = 0` indicates that `kstest` does not reject the null hypothesis at the 1% significance level.

Generate sample data from two different Weibull distributions.

```rng(1); % For reproducibility x1 = wblrnd(1,1,1,50); x2 = wblrnd(1.2,2,1,50);```

Test the null hypothesis that data in vectors `x1` and `x2` comes from populations with the same distribution, against the alternative hypothesis that the cdf of the distribution of `x1` is larger than the cdf of the distribution of `x2`.

`[h,p,k] = kstest2(x1,x2,'Tail','larger')`
```h = logical 1 ```
```p = 0.0158 ```
```k = 0.2800 ```

The returned value of `h = 1` indicates that `kstest` rejects the null hypothesis, in favor of the alternative hypothesis that the cdf of the distribution of `x1` is larger than the cdf of the distribution of `x2`, at the default 5% significance level. The returned value of `k` is the test statistic for the two-sample Kolmogorov-Smirnov test.

Input Arguments

collapse all

Sample data from the first sample, specified as a vector. Data vectors `x1` and `x2` do not need to be the same size.

Data Types: `single` | `double`

Sample data from the second sample, specified as a vector. Data vectors `x1` and `x2` do not need to be the same size.

Data Types: `single` | `double`

Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `'Tail','larger','Alpha',0.01` specifies a test using the alternative hypothesis that the empirical cdf of `x1` is larger than the empirical cdf of `x2`, conducted at the 1% significance level.

Significance level of the hypothesis test, specified as the comma-separated pair consisting of `'Alpha'` and a scalar value in the range (0,1).

Example: `'Alpha',0.01`

Data Types: `single` | `double`

Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of `'Tail'` and one of the following.

 `'unequal'` Test the alternative hypothesis that the empirical cdf of `x1` is unequal to the empirical cdf of `x2`. `'larger'` Test the alternative hypothesis that the empirical cdf of `x1` is larger than the empirical cdf of `x2`. `'smaller'` Test the alternative hypothesis that the empirical cdf of `x1` is smaller than the empirical cdf of `x2`.

If the data values in `x1` tend to be larger than those in `x2`, the empirical distribution function of `x1` tends to be smaller than that of `x2`, and vice versa.

Example: `'Tail','larger'`

Output Arguments

collapse all

Hypothesis test result, returned as a logical value.

• If `h` `= 1`, this indicates the rejection of the null hypothesis at the `Alpha` significance level.

• If `h` `= 0`, this indicates a failure to reject the null hypothesis at the `Alpha` significance level.

Asymptotic p-value of the test, returned as a scalar value in the range (0,1). `p` is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. The asymptotic p-value becomes very accurate for large sample sizes, and is believed to be reasonably accurate for sample sizes `n1` and `n2`, such that `(n1*n2)/(n1 + n2)``4`.

Test statistic, returned as a nonnegative scalar value.

collapse all

Two-Sample Kolmogorov-Smirnov Test

The two-sample Kolmogorov-Smirnov test is a nonparametric hypothesis test that evaluates the difference between the cdfs of the distributions of the two sample data vectors over the range of x in each data set.

The two-sided test uses the maximum absolute difference between the cdfs of the distributions of the two data vectors. The test statistic is

`${D}^{*}=\underset{x}{\mathrm{max}}\left(|{\stackrel{^}{F}}_{1}\left(x\right)-{\stackrel{^}{F}}_{2}\left(x\right)|\right),$`

where ${\stackrel{^}{F}}_{1}\left(x\right)$ is the proportion of `x1` values less than or equal to x and ${\stackrel{^}{F}}_{2}\left(x\right)$ is the proportion of `x2` values less than or equal to x.

The one-sided test uses the actual value of the difference between the cdfs of the distributions of the two data vectors rather than the absolute value. The test statistic is

`${D}^{*}=\underset{x}{\mathrm{max}}\left({\stackrel{^}{F}}_{1}\left(x\right)-{\stackrel{^}{F}}_{2}\left(x\right)\right).$`

Algorithms

In `kstest2`, the decision to reject the null hypothesis is based on comparing the p-value `p` with the significance level `Alpha`, not by comparing the test statistic `ks2stat` with a critical value.

References

[1] Massey, F. J. “The Kolmogorov-Smirnov Test for Goodness of Fit.” Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78.

[2] Miller, L. H. “Table of Percentage Points of Kolmogorov Statistics.” Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121.

[3] Marsaglia, G., W. Tsang, and J. Wang. “Evaluating Kolmogorov’s Distribution.” Journal of Statistical Software. Vol. 8, Issue 18, 2003.

Version History

Introduced before R2006a