Kolmogorov – Smirnov goodness of fit test

This is a non-parametric test and applicable to the continuous random variables. In this case it is not required to make the distributional assumption and testing is carried out on the cumulative distribution functions.

In this case the test statistic is the maximum vertical separation between the theoretical and the observed cumulative distribution functions. If x₁, x₂,….,x_n are the order statistics of a continuous random variable X arranged in the ascending order, the cumulative distribution function is as follows;

F_n(x) = 0 for X < x₁

= k/n for x_k ≤ X ≤x_k+1 ; k = 1,2,…,n-2.

= 1 for X ≥ x_n

Critical values at different significance levels are different and depend on the following situations.

Theoretical distribution parameters are not estimated from the sample data (some other sample might have been used in getting the parameter values
Parameters are estimated from the same sample data.

For the first case, for large samples (sample size > 40) the critical values are as follows (Miller, L.H. 1956);

For the second case, values are provided by Dallal and Wilkinson (1986).

For the headway problem discussed in the previous section the results of the K-S test are as follows;

The CDF of the observed data and the Erlang distribution are shown in Figure 8.6. The Maximum vertical separation is 0.074. Null hypothesis is “data fit to the Erlang distribution” and the alternate hypothesis is “data differ from the Erlang distribution”. Since the parameters are estimated from the same data the critical value of the statistic is obtained from the Table provided in Dallal and Wilkinson (1986). At 1 % significance level the critical value is 0.0603. Since this value is less than the test statistic the conclusion is “reject the null hypothesis”.

K-S test for erlang.tif

Figure 8.6: Observed and the Erlang cumulative distribution functions for the time headway data