chi2gof
Chi-square goodness-of-fit test.
chi2gof
performs a chi-square goodness-of-fit test for discrete or
continuous distributions. The test is performed by grouping the data into
bins, calculating the observed and expected counts for those bins, and
computing the chi-square test statistic
$$ \chi ^ 2 = \sum_{i=1}^N \left (O_i - E_i \right) ^ 2 / E_i $$
where O is the observed counts and E is the expected counts. This test
statistic has an approximate chi-square distribution when the counts are
sufficiently large.
Bins in either tail with an expected count less than 5 are pooled with neighboring bins until the count in each extreme bin is at least 5. If bins remain in the interior with counts less than 5, CHI2GOF displays a warning. In that case, you should use fewer bins, or provide bin centers or binedges, to increase the expected counts in all bins.
h = chi2gof (x)
performs a chi-square goodness-of-fit test
that the data in the vector X are a random sample from a normal distribution
with mean and variance estimated from x. The result is h = 0 if
the null hypothesis (that x is a random sample from a normal
distribution) cannot be rejected at the 5% significance level, or h = 1
if the nullhypothesis can be rejected at the 5% level. chi2gof
uses
by default 10 bins (’nbins’), and compares the test statistic to a chi-square
distribution with ’nbins’ - 3 degrees of freedom, to take into account that
two parameters were estimated.
[h, p] = chi2gof (x)
also returns the p-value p,
which is the probability of observing the given result, or one more extreme,
by chance if the null hypothesis is true. If there are not enough degrees of
freedom to carry out the test, p is NaN.
[h, p, stats] = chi2gof (x)
also returns a
stats structure with the following fields:
"chi2stat" | Chi-square statistic | |
"df" | Degrees of freedom | |
"binedges" | Vector of bin binedges after pooling | |
"O" | Observed count in each bin | |
"E" | Expected count in each bin |
[…] = chi2gof (x, name1, value1, …)
specifies optional argument name/value pairs chosen from the following list.
Name | Value | |
---|---|---|
"nbins" | The number of bins to use. Default is 10. | |
"binctrs" | A vector of bin centers. | |
"binedges" | A vector of bin binedges. | |
"cdf" | A fully specified cumulative distribution function or a a function handle. Alternatively, a cell array whose first element is a function handle and all later elements are parameter values, one per cell. provided in a cell array whose first element is a function handle, and whose later elements are parameter values, one per cell. The function must take X values as its first argument, and other parameters as later arguments. | |
"expected" | A vector with one element per bin specifying the expected counts for each bin. | |
"nparams" | The number of estimated parameters; used to adjust the degrees of freedom to be ’nbins’ - 1 - ’nparams’, where ’nbins’ is the number of bins. | |
"emin" | The minimum allowed expected value for a bin; any bin in either tail having an expected value less than this amount is pooled with a neighboring bin. Use the value 0 to prevent pooling. Default is 5. | |
"frequency" | A vector of the same length as x containing the frequency of the corresponding x values. | |
"alpha" | An ALPHA value such that the hypothesis is rejected if p < ALPHA. Default is ALPHA = 0.05. |
You should specify either "cdf" or "expected" parameters, but not both. If your "cdf" input contains extra parameters, these are accounted for automatically and there is no need to specify "nparams". If your "expected" input depends on estimated parameters, you should use the "nparams" parameter to ensure that the degrees of freedom for the test is correct.
Source Code: chi2gof
x = normrnd (50, 5, 100, 1); [h, p, stats] = chi2gof (x) [h, p, stats] = chi2gof (x, "cdf", @(x)normcdf (x, mean(x), std(x))) [h, p, stats] = chi2gof (x, "cdf", {@normcdf, mean(x), std(x)}) h = 0 p = 0.5464 stats = scalar structure containing the fields: chi2stat = 4.0212 df = 5 edges = 38.399 42.726 44.890 47.053 49.217 51.380 53.544 55.708 60.035 O = 9 7 9 22 17 14 10 12 E = 6.8588 8.2228 13.0313 16.8721 17.8471 15.4236 10.8899 10.8544 h = 0 p = 0.5464 stats = scalar structure containing the fields: chi2stat = 4.0212 df = 5 edges = 38.399 42.726 44.890 47.053 49.217 51.380 53.544 55.708 60.035 O = 9 7 9 22 17 14 10 12 E = 6.8588 8.2228 13.0313 16.8721 17.8471 15.4236 10.8899 10.8544 h = 0 p = 0.5464 stats = scalar structure containing the fields: chi2stat = 4.0212 df = 5 edges = 38.399 42.726 44.890 47.053 49.217 51.380 53.544 55.708 60.035 O = 9 7 9 22 17 14 10 12 E = 6.8588 8.2228 13.0313 16.8721 17.8471 15.4236 10.8899 10.8544 |
x = rand (100,1 ); n = length (x); binedges = linspace (0, 1, 11); expectedCounts = n * diff (binedges); [h, p, stats] = chi2gof (x, "binedges", binedges, "expected", expectedCounts) h = 0 p = 0.055361 stats = scalar structure containing the fields: chi2stat = 16.600 df = 9 edges = Columns 1 through 8: 6.6704e-03 1.0438e-01 2.0209e-01 2.9980e-01 3.9751e-01 4.9521e-01 5.9292e-01 6.9063e-01 Columns 9 through 11: 7.8834e-01 8.8605e-01 9.8376e-01 O = 8 14 5 13 2 13 11 7 12 15 E = 10 10 10 10 10 10 10 10 10 10 |
bins = 0:5; obsCounts = [6 16 10 12 4 2]; n = sum(obsCounts); lambdaHat = sum(bins.*obsCounts) / n; expCounts = n * poisspdf(bins,lambdaHat); [h, p, stats] = chi2gof (bins, "binctrs", bins, "frequency", obsCounts, ... "expected", expCounts, "nparams",1) h = 0 p = 0.4654 stats = scalar structure containing the fields: chi2stat = 2.5550 df = 3 edges = 4.9407e-324 8.3333e-01 1.6667e+00 2.5000e+00 3.3333e+00 5.0000e+00 O = 6 16 10 12 6 E = 7.0429 13.8041 13.5280 8.8383 6.0284 |