- Create an x vector and a vector of bin edges, so that the count in each cell comes out to the values in your vector "sample". Use thtose as inputs to chi2gof().
- Compute the chi quared test statistic yourself and compare it to a critical value, using the correct degrees of freedom.

# using chi2gof to determine sample representativeness

5 views (last 30 days)

Show older comments

Hello,

I am trying to use chi2gof function to test if the collected sample data is representative of the population data. Say here we have 8 bins and we have the population and sample value for each bin. Is this the correct way to do this test?

Population = [996, 749, 370, 53, 9, 3, 1, 0];

Sample = [647, 486, 100, 22, 0, 0, 0, 0];

[h,p,k]=chi2gof(Sample,'Expected',Population);

##### 0 Comments

### Answers (1)

William Rose
on 26 Oct 2022

Your code (below) does not work because chi2gof expects a vector x containing the observed values of the valriable- not the count of how many are in each cell, which you have provided.

There are (at least) 2 solutions.

Furthermore: Cells with 0 expected value cause the calculation of the chi squared statistic to blow up. Cells with less than 4-5 expected should be combined as needed, until all cells have at least 4-5 expected. Therefore combine cells 4-8 into a single cell:

Population = [996, 749, 370, 53, 9, 3, 1, 0];

Sample = [647, 486, 100, 22, 0, 0, 0, 0];

pop2 = [996, 749, 370, sum(Population(4:8))]

sample2 = [647, 486, 100, sum(Sample(4:8))]

Now let's try method 1 above:

x=[];

for i=1:length(sample2), x=[x,i*ones(1,sample2(i))]; end

edges=.5+(0:length(sample2));

Now do the chi2 test using chi2gof(). k has statistical info, so we inspect it, to make sure the observed values ("O") are what we want them to be.

[h,p,k]=chi2gof(x,'Expected',pop2,'Edges',edges)

The oberved vector "O" has the values in "sample2" vector. That means our x vector and the edges vector worked as desired.

h=1 means the null hypothesis (which is that the sample data matches the population) is rejected.

The low p value means it is highly improbable to get the observed data from this population.

Method 2: Compute the chi2 test statistic ourselves, then compare it to the critical value with the correct degrees of freedom.

chi2stat=sum((sample2-pop2).^2./pop2)

df=length(pop2)-1; pcrit=.05; chi2crit=chi2inv(pcrit,df);

h2=chi2stat>chi2crit; p2=1-chi2cdf(chi2stat,df);

fprintf('h=%d, p=%.3f\n',h2,p2);

The chi squared statistic and h and p match the test statistic and h and p we found above with Method 1.

##### 0 Comments

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!