Here we investigate a case where under a null hypothesis we would expect data to be uniformly distributed among a number of available categories.
(we suspect not!)
In a survey of 256 chief executives of Fortune 500 companies, the number with each of the 12 star signs were found to be:
(256/12 = 21.3)
stars<-c(23,20,18,23,20,19,18,21,19,22,24,19)
A suitable null hypothesis here is (delete as applicable): H0: The star sign plays a role/no role in success in life, hence the probability that a chief exec has any particular sign is (insert a value)
We can use a chi-square test to tell us how likely it is that we would have got the data above if the null hypothesis is true.
chisq.test(stars)
##
## Chi-squared test for given probabilities
##
## data: stars
## X-squared = 2.2927, df = 11, p-value = 0.9972
Do we reject the null hypothesis or do we fail to reject it?
Now we investigate a case where under a null hypothesis we would not expect data to be uniformly distributed among a number of available categories, but according to some other known distribution.
This exercise is adapted from: Diez, D. M., Cetinkaya-Rundel, M. and Barr, C. D. (2019) OpenIntro Statistics. 4th edn. Available at: https://www.openintro.org/stat/textbook.php?stat_book=os.
In a certain US locality, the numbers of jurors of different ethnicities are counted The numbers are found to be:
The question is: Do these numbers fairly reflect the proportions of the population from each of these ethnicities?
A suitable null hypothesis would be…
As proportions of the whole, the numbers are:
jurors<-c(205,26,25,19)
p0<-c(0.72,0.07,0.12,0.09)
chisq.test(x=jurors,p=p0) # we have to specify the p argument, because we do not have the default case of uniform probabilities fro each category, under the null
##
## Chi-squared test for given probabilities
##
## data: jurors
## X-squared = 5.8896, df = 3, p-value = 0.1171
Do we reject the null hypothesis or do we fail to reject it?