author: “Alexander Levakov”
date: “March, 2015”
Agents aren’t airplanes. They don’t have schedules.
John Le Carre©
The chi-square test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Does the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference?
For demonstrating the power of chi-square test we use cross tabulations (Table B.7) from open source book (technical report): http://fas.org/sgp/library/spies.pdf
We want to estimate association between the job occupied (civilian, military) and the source of recruitment (volunteer, intelligence, family) that is declared strong by the author of this book. In fact this table can be used as is for chi-square test but we will use chisq.test
and barplot
R-functions to prove or reject this hypothesis.
recruit <- matrix(c(43,20,13,51,12,9),nrow=2,byrow=T)
dimnames(recruit) <- list(job = c("civil", "mil"),source = c("volunt","intell", "family"))
recruit
## source
## job volunt intell family
## civil 43 20 13
## mil 51 12 9
The author declares that there is association - \(H1\) while we want to prove \(H0\) (there is no association) for a significance level of \(P0=0.05\). Let’s start!
recruit.chi <- chisq.test(recruit)
recruit.chi
##
## Pearson's Chi-squared test
##
## data: recruit
## X-squared = 3.3024, df = 2, p-value = 0.1918
x <- seq(from = 0, to = 10, by = .01)
plot(x,dchisq(x,2),main="Chi-square distribution with df=2",type="l",col="blue")
abline(h=0.05,col="red")
grid()
As we can see (X-squared = 3.3024, df = 2, p-value = 0.1918) there is no association between job and source.
Bar plots for observed and expected counts demonstrate the lack of association, too. C’est parfait!
barplot(recruit,legend.text=T,col = c("lightcyan","lightgreen"),xlab="Source",ylab="Job",main="Job occupied by source of recruitment (observed counts)")
recruit2<-recruit.chi$expected
recruit2
## source
## job volunt intell family
## civil 48.27027 16.43243 11.2973
## mil 45.72973 15.56757 10.7027
barplot(recruit2,legend.text=T,col = c("lightcyan","lightgreen"),xlab="Source",ylab="Job",main="Job occupied by source of recruitment (expected counts)")
We can use more informative chi-square test in MINITAB with the same result. See: http://www.minitab.com/
To be honest we must provide citation of the author’s words: The most striking finding in the table remains the higher proportion of military volunteers compared to civilians. Well, the diagram in the bottom % Difference between Observed and Expected Counts needs no comments. But this chart makes a great deal of illusion while the left one in the bottom proves no difference at all. That is the point!
Some times you need to prove your hypothesis before you print the book!
“I mean, you’ve got to compare method with method, and ideal with ideal.”
John Le Carre©