独立性の検定（カイ二乗検定）

数学と統計の好き嫌いの相関を見たいとする．嫌いが0好きが1であるとすると．

stat <- c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0)
math <- c(0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0)
contengency_table <- addmargins(table(stat, math))
print(contengency_table)

##      math
## stat   0  1 Sum
##   0   10  2  12
##   1    4  4   8
##   Sum 14  6  20

こうした２つの質的変数間の連関を有意性の検定には，独立性の検定（カイ二乗検定）を用いる．検定統計量は（\(\chi^2\)値）は \[ \chi^2 = \sum_{i=1}^{n} \frac{(O_i-E_i)^2}{E_i} \quad O_i:観測度数，E_i:期待度数 \\ クロス集計表におけるセルの期待度数 = \frac{(セルが属する行の周辺度数 \times セルが属する列の周辺度数)}{総度数} \] である．\(\chi^2\)分布の形状を確認しておくと

curve(dchisq(x, 2), 0, 20)
curve(dchisq(x, 1), 0, 20, add=T)
curve(dchisq(x, 4), 0, 20, add=T)
curve(dchisq(x, 8), 0, 20, add=T)

この場合は自由度は１．この分布は自由度が増えるに連れ正規分布に近づく

curve(dchisq(x, 40), 0, 100)

観測度数を

	0	1	Sum
0	\(n_{00}\)	\(n_{01}\)	\(n_{0*}\)
1	\(n_{10}\)	\(n_{11}\)	\(n_{1*}\)
Sum	\(n_{*0}\)	\(n_{*1}\)	\(n_{**}\)

期待度数を

	0	1	Sum
0	\(e_{00}\)	\(e_{01}\)	\(e_{0*}\)
1	\(e_{10}\)	\(e_{11}\)	\(e_{1*}\)
Sum	\(e_{*0}\)	\(e_{*1}\)	\(e_{**}\)

としたとき， \[ \chi^2 = n_{**}\frac{(n_{1*} n_{*1}-n_{**}n_{11})^2}{n_{1*}n_{0*}n_{*1}n_{*0}} \] である．

帰無仮説 \(H_0\): 2つの変数は独立である．
対立仮説 \(H_1\): 2つの変数は連関がある．

有意水準を5% すなわち\(\alpha=0.05\)とする．この条件で検定を行う．

chisq.test(stat, math, correct = F)

## Warning in chisq.test(stat, math, correct = F): Chi-squared approximation
## may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  stat and math
## X-squared = 2.5397, df = 1, p-value = 0.111

有意水準が5%のときの棄却域は

qchisq(0.95, 1)

## [1] 3.841459

これにより棄却域は\(\chi^2 > 3.841459\)となり，帰無仮説が棄却されないことを示す．すなわち数学と統計の好き嫌いの間に有意な連関がないということになる．