ファイ係数を導出する

以下のような\(2\times 2\)のクロス表を考える.

\[ \begin{bmatrix} a & b \\ c & d \\ \end{bmatrix} \]

\[\begin{align} n_{1\cdot }&= a+b \\ n_{2\cdot }&= c+d \\ n_{\cdot 1}&= a+c \\ n_{\cdot 2}&= b+d \\ N&=a+b+c+d \end{align}\]

さて, \[\begin{align} n_{1\cdot }n_{\cdot 1}&= (a+b)(a+c)=bc-ad+aN \\ n_{1\cdot }n_{\cdot 2}&= (a+b)(b+d)=ad-bc+bN \\ n_{2\cdot }n_{\cdot 1}&= (c+d)(a+c)=ad-bc+cN \\ n_{2\cdot }n_{\cdot 2}&= (c+d)(b+d)=bc-ad+dN \\ \end{align}\] より, \[n_{1\cdot }n_{\cdot 1}+n_{1\cdot }n_{\cdot 2}+n_{2\cdot }n_{\cdot 1}+n_{2\cdot }n_{\cdot 2}=N^2\] である(あとで使う).

このクロス表のカイ二乗値を計算する.そのために,各セルについて \[r_{ij}=\frac{(観測度数-期待度数)^2}{期待度数}\] を求める. \[\begin{align} r_{11}&=\frac{(a-n_{1\cdot }n_{\cdot 1}/N)^2}{n_{1\cdot }n_{\cdot 1}/N}=\frac{(ad-bc)^2/N}{n_{1\cdot }n_{\cdot 1}} \\ r_{12}&=\frac{(b-n_{1\cdot }n_{\cdot 2}/N)^2}{n_{1\cdot }n_{\cdot 2}/N}=\frac{(ad-bc)^2/N}{n_{1\cdot }n_{\cdot 2}} \\ r_{21}&=\frac{(c-n_{2\cdot }n_{\cdot 1}/N)^2}{n_{2\cdot }n_{\cdot 1}/N}=\frac{(ad-bc)^2/N}{n_{2\cdot }n_{\cdot 1}} \\ r_{22}&=\frac{(d-n_{2\cdot }n_{\cdot 2}/N)^2}{n_{2\cdot }n_{\cdot 2}/N}=\frac{(ad-bc)^2/N}{n_{2\cdot }n_{\cdot 2}} \end{align}\]

カイ二乗値を求めると,

\[\chi^2=\sum_i \sum_j r_{ij}= \frac{(n_{1\cdot }n_{\cdot 1}+n_{1\cdot }n_{\cdot 2}+n_{2\cdot }n_{\cdot 1}+n_{2\cdot }n_{\cdot 2})(ad-bc)^2/N}{n_{1\cdot }n_{\cdot 1}n_{\cdot 2}n_{2 \cdot }}=\frac{N(ad-bc)^2}{(a+b)(a+c)(b+d)(c+d)}\] である.

クラメールのVの絶対値をはずすと,ファイ係数の定義式が導出できる.

\[\phi = \frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\]

仮想例で,ファイ係数,相対リスク,オッズ比を実感する.

例1

mat<-matrix(c(4,2,7,7),2,2)
mat
##      [,1] [,2]
## [1,]    4    7
## [2,]    2    7
chisq.test(mat,correct = F)
## Warning in chisq.test(mat, correct = F): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mat
## X-squared = 0.47138, df = 1, p-value = 0.4924
CramerV<- function(x) {## x should be a contingency table
  chisq<-unname(chisq.test(x,correct = FALSE)$statistic)
  sqrt(chisq/(sum(x)*(min(ncol(x),nrow(x))-1)))
}
CramerV(mat)
## Warning in chisq.test(x, correct = FALSE): Chi-squared approximation may be
## incorrect
## [1] 0.1535221
Phi<- function (x) {## x should be a 2x2 contingency table
  numer<-x[1,1]*x[2,2]-x[1,2]*x[2,1]
  m1<-log(sum(x[1,1:2]))
  m2<-log(sum(x[2,1:2]))
  m3<-log(sum(x[1:2,1]))
  m4<-log(sum(x[1:2,2]))
  denom<-sqrt(exp(m1+m2+m3+m4))
  numer/denom
}
Phi(mat)
## [1] 0.1535221
RR<- function (x) {## x should be a 2x2 contingency table
  p1<-x[1,1]/sum(x[1,1:2])
  p2<-x[2,1]/sum(x[2,1:2])
  p1/p2
}
RR(mat)
## [1] 1.636364
OR<- function (x) {## x should be a 2x2 contingency table
  odds1<-x[1,1]/x[1,2]
  odds2<-x[2,1]/x[2,2]
  odds1/odds2
}
OR(mat)
## [1] 2
library(vcd)
## Loading required package: grid
assocstats(mat)
##                      X^2 df P(> X^2)
## Likelihood Ratio 0.47926  1  0.48876
## Pearson          0.47138  1  0.49235
## 
## Phi-Coefficient   : 0.154 
## Contingency Coeff.: 0.152 
## Cramer's V        : 0.154
exp(coef(loddsratio(mat))) # オッズ比
## / 
## 2

例2

mat2<-matrix(c(12,13,8,42),2,2)
mat2
##      [,1] [,2]
## [1,]   12    8
## [2,]   13   42
chisq.test(mat2,correct = F)
## 
##  Pearson's Chi-squared test
## 
## data:  mat2
## X-squared = 8.7273, df = 1, p-value = 0.003135
CramerV(mat2)
## [1] 0.3411211
Phi(mat2)
## [1] 0.3411211
RR(mat2)
## [1] 2.538462
OR(mat2)
## [1] 4.846154
library(vcd)
assocstats(mat2)
##                     X^2 df  P(> X^2)
## Likelihood Ratio 8.4029  1 0.0037461
## Pearson          8.7273  1 0.0031349
## 
## Phi-Coefficient   : 0.341 
## Contingency Coeff.: 0.323 
## Cramer's V        : 0.341
loddsratio(mat2) # 対数オッズ比
## log odds ratios for and 
## 
## [1] 1.578185
log(12)-log(8)-log(13)+log(42)
## [1] 1.578185