neuro.xtab <- matrix(c(38,5,0,1,33,11,3,0,10,14,5,6,3,7,3,10), ncol=4, byrow=T)
std.r <- chisq.test(neuro.xtab)$stdres
## Warning in chisq.test(neuro.xtab): Chi-squared approximation may be
## incorrect
round(std.r,2)
## [,1] [,2] [,3] [,4]
## [1,] 4.78 -2.46 -2.23 -2.27
## [2,] 2.31 -0.27 -0.32 -2.97
## [3,] -3.79 2.37 1.79 1.22
## [4,] -4.56 0.68 1.13 5.26
We know that positive standardized residuals indicate that there are higher frequencies than expected under independence, which exists when there is no association between the findings. We would expect to see large positive standardized risuals along the along the main diagonal. Interestingly, one of the cells (n22) on our main diagonal has a negative standardized residual, albeit a small negative number. The expected frequency was 11.7 and we observed 11 in our dataset. The other 3 cells on the main diagonal (n11, n33, n44) all showed a high degree of agreement with n11 and n44 having the two largest positive SD in the 4x4 table. The most common disagreements were due to Neurologist B determining a diagnosis of certain multiple sclerosis.
I was expecting the largest negative std. residuals to be located on the bottom left and upper right corners of the chart, where the neurologists findings conflicted to an extreme degree (certain MS vs doubtful, unlikely, or definitely not MS.). N41 had a negative SR=-4.56, which indicates a high level of disagreement and is to be expected. I was surprised to see that cells n31, n12, and n24 all had larger negative SR than cell n14. T
library(psych)
neuro<-data.frame(a=(rep(1:4,each=4)),
b=(rep(1:4,4)),
count=c(38,5,0,1,33,11,3,0,10,14,5,6,3,7,3,10),
diag=c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1))
neuro2 <- data.frame(ms1=c(1,1,1,0,0,0),
ms2=c(-1,0,0,1,1,0),
ms3=c(0,-1,0,-1,0,1),
ms4=c(0,0,-1,0,-1,-1),
nij=c(5,0,1,3,0,6),
nji=c(33,10,3,14,7,3),
x=c(1,2,3,1,2,1))
# Quasi-Independence
QI<-glm(count~factor(a)+factor(b)+factor(diag), family=poisson, data=neuro)
summary(QI)
##
## Call:
## glm(formula = count ~ factor(a) + factor(b) + factor(diag), family = poisson,
## data = neuro)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.96051 -1.92253 0.07114 1.09564 2.49717
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.6556 0.2094 12.683 < 2e-16 ***
## factor(a)2 0.3723 0.2330 1.598 0.110
## factor(a)3 0.2577 0.2642 0.976 0.329
## factor(a)4 -0.2241 0.2836 -0.790 0.429
## factor(b)2 -0.9186 0.2100 -4.375 1.21e-05 ***
## factor(b)3 -2.0983 0.3295 -6.368 1.92e-10 ***
## factor(b)4 -1.5503 0.2717 -5.705 1.16e-08 ***
## factor(diag)1 0.8576 0.1936 4.429 9.47e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 161.740 on 15 degrees of freedom
## Residual deviance: 49.678 on 8 degrees of freedom
## AIC: 118.95
##
## Number of Fisher Scoring iterations: 5
1-pchisq(161.740,8)
## [1] 0
# Ordinal Quasi-Symmetry
OQS <- glm(nij/(nij+nji) ~ - 1 + x, family=binomial, weights=nij+nji, data=neuro2)
summary(OQS)
##
## Call:
## glm(formula = nij/(nij + nji) ~ -1 + x, family = binomial, data = neuro2,
## weights = nij + nji)
##
## Deviance Residuals:
## 1 2 3 4 5 6
## -1.4175 -1.2483 1.7952 -0.4599 -1.0444 2.8512
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## x -1.2565 0.2458 -5.111 3.2e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 56.442 on 6 degrees of freedom
## Residual deviance: 16.222 on 5 degrees of freedom
## AIC: 28.684
##
## Number of Fisher Scoring iterations: 5
1-pchisq(16.2,5)
## [1] 0.006295667
# Quasi-Symmetry Model
QS <- glm(nij/(nij+nji) ~ -1 + ms1 + ms2 + ms3 + ms4, family=binomial,weights=nij+nji, data=neuro2)
summary(QS)
##
## Call:
## glm(formula = nij/(nij + nji) ~ -1 + ms1 + ms2 + ms3 + ms4, family = binomial,
## data = neuro2, weights = nij + nji)
##
## Deviance Residuals:
## 1 2 3 4 5 6
## -0.2731 -0.7188 1.4863 0.5215 -1.7293 0.3481
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## ms1 -3.1930 0.7633 -4.183 2.87e-05 ***
## ms2 -1.4349 0.6583 -2.180 0.0293 *
## ms3 0.4501 0.5840 0.771 0.4409
## ms4 NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 56.442 on 6 degrees of freedom
## Residual deviance: 6.184 on 3 degrees of freedom
## AIC: 22.646
##
## Number of Fisher Scoring iterations: 5
1-pchisq(6.184,3)
## [1] 0.1029934
Quasi-Independent Model: Deviance=49.678 on 8 degrees of freedom AIC=118.95 and a goodness of fit test reports a very small p-value close to 0, therefore we can conclude this model fits the data poorly.
Ordinal Quasi-Symmetry Model: Deviance=16.222 on 5 degrees of freedom AIC=28.684 and GoF test reports p=.0006. This model fits better than the Quasi-Independent model, but there is still some lack of fit and there is likely a better model for this data.
Quasi-Symmetry Model: Deviance=6.184 on 3 degrees of freedom AIC=22.646 and GoF test reports p=.103. We can conclude that this model fits the data pretty well, and it certainly fits much better than a Quasi-Independent or Ordinal Quasi-Symmetry model.
summary(OQS)
##
## Call:
## glm(formula = nij/(nij + nji) ~ -1 + x, family = binomial, data = neuro2,
## weights = nij + nji)
##
## Deviance Residuals:
## 1 2 3 4 5 6
## -1.4175 -1.2483 1.7952 -0.4599 -1.0444 2.8512
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## x -1.2565 0.2458 -5.111 3.2e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 56.442 on 6 degrees of freedom
## Residual deviance: 16.222 on 5 degrees of freedom
## AIC: 28.684
##
## Number of Fisher Scoring iterations: 5
The estimated probability that Neurosurgeon B’s diagnosis is x categories higher than Neurosurgeon A’s diagnosis equals exp(-1.2565x) times the reverse probability.
cohen.kappa(neuro.xtab)
## Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)
##
## Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries
## lower estimate upper
## unweighted kappa 0.11 0.21 0.31
## weighted kappa 0.40 0.52 0.65
##
## Number of subjects = 149
We will use weighted Kappa given that the data is ordinal in it’s structure, going from certain MS to doubtful/unlikely/definitely not MS. The difference between the observed agreement and expected agreement under independence is roughly 52% of the maximum possible difference.
tennis <- data.frame( C=c(1,1,1,1,0,0,0,0,0,0),
D=c(-1,0,0,0,1,1,1,0,0,0),
P=c(0,-1,0,0,-1,0,0,1,1,0),
SW=c(0,0,-1,0,0,-1,0,-1,0,1),
VW=c(0,0,0,-1,0,0,-1,0,-1,-1),
nij=c(6,3,0,2,0,2,4,0,1,2),
nji=c(2,1,2,3,2,2,2,2,2,2))
tennis.fit <- glm(nij/(nij+nji) ~ -1 + C + D + P + VW + SW, family=binomial, weights=nij+nji, data=tennis)
summary(tennis.fit)
##
## Call:
## glm(formula = nij/(nij + nji) ~ -1 + C + D + P + VW + SW, family = binomial,
## data = tennis, weights = nij + nji)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7732 -0.9924 -0.3752 0.7266 1.1643
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## C -0.3918 0.7290 -0.537 0.591
## D -0.8388 0.6986 -1.201 0.230
## P -1.0167 0.8256 -1.232 0.218
## VW -0.5592 0.6957 -0.804 0.422
## SW NA NA NA NA
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 12.678 on 10 degrees of freedom
## Residual deviance: 10.117 on 6 degrees of freedom
## AIC: 32.068
##
## Number of Fisher Scoring iterations: 4
c<-coef(tennis.fit)
round(exp(c[1:5]),2)
## C D P VW SW
## 0.68 0.43 0.36 0.57 NA
1-pchisq(10.117,6)
## [1] 0.1198106
The model has a deviance of 10.117 on 6 degrees of freedom and is an adequate fit for the data. The players ranked from best to worst are as follows: (1) S. Williams (2) Clijsters (3) V. Williams (4) Davenport (5) Pierce
tennis.fitb <- glm(nij/(nij+nji) ~ -1 + C + D + P + SW + VW, family=binomial, weights=nij+nji, data=tennis)
summary(tennis.fitb)
##
## Call:
## glm(formula = nij/(nij + nji) ~ -1 + C + D + P + SW + VW, family = binomial,
## data = tennis, weights = nij + nji)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7732 -0.9924 -0.3752 0.7266 1.1643
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## C 0.1674 0.5960 0.281 0.779
## D -0.2795 0.5796 -0.482 0.630
## P -0.4575 0.7249 -0.631 0.528
## SW 0.5592 0.6957 0.804 0.422
## VW NA NA NA NA
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 12.678 on 10 degrees of freedom
## Residual deviance: 10.117 on 6 degrees of freedom
## AIC: 32.068
##
## Number of Fisher Scoring iterations: 4
c<-coef(tennis.fitb)
# Probability
exp(c[4])/(1+exp(c[4]))
## SW
## 0.6362715
The estimated probability that Serena Williams will win a tennis match against Venus Williams is = 0.6362715.
tennis.fitc <- glm(nij/(nij+nji) ~ -1 + C + P + SW + VW + D, family=binomial, weights=nij+nji, data=tennis)
summary(tennis.fitc)
##
## Call:
## glm(formula = nij/(nij + nji) ~ -1 + C + P + SW + VW + D, family = binomial,
## data = tennis, weights = nij + nji)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7732 -0.9924 -0.3752 0.7266 1.1643
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## C 0.4470 0.5560 0.804 0.421
## P -0.1780 0.7337 -0.243 0.808
## SW 0.8388 0.6986 1.201 0.230
## VW 0.2795 0.5796 0.482 0.630
## D NA NA NA NA
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 12.678 on 10 degrees of freedom
## Residual deviance: 10.117 on 6 degrees of freedom
## AIC: 32.068
##
## Number of Fisher Scoring iterations: 4
# Std. Error for Serena Williams vs. Davenport
SE<-0.6986
BiBj.diff=0.8387604
CI<-BiBj.diff+c(-1,1)*qnorm(.90)*SE; CI
## [1] -0.05653152 1.73405232
exp(CI)
## [1] 0.9450367 5.6635580
With 90% confidence, we infer the probability that Serena Williams will win the tennis match is between 0.9450 and 5.6636 times the probability that Davenport will win the match. 95% CI: (0.9450367, 5.6635580)
MBTI<-read.table("http://users.stat.ufl.edu/~aa/cat/data/MBTI.dat", header=T)
EI<-MBTI$EI
SN<-MBTI$SN
TF<-MBTI$TF
JP<-MBTI$JP
fit5 <- glm(n ~ EI + SN + TF + JP + EI:SN + EI:TF + EI:JP + SN:TF + SN:JP + TF:JP, family=poisson, data=MBTI)
summary(fit5)
##
## Call:
## glm(formula = n ~ EI + SN + TF + JP + EI:SN + EI:TF + EI:JP +
## SN:TF + SN:JP + TF:JP, family = poisson, data = MBTI)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7
## -0.72826 1.00215 0.05168 -0.01429 1.49947 -1.29325 -0.07596
## 8 9 10 11 12 13 14
## 0.00231 0.56850 -0.82975 -0.04948 0.01728 -1.57051 1.09960
## 15 16
## 0.08587 -0.00804
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.44760 0.13793 24.994 < 2e-16 ***
## EIi -0.02907 0.15266 -0.190 0.848952
## SNs 1.21082 0.14552 8.320 < 2e-16 ***
## TFt -0.64194 0.16768 -3.828 0.000129 ***
## JPp 0.93417 0.14594 6.401 1.54e-10 ***
## EIi:SNs 0.30212 0.14233 2.123 0.033780 *
## EIi:TFt 0.19449 0.13121 1.482 0.138258
## EIi:JPp 0.01766 0.13160 0.134 0.893261
## SNs:TFt 0.40920 0.15243 2.684 0.007265 **
## SNs:JPp -1.22153 0.14547 -8.397 < 2e-16 ***
## TFt:JPp -0.55936 0.13512 -4.140 3.48e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.944 on 15 degrees of freedom
## Residual deviance: 10.162 on 5 degrees of freedom
## AIC: 125
##
## Number of Fisher Scoring iterations: 4
1-pchisq(10.162, 5)
## [1] 0.07077304
fit5b <- glm(n ~ EI + SN + TF + JP + EI:SN + SN:TF + SN:JP + TF:JP, family=poisson, data=MBTI)
summary(fit5b)
##
## Call:
## glm(formula = n ~ EI + SN + TF + JP + EI:SN + SN:TF + SN:JP +
## TF:JP, family = poisson, data = MBTI)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.65487 -0.46916 0.00529 0.54208 1.47431
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.41362 0.12930 26.402 < 2e-16 ***
## EIi 0.03871 0.11361 0.341 0.733287
## SNs 1.19414 0.14548 8.208 2.24e-16 ***
## TFt -0.54137 0.15282 -3.543 0.000396 ***
## JPp 0.94292 0.13064 7.218 5.28e-13 ***
## EIi:SNs 0.32190 0.13598 2.367 0.017922 *
## SNs:TFt 0.42366 0.15200 2.787 0.005318 **
## SNs:JPp -1.22021 0.14513 -8.408 < 2e-16 ***
## TFt:JPp -0.55853 0.13497 -4.138 3.50e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.944 on 15 degrees of freedom
## Residual deviance: 12.369 on 7 degrees of freedom
## AIC: 123.2
##
## Number of Fisher Scoring iterations: 4
exp(0.32190)
## [1] 1.379747
The odds of being introverted and and sensitive is 1.3797 times the odds of being extroverted and intuitive.
exp(0.19449)
## [1] 1.214691
See attachment for the loglinear model Conditionally independent: Y and X, Y and Z. The marginal association is equal to the conditional association for Y and X, Y and Z.
substance<-read.table("http://users.stat.ufl.edu/~aa/cat/data/Substance2.dat", header=T)
fit8 <- glm(count ~ A + C + M + R + G + A:C + A:M + C:M + A:G + A:R + G:M + G:R, family=poisson, data=substance)
summary(fit8)
##
## Call:
## glm(formula = count ~ A + C + M + R + G + A:C + A:M + C:M + A:G +
## A:R + G:M + G:R, family = poisson, data = substance)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5357 -0.5772 -0.0178 0.5134 2.4676
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 26.31264 1.04852 25.095 < 2e-16 ***
## A -11.69693 0.95958 -12.190 < 2e-16 ***
## C -7.91818 0.34762 -22.778 < 2e-16 ***
## M -5.97406 0.51168 -11.675 < 2e-16 ***
## R -3.38269 0.34182 -9.896 < 2e-16 ***
## G -0.01334 0.23569 -0.057 0.95486
## A:C 2.05453 0.17406 11.803 < 2e-16 ***
## A:M 3.00592 0.46484 6.467 1e-10 ***
## C:M 2.84789 0.16384 17.382 < 2e-16 ***
## A:G 0.29229 0.12768 2.289 0.02207 *
## A:R 0.59346 0.19268 3.080 0.00207 **
## M:G -0.26929 0.09039 -2.979 0.00289 **
## R:G 0.12619 0.15866 0.795 0.42644
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4818.051 on 31 degrees of freedom
## Residual deviance: 19.909 on 19 degrees of freedom
## AIC: 182.15
##
## Number of Fisher Scoring iterations: 5
The odds ratio does not change because alcohol or marijuana are conditionally independent on race. Alcohol and marijuana are conditionally dependent on gender, which is why the odds ratio changes when collapsing over gender.
fit8b <- glm(count ~ A + C + M + R + G + A:C + A:M + C:M + A:G + A:R + G:R, family=poisson, data=substance)
summary(fit8b)
##
## Call:
## glm(formula = count ~ A + C + M + R + G + A:C + A:M + C:M + A:G +
## A:R + G:R, family = poisson, data = substance)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8342 -0.6160 -0.2801 0.4567 2.4638
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 26.6964 1.0373 25.737 < 2e-16 ***
## A -11.4674 0.9530 -12.033 < 2e-16 ***
## C -7.9182 0.3476 -22.778 < 2e-16 ***
## M -6.3588 0.4957 -12.827 < 2e-16 ***
## R -3.3827 0.3418 -9.896 < 2e-16 ***
## G -0.2922 0.2162 -1.351 0.17663
## A:C 2.0545 0.1741 11.803 < 2e-16 ***
## A:M 2.9860 0.4647 6.426 1.31e-10 ***
## C:M 2.8479 0.1638 17.382 < 2e-16 ***
## A:G 0.1644 0.1202 1.368 0.17133
## A:R 0.5935 0.1927 3.080 0.00207 **
## R:G 0.1262 0.1587 0.795 0.42644
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4818.051 on 31 degrees of freedom
## Residual deviance: 28.805 on 20 degrees of freedom
## AIC: 189.04
##
## Number of Fisher Scoring iterations: 5