Exercise 4.2 The data set Abortion in vcdExtra gives a 2 × 2 × 2 table of opinions regarding abortion in relation to sex and status of the respondent. This table has the following structure:
First, we would like to see the structure of the data set install.packages(“vcdExtra”) library(vcdExtra)
library(vcdExtra)
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data("Abortion", package="vcdExtra")
str(Abortion)
## table [1:2, 1:2, 1:2] 171 152 138 167 79 148 112 133
## - attr(*, "dimnames")=List of 3
## ..$ Sex : chr [1:2] "Female" "Male"
## ..$ Status : chr [1:2] "Lo" "Hi"
## ..$ Support_Abortion: chr [1:2] "Yes" "No"
fourfold(Abortion, c(3,1,2))
fourfold(aperm(Abortion,c(3,1,2)))
(c) For each of the problems above, use oddsratio () to calculate the numerical values of the odds ratio, as stratified in the question.
oddsratio(Abortion, log = FALSE)
## odds ratios for Sex and Status by Support_Abortion
##
## Yes No
## 1.3614130 0.6338682
calculate the Confidence Intervals For the dataset’s odd ratios’ Parameters:
confint(oddsratio(Abortion, log = FALSE))
## 2.5 % 97.5 %
## Yes 0.9945685 1.8635675
## No 0.4373246 0.9187431
oddsratio(Abortion,stratum="Sex",log="FALSE")
## odds ratios for Sex and Status by Support_Abortion
##
## Female Male
## 1.3614130 0.6338682
Summary: Based on the odd ratios calculated above for Sex and Status. Female tends to support abortion more than male.
Exercise 4.4 The Hospital data in vcd gives a 3 × 3 table relating the length of stay (in years) of 132 long-term schizophrenic patients in two London mental hospitals with the frequency of visits by family and friends.
First we want to see the dataset present as a table: install.packages(“vcd”) library(vcd)
data("Hospital", package="vcd")
str(Hospital)
## table [1:3, 1:3] 43 6 9 16 11 18 3 10 16
## - attr(*, "dimnames")=List of 2
## ..$ Visit frequency: chr [1:3] "Regular" "Less than monthly" "Never"
## ..$ Length of stay : chr [1:3] "2-9" "10-19" "20+"
tbl=table(Hospital)
tbl
## Hospital
## 3 6 9 10 11 16 18 43
## 1 1 1 1 1 2 1 1
(a)Carry out a χ2 test for association between the two variables.
chisq.test(tbl)
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
##
## Chi-squared test for given probabilities
##
## data: tbl
## X-squared = 0.77778, df = 7, p-value = 0.9977
(b)Use assocstats () to compute association statistics. How would you describe the strength of association here?
assocstats(Hospital)
## X^2 df P(> X^2)
## Likelihood Ratio 38.353 4 9.4755e-08
## Pearson 35.171 4 4.2842e-07
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.459
## Cramer's V : 0.365
The p value for likelihood ration and pearson are both less than 0.05. In this case, we accept the null hypothesis and conclude that the association is highly significant.
(c)Produce an association plot for these data, with visit frequency as the vertical variable. Describe the pattern of the relation you see here.
Answer: The plot is shown as below. Cramer’s V is positively related with Chi-square statistics. The more close to 1, ther stronger evidence of association. Here Cramer’s V is around 0.3, hence weak association existed.
assocplot(Hospital, col = c("black", "red"), space = 0.3,
main = NULL, xlab = NULL, ylab = NULL)
(d)Both variables can be considered ordinal, so CMHtest () may be useful here. Carry out that analysis. Do any of the tests lead to different conclusions? install.packages(“samplesizeCMH”) library(samplesizeCMH)
library(vcdExtra)
CMHtest(Hospital)
## Cochran-Mantel-Haenszel Statistics for Visit frequency by Length of stay
##
## AltHypothesis Chisq Df Prob
## cor Nonzero correlation 29.138 1 6.7393e-08
## rmeans Row mean scores differ 34.391 2 3.4044e-08
## cmeans Col mean scores differ 29.607 2 3.7233e-07
## general General association 34.905 4 4.8596e-07
From the result, it is found that both row and column factors are ordered factors. Similar conclusion can be drawn for association holds.
Exercise 4.6 The two-way table Mammograms in vcdExtra gives ratings on the severity of diagnosis of 110 mammograms by two raters. (a)Assess the strength of agreement between the raters using Cohen’s κ, both unweighted and weighted. First, we want to look at the two-way table Mammograms
str(Mammograms)
## num [1:4, 1:4] 34 6 2 0 10 8 5 1 2 8 ...
## - attr(*, "dimnames")=List of 2
## ..$ Reader2: chr [1:4] "Absent" "Minimal" "Moderate" "Severe"
## ..$ Reader1: chr [1:4] "Absent" "Minimal" "Moderate" "Severe"
library(vcd)
table(Mammograms)
## Mammograms
## 0 1 2 4 5 6 8 10 12 14 34
## 2 1 4 1 1 1 2 1 1 1 1
(a)Assess the strength of agreement between the raters using Cohen’s κ, both unweighted and weighted.
Kappa(Mammograms)
## value ASE z Pr(>|z|)
## Unweighted 0.3713 0.06033 6.154 7.560e-10
## Weighted 0.5964 0.04923 12.114 8.901e-34
(b)Use agreementplot () for a graphical display of agreement here.
agreementplot(Mammograms, main= "Test Plot")
#Unweighted
agreementplot(Mammograms, main= "Unweighted", weights=1)
#Weighted
agreementplot(Mammograms, main= "Weighted")
(c)Compare the Kappa measures with the results from assocstats (). What is a reasonable interpretation of each of these measures
assocstats(Mammograms)
## X^2 df P(> X^2)
## Likelihood Ratio 92.619 9 4.4409e-16
## Pearson 83.516 9 3.2307e-14
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.657
## Cramer's V : 0.503
Cramer’s is larger than 0.3 and close to 1. The P value is less than 0.05, we accept the null. Compare to Kappa’s result, it is similar conclusion.
Exercise 4.7 Agresti and Winner (1997) gave the data in below on the ratings of 160 movies by the reviewers Gene Siskel and Roger Ebert for the period from April 1995 through September 1996. The rating categories were Con (“thumbs down”), Mixed, and Pro (“thumbs up”)’ (a) Assess the strength of agreement between the raters using Cohen’s κ, both unweighted and weighted. Before we do the agreement analysis, we have to build the table within r.
siskel = matrix(c(24,8,13,45,8,13,11,32,10,9,64,83,42,30,88,160),ncol=4,byrow=TRUE)
colnames(siskel) = c("Con","Mixed","Pro","Total")
rownames(siskel) = c("Con","Mixed","Pro","Total")
siskel = as.table(siskel)
siskel
## Con Mixed Pro Total
## Con 24 8 13 45
## Mixed 8 13 11 32
## Pro 10 9 64 83
## Total 42 30 88 160
Kappa(siskel)
## value ASE z Pr(>|z|)
## Unweighted 0.09012 0.02882 3.127 0.001767
## Weighted 0.08737 0.03372 2.591 0.009573
agreementplot(siskel,main="Unweighted", weights=1)
Weighted agreement plot is shown as below:
agreementplot(siskel,main="Weighted")