ANLY 545- Homework #3

Exercise 4.2 The data set Abortion in vcdExtra gives a 2 × 2 × 2 table of opinions regarding abortion in relation to sex and status of the respondent. This table has the following structure:

First, we would like to see the structure of the data set install.packages(“vcdExtra”) library(vcdExtra)

library(vcdExtra)
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data("Abortion", package="vcdExtra")
str(Abortion)
##  table [1:2, 1:2, 1:2] 171 152 138 167 79 148 112 133
##  - attr(*, "dimnames")=List of 3
##   ..$ Sex             : chr [1:2] "Female" "Male"
##   ..$ Status          : chr [1:2] "Lo" "Hi"
##   ..$ Support_Abortion: chr [1:2] "Yes" "No"
  1. Taking support for abortion as the outcome variable, produce fourfold displays showing the association with sex, stratified by status.
fourfold(Abortion, c(3,1,2))

  1. Do the same for the association of support for abortion with status, stratified by sex.
fourfold(aperm(Abortion,c(3,1,2)))

(c) For each of the problems above, use oddsratio () to calculate the numerical values of the odds ratio, as stratified in the question.

oddsratio(Abortion, log = FALSE)
##  odds ratios for Sex and Status by Support_Abortion 
## 
##       Yes        No 
## 1.3614130 0.6338682

calculate the Confidence Intervals For the dataset’s odd ratios’ Parameters:

confint(oddsratio(Abortion, log = FALSE))
##         2.5 %    97.5 %
## Yes 0.9945685 1.8635675
## No  0.4373246 0.9187431
oddsratio(Abortion,stratum="Sex",log="FALSE")
##  odds ratios for Sex and Status by Support_Abortion 
## 
##    Female      Male 
## 1.3614130 0.6338682
  1. Write a brief summary of how support for abortion depends on sex and status.

Summary: Based on the odd ratios calculated above for Sex and Status. Female tends to support abortion more than male.

Exercise 4.4 The Hospital data in vcd gives a 3 × 3 table relating the length of stay (in years) of 132 long-term schizophrenic patients in two London mental hospitals with the frequency of visits by family and friends.

First we want to see the dataset present as a table: install.packages(“vcd”) library(vcd)

data("Hospital", package="vcd")
str(Hospital)
##  table [1:3, 1:3] 43 6 9 16 11 18 3 10 16
##  - attr(*, "dimnames")=List of 2
##   ..$ Visit frequency: chr [1:3] "Regular" "Less than monthly" "Never"
##   ..$ Length of stay : chr [1:3] "2-9" "10-19" "20+"
tbl=table(Hospital)
tbl
## Hospital
##  3  6  9 10 11 16 18 43 
##  1  1  1  1  1  2  1  1

(a)Carry out a χ2 test for association between the two variables.

chisq.test(tbl)
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
## 
##  Chi-squared test for given probabilities
## 
## data:  tbl
## X-squared = 0.77778, df = 7, p-value = 0.9977

(b)Use assocstats () to compute association statistics. How would you describe the strength of association here?

assocstats(Hospital)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 38.353  4 9.4755e-08
## Pearson          35.171  4 4.2842e-07
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.459 
## Cramer's V        : 0.365

The p value for likelihood ration and pearson are both less than 0.05. In this case, we accept the null hypothesis and conclude that the association is highly significant.

(c)Produce an association plot for these data, with visit frequency as the vertical variable. Describe the pattern of the relation you see here.

Answer: The plot is shown as below. Cramer’s V is positively related with Chi-square statistics. The more close to 1, ther stronger evidence of association. Here Cramer’s V is around 0.3, hence weak association existed.

assocplot(Hospital, col = c("black", "red"), space = 0.3,
          main = NULL, xlab = NULL, ylab = NULL)

(d)Both variables can be considered ordinal, so CMHtest () may be useful here. Carry out that analysis. Do any of the tests lead to different conclusions? install.packages(“samplesizeCMH”) library(samplesizeCMH)

library(vcdExtra)
CMHtest(Hospital)
## Cochran-Mantel-Haenszel Statistics for Visit frequency by Length of stay 
## 
##                  AltHypothesis  Chisq Df       Prob
## cor        Nonzero correlation 29.138  1 6.7393e-08
## rmeans  Row mean scores differ 34.391  2 3.4044e-08
## cmeans  Col mean scores differ 29.607  2 3.7233e-07
## general    General association 34.905  4 4.8596e-07

From the result, it is found that both row and column factors are ordered factors. Similar conclusion can be drawn for association holds.

Exercise 4.6 The two-way table Mammograms in vcdExtra gives ratings on the severity of diagnosis of 110 mammograms by two raters. (a)Assess the strength of agreement between the raters using Cohen’s κ, both unweighted and weighted. First, we want to look at the two-way table Mammograms

str(Mammograms)
##  num [1:4, 1:4] 34 6 2 0 10 8 5 1 2 8 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Reader2: chr [1:4] "Absent" "Minimal" "Moderate" "Severe"
##   ..$ Reader1: chr [1:4] "Absent" "Minimal" "Moderate" "Severe"
library(vcd)
table(Mammograms)
## Mammograms
##  0  1  2  4  5  6  8 10 12 14 34 
##  2  1  4  1  1  1  2  1  1  1  1

(a)Assess the strength of agreement between the raters using Cohen’s κ, both unweighted and weighted.

Kappa(Mammograms)
##             value     ASE      z  Pr(>|z|)
## Unweighted 0.3713 0.06033  6.154 7.560e-10
## Weighted   0.5964 0.04923 12.114 8.901e-34

(b)Use agreementplot () for a graphical display of agreement here.

agreementplot(Mammograms, main= "Test Plot")

#Unweighted
agreementplot(Mammograms, main= "Unweighted", weights=1)

#Weighted
agreementplot(Mammograms, main= "Weighted")

(c)Compare the Kappa measures with the results from assocstats (). What is a reasonable interpretation of each of these measures

assocstats(Mammograms)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 92.619  9 4.4409e-16
## Pearson          83.516  9 3.2307e-14
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.657 
## Cramer's V        : 0.503

Cramer’s is larger than 0.3 and close to 1. The P value is less than 0.05, we accept the null. Compare to Kappa’s result, it is similar conclusion.

Exercise 4.7 Agresti and Winner (1997) gave the data in below on the ratings of 160 movies by the reviewers Gene Siskel and Roger Ebert for the period from April 1995 through September 1996. The rating categories were Con (“thumbs down”), Mixed, and Pro (“thumbs up”)’ (a) Assess the strength of agreement between the raters using Cohen’s κ, both unweighted and weighted. Before we do the agreement analysis, we have to build the table within r.

siskel = matrix(c(24,8,13,45,8,13,11,32,10,9,64,83,42,30,88,160),ncol=4,byrow=TRUE)
colnames(siskel) = c("Con","Mixed","Pro","Total")
rownames(siskel) = c("Con","Mixed","Pro","Total")
siskel = as.table(siskel)
siskel
##       Con Mixed Pro Total
## Con    24     8  13    45
## Mixed   8    13  11    32
## Pro    10     9  64    83
## Total  42    30  88   160
Kappa(siskel)
##              value     ASE     z Pr(>|z|)
## Unweighted 0.09012 0.02882 3.127 0.001767
## Weighted   0.08737 0.03372 2.591 0.009573
  1. Use agreementplot () for a graphical display of agreement here. Unweighted agreement plot is shown as below:
agreementplot(siskel,main="Unweighted", weights=1)

Weighted agreement plot is shown as below:

agreementplot(siskel,main="Weighted")