Breathlessness and wheeze in coal miners

The various ways of standardizing a collection of \(2\times 2\) tables allows visualizing relations with different factors (row percentages, column percentages, strata totals) controlled. However, different kinds of graphs can speak more eloquently to other questions by focusing more directly on the odds ratio. Agresti(2002) cites data from Ashford and Sowden (1970) on the association between two pulmonary conditions, breathlessness and wheeze, in a large sample of coal miners. The miners are classified into age groups, and the question treated by Agresti is whether the association between these two symptoms is homogeneous over age.These data are available in he CoalMiners data in vcd, a \(2\times 2\times 9\) frequency table. The first group, aged \(20-24\) has been omitted from these analyses.

data("CoalMiners",package="vcd")
CM <- CoalMiners[,,2:9]
ftable(CM, row.vars = 3)
##       Breathlessness    B       NoB     
##       Wheeze            W  NoW    W  NoW
## Age                                     
## 25-29                  23    9  105 1654
## 30-34                  54   19  177 1863
## 35-39                 121   48  257 2357
## 40-44                 169   54  273 1778
## 45-49                 269   88  324 1712
## 50-54                 404  117  245 1324
## 55-59                 406  152  225  967
## 60-64                 372  106  132  526
  1. The question of interest can addressed by displaying the odds ratio in the \(2\times 2\) tables with the margins of breathlessness and wheeze equated. Draw a fourfold plot. What do you conclude?

They all have a same pattern.

fourfoldplot(CM)

  1. The oddsratio() function in vcd calculates odds ratios for \(2\times 2\times k\) tables. By default, it returns the log odds. Use the option log=FALSE to get the odds ratios themselves. It is easy to see that the (log) odds ratios decline with age.
#measure of association
or<-oddsratio(CM, log=F)
summary(or)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## 25-29  40.2561    16.3381  2.4639 0.0137420 *  
## 30-34  29.9144     8.3190  3.5959 0.0003233 ***
## 35-39  23.1191     4.2260  5.4707 4.483e-08 ***
## 40-44  20.3827     3.4507  5.9068 3.488e-09 ***
## 45-49  16.1521     2.2118  7.3026 2.822e-13 ***
## 50-54  18.6602     2.3499  7.9407 2.010e-15 ***
## 55-59  11.4796     1.3833  8.2987 < 2.2e-16 ***
## 60-64  13.9846     2.0553  6.8043 1.015e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. When the analysis goal is to understand how the odds ratio varies with a stratifying factor (which could be a quantitative variable), it is often better to plot the odds ratio directly. Plot the odds ratios and write a paragraph to explain your finding.
plot(or)

  1. Do you think it is necessary to test for homogeneous association and/or test for conditional independence? If yes, how to conduct the analysis?

Alternative hypothesis tells that there is no homogeneous association. In our case, we have a small p-value and we reject null hypothesis and accept alternative hypothesis that there is no homogeneous association.

woolf_test(CM)
## 
##  Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)
## 
## data:  CM
## X-squared = 25.614, df = 7, p-value = 0.0005903