1. The pneumo data gives the number of coal miners classified by radiological examination into one of three categories of pneumonoconiosis and by the number of years spent working at the coal face divided into eight categories.
##   Freq status year
## 1   98 normal  5.8
## 2   51 normal 15.0
## 3   34 normal 21.5
## 4   35 normal 27.5
## 5   32 normal 33.5
## 6   23 normal 39.5
## [1] 24  3
## [1] "Freq"   "status" "year"
## [1] "mild"   "normal" "severe"

Variable Description

Freq: Number of coal miners classified at each status, divided into the different year categories
Status: radiological classification of pneumoconiosis: normal, mild, severe (response variable)
Year: number of years spent working at the coal face, 8 categories total

Questions

  1. Treating the pneumonoconiosis status as response variable as nominal, build a model for predicting the frequency of the three outcomes in terms of length of service and use it to predict the outcome for a miner with 25 years of service.

To start off, I visualized the change in case incidence as a funciton of years working

## # weights:  9 (4 variable)
## initial  value 407.585159 
## iter  10 value 208.724810
## final  value 208.724782 
## converged
## Call:
## multinom(formula = status ~ year, data = pneumo, weights = Freq)
## 
## Coefficients:
##        (Intercept)        year
## normal   4.2916723 -0.08356506
## severe  -0.7681706  0.02572027
## 
## Residual Deviance: 417.4496 
## AIC: 425.4496
## [1] normal
## Levels: mild normal severe
##       mild     normal     severe 
## 0.09148821 0.82778696 0.08072483

The nominal model predicts 25 years of exposure will most likely have no case, with p = 0.828.

  1. Repeat the analysis with the pneumonoconiosis status being treated as ordinal.
## Call:
## polr(formula = status ~ year, data = pneumo, weights = Freq)
## 
## Coefficients:
##       year 
## 0.09590405 
## 
## Intercepts:
## normal|mild mild|severe 
##    3.955843    4.869048 
## 
## Residual Deviance: 416.9188 
## AIC: 422.9188
## [1] normal
## Levels: normal mild severe
##     normal       mild     severe 
## 0.82610096 0.09601474 0.07788430

The ordinal model also predicts 25 years of exposure will most likely have no case, with p = 0.826.

  1. Now treat the response variable as hierarchical with top level indicating whether the miner has the disease and the second level indicating, given they have the disease, whether they have a moderate or severe case.
## 
## Call:
## glm(formula = cbind(case, no_case) ~ year, family = "binomial", 
##     data = pneumo_binom_df[pneumo_binom_df$status != "mild", 
##         ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5267  -0.7054  -0.5013   0.8103   1.4294  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.966352   0.241869  -16.40   <2e-16 ***
## year         0.096269   0.007138   13.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 292.692  on 23  degrees of freedom
## Residual deviance:  34.697  on 22  degrees of freedom
## AIC: 116.23
## 
## Number of Fisher Scoring iterations: 5
##         1 
## -1.559621
##        1 
## 0.826299

The top layer predicts a normal case at year = 25, with p = 0.826.

## # weights:  3 (2 variable)
## initial  value 56.838069 
## final  value 55.444195 
## converged
## Call:
## multinom(formula = status ~ year, data = pneumo_case, weights = Freq)
## 
## Coefficients:
## (Intercept)        year 
## -1.11342251  0.03547178 
## 
## Residual Deviance: 110.8884 
## AIC: 114.8884
## [1] mild
## Levels: mild severe
##         1 
## 0.5564158

Given that there is a case at year = 25, it is most likely to be a mild case, with p = 0.556.

  1. Compare the three analyses.

The three analyses all appear to give similar results, with similar probabilities and all predicting the same normal status after 25 years of coal mining. However, this treatment of pneumo data has obvious limitations because of bias introduced from survivorship. In the most severe cases, miners may die and remain uncounted at the highest levels of exposure. Normal and mild cases may be overrepresented in the highest yearly quantiles for this reason.