library(ggplot2)
library(faraway)
  1. The salmonella data was collected in a salmonella reverse mutagenicity assay. The predictor is the dose level of quinoline and the response is the numbers of revertant colonies of TA98 salmonella observed on each of three replicate plates. Show that a Poisson GLM is inadequate and that some overdispersion must be allowed for. Do not forget to check out other reasons for a high deviance.
head(salmonella)
##   colonies dose
## 1       15    0
## 2       21    0
## 3       29    0
## 4       16   10
## 5       18   10
## 6       21   10
dim(salmonella)
## [1] 18  2
salmonella
##    colonies dose
## 1        15    0
## 2        21    0
## 3        29    0
## 4        16   10
## 5        18   10
## 6        21   10
## 7        16   33
## 8        26   33
## 9        33   33
## 10       27  100
## 11       41  100
## 12       60  100
## 13       33  333
## 14       38  333
## 15       41  333
## 16       20 1000
## 17       27 1000
## 18       42 1000
ggplot(salmonella, aes(dose, colonies)) +
  geom_point() +
  geom_smooth(method = 'glm', method.args = list(family = 'poisson'), se = FALSE) +
  labs(title = 'Quinoline dose versus Salmonella Colonies')
## `geom_smooth()` using formula 'y ~ x'

ggplot(salmonella, aes(log(dose+1), colonies)) + ## replotting the points after log-transforming the dose for visualization purposes
  geom_point() +
  geom_smooth(method = 'glm', method.args = list(family = 'poisson'), se = FALSE) +
  labs(title = 'Quinoline dose versus Salmonella Colonies')
## `geom_smooth()` using formula 'y ~ x'

salmonella_lm <- glm(data = salmonella, formula = colonies~., family = 'poisson')


summary(salmonella_lm)
## 
## Call:
## glm(formula = colonies ~ ., family = "poisson", data = salmonella)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6482  -1.8225  -0.2993   1.2917   5.1861  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 3.3219950  0.0540292  61.485   <2e-16 ***
## dose        0.0001901  0.0001172   1.622    0.105    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 78.358  on 17  degrees of freedom
## Residual deviance: 75.806  on 16  degrees of freedom
## AIC: 172.34
## 
## Number of Fisher Scoring iterations: 4

Plotting doses on the log-scale helps visualize the dose-response curve for mutagenicity. There appear to be changes in the variance between the three repetitions at each dose, as well as visual heteroscedasticity.

salmonella_dp <- sum(residuals(salmonella_lm, type = 'pearson')**2)/salmonella_lm$df.res #code taken from textbook

The dispersion parameter is >1, so these data are overdispersed.

drop1(salmonella_lm, test = 'F')
## Warning in drop1.glm(salmonella_lm, test = "F"): F test assumes 'quasipoisson'
## family
## Single term deletions
## 
## Model:
## colonies ~ dose
##        Df Deviance    AIC F value Pr(>F)
## <none>      75.806 172.34               
## dose    1   78.358 172.89  0.5386 0.4736
summary(salmonella_lm, dispersion = salmonella_dp)
## 
## Call:
## glm(formula = colonies ~ ., family = "poisson", data = salmonella)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6482  -1.8225  -0.2993   1.2917   5.1861  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 3.3219950  0.1218626  27.260   <2e-16 ***
## dose        0.0001901  0.0002644   0.719    0.472    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 5.087258)
## 
##     Null deviance: 78.358  on 17  degrees of freedom
## Residual deviance: 75.806  on 16  degrees of freedom
## AIC: 172.34
## 
## Number of Fisher Scoring iterations: 4

After manually specifying the dispersion parameter, the z-statistic through the ‘summary’ funciton closely matches the F-statistic calculated separately.

After reiewing the experimental design, there are some possibilities that explain deviance and particularly systematic differences in deviance. Three salmonella cultures were exposed to a mutagenic compound at different concentrations. There exist some variance at each dose that may indicate other factors may not be controlled. The description also mentions three separate plates are used, so it’s possible that five wafers with the mutagen were placed on the same plate and only the colonies surrounding the wafer were counted. In this case, there may be another variable to match the observations with: plate number. Under this design the drug wafers could have been misidentified, which would explain some of discrepancies in dose-response. Finally, since the experiment is measuring changes in genetic expression, it’s also possible the mutagen is impacting a gene responsible for regulating the rate of metabolism or division.