library(ggplot2)
library(faraway)
head(salmonella)
## colonies dose
## 1 15 0
## 2 21 0
## 3 29 0
## 4 16 10
## 5 18 10
## 6 21 10
dim(salmonella)
## [1] 18 2
salmonella
## colonies dose
## 1 15 0
## 2 21 0
## 3 29 0
## 4 16 10
## 5 18 10
## 6 21 10
## 7 16 33
## 8 26 33
## 9 33 33
## 10 27 100
## 11 41 100
## 12 60 100
## 13 33 333
## 14 38 333
## 15 41 333
## 16 20 1000
## 17 27 1000
## 18 42 1000
ggplot(salmonella, aes(dose, colonies)) +
geom_point() +
geom_smooth(method = 'glm', method.args = list(family = 'poisson'), se = FALSE) +
labs(title = 'Quinoline dose versus Salmonella Colonies')
## `geom_smooth()` using formula 'y ~ x'
ggplot(salmonella, aes(log(dose+1), colonies)) + ## replotting the points after log-transforming the dose for visualization purposes
geom_point() +
geom_smooth(method = 'glm', method.args = list(family = 'poisson'), se = FALSE) +
labs(title = 'Quinoline dose versus Salmonella Colonies')
## `geom_smooth()` using formula 'y ~ x'
salmonella_lm <- glm(data = salmonella, formula = colonies~., family = 'poisson')
summary(salmonella_lm)
##
## Call:
## glm(formula = colonies ~ ., family = "poisson", data = salmonella)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.6482 -1.8225 -0.2993 1.2917 5.1861
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.3219950 0.0540292 61.485 <2e-16 ***
## dose 0.0001901 0.0001172 1.622 0.105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 78.358 on 17 degrees of freedom
## Residual deviance: 75.806 on 16 degrees of freedom
## AIC: 172.34
##
## Number of Fisher Scoring iterations: 4
Plotting doses on the log-scale helps visualize the dose-response curve for mutagenicity. There appear to be changes in the variance between the three repetitions at each dose, as well as visual heteroscedasticity.
salmonella_dp <- sum(residuals(salmonella_lm, type = 'pearson')**2)/salmonella_lm$df.res #code taken from textbook
The dispersion parameter is >1, so these data are overdispersed.
drop1(salmonella_lm, test = 'F')
## Warning in drop1.glm(salmonella_lm, test = "F"): F test assumes 'quasipoisson'
## family
## Single term deletions
##
## Model:
## colonies ~ dose
## Df Deviance AIC F value Pr(>F)
## <none> 75.806 172.34
## dose 1 78.358 172.89 0.5386 0.4736
summary(salmonella_lm, dispersion = salmonella_dp)
##
## Call:
## glm(formula = colonies ~ ., family = "poisson", data = salmonella)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.6482 -1.8225 -0.2993 1.2917 5.1861
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.3219950 0.1218626 27.260 <2e-16 ***
## dose 0.0001901 0.0002644 0.719 0.472
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 5.087258)
##
## Null deviance: 78.358 on 17 degrees of freedom
## Residual deviance: 75.806 on 16 degrees of freedom
## AIC: 172.34
##
## Number of Fisher Scoring iterations: 4
After manually specifying the dispersion parameter, the z-statistic through the ‘summary’ funciton closely matches the F-statistic calculated separately.
After reiewing the experimental design, there are some possibilities that explain deviance and particularly systematic differences in deviance. Three salmonella cultures were exposed to a mutagenic compound at different concentrations. There exist some variance at each dose that may indicate other factors may not be controlled. The description also mentions three separate plates are used, so it’s possible that five wafers with the mutagen were placed on the same plate and only the colonies surrounding the wafer were counted. In this case, there may be another variable to match the observations with: plate number. Under this design the drug wafers could have been misidentified, which would explain some of discrepancies in dose-response. Finally, since the experiment is measuring changes in genetic expression, it’s also possible the mutagen is impacting a gene responsible for regulating the rate of metabolism or division.