Question: What does 5 to 9 mean?
We're going to do the simplest case: two-level variables.
Just so that you'll see that it's easy to do, let's do a simple model: Can we predict the sex of one of the kids from the width and length of the foot?
mod = glm(sex ~ width + length, data = KidsFeet, family = "binomial")
summary(mod)
##
## Call:
## glm(formula = sex ~ width + length, family = "binomial", data = KidsFeet)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.606 -1.030 -0.563 1.017 1.656
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.139 8.155 2.22 0.026 *
## width -1.649 0.967 -1.71 0.088 .
## length -0.136 0.353 -0.39 0.699
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 54.040 on 38 degrees of freedom
## Residual deviance: 47.071 on 36 degrees of freedom
## AIC: 53.07
##
## Number of Fisher Scoring iterations: 4
head(fitted(mod)) # probabilities
## 1 2 3 4 5 6
## 0.7246 0.5429 0.2334 0.1901 0.5120 0.2054
It's not obvious from the report whether these are the probability of being a girl or the probability of being a boy. Here's a plot to help out:
bwplot(fitted(mod) ~ sex, data = KidsFeet)
It seems that the probability refers to being a girl, since the girls have a higher probability than the boys.
The functions themselves have a nice, graceful shape which you may recognize from Math 135 as being an S-function.
f = makeFun(mod)
plotFun(f(width = 10, length) ~ length, length.lim = range(0, 40))
plotFun(f(width = width, length = length) ~ width & length, length.lim = c(10,
30), width.lim = c(7, 12))
ACTIVITY: Try adding in an interaction term, etc.
mod2 = glm(sex ~ length * width, data = KidsFeet, family = "binomial")
f2 = makeFun(mod2)
plotFun(f2(width = width, length = length) ~ width & length, length.lim = c(20,
30), width.lim = c(7, 10))
plotPoints(length ~ width, data = KidsFeet, add = TRUE, col = KidsFeet$sex)
Look at the data in oring-damage.csv, which is about the space shuttle. The damage variable has been encoded as 0 or 1.
oring = fetchData("oring-damage.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/oring-damage.csv
mod = glm(damage ~ temp, data = oring, family = "binomial")
f = makeFun(mod)
plotFun(f(temp) ~ temp, temp.lim = range(0, 100))
Activity: Add in quadratic or cubic terms or stepwise terms and see how that changes the shape of the function. For instance:
mod2 = glm(damage ~ poly(temp, 2) + (temp < 70), data = oring, family = "binomial")
f2 = makeFun(mod2)
plotFun(f2(temp) ~ temp, temp.lim = range(0, 100))
The function doesn't need to be a simple step.
Since fitting and the report generation is trivial, and the report interpretation is superficially the same, we don't need to spend much time on that. Instead, we'll spend a bit of time on interpretation in a common setting: Case-Control studies.
We tend to think about three main kinds of studies:
But there is another kind of study that people don't usually think about but that can be very useful in certain circumstances.
There is a group that wants to do its survey about health outcomes for college students and the factors that determine them.