5E4. Suppose you have a single categorical predictor with 4 levels (unique values), labeled A, B, C and D. Let Ai be an indicator variable that is 1 where case i is in category A. Also suppose Bi, Ci, and Di for the other categories. Now which of the following linear models are inferentially equivalent ways to include the categorical variable in a regression? Models are inferentially equivalent when it’s possible to compute one posterior distribution from the posterior distribution of another model.
We struggled with this question and couldn´t reach an answer. We would appreciate if you could clearify how to read these regression models at the seminar.
5M3. It is sometimes observed that the best predictor of fire risk is the presence of firefighters. States and localities with many firefighters also have more fires. Presumably firefighters do not cause fires. Nevertheless, this is not a spurious correlation. Instead fires cause firefighters. Consider the same reversal of causal inference in the context of the divorce and marriage data. How might a high divorce rate cause a higher marriage rate? Can you think of a way to evaluate this relationship, using multiple regression?
Higher divorce rate could cause a higher marriage rate due to people remarrying. So ideally we would like to include one more variable in out data that include the remarriage rate after divorce.
marriage rate ~ divorce rate + remarriage rate
5M4. In the divorce data, States with high numbers of members of the Church of Jesus Christ of Latter-day Saints (LDS) have much lower divorce rates than the regression models expected. Find a list of LDS population by State and use those numbers as a predictor variable, predicting divorce rate using marriage rate, median age at marriage, and percent LDS population (possibly standardized). You may want to consider transformations of the raw percent LDS variable.
data(WaffleDivorce)
d <- WaffleDivorce
d$pct_LDS <- c(0.75, 4.53, 6.18, 1, 2.01, 2.82, 0.43, 0.55, 0.38,
0.75, 0.82, 5.18, 26.35, 0.44, 0.66, 0.87, 1.25, 0.77, 0.64, 0.81,
0.72, 0.39, 0.44, 0.58, 0.72, 1.14, 4.78, 1.29, 0.61, 0.37, 3.34,
0.41, 0.82, 1.48, 0.52, 1.2, 3.85, 0.4, 0.37, 0.83, 1.27, 0.75,
1.21, 67.97, 0.74, 1.13, 3.99, 0.92, 0.44, 11.5 )
d$pct_LDS <- log(d$pct_LDS)
d$Marriage <- scale(d$Marriage)
d$pct_LDS <- scale(d$pct_LDS)
d$MedianAgeMarriage <- scale(d$MedianAgeMarriage)
M <- quap(
alist(
Divorce ~ dnorm(mu,sigma),
mu <- a + bR*Marriage + bA*MedianAgeMarriage + bM*pct_LDS,
a ~ dnorm(0,100),
c(bA,bR,bM) ~ dnorm(0,10),
sigma ~ dunif(0,10)
),data=d )
precis( M )
## mean sd 5.5% 94.5%
## a 9.6879643 0.1946136 9.3769341 9.9989944
## bA -1.4119277 0.2944285 -1.8824813 -0.9413740
## bR 0.1007897 0.3195464 -0.4099071 0.6114866
## bM -0.6193699 0.2888301 -1.0809762 -0.1577637
## sigma 1.3761287 0.1376128 1.1561968 1.5960606
According to this model, the percentage of the population that are members of the LDS church is negatively related to the dicorce rate, as can be read from the bM slope.
5M5. One way to reason through multiple causation hypotheses is to imagine detailed mechanisms through which predictor variables may influence outcomes. For example, it is sometimes argued that the price of gasoline (predictor variable) is positively associated with lower obesity rates (outcome variable). However, there are at least two important mechanisms by which the price of gas could reduce obesity. First, it could lead to less driving and therefore more exercise. Second, it could lead to less driving, which leads to less eating out, which leads to less consumption of huge restaurant meals. Can you outline one or more multiple regressions that address these two mechanisms? Assume you can have any predictor data you need
obesity ~ priceofgasoline + stepsperday + restaurantvisits