5E4. Suppose you have a single categorical predictor with 4 levels (unique values), labeled A, B, C and D. Let Ai be an indicator variable that is 1 where case i is in category A. Also suppose Bi, Ci, and Di for the other categories. Now which of the following linear models are inferentially equivalent ways to include the categorical variable in a regression? Models are inferentially equivalent when it’s possible to compute one posterior distribution from the posterior distribution of another model.

We struggled with this question and couldn´t reach an answer. We would appreciate if you could clearify how to read these regression models at the seminar.

5M2. Invent your own example of a masked relationship. An outcome variable should be correlated with both predictor variables, but in opposite directions. And the two predictor variables should be correlated with one another.

outcome: time spent on tiktok

age: negative relation with tiktok use

hip to waist ratio: positive relation with tiktok use

age and hip to waist ratio are positively correlated.

5M3. It is sometimes observed that the best predictor of fire risk is the presence of firefighters. States and localities with many firefighters also have more fires. Presumably firefighters do not cause fires. Nevertheless, this is not a spurious correlation. Instead fires cause firefighters. Consider the same reversal of causal inference in the context of the divorce and marriage data. How might a high divorce rate cause a higher marriage rate? Can you think of a way to evaluate this relationship, using multiple regression?

Higher divorce rate could cause a higher marriage rate due to people remarrying. So ideally we would like to include one more variable in out data that include the remarriage rate after divorce.

marriage rate ~ divorce rate + remarriage rate

5M4. In the divorce data, States with high numbers of members of the Church of Jesus Christ of Latter-day Saints (LDS) have much lower divorce rates than the regression models expected. Find a list of LDS population by State and use those numbers as a predictor variable, predicting divorce rate using marriage rate, median age at marriage, and percent LDS population (possibly standardized). You may want to consider transformations of the raw percent LDS variable.

data(WaffleDivorce)

d <- WaffleDivorce

d$pct_LDS <- c(0.75, 4.53, 6.18, 1, 2.01, 2.82, 0.43, 0.55, 0.38,
0.75, 0.82, 5.18, 26.35, 0.44, 0.66, 0.87, 1.25, 0.77, 0.64, 0.81,
0.72, 0.39, 0.44, 0.58, 0.72, 1.14, 4.78, 1.29, 0.61, 0.37, 3.34,
0.41, 0.82, 1.48, 0.52, 1.2, 3.85, 0.4, 0.37, 0.83, 1.27, 0.75,
1.21, 67.97, 0.74, 1.13, 3.99, 0.92, 0.44, 11.5 )
d$pct_LDS <- log(d$pct_LDS)
d$Marriage <- scale(d$Marriage)
d$pct_LDS <- scale(d$pct_LDS)
d$MedianAgeMarriage <- scale(d$MedianAgeMarriage)


M <- quap(
  alist(
    Divorce ~ dnorm(mu,sigma),
    mu <- a + bR*Marriage + bA*MedianAgeMarriage + bM*pct_LDS,
    a ~ dnorm(0,100),
    c(bA,bR,bM) ~ dnorm(0,10),
    sigma ~ dunif(0,10)
    ),data=d )

precis( M )
##             mean        sd       5.5%      94.5%
## a      9.6879643 0.1946136  9.3769341  9.9989944
## bA    -1.4119277 0.2944285 -1.8824813 -0.9413740
## bR     0.1007897 0.3195464 -0.4099071  0.6114866
## bM    -0.6193699 0.2888301 -1.0809762 -0.1577637
## sigma  1.3761287 0.1376128  1.1561968  1.5960606

According to this model, the percentage of the population that are members of the LDS church is negatively related to the dicorce rate, as can be read from the bM slope.

5M5. One way to reason through multiple causation hypotheses is to imagine detailed mechanisms through which predictor variables may influence outcomes. For example, it is sometimes argued that the price of gasoline (predictor variable) is positively associated with lower obesity rates (outcome variable). However, there are at least two important mechanisms by which the price of gas could reduce obesity. First, it could lead to less driving and therefore more exercise. Second, it could lead to less driving, which leads to less eating out, which leads to less consumption of huge restaurant meals. Can you outline one or more multiple regressions that address these two mechanisms? Assume you can have any predictor data you need

obesity ~ priceofgasoline + stepsperday + restaurantvisits