Assignment #4

Questions

5-1. Which of the linear models below are multiple linear regressions? \[\begin{align} {μ_i = α + βx_i} \tag{1}\\ μ_i = β_xx_i + β_zz_i \tag{2} \\ μ_i = α + β(x_i − z_i) \tag{3} \\ μ_i = α + β_xx_i + β_zz_i \tag{4} \\ \end{align}\]

print('2,3,4 are multiple linear regression.')

## [1] "2,3,4 are multiple linear regression."

5-2. Write down a multiple regression to evaluate the claim: Neither amount of funding nor size of laboratory is by itself a good predictor of time to PhD degree; but together these variables are both positively associated with time to degree.* Write down the model definition and indicate which side of zero each slope parameter should be on.

#μ_i  <- α +β_fund * fund_i + β_lab_size * lab_size_i
#Model Definition: fund is funding, lab_size is size of laboratory.

5-3. Invent your own example of a spurious correlation. An outcome variable should be correlated with both predictor variables. But when both predictors are entered in the same model, the correlation between the outcome and one of the predictors should mostly vanish (or at least be greatly reduced).Plot and explain the correlation.

library(tidyverse)
income <- rnorm(100, 0, 1)
food_spend <- rnorm(100, mean = income, 2)
restaurant_spend <- rnorm(100, mean = income, 1)
spurious_data <- data.frame(restaurant_spend,food_spend, income)
pairs(spurious_data)

m1 <- rethinking::map(
  alist(
restaurant_spend ~ dnorm(mu, sigma), 
mu <- a + b*food_spend,
a ~ dnorm(0,5),
b ~ dnorm(0,5),
sigma ~ dunif(0,5)
),
data = spurious_data)
rethinking::precis(m1)

##              mean         sd       5.5%     94.5%
## a     0.000652061 0.12578662 -0.2003793 0.2016834
## b     0.272240463 0.05413516  0.1857220 0.3587589
## sigma 1.258056138 0.08895783  1.1158843 1.4002279

m2 <- rethinking::map( alist(
restaurant_spend ~ dnorm(mu, sigma), mu <- a + b*food_spend + c*income, a ~ dnorm(0,5),
b ~ dnorm(0,5),
c ~ dnorm(0,5),
sigma ~ dunif(0,5)),
data = spurious_data )
rethinking::precis(m2)

##              mean         sd        5.5%     94.5%
## a     0.009130612 0.10128956 -0.15274966 0.1710109
## b     0.070075751 0.05150919 -0.01224588 0.1523974
## c     0.856595635 0.11630197  0.67072263 1.0424686
## sigma 1.012870270 0.07162053  0.89840683 1.1273337

5-4. Invent your own example of a masked relationship. An outcome variable should be correlated with both predictor variables, but in opposite directions. And the two predictor variables should be correlated with one another. Plot and explain the correlation.

income <- rnorm(100, 0, 1)
education <- rnorm(100, 0.7*income,sqrt(1-0.7^2)) 
unemployment <- rnorm(100, income - education, 1) 
df <- data.frame(unemployment, income, education) 
pairs(df)

m3 <- rethinking::map( alist(
unemployment ~ dnorm(mu, sigma), mu <- a + b*income,
a ~ dnorm(0,5),
b ~ dnorm(0,5),
sigma ~ dunif(0,5) ),
data = df )
rethinking::precis(m3)

##             mean         sd       5.5%     94.5%
## a     -0.0374197 0.12635079 -0.2393527 0.1645133
## b      0.3901520 0.12156234  0.1958719 0.5844321
## sigma  1.2607618 0.08914914  1.1182842 1.4032393

m4 <- rethinking::map( alist(
unemployment ~ dnorm(mu, sigma), mu <- a + c*education,
a ~ dnorm(0,5),
c ~ dnorm(0,5),
sigma ~ dunif(0,5) ),
data = df )
rethinking::precis(m4)

##              mean        sd       5.5%       94.5%
## a      0.01189948 0.1301841 -0.1961599  0.21995888
## c     -0.26610005 0.1324642 -0.4778034 -0.05439669
## sigma  1.29819132 0.0917962  1.1514833  1.44489938

m5 <- rethinking::map( alist(
unemployment ~ dnorm(mu, sigma), mu <- a + b*income +c*education , a ~ dnorm(0,5),
b ~ dnorm(0,5),
c ~ dnorm(0,5),
sigma ~ dunif(0,5) ),
data = df )
rethinking::precis(m5)

##               mean         sd       5.5%     94.5%
## a     -0.005244408 0.09973602 -0.1646418  0.154153
## b      1.138603011 0.13571089  0.9217108  1.355495
## c     -1.119018830 0.14362269 -1.3485556 -0.889482
## sigma  0.994215680 0.07030021  0.8818624  1.106569

5-5. In the divorce data, States with high numbers of members of the Church of Jesus Christ of Latter-day Saints (LDS) have much lower divorce rates than the regression models expected. Find a list of LDS population by State and use those numbers as a predictor variable, predicting divorce rate using marriage rate, median age at marriage, and percent LDS population (possibly standardized). You may want to consider transformations of the raw percent LDS variable.

library(rethinking)
data("WaffleDivorce")
set.seed(5)

d <- WaffleDivorce
d$LDS <- c(0.0077, 0.0453, 0.0610, 0.0104, 0.0194, 0.0270, 0.0044, 0.0057, 0.0041, 0.0075, 0.0082, 0.0520, 0.2623, 0.0045, 0.0067, 0.0090, 0.0130, 0.0079, 0.0064, 0.0082, 0.0072, 0.0040, 0.0045, 0.0059, 0.0073, 0.0116, 0.0480, 0.0130, 0.0065, 0.0037, 0.0333, 0.0041, 0.0084, 0.0149, 0.0053, 0.0122, 0.0372, 0.0040, 0.0039, 0.0081, 0.0122, 0.0076, 0.0125, 0.6739, 0.0074, 0.0113, 0.0390, 0.0093, 0.0046, 0.1161)
d$logLDS <- log(d$LDS)
d$logLDS.s <- (d$logLDS - mean(d$logLDS)) / sd(d$logLDS)
simplehist(d$LDS)

simplehist(d$logLDS)

simplehist(d$logLDS.s)

m <- rethinking::map(
  alist(
    Divorce ~ dnorm(mu, sigma),
    mu <- a + bm * Marriage + ba * MedianAgeMarriage + bl * logLDS.s,
    a ~ dnorm(10, 20),
    bm ~ dnorm(0, 10),
    ba ~ dnorm(0, 10),
    bl ~ dnorm(0, 10),
    sigma ~ dunif(0, 5)
  ),
  data = d
)
rethinking::precis(m)

##             mean         sd        5.5%      94.5%
## a     35.4464738 6.77473049 24.61914599 46.2738016
## bm     0.0534216 0.08261297 -0.07860988  0.1854531
## ba    -1.0299865 0.22467479 -1.38906026 -0.6709128
## bl    -0.6077923 0.29055028 -1.07214777 -0.1434368
## sigma  1.3786268 0.13836381  1.15749466  1.5997589

5-6. In the divorce example, suppose the DAG is: M → A → D. What are the implied conditional independencies of the graph? Are the data consistent with it? (Hint: use the dagitty package)

library(dagitty)

mad_dag <- dagitty("dag{ M -> A -> D}")
impliedConditionalIndependencies(mad_dag)

## D _||_ M | A

equivalentDAGs(mad_dag)

## [[1]]
## dag {
## A
## D
## M
## A -> D
## M -> A
## }
## 
## [[2]]
## dag {
## A
## D
## M
## A -> D
## A -> M
## }
## 
## [[3]]
## dag {
## A
## D
## M
## A -> M
## D -> A
## }

Assignment #4

Yuman Liang

2021-08-24

Chapter 5 - Many Variables and Spurious Waffles

Questions