Assignment #5

Questions

6E1. List three mechanisms by which multiple regression can produce false inferences about causal effects.

#Multicollinearity, Post-treatment bias, and Collider bias..

6E2. For one of the mechanisms in the previous problem, provide an example of your choice, perhaps from your own research.

# Multicollinearity

#Example: divorce rate and marriage rate; Education level and income

6E3. List the four elemental confounds. Can you explain the conditional dependencies of each?

# Four elemental confounds: The Fork, The Pipe, The Collider, and The Dexcendant.
# The Confounding Fork: Z is a common cause of X and Y
# The Perplexing Pipe: X causes Z causes Y
# The Explosive Collider: X and Y jointly cause Z
# The Descendant: conditioning on A is like conditioning on Z

6E4. How is a biased sample like conditioning on a collider? Think of the example at the open of the chapter.

# A biased sample like conditioning on a collider is not always relational.
# Example: Newsworthiness and Trustworthiness scores

6M1. Modify the DAG on page 186 to include the variable V, an unobserved cause of C and Y: C ← V → Y. Reanalyze the DAG. How many paths connect X to Y? Which must be closed? Which variables should you condition on now?

# There are four paths in total:
# 1. X ← U ← A → C → Y
# 2.  X ← U ← A → C → V → Y
# 3. X ← U → B ← C → Y
# 4. X ← U → B ← C → V → Y
# 
# 
# Path 1: condition on A to close 
# Path 4: condition on B to open

6M2. Sometimes, in order to avoid multicollinearity, people inspect pairwise correlations among predictors before including them in a model. This is a bad procedure, because what matters is the conditional association, not the association before the variables are included in the model. To highlight this, consider the DAG X → Z → Y. Simulate data from this DAG so that the correlation between X and Z is very large. Then include both in a model prediction Y. Do you observe any multicollinearity? Why or why not? What is different from the legs example in the chapter?

#There is observed multicollinearity between X and Z because they are highly correlated.

library(rethinking)

## Loading required package: rstan

## Loading required package: StanHeaders

## Loading required package: ggplot2

## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)

## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)

## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file

## Loading required package: parallel

## rethinking (Version 2.12)

## 
## Attaching package: 'rethinking'

## The following object is masked from 'package:stats':
## 
##     rstudent

n<- 1000
b_xz<- 0.9
b_zy<- 0.7

set.seed(100)
x<- rnorm(n)
z<- rnorm(n,x*b_xz)
y<- rnorm(n,z*b_zy)

d <- data.frame(x,y,z)
cor(d)

##           x         y         z
## x 1.0000000 0.4562717 0.6924074
## y 0.4562717 1.0000000 0.6351279
## z 0.6924074 0.6351279 1.0000000

m6m2<- quap( alist( 
  y ~ dnorm( mu , sigma ), 
  mu <- a + b_xz*x + b_zy*z,
  a ~ dnorm( 0 , 100 ), 
  c(b_xz,b_zy) ~ dnorm( 0 , 100 ), 
  sigma ~ dexp( 1 ) ), 
  data=d )

## Caution, model may not have converged.

## Code 1: Maximum iterations reached.

precis(m6m2)

##               mean         sd        5.5%      94.5%
## a     -0.008135237 0.03332760 -0.06139918 0.04512871
## b_xz   0.041459808 0.04483542 -0.03019585 0.11311547
## b_zy   0.619099380 0.03398903  0.56477834 0.67342042
## sigma  1.053730409 0.02383294  1.01564077 1.09182005

Assignment #5

JINGFU LI

2020-08-25

Chapter 6 - The Haunted DAG & The Causal Terror

Questions