Lecture 6: Good and Bad Confounds

library(rethinking)
library(tidyverse)
theme_set(theme_bw())

Take Aways of the week

Fork: X ← Z → Y

X and Y are associatied, but not directly each other.
Z is common cause of X and Y
If we stratify on Z, X and Y are independent.
Conditioning will block the path
Example: Marriage Rate and Divorce Rate influenced by Age of Marriage

Pipe: X → Z → Y

X and Y are associated.
Z transmitting causal relationship
If we stratify Z, no association.
Conditioning will block the path.
Example: Fungus treatment and plant growth example.

Collider: X → Z ← Y.

X and Y are not associated
They both influence Z
Stratifying Z will make X and Y associated
Conditioning on Z, opens the path, make it look like X and Y are correlated.
Good looking and talented actors, stratifying on success

Descendent:

Collider + a child of Z. Conditioning on the child will also act like collider.

Backdoor criterion:

List of all paths connecting X (var. interest) and Y (outcome)
Classify each path whether is open or closed.
- A path is open unless it contains a collider.
Check if there are backdoor paths (arrow entering X)
If there are open backdoor paths, try to close it by conditioning.

Question

The data in data(foxes) are 116 foxes from 30 different urban groups in England.

data(foxes)
head(foxes)

##   group avgfood groupsize area weight
## 1     1    0.37         2 1.09   5.02
## 2     1    0.37         2 1.09   2.84
## 3     2    0.53         2 2.05   5.33
## 4     2    0.53         2 2.05   6.07
## 5     3    0.49         2 2.12   5.85
## 6     3    0.49         2 2.12   3.25

foxes %>% 
  ggplot(aes(y=1, x=weight, color=as.factor(group))) + 
  geom_point() +
  facet_wrap(.~group) +
  guides(color="none")

These fox groups are like street gangs. Group size (groupsize) varies from 2 to 8 individuals. Each group maintains its own (almost exclusive) urban territory. Some ter- ritories are larger than others. The area variable encodes this information. Some territories also have more avgfood than others. And food influences the weight of each fox. Assume this DAG:

1. Use the backdoor criterion and estimate the total causal influence of A on

F. What effect would increasing the area of a territory have on the amount of food inside it?

library(rethinking)

d <- foxes

d$W <- standardize(d$weight)
d$A <- standardize(d$area)
d$F <- standardize(d$avgfood)
d$G <- standardize(d$groupsize)

model1 <- quap(
  alist(
  F ~ dnorm( mu , sigma ),
  mu <- a + bA*A,
  a ~ dnorm(0,10),
  bA ~ dnorm(0,10),
  sigma ~ dexp(1)
), data=d )

\[ Food_i = \alpha + \beta_A \times \text{Area}_i + \epsilon \]

precis(model1)

##               mean         sd        5.5%      94.5%
## a     1.823143e-06 0.04328723 -0.06917953 0.06918318
## bA    8.830868e-01 0.04347502  0.81360527 0.95256824
## sigma 4.662221e-01 0.03051719  0.41744974 0.51499446

coeftab_plot(coeftab(model1))

One SD of Area leads on average 0.88 SD in food availability.

2. Infer the total causal effect of adding food F to a territory on the weight

W of foxes. Can you calculate the causal effect by simulating an intervention on food? \[ Weight_i = \alpha + \beta_F \times \text{Food}_i + \epsilon \]

model2 <- quap(
            alist(
              W ~ dnorm( mu , sigma ),
              mu <- a + bF*F,
              a ~ dnorm(0,1),
              bF ~ dnorm(0,1),
              sigma ~ dexp(1)
              ), data=d )

precis(model2)

##                mean         sd       5.5%     94.5%
## a     -2.070640e-07 0.09163803 -0.1464555 0.1464551
## bF    -2.482672e-02 0.09203223 -0.1719120 0.1222586
## sigma  9.911421e-01 0.06465827  0.8878057 1.0944785

coeftab_plot(coeftab(model2))

3. Infer the direct causal effect of adding food F to a territory on the weight

W of foxes. In light of your estimates from this problem and the previous one, what do you think is going on with these foxes? \[ Weight_i = \alpha + \beta_F \times \text{Food}_i + \beta_G \times \text{Group Size}_i + \epsilon \]

model3 <- quap(
        alist(
          W ~ dnorm( mu , sigma ),
          mu <- a + bF*F + bG*G,
          a ~ dnorm(0,1),
          c(bF, bG) ~ dnorm(0,1),
          sigma ~ dexp(1)
          ), data=d )

precis(model3)

##                mean         sd       5.5%      94.5%
## a      6.951703e-07 0.08690214 -0.1388857  0.1388871
## bF     5.914092e-01 0.19541352  0.2791006  0.9037177
## bG    -6.888514e-01 0.19541511 -1.0011625 -0.3765403
## sigma  9.395190e-01 0.06134193  0.8414828  1.0375553

coeftab_plot(coeftab(model3))

model3b <- quap(
        alist(
          W ~ dnorm( mu , sigma ),
          mu <- a + bF*F + bG*G,
          a ~ dnorm(0,1),
          bF ~ dnorm(0,1),
          bG ~ dnorm(0,1),
          sigma ~ dexp(1)
          ), data=d )

coeftab_plot(coeftab(model3, model3b))