Stats 155 Class Notes 2012-11-30

In the News

Confounders as a diagram

Variable   --->  Outcome
   ^                 ^
   |                 |
   ---> Confounder <--

SAT versus per-student spending

Draw the causal diagram

Spending  ---------->      SAT    <-------|
      |                                   |
Focus on Educ. ---> fraction taking SAT  -|

Campaign Spending and the back-door network

Research in political science shows that higher spending in campaigns is related to a lower vote for the incumbent. Yet it's common sense that higher spending improves things for the candidate; that's why they do it.

  Polls <-----    Popularity ---> vote outcome
    |                                 ^
    v                                 |
  Spending ---------------------------

How to block a back-door?

We've done it by including the covariate in the model. But this is too crude an answer.

Example: Election Spending

fetchData("simulate.r")
## Retrieving from http://www.mosaic-web.org/go/datasets/simulate.r
## [1] TRUE
campaign.spending
## Causal Network with  4  vars:  popularity, polls, spending, vote 
## ===============================================
## popularity is exogenous
## polls <== popularity 
## spending <== polls 
## vote <== popularity & spending
equations(campaign.spending)
## popularity <== runif(nsamps, min=15,max=85) 
## polls <== popularity + rnorm(nsamps,sd=3) 
## spending <== 100 - polls + rnorm(nsamps,sd=10) 
## vote <== 0.75*popularity + 0.25*spending + rnorm(nsamps,sd=5)

You can see from the equations how spending is related to vote: it increases it. Let's look at what a statistical model has to say

d = run.sim(campaign.spending, 435)  # number of congressment
summary(lm(vote ~ spending, data = d))
## 
## Call:
## lm(formula = vote ~ spending, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.946  -5.937  -0.267   6.062  24.206 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  64.1522     0.9386    68.3   <2e-16 ***
## spending     -0.2803     0.0176   -15.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.69 on 433 degrees of freedom
## Multiple R-squared: 0.369,   Adjusted R-squared: 0.368 
## F-statistic:  254 on 1 and 433 DF,  p-value: <2e-16

The problem: a back door pathway from spending to vote via popularity. Block it by including a node on the pathway as a covariate.

summary(lm(vote ~ spending + polls, data = d))
## 
## Call:
## lm(formula = vote ~ spending + polls, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -16.98  -3.79   0.49   4.13  18.88 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.5693     2.8298   -0.91     0.36    
## spending      0.2939     0.0264   11.13   <2e-16 ***
## polls         0.7598     0.0315   24.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 5.67 on 432 degrees of freedom
## Multiple R-squared: 0.732,   Adjusted R-squared: 0.73 
## F-statistic:  589 on 2 and 432 DF,  p-value: <2e-16

Work session on Barry Bonds at Bat

Work on the Logistic Regression model of Bonds's hitting.