Stats 155 Class Notes 2012-11-30

In the News

No one shot or stabbed in NY on Monday We know the annual murder rate: about 400/365 per day. So there are many days when there will be no murders. What must the “stabbing” rate be so that it would be a surprise that there were no murders or stabbings? A poisson model.

Confounders as a diagram

Variable   --->  Outcome
   ^                 ^
   |                 |
   ---> Confounder <--

SAT versus per-student spending

Draw the causal diagram

Spending  ---------->      SAT    <-------|
      |                                   |
Focus on Educ. ---> fraction taking SAT  -|

Campaign Spending and the back-door network

Research in political science shows that higher spending in campaigns is related to a lower vote for the incumbent. Yet it's common sense that higher spending improves things for the candidate; that's why they do it.

  Polls <-----    Popularity ---> vote outcome
    |                                 ^
    v                                 |
  Spending ---------------------------

How to block a back-door?

We've done it by including the covariate in the model. But this is too crude an answer.

Example: Election Spending

fetchData("simulate.r")

## Retrieving from http://www.mosaic-web.org/go/datasets/simulate.r

## [1] TRUE

campaign.spending

## Causal Network with  4  vars:  popularity, polls, spending, vote 
## ===============================================
## popularity is exogenous
## polls <== popularity 
## spending <== polls 
## vote <== popularity & spending

equations(campaign.spending)

## popularity <== runif(nsamps, min=15,max=85) 
## polls <== popularity + rnorm(nsamps,sd=3) 
## spending <== 100 - polls + rnorm(nsamps,sd=10) 
## vote <== 0.75*popularity + 0.25*spending + rnorm(nsamps,sd=5)

You can see from the equations how spending is related to vote: it increases it. Let's look at what a statistical model has to say

d = run.sim(campaign.spending, 435)  # number of congressment
summary(lm(vote ~ spending, data = d))

## 
## Call:
## lm(formula = vote ~ spending, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.946  -5.937  -0.267   6.062  24.206 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  64.1522     0.9386    68.3   <2e-16 ***
## spending     -0.2803     0.0176   -15.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.69 on 433 degrees of freedom
## Multiple R-squared: 0.369,   Adjusted R-squared: 0.368 
## F-statistic:  254 on 1 and 433 DF,  p-value: <2e-16

The problem: a back door pathway from spending to vote via popularity. Block it by including a node on the pathway as a covariate.

summary(lm(vote ~ spending + polls, data = d))

## 
## Call:
## lm(formula = vote ~ spending + polls, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -16.98  -3.79   0.49   4.13  18.88 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.5693     2.8298   -0.91     0.36    
## spending      0.2939     0.0264   11.13   <2e-16 ***
## polls         0.7598     0.0315   24.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 5.67 on 432 degrees of freedom
## Multiple R-squared: 0.732,   Adjusted R-squared: 0.73 
## F-statistic:  589 on 2 and 432 DF,  p-value: <2e-16

Work session on Barry Bonds at Bat

Work on the Logistic Regression model of Bonds's hitting.