Friday 26 April

Survey processing.

Example of survey processing commands here and the raw template.

Introduction to Causal Reasoning

Myopia

It turns out that the parent's myopia is the disposing factor. Shortsighted parents leave the lights on at night and their kids are shortsighted for genetic reasons. A CNN news report is here. And here is the abstract of the rebuttal.

Confounders as a diagram

Funky tikz

SAT versus per-student spending

Draw the causal diagram

Funky tikz

Spending  ---------->      SAT    <-------|
      |                                   |
Focus on Educ. ---> fraction taking SAT  -|

Campaign Spending and the back-door network

Research in political science shows that higher spending in campaigns is related to a lower vote for the incumbent. Yet it's common sense that higher spending improves things for the candidate; that's why they do it.

  Polls <-----    Popularity ---> vote outcome
    |                                 ^
    v                                 |
  Spending ---------------------------

How to block a back-door?

We've done it by including the covariate in the model. But this is too crude an answer.

Example: Election Spending

fetchData("simulate.r")
## Retrieving from http://www.mosaic-web.org/go/datasets/simulate.r
## [1] TRUE
campaign.spending
## Causal Network with  4  vars:  popularity, polls, spending, vote 
## ===============================================
## popularity is exogenous
## polls <== popularity 
## spending <== polls 
## vote <== popularity & spending
equations(campaign.spending)
## popularity <== runif(nsamps, min=15,max=85) 
## polls <== popularity + rnorm(nsamps,sd=3) 
## spending <== 100 - polls + rnorm(nsamps,sd=10) 
## vote <== 0.75*popularity + 0.25*spending + rnorm(nsamps,sd=5)

You can see from the equations how spending is related to vote: it increases it. Let's look at what a statistical model has to say

d = run.sim(campaign.spending, 435)  # number of congressment
summary(lm(vote ~ spending, data = d))
## 
## Call:
## lm(formula = vote ~ spending, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.501  -5.679   0.634   5.800  22.371 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  65.8763     0.9874    66.7   <2e-16 ***
## spending     -0.3179     0.0175   -18.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.51 on 433 degrees of freedom
## Multiple R-squared: 0.432,   Adjusted R-squared: 0.431 
## F-statistic:  330 on 1 and 433 DF,  p-value: <2e-16

The problem: a back door pathway from spending to vote via popularity. Block it by including a node on the pathway as a covariate.

summary(lm(vote ~ spending + polls, data = d))
## 
## Call:
## lm(formula = vote ~ spending + polls, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.504  -3.582   0.091   3.271  15.348 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.3417     2.5714   -0.91     0.36    
## spending      0.2826     0.0244   11.56   <2e-16 ***
## polls         0.7558     0.0277   27.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 5.17 on 432 degrees of freedom
## Multiple R-squared: 0.792,   Adjusted R-squared: 0.791 
## F-statistic:  820 on 2 and 432 DF,  p-value: <2e-16