Question 2

Ed wants us to generate a 50 period random walk. Here is the outline:

random_walk_1 = function(x){
  set.seed(1234)
  vt=c()
  eps = rnorm(x, mean=0, sd= 1)
  vt[1] = rnorm(1, mean = 0, sd = 1)
  for (i in 2:50){
    vt[i] = vt[i-1] + eps[i]
  }
  return(vt)
}

vec <- random_walk_1(x = 50)
head(vec)
## [1] -1.8060313 -1.5286020 -0.4441608 -2.7898585 -2.3607339 -1.8546780

Now we need to generate a seperate 50 period random walk. Change the seed, which will generate a different series

random_walk_2 = function(x){
  set.seed(456)
  w=c()
  eps = rnorm(x, mean=0, sd= 1)
  w[1] = rnorm(1,mean = 0, sd = 1)
  for (t in 2:x){
    w[t] = w[t-1]+eps[t]
  }
  return(w)
}
df1 <- random_walk_2(50)
head(df1)
## [1] -0.66495083 -0.04317528  0.75769939 -0.63119303 -1.34554989 -1.66961094

c: no. We shouldn’t! Why?

  1. Now Ed wants us to regress eta on gamma
regression_data = data.frame(
  wt = random_walk_2(50),
  vt = random_walk_1(50)
)

regression_data
##             wt          vt
## 1  -0.66495083  -1.8060313
## 2  -0.04317528  -1.5286020
## 3   0.75769939  -0.4441608
## 4  -0.63119303  -2.7898585
## 5  -1.34554989  -2.3607339
## 6  -1.66961094  -1.8546780
## 7  -0.97896794  -2.4294179
## 8  -0.72842004  -2.9760498
## 9   0.27893223  -3.5405018
## 10  0.85216691  -4.4305396
## 11 -0.06364361  -4.9077323
## 12  1.24745384  -5.9061187
## 13  2.23618016  -6.6823726
## 14  3.89010884  -6.6179138
## 15  2.44930361  -5.6584198
## 16  4.39666004  -5.7687053
## 17  6.13359622  -6.2797148
## 18  6.52107955  -7.1909102
## 19  8.80111352  -8.0280819
## 20 10.33899683  -5.6122467
## 21  9.86439302  -5.4781585
## 22  8.14708422  -5.9688444
## 23  6.72025391  -6.4093922
## 24  6.92848983  -5.9498028
## 25  6.89265365  -6.6435230
## 26  8.02693822  -8.0917280
## 27  7.56408325  -7.5169722
## 28  7.23569922  -8.5406280
## 29  8.72023869  -8.5557663
## 30  7.63086079  -9.4917149
## 31  7.10206658  -8.3894173
## 32  6.50827372  -8.8650104
## 33  4.50935807  -9.5744504
## 34  4.80551120 -10.0757085
## 35  4.97613645 -11.7048020
## 36  6.79178877 -12.8724212
## 37  6.13118567 -15.0524609
## 38  5.99093372 -16.3934541
## 39  5.56695462 -16.6877479
## 40  5.52821884 -17.1536455
## 41  5.49927692 -15.7041492
## 42  5.89231430 -16.7727919
## 43  5.64270038 -17.6281565
## 44  5.72615059 -17.9087796
## 45  7.80502521 -18.9031196
## 46  7.92587701 -19.8716339
## 47  8.04402645 -20.9789521
## 48  8.81408067 -22.2309380
## 49  7.63867826 -22.7547661
## 50  8.04771682 -23.2516161
silly_model = lm(regression_data$wt~regression_data$vt, data=regression_data)
summary(silly_model)
## 
## Call:
## lm(formula = regression_data$wt ~ regression_data$vt, data = regression_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2385 -1.7885 -0.4711  2.4668  6.6124 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.99739    0.71937   2.777  0.00781 ** 
## regression_data$vt -0.30812    0.06247  -4.933 1.01e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.78 on 48 degrees of freedom
## Multiple R-squared:  0.3364, Adjusted R-squared:  0.3226 
## F-statistic: 24.33 on 1 and 48 DF,  p-value: 1.014e-05

That’s no good!

Now we need to see what happens with differences…

regression_data = data.frame(
  wt = random_walk_2(50),
  vt= random_walk_1(50)
)

regression_data_diff = data.frame(
  w_diff = regression_data$wt-lag(regression_data$wt),
  v_diff= regression_data$vt - lag(regression_data$vt)
)

#Alternatively,

regression_data_diff = data.frame(
  w_diff = diff(regression_data$wt),
  v_diff= diff(regression_data$vt)
)

silly_model_2 = lm(w_diff ~ v_diff, data=regression_data_diff)
summary(silly_model_2)
## 
## Call:
## lm(formula = w_diff ~ v_diff, data = regression_data_diff)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.16507 -0.68408 -0.03094  0.62717  2.11936 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.19658    0.16846   1.167    0.249
## v_diff       0.04289    0.17166   0.250    0.804
## 
## Residual standard error: 1.055 on 47 degrees of freedom
## Multiple R-squared:  0.001326,   Adjusted R-squared:  -0.01992 
## F-statistic: 0.06242 on 1 and 47 DF,  p-value: 0.8038

Look at that!

Question 3

y_1 = c(25,15,11,13)
y_0 = c(17,11,3,9)

#to build a vector of ALL treatment effects...
teffect = y_1-y_0
teffect
## [1] 8 4 8 4
mean(teffect)
## [1] 6
  1. If we assume the treatment effect is constant across all individuals (\(treatment_i = treatment\)) then we can estimate the average treatment effect as:

\[Avg(y_i|D_i =1) - Avg(y_i |D_i = 0) \] you can do this.

  1. I’ll give you the opportunity to figure out how to solve it…

  2. it would require you to see the same person do different things. You need to visit parallel universes, where everyone does everything.

  3. I’ll let you guys do this one on your own.

Question 4

I think you guys can solve this one. If you need help: look at the last lab, or Ed’s last lecture.