Homework Help

I’m going to, for now, skip question 1. You can find good information on this in Luciana’s lecture notes.

Question 2

I am going to generate two 200 period random walks. Make sure to adjust this to your problemset.

random_walk = function(seed){
  set.seed(seed)
  xt=c()
  eps = rnorm(200, mean=0, sd= 1)
  xt[1] = rnorm(1, mean = 0, sd = 1)
  for (i in 2:200){
    xt[i] = xt[i-1] + eps[i]
  }
  return(xt)
}

vec <- random_walk(seed = 50)
head(vec)
## [1]  1.2954407  0.4538369  0.4868349  1.0109846 -0.7166195 -0.9944841

Now we need to generate a seperate 200 period random walk. Change the seed, which will generate a different series

vec2 <- random_walk(seed = 434)
head(vec2)
## [1] -0.07093969  0.62114692 -0.73021853 -0.18797141  0.18225374  1.41825715

c: no. We shouldn’t! Why? Think about this a bit.

  1. Now Luciana wants us to regress m on n. We called that vec1 and vec 2.
regression_data = data.frame(
  m = vec,
  n = vec2
)

head(regression_data)
##            m           n
## 1  1.2954407 -0.07093969
## 2  0.4538369  0.62114692
## 3  0.4868349 -0.73021853
## 4  1.0109846 -0.18797141
## 5 -0.7166195  0.18225374
## 6 -0.9944841  1.41825715
silly_model = lm(m~n, data=regression_data)

#we could also have simply run lm(vec ~ vec2)

summary(silly_model)
## 
## Call:
## lm(formula = m ~ n, data = regression_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7786  -3.6223   0.2051   3.2045   9.3988 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.94205    0.47542  -16.70   <2e-16 ***
## n            1.02232    0.04759   21.48   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.967 on 198 degrees of freedom
## Multiple R-squared:  0.6997, Adjusted R-squared:  0.6982 
## F-statistic: 461.4 on 1 and 198 DF,  p-value: < 2.2e-16

That’s no good!

Now we need to see what happens with differences…

regression_data = data.frame(
  wt = random_walk(seed = 50),
  vt= random_walk(seed = 434)
)

regression_data_diff = data.frame(
  w_diff = regression_data$wt-lag(regression_data$wt),
  v_diff= regression_data$vt - lag(regression_data$vt)
)

#Alternatively,

regression_data_diff = data.frame(
  w_diff = diff(regression_data$wt),
  v_diff= diff(regression_data$vt)
)

silly_model_2 = lm(w_diff ~ v_diff, data=regression_data_diff)
summary(silly_model_2)
## 
## Call:
## lm(formula = w_diff ~ v_diff, data = regression_data_diff)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.53413 -0.68309  0.07774  0.55817  2.71468 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.10579    0.06962  -1.520   0.1302  
## v_diff       0.12329    0.06918   1.782   0.0763 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9813 on 197 degrees of freedom
## Multiple R-squared:  0.01587,    Adjusted R-squared:  0.01087 
## F-statistic: 3.176 on 1 and 197 DF,  p-value: 0.07626

Look at that! No results!

Question 3

Technically, this part doesn’t need R, but let me show you how R can help.

y_1 = c(21,19,7,9)
y_0 = c(11,17,3,7)
trt = c(0,0,1,1)

#a helper variable you might want to use

not_treated = c(1,1,0,0)

#to build a vector of ALL treatment effects...
teffect = y_1-y_0
teffect
## [1] 10  2  4  2
mean(teffect)
## [1] 4.5
  1. If we assume the treatment effect is constant across all individuals (\(treatment_i = treatment\)) then we can estimate the average treatment effect as:

\[Avg(y_i|D_i =1) - Avg(y_i |D_i = 0) \] you can do this. (Hint: you want to compare SEEN results when trt = 0 to when trt = 1)

  1. I’ll give you the opportunity to figure out how to solve it…

  2. it would require you to see the same person do different things. You need to visit parallel universes, where everyone does everything.

  3. I’ll let you guys do this one on your own.

Question 4

Step 1: Let’s load the data

library(pacman)
p_load(tidyverse)
wgdta = read_csv(fp)
## Parsed with column specification:
## cols(
##   wage = col_double(),
##   education = col_double(),
##   n_kids = col_double()
## )

4b.) We went over this a TON in lab. Look at the last lab for help on this one.

4c.) This one takes some thinking - how would number of kids impact education? Would those ways effect wage as well?

We can test for relevance using a regression. Let’s do that.

areg <- lm(data = wgdta, education ~ n_kids)
summary(areg)
## 
## Call:
## lm(formula = education ~ n_kids, data = wgdta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.056  -1.056  -0.111   1.889   5.889 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.05582    0.15321  85.213  < 2e-16 ***
## n_kids      -0.47242    0.09361  -5.047 6.21e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.707 on 524 degrees of freedom
## Multiple R-squared:  0.04635,    Adjusted R-squared:  0.04453 
## F-statistic: 25.47 on 1 and 524 DF,  p-value: 6.213e-07

looks at least like we pass the relevance sniff test.

But now we need to run IV (2SLS). We can do this simply with a two stage regression, or by using IV_Robust from estimatr. Let’s load that package

p_load(estimatr)

print(estimatr::iv_robust(data = wgdta, wage ~ education | n_kids))
##              Estimate Std. Error   t value  Pr(>|t|)   CI Lower  CI Upper
## (Intercept) 1.7122545   2.803071 0.6108494 0.5415642 -3.7943837 7.2188926
## education   0.3330363   0.222132 1.4992725 0.1344052 -0.1033422 0.7694149
##              DF
## (Intercept) 524
## education   524
#or

rg1 <- lm(education ~ n_kids, data = wgdta)
res <- fitted.values(rg1)
#to get the fitted values of rg1
rg2 <- lm(wage ~ res, data = wgdta)
summary(rg2)
## 
## Call:
## lm(formula = wage ~ res, data = wgdta)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.530 -2.552 -1.248  1.093 18.920 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   1.7123     3.3987   0.504    0.615
## res           0.3330     0.2702   1.232    0.218
## 
## Residual standard error: 3.691 on 524 degrees of freedom
## Multiple R-squared:  0.00289,    Adjusted R-squared:  0.0009872 
## F-statistic: 1.519 on 1 and 524 DF,  p-value: 0.2184

You guys should be able to interpret this graph, but don’t forget to do that. How do I talk about a coefficient? It has an estimate, a significance, and don’t forget all else equal.

4f) This depends on what you said in 4c, but think hard on this.

Good luck guys! I hope you have an awesome thanksgiving and we’ll see you next week for in-class help.

Also, Connor will be gone the final week of the course, so you will have a review with Jenni instead. Those details will be passed along to you as we get closer.

Link to main notes page