I’m going to, for now, skip question 1. You can find good information on this in Luciana’s lecture notes.
I am going to generate two 200 period random walks. Make sure to adjust this to your problemset.
random_walk = function(seed){
set.seed(seed)
xt=c()
eps = rnorm(200, mean=0, sd= 1)
xt[1] = rnorm(1, mean = 0, sd = 1)
for (i in 2:200){
xt[i] = xt[i-1] + eps[i]
}
return(xt)
}
vec <- random_walk(seed = 50)
head(vec)
## [1] 1.2954407 0.4538369 0.4868349 1.0109846 -0.7166195 -0.9944841
Now we need to generate a seperate 200 period random walk. Change the seed, which will generate a different series
vec2 <- random_walk(seed = 434)
head(vec2)
## [1] -0.07093969 0.62114692 -0.73021853 -0.18797141 0.18225374 1.41825715
c: no. We shouldn’t! Why? Think about this a bit.
regression_data = data.frame(
m = vec,
n = vec2
)
head(regression_data)
## m n
## 1 1.2954407 -0.07093969
## 2 0.4538369 0.62114692
## 3 0.4868349 -0.73021853
## 4 1.0109846 -0.18797141
## 5 -0.7166195 0.18225374
## 6 -0.9944841 1.41825715
silly_model = lm(m~n, data=regression_data)
#we could also have simply run lm(vec ~ vec2)
summary(silly_model)
##
## Call:
## lm(formula = m ~ n, data = regression_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.7786 -3.6223 0.2051 3.2045 9.3988
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.94205 0.47542 -16.70 <2e-16 ***
## n 1.02232 0.04759 21.48 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.967 on 198 degrees of freedom
## Multiple R-squared: 0.6997, Adjusted R-squared: 0.6982
## F-statistic: 461.4 on 1 and 198 DF, p-value: < 2.2e-16
That’s no good!
Now we need to see what happens with differences…
regression_data = data.frame(
wt = random_walk(seed = 50),
vt= random_walk(seed = 434)
)
regression_data_diff = data.frame(
w_diff = regression_data$wt-lag(regression_data$wt),
v_diff= regression_data$vt - lag(regression_data$vt)
)
#Alternatively,
regression_data_diff = data.frame(
w_diff = diff(regression_data$wt),
v_diff= diff(regression_data$vt)
)
silly_model_2 = lm(w_diff ~ v_diff, data=regression_data_diff)
summary(silly_model_2)
##
## Call:
## lm(formula = w_diff ~ v_diff, data = regression_data_diff)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.53413 -0.68309 0.07774 0.55817 2.71468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.10579 0.06962 -1.520 0.1302
## v_diff 0.12329 0.06918 1.782 0.0763 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9813 on 197 degrees of freedom
## Multiple R-squared: 0.01587, Adjusted R-squared: 0.01087
## F-statistic: 3.176 on 1 and 197 DF, p-value: 0.07626
Look at that! No results!
Technically, this part doesn’t need R, but let me show you how R can help.
y_1 = c(21,19,7,9)
y_0 = c(11,17,3,7)
trt = c(0,0,1,1)
#a helper variable you might want to use
not_treated = c(1,1,0,0)
#to build a vector of ALL treatment effects...
teffect = y_1-y_0
teffect
## [1] 10 2 4 2
mean(teffect)
## [1] 4.5
\[Avg(y_i|D_i =1) - Avg(y_i |D_i = 0) \] you can do this. (Hint: you want to compare SEEN results when trt = 0 to when trt = 1)
I’ll give you the opportunity to figure out how to solve it…
it would require you to see the same person do different things. You need to visit parallel universes, where everyone does everything.
I’ll let you guys do this one on your own.
Step 1: Let’s load the data
library(pacman)
p_load(tidyverse)
wgdta = read_csv(fp)
## Parsed with column specification:
## cols(
## wage = col_double(),
## education = col_double(),
## n_kids = col_double()
## )
4b.) We went over this a TON in lab. Look at the last lab for help on this one.
4c.) This one takes some thinking - how would number of kids impact education? Would those ways effect wage as well?
We can test for relevance using a regression. Let’s do that.
areg <- lm(data = wgdta, education ~ n_kids)
summary(areg)
##
## Call:
## lm(formula = education ~ n_kids, data = wgdta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.056 -1.056 -0.111 1.889 5.889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.05582 0.15321 85.213 < 2e-16 ***
## n_kids -0.47242 0.09361 -5.047 6.21e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.707 on 524 degrees of freedom
## Multiple R-squared: 0.04635, Adjusted R-squared: 0.04453
## F-statistic: 25.47 on 1 and 524 DF, p-value: 6.213e-07
looks at least like we pass the relevance sniff test.
But now we need to run IV (2SLS). We can do this simply with a two stage regression, or by using IV_Robust from estimatr. Let’s load that package
p_load(estimatr)
print(estimatr::iv_robust(data = wgdta, wage ~ education | n_kids))
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper
## (Intercept) 1.7122545 2.803071 0.6108494 0.5415642 -3.7943837 7.2188926
## education 0.3330363 0.222132 1.4992725 0.1344052 -0.1033422 0.7694149
## DF
## (Intercept) 524
## education 524
#or
rg1 <- lm(education ~ n_kids, data = wgdta)
res <- fitted.values(rg1)
#to get the fitted values of rg1
rg2 <- lm(wage ~ res, data = wgdta)
summary(rg2)
##
## Call:
## lm(formula = wage ~ res, data = wgdta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.530 -2.552 -1.248 1.093 18.920
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7123 3.3987 0.504 0.615
## res 0.3330 0.2702 1.232 0.218
##
## Residual standard error: 3.691 on 524 degrees of freedom
## Multiple R-squared: 0.00289, Adjusted R-squared: 0.0009872
## F-statistic: 1.519 on 1 and 524 DF, p-value: 0.2184
You guys should be able to interpret this graph, but don’t forget to do that. How do I talk about a coefficient? It has an estimate, a significance, and don’t forget all else equal.
4f) This depends on what you said in 4c, but think hard on this.
Good luck guys! I hope you have an awesome thanksgiving and we’ll see you next week for in-class help.
Also, Connor will be gone the final week of the course, so you will have a review with Jenni instead. Those details will be passed along to you as we get closer.