Computer Problem Set 1

Exercise 1

1.1

data(mroz)
linprob <- lm(inlf~nwifeinc+educ+exper+I(exper^2)+age+kidslt6+kidsge6,data=mroz)

Model 1 Linear Probability Model

\[inlf_{i} = \beta_{0} + \beta_{1}nwifeinc_{i}+ \beta_{2}educ_{i}+ \beta_{3}exper_{i}+ \beta_{4}expersq_{i}+ \beta_{5}age_{i}+ \beta_{6}kidslt6_{i}+ \beta_{7}kidsge6_{i}+ \epsilon_{i}\]

stargazer(linprob, summary = FALSE, type = "text", colnames=FALSE, rownames=FALSE)

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                inlf            
## -----------------------------------------------
## nwifeinc                     -0.003**          
##                               (0.001)          
##                                                
## educ                         0.038***          
##                               (0.007)          
##                                                
## exper                        0.039***          
##                               (0.006)          
##                                                
## I(exper2)                    -0.001***         
##                              (0.0002)          
##                                                
## age                          -0.016***         
##                               (0.002)          
##                                                
## kidslt6                      -0.262***         
##                               (0.034)          
##                                                
## kidsge6                        0.013           
##                               (0.013)          
##                                                
## Constant                     0.586***          
##                               (0.154)          
##                                                
## -----------------------------------------------
## Observations                    753            
## R2                             0.264           
## Adjusted R2                    0.257           
## Residual Std. Error      0.427 (df = 745)      
## F Statistic           38.218*** (df = 7; 745)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The model shows that for every small child (age below 6) a woman has, the probability of her participating in the labor force decreses by 26.2% which is measured as a significant value.

1.2

coeftest(linprob, vcov = hccm)

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)  0.58551922  0.15358032  3.8125  0.000149 ***
## nwifeinc    -0.00340517  0.00155826 -2.1852  0.029182 *  
## educ         0.03799530  0.00733982  5.1766 2.909e-07 ***
## exper        0.03949239  0.00598359  6.6001 7.800e-11 ***
## I(exper^2)  -0.00059631  0.00019895 -2.9973  0.002814 ** 
## age         -0.01609081  0.00241459 -6.6640 5.183e-11 ***
## kidslt6     -0.26181047  0.03215160 -8.1430 1.621e-15 ***
## kidsge6      0.01301223  0.01366031  0.9526  0.341123    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The standard error changes slightly both for kidslt6 (.034 to .032) and kidsge6 (.013 to .014), this makes sense as heteroskedasticity does not effect bias or inconsistency but rather the estimators of the variances which the standard errors are based off of.

1.3

xpred <- list(nwifeinc=c(100,9),educ=c(5,17),exper=c(0,30),
              age=c(20,52),kidslt6=c(2,0),kidsge6=c(0,0))
predict(linprob,xpred)

##          1          2 
## -0.4104582  1.0121619

The two predictions here show each of the two extreme cases described. The first one shows a woman whose conditions make her so unlikely to be in the work force, that the results are a negative number. The second shows a woman whose conditions make her so likely to be in the work force they are over 100%. These of course don’t make logical sense and are some of the extreme ends of the model and one of the downsides to using OLS as you can have results that go over logical bounds.

1.4

#estimate by logit
logitres<-glm(inlf~nwifeinc+educ+exper+I(exper^2)+age+kidslt6+kidsge6,
                                family=binomial(link=logit),data=mroz)
#estimate by probit
probitres<-glm(inlf~nwifeinc+educ+exper+I(exper^2)+age+kidslt6+kidsge6,
                                family=binomial(link=probit),data=mroz)
#produce predictions for LPM, logit and probit
predict(linprob,  xpred,type = "response")

##          1          2 
## -0.4104582  1.0121619

predict(logitres, xpred,type = "response")

##           1           2 
## 0.005218002 0.940103220

predict(probitres,xpred,type = "response")

##           1           2 
## 0.001065043 0.949596693

The logit and probit models designed to fit within an 0-100% range which makes much more sense intuitively looking at them then the LPM model. Rather than a negative % chance we now have a .52% and .11% of being in the workforce respectively for the logit and probit models. For the other end of the scale, rather than a % chance over 100, we have 94% and 95% of being in the workforce respectively for logit and probit.

1.5

#summary(logitres)
#summary(probitres)

logitmfx(inlf~nwifeinc+educ+exper+I(exper^2)+age+kidslt6+kidsge6, 
                                              data=mroz, atmean=FALSE)

## Call:
## logitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + 
##     age + kidslt6 + kidsge6, data = mroz, atmean = FALSE)
## 
## Marginal Effects:
##                  dF/dx   Std. Err.       z     P>|z|    
## nwifeinc   -0.00381181  0.00153898 -2.4769  0.013255 *  
## educ        0.03949652  0.00846811  4.6641 3.099e-06 ***
## exper       0.03676411  0.00655577  5.6079 2.048e-08 ***
## I(exper^2) -0.00056326  0.00018795 -2.9968  0.002728 ** 
## age        -0.01571936  0.00293269 -5.3600 8.320e-08 ***
## kidslt6    -0.25775366  0.04263493 -6.0456 1.489e-09 ***
## kidsge6     0.01073482  0.01339130  0.8016  0.422769    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Rather than a constant marginal effect like in the LPM model, the logit and probit have a diminishing magnitude of the partial effects. So the averages of those partial effects have to be taken to more easily compare to the LPM model coefficient. The numbers are very close with kidslt6 having the biggest difference bewteen -.261 and -.258. This makes sense that its slightly smaller since there is an average of diminishing effects. LPM though gets very close in its coefficient estimate meaning it may be a better option if one just wants to quickly get a coefficient estimate without having to jump through too many hoops. Even though the predicted values from LPM didn’t make sense, the coefficient is close enough to all of these that it would work reasonably well.

Exercise 2

2.1

# Load data using wooldridge
data(fertil3)

# Define Yearly time series beginning in 1913
tsdata <- ts(fertil3, start=1913)

# Linear regression of model with lags:
res <- dynlm(gfr ~ pe + L(pe) + L(pe,2) + ww2 + pill, data=tsdata)
coeftest(res)

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)  95.8704975   3.2819571 29.2114 < 2.2e-16 ***
## pe            0.0726718   0.1255331  0.5789    0.5647    
## L(pe)        -0.0057796   0.1556629 -0.0371    0.9705    
## L(pe, 2)      0.0338268   0.1262574  0.2679    0.7896    
## ww2         -22.1264975  10.7319716 -2.0617    0.0433 *  
## pill        -31.3049888   3.9815591 -7.8625 5.634e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# F test. H0: all pe coefficients are=0
linearHypothesis(res, matchCoefs(res,"pe"))

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
67	15459.75	NA	NA	NA	NA
64	13032.64	3	2427.104	3.972964	0.011652

# Calculating the LRP
b<-coef(res)
b["pe"]+b["L(pe)"]+b["L(pe, 2)"]

##        pe 
## 0.1007191

# F test. H0: LRP=0
linearHypothesis(res,"pe + L(pe) + L(pe, 2) = 0")

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
65	15358.41	NA	NA	NA	NA
64	13032.64	1	2325.765	11.42124	0.0012408

The impact propensity of PE is for every $1 increase in personal tax exemption there are 0.073 more births per 1000 women. For the other other two variables which are each 0/1 variables, there are 24.24 less children born per 100 women during years that are WW2 (1941-1945) and 31.3 less children per 1000 women for years during the birth control pill (1964-Present) The long run propensity of PE is a permanent increase by .101 more births per 1000 women of childbearing age.

2.2

res2 <- dynlm(gfr ~ pe + L(pe) + L(pe,2) + L(pe,3) + ww2 + pill, data=tsdata)

b<-coef(res2)
b["pe"]+b["L(pe)"]+b["L(pe, 2)"]+b["L(pe, 3)"]

##        pe 
## 0.1122418

The long run propensity is slightly higher now at .112 birth increase per 100 women based on an increase in PE. This is a reasonable estimate as the longer the lag, the more time for women to find out or take advantage of the higher PE and since pregnancy is a 9 month term usually there is a delay in the reaction to the higher PE.

2.3

Two lags seems to be enough as the third lag really drops down in its effect and the fourth lag added, causes a negative sign to occur. Logcially thinking, most pregnancies last 9 months and it can take awhile for a woman to become pregneant so a lag of 2 years seems reasonable to show the effects of other vactors on gfr.

2.4

There 31.3 less children born to every 1000 women for years after the birth control pill is introduced to the public. (1964-Present)

Exercise 3

3.1

data("kielmc")
data1981 <- kielmc %>% filter(year == 1981)

## Warning: package 'bindrcpp' was built under R version 3.3.3

equation1981 <- lm(rprice~nearinc, data=data1981)

stargazer(equation1981, summary = FALSE, type = "text", colnames=FALSE, rownames=FALSE)

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               rprice           
## -----------------------------------------------
## nearinc                   -30,688.270***       
##                             (5,827.709)        
##                                                
## Constant                  101,307.500***       
##                             (3,093.027)        
##                                                
## -----------------------------------------------
## Observations                    142            
## R2                             0.165           
## Adjusted R2                    0.159           
## Residual Std. Error    31,238.040 (df = 140)   
## F Statistic           27.730*** (df = 1; 140)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The results match up with 13.4 from the book, showing a house is worth $30,688.27 less if it is located near the incinerator.

3.2

data1978 <- kielmc %>% filter(year == 1978)
equation1978 <- lm(rprice~nearinc, data=data1978)
stargazer(equation1978, summary = FALSE, type = "text", colnames=FALSE, rownames=FALSE)

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               rprice           
## -----------------------------------------------
## nearinc                   -18,824.370***       
##                             (4,744.594)        
##                                                
## Constant                   82,517.230***       
##                             (2,653.790)        
##                                                
## -----------------------------------------------
## Observations                    179            
## R2                             0.082           
## Adjusted R2                    0.076           
## Residual Std. Error    29,431.960 (df = 177)   
## F Statistic           15.741*** (df = 1; 177)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Findin a year previous to the one I had looked at might’ve let me see what my coefficient was before a state law was put in place. In my case, I could’ve seen what E-Cigarett usage was prior to there being a smoking ban within a state and compared that to what it was after the smoking ban was implemented. Rather than comparing states with and without smoking bans I could see directly the impact of the ban on one state’s e-cig usage which may be a better indicator on if smoking bans have an effect on ecig usage rather than cross state comparison.

3.3

Continue reading Example 13.2 through equation 13.8. Estimate equation 13.8.

logprice <- lm(log(rprice)~y81 + nearinc + y81 * nearinc, data=kielmc)
stargazer(logprice, summary = FALSE, type = "text", colnames=FALSE, rownames=FALSE)

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                             log(rprice)        
## -----------------------------------------------
## y81                          0.193***          
##                               (0.045)          
##                                                
## nearinc                      -0.340***         
##                               (0.055)          
##                                                
## y81:nearinc                   -0.063           
##                               (0.083)          
##                                                
## Constant                     11.285***         
##                               (0.031)          
##                                                
## -----------------------------------------------
## Observations                    321            
## R2                             0.246           
## Adjusted R2                    0.239           
## Residual Std. Error      0.338 (df = 317)      
## F Statistic           34.470*** (df = 3; 317)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

From this output, find the entries for each cell of the difference in differences table like Table 13.3 in your text. Fill them into the code below that defines each column of the table.

#Replace the words in caps the code below with the values then run the code to produce a DD table
col1 <- c(" ", "Control", "Treatment", "Treatment-Control")
col2 <- c("Before", "11.29" , "10.95" , "-0.34" )
col3 <- c("After", "11.483" , "11.08" , "-0.403" )
col4 <- c("After-Before", "0.193" , "0.13" , "-0.063")
ddtable <- data.frame(col1, col2, col3, col4)
library(stargazer)
stargazer(ddtable, summary = FALSE, type = "text", colnames=FALSE, rownames=FALSE)

Computer Problem Set 1

Mary Hamman

April 18, 2018

Exercise 1

1.1

Model 1 Linear Probability Model

1.2

1.3

1.4

1.5

Exercise 2

2.1

2.2

2.3

2.4

Exercise 3

3.1

3.2

3.3