Homework 1

Question 1

a)

Since Yit = β0 + β1Xit + γ1D1i + γ2D2i + γ3D3i + uit the {D} are going to denote the dummy which is going to be 1 for each specific case. Since the Number of Dummy variables are going to be the same number of possible outcomes we will have multicollinearity if Xit also equals to 1 represeting the same possible set.

D1i + D2i + D3i = X0,it = 1

b)

The result here or from the last is always going to be the same if the number o the dummy equals the possible set we have. The only way to not have multicollinearity is if we drop one dummy. That is the case to have a base parameter that can not be expressed in the equation with the others in order to calculate the regression.

D1i + D2i + · · · + DNi = X0,it = 1

c)

When you have perfect multicollinearity you cannot compute OLS. You will not be able to invert the matrix so you cannot compute the regression.

β^∼N(β,σ2(XTX)−1)

You have full information already so you have to have D-1 dummy

Question 2

No because everybody would have the same number 2 as the age. The effect you will have is that it will be a constant.

Question 3

a)

YDi = β0 + β1Dxi + D1i

YDi = Arrest = 1, otherwise = 0

β1Dxi = β1 is the coefficient of the dummy = 1 open container, otherwise = 0

D1i = 1 if the law is in place, otherwise = 0

b)

Would be important to control for many other factors, such as age, drive experience, past criminal record, also some more complex variables like the place where the person was arrested as we know some areas (rual) are more likely to suffer with that than others.

We would like to control for that because we may suffer from omitted variable bias.

Question 4

a)

hrsempit = β0 + δ1d88t + δ2d89t + β1grantit + β2granti,t−1 + β3log(employit) + ai + uit

library(wooldridge)
library(plm) 
library(lmtest)                 # load package; to conduct hypothesis test using robust SE
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
data("jtrain")
str(jtrain)
## 'data.frame':    471 obs. of  30 variables:
##  $ year    : int  1987 1988 1989 1987 1988 1989 1987 1988 1989 1987 ...
##  $ fcode   : num  410032 410032 410032 410440 410440 ...
##  $ employ  : int  100 131 123 12 13 14 20 25 24 200 ...
##  $ sales   : num  47000000 43000000 49000000 1560000 1970000 ...
##  $ avgsal  : num  35000 37000 39000 10500 11000 ...
##  $ scrap   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rework  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ tothrs  : int  12 8 8 12 12 10 50 50 50 0 ...
##  $ union   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ grant   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ d89     : int  0 0 1 0 0 1 0 0 1 0 ...
##  $ d88     : int  0 1 0 0 1 0 0 1 0 0 ...
##  $ totrain : int  100 50 50 12 13 14 15 10 20 0 ...
##  $ hrsemp  : num  12 3.05 3.25 12 12 ...
##  $ lscrap  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ lemploy : num  4.61 4.88 4.81 2.48 2.56 ...
##  $ lsales  : num  17.7 17.6 17.7 14.3 14.5 ...
##  $ lrework : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ lhrsemp : num  2.56 1.4 1.45 2.56 2.56 ...
##  $ lscrap_1: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ grant_1 : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ clscrap : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cgrant  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ clemploy: num  NA 0.27 -0.063 NA 0.08 ...
##  $ clsales : num  NA -0.0889 0.1306 NA 0.2333 ...
##  $ lavgsal : num  10.46 10.52 10.57 9.26 9.31 ...
##  $ clavgsal: num  NA 0.0556 0.0526 NA 0.0465 ...
##  $ cgrant_1: int  NA 0 0 NA 0 0 NA 0 0 NA ...
##  $ chrsemp : num  NA -8.947 0.199 NA 0 ...
##  $ clhrsemp: num  NA -1.1654 0.0478 NA 0 ...
##  - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"
df = jtrain

model <- (hrsemp~ d88+d89+grant+grant_1+log(employ))

fit.1 <- plm(model, data = jtrain, index=c("fcode", "year"), model="within")  # fixed effects regression
coeftest(fit.1, vcovHC(fit.1, cluster="group", type="HC0"))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## d88         -1.09868    1.24130 -0.8851   0.3770    
## d89          4.09005    2.78202  1.4702   0.1428    
## grant       34.22818    3.72128  9.1980   <2e-16 ***
## grant_1      0.50408    3.14295  0.1604   0.8727    
## log(employ) -0.17626    4.51164 -0.0391   0.9689    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The amount of firms used was 157.

We have 5 variables and 3 years total observations are 2355

That could be less if considering that d88 and d89 would have to have only observation for that year, not the case here.

length(unique(jtrain$fcode))
## [1] 157
length(unique(jtrain$fcode))*5*3
## [1] 2355

b)

The coefficient is significant. This coefficient is talking about if a person received a grant or not. Since it is positive it showing that if the person received grants the hrsemp will increase.

c)

I would expect to be significant with a small coefficient, but makes sense to not be significant, just because the money that the person received a year ago would not effect today, because they already spent it.

d)

From the regression I have to say that if the firms is larger they give less training. We see that because log(employ) has a negative coefficient. To answer the 10% we just have to multiply the coefficient by the difference we want to check. In this case would be -1.76 hours difference.

Question 5

a)

β1 should be negative, if the execution of a murder will make others not to murder as you have more executions you should expect less murders β2 should be positive, as unemployment grows people tend to fight more, so more murders

b)
data("murder")
str(murder)
## 'data.frame':    153 obs. of  13 variables:
##  $ id     : int  1 1 1 2 2 2 3 3 3 4 ...
##  $ state  : chr  "AL" "AL" "AL" "AK" ...
##  $ year   : int  87 90 93 87 90 93 87 90 93 87 ...
##  $ mrdrte : num  9.3 11.6 11.6 10.1 7.5 ...
##  $ exec   : int  2 5 2 0 0 0 0 0 3 0 ...
##  $ unem   : num  7.8 6.8 7.5 10.8 6.9 ...
##  $ d90    : int  0 1 0 0 1 0 0 1 0 0 ...
##  $ d93    : int  0 0 1 0 0 1 0 0 1 0 ...
##  $ cmrdrte: num  NA 2.3 0 NA -2.6 ...
##  $ cexec  : int  NA 3 -3 NA 0 0 NA 0 3 NA ...
##  $ cunem  : num  NA -1 0.7 NA -3.9 ...
##  $ cexec_1: int  NA NA 3 NA NA 0 NA NA 0 NA ...
##  $ cunem_1: num  NA NA -1 NA NA ...
##  - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"
model.m <- (mrdrte~exec+unem)

murder_90 <- subset(murder, d90==1)
murder_93 <- subset(murder, d93==1)

fit.m <- plm(model.m, data = murder_90, index=c("id", "year"), effect="time", model="pooling")  # fixed effects regression
summary(fit.m)
## Pooling Model
## 
## Call:
## plm(formula = model.m, data = murder_90, effect = "time", model = "pooling", 
##     index = c("id", "year"))
## 
## Balanced Panel: n = 51, T = 1, N = 51
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -10.17150  -3.17727  -1.90289   0.89743  66.44284 
## 
## Coefficients:
##             Estimate Std. Error t-value Pr(>|t|)  
## (Intercept) -6.16910    7.26073 -0.8497  0.39974  
## exec         0.35579    0.68514  0.5193  0.60595  
## unem         2.65549    1.34120  1.9799  0.05346 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5655.8
## Residual Sum of Squares: 5142.6
## R-Squared:      0.090744
## Adj. R-Squared: 0.052858
## F-statistic: 2.3952 on 2 and 48 DF, p-value: 0.10197
coeftest(fit.m, vcovHC(fit.m, cluster="group", type="HC0")) #code for D
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.16910    7.01472 -0.8795   0.3835
## exec         0.35579    0.44067  0.8074   0.4234
## unem         2.65549    1.60265  1.6569   0.1041
fit.m <- plm(model.m, data = murder_93, index=c("id", "year"), effect="twoways", model="pooling")  # fixed effects regression
summary(fit.m)
## Pooling Model
## 
## Call:
## plm(formula = model.m, data = murder_93, effect = "twoways", 
##     model = "pooling", index = c("id", "year"))
## 
## Balanced Panel: n = 51, T = 1, N = 51
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -12.62232  -3.14800  -0.79467   1.39000  64.53409 
## 
## Coefficients:
##              Estimate Std. Error t-value Pr(>|t|)  
## (Intercept) -6.568645   6.314987 -1.0402  0.30347  
## exec         0.084923   0.287752  0.2951  0.76917  
## unem         2.415830   0.980977  2.4627  0.01743 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5743.3
## Residual Sum of Squares: 5081.4
## R-Squared:      0.11525
## Adj. R-Squared: 0.078382
## F-statistic: 3.1262 on 2 and 48 DF, p-value: 0.052934
coeftest(fit.m, vcovHC(fit.m, cluster="group", type="HC0")) #code for D
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.568645   7.693923 -0.8537   0.3975
## exec         0.084923   0.114641  0.7408   0.4624
## unem         2.415830   1.454009  1.6615   0.1031

Using only the cross sectional data the β1 is positive for both years. That is not what we expect

c)

murder_90_93 <- subset(murder, d93==1 | d90 ==1)

fit.m <- plm(model.m, data = murder_90_93, index=c("id", "year"), effect="twoways", model="within")  # fixed effects regression
coeftest(fit.m, vcovHC(fit.m, cluster="group", type="HC0")) 
## 
## t test of coefficients:
## 
##       Estimate Std. Error t value  Pr(>|t|)    
## exec -0.103840   0.016492 -6.2964 8.826e-08 ***
## unem -0.066591   0.142543 -0.4672    0.6425    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes, we see the effect, it is not strong. However β2 has the wrong signal

d)

Code executed in part B

e)

VALOR_MAX <- max(murder_93$exec)
x_list=murder_93$exec
y_list=murder_93$state


#df %>% filter(murder_93$exec = VALOR_MAX) %>% select(state) %>% unique()


# combineListsAsOne <-function(list1, list2){
#   n <- c()
#   for(x in list1){
#     n<-c(n, x)
#   }
#   for(y in list2){
#     n<-c(n, y)
#   }
#   return(n)
# }
# combineListsAsOne(x_list,y_list)

The highest is Texas with 34 and the next one is Virginia with 11. That is 23 more executions

f)

no_tx <- subset(murder, d93==1 | d90 ==1 & state != "TX")

fit.m <- plm(model.m, data = no_tx, index=c("id", "year"), effect="twoways", model="within")  # fixed effects regression
coeftest(fit.m, vcovHC(fit.m, cluster="group", type="HC0")) 
## 
## t test of coefficients:
## 
##       Estimate Std. Error t value Pr(>|t|)
## exec -0.067471   0.076690 -0.8798   0.3834
## unem -0.070032   0.141755 -0.4940   0.6236

The β1 execution is much smaller and not significant. Texas is a outlier and is pushing the regression towards it’s direction.

fit.m <- plm(model.m, data = murder, index=c("id", "year"), effect="twoways", model="within")  # fixed effects regression
coeftest(fit.m, vcovHC(fit.m, cluster="group", type="HC0"))
## 
## t test of coefficients:
## 
##       Estimate Std. Error t value Pr(>|t|)  
## exec -0.138323   0.078723 -1.7571  0.08203 .
## unem  0.221316   0.366288  0.6042  0.54710  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This case we have the same situation as before where we have β1 negative and not significant and unemployment positive, but also not significant.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.