Computer Problem Set 1

1.1

lgrent <- lm(lrent~y90+lpop+lavginc+pctstu, data=rental)
stargazer(lgrent, summary = FALSE, type = "text", colnames=FALSE, rownames=FALSE)

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                lrent           
## -----------------------------------------------
## y90                          0.262***          
##                               (0.035)          
##                                                
## lpop                          0.041*           
##                               (0.023)          
##                                                
## lavginc                      0.571***          
##                               (0.053)          
##                                                
## pctstu                       0.005***          
##                               (0.001)          
##                                                
## Constant                      -0.569           
##                               (0.535)          
##                                                
## -----------------------------------------------
## Observations                    128            
## R2                             0.861           
## Adjusted R2                    0.857           
## Residual Std. Error      0.126 (df = 123)      
## F Statistic          190.922*** (df = 4; 123)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The y90 dummy variable shows that rent increased on average by 26.2% from the year 1980 to the year 1990. This is a very large change and is clearly shown as significant. The variable pctstu(percentage of population are students) shows a coefficient of .005. So for every percentage point increase in the percent of populations(students) there is a 0.5% point increase in rent. This is also statistically significant.

1.2

coeftest(lgrent, vcov = hccm)

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.5688069  0.9502566 -0.5986    0.5506    
## y90          0.2622267  0.0648191  4.0455 9.154e-05 ***
## lpop         0.0406863  0.0243291  1.6723    0.0970 .  
## lavginc      0.5714461  0.1120493  5.1000 1.249e-06 ***
## pctstu       0.0050436  0.0012401  4.0669 8.444e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The equation had been missing the explanatory variable, a, as it is not in the OLS estimation. This makes the errors not valid. After accounting for it, we see the coefficients are the same but standard errors changed. Heteroskedasticity happens when the distribution of error is not identical across all parts of the data. By leaving out the explanatory variable, the error term across the two periods may be positively correlated.

1.3

#This code declares the data to be panel with city defining the units i and year defining t.
rental.p <- pdata.frame(rental, index = c("city", "year"))

#This code estimates the differenced equation. Note that there is supposed to be an easier way to do this with plm but there appears to be a bug in the program right now so we're going to difference the data and then use lm.
rental.p$dlrent <- diff(rental.p$lrent)
rental.p$dy90 <- diff(rental.p$y90)
rental.p$dlpop <- diff(rental.p$lpop)
rental.p$dlavginc <- diff(rental.p$lavginc)
rental.p$dpctstu <- diff(rental.p$pctstu)
e1.3 <- lm(dlrent ~ y90 + dlpop + dlavginc + dpctstu, data = rental.p)

#Compare results from OLS to plm side by side.
stargazer(lgrent, e1.3, type = "text")

## 
## ==================================================================
##                                  Dependent variable:              
##                     ----------------------------------------------
##                              lrent                  dlrent        
##                               (1)                     (2)         
## ------------------------------------------------------------------
## y90                         0.262***                              
##                             (0.035)                               
##                                                                   
## lpop                         0.041*                               
##                             (0.023)                               
##                                                                   
## lavginc                     0.571***                              
##                             (0.053)                               
##                                                                   
## pctstu                      0.005***                              
##                             (0.001)                               
##                                                                   
## dlpop                                                0.072        
##                                                     (0.088)       
##                                                                   
## dlavginc                                           0.310***       
##                                                     (0.066)       
##                                                                   
## dpctstu                                            0.011***       
##                                                     (0.004)       
##                                                                   
## Constant                     -0.569                0.386***       
##                             (0.535)                 (0.037)       
##                                                                   
## ------------------------------------------------------------------
## Observations                  128                     64          
## R2                           0.861                   0.322        
## Adjusted R2                  0.857                   0.288        
## Residual Std. Error     0.126 (df = 123)        0.090 (df = 60)   
## F Statistic         190.922*** (df = 4; 123) 9.510*** (df = 3; 60)
## ==================================================================
## Note:                                  *p<0.1; **p<0.05; ***p<0.01

1.4

There are exactly half as many variables as the OLS model because each variable was differenced over time. Whats left is the difference between 1980 and 1990 so you have half the number of observations. This also eliminates the need for the y90 dummy variable which is why it is not shown in the differenced model. The dpctstu coefficient is actually quite a bit larger than the OLS model. The SE is slightly larger. Now for every 1% point increase in student population, rent increases by 1.1% point rather than the .05% point we saw earlier.

1.5

coeftest(e1.3, vcov = hccm)

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 0.3855214  0.0555902  6.9351 3.228e-09 ***
## dlpop       0.0722456  0.0737387  0.9798 0.3311435    
## dlavginc    0.3099605  0.1019380  3.0407 0.0034963 ** 
## dpctstu     0.0112033  0.0031104  3.6019 0.0006423 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The standard error for dpctstu is even lower than it was before (.003 compared to .004), making it more significant. Serial correlation shouldn’t be an issue anymore though because there is no time series in this data once it is differenced between the two years.

Exercise 2

2.1

data(gpa3)
#?gpa3
gpareg <- lm(trmgpa~spring+sat+hsperc+female+black+white+frstsem+tothrs+crsgpa+season, data=gpa3)
stargazer(gpareg, summary = FALSE, type = "text", colnames=FALSE, rownames=FALSE)

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               trmgpa           
## -----------------------------------------------
## spring                        -0.058           
##                               (0.048)          
##                                                
## sat                          0.002***          
##                              (0.0001)          
##                                                
## hsperc                       -0.009***         
##                               (0.001)          
##                                                
## female                       0.350***          
##                               (0.052)          
##                                                
## black                        -0.254**          
##                               (0.123)          
##                                                
## white                         -0.023           
##                               (0.117)          
##                                                
## frstsem                       -0.035           
##                               (0.076)          
##                                                
## tothrs                        -0.0003          
##                               (0.001)          
##                                                
## crsgpa                       1.048***          
##                               (0.104)          
##                                                
## season                        -0.027           
##                               (0.049)          
##                                                
## Constant                     -1.753***         
##                               (0.348)          
##                                                
## -----------------------------------------------
## Observations                    732            
## R2                             0.478           
## Adjusted R2                    0.470           
## Residual Std. Error      0.552 (df = 721)      
## F Statistic          65.907*** (df = 10; 721)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

2.2

It could be assumed that football players might have a lower academic level than other athletes, largely due to how football playing student athletes are recruited and accepted into universisties compared to other sports. If this is the case we could see football players having a negative effect towards gpa as a whole if omitted. A positive effect would occur to the spring variable if omitted, as there would be a larger jump from fall to spring if the type of sport was not held constant since they take up a majority of fall sports. This results in an overall negative bias on our model.

2.3

#This code declares the data to be panel with city defining the units i and year defining t.
gpadiffprep <- pdata.frame(gpa3, index = c("id", "spring"))

#This code estimates the differenced equation. Note that there is supposed to be an easier way to do this with plm but there appears to be a bug in the program right now so we're going to difference the data and then use lm.


gpadiffprep$dtrmgpa <- diff(gpadiffprep$trmgpa)
gpadiffprep$dcrsgpa <- diff(gpadiffprep$crsgpa)
gpadiffprep$dseason <- diff(gpadiffprep$season)
gpadiffprep$dfrstsem <- diff(gpadiffprep$frstsem)
gpadiffprep$dtothrs <- diff(gpadiffprep$tothrs)


e2.3 <- lm(dtrmgpa ~ dcrsgpa + dseason + dfrstsem + dtothrs, data = gpadiffprep)

#Compare results from OLS to plm side by side.
stargazer(gpareg, e2.3, type = "text")

## 
## ====================================================================
##                                   Dependent variable:               
##                     ------------------------------------------------
##                              trmgpa                  dtrmgpa        
##                               (1)                      (2)          
## --------------------------------------------------------------------
## spring                       -0.058                                 
##                             (0.048)                                 
##                                                                     
## sat                         0.002***                                
##                             (0.0001)                                
##                                                                     
## hsperc                     -0.009***                                
##                             (0.001)                                 
##                                                                     
## female                      0.350***                                
##                             (0.052)                                 
##                                                                     
## black                       -0.254**                                
##                             (0.123)                                 
##                                                                     
## white                        -0.023                                 
##                             (0.117)                                 
##                                                                     
## frstsem                      -0.035                                 
##                             (0.076)                                 
##                                                                     
## tothrs                      -0.0003                                 
##                             (0.001)                                 
##                                                                     
## crsgpa                      1.048***                                
##                             (0.104)                                 
##                                                                     
## season                       -0.027                                 
##                             (0.049)                                 
##                                                                     
## dcrsgpa                                             1.136***        
##                                                      (0.119)        
##                                                                     
## dseason                                              -0.065         
##                                                      (0.043)        
##                                                                     
## dfrstsem                                              0.019         
##                                                      (0.069)        
##                                                                     
## dtothrs                                               0.012         
##                                                      (0.014)        
##                                                                     
## Constant                   -1.753***                 -0.237         
##                             (0.348)                  (0.206)        
##                                                                     
## --------------------------------------------------------------------
## Observations                  732                      366          
## R2                           0.478                    0.208         
## Adjusted R2                  0.470                    0.199         
## Residual Std. Error     0.552 (df = 721)        0.578 (df = 361)    
## F Statistic         65.907*** (df = 10; 721) 23.703*** (df = 4; 361)
## ====================================================================
## Note:                                    *p<0.1; **p<0.05; ***p<0.01

2.4

There doesn’t appeart to be any significant difference between when athlete’s respective sports are in season versus not in season. The standard errors seem to be too large in every case to show the data as being significant.

2.5

Another bias could be the numer of courses taken by a student. If an athlete is taking only 12 credits in their season versus taking 15-18 in their non playing season it could signify that even though their gpa doesn’t change much over the year, they are still accomodating for their sport. Difficulty of classes taken could be another factor if easier general ed classes are being taken during their season versus higher level classes. To focus on the number of classes taken, likely that would have a negative effect on total gpa as a busier schedule might lead to a lower gpa. Number of classes taken would negatively react to the season variable creating a positive bias overall.

Exercise 3

3.1

e3.1 <- plm(lrent ~ y90 + lpop + lavginc + pctstu, data = rental.p, model = "within")
summary(e3.1)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = lrent ~ y90 + lpop + lavginc + pctstu, data = rental.p, 
##     model = "within")
## 
## Balanced Panel: n = 64, T = 2, N = 128
## 
## Residuals:
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
## -0.1190 -0.0296  0.0000  0.0296  0.1190 
## 
## Coefficients:
##          Estimate Std. Error t-value  Pr(>|t|)    
## y90     0.3855214  0.0368245 10.4692 3.661e-15 ***
## lpop    0.0722456  0.0883426  0.8178  0.416714    
## lavginc 0.3099605  0.0664771  4.6627 1.788e-05 ***
## pctstu  0.0112033  0.0041319  2.7114  0.008726 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.383
## Residual Sum of Squares: 0.24368
## R-Squared:      0.97653
## Adj. R-Squared: 0.95032
## F-statistic: 624.146 on 4 and 60 DF, p-value: < 2.22e-16

The effects and their standard errors are identical in this case, which makes sense as there were was only two time series to difference.

4.1

data("jtrain")
#?jtrain

jtrainols <- lm(hrsemp~d88+d89+grant+grant_1+lemploy, data=jtrain)

#This code declares the data to be panel with city defining the units i and year defining t.
jtrainprep <- pdata.frame(jtrain, index = c("fcode", "year"))

#This code estimates the differenced equation. Note that there is supposed to be an easier way to do this with plm but there appears to be a bug in the program right now so we're going to difference the data and then use lm.

jtrainprep$dhrsemp <- diff(jtrainprep$hrsemp)
jtrainprep$dd88 <- diff(jtrainprep$d88)
jtrainprep$dd89 <- diff(jtrainprep$d89)
jtrainprep$dgrant <- diff(jtrainprep$grant)
jtrainprep$dgrant_1 <- diff(jtrainprep$grant_1)
jtrainprep$dlemploy <- diff(jtrainprep$lemploy)



e4.1 <- lm(dhrsemp ~ dd88 + dd89 + dgrant + dgrant_1 + dlemploy, data = jtrainprep)

#Compare results from OLS to plm side by side.
stargazer(jtrainols, e4.1, type = "text")

## 
## ===================================================================
##                                   Dependent variable:              
##                     -----------------------------------------------
##                             hrsemp                  dhrsemp        
##                               (1)                     (2)          
## -------------------------------------------------------------------
## d88                         -0.201                                 
##                             (2.908)                                
##                                                                    
## d89                         6.027*                                 
##                             (3.134)                                
##                                                                    
## grant                      31.971***                               
##                             (3.381)                                
##                                                                    
## grant_1                     -3.840                                 
##                             (4.489)                                
##                                                                    
## lemploy                    -4.737***                               
##                             (1.078)                                
##                                                                    
## dd88                                               -2.712**        
##                                                     (1.325)        
##                                                                    
## dd89                                                               
##                                                                    
##                                                                    
## dgrant                                             32.601***       
##                                                     (2.968)        
##                                                                    
## dgrant_1                                             1.997         
##                                                     (5.555)        
##                                                                    
## dlemploy                                             0.744         
##                                                     (4.868)        
##                                                                    
## Constant                   25.356***                 1.972         
##                             (4.230)                 (1.605)        
##                                                                    
## -------------------------------------------------------------------
## Observations                  390                     251          
## R2                           0.257                   0.476         
## Adjusted R2                  0.248                   0.467         
## Residual Std. Error    22.300 (df = 384)       19.430 (df = 246)   
## F Statistic         26.617*** (df = 5; 384) 55.800*** (df = 4; 246)
## ===================================================================
## Note:                                   *p<0.1; **p<0.05; ***p<0.01

4.2

By our OLS model, when a job training grant has been given, you can see that hours of job training per employee increases by a very large 31.971 hours. Even in our differencese model it is at 32.601 hours. This is considered very significant and is clearly the largest effect from the model.

4.3

No, as its likely a company would take advantage of the extra training hours easily within a years period and wouldn’t neccesarily need a lag to help show the effects of being given a grant. If anything it would remain about the same or less than the year before as they wouldn’t neccesarrily keep needing to train their employees more and more hours.

4.4

The differences model doesn’t seem to show an effect with # of employees but the OLS model certainly does. It emplies that for every 1% point increase in the number of employees, there is 4.737 less hours of job training per employee which seems very large but this could be because of the type of job, as the company gets larger. So larger firms provide employees with less training on average.

Computer Problem Set 1

Mary Hamman

April 18, 2018

1.1

1.2

1.3

1.4

1.5

Exercise 2

2.1

2.2

2.3

2.4

2.5

Exercise 3

3.1

4.1

4.2

4.3

4.4