Lab 2

Q1-(i)

Find the average participation rate and the average match rate in the sample of plans.

setwd("C:/Users/19135/OneDrive/Documents/RStudio/ECON 526 Lab #2") #set up your working directory

library(readxl)
dataq1=read_excel("401k.xls")
#summary(dataq1)

Avg_prate=mean(dataq1$prate)
Avg_mrate=mean(dataq1$mrate)

cat("Average value of prate is", Avg_prate, "and the average value of mrate is",  Avg_mrate)
## Average value of prate is 87.36291 and the average value of mrate is 0.7315124

(ii-iii)

lm.fit1=lm(prate~mrate, data=dataq1)
summary(lm.fit1)
## 
## Call:
## lm(formula = prate ~ mrate, data = dataq1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.303  -8.184   5.178  12.712  16.807 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  83.0755     0.5633  147.48   <2e-16 ***
## mrate         5.8611     0.5270   11.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.09 on 1532 degrees of freedom
## Multiple R-squared:  0.0747, Adjusted R-squared:  0.0741 
## F-statistic: 123.7 on 1 and 1532 DF,  p-value: < 2.2e-16
nrow(dataq1)
## [1] 1534

(iv)

Find the predicted prate when mrate = 3.5. Is this a reasonable prediction?
\[\widehat{prate}=83.0755+5.8611 mrate\]

predict(lm.fit1, data.frame(mrate=c(3.5)))
##        1 
## 103.5892
#test=data.frame(mrate=c(3.5, 2.5))
sum(dataq1$mrate>=3.5)
## [1] 34

Q2-(i)

dataq2=read_excel("sleep75-1.xls")

lm.fit2=lm(sleep~totwrk, data=dataq2)
summary(lm.fit2)
## 
## Call:
## lm(formula = sleep ~ totwrk, data = dataq2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2429.94  -240.25     4.91   250.53  1339.72 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3586.37695   38.91243  92.165   <2e-16 ***
## totwrk        -0.15075    0.01674  -9.005   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 421.1 on 704 degrees of freedom
## Multiple R-squared:  0.1033, Adjusted R-squared:  0.102 
## F-statistic: 81.09 on 1 and 704 DF,  p-value: < 2.2e-16

(ii)

If totwrk increases by 2 hours, by how much is sleep estimated to fall? Do you find this to be a large effect? \[\widehat{sleep}=3586.37695-0.15075 totwrk\]

predict(lm.fit2, data.frame(totwrk=c(2*60)))-predict(lm.fit2, data.frame(totwrk=c(0)))
##        1 
## -18.0895

Q3-(i)

Write down a model (not an estimated equation) that implies a constant elasticity between rd and sales. Which parameter is the elasticity?

\[log(rd)=\beta_0+\beta_1 log(sales)+u\]

dataq3=read_excel("rdchem-1.xls")

lm.fit3=lm(log(rd)~log(sales), data=dataq3)
summary(lm.fit3)
## 
## Call:
## lm(formula = log(rd) ~ log(sales), data = dataq3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90406 -0.40086 -0.02178  0.40562  1.10438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.10472    0.45277  -9.066 4.27e-10 ***
## log(sales)   1.07573    0.06183  17.399  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5294 on 30 degrees of freedom
## Multiple R-squared:  0.9098, Adjusted R-squared:  0.9068 
## F-statistic: 302.7 on 1 and 30 DF,  p-value: < 2.2e-16

Q4-(i)

#rm(list=ls()) #clean the environment

dataq4=read_excel("countymurders-1.xls") #this is the panel data

#test=data.frame(dataq4$countyid,dataq4$year)

data1996=dataq4[dataq4$year==1996,] #this is the cross-sectional data for year=1996

(i)

How many counties had zero murders in 1996? How many counties had at least one execution? What is the largest number of executions?

zero_murders=sum(data1996$murders==0)
print(zero_murders)
## [1] 1051
one_exec=sum(data1996$execs>=1)
one_exec
## [1] 31
max_exec=max(data1996$execs)
max_exec
## [1] 3

(ii-iii)

lm.fit4=lm(murders~execs, data=data1996)
summary(lm.fit4)
## 
## Call:
## lm(formula = murders ~ execs, data = data1996)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -149.12   -5.46   -4.46   -2.46 1338.99 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.4572     0.8348   6.537 7.79e-11 ***
## execs        58.5555     5.8333  10.038  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.89 on 2195 degrees of freedom
## Multiple R-squared:  0.04389,    Adjusted R-squared:  0.04346 
## F-statistic: 100.8 on 1 and 2195 DF,  p-value: < 2.2e-16

(iv)

What is the smallest number of murders that can be predicted by the equation? What is the residual for a county with zero executions and zero murders?

predicted_value=predict(lm.fit4, data.frame(execs=c(0)))
actual=0
residual=actual-predicted_value
residual
##         1 
## -5.457241