Find the average participation rate and the average match rate in the sample of plans.
setwd("C:/Users/19135/OneDrive/Documents/RStudio/ECON 526 Lab #2") #set up your working directory
library(readxl)
dataq1=read_excel("401k.xls")
#summary(dataq1)
Avg_prate=mean(dataq1$prate)
Avg_mrate=mean(dataq1$mrate)
cat("Average value of prate is", Avg_prate, "and the average value of mrate is", Avg_mrate)
## Average value of prate is 87.36291 and the average value of mrate is 0.7315124
lm.fit1=lm(prate~mrate, data=dataq1)
summary(lm.fit1)
##
## Call:
## lm(formula = prate ~ mrate, data = dataq1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.303 -8.184 5.178 12.712 16.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.0755 0.5633 147.48 <2e-16 ***
## mrate 5.8611 0.5270 11.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.09 on 1532 degrees of freedom
## Multiple R-squared: 0.0747, Adjusted R-squared: 0.0741
## F-statistic: 123.7 on 1 and 1532 DF, p-value: < 2.2e-16
nrow(dataq1)
## [1] 1534
Find the predicted prate when mrate = 3.5. Is this a reasonable
prediction?
\[\widehat{prate}=83.0755+5.8611
mrate\]
predict(lm.fit1, data.frame(mrate=c(3.5)))
## 1
## 103.5892
#test=data.frame(mrate=c(3.5, 2.5))
sum(dataq1$mrate>=3.5)
## [1] 34
dataq2=read_excel("sleep75-1.xls")
lm.fit2=lm(sleep~totwrk, data=dataq2)
summary(lm.fit2)
##
## Call:
## lm(formula = sleep ~ totwrk, data = dataq2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2429.94 -240.25 4.91 250.53 1339.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3586.37695 38.91243 92.165 <2e-16 ***
## totwrk -0.15075 0.01674 -9.005 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 421.1 on 704 degrees of freedom
## Multiple R-squared: 0.1033, Adjusted R-squared: 0.102
## F-statistic: 81.09 on 1 and 704 DF, p-value: < 2.2e-16
If totwrk increases by 2 hours, by how much is sleep estimated to fall? Do you find this to be a large effect? \[\widehat{sleep}=3586.37695-0.15075 totwrk\]
predict(lm.fit2, data.frame(totwrk=c(2*60)))-predict(lm.fit2, data.frame(totwrk=c(0)))
## 1
## -18.0895
Write down a model (not an estimated equation) that implies a constant elasticity between rd and sales. Which parameter is the elasticity?
\[log(rd)=\beta_0+\beta_1 log(sales)+u\]
dataq3=read_excel("rdchem-1.xls")
lm.fit3=lm(log(rd)~log(sales), data=dataq3)
summary(lm.fit3)
##
## Call:
## lm(formula = log(rd) ~ log(sales), data = dataq3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90406 -0.40086 -0.02178 0.40562 1.10438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.10472 0.45277 -9.066 4.27e-10 ***
## log(sales) 1.07573 0.06183 17.399 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5294 on 30 degrees of freedom
## Multiple R-squared: 0.9098, Adjusted R-squared: 0.9068
## F-statistic: 302.7 on 1 and 30 DF, p-value: < 2.2e-16
#rm(list=ls()) #clean the environment
dataq4=read_excel("countymurders-1.xls") #this is the panel data
#test=data.frame(dataq4$countyid,dataq4$year)
data1996=dataq4[dataq4$year==1996,] #this is the cross-sectional data for year=1996
How many counties had zero murders in 1996? How many counties had at least one execution? What is the largest number of executions?
zero_murders=sum(data1996$murders==0)
print(zero_murders)
## [1] 1051
one_exec=sum(data1996$execs>=1)
one_exec
## [1] 31
max_exec=max(data1996$execs)
max_exec
## [1] 3
lm.fit4=lm(murders~execs, data=data1996)
summary(lm.fit4)
##
## Call:
## lm(formula = murders ~ execs, data = data1996)
##
## Residuals:
## Min 1Q Median 3Q Max
## -149.12 -5.46 -4.46 -2.46 1338.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.4572 0.8348 6.537 7.79e-11 ***
## execs 58.5555 5.8333 10.038 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.89 on 2195 degrees of freedom
## Multiple R-squared: 0.04389, Adjusted R-squared: 0.04346
## F-statistic: 100.8 on 1 and 2195 DF, p-value: < 2.2e-16
What is the smallest number of murders that can be predicted by the equation? What is the residual for a county with zero executions and zero murders?
predicted_value=predict(lm.fit4, data.frame(execs=c(0)))
actual=0
residual=actual-predicted_value
residual
## 1
## -5.457241