library(wooldridge)
data("k401k")
head(k401k)
## prate mrate totpart totelg age totemp sole ltotemp
## 1 26.1 0.21 1653 6322 8 8709 0 9.072112
## 2 100.0 1.42 262 262 6 315 1 5.752573
## 3 97.6 0.91 166 170 10 275 1 5.616771
## 4 100.0 0.42 257 257 7 500 0 6.214608
## 5 82.5 0.53 591 716 28 933 1 6.838405
## 6 100.0 1.82 92 92 7 143 1 4.962845
Explained var • prate: participation rate, percent The variable prate is the percentage of eligible workers with an active account; this is the variable we would like to explain. Explanatory var • mrate: 401k plan match rate mrate measures generosity. This variable gives the average amount the firm contributes to each worker’s plan for each 1 dollar of contribution by the worker.If mrate = 0.50, then a 1 dollar contribution by the worker is matched by a 50¢ contribution by the firm -> expect the higher the more participation rate • totpart: total 401k participants • totelg: total eligible for 401k plan • age: age of 401k plan • totemp: total number of firm employees • sole: = 1 if 401k is firm’s sole plan • ltotemp: log of totemp
#C1.(i) mean values
library(table1)
##
## Attaching package: 'table1'
## The following objects are masked from 'package:base':
##
## units, units<-
table1(~prate + mrate, data=k401k)
| Overall (N=1534) |
|
|---|---|
| prate | |
| Mean (SD) | 87.4 (16.7) |
| Median [Min, Max] | 95.7 [3.00, 100] |
| mrate | |
| Mean (SD) | 0.732 (0.780) |
| Median [Min, Max] | 0.460 [0.0100, 4.91] |
#C1(ii) regression
mod1 <- lm(prate ~ mrate, data=k401k)
summary(mod1)
##
## Call:
## lm(formula = prate ~ mrate, data = k401k)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.303 -8.184 5.178 12.712 16.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.0755 0.5633 147.48 <2e-16 ***
## mrate 5.8611 0.5270 11.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.09 on 1532 degrees of freedom
## Multiple R-squared: 0.0747, Adjusted R-squared: 0.0741
## F-statistic: 123.7 on 1 and 1532 DF, p-value: < 2.2e-16
R-squared 0.0747 (7%), obs = 1534
#C1(iii) interpretation intercept = 83.0755 (0.5633) coefficient = 5.8611 (0.5270)
#C1(iv) predicted Predicted value of prate when mrate = 3.5 is 83.0775 + 5.8611x3.5 = 103.59 which is impossible since max prate is 100
#C1(v) fitness mrate only explains about 7.5% the change in prate. This is very low, thus implies there are many other factors.
#C2(i) mean values The variable salary is annual compensation, in thousands of dollars, and ceoten is prior number of years as company CEO
data("ceosal2")
table1(~salary + ceoten, data=ceosal2)
| Overall (N=177) |
|
|---|---|
| salary | |
| Mean (SD) | 866 (588) |
| Median [Min, Max] | 707 [100, 5300] |
| ceoten | |
| Mean (SD) | 7.95 (7.15) |
| Median [Min, Max] | 6.00 [0, 37.0] |
#C2(ii) term of ceo
ten0 <- subset(ceosal2, ceoten==0)
max(ceosal2$ceoten)
## [1] 37
5 CEOs is serving 1st term. Longest term is 37 years. #C2(iii) regression
logsalary <- log(ceosal2$salary)
mod2 <- lm(logsalary ~ ceoten, data = ceosal2)
summary(mod2)
##
## Call:
## lm(formula = logsalary ~ ceoten, data = ceosal2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.15314 -0.38319 -0.02251 0.44439 1.94337
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.505498 0.067991 95.682 <2e-16 ***
## ceoten 0.009724 0.006364 1.528 0.128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6038 on 175 degrees of freedom
## Multiple R-squared: 0.01316, Adjusted R-squared: 0.007523
## F-statistic: 2.334 on 1 and 175 DF, p-value: 0.1284
Intercept | 6.505498 (0.067991) Coefficient | 0.009724 (0.006364) obs = 177 R-squared: 0.01316 (very low) if ceoten raises 1 more year, salary is expected to raise 0.9%
#C3(i) regression whether there is a tradeoff between the time spent sleeping per week and the time spent in paid work. sleep is minutes spent sleeping at night per week and totwrk is total minutes worked during the week
data("sleep75")
mod3 <- lm(sleep ~ totwrk, data = sleep75)
summary(mod3)
##
## Call:
## lm(formula = sleep ~ totwrk, data = sleep75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2429.94 -240.25 4.91 250.53 1339.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3586.37695 38.91243 92.165 <2e-16 ***
## totwrk -0.15075 0.01674 -9.005 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 421.1 on 704 degrees of freedom
## Multiple R-squared: 0.1033, Adjusted R-squared: 0.102
## F-statistic: 81.09 on 1 and 704 DF, p-value: < 2.2e-16
Intercept | 3586.37695 (38.91243) Coefficient | -0.15075 (0.01674) obs = 706 R-squared: 0.1033 (very low) Intercept means if the person doesn’t work, he would sleep that much (which is about 3586 minutes per week, equivalently, 8.54 hours per night)
#C3(ii) analysis If totwrk increases by 2 hours, sleep is estimated to fall -0.15 x 120 mins = -18 minutes. This effect is not that large.
#C5(ii) regression
data("rdchem")
mod4 <- lm(lrd ~ lsales, data=rdchem)
summary(mod4)
##
## Call:
## lm(formula = lrd ~ lsales, data = rdchem)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90406 -0.40086 -0.02178 0.40562 1.10439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.10472 0.45277 -9.066 4.27e-10 ***
## lsales 1.07573 0.06183 17.399 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5294 on 30 degrees of freedom
## Multiple R-squared: 0.9098, Adjusted R-squared: 0.9068
## F-statistic: 302.7 on 1 and 30 DF, p-value: < 2.2e-16
Intercept | -4.10472 (0.45277) Coefficient | 1.07573 (0.06183) obs = 32 R-squared: 0.9098 A 1% increase in annual sales revenue increases annual expenditure in research and development by about 1.08%