C1

library(wooldridge)
data("k401k")
head(k401k)
##   prate mrate totpart totelg age totemp sole  ltotemp
## 1  26.1  0.21    1653   6322   8   8709    0 9.072112
## 2 100.0  1.42     262    262   6    315    1 5.752573
## 3  97.6  0.91     166    170  10    275    1 5.616771
## 4 100.0  0.42     257    257   7    500    0 6.214608
## 5  82.5  0.53     591    716  28    933    1 6.838405
## 6 100.0  1.82      92     92   7    143    1 4.962845

Explained var • prate: participation rate, percent The variable prate is the percentage of eligible workers with an active account; this is the variable we would like to explain. Explanatory var • mrate: 401k plan match rate mrate measures generosity. This variable gives the average amount the firm contributes to each worker’s plan for each 1 dollar of contribution by the worker.If mrate = 0.50, then a 1 dollar contribution by the worker is matched by a 50¢ contribution by the firm -> expect the higher the more participation rate • totpart: total 401k participants • totelg: total eligible for 401k plan • age: age of 401k plan • totemp: total number of firm employees • sole: = 1 if 401k is firm’s sole plan • ltotemp: log of totemp

#C1.(i) mean values

library(table1)
## 
## Attaching package: 'table1'
## The following objects are masked from 'package:base':
## 
##     units, units<-
table1(~prate + mrate, data=k401k)

Overall
(N=1534)
prate
Mean (SD) 87.4 (16.7)
Median [Min, Max] 95.7 [3.00, 100]
mrate
Mean (SD) 0.732 (0.780)
Median [Min, Max] 0.460 [0.0100, 4.91]
===> results prate | Mean (SD) 87.4 (16.7) mrate | Mean (SD) 0.732 (0.780)

#C1(ii) regression

mod1 <- lm(prate ~ mrate, data=k401k)
summary(mod1)
## 
## Call:
## lm(formula = prate ~ mrate, data = k401k)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.303  -8.184   5.178  12.712  16.807 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  83.0755     0.5633  147.48   <2e-16 ***
## mrate         5.8611     0.5270   11.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.09 on 1532 degrees of freedom
## Multiple R-squared:  0.0747, Adjusted R-squared:  0.0741 
## F-statistic: 123.7 on 1 and 1532 DF,  p-value: < 2.2e-16

R-squared 0.0747 (7%), obs = 1534

#C1(iii) interpretation intercept = 83.0755 (0.5633) coefficient = 5.8611 (0.5270)

#C1(iv) predicted Predicted value of prate when mrate = 3.5 is 83.0775 + 5.8611x3.5 = 103.59 which is impossible since max prate is 100

#C1(v) fitness mrate only explains about 7.5% the change in prate. This is very low, thus implies there are many other factors.

#C2(i) mean values The variable salary is annual compensation, in thousands of dollars, and ceoten is prior number of years as company CEO

data("ceosal2")
table1(~salary + ceoten, data=ceosal2)

Overall
(N=177)
salary
Mean (SD) 866 (588)
Median [Min, Max] 707 [100, 5300]
ceoten
Mean (SD) 7.95 (7.15)
Median [Min, Max] 6.00 [0, 37.0]
salary | Mean (SD) 866 (588) (thousand dollars) ceoten | Mean (SD) 7.95 (7.15) (yrs)

#C2(ii) term of ceo

ten0 <- subset(ceosal2, ceoten==0)
max(ceosal2$ceoten)
## [1] 37

5 CEOs is serving 1st term. Longest term is 37 years. #C2(iii) regression

logsalary <- log(ceosal2$salary)
mod2 <- lm(logsalary ~ ceoten, data = ceosal2)
summary(mod2)
## 
## Call:
## lm(formula = logsalary ~ ceoten, data = ceosal2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.15314 -0.38319 -0.02251  0.44439  1.94337 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.505498   0.067991  95.682   <2e-16 ***
## ceoten      0.009724   0.006364   1.528    0.128    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6038 on 175 degrees of freedom
## Multiple R-squared:  0.01316,    Adjusted R-squared:  0.007523 
## F-statistic: 2.334 on 1 and 175 DF,  p-value: 0.1284

Intercept | 6.505498 (0.067991) Coefficient | 0.009724 (0.006364) obs = 177 R-squared: 0.01316 (very low) if ceoten raises 1 more year, salary is expected to raise 0.9%

#C3(i) regression whether there is a tradeoff between the time spent sleeping per week and the time spent in paid work. sleep is minutes spent sleeping at night per week and totwrk is total minutes worked during the week

data("sleep75")
mod3 <- lm(sleep ~ totwrk, data = sleep75)
summary(mod3)
## 
## Call:
## lm(formula = sleep ~ totwrk, data = sleep75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2429.94  -240.25     4.91   250.53  1339.72 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3586.37695   38.91243  92.165   <2e-16 ***
## totwrk        -0.15075    0.01674  -9.005   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 421.1 on 704 degrees of freedom
## Multiple R-squared:  0.1033, Adjusted R-squared:  0.102 
## F-statistic: 81.09 on 1 and 704 DF,  p-value: < 2.2e-16

Intercept | 3586.37695 (38.91243) Coefficient | -0.15075 (0.01674) obs = 706 R-squared: 0.1033 (very low) Intercept means if the person doesn’t work, he would sleep that much (which is about 3586 minutes per week, equivalently, 8.54 hours per night)

#C3(ii) analysis If totwrk increases by 2 hours, sleep is estimated to fall -0.15 x 120 mins = -18 minutes. This effect is not that large.

#C5(ii) regression

data("rdchem")
mod4 <- lm(lrd ~ lsales, data=rdchem)
summary(mod4)
## 
## Call:
## lm(formula = lrd ~ lsales, data = rdchem)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90406 -0.40086 -0.02178  0.40562  1.10439 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.10472    0.45277  -9.066 4.27e-10 ***
## lsales       1.07573    0.06183  17.399  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5294 on 30 degrees of freedom
## Multiple R-squared:  0.9098, Adjusted R-squared:  0.9068 
## F-statistic: 302.7 on 1 and 30 DF,  p-value: < 2.2e-16

Intercept | -4.10472 (0.45277) Coefficient | 1.07573 (0.06183) obs = 32 R-squared: 0.9098 A 1% increase in annual sales revenue increases annual expenditure in research and development by about 1.08%