##
## Proportion of Missing Across Variables:
## Proportion of Missing in Rate Health: 3 /6279
## Proportion of Missing in Heart Condition: 3 /6279
## Proportion of Missing in Activity Frequency: 7 /6279
## Proportion of Missing in Work Enjoyment: 248 /6279
## Proportion of Missing in Any Dependents: 297 /6279
## Proportion of Missing in Working Spouse: 2835 /6279
## Proportion of Missing in Weeks of Paid Vacation: 1327 /6279
## Proportion of Missing in Ability to Reduce Hours: 1319 /6279
## Proportion of Missing in Employment Status: 26 /6279
## Proportion of Missing in Medicare Coverage: 27 /6279
## Proportion of Missing in Medicaid Coverage: 42 /6279
##
## Proportion of Missing in Rate Health: 3 /6279
## Proportion of Excellent in Rate Health: 700 /6279
## Proportion of Very Good in Rate Health: 2254 /6279
## Proportion of Good in Rate Health: 2275 /6279
## Proportion of Fair in Rate Health: 923 /6279
## Proportion of Poor in Rate Health: 124 /6279
##
## Proportion of Missing in Activity Frequency: 7 /6279
## Proportion of Once Weekly in Activity Frequency: 1107 /6279
## Proportion of More than Once Weekly in Activity Frequency: 2881 /6279
## Proportion of Daily in Activity Frequency: 778 /6279
## Proportion of Inactive in Activity Frequency: 1513 /6279
##
## Proportion of Married: 5745 /6279
## Proportion of Divorced: 16 /6279
## Proportion of Divorced With New Partner: 487 /6279
## Proportion of Divorced Who Remaried Same Partner: 31 /6279
##
## Proportion of Employeed: 487 /6279
## Proportion of Unemployed: 16 /6279
## Proportion of Recently Laid Off: 31 /6279
##
## Proportion of Those Who Really Like Working: 1725 /6279
## Proportion of Those Who Like Working: 3696 /6279
## Proportion of Those Who Dislike Working: 499 /6279
## Proportion of Those Who Really Dislike Working: 111 /6279
##
## Proportion of Missing Across Variables:
## Proportion of Missing in Work Enjoyment: 239 /6147
## Proportion of Missing in Any Dependents: 292 /6147
## Proportion of Missing in Working Spouse: 2760 /6147
## Proportion of Missing in Weeks of Paid Vacation: 1291 /6147
## Proportion of Missing in Ability to Reduce Hours: 1284 /6147
## Proportion of Missing in Employment Status: 127 /6147
##
## Proportion of Excellent in Rate Health: 693 /6147
## Proportion of Very Good in Rate Health: 2218 /6147
## Proportion of Good in Rate Health: 2218 /6147
## Proportion of Fair in Rate Health: 898 /6147
## Proportion of Poor in Rate Health: 0 /6147
##
## Proportion of Once Weekly in Activity Frequency: 1085 /6147
## Proportion of More than Once Weekly in Activity Frequency: 2833 /6147
## Proportion of Daily in Activity Frequency: 763 /6147
## Proportion of Inactive in Activity Frequency: 0 /6147
##
## Proportion of Married: 5666 /6147
## Proportion of Divorced With New Partner: 0 /6147
##
## Proportion of Employeed: 2743 /6147
##
## Proportion of Those Who Really Like Working: 1689 /6147
## Proportion of Those Who Like Working: 3618 /6147
## Proportion of Those Who Dislike Working: 493 /6147
## Proportion of Those Who Really Dislike Working: 0 /6147
##
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ ., data = df_reg_no_singularities)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.897 -4.924 -1.038 5.349 147.575
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.291e+01 3.443e+00 12.463 < 2e-16 ***
## AGE -1.466e-01 3.494e-02 -4.195 2.76e-05 ***
## HEART_CONDITION 2.812e-01 4.835e-01 0.582 0.56090
## ANY_DEPENDENTS 4.352e-01 4.236e-01 1.027 0.30428
## WORKING_SPOUSE -6.431e-01 5.429e-01 -1.185 0.23624
## WEEKS_PAID_VACATION 5.293e-01 6.153e-02 8.603 < 2e-16 ***
## REDUCE_PAID_WORK_HOURS -1.229e+00 3.835e-01 -3.205 0.00136 **
## MEDICARE -1.351e+00 5.814e-01 -2.324 0.02017 *
## MEDICAID -4.349e+00 5.726e-01 -7.596 3.51e-14 ***
## HOSPITAL_EXPENSES 3.103e-04 1.225e-04 2.533 0.01134 *
## RETIRED -1.384e+01 4.796e-01 -28.862 < 2e-16 ***
## MISSING_RATE_HEALTH 1.565e+01 7.308e+00 2.141 0.03230 *
## MISSING_HEART_CONDITION 1.861e+00 7.168e+00 0.260 0.79519
## MISSING_ACTIVITY_FREQUENCY 1.126e+00 4.701e+00 0.240 0.81064
## MISSING_WORK_ENJOYMENT -1.506e+01 1.655e+00 -9.100 < 2e-16 ***
## MISSING_ANY_DEPENDENTS 7.265e-01 7.514e-01 0.967 0.33362
## MISSING_WORKING_SPOUSE -1.798e+00 5.592e-01 -3.216 0.00131 **
## MISSING_WEEKS_PAID_VACATION 1.844e+00 8.576e-01 2.151 0.03155 *
## MISSING_REDUCE_PAID_WORK_HOURS -1.547e+00 8.885e-01 -1.741 0.08175 .
## MISSING_JOB_STATUS 6.748e+00 2.449e+00 2.756 0.00587 **
## MISSING_MEDICARE -1.519e+00 3.109e+00 -0.489 0.62521
## MISSING_MEDICAID -4.188e+00 2.468e+00 -1.697 0.08977 .
## MISSING_RETIRED 1.429e+01 1.510e+00 9.469 < 2e-16 ***
## ACTIVE_ONCE_WEEKLY 2.107e-01 4.940e-01 0.427 0.66968
## ACTIVE_MORE_THAN_ONCE_WEEKLY -5.971e-01 4.055e-01 -1.472 0.14097
## ACTIVE_DAILY 2.380e-01 5.574e-01 0.427 0.66935
## EXCELLENT_HEALTH 6.448e+00 1.229e+00 5.246 1.61e-07 ***
## VERY_GOOD_HEALTH 5.675e+00 1.163e+00 4.881 1.08e-06 ***
## GOOD_HEALTH 6.087e+00 1.154e+00 5.275 1.37e-07 ***
## FAIR_HEALTH 5.118e+00 1.190e+00 4.300 1.73e-05 ***
## MARRIED 1.228e+00 2.242e+00 0.548 0.58386
## DIVORCED_REMARIED_NEW_PARTNER 2.712e+00 2.311e+00 1.174 0.24049
## DIVORCED 8.269e-01 3.833e+00 0.216 0.82919
## REALLY_LIKE_WORKING 4.340e-01 1.221e+00 0.355 0.72238
## LIKE_WORKING 1.283e-01 1.198e+00 0.107 0.91471
## DISLIKE_WORKING -1.751e-01 1.303e+00 -0.134 0.89308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.39 on 6243 degrees of freedom
## Multiple R-squared: 0.311, Adjusted R-squared: 0.3071
## F-statistic: 80.5 on 35 and 6243 DF, p-value: < 2.2e-16
## [1] 49467.57
##
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ ., data = df_reg_significant)
##
## Residuals:
## Min 1Q Median 3Q Max
## -45.909 -4.856 -1.045 5.377 147.628
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.649e+01 2.678e+00 17.360 < 2e-16 ***
## AGE -1.605e-01 3.534e-02 -4.541 5.70e-06 ***
## HEART_CONDITION 4.312e-01 4.877e-01 0.884 0.37659
## ANY_DEPENDENTS 5.314e-01 4.248e-01 1.251 0.21102
## WORKING_SPOUSE -6.795e-01 5.448e-01 -1.247 0.21228
## WEEKS_PAID_VACATION 5.221e-01 6.146e-02 8.495 < 2e-16 ***
## REDUCE_PAID_WORK_HOURS -1.179e+00 3.853e-01 -3.060 0.00222 **
## MEDICARE -1.151e+00 5.866e-01 -1.963 0.04971 *
## MEDICAID -4.544e+00 5.745e-01 -7.910 3.03e-15 ***
## HOSPITAL_EXPENSES 3.027e-04 1.277e-04 2.370 0.01782 *
## RETIRED -1.394e+01 4.830e-01 -28.864 < 2e-16 ***
## MISSING_WORK_ENJOYMENT -1.627e+01 1.691e+00 -9.619 < 2e-16 ***
## MISSING_ANY_DEPENDENTS 8.878e-01 7.533e-01 1.179 0.23862
## MISSING_WORKING_SPOUSE -1.777e+00 5.608e-01 -3.168 0.00154 **
## MISSING_WEEKS_PAID_VACATION 1.495e+00 8.600e-01 1.738 0.08225 .
## MISSING_REDUCE_PAID_WORK_HOURS -1.037e+00 8.914e-01 -1.163 0.24488
## MISSING_RETIRED 1.624e+01 1.576e+00 10.300 < 2e-16 ***
## ACTIVE_ONCE_WEEKLY 2.282e-01 4.968e-01 0.459 0.64603
## ACTIVE_MORE_THAN_ONCE_WEEKLY -5.044e-01 4.075e-01 -1.238 0.21581
## ACTIVE_DAILY 3.416e-01 5.601e-01 0.610 0.54200
## EXCELLENT_HEALTH 6.393e+00 1.240e+00 5.156 2.61e-07 ***
## VERY_GOOD_HEALTH 5.710e+00 1.174e+00 4.863 1.19e-06 ***
## GOOD_HEALTH 6.152e+00 1.165e+00 5.279 1.34e-07 ***
## FAIR_HEALTH 5.059e+00 1.202e+00 4.210 2.59e-05 ***
## MARRIED -1.605e+00 5.966e-01 -2.690 0.00716 **
## REALLY_LIKE_WORKING 3.997e-01 1.230e+00 0.325 0.74526
## LIKE_WORKING 4.480e-02 1.207e+00 0.037 0.97039
## DISLIKE_WORKING -1.931e-01 1.311e+00 -0.147 0.88288
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.32 on 6119 degrees of freedom
## Multiple R-squared: 0.3146, Adjusted R-squared: 0.3116
## F-statistic: 104 on 27 and 6119 DF, p-value: < 2.2e-16
##
## Model Comparison:
## [1] 49467.57
## [1] 48348.64
##
## Summary of Model Selection Results:
| OLS | STEP_BOTH | OLS_REDUCED | STEP_BOTH_REDUCED | |
|---|---|---|---|---|
| (Intercept) | 42.912 | 44.134 | 46.489 | 46.454 |
| (3.443) | (2.231) | (2.678) | (2.338) | |
| AGE | −0.147 | −0.148 | −0.160 | −0.163 |
| (0.035) | (0.034) | (0.035) | (0.035) | |
| HEART_CONDITION | 0.281 | 0.431 | ||
| (0.484) | (0.488) | |||
| ANY_DEPENDENTS | 0.435 | 0.531 | ||
| (0.424) | (0.425) | |||
| WORKING_SPOUSE | −0.643 | −0.680 | ||
| (0.543) | (0.545) | |||
| WEEKS_PAID_VACATION | 0.529 | 0.532 | 0.522 | 0.527 |
| (0.062) | (0.061) | (0.061) | (0.061) | |
| REDUCE_PAID_WORK_HOURS | −1.229 | −1.210 | −1.179 | −1.081 |
| (0.384) | (0.383) | (0.385) | (0.378) | |
| MEDICARE | −1.351 | −1.296 | −1.151 | −1.104 |
| (0.581) | (0.580) | (0.587) | (0.586) | |
| MEDICAID | −4.349 | −4.317 | −4.544 | −4.540 |
| (0.573) | (0.570) | (0.574) | (0.572) | |
| HOSPITAL_EXPENSES | 0.0003 | 0.0003 | 0.0003 | 0.0003 |
| (0.0001) | (0.0001) | (0.0001) | (0.0001) | |
| RETIRED | −13.842 | −13.841 | −13.940 | −13.955 |
| (0.480) | (0.478) | (0.483) | (0.481) | |
| MISSING_RATE_HEALTH | 15.648 | 15.175 | ||
| (7.308) | (7.279) | |||
| MISSING_HEART_CONDITION | 1.861 | |||
| (7.168) | ||||
| MISSING_ACTIVITY_FREQUENCY | 1.126 | |||
| (4.701) | ||||
| MISSING_WORK_ENJOYMENT | −15.061 | −15.248 | −16.268 | −16.766 |
| (1.655) | (1.150) | (1.691) | (1.146) | |
| MISSING_ANY_DEPENDENTS | 0.727 | 0.888 | ||
| (0.751) | (0.753) | |||
| MISSING_WORKING_SPOUSE | −1.798 | −1.341 | −1.777 | −1.306 |
| (0.559) | (0.352) | (0.561) | (0.354) | |
| MISSING_WEEKS_PAID_VACATION | 1.844 | 1.826 | 1.495 | 0.707 |
| (0.858) | (0.856) | (0.860) | (0.435) | |
| MISSING_REDUCE_PAID_WORK_HOURS | −1.547 | −1.450 | −1.037 | |
| (0.889) | (0.886) | (0.891) | ||
| MISSING_JOB_STATUS | 6.748 | 6.849 | ||
| (2.449) | (2.445) | |||
| MISSING_MEDICARE | −1.519 | |||
| (3.109) | ||||
| MISSING_MEDICAID | −4.188 | −4.942 | ||
| (2.468) | (1.971) | |||
| MISSING_RETIRED | 14.294 | 14.192 | 16.236 | 16.497 |
| (1.510) | (1.490) | (1.576) | (1.551) | |
| ACTIVE_ONCE_WEEKLY | 0.211 | 0.228 | ||
| (0.494) | (0.497) | |||
| ACTIVE_MORE_THAN_ONCE_WEEKLY | −0.597 | −0.726 | −0.504 | −0.664 |
| (0.406) | (0.319) | (0.407) | (0.320) | |
| ACTIVE_DAILY | 0.238 | 0.342 | ||
| (0.557) | (0.560) | |||
| EXCELLENT_HEALTH | 6.448 | 6.513 | 6.393 | 6.466 |
| (1.229) | (1.219) | (1.240) | (1.230) | |
| VERY_GOOD_HEALTH | 5.675 | 5.724 | 5.710 | 5.789 |
| (1.163) | (1.156) | (1.174) | (1.167) | |
| GOOD_HEALTH | 6.087 | 6.112 | 6.152 | 6.205 |
| (1.154) | (1.151) | (1.165) | (1.162) | |
| FAIR_HEALTH | 5.118 | 5.153 | 5.059 | 5.134 |
| (1.190) | (1.188) | (1.202) | (1.200) | |
| MARRIED | 1.228 | −1.605 | −1.632 | |
| (2.242) | (0.597) | (0.596) | ||
| DIVORCED_REMARIED_NEW_PARTNER | 2.712 | 1.497 | ||
| (2.311) | (0.595) | |||
| DIVORCED | 0.827 | |||
| (3.833) | ||||
| REALLY_LIKE_WORKING | 0.434 | 0.400 | ||
| (1.221) | (1.230) | |||
| LIKE_WORKING | 0.128 | 0.045 | ||
| (1.198) | (1.207) | |||
| DISLIKE_WORKING | −0.175 | −0.193 | ||
| (1.303) | (1.311) | |||
| Num.Obs. | 6279 | 6279 | 6147 | 6147 |
| R2 | 0.311 | 0.310 | 0.315 | 0.314 |
| R2 Adj. | 0.307 | 0.308 | 0.312 | 0.312 |
| AIC | 49467.6 | 49445.4 | 48348.6 | 48336.6 |
| BIC | 49717.1 | 49600.5 | 48543.6 | 48464.4 |
| Log.Lik. | −24696.783 | −24699.679 | −24145.319 | −24149.324 |
| F | 80.498 | 134.066 | 104.047 | 164.836 |
| RMSE | 12.36 | 12.36 | 12.29 | 12.30 |
df_high_corr <- subset(df_reg_significant,
select = c(WEEKLY_WORK_HOURS, WORKING_SPOUSE, ANY_DEPENDENTS,
RETIRED, AGE, MEDICARE, MEDICAID, HEART_CONDITION))
correlation_matrix <- round(cor(df_high_corr),2)
melted_correlation_matrix <- melt(correlation_matrix)
## Warning in melt(correlation_matrix): The melt generic in data.table has
## been passed a matrix and will attempt to redirect to the relevant reshape2
## method; please note that reshape2 is deprecated, and this redirection is now
## deprecated as well. To continue using melt methods from reshape2 while both
## libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(correlation_matrix). In the next version, this warning will
## become an error.
get_lower_tri <- function(correlation_matrix){
correlation_matrix[upper.tri(correlation_matrix)] <- NA
return(correlation_matrix)
}
get_upper_tri <- function(correlation_matrix){
correlation_matrix[lower.tri(correlation_matrix)] <- NA
return(correlation_matrix)
}
reorder_correlation_matrix <- function(correlation_matrix){
dd <- as.dist((1-correlation_matrix)/2)
hc <- hclust(dd)
correlation_matrix <- correlation_matrix[hc$order, hc$order]
}
correlation_matrix <- reorder_correlation_matrix(correlation_matrix)
upper_triangel <- get_upper_tri(correlation_matrix)
melted_correlation_matrix <- melt(upper_triangel, na.rm=TRUE)
## Warning in melt(upper_triangel, na.rm = TRUE): The melt generic in data.table
## has been passed a matrix and will attempt to redirect to the relevant reshape2
## method; please note that reshape2 is deprecated, and this redirection is now
## deprecated as well. To continue using melt methods from reshape2 while both
## libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(upper_triangel). In the next version, this warning will become an
## error.
options(repr.plot.width = 20, repr.plot.height =20)
ggheatmap <- ggplot(melted_correlation_matrix, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 8, hjust = 1))+
coord_fixed() +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 2) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 5, barheight = 1,
title.position = "top", title.hjust = 0.5))
# Print the heatmap
print(ggheatmap)
melted_correlation_matrix
## Var1 Var2 value
## 1 ANY_DEPENDENTS ANY_DEPENDENTS 1.00
## 9 ANY_DEPENDENTS WEEKLY_WORK_HOURS 0.08
## 10 WEEKLY_WORK_HOURS WEEKLY_WORK_HOURS 1.00
## 17 ANY_DEPENDENTS WORKING_SPOUSE 0.10
## 18 WEEKLY_WORK_HOURS WORKING_SPOUSE 0.17
## 19 WORKING_SPOUSE WORKING_SPOUSE 1.00
## 25 ANY_DEPENDENTS MEDICAID 0.02
## 26 WEEKLY_WORK_HOURS MEDICAID -0.11
## 27 WORKING_SPOUSE MEDICAID -0.10
## 28 MEDICAID MEDICAID 1.00
## 33 ANY_DEPENDENTS HEART_CONDITION -0.03
## 34 WEEKLY_WORK_HOURS HEART_CONDITION -0.08
## 35 WORKING_SPOUSE HEART_CONDITION -0.04
## 36 MEDICAID HEART_CONDITION -0.02
## 37 HEART_CONDITION HEART_CONDITION 1.00
## 41 ANY_DEPENDENTS RETIRED -0.10
## 42 WEEKLY_WORK_HOURS RETIRED -0.50
## 43 WORKING_SPOUSE RETIRED -0.21
## 44 MEDICAID RETIRED 0.02
## 45 HEART_CONDITION RETIRED 0.14
## 46 RETIRED RETIRED 1.00
## 49 ANY_DEPENDENTS AGE -0.17
## 50 WEEKLY_WORK_HOURS AGE -0.33
## 51 WORKING_SPOUSE AGE -0.34
## 52 MEDICAID AGE -0.05
## 53 HEART_CONDITION AGE 0.21
## 54 RETIRED AGE 0.50
## 55 AGE AGE 1.00
## 57 ANY_DEPENDENTS MEDICARE -0.13
## 58 WEEKLY_WORK_HOURS MEDICARE -0.33
## 59 WORKING_SPOUSE MEDICARE -0.33
## 60 MEDICAID MEDICARE 0.01
## 61 HEART_CONDITION MEDICARE 0.17
## 62 RETIRED MEDICARE 0.51
## 63 AGE MEDICARE 0.74
## 64 MEDICARE MEDICARE 1.00
Most correlated with Weekly Work Hours: Retired, Age, Medicare, Working Spouse
Most Correlated with Age: Medicare, Working Spouse, Any Dependents
Most Correlated with Retired: Age, Medicare, Working Spouse
Most Correlated with Heart Condition: Age, Medicare
Best endogenous variables: Retired, Working Spouse
Best IV: Age, Medicare, Medicaid Heart Condition
# Creating OLS models for endogenous variables
retired.hat <- lm(RETIRED ~ AGE + MEDICARE + HEART_CONDITION + WEEKS_PAID_VACATION +
REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS +
MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS +
MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION +
MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH +
MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
data = df_reg_significant)
working_spouse.hat <- lm(RETIRED ~ AGE + MEDICARE + HEART_CONDITION + WEEKS_PAID_VACATION +
REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS +
MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS +
MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION +
MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH +
MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
data = df_reg_significant)
df_2sls <- df_reg_significant
df_2sls$RETIRED_HAT <- df_2sls$RETIRED + retired.hat$residuals
df_2sls$WORKING_SPOUSE_HAT <- df_2sls$WORKING_SPOUSE + working_spouse.hat$residuals
# 2SLS Models
tsls.most.correlated <- lm(WEEKLY_WORK_HOURS ~ RETIRED_HAT + WORKING_SPOUSE_HAT + HEART_CONDITION + WEEKS_PAID_VACATION +
REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS +
MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS +
MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION +
MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH +
MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
data = df_2sls)
tsls.retired <- lm(WEEKLY_WORK_HOURS ~ RETIRED_HAT + WORKING_SPOUSE + HEART_CONDITION + WEEKS_PAID_VACATION +
REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS +
MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS +
MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION +
MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH +
MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
data = df_2sls)
tsls.working_spouse <- lm(WEEKLY_WORK_HOURS ~ RETIRED + WORKING_SPOUSE_HAT + HEART_CONDITION + WEEKS_PAID_VACATION +
REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS +
MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS +
MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION +
MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH +
MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
data = df_2sls)
#summary(tsls.most.correlated, vcov = sandwich, diagnostics = TRUE)
#summary(tsls.retired, vcov = sandwich, diagnostics = TRUE)
#summary(tsls.retired, vcov = sandwich, diagnostics = TRUE)
AIC(lm.full.significant)
## [1] 48348.64
AIC(tsls.most.correlated)
## [1] 48594.19
AIC(tsls.retired)
## [1] 48606.3
AIC(tsls.working_spouse)
## [1] 48445.39
#None of these beat the original OLS model
df_nonzero_wwh <- subset(df_reg_significant, df_reg_significant$WEEKLY_WORK_HOURS > 0)
df_zero_wwh <- df_reg_significant
df_zero_wwh$ZERO_WWH <- ifelse(df_zero_wwh$WEEKLY_WORK_HOURS > 0, 1, 0)
df_zero_wwh <- subset(df_zero_wwh, select = -WEEKLY_WORK_HOURS)
df_logged_wwh <- df_nonzero_wwh
df_logged_wwh$LOG_WWH <- log(df_logged_wwh$WEEKLY_WORK_HOURS)
df_logged_wwh <- subset(df_logged_wwh, select = -WEEKLY_WORK_HOURS)
hist(df_reg_significant$WEEKLY_WORK_HOURS)
hist(df_logged_wwh$LOG_WWH)
probit.model <- glm(ZERO_WWH ~ .,
family = binomial(link = "probit"),
data = df_zero_wwh)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
logit.model <- glm(ZERO_WWH ~ .,
family = "binomial",
data = df_zero_wwh)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
lm.nonzero <- lm(WEEKLY_WORK_HOURS ~ .,
data = df_nonzero_wwh)
log.nonzero <- lm(log(WEEKLY_WORK_HOURS) ~ .,
data = df_nonzero_wwh)
gamma.nonzero <- glm(WEEKLY_WORK_HOURS ~ .,
family = Gamma(link = "log"),
data = df_nonzero_wwh)
poisson.model <- glm(WEEKLY_WORK_HOURS ~ .,
family = "poisson",
data = df_reg_significant)
poisson.model.truncated <- glm(WEEKLY_WORK_HOURS ~ .,
family = "poisson",
data = df_nonzero_wwh)
nb2.model <- glm.nb(WEEKLY_WORK_HOURS ~ .,
data = df_reg_significant)
nb2.model.truncated <- glm.nb(WEEKLY_WORK_HOURS ~ .,
data = df_nonzero_wwh)
cat("\nProbit Model AIC: ", AIC(probit.model))
##
## Probit Model AIC: 131.8006
cat("\nLogit Model AIC: ", AIC(logit.model))
##
## Logit Model AIC: 131.5922
cat("\nNon-zero Linear Model AIC: ", AIC(lm.nonzero))
##
## Non-zero Linear Model AIC: 48263.71
cat("\nNon-zero Log Model AIC: ", AIC(log.nonzero))
##
## Non-zero Log Model AIC: 8758.231
cat("\nNon-zero Gamma Model AIC: ", AIC(gamma.nonzero))
##
## Non-zero Gamma Model AIC: 50250.06
cat("\nPoisson Model AIC: ", AIC(poisson.model))
##
## Poisson Model AIC: 61600.67
cat("\nTruncated Poisson Model AIC: ", AIC(poisson.model.truncated))
##
## Truncated Poisson Model AIC: 61285.99
cat("\nNegative Binomial 2 Model AIC: ", AIC(nb2.model))
##
## Negative Binomial 2 Model AIC: 49660.04
cat("\nTruncated Negative Binomial 2 Model AIC: ", AIC(nb2.model.truncated))
##
## Truncated Negative Binomial 2 Model AIC: 49509.87
summary(logit.model)
##
## Call:
## glm(formula = ZERO_WWH ~ ., family = "binomial", data = df_zero_wwh)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.0812 0.0000 0.0000 0.0222 0.9516
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.970e+01 1.336e+04 0.003 0.998
## AGE 2.261e-02 7.093e-02 0.319 0.750
## HEART_CONDITION -6.084e-01 9.855e-01 -0.617 0.537
## ANY_DEPENDENTS -1.139e-01 1.267e+00 -0.090 0.928
## WORKING_SPOUSE 1.844e+01 1.956e+03 0.009 0.992
## WEEKS_PAID_VACATION 2.753e-01 5.107e-01 0.539 0.590
## REDUCE_PAID_WORK_HOURS -8.340e-01 1.333e+00 -0.626 0.532
## MEDICARE 8.492e-01 1.270e+00 0.669 0.504
## MEDICAID -7.363e-01 1.251e+00 -0.589 0.556
## HOSPITAL_EXPENSES -1.672e-04 2.389e-04 -0.700 0.484
## RETIRED -1.452e+00 1.221e+00 -1.189 0.234
## MISSING_WORK_ENJOYMENT -2.095e+01 9.878e+03 -0.002 0.998
## MISSING_ANY_DEPENDENTS 2.078e+01 4.690e+03 0.004 0.996
## MISSING_WORKING_SPOUSE 3.407e-01 1.046e+00 0.326 0.745
## MISSING_WEEKS_PAID_VACATION -4.641e-01 9.763e-01 -0.475 0.635
## MISSING_REDUCE_PAID_WORK_HOURS 3.784e-01 1.610e+00 0.235 0.814
## MISSING_RETIRED 1.962e+01 9.451e+03 0.002 0.998
## ACTIVE_ONCE_WEEKLY -9.583e-01 1.010e+00 -0.949 0.343
## ACTIVE_MORE_THAN_ONCE_WEEKLY 8.136e-01 1.019e+00 0.798 0.425
## ACTIVE_DAILY 1.782e+01 3.786e+03 0.005 0.996
## EXCELLENT_HEALTH -1.958e+01 8.989e+03 -0.002 0.998
## VERY_GOOD_HEALTH -1.950e+01 8.989e+03 -0.002 0.998
## GOOD_HEALTH -1.688e+01 8.989e+03 -0.002 0.999
## FAIR_HEALTH -1.817e+01 8.989e+03 -0.002 0.998
## MARRIED 1.223e+00 1.194e+00 1.024 0.306
## REALLY_LIKE_WORKING -1.609e+01 9.878e+03 -0.002 0.999
## LIKE_WORKING -1.637e+01 9.878e+03 -0.002 0.999
## DISLIKE_WORKING 3.736e-02 1.094e+04 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 135.464 on 6146 degrees of freedom
## Residual deviance: 75.592 on 6119 degrees of freedom
## AIC: 131.59
##
## Number of Fisher Scoring iterations: 23
summary(log.nonzero)
##
## Call:
## lm(formula = log(WEEKLY_WORK_HOURS) ~ ., data = df_nonzero_wwh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6494 -0.0918 0.0336 0.2165 2.3048
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.992e+00 1.071e-01 37.269 < 2e-16 ***
## AGE -7.878e-03 1.414e-03 -5.573 2.61e-08 ***
## HEART_CONDITION 1.430e-02 1.952e-02 0.733 0.46385
## ANY_DEPENDENTS -7.294e-03 1.700e-02 -0.429 0.66779
## WORKING_SPOUSE -3.108e-02 2.181e-02 -1.425 0.15416
## WEEKS_PAID_VACATION 2.203e-02 2.458e-03 8.962 < 2e-16 ***
## REDUCE_PAID_WORK_HOURS -2.969e-02 1.541e-02 -1.927 0.05406 .
## MEDICARE -4.970e-02 2.347e-02 -2.118 0.03425 *
## MEDICAID -1.403e-01 2.299e-02 -6.106 1.09e-09 ***
## HOSPITAL_EXPENSES 7.212e-06 5.108e-06 1.412 0.15804
## RETIRED -5.912e-01 1.933e-02 -30.588 < 2e-16 ***
## MISSING_WORK_ENJOYMENT -1.195e+00 6.834e-02 -17.489 < 2e-16 ***
## MISSING_ANY_DEPENDENTS 9.479e-03 3.012e-02 0.315 0.75300
## MISSING_WORKING_SPOUSE -7.139e-02 2.245e-02 -3.179 0.00148 **
## MISSING_WEEKS_PAID_VACATION 8.348e-02 3.450e-02 2.420 0.01556 *
## MISSING_REDUCE_PAID_WORK_HOURS -1.302e-01 3.573e-02 -3.645 0.00027 ***
## MISSING_RETIRED 1.115e+00 6.373e-02 17.502 < 2e-16 ***
## ACTIVE_ONCE_WEEKLY 1.849e-02 1.989e-02 0.930 0.35262
## ACTIVE_MORE_THAN_ONCE_WEEKLY -3.278e-03 1.630e-02 -0.201 0.84063
## ACTIVE_DAILY -4.708e-03 2.240e-02 -0.210 0.83355
## EXCELLENT_HEALTH 2.526e-01 4.958e-02 5.095 3.60e-07 ***
## VERY_GOOD_HEALTH 2.364e-01 4.695e-02 5.035 4.93e-07 ***
## GOOD_HEALTH 2.517e-01 4.660e-02 5.403 6.82e-08 ***
## FAIR_HEALTH 2.187e-01 4.805e-02 4.551 5.44e-06 ***
## MARRIED -7.448e-02 2.388e-02 -3.119 0.00182 **
## REALLY_LIKE_WORKING -1.074e-02 4.920e-02 -0.218 0.82714
## LIKE_WORKING -1.460e-02 4.826e-02 -0.302 0.76234
## DISLIKE_WORKING -1.779e-02 5.241e-02 -0.339 0.73425
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4927 on 6110 degrees of freedom
## Multiple R-squared: 0.388, Adjusted R-squared: 0.3853
## F-statistic: 143.4 on 27 and 6110 DF, p-value: < 2.2e-16