Data Summary Less Clean

## 
## Proportion of Missing Across Variables:
## Proportion of Missing in Rate Health:  3 /6279
## Proportion of Missing in Heart Condition:  3 /6279
## Proportion of Missing in Activity Frequency:  7 /6279
## Proportion of Missing in Work Enjoyment:  248 /6279
## Proportion of Missing in Any Dependents:  297 /6279
## Proportion of Missing in Working Spouse:  2835 /6279
## Proportion of Missing in Weeks of Paid Vacation:  1327 /6279
## Proportion of Missing in Ability to Reduce Hours:  1319 /6279
## Proportion of Missing in Employment Status:  26 /6279
## Proportion of Missing in Medicare Coverage:  27 /6279
## Proportion of Missing in Medicaid Coverage:  42 /6279
## 
## Proportion of Missing in Rate Health:  3 /6279
## Proportion of Excellent in Rate Health:  700 /6279
## Proportion of Very Good in Rate Health:  2254 /6279
## Proportion of Good in Rate Health:  2275 /6279
## Proportion of Fair in Rate Health:  923 /6279
## Proportion of Poor in Rate Health:  124 /6279
## 
## Proportion of Missing in Activity Frequency:  7 /6279
## Proportion of Once Weekly in Activity Frequency:  1107 /6279
## Proportion of More than Once Weekly in Activity Frequency:  2881 /6279
## Proportion of Daily in Activity Frequency:  778 /6279
## Proportion of Inactive in Activity Frequency:  1513 /6279
## 
## Proportion of Married:  5745 /6279
## Proportion of Divorced:  16 /6279
## Proportion of Divorced With New Partner:  487 /6279
## Proportion of Divorced Who Remaried Same Partner:  31 /6279
## 
## Proportion of Employeed:  487 /6279
## Proportion of Unemployed:  16 /6279
## Proportion of Recently Laid Off:  31 /6279
## 
## Proportion of Those Who Really Like Working:  1725 /6279
## Proportion of Those Who Like Working:  3696 /6279
## Proportion of Those Who Dislike Working:  499 /6279
## Proportion of Those Who Really Dislike Working:  111 /6279

Data Summary Clean

## 
## Proportion of Missing Across Variables:
## Proportion of Missing in Work Enjoyment:  239 /6147
## Proportion of Missing in Any Dependents:  292 /6147
## Proportion of Missing in Working Spouse:  2760 /6147
## Proportion of Missing in Weeks of Paid Vacation:  1291 /6147
## Proportion of Missing in Ability to Reduce Hours:  1284 /6147
## Proportion of Missing in Employment Status:  127 /6147
## 
## Proportion of Excellent in Rate Health:  693 /6147
## Proportion of Very Good in Rate Health:  2218 /6147
## Proportion of Good in Rate Health:  2218 /6147
## Proportion of Fair in Rate Health:  898 /6147
## Proportion of Poor in Rate Health:  0 /6147
## 
## Proportion of Once Weekly in Activity Frequency:  1085 /6147
## Proportion of More than Once Weekly in Activity Frequency:  2833 /6147
## Proportion of Daily in Activity Frequency:  763 /6147
## Proportion of Inactive in Activity Frequency:  0 /6147
## 
## Proportion of Married:  5666 /6147
## Proportion of Divorced With New Partner:  0 /6147
## 
## Proportion of Employeed:  2743 /6147
## 
## Proportion of Those Who Really Like Working:  1689 /6147
## Proportion of Those Who Like Working:  3618 /6147
## Proportion of Those Who Dislike Working:  493 /6147
## Proportion of Those Who Really Dislike Working:  0 /6147

OLS Models

## 
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ ., data = df_reg_no_singularities)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.897  -4.924  -1.038   5.349 147.575 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     4.291e+01  3.443e+00  12.463  < 2e-16 ***
## AGE                            -1.466e-01  3.494e-02  -4.195 2.76e-05 ***
## HEART_CONDITION                 2.812e-01  4.835e-01   0.582  0.56090    
## ANY_DEPENDENTS                  4.352e-01  4.236e-01   1.027  0.30428    
## WORKING_SPOUSE                 -6.431e-01  5.429e-01  -1.185  0.23624    
## WEEKS_PAID_VACATION             5.293e-01  6.153e-02   8.603  < 2e-16 ***
## REDUCE_PAID_WORK_HOURS         -1.229e+00  3.835e-01  -3.205  0.00136 ** 
## MEDICARE                       -1.351e+00  5.814e-01  -2.324  0.02017 *  
## MEDICAID                       -4.349e+00  5.726e-01  -7.596 3.51e-14 ***
## HOSPITAL_EXPENSES               3.103e-04  1.225e-04   2.533  0.01134 *  
## RETIRED                        -1.384e+01  4.796e-01 -28.862  < 2e-16 ***
## MISSING_RATE_HEALTH             1.565e+01  7.308e+00   2.141  0.03230 *  
## MISSING_HEART_CONDITION         1.861e+00  7.168e+00   0.260  0.79519    
## MISSING_ACTIVITY_FREQUENCY      1.126e+00  4.701e+00   0.240  0.81064    
## MISSING_WORK_ENJOYMENT         -1.506e+01  1.655e+00  -9.100  < 2e-16 ***
## MISSING_ANY_DEPENDENTS          7.265e-01  7.514e-01   0.967  0.33362    
## MISSING_WORKING_SPOUSE         -1.798e+00  5.592e-01  -3.216  0.00131 ** 
## MISSING_WEEKS_PAID_VACATION     1.844e+00  8.576e-01   2.151  0.03155 *  
## MISSING_REDUCE_PAID_WORK_HOURS -1.547e+00  8.885e-01  -1.741  0.08175 .  
## MISSING_JOB_STATUS              6.748e+00  2.449e+00   2.756  0.00587 ** 
## MISSING_MEDICARE               -1.519e+00  3.109e+00  -0.489  0.62521    
## MISSING_MEDICAID               -4.188e+00  2.468e+00  -1.697  0.08977 .  
## MISSING_RETIRED                 1.429e+01  1.510e+00   9.469  < 2e-16 ***
## ACTIVE_ONCE_WEEKLY              2.107e-01  4.940e-01   0.427  0.66968    
## ACTIVE_MORE_THAN_ONCE_WEEKLY   -5.971e-01  4.055e-01  -1.472  0.14097    
## ACTIVE_DAILY                    2.380e-01  5.574e-01   0.427  0.66935    
## EXCELLENT_HEALTH                6.448e+00  1.229e+00   5.246 1.61e-07 ***
## VERY_GOOD_HEALTH                5.675e+00  1.163e+00   4.881 1.08e-06 ***
## GOOD_HEALTH                     6.087e+00  1.154e+00   5.275 1.37e-07 ***
## FAIR_HEALTH                     5.118e+00  1.190e+00   4.300 1.73e-05 ***
## MARRIED                         1.228e+00  2.242e+00   0.548  0.58386    
## DIVORCED_REMARIED_NEW_PARTNER   2.712e+00  2.311e+00   1.174  0.24049    
## DIVORCED                        8.269e-01  3.833e+00   0.216  0.82919    
## REALLY_LIKE_WORKING             4.340e-01  1.221e+00   0.355  0.72238    
## LIKE_WORKING                    1.283e-01  1.198e+00   0.107  0.91471    
## DISLIKE_WORKING                -1.751e-01  1.303e+00  -0.134  0.89308    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.39 on 6243 degrees of freedom
## Multiple R-squared:  0.311,  Adjusted R-squared:  0.3071 
## F-statistic:  80.5 on 35 and 6243 DF,  p-value: < 2.2e-16
## [1] 49467.57
## 
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ ., data = df_reg_significant)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.909  -4.856  -1.045   5.377 147.628 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     4.649e+01  2.678e+00  17.360  < 2e-16 ***
## AGE                            -1.605e-01  3.534e-02  -4.541 5.70e-06 ***
## HEART_CONDITION                 4.312e-01  4.877e-01   0.884  0.37659    
## ANY_DEPENDENTS                  5.314e-01  4.248e-01   1.251  0.21102    
## WORKING_SPOUSE                 -6.795e-01  5.448e-01  -1.247  0.21228    
## WEEKS_PAID_VACATION             5.221e-01  6.146e-02   8.495  < 2e-16 ***
## REDUCE_PAID_WORK_HOURS         -1.179e+00  3.853e-01  -3.060  0.00222 ** 
## MEDICARE                       -1.151e+00  5.866e-01  -1.963  0.04971 *  
## MEDICAID                       -4.544e+00  5.745e-01  -7.910 3.03e-15 ***
## HOSPITAL_EXPENSES               3.027e-04  1.277e-04   2.370  0.01782 *  
## RETIRED                        -1.394e+01  4.830e-01 -28.864  < 2e-16 ***
## MISSING_WORK_ENJOYMENT         -1.627e+01  1.691e+00  -9.619  < 2e-16 ***
## MISSING_ANY_DEPENDENTS          8.878e-01  7.533e-01   1.179  0.23862    
## MISSING_WORKING_SPOUSE         -1.777e+00  5.608e-01  -3.168  0.00154 ** 
## MISSING_WEEKS_PAID_VACATION     1.495e+00  8.600e-01   1.738  0.08225 .  
## MISSING_REDUCE_PAID_WORK_HOURS -1.037e+00  8.914e-01  -1.163  0.24488    
## MISSING_RETIRED                 1.624e+01  1.576e+00  10.300  < 2e-16 ***
## ACTIVE_ONCE_WEEKLY              2.282e-01  4.968e-01   0.459  0.64603    
## ACTIVE_MORE_THAN_ONCE_WEEKLY   -5.044e-01  4.075e-01  -1.238  0.21581    
## ACTIVE_DAILY                    3.416e-01  5.601e-01   0.610  0.54200    
## EXCELLENT_HEALTH                6.393e+00  1.240e+00   5.156 2.61e-07 ***
## VERY_GOOD_HEALTH                5.710e+00  1.174e+00   4.863 1.19e-06 ***
## GOOD_HEALTH                     6.152e+00  1.165e+00   5.279 1.34e-07 ***
## FAIR_HEALTH                     5.059e+00  1.202e+00   4.210 2.59e-05 ***
## MARRIED                        -1.605e+00  5.966e-01  -2.690  0.00716 ** 
## REALLY_LIKE_WORKING             3.997e-01  1.230e+00   0.325  0.74526    
## LIKE_WORKING                    4.480e-02  1.207e+00   0.037  0.97039    
## DISLIKE_WORKING                -1.931e-01  1.311e+00  -0.147  0.88288    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.32 on 6119 degrees of freedom
## Multiple R-squared:  0.3146, Adjusted R-squared:  0.3116 
## F-statistic:   104 on 27 and 6119 DF,  p-value: < 2.2e-16
## 
## Model Comparison:
## [1] 49467.57
## [1] 48348.64

Model Selection

## 
## Summary of Model Selection Results:
OLS STEP_BOTH OLS_REDUCED STEP_BOTH_REDUCED
(Intercept) 42.912 44.134 46.489 46.454
(3.443) (2.231) (2.678) (2.338)
AGE −0.147 −0.148 −0.160 −0.163
(0.035) (0.034) (0.035) (0.035)
HEART_CONDITION 0.281 0.431
(0.484) (0.488)
ANY_DEPENDENTS 0.435 0.531
(0.424) (0.425)
WORKING_SPOUSE −0.643 −0.680
(0.543) (0.545)
WEEKS_PAID_VACATION 0.529 0.532 0.522 0.527
(0.062) (0.061) (0.061) (0.061)
REDUCE_PAID_WORK_HOURS −1.229 −1.210 −1.179 −1.081
(0.384) (0.383) (0.385) (0.378)
MEDICARE −1.351 −1.296 −1.151 −1.104
(0.581) (0.580) (0.587) (0.586)
MEDICAID −4.349 −4.317 −4.544 −4.540
(0.573) (0.570) (0.574) (0.572)
HOSPITAL_EXPENSES 0.0003 0.0003 0.0003 0.0003
(0.0001) (0.0001) (0.0001) (0.0001)
RETIRED −13.842 −13.841 −13.940 −13.955
(0.480) (0.478) (0.483) (0.481)
MISSING_RATE_HEALTH 15.648 15.175
(7.308) (7.279)
MISSING_HEART_CONDITION 1.861
(7.168)
MISSING_ACTIVITY_FREQUENCY 1.126
(4.701)
MISSING_WORK_ENJOYMENT −15.061 −15.248 −16.268 −16.766
(1.655) (1.150) (1.691) (1.146)
MISSING_ANY_DEPENDENTS 0.727 0.888
(0.751) (0.753)
MISSING_WORKING_SPOUSE −1.798 −1.341 −1.777 −1.306
(0.559) (0.352) (0.561) (0.354)
MISSING_WEEKS_PAID_VACATION 1.844 1.826 1.495 0.707
(0.858) (0.856) (0.860) (0.435)
MISSING_REDUCE_PAID_WORK_HOURS −1.547 −1.450 −1.037
(0.889) (0.886) (0.891)
MISSING_JOB_STATUS 6.748 6.849
(2.449) (2.445)
MISSING_MEDICARE −1.519
(3.109)
MISSING_MEDICAID −4.188 −4.942
(2.468) (1.971)
MISSING_RETIRED 14.294 14.192 16.236 16.497
(1.510) (1.490) (1.576) (1.551)
ACTIVE_ONCE_WEEKLY 0.211 0.228
(0.494) (0.497)
ACTIVE_MORE_THAN_ONCE_WEEKLY −0.597 −0.726 −0.504 −0.664
(0.406) (0.319) (0.407) (0.320)
ACTIVE_DAILY 0.238 0.342
(0.557) (0.560)
EXCELLENT_HEALTH 6.448 6.513 6.393 6.466
(1.229) (1.219) (1.240) (1.230)
VERY_GOOD_HEALTH 5.675 5.724 5.710 5.789
(1.163) (1.156) (1.174) (1.167)
GOOD_HEALTH 6.087 6.112 6.152 6.205
(1.154) (1.151) (1.165) (1.162)
FAIR_HEALTH 5.118 5.153 5.059 5.134
(1.190) (1.188) (1.202) (1.200)
MARRIED 1.228 −1.605 −1.632
(2.242) (0.597) (0.596)
DIVORCED_REMARIED_NEW_PARTNER 2.712 1.497
(2.311) (0.595)
DIVORCED 0.827
(3.833)
REALLY_LIKE_WORKING 0.434 0.400
(1.221) (1.230)
LIKE_WORKING 0.128 0.045
(1.198) (1.207)
DISLIKE_WORKING −0.175 −0.193
(1.303) (1.311)
Num.Obs. 6279 6279 6147 6147
R2 0.311 0.310 0.315 0.314
R2 Adj. 0.307 0.308 0.312 0.312
AIC 49467.6 49445.4 48348.6 48336.6
BIC 49717.1 49600.5 48543.6 48464.4
Log.Lik. −24696.783 −24699.679 −24145.319 −24149.324
F 80.498 134.066 104.047 164.836
RMSE 12.36 12.36 12.29 12.30

Correlation Matrix

df_high_corr <- subset(df_reg_significant, 
                       select = c(WEEKLY_WORK_HOURS, WORKING_SPOUSE, ANY_DEPENDENTS,  
                                  RETIRED, AGE, MEDICARE, MEDICAID, HEART_CONDITION))
correlation_matrix <- round(cor(df_high_corr),2)
melted_correlation_matrix <- melt(correlation_matrix)
## Warning in melt(correlation_matrix): The melt generic in data.table has
## been passed a matrix and will attempt to redirect to the relevant reshape2
## method; please note that reshape2 is deprecated, and this redirection is now
## deprecated as well. To continue using melt methods from reshape2 while both
## libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(correlation_matrix). In the next version, this warning will
## become an error.
get_lower_tri <- function(correlation_matrix){
  correlation_matrix[upper.tri(correlation_matrix)] <- NA
  return(correlation_matrix)
}

get_upper_tri <- function(correlation_matrix){
  correlation_matrix[lower.tri(correlation_matrix)] <- NA
  return(correlation_matrix)
}

reorder_correlation_matrix <- function(correlation_matrix){
  dd <- as.dist((1-correlation_matrix)/2)
  hc <- hclust(dd)
  correlation_matrix <- correlation_matrix[hc$order, hc$order]
}

correlation_matrix <- reorder_correlation_matrix(correlation_matrix)
upper_triangel <- get_upper_tri(correlation_matrix)
melted_correlation_matrix <- melt(upper_triangel, na.rm=TRUE)
## Warning in melt(upper_triangel, na.rm = TRUE): The melt generic in data.table
## has been passed a matrix and will attempt to redirect to the relevant reshape2
## method; please note that reshape2 is deprecated, and this redirection is now
## deprecated as well. To continue using melt methods from reshape2 while both
## libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(upper_triangel). In the next version, this warning will become an
## error.
options(repr.plot.width = 20, repr.plot.height =20)
ggheatmap <- ggplot(melted_correlation_matrix, aes(Var2, Var1, fill = value))+
 geom_tile(color = "white")+
 scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
   midpoint = 0, limit = c(-1,1), space = "Lab", 
    name="Pearson\nCorrelation") +
  theme_minimal()+ 
 theme(axis.text.x = element_text(angle = 45, vjust = 1, 
    size = 8, hjust = 1))+
 coord_fixed() + 
 geom_text(aes(Var2, Var1, label = value), color = "black", size = 2) +
  theme(
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  panel.grid.major = element_blank(),
  panel.border = element_blank(),
  panel.background = element_blank(),
  axis.ticks = element_blank(),
  legend.justification = c(1, 0),
  legend.position = c(0.6, 0.7),
  legend.direction = "horizontal")+
  guides(fill = guide_colorbar(barwidth = 5, barheight = 1,
                title.position = "top", title.hjust = 0.5))
# Print the heatmap
print(ggheatmap)

melted_correlation_matrix
##                 Var1              Var2 value
## 1     ANY_DEPENDENTS    ANY_DEPENDENTS  1.00
## 9     ANY_DEPENDENTS WEEKLY_WORK_HOURS  0.08
## 10 WEEKLY_WORK_HOURS WEEKLY_WORK_HOURS  1.00
## 17    ANY_DEPENDENTS    WORKING_SPOUSE  0.10
## 18 WEEKLY_WORK_HOURS    WORKING_SPOUSE  0.17
## 19    WORKING_SPOUSE    WORKING_SPOUSE  1.00
## 25    ANY_DEPENDENTS          MEDICAID  0.02
## 26 WEEKLY_WORK_HOURS          MEDICAID -0.11
## 27    WORKING_SPOUSE          MEDICAID -0.10
## 28          MEDICAID          MEDICAID  1.00
## 33    ANY_DEPENDENTS   HEART_CONDITION -0.03
## 34 WEEKLY_WORK_HOURS   HEART_CONDITION -0.08
## 35    WORKING_SPOUSE   HEART_CONDITION -0.04
## 36          MEDICAID   HEART_CONDITION -0.02
## 37   HEART_CONDITION   HEART_CONDITION  1.00
## 41    ANY_DEPENDENTS           RETIRED -0.10
## 42 WEEKLY_WORK_HOURS           RETIRED -0.50
## 43    WORKING_SPOUSE           RETIRED -0.21
## 44          MEDICAID           RETIRED  0.02
## 45   HEART_CONDITION           RETIRED  0.14
## 46           RETIRED           RETIRED  1.00
## 49    ANY_DEPENDENTS               AGE -0.17
## 50 WEEKLY_WORK_HOURS               AGE -0.33
## 51    WORKING_SPOUSE               AGE -0.34
## 52          MEDICAID               AGE -0.05
## 53   HEART_CONDITION               AGE  0.21
## 54           RETIRED               AGE  0.50
## 55               AGE               AGE  1.00
## 57    ANY_DEPENDENTS          MEDICARE -0.13
## 58 WEEKLY_WORK_HOURS          MEDICARE -0.33
## 59    WORKING_SPOUSE          MEDICARE -0.33
## 60          MEDICAID          MEDICARE  0.01
## 61   HEART_CONDITION          MEDICARE  0.17
## 62           RETIRED          MEDICARE  0.51
## 63               AGE          MEDICARE  0.74
## 64          MEDICARE          MEDICARE  1.00

2SLS

Most correlated with Weekly Work Hours: Retired, Age, Medicare, Working Spouse

Most Correlated with Age: Medicare, Working Spouse, Any Dependents

Most Correlated with Retired: Age, Medicare, Working Spouse

Most Correlated with Heart Condition: Age, Medicare

Best endogenous variables: Retired, Working Spouse

Best IV: Age, Medicare, Medicaid Heart Condition

# Creating OLS models for endogenous variables

retired.hat <- lm(RETIRED ~ AGE + MEDICARE + HEART_CONDITION + WEEKS_PAID_VACATION +
                                REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS + 
                                MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS + 
                                MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION + 
                                MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED + 
                                ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
                                EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + 
                                MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
                   data = df_reg_significant)
  
working_spouse.hat <- lm(RETIRED ~ AGE + MEDICARE + HEART_CONDITION + WEEKS_PAID_VACATION +
                                REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS + 
                                MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS + 
                                MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION + 
                                MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED + 
                                ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
                                EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + 
                                MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
                   data = df_reg_significant)

df_2sls <- df_reg_significant
df_2sls$RETIRED_HAT <- df_2sls$RETIRED + retired.hat$residuals
df_2sls$WORKING_SPOUSE_HAT <- df_2sls$WORKING_SPOUSE + working_spouse.hat$residuals

# 2SLS Models
tsls.most.correlated <- lm(WEEKLY_WORK_HOURS ~ RETIRED_HAT + WORKING_SPOUSE_HAT +  HEART_CONDITION + WEEKS_PAID_VACATION +
                                REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS + 
                                MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS + 
                                MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION + 
                                MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED + 
                                ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
                                EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + 
                                MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
                              data = df_2sls)

tsls.retired <- lm(WEEKLY_WORK_HOURS ~ RETIRED_HAT + WORKING_SPOUSE +  HEART_CONDITION + WEEKS_PAID_VACATION +
                                REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS + 
                                MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS + 
                                MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION + 
                                MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED + 
                                ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
                                EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + 
                                MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
                              data = df_2sls)
tsls.working_spouse <- lm(WEEKLY_WORK_HOURS ~ RETIRED + WORKING_SPOUSE_HAT +  HEART_CONDITION + WEEKS_PAID_VACATION +
                                REDUCE_PAID_WORK_HOURS + HOSPITAL_EXPENSES + ANY_DEPENDENTS + 
                                MISSING_WORK_ENJOYMENT + MISSING_ANY_DEPENDENTS + 
                                MISSING_WORKING_SPOUSE + MISSING_WEEKS_PAID_VACATION + 
                                MISSING_REDUCE_PAID_WORK_HOURS + MISSING_RETIRED + 
                                ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
                                EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + 
                                MARRIED + REALLY_LIKE_WORKING + LIKE_WORKING + DISLIKE_WORKING,
                              data = df_2sls)
#summary(tsls.most.correlated, vcov = sandwich, diagnostics = TRUE)
#summary(tsls.retired, vcov = sandwich, diagnostics = TRUE)
#summary(tsls.retired, vcov = sandwich, diagnostics = TRUE)
AIC(lm.full.significant)
## [1] 48348.64
AIC(tsls.most.correlated)
## [1] 48594.19
AIC(tsls.retired)
## [1] 48606.3
AIC(tsls.working_spouse)
## [1] 48445.39
#None of these beat the original OLS model

2-Part Model

df_nonzero_wwh <- subset(df_reg_significant, df_reg_significant$WEEKLY_WORK_HOURS > 0)
df_zero_wwh <- df_reg_significant
df_zero_wwh$ZERO_WWH <- ifelse(df_zero_wwh$WEEKLY_WORK_HOURS > 0, 1, 0)
df_zero_wwh <- subset(df_zero_wwh, select = -WEEKLY_WORK_HOURS)
df_logged_wwh <- df_nonzero_wwh
df_logged_wwh$LOG_WWH <- log(df_logged_wwh$WEEKLY_WORK_HOURS)
df_logged_wwh <- subset(df_logged_wwh, select = -WEEKLY_WORK_HOURS)
hist(df_reg_significant$WEEKLY_WORK_HOURS)

hist(df_logged_wwh$LOG_WWH)

probit.model <- glm(ZERO_WWH ~ ., 
                    family = binomial(link = "probit"), 
                    data = df_zero_wwh)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
logit.model <- glm(ZERO_WWH ~ ., 
                   family = "binomial", 
                   data = df_zero_wwh)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
lm.nonzero <- lm(WEEKLY_WORK_HOURS ~ .,
                 data = df_nonzero_wwh)
log.nonzero <- lm(log(WEEKLY_WORK_HOURS) ~ .,
                  data = df_nonzero_wwh)
gamma.nonzero <- glm(WEEKLY_WORK_HOURS ~ .,
                     family = Gamma(link = "log"),
                     data = df_nonzero_wwh)
poisson.model <- glm(WEEKLY_WORK_HOURS ~ ., 
                     family = "poisson",
                     data = df_reg_significant)
poisson.model.truncated <- glm(WEEKLY_WORK_HOURS ~ .,
                               family = "poisson",
                               data = df_nonzero_wwh)
nb2.model <- glm.nb(WEEKLY_WORK_HOURS ~ ., 
                    data = df_reg_significant)
nb2.model.truncated <- glm.nb(WEEKLY_WORK_HOURS ~ .,
                              data = df_nonzero_wwh)

cat("\nProbit Model AIC: ", AIC(probit.model))
## 
## Probit Model AIC:  131.8006
cat("\nLogit Model AIC: ", AIC(logit.model))
## 
## Logit Model AIC:  131.5922
cat("\nNon-zero Linear Model AIC: ", AIC(lm.nonzero))
## 
## Non-zero Linear Model AIC:  48263.71
cat("\nNon-zero Log Model AIC: ", AIC(log.nonzero))
## 
## Non-zero Log Model AIC:  8758.231
cat("\nNon-zero Gamma Model AIC: ", AIC(gamma.nonzero))
## 
## Non-zero Gamma Model AIC:  50250.06
cat("\nPoisson Model AIC: ", AIC(poisson.model))
## 
## Poisson Model AIC:  61600.67
cat("\nTruncated Poisson Model AIC: ", AIC(poisson.model.truncated))
## 
## Truncated Poisson Model AIC:  61285.99
cat("\nNegative Binomial 2 Model AIC: ", AIC(nb2.model))
## 
## Negative Binomial 2 Model AIC:  49660.04
cat("\nTruncated Negative Binomial 2 Model AIC: ", AIC(nb2.model.truncated))
## 
## Truncated Negative Binomial 2 Model AIC:  49509.87
summary(logit.model)
## 
## Call:
## glm(formula = ZERO_WWH ~ ., family = "binomial", data = df_zero_wwh)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.0812   0.0000   0.0000   0.0222   0.9516  
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)                     3.970e+01  1.336e+04   0.003    0.998
## AGE                             2.261e-02  7.093e-02   0.319    0.750
## HEART_CONDITION                -6.084e-01  9.855e-01  -0.617    0.537
## ANY_DEPENDENTS                 -1.139e-01  1.267e+00  -0.090    0.928
## WORKING_SPOUSE                  1.844e+01  1.956e+03   0.009    0.992
## WEEKS_PAID_VACATION             2.753e-01  5.107e-01   0.539    0.590
## REDUCE_PAID_WORK_HOURS         -8.340e-01  1.333e+00  -0.626    0.532
## MEDICARE                        8.492e-01  1.270e+00   0.669    0.504
## MEDICAID                       -7.363e-01  1.251e+00  -0.589    0.556
## HOSPITAL_EXPENSES              -1.672e-04  2.389e-04  -0.700    0.484
## RETIRED                        -1.452e+00  1.221e+00  -1.189    0.234
## MISSING_WORK_ENJOYMENT         -2.095e+01  9.878e+03  -0.002    0.998
## MISSING_ANY_DEPENDENTS          2.078e+01  4.690e+03   0.004    0.996
## MISSING_WORKING_SPOUSE          3.407e-01  1.046e+00   0.326    0.745
## MISSING_WEEKS_PAID_VACATION    -4.641e-01  9.763e-01  -0.475    0.635
## MISSING_REDUCE_PAID_WORK_HOURS  3.784e-01  1.610e+00   0.235    0.814
## MISSING_RETIRED                 1.962e+01  9.451e+03   0.002    0.998
## ACTIVE_ONCE_WEEKLY             -9.583e-01  1.010e+00  -0.949    0.343
## ACTIVE_MORE_THAN_ONCE_WEEKLY    8.136e-01  1.019e+00   0.798    0.425
## ACTIVE_DAILY                    1.782e+01  3.786e+03   0.005    0.996
## EXCELLENT_HEALTH               -1.958e+01  8.989e+03  -0.002    0.998
## VERY_GOOD_HEALTH               -1.950e+01  8.989e+03  -0.002    0.998
## GOOD_HEALTH                    -1.688e+01  8.989e+03  -0.002    0.999
## FAIR_HEALTH                    -1.817e+01  8.989e+03  -0.002    0.998
## MARRIED                         1.223e+00  1.194e+00   1.024    0.306
## REALLY_LIKE_WORKING            -1.609e+01  9.878e+03  -0.002    0.999
## LIKE_WORKING                   -1.637e+01  9.878e+03  -0.002    0.999
## DISLIKE_WORKING                 3.736e-02  1.094e+04   0.000    1.000
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 135.464  on 6146  degrees of freedom
## Residual deviance:  75.592  on 6119  degrees of freedom
## AIC: 131.59
## 
## Number of Fisher Scoring iterations: 23
summary(log.nonzero)
## 
## Call:
## lm(formula = log(WEEKLY_WORK_HOURS) ~ ., data = df_nonzero_wwh)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6494 -0.0918  0.0336  0.2165  2.3048 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     3.992e+00  1.071e-01  37.269  < 2e-16 ***
## AGE                            -7.878e-03  1.414e-03  -5.573 2.61e-08 ***
## HEART_CONDITION                 1.430e-02  1.952e-02   0.733  0.46385    
## ANY_DEPENDENTS                 -7.294e-03  1.700e-02  -0.429  0.66779    
## WORKING_SPOUSE                 -3.108e-02  2.181e-02  -1.425  0.15416    
## WEEKS_PAID_VACATION             2.203e-02  2.458e-03   8.962  < 2e-16 ***
## REDUCE_PAID_WORK_HOURS         -2.969e-02  1.541e-02  -1.927  0.05406 .  
## MEDICARE                       -4.970e-02  2.347e-02  -2.118  0.03425 *  
## MEDICAID                       -1.403e-01  2.299e-02  -6.106 1.09e-09 ***
## HOSPITAL_EXPENSES               7.212e-06  5.108e-06   1.412  0.15804    
## RETIRED                        -5.912e-01  1.933e-02 -30.588  < 2e-16 ***
## MISSING_WORK_ENJOYMENT         -1.195e+00  6.834e-02 -17.489  < 2e-16 ***
## MISSING_ANY_DEPENDENTS          9.479e-03  3.012e-02   0.315  0.75300    
## MISSING_WORKING_SPOUSE         -7.139e-02  2.245e-02  -3.179  0.00148 ** 
## MISSING_WEEKS_PAID_VACATION     8.348e-02  3.450e-02   2.420  0.01556 *  
## MISSING_REDUCE_PAID_WORK_HOURS -1.302e-01  3.573e-02  -3.645  0.00027 ***
## MISSING_RETIRED                 1.115e+00  6.373e-02  17.502  < 2e-16 ***
## ACTIVE_ONCE_WEEKLY              1.849e-02  1.989e-02   0.930  0.35262    
## ACTIVE_MORE_THAN_ONCE_WEEKLY   -3.278e-03  1.630e-02  -0.201  0.84063    
## ACTIVE_DAILY                   -4.708e-03  2.240e-02  -0.210  0.83355    
## EXCELLENT_HEALTH                2.526e-01  4.958e-02   5.095 3.60e-07 ***
## VERY_GOOD_HEALTH                2.364e-01  4.695e-02   5.035 4.93e-07 ***
## GOOD_HEALTH                     2.517e-01  4.660e-02   5.403 6.82e-08 ***
## FAIR_HEALTH                     2.187e-01  4.805e-02   4.551 5.44e-06 ***
## MARRIED                        -7.448e-02  2.388e-02  -3.119  0.00182 ** 
## REALLY_LIKE_WORKING            -1.074e-02  4.920e-02  -0.218  0.82714    
## LIKE_WORKING                   -1.460e-02  4.826e-02  -0.302  0.76234    
## DISLIKE_WORKING                -1.779e-02  5.241e-02  -0.339  0.73425    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4927 on 6110 degrees of freedom
## Multiple R-squared:  0.388,  Adjusted R-squared:  0.3853 
## F-statistic: 143.4 on 27 and 6110 DF,  p-value: < 2.2e-16