Question 1

In the context of the following regression equation, define what it means for x2 to be an endogenous explanatory variable. If x2 is endogenous, what does this mean for estimation of B2? That is, how does endogeneity affect the point estimate and the standard error?

y = B0 + B1x1 + B2x2 + E

ANSWER:

If x2 is endogenous, that means it is correlated in some way with the error term E. That means that x2 is a function of x1 which causes the error term to have predictability, meaning the original equation violates assumptions that errors are uncorrelated with unobserved characteristics. This means that the B2 is now inconsistent when at the limits of OLS estimation in the sample.

The endogeneity in x2 will throw off the point estimates when observations fall far from the mean. This will create biases and inconsistencies within the B2 estimates and potential errors in confounding results if endogeneity is not accounted for.

Unaccounted for enodgenous explanatory variables within a model will cause the standard error to be larger than in models where endogenous variables are accounted for by Instrumental Variables (IVs). Instrumental variables must be uncorrelated with the main dependent variable and must be strongly correlated with the endogenous variable. If the IV is only weakly correlated with the endogenous explanatory variable, it will result in a weak instrument which may produce inconsistent estimates.

IVs are used within Two-stage Least Squares (2SLS) and Two-stage Residual Inclusion (2SRI) testing to account for endogenous explanatory variable effects and create unbiased estimates. 2SLS models first estimate the endogenous explanatory variable and then plug this estimation into the the original estimation of the the dependent variable (in this case y). Meanwhile, 2SLI models first model the endogenous explanatory variable and make note of the residuals of this regression, then plugging those into the original model predicting y. Both of these regression types help to account for endogenous, but 2SLS is more common because bootstrapping of the data is not necessary.

Question 2

Find a variable in the dataset that you think may serve as a reasonable instrumental variable. Summarize the dependent variable, the endogenous explanatory variable, and the instrumental variable.

ANSWER: Picking the Endogenous-IV Variables

From the cleaned variables of my dataset, we will be looking at: Weekly Work Hours (WWH), Age, Health Rating (RH), any Heart Conditions (HC), Frequency of Activity (AF), Enjoyment of Work (WE), and Hospital Expenses (HE). AF and RH are both categorical variables that will be broken into dummy variables below.

After running a correlation matrix (1 pages below) we see that the following variables have the highest correlation: * Age and Weekly Work Hours: corr = -0.31 * Age and Heart Condition: corr = 0.2 * Rate Health and Activity Frequency: corr = -0.21

For the sake of this homework, we will be focusing on the correlation between Age and Heart Condition. Age will be our Endogenous variable due to it’s high correlation with Weekly Work Hours (our dependent variable), while Heart Condition will be our Instrument Variable because it is relatively uncorrelated with WWH (-0.07). For the sake of theory, it does not make sense that Heart Condition is an input in Age, however the presence of a Heart Condition is a good predictor of Age so this relationship will work for the sake of this homework.

Alternatively, we see there is a strong correlation between Health Rating and Activity Frequency. In this case, both variables are relatively uncorrelated with WWH (AF: -0.1, RH: -0.05) so we can use theory to decide the endogenous variable. Activity Frequency will be treated as endogenous while Health Rating is the Instrument Variable. This is because an individuals Health Rating is a direct input in how active they are on a monthly basis, additionally, Activity Frequency does have less correlation with Age. However, this will not be the main Endogenous-IV pair in this homework because both of these variables are categorical and will be split into dummies. My understanding of how this Endogenous-IV relationship works does not extend to many indicator variables but I do want to see if it is significant in this case.

Summary Statistics on these variables can be found 2 pages below.

ANSWER: Dependent, Endogenous, IV Variable Summaries

In total, there are 6054 observations of all variables.

To begin we should look at the Dependent Variable, Weekly Work Hours (WWH). This variable has a Min of 0, Max of 168, Median of 40, and a mean of 37.3. All of this makes sense. There are jobs-like being a lineman-that are paid for all hours of a week during peak season. From the mean-median relationship we can see there is some right skew within this data.

Our first Endogenous Variable is Age, this has a minimum of 50, a Max of 90, a median of 59 and a mean of 60.6. Age is cutoff at 50 because we are only considering those who are eligible for AARP benefits in our final study. The mean-median relationship shows that there is some left skew in the data with more mass at older ages.

Our first Instrument Variable is Heart Condition which is a binary variable that is true when individuals report current or past presence of a heart condition. 775 Individuals report a presence or history of a heart condition meaning there is a significant mass at 0 for this variable.

Our secondary Endogenous Variable is Activity Frequency. This is a categorical variable wherein 0 represents inactivity, 1 represents weekly activity, 2 represents multiple times weekly activity, 3 represents daily activity, and there is a paired indicator variable for missing records. When split into dummy variables, there are 7 missing observations, 1461 inactive individuals, 1059 individuals active weekly, 2778 individuals active more than once weekly, and 756 individuals active daily. This shows that there is a significant skew towards more active people in this set.

Finally, our secondary Instrument Variable is Health Rating. This is a categorical variable where 0 represents missing, 1 represents excellent health, 2 represents very good health, 3 represents good health, 4 represents fair health, and 5 represents poor health. When split into dummy variables, there are 3 missing observations, 670 individuals with excellent health, 2184 with very good health, 2192 with good health, 886 with fair health, and 119 with poor health. Overall this is a pretty normal distribution when looking at them in aggregate.

Creating a Correlation Matrix Visualization

df = read.csv("HSV2018_Hw3.csv")
df_corr = read.csv("HSV2018_Hw3_core.csv")
correlation_matrix <- round(cor(df_corr[,c(2,3,4,5,6,7,8)]),2)
#head(correlation_matrix)
melted_correlation_matrix <- melt(correlation_matrix)
#head(melted_correlation_matrix)

get_lower_tri <- function(correlation_matrix){
  correlation_matrix[upper.tri(correlation_matrix)] <- NA
  return(correlation_matrix)
}

get_upper_tri <- function(correlation_matrix){
  correlation_matrix[lower.tri(correlation_matrix)] <- NA
  return(correlation_matrix)
}

reorder_correlation_matrix <- function(correlation_matrix){
  dd <- as.dist((1-correlation_matrix)/2)
  hc <- hclust(dd)
  correlation_matrix <- correlation_matrix[hc$order, hc$order]
}

correlation_matrix <- reorder_correlation_matrix(correlation_matrix)
upper_triangel <- get_upper_tri(correlation_matrix)
melted_correlation_matrix <- melt(upper_triangel, na.rm=TRUE)

ggheatmap <- ggplot(melted_correlation_matrix, aes(Var2, Var1, fill = value))+
 geom_tile(color = "white")+
 scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
   midpoint = 0, limit = c(-1,1), space = "Lab", 
    name="Pearson\nCorrelation") +
  theme_minimal()+ 
 theme(axis.text.x = element_text(angle = 45, vjust = 1, 
    size = 8, hjust = 1))+
 coord_fixed() + 
 geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
  theme(
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  panel.grid.major = element_blank(),
  panel.border = element_blank(),
  panel.background = element_blank(),
  axis.ticks = element_blank(),
  legend.justification = c(1, 0),
  legend.position = c(0.6, 0.7),
  legend.direction = "horizontal")+
  guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
                title.position = "top", title.hjust = 0.5))
# Print the heatmap
print(ggheatmap)

## Summary statistics for first group: Dependent, Endogenous, IV
##  WEEKLY_WORK_HOURS      AGE        HEART_CONDITION
##  Min.   :  0.00    Min.   :50.00   Min.   :0.000  
##  1st Qu.: 30.00    1st Qu.:55.00   1st Qu.:0.000  
##  Median : 40.00    Median :59.00   Median :0.000  
##  Mean   : 37.32    Mean   :60.59   Mean   :0.128  
##  3rd Qu.: 45.00    3rd Qu.:64.00   3rd Qu.:0.000  
##  Max.   :168.00    Max.   :90.00   Max.   :1.000
## Number of individuals Who have had a Heart Condition:  775 /6054
## 
## 
##  Summary statistics for second group: Dependent, Endogenous, IV
##       AGE         RATE_HEALTH    ACTIVITY_FREQUENCY
##  Min.   :50.00   Min.   :0.000   Min.   :0.000     
##  1st Qu.:55.00   1st Qu.:2.000   1st Qu.:1.000     
##  Median :59.00   Median :3.000   Median :2.000     
##  Mean   :60.59   Mean   :2.602   Mean   :1.467     
##  3rd Qu.:64.00   3rd Qu.:3.000   3rd Qu.:2.000     
##  Max.   :90.00   Max.   :5.000   Max.   :3.000
## 
##  Decomposed Rate of Health Statistics: Endogenous
##  MISSING_RATE_HEALTH EXCELLENT_HEALTH VERY_GOOD_HEALTH  GOOD_HEALTH    
##  Min.   :0.0000000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.0004955   Mean   :0.1107   Mean   :0.3608   Mean   :0.3621  
##  3rd Qu.:0.0000000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##   FAIR_HEALTH      POOR_HEALTH     
##  Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.00000  
##  Mean   :0.1463   Mean   :0.01966  
##  3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.00000
## 
## Number of Missing in Rate Health:  3 /6054
## Number of Excellent in Rate Health:  670 /6054
## Number of Very Good in Rate Health:  2184 /6054
## Number of Good in Rate Health:  2192 /6054
## Number of Fair in Rate Health:  886 /6054
## Number of Poor in Rate Health:  119 /6054
## 
##  Decomposed Activity Frequency Statistics: IV
##  MISSING_ACTIVITY_FREQUENCY ACTIVE_ONCE_WEEKLY ACTIVE_MORE_THAN_ONCE_WEEKLY
##  Min.   :0.000000           Min.   :0.0000     Min.   :0.0000              
##  1st Qu.:0.000000           1st Qu.:0.0000     1st Qu.:0.0000              
##  Median :0.000000           Median :0.0000     Median :0.0000              
##  Mean   :0.001156           Mean   :0.1749     Mean   :0.4589              
##  3rd Qu.:0.000000           3rd Qu.:0.0000     3rd Qu.:1.0000              
##  Max.   :1.000000           Max.   :1.0000     Max.   :1.0000              
##   ACTIVE_DAILY      NOT_ACTIVE    
##  Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000  
##  Mean   :0.1249   Mean   :0.2413  
##  3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000
## 
## Number of Missing in Activity Frequency:  7 /6054
## Number of Once Weekly in Activity Frequency:  1059 /6054
## Number of More than Once Weekly in Activity Frequency:  2778 /6054
## Number of Daily in Activity Frequency:  756 /6054
## Number of Inactive in Activity Frequency:  1461 /6054

# Question 3 # Run a regression that assumes that the endogenous variable is exogenous. Then, run a regression that corrects for the endogeneity of that variable. Perform tests to determine if the IV is sufficient and whether the explanatory variable is endogenous. Use both the 2SLS and 2SRI estimators that we discussed in class. State in words what you conclude from these analyses.

ANSWER:

All regression results can be found on the last page of this report.

The following regressions were run: * OLS1: WWH on all * OLS2: WWH on all but endogenous explanatory variable (age) * OLS3: WWH on all but instrument variable (heart condition) * OLS4: Age on all but WWH * 2SLS1: WWH as dependent, Age as endogenous, Heart Condition as IV * 2SLS2: WWH as dependent, Activity Frequency as endogenous, Health Rating as IV * 2SLS3: WWH as dependent, Age and Activity Frequency as endogenous, Heart Condition and Health Rating as IV * 2SLI: WWH as dependent, Age as endogenous, Heart Condition as IV

Looking at the F statistics of the OLS estimators, we see that the best equation was using all variables but Heart Condition as an input in WWH (OLS3). This resulted in a F-statistic of 53.063, a RMSE of 13.62, and a AIC of 48828.6. Meanwhile OLS2 had an F statistic of 7.017, a RMSE of 14.31, and AIC of 49433.3. And OLS1 had an F statistic of 49.517, a RMSE of 13.62, and an AIC of 48830.6. From these we see that removing Heart Condition results in the best OLS model. All of these models share similar significance within the variables.

Moving to the 2SLS models we can see that accounting for different endogenous variables had different effects. In 2SLS1 Age was accounted for with an IV of Heart Condition which resulted in a RMSE of 13.62, and an AIC of 48828.7. 2SLS2 accounted for Activity with an IV of Health Rating resulting in a RMSE fo 65.96 and an AIC of 67926.1. Finally, 2SLS3 accounted for age and activity with IV of Heart Condition and Health Rating resulting in a RMSE of 45.24 and an AIC of 63356. From this we see that the best model uses Age as an endogenous explanatory variable and Heart Condition as it’s instrument variable.

A quick linear model with Age as the dependent and all other variables but Weekly Work Hours as the exogenous variables shows that this model is accurate in estimating the bottom 75% of observations for Age compared to our sample averages but tends to fall apart at the upper quarterly.

Because the endogenous explanatory of Age and the instrument variable of Heart Condition showed the best results, the 2SLI model was only run using these variables as endogenous and IV. Summary statistics using this 2SLI model are certainly interesting but do not display RMSE or AIC. However, it does show that this 2SLI model has a very significant J test and that all variable coefficients are significant but work enjoyment and hospital expenses. However, I would not use this model or it’s results empirically.

When comparing the best OLS model (OLS3) and the best 2SLS model (2SLS1) we see that the OLS results and 2SLS results are almost exactly the same with an RMSE of 13.62 shared and a AIC difference of just .1. This means that using heart condition as an instrument variable for Age has little impact on the overall model and that, in this case, an OLS model omitting heart condition variables is as significant as a 2SLS model accounting for the endogenous of Age with Heart Condition.

lm1 <- lm(WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + 
            MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + 
            ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
            EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
summary(lm1)
## 
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT + 
##     HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + 
##     MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + 
##     ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + EXCELLENT_HEALTH + 
##     VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.875  -6.174   0.374   6.799 139.105 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  68.8734263  2.0958671  32.862  < 2e-16 ***
## AGE                          -0.6505539  0.0258323 -25.184  < 2e-16 ***
## HEART_CONDITION              -0.0328882  0.5435213  -0.061  0.95175    
## WORK_ENJOYMENT                0.1156849  0.2764814   0.418  0.67566    
## HOSPITAL_EXPENSES             0.0004399  0.0001350   3.259  0.00112 ** 
## MISSING_RATE_HEALTH          17.4700715  7.9746424   2.191  0.02851 *  
## MISSING_HEART_CONDITION       4.6215157  7.8770942   0.587  0.55743    
## MISSING_ACTIVITY_FREQUENCY   -0.9435384  5.1667264  -0.183  0.85510    
## MISSING_WORK_ENJOYMENT       -5.5798776  2.8944104  -1.928  0.05393 .  
## ACTIVE_ONCE_WEEKLY            0.1041171  0.5534886   0.188  0.85080    
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.1992926  0.4524843  -2.650  0.00806 ** 
## ACTIVE_DAILY                  0.0451878  0.6196061   0.073  0.94186    
## EXCELLENT_HEALTH              9.0973890  1.3710495   6.635 3.52e-11 ***
## VERY_GOOD_HEALTH              8.4795382  1.2945612   6.550 6.22e-11 ***
## GOOD_HEALTH                   8.3960229  1.2881400   6.518 7.70e-11 ***
## FAIR_HEALTH                   6.8556517  1.3323007   5.146 2.75e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.63 on 6038 degrees of freedom
## Multiple R-squared:  0.1095, Adjusted R-squared:  0.1073 
## F-statistic: 49.52 on 15 and 6038 DF,  p-value: < 2.2e-16
lm_test_no_iv <- lm(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + 
                    MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + 
                    MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
                    ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, 
                    data = df)
summary(lm_test_no_iv)
## 
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + 
##     MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + 
##     MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
##     ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + 
##     FAIR_HEALTH, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.864  -6.170   0.374   6.788 139.113 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  68.8860052  2.0853601  33.033  < 2e-16 ***
## AGE                          -0.6508818  0.0252552 -25.772  < 2e-16 ***
## WORK_ENJOYMENT                0.1160032  0.2764086   0.420  0.67473    
## HOSPITAL_EXPENSES             0.0004393  0.0001346   3.264  0.00110 ** 
## MISSING_RATE_HEALTH          17.4761965  7.9733422   2.192  0.02843 *  
## MISSING_HEART_CONDITION       4.6253226  7.8761932   0.587  0.55706    
## MISSING_ACTIVITY_FREQUENCY   -0.9428554  5.1662878  -0.183  0.85520    
## MISSING_WORK_ENJOYMENT       -5.5799631  2.8941713  -1.928  0.05390 .  
## ACTIVE_ONCE_WEEKLY            0.1040303  0.5534410   0.188  0.85091    
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.2000500  0.4522739  -2.653  0.00799 ** 
## ACTIVE_DAILY                  0.0444889  0.6194473   0.072  0.94275    
## EXCELLENT_HEALTH              9.1027989  1.3680185   6.654 3.10e-11 ***
## VERY_GOOD_HEALTH              8.4838785  1.2924658   6.564 5.67e-11 ***
## GOOD_HEALTH                   8.3983904  1.2874394   6.523 7.43e-11 ***
## FAIR_HEALTH                   6.8560419  1.3321752   5.147 2.74e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.63 on 6039 degrees of freedom
## Multiple R-squared:  0.1095, Adjusted R-squared:  0.1075 
## F-statistic: 53.06 on 14 and 6039 DF,  p-value: < 2.2e-16
lm_test_no_endg <- lm(WEEKLY_WORK_HOURS ~ HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + 
                      MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + 
                      MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
                      ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH,
                      data = df)
summary(lm_test_no_endg)
## 
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ HEART_CONDITION + WORK_ENJOYMENT + 
##     HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + 
##     MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + 
##     ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + EXCELLENT_HEALTH + 
##     VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.000  -7.418   2.077   6.546 134.129 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  29.1752638  1.4518447  20.095  < 2e-16 ***
## HEART_CONDITION              -2.9048319  0.5585898  -5.200 2.06e-07 ***
## WORK_ENJOYMENT                0.7004888  0.2895886   2.419 0.015596 *  
## HOSPITAL_EXPENSES             0.0005498  0.0001418   3.877 0.000107 ***
## MISSING_RATE_HEALTH          17.5396417  8.3823134   2.092 0.036439 *  
## MISSING_HEART_CONDITION       5.7306301  8.2796495   0.692 0.488880    
## MISSING_ACTIVITY_FREQUENCY   -2.0072555  5.4306729  -0.370 0.711683    
## MISSING_WORK_ENJOYMENT       -6.6849355  3.0420258  -2.198 0.028021 *  
## ACTIVE_ONCE_WEEKLY            0.8445981  0.5809619   1.454 0.146056    
## ACTIVE_MORE_THAN_ONCE_WEEKLY -0.4463680  0.4745764  -0.941 0.346968    
## ACTIVE_DAILY                  0.5382204  0.6509558   0.827 0.408374    
## EXCELLENT_HEALTH              8.0797975  1.4405129   5.609 2.13e-08 ***
## VERY_GOOD_HEALTH              7.2884802  1.3598320   5.360 8.64e-08 ***
## GOOD_HEALTH                   7.3465250  1.3532822   5.429 5.90e-08 ***
## FAIR_HEALTH                   6.4830826  1.4003229   4.630 3.74e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.33 on 6039 degrees of freedom
## Multiple R-squared:  0.01601,    Adjusted R-squared:  0.01373 
## F-statistic: 7.017 on 14 and 6039 DF,  p-value: 1.4e-14
#IV: Age as endogenous and Heart Condition as IV. 
iv1 <- ivreg(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + 
              MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + 
              ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
              EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH | HEART_CONDITION + 
              WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + 
              MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + 
              ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
              EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, 
             data = df)
summary(iv1, vcov = sandwich, diagnostics = TRUE)
## 
## Call:
## ivreg(formula = WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + 
##     MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + 
##     MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
##     ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + 
##     FAIR_HEALTH | HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + 
##     MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + 
##     MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
##     ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + 
##     FAIR_HEALTH, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -42.8620  -6.1907   0.3787   6.7892 139.2030 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  69.3280315  8.1228809   8.535  < 2e-16 ***
## AGE                          -0.6580037  0.1289197  -5.104 3.43e-07 ***
## WORK_ENJOYMENT                0.1089880  0.2892430   0.377  0.70633    
## HOSPITAL_EXPENSES             0.0004386  0.0001358   3.230  0.00124 ** 
## MISSING_RATE_HEALTH          17.4692748  8.2316992   2.122  0.03386 *  
## MISSING_HEART_CONDITION       4.6088146  5.5026876   0.838  0.40231    
## MISSING_ACTIVITY_FREQUENCY   -0.9313572  6.2093084  -0.150  0.88077    
## MISSING_WORK_ENJOYMENT       -5.5672230  3.2880644  -1.693  0.09048 .  
## ACTIVE_ONCE_WEEKLY            0.0956374  0.5570490   0.172  0.86369    
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.2079148  0.4812730  -2.510  0.01210 *  
## ACTIVE_DAILY                  0.0395418  0.6663192   0.059  0.95268    
## EXCELLENT_HEALTH              9.1090420  1.4300485   6.370 2.03e-10 ***
## VERY_GOOD_HEALTH              8.4931776  1.3585096   6.252 4.33e-10 ***
## GOOD_HEALTH                   8.4080412  1.3536999   6.211 5.61e-10 ***
## FAIR_HEALTH                   6.8599182  1.3978738   4.907 9.47e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.63 on 6039 degrees of freedom
## Multiple R-Squared: 0.1095,  Adjusted R-squared: 0.1075 
## Wald test: 6.931 on 14 and 6039 DF,  p-value: 2.354e-14
iv1_f = summary(iv1)$fstatistic
iv1_f
## NULL
#IV: Activity Frequency as endogenous and Health Rating as IV. 
iv2 <- ivreg(WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY + 
              MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
              ACTIVE_DAILY | EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + AGE + HEART_CONDITION + 
              WORK_ENJOYMENT + HOSPITAL_EXPENSES  + MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + 
              EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH, 
             data = df)
summary(iv2, vcov = sandwich, diagnostics = TRUE)
## 
## Call:
## ivreg(formula = WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT + 
##     HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY + MISSING_HEART_CONDITION + 
##     MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
##     ACTIVE_DAILY | EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + 
##     FAIR_HEALTH + AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + 
##     MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + EXCELLENT_HEALTH + 
##     VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH, 
##     data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -1488.166   -31.891    -8.319    30.229   171.052 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   7.991e+01  3.658e+02   0.218    0.827
## AGE                          -6.552e-01  9.117e-01  -0.719    0.472
## HEART_CONDITION              -6.226e-01  8.371e+00  -0.074    0.941
## WORK_ENJOYMENT               -1.161e+00  2.303e+01  -0.050    0.960
## HOSPITAL_EXPENSES             4.990e-04  1.972e-03   0.253    0.800
## MISSING_ACTIVITY_FREQUENCY    1.453e+03  3.240e+03   0.449    0.654
## MISSING_HEART_CONDITION      -2.883e+01  3.285e+02  -0.088    0.930
## MISSING_WORK_ENJOYMENT       -1.696e+01  9.201e+01  -0.184    0.854
## ACTIVE_ONCE_WEEKLY           -3.309e+01  9.273e+02  -0.036    0.972
## ACTIVE_MORE_THAN_ONCE_WEEKLY  3.204e+01  5.463e+01   0.586    0.558
## ACTIVE_DAILY                 -9.048e+01  1.074e+03  -0.084    0.933
## 
## Residual standard error: 66.02 on 6043 degrees of freedom
## Multiple R-Squared: -19.9,   Adjusted R-squared: -19.94 
## Wald test: 1.991 on 10 and 6043 DF,  p-value: 0.0303
#IV: Age + Activity Frequency as endogenous and Heart Condition + Health Rating as IV. 
iv3 <- ivreg(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY + 
              MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
              ACTIVE_DAILY | HEART_CONDITION + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + AGE + 
              HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES  + MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + 
              EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH, 
             data = df)
summary(iv2, vcov = sandwich, diagnostics = TRUE)
## 
## Call:
## ivreg(formula = WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT + 
##     HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY + MISSING_HEART_CONDITION + 
##     MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + 
##     ACTIVE_DAILY | EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + 
##     FAIR_HEALTH + AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + 
##     MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + EXCELLENT_HEALTH + 
##     VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH, 
##     data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -1488.166   -31.891    -8.319    30.229   171.052 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   7.991e+01  3.658e+02   0.218    0.827
## AGE                          -6.552e-01  9.117e-01  -0.719    0.472
## HEART_CONDITION              -6.226e-01  8.371e+00  -0.074    0.941
## WORK_ENJOYMENT               -1.161e+00  2.303e+01  -0.050    0.960
## HOSPITAL_EXPENSES             4.990e-04  1.972e-03   0.253    0.800
## MISSING_ACTIVITY_FREQUENCY    1.453e+03  3.240e+03   0.449    0.654
## MISSING_HEART_CONDITION      -2.883e+01  3.285e+02  -0.088    0.930
## MISSING_WORK_ENJOYMENT       -1.696e+01  9.201e+01  -0.184    0.854
## ACTIVE_ONCE_WEEKLY           -3.309e+01  9.273e+02  -0.036    0.972
## ACTIVE_MORE_THAN_ONCE_WEEKLY  3.204e+01  5.463e+01   0.586    0.558
## ACTIVE_DAILY                 -9.048e+01  1.074e+03  -0.084    0.933
## 
## Residual standard error: 66.02 on 6043 degrees of freedom
## Multiple R-Squared: -19.9,   Adjusted R-squared: -19.94 
## Wald test: 1.991 on 10 and 6043 DF,  p-value: 0.0303
#Testing Instrument effects on endogenous variables
lm_age <- lm(AGE ~ HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + 
            MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + 
            ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
            EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
coeftest(lm_age, vcov = vcovHC(lm_age, type = "HC0"))
## 
## t test of coefficients:
## 
##                                 Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)                   6.1022e+01  6.9381e-01 87.9525 < 2.2e-16 ***
## HEART_CONDITION               4.4146e+00  3.2171e-01 13.7222 < 2.2e-16 ***
## WORK_ENJOYMENT               -8.9893e-01  1.2563e-01 -7.1553 9.338e-13 ***
## HOSPITAL_EXPENSES            -1.6895e-04  5.2978e-05 -3.1890  0.001435 ** 
## MISSING_RATE_HEALTH          -1.0694e-01  8.1199e-01 -0.1317  0.895225    
## MISSING_HEART_CONDITION      -1.7049e+00  2.5189e+00 -0.6768  0.498544    
## MISSING_ACTIVITY_FREQUENCY    1.6351e+00  4.4997e+00  0.3634  0.716335    
## MISSING_WORK_ENJOYMENT        1.6986e+00  1.2146e+00  1.3985  0.162006    
## ACTIVE_ONCE_WEEKLY           -1.1382e+00  2.8064e-01 -4.0559 5.057e-05 ***
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.1574e+00  2.3463e-01 -4.9327 8.326e-07 ***
## ACTIVE_DAILY                 -7.5787e-01  3.2117e-01 -2.3597  0.018320 *  
## EXCELLENT_HEALTH              1.5642e+00  6.8462e-01  2.2847  0.022362 *  
## VERY_GOOD_HEALTH              1.8308e+00  6.5032e-01  2.8153  0.004889 ** 
## GOOD_HEALTH                   1.6132e+00  6.4804e-01  2.4894  0.012822 *  
## FAIR_HEALTH                   5.7270e-01  6.6256e-01  0.8644  0.387418    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
hat_age = lm_age$fitted.values
summary(df$AGE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   50.00   55.00   59.00   60.59   64.00   90.00
summary(hat_age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   54.75   59.68   60.03   60.59   60.84   68.97
#Testing lm with and without IVs
ct1 <- coeftest(lm1, vcov = vcovHC(lm1, type = "HC0")) #Pre-assigned LM on all variables
ct2 <- coeftest(lm_test_no_iv, vcov = vcovHC(lm_test_no_iv, type = "HC0"))
ct3 <- coeftest(lm_test_no_endg, vcov = vcovHC(lm_test_no_endg))
cat("OLS on all variables: \n", summary(ct1))
## OLS on all variables: 
##  Min.   :-5.5799   1st Qu.:-0.1873   Median : 0.1099   Mean   : 7.2283   3rd Qu.: 8.4169   Max.   :68.8734   Min.   :0.000136   1st Qu.:0.516314   Median :1.350176   Mean   :2.092999   3rd Qu.:2.502722   Max.   :8.234344   Min.   :-23.33774   1st Qu.: -0.08017   Median :  0.63586   Mean   :  2.09276   3rd Qu.:  5.23635   Max.   : 30.69955   Min.   :0.00000   1st Qu.:0.00000   Median :0.02132   Mean   :0.30160   3rd Qu.:0.71245   Max.   :0.95534
cat("\n\nOLS on all variables but Heart Condition: \n", summary(ct2))
## 
## 
## OLS on all variables but Heart Condition: 
##  Min.   :-5.5800   1st Qu.:-0.3252   Median : 0.1160   Mean   : 7.7147   3rd Qu.: 8.4411   Max.   :68.8860   Min.   :0.000135   1st Qu.:0.496646   Median :1.349764   Mean   :2.191557   3rd Qu.:2.752492   Max.   :8.232735   Min.   :-24.0858   1st Qu.: -0.0425   Median :  0.8433   Mean   :  2.2077   3rd Qu.:  5.5719   Max.   : 30.9673   Min.   :0.000000   1st Qu.:0.000000   Median :0.008701   Mean   :0.258024   3rd Qu.:0.533281   Max.   :0.946175
cat("\n\nOLS on all variables but Age: \n", summary(ct3))
## 
## 
## OLS on all variables but Age: 
##  Min.   :-6.6849   1st Qu.:-0.2229   Median : 0.8446   Mean   : 4.7789   3rd Qu.: 7.3175   Max.   :29.1753   Min.   : 0.000143   1st Qu.: 0.593379   Median : 1.425131   Mean   : 2.615873   3rd Qu.: 2.678130   Max.   :12.667362   Min.   :-4.7044   1st Qu.: 0.2215   Median : 1.4836   Mean   : 2.8182   3rd Qu.: 4.7634   Max.   :19.1062   Min.   :0.0000000   1st Qu.:0.0000015   Median :0.0125885   Mean   :0.1547654   3rd Qu.:0.2597135   Max.   :0.7316252
#2sls: Age as endogenous and Heart Condition as IV. 
ri1 <- tsri(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + 
              MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + 
              ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
              EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH | HEART_CONDITION + 
              WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + 
              MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + 
              ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + 
              EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, 
             data = df)
summary(ri1)
## 
## GMM fit summary:
## 
## Call:
## gmm::gmm(g = tsriIdentityMoments, x = dat, t0 = t0, vcov = "iid")
## 
## 
## Method:  twoStep 
## 
## Coefficients:
##                                  Estimate     Std. Error   t value    
## Z(Intercept)                      6.1022e+01   6.9381e-01   8.7952e+01
## ZHEART_CONDITION                  4.4146e+00   3.2171e-01   1.3722e+01
## ZWORK_ENJOYMENT                  -8.9893e-01   1.2563e-01  -7.1553e+00
## ZHOSPITAL_EXPENSES               -1.6895e-04   5.2978e-05  -3.1890e+00
## ZMISSING_RATE_HEALTH             -1.0694e-01   8.1199e-01  -1.3170e-01
## ZMISSING_HEART_CONDITION         -1.7049e+00   2.5189e+00  -6.7682e-01
## ZMISSING_ACTIVITY_FREQUENCY       1.6351e+00   4.4997e+00   3.6338e-01
## ZMISSING_WORK_ENJOYMENT           1.6986e+00   1.2146e+00   1.3985e+00
## ZACTIVE_ONCE_WEEKLY              -1.1382e+00   2.8064e-01  -4.0559e+00
## ZACTIVE_MORE_THAN_ONCE_WEEKLY    -1.1574e+00   2.3463e-01  -4.9327e+00
## ZACTIVE_DAILY                    -7.5787e-01   3.2117e-01  -2.3597e+00
## ZEXCELLENT_HEALTH                 1.5642e+00   6.8462e-01   2.2847e+00
## ZVERY_GOOD_HEALTH                 1.8308e+00   6.5032e-01   2.8153e+00
## ZGOOD_HEALTH                      1.6132e+00   6.4804e-01   2.4894e+00
## ZFAIR_HEALTH                      5.7270e-01   6.6256e-01   8.6437e-01
## (Intercept)                       6.9328e+01   8.1229e+00   8.5349e+00
## AGE                              -6.5800e-01   1.2892e-01  -5.1040e+00
## resres                            7.4498e-03   1.3299e-01   5.6018e-02
## resWORK_ENJOYMENT                 1.0899e-01   2.8924e-01   3.7680e-01
## resHOSPITAL_EXPENSES              4.3862e-04   1.3578e-04   3.2305e+00
## resMISSING_RATE_HEALTH            1.7469e+01   8.2317e+00   2.1222e+00
## resMISSING_HEART_CONDITION        4.6088e+00   5.5027e+00   8.3756e-01
## resMISSING_ACTIVITY_FREQUENCY    -9.3136e-01   6.2093e+00  -1.4999e-01
## resMISSING_WORK_ENJOYMENT        -5.5672e+00   3.2881e+00  -1.6932e+00
## resACTIVE_ONCE_WEEKLY             9.5637e-02   5.5705e-01   1.7169e-01
## resACTIVE_MORE_THAN_ONCE_WEEKLY  -1.2079e+00   4.8127e-01  -2.5098e+00
## resACTIVE_DAILY                   3.9542e-02   6.6632e-01   5.9344e-02
## resEXCELLENT_HEALTH               9.1090e+00   1.4300e+00   6.3697e+00
## resVERY_GOOD_HEALTH               8.4932e+00   1.3585e+00   6.2518e+00
## resGOOD_HEALTH                    8.4080e+00   1.3537e+00   6.2112e+00
## resFAIR_HEALTH                    6.8599e+00   1.3979e+00   4.9074e+00
##                                  Pr(>|t|)   
## Z(Intercept)                      0.0000e+00
## ZHEART_CONDITION                  7.4795e-43
## ZWORK_ENJOYMENT                   8.3476e-13
## ZHOSPITAL_EXPENSES                1.4277e-03
## ZMISSING_RATE_HEALTH              8.9522e-01
## ZMISSING_HEART_CONDITION          4.9852e-01
## ZMISSING_ACTIVITY_FREQUENCY       7.1632e-01
## ZMISSING_WORK_ENJOYMENT           1.6195e-01
## ZACTIVE_ONCE_WEEKLY               4.9945e-05
## ZACTIVE_MORE_THAN_ONCE_WEEKLY     8.1092e-07
## ZACTIVE_DAILY                     1.8288e-02
## ZEXCELLENT_HEALTH                 2.2328e-02
## ZVERY_GOOD_HEALTH                 4.8732e-03
## ZGOOD_HEALTH                      1.2795e-02
## ZFAIR_HEALTH                      3.8738e-01
## (Intercept)                       1.4027e-17
## AGE                               3.3258e-07
## resres                            9.5533e-01
## resWORK_ENJOYMENT                 7.0632e-01
## resHOSPITAL_EXPENSES              1.2358e-03
## resMISSING_RATE_HEALTH            3.3821e-02
## resMISSING_HEART_CONDITION        4.0228e-01
## resMISSING_ACTIVITY_FREQUENCY     8.8077e-01
## resMISSING_WORK_ENJOYMENT         9.0425e-02
## resACTIVE_ONCE_WEEKLY             8.6368e-01
## resACTIVE_MORE_THAN_ONCE_WEEKLY   1.2079e-02
## resACTIVE_DAILY                   9.5268e-01
## resEXCELLENT_HEALTH               1.8934e-10
## resVERY_GOOD_HEALTH               4.0566e-10
## resGOOD_HEALTH                    5.2596e-10
## resFAIR_HEALTH                    9.2294e-07
## 
## J-Test: degrees of freedom is 0 
##                 J-test                P-value             
## Test E(g)=0:    1.14569283026132e-14  *******             
## 
## #############
## Information related to the numerical optimization
## Convergence code =  1 
## Function eval. =  502 
## Gradian eval. =  NA 
## 
## Estimates with 95% CI limits:
##                                   Estimate      0.025      0.975
## Z(Intercept)                    61.0220987  5.966e+01  6.238e+01
## ZHEART_CONDITION                 4.4146133  3.784e+00  5.045e+00
## ZWORK_ENJOYMENT                 -0.8989323 -1.145e+00 -6.527e-01
## ZHOSPITAL_EXPENSES              -0.0001689 -2.728e-04 -6.511e-05
## ZMISSING_RATE_HEALTH            -0.1069399 -1.698e+00  1.485e+00
## ZMISSING_HEART_CONDITION        -1.7048772 -6.642e+00  3.232e+00
## ZMISSING_ACTIVITY_FREQUENCY      1.6350946 -7.184e+00  1.045e+01
## ZMISSING_WORK_ENJOYMENT          1.6986416 -6.819e-01  4.079e+00
## ZACTIVE_ONCE_WEEKLY             -1.1382317 -1.688e+00 -5.882e-01
## ZACTIVE_MORE_THAN_ONCE_WEEKLY   -1.1573594 -1.617e+00 -6.975e-01
## ZACTIVE_DAILY                   -0.7578660 -1.387e+00 -1.284e-01
## ZEXCELLENT_HEALTH                1.5641925  2.224e-01  2.906e+00
## ZVERY_GOOD_HEALTH                1.8308368  5.562e-01  3.105e+00
## ZGOOD_HEALTH                     1.6132375  3.431e-01  2.883e+00
## ZFAIR_HEALTH                     0.5726951 -7.259e-01  1.871e+00
## (Intercept)                     69.3280315  5.341e+01  8.525e+01
## AGE                             -0.6580037 -9.107e-01 -4.053e-01
## resres                           0.0074498 -2.532e-01  2.681e-01
## resWORK_ENJOYMENT                0.1089880 -4.579e-01  6.759e-01
## resHOSPITAL_EXPENSES             0.0004386  1.725e-04  7.047e-04
## resMISSING_RATE_HEALTH          17.4692748  1.335e+00  3.360e+01
## resMISSING_HEART_CONDITION       4.6088146 -6.176e+00  1.539e+01
## resMISSING_ACTIVITY_FREQUENCY   -0.9313572 -1.310e+01  1.124e+01
## resMISSING_WORK_ENJOYMENT       -5.5672230 -1.201e+01  8.773e-01
## resACTIVE_ONCE_WEEKLY            0.0956374 -9.962e-01  1.187e+00
## resACTIVE_MORE_THAN_ONCE_WEEKLY -1.2079148 -2.151e+00 -2.646e-01
## resACTIVE_DAILY                  0.0395418 -1.266e+00  1.346e+00
## resEXCELLENT_HEALTH              9.1090420  6.306e+00  1.191e+01
## resVERY_GOOD_HEALTH              8.4931776  5.831e+00  1.116e+01
## resGOOD_HEALTH                   8.4080412  5.755e+00  1.106e+01
## resFAIR_HEALTH                   6.8599182  4.120e+00  9.600e+00

Model Summary

#Easy model summary:
cat("Summary of 2SRI above")
## Summary of 2SRI above
cat("Summary of OLS models")
## Summary of OLS models
m_list_ols <- list(OLS_ALL = lm1, OLS_NO_ENDG = lm_test_no_endg, OLS_NO_IV = lm_test_no_iv)
msummary(m_list_ols)
OLS_ALL OLS_NO_ENDG OLS_NO_IV
(Intercept) 68.873 29.175 68.886
(2.096) (1.452) (2.085)
AGE −0.651 −0.651
(0.026) (0.025)
HEART_CONDITION −0.033 −2.905
(0.544) (0.559)
WORK_ENJOYMENT 0.116 0.700 0.116
(0.276) (0.290) (0.276)
HOSPITAL_EXPENSES 0.0004 0.0005 0.0004
(0.0001) (0.0001) (0.0001)
MISSING_RATE_HEALTH 17.470 17.540 17.476
(7.975) (8.382) (7.973)
MISSING_HEART_CONDITION 4.622 5.731 4.625
(7.877) (8.280) (7.876)
MISSING_ACTIVITY_FREQUENCY −0.944 −2.007 −0.943
(5.167) (5.431) (5.166)
MISSING_WORK_ENJOYMENT −5.580 −6.685 −5.580
(2.894) (3.042) (2.894)
ACTIVE_ONCE_WEEKLY 0.104 0.845 0.104
(0.553) (0.581) (0.553)
ACTIVE_MORE_THAN_ONCE_WEEKLY −1.199 −0.446 −1.200
(0.452) (0.475) (0.452)
ACTIVE_DAILY 0.045 0.538 0.044
(0.620) (0.651) (0.619)
EXCELLENT_HEALTH 9.097 8.080 9.103
(1.371) (1.441) (1.368)
VERY_GOOD_HEALTH 8.480 7.288 8.484
(1.295) (1.360) (1.292)
GOOD_HEALTH 8.396 7.347 8.398
(1.288) (1.353) (1.287)
FAIR_HEALTH 6.856 6.483 6.856
(1.332) (1.400) (1.332)
Num.Obs. 6054 6054 6054
R2 0.110 0.016 0.110
R2 Adj. 0.107 0.014 0.107
AIC 48830.6 49433.3 48828.6
BIC 48944.6 49540.6 48935.9
Log.Lik. −24398.299 −24700.636 −24398.301
F 49.517 7.017 53.063
RMSE 13.62 14.31 13.62
cat("\nSummary of 2SLS models")
## 
## Summary of 2SLS models
m_list_ols <- list(ENDOG_AGE = iv1, ENDG_ACTIVITY_FREQ = iv2, ENDG_BOTH = iv3)
msummary(m_list_ols)
ENDOG_AGE ENDG_ACTIVITY_FREQ ENDG_BOTH
(Intercept) 69.328 79.905 49.462
(7.596) (370.634) (31.043)
AGE −0.658 −0.655 −0.579
(0.120) (0.934) (0.138)
WORK_ENJOYMENT 0.109 −1.161 0.754
(0.300) (23.316) (2.127)
HOSPITAL_EXPENSES 0.0004 0.0005 0.0004
(0.0001) (0.002) (0.0005)
MISSING_RATE_HEALTH 17.469
(7.974)
MISSING_HEART_CONDITION 4.609 −28.830
(7.881) (339.837)
MISSING_ACTIVITY_FREQUENCY −0.931 1453.493 1191.521
(5.170) (3470.524) (1127.491)
MISSING_WORK_ENJOYMENT −5.567 −16.957 −9.187
(2.902) (95.110) (12.706)
ACTIVE_ONCE_WEEKLY 0.096 −33.087 44.291
(0.571) (941.944) (74.296)
ACTIVE_MORE_THAN_ONCE_WEEKLY −1.208 32.038 26.946
(0.471) (62.785) (13.493)
ACTIVE_DAILY 0.040 −90.484 0.164
(0.625) (1097.553) (80.579)
EXCELLENT_HEALTH 9.109
(1.372)
VERY_GOOD_HEALTH 8.493
(1.302)
GOOD_HEALTH 8.408
(1.297)
FAIR_HEALTH 6.860
(1.334)
HEART_CONDITION −0.623
(9.223)
Num.Obs. 6054 6054 6054
R2 0.110 −19.902 −8.832
R2 Adj. 0.107 −19.936 −8.845
AIC 48828.7 67926.1 63356.0
BIC 48936.0 68006.6 63423.0
RMSE 13.62 65.96 45.24