In the context of the following regression equation, define what it means for x2 to be an endogenous explanatory variable. If x2 is endogenous, what does this mean for estimation of B2? That is, how does endogeneity affect the point estimate and the standard error?
y = B0 + B1x1 + B2x2 + E
If x2 is endogenous, that means it is correlated in some way with the error term E. That means that x2 is a function of x1 which causes the error term to have predictability, meaning the original equation violates assumptions that errors are uncorrelated with unobserved characteristics. This means that the B2 is now inconsistent when at the limits of OLS estimation in the sample.
The endogeneity in x2 will throw off the point estimates when observations fall far from the mean. This will create biases and inconsistencies within the B2 estimates and potential errors in confounding results if endogeneity is not accounted for.
Unaccounted for enodgenous explanatory variables within a model will cause the standard error to be larger than in models where endogenous variables are accounted for by Instrumental Variables (IVs). Instrumental variables must be uncorrelated with the main dependent variable and must be strongly correlated with the endogenous variable. If the IV is only weakly correlated with the endogenous explanatory variable, it will result in a weak instrument which may produce inconsistent estimates.
IVs are used within Two-stage Least Squares (2SLS) and Two-stage Residual Inclusion (2SRI) testing to account for endogenous explanatory variable effects and create unbiased estimates. 2SLS models first estimate the endogenous explanatory variable and then plug this estimation into the the original estimation of the the dependent variable (in this case y). Meanwhile, 2SLI models first model the endogenous explanatory variable and make note of the residuals of this regression, then plugging those into the original model predicting y. Both of these regression types help to account for endogenous, but 2SLS is more common because bootstrapping of the data is not necessary.
Find a variable in the dataset that you think may serve as a reasonable instrumental variable. Summarize the dependent variable, the endogenous explanatory variable, and the instrumental variable.
From the cleaned variables of my dataset, we will be looking at: Weekly Work Hours (WWH), Age, Health Rating (RH), any Heart Conditions (HC), Frequency of Activity (AF), Enjoyment of Work (WE), and Hospital Expenses (HE). AF and RH are both categorical variables that will be broken into dummy variables below.
After running a correlation matrix (1 pages below) we see that the following variables have the highest correlation: * Age and Weekly Work Hours: corr = -0.31 * Age and Heart Condition: corr = 0.2 * Rate Health and Activity Frequency: corr = -0.21
For the sake of this homework, we will be focusing on the correlation between Age and Heart Condition. Age will be our Endogenous variable due to it’s high correlation with Weekly Work Hours (our dependent variable), while Heart Condition will be our Instrument Variable because it is relatively uncorrelated with WWH (-0.07). For the sake of theory, it does not make sense that Heart Condition is an input in Age, however the presence of a Heart Condition is a good predictor of Age so this relationship will work for the sake of this homework.
Alternatively, we see there is a strong correlation between Health Rating and Activity Frequency. In this case, both variables are relatively uncorrelated with WWH (AF: -0.1, RH: -0.05) so we can use theory to decide the endogenous variable. Activity Frequency will be treated as endogenous while Health Rating is the Instrument Variable. This is because an individuals Health Rating is a direct input in how active they are on a monthly basis, additionally, Activity Frequency does have less correlation with Age. However, this will not be the main Endogenous-IV pair in this homework because both of these variables are categorical and will be split into dummies. My understanding of how this Endogenous-IV relationship works does not extend to many indicator variables but I do want to see if it is significant in this case.
Summary Statistics on these variables can be found 2 pages below.
In total, there are 6054 observations of all variables.
To begin we should look at the Dependent Variable, Weekly Work Hours (WWH). This variable has a Min of 0, Max of 168, Median of 40, and a mean of 37.3. All of this makes sense. There are jobs-like being a lineman-that are paid for all hours of a week during peak season. From the mean-median relationship we can see there is some right skew within this data.
Our first Endogenous Variable is Age, this has a minimum of 50, a Max of 90, a median of 59 and a mean of 60.6. Age is cutoff at 50 because we are only considering those who are eligible for AARP benefits in our final study. The mean-median relationship shows that there is some left skew in the data with more mass at older ages.
Our first Instrument Variable is Heart Condition which is a binary variable that is true when individuals report current or past presence of a heart condition. 775 Individuals report a presence or history of a heart condition meaning there is a significant mass at 0 for this variable.
Our secondary Endogenous Variable is Activity Frequency. This is a categorical variable wherein 0 represents inactivity, 1 represents weekly activity, 2 represents multiple times weekly activity, 3 represents daily activity, and there is a paired indicator variable for missing records. When split into dummy variables, there are 7 missing observations, 1461 inactive individuals, 1059 individuals active weekly, 2778 individuals active more than once weekly, and 756 individuals active daily. This shows that there is a significant skew towards more active people in this set.
Finally, our secondary Instrument Variable is Health Rating. This is a categorical variable where 0 represents missing, 1 represents excellent health, 2 represents very good health, 3 represents good health, 4 represents fair health, and 5 represents poor health. When split into dummy variables, there are 3 missing observations, 670 individuals with excellent health, 2184 with very good health, 2192 with good health, 886 with fair health, and 119 with poor health. Overall this is a pretty normal distribution when looking at them in aggregate.
df = read.csv("HSV2018_Hw3.csv")
df_corr = read.csv("HSV2018_Hw3_core.csv")
correlation_matrix <- round(cor(df_corr[,c(2,3,4,5,6,7,8)]),2)
#head(correlation_matrix)
melted_correlation_matrix <- melt(correlation_matrix)
#head(melted_correlation_matrix)
get_lower_tri <- function(correlation_matrix){
correlation_matrix[upper.tri(correlation_matrix)] <- NA
return(correlation_matrix)
}
get_upper_tri <- function(correlation_matrix){
correlation_matrix[lower.tri(correlation_matrix)] <- NA
return(correlation_matrix)
}
reorder_correlation_matrix <- function(correlation_matrix){
dd <- as.dist((1-correlation_matrix)/2)
hc <- hclust(dd)
correlation_matrix <- correlation_matrix[hc$order, hc$order]
}
correlation_matrix <- reorder_correlation_matrix(correlation_matrix)
upper_triangel <- get_upper_tri(correlation_matrix)
melted_correlation_matrix <- melt(upper_triangel, na.rm=TRUE)
ggheatmap <- ggplot(melted_correlation_matrix, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 8, hjust = 1))+
coord_fixed() +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))
# Print the heatmap
print(ggheatmap)
## Summary statistics for first group: Dependent, Endogenous, IV
## WEEKLY_WORK_HOURS AGE HEART_CONDITION
## Min. : 0.00 Min. :50.00 Min. :0.000
## 1st Qu.: 30.00 1st Qu.:55.00 1st Qu.:0.000
## Median : 40.00 Median :59.00 Median :0.000
## Mean : 37.32 Mean :60.59 Mean :0.128
## 3rd Qu.: 45.00 3rd Qu.:64.00 3rd Qu.:0.000
## Max. :168.00 Max. :90.00 Max. :1.000
## Number of individuals Who have had a Heart Condition: 775 /6054
##
##
## Summary statistics for second group: Dependent, Endogenous, IV
## AGE RATE_HEALTH ACTIVITY_FREQUENCY
## Min. :50.00 Min. :0.000 Min. :0.000
## 1st Qu.:55.00 1st Qu.:2.000 1st Qu.:1.000
## Median :59.00 Median :3.000 Median :2.000
## Mean :60.59 Mean :2.602 Mean :1.467
## 3rd Qu.:64.00 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :90.00 Max. :5.000 Max. :3.000
##
## Decomposed Rate of Health Statistics: Endogenous
## MISSING_RATE_HEALTH EXCELLENT_HEALTH VERY_GOOD_HEALTH GOOD_HEALTH
## Min. :0.0000000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.0004955 Mean :0.1107 Mean :0.3608 Mean :0.3621
## 3rd Qu.:0.0000000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## FAIR_HEALTH POOR_HEALTH
## Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.00000
## Mean :0.1463 Mean :0.01966
## 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.00000
##
## Number of Missing in Rate Health: 3 /6054
## Number of Excellent in Rate Health: 670 /6054
## Number of Very Good in Rate Health: 2184 /6054
## Number of Good in Rate Health: 2192 /6054
## Number of Fair in Rate Health: 886 /6054
## Number of Poor in Rate Health: 119 /6054
##
## Decomposed Activity Frequency Statistics: IV
## MISSING_ACTIVITY_FREQUENCY ACTIVE_ONCE_WEEKLY ACTIVE_MORE_THAN_ONCE_WEEKLY
## Min. :0.000000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000000 Median :0.0000 Median :0.0000
## Mean :0.001156 Mean :0.1749 Mean :0.4589
## 3rd Qu.:0.000000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.000000 Max. :1.0000 Max. :1.0000
## ACTIVE_DAILY NOT_ACTIVE
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000
## Mean :0.1249 Mean :0.2413
## 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000
##
## Number of Missing in Activity Frequency: 7 /6054
## Number of Once Weekly in Activity Frequency: 1059 /6054
## Number of More than Once Weekly in Activity Frequency: 2778 /6054
## Number of Daily in Activity Frequency: 756 /6054
## Number of Inactive in Activity Frequency: 1461 /6054
# Question 3 # Run a regression that assumes that the endogenous
variable is exogenous. Then, run a regression that corrects for the
endogeneity of that variable. Perform tests to determine if the IV is
sufficient and whether the explanatory variable is endogenous. Use both
the 2SLS and 2SRI estimators that we discussed in class. State in words
what you conclude from these analyses.
All regression results can be found on the last page of this report.
The following regressions were run: * OLS1: WWH on all * OLS2: WWH on all but endogenous explanatory variable (age) * OLS3: WWH on all but instrument variable (heart condition) * OLS4: Age on all but WWH * 2SLS1: WWH as dependent, Age as endogenous, Heart Condition as IV * 2SLS2: WWH as dependent, Activity Frequency as endogenous, Health Rating as IV * 2SLS3: WWH as dependent, Age and Activity Frequency as endogenous, Heart Condition and Health Rating as IV * 2SLI: WWH as dependent, Age as endogenous, Heart Condition as IV
Looking at the F statistics of the OLS estimators, we see that the best equation was using all variables but Heart Condition as an input in WWH (OLS3). This resulted in a F-statistic of 53.063, a RMSE of 13.62, and a AIC of 48828.6. Meanwhile OLS2 had an F statistic of 7.017, a RMSE of 14.31, and AIC of 49433.3. And OLS1 had an F statistic of 49.517, a RMSE of 13.62, and an AIC of 48830.6. From these we see that removing Heart Condition results in the best OLS model. All of these models share similar significance within the variables.
Moving to the 2SLS models we can see that accounting for different endogenous variables had different effects. In 2SLS1 Age was accounted for with an IV of Heart Condition which resulted in a RMSE of 13.62, and an AIC of 48828.7. 2SLS2 accounted for Activity with an IV of Health Rating resulting in a RMSE fo 65.96 and an AIC of 67926.1. Finally, 2SLS3 accounted for age and activity with IV of Heart Condition and Health Rating resulting in a RMSE of 45.24 and an AIC of 63356. From this we see that the best model uses Age as an endogenous explanatory variable and Heart Condition as it’s instrument variable.
A quick linear model with Age as the dependent and all other variables but Weekly Work Hours as the exogenous variables shows that this model is accurate in estimating the bottom 75% of observations for Age compared to our sample averages but tends to fall apart at the upper quarterly.
Because the endogenous explanatory of Age and the instrument variable of Heart Condition showed the best results, the 2SLI model was only run using these variables as endogenous and IV. Summary statistics using this 2SLI model are certainly interesting but do not display RMSE or AIC. However, it does show that this 2SLI model has a very significant J test and that all variable coefficients are significant but work enjoyment and hospital expenses. However, I would not use this model or it’s results empirically.
When comparing the best OLS model (OLS3) and the best 2SLS model (2SLS1) we see that the OLS results and 2SLS results are almost exactly the same with an RMSE of 13.62 shared and a AIC difference of just .1. This means that using heart condition as an instrument variable for Age has little impact on the overall model and that, in this case, an OLS model omitting heart condition variables is as significant as a 2SLS model accounting for the endogenous of Age with Heart Condition.
lm1 <- lm(WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH +
MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
summary(lm1)
##
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT +
## HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + MISSING_HEART_CONDITION +
## MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY +
## ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + EXCELLENT_HEALTH +
## VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.875 -6.174 0.374 6.799 139.105
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68.8734263 2.0958671 32.862 < 2e-16 ***
## AGE -0.6505539 0.0258323 -25.184 < 2e-16 ***
## HEART_CONDITION -0.0328882 0.5435213 -0.061 0.95175
## WORK_ENJOYMENT 0.1156849 0.2764814 0.418 0.67566
## HOSPITAL_EXPENSES 0.0004399 0.0001350 3.259 0.00112 **
## MISSING_RATE_HEALTH 17.4700715 7.9746424 2.191 0.02851 *
## MISSING_HEART_CONDITION 4.6215157 7.8770942 0.587 0.55743
## MISSING_ACTIVITY_FREQUENCY -0.9435384 5.1667264 -0.183 0.85510
## MISSING_WORK_ENJOYMENT -5.5798776 2.8944104 -1.928 0.05393 .
## ACTIVE_ONCE_WEEKLY 0.1041171 0.5534886 0.188 0.85080
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.1992926 0.4524843 -2.650 0.00806 **
## ACTIVE_DAILY 0.0451878 0.6196061 0.073 0.94186
## EXCELLENT_HEALTH 9.0973890 1.3710495 6.635 3.52e-11 ***
## VERY_GOOD_HEALTH 8.4795382 1.2945612 6.550 6.22e-11 ***
## GOOD_HEALTH 8.3960229 1.2881400 6.518 7.70e-11 ***
## FAIR_HEALTH 6.8556517 1.3323007 5.146 2.75e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.63 on 6038 degrees of freedom
## Multiple R-squared: 0.1095, Adjusted R-squared: 0.1073
## F-statistic: 49.52 on 15 and 6038 DF, p-value: < 2.2e-16
lm_test_no_iv <- lm(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES +
MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY +
MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH,
data = df)
summary(lm_test_no_iv)
##
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES +
## MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY +
## MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
## ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH +
## FAIR_HEALTH, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.864 -6.170 0.374 6.788 139.113
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68.8860052 2.0853601 33.033 < 2e-16 ***
## AGE -0.6508818 0.0252552 -25.772 < 2e-16 ***
## WORK_ENJOYMENT 0.1160032 0.2764086 0.420 0.67473
## HOSPITAL_EXPENSES 0.0004393 0.0001346 3.264 0.00110 **
## MISSING_RATE_HEALTH 17.4761965 7.9733422 2.192 0.02843 *
## MISSING_HEART_CONDITION 4.6253226 7.8761932 0.587 0.55706
## MISSING_ACTIVITY_FREQUENCY -0.9428554 5.1662878 -0.183 0.85520
## MISSING_WORK_ENJOYMENT -5.5799631 2.8941713 -1.928 0.05390 .
## ACTIVE_ONCE_WEEKLY 0.1040303 0.5534410 0.188 0.85091
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.2000500 0.4522739 -2.653 0.00799 **
## ACTIVE_DAILY 0.0444889 0.6194473 0.072 0.94275
## EXCELLENT_HEALTH 9.1027989 1.3680185 6.654 3.10e-11 ***
## VERY_GOOD_HEALTH 8.4838785 1.2924658 6.564 5.67e-11 ***
## GOOD_HEALTH 8.3983904 1.2874394 6.523 7.43e-11 ***
## FAIR_HEALTH 6.8560419 1.3321752 5.147 2.74e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.63 on 6039 degrees of freedom
## Multiple R-squared: 0.1095, Adjusted R-squared: 0.1075
## F-statistic: 53.06 on 14 and 6039 DF, p-value: < 2.2e-16
lm_test_no_endg <- lm(WEEKLY_WORK_HOURS ~ HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES +
MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY +
MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH,
data = df)
summary(lm_test_no_endg)
##
## Call:
## lm(formula = WEEKLY_WORK_HOURS ~ HEART_CONDITION + WORK_ENJOYMENT +
## HOSPITAL_EXPENSES + MISSING_RATE_HEALTH + MISSING_HEART_CONDITION +
## MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY +
## ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY + EXCELLENT_HEALTH +
## VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.000 -7.418 2.077 6.546 134.129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.1752638 1.4518447 20.095 < 2e-16 ***
## HEART_CONDITION -2.9048319 0.5585898 -5.200 2.06e-07 ***
## WORK_ENJOYMENT 0.7004888 0.2895886 2.419 0.015596 *
## HOSPITAL_EXPENSES 0.0005498 0.0001418 3.877 0.000107 ***
## MISSING_RATE_HEALTH 17.5396417 8.3823134 2.092 0.036439 *
## MISSING_HEART_CONDITION 5.7306301 8.2796495 0.692 0.488880
## MISSING_ACTIVITY_FREQUENCY -2.0072555 5.4306729 -0.370 0.711683
## MISSING_WORK_ENJOYMENT -6.6849355 3.0420258 -2.198 0.028021 *
## ACTIVE_ONCE_WEEKLY 0.8445981 0.5809619 1.454 0.146056
## ACTIVE_MORE_THAN_ONCE_WEEKLY -0.4463680 0.4745764 -0.941 0.346968
## ACTIVE_DAILY 0.5382204 0.6509558 0.827 0.408374
## EXCELLENT_HEALTH 8.0797975 1.4405129 5.609 2.13e-08 ***
## VERY_GOOD_HEALTH 7.2884802 1.3598320 5.360 8.64e-08 ***
## GOOD_HEALTH 7.3465250 1.3532822 5.429 5.90e-08 ***
## FAIR_HEALTH 6.4830826 1.4003229 4.630 3.74e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.33 on 6039 degrees of freedom
## Multiple R-squared: 0.01601, Adjusted R-squared: 0.01373
## F-statistic: 7.017 on 14 and 6039 DF, p-value: 1.4e-14
#IV: Age as endogenous and Heart Condition as IV.
iv1 <- ivreg(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH +
MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH | HEART_CONDITION +
WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH +
MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH,
data = df)
summary(iv1, vcov = sandwich, diagnostics = TRUE)
##
## Call:
## ivreg(formula = WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES +
## MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY +
## MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
## ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH +
## FAIR_HEALTH | HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES +
## MISSING_RATE_HEALTH + MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY +
## MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
## ACTIVE_DAILY + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH +
## FAIR_HEALTH, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.8620 -6.1907 0.3787 6.7892 139.2030
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 69.3280315 8.1228809 8.535 < 2e-16 ***
## AGE -0.6580037 0.1289197 -5.104 3.43e-07 ***
## WORK_ENJOYMENT 0.1089880 0.2892430 0.377 0.70633
## HOSPITAL_EXPENSES 0.0004386 0.0001358 3.230 0.00124 **
## MISSING_RATE_HEALTH 17.4692748 8.2316992 2.122 0.03386 *
## MISSING_HEART_CONDITION 4.6088146 5.5026876 0.838 0.40231
## MISSING_ACTIVITY_FREQUENCY -0.9313572 6.2093084 -0.150 0.88077
## MISSING_WORK_ENJOYMENT -5.5672230 3.2880644 -1.693 0.09048 .
## ACTIVE_ONCE_WEEKLY 0.0956374 0.5570490 0.172 0.86369
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.2079148 0.4812730 -2.510 0.01210 *
## ACTIVE_DAILY 0.0395418 0.6663192 0.059 0.95268
## EXCELLENT_HEALTH 9.1090420 1.4300485 6.370 2.03e-10 ***
## VERY_GOOD_HEALTH 8.4931776 1.3585096 6.252 4.33e-10 ***
## GOOD_HEALTH 8.4080412 1.3536999 6.211 5.61e-10 ***
## FAIR_HEALTH 6.8599182 1.3978738 4.907 9.47e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.63 on 6039 degrees of freedom
## Multiple R-Squared: 0.1095, Adjusted R-squared: 0.1075
## Wald test: 6.931 on 14 and 6039 DF, p-value: 2.354e-14
iv1_f = summary(iv1)$fstatistic
iv1_f
## NULL
#IV: Activity Frequency as endogenous and Health Rating as IV.
iv2 <- ivreg(WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY +
MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
ACTIVE_DAILY | EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + AGE + HEART_CONDITION +
WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH,
data = df)
summary(iv2, vcov = sandwich, diagnostics = TRUE)
##
## Call:
## ivreg(formula = WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT +
## HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY + MISSING_HEART_CONDITION +
## MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
## ACTIVE_DAILY | EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH +
## FAIR_HEALTH + AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES +
## MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + EXCELLENT_HEALTH +
## VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1488.166 -31.891 -8.319 30.229 171.052
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.991e+01 3.658e+02 0.218 0.827
## AGE -6.552e-01 9.117e-01 -0.719 0.472
## HEART_CONDITION -6.226e-01 8.371e+00 -0.074 0.941
## WORK_ENJOYMENT -1.161e+00 2.303e+01 -0.050 0.960
## HOSPITAL_EXPENSES 4.990e-04 1.972e-03 0.253 0.800
## MISSING_ACTIVITY_FREQUENCY 1.453e+03 3.240e+03 0.449 0.654
## MISSING_HEART_CONDITION -2.883e+01 3.285e+02 -0.088 0.930
## MISSING_WORK_ENJOYMENT -1.696e+01 9.201e+01 -0.184 0.854
## ACTIVE_ONCE_WEEKLY -3.309e+01 9.273e+02 -0.036 0.972
## ACTIVE_MORE_THAN_ONCE_WEEKLY 3.204e+01 5.463e+01 0.586 0.558
## ACTIVE_DAILY -9.048e+01 1.074e+03 -0.084 0.933
##
## Residual standard error: 66.02 on 6043 degrees of freedom
## Multiple R-Squared: -19.9, Adjusted R-squared: -19.94
## Wald test: 1.991 on 10 and 6043 DF, p-value: 0.0303
#IV: Age + Activity Frequency as endogenous and Heart Condition + Health Rating as IV.
iv3 <- ivreg(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY +
MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
ACTIVE_DAILY | HEART_CONDITION + EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + AGE +
HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH,
data = df)
summary(iv2, vcov = sandwich, diagnostics = TRUE)
##
## Call:
## ivreg(formula = WEEKLY_WORK_HOURS ~ AGE + HEART_CONDITION + WORK_ENJOYMENT +
## HOSPITAL_EXPENSES + MISSING_ACTIVITY_FREQUENCY + MISSING_HEART_CONDITION +
## MISSING_WORK_ENJOYMENT + ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY +
## ACTIVE_DAILY | EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH +
## FAIR_HEALTH + AGE + HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES +
## MISSING_HEART_CONDITION + MISSING_WORK_ENJOYMENT + EXCELLENT_HEALTH +
## VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH + MISSING_RATE_HEALTH,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1488.166 -31.891 -8.319 30.229 171.052
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.991e+01 3.658e+02 0.218 0.827
## AGE -6.552e-01 9.117e-01 -0.719 0.472
## HEART_CONDITION -6.226e-01 8.371e+00 -0.074 0.941
## WORK_ENJOYMENT -1.161e+00 2.303e+01 -0.050 0.960
## HOSPITAL_EXPENSES 4.990e-04 1.972e-03 0.253 0.800
## MISSING_ACTIVITY_FREQUENCY 1.453e+03 3.240e+03 0.449 0.654
## MISSING_HEART_CONDITION -2.883e+01 3.285e+02 -0.088 0.930
## MISSING_WORK_ENJOYMENT -1.696e+01 9.201e+01 -0.184 0.854
## ACTIVE_ONCE_WEEKLY -3.309e+01 9.273e+02 -0.036 0.972
## ACTIVE_MORE_THAN_ONCE_WEEKLY 3.204e+01 5.463e+01 0.586 0.558
## ACTIVE_DAILY -9.048e+01 1.074e+03 -0.084 0.933
##
## Residual standard error: 66.02 on 6043 degrees of freedom
## Multiple R-Squared: -19.9, Adjusted R-squared: -19.94
## Wald test: 1.991 on 10 and 6043 DF, p-value: 0.0303
#Testing Instrument effects on endogenous variables
lm_age <- lm(AGE ~ HEART_CONDITION + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH +
MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH, data = df)
coeftest(lm_age, vcov = vcovHC(lm_age, type = "HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.1022e+01 6.9381e-01 87.9525 < 2.2e-16 ***
## HEART_CONDITION 4.4146e+00 3.2171e-01 13.7222 < 2.2e-16 ***
## WORK_ENJOYMENT -8.9893e-01 1.2563e-01 -7.1553 9.338e-13 ***
## HOSPITAL_EXPENSES -1.6895e-04 5.2978e-05 -3.1890 0.001435 **
## MISSING_RATE_HEALTH -1.0694e-01 8.1199e-01 -0.1317 0.895225
## MISSING_HEART_CONDITION -1.7049e+00 2.5189e+00 -0.6768 0.498544
## MISSING_ACTIVITY_FREQUENCY 1.6351e+00 4.4997e+00 0.3634 0.716335
## MISSING_WORK_ENJOYMENT 1.6986e+00 1.2146e+00 1.3985 0.162006
## ACTIVE_ONCE_WEEKLY -1.1382e+00 2.8064e-01 -4.0559 5.057e-05 ***
## ACTIVE_MORE_THAN_ONCE_WEEKLY -1.1574e+00 2.3463e-01 -4.9327 8.326e-07 ***
## ACTIVE_DAILY -7.5787e-01 3.2117e-01 -2.3597 0.018320 *
## EXCELLENT_HEALTH 1.5642e+00 6.8462e-01 2.2847 0.022362 *
## VERY_GOOD_HEALTH 1.8308e+00 6.5032e-01 2.8153 0.004889 **
## GOOD_HEALTH 1.6132e+00 6.4804e-01 2.4894 0.012822 *
## FAIR_HEALTH 5.7270e-01 6.6256e-01 0.8644 0.387418
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
hat_age = lm_age$fitted.values
summary(df$AGE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 50.00 55.00 59.00 60.59 64.00 90.00
summary(hat_age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54.75 59.68 60.03 60.59 60.84 68.97
#Testing lm with and without IVs
ct1 <- coeftest(lm1, vcov = vcovHC(lm1, type = "HC0")) #Pre-assigned LM on all variables
ct2 <- coeftest(lm_test_no_iv, vcov = vcovHC(lm_test_no_iv, type = "HC0"))
ct3 <- coeftest(lm_test_no_endg, vcov = vcovHC(lm_test_no_endg))
cat("OLS on all variables: \n", summary(ct1))
## OLS on all variables:
## Min. :-5.5799 1st Qu.:-0.1873 Median : 0.1099 Mean : 7.2283 3rd Qu.: 8.4169 Max. :68.8734 Min. :0.000136 1st Qu.:0.516314 Median :1.350176 Mean :2.092999 3rd Qu.:2.502722 Max. :8.234344 Min. :-23.33774 1st Qu.: -0.08017 Median : 0.63586 Mean : 2.09276 3rd Qu.: 5.23635 Max. : 30.69955 Min. :0.00000 1st Qu.:0.00000 Median :0.02132 Mean :0.30160 3rd Qu.:0.71245 Max. :0.95534
cat("\n\nOLS on all variables but Heart Condition: \n", summary(ct2))
##
##
## OLS on all variables but Heart Condition:
## Min. :-5.5800 1st Qu.:-0.3252 Median : 0.1160 Mean : 7.7147 3rd Qu.: 8.4411 Max. :68.8860 Min. :0.000135 1st Qu.:0.496646 Median :1.349764 Mean :2.191557 3rd Qu.:2.752492 Max. :8.232735 Min. :-24.0858 1st Qu.: -0.0425 Median : 0.8433 Mean : 2.2077 3rd Qu.: 5.5719 Max. : 30.9673 Min. :0.000000 1st Qu.:0.000000 Median :0.008701 Mean :0.258024 3rd Qu.:0.533281 Max. :0.946175
cat("\n\nOLS on all variables but Age: \n", summary(ct3))
##
##
## OLS on all variables but Age:
## Min. :-6.6849 1st Qu.:-0.2229 Median : 0.8446 Mean : 4.7789 3rd Qu.: 7.3175 Max. :29.1753 Min. : 0.000143 1st Qu.: 0.593379 Median : 1.425131 Mean : 2.615873 3rd Qu.: 2.678130 Max. :12.667362 Min. :-4.7044 1st Qu.: 0.2215 Median : 1.4836 Mean : 2.8182 3rd Qu.: 4.7634 Max. :19.1062 Min. :0.0000000 1st Qu.:0.0000015 Median :0.0125885 Mean :0.1547654 3rd Qu.:0.2597135 Max. :0.7316252
#2sls: Age as endogenous and Heart Condition as IV.
ri1 <- tsri(WEEKLY_WORK_HOURS ~ AGE + WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH +
MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH | HEART_CONDITION +
WORK_ENJOYMENT + HOSPITAL_EXPENSES + MISSING_RATE_HEALTH +
MISSING_HEART_CONDITION + MISSING_ACTIVITY_FREQUENCY + MISSING_WORK_ENJOYMENT +
ACTIVE_ONCE_WEEKLY + ACTIVE_MORE_THAN_ONCE_WEEKLY + ACTIVE_DAILY +
EXCELLENT_HEALTH + VERY_GOOD_HEALTH + GOOD_HEALTH + FAIR_HEALTH,
data = df)
summary(ri1)
##
## GMM fit summary:
##
## Call:
## gmm::gmm(g = tsriIdentityMoments, x = dat, t0 = t0, vcov = "iid")
##
##
## Method: twoStep
##
## Coefficients:
## Estimate Std. Error t value
## Z(Intercept) 6.1022e+01 6.9381e-01 8.7952e+01
## ZHEART_CONDITION 4.4146e+00 3.2171e-01 1.3722e+01
## ZWORK_ENJOYMENT -8.9893e-01 1.2563e-01 -7.1553e+00
## ZHOSPITAL_EXPENSES -1.6895e-04 5.2978e-05 -3.1890e+00
## ZMISSING_RATE_HEALTH -1.0694e-01 8.1199e-01 -1.3170e-01
## ZMISSING_HEART_CONDITION -1.7049e+00 2.5189e+00 -6.7682e-01
## ZMISSING_ACTIVITY_FREQUENCY 1.6351e+00 4.4997e+00 3.6338e-01
## ZMISSING_WORK_ENJOYMENT 1.6986e+00 1.2146e+00 1.3985e+00
## ZACTIVE_ONCE_WEEKLY -1.1382e+00 2.8064e-01 -4.0559e+00
## ZACTIVE_MORE_THAN_ONCE_WEEKLY -1.1574e+00 2.3463e-01 -4.9327e+00
## ZACTIVE_DAILY -7.5787e-01 3.2117e-01 -2.3597e+00
## ZEXCELLENT_HEALTH 1.5642e+00 6.8462e-01 2.2847e+00
## ZVERY_GOOD_HEALTH 1.8308e+00 6.5032e-01 2.8153e+00
## ZGOOD_HEALTH 1.6132e+00 6.4804e-01 2.4894e+00
## ZFAIR_HEALTH 5.7270e-01 6.6256e-01 8.6437e-01
## (Intercept) 6.9328e+01 8.1229e+00 8.5349e+00
## AGE -6.5800e-01 1.2892e-01 -5.1040e+00
## resres 7.4498e-03 1.3299e-01 5.6018e-02
## resWORK_ENJOYMENT 1.0899e-01 2.8924e-01 3.7680e-01
## resHOSPITAL_EXPENSES 4.3862e-04 1.3578e-04 3.2305e+00
## resMISSING_RATE_HEALTH 1.7469e+01 8.2317e+00 2.1222e+00
## resMISSING_HEART_CONDITION 4.6088e+00 5.5027e+00 8.3756e-01
## resMISSING_ACTIVITY_FREQUENCY -9.3136e-01 6.2093e+00 -1.4999e-01
## resMISSING_WORK_ENJOYMENT -5.5672e+00 3.2881e+00 -1.6932e+00
## resACTIVE_ONCE_WEEKLY 9.5637e-02 5.5705e-01 1.7169e-01
## resACTIVE_MORE_THAN_ONCE_WEEKLY -1.2079e+00 4.8127e-01 -2.5098e+00
## resACTIVE_DAILY 3.9542e-02 6.6632e-01 5.9344e-02
## resEXCELLENT_HEALTH 9.1090e+00 1.4300e+00 6.3697e+00
## resVERY_GOOD_HEALTH 8.4932e+00 1.3585e+00 6.2518e+00
## resGOOD_HEALTH 8.4080e+00 1.3537e+00 6.2112e+00
## resFAIR_HEALTH 6.8599e+00 1.3979e+00 4.9074e+00
## Pr(>|t|)
## Z(Intercept) 0.0000e+00
## ZHEART_CONDITION 7.4795e-43
## ZWORK_ENJOYMENT 8.3476e-13
## ZHOSPITAL_EXPENSES 1.4277e-03
## ZMISSING_RATE_HEALTH 8.9522e-01
## ZMISSING_HEART_CONDITION 4.9852e-01
## ZMISSING_ACTIVITY_FREQUENCY 7.1632e-01
## ZMISSING_WORK_ENJOYMENT 1.6195e-01
## ZACTIVE_ONCE_WEEKLY 4.9945e-05
## ZACTIVE_MORE_THAN_ONCE_WEEKLY 8.1092e-07
## ZACTIVE_DAILY 1.8288e-02
## ZEXCELLENT_HEALTH 2.2328e-02
## ZVERY_GOOD_HEALTH 4.8732e-03
## ZGOOD_HEALTH 1.2795e-02
## ZFAIR_HEALTH 3.8738e-01
## (Intercept) 1.4027e-17
## AGE 3.3258e-07
## resres 9.5533e-01
## resWORK_ENJOYMENT 7.0632e-01
## resHOSPITAL_EXPENSES 1.2358e-03
## resMISSING_RATE_HEALTH 3.3821e-02
## resMISSING_HEART_CONDITION 4.0228e-01
## resMISSING_ACTIVITY_FREQUENCY 8.8077e-01
## resMISSING_WORK_ENJOYMENT 9.0425e-02
## resACTIVE_ONCE_WEEKLY 8.6368e-01
## resACTIVE_MORE_THAN_ONCE_WEEKLY 1.2079e-02
## resACTIVE_DAILY 9.5268e-01
## resEXCELLENT_HEALTH 1.8934e-10
## resVERY_GOOD_HEALTH 4.0566e-10
## resGOOD_HEALTH 5.2596e-10
## resFAIR_HEALTH 9.2294e-07
##
## J-Test: degrees of freedom is 0
## J-test P-value
## Test E(g)=0: 1.14569283026132e-14 *******
##
## #############
## Information related to the numerical optimization
## Convergence code = 1
## Function eval. = 502
## Gradian eval. = NA
##
## Estimates with 95% CI limits:
## Estimate 0.025 0.975
## Z(Intercept) 61.0220987 5.966e+01 6.238e+01
## ZHEART_CONDITION 4.4146133 3.784e+00 5.045e+00
## ZWORK_ENJOYMENT -0.8989323 -1.145e+00 -6.527e-01
## ZHOSPITAL_EXPENSES -0.0001689 -2.728e-04 -6.511e-05
## ZMISSING_RATE_HEALTH -0.1069399 -1.698e+00 1.485e+00
## ZMISSING_HEART_CONDITION -1.7048772 -6.642e+00 3.232e+00
## ZMISSING_ACTIVITY_FREQUENCY 1.6350946 -7.184e+00 1.045e+01
## ZMISSING_WORK_ENJOYMENT 1.6986416 -6.819e-01 4.079e+00
## ZACTIVE_ONCE_WEEKLY -1.1382317 -1.688e+00 -5.882e-01
## ZACTIVE_MORE_THAN_ONCE_WEEKLY -1.1573594 -1.617e+00 -6.975e-01
## ZACTIVE_DAILY -0.7578660 -1.387e+00 -1.284e-01
## ZEXCELLENT_HEALTH 1.5641925 2.224e-01 2.906e+00
## ZVERY_GOOD_HEALTH 1.8308368 5.562e-01 3.105e+00
## ZGOOD_HEALTH 1.6132375 3.431e-01 2.883e+00
## ZFAIR_HEALTH 0.5726951 -7.259e-01 1.871e+00
## (Intercept) 69.3280315 5.341e+01 8.525e+01
## AGE -0.6580037 -9.107e-01 -4.053e-01
## resres 0.0074498 -2.532e-01 2.681e-01
## resWORK_ENJOYMENT 0.1089880 -4.579e-01 6.759e-01
## resHOSPITAL_EXPENSES 0.0004386 1.725e-04 7.047e-04
## resMISSING_RATE_HEALTH 17.4692748 1.335e+00 3.360e+01
## resMISSING_HEART_CONDITION 4.6088146 -6.176e+00 1.539e+01
## resMISSING_ACTIVITY_FREQUENCY -0.9313572 -1.310e+01 1.124e+01
## resMISSING_WORK_ENJOYMENT -5.5672230 -1.201e+01 8.773e-01
## resACTIVE_ONCE_WEEKLY 0.0956374 -9.962e-01 1.187e+00
## resACTIVE_MORE_THAN_ONCE_WEEKLY -1.2079148 -2.151e+00 -2.646e-01
## resACTIVE_DAILY 0.0395418 -1.266e+00 1.346e+00
## resEXCELLENT_HEALTH 9.1090420 6.306e+00 1.191e+01
## resVERY_GOOD_HEALTH 8.4931776 5.831e+00 1.116e+01
## resGOOD_HEALTH 8.4080412 5.755e+00 1.106e+01
## resFAIR_HEALTH 6.8599182 4.120e+00 9.600e+00
#Easy model summary:
cat("Summary of 2SRI above")
## Summary of 2SRI above
cat("Summary of OLS models")
## Summary of OLS models
m_list_ols <- list(OLS_ALL = lm1, OLS_NO_ENDG = lm_test_no_endg, OLS_NO_IV = lm_test_no_iv)
msummary(m_list_ols)
| OLS_ALL | OLS_NO_ENDG | OLS_NO_IV | |
|---|---|---|---|
| (Intercept) | 68.873 | 29.175 | 68.886 |
| (2.096) | (1.452) | (2.085) | |
| AGE | −0.651 | −0.651 | |
| (0.026) | (0.025) | ||
| HEART_CONDITION | −0.033 | −2.905 | |
| (0.544) | (0.559) | ||
| WORK_ENJOYMENT | 0.116 | 0.700 | 0.116 |
| (0.276) | (0.290) | (0.276) | |
| HOSPITAL_EXPENSES | 0.0004 | 0.0005 | 0.0004 |
| (0.0001) | (0.0001) | (0.0001) | |
| MISSING_RATE_HEALTH | 17.470 | 17.540 | 17.476 |
| (7.975) | (8.382) | (7.973) | |
| MISSING_HEART_CONDITION | 4.622 | 5.731 | 4.625 |
| (7.877) | (8.280) | (7.876) | |
| MISSING_ACTIVITY_FREQUENCY | −0.944 | −2.007 | −0.943 |
| (5.167) | (5.431) | (5.166) | |
| MISSING_WORK_ENJOYMENT | −5.580 | −6.685 | −5.580 |
| (2.894) | (3.042) | (2.894) | |
| ACTIVE_ONCE_WEEKLY | 0.104 | 0.845 | 0.104 |
| (0.553) | (0.581) | (0.553) | |
| ACTIVE_MORE_THAN_ONCE_WEEKLY | −1.199 | −0.446 | −1.200 |
| (0.452) | (0.475) | (0.452) | |
| ACTIVE_DAILY | 0.045 | 0.538 | 0.044 |
| (0.620) | (0.651) | (0.619) | |
| EXCELLENT_HEALTH | 9.097 | 8.080 | 9.103 |
| (1.371) | (1.441) | (1.368) | |
| VERY_GOOD_HEALTH | 8.480 | 7.288 | 8.484 |
| (1.295) | (1.360) | (1.292) | |
| GOOD_HEALTH | 8.396 | 7.347 | 8.398 |
| (1.288) | (1.353) | (1.287) | |
| FAIR_HEALTH | 6.856 | 6.483 | 6.856 |
| (1.332) | (1.400) | (1.332) | |
| Num.Obs. | 6054 | 6054 | 6054 |
| R2 | 0.110 | 0.016 | 0.110 |
| R2 Adj. | 0.107 | 0.014 | 0.107 |
| AIC | 48830.6 | 49433.3 | 48828.6 |
| BIC | 48944.6 | 49540.6 | 48935.9 |
| Log.Lik. | −24398.299 | −24700.636 | −24398.301 |
| F | 49.517 | 7.017 | 53.063 |
| RMSE | 13.62 | 14.31 | 13.62 |
cat("\nSummary of 2SLS models")
##
## Summary of 2SLS models
m_list_ols <- list(ENDOG_AGE = iv1, ENDG_ACTIVITY_FREQ = iv2, ENDG_BOTH = iv3)
msummary(m_list_ols)
| ENDOG_AGE | ENDG_ACTIVITY_FREQ | ENDG_BOTH | |
|---|---|---|---|
| (Intercept) | 69.328 | 79.905 | 49.462 |
| (7.596) | (370.634) | (31.043) | |
| AGE | −0.658 | −0.655 | −0.579 |
| (0.120) | (0.934) | (0.138) | |
| WORK_ENJOYMENT | 0.109 | −1.161 | 0.754 |
| (0.300) | (23.316) | (2.127) | |
| HOSPITAL_EXPENSES | 0.0004 | 0.0005 | 0.0004 |
| (0.0001) | (0.002) | (0.0005) | |
| MISSING_RATE_HEALTH | 17.469 | ||
| (7.974) | |||
| MISSING_HEART_CONDITION | 4.609 | −28.830 | |
| (7.881) | (339.837) | ||
| MISSING_ACTIVITY_FREQUENCY | −0.931 | 1453.493 | 1191.521 |
| (5.170) | (3470.524) | (1127.491) | |
| MISSING_WORK_ENJOYMENT | −5.567 | −16.957 | −9.187 |
| (2.902) | (95.110) | (12.706) | |
| ACTIVE_ONCE_WEEKLY | 0.096 | −33.087 | 44.291 |
| (0.571) | (941.944) | (74.296) | |
| ACTIVE_MORE_THAN_ONCE_WEEKLY | −1.208 | 32.038 | 26.946 |
| (0.471) | (62.785) | (13.493) | |
| ACTIVE_DAILY | 0.040 | −90.484 | 0.164 |
| (0.625) | (1097.553) | (80.579) | |
| EXCELLENT_HEALTH | 9.109 | ||
| (1.372) | |||
| VERY_GOOD_HEALTH | 8.493 | ||
| (1.302) | |||
| GOOD_HEALTH | 8.408 | ||
| (1.297) | |||
| FAIR_HEALTH | 6.860 | ||
| (1.334) | |||
| HEART_CONDITION | −0.623 | ||
| (9.223) | |||
| Num.Obs. | 6054 | 6054 | 6054 |
| R2 | 0.110 | −19.902 | −8.832 |
| R2 Adj. | 0.107 | −19.936 | −8.845 |
| AIC | 48828.7 | 67926.1 | 63356.0 |
| BIC | 48936.0 | 68006.6 | 63423.0 |
| RMSE | 13.62 | 65.96 | 45.24 |