Introduction

In this short analysis, I will try to investigate the impact of long working hours on mental health, using the NHIS 2016 single year data. Working hours have a pervasive influence on both physical and mental health. In 2012, Dr. Marianna Virtanen, a professor at the Finnish Institute of Occupational Health found; people who work overtime have a higher chance of developing depression. The research also found that 3.5% people who reported working eight hours developed depression, whereas 4.4% people working more than 11 hours did (https://www.livescience.com).

As mental health is a critical part of overall wellness of a person, it is important to understand the factors which may lead to poor/severe mental distress levels, especially when it comes to economically developed populations such as the United States. It also important to understand other potential conditions/factors which might increase the odds of developing severe mental distress/Nonspecific Pychological Distress. For instance: In the recent years, socioeconomic factors have also been associated with severe mental distress levels.

This stems from the idea that long working hours and their impact on health have been related to family financial stress and the breadwinner role: in situations of family financial stress, breadwinners are likely to be forced to work long hours in order to increase the family income. This indicates that among various others factors associated with NPD one of the potential factors may be poverty level (Scandinavian Journal of Work, Environment & Health).

In this study, the long working hours simply correspond to the overtime or number of hours worked beyond the standard time defined by the employer under the Fair Labor Standard Act which in the case is 40 hours including breaks.

The word “mental health” as part of the study, refers to the mental condition of an individual which is more explicitly about the frequency of experiencing symptoms of normal, moderate or severe psychological distress in past 30 days (before interview); based on the “Kessler Psychological Distress Scale”, using the response options none of the time, a little of the time’, some of the time , most of the time’, and `all of the time.

Information about the Data.

The data used in the analysis is collected through “National Health Interview Survey”. The NHIS is a large, population-based representative survey of the US population, which has been run annually since 1957. The NHIS uses a multi-stage area-based probability sampling design. The survey has three main components, the family, the sample child, and the sample adult which is used in this study. The questions in the adult sample of NHIS are specifically intended to gather detailed information. The data analyzed in this study specifically covers survey year 2016.

Statistical Analysis

For analysis of the data, we will use ‘zelig’ package to create logit based simulation model, which weill give us log coefficients/odd ratios which can than be used to visualize the predicted probabilites of nonspecific psychological distress levels. To investigate the mental distress level, we would first need to create a composite variable “K6”. This variable is based on the Kessler’s Scale of Nonspecific Psychological Distress. The K6 items assess the frequency of nonspecific psychological distress within a particular reference period. The responses range from “none of the time” coded 0 to “all of the time” code 4. The six items are summed to yield a number between 0 and 24. In NHIS there are 6 quesions related to K6, which potentially ask if respondent felt: hopeless, nervous, restless, sad, and worthless in the past thirty days. To protect the orginial data values, we created six new variales.

Composite Score Variable “K6” (0-24)

The K6 consist of six questions that were asked in NHIS about the frequency of experiencing the symptoms of mental distress. The K6 nonspecific distress scale has been extensively used in the epidemiological studies and its ability to discriminate DSM-IV (Diagnostic and Statistical Manual of Mental Disorders) cases from non-cases make the K6 and K10 attractive for use in general-purpose health surveys . According to Prochaska et al. “The K6 items assess the frequency of nonspecific psychological distress within a particular reference period. The responses range from”none of the time" coded 0 to “all of the time” code 4. The six items are summed to yield a number between 0 and 24."

Many research studies have been done to test the validity of the K6 scale and the results from these studies show consistency in the K6 scale for nonspecific distress levels. Research has shown that polychotomous classification of K6 has shown more textured information as compared to the dichotomous scoring of responses in the range of ???13’ versus ‘???12’. That is why, in this study, an logistic regression is used as the primary mode of analysis. For this study, a score of ???5 has been classified as “Okay/No mental distress”, a score between 6 and 12 is classified as “Moderate Mental Distress”, and ???13 is classified as “Severe Mental Distress”. The NHIS dataset contains 6 variables based on Kessler’s Nonspecific Psychological Distress Scale: i) aeffort, ii) ahopeless, iii) anervous iv) arestless, v) asad, vi) aworthless. These variables were recoded for missing values and then added together for a composite score. The scale has a range ‘0’ to ‘24’. The variable “MD3”, in the study is the dependent variable and is used for K6 score categorization, it has ‘3’ different distress level categories: None/Okay, Moderate, and Severe.

The cases which were missing, and the cases where any of the six K6 questions were not asked were treated as like any variable with missing value would have been treated. Upon a thorough review of the available guidelines regarding treating the missing values for K6 scoring, it was found that there is no particular way of dealing with the missing cases related to K6 . However, it important to mention that NHIS dataset is imputed using the multiple-imputation methodology. If in case a dataset is not imputed, a dataset containing imputed values can be downloaded and can be merged with existing dataset, to create the complete dataset . For our analysis, a value which was found to be not in the universe, or unknown, or missing were coded as missing and were not used in our analysis.

table.K6.f
## 
##     0     1 
## 30642  1247

Long Working Hours

The variable “hourwrk” is the independent variable which it will be used to determine the effect of the dependent variable. For this study, the long working hours correspond to the >40/week. It is important to mention that in the US the criteria for full-time employment is not standardized. The Fair Labor Standards Act of 1938, gives employers the right to define full-time employment and also the number of hours required to fulfill the full-time employment status. However, after the International Labor Organization 19th session of 1935, ‘forty hours a week’ policy , it becomes a standardized practice especially in the developed nations to have 40/week as standard working hours for full-time employment. Adults employed full time in the US report working an average of 47 hours per week, almost a full workday longer than what a standard five-day, 9-to-5 schedule entails. In fact, half of all full-time workers indicate they typically work more than 40 hours, and nearly four in 10 say they work at least 50 hours. In this study, based on standard employment practice for full time which is 35-40 hours per week, we have excluded/dropped hours below or less than 35 hours/week. The reference category is 35-40 hours per week. The variable “hourwrk” in the data set is a metric variable which has a number of worked hours in last week by the respondent. The variable was recoded for the missing value. Another categorized variable (hourswork_cat1) was also formed for long working hours. The reference category is 35-40 and the other three more categories are: I) 41-49 II) 51-59, and III) 60-69 IV) 70+ hours.

Poverty Threshold

The variable “pthreshold1” a recode of another variable “POORYN” which is a binary variable representing the poverty threshold levels of the respondent(s). According to NHIS codebook description: “POORYN” indicates whether family income was above or below the poverty level. The poverty status of a family group is assigned to each member of the family, thus making POORYN a person-level variable. Poverty status is also calculated for adults who live alone or with persons they are not related to; in such cases, POORYN is calculated based on the individual’s income .

Running the Model

For this analysis, we will use zelig5 package syntax.

Summarizing the Model in OddsRatios.

From the below table, it can been seen that the association between working hours and mental distress level is not significant. However, it can been interpreted that the people, who are below poverty threshold have 3.68 times higher odds of having severe mental distress level.

Similarly, females have 1.58 times higher odds than males of having mental distress level, while other variables in the model are help constant.

## Model: 
## 
## Call:
## stats::glm(formula = K6.f ~ hourswrk1 * female + poor_threshold + 
##     age + hourswrk2, family = binomial("logit"), data = as.data.frame(.))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.4761  -0.2086  -0.1843  -0.1607   2.9737  
## 
## Coefficients:
##                  Estimate (OR) Std. Error (OR) z value Pr(>|z|)    
## (Intercept)           0.009869        0.006445  -7.072 1.53e-12 ***
## hourswrk1             1.002751        0.017860   0.154    0.877    
## female                1.616132        0.993629   0.781    0.435    
## poor_threshold        3.686797        0.654343   7.351 1.96e-13 ***
## age                   1.001624        0.004797   0.339    0.735    
## hourswrk2             1.076596        0.154857   0.513    0.608    
## hourswrk1:female      1.001263        0.013280   0.095    0.924    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2627.1  on 13390  degrees of freedom
## Residual deviance: 2563.0  on 13384  degrees of freedom
##   (83778 observations deleted due to missingness)
## AIC: 2577
## 
## Number of Fisher Scoring iterations: 7

Hours Work and Mental Distress.

From our earlier model, we found that the association between long working hours and mental distress level is not significant. However, we can still use the model to see that what is underlying cause behind long working hours being a weak predictor. From the figure below we can see that there is a very weak assocation between mental distress and hours worked. Almost to the level of 60 hours per week, the affect of long working on mental health is trivial.

Age and Mental Distress Level

Age is an immutable charteristic. It means that whether age has an siginificane or not it is an important factor, which is not modifiable like other variables such long working hours. Again, from our model it was found that association betweem age and mental distress level is not significant, and the figure below gives more clear picture about this weak association. The effect of age on mental distress level is very puny. Even at the age of 70 years, the mental distress expected value of mental distress level is 0.025.

Simulation for Gender Difference.

Statistical models
Model 1
(Intercept) -4.62***
(0.65)
hourswrk1 0.00
(0.02)
female 0.48
(0.61)
poor_threshold 1.30***
(0.18)
age 0.00
(0.00)
hourswrk2 0.07
(0.14)
hourswrk1:female 0.00
(0.01)
AIC 2577.02
BIC 2629.54
Log Likelihood -1281.51
Deviance 2563.02
Num. obs. 13391
p < 0.001, p < 0.01, p < 0.05

First Differences for Variable “Female” (Where 1= Female and 0 = Male)

The first differences show us that “x1” which means female=0 (male) are 0.009 less likely to develop severe mental distress as compare to female(s).

##        V1           
##  Min.   :-0.017601  
##  1st Qu.:-0.011555  
##  Median :-0.009922  
##  Mean   :-0.009918  
##  3rd Qu.:-0.008258  
##  Max.   :-0.001353

Simulation for Poverty Threshold.

Statistical models
Model 1
(Intercept) -4.24***
(0.60)
hourswrk1 0.00
(0.02)
poor_threshold 1.30
(0.82)
age 0.00
(0.00)
hourswrk2 0.06
(0.14)
hourswrk1:poor_threshold 0.00
(0.02)
AIC 2593.24
BIC 2638.26
Log Likelihood -1290.62
Deviance 2581.24
Num. obs. 13391
p < 0.001, p < 0.01, p < 0.05

First Differences for Variable “Poor Threshold” (Where 1= Below Povety Threshold and 0 = Above Poverty Threshold)

The first difference tells, that people who are above poverty threshold have 0.04 less likely to develop mental/severe distress levels, as compare to those who are below poverty threshold.

Testing for Gender Differences in Working Hours.

To find the specific working hours category that has the highest proabibility, where an indiviualt can develop mental distress, we use the variable “hourswrk2”. The variable hourswrk2 is recoded in four categories: 1- 35-40, 2- 41-49, 3-50-59, 4-60+. We will use the last three categories to test the economic differences using Zelig. We also want to see the indiviualt effect of the specific category on gender. So use variable female which coded 1=Females and 0=Males.

The figure below shows that the females who work betweem 41-49 hours are more likely to develop severe mental distress as compare to men wo work between 41-49 hours.

Interestingly, females who work between 50-59 hours have lower probability of developing mental distress as comapre to men.

Similary, Men who work 60 hours or more per week have slightly lower probility of developing mental distress as compare women.

Histogram for Predicted Probilities of Mental Distress Levels.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusion

We used GGPLOT to construct a histogram based on average probabilities of the three different categories of working hours. It seems to be clear, that there are very few people in the 60+ weekly working hours category, and those people also have elevated odds of develop severe mental distress levels. In the end, I would to conclude that the results from this analyss show that the association between long working hours and mental distress levels is not significant and very weak. It is very important to mention, that this dataset of single year (2016) does not give textured results. It is further required to use at least a 5 year sample / data to gain more meaningful insights regarding the impact of long working hours and severe mental distress.