Analytics Final Project

Background

While brainstorming topics for our project, the “World Mental Health Day” drew our attention.

World Mental Health Day

In the UK, 1 in 6 people report experiencing mental health problems every week, with 1 in 4 adults experiencing mental illness in their lifetime (here for more details). Though sadly, the story does not end here; major depression is the second leading cause of disability worldwide, and mental health is a major contributor to suicide.
Although many people are content with their standards of living, there are also a vast number of people who suffer from mental illnesses, and unfortunately, those struggling with mental health choose far too often to take their own life (click here for further data on suicide and related factors).
The statistics speak for themselves. Suicide is the single biggest killer of men under the age of 45 in the UK, and worldwide one man dies by suicide every minute of every day (Movember, 2020).
In the face of these statistics, naturally we ask how can suicide be prevented? Can contemporary economics explain these numbers? Can suicide rates be reduced through improved economic welfare and higher incomes? Is suicide rate perhaps related to changes in GDP per capita? Can we see something in this graph?

To find out whether GDP per capita impacts suicide rate, we started our analysis.

Introduction

This study aims to explore the causal relationship between GDP per capita and suicide rates in the UK, using time series data between the years 1985 and 2010. Specifically, we explore the hypothesis that growth of GDP per capita reduces suicide rates. The supporting causal mechanism is that rising GDP per capita may result in improved living standards, an alleviation of extreme poverty, and improved economic development. Thus, an increase in GDP per capita may improve individuals’ welfare and reduce the rate of suicide. Although our initial findings confirm this hypothesis, the impact of changes in GDP per capita upon suicide rates is limited. Interestingly, controlling for time effects and the non-stationarity of log_GDPpc, the causal effect becomes positive, suggesting that an increase in the growth rate of GDP per capita increases suicide rates in the UK.

Methods and data

To explore the causal relationship between changes in GDP per capita and suicide rates, we first estimate the following simple OLS regression: \[Suicide~rate = \beta_0 + \beta_1log~GDP~per~capita + \epsilon\] where \(Suicide~rate_t\) is the number of Suicide per 100 thousand people (suicide/100k) and \(log~GDP~per~capita_t\) is the annual change in GDP per capita. Suicide rate data is compiled from a dataset backed by sources including the UNDP, World Bank and WHO. (more details here).
However, omitted variable bias and endogeneity issues are likely to confound these results, and thus regressions are re-run with relevant control variables and instrumental variables to better account for these factors. After running an IV regression, we test and control for time effects and non-stationarity, in order to determine the true causal effect of changes in GDP per capita upon suicide rates in the UK.

Our first control is gender, through the dummy variable sexfemale, which takes the value of 1 if an individual is female, and 0 if male. We control for gender as we believe suicide rates may differ between genders, due to factors such as gender discrimination and differing/unequal societal expectations/norms for men and women.

Secondly, we include the unemployment rate, as increased unemployment may reduce incomes, lead to relative poverty and even homelessness, propelling suicide rates. Moreover, unemployment may also negatively impact individuals’ sense of value, resulting in social alienation, causing health problems, and mental illnesses (Effects of unemployment on health (US Library of Health), further increasing suicide rates. Therefore, we may expect the exclusion of unemployment rate in the initial simple OLS model to cause downward bias on the GDP per capita coefficient, overstating a negative impact of GDP per capita upon suicide rates - as rises in GDP per capita are likely negatively correlated with unemployment, and unemployment positively correlated with suicide rates.

We also control for inequality, using income inequality as a proxy, by including Gini index. Robert Merton’s ‘Social Strain Theory’ posits that high concentrations of wealth in a country lead to unequal opportunities and enormous pressure on society’s poorest, resulting in mental strain and suicide (Merton, 1938). Mean household size is also included, as smaller households may result in increased loneliness of individuals as well as social alienation. Thus, we expect household sizes to be negatively associated with suicide rate, with lower average household size leading to higher suicide rates (Neumayer, 2003).

The final control variable is secondary school enrolment rates, as a proxy for education. Education may be an important factor affecting suicide rates, as educated individuals may be more aware of the importance of mental health and take precautions to protect it, hence reducing suicde rate.

Results

Looking at a scatter plot of suicide/100K and log GDP per capita, we see a clear negative relationship between the two variables.

## 
## The downloaded binary packages are in
##  /var/folders/tg/ndf9pfq972n0nl0yrwftf9rh0000gn/T//RtmpCAFdy1/downloaded_packages

【Simple Regression】
This relationship is also highlighted by the simple OLS regression:

## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.729 -4.484 -2.458  5.486 15.464 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   28.540      7.862   3.630 0.000331 ***
## log_GDPpc     -2.054      0.771  -2.664 0.008135 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.107 on 310 degrees of freedom
## Multiple R-squared:  0.02237,    Adjusted R-squared:  0.01922 
## F-statistic: 7.095 on 1 and 310 DF,  p-value: 0.008135

The results suggest a significant negative relationship between GDP per capita and suicide rate, with a 1 percent rise in \(log~GDP~per~capita\) resulting in a 0.0205 fall in suicide/100k (~2 less deaths per 10M people). As the p-value is 0.008135, we reject the null hypothesis that \(\beta_1=0\) , and thus the coefficient is statistically significant at the 1% level.

However, despite statistical significance, the coefficient is relatively small, and the model has a very low adjusted R-squared value, of only 0.0192. As a result, only 1.9% of variation in suicide rates is actually explained by changes in GDP per capita. Thus, from these findings changes in GDP per capita are unlikely to be a significant driver of suicide rates in the UK.
Moreover, these results may not indicate the true causal impact of changes in GDP per capita on the suicide rate, due to a number of confounding factors, including omitted variable bias. As a result, the low coefficient may be a result of upward bias, making the causal effect appear smaller than it actually is, or conversely downward bias, with the real impact of GDP per capita perhaps being positive. Consequently, we construct a multivariate regression model to better control for confounding factors.
【multivariate regression】

## 
## The downloaded binary packages are in
##  /var/folders/tg/ndf9pfq972n0nl0yrwftf9rh0000gn/T//RtmpCAFdy1/downloaded_packages

\[Suicide~rate = \beta_0 +\beta_1log~GDP~per~capita+\beta_2female+\beta_3unemployment+\beta_4Gini+\beta_5Average~household~size+\beta_6Secondary~School~enrolment+\epsilon\]

The results from the multivariate regression, including control variables, can be seen below.

## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3483  -0.9902   0.5394   1.6700   9.2838 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -9.58045  104.60263  -0.092    0.927    
## log_GDPpc                        -0.85477    2.60418  -0.328    0.743    
## sexfemale                         7.79822    0.62419  12.493   <2e-16 ***
## unemployment                     -0.10819    0.33453  -0.323    0.747    
## Gini_index                       -0.04497    0.37068  -0.121    0.904    
## Average_household_size           10.90709   30.59480   0.357    0.722    
## Secondary_School_enrolment_rate  -0.02075    0.18254  -0.114    0.910    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.187 on 173 degrees of freedom
##   (132 observations deleted due to missingness)
## Multiple R-squared:  0.4769, Adjusted R-squared:  0.4588 
## F-statistic: 26.29 on 6 and 173 DF,  p-value: < 2.2e-16

Firstly, the coefficient of GDP per capita increases (to -0.85) and becomes statistically insignificant at all levels (with a p-value of 0.743). Secondly, the adjusted R-squared value significantly increases compared with the simple OLS regression, from 0.01922 to 0.4588. Thus, the multivariate regression accounts for almost half of the variation in suicide rates. Thirdly, we find that, with the exception of the gender dummy variable, none of the additional control variables are statistically significant. As a result, it would appear that only gender has a significant effect on suicide, with women committing almost 8 more suicide/100k, compared to men, ceteris paribus. This may be indicative of gender inequality and societal pressures on women, such as gender discrimination, which may result in increased mental illness and consequently higher suicide rates compared with their male counterparts.

However, we cannot immediately conclude that changes in GDP per capita are not statistically significant in explaining suicide rates. Rather, this result may be due to multicollinearity between the explanatory variables, whilst the variables may also have joint significance. Consequently, we run several diagnostics tests, including an F-test for joint significance, and computing VIFs to determine whether multicollinearity plays a role.

【Multicollinearity check】

##                                  log_GDPpc unemployment Gini_index
## log_GDPpc                        1.0000000   -0.8193791  0.8653588
## unemployment                    -0.8193791    1.0000000 -0.7648058
## Gini_index                       0.8653588   -0.7648058  1.0000000
## Average_household_size                  NA           NA         NA
## Secondary_School_enrolment_rate  0.7337892   -0.5782068  0.4842689
##                                 Average_household_size
## log_GDPpc                                           NA
## unemployment                                        NA
## Gini_index                                          NA
## Average_household_size                               1
## Secondary_School_enrolment_rate                     NA
##                                 Secondary_School_enrolment_rate
## log_GDPpc                                             0.7337892
## unemployment                                         -0.5782068
## Gini_index                                            0.4842689
## Average_household_size                                       NA
## Secondary_School_enrolment_rate                       1.0000000

## Linear hypothesis test
## 
## Hypothesis:
## log_GDPpc = 0
## sexfemale = 0
## unemployment = 0
## Gini_index = 0
## Average_household_size = 0
## Secondary_School_enrolment_rate = 0
## 
## Model 1: restricted model
## Model 2: `suicide/100k pop` ~ log_GDPpc + sex + unemployment + Gini_index + 
##     Average_household_size + Secondary_School_enrolment_rate
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    179 5798.6                                  
## 2    173 3033.1  6    2765.5 26.289 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##                       log_GDPpc                             sex 
##                        3.735700                        1.000000 
##                    unemployment                      Gini_index 
##                        1.381500                        2.838073 
##          Average_household_size Secondary_School_enrolment_rate 
##                        3.826946                        1.791867

From the above tests, we find that most of the control variables are highly correlated, and so there may well be a multicollinearity issue. Moreover, from the joint hypothesis test, the p-value is very small, thus we can reject the null hypothesis and conclude that the coefficients are jointly significant. However, despite the correlation between variables, the VIFs are less than 5 (though greater than 1), thus although multicollinearity may be an issue, it is unlikely to be problematic.

Despite this, including additional control variables does not always provide a better estimate of the causal relationship. In this case, there may be a causal channel from changes in GDP per capita via unemployment to suicide rate: an increase in GDP per capita increases the demand for labour, thus reducing unemployment, and therefore results in lower suicide rates. Consequently, by including unemployment as a control variable, we shut-down this causal channel and so the estimate of the coefficient shows only the impact of a change in GDP per capita whilst holding unemployment constant, and not the true causal effect on suicide rates. This may be particularly problematic for policymakers who wish to explore the full causal impact of changes in GDP per capita on suicide rates.

As a result, to further address this endogeneity problem, we include the inflation rate and the price of Brent crude oil as two instrumental variables for log GDP per capita and Gini index.
【IV Regression】

## 
## Call:
## ivreg(formula = `suicide/100k pop` ~ log_GDPpc + Gini_index + 
##     sex + unemployment + Average_household_size + Secondary_School_enrolment_rate | 
##     inflation_rate + oil_price + sex + unemployment + Average_household_size + 
##         Secondary_School_enrolment_rate, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3709  -1.0246   0.5993   1.6956   9.3571 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -31.02378  398.50198  -0.078    0.938    
## log_GDPpc                        -0.97116    5.35803  -0.181    0.856    
## Gini_index                        0.11534    3.35336   0.034    0.973    
## sexfemale                         7.79822    0.62454  12.486   <2e-16 ***
## unemployment                     -0.07172    0.85050  -0.084    0.933    
## Average_household_size           15.88603   92.53431   0.172    0.864    
## Secondary_School_enrolment_rate   0.02855    0.99476   0.029    0.977    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.19 on 173 degrees of freedom
## Multiple R-Squared: 0.4763,  Adjusted R-squared: 0.4582 
## Wald test: 26.24 on 6 and 173 DF,  p-value: < 2.2e-16

The results of the instrumental variable regressions can be seen below 【Instrument Tests】

## 
## Call:
## lm(formula = Gini_index ~ oil_price + inflation_rate, data = master)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4375 -0.9892  0.1335  0.9589  2.5507 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    33.307490   0.226029 147.359  < 2e-16 ***
## oil_price       0.047127   0.003778  12.474  < 2e-16 ***
## inflation_rate -0.257925   0.047086  -5.478 8.94e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.504 on 309 degrees of freedom
## Multiple R-squared:  0.4128, Adjusted R-squared:  0.409 
## F-statistic: 108.6 on 2 and 309 DF,  p-value: < 2.2e-16

## Linear hypothesis test
## 
## Hypothesis:
## oil_price = 0
## 
## Model 1: restricted model
## Model 2: Gini_index ~ oil_price + inflation_rate
## 
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    310 1051.58                                  
## 2    309  699.41  1    352.18 155.59 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Call:
## lm(formula = log_GDPpc ~ oil_price + inflation_rate, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.67023 -0.11184  0.06906  0.13430  0.40419 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    10.1280262  0.0373532  271.14   <2e-16 ***
## oil_price       0.0122607  0.0006244   19.64   <2e-16 ***
## inflation_rate -0.1094714  0.0077813  -14.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2486 on 309 degrees of freedom
## Multiple R-squared:  0.6956, Adjusted R-squared:  0.6936 
## F-statistic:   353 on 2 and 309 DF,  p-value: < 2.2e-16

## Linear hypothesis test
## 
## Hypothesis:
## inflation_rate = 0
## 
## Model 1: restricted model
## Model 2: log_GDPpc ~ oil_price + inflation_rate
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    310 31.336                                  
## 2    309 19.101  1    12.235 197.92 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Call:
## ivreg(formula = `suicide/100k pop` ~ log_GDPpc + Gini_index + 
##     sex + unemployment + Average_household_size + Secondary_School_enrolment_rate | 
##     inflation_rate + oil_price + sex + unemployment + Average_household_size + 
##         Secondary_School_enrolment_rate, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3709  -1.0246   0.5993   1.6956   9.3571 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -31.02378  398.50198  -0.078    0.938    
## log_GDPpc                        -0.97116    5.35803  -0.181    0.856    
## Gini_index                        0.11534    3.35336   0.034    0.973    
## sexfemale                         7.79822    0.62454  12.486   <2e-16 ***
## unemployment                     -0.07172    0.85050  -0.084    0.933    
## Average_household_size           15.88603   92.53431   0.172    0.864    
## Secondary_School_enrolment_rate   0.02855    0.99476   0.029    0.977    
## 
## Diagnostic tests:
##                               df1 df2 statistic p-value    
## Weak instruments (log_GDPpc)    2 173   146.213  <2e-16 ***
## Weak instruments (Gini_index)   2 173     2.906  0.0574 .  
## Wu-Hausman                      2 171     0.002  0.9977    
## Sargan                          0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.19 on 173 degrees of freedom
## Multiple R-Squared: 0.4763,  Adjusted R-squared: 0.4582 
## Wald test: 26.24 on 6 and 173 DF,  p-value: < 2.2e-16

Interestingly, by including instrumental variables, the effect of GDP per capita becomes stronger (though not by much), with an estimated coefficient of -0.97 for logGDPpc. This indicates that an increase in GDP per capita by 1% results in a reduction of 0.0097 suicide per 100k, i.e. ~1 suicide per 10M population. This may suggest that the IV model does address the endogeneity issue in the multivariate OLS regression.

However, a variable can only be a good instrument if it satisfies the following three criteria. First, the instrument must be independent of shocks. Second, it must be a driver of the variable of interest. Third, it must not affect outcome variables other than through the variable of interest. As a result, although criteria 1 and 3 may be argued, criteria 2 is tested for through the weak instruments test (above). From the results, at least one of the included instruments is strongly significant (meeting criteria 2). This further reinforces the joint significance found from the F-test, suggesting that our instruments are indeed valid. Despite this, the (Wu-)Hausman test for endogeneity indicates a large p-value. Thus, we cannot reject the null hypothesis that the variable of concern is uncorrelated with the error term, meaning that endogeneity might not actually be a significant issue in our original multivariate regression. Consequently, in this case the OLS regression may be justified as a good estimate of the causal relationship, and preferred over the instrumental variables regression.
【Time Series】

However, as log GDP per capita and suicide rates may have grown or shrunk continuously over the time period, our estimates of the causal effect of changes in GDP per capita on suicide rates may be biased. To explore this further, we look at the time trend of suicide rate and log GDP per capita between 1985 and 2010.

The graph shows a clear negative trend for suicide rates over the period, and a positive trend for log_GDPpc. Thus, it is possible that the time trend is indeed a confounding factor, which may capture the impact of changes in factors over time, such as an increased awareness and treatment of mental health, social and demographic changes, as well as GDP growth from technological and productivity advancements over time. Consequently, this may result in a bias of our estimated causal effect. Therefore, we control for the time trend by including timeline as a variable and run the following regression:

\[Suicide~rate_t = \beta_0 +\beta_1logGDPpc_t+\beta_2t+\epsilon\]

## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + t, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.345 -4.467 -2.292  5.351 14.253 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -33.94727   24.52622  -1.384  0.16732   
## log_GDPpc     4.61451    2.59674   1.777  0.07654 . 
## t            -0.03474    0.01293  -2.687  0.00761 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.047 on 309 degrees of freedom
## Multiple R-squared:  0.04469,    Adjusted R-squared:  0.03851 
## F-statistic: 7.228 on 2 and 309 DF,  p-value: 0.0008557

Surprisingly, the results indicate that for every 1 percentage increase in GDP per capita, suicide rate increases by 0.04 per 100k people, i.e. 4 more people commit suicide per 10M people. Thus, when controlling for the time trend, the causal relationship between GDP per capita and suicide rate becomes positive, opposed to negative, and statistically significant (at the 1% level) and larger in magnitude than in the previous regressions. However, to check the validity of this relationship, we must use Dickey-Fuller Tests to test whether the series is stationary.

【Unit Root Tests & First Differences】

# Testing autoregression
library(urca)
ur.df(master$log_GDPpc,type = "none",lags = 1)%>%summary()

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.203932 -0.004974 -0.004753 -0.004565  0.207565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## z.lag.1     0.0004682  0.0001810   2.586   0.0102 *
## z.diff.lag -0.0217253  0.0569938  -0.381   0.7033  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03215 on 308 degrees of freedom
## Multiple R-squared:  0.02126,    Adjusted R-squared:  0.0149 
## F-statistic: 3.345 on 2 and 308 DF,  p-value: 0.03656
## 
## 
## Value of test-statistic is: 2.5864 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

ur.df(master$`suicide/100k pop`,type = "none",lags = 1)%>%summary()

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7112 -0.9534  0.0740  0.8879 23.9185 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.17665    0.03269  -5.403 1.31e-07 ***
## z.diff.lag  0.02649    0.05650   0.469    0.639    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.395 on 308 degrees of freedom
## Multiple R-squared:  0.08961,    Adjusted R-squared:  0.0837 
## F-statistic: 15.16 on 2 and 308 DF,  p-value: 5.263e-07
## 
## 
## Value of test-statistic is: -5.4032 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

# Getting rid of unit roots
ur.df(diff(master$log_GDPpc,1),type = "none",lags = 1)%>%summary()

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.1989  0.0000  0.0000  0.0000  0.2119 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -1.000e+00  8.071e-02  -12.39   <2e-16 ***
## z.diff.lag -4.887e-33  5.707e-02    0.00        1    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03255 on 307 degrees of freedom
## Multiple R-squared:    0.5,  Adjusted R-squared:  0.4967 
## F-statistic: 153.5 on 2 and 307 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -12.3895 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

ur.df(diff(master$`suicide/100k pop`,1),type = "none",lags = 1)%>%summary()

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.233 -2.285 -1.113 -0.372 23.660 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -1.17241    0.08249 -14.213   <2e-16 ***
## z.diff.lag  0.10846    0.05670   1.913   0.0567 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.621 on 307 degrees of freedom
## Multiple R-squared:  0.5344, Adjusted R-squared:  0.5314 
## F-statistic: 176.2 on 2 and 307 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -14.2127 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

## 
## Call:
## lm(formula = `suicide/100k pop` ~ Dlog_GDPpc + t, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.705 -4.229 -2.344  4.786 22.625 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.884911   0.671683  13.228  < 2e-16 ***
## Dlog_GDPpc  53.548093  10.341900   5.178 4.06e-07 ***
## t           -0.010004   0.003683  -2.716  0.00698 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.799 on 308 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1087, Adjusted R-squared:  0.1029 
## F-statistic: 18.78 on 2 and 308 DF,  p-value: 2.018e-08

Setting the null hypothesis for the coefficient on the lagged term equal to 0, and the alternate hypothesis that the coefficient is less than 0, we find that the t-stat for log_GDPpc equals 2.586, and -5.403 for suicide/100K. As a result, although we reject the null hypothesis for suicide/100K, the t-statistic is greater than the critical value for log_GDPpc at the 5% significance level, 2.5864>-1.95, and therefore, we fail to reject the null hypothesis for log_GDPpc. Thus, we conclude that for log_GDPpc, there is a unit root present, i.e. data is a random walk. Therefore, OLS estimation of the coefficients on regressors have a stochastic trend. This is problematic, as the distribution of the estimator and its t-statistic is non-normal, even asymptotically (Hanck et al., 2020). To address the unit root issue, we take differences of the log_GDPpc variable, test whether the unit root has been removed after first-differencing, and run the regression of suicide rate on change in logGDPpc.

Interestingly, our revised regression (without unit root) indicates that for a 1% increase in GDP per capita growth, there is an increase in the suicide rate of 0.535/100K. i.e. 1 more person commits suicide per 200K population. This is also strongly statistically significant, and far larger than in previous estimations. As a result, it is likely that our previously obtained estimates were impacted by timeline effects and non-stationarity, and, thus when controlling for these factors, the true causal relationship between log_GDPpc and suicide rate is actually positive. Consequently, an increase in GDP per capita relates to an increase in suicide/100k rather than a fall in the suicide rate as hypothesised. One explanation for this may be that a growth in the growth rate of GDP per capita results in an increase in individuals’ workloads and work pressure. This may consequently worsen individuals’ work-life balance and mental health, and thus result in higher suicide rates.

Conclusion

As we have become more aware of the impact of mental health and the significance of suicide as a leading cause of death, the importance of reducing suicide rates has become ever more apparent. For policymakers around the world, and more specifically in the UK, the impact of economic instruments and the causal channels which effect suicide rates is thus of paramount importance. In the case of GDP per capita, it would appear that increased growth rates may have potentially deadly consequences.

In this study, we explored the link between GDP per capita and suicide per 100 thousand population in the UK, across the time period 1985-2010. Interestingly, we found a robust and significant positive relationship between increases in the growth rate of GDP per capita and the suicide rate. If we take these numbers at face value, they imply that an increase in the growth rate of GDP per capita by 1% results in approximately 1 more suicide per 200 thousand population. This relates to an increase of around 7% from the mean suicide/100K.

Although this figure should be subject to further scrutiny, for example through using more sophisticated empirical models, the impact of increased growth rates upon suicide is clearly a topic which merits further discussion; including further exploration of the impacts of age and gender, beyond the scope of this study. Though, from these findings it may be imperative for policymakers to maintain steady growth rates in GDP per capita, as well as tackle various aspects of the causal mechanism running between GDP per capita and suicide rates, for example through implementing safeguards and regulation to better protect working conditions, work-life balance and mental health, in order to combat the impact of these effects.

Appendix

Merton, Robert K. “Social Structure and Anomie.” American Sociological Review, vol. 3, no. 5, 1938, pp. 672–682. JSTOR, www.jstor.org/stable/2084686. Accessed 26 Nov. 2020.

Neumayer, E., 2003. Socioeconomic Factors and Suicide Rates at Large-unit Aggregate Levels: A Comment. Urban Studies, 40(13), pp.2769-2776. Hanck et al.,(2020) Introduction to Econometrics with R. Germany. University of Duisburg-Essen. Accessed 29 Nov. 2020.

——————————————————————————————

【Multivariate Regressions by Age groups】

# separate age into different age groups and compare the differences in casaul effects
library(dplyr)
master1=master%>%mutate(age=factor(age,label=c("75+ years","55-74 years","35-54 years","25-34 years","15-24 years","5-14 years")))
# "75+ years"
master1$age=as.character(master1$age)

ages <- names(table(master1$age))

AG <- list()
for (i in ages) {
  year=master1%>%filter(age==i)
  r=lm(`suicide/100k pop`~log_GDPpc+sex+unemployment+Gini_index+Average_household_size+Secondary_School_enrolment_rate,year)
  AG[[i]] <- summary(r)
}

AG

## $`15-24 years`
## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77209 -0.22960 -0.00992  0.23904  0.82983 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -10.604168  25.744942  -0.412    0.684    
## log_GDPpc                         0.016060   0.640944   0.025    0.980    
## sexfemale                         6.978000   0.153626  45.422   <2e-16 ***
## unemployment                      0.055232   0.082335   0.671    0.509    
## Gini_index                        0.059306   0.091233   0.650    0.522    
## Average_household_size            4.779799   7.530035   0.635    0.532    
## Secondary_School_enrolment_rate   0.004333   0.044926   0.096    0.924    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4207 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.989,  Adjusted R-squared:  0.9861 
## F-statistic: 344.2 on 6 and 23 DF,  p-value: < 2.2e-16
## 
## 
## $`25-34 years`
## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.064595 -0.045739 -0.006506  0.031459  0.089651 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)                     -0.224570   3.028473  -0.074    0.942
## log_GDPpc                        0.039840   0.075397   0.528    0.602
## sexfemale                        0.024000   0.018072   1.328    0.197
## unemployment                    -0.006884   0.009685  -0.711    0.484
## Gini_index                       0.007187   0.010732   0.670    0.510
## Average_household_size           0.134207   0.885786   0.152    0.881
## Secondary_School_enrolment_rate -0.006282   0.005285  -1.189    0.247
## 
## Residual standard error: 0.04949 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.3244, Adjusted R-squared:  0.1482 
## F-statistic: 1.841 on 6 and 23 DF,  p-value: 0.1352
## 
## 
## $`35-54 years`
## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.02225 -0.45863 -0.01513  0.30943  1.32638 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      25.698345  35.017811   0.734    0.470    
## log_GDPpc                         0.448191   0.871801   0.514    0.612    
## sexfemale                        11.764667   0.208959  56.301   <2e-16 ***
## unemployment                      0.108176   0.111991   0.966    0.344    
## Gini_index                       -0.003095   0.124093  -0.025    0.980    
## Average_household_size          -12.999469  10.242219  -1.269    0.217    
## Secondary_School_enrolment_rate   0.043819   0.061108   0.717    0.481    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5723 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9928, Adjusted R-squared:  0.9909 
## F-statistic: 529.9 on 6 and 23 DF,  p-value: < 2.2e-16
## 
## 
## $`5-14 years`
## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82475 -0.49633 -0.00738  0.45892  2.26588 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -122.31284   59.65597  -2.050  0.05190 .  
## log_GDPpc                         -0.24281    1.48519  -0.163  0.87156    
## sexfemale                          9.10533    0.35598  25.578  < 2e-16 ***
## unemployment                      -0.27834    0.19079  -1.459  0.15811    
## Gini_index                        -0.05549    0.21140  -0.262  0.79529    
## Average_household_size            53.24608   17.44853   3.052  0.00566 ** 
## Secondary_School_enrolment_rate    0.05690    0.10410   0.547  0.58995    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9749 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9678, Adjusted R-squared:  0.9593 
## F-statistic:   115 on 6 and 23 DF,  p-value: 5.598e-16
## 
## 
## $`55-74 years`
## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.03843 -0.74893 -0.08662  0.82567  3.13903 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      22.4022    77.9391   0.287   0.7764    
## log_GDPpc                        -2.9806     1.9404  -1.536   0.1382    
## sexfemale                        12.3900     0.4651  26.641   <2e-16 ***
## unemployment                     -0.4562     0.2493  -1.830   0.0802 .  
## Gini_index                       -0.1691     0.2762  -0.612   0.5463    
## Average_household_size           15.5822    22.7961   0.684   0.5011    
## Secondary_School_enrolment_rate  -0.1556     0.1360  -1.144   0.2644    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.274 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9694, Adjusted R-squared:  0.9614 
## F-statistic: 121.5 on 6 and 23 DF,  p-value: 3.063e-16
## 
## 
## $`75+ years`
## 
## Call:
## lm(formula = `suicide/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0012 -0.4932 -0.1408  0.4898  1.2717 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     27.55832   45.20080   0.610   0.5480    
## log_GDPpc                       -2.40929    1.12532  -2.141   0.0431 *  
## sexfemale                        6.52733    0.26972  24.200   <2e-16 ***
## unemployment                    -0.07111    0.14456  -0.492   0.6274    
## Gini_index                      -0.10862    0.16018  -0.678   0.5045    
## Average_household_size           4.69973   13.22060   0.355   0.7255    
## Secondary_School_enrolment_rate -0.06768    0.07888  -0.858   0.3997    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7387 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9638, Adjusted R-squared:  0.9544 
## F-statistic: 102.1 on 6 and 23 DF,  p-value: 2.084e-15