Background

When we first thought about the topic for our group project, the World Mental Health Day came.


Although many of us are content with current life quality, there are people suffering from mental problems and choosing to end their life. (Here are some data and graphs about suicide and its main porperties Suicide)
We then question ourselves: Is suicide rate related to GDP growth? Do gender differences or age groups matter? Can we see something in this graph?


To find out whether GDP is an indicator of suicide rate, we start our analysis.

Introduction

Here we are going to look into the causal relationship between GDP per capita and suicide rates using time series data for the United Kingdom. Specifically, we explore the hypothesis that a postive change in GDP per capita will cause a fall in suicide rates. A mechanism that would support this is that rises in GDP per capita may indicate an increase in living standards and economic development, alleviation of extreme poverty, and therefore reduce the rate of suicide. Therefore, there will be a negative correlation between the two variables.

Methods and data

To start with, we run a simple regression of the following form: \[Suicide~rate = \beta_0 + \beta_1log~GDP~per~capita + \epsilon\] where \(Suicide~rate_t\) is the suicide rate per 100k people, \(log~GDP~per~capita_t\) is the annual GDP per capita growth rate. We are using data for United Kingdom and Thailand from 1985-2010. To measure the causal relationship, we first analyse the two countries separately and then combine the two datasets to make a more balanced measurement.

Results

We start by taking logarithm of GDP per capita and look at a scatter plot of suicide rate per 100k people on log GDP per capita.

## 
## The downloaded binary packages are in
##  /var/folders/tg/ndf9pfq972n0nl0yrwftf9rh0000gn/T//Rtmp1wzcgO/downloaded_packages

The plot suggests a negative relationship i.e. higher GDP growth is associated with lower suicide rates. We can confirm this with a regression:

## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.729 -4.484 -2.458  5.486 15.464 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   28.540      7.862   3.630 0.000331 ***
## log_GDPpc     -2.054      0.771  -2.664 0.008135 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.107 on 310 degrees of freedom
## Multiple R-squared:  0.02237,    Adjusted R-squared:  0.01922 
## F-statistic: 7.095 on 1 and 310 DF,  p-value: 0.008135

The regression result would suggest that every 1 percentage point increase of the \(log~GDP~per~capita\) leads to 0.02054 less suicides per 100k people in the UK. As the p-value is 0.008135, we can reject the null hypothesis \(\beta_1=0\) and say that the log_GDPpc coefficient is significantly different from 0 at 5%. To further look at the trend, we plot the graph below:
Both initial results matches our expectation. However, there might be a number of confounding factors which may bias the our estimated causal relationship upwards or downwards, For instance, people's education level could be an important driver of both the suicide rates and the GDP per capita growth. A better educated population may work more efficiently and result in higher GDP growth. People with higher education level may have better psychological knowledge, know more about the ways to get work life balance, resulting in a lower suicide rate. This would cause a downward bias in the coefficient estimate, implying that the effect of GDP growth on suicide rate is overestimated as it captures the effect of having higher education level, too. Alternatively, it might be the case that higher unemployment rate reduces the growth of GDP per capita and increases the suicide rate as people are more likely to feel depressed without a proper job. This would result in a downward bias in the coefficient estimate, implying that the causal effect of GDP growth on suicide rate is overestimated as it captures the effect of having lower unemployment rate, too.

## 
## The downloaded binary packages are in
##  /var/folders/tg/ndf9pfq972n0nl0yrwftf9rh0000gn/T//Rtmp1wzcgO/downloaded_packages

Hence a good idea is to introduce the potential confounding factors, i.e. unemployment rate, Gini index, sex, average household size, and secondary education enrolment as control variables. We first merge all the relevant datasets into our original data and then run a multivariate regression of the following form: \[Suicide~rate = \beta_0 +\beta_1log~GDP~per~capita+\beta_2female+\beta_3unemployment+\beta_4Gini+\beta_5Average~household~size+\beta_6Secondary~School~enrolment+\epsilon\]

Here is our multivariate regression output:

## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3483  -0.9902   0.5394   1.6700   9.2838 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -9.58045  104.60263  -0.092    0.927    
## log_GDPpc                        -0.85477    2.60418  -0.328    0.743    
## sexfemale                         7.79822    0.62419  12.493   <2e-16 ***
## unemployment                     -0.10819    0.33453  -0.323    0.747    
## Gini_index                       -0.04497    0.37068  -0.121    0.904    
## Average_household_size           10.90709   30.59480   0.357    0.722    
## Secondary_School_enrolment_rate  -0.02075    0.18254  -0.114    0.910    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.187 on 173 degrees of freedom
##   (132 observations deleted due to missingness)
## Multiple R-squared:  0.4769, Adjusted R-squared:  0.4588 
## F-statistic: 26.29 on 6 and 173 DF,  p-value: < 2.2e-16

As we include a wider set of control variables, we see two things in particular: 1) previously significant negative relationship between \(log~GDP~per~capita\) and suicide rate has now become insignificant. 2) none of the other control variables except sex is significant. We want to test that if GDP growth is really not important in explaining suicide rate and if there is really no significant relationship between suicide rate and the other control variables. We suspect that there is multicollinearity issues between the explanatory varible and in order to test that we first inspect the correlation between our control variables, then do the F-test and compute VIF fot every x-variable.

##                                  log_GDPpc unemployment Gini_index
## log_GDPpc                        1.0000000   -0.8193791  0.8653588
## unemployment                    -0.8193791    1.0000000 -0.7648058
## Gini_index                       0.8653588   -0.7648058  1.0000000
## Average_household_size                  NA           NA         NA
## Secondary_School_enrolment_rate  0.7337892   -0.5782068  0.4842689
##                                 Average_household_size
## log_GDPpc                                           NA
## unemployment                                        NA
## Gini_index                                          NA
## Average_household_size                               1
## Secondary_School_enrolment_rate                     NA
##                                 Secondary_School_enrolment_rate
## log_GDPpc                                             0.7337892
## unemployment                                         -0.5782068
## Gini_index                                            0.4842689
## Average_household_size                                       NA
## Secondary_School_enrolment_rate                       1.0000000
## Linear hypothesis test
## 
## Hypothesis:
## log_GDPpc = 0
## sexfemale = 0
## unemployment = 0
## Gini_index = 0
## Average_household_size = 0
## Secondary_School_enrolment_rate = 0
## 
## Model 1: restricted model
## Model 2: `suicides/100k pop` ~ log_GDPpc + sex + unemployment + Gini_index + 
##     Average_household_size + Secondary_School_enrolment_rate
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    179 5798.6                                  
## 2    173 3033.1  6    2765.5 26.289 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##                       log_GDPpc                             sex 
##                        3.735700                        1.000000 
##                    unemployment                      Gini_index 
##                        1.381500                        2.838073 
##          Average_household_size Secondary_School_enrolment_rate 
##                        3.826946                        1.791867

We see that most of the control variables are highly correlated so there might be a multicollinearity issue. Our joint hypothesis test calculates that p-value is very small, meaning that we can reject the null and the coefficients are significantly different from zero. However, the VIFs are less than 5 but greater than 1, which informs us that multicollinearity may be an issue but not so problematic in this case.

There are clearly further concerns to explore. For instance, including additional control variables does not always give a better estimate of causal relationship. There could be a causal channel that goes from GDP growth via unemployment to suicide rate: i.e. higher GDP growth increases the demand for labour, reduces unemployment, and therefore results in lower suicide rate. By including unemployment we would shut down this part of the causal effect of GDP growth.

To further address the endogeneity problem, we use inflation rate and brent crude oil price as two instrumental variables for \(log~GDP~per~capita\) and \(Gini~index\) and do the following regressions:

## 
## Call:
## ivreg(formula = `suicides/100k pop` ~ log_GDPpc + Gini_index + 
##     sex + unemployment + Average_household_size + Secondary_School_enrolment_rate | 
##     inflation_rate + oil_price + sex + unemployment + Average_household_size + 
##         Secondary_School_enrolment_rate, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3709  -1.0246   0.5993   1.6956   9.3571 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -31.02378  398.50198  -0.078    0.938    
## log_GDPpc                        -0.97116    5.35803  -0.181    0.856    
## Gini_index                        0.11534    3.35336   0.034    0.973    
## sexfemale                         7.79822    0.62454  12.486   <2e-16 ***
## unemployment                     -0.07172    0.85050  -0.084    0.933    
## Average_household_size           15.88603   92.53431   0.172    0.864    
## Secondary_School_enrolment_rate   0.02855    0.99476   0.029    0.977    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.19 on 173 degrees of freedom
## Multiple R-Squared: 0.4763,  Adjusted R-squared: 0.4582 
## Wald test: 26.24 on 6 and 173 DF,  p-value: < 2.2e-16

Then we check the validity of our two instruments:

## 
## Call:
## lm(formula = Gini_index ~ oil_price + inflation_rate, data = master)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4375 -0.9892  0.1335  0.9589  2.5507 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    33.307490   0.226029 147.359  < 2e-16 ***
## oil_price       0.047127   0.003778  12.474  < 2e-16 ***
## inflation_rate -0.257925   0.047086  -5.478 8.94e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.504 on 309 degrees of freedom
## Multiple R-squared:  0.4128, Adjusted R-squared:  0.409 
## F-statistic: 108.6 on 2 and 309 DF,  p-value: < 2.2e-16
## Linear hypothesis test
## 
## Hypothesis:
## oil_price = 0
## 
## Model 1: restricted model
## Model 2: Gini_index ~ oil_price + inflation_rate
## 
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    310 1051.58                                  
## 2    309  699.41  1    352.18 155.59 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:
## lm(formula = log_GDPpc ~ oil_price + inflation_rate, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.67023 -0.11184  0.06906  0.13430  0.40419 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    10.1280262  0.0373532  271.14   <2e-16 ***
## oil_price       0.0122607  0.0006244   19.64   <2e-16 ***
## inflation_rate -0.1094714  0.0077813  -14.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2486 on 309 degrees of freedom
## Multiple R-squared:  0.6956, Adjusted R-squared:  0.6936 
## F-statistic:   353 on 2 and 309 DF,  p-value: < 2.2e-16
## Linear hypothesis test
## 
## Hypothesis:
## inflation_rate = 0
## 
## Model 1: restricted model
## Model 2: log_GDPpc ~ oil_price + inflation_rate
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    310 31.336                                  
## 2    309 19.101  1    12.235 197.92 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:
## ivreg(formula = `suicides/100k pop` ~ log_GDPpc + Gini_index + 
##     sex + unemployment + Average_household_size + Secondary_School_enrolment_rate | 
##     inflation_rate + oil_price + sex + unemployment + Average_household_size + 
##         Secondary_School_enrolment_rate, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3709  -1.0246   0.5993   1.6956   9.3571 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -31.02378  398.50198  -0.078    0.938    
## log_GDPpc                        -0.97116    5.35803  -0.181    0.856    
## Gini_index                        0.11534    3.35336   0.034    0.973    
## sexfemale                         7.79822    0.62454  12.486   <2e-16 ***
## unemployment                     -0.07172    0.85050  -0.084    0.933    
## Average_household_size           15.88603   92.53431   0.172    0.864    
## Secondary_School_enrolment_rate   0.02855    0.99476   0.029    0.977    
## 
## Diagnostic tests:
##                               df1 df2 statistic p-value    
## Weak instruments (log_GDPpc)    2 173   146.213  <2e-16 ***
## Weak instruments (Gini_index)   2 173     2.906  0.0574 .  
## Wu-Hausman                      2 171     0.002  0.9977    
## Sargan                          0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.19 on 173 degrees of freedom
## Multiple R-Squared: 0.4763,  Adjusted R-squared: 0.4582 
## Wald test: 26.24 on 6 and 173 DF,  p-value: < 2.2e-16

Our IV regression comes out with an estimated coefficient=-0.97 for \(log~GDP~per~capita\). Note that the effect of GDP growth becomes actually stronger when using the instrument. This suggests that it addresses an endogeneity issue arising from a positive correlation between un-observed heterogeneity and the endogenous variable; e.g. it could be the case that suicide rate becomes higher when brent inflation rate becomes higher since higher inflation signals better performance of the economy and therefore higher GDP growth.

Weak instruments test rejects the null, meaning that at least one instrument is strong, validating the use of our instruments as they also pass the F-test at first stage.

However our (Wu-)Hausman test for endogeneity comes out that p-value is not significant. We cannot reject the null that the variable of concern is uncorrelated with the error term, indicating that log_GDPpc and Gini_index may not be marginally endogenous. So there may not be a significant isuue of endogeneity in our original multivariate regression.

Further looking at our dataset, we recognized that age is divided into groups. We think that it might be worth comparing the causal effects of GDP growth on suicide rate in different age groups. We first observe the trend by plotting suicide rate against GDP growth for different age groups.

To check the validity of the trends observed from our plot, we run multivariate regressions for different age groups, respectively.

## $`15-24 years`
## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77209 -0.22960 -0.00992  0.23904  0.82983 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -10.604168  25.744942  -0.412    0.684    
## log_GDPpc                         0.016060   0.640944   0.025    0.980    
## sexfemale                         6.978000   0.153626  45.422   <2e-16 ***
## unemployment                      0.055232   0.082335   0.671    0.509    
## Gini_index                        0.059306   0.091233   0.650    0.522    
## Average_household_size            4.779799   7.530035   0.635    0.532    
## Secondary_School_enrolment_rate   0.004333   0.044926   0.096    0.924    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4207 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.989,  Adjusted R-squared:  0.9861 
## F-statistic: 344.2 on 6 and 23 DF,  p-value: < 2.2e-16
## 
## 
## $`25-34 years`
## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.064595 -0.045739 -0.006506  0.031459  0.089651 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)                     -0.224570   3.028473  -0.074    0.942
## log_GDPpc                        0.039840   0.075397   0.528    0.602
## sexfemale                        0.024000   0.018072   1.328    0.197
## unemployment                    -0.006884   0.009685  -0.711    0.484
## Gini_index                       0.007187   0.010732   0.670    0.510
## Average_household_size           0.134207   0.885786   0.152    0.881
## Secondary_School_enrolment_rate -0.006282   0.005285  -1.189    0.247
## 
## Residual standard error: 0.04949 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.3244, Adjusted R-squared:  0.1482 
## F-statistic: 1.841 on 6 and 23 DF,  p-value: 0.1352
## 
## 
## $`35-54 years`
## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.02225 -0.45863 -0.01513  0.30943  1.32638 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      25.698345  35.017811   0.734    0.470    
## log_GDPpc                         0.448191   0.871801   0.514    0.612    
## sexfemale                        11.764667   0.208959  56.301   <2e-16 ***
## unemployment                      0.108176   0.111991   0.966    0.344    
## Gini_index                       -0.003095   0.124093  -0.025    0.980    
## Average_household_size          -12.999469  10.242219  -1.269    0.217    
## Secondary_School_enrolment_rate   0.043819   0.061108   0.717    0.481    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5723 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9928, Adjusted R-squared:  0.9909 
## F-statistic: 529.9 on 6 and 23 DF,  p-value: < 2.2e-16
## 
## 
## $`5-14 years`
## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82475 -0.49633 -0.00738  0.45892  2.26588 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -122.31284   59.65597  -2.050  0.05190 .  
## log_GDPpc                         -0.24281    1.48519  -0.163  0.87156    
## sexfemale                          9.10533    0.35598  25.578  < 2e-16 ***
## unemployment                      -0.27834    0.19079  -1.459  0.15811    
## Gini_index                        -0.05549    0.21140  -0.262  0.79529    
## Average_household_size            53.24608   17.44853   3.052  0.00566 ** 
## Secondary_School_enrolment_rate    0.05690    0.10410   0.547  0.58995    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9749 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9678, Adjusted R-squared:  0.9593 
## F-statistic:   115 on 6 and 23 DF,  p-value: 5.598e-16
## 
## 
## $`55-74 years`
## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.03843 -0.74893 -0.08662  0.82567  3.13903 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      22.4022    77.9391   0.287   0.7764    
## log_GDPpc                        -2.9806     1.9404  -1.536   0.1382    
## sexfemale                        12.3900     0.4651  26.641   <2e-16 ***
## unemployment                     -0.4562     0.2493  -1.830   0.0802 .  
## Gini_index                       -0.1691     0.2762  -0.612   0.5463    
## Average_household_size           15.5822    22.7961   0.684   0.5011    
## Secondary_School_enrolment_rate  -0.1556     0.1360  -1.144   0.2644    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.274 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9694, Adjusted R-squared:  0.9614 
## F-statistic: 121.5 on 6 and 23 DF,  p-value: 3.063e-16
## 
## 
## $`75+ years`
## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + sex + unemployment + 
##     Gini_index + Average_household_size + Secondary_School_enrolment_rate, 
##     data = year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0012 -0.4932 -0.1408  0.4898  1.2717 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     27.55832   45.20080   0.610   0.5480    
## log_GDPpc                       -2.40929    1.12532  -2.141   0.0431 *  
## sexfemale                        6.52733    0.26972  24.200   <2e-16 ***
## unemployment                    -0.07111    0.14456  -0.492   0.6274    
## Gini_index                      -0.10862    0.16018  -0.678   0.5045    
## Average_household_size           4.69973   13.22060   0.355   0.7255    
## Secondary_School_enrolment_rate -0.06768    0.07888  -0.858   0.3997    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7387 on 23 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.9638, Adjusted R-squared:  0.9544 
## F-statistic: 102.1 on 6 and 23 DF,  p-value: 2.084e-15

Comparing regression output for different age groups, we see that elderly people experience stronger, more significant and negative relationship between GDP per capita growth and suicide rate. This negative relationship becomes much weaker for lower age groups. This may be explained by the fact that older people are more likely being dependent on state pension. A higher GDP growth economy may imply better performance of the government and therefore more generous provision of pension and national healthcare. Consequently, older people tend to get more content with their life quality at higher growth economy and hence less likely to suicide. Youngsters are more likely to have regular income so their suicide rate may be less affected by changes in GDP per capita. However, only sex variable keep being significant in all age group.

As the results obtained so far are generally not very significant, we suspect that time itself may be a confounding factor that biases the causal effect we are estimating. To see if it is really the case, we first look at the time trend of suicide rate from 1985-2010. The graph shows a clear negative trend for suicide rate and a positive one for log_GDPpc. Time trend is indeed a possible confounding factor which may create a negative bias to our causal effect estimate. Therefore, we now take control of time by including timeline as a control variable and run the regression below:

\[Suicide~rate_t = \beta_0 +\beta_1logGDPpc_t+\beta_2t+\epsilon\]

## 
## Call:
## lm(formula = `suicides/100k pop` ~ log_GDPpc + t, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.345 -4.467 -2.292  5.351 14.253 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -33.94727   24.52622  -1.384  0.16732   
## log_GDPpc     4.61451    2.59674   1.777  0.07654 . 
## t            -0.03474    0.01293  -2.687  0.00761 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.047 on 309 degrees of freedom
## Multiple R-squared:  0.04469,    Adjusted R-squared:  0.03851 
## F-statistic: 7.228 on 2 and 309 DF,  p-value: 0.0008557

The results show that for every 1 percentage increase in GDP per capita, suicide rate increased by 0.04 per 100k people, i.e. 4 more people suicide per 10M people. The causal relationship is showing to be positive, which is opposite to what we found previously. To check the validity of this positive relationship, we use Dickey-Fuller Tests to see if the series is stationary.

## 
## The downloaded binary packages are in
##  /var/folders/tg/ndf9pfq972n0nl0yrwftf9rh0000gn/T//Rtmp1wzcgO/downloaded_packages
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.203932 -0.004974 -0.004753 -0.004565  0.207565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## z.lag.1     0.0004682  0.0001810   2.586   0.0102 *
## z.diff.lag -0.0217253  0.0569938  -0.381   0.7033  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03215 on 308 degrees of freedom
## Multiple R-squared:  0.02126,    Adjusted R-squared:  0.0149 
## F-statistic: 3.345 on 2 and 308 DF,  p-value: 0.03656
## 
## 
## Value of test-statistic is: 2.5864 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7112 -0.9534  0.0740  0.8879 23.9185 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.17665    0.03269  -5.403 1.31e-07 ***
## z.diff.lag  0.02649    0.05650   0.469    0.639    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.395 on 308 degrees of freedom
## Multiple R-squared:  0.08961,    Adjusted R-squared:  0.0837 
## F-statistic: 15.16 on 2 and 308 DF,  p-value: 5.263e-07
## 
## 
## Value of test-statistic is: -5.4032 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.1989  0.0000  0.0000  0.0000  0.2119 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -1.000e+00  8.071e-02  -12.39   <2e-16 ***
## z.diff.lag -4.887e-33  5.707e-02    0.00        1    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03255 on 307 degrees of freedom
## Multiple R-squared:    0.5,  Adjusted R-squared:  0.4967 
## F-statistic: 153.5 on 2 and 307 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -12.3895 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.233 -2.285 -1.113 -0.372 23.660 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -1.17241    0.08249 -14.213   <2e-16 ***
## z.diff.lag  0.10846    0.05670   1.913   0.0567 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.621 on 307 degrees of freedom
## Multiple R-squared:  0.5344, Adjusted R-squared:  0.5314 
## F-statistic: 176.2 on 2 and 307 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -14.2127 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
## 
## Call:
## lm(formula = Dsuicides_100kpop ~ Dlog_GDPpc + t, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.122 -1.758 -0.488  0.245 36.440 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.141734   0.556220  -2.053   0.0409 *  
## Dlog_GDPpc  93.694301   8.564123  10.940   <2e-16 ***
## t            0.003925   0.003050   1.287   0.1991    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.802 on 308 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2799, Adjusted R-squared:  0.2752 
## F-statistic: 59.85 on 2 and 308 DF,  p-value: < 2.2e-16

Setting null hypothesis for coefficient on lag term=0, we get the t-stat for log_GDPpc=2.5864, much greater than the critical value at 5%=-1.95; t-stat for suicide rate=-5.4032<-1.95. Therefore, we cannot reject the null in the test for variable log_GDPpc and conclude that there is a unit root present, i.e. data is a random walk. OLS estimation of the coefficients on regressors that have a stochastic trend is problematic because the distribution of the estimator and its t-statistic is non-normal, even asymptotically. This has various consequences, for example, when two stochastically trending time series are regressed onto each other, the estimated relationship may appear highly significant using conventional normal critical values although the series are unrelated, resulting in a spurious relationship.To address the unit root issue, we take differences, test whether the differenced series is unit rooted, and run the regression again. Our revised regression(without unit root) shows that for every 1 percentage increase in GDP per capita, nearly 1 more person suicides per 100k people, and relationship is highly significant. By controlling timeline, the relationship between log_GDPpc and suicide rate turns significantly positive. It is probably because over time people are more capable of finding work life balance and more content with life quality so they are less likely to suicide. GDP growth increases as technology develops and the UK economic performance keeps improving. Therefore, if we difference the data across time periods, the actual causal relationship between GDP per capita and suicide rate is positive. A possible interpretation of this result could be higher the production level, heavier the work pressure, hence higher the suicide rate.

Conclusion