A paper published in 1974 called “An Economic Theory of Suicide” argued that when the economic conditions get tough, the number of suicides increases because many people “will believe future prospects to have diminished.” As a quick test of this theory, we can write down the following linear model between \(S_{t}\), the number of suicides per 100,000 people in year t, and \(U_{t}\), the unemployment rate in year t:

\[S_{t}=\beta_0+\beta_1 U_{t} + \epsilon_{t}\] \[E(\epsilon_{t}|U_{t})=0\]

The data for this problem is on the website at link. Save the csv file someplace on your computer, and then import the data in R. Since this is a csv file, you have to type:

mydata <- read.csv(“C:/…/Book1.csv”)

where the dots represent the location of your file.

Most of the R commands from last homework will be helpful here. So, please, look over the last R assignment to figure out which commands you need.

Type “mydata” – you should see three columns: yeart, St, and Ut.

  1. What could \(\epsilon_{t}\) represent? That is, what other factors rather than unemployment could have affected suicides in year t? Based on your answer, do you think \(E(\epsilon_{t}|U_{t})=0\) is likely to hold?

Any factors other than unemployment that can influence suicide rates, e.g. depression, anxiety, other economic variables such as bankruptcy. It does not seem likely that the assumption holds since there may be an underlying factor causing both unemployment and, say, depression (\(\epsilon_t\)). For example, if the individual is suffering from clinical depression, s/he may be more likely to become unemployed (so \(\epsilon_t\) which is now depression is correlated with \(U_t\)).

  1. Do a scatter plot of \(S_{t}\) (y-axis) and \(U_{t}\) (x-axis). Use the plot command. Comment. What is the relationship between the two variables? positive, negative? Does it look linear?
plot(mydata$Ut, mydata$St)

  1. Compute the OLS estimators for \(\beta_0\) and \(\beta_1\). Use the lm command.

  2. What is the \(R^2\) for this regression? What does it mean? What is the 95% confidence interval for \(\beta_1\)? Use confit to find this interval. Can you reject the hypothesis that unemployment has no effect on the suicide rate? Why or why not?

regSt <- lm(St~Ut, data=mydata)  ## regress St on Ut using mydata
summary(regSt)                   ## regression output
## 
## Call:
## lm(formula = St ~ Ut, data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4367 -0.3379  0.0040  0.2424  0.7808 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.67277    0.49740  19.447 1.16e-08 ***
## Ut           0.40664    0.08383   4.851 0.000907 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4123 on 9 degrees of freedom
## Multiple R-squared:  0.7233, Adjusted R-squared:  0.6926 
## F-statistic: 23.53 on 1 and 9 DF,  p-value: 0.0009071
confint(regSt, level = 0.95)     ## 95% confidence intervals for the parameters
##                 2.5 %     97.5 %
## (Intercept) 8.5475775 10.7979686
## Ut          0.2170104  0.5962648
confint(regSt, level = 0.99)     ## 99% confidence intervals for the parameters
##                 0.5 %     99.5 %
## (Intercept) 8.0563069 11.2892392
## Ut          0.1342175  0.6790577

The output of the regression is \[\hat{S_t} = \underset{(0.497)}{9.673} + \underset{(0.084)}{0.407} U_t, \quad R^2 = 0.723\]

The \(R^2\) is relatively high, which means that the unemployment rate explains a large proportion of the variation in suicide rates.

Testing whether unemployment does not have an effect on suicide means that the null hypothesis is \(H_0: \quad \beta_1 = 0\). The alternative hypothesis is \(H_1: \quad \beta_1 \neq 0\). Alternatively, looking at the p-value for \(\beta_1\), we can case that the unemployment rate is statistically significant at the 1% level. Looking at the \(99%\) (and \(95%\)) confidence intervals, we can see that \(0\) does not lie in either one, so we can reject the null hypothesis at both \(1%\) and \(5%\) significance levels. This means that we can reject the hypothesis that unemployment does not have an effect on suicide rates.

  1. How would your answer to (c) change if suicide rates were reported per \(1000\) people (instead of \(100,000\))?

The new coefficients would be \(\frac{10^5}{10^3} \beta_0\) and \(\frac{10^5}{10^3} \beta_1\).

  1. Calculate the predicted suicide rate for each year, i.e. \(\hat{S_{t}}\). Use the predict command. Hint: This is like number 13 on Assignment 3R from last week, where the new dependent variable that you use for forecasting is

newUt <- data.frame(Ut = mydata$Ut)

(newUt is like the newAds variable from last week’s assignment).

newUt  <- data.frame(Ut = mydata$Ut)
preds  <- predict(regSt,newUt)
preds
##        1        2        3        4        5        6        7        8 
## 11.13667 11.09600 11.66530 12.07193 11.94994 11.66530 11.94994 13.12919 
##        9       10       11 
## 12.80388 12.51924 12.11260
  1. Calculate the estimated error (residual) for each year, i.e. \(\hat{\epsilon_{t}}=S_{t}-\hat{S_{t}}\). Then plot the rediduals.
errorhat <- mydata$St - preds
plot(mydata$Ut,errorhat)

Take a look at this website to understand how the residuals should look like, and then compare your residuals to the ones on the webpage. Does it look as if your residuals have a pattern?

It looks as if there is serial correlation.

  1. Calculate the average estimated error, \[\frac{1}{11} \sum_{t=1968}^{1978} \hat{\epsilon_{t}}\] using the mean function. Is the answer that you got surprising? See pg 144 in your book.
mean(errorhat)
## [1] -4.521643e-15

This number is almost zero, which should be by construction (since the sum of squared residuals is supposed to be zero). It is not surprising.

  1. The unemployment rate in 1979 was 5.8. Based on your estimates from (c), what suicide rate would you have expected in 1979? You can either compute this by hand but you have to show the formula, or you can use predict.
newUt  <- data.frame(Ut = 5.8)
preds  <- predict(regSt,newUt)
preds
##        1 
## 12.03127

\[\hat{S_t} = 9.673 + (0.407)(5.8) = 12.032\]