A paper published in 1974 called “An Economic Theory of Suicide” argued that when the economic conditions get tough, the number of suicides increases because many people “will believe future prospects to have diminished.” As a quick test of this theory, we can write down the following linear model between \(S_{t}\), the number of suicides per 100,000 people in year t, and \(U_{t}\), the unemployment rate in year t:
\[S_{t}=\beta_0+\beta_1 U_{t} + \epsilon_{t}\] \[E(\epsilon_{t}|U_{t})=0\]
The data for this problem is on the website at link. Save the csv file someplace on your computer, and then import the data in R. Since this is a csv file, you have to type:
mydata <- read.csv(“C:/…/Book1.csv”)
where the dots represent the location of your file.
Most of the R commands from last homework will be helpful here. So, please, look over the last R assignment to figure out which commands you need.
Type “mydata” – you should see three columns: yeart, St, and Ut.
Any factors other than unemployment that can influence suicide rates, e.g. depression, anxiety, other economic variables such as bankruptcy. It does not seem likely that the assumption holds since there may be an underlying factor causing both unemployment and, say, depression (\(\epsilon_t\)). For example, if the individual is suffering from clinical depression, s/he may be more likely to become unemployed (so \(\epsilon_t\) which is now depression is correlated with \(U_t\)).
plot(mydata$Ut, mydata$St)
Compute the OLS estimators for \(\beta_0\) and \(\beta_1\). Use the lm command.
What is the \(R^2\) for this regression? What does it mean? What is the 95% confidence interval for \(\beta_1\)? Use confit to find this interval. Can you reject the hypothesis that unemployment has no effect on the suicide rate? Why or why not?
regSt <- lm(St~Ut, data=mydata) ## regress St on Ut using mydata
summary(regSt) ## regression output
##
## Call:
## lm(formula = St ~ Ut, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4367 -0.3379 0.0040 0.2424 0.7808
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.67277 0.49740 19.447 1.16e-08 ***
## Ut 0.40664 0.08383 4.851 0.000907 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4123 on 9 degrees of freedom
## Multiple R-squared: 0.7233, Adjusted R-squared: 0.6926
## F-statistic: 23.53 on 1 and 9 DF, p-value: 0.0009071
confint(regSt, level = 0.95) ## 95% confidence intervals for the parameters
## 2.5 % 97.5 %
## (Intercept) 8.5475775 10.7979686
## Ut 0.2170104 0.5962648
confint(regSt, level = 0.99) ## 99% confidence intervals for the parameters
## 0.5 % 99.5 %
## (Intercept) 8.0563069 11.2892392
## Ut 0.1342175 0.6790577
The output of the regression is \[\hat{S_t} = \underset{(0.497)}{9.673} + \underset{(0.084)}{0.407} U_t, \quad R^2 = 0.723\]
The \(R^2\) is relatively high, which means that the unemployment rate explains a large proportion of the variation in suicide rates.
Testing whether unemployment does not have an effect on suicide means that the null hypothesis is \(H_0: \quad \beta_1 = 0\). The alternative hypothesis is \(H_1: \quad \beta_1 \neq 0\). Alternatively, looking at the p-value for \(\beta_1\), we can case that the unemployment rate is statistically significant at the 1% level. Looking at the \(99%\) (and \(95%\)) confidence intervals, we can see that \(0\) does not lie in either one, so we can reject the null hypothesis at both \(1%\) and \(5%\) significance levels. This means that we can reject the hypothesis that unemployment does not have an effect on suicide rates.
The new coefficients would be \(\frac{10^5}{10^3} \beta_0\) and \(\frac{10^5}{10^3} \beta_1\).
newUt <- data.frame(Ut = mydata$Ut)
(newUt is like the newAds variable from last week’s assignment).
newUt <- data.frame(Ut = mydata$Ut)
preds <- predict(regSt,newUt)
preds
## 1 2 3 4 5 6 7 8
## 11.13667 11.09600 11.66530 12.07193 11.94994 11.66530 11.94994 13.12919
## 9 10 11
## 12.80388 12.51924 12.11260
errorhat <- mydata$St - preds
plot(mydata$Ut,errorhat)
Take a look at this website to understand how the residuals should look like, and then compare your residuals to the ones on the webpage. Does it look as if your residuals have a pattern?
It looks as if there is serial correlation.
mean(errorhat)
## [1] -4.521643e-15
This number is almost zero, which should be by construction (since the sum of squared residuals is supposed to be zero). It is not surprising.
newUt <- data.frame(Ut = 5.8)
preds <- predict(regSt,newUt)
preds
## 1
## 12.03127
\[\hat{S_t} = 9.673 + (0.407)(5.8) = 12.032\]