Imagine that you should conduct a small research on the relationship between the share of the retired people in a region and the turnout in the elections to the Russian State Duma (2011). It is observed that in Russia voters of the retirement age tends to participate in elections more actively than the younger generation. The question is: does the share of pensioners positively affect the turnout in elections?
To answer this question you are provided with the dataset containing the variables you need for the analysis. The share of pensioners (%) is the variable ret, and the turnout (%) is the variable turnout. You should use the least squares regression model in this task.
Load data:
dat <- read.csv("http://math-info.hse.ru/f/2016-17/ps-pep-quant/datareg2011.csv")
1.1. What is the dependent variable in your model?
The dependent variable is the turnout rate (turnout
).
1.2. What is the independent variable in your model?
The independent one is the share of retired people (ret
).
1.3. Formulate the null hypothesis you are going to test.
\[ H_0: \beta_1 = 0 \text{ (no effect of the share of retired people on the turnout}) \]
1.4. Formulate the alternative hypothesis.
\[ H_1: \beta_1 \ne 0 \text{ (the share of retired people affects the turnout}) \]
1.5. Run R commands to perform the simple least squares regression. Provide the code you use to do it.
model <- lm(data = dat, turnout ~ ret)
summary(model)
##
## Call:
## lm(formula = turnout ~ ret, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.410 -8.223 -2.364 6.760 35.070
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 102.962 11.436 9.004 1.18e-13 ***
## ret -1.489 0.395 -3.768 0.00032 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.43 on 77 degrees of freedom
## Multiple R-squared: 0.1557, Adjusted R-squared: 0.1447
## F-statistic: 14.2 on 1 and 77 DF, p-value: 0.0003199
1.6. Interpret the output you get.
Your interpretation should include the answers for the following questions:
How does the turnout rate change when the share of pensioners in a region increases by one percentage point?
Can we conclude that the share of pensioners has a statistically significant association with the turnout rate? Do not forget to indicate the level of significance you make your conclusions at.
Is the model you performed plausible? Consider the output you got and use your own perception of the research described in this task.
Answers:
When the share of pensioners increases by one percentage point, the turnout rate decreases by \(1.489\) (on average).
Yes, we can since the coefficient of ret
is statistically significant at 0.1% level of significance (‘***’ 0.001
in codes) and hence, it is significant at any conventional significance level that is greater than 0.1% (1%, 5% and 10%).
The model is not plausible since the \(R^2\) is low (R-squared: 0.1557
). Besides, substantially it is not enough to think that the turnout rate depends on the share of retired people only (in this model we have no additional predictors).