Imagine that you should conduct a small research on the relationship between the share of the retired people in a region and the turnout in the elections to the Russian State Duma (2011). It is observed that in Russia voters of the retirement age tends to participate in elections more actively than the younger generation. The question is: does the share of pensioners positively affect the turnout in elections?
To answer this question you are provided with the dataset containing the variables you need for the analysis. The share of pensioners (%) is the variable ret, and the turnout (%) is the variable turnout. You should use the least squares regression model in this task.
1.1. What is the dependent variable in your model?
1.2. What is the independent variable in your model?
1.3. Formulate the null hypothesis you are going to test.
1.4. Formulate the alternative hypothesis.
1.5. Run R commands to perform the simple least squares regression. Provide the code you use to do it.
1.6. Interpret the output you get. Your interpretation should include the answers for the following questions:
How does the turnout rate change when the share of pensioners in a region increases by one percentage point?
Can we conclude that the share of pensioners has a statistically significant association with the turnout rate? Do not forget to indicate the level of significance you make your conclusions at.
Is the model you performed plausible? Consider the output you got and use your own perception of the research described in this task.
D. Acemoglu, S. Johnson, and J. A. Robinson in the paper “The Colonial Origins of Comparative Development: An Empirical Investigation” (2001) evaluated the effect of institutions on economic performance. According to their theory, current economic performance of former colonies depends on the type of institutions europeans introduced during the process of colonization. The type of institutions, in its own turn, depends on natural conditions in a colony.
If natural conditions in a colony were bad and caused diseases and higher mortality, europeans tended to set up ‘extractive states’ that was used only to transfer resources to the metropolitan country. In such cases colonizers did not develop high-quality institutions and thus, provided no protection for private property (Kongo). In colonies with good conditions europeans tried to settle more thoroughly and replicated European institutions with higher emphasis on the private property and the system of checks and balancies against government expropriation (Canada, New Zealand).
To find the support for the their theory, researchers used advanced methods (instrumental variables and two-staged regression), but at some steps they use ordinary least squares regressions (OLS) that were discussed in this course. In this practical task you are suggested to replicate one OLS model from the paper.
You are provided with the dataset that contains the following variables:
shortnam : abbreviation for country name
logpgp95 : logarithm of GDP per capita in 1995
avexpr : average protection against expropriation risk, risk of expropriation of private foreign investment by government, averaged for all years from 1985 to 1995; takes values from 0 to 10, where a higher score means less risk
lat_abst : absolute value of the latitude of the country (a measure of distance from the equator), scaled to take values between 0 and 1, where 0 is the equator
africa : equals 1 if a country is situated in Africa, 0 – otherwise
asia : equals 1 if a country is situated in Asia, 0 – otherwise
america : equals 1 if a country is situated in America, 0 – otherwise
other : equals 1 if a country is situated on other continent (not Africa, not Asia, not America), 0 – otherwise
As a first step of the research authors evaluate the effect of the risk of expropriation of private foreign investment by government on the logarithm of GDP capita taking into account some geographical factors – latitude and a continent where a country is situated. There are four dummies for continents, and the researchers take the America as the base category. Then researchers perform a regression.
2.1. What is the dependent variable in the model?
2.2. What are the independent variables in the model?
2.3. Reproduce the model proposed by Acemoglu et al. Provide the R code you used to perform the model.
2.4. All else equal (ceteris paribus), how does the logarithm of GDP per capita change (on average) when the indicator of risk of expropriation increases by 1 unit?
2.5. All else equal (ceteris paribus), how does the logarithm of GDP per capita differ in African and American countries?
2.6. Which of the factors significantly affect the GDP per capita? At what level of significance?
2.7. How would you assess the quality of the model performed? Provide your comment.
You are suggested to conduct a small research on the political self-identification of the Spanish people. Your question of interest is the following: which factors affect the people’s propensity to identify themselves as advocates of right-wing policy? You are provided with a dataset with the results of the survey conducted in 2014 in Spain. It contains the following variables:
ideolog : respondents’s position on the ideological spectrum, 1 – right, 0 – left
age : respondent’s age
ident_reg : equals 1 if a respondent indentifies themselves with a region (province in Spain)
ident_cntr : equals 1 if a respondent indentifies themselves with a country (Spain as a whole political unit)
male : equals 1 if a respondent is male, 0 – female
educ : respondent’s level of education (1 – primary school, 4 – higher education)
unempl : equals 1 if a respondent is unemployed, 0 – otherwise
Make a regression model that would help you to decide which factors mentioned above affect the people’s position on the ideological spectrum.
3.1. What is the dependent variable in your model?
3.2. What are the independent variables in your model?
3.3. What type of the regression you are going to use? Explain your choice.
3.4. Make the model. Provide the R code you used to perform the model.
3.5. Which of the factors significantly affect the people’s position on the ideological spectrum? At what level of significance?
3.6. Interpret the coefficient of the variable age, i.e. explain what happens when the age of a respondent increases by one year.
3.7. Interpret the coefficient of the variable male, i.e. explain what happens when we move from a female respondent to a male one.
3.8. How does the logarithm of odds differ if we compare its value for unemployed and employed people?
Modify the model from the previous task (Practice 3) so as to cover the differences in the effect of education on political identification on the left – right scale between men and women. In other words, use the same model as before, but consider including some specific term(s) in your model.
4.1. Write the equation of the new model.
4.2. Explain in what way the new model is different from the model from Practice 3.
4.3. Make the model. Provide the R code you used to perform the model.
4.4 Does the level of education affect the political self-identification (left or right) differently for men and women? Explain your answer.
4.5. On average, how does the logarithm of odds differ if we compare the effect of education on political self-identification for men and women?