Consider the following graph showing the relationship between \(X\) and \(Y\).
Write the equation of this line. Interpret the slope and the intercept from this equation.
The figures A-D show the associations between two variables \(X\) and \(Y\).
Match each figure with the following Pearson correlation coefficients: \(-0.9\), \(-0.45\), \(0.55\), \(0.95\). Explain your answer.
Find the Pearson correlation coefficient between \(X\) and \(Y\) based on the graph below. Explain your answer.
A researcher conducted the preliminary analysis on the relationship between the freedom of the press and the control of corruption. The freedom of the press is operationalized as the corresponding index provided by the Freedom House (FPI), and the control of corruption is operationalized as the corresponding index from the Worldwide Governance indicators (CC). Freedom of the Press Index ranges from 0 to 100, and higher values correspond to the lack of freedom. Control of Corruption Indicator ranges from -2.5 to 2.5, and higher values correspond to the better control.
He got the following results:
##
## Pearson's product-moment correlation
##
## data: FPI and CC
## t = -14.226, df = 188, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7821839 -0.6436134
## sample estimates:
## cor
## -0.72
Mark all the correct statements. Explain why other statements are incorrect.
The Russian party “Yabloko” proclaim themselves as the social liberal party. The electorate of this party is mainly concentrated in big cities. One student decided to check statistically whether the share of urban population in a region positively influenced the share of votes for this party at the parliamental elections in 2011. He performed the bivariate linear regression and got the following output:
##
## Call:
## lm(formula = Yabloko ~ urban, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4835 -0.6980 -0.1776 0.4906 6.7773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.33607 0.87116 -4.977 3.84e-06 ***
## urban 0.10489 0.01233 8.505 1.08e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.394 on 77 degrees of freedom
## Multiple R-squared: 0.4844, Adjusted R-squared: 0.4777
## F-statistic: 72.34 on 1 and 77 DF, p-value: 1.078e-12
5.1. How does the vote share of “Yabloko” change when the share of urban population increases by one percentage point?
5.2. Can we conclude that the share of urban population really affects the support of the party “Yabloko”?
5.3. Provide the interpretation of the determination coefficient (R-squared) from this output, i.e. decide based on its value whether the quality of the model is acceptable.
5.4. What is the Pearson correlation coefficient between the share of urban population and the vote share of “Yabloko”?
Consider the following situation. A student decides to evaluate the association between the economic development and the type of the political regime. As an indicator for the economic development he uses the GDP per capita. The type of the political regime takes three values: ‘autocracy’, ‘hybrid regime’, and ‘democracy’. There are 191 countries in the sample.
To find the strength of the association he wants to estimate the Pearson correlation coefficient between the GDP per capita and the type of the political regime. Is the approach used by the student correct? Explain your answer.
A young researcher decided to study the relationship between the income of residents of the district A (income) and the number of years they live in this district (years). He estimated the Pearson correlation coefficient and got the following results:
\(Corr(income,years)=0.3\), \(p-value=0.7\).
Can we regard this result as the reliable one? In other words, can we make any conclusions about the relationship between the income of residents and the number of years they live in the district? Explain your answer.
A student is going to conduct a research on the effect of different factors on young people’s perception of the current political situation in a country. The student has results of the survey (1000 respondents) that include various parameters: respondents’ age, number of years spent studying, number of hours per week spent on reading news, the level of parents’ education, etc. The perception of the current political situation is a complex index that ranges from 0 to 100, where 100 stands for the highest satisfaction with the current political situation.
The student wants to perform a multiple least squares regression, and he is sure that in his model he should take into account the type of locality where respondents live. The type of locality takes five possible values: “city”, “town”, “village”, “urban-type settlement” and “rural-type settlement”.
How should the student include this information in a model? Suggest your solution.
A researcher carried out a study on the salary of the country A residents. He performed a multiple least squares regression, got the coefficients and wrote the following equation:
\[salary = 2.5 \times expr + 3\times educ - 4\times female + 1.5 \times age\] where salary is a person’s salary measured in thousands of units, expr is person’s experience measured in years, educ is the number of years a person spent studying, age is person’s age, and female is a dummy that takes value 1 if a respondent is female and 0 otherwise.
9.1. All else equal, how does the salary of males and females differ according to this equation? In other words, how does the salary differ for a man and a woman of the same age that have the same level of education and experience?
9.2. Suppose the researcher decided to add a new term in the model – \(expr \times female\). Write the new equation of the model if it is estimated that the coefficient of the new term is \((-1.2)\). How does the effect of experience on the salary differ if we compare this modified model and the initial model?
A young researcher to accelerate his skills in building statistical models takes an online course on econometrics. For the practical task he decided to work with the Titanic dataset that contains information about passangers who survived or not in the notorious shipwreck. The researcher wants to decide passengers of what age were more likely to survive. He visualized the relationship between age and passangers’ survival, and got the following graph:
Judging by the graph, the researcher concluded that the age does not have a dramatic effect on passangers’ survival. However, he decided to check it statistically and conduct an ordinary least squares regression, where “Survival” is the dependent variable and “Age” is the independent one.
Comment on the idea of the researcher to perform such analysis.
A student tossed a coin 10 times. In 4 cases he got a head, and in 6 cases he got a tail. Assuming that the probability of a head is the relative frequency of a head estimated on the data given, calculate the odds of obtaining a head.