Research question: why are some countries democracies whereas others are not?
Theory: economic development causes democratization
Modernization theory (Lipset 1963; Przeworski et al 2000)
Hypothesis: there is a positive relationship between economic development and democracy—more “developed” countries will be more democratic than less developed countries (and vice versa)
Pearson's product-moment correlation
data: ds$fh_index and ds$log_gdp
t = 8.9739, df = 186, p-value = 0.0000000000000003101
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4412788 0.6422632
sample estimates:
cor
0.5496762
Development and Democracy
On average, there is a positive and statistically significant relationship between development and democracy—as development increases, democracy seems to increase.
By how much?
Bivariate Linear Regression
We use bivariate (simple) linear regression to estimate the slope (and intercept) of the line that “best fits” the data; we use this information to:
Identify the presence (or absence) of a relationship between variables
Identify the direction of the relationship (positive or negative)
Identify the strength of the relationship—how much does y change when x changes?
Make predictions—if x is 7, what should y be?
Bivariate Linear Regression
Bivariate linear regression models take the following form: y_i=\alpha+\beta{x}_i+\varepsilon_i, where:
Bivariate Linear Regression: Estimating the Slope and Intercept
The goal of bivariate linear regression is to estimate a line (slope and intercept) that minimizes the errors (residuals)
We accomplish this using the ordinary least squares (OLS) method to find the \hat{\alpha} and \hat{\beta} that minimize the sum of squared errors (SSE)
Call:
lm(formula = y ~ x, data = data)
Coefficients:
(Intercept) x
1.2650 0.8745
\hat{y} = 1.27 + 0.87x
If x is 7, what is the prediction for y?
Development and Democracy
On average, there is a positive and statistically significant relationship between development and democracy—as development increases, democracy seems to increase.
There is a positive relationship between development and democracy—the estimate indicates that a one unit increase in log GDP corresponds with a 9.61 point increase on the Freedom House index.
If log_gdp is 5, what is the prediction for fh_index?
What about when log_gdp is 10?
Bivariate Linear Regression: Model Fit (RSE)
How well does the line “fit” (describe) the data?
To answer this question, we analyze the error term (residuals)—how close are the values we observe to the values we predict (the regression line)?
Residual standard error (RSE): SE(\hat{r_i})=\sqrt{\frac{1}{n-2}\sum_{i=1}^n(y_i-\hat{y})^2}
Bivariate Linear Regression: Model Fit (r^2)
Coefficient of determination: r^2=1-\frac{SS_{res}}{SS_{tot}}
Residual sum of squares: SS_{res}=\sum_{i=1}^n(y_i-\hat{y})^2
Total sum of squares: SS_{tot}=\sum_{i=1}^n(y_i-\bar{y})^2
Measure of how well the regression line approximates the real data points
r^2 of 1 indicates that the regression line perfectly fits the data
Often interpreted as the proportion of variation in y that is “explained” by x; an r^2 of 1 indicates that x explains 100% of the variation in the y; why?
fit <-lm(fh_index ~ log_gdp, data = ds)summary(fit)
Call:
lm(formula = fh_index ~ log_gdp, data = ds)
Residuals:
Min 1Q Median 3Q Max
-64.778 -15.198 5.812 16.900 46.838
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.871 8.263 -0.953 0.342
log_gdp 9.607 1.071 8.974 0.00000000000000031 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.67 on 186 degrees of freedom
Multiple R-squared: 0.3021, Adjusted R-squared: 0.2984
F-statistic: 80.53 on 1 and 186 DF, p-value: 0.0000000000000003101
Development and Democracy
There is a positive relationship between development and democracy—a one unit increase in log(GDP) corresponds with a 9.6 unit increase in Freedom House score.
The model fits the data reasonably well—approximately 30% of the variation in Freedom House scores is “explained” by log(GDP) and the majority (~68%) of the Freedom House scores that we observe in the data fall within 24 points of the score that the model predicts.
Bivariate Linear Regression: Inference
Goal: estimate unknown population parameters using sample statistics as point estimates for the unknown population parameters
In bivariate linear regression, the intercept (\alpha) and slope (\beta) of the regression line are the unknown population parameters that we have to estimate; as always, we estimate these parameters in two steps:
Calculate point estimates for \alpha(\hat{\alpha}) and \beta(\hat{\beta})
Quantify the uncertainty around the point estimates by calculating standard errors for \hat{\alpha} and \hat{\beta}
With this information, we can calculate CIs, p-values, and test hypotheses
Consistent with H1, these findings indicate that there is a positive relationship between development and democracy. The relationship is both statistically and substantively significant. On average, countries that have a low per capita GDP (y = 5, $148) score relatively low on the Freedom House index (x = 40.2), whereas countries that have a per capita high GDP (y = 10, $22,026) score significantly higher (x = 88.2), a mean difference of roughly 48 points on the 100 point scale of democracy. These results suggest that economic development (or lack thereof) may explain why some countries are democracies whereas others are not.