There is a data set ‘countries.csv’ that contains information on 60 countries of the world and their characteristics:
| variable | label | possible_range |
|---|---|---|
| country | country name | NA |
| GDP.ppp.pc.us | GDP PPP per capita in US dollars | 0+ |
| Int.users.per.100 | Internet users per 100 | 0-100 |
| Status | Freedom House status (free, partly free, or unfree) | Free, Partly free, Unfree |
| democracy | binary Przeworski index of democracy (democracy or dictatorship) | Yes, No |
| P4 | Polity IV index (ordinal scale from -10, autocracy, to +10, full democracy) | [-10; 10] |
df <- read.csv("countries.csv", header = T)
df$P4 <- as.factor(df$P4)
df <- df[,c(-1, -8, -9)]
labs <- c("Country name",
"GDP per capita, PPP",
"Internet per 100",
"Freedom House Index",
"Democracy-Dictatorship",
"Polity IV")
library(sjlabelled)
df <- set_label(df, label = labs)
library(sjPlot)
view_df(df, verbose = F)
| ID | Name | Label | Values | Value Labels |
|---|---|---|---|---|
| 1 | country | Country name |
Algeria Argentina Armenia Australia Azerbaijan Bahrain Belarus Brazil Chile China Colombia Cyprus Ecuador Egypt Estonia <… truncated> |
|
| 2 | GDP.ppp.pc.us | GDP per capita, PPP | range: 1535.5-122609.4 | |
| 3 | Int.users.per.100 | Internet per 100 | range: 7.1-92.9 | |
| 4 | Status | Freedom House Index |
Free Partly free Unfree |
|
| 5 | democracy | Democracy-Dictatorship |
No Yes |
|
| 6 | P4 | Polity IV |
-10 -9 -7 -6 -4 -3 -2 2 3 4 5 6 7 8 9 <… truncated> |
|
library(psych)
describe(df)
## vars n mean sd median trimmed mad
## country* 1 60 30.50 17.46 30.50 30.50 22.24
## GDP.ppp.pc.us 2 57 23234.14 21823.75 16457.09 19795.13 13988.32
## Int.users.per.100 3 58 49.26 23.42 48.70 49.17 28.32
## Status* 4 60 1.83 0.81 2.00 1.79 1.48
## democracy* 5 58 1.60 0.49 2.00 1.62 0.00
## P4* 6 56 11.25 4.86 13.00 11.74 4.45
## min max range skew kurtosis se
## country* 1.00 60.00 59.00 0.00 -1.26 2.25
## GDP.ppp.pc.us 1535.48 122609.41 121073.93 2.18 6.18 2890.63
## Int.users.per.100 7.10 92.86 85.76 0.03 -0.96 3.08
## Status* 1.00 3.00 2.00 0.30 -1.43 0.10
## democracy* 1.00 2.00 1.00 -0.41 -1.86 0.06
## P4* 1.00 16.00 15.00 -0.74 -0.89 0.65
There are two categorical indices in the data set, the Freedom Index Index with 3 levels and the Democracy-Dictatorship index. Both of them measure the same concept, but their methodologies differ.
Explore their relationship with a cross-tab and then test with a statistical test. If the test is statistically significant, report the influential categories.
Write down your conclusions.
Some analysts claim that Internet penetration and democracy are related. Others argue that nowadays Internet penetration is not indicative of the democratic political regime anymore.
Explore the issue showing the average Internet users per 100 rate by country status according to the Freedom House index in a table and in a boxplot showing the distributions of these values.
Then test whether the average Internet penetration is different across three political regimes or not. If yes, test which regimes have different levels of Internet penetration.
Write down your conclusions.
There are rich and poor countries, there are democracies and non-democracies. Adam Przeworski came up with a binary Democracy-Dictatorship index which crudely divides all the countries into two categories. Are dictatorship richer than democracies?
Explore whether the means and standard deviations of GDP PPP are different in democracies and dictatorships.
Then test whether the GDP PPP level is the same across the two groups of countries. Pay attention to the number of countries in each group and to the distribution of the GDP PPP values in both groups. If there are not many, double-check your results with a test that does not require large sample sizes or normal distribution.
Write down your results.
Polity IV is one of the most famous scales for democracy(variable P4). It is ordinal and results from a subtraction of two other ordinal indices, democracy and autocracy. Is Polity IV scale of democracy correlated with GDP PPP?
Visualize the relationship, run the formal test(s), and interpret the results. If the coefficients are statistically significant, report whether the correlation is high, medium, or low.
What about information technologies and economic development? Can every country afford to have Internet access for everyone and thus provide a faster growth of information economy?
Let’s investigate the relationship between GDP PPP and the Internet users per 100 rate. Are those correlated?
Show a plot with both variables. Then calculate and report Pearson’s correlation coefficient, indicate whether it is statistically significant and, if significant, whether it is high, medium, or low.