Problem set

There is a data set ‘countries.csv’ that contains information on 60 countries of the world and their characteristics:

Data set “Countries”
variable label possible_range
country country name NA
GDP.ppp.pc.us GDP PPP per capita in US dollars 0+
Int.users.per.100 Internet users per 100 0-100
Status Freedom House status (free, partly free, or unfree) Free, Partly free, Unfree
democracy binary Przeworski index of democracy (democracy or dictatorship) Yes, No
P4 Polity IV index (ordinal scale from -10, autocracy, to +10, full democracy) [-10; 10]
df <- read.csv("countries.csv", header = T)
df$P4 <- as.factor(df$P4)
df <- df[,c(-1, -8, -9)]
labs <- c("Country name",
          "GDP per capita, PPP",
          "Internet per 100",
          "Freedom House Index",
          "Democracy-Dictatorship",
          "Polity IV")
library(sjlabelled)
df <- set_label(df, label = labs)
library(sjPlot)
view_df(df, verbose = F)
Data frame: df
ID Name Label Values Value Labels
1 country Country name Algeria
Argentina
Armenia
Australia
Azerbaijan
Bahrain
Belarus
Brazil
Chile
China
Colombia
Cyprus
Ecuador
Egypt
Estonia
<… truncated>
2 GDP.ppp.pc.us GDP per capita, PPP range: 1535.5-122609.4
3 Int.users.per.100 Internet per 100 range: 7.1-92.9
4 Status Freedom House Index Free
Partly free
Unfree
5 democracy Democracy-Dictatorship No
Yes
6 P4 Polity IV -10
-9
-7
-6
-4
-3
-2
2
3
4
5
6
7
8
9
<… truncated>
library(psych)
describe(df)
##                   vars  n     mean       sd   median  trimmed      mad
## country*             1 60    30.50    17.46    30.50    30.50    22.24
## GDP.ppp.pc.us        2 57 23234.14 21823.75 16457.09 19795.13 13988.32
## Int.users.per.100    3 58    49.26    23.42    48.70    49.17    28.32
## Status*              4 60     1.83     0.81     2.00     1.79     1.48
## democracy*           5 58     1.60     0.49     2.00     1.62     0.00
## P4*                  6 56    11.25     4.86    13.00    11.74     4.45
##                       min       max     range  skew kurtosis      se
## country*             1.00     60.00     59.00  0.00    -1.26    2.25
## GDP.ppp.pc.us     1535.48 122609.41 121073.93  2.18     6.18 2890.63
## Int.users.per.100    7.10     92.86     85.76  0.03    -0.96    3.08
## Status*              1.00      3.00      2.00  0.30    -1.43    0.10
## democracy*           1.00      2.00      1.00 -0.41    -1.86    0.06
## P4*                  1.00     16.00     15.00 -0.74    -0.89    0.65

Task 1

There are two categorical indices in the data set, the Freedom Index Index with 3 levels and the Democracy-Dictatorship index. Both of them measure the same concept, but their methodologies differ.

Explore their relationship with a cross-tab and then test with a statistical test. If the test is statistically significant, report the influential categories.

Write down your conclusions.

Task 2

Some analysts claim that Internet penetration and democracy are related. Others argue that nowadays Internet penetration is not indicative of the democratic political regime anymore.

Explore the issue showing the average Internet users per 100 rate by country status according to the Freedom House index in a table and in a boxplot showing the distributions of these values.

Then test whether the average Internet penetration is different across three political regimes or not. If yes, test which regimes have different levels of Internet penetration.

Write down your conclusions.

Task 3

There are rich and poor countries, there are democracies and non-democracies. Adam Przeworski came up with a binary Democracy-Dictatorship index which crudely divides all the countries into two categories. Are dictatorship richer than democracies?

Explore whether the means and standard deviations of GDP PPP are different in democracies and dictatorships.

Then test whether the GDP PPP level is the same across the two groups of countries. Pay attention to the number of countries in each group and to the distribution of the GDP PPP values in both groups. If there are not many, double-check your results with a test that does not require large sample sizes or normal distribution.

Write down your results.

Task 4 (home)

Polity IV is one of the most famous scales for democracy(variable P4). It is ordinal and results from a subtraction of two other ordinal indices, democracy and autocracy. Is Polity IV scale of democracy correlated with GDP PPP?

Visualize the relationship, run the formal test(s), and interpret the results. If the coefficients are statistically significant, report whether the correlation is high, medium, or low.

Task 5 (home)

What about information technologies and economic development? Can every country afford to have Internet access for everyone and thus provide a faster growth of information economy?

Let’s investigate the relationship between GDP PPP and the Internet users per 100 rate. Are those correlated?

Show a plot with both variables. Then calculate and report Pearson’s correlation coefficient, indicate whether it is statistically significant and, if significant, whether it is high, medium, or low.

The end.