In their 2009 article, “Parasites, democratization, and the liberalization of values across contemporary countries,” Randy Thornhill, Corey Fincher, and Deveraj Aran hypothesize that “…the variation in values pertaining to autocracy-democracy arises fundamentally out of human (Homo Sapiens) species-typical psychological adaptation that manifests contingently, producing values and associated behaviors that functioned adaptively in human evolutionary history to cope with local levels of infectious diseases.” (Thornhill, Fincher, and Aran, 2009)
In that article, the authors argue that “…the risk of infectious disease…is a cause affecting global variation in three central aspects of democratization: (1) the willingness of powerful people to extend economic and social resources and opportunities outside their own kin or ethnic group, and encourage political involvement of the populace; (2) the validity of rank/authority, as perceived by the general population, and thus the authoritarian—anti-authoritarian dimension; and (3) attitudes about non-traditional ideas and ways of life that determine whether innovation occurs as well as whether innovation diffuses within and across geopolitical boundaries. … the empirical implication is that the degree of democratization should increase as disease prevalence decreases across the countries of the world.” (Thornhill, Fincher, and Aran, 2009)
This hypothesis is controversial, as scholars have long posited that other factors like economic development, modernization, resource and political power distributions (factors which Thornhill et al argue are salient components of democratization, but not causes of democratization) are the true determinants of political systems. In 2013, Damian Murray, Mark Schaller, and Peter Suedfeld attempted to test the parasite-stress hypothesis while “statistically controlling for other threats to human welfare.” (Murray, Schaller, Suedfeld, 2013) They ran two studies. The first examined the relationship not just between state governance systems and infection rates, but the relationship between those variables and authoritarian attitudes of the people in the country. The second introduced an additional variable and a statistical mediation test to determine whether the individuals’ attitudes influenced or were influenced by the government system. They determined that “…the ecological prevalence of infectious diseases predicts the individual authoritarian personalities of people living within that ecological region, and these individual-level dispositions in turn give rise to (and sustain) authoritarian systems of government.” (Murray, Schaller, Suedfeld, 2013)
Other scholars dismiss this hypothesis. In their 2018 article, “Parasites and politics: why cross-cultural studies must control for relatedness, proximity and covariation,” Lindell Bromham, Xia Hua, Marcel Cardillo, Hilde Schneeman, and Simon Greenhill dismiss the hypothesis completely, arguing that most analyses that purport to prove infection rate corresponds with political system “fail to account for one or more sources of statistical non-independence inherent in large observational datasets, which can lead to spurious relationships between traits and environments.” (Bromhan, et al, 2018) Thomas Curry and Ruth Mace succinctly point out the greatest flaw in the development of this hypothesis: “Because of their historical relationships, countries (F&T’s unit of analysis) cannot be considered as independent for the purposes of statistical analysis.” (Curry, Mace, 2012)
As a historian of Russia, I have long been fascinated by the question of what has led Russia to develop an autocratic system so at odds with most of the rest of Europe. There are no shortage of theories, including one tongue-in-cheek analysis of the relationship between type of liquor a country consumes and the harshness of its government, but there are no definitive answers. I am intrigued by Thornhill’s hypothesis, but doubtful that the answer to what causes formation of a particular system could be so simple.
Using a small dataset, it is possible to identify some of the weaknesses in the parasite-stress hypothesis.
This dataset, from the Global Infectious Diseases and Epidemiology Network (GIDEON), enables the examination of the parasite-stress hypothesis of democratic or authoritarian political development. It includes the country’s name, its income group, its democracy score, and its infection rate.
library(tidyverse) #download tidyverse to manipulate data
library(ggthemes) # to use themes
library(RColorBrewer) # to use color brewer palattes
setwd("~/Desktop/DATA 110")
gideon_data <- read_csv("disease_democ.csv") # import data into variable "gideon_data"
head(gideon_data) # examine top 6 rows of data to make sure it loaded correctly
## # A tibble: 6 x 4
## country income_group democ_score infect_rate
## <chr> <chr> <dbl> <dbl>
## 1 Bahrain High income: non-OECD 45.6 23
## 2 Bahamas, The High income: non-OECD 48.4 24
## 3 Qatar High income: non-OECD 50.4 24
## 4 Latvia High income: non-OECD 52.8 25
## 5 Barbados High income: non-OECD 46 26
## 6 Singapore High income: non-OECD 64 26
The data is structured as follows:
str(gideon_data)
## tibble [168 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ country : chr [1:168] "Bahrain" "Bahamas, The" "Qatar" "Latvia" ...
## $ income_group: chr [1:168] "High income: non-OECD" "High income: non-OECD" "High income: non-OECD" "High income: non-OECD" ...
## $ democ_score : num [1:168] 45.6 48.4 50.4 52.8 46 64 65.8 70.6 57.6 40.6 ...
## $ infect_rate : num [1:168] 23 24 24 25 26 26 26 26 27 28 ...
## - attr(*, "spec")=
## .. cols(
## .. country = col_character(),
## .. income_group = col_character(),
## .. democ_score = col_double(),
## .. infect_rate = col_double()
## .. )
This data appears to be clean and tidy, with four variables and 168 entries per variable. A check for NA’s reveals that the data is tidy with no missing entries for each variable.
# get the total number of NAs in the data
sum(is.na(gideon_data))
## [1] 0
The income group variable is divided into five categories: “High income: non-OECD”, “High income: OECD”, “Low income”, “Lower middle income”, “Upper middle income”.
(Note, the OECD is “The Organization for Economic Co-operation and Development…an intergovernmental economic organization with 37 member countries, founded in 1961 to stimulate economic progress and world trade. It is a forum of countries describing themselves as committed to democracy and the market economy, providing a platform to compare policy experiences, seek answers to common problems, identify good practices and coordinate domestic and international policies of its members. Generally, OECD members are high-income economies with a very high Human Development Index (HDI) and are regarded as developed countries. As of 2017, the OECD member countries collectively comprised 62.2% of global nominal GDP ($49.6 trillion) and 42.8% of global GDP ($54.2 trillion) at purchasing power parity. The OECD is an official United Nations observer.”) Wikipedia
# use unique to find the categories included under "income_group"
unique(gideon_data$income_group)
## [1] "High income: non-OECD" "High income: OECD" "Low income"
## [4] "Lower middle income" "Upper middle income"
Based upon the top six rows of the data (above), it appears that infection rate is a whole number, probably indicating the number of people per some other number (100 or 1000 most likely), infected. More generally, it seems the higher the number, the higher the impact of disease on the population. Similarly, the democracy score is a decimal number, with a higher score reflecting a higher degree of democratization. The maximum and minimum in each of those categories is as follows:
# use max and min to identify highest and lowest democracy score and infection rate
max(gideon_data$democ_score)
## [1] 86.6
min(gideon_data$democ_score)
## [1] 15.8
max(gideon_data$infect_rate)
## [1] 48
min(gideon_data$infect_rate)
## [1] 23
Given these variables, it is possible to examine the relationship between democracy and infection rate, and democracy and income group, and infection rate and income group. This allows a very basic examination of the hypothesis that infection rate causes democratization or authoritarianism.
First, a five-number summary of the democracy score and infection rate will reveal some basic information about those indicators.
democracy <- gideon_data$democ_score
infection <- gideon_data$infect_rate
summary(democracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.80 28.40 38.40 42.78 52.65 86.60
summary(infection)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 23.00 27.00 32.00 33.33 39.00 48.00
mediandemocracy <- 38.4
medianinfection <- 32.00
This gives a basic understanding of the shape of the data. A simple histogram will show the same information.
ggplot(gideon_data) +
geom_histogram(binwidth = 1, aes(gideon_data$democ_score), fill = "blue") +
labs(title = "Distribution of Countries by Democracy Score", x = "Democracy Score", y = "Number of Countries") +
theme_solarized()
ggplot(gideon_data) +
geom_histogram(binwidth = 1, aes(gideon_data$infect_rate), fill = "blue")+
labs(title = "Distribution of Countries by Infection Rate", x = "Infection Rate", y = "Number of Countries")+
theme_solarized()
We can also see how many countries are in each income group.
ggplot(gideon_data) +
geom_bar(aes(gideon_data$income_group), fill = "blue")+
labs(title = "Distribution of Countries by Income Group", x = "Income Group", y = "Number of Countries")+
theme_solarized()
A simple check of the parasite-stress hypothesis would be a scatterplot checking for a relationship between democracy score and infection rate.
ggplot(mapping = aes(x = gideon_data$democ_score, y = gideon_data$infect_rate)) +
geom_point(color = "blue", alpha = 0.5)+
geom_smooth(color = "red")+
geom_vline(xintercept = mediandemocracy, size = 1, color = "black")+
geom_text(aes(x = mediandemocracy + 12, y = 47, label = paste("Median Democracy Score\n (38.4)")))+
geom_hline(yintercept = medianinfection, size = 1, color = "black")+
geom_text(aes(x = 70, y = 35, label = paste("Median Infection Rate\n (32.0)"))) +
labs(title = "Relationship between Infection Rate and Democracy Score", x = "Democracy Score", y= "Infection Rate") +
theme_solarized()
Inserting a LOESS smoother (in red), it appears there is a relationship, but it is not strongly linear (it looks more like a curve). Nonetheless, the rough relationship is that the higher the infection rate, the lower the democracy score, and vice-versa.
Indeed, a linear regression analysis reveals a correlation.
cor(gideon_data$democ_score, gideon_data$infect_rate) # check the correlation of democracy score and infection rate
## [1] -0.6664911
The correlation coefficient for democracy score and infection rate is -.6664911, which is a weak negative correlation between those two factors.
A linear regression model will provide more information.
fit1 <- lm(formula = gideon_data$democ_score~gideon_data$infect_rate)
summary(fit1)
##
## Call:
## lm(formula = gideon_data$democ_score ~ gideon_data$infect_rate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.838 -9.689 -1.512 7.775 31.763
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 104.4458 5.4627 19.12 <2e-16 ***
## gideon_data$infect_rate -1.8503 0.1606 -11.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.08 on 166 degrees of freedom
## Multiple R-squared: 0.4442, Adjusted R-squared: 0.4409
## F-statistic: 132.7 on 1 and 166 DF, p-value: < 2.2e-16
This model shows that for each increase in infection rate, there will be a drop in the democracy score of 1.8503 points. The p value is very low and is thus considered “statistically significant,” but the adjusted R-squared score indicates that the model explains about 44% of the variation in the democracy score in this data. That means, about 56% of the variation is not explained by this model.
Another hypothesis is that income is a greater predictor of democracy. Examining the democracy score by income group reveals the following:
ggplot(data = gideon_data) +
geom_histogram(mapping = aes(gideon_data$democ_score), fill = "blue") +
labs(title = "Democracy Score by Country and Income Group", x = "Democracy Score", y = "Number of Countries") +
facet_wrap(~gideon_data$income_group) +
theme_linedraw()
There definitely appears to be a relationship between democracy score and income level. The exception appears to be high income, non-OECD countries. By filtering we can discover what those countries are.
high_income_nonOECD <- gideon_data %>%
filter(gideon_data$income_group == "High income: non-OECD")
high_income_nonOECD
## # A tibble: 16 x 4
## country income_group democ_score infect_rate
## <chr> <chr> <dbl> <dbl>
## 1 Bahrain High income: non-OECD 45.6 23
## 2 Bahamas, The High income: non-OECD 48.4 24
## 3 Qatar High income: non-OECD 50.4 24
## 4 Latvia High income: non-OECD 52.8 25
## 5 Barbados High income: non-OECD 46 26
## 6 Singapore High income: non-OECD 64 26
## 7 Cyprus High income: non-OECD 65.8 26
## 8 Malta High income: non-OECD 70.6 26
## 9 Croatia High income: non-OECD 57.6 27
## 10 United Arab Emirates High income: non-OECD 40.6 28
## 11 Trinidad and Tobago High income: non-OECD 46.6 28
## 12 Kuwait High income: non-OECD 49.6 28
## 13 Taiwan High income: non-OECD 77.6 29
## 14 Oman High income: non-OECD 33 35
## 15 Equatorial Guinea High income: non-OECD 28.4 36
## 16 Saudi Arabia High income: non-OECD 40 37
A quick scatterplot will reveal the relationship (or lack of relationship) between democracy and infection rate among these countries.
ggplot(mapping = aes(x = high_income_nonOECD$democ_score, y = high_income_nonOECD$infect_rate)) +
geom_point(color = "red", alpha = 0.5)+
geom_smooth(aes( color = "LOESS"))+
geom_smooth(method = 'lm', formula = y~x, aes(color = "Linear Regression"))+
geom_vline(xintercept = mediandemocracy, size = 1, color = "black")+
geom_text(aes(x = mediandemocracy + 12, y = 47, label = paste("Median Democracy Score\n (38.4)")))+
geom_hline(yintercept = medianinfection, size = 1, color = "black")+
geom_text(aes(x = 70, y = 35, label = paste("Median Infection Rate\n (32.0)"))) +
labs(title = "Infection and Democracy in High Income non-OECD Countries", x = "Democracy Score", y= "Infection Rate")+
scale_colour_manual(name="lines", values=c("red", "blue"))+
theme_solarized()
The blue line represents the LOESS smoother, which is the curve of best fit without assuming the data has some particular shape. In contrast, the red line is the linear regression line. The contrast between the two suggests there is not a strong linear relationship between the two variables. The correlation coefficient confirms this.
cor(high_income_nonOECD$democ_score, high_income_nonOECD$infect_rate)
## [1] -0.5060138
At -.5060138, the correlation coefficient reveals an even weaker correlation than among all countries. Filtering by infection rate (selecting only those countries with an infection rate below the median), enables a closer examination of the democratization pattern of countries with a low-infection rate.
high_income_low_infection <- high_income_nonOECD %>% filter(high_income_nonOECD$infect_rate < 32.0)
high_income_low_infection
## # A tibble: 13 x 4
## country income_group democ_score infect_rate
## <chr> <chr> <dbl> <dbl>
## 1 Bahrain High income: non-OECD 45.6 23
## 2 Bahamas, The High income: non-OECD 48.4 24
## 3 Qatar High income: non-OECD 50.4 24
## 4 Latvia High income: non-OECD 52.8 25
## 5 Barbados High income: non-OECD 46 26
## 6 Singapore High income: non-OECD 64 26
## 7 Cyprus High income: non-OECD 65.8 26
## 8 Malta High income: non-OECD 70.6 26
## 9 Croatia High income: non-OECD 57.6 27
## 10 United Arab Emirates High income: non-OECD 40.6 28
## 11 Trinidad and Tobago High income: non-OECD 46.6 28
## 12 Kuwait High income: non-OECD 49.6 28
## 13 Taiwan High income: non-OECD 77.6 29
To examine these countries more closely, use plotly to be able to hover over the points and see what they represent.
library(plotly)
interactive_plot1 <- ggplot(mapping = aes(x = high_income_low_infection$democ_score, y = high_income_low_infection$infect_rate, text = paste("country" = high_income_low_infection$country))) +
geom_point(color = "red")+
geom_smooth(color = "blue")+
geom_vline(xintercept = mediandemocracy, size = 1, color = "black")+
geom_text(aes(x = mediandemocracy + 8, y = 47, label = paste("Median Democracy Score\n (38.4)")))+
labs(title = "Infection and Democracy in High Income non-OECD Countries", x = "Democracy Score", y= "Infection Rate")+
theme_solarized()
interactive_plot1 <- ggplotly(interactive_plot1)
interactive_plot1
While all high income, non-OECD countries with below-median levels of infection are above the median for democracy score, there appears to be no relationship between the infection rate and the democracy score. (The linear regression model below confirms this.) These represent only thirteen of the 168 countries in this dataset, but that is a non-trivial 7-8% of the total data.
fit3 <- lm(formula = high_income_low_infection$democ_score~high_income_low_infection$infect_rate)
summary(fit3)
##
## Call:
## lm(formula = high_income_low_infection$democ_score ~ high_income_low_infection$infect_rate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.549 -8.549 -1.026 9.212 17.770
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.091 46.809 0.237 0.817
## high_income_low_infection$infect_rate 1.681 1.786 0.941 0.367
##
## Residual standard error: 11.25 on 11 degrees of freedom
## Multiple R-squared: 0.07452, Adjusted R-squared: -0.009618
## F-statistic: 0.8857 on 1 and 11 DF, p-value: 0.3669
Expanding this exploration to all countries with below-median levels of infection reveals the following relationship between infection rate and democracy score:
low_infection <- gideon_data %>% filter(gideon_data$infect_rate < 32.0) # filter for countries below median
interactive_plot2 <- ggplot(mapping = aes(x = low_infection$democ_score, y = low_infection$infect_rate, text = paste("country" = low_infection$country))) +
geom_point(color = "red", alpha = 0.5)+
geom_smooth(color = "blue")+
geom_vline(xintercept = mediandemocracy, size = 1, color = "black")+
geom_text(aes(x = mediandemocracy + 12, y = 47, label = paste("Median Democracy Score\n (38.4)")))+
labs(title = "Infection and Democracy Below-Median Infection Countries", x = "Democracy Score", y= "Infection Rate")+
theme_solarized()
interactive_plot2 <- ggplotly(interactive_plot2)
interactive_plot2
With the hypothesis that parasite-stress causes lower democracy scores, we would expect to see these countries, with below-median parasite-stress, clustered toward the right—the higher democracy score region of the plot. That is not what this plot shows. While most points are indeed above the median, there remain several that are below.
There appears to be only a very weak relationship between infection rate and democracy score in countries with low infection rates. Applying a LOESS smoother and linear regression line will make this clearer. In this plot, the countries, their income groups, and the relationship between their infection rate and democracy score are evident.
plot2 <- ggplot(mapping = aes(x = low_infection$democ_score, y = low_infection$infect_rate, color = low_infection$income_group)) +
geom_point()+
geom_smooth(color = "purple")+
geom_smooth(method = 'lm', formula = y~x, color = 'red')+
geom_vline(xintercept = mediandemocracy, size = 1, color = "black")+
geom_text(aes(x = mediandemocracy + 15, y = 47, label = paste("Median Democracy Score\n (38.4)")))+
labs(title = "Infection and Democracy in Below-Median Infection Countries", x = "Democracy Score", y= "Infection Rate", color = "Income Group")+
theme_solarized()
plot2
The correlation coefficient confirms this.
cor(low_infection$democ_score, low_infection$infect_rate)
## [1] -0.3867273
This correlation coefficient is below .5, so very weak.
fit2 <- lm(formula = low_infection$democ_score~low_infection$infect_rate)
summary(fit2)
##
## Call:
## lm(formula = low_infection$democ_score ~ low_infection$infect_rate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.25 -14.28 -1.59 14.68 31.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 149.2608 25.7501 5.797 1.46e-07 ***
## low_infection$infect_rate -3.4717 0.9496 -3.656 0.00047 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.22 on 76 degrees of freedom
## Multiple R-squared: 0.1496, Adjusted R-squared: 0.1384
## F-statistic: 13.37 on 1 and 76 DF, p-value: 0.0004697
The adjusted R-squared for a linear regression model of the relationship between infection rate and democracy score is a very low .1384. The model explains only 14 percent of the data. Something else is probably responsible for the low democracy scores in these countries.
The study above is really only a tiny representation of the data that Thornhill and others examined in order to develop and challenge the parasite-stress hypothesis. As a result, it is really insufficient to challenge that thesis. The lack of a really clear, strong relationship between the variables of infection rate and democracy score when only one other variable (income) is considered, does raise questions about the accuracy of the hypothesis. Given that an examination of so few variables could raise questions, it would be unsurprising that looking at even more of the multitude of factors that make-up countries’ nature would raise even more questions. The states that Thornhill breaks down his analysis by are relatively recent creations. As a result, Currie and Mace’s observation that states are not independent variables for the purpose of statistical analysis rings true. They are somewhat artificial and arbitrary units when considered within the history of human and disease evolution.
Nonetheless, the analysis of the dataset above was useful—particularly the creation of data visualizations. The initial visualizations lacked the median lines, and that led me to falsely conclude that things were distributed in a different way than they actually were. For example, I did not notice that all of the points in the high-income non-OECD countries plot had above median democracy scores. Placing those lines encouraged me to go back and look at the statistical models to double-check my first impressions. The data supported Thornhill’s conclusions more than I believed based upon my first-draft plots, but a closer look at the statistical analyses confirmed my doubts about his conclusions.
Finally, the dataset had some limitations that inhibited my analysis. Aside from only having four variables, one of those variables–the income group–was not as useful as it could have been had it been quantitative (actual GDP) rather than categorical. Because it was not quantitative, I could not run a linear regression between income and infection rate or democracy score. (I could have run a logistic regression, but I am still learning how to do those.)
In addition, I am still beset by the technical problem with plotly and smoother lines. I believe there is something with the text function that is causing the lines to disappear when I run plotly, but I am not certain what the problem is.
Bromham, L., Hua, X., Cardillo, M., Schneemann, H., & Greenhill, S. J. (2018). Parasites and politics: why cross-cultural studies must control for relatedness, proximity and covariation. Royal Society Open Science, 5(8), 181100. https://doi.org/doi:10.1098/rsos.181100
Currie, T., & Mace, R. (2012, 04/01). Analyses do not support the parasite-stress theory of human sociality. The Behavioral and brain sciences, 35, 83-85. https://doi.org/10.1017/S0140525X11000963
Murray, D. R. (2013, May 1, 2013). Pathogens and Politics: Further Evidence that Parasite Prevalence Predicts Authoritarianism. PLoS ONE, 8(5). https://doi.org/10.1371/journal.pone.0062275
Thornhill, R., Fincher, C. L., & Aran, D. (2009). Parasites, democratization, and the liberalization of values across contemporary countries. Biological Reviews, 84(1), 113-131. https://doi.org/https://doi.org/10.1111/j.1469-185X.2008.00062.x