getwd()
## [1] "/Users/h0age/Documents/data110/Project 2"
setwd("/Users/h0age/Documents/data110/data_sets")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.0 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(RColorBrewer)
library(ggthemes)
library(ggrepel)
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
demosick <- read_csv("disease_democ.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## country = col_character(),
## income_group = col_character(),
## democ_score = col_double(),
## infect_rate = col_double()
## )
disease_democ.csv is a 168 sample dataset from 2003 with four variables: 1) COUNTRY NAME, 2) INCOME GROUP, 3) DEMOCRACY SCORE, 4) INFECTIOUS DISEASE RATE. Democracy scores come from the work of Tatu Vanhanen in Democratization: A Comparative Analysis of 170 Countries (Routledge, 2003). Rate of infectious disease data comes from Global Infectious Diseases and Epidemiology Network. Country income group thresholds are set by the World Bank. The data can be accessed here: https://paldhous.github.io/ucb/2018/dataviz/datasets.html
The data is useful in exploring an argument made by evolutionary biologists Randy Thornhill and Corey Fincher that there is a negative correlation between rates of infectious disease and a society’s tendency to develop democratic political institutions. For an article-length introduction their ideas see ‘Healthy democracy’ by Jim Giles in New Scientist, 5/21/2011, Vol. 210 Issue 2813, p34-37. 4p. weblink: https://www.newscientist.com/article/mg21028133-300-genes-germs-and-the-origins-of-politics/
Thornhill and Fincher see high prevalence of infectious disease as a fundamental barrier to development of democratic institutions. They argue that in an environment where infectious disease is widespread, outsiders will be more likely to be viewed with suspicion, which undermines the openness and social cohesion that are prerequisites to the formation of democratic societies.
I was drawn to this dataset and topic because my father was trained as an anthropologist and the development of human societies was a topic we discussed quite frequently. From my non-academic understanding of the topic, the level of complexity and variability across societies make it difficult to point to any one factor as being determinitive of the political direction that a society might take. This leads me to view Thornhill and Fincher’s argument with a certain level of skepticism.
In the plots below I will attempt to 1) use the disease_democ.csv dataset to assess the correlation between infectious disease and democracy, 2) view the distribution of the data in a scatterplot format, and 3) explore an alternative factor that might also influence the establishment of democracy.
Before looking at the distribution of the data in scattereplot form, we will sort the entries according to infectious disease scores, in ascending order.
byinfection <- demosick %>%
arrange(infect_rate)
Next we set up the scatter plot and regression line to evaluate the connection between the predictive variable (infection rate) and the dependant variable (democracy score).
byinfection %>% ggplot(aes(x = democ_score, y = infect_rate, color = income_group)) +
ggtitle("Is There a Relationship Between Disease and Democracy?") + ## plot title
xlab("Democracy Score") + ## x axis label
ylab("Infectious Disease Score") + ## y axis labell
##geom_point(show.legend = FALSE, alpha=0.4) + ## calling scatter plot
geom_point(alpha=0.4) + ## calling scatter plot
geom_smooth(color="red", ,formula=y~x,method="lm", se=FALSE, linetype="solid") +
##geom_smooth(method='lm',formula=y~x) + ## how to get the income group parameter out??
labs(color='Country Income Group') +
scale_fill_discrete(name = "Country's Income Group") + ## label for the legend
scale_y_continuous(limits = c(30,50)) + ## do I need this? the chunk doesn't work if i remove it
theme_light() ## add new theme
## Warning: Removed 67 rows containing non-finite values (stat_smooth).
## Warning: Removed 67 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_smooth).
From Plot 1 we can see a clear negative correlation affirming the idea that the higher the prevalence of infectious disease the lower the likelihood that a society will develop democratic institutions. Thornhill argues that the rate of infectious disease is the most important factor in determining the emergence of democratic institutions. Given the complexity of human societies and all the factors that influence their development, we might be skeptical of the idea that one factor such as infectious disease could be so determinative, and may want to investigate other confounding factors.
The inclusion of country income categories in the data may provide insight into whether there is any connection between economic prosperity and the development of democratic institutions, which we will explore in Plot 2.
But before we do that, first let’s analyze the precise correlation and predictive value of the model using our democracy and infectious disease variables.
cor(byinfection$infect_rate, byinfection$democ_score) ## check correlation
## [1] -0.6664911
model <- lm(infect_rate ~ democ_score, data = byinfection) ## linear model, predictor variable y is infection rate
summary(model) ## view summary
##
## Call:
## lm(formula = infect_rate ~ democ_score, data = byinfection)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.6506 -3.7633 0.2188 3.6332 10.4621
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43.59815 0.97374 44.77 <2e-16 ***
## democ_score -0.24008 0.02084 -11.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.071 on 166 degrees of freedom
## Multiple R-squared: 0.4442, Adjusted R-squared: 0.4409
## F-statistic: 132.7 on 1 and 166 DF, p-value: < 2.2e-16
The correlation for Infection Rate as a predictor of Democracy Score is -0.667, a relatively strong sign of negative correlation. The column for Pr(>|t|) showing how useful the predictor is to the model is <2e-16 ***. Three asterisks indicate that the predictor has a significant impact on the model. This further reinforces the conclusion from Plot 1 that there is a strong negative correlation between infection rates and democracy.
Now we will look at whether the economic development variable might also have any connection to the development of democracy.
# define color palette
cols <- brewer.pal(5, "Set1")
highchart() %>%
hc_add_series(data = byinfection, type = "scatter", hcaes(x = democ_score, y = infect_rate,
group = income_group)) %>%
## add regression line??
## lm(infect_rate ~ democ_score)
## hc_add_series(model, type = "line", color = "red") %>%
hc_title(text = "Is There a Relationship Between Disease and Democracy?") %>% ## set topline title
hc_xAxis(title = list(text="Democracy Score")) %>% ## x-axis title
hc_yAxis(title = list(text="Infectious Disease Score")) %>% ## y-axis title
## Legend code
hc_legend(align = "right",
verticalAlign = "top") %>% ## Legend alignment
## add a caption
hc_caption(text = "<b>Source: Vanhanen, Tatu. 2003: Democratization: A Comparative Analysis of 170 Countries. London: Routledge Research in Comparative Politics, Routledge. Taylor & Francis Group, 302 pp</em>'") %>%
## tool tip code
hc_tooltip(pointFormat = "Country: {point.country}:<br>
Democracy Score: {point.democ_score}:<br>
Infection Rate: {point.infect_rate} <br>") %>%
hc_colors(cols) %>% ## call color palette
hc_chart(style = list(fontFamily = "Georgia")) ## set new font
Thanks to the Highcharter tooltip, we can isolate countries on the graph by their income category. We see that countries in the Low Income category are clustered at the low end of the democracy score range and high end of the infectious disease range; while High Income OECD countries are clustered at the high end of democracy score range and the low end of the infectious disease range; middle and upper middle income countries occupy the middle ground. It should be no surprise that high income countries are able to afford a higher standard of healthcare services and as a result have lower prevalence of infectious disease. This begs the question: is it income level or infectious disease that is influencing the development of democracies? or a combination of the two? There are likely a host of other unknown factors that also may play a significant role in determining the formation of democracy in different socities.
Thornhill and Fincher’s argument is insightful and compelling, but I remain unconvinced that infetious disease ratees are the paramount factor in developing democracy.