In this Article, I’m gonna show you step-by-step tutorial, on how to recreate one of beautiful graphs created by The Economist with R’s ggplot2 :
The source data was provided by Study of Terrorism and Responses to Terrorism (START) and can be downloaded here.
The dataset contains information of more than 180,000 terrorist attacks from 1970-2017, and has more than 100 variables on location, tactics, perpetrators, targets, and outcomes.
## [1] "eventid" "iyear" "imonth"
## [4] "iday" "approxdate" "extended"
## [7] "resolution" "country" "country_txt"
## [10] "region" "region_txt" "provstate"
## [13] "city" "latitude" "longitude"
## [16] "specificity" "vicinity" "location"
## [19] "summary" "crit1" "crit2"
## [22] "crit3" "doubtterr" "alternative"
## [25] "alternative_txt" "multiple" "success"
## [28] "suicide" "attacktype1" "attacktype1_txt"
## [31] "attacktype2" "attacktype2_txt" "attacktype3"
## [34] "attacktype3_txt" "targtype1" "targtype1_txt"
## [37] "targsubtype1" "targsubtype1_txt" "corp1"
## [40] "target1" "natlty1" "natlty1_txt"
## [43] "targtype2" "targtype2_txt" "targsubtype2"
## [46] "targsubtype2_txt" "corp2" "target2"
## [49] "natlty2" "natlty2_txt" "targtype3"
## [52] "targtype3_txt" "targsubtype3" "targsubtype3_txt"
## [55] "corp3" "target3" "natlty3"
## [58] "natlty3_txt" "gname" "gsubname"
## [61] "gname2" "gsubname2" "gname3"
## [64] "gsubname3" "motive" "guncertain1"
## [67] "guncertain2" "guncertain3" "individual"
## [70] "nperps" "nperpcap" "claimed"
## [73] "claimmode" "claimmode_txt" "claim2"
## [76] "claimmode2" "claimmode2_txt" "claim3"
## [79] "claimmode3" "claimmode3_txt" "compclaim"
## [82] "weaptype1" "weaptype1_txt" "weapsubtype1"
## [85] "weapsubtype1_txt" "weaptype2" "weaptype2_txt"
## [88] "weapsubtype2" "weapsubtype2_txt" "weaptype3"
## [91] "weaptype3_txt" "weapsubtype3" "weapsubtype3_txt"
## [94] "weaptype4" "weaptype4_txt" "weapsubtype4"
## [97] "weapsubtype4_txt" "weapdetail" "nkill"
## [100] "nkillus" "nkillter" "nwound"
## [103] "nwoundus" "nwoundte" "property"
## [106] "propextent" "propextent_txt" "propvalue"
## [109] "propcomment" "ishostkid" "nhostkid"
## [112] "nhostkidus" "nhours" "ndays"
## [115] "divert" "kidhijcountry" "ransom"
## [118] "ransomamt" "ransomamtus" "ransompaid"
## [121] "ransompaidus" "ransomnote" "hostkidoutcome"
## [124] "hostkidoutcome_txt" "nreleased" "addnotes"
## [127] "scite1" "scite2" "scite3"
## [130] "dbsource" "INT_LOG" "INT_IDEO"
## [133] "INT_MISC" "INT_ANY" "related"
Among all variables, we’ll only take some variables which are relevant for our plotting. Those variables are:
iyear which contains the year in which the incident occurred.country_txt which identifies the country or location where the incident occurred.nkill : The number of total confirmed fatalities for the incident.nkillter : number of perpetrator fatalities.Since the plot we’re going to recreate is only showing Terrorism Index on year 2000 to 2014, we’ll also remove the unnecessaries:
We’ll then create new variable, region, which indicates groups of countries that will shown in the plot:
data_gtd %<>%
mutate(region = case_when(
country_txt == "Iraq" ~ "Iraq",
country_txt == "Nigeria" ~ "Nigeria",
country_txt %in% c("Syria","Afghanistan","Pakistan") ~ "Syria, Afghanistan & Pakistan",
country_txt %in% c("Andorra", "Argentina", "Australia", "Austria", "Belgium", "Canada", "Chile", "Croatia", "Czech Republic", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Iceland", "Ireland", "Israel", "Italy", "Latvia", "Liechtenstein", "Lithuania", "Luxembourg", "Malta", "Monaco", "Netherlands", "New Zealand", "Norway", "Poland", "Portugal", "San Marino", "Slovakia", "Slovenia", "Spain", "Sweden", "Switzerland", "United Kingdom", "United States", "Vatican City") ~ "Western Countries",
TRUE ~ "Others",
))Then we’ll calculate the total deaths based on each region group. Here, note that I also rescale() the deaths value for plotting requisite:
data_gtd %<>%
mutate(nkill = replace_na(nkill, 0),
nkillter = replace_na(nkillter,0),
ntotal = nkill+nkillter) %>%
group_by(iyear, region) %>%
summarise(deaths = sum(ntotal)/1000) %>%
ungroup() %>%
mutate(deaths = rescale(deaths, to = c(0,35)))Now if we inspect both our data and the original economist plot, there’s one group that we still missed, the “Rest of the World” group which contains total deaths of all countries worldwide.
Here, I create a separate dataframe, total, which particularly contains the information of “Rest of the World” group. I intentionally keep it separated for the others as another plotting requisite:
total <- data_gtd %>%
mutate(region = "Rest of the World") %>%
group_by(iyear, region) %>%
summarise(deaths = sum(deaths)) %>%
ungroup() Lastly, we can now remove the “Others” region group since it’s no longer relevant to our data:
Let’s begin with create our ggplot “canvas”:
Then we’ll add the geometrical elements. Now, the reason why I keep the “Rest of the World” data separated is so I can also create it as separated geom from other groups.
If you look at the original plot, there’s a black line on the outer part of the “Rest of the World” area, and that’s the small detail we’re going to achieve :D
ggplot(data_gtd, aes(iyear, deaths))+
geom_area(data = total, aes(fill = region))+
geom_line(data = total, aes(iyear, deaths))+
geom_area(aes(fill = region),color = "white")+
labs(title = "Global deaths from terrorism",
subtitle = "'000",
caption = "Source: START, IEP",
x = NULL,
y = NULL)We’ll then adjust the x and y axis properties and the area fill color. And save our plot as p:
p <- ggplot(data_gtd, aes(iyear, deaths))+
geom_area(data = total, aes(fill = region))+
geom_line(data = total, aes(iyear, deaths))+
geom_area(aes(fill = region),color = "white")+
labs(title = "Global deaths from terrorism",
subtitle = "'000",
caption = "Source: START, IEP",
x = NULL,
y = NULL)+
scale_x_continuous(breaks = seq(2000,2014),
labels = c(2000, paste0(0, seq(1,9)), seq(10,14)),
expand = expand_scale(mult = c(0.03,0.02)))+
scale_y_continuous(position = "right",
breaks = seq(0,125,125/7),
labels = paste(seq(0,35,5)),
expand = expand_scale(mult = c(0,0.1)))+
scale_fill_manual(values = c("#7b2713","#eb9e84","#00a4dc","#f15a40","#00526d"))
pNow, we can adjust the appearance of the plot by adjusting theme().
p <- p +
theme(aspect.ratio = 3.2/7,
text = element_text(family = "Roboto Condensed"),
plot.margin = margin(0,15,0,15),
panel.background = element_rect(fill = "white"),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color = "darkgrey"),
legend.text = element_text(margin = margin(l=3), size = 10),
legend.title = element_blank(),
legend.position = c(0.16,0.7),
legend.key.width = unit(25,"pt"),
legend.key.height = unit(15, "pt"),
axis.text = element_text(size = rel(1), color = "gray8"),
axis.line.x = element_line(color = "gray8"),
axis.ticks.y = element_blank(),
plot.title = element_text(size = rel(1.5), hjust = 0, face = "bold"),
plot.caption = element_text(hjust = 0, size = 9))
pWe can now export our final ggplot as .png image:
extrafontI’m using “Roboto Condesed” from Google Fonts for the following plot.
If you have no idea about how to import your system fonts, you can use extrafont package. Read here for it’s documentations.
magickp <- image_read("p.png")
head <- image_read("header.png")
economist_recreated <- image_flatten(c(p,image_scale(head, "x180")))
image_write(economist_recreated, "economist_recrtd.png")Final Pot: