The data set is from the Cato Institute. It measures human freedom in the world based on personal, economic, and civil freedom. The data includes the country, region, human freedom overall score and various factors that could influence that score. The scores are out of ten and the data are the reports counted that determine the score. I will be focusing on safety and security, freedom of expression, religion, and movement. Safety and security is the right to life and safety from aggression. The data was collected by looking at the disappearances, homicides, and killings (organized and not) for each country. The higher the number, the farther from 10 the score will be. The organized conflicts were gathered from information created by the Economist Intelligence Unit. The fatalities and injuries from terrorism data were gathered from University of Maryland’s Global Terrorism Database. For this category I want to look at the homicide data from different countries in South America and compare. I also would like to put the data on a span of 10 years to find any unusual changes (dips/spikes) and an explanation for them.
The next factor is freedom of movement, which according to the Cato Institute, is the average from freedom of foreign movement (emigration) and freedom of movement for both men and women. It is the freedom to roam within the country and the freedom to leave it. Freedom of religion is based on 2 components: the freedom to choose and exercise one’s religion and the degree to which the government suppresses this freedom. The last factor I will look at is freedom of expression. This is a broad look on personal, press, and use of internet. This also includes government censorship and self-censorship. With these last factors I want to compare the most recent data given for all the countries to see the relationship of the factors to the overall freedom score. I will find out which country has the highest score given these factors and find any outliers.
I decided to look into the human freedom index data because it has an array of variables that influence the score. I like to see how they all work together. All of the variables can be used to create visuals that are full of information that can be shown through shape, size, color, and position on the visual. I want to see which country or region has the highest score and which has the lowest. I wonder what factors are the reason for the score each country receives. I also think it is important to show the differences in regions because it really shows the differences in cultures and government.
Loading the library
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(ggfortify)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(highcharter)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Rows: 3630 Columns: 146
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso, countries, region, ef_government_tax_income_data, ef_governm...
dbl (141): year, hf_score, hf_rank, hf_quartile, pf_rol_procedural, pf_rol_c...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Deselecting the variables I won’t be using to simplify the data set
freedom_2 <- freedom_nona |>select(-hf_quartile, -pf_assembly, -pf_assembly_entry, -pf_assembly_civil, -pf_assembly_freedom_bti, -pf_assembly_freedom, -pf_assembly_freedom_cld, -pf_assembly_freedom_house, -pf_assembly_parties, -pf_assembly_parties_auton, -pf_assembly_parties, -pf_assembly_civil, -pf_assembly_parties_bans, -pf_assembly_parties_barriers)head(freedom_2) # I did not want to look at freedom of assebmly so I removed variables related to it.
To find the significance of certain factors to the human freedom score, I first added year and country. After seeing that year and country were significant, I added the disappearance data, homicide data, freedom of movement, freedom of religion, and freedom of expression. All these variables have asterisks so that means it’s meaningful. With this information, I can create an equation for the human freedom index: Human Freedom Score = -4.466 + (0.004)year + (0.067)disappearances + (-0.004)homicide + (0.121)movement + (0.046)expression + (values)country. *Note: (some specific countries that are not significant to the model are Croatia, Hungary, Mauritius, Mongolia, Panama, Peru). This means that for every point increase in the human freedom score, 0.004 years have gone by, there is an increase in disappearance by 0.067, a decrease in homicides by 0.004, an increase in movement by 0.121, and an increase in expression by 0.046. The countries each have their own value.
The adjusted R-squared is 0.9851. This means that nearly 99% of the variations in the observations may be explained by the model. Next, I will focus on the first two diagnostic plots for the model: residuals vs. fitted and Q-Q residuals. When I look at the fitted vs. residuals plot, the red line is relatively straight with not obvious strays from the dotted line. Also, most of the dots are scattered so that means the model is pretty good. In the Q-Q residuals plot, it is good when the observations are on the line. When looking at it, most of the observations are on the line; however, observations 703, 1000, and 1076 may need to be looked at. Overall, the residuals are normally distributed.
Filtering for only the year 2020
freedom_2020 <- freedom_2 |>filter(year ==2020) # Some of the data needed was not available for 2021.
Plot 1: (Static)
p1 <-ggplot(freedom_2020, aes(x = hf_score, y = pf_religion, color = pf_movement, text =paste("Country:", countries, "\n Region:", region, "\n Human Freedom Score:", hf_score, "\n Religious Freedom:", pf_religion, "\n Freedom of Movement:", pf_movement, "\n Freedom of Expression:", pf_expression_direct))) +geom_point(aes(size = pf_expression_direct)) +scale_color_gradient(low ="#f5f2ab", high ="#f03050") +labs(title ="Human Freedom Score Based on Freedom\nof Movement, Religion, and Expression (2020)",caption ="Source: CATO Institute",x ="Human Freedom Score", y ="Freedom of Religion",color ="Freedom of Movement ") +theme_minimal(base_family ="serif") +guides(size ="none")p1
Plot 1: (Plotly)
p1 <-ggplotly(p1, tooltip="text")p1
Plot 1 Thoughts
In this visualization I took the freedom of religion (y-axis), movement (color), and expression (size) to show the correlation of these factors with the human freedom score. From this I noticed that the countries with less freedom of movement (yellow) also had low religious freedom. I think that these freedoms both being low resulted in a low overall human freedom score. I did not see many repeating regions that are in the lower scores, most are different, so I wonder what other factors led them to being low. On the other hand, high freedom of movement and religion resulted in a high overall freedom score. In the upper right, all three factors were high and it is mostly countries in Eastern Europe. One thing I would note is that the freedom of expression does not seem to be super influential because many are the same size spread throughout the plot. Overall, it is obvious that a higher freedom of movement and religion means a higher human freedom score.
Filtering for my second plot which focuses on South America
freedom_south <- freedom_2 |># I wanted to show the change in homicides over time so I did a span of 10 years.filter(year %in%c(2010, 2011, 2012, 2013, 2014, 2016, 2016, 2017, 2018, 2019, 2020)) |># I only wanted to focus on South America so I filtered out the Caribbeanfilter(!grepl("Jamaica", countries)) |>filter(!grepl("Trinidad and Tobago", countries)) |>filter(!grepl("Jamainca", countries)) |>filter(!grepl("Dominican Republic", countries)) |>filter(!grepl("Haiti", countries)) |># Removing countries with no significant changes after graphingfilter(!grepl("Uruguay", countries)) |>filter(!grepl("Costa Rica", countries)) |>filter(!grepl("Peru", countries)) |>filter(!grepl("Argentina", countries)) |>filter(!grepl("Chile", countries)) |>filter(!grepl("Bolivia", countries)) |>filter(!grepl("Nicaragua", countries)) |># Removing countries that do not have complete data spanning from 2010-2020filter(!grepl("Honduras", countries)) |>filter(!grepl("Paraguay", countries)) |>filter(region =="Latin America & the Caribbean")
After graphing all the South American countries I decided to remove some of the counties that had little to no change because it was getting too crowded. I wanted to focus more on the countries that had drastic changes over time. I removed Chile, Peru, Argentina, Costa Rica, Uruguay, Bolivia, and Nicaragua. Next, I removed Honduras and Paraguay as they did not have complete data.
For my second plot I decided to do a time series of South American country homicides. I wanted to compare them with each other and their changes over time. I noticed that most of the countries on the bottom half had the same overall trend of a straight line. However, Venezuela and El Salvador had an obvious increase in homicides around 2013-2014. Venezuela continued with a downward slope as the years went by, but El Salvador still had the highest spike only decreasing after 2016. I decided to look at the reason why there was a sudden increase.
The reason for El Salvador’s spike in homicides in the years 2013-2016 is because of the 2012 gang truce. This truce was between MS-13 and Barrio 18, which are two of the largest gangs in El Salvador. After the truce was announced in March of 2012, for over a year the homicide rate dropped by over 50%. This explains why 2012 and 2013 had the lowest homicides for El Salvador. The two gangs agreed on reducing hostility and attacks on the armed forces, gangs, and civilians. The government was even involved although they denied it several times. In the end, the truce did not last long. Multiple events eventually lead to the collapse of the truce as gang members were being killed and the government was changing policies. It probably created more distrust not only among the gangs but with the government as the public didn’t completely agree that it would benefit them in any way. With this tension, the homicide rate increased higher than it was before the truce, which shows in the years after 2013. One thing I wish I could do as another visualization is gather more data from the months rather than years and see how the time of events match up with the changes in homicides in El Salvador.
Works Cited
“Human Freedom Index.” Cato Institute, 2024, www.cato.org/human-freedom-index/2024.
Vásquez, Ian, et al. “The Human Freedom Index 2023.” Human Freedom Index , Cato Institute, 2023, www.cato.org/sites/cato.org/files/2023-12/human-freedom-index-2023-full-revised.pdf.
Wikipedia contributors. “2012–2014 Salvadoran gang truce.” Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 22 Nov. 2025. https://en.wikipedia.org/wiki/2012%E2%80%932014_Salvadoran_gang_truce.
Vuković , Siniša, and Ronald Alfieri. “Negotiating with Gangs: Lessons from the 2012 Truce in El Salvador - the Sais Review of International Affairs.” The SAIS Review of International Affairs, 24 Apr. 2023, saisreview.sais.jhu.edu/negotiating-gangs-el-salvador-truce/.
Wikipedia contributors. “List of freedom indices.” Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 12 Dec. 2025. Web. 14 Dec. 2025. https://en.wikipedia.org/wiki/List_of_freedom_indices.