Introduction

This project uses data from the 2022 Human Freedom Index (HFI), which is an annual report that evaluates and ranks countries’ levels of human freedom based on personal and economic aspects. The variables in the dataset include the name of the country, the year of the country’s ranking (contains data from 2000 to 2020), region, and overall human freedom score (hf), as well as overall rank and quartile, with the rest of the variables split up into two main categories, personal freedom (pf) and economic freedom (ef). These main two categories each have several subcategories, with each subcategory having its own subcategories. The data used in the HFI was collected from various international organizations, including the World Bank, the International Monetary Fund, and the United Nations, among other reputable sources (Vasquez et al). The Cato Institute and the Fraser Institute, which are the organizations that developed and compiled the HFI, then analyze the data and assign scores for each aspect of freedom to each country based on expert assessments and quantitative indicators. These scores are standardized to a common scale, often ranging from 0 to 10, for consistency. In regards to prior cleaning for this project, I did not have to clean the variable names as they were already formatted properly (all lowercase and no spaces); the only cleaning that was involved was excluding missing values during calculations. I chose this topic because many parts of the world are currently suffering from the fallout of international conflict and the information in the Human Freedom Index may shed light on the factors contributing to these conflicts.

Questions to explore:

  1. How has overall human freedom changed globally from 2000 to 2020?

  2. Are personal or economic factors more influential on a country’s overall level of human freedom?

  3. How do scores for various aspects of personal freedom compare across different regions in 2020?

Background Information

Throughout history, diverse perspectives on the true essence of freedom have emerged. Philosophers such as Plato and Hobbes argued for a structured society with strict rules, believing that this was crucial to prevent chaos and maintain safety. According to them, limiting individual freedoms was seen as a necessary sacrifice for the greater good of societal order (Vasquez et al). However, contrasting viewpoints, advocated by figures like Lao Tzu and John Locke, provide a different perspective. They proposed that genuine freedom means allowing individuals to make their own decisions without unwarranted interference. From their view, true freedom transcends a mere absence of regulations; it involves granting people the autonomy to lead their lives on their own terms. This ongoing debate has significantly influenced our modern understanding of freedom, emphasizing its dynamic nature that goes beyond just not having rules.

Load libraries and read in data

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
hfi <- read_csv("hfi_cc_2022.csv")
## Rows: 3465 Columns: 141
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (4): countries, region, ef_government_tax_income_data, ef_government_t...
## dbl (137): year, hf_score, hf_rank, hf_quartile, pf_rol_procedural, pf_rol_c...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Group by region and year and calculate the mean human freedom score for each group, excluding missing values

q1 <- hfi |> group_by(region, year) |> filter(!is.na(hf_score)) |> summarise(mean_freedom = mean(hf_score, na.rm = TRUE))
## `summarise()` has grouped output by 'region'. You can override using the
## `.groups` argument.
q1
## # A tibble: 208 × 3
## # Groups:   region [10]
##    region                   year mean_freedom
##    <chr>                   <dbl>        <dbl>
##  1 Caucasus & Central Asia  2000         7.15
##  2 Caucasus & Central Asia  2003         7.39
##  3 Caucasus & Central Asia  2004         7.12
##  4 Caucasus & Central Asia  2005         6.87
##  5 Caucasus & Central Asia  2006         6.83
##  6 Caucasus & Central Asia  2007         6.81
##  7 Caucasus & Central Asia  2008         6.75
##  8 Caucasus & Central Asia  2009         6.63
##  9 Caucasus & Central Asia  2010         6.54
## 10 Caucasus & Central Asia  2011         6.54
## # ℹ 198 more rows

Create a line chart showing mean human freedom scores over time by region

# Customizing aesthetics, scales, labels, and themes for the plot
viz1 <- q1 |> ggplot(aes(x = year, y = mean_freedom, color = region, text = paste("Region: ", region, "\nYear: ", year, "\nMean Human Freedom Score: ", mean_freedom))) +
  geom_line(aes(group = region), size = 0.8) +
  scale_color_manual(values = c("red2", "#FF7F00", "#FFD700", "#4DAF4A", "blue", "#984EA3", "pink", "#000000", "gray70", "skyblue")) +
  theme_minimal() +
  labs(
    x = "Year",
    y = "Mean Human Freedom Score",
    color = "Region",
    title = "Global Trends in Human Freedom (2000-2020)"
  ) +
  theme(plot.caption = element_text(hjust = 1))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Converting the ggplot object to a plotly object for interactivity
viz1 <- ggplotly(viz1) |> 
  # Adding annotations to the plot for the caption and resizing the margins
  layout(
    annotations = list(
      text = "Source: The Human Freedom Index 2022",
      showarrow = FALSE,
      xref = "paper",
      yref = "paper",
      x = 1.55,
      y = -0.19,
      font = list(size = 12)),
  margin = list(l = 60, r = 50, b = 75, t = 60)
    ) |> 
  # Adding a trace to the plotly object for custom hover text
  add_trace(
    data = q1,
    x = ~year,
    y = ~mean_freedom,
    type = "scatter",
    mode = "lines",
    line = list(width = 0.8),
    hoverinfo = "text",
    text = ~paste("Region: ", region, "\nYear: ", year, "\nMean Human Freedom Score: ", mean_freedom),
    showlegend = FALSE
  )
viz1
## Warning: 'scatter' objects don't have these attributes: 'colour'
## Valid attributes include:
## 'cliponaxis', 'connectgaps', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'fill', 'fillcolor', 'fillpattern', 'groupnorm', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hoveron', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'legendgroup', 'legendgrouptitle', 'legendrank', 'line', 'marker', 'meta', 'metasrc', 'mode', 'name', 'opacity', 'orientation', 'selected', 'selectedpoints', 'showlegend', 'stackgaps', 'stackgroup', 'stream', 'text', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'

Some key takeaways from this plot are that North America and Western Europe have consistently ranked top 2 in average human freedom scores while the Middle East & North African (MENA) region remained at the bottom of the ranks. It also appears that the MENA region as well as the Caucasus & Central Asia (CCA) region had experienced significant drops in their freedom scores around 2004 . This could be due to the rise in terrorism in the MENA region resulting from the 2003 Iraq War, which may have lowered freedom scores, especially in the category of Security and Safety, in the MENA region as well as the CCA region since they are geographically adjacent to each other (“Middle East”). Additionally, it seems that all regions saw drops between 2019 and 2020, which was during the outbreak of COVID-19, implying the pandemic contributed to a decline in human freedom in all aspects across the world.

Create scatterplots with regression lines showing the correlation between scores for each main category of freedom (personal and economic) and overall human freedom scores

# Use `geom_point()` and `geom_smooth(method = "lm")` to show the points with the regression lines

pf_corr <- hfi |> ggplot(aes(x = pf_score, y = hf_score)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(
    x = "Personal Freedom Score",
    y = "Human Freedom Score",
    title = "Correlation Between Personal and Overall Human Freedom",
    caption = "Source: The Human Freedom Index 2022"
  ) +
  theme_minimal() +
  theme(plot.caption = element_text(hjust = 1))

ef_corr <- hfi |> ggplot(aes(x = ef_score, y = hf_score)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") + 
  labs(
    x = "Economic Freedom Score",
    y = "Human Freedom Score",
    title = "Correlation Between Economic and Overall Human Freedom",
    caption = "Source: The Human Freedom Index 2022"
  ) +
  theme_minimal() +
  theme(plot.caption = element_text(hjust = 1))

## load `gridExtra` package and use `grid.arrange` to display both plots in the same output
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
 grid.arrange(pf_corr, ef_corr) 
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 382 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 382 rows containing missing values (`geom_point()`).
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 382 rows containing non-finite values (`stat_smooth()`).
## Removed 382 rows containing missing values (`geom_point()`).

Looking at these scatterplots, it appears that both personal and economic freedom scores have strong positive correlations with overall freedom. Thus, the level of freedom shared by the citizens of a country is influenced by both social and economic factors. However, with these plots looking very identical in regards to the slope of their linear regression lines, the following summary statistics for their regression models will better illustrate which aspect of freedom is most influential.

Calculate the correlation coefficient and summary statistics for both linear regression models

Personal Freedom

cor(hfi$pf_score, hfi$hf_score, use = "complete.obs")
## [1] 0.9675184
summary(lm(hf_score ~ pf_score, data = hfi))
## 
## Call:
## lm(formula = hf_score ~ pf_score, data = hfi)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3650 -0.1950  0.0371  0.2004  0.8874 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.566832   0.026815   58.43   <2e-16 ***
## pf_score    0.755423   0.003556  212.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3114 on 3081 degrees of freedom
##   (382 observations deleted due to missingness)
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9361 
## F-statistic: 4.513e+04 on 1 and 3081 DF,  p-value: < 2.2e-16

The correlation analysis between personal freedom and human freedom reveals a very strong positive correlation of 0.97, indicating the tow variables are closely related. The model’s R-squared value of 0.94 means about 94% of the differences in human freedom scores can be explained by variations in personal freedom scores. This suggests that personal freedom has a big influence on a country’s overall human freedom.

Economic Freedom

cor(hfi$ef_score, hfi$hf_score, use = "complete.obs")
## [1] 0.8283995
summary(lm(hf_score ~ ef_score, data = hfi))
## 
## Call:
## lm(formula = hf_score ~ ef_score, data = hfi)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5498 -0.4251  0.1568  0.4978  1.9800 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.13183    0.08624   1.529    0.126    
## ef_score     1.02921    0.01254  82.090   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.69 on 3081 degrees of freedom
##   (382 observations deleted due to missingness)
## Multiple R-squared:  0.6862, Adjusted R-squared:  0.6861 
## F-statistic:  6739 on 1 and 3081 DF,  p-value: < 2.2e-16

In this model, the correlation coefficient is strong at 0.82, demonstrating positive association between economic freedom and human freedom. The linear regression model is also highly significant (p-value: < 2.2e-16). However, in addition to the correlation coefficient, the R-squared value is lower compared to the personal freedom model, standing at 0.69. This suggests that economic factors, while influential, might explain a slightly lower proportion of the variability in overall human freedom scores.

Filter for the year 2020, group by region, and summarize the mean scores for the subcategories of personal freedom.

q3 <- hfi |> filter(year == "2020") |> group_by(region) |> summarise(
  
  ## After trying many times to get 'plotly` to preserve my formatting for the legend items and hover information, I decided to just format them here while creating the data frame for the plot instead leaving them in their the cleaned version.
  
  "Rule of Law" = mean(pf_rol, na.rm = TRUE),
  "Security and Safety" = mean(pf_ss, na.rm = TRUE),
  "Freedom to Travel" = mean(pf_movement, na.rm = TRUE),
  "Freedom of Religion" = mean(pf_religion, na.rm = TRUE),
  "Freedom to Assemble Peacefully" = mean(pf_assembly, na.rm = TRUE),
  "Freedom of Expression" = mean(pf_expression, na.rm = TRUE),
  "Freedom to Choose Relationships" = mean(pf_identity, na.rm = TRUE)
) |>
# Use `pivot_longer()` to place all the subcategories of personal freedom into a single categorical column with their corresponding mean scores in a separte column. Format variable names how they would appear on the plot.

 pivot_longer(
  cols = c("Rule of Law", "Security and Safety", "Freedom to Travel", "Freedom of Religion", "Freedom to Assemble Peacefully", "Freedom of Expression", "Freedom to Choose Relationships"),
  names_to = "Component",
  values_to = "Mean Score"
) |>
  pivot_longer(cols = region, names_to = "region", values_to = "Region") |>
  select("Region", "Component", "Mean Score")
q3
## # A tibble: 70 × 3
##    Region                  Component                       `Mean Score`
##    <chr>                   <chr>                                  <dbl>
##  1 Caucasus & Central Asia Rule of Law                             4.87
##  2 Caucasus & Central Asia Security and Safety                     8.11
##  3 Caucasus & Central Asia Freedom to Travel                       6.71
##  4 Caucasus & Central Asia Freedom of Religion                     5.83
##  5 Caucasus & Central Asia Freedom to Assemble Peacefully          5.58
##  6 Caucasus & Central Asia Freedom of Expression                   4.71
##  7 Caucasus & Central Asia Freedom to Choose Relationships         8.96
##  8 East Asia               Rule of Law                             6.53
##  9 East Asia               Security and Safety                     9.31
## 10 East Asia               Freedom to Travel                       6.64
## # ℹ 60 more rows

Create a horizontal grouped chart showing mean scores for each component of personal freedom across all regions

# This is optional, but I changed the order of the levels of the components so it would match their order of reference in the Human Freedom Index report

q3$Component <- factor(q3$Component, levels = c("Rule of Law", "Security and Safety", "Freedom to Travel", "Freedom of Religion", "Freedom to Assemble Peacefully", "Freedom of Expression", "Freedom to Choose Relationships"))

viz2 <- q3 |> ggplot(aes(x = Region, y = `Mean Score`, fill = Component)) +
  
  ## Use `geom_col()` and `coord_flip()` to create the horizontal bar chart, and set the position to `position_dodge()" to have the bars next to each other. Adjust the width of the bars and their margins if needed.
  
  geom_col(position = position_dodge(width = 0.82), width = 0.7) +
  coord_flip() +
  theme_minimal() +
  labs(
    x = "Region",
    y = "Mean Score",
    title = "Regional Averages of Personal Freedom Factors (2020)",
    fill = "Component of Personal Freedom",
    caption = "Source: The Human Freedom Index 2022"
  ) +
  scale_fill_manual(
    values = c("Rule of Law" = "red",
               "Security and Safety" = "orange2", 
               "Freedom to Travel" = "yellow2",
               "Freedom of Religion" = "green4",
               "Freedom to Assemble Peacefully" = "skyblue2",
               "Freedom of Expression" = "purple",
               "Freedom to Choose Relationships" = "black"),
  )
# Convert the ggplot object to a plotly object for interactivity
viz2 <- ggplotly(viz2)
# Add annotations to the plot for the caption and resize the margins; adjust the position of caption
viz2 <- viz2 |> layout(
    annotations = list(
      text = "Source: The Human Freedom Index 2022",
      showarrow = FALSE,
      xref = "paper",
      yref = "paper",
      x = 2.3,
      y = -0.19,
      font = list(size = 11)),
    margin = list(l = 80, r = 50, b = 75, t = 60),
     yaxis = list(title = list(standoff = 10)),
    title = list(
      text = "Regional Averages of Personal Freedom Factors (2020)",
      x = 0.5
    )
    )
viz2

This horizontal bar chart shows mean scores for each subcategory/component of personal freedom across global regions. Using plotly’s isolation feature to compare each region’s scores one component at a time, the plot shows similar regional results to the first visualization as Western Europe and North America consistently rank among the top scorers in each category while the Middle East & North Africa score the lowest. Comparing the distribution of mean scores across the regions, it seems that the freedom to choose relationships, the freedom of religion, and the security and safety component of freedom were mostly scored high throughout the regions, while the rule of law and freedom to travel categories saw the lowest scores in each region.

Conclusion

This exploration of the Human Freedom Index (HFI) from 2000 to 2020 exhibited compelling insights into the global dynamics of freedom. The line chart revealed consistently high rankings for North America and Western Europe, contrasting with the persistent low scores in the Middle East & North Africa (MENA) region. I inferred that the substantial drop in scores of the MENA and Caucasus & Central Asia (CCA) regions around 2004 aligned with the aftermath of the 2003 Iraq War, indicating a potential impact of geopolitical events on freedom. The global decline in scores during the COVID-19 pandemic suggested its widespread influence on overall human freedom. The scatterplots and linear regression models showed strong positive correlations between each main category of freedom (personal and economic) and overall human freedom, emphasizing personal freedom as the stronger contributor. The horizontal grouped chart further reinforced regional trends and revealed consistently high scores in components like freedom of religion and security, while highlighting lower scores in the rule of law and freedom to travel. For further exploration, I would want to do a multiple regression analysis to identify which of the 83 distinct indicators of personal and economic freedom contribute most to overall human freedom scores. These findings would help provide a more detailed analysis of the global freedom dynamics and enhance the precision of future policy recommendations and interventions.

Bibliography