For my first IS460 Module Assignment, I explored a data set about Economic Freedom from the Fraser Institute. This data set caught my attention as a Finance major (and Data Science as well) because I have not taken many economics courses nor have learned about Economic Freedom. Now, reading this, you may be asking yourself, “what does data Economic Freedom contain”? This is the exact question that drew me to the data set. Variables describing to Economic Freedom in this data set are divvied into five areas: 1) governmental financials and taxes, 2) executive, judicial, and legislative effectiveness, 3) fiscal and currency conditions, 4) capital transfers, tariffs and trade information, and 5) financial and business market conditions. These five areas of data amount to more than 80 variables, 6 decades, and 165 countries. Before you continue reading my analysis of this Economic Freedom data set, please peruse this link and/or the data for yourself if curious: https://efotw.org/economic-freedom/dataset?geozone=world&page=dataset&min-year=2&max-year=0&filter=0
The variables I will be focusing on are: Year, Countries, EFW Rank, World Bank Region, World Bank Income Classification…
# Setting up my working directory
setwd("C:/Users/raymo/OneDrive/Documents/IS460")
# Calling in the data
efw <- fread("efotw-2025-master-index-data-for-researchers-iso-efw-index.csv")
names(efw)[5] <- "EFW_Rank" # Renaming a column; https://r-lang.com/names-in-r/
efw_lean <- efw %>%
select(Year,Countries, EFW_Rank) %>% # Selects essential columns to make visualization
filter(Year > 1999) %>% # Filters out years before 2000
data.frame()
efw_top10 <- efw_lean[(efw_lean$EFW_Rank %in% c(1:10)), ] # Selects rows only with top 10 ranking
efw_top10_count <- count(efw_top10, Countries) # New data frame with counts from top 10 rankings
max_y <- round_any(max(efw_top10_count$n), 4, ceiling) # Sets a max-y limit on the y-axis for the counts
#-------------------------------------------------------------------------------
# 'fig.width' and 'fig.height' argument from this StackOverflow article:
# https://stackoverflow.com/questions/43195871/adjusting-figure-margins-in-rmarkdown
ggplot(efw_top10_count, aes(x = reorder(Countries, n), # Identifies the data set and variables in graph
y = n,
fill = n)) +
geom_bar(stat="identity") + # Identifies what kind of graph the visualization
coord_flip() + # Flips the set up of the graph
theme_minimal() +
labs(title = "Country Count of EFW Top 10 Rankings (since 2000)", # Titles for the chart and axes
x = "Countries",
y = "Count") +
theme(plot.title = element_text(hjust = 0.5)) + # Centers the chart title
scale_y_continuous(labels = comma, # Scales the y-axis to the max count
breaks = seq(0, max_y, by=4),
limits = c(0, max_y))
This visualization was my first idea I crafted when exploring this data set, as I wanted to quickly see which countries have been recognized for having high levels of economic freedom. I was I was curious to see how the country rankings have changed in recent years, especially since the turn of the 21st century. I made a subset data frame that eliminated any data with a year before 2000. My findings are such: since 2000, six countries (the United States, Switzerland, Singapore, New Zealand, Hong Kong SAR China, and Australia) all have a top ten ranking each year since 2000. The United Kingdom, Ireland, Denmark, and Canada have all received at least a decade’s worth of top ten rankings, while Mauritius and Germany have both received more than five years of top ten rankings. Rounding out the list of the 19 nations with top ten rankings are the following: Finland, Estonia, Luxembourg, Iceland, and the Netherlands. Each of these nations have a few (all less than 4 specifically) years of top ten rankings.
names(efw)[22] <- "Area_1_Government_Size" # Renaming multiple columns for easier data manipulation
names(efw)[34] <- "Area_2_LSPRwGA"
names(efw)[43] <- "Area_3_Sound_Money"
names(efw)[61] <- "Area_4_International_Trading_Freedom"
names(efw)[84] <- "Area_5_Regulation"
area_years <- c(1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, 2020, 2023) # Selecting a hanful of years to visualize
efw_areas <- efw %>%
select(Year, Countries, Area_1_Government_Size,Area_2_LSPRwGA, Area_3_Sound_Money, # Selecting required variables for visualizartion
Area_4_International_Trading_Freedom, Area_5_Regulation) %>%
filter(Year %in% area_years) %>% # Removing any years that are not assigned to 'area_years'
data.frame()
cleaned_efw_areas <- na.omit(efw_areas) # Cleaning 'efw_areas" df of any NA values
cleaned_efw_areas_avgs <- cleaned_efw_areas %>%
mutate(MeanAreaRatingScore = (Area_1_Government_Size + # creating the MARS metric for each row
Area_2_LSPRwGA +
Area_3_Sound_Money +
Area_4_International_Trading_Freedom +
Area_5_Regulation) / 5) %>%
data.frame()
top_ten_viz_one <- c("Australia", "Canada", "Denmark", "Hong Kong SAR, China", # Hardcoded countries from first visualization under focus here
"Ireland", "New Zealand", "Singapore", "Switzerland",
"United Kingdom", "United States")
mars_viz <- cleaned_efw_areas_avgs %>% # Filters out all countries that do not have at least 10 T10 rankings form first visualization
filter(Countries %in% top_ten_viz_one) %>%
data.frame()
# 'fig.width' and 'fig.height' argument from this StackOverflow article:
# https://stackoverflow.com/questions/43195871/adjusting-figure-margins-in-rmarkdown
ggplot(mars_viz, # Selects the data frame and subsequent variables utilized for the visualization
aes(x=Year,
y=MeanAreaRatingScore,
color = Countries)) +
geom_line(linewidth=1.5) + # A line - a different color - for each country...
geom_point(shape=21,
size=5,
color='black',
fill='white') + # ... that has white circle to lead the viewer's eyes
labs(x="Year",
y="MeanAreaRatingScore (MARS)",
title="MARS of Countries with at least 10 Top 10 Rankings by Year")+
theme(plot.title = element_text(hjust = 0.5))+ # Centers the chart title
theme_light() # Depicts a white backdrop onto the graph
My second visualization is a continuation of looking at the top countries that have been recognized with high economic freedom. I selected to visualize all of the countries from the first visualization that have a count greater than 10, which means they have earned a top ten ranking for at least 10 years since 2000. As mentioned in my introduction, this data set contains data on five areas: 1) governmental financials and taxes, 2) executive, judicial, and legislative grades, 3) fiscal and currency conditions, 4) capital transfers, tariffs and trade information, and 5) financial and business market conditions. Each country receives an annual ranking (among all countries) and a value (between 1 and 10; the greater the better) that portrays its strength in each of the five areas. After discovering these variables, I had the idea to create an average score to compare a number of countries over a given period of time. I call this average score MARS, which stands for ‘Mean Area Rating Score’. The EFW data set has data from 1970 to 2000, in five year increments, and every year since 2000. With this nuance in mind, I filtered out any year that was not a multiple of 5 except for 2023, in order to act as a ‘current rating’, as it is the most recent year.
The MARS trend line provide an interesting picture above: At 1970, there was a wide gap between the highest and lowest MARS, with Hong Kong SAR, China (which I refer to as China from now on) scored a hair under 9.0 and the United Kingdom scored around 6.5. As the decade progress, all MARS values increase for all countries, with the UK drastically improving to 7.5 in only 10 years. This upward trend, with the exception of China trending sideways and slightly declining, continued from 1980 to 2000 at variable rates for each country. 2000 is a very interesting year in this visualization, as it appears to be the year with the tightest interval of MARS values of any year. The difference from the highest to the lowest point is about 0.5 MARS. The second most tightest interval is in the year 2023, which I will discuss in a few sentences. I believe that this tight interval of MARS value in 2000 is due to the the turn of the new millennium and how it coincided with a few short years of rapid advancements from the Internet and all the fear and uncertainty surrounding Y2K.
From 2000 onward, most countries began to move slightly downwards in a more interconnected pattern. While I did not analyze correlations and covariance trends of this time series data, I assert that the rise of the strengthening global economy had an influence on the economic performance of these nations. This continued through the 2020s, with all countries experiencing a sharper decrease than most years from 2015 to 2020. As mentioned early, certain time periods in the new millennium have quite tight intervals; 2020 is the third tightest. This sharp decline from 2015 to 2020 makes sense, as 2020 was when the Coronavirus pandemic began. With 2023 being the most recent year of data available, I believe it makes sense that most of the countries experienced an increase in their MARS values, as 2023 was arguably the first full year where the world was back open for business. With 2023 being the first real year the global economy returned to pre-pandemic levels in terms of economical activity, the tight interval of MARS values represents a ‘soft reset’, like 2000, in maintaining the global economy.
#
names(efw)[24] <- "Judicial_Independence" #https://r-lang.com/names-in-r/
names(efw)[48] <- "Mean_Tariff_Rate_Data" #https://r-lang.com/names-in-r/
tariff_judicial_df <- efw %>%
select(Year, Countries, Mean_Tariff_Rate_Data, Judicial_Independence) %>%
filter(!is.na(Mean_Tariff_Rate_Data), !is.na(Judicial_Independence))%>%
mutate(Judicial_Strength = ifelse(Judicial_Independence <= 2.5, "Quite Weak",
ifelse(Judicial_Independence <= 5, "Weak",
ifelse(Judicial_Independence > 7.5, "Quite Strong", "Strong"))),
Tariff_Rate_Tier = ifelse(Mean_Tariff_Rate_Data <= 2.5, "Low",
ifelse(Mean_Tariff_Rate_Data <= 5, "Medium",
ifelse(Mean_Tariff_Rate_Data <= 7.5, "Moderate",
ifelse(Mean_Tariff_Rate_Data <= 12.5, "High",
ifelse(Mean_Tariff_Rate_Data <= 25, "Significant", "Extreme"))))))%>%
group_by(Judicial_Strength, Tariff_Rate_Tier ) %>%
summarise(n=length(Judicial_Strength), .groups='keep') %>%
data.frame()
missing_rows <- data.frame(Judicial_Strength = c("Quite Weak", "Quite Weak", "Quite Weak"),
Tariff_Rate_Tier = c("Low", "Medium", "Extreme"),
n = c(0, 0, 0))
tariff_judicial_df <- rbind(tariff_judicial_df, missing_rows)
judicial_levels <- c("Quite Weak", "Weak", "Strong", "Quite Strong")
tariff_judicial_df$Judicial_Strength <- factor(tariff_judicial_df$Judicial_Strength, levels = judicial_levels)
tariff_levels <- c("Low", "Medium", "Moderate", "High", "Significant", "Extreme")
tariff_judicial_df$Tariff_Rate_Tier <- factor(tariff_judicial_df$Tariff_Rate_Tier, levels = tariff_levels)
tar_jud_breaks <- c(seq(0,max(tariff_judicial_df$n), by=100))
ggplot(tariff_judicial_df,
aes(x=Tariff_Rate_Tier,
y=Judicial_Strength,
fill=n)) +
geom_tile(color="black") +
geom_text(aes(label=comma(n))) +
coord_equal(ratio=1) +
labs(title="Heatmap: Tariff Rate Tiers compared to Judicial Strength",
x="Tariff Rate Tiers",
y="Judicial Strength",
fill="Instance Count") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_discrete(limits = rev(levels(tariff_judicial_df$Judicial_Strength))) +
scale_fill_continuous(low="white", high="purple", breaks= tar_jud_breaks) +
guides(fill=guide_legend(reverse = TRUE, override.aes = list(colour="black")))
#### Tariff Rate Tiers and Judicial Strength Heatmap
Before analyzing the visualization above, I have included the breakdown of the categorical variables utilized. For the Tariff Rate Tiers variable, here are the intervals at which the data was categorized:
For the Judicial Independence variable, here are the intervals at which the data was categorized:
With the recent headlines surrounding the Supreme Court challenging the constitutionitality of the Trump Administration’s tariffs imposed last spring, I was curious to examine the relationship between the average tariff rate countries impose onto other countries and the historical judicial strength of observed countries. The total 3,910 observations are representative of all observations that are not missing any values for Tariff Rate Tiers and judicial strength, regardless of year. I wanted to incorporate the element of time - via the Year variable - into this visualization, however I feared that this would hinder the effectiveness of this visualization.
The majority of the data is concentrated within the six squares that are located between “Weak” to “Strong” quality of Judicial Strength and “High” to “Significant” for Tariff Rate Tiers. While the three highlighted cell in the “Strong” row has more instances than the “Weak”, this fact shows that there is a wide variability in governments of weak to strong judicial strength place moderately to significantly large tariffs onto other countries they trade with. Further, this indicates that nearly than half of the annual tariff rates countries have place on their trade partners have ranged between 7.50% and 12.50%. The three largest cells have either a “High” Tariff Rate Tier or a “Strong” quality of Judicial Strength; each have approximately an eighth of the data set. Governments that are classified as “Quite Strong” across all Tariff Rate Tiers account for roughly one sixth of the data in this heat map. Governments that scored in the “Quite Week” quality account for a sliver of the data set, with exactly 30 instances.
# Renaming two columns for simplfying data manipulation
names(efw)[41] <- "Inflation_Data"
names(efw)[86] <- "World_Bank_Region"
# Selecting the years I would like to visualize
inflation_world_years <- c(2000, 2005, 2010, 2015, 2020, 2023)
# Creating a data frame that looks at annual inflation based on 'inflation levels' and world bank region
world_inflation_df <- efw %>%
select(Year, World_Bank_Region, Inflation_Data) %>%
filter(!is.na(Year), Year %in% inflation_world_years) %>%
mutate(Inflation_Level = ifelse(Inflation_Data < 0, "Negative",
ifelse(Inflation_Data < 5, "Low",
ifelse(Inflation_Data < 10, "Medium",
ifelse(Inflation_Data < 15, "High",
ifelse((Inflation_Data >= 15) & (Inflation_Data < 50), "Extreme", "Hyperinflation")))))) %>%
group_by(Year, World_Bank_Region, Inflation_Level) %>%
summarise(n=length(Inflation_Level), .groups='keep') %>%
data.frame()
# Cleaning the new data frame of all NAs
cleaned_world_inflation_df <- na.omit(world_inflation_df)
# Factoring categorical variables:
years_levels <- c(2000, 2005, 2010, 2015, 2020, 2023)
years_order <- factor(cleaned_world_inflation_df$Year, level= years_levels)
worbk_levels <- c("East Asia & Pacific", "Europe & Central Asia", "Latin America & the Caribbean", "Middle East & North Africa", "North America", "South Asia", "Sub-Saharan Africa")
worbk_orders <- factor(cleaned_world_inflation_df$World_Bank_Region, level= worbk_levels)
w_inflation_levels <- c("Negative", "Low", "Medium", "High", "Extreme", "Hyperinflation")
cleaned_world_inflation_df$Inflation_Level <- factor(cleaned_world_inflation_df$Inflation_Level, levels = w_inflation_levels)
# Labeling data frame:
# Create a new data frame that captures the total height of all stacked bars in the chart
labels_c_w_inf_df <- cleaned_world_inflation_df %>%
select(!World_Bank_Region) %>% # "!" in this command removes the 'World_Bank_Region' from the data frame
group_by(Year, Inflation_Level) %>% # Groups by year, then inflation level to effectively visualize and label
summarise(Count=sum(n), .groups = 'keep') %>% # Collects the count for the stacked bar chart labels
data.frame()
max_y <- round(max(labels_c_w_inf_df$Count) * 1.025) # Sets a max y-value for all charts; real max y-value is 98
# FIX ME: 2000 only plots 162 countries, 2005 & 2010 only plots 164 countries; figure out this missing countries?!?!?!?!?!?
# 'fig.width' and 'fig.height' argument from this StackOverflow article:
# https://stackoverflow.com/questions/43195871/adjusting-figure-margins-in-rmarkdown
#Now for the big reveal: a trellis stacked bar chart
ggplot(cleaned_world_inflation_df,
aes(x=Inflation_Level,
y=n,
fill=World_Bank_Region)) +
geom_bar(stat = "identity",
position = position_stack(reverse = TRUE)) + # Sets the bars to be stacked
theme_light() + # Depicts a white backdrop onto the graph
theme(plot.title = element_text(hjust = 0.5))+ # Centers the plot title
scale_y_continuous(breaks = seq(0, max_y, by = 20), # Distinguishes how much the y-axis will increase by
limits = c(0, max_y))+
labs(title = "Trellis Stacked Bar Charts: Inflation Levels by World Bank Region by Year",
x = "Inflation Level ",
y = "Country Count",
fill = "World Bank Region") +
geom_text(data =labels_c_w_inf_df, # Adds text to the top of each bar that is ...
aes(x = Inflation_Level,
y = Count, # ... the total count of the stacked bar.
label = scales::comma(Count),
fill = NULL),
vjust = -0.3, # Positioned a touch above the top of the stacked bar
size = 4) +
scale_fill_brewer(palette = "Set2") + # Color Pallette for the different World Bank Regions
facet_wrap(~Year, ncol=3, nrow=2) + # Formats the six trellis panes to be a 2x3 (RxC)
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Utilize the article below to base my code off of
# https://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2
This visualization is a combination of two visualization I learned to create: a trellis chart and a stacked bar chart. Three countries (Somalia, Brunei Darussalam, and Timor-Leste) were missing data in one or multiple time periods; I chose to forego visualizing them. Before talking about the process and the takeaways associated with this visualization, here are the intervals at which annual inflation percentages are categorized into:
The first macro-level observation is that all the trellis distributions above are right skewed to varying degrees; with the majority of the data lying typically in the Low category, then the Negative and Medium categories. A minority of the data lies on the right side of the distributions, in the High, Extreme, and Hyperinflation categories. Another insight that I realized was that the distributions 2015 and 2020 uniquely differ from all other years. These two distributions have Low inflation counts that are extremely large in comparison to all other categories, Negative and Medium inflation counts that are both in the high 20’s to low 30’s range, and all categories (High, Extreme, and Hyperinflation) all have single digit counts. Both of these highlighted years have an average of 27 countries experiencing deflation in comparison to an average of 7 countries experiencing deflation in the other for years (2000, 2005, 2010, 2023). While there is variety in how each of the regions make up each of the stacked bars, there are some noticeable regional trends.
The first regional trend that I have noticed is that in 2015, 2020, and 2023, (figure out the specific numbers) “a noticeable amount of countries in the Middle East & North Africa and Sub-Saharan Africa regions account for having either High, Extreme, or Hyperinflation. A third region, Latin America & the Carribbean, follows this similar trend to a lesser extent in terms of strength. This trend indicates that a handful countries in these World Bank Regions have struggles to curb excessive amounts in the bottom three years visualized. A second regional trend: the three previously mentioned regions join two more regions - East Asia & Pacific, Europe & Central Asia - contribute to the majority of regions that represent the less severe inflation levels (Negative, Low, and Medium).
viz_years <- c("2003", "2013", "2023") # Selecting three years for visualization
efw_income_classes <- efw %>% # Reducing data based on...
select("Year", "Countries", "World_Bank_Region", # ... these variables...
"World Bank Current Income Classification, 1990-Present") %>%
filter(Year %in% viz_years)%>% # ... and these years...
data.frame() # ... into a new data frame
efw_income_classes$Year <- as.factor(efw_income_classes$Year) # Factoring the 'Year' variable
names(efw_income_classes)[3] <- "WB_Region" # Renaming for ease
names(efw_income_classes)[4] <- "WB_Income_Classification" # Renaming for ease
yearcount <- data.frame(count(efw_income_classes, Year, # Getting yearly counts of different income class
efw_income_classes$WB_Income_Classification))
yearcount <- yearcount[2:13,] # Removes unnecessary columns
names(yearcount)[2] <- "WB_Income_Classification" # Renaming for ease
income_levels <- c('High Income', 'Upper-Middle Income', # Factoring income level variable
'Lower-Middle Income', 'Low Income')
yearcount$WB_Income_Classification <- factor(yearcount$WB_Income_Classification,# Factoring income level variable
levels = income_levels)
# Visualization Code:
plot_ly(hole=0.7) %>%
layout(title="Income Classifications (2003, 2013, 2023)") %>%
add_trace(data = yearcount[yearcount$Year==2023,],
labels = ~WB_Income_Classification,
values = ~yearcount[yearcount$Year==2023, "n"],
type="pie",
textposition="inside",
hovertemplate="Year: 2023<br>Classification:%{label}<br>Percent:%{percent}<br>Country Count: %{value}<>extra</extra>") %>%
add_trace(data = yearcount[yearcount$Year==2013,],
labels = ~WB_Income_Classification,
values = ~yearcount[yearcount$Year==2013, "n"],
type="pie",
textposition="inside",
hovertemplate="Year: 2013<br>Classification:%{label}<br>Percent:%{percent}<br>Country Count: %{value}<>extra</extra>",
domain=list(
x=c(0.17, 0.83),
y=c(0.17, 0.83))) %>%
add_trace(data = yearcount[yearcount$Year==2003,],
labels = ~WB_Income_Classification,
values = ~yearcount[yearcount$Year==2003, "n"],
type="pie",
textposition="inside",
hovertemplate="Year: 2003<br>Classification:%{label}<br>Percent:%{percent}<br>Country Count: %{value}<>extra</extra>",
domain=list(
x=c(0.29, 0.71),
y=c(0.29, 0.71)))
The visualization above contains three pie charts, with with pie chart shaped as a circular ring that represents the breakdown of proportions of countries classified by income classification. The inner most ring represents 2003; the middle ring represents 2013; the outermost ring represents 2023. The proportion of countries classified as ‘High Income’ has increased since 2003, with 22.7% of countries increasing to 35.6% in 2023. In contrast, the proportion of countries classified as ‘Low Income’ has decreased from 32.5% in 2003 increasing to 12.9% in 2023. Interestingly, the two middle-based classifications have differing trends that are opposite to each other. The proportion of ‘Upper-Middle Income’ countries has increased from 2003 (17.2%) to 2013 (26.7%), then slightly decreased after 2013, with 2023 reporting a proportion of 25.2%. ‘Lower-Middle Income’ mirrored a similar pattern to its counterpart category; 2003 saw 27.6% of countries in this proportion. Ten years later in 2013, this proportion decreased to 23%; ten more years later, this proportion climbed back up to 26.4%.
As seen in the visualizations above, my takeaways from my analysis of this data set are…
My recommendations are as such:
As for next steps in continuing to analyze this data set, there are multiple courses of action to consider.
With respect to my second visualization (the MARS countries trend lines), looking at how each of the five area scores impact the simple averaging of the MARS values may yield some more insights. Further, looking at the correlation and covariance between the area scores, the MARS trend lines, etc. may also be useful in confirming some of my assumptions related to the historical events that have impacted national and global economics. Additionally, I infer that MARS scores would have increased in 2024 and then trended sideways and/or down, after the United States unveiled historically high tariffs unto all of its trading partners.
Libraries utilized for this project are as followed: plyr, dplyr, tidyr, scales, plotly, ggplot2, ggrepel, cowplot, ggthemes, DescTools, lubridate, data.table, htmlwidgets, and RColorBrewer.