Below is a comprehensive examination of global inflation rates spanning from 1980 to 2024 across 196 countries. We will explore data tidying, thorough cleaning, and a nuanced analysis, with particular emphasis on distinguishing between hyperinflation and non-hyperinflation periods. The data was collected by the World Bank [https://data.worldbank.org/] and transformed on Kaggle by SAZIDUL ISLAM [https://www.kaggle.com/datasets/sazidthe1/global-inflation-data].
Functions such as pivot_longer in tidyr and filter from dplyr are used in this analysis.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
The data set is a csv file that is pulled from a GitHub link.
#reading in the data set to an object named after the inflation data set (infl)
infl <- read.csv("https://raw.githubusercontent.com/evanskaylie/DATA607/main/global_inflation_data%202.csv", sep = ',')
#preview data
head(infl)
## country_name indicator_name X1980
## 1 Afghanistan Annual average inflation (consumer prices) rate 13.4
## 2 Albania Annual average inflation (consumer prices) rate NA
## 3 Algeria Annual average inflation (consumer prices) rate 9.7
## 4 Andorra Annual average inflation (consumer prices) rate NA
## 5 Angola Annual average inflation (consumer prices) rate 46.7
## 6 Antigua and Barbuda Annual average inflation (consumer prices) rate 19.0
## X1981 X1982 X1983 X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992
## 1 22.2 18.2 15.9 20.4 8.7 -2.1 18.4 27.5 71.5 47.4 43.8 58.19
## 2 NA NA NA NA NA NA NA NA NA -0.2 35.7 226.00
## 3 14.6 6.6 7.8 6.3 10.4 14.0 5.9 5.9 9.2 9.3 25.9 31.70
## 4 NA NA NA NA NA NA NA NA NA NA NA NA
## 5 1.4 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 85.3 299.10
## 6 11.5 4.2 2.3 3.8 1.0 0.5 3.6 6.8 4.4 6.6 4.5 3.00
## X1993 X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003
## 1 33.99 20.01 14.0 14.01 14.01 14.01 14.01 0.0 -43.4 51.93 35.66
## 2 85.00 22.60 7.8 12.70 33.20 20.60 0.40 0.0 3.1 5.20 2.40
## 3 20.50 29.00 29.8 18.70 5.70 5.00 2.60 0.3 4.2 1.40 4.30
## 4 NA NA NA NA NA NA NA NA NA 3.10 3.10
## 5 1379.50 949.80 2672.2 4146.00 221.50 107.40 248.20 325.0 152.6 108.90 98.20
## 6 3.10 6.50 2.7 3.00 0.40 3.30 1.10 -0.2 1.9 2.40 2.00
## X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016
## 1 16.36 10.57 6.78 8.68 26.42 -6.81 2.18 11.8 6.44 7.39 4.67 -0.66 4.38
## 2 2.90 2.40 2.40 3.00 3.30 2.20 3.60 3.4 2.00 1.90 1.60 1.90 1.30
## 3 4.00 1.40 2.30 3.70 4.90 5.70 3.90 4.5 8.90 3.30 2.90 4.80 6.40
## 4 2.90 3.50 3.70 2.70 4.30 -1.20 1.70 2.6 1.50 0.50 -0.10 -1.10 -0.40
## 5 43.50 23.00 13.30 12.20 12.50 13.70 14.50 13.5 10.30 8.80 7.30 9.20 30.70
## 6 2.00 2.10 1.80 1.40 5.30 -0.60 3.40 3.5 3.40 1.10 1.10 1.00 -0.50
## X2017 X2018 X2019 X2020 X2021 X2022 X2023 X2024
## 1 4.98 0.63 2.3 5.44 5.06 13.71 9.1 NA
## 2 2.00 2.00 1.4 1.60 2.00 6.70 4.8 4.0
## 3 5.60 4.30 2.0 2.40 7.20 9.30 9.0 6.8
## 4 2.60 1.00 0.5 0.10 1.70 6.20 5.2 3.5
## 5 29.80 19.60 17.1 22.30 25.80 21.40 13.1 22.3
## 6 2.40 1.20 1.4 1.10 1.60 7.50 5.0 2.9
The data set above includes each year as a column. This is untidy as years should not be features of observations. Rather, each year should be its own observation. Tidying this data will have a column for year rather than 46 year columns. Each row will be a single observation for each country and year.
#pivot the years to a single column
infl <- infl |>
pivot_longer(
cols = !c(country_name, indicator_name),
names_to = "year",
values_to = "annual_average_inflation_rate"
)
#check the data
head(infl)
## # A tibble: 6 × 4
## country_name indicator_name year annual_average_infla…¹
## <chr> <chr> <chr> <dbl>
## 1 Afghanistan Annual average inflation (consumer … X1980 13.4
## 2 Afghanistan Annual average inflation (consumer … X1981 22.2
## 3 Afghanistan Annual average inflation (consumer … X1982 18.2
## 4 Afghanistan Annual average inflation (consumer … X1983 15.9
## 5 Afghanistan Annual average inflation (consumer … X1984 20.4
## 6 Afghanistan Annual average inflation (consumer … X1985 8.7
## # ℹ abbreviated name: ¹annual_average_inflation_rate
Not only was this data not tidy, but it could use some cleaning. The year values include an X in front of each number. Also, there is a column that does not provide any meaningful information, especially after the transformation renamed the values for annual average inflation rate as such.
#removing all characters from the year column other than numbers
infl$year <- as.character(str_extract_all(infl$year, "[0-9]+"))
#dropping the column that specifies annual average inflation (consumer prices) rate
infl <- infl |>
select(country_name, year, annual_average_inflation_rate)
The below analysis explores how to group and filter the data to give some meaningful insights.
First, we will take a look at the current state to guide the direction of the next steps.
#look at quantitative summary
summary(infl$annual_average_inflation_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -72.70 2.10 4.70 42.07 10.20 65374.10 868
#how does the graph look
ggplot(infl, aes(x = year, y = annual_average_inflation_rate)) +
geom_point(na.rm = TRUE)
What are the biggest issues with this graph? It does not seem to show us anything meaningful. 2 things are clear:
1- The range of median to max values is causing the graph’s distribution of values to be functionally unreadable.
2- The y-axis has too many values to read the labels.
The first analysis will include removing hyperinflation. EconLib defines the scale of hyperinflation, “Although the threshold is arbitrary, economists generally reserve the term ‘hyperinflation’ to describe episodes when the monthly inflation rate is greater than 50 percent.” We will use this range and remove all values of annual inflation rates below -50 and above 50. The removed values will be explored later in our analysis.
#filter out hyper inflation -50 to 50
infl_no_hyp <- infl |>
filter(
annual_average_inflation_rate < 50,
annual_average_inflation_rate > -50
)
#graph = too many years, lets slim it down
ggplot(infl_no_hyp, aes(x = year, y = annual_average_inflation_rate)) +
geom_boxplot(aes(color = year), na.rm = TRUE)
The second issue with the graph is that there are too many years to give a readable visualization. Grouping years by there decades will help solve this.
#round each year to the decade it is in to group data by decades
infl_decades <- infl_no_hyp |>
group_by(year)
infl_decades$year <- str_sub(infl_decades$year, start = 3, end = 3)
infl_decades$year <- paste0(infl_decades$year, "0s")
decades_order <- c('80s','90s','00s','10s','20s')
#find the average of each decade for the graph
infl_decades <- infl_decades |>
group_by(year) |>
mutate(
average_infl = mean(annual_average_inflation_rate)
)
#graph
ggplot(infl_decades, aes(x = factor(year, level = decades_order), y = annual_average_inflation_rate)) +
geom_boxplot(aes(color = year), na.rm = TRUE) +
geom_point(aes(y = average_infl), shape = 21)
Now let’s take a look at the hyperinflation values that were removed.
#all hyperinflation
infl_hyp <- infl |>
filter(
annual_average_inflation_rate >= 50
)
ggplot(infl_hyp, aes(x = annual_average_inflation_rate)) +
ggtitle("All Hyperinflation") +
geom_histogram(bins = 100)
#low hyp
infl_hyp <- infl |>
filter(
annual_average_inflation_rate >= 50,
annual_average_inflation_rate <= 600
)
ggplot(infl_hyp, aes(x = annual_average_inflation_rate)) +
ggtitle("Majority Hyperinflation") +
geom_histogram(bins = 15)
The All Hyperinflation graph shows that the distribution of hyperinflation values is wildly skewed. Because of this, we drill into the first bin on that graph. Those drilled values are what is shown on the Majority Hyperinflation histogram.
The analysis performed tells a story. The hyperinflation-excluding box plot visualization shows that the variance in annual average inflation rates has mostly decreased over the decades. Along with this information, the average inflation rates has decreased steadily between the 1980s and the 2010s, and then increases in the 2020s. This is a great measure for global economic status. As EconLib states, “Most economists agree that inflation lowers economic welfare even when allowing for revenue from the inflation tax and the distortion that would be created by alternative taxes that raise the same revenue.” This means we can take the boxplot visualization as one measure of global welfare, with larger counts of higher inflation to mean worse welfare in some countries.
With the hyperinflation graph, we can see the frequency that different levels of hyperinflation have occurred globally in the past 40 or so years. Hyperinflation is caused when a governing power faces pressure to pay money they do not have, and solve by effectively printing more money. This devalues the currency and causes more money to need to be printed. The cycle continues until the original currency approaches functional worthlessness. In our analysis above, it looks like the cases of hyperinflation are most common between rates of 50 and 130, with few cases existing above 600. From this, we can infer that it is more common for inflation to stay the same or decrease than it is for inflation to continue rising after it reaches a rate of 130.