Tidy01

Introduction

Below is a comprehensive examination of global inflation rates spanning from 1980 to 2024 across 196 countries. We will explore data tidying, thorough cleaning, and a nuanced analysis, with particular emphasis on distinguishing between hyperinflation and non-hyperinflation periods. The data was collected by the World Bank [https://data.worldbank.org/] and transformed on Kaggle by SAZIDUL ISLAM [https://www.kaggle.com/datasets/sazidthe1/global-inflation-data].

Import libraries

Functions such as pivot_longer in tidyr and filter from dplyr are used in this analysis.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)

Import data

The data set is a csv file that is pulled from a GitHub link.

#reading in the data set to an object named after the inflation data set (infl)
infl <- read.csv("https://raw.githubusercontent.com/evanskaylie/DATA607/main/global_inflation_data%202.csv", sep = ',')

#preview data
head(infl)

##          country_name                                  indicator_name X1980
## 1         Afghanistan Annual average inflation (consumer prices) rate  13.4
## 2             Albania Annual average inflation (consumer prices) rate    NA
## 3             Algeria Annual average inflation (consumer prices) rate   9.7
## 4             Andorra Annual average inflation (consumer prices) rate    NA
## 5              Angola Annual average inflation (consumer prices) rate  46.7
## 6 Antigua and Barbuda Annual average inflation (consumer prices) rate  19.0
##   X1981 X1982 X1983 X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991  X1992
## 1  22.2  18.2  15.9  20.4   8.7  -2.1  18.4  27.5  71.5  47.4  43.8  58.19
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA  -0.2  35.7 226.00
## 3  14.6   6.6   7.8   6.3  10.4  14.0   5.9   5.9   9.2   9.3  25.9  31.70
## 4    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## 5   1.4   1.8   1.8   1.8   1.8   1.8   1.8   1.8   1.8   1.8  85.3 299.10
## 6  11.5   4.2   2.3   3.8   1.0   0.5   3.6   6.8   4.4   6.6   4.5   3.00
##     X1993  X1994  X1995   X1996  X1997  X1998  X1999 X2000 X2001  X2002 X2003
## 1   33.99  20.01   14.0   14.01  14.01  14.01  14.01   0.0 -43.4  51.93 35.66
## 2   85.00  22.60    7.8   12.70  33.20  20.60   0.40   0.0   3.1   5.20  2.40
## 3   20.50  29.00   29.8   18.70   5.70   5.00   2.60   0.3   4.2   1.40  4.30
## 4      NA     NA     NA      NA     NA     NA     NA    NA    NA   3.10  3.10
## 5 1379.50 949.80 2672.2 4146.00 221.50 107.40 248.20 325.0 152.6 108.90 98.20
## 6    3.10   6.50    2.7    3.00   0.40   3.30   1.10  -0.2   1.9   2.40  2.00
##   X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016
## 1 16.36 10.57  6.78  8.68 26.42 -6.81  2.18  11.8  6.44  7.39  4.67 -0.66  4.38
## 2  2.90  2.40  2.40  3.00  3.30  2.20  3.60   3.4  2.00  1.90  1.60  1.90  1.30
## 3  4.00  1.40  2.30  3.70  4.90  5.70  3.90   4.5  8.90  3.30  2.90  4.80  6.40
## 4  2.90  3.50  3.70  2.70  4.30 -1.20  1.70   2.6  1.50  0.50 -0.10 -1.10 -0.40
## 5 43.50 23.00 13.30 12.20 12.50 13.70 14.50  13.5 10.30  8.80  7.30  9.20 30.70
## 6  2.00  2.10  1.80  1.40  5.30 -0.60  3.40   3.5  3.40  1.10  1.10  1.00 -0.50
##   X2017 X2018 X2019 X2020 X2021 X2022 X2023 X2024
## 1  4.98  0.63   2.3  5.44  5.06 13.71   9.1    NA
## 2  2.00  2.00   1.4  1.60  2.00  6.70   4.8   4.0
## 3  5.60  4.30   2.0  2.40  7.20  9.30   9.0   6.8
## 4  2.60  1.00   0.5  0.10  1.70  6.20   5.2   3.5
## 5 29.80 19.60  17.1 22.30 25.80 21.40  13.1  22.3
## 6  2.40  1.20   1.4  1.10  1.60  7.50   5.0   2.9

Tidy the data set

The data set above includes each year as a column. This is untidy as years should not be features of observations. Rather, each year should be its own observation. Tidying this data will have a column for year rather than 46 year columns. Each row will be a single observation for each country and year.

#pivot the years to a single column
infl <- infl |>
  pivot_longer(
    cols = !c(country_name, indicator_name),
    names_to = "year",
    values_to = "annual_average_inflation_rate"
  )

#check the data
head(infl)

## # A tibble: 6 × 4
##   country_name indicator_name                       year  annual_average_infla…¹
##   <chr>        <chr>                                <chr>                  <dbl>
## 1 Afghanistan  Annual average inflation (consumer … X1980                   13.4
## 2 Afghanistan  Annual average inflation (consumer … X1981                   22.2
## 3 Afghanistan  Annual average inflation (consumer … X1982                   18.2
## 4 Afghanistan  Annual average inflation (consumer … X1983                   15.9
## 5 Afghanistan  Annual average inflation (consumer … X1984                   20.4
## 6 Afghanistan  Annual average inflation (consumer … X1985                    8.7
## # ℹ abbreviated name: ¹annual_average_inflation_rate

Clean the data set

Not only was this data not tidy, but it could use some cleaning. The year values include an X in front of each number. Also, there is a column that does not provide any meaningful information, especially after the transformation renamed the values for annual average inflation rate as such.

#removing all characters from the year column other than numbers
infl$year <- as.character(str_extract_all(infl$year, "[0-9]+"))

#dropping the column that specifies annual average inflation (consumer prices) rate
infl <- infl |> 
  select(country_name, year, annual_average_inflation_rate)

Analyze the data

The below analysis explores how to group and filter the data to give some meaningful insights.

Current state

First, we will take a look at the current state to guide the direction of the next steps.

#look at quantitative summary
summary(infl$annual_average_inflation_rate)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##   -72.70     2.10     4.70    42.07    10.20 65374.10      868

#how does the graph look
ggplot(infl, aes(x = year, y = annual_average_inflation_rate)) +
  geom_point(na.rm = TRUE)

What are the biggest issues with this graph? It does not seem to show us anything meaningful. 2 things are clear:

1- The range of median to max values is causing the graph’s distribution of values to be functionally unreadable.

2- The y-axis has too many values to read the labels.

Remove hyperinflation

The first analysis will include removing hyperinflation. EconLib defines the scale of hyperinflation, “Although the threshold is arbitrary, economists generally reserve the term ‘hyperinflation’ to describe episodes when the monthly inflation rate is greater than 50 percent.” We will use this range and remove all values of annual inflation rates below -50 and above 50. The removed values will be explored later in our analysis.

#filter out hyper inflation -50 to 50
infl_no_hyp <- infl |>
                  filter(
                    annual_average_inflation_rate < 50,
                    annual_average_inflation_rate > -50
                  )

#graph = too many years, lets slim it down
ggplot(infl_no_hyp, aes(x = year, y = annual_average_inflation_rate)) +
  geom_boxplot(aes(color = year), na.rm = TRUE)

Rounding to decades

The second issue with the graph is that there are too many years to give a readable visualization. Grouping years by there decades will help solve this.

#round each year to the decade it is in to group data by decades
infl_decades <- infl_no_hyp |>
  group_by(year)
infl_decades$year <- str_sub(infl_decades$year, start = 3, end = 3)
infl_decades$year <- paste0(infl_decades$year, "0s")
decades_order <- c('80s','90s','00s','10s','20s')

#find the average of each decade for the graph
infl_decades <- infl_decades |>
  group_by(year) |>
  mutate(
    average_infl = mean(annual_average_inflation_rate)
  )


#graph
ggplot(infl_decades, aes(x = factor(year, level = decades_order), y = annual_average_inflation_rate)) +
  geom_boxplot(aes(color = year), na.rm = TRUE) + 
  geom_point(aes(y = average_infl), shape = 21)

Dive into hyperinflation

Now let’s take a look at the hyperinflation values that were removed.

#all hyperinflation
infl_hyp <- infl |>
  filter(
    annual_average_inflation_rate >= 50
  )

ggplot(infl_hyp, aes(x = annual_average_inflation_rate)) +
  ggtitle("All Hyperinflation") + 
  geom_histogram(bins = 100)

#low hyp
infl_hyp <- infl |>
  filter(
    annual_average_inflation_rate >= 50,
    annual_average_inflation_rate <= 600
  )

ggplot(infl_hyp, aes(x = annual_average_inflation_rate)) +
  ggtitle("Majority Hyperinflation") + 
  geom_histogram(bins = 15)

The All Hyperinflation graph shows that the distribution of hyperinflation values is wildly skewed. Because of this, we drill into the first bin on that graph. Those drilled values are what is shown on the Majority Hyperinflation histogram.

Analysis summary and conclusions

Variance and averages over decades:

The analysis performed tells a story. The hyperinflation-excluding box plot visualization shows that the variance in annual average inflation rates has mostly decreased over the decades. Along with this information, the average inflation rates has decreased steadily between the 1980s and the 2010s, and then increases in the 2020s. This is a great measure for global economic status. As EconLib states, “Most economists agree that inflation lowers economic welfare even when allowing for revenue from the inflation tax and the distortion that would be created by alternative taxes that raise the same revenue.” This means we can take the boxplot visualization as one measure of global welfare, with larger counts of higher inflation to mean worse welfare in some countries.

Hyperinflation:

With the hyperinflation graph, we can see the frequency that different levels of hyperinflation have occurred globally in the past 40 or so years. Hyperinflation is caused when a governing power faces pressure to pay money they do not have, and solve by effectively printing more money. This devalues the currency and causes more money to need to be printed. The cycle continues until the original currency approaches functional worthlessness. In our analysis above, it looks like the cases of hyperinflation are most common between rates of 50 and 130, with few cases existing above 600. From this, we can infer that it is more common for inflation to stay the same or decrease than it is for inflation to continue rising after it reaches a rate of 130.