Below I import the csv file containing MTA ridership data from Github, where I saved the file.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
fileURL <- 'https://raw.githubusercontent.com/stoybis/DATA607Repo/main/global_inflation_data.csv'
inflationData <-read.csv(url(fileURL))
head(inflationData)
## country_name indicator_name X1980
## 1 Afghanistan Annual average inflation (consumer prices) rate 13.4
## 2 Albania Annual average inflation (consumer prices) rate NA
## 3 Algeria Annual average inflation (consumer prices) rate 9.7
## 4 Andorra Annual average inflation (consumer prices) rate NA
## 5 Angola Annual average inflation (consumer prices) rate 46.7
## 6 Antigua and Barbuda Annual average inflation (consumer prices) rate 19.0
## X1981 X1982 X1983 X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992
## 1 22.2 18.2 15.9 20.4 8.7 -2.1 18.4 27.5 71.5 47.4 43.8 58.19
## 2 NA NA NA NA NA NA NA NA NA -0.2 35.7 226.00
## 3 14.6 6.6 7.8 6.3 10.4 14.0 5.9 5.9 9.2 9.3 25.9 31.70
## 4 NA NA NA NA NA NA NA NA NA NA NA NA
## 5 1.4 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 85.3 299.10
## 6 11.5 4.2 2.3 3.8 1.0 0.5 3.6 6.8 4.4 6.6 4.5 3.00
## X1993 X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003
## 1 33.99 20.01 14.0 14.01 14.01 14.01 14.01 0.0 -43.4 51.93 35.66
## 2 85.00 22.60 7.8 12.70 33.20 20.60 0.40 0.0 3.1 5.20 2.40
## 3 20.50 29.00 29.8 18.70 5.70 5.00 2.60 0.3 4.2 1.40 4.30
## 4 NA NA NA NA NA NA NA NA NA 3.10 3.10
## 5 1379.50 949.80 2672.2 4146.00 221.50 107.40 248.20 325.0 152.6 108.90 98.20
## 6 3.10 6.50 2.7 3.00 0.40 3.30 1.10 -0.2 1.9 2.40 2.00
## X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016
## 1 16.36 10.57 6.78 8.68 26.42 -6.81 2.18 11.8 6.44 7.39 4.67 -0.66 4.38
## 2 2.90 2.40 2.40 3.00 3.30 2.20 3.60 3.4 2.00 1.90 1.60 1.90 1.30
## 3 4.00 1.40 2.30 3.70 4.90 5.70 3.90 4.5 8.90 3.30 2.90 4.80 6.40
## 4 2.90 3.50 3.70 2.70 4.30 -1.20 1.70 2.6 1.50 0.50 -0.10 -1.10 -0.40
## 5 43.50 23.00 13.30 12.20 12.50 13.70 14.50 13.5 10.30 8.80 7.30 9.20 30.70
## 6 2.00 2.10 1.80 1.40 5.30 -0.60 3.40 3.5 3.40 1.10 1.10 1.00 -0.50
## X2017 X2018 X2019 X2020 X2021 X2022 X2023 X2024
## 1 4.98 0.63 2.3 5.44 5.06 13.71 9.1 NA
## 2 2.00 2.00 1.4 1.60 2.00 6.70 4.8 4.0
## 3 5.60 4.30 2.0 2.40 7.20 9.30 9.0 6.8
## 4 2.60 1.00 0.5 0.10 1.70 6.20 5.2 3.5
## 5 29.80 19.60 17.1 22.30 25.80 21.40 13.1 22.3
## 6 2.40 1.20 1.4 1.10 1.60 7.50 5.0 2.9
The data is not in a tidy format because there are multiple observations in each row. Each column is a different observation, for example we have the observation of average annual inflation data in 1980, 1981, 1982, etc for each country. These are all new observations and should have their own rows.
First, I remove the X from the column names.
colnames(inflationData)[3:ncol(inflationData)] <- str_replace(colnames(inflationData)[3:ncol(inflationData)],"X","")
head(inflationData)
## country_name indicator_name 1980 1981
## 1 Afghanistan Annual average inflation (consumer prices) rate 13.4 22.2
## 2 Albania Annual average inflation (consumer prices) rate NA NA
## 3 Algeria Annual average inflation (consumer prices) rate 9.7 14.6
## 4 Andorra Annual average inflation (consumer prices) rate NA NA
## 5 Angola Annual average inflation (consumer prices) rate 46.7 1.4
## 6 Antigua and Barbuda Annual average inflation (consumer prices) rate 19.0 11.5
## 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 1 18.2 15.9 20.4 8.7 -2.1 18.4 27.5 71.5 47.4 43.8 58.19 33.99 20.01
## 2 NA NA NA NA NA NA NA NA -0.2 35.7 226.00 85.00 22.60
## 3 6.6 7.8 6.3 10.4 14.0 5.9 5.9 9.2 9.3 25.9 31.70 20.50 29.00
## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA
## 5 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 85.3 299.10 1379.50 949.80
## 6 4.2 2.3 3.8 1.0 0.5 3.6 6.8 4.4 6.6 4.5 3.00 3.10 6.50
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
## 1 14.0 14.01 14.01 14.01 14.01 0.0 -43.4 51.93 35.66 16.36 10.57
## 2 7.8 12.70 33.20 20.60 0.40 0.0 3.1 5.20 2.40 2.90 2.40
## 3 29.8 18.70 5.70 5.00 2.60 0.3 4.2 1.40 4.30 4.00 1.40
## 4 NA NA NA NA NA NA NA 3.10 3.10 2.90 3.50
## 5 2672.2 4146.00 221.50 107.40 248.20 325.0 152.6 108.90 98.20 43.50 23.00
## 6 2.7 3.00 0.40 3.30 1.10 -0.2 1.9 2.40 2.00 2.00 2.10
## 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
## 1 6.78 8.68 26.42 -6.81 2.18 11.8 6.44 7.39 4.67 -0.66 4.38 4.98 0.63
## 2 2.40 3.00 3.30 2.20 3.60 3.4 2.00 1.90 1.60 1.90 1.30 2.00 2.00
## 3 2.30 3.70 4.90 5.70 3.90 4.5 8.90 3.30 2.90 4.80 6.40 5.60 4.30
## 4 3.70 2.70 4.30 -1.20 1.70 2.6 1.50 0.50 -0.10 -1.10 -0.40 2.60 1.00
## 5 13.30 12.20 12.50 13.70 14.50 13.5 10.30 8.80 7.30 9.20 30.70 29.80 19.60
## 6 1.80 1.40 5.30 -0.60 3.40 3.5 3.40 1.10 1.10 1.00 -0.50 2.40 1.20
## 2019 2020 2021 2022 2023 2024
## 1 2.3 5.44 5.06 13.71 9.1 NA
## 2 1.4 1.60 2.00 6.70 4.8 4.0
## 3 2.0 2.40 7.20 9.30 9.0 6.8
## 4 0.5 0.10 1.70 6.20 5.2 3.5
## 5 17.1 22.30 25.80 21.40 13.1 22.3
## 6 1.4 1.10 1.60 7.50 5.0 2.9
Then I pivot the data frame to a longer format so that each year is its own observation in a separate row. I also convert the years and values to type numeric and the country names to factors.
inflationDataTidy <- pivot_longer(inflationData,
cols = !c('country_name','indicator_name'), names_to = 'year', values_to = 'value')
inflationDataTidy$country_name <- as.factor(inflationDataTidy$country_name)
inflationDataTidy$year <- as.numeric(inflationDataTidy$year)
inflationDataTidy$value <- as.numeric(inflationDataTidy$value)
head(inflationDataTidy, n = 10)
## # A tibble: 10 × 4
## country_name indicator_name year value
## <fct> <chr> <dbl> <dbl>
## 1 Afghanistan Annual average inflation (consumer prices) rate 1980 13.4
## 2 Afghanistan Annual average inflation (consumer prices) rate 1981 22.2
## 3 Afghanistan Annual average inflation (consumer prices) rate 1982 18.2
## 4 Afghanistan Annual average inflation (consumer prices) rate 1983 15.9
## 5 Afghanistan Annual average inflation (consumer prices) rate 1984 20.4
## 6 Afghanistan Annual average inflation (consumer prices) rate 1985 8.7
## 7 Afghanistan Annual average inflation (consumer prices) rate 1986 -2.1
## 8 Afghanistan Annual average inflation (consumer prices) rate 1987 18.4
## 9 Afghanistan Annual average inflation (consumer prices) rate 1988 27.5
## 10 Afghanistan Annual average inflation (consumer prices) rate 1989 71.5
The data frame is now tidy - each new observation is its own row.
One of the questions is to compare average annual inflation data for countries in similar regions. I am curious to see how the inflation data compares for the US, Canada, and Mexico, the three largest countries in North America.
Below I filter the tidy data frame for these countries.
UsCanMex <- inflationDataTidy |> filter(country_name == 'Canada'|country_name=='Mexico'| country_name == 'United States')
head(UsCanMex, n = 10)
## # A tibble: 10 × 4
## country_name indicator_name year value
## <fct> <chr> <dbl> <dbl>
## 1 Canada Annual average inflation (consumer prices) rate 1980 10.2
## 2 Canada Annual average inflation (consumer prices) rate 1981 12.5
## 3 Canada Annual average inflation (consumer prices) rate 1982 10.8
## 4 Canada Annual average inflation (consumer prices) rate 1983 5.8
## 5 Canada Annual average inflation (consumer prices) rate 1984 4.3
## 6 Canada Annual average inflation (consumer prices) rate 1985 4
## 7 Canada Annual average inflation (consumer prices) rate 1986 4.2
## 8 Canada Annual average inflation (consumer prices) rate 1987 4.4
## 9 Canada Annual average inflation (consumer prices) rate 1988 4
## 10 Canada Annual average inflation (consumer prices) rate 1989 5
Below I graph the average annual inflation rates over time.
ggplot(UsCanMex, aes(x = year, y = value, color = country_name)) +
geom_line() + ggtitle('Average annual inflation rate over time')
While all three countries had higher inflation in the early 1980s than the 1990s to 2000s, Mexico was meaningfully higher than the US and Canada. This may have to do with Mexico’s economy being in the developing stage whereas the US and Canada are developed economies.
Below, I conduct the above analysis for Spain, France, Germany, Italy, and Portugal, five of the larger countries in Western Europe.
filterList <- c('Spain', 'France', 'Germany', 'Italy', 'Portugal')
westernEurope <- inflationDataTidy |> filter(country_name %in% filterList)
head(westernEurope, n = 10)
## # A tibble: 10 × 4
## country_name indicator_name year value
## <fct> <chr> <dbl> <dbl>
## 1 France Annual average inflation (consumer prices) rate 1980 13.1
## 2 France Annual average inflation (consumer prices) rate 1981 13.3
## 3 France Annual average inflation (consumer prices) rate 1982 12
## 4 France Annual average inflation (consumer prices) rate 1983 9.5
## 5 France Annual average inflation (consumer prices) rate 1984 7.7
## 6 France Annual average inflation (consumer prices) rate 1985 5.8
## 7 France Annual average inflation (consumer prices) rate 1986 2.5
## 8 France Annual average inflation (consumer prices) rate 1987 3.3
## 9 France Annual average inflation (consumer prices) rate 1988 2.7
## 10 France Annual average inflation (consumer prices) rate 1989 6.6
Below I graph the average annual inflation rates over time
ggplot(westernEurope, aes(x = year, y = value, color = country_name)) +
geom_line() + ggtitle('Average annual inflation rate over time')
Similar to the North America graph, these countries saw high inflation in the 1980s. However, post 1980, the average annual inflation rates for these countries is higher than the US and Canada even though all of these countries would be considered developed economies.