Tidyverse Create

Author

Long Lin

Introduction

This vignette shows how to use Tidyverse packages like dplyr and ggplot2 to organize data and display findings in a chart. The chosen data source named global grocery inflation is from Kaggle.

Source: https://www.kaggle.com/datasets/waddahali/global-grocery-inflation-20252026/data?select=breakfast+basket.csv

Reading in the data with Tidyverse

We read in the data from Kaggle by first uploading the data to github and then using the read_csv method within the tidyverse library on a raw github link. Then we use the glimpse method within dplyr to visualize the columns within the data set.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

url <- "https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/TIDYVERSE2026/GlobalGroceryInflation.csv"

grocery_inflation <- read_csv(
  file = url,
  show_col_types = FALSE,
  progress = FALSE
)

glimpse(grocery_inflation)

Rows: 10,248
Columns: 27
$ City                       <chr> "New York", "New York", "New York", "New Yo…
$ Country                    <chr> "United States", "United States", "United S…
$ ISO_Country_Code           <chr> "USA", "USA", "USA", "USA", "USA", "USA", "…
$ Region                     <chr> "North America", "North America", "North Am…
$ Continent                  <chr> "North America", "North America", "North Am…
$ Month                      <chr> "2025-10", "2025-11", "2025-12", "2026-01",…
$ Month_Name                 <chr> "October 2025", "November 2025", "December …
$ Item                       <chr> "Milk (1 Liter)", "Milk (1 Liter)", "Milk (…
$ Item_Key                   <chr> "Milk_1L", "Milk_1L", "Milk_1L", "Milk_1L",…
$ Item_Category              <chr> "Dairy", "Dairy", "Dairy", "Dairy", "Dairy"…
$ Quantity                   <dbl> 1, 1, 1, 1, 1, 1, 500, 500, 500, 500, 500, …
$ Unit                       <chr> "liter", "liter", "liter", "liter", "liter"…
$ Price_Local                <dbl> 1.32, 1.33, 1.34, 1.30, 1.35, 1.33, 4.38, 4…
$ Currency_Local             <chr> "USD", "USD", "USD", "USD", "USD", "USD", "…
$ Price_USD                  <dbl> 1.32, 1.33, 1.34, 1.30, 1.35, 1.33, 4.38, 4…
$ Exchange_Rate              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ YoY_Inflation_Estimate_Pct <dbl> 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3…
$ Inflation_Source           <chr> "USDA Food Price Outlook / IMF WEO 2026", "…
$ FAO_Index_Value            <dbl> 127.1, 126.2, 125.8, 124.2, 125.3, 126.0, 1…
$ FAO_Index_Date             <chr> "October 2025", "November 2025", "December …
$ FAO_YoY_Change_Pct         <dbl> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,…
$ USDA_All_Food_Forecast_Pct <dbl> 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1…
$ USDA_Food_At_Home_Pct      <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5…
$ Data_Collection_Date       <date> 2026-03-20, 2026-03-20, 2026-03-20, 2026-0…
$ Source_URL                 <chr> "https://www.numbeo.com/food-prices/in/New-…
$ Population_Estimate        <dbl> 8336817, 8336817, 8336817, 8336817, 8336817…
$ Breakfast_Basket_USD       <dbl> 16.18, 16.40, 17.02, 16.80, 16.82, 16.91, 1…

Next, we use select within dplyr to select specific columns of interest. For this vignette, we will focus on the price of milk in target cities such as New York, Los Angeles, London, Paris, Beijing, and Tokyo. In order to do this, we use filter for Item that is equal to Milk (1 Liter) and for City that are in the target_cities variable.

target_cities <- c("New York", "Los Angeles", "London", "Paris", "Beijing", "Tokyo")

milk_inflation <- grocery_inflation %>%
  select(City, Country, Item, Month, YoY_Inflation_Estimate_Pct) %>%
  filter(Item == "Milk (1 Liter)") %>%
  filter(City %in% target_cities)

head(milk_inflation)

# A tibble: 6 × 5
  City     Country       Item           Month   YoY_Inflation_Estimate_Pct
  <chr>    <chr>         <chr>          <chr>                        <dbl>
1 New York United States Milk (1 Liter) 2025-10                        4.3
2 New York United States Milk (1 Liter) 2025-11                        4.3
3 New York United States Milk (1 Liter) 2025-12                        4.3
4 New York United States Milk (1 Liter) 2026-01                        4.3
5 New York United States Milk (1 Liter) 2026-02                        4.3
6 New York United States Milk (1 Liter) 2026-03                        4.3

Here, we use summarize to create a new column named Average_YoY to grab the average of all the months for each city.

milk_inflation_group <- milk_inflation %>%
  group_by(City) %>% 
  summarize(
    Average_YoY = mean(YoY_Inflation_Estimate_Pct, na.rm = TRUE)
  )

head(milk_inflation_group)

# A tibble: 6 × 2
  City        Average_YoY
  <chr>             <dbl>
1 Beijing             2.4
2 London              3.2
3 Los Angeles         4.3
4 New York            4.3
5 Paris               3.2
6 Tokyo               2.4

In order to visualize the data, we use ggplot2 to create a bar chart to display the results.

library(ggplot2)

ggplot(milk_inflation_group, aes(x = reorder(City, Average_YoY), 
                                       y = Average_YoY, 
                                       fill = City)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = round(Average_YoY, 1)), 
            hjust = -0.2, 
            size = 3.5) + 
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) + 
  labs(
    title = "Average Milk Inflation by City for Oct 2025 to March 2026",
    x = "City",
    y = "Average Year over Year Inflation (%)"
  ) +
  theme_minimal()

Conclusion

Tidyverse is a very powerful tool for managing and transforming data. It allows you to create easy to read charts created from a data source of your choice. In this example, packages such as dplyr and ggplot2, made it very easy to manipulate the data and display the results.