Abstract

This analysis explores the Pizza Sales dataset, focusing on understanding customer preferences and identifying sales patterns for a restaurant chain. This involves cleaning and transforming raw data, generating new insights through calculated variables, and answering key business questions. The findings provide actionable recommendations to optimize sales performance and improve customer satisfaction.

Introduction

In the highly competitive food and beverage industry, understanding sales trends and customer preferences is important for business growth. This analysis examines a comprehensive pizza sales dataset from a restaurant chain. The aim is to uncover insights into customer behavior, sales patterns, and revenue optimization. By addressing business questions like identifying the busiest days, most popular pizza categories, and peak sales times, this analysis provides strategic recommendations for enhancing performance.

Intro to Data

The dataset, titled “Pizza Sales”, was sourced from Kaggle. This dataset contains detailed records of pizza sales transactions from a fictitious pizza chain, providing valuable insights into customer preferences and sales trends.

Key Details:

  • Last Updated: The dataset was last updated on Kaggle 1 year ago.
  • Time-Frame: The dataset captures transactions over one year.

Dataset Overview:

  • Rows and Columns: The dataset comprises 48,620 rows and 12 columns.
  • Granularity: Each row represents a single pizza sold during a transaction.

Key Variables:

  • Order Date and Order Time: Provide temporal details for analyzing sales patterns over time.
  • Pizza Name, Pizza Size, and Pizza Category: Contain descriptive information about the pizzas sold.
  • Quantity, Unit Price, and Total Price: Include transaction-specific financial data.
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
Data_Model_Pizza_Sales <- read_excel("Downloads/Data Model - Pizza Sales.xlsx")
glimpse(Data_Model_Pizza_Sales)
## Rows: 48,620
## Columns: 12
## $ order_details_id  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
## $ order_id          <dbl> 1, 2, 2, 2, 2, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 9, 9…
## $ pizza_id          <chr> "hawaiian_m", "classic_dlx_m", "five_cheese_l", "ita…
## $ quantity          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ order_date        <dttm> 2015-01-01, 2015-01-01, 2015-01-01, 2015-01-01, 201…
## $ order_time        <dttm> 1899-12-31 11:38:36, 1899-12-31 11:57:40, 1899-12-3…
## $ unit_price        <dbl> 13.25, 16.00, 18.50, 20.75, 16.00, 20.75, 16.50, 20.…
## $ total_price       <dbl> 13.25, 16.00, 18.50, 20.75, 16.00, 20.75, 16.50, 20.…
## $ pizza_size        <chr> "M", "M", "L", "L", "M", "L", "M", "L", "M", "M", "S…
## $ pizza_category    <chr> "Classic", "Classic", "Veggie", "Supreme", "Veggie",…
## $ pizza_ingredients <chr> "Sliced Ham, Pineapple, Mozzarella Cheese", "Peppero…
## $ pizza_name        <chr> "The Hawaiian Pizza", "The Classic Deluxe Pizza", "T…

Data Cleaning

Change the format of Order Time Column
Data_Model_Pizza_Sales <- Data_Model_Pizza_Sales %>%
  mutate(
    sale_time = format(order_time, "%H:%M:%S")
  )
Create a new column called Day of Week using the Date column
Data_Model_Pizza_Sales <- Data_Model_Pizza_Sales %>%
  mutate(day_of_week = weekdays(order_date))
Remove Order Time column
Data_Model_Pizza_Sales <- Data_Model_Pizza_Sales %>%
  select(-order_time)
Create a new column called Total Revenue using the Quantity and Unit Price columns
Data_Model_Pizza_Sales <- Data_Model_Pizza_Sales %>%
  mutate(total_revenue = quantity * unit_price)
Create a new column called Ingredient Count using the Pizza Ingredients column
library(stringr)  
Data_Model_Pizza_Sales <- Data_Model_Pizza_Sales %>%
  mutate(ingredient_count = str_count(pizza_ingredients, ",") + 1)
Remove all unnecessary columns by selecting the most important ones
Data_Model_Pizza_Sales <- Data_Model_Pizza_Sales %>%
  select(
    sale_date = order_date,  
    sale_time,               
    day_of_week,             
    total_revenue,          
    pizza_name,             
    pizza_category,         
    pizza_size,              
    ingredient_count         
  )
glimpse(Data_Model_Pizza_Sales)
## Rows: 48,620
## Columns: 8
## $ sale_date        <dttm> 2015-01-01, 2015-01-01, 2015-01-01, 2015-01-01, 2015…
## $ sale_time        <chr> "11:38:36", "11:57:40", "11:57:40", "11:57:40", "11:5…
## $ day_of_week      <chr> "Thursday", "Thursday", "Thursday", "Thursday", "Thur…
## $ total_revenue    <dbl> 13.25, 16.00, 18.50, 20.75, 16.00, 20.75, 16.50, 20.7…
## $ pizza_name       <chr> "The Hawaiian Pizza", "The Classic Deluxe Pizza", "Th…
## $ pizza_category   <chr> "Classic", "Classic", "Veggie", "Supreme", "Veggie", …
## $ pizza_size       <chr> "M", "M", "L", "L", "M", "L", "M", "L", "M", "M", "S"…
## $ ingredient_count <dbl> 3, 5, 6, 6, 8, 5, 6, 3, 6, 6, 6, 6, 8, 8, 5, 5, 6, 6,…

Business Questions & Analysis

1. What are the busiest days for pizza sales?

sales_by_day <- Data_Model_Pizza_Sales %>%
  group_by(day_of_week) %>%
  summarise(total_revenue = sum(total_revenue)) %>%
  arrange(desc(total_revenue))
ggplot(sales_by_day, aes(x = reorder(day_of_week, total_revenue), y = total_revenue)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Revenue by Day of the Week", x = "Day of the Week", y = "Total Revenue") +
  theme_minimal()

Observations:

  • Fridays generate the highest revenue, making them the busiest day for sales.
  • Mondays and Sundays experience lower sales compared to other days of the week.
  • Sales steadily increase from Monday to Friday, suggesting growing demand throughout the week.

Recommendations:

  • Introduce special deals or discounts on slower days like Mondays to increase foot traffic - and online orders.
  • Leverage Fridays by offering premium-priced combo meals or family packs to maximize revenue.

3. Which pizza size generates the most revenue?

sales_by_size <- Data_Model_Pizza_Sales %>%
  group_by(pizza_size) %>%
  summarise(total_revenue = sum(total_revenue)) %>%
  arrange(desc(total_revenue))
ggplot(sales_by_size, aes(x = reorder(pizza_size, total_revenue), y = total_revenue, fill = pizza_size)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Revenue by Pizza Size", x = "Pizza Size", y = "Total Revenue") +
  theme_minimal()

Observations:

  • Large pizzas generate the most revenue, likely due to higher prices and popularity with groups and families.
  • Medium pizzas also perform well, appealing to smaller groups or individuals who want a balance of value and portion size.
  • Small pizzas contribute the least, potentially due to lower demand or limited use cases (e.g., solo dining).

Recommendations:

  • Highlight Large pizzas in group-focused promotions or discounts, especially during weekends and holidays.
  • Continue offering Medium pizzas as a versatile option for individual and group customers.
  • Introduce value deals for Small pizzas to boost their sales, such as pairing them with beverages or sides.

4. What time of day sees the highest sales?

Data_Model_Pizza_Sales <- Data_Model_Pizza_Sales %>%
  mutate(hour = hour(ymd_hms(paste(sale_date, sale_time))))

sales_by_hour <- Data_Model_Pizza_Sales %>%
  group_by(hour) %>%
  summarise(total_revenue = sum(total_revenue)) %>%
  arrange(desc(total_revenue))
ggplot(sales_by_hour, aes(x = hour, y = total_revenue)) +
  geom_line() +
  labs(title = "Total Revenue by Hour of Day", x = "Hour of Day", y = "Total Revenue") +
  theme_minimal()

Observations:

  • Peak sales occur during lunch hours (12 PM–2 PM) and dinner hours (5 PM–8 PM).
  • Minimal sales are observed during early morning and late-night hours, suggesting low customer demand.
  • The evening peak aligns with typical dining patterns, while lunch attracts working professionals and casual diners.

Recommendations:

  • Optimize staffing and inventory for dinner and lunch hours to handle peak demand efficiently.
  • Introduce late-night deals to attract customers during off-peak times.
  • Experiment with breakfast pizza options or morning coffee pairings to explore early-morning sales opportunities.

Conclusion

This analysis revealed significant patterns in pizza sales.

By leveraging these insights, the restaurant chain can improve sales strategies, enhance customer satisfaction, and maximize profitability.