Plato’s Pizza

Introduction

Plato Pizza is a fictional Greek-inspired pizza place in New Jersey, United states of America.Things are going OK here at Plato’s, but there’s room for improvement. We’ve been collecting transaction data for the past year, but really haven’t been able to put it to good use, Mario Maven (Manager, Plato’s Pizza) want the following business questions to be answered from the insight derived from their dataset

The data contains four different dataset that contains datas about their order history and pizza types, They are as follows

About the dataset

This dataset contains 4 tables in CSV format

load packages needed R packages for this analysis

library(tidyverse)
library(lubridate)
library(scales)

load the dataset

order_details <- read.csv("order_details.csv")
orders <- read.csv("orders.csv")
pizza_types <- read.csv("pizza_types.csv")
pizzas <- read.csv("pizzas.csv")

view the individual dataset columns name

colnames(order_details)
## [1] "order_details_id" "order_id"         "pizza_id"         "quantity"
colnames(orders)
## [1] "order_id" "date"     "time"
colnames(pizza_types)
## [1] "pizza_type_id" "name"          "category"      "ingredients"
colnames(pizzas)
## [1] "pizza_id"      "pizza_type_id" "size"          "price"

Preview individual dataset data type

str(order_details)
## 'data.frame':    48620 obs. of  4 variables:
##  $ order_details_id: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ order_id        : int  1 2 2 2 2 2 3 3 4 5 ...
##  $ pizza_id        : chr  "hawaiian_m" "classic_dlx_m" "five_cheese_l" "ital_supr_l" ...
##  $ quantity        : int  1 1 1 1 1 1 1 1 1 1 ...
str(orders)
## 'data.frame':    21350 obs. of  3 variables:
##  $ order_id: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ date    : chr  "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
##  $ time    : chr  "11:38:36" "11:57:40" "12:12:28" "12:16:31" ...
str(pizza_types)
## 'data.frame':    32 obs. of  4 variables:
##  $ pizza_type_id: chr  "bbq_ckn" "cali_ckn" "ckn_alfredo" "ckn_pesto" ...
##  $ name         : chr  "The Barbecue Chicken Pizza" "The California Chicken Pizza" "The Chicken Alfredo Pizza" "The Chicken Pesto Pizza" ...
##  $ category     : chr  "Chicken" "Chicken" "Chicken" "Chicken" ...
##  $ ingredients  : chr  "Barbecued Chicken, Red Peppers, Green Peppers, Tomatoes, Red Onions, Barbecue Sauce" "Chicken, Artichoke, Spinach, Garlic, Jalapeno Peppers, Fontina Cheese, Gouda Cheese" "Chicken, Red Onions, Red Peppers, Mushrooms, Asiago Cheese, Alfredo Sauce" "Chicken, Tomatoes, Red Peppers, Spinach, Garlic, Pesto Sauce" ...
str(pizzas)
## 'data.frame':    96 obs. of  4 variables:
##  $ pizza_id     : chr  "bbq_ckn_s" "bbq_ckn_m" "bbq_ckn_l" "cali_ckn_s" ...
##  $ pizza_type_id: chr  "bbq_ckn" "bbq_ckn" "bbq_ckn" "cali_ckn" ...
##  $ size         : chr  "S" "M" "L" "S" ...
##  $ price        : num  12.8 16.8 20.8 12.8 16.8 ...

Joining of individual dataset

Joining of orders table and order_details table using leftjoin function

order_details_v2 <- left_join(order_details,orders, by= c("order_id"))

joining of the pizzas table and pizza_type table using leftjoin function

pizzas_v2 <- left_join(pizzas,pizza_types, by = c("pizza_type_id"))

view the individual dataset columns name

colnames(pizzas_v2)
## [1] "pizza_id"      "pizza_type_id" "size"          "price"        
## [5] "name"          "category"      "ingredients"
colnames(order_details_v2)
## [1] "order_details_id" "order_id"         "pizza_id"         "quantity"        
## [5] "date"             "time"

Joining of the pizzas_v2 data and order_details_v2 with the pizza_id dataframe using the leftjoin function

maven_pizza <- left_join(order_details_v2,pizzas_v2, by = c("pizza_id"))

Dataset columns name

colnames(maven_pizza)
##  [1] "order_details_id" "order_id"         "pizza_id"         "quantity"        
##  [5] "date"             "time"             "pizza_type_id"    "size"            
##  [9] "price"            "name"             "category"         "ingredients"

All data sets have been combined and ready for processing, New dataframe for months, days of the week and days

maven_pizza$months <- format(as.Date(maven_pizza$date),"%B")
maven_pizza$days <- format(as.Date(maven_pizza$date), "%d")
maven_pizza$day_of_the_week <- format(as.Date(maven_pizza$date),"%A")

Dataset columns name

colnames(maven_pizza)
##  [1] "order_details_id" "order_id"         "pizza_id"         "quantity"        
##  [5] "date"             "time"             "pizza_type_id"    "size"            
##  [9] "price"            "name"             "category"         "ingredients"     
## [13] "months"           "days"             "day_of_the_week"

Verification for an potential outlier in the dataset

table(maven_pizza$price)
## 
##  9.75  10.5    11    12 12.25  12.5 12.75 13.25  14.5 14.75 15.25    16 16.25 
##   739  1001   570  5600   830  3328  2507   473   391   577   712  4474  1114 
##  16.5 16.75  17.5 17.95  18.5 20.25  20.5 20.75    21 23.65  25.5 35.95 
##  4040  4294   381  1273  1359  3032  1998  8685   190   480   544    28
table(maven_pizza$quantity)
## 
##     1     2     3     4 
## 47693   903    21     3

Analysis of data set to use our insight to gain business insight and improve productivity

Total revenue

maven_pizza %>% 
  summarise(Revenue = sum(quantity*price))
##   Revenue
## 1  817860

Total Order

maven_pizza %>% 
  summarise(Orders = sum(quantity))
##   Orders
## 1  49574

Average Order price

maven_pizza %>% 
  summarise(Avg_order_price = mean(quantity*price))
##   Avg_order_price
## 1        16.82147

Number of pizza types

maven_pizza %>% 
  summarise(types_of_pizza = n_distinct(name))
##   types_of_pizza
## 1             32

What days of the week do we tend to be busiest both for number of orders and revenue

  • Busiest day of the week for number of orders placed
maven_pizza %>% 
  mutate(week_days = wday(date, label = TRUE)) %>% 
  group_by(week_days,category) %>% 
  summarise(busiest_days_by_orders = sum(quantity)) %>% 
  arrange(-busiest_days_by_orders) %>% 
  ggplot(aes(x = week_days, y = busiest_days_by_orders, fill = category))+
  geom_col()+
  theme(text = element_text(size = 9),element_line(size =1))+
  labs(title = "Number of orders placed within the days of the Week")
## `summarise()` has grouped output by 'week_days'. You can override using the
## `.groups` argument.

From the insight, Thursday, Saturday and friday has the highest number of orders within the week days ranging from 7478,7493 and 8242 orders respectively. While on the other hands, Sunday and Monday has the lowest number of orders within the week days ranging from 6035 and 6485 respectively

  • Busiest day of the week for revenue generated
maven_pizza %>% 
  mutate(week_days = wday(date, label = TRUE)) %>% 
  group_by(week_days,category) %>% 
  summarise(busiest_days_by_revenue = sum(quantity*price)) %>% 
  arrange(-busiest_days_by_revenue) %>% 
  ggplot(aes(x = week_days, y = busiest_days_by_revenue, fill = category))+
  geom_col()+
    scale_y_continuous(labels = function(x) format(x, scientific = FALSE))+
  theme(text = element_text(size = 9),element_line(size =1))+
  labs(title = "Amount of revenue generated within the days of the Week")
## `summarise()` has grouped output by 'week_days'. You can override using the
## `.groups` argument.

From the insight, Thursday, Saturday and friday generated the highest number of revenue within the week days ranging from $123528,$123182 and $136074 orders respectively. While on the other hands, Sunday and Monday has the lowest number of orders within the week days ranging from $99204 and $107330 respectively

Percentage of Order by pizza categories

 maven_pizza %>%
  group_by(category) %>%
  summarize(Frequency = n()) %>%
  mutate(Percent = paste(round(Frequency / dims[1] * 100,0),"%"))

Graphical representation of number of pizza ordered by pizza category

ggplot(maven_pizza, aes(x = category)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(labels=percent)+
        theme(legend.title = element_text(size = 12),
        legend.text = element_text(size = 12))+
        labs(title = "Percentage of pizza ordered by their category")

From our graph, classic pizza category has the most number of order placed which represent 14579(30%) of total order while chicken pizza category has the lowest number of order placed at 10815(22%) of the total orders

Graphical representation of number of pizza ordered by pizza size

ggplot(maven_pizza, aes(x = size)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(labels=percent)+
        theme(legend.title = element_text(size = 12),
        legend.text = element_text(size = 12))+
        labs(title = "Percentage of pizza ordered by their size")

Best and worst selling pizza

  • Top 10 best selling pizza
maven_pizza %>% 
  group_by(name) %>% 
  summarise(total_price = sum(quantity*price)) %>% 
  arrange(-total_price) %>% 
  print(n = 10) 
## # A tibble: 32 × 2
##    name                         total_price
##    <chr>                              <dbl>
##  1 The Thai Chicken Pizza            43434.
##  2 The Barbecue Chicken Pizza        42768 
##  3 The California Chicken Pizza      41410.
##  4 The Classic Deluxe Pizza          38180.
##  5 The Spicy Italian Pizza           34831.
##  6 The Southwest Chicken Pizza       34706.
##  7 The Italian Supreme Pizza         33477.
##  8 The Hawaiian Pizza                32273.
##  9 The Four Cheese Pizza             32266.
## 10 The Sicilian Pizza                30940.
## # … with 22 more rows
  • Top 10 worst selling pizza
maven_pizza %>% 
  group_by(name) %>% 
  summarise(total_price = sum(quantity*price)) %>% 
  arrange(total_price) %>% 
  print(n = 10)
## # A tibble: 32 × 2
##    name                         total_price
##    <chr>                              <dbl>
##  1 The Brie Carre Pizza              11588.
##  2 The Green Garden Pizza            13956.
##  3 The Spinach Supreme Pizza         15278.
##  4 The Mediterranean Pizza           15360.
##  5 The Spinach Pesto Pizza           15596 
##  6 The Calabrese Pizza               15934.
##  7 The Italian Vegetables Pizza      16019.
##  8 The Soppressata Pizza             16426.
##  9 The Chicken Pesto Pizza           16702.
## 10 The Chicken Alfredo Pizza         16900.
## # … with 22 more rows

Visualization of pizza sales by their revenue

maven_pizza %>% 
  group_by(name) %>% 
  summarise(total_revenue = sum(quantity*price)) %>% 
  ggplot(aes(x = name, y = total_revenue))+
  geom_col()+
  coord_flip() +
  labs(title = "Pizza names and its total revenue")

Revenue by Months by category of pizza

maven_pizza %>% 
  group_by(months) %>% 
  summarise(monthly_revenue = sum(quantity*price)) %>% 
  arrange(-monthly_revenue) %>% 
  ggplot(aes(x = months, y = monthly_revenue ))+
  geom_col()+
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(title = "Monthly revenue by pizza category")

From the graph, revenue generation ranges from above 60k to 72k per month, The month of July recorded the highest amount in revenue at $72,558 while October recorded the lowest amount in revenue at $64028

How well are we utilizing our seating capacity

The seat available presently, seems to be enough from our analyses, we currently have 15 Tables and 60 Chairs . Averaging one order per table, we only have about 80% of the capacity only 2 to 7 times in a month, and likewise, Averaging one pizza per chair,we only recorded only once that the number of orders (64orders) was over the number of available seats (60 chairs)