pizza

About the Dataset

A year’s worth of sales from a fictitious pizza place, including the date and time of each order and the pizzas served, with additional details on the type, size, quantity, price, and ingredients

Table Field Description

Table Field Description
orders order_id Unique identifier for each order placed by a table
orders date Date the order was placed (entered into the system prior to cooking & serving)
orders time Time the order was placed (entered into the system prior to cooking & serving)
order_details order_details_id Unique identifier for each pizza placed within each order (pizzas of the same type and size are kept in the same row, and the quantity increases)
order_details order_id Foreign key that ties the details in each order to the order itself
order_details pizza_id Foreign key that ties the pizza ordered to its details, like size and price
order_details quantity Quantity ordered for each pizza of the same type and size
pizzas pizza_id Unique identifier for each pizza (constituted by its type and size)
pizzas pizza_type_id Foreign key that ties each pizza to its broader pizza type
pizzas size Size of the pizza (Small, Medium, Large, X Large, or XX Large)
pizzas price Price of the pizza in USD
pizza_types pizza_type_id Unique identifier for each pizza type
pizza_types name Name of the pizza as shown in the menu
pizza_types category Category that the pizza fall under in the menu (Classic, Chicken, Supreme, or Veggie)
pizza_types ingredients Comma-delimited ingredients used in the pizza as shown in the menu (they all include Mozzarella Cheese, even if not specified; and they all include Tomato Sauce, unless another sauce is specified)

Data Processing

  • Joining the tables
  • Changing the data format
  • Adding some new converted format column
  • Adding the Revenue column by multiplying the quantity by price
  • generate a data summary to conduct descriptive analysis and identify any potential outliers in the data
  • Remove a unique identifier from the merged tables as it does not add value to the data and may clutter it.
pizzas_order <- inner_join(order_details, orders, by = "order_id")
pizzas_order <- inner_join(pizzas_order, pizzas, by = "pizza_id")
pizzas_df <- inner_join(pizzas_order,pizza_types, by = "pizza_type_id")
pizzas_df<- pizzas_df %>% 
  mutate(revenue =  quantity * price, date_converted = ymd(date)) %>% 
  mutate(converted_month = format(ymd(date), format = "%B"),
         converted_day = format(ymd(date), format ="%A")) %>% 
  mutate(converted_hour = hour(hms(time))) %>% 
  select(- c(order_details_id, order_id, pizza_id, pizza_type_id))

head(pizzas_df)
##   quantity       date     time size price                      name category
## 1        1 2015-01-01 11:38:36    M 13.25        The Hawaiian Pizza  Classic
## 2        1 2015-01-01 11:57:40    M 16.00  The Classic Deluxe Pizza  Classic
## 3        1 2015-01-01 11:57:40    L 18.50     The Five Cheese Pizza   Veggie
## 4        1 2015-01-01 11:57:40    L 20.75 The Italian Supreme Pizza  Supreme
## 5        1 2015-01-01 11:57:40    M 16.00        The Mexicana Pizza   Veggie
## 6        1 2015-01-01 11:57:40    L 20.75    The Thai Chicken Pizza  Chicken
##                                                                                    ingredients
## 1                                                     Sliced Ham, Pineapple, Mozzarella Cheese
## 2                                         Pepperoni, Mushrooms, Red Onions, Red Peppers, Bacon
## 3 Mozzarella Cheese, Provolone Cheese, Smoked Gouda Cheese, Romano Cheese, Blue Cheese, Garlic
## 4                      Calabrese Salami, Capocollo, Tomatoes, Red Onions, Green Olives, Garlic
## 5  Tomatoes, Red Peppers, Jalapeno Peppers, Red Onions, Cilantro, Corn, Chipotle Sauce, Garlic
## 6                           Chicken, Pineapple, Tomatoes, Red Peppers, Thai Sweet Chilli Sauce
##   revenue date_converted converted_month converted_day converted_hour
## 1   13.25     2015-01-01         January      Thursday             11
## 2   16.00     2015-01-01         January      Thursday             11
## 3   18.50     2015-01-01         January      Thursday             11
## 4   20.75     2015-01-01         January      Thursday             11
## 5   16.00     2015-01-01         January      Thursday             11
## 6   20.75     2015-01-01         January      Thursday             11
tail(pizzas_df)
##       quantity       date     time size price                        name
## 48615        1 2015-12-31 21:14:37    L 20.75 The Southwest Chicken Pizza
## 48616        1 2015-12-31 21:23:10    M 16.75   The Chicken Alfredo Pizza
## 48617        1 2015-12-31 21:23:10    L 17.95       The Four Cheese Pizza
## 48618        1 2015-12-31 21:23:10    S 12.00        The Napolitana Pizza
## 48619        1 2015-12-31 22:09:54    L 20.25          The Mexicana Pizza
## 48620        1 2015-12-31 23:02:05    S 12.75  The Barbecue Chicken Pizza
##       category
## 48615  Chicken
## 48616  Chicken
## 48617   Veggie
## 48618  Classic
## 48619   Veggie
## 48620  Chicken
##                                                                                             ingredients
## 48615      Chicken, Tomatoes, Red Peppers, Red Onions, Jalapeno Peppers, Corn, Cilantro, Chipotle Sauce
## 48616                         Chicken, Red Onions, Red Peppers, Mushrooms, Asiago Cheese, Alfredo Sauce
## 48617 Ricotta Cheese, Gorgonzola Piccante Cheese, Mozzarella Cheese, Parmigiano Reggiano Cheese, Garlic
## 48618                                             Tomatoes, Anchovies, Green Olives, Red Onions, Garlic
## 48619       Tomatoes, Red Peppers, Jalapeno Peppers, Red Onions, Cilantro, Corn, Chipotle Sauce, Garlic
## 48620               Barbecued Chicken, Red Peppers, Green Peppers, Tomatoes, Red Onions, Barbecue Sauce
##       revenue date_converted converted_month converted_day converted_hour
## 48615   20.75     2015-12-31        December      Thursday             21
## 48616   16.75     2015-12-31        December      Thursday             21
## 48617   17.95     2015-12-31        December      Thursday             21
## 48618   12.00     2015-12-31        December      Thursday             21
## 48619   20.25     2015-12-31        December      Thursday             22
## 48620   12.75     2015-12-31        December      Thursday             23
skimr::skim(pizzas_df)
Data summary
Name pizzas_df
Number of rows 48620
Number of columns 13
_______________________
Column type frequency:
character 8
Date 1
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
date 0 1 10 10 0 358 0
time 0 1 8 8 0 16382 0
size 0 1 1 3 0 5 0
name 0 1 15 42 0 32 0
category 0 1 6 7 0 4 0
ingredients 0 1 28 97 0 32 0
converted_month 0 1 3 9 0 12 0
converted_day 0 1 6 9 0 7 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date_converted 0 1 2015-01-01 2015-12-31 2015-06-28 358

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
quantity 0 1 1.02 0.14 1.00 1.00 1.0 1.00 4.00 ▇▁▁▁▁
price 0 1 16.49 3.62 9.75 12.75 16.5 20.25 35.95 ▇▇▆▁▁
revenue 0 1 16.82 4.44 9.75 12.75 16.5 20.50 83.00 ▇▁▁▁▁
converted_hour 0 1 15.91 3.15 9.00 13.00 16.0 18.00 23.00 ▁▇▆▆▂
Data Summary
  • The data summary show that the date have a total of 48620 observation and 13 variables with different data type.
  • All over the observation there is no missing value at all.
  • The “revenue” column has an average value of 16.82, and the data points in this column tend to deviate from the mean by approximately 4.44 units on average.
  • Based on the quartile values of the revenue, it appears that there may be some high-value orders that are significantly higher than the majority of orders. The maximum value of 83.00 seems to be much higher than the third quartile value of 20.50, indicating that there may be an order or orders in a day that stand out in terms of revenue.
pizzas_df %>% 
  filter(revenue == 83.00) %>% 
  select(date,time, name, size, quantity, price, revenue)
##         date     time                         name size quantity price revenue
## 1 2015-09-19 14:16:35 The California Chicken Pizza    L        4 20.75      83
  • Upon further investigation, it has been determined that the unusually high revenue observed on September 19, 2015, around 2:16 pm is due to a large order size of pizza with a quantity of 4 and a unit price of 20.75. This explains why the revenue for this particular order is significantly higher compared to others. By reviewing the data in the relevant columns, it has been confirmed that this outlier in the revenue column is not a mistake, but rather a valid data point.

  • This additional analysis has provided insight into the reason behind the high revenue value for this particular order, and it confirms that the outlier is not an error in the data. Understanding the factors contributing to outliers in the data, such as large order sizes or high-priced items, is important in gaining a comprehensive understanding of the data and making informed decisions based on accurate insights.

1. How many customers do we have each day? Are there any peak hours?

costumer_per_Day <- pizzas_df %>% 
  group_by(converted_day) %>% 
  summarise(total_costumer = n()) %>% 
  mutate(converted_day = fct_reorder(converted_day, total_costumer))

 costumer_per_Day_graph <-  costumer_per_Day %>% 
 ggplot(aes(x = converted_day, y = total_costumer, fill = total_costumer)) +
  geom_col(show.legend = F, width = 0.7) +
  geom_text(aes(label = total_costumer), nudge_y = 500)+
  scale_fill_distiller(palette = "BuGn", direction = 1) +
  coord_flip()+
  labs(
    title = "No. of Costumer per Day for Whole Year",
    x ="",
    y = ""
  )+
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

costumer_per_Hour <- pizzas_df %>% 
  group_by(converted_hour) %>% 
  summarise(total_costumer = n())

costumer_per_Hour_graph <- ggplot(costumer_per_Hour) +
  aes(x = converted_hour, y = total_costumer) +
  geom_step(size = 0.8, colour = "#228B22") +
  labs(
    title = "No. of Costumer per Hour for Whole Year",
    x ="",
    y = ""
  )+
  theme_minimal()+
  scale_x_continuous(breaks = c(9,11,13,15,17,19,21,23), 
                     labels = c("9","11","13", "15", "17", "19", "21", "23"))+
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold") 
  )

 costumer_per_Day_graph | costumer_per_Hour_graph

1.1 Graph Explanation

  • According to the graph, sales are highest on Friday, followed by Saturday. It appears that sales increase as the weekend approaches. To make sure that actually, if the weekends are approaching, the customer is increasing, we will try to categorize by month to the following graph.

  • In a full-day sale, there are two periods when customer traffic is at its highest: from 12 a.m to 2 p.m and from 5 p.m to 7 p.m. Also, we will categorize the customer number by hour for each month to confirm that the peak is actually during the previously specified hours.

costumer_per_month_graph <- function(data, month){
  data %>% 
  filter(converted_month == month) %>% 
  group_by(converted_day) %>% 
  summarise(total_costumer = n()) %>% 
   mutate(converted_day = fct_reorder(converted_day, total_costumer)) %>% 
    
# Graph 
 ggplot(aes(x = converted_day, y = total_costumer, fill = total_costumer)) +
  geom_col(show.legend = F, width = 0.7) +
  geom_text(aes(label = total_costumer), nudge_y = -50, size = 3.5)+
  scale_fill_distiller(palette = "BuGn", direction = 1) +
  coord_flip()+
  labs(
    title = month,
    x ="",
    y = ""
  )+
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, size = 15)
  )
}

Jan <- costumer_per_month_graph(pizzas_df, "January")
Feb <- costumer_per_month_graph(pizzas_df, "February")
March <- costumer_per_month_graph(pizzas_df, "March")
April <- costumer_per_month_graph(pizzas_df, "April")
May <- costumer_per_month_graph(pizzas_df, "May")
June <- costumer_per_month_graph(pizzas_df, "June")
July <- costumer_per_month_graph(pizzas_df, "July")
Aug <- costumer_per_month_graph(pizzas_df, "August")
Sep <- costumer_per_month_graph(pizzas_df, "September")
Oct <- costumer_per_month_graph(pizzas_df, "October")
Nov <- costumer_per_month_graph(pizzas_df, "November")
Dec <- costumer_per_month_graph(pizzas_df, "December")

## Plot the Graph
(Jan | Feb | March | April ) / ( May | June | July | Aug ) / (Sep | Oct | Nov | Dec) +
   plot_annotation(title = "Monthly Days Costumer", 
                  theme = theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold")))

1.2 Graph Explanation

  • Upon aggregating the data by month, it appears that Fridays and Thursdays have the highest number of customers on most days with higher customer traffic in some months. This could indicate that these two weekdays are particularly popular among customers for visiting the establishment. Additionally, Mondays and Tuesdays also show some customer activity, albeit to a lesser extent with only one and two instances respectively, suggesting that there may be some customers who prefer to visit on the early weekdays.

  • Interestingly, despite the popularity of Fridays and Saturdays, the previous graph shows that weekdays still have a larger customer peak compared to weekends. This could be surprising, as it challenges the assumption that all customers prefer Fridays and Saturdays for their visits. This finding could indicate that there are a significant number of customers who prefer visiting on weekdays, possibly due to various factors such as work schedules, personal preferences, or other reasons.

  • Furthermore, based on the data, it is anticipated that around a month from now, Thursdays and the following days will continue to have the highest volume of customers. This trend could be attributed to the consistent popularity of Thursdays and the subsequent days in terms of customer traffic. On the other hand, Sundays are expected to have the lowest volume of customers, which could be due to lower customer demand or reduced business hours on weekends.

step_graph_customer_per_hour <- function(data, month){
  costumer_per_Hour <- data %>% 
  filter(converted_month == month) %>% 
  group_by(converted_hour) %>% 
  summarise(total_costumer = n())

ggplot(costumer_per_Hour) +
  aes(x = converted_hour, y = total_costumer) +
  geom_step(size = 0.8, colour = "#228B22") +
  labs(
    title = month,
    x = "Hour",
    y = ""
  )+
  theme_minimal()+
  theme(
        plot.title = element_text(hjust = 0.5, size = 15)
  )+
  scale_x_continuous(breaks = c(9,11,13,15,17,19,21,23), 
                     labels = c("9","11","13", "15", "17", "19", "21", "23"))
}

Jan <- step_graph_customer_per_hour(pizzas_df, "January")
Feb <- step_graph_customer_per_hour(pizzas_df, "February")
March <- step_graph_customer_per_hour(pizzas_df, "March")
April <- step_graph_customer_per_hour(pizzas_df, "April")
May <- step_graph_customer_per_hour(pizzas_df, "May")
June <- step_graph_customer_per_hour(pizzas_df, "June")
July <- step_graph_customer_per_hour(pizzas_df, "July")
Aug <- step_graph_customer_per_hour(pizzas_df, "August")
Sep <- step_graph_customer_per_hour(pizzas_df, "September")
Oct <- step_graph_customer_per_hour(pizzas_df, "October")
Nov <- step_graph_customer_per_hour(pizzas_df, "November")
Dec <- step_graph_customer_per_hour(pizzas_df, "December")

(Jan | Feb | March | April ) / ( May | June | July | Aug ) / (Sep | Oct | Nov | Dec) +
  plot_annotation(title = "Monthly Hours Costumer", 
                  theme = theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold")))

1.3 Graph Explanation

  • Upon categorizing the graph by month, it is observed that customers tend to have a higher peak in orders from around 12 a.m to 2 p.m and from 5 p.m to 7 p.m. This observation confirms that the previous graph indicating higher customer activity during certain time periods is accurate. This information could be valuable in understanding customer behavior patterns and planning operational strategies accordingly. Further analysis of customer preferences and trends during these peak hours could provide insights into optimizing sales and improving customer satisfaction.

2. How many pizzas are typically in an order? Do we have any bestsellers?

pizzas_df %>%
  group_by(converted_hour) %>%
  summarise(total_order = sum(quantity)) %>%
  ggplot(aes(x = converted_hour, y = total_order)) +
  geom_line(color = "grey") +
  geom_point(shape = 21, color = "black", fill = "219C4B", size = 6) +
  geom_text(aes(label = total_order), vjust = -1.5, size = 3) +  # Add labels for total_order
  scale_x_continuous(breaks = seq(9, 23, by = 1)) +
  labs(title = "Total Order per Hour",
       y = "Total Order",
       x = "Hour") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

2.1 Graph Explanation
  • The data shows two distinct peaks in customer activity and order placement. The first peak occurs between 12 a.m. and 1 p.m., while the second peak is observed between 5 p.m. and 6 p.m. These time periods correspond to the busiest hours for order placement and are characterized by a higher influx of customers.

  • During the early morning and lunchtime hours (12 a.m. to 1 p.m.), there is a significant surge in orders. This can be attributed to various factors such as breakfast orders, mid-morning snacks, and lunchtime deliveries. This period represents a key opportunity for businesses to cater to the needs of their customers by ensuring efficient service and timely deliveries.

  • The second peak in customer activity occurs in the late afternoon to early evening (5 p.m. to 6 p.m.). This time frame is commonly associated with the end of the workday and the start of the evening rush hour. Many people tend to place orders for dinner during this period, either for delivery or pickup on their way home. The convenience of ordering food during this time allows individuals to save time on meal preparation and enjoy a hassle-free dining experience.

# Define the desired order of months
month_order <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

pizzas_df %>%
  mutate(converted_month = factor(converted_month, levels = month_order),
         converted_month = parse_date_time(paste0(converted_month, "2015"), orders = "B Y")) %>%
  group_by(converted_month) %>%
  summarise(total_order = sum(quantity)) %>%
  ggplot(aes(x = converted_month, y = total_order)) +
  geom_line(color = "grey") +
  geom_point(shape = 21, color = "black", fill = "219C4B", size = 6) +
  geom_text(aes(label = total_order), vjust = -1.5, size = 3) +
  labs(title = "Total Order per Month",
       y = "Total Order",
       x = "Month") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 

2.2 Graph Explanation
  • The graph represents the monthly total number of orders over a given period. Analyzing the data, we observe that July has the highest number of orders, followed by May and November, while February, September, and October have the lowest number of orders.

  • Examining the trend from January to July, we notice a gradual decrease in the number of orders per month. This could be attributed to various factors such as seasonal variations, changes in consumer behavior, or specific events occurring during that period. Despite the overall decreasing trend, it’s important to note that there may be fluctuations within each month.

  • Moving on to the period from July to October, we observe a sudden decrease in the number of orders compared to July. This drop could be a result of specific factors influencing consumer demand, such as vacations, holidays, or other external circumstances that impact purchasing patterns. However, it is interesting to note that the number of orders increases again in November, suggesting a possible shift in consumer behavior or an event that drives higher order volumes during that month.

  • Finally, from November to December, there is another decrease in the number of orders. This decline could be associated with the holiday season, as people tend to focus more on family gatherings and celebrations rather than ordering food or other products.

pizzas_df %>% 
  group_by(name) %>% 
  summarise(total_count = sum(quantity)) %>%
  mutate(name = fct_reorder(name, total_count)) %>% 
  ggplot(aes(x = name, y = total_count)) +
  geom_segment(aes(xend = name, yend = 0), color = "green") +
  geom_point(size = 3, color = "darkgreen") +
  labs(
    title = "Total Order per Pizza Name",
    x = "",
    y = ""
  ) +
  theme_light() +
  coord_flip() +
  theme(
    panel.grid.major.y = element_blank(),
    panel.border = element_blank(),
    axis.ticks.y = element_blank(),
        axis.text.y = element_text(size = 12, face = "bold"),
    plot.title = element_text(hjust = 0.5, size = 20,  face = "bold")
  )

2.3 Graph Explanation
  • The data indicates the total number of orders for each pizza name, revealing that the Classic Deluxe pizza has the highest number of orders, while the Brie Carre pizza has the lowest number of orders.

  • The popularity of the Classic Deluxe pizza can be attributed to its appealing combination of ingredients, which likely appeals to a wide range of customers. This pizza may feature a variety of toppings, such as pepperoni, mushrooms, onions, and bell peppers, among others, which are commonly enjoyed by many pizza enthusiasts. The consistent high demand for the Classic Deluxe pizza suggests that it is a customer favorite and a reliable choice for those seeking a delicious and satisfying pizza experience.

  • On the other hand, the Brie Carre pizza has the lowest number of orders. This may be due to its unique combination of ingredients, which might cater to a more specific or niche market. The flavor profile of this pizza likely appeals to individuals who enjoy the distinct taste of Brie cheese and other complementary toppings. While the Brie Carre pizza may not have mass appeal compared to other options on the menu, it can still be a valuable addition for customers with more adventurous palates or a preference for gourmet ingredients.

pizza_order_month <- function(data, month){
  data %>% 
  filter(converted_month == month) %>% 
  group_by(name) %>% 
  summarise(total_count = sum(quantity)) %>%
  mutate(name = gsub("\\bThe\\b|\\bPizza\\b", "",name)) %>% 
  top_n(5) %>% 
  mutate(name = fct_reorder(name, total_count)) %>%
    
  # Graph 
 ggplot(aes(x = name, y = total_count, fill = total_count)) +
  geom_col(show.legend = F, width = 0.7) +
  geom_text(aes(label = total_count), nudge_y = -30, size = 5)+
  scale_fill_distiller(palette = "BuGn", direction = 1) +
  coord_flip()+
  labs(
    title = month,
    x ="",
    y = ""
  )+
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, size = 15),
    axis.text.y = element_text(size = 12, face = "bold")
  )
}

Jan <- pizza_order_month(pizzas_df, "January")
Feb <- pizza_order_month(pizzas_df, "February")
Mar <- pizza_order_month(pizzas_df, "March")
April <- pizza_order_month(pizzas_df, "April")
May <- pizza_order_month(pizzas_df, "May")
June <- pizza_order_month(pizzas_df, "June")
July <- pizza_order_month(pizzas_df, "July")
Aug <- pizza_order_month(pizzas_df, "August")
Sep <- pizza_order_month(pizzas_df, "September")
Oct <- pizza_order_month(pizzas_df, "October")
Nov <- pizza_order_month(pizzas_df, "November")
Dec <- pizza_order_month(pizzas_df, "December")

(Jan | Feb | Mar | April ) / ( May | June | July | Aug ) / (Sep | Oct | Nov | Dec) +
  plot_annotation(title = "Monthly Top 5 Total Order per Pizza Name", 
                  theme = theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold")))

2.4 Graph Explanation
  • Analyzing the data on monthly total orders per pizza name, we can observe that the Pepperoni pizza consistently has the highest number of orders throughout the entire month. This indicates that Pepperoni is a popular choice among customers and remains in high demand regardless of the specific month.

  • In addition to Pepperoni, there are several other pizza names that stand out for specific months. These include Barbecue Chicken, Hawaiian, Classic Deluxe, and California Chicken. These pizzas likely have their own unique combination of toppings and flavors that attract a significant number of customers during certain months. It’s worth noting that these pizzas might align with seasonal preferences, regional tastes, or promotional activities that drive their popularity during specific times of the year.

  • Furthermore, it is interesting to observe that some pizza names consistently appear as top orders across multiple months. This suggests that certain pizzas have a steady and loyal customer base, regardless of the specific time of year. These pizzas might have gained a reputation for their taste, quality, or value, leading customers to choose them consistently throughout the year.

3. How much money did we make this year? Can we identify any seasonality in the sales?

## revenue all over the year
sum(pizzas_df$revenue)
## [1] 817860

Total Revenue

  • Based on the data for the entire year, the revenue generated amounts to $817,860.
revenue_data <- pizzas_df %>% 
  group_by(date_converted, converted_month) %>% 
  summarise(revenue = sum(revenue))  %>% 
  mutate(season  = case_when(
    converted_month %in% c("January", "February", "March") ~ "1st quarter",
    converted_month %in% c("April", "May", "June" ) ~ "2nd quarter",
    converted_month %in% c("July","August", "September") ~ "3rd quarter",
    converted_month %in% c("October", "November","December") ~ "4th quarter"
  ))
## `summarise()` has grouped output by 'date_converted'. You can override using
## the `.groups` argument.
revenue_data %>% 
ggplot() +
  aes(x = date_converted, y = revenue) +
  geom_line(size = 0.5, colour = "#0C4C8A") +
  annotate(geom="text", x=as.Date("2015-11-27"), y=4700, size = 3, 
             label="The highest revenue recorded for\n September 27, 2015, was $4,422.45") +
  annotate(geom="point", x=as.Date("2015-11-27"), y=4422.45, size=10, shape=21, fill="transparent") +
  theme_minimal() +
  labs(
    x= "Date", title = "Revenue All Over the Year"
  ) +
  scale_y_continuous(limits = c(1000, 5000))+
    theme(
      plot.title = element_text(hjust = .5, size = 20L, face = "bold")
    )

3.1 Graph Explanation
  • The line graph represents the revenue generated over the course of a year. It appears that the revenue line does not exhibit a consistent upward or downward trend. Instead, it shows fluctuations with some days having lower revenue and others having higher revenue.

  • One notable point on the graph is September 27, 2015, which stands out as having the highest revenue of $4,422.45. This indicates a particularly successful day in terms of generating income.

  • To better understand the seasonality in the graph, it may be challenging to identify. In order to address this, categorizing the data into quarterly or monthly periods. This would involve grouping the revenue data into specific time intervals, such as three-month quarters or individual months. Analyzing the data in this way can potentially reveal patterns or trends that may not be immediately apparent when viewing the overall year-long graph.

revenue_month <- function(data, month){
  data %>% 
  filter(converted_month == month) %>% 
    
  # Graph 
 ggplot(aes(x = date_converted, y = revenue)) +
  geom_line(size = 1, colour = "#0C4C8A") + 
  labs(
    title = month,
    x ="",
    y = " "
  )+
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, size = 15)
  )
}
Jan <- revenue_month(revenue_data, "January")
Feb <- revenue_month(revenue_data, "February")
Mar <- revenue_month(revenue_data, "March")
April <- revenue_month(revenue_data, "April")
May <- revenue_month(revenue_data, "May")
June <- revenue_month(revenue_data, "June")
July <- revenue_month(revenue_data, "July")
Aug <- revenue_month(revenue_data, "August")
Sep <- revenue_month(revenue_data, "September")
Oct <- revenue_month(revenue_data, "October")
Nov <- revenue_month(revenue_data, "November")
Dec <- revenue_month(revenue_data, "December")

(Jan | Feb | Mar | April ) / ( May | June | July | Aug ) / (Sep | Oct | Nov | Dec) +
  plot_annotation(title = "Monthly Revenue", 
                  theme = theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold")))

##### 3.2 Graph Explanation

  • The line graph illustrates the monthly revenue generated over the course of a year. Each line represents the revenue pattern for a specific month, and it is evident that the lines do not follow a consistent or repetitive pattern throughout the year. This indicates that there is no clear seasonality in the revenue for the entire year.

  • The absence of seasonality implies that the revenue does not exhibit a regular cycle or predictable fluctuations tied to specific months or seasons. Instead, the revenue figures fluctuate independently from month to month, without any discernible pattern.

quarterly_revenue <- function(data, quarter){
  data %>% 
  filter(season == quarter) %>% 
  # Graph 
 ggplot(aes(x = date_converted, y = revenue)) +
  geom_line(size = 1, colour = "#0C4C8A") + 
  labs(
    title = quarter,
    x ="",
    y = ""
  )+
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, size = 15),
    axis.text.y = element_text(size = 12, face = "bold")
  )
}

first <- quarterly_revenue(revenue_data, "1st quarter")
second <- quarterly_revenue(revenue_data, "2nd quarter")
third <- quarterly_revenue(revenue_data, "3rd quarter")
fourth <- quarterly_revenue(revenue_data, "4th quarter")
 

(first | second) / (third | fourth) +
  plot_annotation(title = "Quarterly Revenue", 
                  theme = theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold")))

3.3 Graph Explanation
  • The line graph also reveals that the revenue pattern is not consistent when analyzed on a quarterly basis. This further confirms that there is no clear, repetitive pattern or seasonality in the revenue for the entire year, whether viewed monthly or quarterly.

  • Analyzing revenue trends on a quarterly basis provides a broader perspective by grouping data into three-month intervals. Despite this broader view, the graph still demonstrates that the revenue lines for each quarter do not exhibit a consistent pattern or follow a predictable trend throughout the year.

# Statistical Tests
kruskal_wallis <- kruskal.test(revenue_data$revenue, g = revenue_data$converted_month)

# Print the test results
print(kruskal_wallis)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  revenue_data$revenue and revenue_data$converted_month
## Kruskal-Wallis chi-squared = 5.6972, df = 11, p-value = 0.8928
3.4 Kruskal-Wallis rank sum test
  • The test result shows a Kruskal-Wallis chi-squared value of 5.6972, with 11 degrees of freedom. The corresponding p-value is 0.8928.

  • Interpreting the results, the p-value of 0.8928 indicates that there is no statistically significant difference in revenue among the months. This suggests that the revenue data does not vary significantly across different months and no seoanality at all.

# Statistical Tests
kruskal_wallis <- kruskal.test(revenue_data$revenue, g = revenue_data$season)

# Print the test results
print(kruskal_wallis)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  revenue_data$revenue and revenue_data$season
## Kruskal-Wallis chi-squared = 1.1837, df = 3, p-value = 0.7569
3.5 Kruskal-Wallis rank sum test
  • The test result indicates that the Kruskal-Wallis chi-squared value is 1.1837, with 3 degrees of freedom. The corresponding p-value is 0.7569.

  • In interpreting the results, the p-value is crucial. It represents the probability of obtaining the observed differences in revenue between seasons by chance alone.

  • In this case, the p-value of 0.7569 suggests that there is no statistically significant difference in revenue among the quarterly seasons. This means that there is no strong evidence to support the presence of seasonality in the revenue data based on the comparison of the seasons using the Kruskal-Wallis test.

4. Are there any pizzas we should take of the menu, or any promotions we could leverage?

pizzas_df %>%
  group_by(name) %>% 
  summarise(total_revenue = sum(revenue)) %>% 
  mutate(name = fct_reorder(name, total_revenue)) %>% 
  ggplot(aes(x = name, y = total_revenue, fill = total_revenue)) +
  geom_col(show.legend = F, width = 0.7) +
  geom_text(aes(label = dollar_format()(total_revenue)), nudge_y = -2300, size = 4)+
  scale_fill_distiller(palette = "BuGn", direction = 1) +
  coord_flip()+
  labs(
    title = "Pizza Name Revenue",
    x ="",
    y = ""
  )+
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, size = 15),
    axis.text.y = element_text(size = 12, face = "bold")
  )

4.1 Graph Explanation
  • The graph provided displays the revenue generated for each pizza name. Among the different pizza options, it is evident that the Thai Chicken pizza stands out as the highest revenue generator, accumulating an impressive total of $43,434.25. On the other end of the spectrum, we observe that the Brie Carre Pizza falls significantly short in terms of generated revenue when compared to the highest revenue achieved. It is important to note that the revenue generated by the Brie Carre Pizza is considerably low.

  • Furthermore, upon closer examination, we can identify an interesting trend within the graph. The top 10 lowest revenue figures are primarily dominated by pizzas belonging to the Veggie Category. This indicates that pizzas within this particular category have generally experienced lower levels of revenue generation compared to other pizza types.

pizzas_df %>%
  group_by(category) %>%
  summarise(total_revenue = sum(revenue)) %>%
  mutate(category = fct_reorder(category, total_revenue, .desc = TRUE)) %>%
  ggplot(aes(x = category, y = total_revenue, fill = total_revenue)) +
  geom_col(show.legend = FALSE, width = 0.7) +
  geom_text(aes(label = dollar_format()(total_revenue)), nudge_y = 10000, size = 3.5) +
  scale_fill_gradient(low = "#7b8a00", high = "#0c5c23") +
  labs(
    title = "Pizza Category Revenue",
    x = "",
    y = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 15),
    axis.text.y = element_text(size = 12, face = "bold")
  )

4.2 Graph Explanation
  • The bar graph presented above illustrates the revenue generated for each pizza category. Among the various categories, the Classic category emerges as the highest revenue generator, accumulating an impressive total of $220,053. Following closely behind are the Supreme and Chicken categories, which also demonstrate substantial revenue figures.

  • On the other hand, we observe that the Veggie category lags behind in terms of revenue, with a total generated revenue of $193,692. This category experiences the lowest revenue when compared to the other categories depicted in the graph.

pizzas_df %>%
  group_by(size) %>%
  summarise(total_revenue = sum(revenue)) %>%
  mutate(size = fct_reorder(size, total_revenue, .desc = TRUE)) %>%
  ggplot(aes(x = size, y = total_revenue, fill = total_revenue)) +
  geom_col(show.legend = FALSE, width = 0.7) +
  geom_text(aes(label = dollar_format()(total_revenue)), nudge_y = 20000, size = 3.5) +
  scale_fill_gradient(low = "#7b8a00", high = "#0c5c23") +
  labs(
    title = "Pizza size Revenue",
    x = "",
    y = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 15),
    axis.text.y = element_text(size = 12, face = "bold")
  )

4.3 Graph Explanation
  • The bar graph provided above displays the revenue generated for each pizza size. Among the different sizes, the large size stands out as the highest revenue generator, accumulating an impressive total of $357,319. Following closely behind are the medium and small sizes, which also demonstrate significant revenue figures.

However, it is worth noting that the XL and XXXL sizes fall considerably short in terms of generated revenue when compared to the other sizes depicted in the graph. The XL size generates a total revenue of $14,076, while the XXXL size only generates $1,007. These figures highlight a substantial difference in revenue compared to the other sizes, indicating that these two sizes have significantly lower sales performance.

pizzas_df %>% 
  group_by(ingredients) %>% 
  summarise(total_revenue = sum(revenue)) %>%
  arrange(total_revenue) %>% 
  mutate(ingredients = fct_reorder(ingredients, total_revenue, .desc = T)) %>% 
  ggplot(aes(x = ingredients, y = total_revenue, fill = total_revenue)) +
  geom_col(show.legend = F, width = 0.7) +
  geom_text(aes(label = dollar_format()(total_revenue)), nudge_y = - 5000, size = 5)+
  scale_fill_distiller(palette = "BuGn", direction = 1) +
  coord_flip()+
  labs(
    title = "Pizza Ingredients Revenue",
    x ="",
    y = ""
  )+
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = -0.5, size = 20, face = "bold"),
    axis.text.y = element_text(size = 12, face = "bold")
  )

4.4 Graph Explanation
  • The bar graph provided above illustrates the revenue generated for different pizza ingredients. From the top of the lowest revenue ingredients to the highest, we observe a pattern where the majority of ingredients belong to the Veggie Category, indicating relatively lower revenue generation for pizzas with these ingredients.

  • On the other hand, the ingredients associated with the highest revenue are predominantly those that include chicken. This suggests that pizzas featuring chicken as an ingredient have been particularly successful in terms of revenue generation.

Summary

  • Sales are highest on Fridays, followed by Saturdays, indicating an increase in customer traffic as the weekend approaches.
  • The busiest periods during a full day of sales are from 12 a.m to 2 p.m and from 5 p.m to 7 p.m, confirming the previously specified peak hours.
  • When aggregating data by month, Fridays and Thursdays consistently have the highest number of customers, suggesting their popularity.
  • Mondays and Tuesdays also show some customer activity, albeit to a lesser extent.
  • Despite the popularity of Fridays and Saturdays, weekdays still have larger customer peaks, challenging the assumption that all customers prefer weekends.
  • Thursdays and the following days are expected to have the highest volume of customers in the upcoming month.
  • Sundays are anticipated to have the lowest volume of customers, potentially due to lower demand or reduced business hours on weekends.
  • Customers tend to have a higher peak in orders from around 12 a.m to 2 p.m and from 5 p.m to 7 p.m, confirming the observed time patterns.
  • The busiest hours for order placement are from 12 a.m. to 1 p.m. and from 5 p.m. to 6 p.m.
  • July has the highest number of orders, followed by May and November, while February, September, and October have the lowest.
  • There is a gradual decrease in the number of orders from January to July, with fluctuations within each month.
  • There is a significant decrease in orders from July to October, possibly due to vacations or holidays.
  • The Classic Deluxe pizza has the highest number of orders, while the Brie Carre pizza has the lowest.
  • Pepperoni consistently has the highest number of orders, indicating its popularity throughout the year.
  • Some pizza names, such as Barbecue Chicken, Hawaiian, Classic Deluxe, and California Chicken, stand out for specific months.
  • Certain pizzas have a steady and loyal customer base across multiple months.
  • The overall revenue generated for the year amounts to $817,860.
  • The line graph depicting revenue fluctuations throughout the year does not show a consistent upward or downward trend.
  • September 27, 2015, stands out as the day with the highest revenue of $4,422.45.
  • When categorizing the revenue data into quarterly or monthly intervals, no clear seasonality or predictable patterns emerge.
  • The revenue lines for each month and quarter do not exhibit consistent or repetitive patterns.
  • A Kruskal-Wallis test indicates that there is no statistically significant difference in revenue among the months or quarters, suggesting the absence of seasonality in the revenue data.
  • Thai Chicken pizza stands out as the highest revenue generator, while Brie Carre Pizza has considerably low revenue.
  • Veggie category pizzas generally experience lower revenue compared to other pizza types.
  • Classic category is the highest revenue generator among pizza categories, followed by Supreme and Chicken categories.
  • Veggie category has the lowest revenue compared to other categories.
  • Large size pizzas generate the highest revenue, while XL and XXXL sizes have considerably lower revenue.
  • Ingredients associated with the highest revenue are predominantly those that include chicken, while ingredients from the Veggie category have lower revenue.

Recommendation

  1. Weekend Promotions: Since sales are highest on Fridays and Saturdays, it would be beneficial to capitalize on the increased customer traffic during these days by offering special promotions or deals. This could include discounts on popular pizzas or limited-time offers to incentive customers to visit during the weekends.

  2. Lunch and Dinner Specials: Considering that customer traffic peaks from 12 p.m to 2 p.m and 5 p.m to 7 p.m, it would be advantageous to introduce lunch and dinner specials during these time periods. This could attract more customers during traditionally busy hours and encourage them to choose your establishment over competitors.

  3. Weekday Strategies: While Fridays and Thursdays have the highest customer traffic, it’s important not to overlook Mondays and Tuesdays, which still show some customer activity. Implementing targeted promotions or loyalty programs specifically for early weekdays could help attract more customers during these typically quieter days.

  4. Market Weekday Benefits: Since the data indicates that weekdays have a larger customer peak compared to weekends, emphasize the benefits of visiting on weekdays in your marketing efforts. Highlight factors such as shorter wait times, more personalized service, or exclusive weekday offers to encourage customers to choose weekdays for their visits.

  5. Customer Engagement during Peak Hours: During the identified peak hours, ensure that you have enough staff members available to handle the increased customer volume efficiently. Additionally, focus on delivering exceptional customer service during these periods to leave a positive impression and encourage repeat visits.

  6. Understanding Customer Preferences: Continuously analyze customer preferences and behaviors during the peak hours to identify patterns and trends. This data can help tailor your menu, promotions, and overall customer experience to better align with their preferences, further enhancing customer satisfaction and loyalty.

  7. Exploring Delivery and Takeout Options: Considering the popularity of specific time periods, explore the feasibility of offering delivery and takeout services during these peak hours. This can cater to customers who prefer enjoying their pizzas at home or on the go, expanding your reach and potentially increasing revenue.

  8. Optimize Staffing and Operations: Since the peak order placement hours are from 12 a.m. to 1 p.m. and 5 p.m. to 6 p.m., ensure that you have sufficient staff members during these periods to handle the increased demand. Streamline operations to ensure smooth order processing, timely deliveries, and efficient customer service during these busy hours.

  9. Marketing and Communication: Leverage the popularity of certain pizza names, such as the Classic Deluxe and Pepperoni, in your marketing efforts. Highlight customer favorites through attractive visuals, social media campaigns, and targeted advertisements. Encourage customers to share their positive experiences and reviews, further promoting these popular choices.

  10. Customization and Personalization: Offer customers the ability to customize their pizzas by choosing their preferred ingredients and toppings. This personalization option can cater to different tastes and dietary restrictions, attracting a wider range of customers. Additionally, consider introducing limited-time specialty pizzas or collaborations with local suppliers or influencers to create unique and exclusive offerings that drive excitement and engagement.

  11. Diversify Marketing Efforts: Since the revenue does not exhibit clear patterns tied to specific months or seasons, it’s important to maintain consistent marketing efforts throughout the year. Develop a well-rounded marketing strategy that includes various channels such as social media, email marketing, partnerships, and targeted advertisements. This will help maintain a steady flow of customers and revenue regardless of the time of year.

  12. Focus on Customer Retention: With revenue fluctuations occurring independently from month to month, it becomes essential to prioritize customer retention. Implement loyalty programs, personalized offers, and excellent customer service to encourage repeat business and build long-term relationships with customers. Engage with customers through surveys, feedback forms, and social media to understand their preferences and tailor offerings accordingly.

  13. Explore Seasonal Promotions: While the overall revenue does not exhibit consistent seasonality, it may still be worth exploring occasional seasonal promotions or limited-time offers. Identify specific periods throughout the year when there may be potential spikes in customer demand, such as holidays, local events, or special occasions. Launch targeted campaigns during these periods to attract customers and capitalize on the increased market activity.

  14. Maximize Operational Efficiency: Since the revenue data shows fluctuations throughout the year, focus on optimizing operational efficiency to accommodate varying demand levels. Implement robust inventory management systems, streamline order processing, and ensure smooth delivery or service operations. This will help you handle both peak and low-demand periods effectively, minimizing any negative impact on customer experience and revenue.

  15. Analyze Customer Behavior: Continuously monitor and analyze customer behavior patterns and preferences. Utilize customer data, including purchase history, demographics, and feedback, to gain insights into their preferences and tailor your offerings accordingly. This data-driven approach will help you make informed decisions about pricing, product offerings, and marketing strategies.

  16. Collaborate with Local Businesses: Explore partnerships with complementary local businesses to expand your customer base and generate revenue through cross-promotions and joint marketing efforts. For example, team up with nearby hotels, event organizers, or tourist attractions to offer exclusive deals or packages. This can help drive additional foot traffic and revenue during both peak and off-peak periods.

  17. Invest in Customer Experience: Providing an exceptional customer experience is vital for sustaining revenue throughout the year. Focus on delivering high-quality products, prompt customer service, and seamless online or in-store experiences. Pay attention to feedback and continually improve your processes to meet and exceed customer expectations. A positive and memorable customer experience can lead to repeat business and positive word-of-mouth, driving revenue growth over time.

  18. Promote Top Revenue Generators: The Thai Chicken pizza emerges as the highest revenue generator among the different pizza names. Capitalize on its popularity by highlighting it in promotional materials, menu displays, and online marketing campaigns. Consider offering special deals or discounts specifically for the Thai Chicken pizza to attract more customers and further boost its revenue contribution.

  19. Revamp Low Revenue Pizza Options: Pay attention to the Brie Carre pizza, which falls significantly short in terms of generated revenue. Evaluate its performance and consider making modifications to its recipe, presentation, or marketing strategy to increase its appeal to customers. Alternatively, you may consider phasing out low-revenue pizzas or introducing new options that have the potential to generate higher revenue.

  20. Optimize Veggie Category: Given that the Veggie category consistently demonstrates lower revenue figures, focus on enhancing the appeal of vegetarian pizza options. Explore new flavor combinations, seasonal ingredients, or innovative toppings to attract a wider range of customers. Effective marketing strategies that highlight the health benefits, sustainability, or unique taste of vegetarian pizzas can help increase their revenue potential.

  21. Maximize Classic Category: The Classic category emerges as the highest revenue generator among the different pizza categories. Leverage this popularity by expanding the Classic pizza range, introducing new variations or limited-time specials within this category. Highlight the familiarity, comfort, and broad appeal of Classic pizzas to entice customers and drive revenue growth.

  22. Review Size Performance: While the large size stands out as the highest revenue generator, there is an opportunity to improve sales performance for the XL and XXXL sizes. Evaluate the pricing, portion sizes, and customer preferences for these larger options. Consider adjusting pricing strategies or introducing promotions that encourage customers to choose these sizes, such as family meal deals or bulk order discounts.

  23. Chicken Ingredient Focus: Since pizzas featuring chicken as an ingredient demonstrate strong revenue performance, consider expanding the range of chicken-based options or introducing new flavors and combinations. Promote these pizzas as satisfying, protein-rich choices that cater to diverse taste preferences. Develop targeted marketing campaigns highlighting the deliciousness and versatility of chicken pizzas to attract both existing and new customers.

  24. Specialty Pizza Development: Explore the possibility of developing specialty pizzas that cater to specific dietary preferences or emerging food trends. This could include gluten-free, vegan, or keto-friendly options, as well as innovative flavor combinations inspired by international cuisines. By expanding your menu to accommodate a broader range of customer preferences, you can attract new customer segments and potentially increase revenue.