Plato’s Pizza

Authors: Melissa Oberg, Tammy Brockman, Courtney Lambert, Ryann Laky

Business Understanding

Objective

With the goal of improving restaurant operations and sales using peak periods and the top pizzas (including ingredients) sold during those periods, the following questions are answered in this analysis:

What days and times is Plato’s Pizza the busiest?
What are the most popular sizes and types of pizzas ordered?
What are the most used ingredients?
What’s the average order value?

Project Ideas

The end state of this project is to help the business and advise actions they should take, including recommendations on inventory management and potential advertising campaigns. Below are some of the methods used to meet this end state:

Decision Trees: Determine the pizza most likely to be chosen.
Linear Regression: Predict profit and determine if certain tables prove more profitable.
Leverage and Cook’s Distance: Identify outliers to determine if observations should be removed from the model.
Cluster: Identify different personas of orders to look for commonalities, and using this to inform other analyses.
Association and Market Basket Analysis: Determine the relationships between various items sold.

These are just the initial idea of methods for determining the business solution. As we dive deep into the data and its analysis, other analytic methods may be useful in finding the business solution and better serving Plato’s Pizza.

Data Selection and Overview

This data includes a year’s worth of sales data from Plato’s Pizza, including the date and time of each order and the pizzas served, with additional details such as the type/size of pizza, quantity of pizzas, pricing of the pizza, and the specific ingredients used.

Data was pulled from Maven: Plato’s Pizza Challenge.
This data set contains four tables in CSV format:
Orders Table (orders.csv): Date and Time of order by Order ID

Order Details Table (order_details.csv): Order ID, Pizza ID, and Quantity by Order Details ID

Pizza Table (pizzas.csv): Pizza Type ID, Size, and Price by Pizza ID

Pizza Type Table (pizza_type.csv): Name, Category, and Ingredients by Pizza Type ID

orders <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/orders.csv")

order_details <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/order_details.csv")

pizzas <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/pizzas.csv")

pizza_types <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/pizza_types.csv")

Data Cleaning and Preparation

Step 1) Left joining `pizzas` to `order_details` table by `pizza_id`.

The `pizzas` table has the price of each item, which is critical for further analysis. The `order_details` table is on the left and the `pizzas` table is on the right. Each row is then verified to ensure each row gets a price assigned as part of this join.

order_details %>% left_join(pizzas, key="pizza_id") -> orders_price
sum(is.na(orders_price$price)) #0

Looking into the sum of missing valus for the join by price, there are `0`. This indicates there are no missing prices for the output, and visually inspecting the data frame proves this to be true, to include the output table being the right size: `48,620` rows by `4` columns.

Step 2) Left joining `pizza_types` to `orders_price` by `pizza_type_id`, adding on information from the pizza types table about ingredients to product from above.

orders_price %>% left_join(pizza_types, key="pizza_type_id") -> pizza_complete
head(pizza_complete)
sum(is.na(pizza_complete$ingredients))

pizza_complete <- pizza_complete %>% mutate(full_price=quantity*price) #create a true price column for each order line using quantity * price

Before going any further with joining and summarizing, the ingredients column needs to be cleaned and put it in a more usable format. Below is a listing of the column names, where basic pizza information is included, and where an ingredient is used, a 1 would be marked under the ingredient for use in the pizza.

broken_out <- pizza_complete %>% 
  mutate(
      Pepperoni = case_when(str_detect(ingredients,"Pepperoni")==TRUE ~ 1,TRUE~0),
      Ham = case_when(str_detect(ingredients,"Sliced Ham")==TRUE ~ 1,TRUE~0),
      BBQChicken = case_when(str_detect(ingredients,"Barbecued Chicken")==TRUE ~ 1,TRUE~0),
      Chicken = case_when(str_detect(ingredients,"Chicken")==TRUE ~ 1,TRUE~0),
      Pineapple = case_when(str_detect(ingredients,"Pineapple")==TRUE ~ 1,TRUE~0),
      Mozzarella = case_when(str_detect(ingredients,"Mozzarella")==TRUE ~ 1,TRUE~0),
      Provolone = case_when(str_detect(ingredients,"Provolone")==TRUE ~ 1,TRUE~0),
      Jalapeno = case_when(str_detect(ingredients,"Jalapeno")==TRUE ~ 1,TRUE~0),
      Tomato = case_when(str_detect(ingredients,"Tomato")==TRUE ~ 1,TRUE~0),
      Bacon = case_when(str_detect(ingredients,"Bacon")==TRUE ~ 1,TRUE~0),
      Red_Pepper = case_when(str_detect(ingredients,"Red Peppers")==TRUE ~ 1,TRUE~0),
      Calabrese = case_when(str_detect(ingredients,"Calabrese")==TRUE ~ 1,TRUE~0),
      Red_Onion = case_when(str_detect(ingredients,"Red Onions")==TRUE ~ 1,TRUE~0),
      Thai_Sweet = case_when(str_detect(ingredients,"Thai")==TRUE ~ 1,TRUE~0),
      Chipotle = case_when(str_detect(ingredients,"Chipotle")==TRUE ~ 1,TRUE~0),
      Garlic = case_when(str_detect(ingredients,"Garlic")==TRUE ~ 1,TRUE~0),
      Gouda = case_when(str_detect(ingredients,"Gouda")==TRUE ~ 1,TRUE~0),
      Blue_Cheese = case_when(str_detect(ingredients,"Blue")==TRUE ~ 1,TRUE~0),
      Cilantro = case_when(str_detect(ingredients,"Cilantro")==TRUE ~ 1,TRUE~0),
      Corn = case_when(str_detect(ingredients,"Corn")==TRUE ~ 1,TRUE~0),
      Capocollo = case_when(str_detect(ingredients,"Capocollo")==TRUE ~ 1,TRUE~0),
      Romano = case_when(str_detect(ingredients,"Romano")==TRUE ~ 1,TRUE~0),
      Mushrooms = case_when(str_detect(ingredients,"Mushrooms")==TRUE ~ 1,TRUE~0),
      Green_Olives = case_when(str_detect(ingredients,"Green Olives")==TRUE ~ 1,TRUE~0),
      Green_Peppers = case_when(str_detect(ingredients,"Green Peppers")==TRUE ~ 1,TRUE~0),
      BBQ_Sauce = case_when(str_detect(ingredients,"Barbecue Sauce")==TRUE ~ 1,TRUE~0),
      Artichoke = case_when(str_detect(ingredients,"Artichokes")==TRUE ~ 1,TRUE~0),
      Fontina = case_when(str_detect(ingredients,"Fontina")==TRUE ~ 1,TRUE~0),
      Spinach = case_when(str_detect(ingredients,"Spinach")==TRUE ~ 1,TRUE~0),
      Asiago = case_when(str_detect(ingredients,"Asiago")==TRUE ~ 1,TRUE~0),
      Alfredo = case_when(str_detect(ingredients,"Alfredo")==TRUE ~ 1,TRUE~0),
      Pesto = case_when(str_detect(ingredients,"Pesto")==TRUE ~ 1,TRUE~0),
      Italian_Sausage = case_when(str_detect(ingredients,"Italian")==TRUE ~ 1,TRUE~0),
      Chorizo = case_when(str_detect(ingredients,"Chorizo")==TRUE ~ 1,TRUE~0),
      Goat_Cheese = case_when(str_detect(ingredients,"Goat")==TRUE ~ 1,TRUE~0),
      Oregano = case_when(str_detect(ingredients,"Oregano")==TRUE ~ 1,TRUE~0),
      Anchovies = case_when(str_detect(ingredients,"Anchovies")==TRUE ~ 1,TRUE~0),
      Kalamata = case_when(str_detect(ingredients,"Kalamata")==TRUE ~ 1,TRUE~0),
      Feta = case_when(str_detect(ingredients,"Feta")==TRUE ~ 1,TRUE~0),   
      Beef = case_when(str_detect(ingredients,"Beef")==TRUE ~ 1,TRUE~0),
      Brie = case_when(str_detect(ingredients,"Brie Carre")==TRUE ~ 1,TRUE~0),
      Prosciutto = case_when(str_detect(ingredients,"Prosciutto")==TRUE ~ 1,TRUE~0),
      Carm_Onion = case_when(str_detect(ingredients,"Caramelized Onions")==TRUE ~ 1,TRUE~0),
      Pear = case_when(str_detect(ingredients,"Pepperoni")==TRUE ~ 1,TRUE~0),
      Thyme = case_when(str_detect(ingredients,"Pepperoni")==TRUE ~ 1,TRUE~0),
      Nduja_Sal = case_when(str_detect(ingredients,"Nduja")==TRUE ~ 1,TRUE~0),
      Pancetta = case_when(str_detect(ingredients,"Pancetta")==TRUE ~ 1,TRUE~0),
      Friggitello = case_when(str_detect(ingredients,"Friggitello")==TRUE ~ 1,TRUE~0),
      Genoa_Sal = case_when(str_detect(ingredients,"Genoa")==TRUE ~ 1,TRUE~0),
      San_Daniele = case_when(str_detect(ingredients,"San Daniele")==TRUE ~ 1,TRUE~0),
      Coarse_Sicialian = case_when(str_detect(ingredients,"Coarse")==TRUE ~ 1,TRUE~0),
      Arugula = case_when(str_detect(ingredients,"Arugula")==TRUE ~ 1,TRUE~0),
      Soppressata = case_when(str_detect(ingredients,"Soppressata")==TRUE ~ 1,TRUE~0),
      Onion = case_when(str_detect(ingredients,"Onion")==TRUE ~ 1,TRUE~0),
      Luganega = case_when(str_detect(ingredients,"Luganega")==TRUE ~ 1,TRUE~0),
      Peperoncini = case_when(str_detect(ingredients,"Peperoncini verdi")==TRUE ~ 1,TRUE~0),
      Ricotta = case_when(str_detect(ingredients,"Ricotta")==TRUE ~ 1,TRUE~0),
      Gorgonzola = case_when(str_detect(ingredients,"Gorgonzola")==TRUE ~ 1,TRUE~0),
      Parmigiano  = case_when(str_detect(ingredients,"Parmigiano")==TRUE ~ 1,TRUE~0),
      Eggplant  = case_when(str_detect(ingredients,"Eggplant")==TRUE ~ 1,TRUE~0),
      Zucchini  = case_when(str_detect(ingredients,"Zucchini")==TRUE ~ 1,TRUE~0),
      Sun_Dried = case_when(str_detect(ingredients,"Sun-dried Tomatoes")==TRUE ~ 1,TRUE~0),
      Plum_Tom  = case_when(str_detect(ingredients,"Plum")==TRUE ~ 1,TRUE~0))  
colnames(broken_out)

##  [1] "order_details_id" "order_id"         "pizza_id"         "quantity"        
##  [5] "pizza_type_id"    "size"             "price"            "name"            
##  [9] "category"         "ingredients"      "full_price"       "Pepperoni"       
## [13] "Ham"              "BBQChicken"       "Chicken"          "Pineapple"       
## [17] "Mozzarella"       "Provolone"        "Jalapeno"         "Tomato"          
## [21] "Bacon"            "Red_Pepper"       "Calabrese"        "Red_Onion"       
## [25] "Thai_Sweet"       "Chipotle"         "Garlic"           "Gouda"           
## [29] "Blue_Cheese"      "Cilantro"         "Corn"             "Capocollo"       
## [33] "Romano"           "Mushrooms"        "Green_Olives"     "Green_Peppers"   
## [37] "BBQ_Sauce"        "Artichoke"        "Fontina"          "Spinach"         
## [41] "Asiago"           "Alfredo"          "Pesto"            "Italian_Sausage" 
## [45] "Chorizo"          "Goat_Cheese"      "Oregano"          "Anchovies"       
## [49] "Kalamata"         "Feta"             "Beef"             "Brie"            
## [53] "Prosciutto"       "Carm_Onion"       "Pear"             "Thyme"           
## [57] "Nduja_Sal"        "Pancetta"         "Friggitello"      "Genoa_Sal"       
## [61] "San_Daniele"      "Coarse_Sicialian" "Arugula"          "Soppressata"     
## [65] "Onion"            "Luganega"         "Peperoncini"      "Ricotta"         
## [69] "Gorgonzola"       "Parmigiano"       "Eggplant"         "Zucchini"        
## [73] "Sun_Dried"        "Plum_Tom"

broken_out$ingredients=NULL #removing old ingredients columns
by_order_id <- group_by(broken_out,order_id) #creating summary data set with ingredient counts/prices by order

pizza_summary <- summarize(by_order_id, 
                           TotalPrice=sum(full_price), #fixed 
                           TotalItems=n(),
                           Pepperoni=sum(Pepperoni),
                           Ham = sum(Ham),
                           BBQChicken = sum(BBQChicken),
                           Chicken = sum(Chicken),
                           Pineapple = sum(Pineapple),
                           Mozzarella = sum(Mozzarella),
                           Provolone = sum(Provolone),
                           Jalapeno = sum(Jalapeno),
                           Tomato = sum(Tomato),
                           Bacon = sum(Bacon),
                           Red_Pepper = sum(Red_Pepper),
                           Calabrese = sum(Calabrese),
                           Red_Onion = sum(Red_Onion),
                           Thai_Sweet = sum(Thai_Sweet),
                           Chipotle = sum(Chipotle),
                           Garlic = sum(Garlic),
                           Gouda = sum(Gouda),
                           Blue_Cheese = sum(Blue_Cheese),
                           Cilantro = sum(Cilantro),
                           Corn = sum(Corn),
                           Capocollo = sum(Capocollo),
                           Romano = sum(Romano),
                           Mushrooms = sum(Mushrooms),
                           Green_Olives = sum(Green_Olives),
                           Green_Peppers = sum(Green_Peppers),
                           BBQ_Sauce = sum(BBQ_Sauce),
                           Artichoke = sum(Artichoke),
                           Fontina = sum(Fontina),
                           Spinach = sum(Spinach),
                           Asiago = sum(Asiago),
                           Alfredo = sum(Alfredo),
                           Pesto = sum(Pesto),
                           Italian_Sausage = sum(Italian_Sausage),
                           Chorizo = sum(Chorizo),
                           Goat_Cheese = sum(Goat_Cheese),
                           Oregano = sum(Oregano),
                           Anchovies = sum(Anchovies),
                           Kalamata = sum(Kalamata),
                           Feta = sum(Feta),   
                           Beef = sum(Beef),
                           Brie = sum(Brie),
                           Prosciutto = sum(Prosciutto),
                           Carm_Onion = sum(Carm_Onion),
                           Pear = sum(Pear),
                           Thyme = sum(Thyme),
                           Nduja_Sal = sum(Nduja_Sal),
                           Pancetta = sum(Pancetta),
                           Friggitello = sum(Friggitello),
                           Genoa_Sal = sum(Genoa_Sal),
                           San_Daniele = sum(San_Daniele),
                           Coarse_Sicialian = sum(Coarse_Sicialian),
                           Arugula = sum(Arugula),
                           Soppressata = sum(Soppressata),
                           Onion = sum(Onion),
                           Luganega = sum(Luganega),
                           Peperoncini = sum(Peperoncini),
                           Ricotta = sum(Ricotta),
                           Gorgonzola = sum(Gorgonzola),
                           Parmigiano  = sum(Parmigiano),
                           Eggplant  = sum(Eggplant),
                           Zucchini  = sum(Zucchini),
                           Sun_Dried = sum(Sun_Dried),
                           Plum_Tom  = sum(Plum_Tom))

The output table pizza summary has one row per order. It has a total price column, total pizzas column, and a total count of ingredients across the order. Next I will join in the time information from the orders table.

Step 3) Left joining `pizza_types` to `orders_price` by `pizza_type_id`, adding on information from the pizza types table about ingredients to product from above.

pizza_summary_with_times <- pizza_summary %>% left_join(orders,key="order_id")
nrow(pizza_summary_with_times) #21,350

Step 4) Separating the columns with time/date information into individual units serves as useful in exploratory data analysis and allows further analysis into peak hours with a high number of orders.

pizza_summary_with_times[c('Year','Month', 'Day')]<-str_split_fixed(pizza_summary_with_times$date, "-", 3) #split date column into year, month, day

pizza_summary_with_times[c('Hour','Minute', 'Seconds')]<-str_split_fixed(pizza_summary_with_times$time, ":", 3) #split time column into hour, minute, second

pizza_summary_with_times$date=NULL #removing old date column
pizza_summary_with_times$time=NULL #removing old time column
pizza_summary_with_times$Seconds=NULL #removed seconds column (not useful)

pizza_summary_with_times$Month <- as.numeric(pizza_summary_with_times$Month) #month as numeric
pizza_summary_with_times$Year <- as.numeric(pizza_summary_with_times$Year) #year as numeric
pizza_summary_with_times$Day <- as.numeric(pizza_summary_with_times$Day) #day as numeric
pizza_summary_with_times$Hour <- as.numeric(pizza_summary_with_times$Hour) #hour as numeric
pizza_summary_with_times$Minute <- as.numeric(pizza_summary_with_times$Minute) #minute as numeric

The remaining product columns are listed below:

colnames(pizza_summary_with_times)

##  [1] "order_id"         "TotalPrice"       "TotalItems"       "Pepperoni"       
##  [5] "Ham"              "BBQChicken"       "Chicken"          "Pineapple"       
##  [9] "Mozzarella"       "Provolone"        "Jalapeno"         "Tomato"          
## [13] "Bacon"            "Red_Pepper"       "Calabrese"        "Red_Onion"       
## [17] "Thai_Sweet"       "Chipotle"         "Garlic"           "Gouda"           
## [21] "Blue_Cheese"      "Cilantro"         "Corn"             "Capocollo"       
## [25] "Romano"           "Mushrooms"        "Green_Olives"     "Green_Peppers"   
## [29] "BBQ_Sauce"        "Artichoke"        "Fontina"          "Spinach"         
## [33] "Asiago"           "Alfredo"          "Pesto"            "Italian_Sausage" 
## [37] "Chorizo"          "Goat_Cheese"      "Oregano"          "Anchovies"       
## [41] "Kalamata"         "Feta"             "Beef"             "Brie"            
## [45] "Prosciutto"       "Carm_Onion"       "Pear"             "Thyme"           
## [49] "Nduja_Sal"        "Pancetta"         "Friggitello"      "Genoa_Sal"       
## [53] "San_Daniele"      "Coarse_Sicialian" "Arugula"          "Soppressata"     
## [57] "Onion"            "Luganega"         "Peperoncini"      "Ricotta"         
## [61] "Gorgonzola"       "Parmigiano"       "Eggplant"         "Zucchini"        
## [65] "Sun_Dried"        "Plum_Tom"         "Year"             "Month"           
## [69] "Day"              "Hour"             "Minute"

Exploratory Data Visualization

Exploratory data visualization helps to shape how the data looks before getting further into data analysis. In the below visualizations, we can develop a shared understanding of the shape of the data and use it to guide further analysis.

Histograms

In developing some initial histograms, we can gain some insight into the data to better visualize aspects of the data prior to getting into further data analysis.

hist(pizza_summary_with_times$TotalPrice, main = "Histogram of Orders by Total Price", xlab = "Total Price ($)", col="red", breaks = 15)

Figure 1: Histogram of Total Orders by Price, where the majority of orders are under $50.

hist(pizza_summary_with_times$Month, main = "Histogram of Orders by Month", xlab = "Month", col="red", breaks = 13)

Figure 2: Histogram of Total Orders by Month, where the majority of orders seem to be sold in month 1, being January.

hist(pizza_summary_with_times$TotalItems, main = "Histogram of Orders by Total Items", xlab = "Total Item Count per Order", xlim = c(0,10), col="red", breaks = 10)

Figure 3: Histogram of Total Orders by Total Items, where the majority of orders have 5 items or less. While total item count per order exceeds 20, this histogram has been limited to show orders with up to 10 items in each.

In examining the above histograms, we know that customers typically spend under $50.00, that January has the highest number of orders, and that customers usually order 5 items or less. Knowing this information alone, we can make a few suggestions:

Plato’s should offer specials (such as a 15% discount on orders over $50.00) to increase the number of items and the price of items sold
Plato’s should offer bundled deals (such as deals on orders over 5 items) to increase the number of items sold
Plato’s should offer specials during certain months (such as a $2.00 discount per order during the months of August and September) to increase sales during these months

Bar Charts

Looking into bar charts also helps us visualize the spread of the data. Below are some good examples of how the data looks.

Figure 4: This shows an example of the total number of pizzas ordered by category, which can roughly tell which pizzas specifically (and by category) have been ordered the most.

Figure 5: This shows the most used ingredients, which can be useful in supply chain management and understanding which ingredients need to be kept in stock more often than others. While this isn’t a problem initially addressed, it can still shed insight into this product management.

ingredients1 <- apply(pizza_summary_with_times[,4:34], 2, max)
ingredients2 <- apply(pizza_summary_with_times[,35:66], 2, max)
par(mar=c(11,4,4,4))
barplot(ingredients1, main = "Ingredient Use by Ingredient, pt. 1", ylab = "Count", col = "red", las=2)
par(mar=c(11,4,4,4))
barplot(ingredients2, main = "Ingredient Use by Ingredient, pt. 2", ylab = "Count", col = "red", las=2)

Figure 6: Above are the total ingredients used where each time an ingredient is used in an order, it is counted.

Overall, the following ingredients have 5 or more occurances:

Capocollo
Chicken
Feta
Garlic
Jalapeno
Mozzarella
Mushrooms
Onion
Pear
Pepperoni
Red Pepper
Red Onion
Spinach
Tomato
Thyme

Next are some bar plots showing the distribution of pizzas among different sizes, categories, and types.

sizes <- table(pizza_complete$size) #Pizza Sizes
barplot(sizes, ylab= "Frequency", xlab="Sizes", main= "Size of Pizzas Ordered", col="red", ylim=c(0,20000))

Figure 7: Above is a barplot displaying the sizes of pizzas ordered.

categoriesofpizza <- table(pizza_complete$category) #Pizza Categories
barplot(categoriesofpizza, ylab="Frequency", main= "Categories of Pizza", las=2, col="red", ylim=c(0,17000))

Figure 8: Above is a bar plot displaying the frequency of categories of pizza ordered. These categories are pre-determined by Plato’s.

nameofpizza <- table(pizza_complete$name)
par(mar=c(15,4,4,4))
a <- barplot(nameofpizza, ylab="Frequency", main="Names of Pizza", las=2, col="red", ylim=c(0,3000))
text(a, nameofpizza + 125, nameofpizza, font=2, cex = 0.5)

Figure 9: Above are the frequencies of the names of pizzas ordered.

Box Plots

Box plots can also give us a good idea of the distribution of the data, especially when using these for comparisons within our data set.

par(mar=c(11,4,4,4))
boxplot(pizza_summary_with_times[,4:34], main = "Ingredient Box Plot, pt. 1", notch=FALSE, col=c("red", "blue", "yellow", "grey"), las=2)
par(mar=c(11,4,4,4))
boxplot(pizza_summary_with_times[,35:66], main = "Ingredient Box Plot, pt. 2", notch=FALSE, col=c("red", "blue", "yellow", "grey"), las=2)

Figure 10: Above are two boxplots displaying where ingredients are used. While these may not provide any obvious value, they do display that every ingredient used has outliers.

boxplot(pizza_summary_with_times$Month, main = "Total Pizzas by Month", notch=FALSE, col=c("red", "blue", "yellow", "grey"))
boxplot(pizza_summary_with_times$Hour, main = "Total Pizzas by Hour", notch=FALSE, col=c("red", "blue", "yellow", "grey"))

Figure 11: Above are two boxplots, one indicating the number of pizzas ordered per month and the other dictating the number of pizzas ordered by hour. There are no outliers in either. For the by-month summary, this tells us that although the month of January has higher sales, this still isn’t an outlier. For the by-hour summary, this is expected as Plato’s has limited hours of operation. An outlier of any sorts would be considered problematic and means sales are being conducted outside of traditional business hours. If this existed, this would require special examination.

boxplot(pizza_summary_with_times$TotalPrice, main = "Box Plot by Total Price", notch=T, col=c("red", "blue", "yellow", "grey"))
boxplot(pizza_summary_with_times$TotalItems, main = "Box Plot by Total Items", notch=T, col=c("red", "blue", "yellow", "grey"))

Figure 12: Above are two additional boxplots of dictating total price and total items for all orders. There are a handful of outliers in both.

Heat Maps

Heat maps can be useful in aiding more visual learners in expressing data in meainingful colors. Using heat maps also ads another dimension to visual depictions, something that most other 2-dimensional visualizations tend to lack.

Figure 13: This shows with color and further labeling the ingredients most commonly occurring by pizza category. Sum of pizzas ordered are detailed in the legend on the right.

Data Analysis

K-Means Clustering

K-Means Clustering aims to partition n observations into k clusters. Each nth observation belongs to the cluster “k” with the nearest cluster centers. This method of clustering minimizes within-cluster squared Euclidian distances, or variances. This kind of clustering maximizes the similarity of data points within the same clusters and minimizes the similarity of points in other clusters. First, we evaluated which number of cluster centers would be best for our data by calculating the sum of squares for different numbers of cluster centers. The chart below shows our results.

wss <- 0
kmeans_pizza <- pizza_summary %>%
  select(c("order_id","TotalPrice","TotalItems","Capocollo","Chicken",
           "Feta", "Garlic", "Jalapeno", "Mozzarella", "Mushrooms", "Onion",
           "Pear", "Pepperoni", "Red_Pepper", "Red_Onion", "Spinach", "Tomato",
           "Thyme"))

for (i in 1:15) {
  km.out <- kmeans(kmeans_pizza[,-1], centers = i, nstart=20)
  wss[i] <- km.out$tot.withinss
}

kclusters <- kmeans(kmeans_pizza[,-1], 4)

for (i in 1:15) {
  km.out <- kmeans(kmeans_pizza[,-1], centers = i, nstart=20)
  wss[i] <- km.out$tot.withinss
}

plot(1:15, wss, type = "b", xlab = "Number of Clusters", ylab = "Within Groups Sum of Squares")

Figure 14: In the above graph showing the number of clusters, the optimal number is around 4, where returns become diminished as clusters increase.

Figure 15: The differentiating features for the four order clusters appear to be the total price and total items.

Cluster Profiles:

The differentiating features for the four order clusters appear to be the total price and total items.

Cluster 1 – This is cluster with the highest number of observations. This cluster’s orders also have the lowest price on average and an average total item count of 1. This indicates that a significant portion of the Plato’s Pizza customer base orders just one item.
Cluster 2 – A significant portion of orders fall into cluster 2. Cluster 2 has a higher average price than Cluster 1 and 2 total items. Plato’s may consider further analysis to determine differences between Cluster 1 and Cluster 2 customers to determine how customers from Cluster 1 could be persuaded to purchase more items and become more like customers from Cluster 2.
Cluster 3 – A smaller but still significant portion of orders are classified into Cluster 3. Cluster 3’s differences from Cluster 2 are similar to those described above between Cluster 2 and Cluster 1; Cluster 3’s customers spend slightly more money and tend to order about 3 items. A similar approach could be taken to determine if it would be beneficial to upsell customer’s matching the Cluster 2 profile to become more like Cluster 3.
Cluster 4 – A small number of orders fall into Cluster 4. Cluster 4 orders have a much higher total price and total number of items than the other clusters. Plato’s might consider looking into net profit from large orders like those from Cluster 4 to determine if it would be a good idea to develop more business with the customer segment.

There are some differences in the values for ingredients between the clusters, these appear to be primarily driven by the total items on the order (e.g. if you are ordering more pizzas, you will naturally order more ingredients). Throughout all the clusters, tomato, onion, garlic, and red pepper appear to be popular. An additional recommendation, should Plato’s Pizza want to continue this type of analysis, would be to launch a loyalty program or find another similar mechanism to collect additional customer demographic information. With more customer information, which could be combined with existing order information, Plato’s will be able to create more detailed and usable customer profiles.

Hierarchical Clustering

As opposed to K-means clustering, this form of clustering that builds an heirarchy of clusters that can be broken up in very different leaves, separated visually in a dendrogram. However, much like k-means clustering, the goal of this form of clustering is to develop clusters that contain enough similarity within clusters and enough differences between clusters.

hclusters <- hclust(dist(pizza_summary[,-1]))
plot(hclusters, xlab = "Clusters", ylab = "Height")
rect.hclust(hclusters, k = 3)

Figure 16: Hierarchical clustering diagram of pizza_summary.

In working through several examples of the heirarchical clustering and associated dendrograms, these prove to be unreadable and not very useful for this analysis.

Association

Association analysis identifies whether different items are more or less likely to be purchased together on the same order. The bar chart below shows the frequency that different items appear in the dataset (limited to just the most commonly appearing items).

assoc_pizza <- pizza_complete %>% 
        mutate(bbq_ckn = case_when(str_detect(pizza_type_id,"bbq_ckn")==TRUE ~ 1,TRUE~0),
         cali_ckn = case_when(str_detect(pizza_type_id,"cali_ckn")==TRUE ~ 1,TRUE~0),  
         ckn_alfredo = case_when(str_detect(pizza_type_id,"ckn_alfredo")==TRUE ~ 1,TRUE~0),
         ckn_pesto = case_when(str_detect(pizza_type_id,"ckn_pesto")==TRUE ~ 1,TRUE~0),
         southw_ckn = case_when(str_detect(pizza_type_id,"southw_ckn")==TRUE ~ 1,TRUE~0),
         thai_ckn = case_when(str_detect(pizza_type_id,"thai_ckn")==TRUE ~ 1,TRUE~0),
         big_meat = case_when(str_detect(pizza_type_id,"big_meat")==TRUE ~ 1,TRUE~0),
         classic_dlx = case_when(str_detect(pizza_type_id,"classic_dlx")==TRUE ~ 1,TRUE~0),
         hawaiian = case_when(str_detect(pizza_type_id,"hawaiian")==TRUE ~ 1,TRUE~0),
         ital_cpcllo = case_when(str_detect(pizza_type_id,"ital_cpcllo")==TRUE ~ 1,TRUE~0),
         napolitana = case_when(str_detect(pizza_type_id,"napolitana")==TRUE ~ 1,TRUE~0),
         pep_msh_pep = case_when(str_detect(pizza_type_id,"pep_msh_pep")==TRUE ~ 1,TRUE~0),
         pepperoni = case_when(str_detect(pizza_type_id,"pepperoni")==TRUE ~ 1,TRUE~0),
         the_greek = case_when(str_detect(pizza_type_id,"the_greek")==TRUE ~ 1,TRUE~0),
         brie_carre = case_when(str_detect(pizza_type_id,"brie_carre")==TRUE ~ 1,TRUE~0),
         calabrese = case_when(str_detect(pizza_type_id,"calabrese")==TRUE ~ 1,TRUE~0),
         ital_supr = case_when(str_detect(pizza_type_id,"ital_supr")==TRUE ~ 1,TRUE~0),
         peppr_salami = case_when(str_detect(pizza_type_id,"peppr_salami")==TRUE ~ 1,TRUE~0),
         prsc_argla = case_when(str_detect(pizza_type_id,"prsc_argla")==TRUE ~ 1,TRUE~0),
         sicilian = case_when(str_detect(pizza_type_id,"sicilian")==TRUE ~ 1,TRUE~0),
         soppressata = case_when(str_detect(pizza_type_id,"soppressata")==TRUE ~ 1,TRUE~0),
         spicy_ital = case_when(str_detect(pizza_type_id,"spicy_ital")==TRUE ~ 1,TRUE~0),
         spinach_supr = case_when(str_detect(pizza_type_id,"spinach_supr")==TRUE ~ 1,TRUE~0),
         five_cheese = case_when(str_detect(pizza_type_id,"five_cheese")==TRUE ~ 1,TRUE~0),
         four_cheese = case_when(str_detect(pizza_type_id,"four_cheese")==TRUE ~ 1,TRUE~0),
         green_garden = case_when(str_detect(pizza_type_id,"green_garden")==TRUE ~ 1,TRUE~0),
         ital_veggie = case_when(str_detect(pizza_type_id,"ital_veggie")==TRUE ~ 1,TRUE~0),
         mediterraneo = case_when(str_detect(pizza_type_id,"mediterraneo")==TRUE ~ 1,TRUE~0),
         mexicana = case_when(str_detect(pizza_type_id,"mexicana")==TRUE ~ 1,TRUE~0),
         spin_pesto = case_when(str_detect(pizza_type_id,"spin_pesto")==TRUE ~ 1,TRUE~0),
         spinach_fet = case_when(str_detect(pizza_type_id,"spinach_fet")==TRUE ~ 1,TRUE~0),
         veggie_veg = case_when(str_detect(pizza_type_id,"veggie_veg")==TRUE ~ 1,TRUE~0))

assoc_pizza$order_details_id=NULL
assoc_pizza$pizza_id=NULL
assoc_pizza$quantity=NULL
assoc_pizza$price=NULL
assoc_pizza$name=NULL
assoc_pizza$full_price=NULL
assoc_pizza$pizza_type_id=NULL
assoc_pizza$size=NULL
assoc_pizza$ingredients=NULL
assoc_pizza$category=NULL

by_order_id <- group_by(assoc_pizza,order_id) #creating summary dataset with ingredient counts and prices by order

association_data <- summarize(by_order_id, 
bbq_ckn=sum(bbq_ckn), cali_ckn=sum(cali_ckn), ckn_alfredo=sum(ckn_alfredo), ckn_pesto=sum(ckn_pesto), southw_ckn=sum(southw_ckn), thai_ckn=sum(thai_ckn), big_meat=sum(big_meat), classic_dlx=sum(classic_dlx), hawaiian=sum(hawaiian), ital_cpcllo=sum(ital_cpcllo), napolitana=sum(napolitana), pep_msh_pep=sum(pep_msh_pep), pepperoni=sum(pepperoni), the_greek=sum(the_greek), brie_carre=sum(brie_carre), calabrese=sum(calabrese), ital_supr=sum(ital_supr), peppr_salami=sum(peppr_salami), prsc_argla=sum(prsc_argla), sicilian=sum(sicilian), soppressata=sum(soppressata), spicy_ital=sum(spicy_ital), spinach_supr=sum(spinach_supr), five_cheese=sum(five_cheese), four_cheese=sum(four_cheese), green_garden=sum(green_garden), ital_veggie=sum(ital_veggie), mediterraneo=sum(mediterraneo), mexicana=sum(mexicana), spin_pesto=sum(spin_pesto), spinach_fet=sum(spinach_fet), veggie_veg=sum(veggie_veg)) %>% mutate(
  bbq_ckn=case_when((bbq_ckn>0)==TRUE~1,TRUE~0),
  cali_ckn=case_when((cali_ckn>0)==TRUE~1,TRUE~0),
  ckn_alfredo=case_when((ckn_alfredo>0)==TRUE~1,TRUE~0),
  ckn_pesto=case_when((ckn_pesto>0)==TRUE~1,TRUE~0),
  southw_ckn=case_when((southw_ckn>0)==TRUE~1,TRUE~0),
  thai_ckn=case_when((thai_ckn>0)==TRUE~1,TRUE~0),
  big_meat=case_when((big_meat>0)==TRUE~1,TRUE~0),
  classic_dlx=case_when((classic_dlx>0)==TRUE~1,TRUE~0),
  hawaiian=case_when((hawaiian>0)==TRUE~1,TRUE~0),
  ital_cpcllo=case_when((ital_cpcllo>0)==TRUE~1,TRUE~0),
  napolitana=case_when((napolitana>0)==TRUE~1,TRUE~0),
  pep_msh_pep=case_when((pep_msh_pep>0)==TRUE~1,TRUE~0),
  pepperoni=case_when((pepperoni>0)==TRUE~1,TRUE~0),
  the_greek=case_when((the_greek>0)==TRUE~1,TRUE~0),
  brie_carre=case_when((brie_carre>0)==TRUE~1,TRUE~0),
  calabrese=case_when((calabrese>0)==TRUE~1,TRUE~0),
  ital_supr=case_when((ital_supr>0)==TRUE~1,TRUE~0),
  peppr_salami=case_when((peppr_salami>0)==TRUE~1,TRUE~0),
  prsc_argla=case_when((prsc_argla>0)==TRUE~1,TRUE~0),
  sicilian=case_when((sicilian>0)==TRUE~1,TRUE~0),
  soppressata=case_when((soppressata>0)==TRUE~1,TRUE~0),
  spicy_ital=case_when((spicy_ital>0)==TRUE~1,TRUE~0),
  spinach_supr=case_when((spinach_supr>0)==TRUE~1,TRUE~0),
  five_cheese=case_when((five_cheese>0)==TRUE~1,TRUE~0),
  four_cheese=case_when((four_cheese>0)==TRUE~1,TRUE~0),
  green_garden=case_when((green_garden>0)==TRUE~1,TRUE~0),
  ital_veggie=case_when((ital_veggie>0)==TRUE~1,TRUE~0),
  mediterraneo=case_when((mediterraneo>0)==TRUE~1,TRUE~0),
  mexicana=case_when((mexicana>0)==TRUE~1,TRUE~0),
  spin_pesto=case_when((spin_pesto>0)==TRUE~1,TRUE~0),
  spinach_fet=case_when((spinach_fet>0)==TRUE~1,TRUE~0),
  veggie_veg=case_when((veggie_veg>0)==TRUE~1,TRUE~0))

association_data$order_id=NULL

cleaned <- as(as.matrix(association_data), "transactions")
itemFrequencyPlot(cleaned, support=.1, xlab = "Pizza Type", ylab = "Relative Item Frequency", col="red")

Figure 17: Association of relative item frequency.

The selected association rules below describe relationships between the different pizzas Plato’s sells. For example, the top rule indicates that customers who have ordered the Italian Supreme are more likely to also order the Classic Deluxe. The next indicates customers who have ordered the Spicy Italian and more likely to also order the Pepperoni.

Figure 18: Association rules describing the relationship between different pizzas.

Plato’s Pizza may be able to use this information to move forward with strategies suggested above for encouraging one-item customers to order a second item, or for encouraging two-item customers to order a third. Insight from this association analysis can inform which items they recommended for additional purchase to different customers.

Decision Trees

Decision trees help identify relationships between different variables, and are often portrayed tiered to shape decisions in selecting variables for further analysis. In this case, we can assess the relationships.

The mean squard error, or MSE, is a way to determine the amount of error in a statistical model. While there is no correct value for MSE, the lower the value the better and 0 indicates the model is perfect. It is often used in model selection. For example, the lower MSE, the better the model (comparatively).

sample_index <- sample(nrow(pizza_summary_with_times), nrow(pizza_summary_with_times)*0.90)
pizza_train <- pizza_summary_with_times[sample_index,]
pizza_test <- pizza_summary_with_times[-sample_index,]
pizza_rpart <- rpart(formula = TotalPrice ~ ., data = pizza_summary_with_times)
prp(pizza_rpart, digits = 2, extra = 1)

Figure 19: This shows the separation of orders by total number of items according to the total price of the order. According to this, for example, a total of 8,201 orders have less than 2 items with an average price of $17.

pizza_pred <- predict(pizza_rpart, newdata = pizza_test)
print(paste("MSE:", mean((pizza_test$TotalPrice - pizza_pred)^2)))

## [1] "MSE: 63.8177880696272"

sample_index <- sample(nrow(pizza_complete), nrow(pizza_complete)*0.90)
pizzac_train <- pizza_complete[sample_index,]
pizzac_test <- pizza_complete[-sample_index,]
pizza_complete$category <- as.factor(pizza_complete$category)
pizza_complete$size <- as.factor(pizza_complete$size)
pizzac_rpart <- rpart(formula = price ~ category + size, data = pizza_complete)
prp(pizzac_rpart, digits = 2, extra = 1)

Figure 20: This shows the separation of sizes of pizza by total number of items according to the sizes of the pizza. According to this, for example, a total of 8,609 orders have small classic or veggie pizza classifications with an average price of $12.

pizzac_pred <- predict(pizzac_rpart, newdata = pizzac_test)
print(paste("MSE:", mean((pizzac_test$price - pizzac_pred)^2)))

## [1] "MSE: 1.91295280206912"

In this case, the model with the lower MSE indicates a better model. From the above decision trees, it can be determined that using the categories of pizza in estimating the prices of an order is a better indicator than the using the size. For what we know about pizza sales, this makes snese, as ingredients really determine the cost of a pizza and not just the size. However, both can be useful in shaping decisions made with these variables.

Regression

Generally speaking, regression models are used to determine an equation from the historical outcomes of y based on any number of variables x. These models are useful for predictions, and in many cases, involve more than one independent variable. This is useful for inventory purchasing decisions. Regression Analysis is used to determine which ingredients are used the most across all the pizzas sold. Three models were created. This is the final model with p-values for ingredients with significant relationship to `Total Items Sold`. Below is an initial regression detailing which variables have the greatest impact on total price.

summary(model_totalitems3)

## 
## Call:
## lm(formula = pizza_summary_reg$TotalItems ~ Pepperoni + Ham + 
##     BBQChicken + Chicken + Pineapple + Mozzarella + Provolone + 
##     Jalapeno + Tomato + Red_Pepper + Calabrese + Red_Onion + 
##     Chipotle + Capocollo + Mushrooms + Green_Olives + Green_Peppers + 
##     Artichoke + Fontina + Spinach + Pesto + Italian_Sausage + 
##     Goat_Cheese + Anchovies + Kalamata + Beef + Brie, data = pizza_summary_reg)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.464e-11 -5.000e-15 -1.000e-15  3.000e-15  6.971e-11 
## 
## Coefficients:
##                   Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)      4.286e-13  5.732e-15  7.478e+01  < 2e-16 ***
## Pepperoni       -7.180e-14  9.993e-15 -7.184e+00 6.97e-13 ***
## Ham             -1.500e+00  3.784e-14 -3.964e+13  < 2e-16 ***
## BBQChicken       1.315e-13  2.851e-14  4.615e+00 3.96e-06 ***
## Chicken         -9.328e-14  1.180e-14 -7.904e+00 2.83e-15 ***
## Pineapple        1.500e+00  3.443e-14  4.357e+13  < 2e-16 ***
## Mozzarella       1.000e+00  7.914e-15  1.264e+14  < 2e-16 ***
## Provolone       -6.167e-14  1.616e-14 -3.815e+00 0.000136 ***
## Jalapeno         2.000e+00  4.759e-14  4.202e+13  < 2e-16 ***
## Tomato           1.297e-13  2.865e-14  4.526e+00 6.04e-06 ***
## Red_Pepper      -5.000e-01  1.462e-14 -3.419e+13  < 2e-16 ***
## Calabrese       -2.000e+00  6.037e-14 -3.313e+13  < 2e-16 ***
## Red_Onion        1.000e+00  2.333e-14  4.287e+13  < 2e-16 ***
## Chipotle        -1.500e+00  4.837e-14 -3.101e+13  < 2e-16 ***
## Capocollo        1.000e+00  3.384e-14  2.955e+13  < 2e-16 ***
## Mushrooms        5.000e-01  2.577e-14  1.940e+13  < 2e-16 ***
## Green_Olives     1.000e+00  3.507e-14  2.851e+13  < 2e-16 ***
## Green_Peppers    5.000e-01  2.455e-14  2.036e+13  < 2e-16 ***
## Artichoke       -5.000e-01  2.010e-14 -2.488e+13  < 2e-16 ***
## Fontina         -5.000e-01  3.256e-14 -1.536e+13  < 2e-16 ***
## Spinach         -5.000e-01  1.605e-14 -3.116e+13  < 2e-16 ***
## Pesto            2.000e+00  3.678e-14  5.438e+13  < 2e-16 ***
## Italian_Sausage  1.000e+00  1.593e-14  6.278e+13  < 2e-16 ***
## Goat_Cheese      5.000e-01  2.358e-14  2.120e+13  < 2e-16 ***
## Anchovies       -1.000e+00  3.367e-14 -2.970e+13  < 2e-16 ***
## Kalamata         1.000e+00  3.192e-14  3.133e+13  < 2e-16 ***
## Beef            -1.000e+00  3.873e-14 -2.582e+13  < 2e-16 ***
## Brie             1.000e+00  2.319e-14  4.313e+13  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.015e-13 on 21322 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 9.146e+27 on 27 and 21322 DF,  p-value: < 2.2e-16

This shows the significance (p-values) of all variables in determining which ingredients affect the total number of items sold, and they are all influential (some more than others) in determining the dependent variables.

qqnorm(model_totalitems3$residuals, main = "Total Ingredients Regression")
qqline(model_totalitems3$residuals, col = 'blue')

Figure 21: This shows the QQ plot of the regression of all ingredients used on pizzas sold.

Market Basket Analysis

Market basket analysis will show which ingredients are used the most frequently for all the pizzas sold. The market basket analysis created below uses a new data set removing columns 2-10, as they are not usable in this analysis.

Below are the top 5 ingredients that are most frequently used in all the pizzas sold:

Tomato - 27,635 times
Garlic - 27,422 times
Onion - 21,914 times
Red Onion - 19,547 times
Red Pepper - 16,284 times

broken_out_all_ing <- broken_out[,-2:-10]
broken_out_all_ing <- broken_out_all_ing[,-1]
broken_out_all_ing_tran <- as(as.matrix(broken_out_all_ing), "transactions")
itemFrequencyPlot(broken_out_all_ing_tran, support = 0.1, cex.names = 0.8, main = "Item Frequency Plot", xlab = "Frequency", topN = 15, col = 'red', type = "absolute", horiz = TRUE)

Figure 22: This is the frequency of the times each ingredient has been sold for the top 15 ingredients.

The below is a list of all 45 rules created, descending by lift. If the lift is greater than 1, it shows that there is a relationship between the ingredents. This is a view all 45 rules and their support, lift, confidence, and a count of how many times that rule is used. As an item of note, additional rules were created without a left-hand side of the rule, indicating that regardless of the left, the right would be plausible. However, because of the relationships we are looking to inspect, those were excluded.

options(width = 300)
basket_rules_all_ingred <- apriori(broken_out_all_ing_tran, parameter = list(sup = 0.1, conf = .9, target = "rules"))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target  ext
##         0.9    0.1    1 none FALSE            TRUE       5     0.1      1     10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 4862 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[63 item(s), 48620 transaction(s)] done [0.01s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [45 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

quality(basket_rules_all_ingred)<-round(quality(basket_rules_all_ingred),digits=3)
basket_rules_ai.sorted <- sort(basket_rules_all_ingred, by="lift")
options(width = 300)
inspect(basket_rules_ai.sorted)

##      lhs                                   rhs          support confidence coverage lift  count
## [1]  {Pepperoni}                        => {Pear}       0.212   1          0.212    4.720 10300
## [2]  {Pear}                             => {Pepperoni}  0.212   1          0.212    4.720 10300
## [3]  {Pepperoni}                        => {Thyme}      0.212   1          0.212    4.720 10300
## [4]  {Thyme}                            => {Pepperoni}  0.212   1          0.212    4.720 10300
## [5]  {Pear}                             => {Thyme}      0.212   1          0.212    4.720 10300
## [6]  {Thyme}                            => {Pear}       0.212   1          0.212    4.720 10300
## [7]  {Pepperoni, Pear}                  => {Thyme}      0.212   1          0.212    4.720 10300
## [8]  {Pepperoni, Thyme}                 => {Pear}       0.212   1          0.212    4.720 10300
## [9]  {Pear, Thyme}                      => {Pepperoni}  0.212   1          0.212    4.720 10300
## [10] {Chicken, Red_Onion}               => {Red_Pepper} 0.108   1          0.108    2.986  5237
## [11] {Chicken, Onion}                   => {Red_Pepper} 0.108   1          0.108    2.986  5237
## [12] {Chicken, Tomato}                  => {Red_Pepper} 0.155   1          0.155    2.986  7533
## [13] {Chicken, Red_Onion, Onion}        => {Red_Pepper} 0.108   1          0.108    2.986  5237
## [14] {Red_Pepper, Mushrooms}            => {Red_Onion}  0.101   1          0.101    2.487  4906
## [15] {Mushrooms, Onion}                 => {Red_Onion}  0.130   1          0.130    2.487  6338
## [16] {Chicken, Onion}                   => {Red_Onion}  0.108   1          0.108    2.487  5237
## [17] {Red_Pepper, Onion}                => {Red_Onion}  0.218   1          0.218    2.487 10619
## [18] {Red_Pepper, Mushrooms, Onion}     => {Red_Onion}  0.101   1          0.101    2.487  4906
## [19] {Chicken, Red_Pepper, Onion}       => {Red_Onion}  0.108   1          0.108    2.487  5237
## [20] {Tomato, Red_Pepper, Onion}        => {Red_Onion}  0.149   1          0.149    2.487  7223
## [21] {Red_Onion}                        => {Onion}      0.402   1          0.402    2.219 19547
## [22] {Garlic, Green_Olives}             => {Onion}      0.107   1          0.107    2.219  5187
## [23] {Red_Pepper, Mushrooms}            => {Onion}      0.101   1          0.101    2.219  4906
## [24] {Red_Onion, Mushrooms}             => {Onion}      0.130   1          0.130    2.219  6338
## [25] {Chicken, Red_Onion}               => {Onion}      0.108   1          0.108    2.219  5237
## [26] {Red_Pepper, Red_Onion}            => {Onion}      0.218   1          0.218    2.219 10619
## [27] {Red_Onion, Garlic}                => {Onion}      0.226   1          0.226    2.219 10971
## [28] {Tomato, Red_Onion}                => {Onion}      0.303   1          0.303    2.219 14719
## [29] {Tomato, Garlic, Green_Olives}     => {Onion}      0.107   1          0.107    2.219  5187
## [30] {Red_Pepper, Red_Onion, Mushrooms} => {Onion}      0.101   1          0.101    2.219  4906
## [31] {Chicken, Red_Pepper, Red_Onion}   => {Onion}      0.108   1          0.108    2.219  5237
## [32] {Tomato, Red_Pepper, Red_Onion}    => {Onion}      0.149   1          0.149    2.219  7223
## [33] {Tomato, Red_Onion, Garlic}        => {Onion}      0.196   1          0.196    2.219  9539
## [34] {Capocollo}                        => {Garlic}     0.135   1          0.135    1.773  6572
## [35] {Green_Olives, Onion}              => {Garlic}     0.107   1          0.107    1.773  5187
## [36] {Tomato, Capocollo}                => {Garlic}     0.135   1          0.135    1.773  6572
## [37] {Tomato, Green_Olives, Onion}      => {Garlic}     0.107   1          0.107    1.773  5187
## [38] {Green_Olives}                     => {Tomato}     0.127   1          0.127    1.766  6174
## [39] {Artichoke}                        => {Tomato}     0.117   1          0.117    1.766  5682
## [40] {Capocollo}                        => {Tomato}     0.135   1          0.135    1.766  6572
## [41] {Green_Olives, Onion}              => {Tomato}     0.107   1          0.107    1.766  5187
## [42] {Garlic, Green_Olives}             => {Tomato}     0.107   1          0.107    1.766  5187
## [43] {Garlic, Capocollo}                => {Tomato}     0.135   1          0.135    1.766  6572
## [44] {Red_Pepper, Garlic}               => {Tomato}     0.130   1          0.130    1.766  6316
## [45] {Garlic, Green_Olives, Onion}      => {Tomato}     0.107   1          0.107    1.766  5187

plot(basket_rules_ai.sorted, method = "graph", engine = "htmlwidget")

Figure 23: Above is an interactive graph showing all the rules and which ingredients are used, as well as their lift, support, confidence, and count. A blue arrow indicates that the starting ingredient is on the left side and the end of the arrow is the right side. A red arrow indicates that the starting ingredient is on the right side and the end of the arrow is the left side. This can also be clicked on to show related ingredients. It can also be zoomed in to see the details better.

plot(basket_rules_ai.sorted, main = "Confidence by Support and Lift")

Figure 24: Above is a scatter plot of all 45 basket rules.

Evaluation and Deployment

The original goal of this project was to answer the below questions to help Plato’s Pizza shape their business. With some initial exploratory analysis and additional analytics on the data sets, we have come to the following:

What days and times is Plato’s Pizza the busiest?

According to Figure 11, pizzas are sold primarily between the hours of 12:30 PM and 6:30 PM. According to Figure 2, pizzas are primarily sold during the first month. While both demonstrations are slightly off from expectations, this can help Plato’s shape their staffing strategy, as well as allow them to increase sales thorough various specials and other advertising during the less popular dates/times.

What are the most popular sizes and types of pizzas ordered?

The most common category of pizza sold is Classic according to Figure 8 and the most common size sold is Large according to Figure 7

What are the most used ingredients?

According to Figure 22, the following ingredients are used the most times:
- Tomato - 27,635 times
- Garlic - 27,422 times
- Onion - 21,914 times
- Red Onion - 19,547 times
- Red Pepper - 16,284 times

What’s the average order value?

Each order is generally under $50 according to Figure 1.

With some additional analysis, we also learned some additional items for explanation and exploration. For example, in K-means clustering, we know that most orders are in the first and second clusters, with 1-2 items per order, with a total order price of roughly $17-40. Within these two k-means clusters are a total of 16,358 orders, which makes up about 76% of the total orders. This, outlined in `Figure 15`, is consistent with the initial exploratory data. This is also consistent with the regression, as well. There are a few recommendations we can make from this:

Offering specials on larger orders (such as a 15% discount on orders over $50.00) to increase the number of items and the price of items sold
Offer bundled deals (such as deals on orders over 5 items) to increase the number of items sold
Offer specials during certain months (such as a $2.00 discount per order during the months of August and September) to increase sales during these months

It is also important to note that all rules established above in the market basket analysis relate primarily to the most common ingredients. This is critial in determining which ingredients should be in stock, but also important for shaping additional deals.