What days and times is Plato’s Pizza the busiest?
What are the most popular sizes and types of pizzas ordered?
What are the most used ingredients?
What’s the average order value?
Data was pulled from Maven: Plato’s Pizza Challenge.
This data set contains four tables in CSV format:
Orders Table (orders.csv): Date and
Time of order by Order ID
order_details.csv): Order ID, Pizza ID, and Quantity by
Order Details ID
pizzas.csv): Pizza Type
ID, Size, and Price by Pizza ID
pizza_type.csv):
Name, Category, and Ingredients by Pizza Type ID
orders <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/orders.csv")
order_details <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/order_details.csv")
pizzas <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/pizzas.csv")
pizza_types <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 621/Project/Raw Data/pizza_types.csv")
pizzas to
order_details table by pizza_id.pizzas table has the price of each item, which is
critical for further analysis. The order_details table is
on the left and the pizzas table is on the right. Each row
is then verified to ensure each row gets a price assigned as part of
this join.order_details %>% left_join(pizzas, key="pizza_id") -> orders_price
sum(is.na(orders_price$price)) #0
0. This indicates there are no missing prices for the
output, and visually inspecting the data frame proves this to be true,
to include the output table being the right size: 48,620
rows by 4 columns.
pizza_types to
orders_price by pizza_type_id, adding on
information from the pizza types table about ingredients to product from
above.orders_price %>% left_join(pizza_types, key="pizza_type_id") -> pizza_complete
head(pizza_complete)
sum(is.na(pizza_complete$ingredients))
pizza_complete <- pizza_complete %>% mutate(full_price=quantity*price) #create a true price column for each order line using quantity * price
broken_out <- pizza_complete %>%
mutate(
Pepperoni = case_when(str_detect(ingredients,"Pepperoni")==TRUE ~ 1,TRUE~0),
Ham = case_when(str_detect(ingredients,"Sliced Ham")==TRUE ~ 1,TRUE~0),
BBQChicken = case_when(str_detect(ingredients,"Barbecued Chicken")==TRUE ~ 1,TRUE~0),
Chicken = case_when(str_detect(ingredients,"Chicken")==TRUE ~ 1,TRUE~0),
Pineapple = case_when(str_detect(ingredients,"Pineapple")==TRUE ~ 1,TRUE~0),
Mozzarella = case_when(str_detect(ingredients,"Mozzarella")==TRUE ~ 1,TRUE~0),
Provolone = case_when(str_detect(ingredients,"Provolone")==TRUE ~ 1,TRUE~0),
Jalapeno = case_when(str_detect(ingredients,"Jalapeno")==TRUE ~ 1,TRUE~0),
Tomato = case_when(str_detect(ingredients,"Tomato")==TRUE ~ 1,TRUE~0),
Bacon = case_when(str_detect(ingredients,"Bacon")==TRUE ~ 1,TRUE~0),
Red_Pepper = case_when(str_detect(ingredients,"Red Peppers")==TRUE ~ 1,TRUE~0),
Calabrese = case_when(str_detect(ingredients,"Calabrese")==TRUE ~ 1,TRUE~0),
Red_Onion = case_when(str_detect(ingredients,"Red Onions")==TRUE ~ 1,TRUE~0),
Thai_Sweet = case_when(str_detect(ingredients,"Thai")==TRUE ~ 1,TRUE~0),
Chipotle = case_when(str_detect(ingredients,"Chipotle")==TRUE ~ 1,TRUE~0),
Garlic = case_when(str_detect(ingredients,"Garlic")==TRUE ~ 1,TRUE~0),
Gouda = case_when(str_detect(ingredients,"Gouda")==TRUE ~ 1,TRUE~0),
Blue_Cheese = case_when(str_detect(ingredients,"Blue")==TRUE ~ 1,TRUE~0),
Cilantro = case_when(str_detect(ingredients,"Cilantro")==TRUE ~ 1,TRUE~0),
Corn = case_when(str_detect(ingredients,"Corn")==TRUE ~ 1,TRUE~0),
Capocollo = case_when(str_detect(ingredients,"Capocollo")==TRUE ~ 1,TRUE~0),
Romano = case_when(str_detect(ingredients,"Romano")==TRUE ~ 1,TRUE~0),
Mushrooms = case_when(str_detect(ingredients,"Mushrooms")==TRUE ~ 1,TRUE~0),
Green_Olives = case_when(str_detect(ingredients,"Green Olives")==TRUE ~ 1,TRUE~0),
Green_Peppers = case_when(str_detect(ingredients,"Green Peppers")==TRUE ~ 1,TRUE~0),
BBQ_Sauce = case_when(str_detect(ingredients,"Barbecue Sauce")==TRUE ~ 1,TRUE~0),
Artichoke = case_when(str_detect(ingredients,"Artichokes")==TRUE ~ 1,TRUE~0),
Fontina = case_when(str_detect(ingredients,"Fontina")==TRUE ~ 1,TRUE~0),
Spinach = case_when(str_detect(ingredients,"Spinach")==TRUE ~ 1,TRUE~0),
Asiago = case_when(str_detect(ingredients,"Asiago")==TRUE ~ 1,TRUE~0),
Alfredo = case_when(str_detect(ingredients,"Alfredo")==TRUE ~ 1,TRUE~0),
Pesto = case_when(str_detect(ingredients,"Pesto")==TRUE ~ 1,TRUE~0),
Italian_Sausage = case_when(str_detect(ingredients,"Italian")==TRUE ~ 1,TRUE~0),
Chorizo = case_when(str_detect(ingredients,"Chorizo")==TRUE ~ 1,TRUE~0),
Goat_Cheese = case_when(str_detect(ingredients,"Goat")==TRUE ~ 1,TRUE~0),
Oregano = case_when(str_detect(ingredients,"Oregano")==TRUE ~ 1,TRUE~0),
Anchovies = case_when(str_detect(ingredients,"Anchovies")==TRUE ~ 1,TRUE~0),
Kalamata = case_when(str_detect(ingredients,"Kalamata")==TRUE ~ 1,TRUE~0),
Feta = case_when(str_detect(ingredients,"Feta")==TRUE ~ 1,TRUE~0),
Beef = case_when(str_detect(ingredients,"Beef")==TRUE ~ 1,TRUE~0),
Brie = case_when(str_detect(ingredients,"Brie Carre")==TRUE ~ 1,TRUE~0),
Prosciutto = case_when(str_detect(ingredients,"Prosciutto")==TRUE ~ 1,TRUE~0),
Carm_Onion = case_when(str_detect(ingredients,"Caramelized Onions")==TRUE ~ 1,TRUE~0),
Pear = case_when(str_detect(ingredients,"Pepperoni")==TRUE ~ 1,TRUE~0),
Thyme = case_when(str_detect(ingredients,"Pepperoni")==TRUE ~ 1,TRUE~0),
Nduja_Sal = case_when(str_detect(ingredients,"Nduja")==TRUE ~ 1,TRUE~0),
Pancetta = case_when(str_detect(ingredients,"Pancetta")==TRUE ~ 1,TRUE~0),
Friggitello = case_when(str_detect(ingredients,"Friggitello")==TRUE ~ 1,TRUE~0),
Genoa_Sal = case_when(str_detect(ingredients,"Genoa")==TRUE ~ 1,TRUE~0),
San_Daniele = case_when(str_detect(ingredients,"San Daniele")==TRUE ~ 1,TRUE~0),
Coarse_Sicialian = case_when(str_detect(ingredients,"Coarse")==TRUE ~ 1,TRUE~0),
Arugula = case_when(str_detect(ingredients,"Arugula")==TRUE ~ 1,TRUE~0),
Soppressata = case_when(str_detect(ingredients,"Soppressata")==TRUE ~ 1,TRUE~0),
Onion = case_when(str_detect(ingredients,"Onion")==TRUE ~ 1,TRUE~0),
Luganega = case_when(str_detect(ingredients,"Luganega")==TRUE ~ 1,TRUE~0),
Peperoncini = case_when(str_detect(ingredients,"Peperoncini verdi")==TRUE ~ 1,TRUE~0),
Ricotta = case_when(str_detect(ingredients,"Ricotta")==TRUE ~ 1,TRUE~0),
Gorgonzola = case_when(str_detect(ingredients,"Gorgonzola")==TRUE ~ 1,TRUE~0),
Parmigiano = case_when(str_detect(ingredients,"Parmigiano")==TRUE ~ 1,TRUE~0),
Eggplant = case_when(str_detect(ingredients,"Eggplant")==TRUE ~ 1,TRUE~0),
Zucchini = case_when(str_detect(ingredients,"Zucchini")==TRUE ~ 1,TRUE~0),
Sun_Dried = case_when(str_detect(ingredients,"Sun-dried Tomatoes")==TRUE ~ 1,TRUE~0),
Plum_Tom = case_when(str_detect(ingredients,"Plum")==TRUE ~ 1,TRUE~0))
colnames(broken_out)
## [1] "order_details_id" "order_id" "pizza_id" "quantity"
## [5] "pizza_type_id" "size" "price" "name"
## [9] "category" "ingredients" "full_price" "Pepperoni"
## [13] "Ham" "BBQChicken" "Chicken" "Pineapple"
## [17] "Mozzarella" "Provolone" "Jalapeno" "Tomato"
## [21] "Bacon" "Red_Pepper" "Calabrese" "Red_Onion"
## [25] "Thai_Sweet" "Chipotle" "Garlic" "Gouda"
## [29] "Blue_Cheese" "Cilantro" "Corn" "Capocollo"
## [33] "Romano" "Mushrooms" "Green_Olives" "Green_Peppers"
## [37] "BBQ_Sauce" "Artichoke" "Fontina" "Spinach"
## [41] "Asiago" "Alfredo" "Pesto" "Italian_Sausage"
## [45] "Chorizo" "Goat_Cheese" "Oregano" "Anchovies"
## [49] "Kalamata" "Feta" "Beef" "Brie"
## [53] "Prosciutto" "Carm_Onion" "Pear" "Thyme"
## [57] "Nduja_Sal" "Pancetta" "Friggitello" "Genoa_Sal"
## [61] "San_Daniele" "Coarse_Sicialian" "Arugula" "Soppressata"
## [65] "Onion" "Luganega" "Peperoncini" "Ricotta"
## [69] "Gorgonzola" "Parmigiano" "Eggplant" "Zucchini"
## [73] "Sun_Dried" "Plum_Tom"
broken_out$ingredients=NULL #removing old ingredients columns
by_order_id <- group_by(broken_out,order_id) #creating summary data set with ingredient counts/prices by order
pizza_summary <- summarize(by_order_id,
TotalPrice=sum(full_price), #fixed
TotalItems=n(),
Pepperoni=sum(Pepperoni),
Ham = sum(Ham),
BBQChicken = sum(BBQChicken),
Chicken = sum(Chicken),
Pineapple = sum(Pineapple),
Mozzarella = sum(Mozzarella),
Provolone = sum(Provolone),
Jalapeno = sum(Jalapeno),
Tomato = sum(Tomato),
Bacon = sum(Bacon),
Red_Pepper = sum(Red_Pepper),
Calabrese = sum(Calabrese),
Red_Onion = sum(Red_Onion),
Thai_Sweet = sum(Thai_Sweet),
Chipotle = sum(Chipotle),
Garlic = sum(Garlic),
Gouda = sum(Gouda),
Blue_Cheese = sum(Blue_Cheese),
Cilantro = sum(Cilantro),
Corn = sum(Corn),
Capocollo = sum(Capocollo),
Romano = sum(Romano),
Mushrooms = sum(Mushrooms),
Green_Olives = sum(Green_Olives),
Green_Peppers = sum(Green_Peppers),
BBQ_Sauce = sum(BBQ_Sauce),
Artichoke = sum(Artichoke),
Fontina = sum(Fontina),
Spinach = sum(Spinach),
Asiago = sum(Asiago),
Alfredo = sum(Alfredo),
Pesto = sum(Pesto),
Italian_Sausage = sum(Italian_Sausage),
Chorizo = sum(Chorizo),
Goat_Cheese = sum(Goat_Cheese),
Oregano = sum(Oregano),
Anchovies = sum(Anchovies),
Kalamata = sum(Kalamata),
Feta = sum(Feta),
Beef = sum(Beef),
Brie = sum(Brie),
Prosciutto = sum(Prosciutto),
Carm_Onion = sum(Carm_Onion),
Pear = sum(Pear),
Thyme = sum(Thyme),
Nduja_Sal = sum(Nduja_Sal),
Pancetta = sum(Pancetta),
Friggitello = sum(Friggitello),
Genoa_Sal = sum(Genoa_Sal),
San_Daniele = sum(San_Daniele),
Coarse_Sicialian = sum(Coarse_Sicialian),
Arugula = sum(Arugula),
Soppressata = sum(Soppressata),
Onion = sum(Onion),
Luganega = sum(Luganega),
Peperoncini = sum(Peperoncini),
Ricotta = sum(Ricotta),
Gorgonzola = sum(Gorgonzola),
Parmigiano = sum(Parmigiano),
Eggplant = sum(Eggplant),
Zucchini = sum(Zucchini),
Sun_Dried = sum(Sun_Dried),
Plum_Tom = sum(Plum_Tom))
pizza_types to
orders_price by pizza_type_id, adding on
information from the pizza types table about ingredients to product from
above.pizza_summary_with_times <- pizza_summary %>% left_join(orders,key="order_id")
nrow(pizza_summary_with_times) #21,350
pizza_summary_with_times[c('Year','Month', 'Day')]<-str_split_fixed(pizza_summary_with_times$date, "-", 3) #split date column into year, month, day
pizza_summary_with_times[c('Hour','Minute', 'Seconds')]<-str_split_fixed(pizza_summary_with_times$time, ":", 3) #split time column into hour, minute, second
pizza_summary_with_times$date=NULL #removing old date column
pizza_summary_with_times$time=NULL #removing old time column
pizza_summary_with_times$Seconds=NULL #removed seconds column (not useful)
pizza_summary_with_times$Month <- as.numeric(pizza_summary_with_times$Month) #month as numeric
pizza_summary_with_times$Year <- as.numeric(pizza_summary_with_times$Year) #year as numeric
pizza_summary_with_times$Day <- as.numeric(pizza_summary_with_times$Day) #day as numeric
pizza_summary_with_times$Hour <- as.numeric(pizza_summary_with_times$Hour) #hour as numeric
pizza_summary_with_times$Minute <- as.numeric(pizza_summary_with_times$Minute) #minute as numeric
colnames(pizza_summary_with_times)
## [1] "order_id" "TotalPrice" "TotalItems" "Pepperoni"
## [5] "Ham" "BBQChicken" "Chicken" "Pineapple"
## [9] "Mozzarella" "Provolone" "Jalapeno" "Tomato"
## [13] "Bacon" "Red_Pepper" "Calabrese" "Red_Onion"
## [17] "Thai_Sweet" "Chipotle" "Garlic" "Gouda"
## [21] "Blue_Cheese" "Cilantro" "Corn" "Capocollo"
## [25] "Romano" "Mushrooms" "Green_Olives" "Green_Peppers"
## [29] "BBQ_Sauce" "Artichoke" "Fontina" "Spinach"
## [33] "Asiago" "Alfredo" "Pesto" "Italian_Sausage"
## [37] "Chorizo" "Goat_Cheese" "Oregano" "Anchovies"
## [41] "Kalamata" "Feta" "Beef" "Brie"
## [45] "Prosciutto" "Carm_Onion" "Pear" "Thyme"
## [49] "Nduja_Sal" "Pancetta" "Friggitello" "Genoa_Sal"
## [53] "San_Daniele" "Coarse_Sicialian" "Arugula" "Soppressata"
## [57] "Onion" "Luganega" "Peperoncini" "Ricotta"
## [61] "Gorgonzola" "Parmigiano" "Eggplant" "Zucchini"
## [65] "Sun_Dried" "Plum_Tom" "Year" "Month"
## [69] "Day" "Hour" "Minute"
hist(pizza_summary_with_times$TotalPrice, main = "Histogram of Orders by Total Price", xlab = "Total Price ($)", col="red", breaks = 15)
Figure 1: Histogram of Total Orders by Price, where the majority of orders are under $50.
hist(pizza_summary_with_times$Month, main = "Histogram of Orders by Month", xlab = "Month", col="red", breaks = 13)
Figure 2: Histogram of Total Orders by Month, where the majority of orders seem to be sold in month 1, being January.
hist(pizza_summary_with_times$TotalItems, main = "Histogram of Orders by Total Items", xlab = "Total Item Count per Order", xlim = c(0,10), col="red", breaks = 10)
Figure 3: Histogram of Total Orders by Total Items, where the majority of orders have 5 items or less. While total item count per order exceeds 20, this histogram has been limited to show orders with up to 10 items in each.
Figure 4: This shows an example of the total number of pizzas ordered by category, which can roughly tell which pizzas specifically (and by category) have been ordered the most.
Figure 5: This shows the most used ingredients, which can be useful in supply chain management and understanding which ingredients need to be kept in stock more often than others. While this isn’t a problem initially addressed, it can still shed insight into this product management.
ingredients1 <- apply(pizza_summary_with_times[,4:34], 2, max)
ingredients2 <- apply(pizza_summary_with_times[,35:66], 2, max)
par(mar=c(11,4,4,4))
barplot(ingredients1, main = "Ingredient Use by Ingredient, pt. 1", ylab = "Count", col = "red", las=2)
par(mar=c(11,4,4,4))
barplot(ingredients2, main = "Ingredient Use by Ingredient, pt. 2", ylab = "Count", col = "red", las=2)
Figure 6: Above are the total ingredients used where each time an ingredient is used in an order, it is counted.
sizes <- table(pizza_complete$size) #Pizza Sizes
barplot(sizes, ylab= "Frequency", xlab="Sizes", main= "Size of Pizzas Ordered", col="red", ylim=c(0,20000))
Figure 7: Above is a barplot displaying the sizes of pizzas ordered.
categoriesofpizza <- table(pizza_complete$category) #Pizza Categories
barplot(categoriesofpizza, ylab="Frequency", main= "Categories of Pizza", las=2, col="red", ylim=c(0,17000))
Figure 8: Above is a bar plot displaying the frequency of categories of pizza ordered. These categories are pre-determined by Plato’s.
nameofpizza <- table(pizza_complete$name)
par(mar=c(15,4,4,4))
a <- barplot(nameofpizza, ylab="Frequency", main="Names of Pizza", las=2, col="red", ylim=c(0,3000))
text(a, nameofpizza + 125, nameofpizza, font=2, cex = 0.5)
Figure 9: Above are the frequencies of the names of pizzas ordered.
par(mar=c(11,4,4,4))
boxplot(pizza_summary_with_times[,4:34], main = "Ingredient Box Plot, pt. 1", notch=FALSE, col=c("red", "blue", "yellow", "grey"), las=2)
par(mar=c(11,4,4,4))
boxplot(pizza_summary_with_times[,35:66], main = "Ingredient Box Plot, pt. 2", notch=FALSE, col=c("red", "blue", "yellow", "grey"), las=2)
Figure 10: Above are two boxplots displaying where ingredients are used. While these may not provide any obvious value, they do display that every ingredient used has outliers.
boxplot(pizza_summary_with_times$Month, main = "Total Pizzas by Month", notch=FALSE, col=c("red", "blue", "yellow", "grey"))
boxplot(pizza_summary_with_times$Hour, main = "Total Pizzas by Hour", notch=FALSE, col=c("red", "blue", "yellow", "grey"))
Figure 11: Above are two boxplots, one indicating the number of pizzas ordered per month and the other dictating the number of pizzas ordered by hour. There are no outliers in either. For the by-month summary, this tells us that although the month of January has higher sales, this still isn’t an outlier. For the by-hour summary, this is expected as Plato’s has limited hours of operation. An outlier of any sorts would be considered problematic and means sales are being conducted outside of traditional business hours. If this existed, this would require special examination.
boxplot(pizza_summary_with_times$TotalPrice, main = "Box Plot by Total Price", notch=T, col=c("red", "blue", "yellow", "grey"))
boxplot(pizza_summary_with_times$TotalItems, main = "Box Plot by Total Items", notch=T, col=c("red", "blue", "yellow", "grey"))
Figure 12: Above are two additional boxplots of dictating total price and total items for all orders. There are a handful of outliers in both.
Figure 13: This shows with color and further labeling the ingredients most commonly occurring by pizza category. Sum of pizzas ordered are detailed in the legend on the right.
wss <- 0
kmeans_pizza <- pizza_summary %>%
select(c("order_id","TotalPrice","TotalItems","Capocollo","Chicken",
"Feta", "Garlic", "Jalapeno", "Mozzarella", "Mushrooms", "Onion",
"Pear", "Pepperoni", "Red_Pepper", "Red_Onion", "Spinach", "Tomato",
"Thyme"))
for (i in 1:15) {
km.out <- kmeans(kmeans_pizza[,-1], centers = i, nstart=20)
wss[i] <- km.out$tot.withinss
}
kclusters <- kmeans(kmeans_pizza[,-1], 4)
for (i in 1:15) {
km.out <- kmeans(kmeans_pizza[,-1], centers = i, nstart=20)
wss[i] <- km.out$tot.withinss
}
plot(1:15, wss, type = "b", xlab = "Number of Clusters", ylab = "Within Groups Sum of Squares")
Figure 14: In the above graph showing the number of clusters, the optimal number is around 4, where returns become diminished as clusters increase.
Figure 15: The differentiating features for the four order clusters appear to be the total price and total items.
Cluster 1 – This is cluster with the highest number of observations. This cluster’s orders also have the lowest price on average and an average total item count of 1. This indicates that a significant portion of the Plato’s Pizza customer base orders just one item.
Cluster 2 – A significant portion of orders fall into cluster 2. Cluster 2 has a higher average price than Cluster 1 and 2 total items. Plato’s may consider further analysis to determine differences between Cluster 1 and Cluster 2 customers to determine how customers from Cluster 1 could be persuaded to purchase more items and become more like customers from Cluster 2.
Cluster 3 – A smaller but still significant portion of orders are classified into Cluster 3. Cluster 3’s differences from Cluster 2 are similar to those described above between Cluster 2 and Cluster 1; Cluster 3’s customers spend slightly more money and tend to order about 3 items. A similar approach could be taken to determine if it would be beneficial to upsell customer’s matching the Cluster 2 profile to become more like Cluster 3.
Cluster 4 – A small number of orders fall into Cluster 4. Cluster 4 orders have a much higher total price and total number of items than the other clusters. Plato’s might consider looking into net profit from large orders like those from Cluster 4 to determine if it would be a good idea to develop more business with the customer segment.
hclusters <- hclust(dist(pizza_summary[,-1]))
plot(hclusters, xlab = "Clusters", ylab = "Height")
rect.hclust(hclusters, k = 3)
Figure 16: Hierarchical clustering diagram of pizza_summary.
assoc_pizza <- pizza_complete %>%
mutate(bbq_ckn = case_when(str_detect(pizza_type_id,"bbq_ckn")==TRUE ~ 1,TRUE~0),
cali_ckn = case_when(str_detect(pizza_type_id,"cali_ckn")==TRUE ~ 1,TRUE~0),
ckn_alfredo = case_when(str_detect(pizza_type_id,"ckn_alfredo")==TRUE ~ 1,TRUE~0),
ckn_pesto = case_when(str_detect(pizza_type_id,"ckn_pesto")==TRUE ~ 1,TRUE~0),
southw_ckn = case_when(str_detect(pizza_type_id,"southw_ckn")==TRUE ~ 1,TRUE~0),
thai_ckn = case_when(str_detect(pizza_type_id,"thai_ckn")==TRUE ~ 1,TRUE~0),
big_meat = case_when(str_detect(pizza_type_id,"big_meat")==TRUE ~ 1,TRUE~0),
classic_dlx = case_when(str_detect(pizza_type_id,"classic_dlx")==TRUE ~ 1,TRUE~0),
hawaiian = case_when(str_detect(pizza_type_id,"hawaiian")==TRUE ~ 1,TRUE~0),
ital_cpcllo = case_when(str_detect(pizza_type_id,"ital_cpcllo")==TRUE ~ 1,TRUE~0),
napolitana = case_when(str_detect(pizza_type_id,"napolitana")==TRUE ~ 1,TRUE~0),
pep_msh_pep = case_when(str_detect(pizza_type_id,"pep_msh_pep")==TRUE ~ 1,TRUE~0),
pepperoni = case_when(str_detect(pizza_type_id,"pepperoni")==TRUE ~ 1,TRUE~0),
the_greek = case_when(str_detect(pizza_type_id,"the_greek")==TRUE ~ 1,TRUE~0),
brie_carre = case_when(str_detect(pizza_type_id,"brie_carre")==TRUE ~ 1,TRUE~0),
calabrese = case_when(str_detect(pizza_type_id,"calabrese")==TRUE ~ 1,TRUE~0),
ital_supr = case_when(str_detect(pizza_type_id,"ital_supr")==TRUE ~ 1,TRUE~0),
peppr_salami = case_when(str_detect(pizza_type_id,"peppr_salami")==TRUE ~ 1,TRUE~0),
prsc_argla = case_when(str_detect(pizza_type_id,"prsc_argla")==TRUE ~ 1,TRUE~0),
sicilian = case_when(str_detect(pizza_type_id,"sicilian")==TRUE ~ 1,TRUE~0),
soppressata = case_when(str_detect(pizza_type_id,"soppressata")==TRUE ~ 1,TRUE~0),
spicy_ital = case_when(str_detect(pizza_type_id,"spicy_ital")==TRUE ~ 1,TRUE~0),
spinach_supr = case_when(str_detect(pizza_type_id,"spinach_supr")==TRUE ~ 1,TRUE~0),
five_cheese = case_when(str_detect(pizza_type_id,"five_cheese")==TRUE ~ 1,TRUE~0),
four_cheese = case_when(str_detect(pizza_type_id,"four_cheese")==TRUE ~ 1,TRUE~0),
green_garden = case_when(str_detect(pizza_type_id,"green_garden")==TRUE ~ 1,TRUE~0),
ital_veggie = case_when(str_detect(pizza_type_id,"ital_veggie")==TRUE ~ 1,TRUE~0),
mediterraneo = case_when(str_detect(pizza_type_id,"mediterraneo")==TRUE ~ 1,TRUE~0),
mexicana = case_when(str_detect(pizza_type_id,"mexicana")==TRUE ~ 1,TRUE~0),
spin_pesto = case_when(str_detect(pizza_type_id,"spin_pesto")==TRUE ~ 1,TRUE~0),
spinach_fet = case_when(str_detect(pizza_type_id,"spinach_fet")==TRUE ~ 1,TRUE~0),
veggie_veg = case_when(str_detect(pizza_type_id,"veggie_veg")==TRUE ~ 1,TRUE~0))
assoc_pizza$order_details_id=NULL
assoc_pizza$pizza_id=NULL
assoc_pizza$quantity=NULL
assoc_pizza$price=NULL
assoc_pizza$name=NULL
assoc_pizza$full_price=NULL
assoc_pizza$pizza_type_id=NULL
assoc_pizza$size=NULL
assoc_pizza$ingredients=NULL
assoc_pizza$category=NULL
by_order_id <- group_by(assoc_pizza,order_id) #creating summary dataset with ingredient counts and prices by order
association_data <- summarize(by_order_id,
bbq_ckn=sum(bbq_ckn), cali_ckn=sum(cali_ckn), ckn_alfredo=sum(ckn_alfredo), ckn_pesto=sum(ckn_pesto), southw_ckn=sum(southw_ckn), thai_ckn=sum(thai_ckn), big_meat=sum(big_meat), classic_dlx=sum(classic_dlx), hawaiian=sum(hawaiian), ital_cpcllo=sum(ital_cpcllo), napolitana=sum(napolitana), pep_msh_pep=sum(pep_msh_pep), pepperoni=sum(pepperoni), the_greek=sum(the_greek), brie_carre=sum(brie_carre), calabrese=sum(calabrese), ital_supr=sum(ital_supr), peppr_salami=sum(peppr_salami), prsc_argla=sum(prsc_argla), sicilian=sum(sicilian), soppressata=sum(soppressata), spicy_ital=sum(spicy_ital), spinach_supr=sum(spinach_supr), five_cheese=sum(five_cheese), four_cheese=sum(four_cheese), green_garden=sum(green_garden), ital_veggie=sum(ital_veggie), mediterraneo=sum(mediterraneo), mexicana=sum(mexicana), spin_pesto=sum(spin_pesto), spinach_fet=sum(spinach_fet), veggie_veg=sum(veggie_veg)) %>% mutate(
bbq_ckn=case_when((bbq_ckn>0)==TRUE~1,TRUE~0),
cali_ckn=case_when((cali_ckn>0)==TRUE~1,TRUE~0),
ckn_alfredo=case_when((ckn_alfredo>0)==TRUE~1,TRUE~0),
ckn_pesto=case_when((ckn_pesto>0)==TRUE~1,TRUE~0),
southw_ckn=case_when((southw_ckn>0)==TRUE~1,TRUE~0),
thai_ckn=case_when((thai_ckn>0)==TRUE~1,TRUE~0),
big_meat=case_when((big_meat>0)==TRUE~1,TRUE~0),
classic_dlx=case_when((classic_dlx>0)==TRUE~1,TRUE~0),
hawaiian=case_when((hawaiian>0)==TRUE~1,TRUE~0),
ital_cpcllo=case_when((ital_cpcllo>0)==TRUE~1,TRUE~0),
napolitana=case_when((napolitana>0)==TRUE~1,TRUE~0),
pep_msh_pep=case_when((pep_msh_pep>0)==TRUE~1,TRUE~0),
pepperoni=case_when((pepperoni>0)==TRUE~1,TRUE~0),
the_greek=case_when((the_greek>0)==TRUE~1,TRUE~0),
brie_carre=case_when((brie_carre>0)==TRUE~1,TRUE~0),
calabrese=case_when((calabrese>0)==TRUE~1,TRUE~0),
ital_supr=case_when((ital_supr>0)==TRUE~1,TRUE~0),
peppr_salami=case_when((peppr_salami>0)==TRUE~1,TRUE~0),
prsc_argla=case_when((prsc_argla>0)==TRUE~1,TRUE~0),
sicilian=case_when((sicilian>0)==TRUE~1,TRUE~0),
soppressata=case_when((soppressata>0)==TRUE~1,TRUE~0),
spicy_ital=case_when((spicy_ital>0)==TRUE~1,TRUE~0),
spinach_supr=case_when((spinach_supr>0)==TRUE~1,TRUE~0),
five_cheese=case_when((five_cheese>0)==TRUE~1,TRUE~0),
four_cheese=case_when((four_cheese>0)==TRUE~1,TRUE~0),
green_garden=case_when((green_garden>0)==TRUE~1,TRUE~0),
ital_veggie=case_when((ital_veggie>0)==TRUE~1,TRUE~0),
mediterraneo=case_when((mediterraneo>0)==TRUE~1,TRUE~0),
mexicana=case_when((mexicana>0)==TRUE~1,TRUE~0),
spin_pesto=case_when((spin_pesto>0)==TRUE~1,TRUE~0),
spinach_fet=case_when((spinach_fet>0)==TRUE~1,TRUE~0),
veggie_veg=case_when((veggie_veg>0)==TRUE~1,TRUE~0))
association_data$order_id=NULL
cleaned <- as(as.matrix(association_data), "transactions")
itemFrequencyPlot(cleaned, support=.1, xlab = "Pizza Type", ylab = "Relative Item Frequency", col="red")
Figure 17: Association of relative item frequency.
Figure 18: Association rules describing the relationship between different pizzas.
sample_index <- sample(nrow(pizza_summary_with_times), nrow(pizza_summary_with_times)*0.90)
pizza_train <- pizza_summary_with_times[sample_index,]
pizza_test <- pizza_summary_with_times[-sample_index,]
pizza_rpart <- rpart(formula = TotalPrice ~ ., data = pizza_summary_with_times)
prp(pizza_rpart, digits = 2, extra = 1)
Figure 19: This shows the separation of orders by total number of items according to the total price of the order. According to this, for example, a total of 8,201 orders have less than 2 items with an average price of $17.
pizza_pred <- predict(pizza_rpart, newdata = pizza_test)
print(paste("MSE:", mean((pizza_test$TotalPrice - pizza_pred)^2)))
## [1] "MSE: 63.8177880696272"
sample_index <- sample(nrow(pizza_complete), nrow(pizza_complete)*0.90)
pizzac_train <- pizza_complete[sample_index,]
pizzac_test <- pizza_complete[-sample_index,]
pizza_complete$category <- as.factor(pizza_complete$category)
pizza_complete$size <- as.factor(pizza_complete$size)
pizzac_rpart <- rpart(formula = price ~ category + size, data = pizza_complete)
prp(pizzac_rpart, digits = 2, extra = 1)
Figure 20: This shows the separation of sizes of pizza by total number of items according to the sizes of the pizza. According to this, for example, a total of 8,609 orders have small classic or veggie pizza classifications with an average price of $12.
pizzac_pred <- predict(pizzac_rpart, newdata = pizzac_test)
print(paste("MSE:", mean((pizzac_test$price - pizzac_pred)^2)))
## [1] "MSE: 1.91295280206912"
Total Items Sold. Below is an initial regression detailing
which variables have the greatest impact on total price.
summary(model_totalitems3)
##
## Call:
## lm(formula = pizza_summary_reg$TotalItems ~ Pepperoni + Ham +
## BBQChicken + Chicken + Pineapple + Mozzarella + Provolone +
## Jalapeno + Tomato + Red_Pepper + Calabrese + Red_Onion +
## Chipotle + Capocollo + Mushrooms + Green_Olives + Green_Peppers +
## Artichoke + Fontina + Spinach + Pesto + Italian_Sausage +
## Goat_Cheese + Anchovies + Kalamata + Beef + Brie, data = pizza_summary_reg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.464e-11 -5.000e-15 -1.000e-15 3.000e-15 6.971e-11
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.286e-13 5.732e-15 7.478e+01 < 2e-16 ***
## Pepperoni -7.180e-14 9.993e-15 -7.184e+00 6.97e-13 ***
## Ham -1.500e+00 3.784e-14 -3.964e+13 < 2e-16 ***
## BBQChicken 1.315e-13 2.851e-14 4.615e+00 3.96e-06 ***
## Chicken -9.328e-14 1.180e-14 -7.904e+00 2.83e-15 ***
## Pineapple 1.500e+00 3.443e-14 4.357e+13 < 2e-16 ***
## Mozzarella 1.000e+00 7.914e-15 1.264e+14 < 2e-16 ***
## Provolone -6.167e-14 1.616e-14 -3.815e+00 0.000136 ***
## Jalapeno 2.000e+00 4.759e-14 4.202e+13 < 2e-16 ***
## Tomato 1.297e-13 2.865e-14 4.526e+00 6.04e-06 ***
## Red_Pepper -5.000e-01 1.462e-14 -3.419e+13 < 2e-16 ***
## Calabrese -2.000e+00 6.037e-14 -3.313e+13 < 2e-16 ***
## Red_Onion 1.000e+00 2.333e-14 4.287e+13 < 2e-16 ***
## Chipotle -1.500e+00 4.837e-14 -3.101e+13 < 2e-16 ***
## Capocollo 1.000e+00 3.384e-14 2.955e+13 < 2e-16 ***
## Mushrooms 5.000e-01 2.577e-14 1.940e+13 < 2e-16 ***
## Green_Olives 1.000e+00 3.507e-14 2.851e+13 < 2e-16 ***
## Green_Peppers 5.000e-01 2.455e-14 2.036e+13 < 2e-16 ***
## Artichoke -5.000e-01 2.010e-14 -2.488e+13 < 2e-16 ***
## Fontina -5.000e-01 3.256e-14 -1.536e+13 < 2e-16 ***
## Spinach -5.000e-01 1.605e-14 -3.116e+13 < 2e-16 ***
## Pesto 2.000e+00 3.678e-14 5.438e+13 < 2e-16 ***
## Italian_Sausage 1.000e+00 1.593e-14 6.278e+13 < 2e-16 ***
## Goat_Cheese 5.000e-01 2.358e-14 2.120e+13 < 2e-16 ***
## Anchovies -1.000e+00 3.367e-14 -2.970e+13 < 2e-16 ***
## Kalamata 1.000e+00 3.192e-14 3.133e+13 < 2e-16 ***
## Beef -1.000e+00 3.873e-14 -2.582e+13 < 2e-16 ***
## Brie 1.000e+00 2.319e-14 4.313e+13 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.015e-13 on 21322 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 9.146e+27 on 27 and 21322 DF, p-value: < 2.2e-16
qqnorm(model_totalitems3$residuals, main = "Total Ingredients Regression")
qqline(model_totalitems3$residuals, col = 'blue')
Figure 21: This shows the QQ plot of the regression of all ingredients used on pizzas sold.
broken_out_all_ing <- broken_out[,-2:-10]
broken_out_all_ing <- broken_out_all_ing[,-1]
broken_out_all_ing_tran <- as(as.matrix(broken_out_all_ing), "transactions")
itemFrequencyPlot(broken_out_all_ing_tran, support = 0.1, cex.names = 0.8, main = "Item Frequency Plot", xlab = "Frequency", topN = 15, col = 'red', type = "absolute", horiz = TRUE)
Figure 22: This is the frequency of the times each ingredient has been sold for the top 15 ingredients.
options(width = 300)
basket_rules_all_ingred <- apriori(broken_out_all_ing_tran, parameter = list(sup = 0.1, conf = .9, target = "rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
## 0.9 0.1 1 none FALSE TRUE 5 0.1 1 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4862
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[63 item(s), 48620 transaction(s)] done [0.01s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [45 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
quality(basket_rules_all_ingred)<-round(quality(basket_rules_all_ingred),digits=3)
basket_rules_ai.sorted <- sort(basket_rules_all_ingred, by="lift")
options(width = 300)
inspect(basket_rules_ai.sorted)
## lhs rhs support confidence coverage lift count
## [1] {Pepperoni} => {Pear} 0.212 1 0.212 4.720 10300
## [2] {Pear} => {Pepperoni} 0.212 1 0.212 4.720 10300
## [3] {Pepperoni} => {Thyme} 0.212 1 0.212 4.720 10300
## [4] {Thyme} => {Pepperoni} 0.212 1 0.212 4.720 10300
## [5] {Pear} => {Thyme} 0.212 1 0.212 4.720 10300
## [6] {Thyme} => {Pear} 0.212 1 0.212 4.720 10300
## [7] {Pepperoni, Pear} => {Thyme} 0.212 1 0.212 4.720 10300
## [8] {Pepperoni, Thyme} => {Pear} 0.212 1 0.212 4.720 10300
## [9] {Pear, Thyme} => {Pepperoni} 0.212 1 0.212 4.720 10300
## [10] {Chicken, Red_Onion} => {Red_Pepper} 0.108 1 0.108 2.986 5237
## [11] {Chicken, Onion} => {Red_Pepper} 0.108 1 0.108 2.986 5237
## [12] {Chicken, Tomato} => {Red_Pepper} 0.155 1 0.155 2.986 7533
## [13] {Chicken, Red_Onion, Onion} => {Red_Pepper} 0.108 1 0.108 2.986 5237
## [14] {Red_Pepper, Mushrooms} => {Red_Onion} 0.101 1 0.101 2.487 4906
## [15] {Mushrooms, Onion} => {Red_Onion} 0.130 1 0.130 2.487 6338
## [16] {Chicken, Onion} => {Red_Onion} 0.108 1 0.108 2.487 5237
## [17] {Red_Pepper, Onion} => {Red_Onion} 0.218 1 0.218 2.487 10619
## [18] {Red_Pepper, Mushrooms, Onion} => {Red_Onion} 0.101 1 0.101 2.487 4906
## [19] {Chicken, Red_Pepper, Onion} => {Red_Onion} 0.108 1 0.108 2.487 5237
## [20] {Tomato, Red_Pepper, Onion} => {Red_Onion} 0.149 1 0.149 2.487 7223
## [21] {Red_Onion} => {Onion} 0.402 1 0.402 2.219 19547
## [22] {Garlic, Green_Olives} => {Onion} 0.107 1 0.107 2.219 5187
## [23] {Red_Pepper, Mushrooms} => {Onion} 0.101 1 0.101 2.219 4906
## [24] {Red_Onion, Mushrooms} => {Onion} 0.130 1 0.130 2.219 6338
## [25] {Chicken, Red_Onion} => {Onion} 0.108 1 0.108 2.219 5237
## [26] {Red_Pepper, Red_Onion} => {Onion} 0.218 1 0.218 2.219 10619
## [27] {Red_Onion, Garlic} => {Onion} 0.226 1 0.226 2.219 10971
## [28] {Tomato, Red_Onion} => {Onion} 0.303 1 0.303 2.219 14719
## [29] {Tomato, Garlic, Green_Olives} => {Onion} 0.107 1 0.107 2.219 5187
## [30] {Red_Pepper, Red_Onion, Mushrooms} => {Onion} 0.101 1 0.101 2.219 4906
## [31] {Chicken, Red_Pepper, Red_Onion} => {Onion} 0.108 1 0.108 2.219 5237
## [32] {Tomato, Red_Pepper, Red_Onion} => {Onion} 0.149 1 0.149 2.219 7223
## [33] {Tomato, Red_Onion, Garlic} => {Onion} 0.196 1 0.196 2.219 9539
## [34] {Capocollo} => {Garlic} 0.135 1 0.135 1.773 6572
## [35] {Green_Olives, Onion} => {Garlic} 0.107 1 0.107 1.773 5187
## [36] {Tomato, Capocollo} => {Garlic} 0.135 1 0.135 1.773 6572
## [37] {Tomato, Green_Olives, Onion} => {Garlic} 0.107 1 0.107 1.773 5187
## [38] {Green_Olives} => {Tomato} 0.127 1 0.127 1.766 6174
## [39] {Artichoke} => {Tomato} 0.117 1 0.117 1.766 5682
## [40] {Capocollo} => {Tomato} 0.135 1 0.135 1.766 6572
## [41] {Green_Olives, Onion} => {Tomato} 0.107 1 0.107 1.766 5187
## [42] {Garlic, Green_Olives} => {Tomato} 0.107 1 0.107 1.766 5187
## [43] {Garlic, Capocollo} => {Tomato} 0.135 1 0.135 1.766 6572
## [44] {Red_Pepper, Garlic} => {Tomato} 0.130 1 0.130 1.766 6316
## [45] {Garlic, Green_Olives, Onion} => {Tomato} 0.107 1 0.107 1.766 5187
plot(basket_rules_ai.sorted, method = "graph", engine = "htmlwidget")
Figure 23: Above is an interactive graph showing all the rules and which ingredients are used, as well as their lift, support, confidence, and count. A blue arrow indicates that the starting ingredient is on the left side and the end of the arrow is the right side. A red arrow indicates that the starting ingredient is on the right side and the end of the arrow is the left side. This can also be clicked on to show related ingredients. It can also be zoomed in to see the details better.
plot(basket_rules_ai.sorted, main = "Confidence by Support and Lift")
Figure 24: Above is a scatter plot of all 45 basket rules.
Figure 11, pizzas are sold primarily
between the hours of 12:30 PM and 6:30 PM. According to
Figure 2, pizzas are primarily sold during the first month.
While both demonstrations are slightly off from expectations, this can
help Plato’s shape their staffing strategy, as well as allow them to
increase sales thorough various specials and other advertising during
the less popular dates/times.
Figure 8 and the most common size sold is Large according
to Figure 7
Figure 22, the following ingredients are
used the most times:
Figure 1.
Figure 15,
is consistent with the initial exploratory data. This is also consistent
with the regression, as well. There are a few recommendations we can
make from this:Offering specials on larger orders (such as a 15% discount on orders over $50.00) to increase the number of items and the price of items sold
Offer bundled deals (such as deals on orders over 5 items) to increase the number of items sold
Offer specials during certain months (such as a $2.00 discount per order during the months of August and September) to increase sales during these months