Understanding the Database Structure
To get a good handle on the database, it’s important to know how the tables are structured and connected. The dataset has four tables: orders, order_details, pizzas, and pizza_types. Each of these tables is key to understanding pizza sales and customer habits.
Connections Between Tables The connections between these tables are crucial for accurate analysis. Here’s how they are linked:
These links allows tables tobe connected. Importing the library and Data
#Importing the library
suppressWarnings({library(sqldf)})
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
#Importing the data
orders <- read.csv("D:/SQL Practice/orders.csv", header = T)
order_details <- read.csv("D:/SQL Practice/order_details.csv", header = T)
pizzas <- read.csv("D:/SQL Practice/pizzas.csv", header = T)
pizza_types <- read.csv("D:/SQL Practice/pizza_types.csv", header = T)
#properties of the data
str(order)
## function (..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell",
## "radix"))
summary(orders)
## order_id date time
## Min. : 1 Length:21350 Length:21350
## 1st Qu.: 5338 Class :character Class :character
## Median :10676 Mode :character Mode :character
## Mean :10676
## 3rd Qu.:16013
## Max. :21350
str(order_details)
## 'data.frame': 48620 obs. of 4 variables:
## $ order_details_id: int 1 2 3 4 5 6 7 8 9 10 ...
## $ order_id : int 1 2 2 2 2 2 3 3 4 5 ...
## $ pizza_id : chr "hawaiian_m" "classic_dlx_m" "five_cheese_l" "ital_supr_l" ...
## $ quantity : int 1 1 1 1 1 1 1 1 1 1 ...
summary(order_details)
## order_details_id order_id pizza_id quantity
## Min. : 1 Min. : 1 Length:48620 Min. :1.00
## 1st Qu.:12156 1st Qu.: 5337 Class :character 1st Qu.:1.00
## Median :24311 Median :10682 Mode :character Median :1.00
## Mean :24311 Mean :10701 Mean :1.02
## 3rd Qu.:36465 3rd Qu.:16100 3rd Qu.:1.00
## Max. :48620 Max. :21350 Max. :4.00
str(pizzas)
## 'data.frame': 96 obs. of 4 variables:
## $ pizza_id : chr "bbq_ckn_s" "bbq_ckn_m" "bbq_ckn_l" "cali_ckn_s" ...
## $ pizza_type_id: chr "bbq_ckn" "bbq_ckn" "bbq_ckn" "cali_ckn" ...
## $ size : chr "S" "M" "L" "S" ...
## $ price : num 12.8 16.8 20.8 12.8 16.8 ...
summary(pizzas)
## pizza_id pizza_type_id size price
## Length:96 Length:96 Length:96 Min. : 9.75
## Class :character Class :character Class :character 1st Qu.:12.50
## Mode :character Mode :character Mode :character Median :16.25
## Mean :16.44
## 3rd Qu.:20.25
## Max. :35.95
str(pizza_types)
## 'data.frame': 32 obs. of 4 variables:
## $ pizza_type_id: chr "bbq_ckn" "cali_ckn" "ckn_alfredo" "ckn_pesto" ...
## $ name : chr "The Barbecue Chicken Pizza" "The California Chicken Pizza" "The Chicken Alfredo Pizza" "The Chicken Pesto Pizza" ...
## $ category : chr "Chicken" "Chicken" "Chicken" "Chicken" ...
## $ ingredients : chr "Barbecued Chicken, Red Peppers, Green Peppers, Tomatoes, Red Onions, Barbecue Sauce" "Chicken, Artichoke, Spinach, Garlic, Jalapeno Peppers, Fontina Cheese, Gouda Cheese" "Chicken, Red Onions, Red Peppers, Mushrooms, Asiago Cheese, Alfredo Sauce" "Chicken, Tomatoes, Red Peppers, Spinach, Garlic, Pesto Sauce" ...
summary(pizza_types)
## pizza_type_id name category ingredients
## Length:32 Length:32 Length:32 Length:32
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
To apply the correct joins just checking the nulls values in the data
sum(is.na(orders))
## [1] 0
sum(is.na(order_details))
## [1] 0
sum(is.na(pizzas))
## [1] 0
sum(is.na(pizza_types))
## [1] 0
#as there are no any null values so when we will need join we will just apply the inner joins
Now running the queries Q#1: How many orders are placed in total?
#The SQL query to determine the total number of orders placed is straightforward:
q1 <- sqldf("SELECT
COUNT(order_id) AS total_orders
FROM
orders;")
#output of q1
q1
## total_orders
## 1 21350
Q#2: Calculate the total revenue generated from pizza sales.
#Combining tables via joins is necessary to calculate the total revenue accurately.
q2 <- sqldf("SELECT
ROUND(SUM(order_details.quantity * pizzas.price),
2) AS total_revenue
FROM
order_details
JOIN
pizzas ON pizzas.pizza_id = order_details.pizza_id;")
#output of q2
q2
## total_revenue
## 1 817860.1
Q#3: Identify the highest-priced pizza.
#As identifying the highest-priced pizza helps in understanding the pricing strategy and the premium offerings.
q3 <- sqldf("SELECT
pizza_types.name, pizzas.price
FROM
pizza_types
JOIN
pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
ORDER BY 2 DESC
LIMIT 1;")
#output of q3
q3
## name price
## 1 The Greek Pizza 35.95
#The analysis reveals the highest-priced pizza is The Greek Pizza, which can be crucial for marketing and positioning strategies.
Q#4: Identify the most common pizza size ordered.
#Understanding customer preferences in terms of pizza size can help in inventory management and promotional strategies.
q4 <- sqldf("SELECT
pizzas.size,
COUNT(order_details.order_details_id) AS order_count
FROM
pizzas
JOIN
order_details ON pizzas.pizza_id = order_details.pizza_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;")
#output of q4
q4
## size order_count
## 1 L 18526
#The most commonly ordered pizza size L indicates customer preferences and can guide future promotions and menu adjustments.
Q#5: Top 5 Most Ordered Pizza Types
q5 <- sqldf("SELECT
pizza_types.name, COUNT(order_details.quantity) AS quantity
FROM
pizza_types
JOIN
pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
JOIN
order_details ON order_details.pizza_id = pizzas.pizza_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;")
q5
## name quantity
## 1 The Classic Deluxe Pizza 2416
## 2 The Barbecue Chicken Pizza 2372
## 3 The Hawaiian Pizza 2370
## 4 The Pepperoni Pizza 2369
## 5 The Thai Chicken Pizza 2315
#This analysis provides insights into customer favorites, which can influence marketing campaigns and menu optimization.
Q#6: Quantity of Each Pizza Category Ordered
#Grouping the total quantities ordered by each pizza category helps in understanding the distribution of sales across different categories.
q6 <- sqldf("SELECT
pizza_types.category,
SUM(order_details.quantity) AS quantity
FROM
pizza_types
JOIN
pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
JOIN
order_details ON order_details.pizza_id = pizzas.pizza_id
GROUP BY 1
ORDER BY 2 DESC;")
q6
## category quantity
## 1 Classic 14888
## 2 Supreme 11987
## 3 Veggie 11649
## 4 Chicken 11050
#This metric highlights the popularity of different pizza categories, aiding in targeted marketing and inventory planning.
Q#7: Category-Wise Distribution of Pizzas
#Analyzing the distribution of pizzas by category helps in understanding which categories are more popular among customers.
q7 <- sqldf("SELECT
category AS category_name, COUNT(name) AS pizza_count
FROM
pizza_types
GROUP BY 1
ORDER BY 2 DESC;")
q7
## category_name pizza_count
## 1 Veggie 9
## 2 Supreme 9
## 3 Classic 8
## 4 Chicken 6
#Understanding peak order times can help optimize staffing, operations, and promotional activities.
Q#8: Category-Wise Distribution of Pizzas
#Analyzing the distribution of pizzas by category helps in understanding which categories are more popular among customers.
q8 <- sqldf("SELECT
category AS category_name, COUNT(name) AS pizza_count
FROM
pizza_types
GROUP BY 1
ORDER BY 2 DESC;")
q8
## category_name pizza_count
## 1 Veggie 9
## 2 Supreme 9
## 3 Classic 8
## 4 Chicken 6
#This information is valuable for menu planning and understanding customer preferences across different pizza types
Q#9: Average Number of Pizzas Ordered per Day
#Grouping orders by date and calculating the average number of pizzas ordered per day provides insights into daily sales trends.
q9 <- sqldf("SELECT round(avg(quantity), 0) as avg_pizza_ordered_per_day
FROM
(SELECT
orders.date, SUM(order_details.quantity) AS quantity
FROM
orders
JOIN
order_details ON orders.order_id = order_details.order_id
GROUP BY 1) AS order_quantity;")
q9
## avg_pizza_ordered_per_day
## 1 138
#This analysis helps in understanding daily sales fluctuations and planning for high-demand periods.
Q#10: Top 3 Most Ordered Pizza Types Based on Revenue
#Identifying the top 3 pizza types based on revenue helps in recognizing the most lucrative products.
q10 <- sqldf("SELECT
pizza_types.name AS pizza_name,
SUM(order_details.quantity * pizzas.price) AS revenue
FROM
pizza_types
JOIN
pizzas ON pizzas.pizza_type_id = pizza_types.pizza_type_id
JOIN
order_details ON order_details.pizza_id = pizzas.pizza_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 3;")
q10
## pizza_name revenue
## 1 The Thai Chicken Pizza 43434.25
## 2 The Barbecue Chicken Pizza 42768.00
## 3 The California Chicken Pizza 41409.50
#This information is crucial for focusing marketing efforts and optimizing the menu to maximize revenue.
Q#11: Percentage Contribution of Each Pizza Type to Total Revenue
#Calculating the percentage contribution of each pizza type to total revenue helps in understanding the revenue distribution among different products.
q11 <- sqldf("WITH total_revenue_cte AS (
SELECT
ROUND(SUM(od.quantity * p.price), 2) AS total_revenue
FROM
order_details od
JOIN pizzas p ON p.pizza_id = od.pizza_id
),
category_revenue_cte AS (
SELECT
pt.category AS pizza_category,
SUM(od.quantity * p.price) AS category_revenue
FROM
pizza_types pt
JOIN pizzas p ON p.pizza_type_id = pt.pizza_type_id
JOIN order_details od ON od.pizza_id = p.pizza_id
GROUP BY
pt.category
)
SELECT
cr.pizza_category,
ROUND((cr.category_revenue / tr.total_revenue) * 100, 2) AS revenue
FROM
category_revenue_cte cr
CROSS JOIN total_revenue_cte tr
ORDER BY
revenue DESC;")
q11
## pizza_category revenue
## 1 Classic 26.91
## 2 Supreme 25.46
## 3 Chicken 23.96
## 4 Veggie 23.68
#This analysis highlights the financial impact of each pizza type, aiding in strategic business decisions.
Q#12: Cumulative Revenue Over Time
q12 <- sqldf("SELECT date, SUM(revenue) OVER(ORDER BY date) AS cumulative_revenue
FROM
(SELECT
orders.date,
SUM(order_details.quantity * pizzas.price) AS revenue
FROM
order_details
JOIN
pizzas ON order_details.pizza_id = pizzas.pizza_id
JOIN
orders ON orders.order_id = order_details.order_id
GROUP BY 1) AS sales;")
Q#13: Top 3 Most Ordered Pizza Types by Revenue for Each Category
#Determining the top 3 most ordered pizza types by revenue for each category provides a granular view of product performance within categories.
q13 <- sqldf("SELECT name, revenue
FROM (
SELECT category, name, revenue,
RANK() OVER(PARTITION BY category ORDER BY revenue DESC) AS rn
FROM (
SELECT pizza_types.category, pizza_types.name,
SUM(order_details.quantity * pizzas.price) AS revenue
FROM pizza_types
JOIN pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
JOIN order_details ON order_details.pizza_id = pizzas.pizza_id
GROUP BY pizza_types.category, pizza_types.name) AS a ) AS b
WHERE rn <= 3 LIMIT 3;")
q13
## name revenue
## 1 The Thai Chicken Pizza 43434.25
## 2 The Barbecue Chicken Pizza 42768.00
## 3 The California Chicken Pizza 41409.50
#This detailed analysis helps in understanding the most profitable products within each category, guiding product development and marketing strategies.
These comprehensive analyses will provide valuable insights into the performance of a pizzeria. By understanding customer preferences, revenue patterns, and order distributions, businesses can make informed decisions to enhance their operations, marketing strategies, and overall profitability. Data-driven insights like these are essential for sustaining growth and achieving business success in a competitive market.