Understanding the Database Structure

To get a good handle on the database, it’s important to know how the tables are structured and connected. The dataset has four tables: orders, order_details, pizzas, and pizza_types. Each of these tables is key to understanding pizza sales and customer habits.

Connections Between Tables The connections between these tables are crucial for accurate analysis. Here’s how they are linked:

These links allows tables tobe connected. Importing the library and Data

#Importing the library
suppressWarnings({library(sqldf)})
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
#Importing the data
orders <- read.csv("D:/SQL Practice/orders.csv", header = T)
order_details <- read.csv("D:/SQL Practice/order_details.csv", header = T)
pizzas <- read.csv("D:/SQL Practice/pizzas.csv", header = T)
pizza_types <- read.csv("D:/SQL Practice/pizza_types.csv", header = T)
#properties of the data
str(order)
## function (..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", 
##     "radix"))
summary(orders)
##     order_id         date               time          
##  Min.   :    1   Length:21350       Length:21350      
##  1st Qu.: 5338   Class :character   Class :character  
##  Median :10676   Mode  :character   Mode  :character  
##  Mean   :10676                                        
##  3rd Qu.:16013                                        
##  Max.   :21350
str(order_details)
## 'data.frame':    48620 obs. of  4 variables:
##  $ order_details_id: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ order_id        : int  1 2 2 2 2 2 3 3 4 5 ...
##  $ pizza_id        : chr  "hawaiian_m" "classic_dlx_m" "five_cheese_l" "ital_supr_l" ...
##  $ quantity        : int  1 1 1 1 1 1 1 1 1 1 ...
summary(order_details)
##  order_details_id    order_id       pizza_id            quantity   
##  Min.   :    1    Min.   :    1   Length:48620       Min.   :1.00  
##  1st Qu.:12156    1st Qu.: 5337   Class :character   1st Qu.:1.00  
##  Median :24311    Median :10682   Mode  :character   Median :1.00  
##  Mean   :24311    Mean   :10701                      Mean   :1.02  
##  3rd Qu.:36465    3rd Qu.:16100                      3rd Qu.:1.00  
##  Max.   :48620    Max.   :21350                      Max.   :4.00
str(pizzas)
## 'data.frame':    96 obs. of  4 variables:
##  $ pizza_id     : chr  "bbq_ckn_s" "bbq_ckn_m" "bbq_ckn_l" "cali_ckn_s" ...
##  $ pizza_type_id: chr  "bbq_ckn" "bbq_ckn" "bbq_ckn" "cali_ckn" ...
##  $ size         : chr  "S" "M" "L" "S" ...
##  $ price        : num  12.8 16.8 20.8 12.8 16.8 ...
summary(pizzas)
##    pizza_id         pizza_type_id          size               price      
##  Length:96          Length:96          Length:96          Min.   : 9.75  
##  Class :character   Class :character   Class :character   1st Qu.:12.50  
##  Mode  :character   Mode  :character   Mode  :character   Median :16.25  
##                                                           Mean   :16.44  
##                                                           3rd Qu.:20.25  
##                                                           Max.   :35.95
str(pizza_types)
## 'data.frame':    32 obs. of  4 variables:
##  $ pizza_type_id: chr  "bbq_ckn" "cali_ckn" "ckn_alfredo" "ckn_pesto" ...
##  $ name         : chr  "The Barbecue Chicken Pizza" "The California Chicken Pizza" "The Chicken Alfredo Pizza" "The Chicken Pesto Pizza" ...
##  $ category     : chr  "Chicken" "Chicken" "Chicken" "Chicken" ...
##  $ ingredients  : chr  "Barbecued Chicken, Red Peppers, Green Peppers, Tomatoes, Red Onions, Barbecue Sauce" "Chicken, Artichoke, Spinach, Garlic, Jalapeno Peppers, Fontina Cheese, Gouda Cheese" "Chicken, Red Onions, Red Peppers, Mushrooms, Asiago Cheese, Alfredo Sauce" "Chicken, Tomatoes, Red Peppers, Spinach, Garlic, Pesto Sauce" ...
summary(pizza_types)
##  pizza_type_id          name             category         ingredients       
##  Length:32          Length:32          Length:32          Length:32         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character

To apply the correct joins just checking the nulls values in the data

sum(is.na(orders))
## [1] 0
sum(is.na(order_details))
## [1] 0
sum(is.na(pizzas))
## [1] 0
sum(is.na(pizza_types))
## [1] 0
#as there are no any null values so when we will need join we will just apply the inner joins

Now running the queries Q#1: How many orders are placed in total?

#The SQL query to determine the total number of orders placed is straightforward:
q1 <- sqldf("SELECT 
    COUNT(order_id) AS total_orders
FROM
    orders;")
#output of q1
q1
##   total_orders
## 1        21350

Q#2: Calculate the total revenue generated from pizza sales.

#Combining tables via joins is necessary to calculate the total revenue accurately.
q2 <- sqldf("SELECT 
    ROUND(SUM(order_details.quantity * pizzas.price),
            2) AS total_revenue
FROM
    order_details
        JOIN
    pizzas ON pizzas.pizza_id = order_details.pizza_id;")
#output of q2
q2
##   total_revenue
## 1      817860.1

Q#3: Identify the highest-priced pizza.

#As identifying the highest-priced pizza helps in understanding the pricing strategy and the premium offerings.
q3 <- sqldf("SELECT 
    pizza_types.name, pizzas.price
FROM
    pizza_types
        JOIN
    pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
ORDER BY 2 DESC
LIMIT 1;")
#output of q3
q3
##              name price
## 1 The Greek Pizza 35.95
#The analysis reveals the highest-priced pizza is The Greek Pizza, which can be crucial for marketing and positioning strategies.

Q#4: Identify the most common pizza size ordered.

#Understanding customer preferences in terms of pizza size can help in inventory management and promotional strategies.
q4 <- sqldf("SELECT 
    pizzas.size,
    COUNT(order_details.order_details_id) AS order_count
FROM
    pizzas
        JOIN
    order_details ON pizzas.pizza_id = order_details.pizza_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;")
#output of q4
q4
##   size order_count
## 1    L       18526
#The most commonly ordered pizza size L indicates customer preferences and can guide future promotions and menu adjustments.

Q#5: Top 5 Most Ordered Pizza Types

q5 <- sqldf("SELECT 
    pizza_types.name, COUNT(order_details.quantity) AS quantity
FROM
    pizza_types
        JOIN
    pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
        JOIN
    order_details ON order_details.pizza_id = pizzas.pizza_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;")
q5
##                         name quantity
## 1   The Classic Deluxe Pizza     2416
## 2 The Barbecue Chicken Pizza     2372
## 3         The Hawaiian Pizza     2370
## 4        The Pepperoni Pizza     2369
## 5     The Thai Chicken Pizza     2315
#This analysis provides insights into customer favorites, which can influence marketing campaigns and menu optimization.

Q#6: Quantity of Each Pizza Category Ordered

#Grouping the total quantities ordered by each pizza category helps in understanding the distribution of sales across different categories.
q6 <- sqldf("SELECT 
    pizza_types.category,
    SUM(order_details.quantity) AS quantity
FROM
    pizza_types
        JOIN
    pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
        JOIN
    order_details ON order_details.pizza_id = pizzas.pizza_id
GROUP BY 1
ORDER BY 2 DESC;")
q6
##   category quantity
## 1  Classic    14888
## 2  Supreme    11987
## 3   Veggie    11649
## 4  Chicken    11050
#This metric highlights the popularity of different pizza categories, aiding in targeted marketing and inventory planning.

Q#7: Category-Wise Distribution of Pizzas

#Analyzing the distribution of pizzas by category helps in understanding which categories are more popular among customers.
q7 <- sqldf("SELECT 
    category AS category_name, COUNT(name) AS pizza_count
FROM
    pizza_types
GROUP BY 1
ORDER BY 2 DESC;")
q7
##   category_name pizza_count
## 1        Veggie           9
## 2       Supreme           9
## 3       Classic           8
## 4       Chicken           6
#Understanding peak order times can help optimize staffing, operations, and promotional activities.

Q#8: Category-Wise Distribution of Pizzas

#Analyzing the distribution of pizzas by category helps in understanding which categories are more popular among customers.
q8 <- sqldf("SELECT 
    category AS category_name, COUNT(name) AS pizza_count
FROM
    pizza_types
GROUP BY 1
ORDER BY 2 DESC;")
q8
##   category_name pizza_count
## 1        Veggie           9
## 2       Supreme           9
## 3       Classic           8
## 4       Chicken           6
#This information is valuable for menu planning and understanding customer preferences across different pizza types

Q#9: Average Number of Pizzas Ordered per Day

#Grouping orders by date and calculating the average number of pizzas ordered per day provides insights into daily sales trends.
q9 <- sqldf("SELECT round(avg(quantity), 0) as avg_pizza_ordered_per_day 
FROM 
(SELECT 
    orders.date, SUM(order_details.quantity) AS quantity
FROM
    orders
        JOIN
    order_details ON orders.order_id = order_details.order_id
GROUP BY 1) AS order_quantity;")
q9
##   avg_pizza_ordered_per_day
## 1                       138
#This analysis helps in understanding daily sales fluctuations and planning for high-demand periods.

Q#10: Top 3 Most Ordered Pizza Types Based on Revenue

#Identifying the top 3 pizza types based on revenue helps in recognizing the most lucrative products.
q10 <- sqldf("SELECT 
    pizza_types.name AS pizza_name,
    SUM(order_details.quantity * pizzas.price) AS revenue
FROM
    pizza_types
        JOIN
    pizzas ON pizzas.pizza_type_id = pizza_types.pizza_type_id
        JOIN
    order_details ON order_details.pizza_id = pizzas.pizza_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 3;")
q10
##                     pizza_name  revenue
## 1       The Thai Chicken Pizza 43434.25
## 2   The Barbecue Chicken Pizza 42768.00
## 3 The California Chicken Pizza 41409.50
#This information is crucial for focusing marketing efforts and optimizing the menu to maximize revenue.

Q#11: Percentage Contribution of Each Pizza Type to Total Revenue

#Calculating the percentage contribution of each pizza type to total revenue helps in understanding the revenue distribution among different products.
q11 <- sqldf("WITH total_revenue_cte AS (
    SELECT 
        ROUND(SUM(od.quantity * p.price), 2) AS total_revenue
    FROM
        order_details od
        JOIN pizzas p ON p.pizza_id = od.pizza_id
),
category_revenue_cte AS (
    SELECT 
        pt.category AS pizza_category,
        SUM(od.quantity * p.price) AS category_revenue
    FROM
        pizza_types pt
        JOIN pizzas p ON p.pizza_type_id = pt.pizza_type_id
        JOIN order_details od ON od.pizza_id = p.pizza_id
    GROUP BY 
        pt.category
)
SELECT
    cr.pizza_category,
    ROUND((cr.category_revenue / tr.total_revenue) * 100, 2) AS revenue
FROM
    category_revenue_cte cr
    CROSS JOIN total_revenue_cte tr
ORDER BY
    revenue DESC;")
q11
##   pizza_category revenue
## 1        Classic   26.91
## 2        Supreme   25.46
## 3        Chicken   23.96
## 4         Veggie   23.68
#This analysis highlights the financial impact of each pizza type, aiding in strategic business decisions.

Q#12: Cumulative Revenue Over Time

q12 <- sqldf("SELECT date, SUM(revenue) OVER(ORDER BY date) AS cumulative_revenue
FROM
(SELECT 
    orders.date,
    SUM(order_details.quantity * pizzas.price) AS revenue
FROM
    order_details
        JOIN
    pizzas ON order_details.pizza_id = pizzas.pizza_id
        JOIN
    orders ON orders.order_id = order_details.order_id
GROUP BY 1) AS sales;")

Q#13: Top 3 Most Ordered Pizza Types by Revenue for Each Category

#Determining the top 3 most ordered pizza types by revenue for each category provides a granular view of product performance within categories.
q13 <- sqldf("SELECT name, revenue
FROM (
    SELECT category, name, revenue,
           RANK() OVER(PARTITION BY category ORDER BY revenue DESC) AS rn
    FROM (
        SELECT pizza_types.category, pizza_types.name, 
               SUM(order_details.quantity * pizzas.price) AS revenue
        FROM pizza_types
        JOIN pizzas ON pizza_types.pizza_type_id = pizzas.pizza_type_id
        JOIN order_details ON order_details.pizza_id = pizzas.pizza_id
        GROUP BY pizza_types.category, pizza_types.name) AS a ) AS b
WHERE rn <= 3 LIMIT 3;")
q13
##                           name  revenue
## 1       The Thai Chicken Pizza 43434.25
## 2   The Barbecue Chicken Pizza 42768.00
## 3 The California Chicken Pizza 41409.50
#This detailed analysis helps in understanding the most profitable products within each category, guiding product development and marketing strategies.

These comprehensive analyses will provide valuable insights into the performance of a pizzeria. By understanding customer preferences, revenue patterns, and order distributions, businesses can make informed decisions to enhance their operations, marketing strategies, and overall profitability. Data-driven insights like these are essential for sustaining growth and achieving business success in a competitive market.