Setup & Data Engineering

Before analyzing the data, we must load the necessary R libraries and engineer our core business metric: Order Value (Price × Quantity).

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(reshape2)

# Load data (Ensure this path matches your computer)
df <- read.csv('/Users/harshdeepsingh/Downloads/Foodpanda Analysis Dataset.csv')

# Create Order Value metric
df <- df %>% mutate(order_value = price * quantity)

1. Platform Overview

To begin, we need a baseline understanding of the scale and scope of our Foodpanda dataset.

1.1 Total Order Volume

First, we look at the total number of transactions processed in this dataset to understand the scale of our analysis.

nrow(df)
## [1] 6000

1.2 Active Restaurant Partners

Next, we calculate the number of unique restaurant partners fulfilling these orders.

df %>% summarise(unique_restaurants = n_distinct(restaurant_name))
##   unique_restaurants
## 1                  5

1.3 Top 5 Cuisines

What are the most popular types of food being ordered? We identify the top 5 categories by total volume.

df %>% count(category, sort = TRUE) %>% head(5)
##      category    n
## 1     Italian 1236
## 2   Fast Food 1222
## 3 Continental 1211
## 4     Chinese 1198
## 5     Dessert 1133

1.4 Global Average Rating

We check the overall health of the platform by finding the average customer rating across all orders.

df %>% summarise(avg_rating = mean(rating, na.rm = TRUE))
##   avg_rating
## 1   2.996833

1.5 Order Distribution by City

This visualization shows us which cities are driving the most traffic and order volume on the platform.

city_orders <- df %>% count(city, sort = TRUE)

ggplot(city_orders, aes(x = reorder(city, n), y = n)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Total Orders per City", x = "City", y = "Orders")


2. Revenue & Performance Aggregation

We now look deeper to see which specific restaurants and categories are generating the most value.

2.1 Top 10 Restaurants by Rating

Identifying our top-performing partners based strictly on average customer satisfaction.

rating_rest <- df %>%
  group_by(restaurant_name) %>%
  summarise(avg_rating = mean(rating, na.rm = TRUE)) %>%
  arrange(desc(avg_rating)) %>%
  head(10)

ggplot(rating_rest, aes(x = reorder(restaurant_name, avg_rating), y = avg_rating)) +
  geom_bar(stat = "identity", fill = "lightgreen") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Top 10 Restaurants by Customer Rating", x = "Restaurant", y = "Average Rating")

2.2 Highest Average Order Value

Which single restaurant gets customers to spend the most money per order on average?

df %>%
  group_by(restaurant_name) %>%
  summarise(avg_value = mean(order_value, na.rm = TRUE)) %>%
  arrange(desc(avg_value)) %>%
  head(1)
## # A tibble: 1 × 2
##   restaurant_name avg_value
##   <chr>               <dbl>
## 1 Pizza Hut           2453.

2.3 Total Revenue per City

While some cities might have high order volume, this chart shows which cities actually generate the most total revenue.

city_revenue <- df %>%
  group_by(city) %>%
  summarise(revenue = sum(order_value, na.rm = TRUE))

ggplot(city_revenue, aes(x = reorder(city, revenue), y = revenue)) +
  geom_bar(stat = "identity", fill = "darkblue") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Total Revenue per City", x = "City", y = "Total Revenue")

2.4 Revenue per Cuisine Category

This identifies our most lucrative food categories, informing where marketing budgets should be allocated.

cat_revenue <- df %>%
  group_by(category) %>%
  summarise(revenue = sum(order_value, na.rm = TRUE))

ggplot(cat_revenue, aes(x = reorder(category, revenue), y = revenue)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Total Revenue by Cuisine Category", x = "Category", y = "Revenue")


3. Customer Behavior (Proportions)

Let’s break down customer habits using proportion analysis.

6.1 Customer Satisfaction Tiers

Instead of a simple average, we categorize ratings into ‘Excellent’, ‘Average’, and ‘Poor’ to expose the true distribution of customer sentiment.

rating_tiers <- df %>%
  mutate(tier = case_when(
    rating >= 4.5 ~ "Excellent (4.5-5.0)",
    rating >= 3.0 ~ "Average (3.0-4.4)",
    TRUE ~ "Poor (< 3.0)"
  )) %>%
  count(tier)

ggplot(rating_tiers, aes(x = "", y = n, fill = tier)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +
  scale_fill_brewer(palette = "Set2") +
  theme_void() +
  labs(title = "Proportion of Orders by Satisfaction Tier")

6.2 Revenue: Single vs. Multi-Item Orders

Does Foodpanda make more money from solo diners or group orders? This chart splits revenue by cart size.

order_size_rev <- df %>%
  mutate(size_type = ifelse(quantity == 1, "Single Item", "Multi-Item")) %>%
  group_by(size_type) %>%
  summarise(revenue = sum(order_value, na.rm = TRUE))

ggplot(order_size_rev, aes(x = "", y = revenue, fill = size_type)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +
  scale_fill_manual(values = c("Single Item" = "#FF9999", "Multi-Item" = "#66B2FF")) +
  theme_void() +
  labs(title = "Revenue Share: Single vs. Multi-Item Orders")


4. Pricing & Value Distributions

Box plots allow us to see the spread of pricing data and identify extreme outliers.

7.1 Order Value Spread by Category

Using a 95th-percentile dynamic zoom, this plot shows the typical spending ranges for different cuisines, ignoring rare extreme luxury orders.

upper_limit <- quantile(df$order_value, 0.95, na.rm = TRUE)

ggplot(df, aes(x = reorder(category, order_value, FUN = median), y = order_value)) +
  geom_boxplot(fill = "lightblue", outlier.alpha = 0.1) + 
  coord_flip(ylim = c(0, upper_limit)) + 
  theme_minimal() +
  labs(title = "Order Value Spread by Category", x = "Category", y = "Order Value")

7.2 Premium vs. Budget Categories

This visualization isolates the base item price to determine which cuisines inherently act as premium offerings versus budget drivers.

price_upper_limit <- quantile(df$price, 0.95, na.rm = TRUE)

ggplot(df, aes(x = reorder(category, price, FUN = median, na.rm = TRUE), y = price)) +
  geom_boxplot(fill = "lightcoral", outlier.alpha = 0.3) +
  coord_flip(ylim = c(0, price_upper_limit)) + 
  theme_minimal() +
  labs(title = "Premium vs. Budget: Base Item Price Spread", x = "Cuisine Category", y = "Item Base Price")


5. Testing Business Assumptions (Regression)

We use linear regression models to test common business assumptions mathematically.

8.1 Do higher prices mean better ratings?

We aggregate to the restaurant level to see if customers are more satisfied when they pay premium prices, or if they become harsher critics.

rest_price_rating <- df %>%
  group_by(restaurant_name) %>%
  summarise(
    avg_item_price = mean(price, na.rm = TRUE),
    avg_rating = mean(rating, na.rm = TRUE),
    total_orders = n()
  ) %>%
  filter(total_orders > 5)

ggplot(rest_price_rating, aes(x = avg_item_price, y = avg_rating)) +
  geom_point(aes(size = total_orders), alpha = 0.5, color = "darkcyan") +
  geom_smooth(method = "lm", color = "darkred", se = TRUE) +
  theme_minimal() +
  labs(title = "Premium vs. Value: Do higher prices mean better ratings?", x = "Average Item Price", y = "Average Restaurant Rating", size = "Total Orders")
## `geom_smooth()` using formula = 'y ~ x'

8.2 Does popularity equal quality?

This model tests whether the restaurants receiving the highest volume of orders actually maintain the highest average customer ratings.

rest_performance <- df %>%
  group_by(restaurant_name) %>%
  summarise(total_orders = n(), avg_rating = mean(rating, na.rm = TRUE))

ggplot(rest_performance, aes(x = total_orders, y = avg_rating)) +
  geom_point(alpha = 0.5, color = "darkgreen") +
  geom_smooth(method = "lm", color = "orange", se = TRUE) +
  theme_minimal() +
  labs(title = "Popularity vs. Quality: Total Orders vs Average Rating", x = "Total Orders Received", y = "Average Rating")
## `geom_smooth()` using formula = 'y ~ x'


6. Final KPI Correlation Analysis

For our final analysis, we map how all major business metrics interact with one another.

9.1 KPI Aggregation

First, we calculate total revenue, volume, average price, and average rating for every single restaurant on the platform.

restaurant_kpis <- df %>%
  group_by(restaurant_name) %>%
  summarise(
    Total_Revenue = sum(order_value, na.rm = TRUE),
    Total_Orders = n(),
    Avg_Price = mean(price, na.rm = TRUE),
    Avg_Rating = mean(rating, na.rm = TRUE)
  ) %>%
  select(-restaurant_name) 

9.2 Correlation Math

We compute the mathematical correlation between these metrics to see which variables move together.

kpi_cor_matrix <- cor(restaurant_kpis, use = "complete.obs")
print(kpi_cor_matrix)
##               Total_Revenue Total_Orders  Avg_Price Avg_Rating
## Total_Revenue     1.0000000    0.9505147  0.9123879  0.1572310
## Total_Orders      0.9505147    1.0000000  0.8142577  0.4372608
## Avg_Price         0.9123879    0.8142577  1.0000000 -0.1399778
## Avg_Rating        0.1572310    0.4372608 -0.1399778  1.0000000

9.3 Heatmap Visualization

Finally, we visualize this matrix. This heatmap allows stakeholders to instantly identify the strongest positive (green) and negative (red) drivers of revenue and satisfaction.

kpi_melt <- melt(kpi_cor_matrix)

ggplot(kpi_melt, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(value, 2)), color = "black", size = 4) +
  scale_fill_gradient2(low = "#d73027", high = "#1a9850", mid = "#ffffbf", 
                       midpoint = 0, limit = c(-1,1), name="Correlation") +
  theme_minimal() +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
        panel.grid.major = element_blank(),
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Correlation of Restaurant Business KPIs")