Before analyzing the data, we must load the necessary R libraries and engineer our core business metric: Order Value (Price × Quantity).
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(reshape2)
# Load data (Ensure this path matches your computer)
df <- read.csv('/Users/harshdeepsingh/Downloads/Foodpanda Analysis Dataset.csv')
# Create Order Value metric
df <- df %>% mutate(order_value = price * quantity)
To begin, we need a baseline understanding of the scale and scope of our Foodpanda dataset.
First, we look at the total number of transactions processed in this dataset to understand the scale of our analysis.
nrow(df)
## [1] 6000
Next, we calculate the number of unique restaurant partners fulfilling these orders.
df %>% summarise(unique_restaurants = n_distinct(restaurant_name))
## unique_restaurants
## 1 5
What are the most popular types of food being ordered? We identify the top 5 categories by total volume.
df %>% count(category, sort = TRUE) %>% head(5)
## category n
## 1 Italian 1236
## 2 Fast Food 1222
## 3 Continental 1211
## 4 Chinese 1198
## 5 Dessert 1133
We check the overall health of the platform by finding the average customer rating across all orders.
df %>% summarise(avg_rating = mean(rating, na.rm = TRUE))
## avg_rating
## 1 2.996833
This visualization shows us which cities are driving the most traffic and order volume on the platform.
city_orders <- df %>% count(city, sort = TRUE)
ggplot(city_orders, aes(x = reorder(city, n), y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
theme_minimal() +
labs(title = "Total Orders per City", x = "City", y = "Orders")
We now look deeper to see which specific restaurants and categories are generating the most value.
Identifying our top-performing partners based strictly on average customer satisfaction.
rating_rest <- df %>%
group_by(restaurant_name) %>%
summarise(avg_rating = mean(rating, na.rm = TRUE)) %>%
arrange(desc(avg_rating)) %>%
head(10)
ggplot(rating_rest, aes(x = reorder(restaurant_name, avg_rating), y = avg_rating)) +
geom_bar(stat = "identity", fill = "lightgreen") +
coord_flip() +
theme_minimal() +
labs(title = "Top 10 Restaurants by Customer Rating", x = "Restaurant", y = "Average Rating")
Which single restaurant gets customers to spend the most money per order on average?
df %>%
group_by(restaurant_name) %>%
summarise(avg_value = mean(order_value, na.rm = TRUE)) %>%
arrange(desc(avg_value)) %>%
head(1)
## # A tibble: 1 × 2
## restaurant_name avg_value
## <chr> <dbl>
## 1 Pizza Hut 2453.
While some cities might have high order volume, this chart shows which cities actually generate the most total revenue.
city_revenue <- df %>%
group_by(city) %>%
summarise(revenue = sum(order_value, na.rm = TRUE))
ggplot(city_revenue, aes(x = reorder(city, revenue), y = revenue)) +
geom_bar(stat = "identity", fill = "darkblue") +
coord_flip() +
theme_minimal() +
labs(title = "Total Revenue per City", x = "City", y = "Total Revenue")
This identifies our most lucrative food categories, informing where marketing budgets should be allocated.
cat_revenue <- df %>%
group_by(category) %>%
summarise(revenue = sum(order_value, na.rm = TRUE))
ggplot(cat_revenue, aes(x = reorder(category, revenue), y = revenue)) +
geom_bar(stat = "identity", fill = "darkgreen") +
coord_flip() +
theme_minimal() +
labs(title = "Total Revenue by Cuisine Category", x = "Category", y = "Revenue")
Let’s break down customer habits using proportion analysis.
Instead of a simple average, we categorize ratings into ‘Excellent’, ‘Average’, and ‘Poor’ to expose the true distribution of customer sentiment.
rating_tiers <- df %>%
mutate(tier = case_when(
rating >= 4.5 ~ "Excellent (4.5-5.0)",
rating >= 3.0 ~ "Average (3.0-4.4)",
TRUE ~ "Poor (< 3.0)"
)) %>%
count(tier)
ggplot(rating_tiers, aes(x = "", y = n, fill = tier)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) +
scale_fill_brewer(palette = "Set2") +
theme_void() +
labs(title = "Proportion of Orders by Satisfaction Tier")
Does Foodpanda make more money from solo diners or group orders? This chart splits revenue by cart size.
order_size_rev <- df %>%
mutate(size_type = ifelse(quantity == 1, "Single Item", "Multi-Item")) %>%
group_by(size_type) %>%
summarise(revenue = sum(order_value, na.rm = TRUE))
ggplot(order_size_rev, aes(x = "", y = revenue, fill = size_type)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) +
scale_fill_manual(values = c("Single Item" = "#FF9999", "Multi-Item" = "#66B2FF")) +
theme_void() +
labs(title = "Revenue Share: Single vs. Multi-Item Orders")
Box plots allow us to see the spread of pricing data and identify extreme outliers.
Using a 95th-percentile dynamic zoom, this plot shows the typical spending ranges for different cuisines, ignoring rare extreme luxury orders.
upper_limit <- quantile(df$order_value, 0.95, na.rm = TRUE)
ggplot(df, aes(x = reorder(category, order_value, FUN = median), y = order_value)) +
geom_boxplot(fill = "lightblue", outlier.alpha = 0.1) +
coord_flip(ylim = c(0, upper_limit)) +
theme_minimal() +
labs(title = "Order Value Spread by Category", x = "Category", y = "Order Value")
We use linear regression models to test common business assumptions mathematically.
We aggregate to the restaurant level to see if customers are more satisfied when they pay premium prices, or if they become harsher critics.
rest_price_rating <- df %>%
group_by(restaurant_name) %>%
summarise(
avg_item_price = mean(price, na.rm = TRUE),
avg_rating = mean(rating, na.rm = TRUE),
total_orders = n()
) %>%
filter(total_orders > 5)
ggplot(rest_price_rating, aes(x = avg_item_price, y = avg_rating)) +
geom_point(aes(size = total_orders), alpha = 0.5, color = "darkcyan") +
geom_smooth(method = "lm", color = "darkred", se = TRUE) +
theme_minimal() +
labs(title = "Premium vs. Value: Do higher prices mean better ratings?", x = "Average Item Price", y = "Average Restaurant Rating", size = "Total Orders")
## `geom_smooth()` using formula = 'y ~ x'
This model tests whether the restaurants receiving the highest volume of orders actually maintain the highest average customer ratings.
rest_performance <- df %>%
group_by(restaurant_name) %>%
summarise(total_orders = n(), avg_rating = mean(rating, na.rm = TRUE))
ggplot(rest_performance, aes(x = total_orders, y = avg_rating)) +
geom_point(alpha = 0.5, color = "darkgreen") +
geom_smooth(method = "lm", color = "orange", se = TRUE) +
theme_minimal() +
labs(title = "Popularity vs. Quality: Total Orders vs Average Rating", x = "Total Orders Received", y = "Average Rating")
## `geom_smooth()` using formula = 'y ~ x'
For our final analysis, we map how all major business metrics interact with one another.
First, we calculate total revenue, volume, average price, and average rating for every single restaurant on the platform.
restaurant_kpis <- df %>%
group_by(restaurant_name) %>%
summarise(
Total_Revenue = sum(order_value, na.rm = TRUE),
Total_Orders = n(),
Avg_Price = mean(price, na.rm = TRUE),
Avg_Rating = mean(rating, na.rm = TRUE)
) %>%
select(-restaurant_name)
We compute the mathematical correlation between these metrics to see which variables move together.
kpi_cor_matrix <- cor(restaurant_kpis, use = "complete.obs")
print(kpi_cor_matrix)
## Total_Revenue Total_Orders Avg_Price Avg_Rating
## Total_Revenue 1.0000000 0.9505147 0.9123879 0.1572310
## Total_Orders 0.9505147 1.0000000 0.8142577 0.4372608
## Avg_Price 0.9123879 0.8142577 1.0000000 -0.1399778
## Avg_Rating 0.1572310 0.4372608 -0.1399778 1.0000000
Finally, we visualize this matrix. This heatmap allows stakeholders to instantly identify the strongest positive (green) and negative (red) drivers of revenue and satisfaction.
kpi_melt <- melt(kpi_cor_matrix)
ggplot(kpi_melt, aes(x = Var1, y = Var2, fill = value)) +
geom_tile(color = "white") +
geom_text(aes(label = round(value, 2)), color = "black", size = 4) +
scale_fill_gradient2(low = "#d73027", high = "#1a9850", mid = "#ffffbf",
midpoint = 0, limit = c(-1,1), name="Correlation") +
theme_minimal() +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
panel.grid.major = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Correlation of Restaurant Business KPIs")