2025-03-30

Introduction

Welcome to my DAT 301 midterm project.

In this presentation, I will analyze the Global Superstore dataset, which contains detailed information about customer orders, including sales, profit, discounts, categories, and regions.

The main goal of this analysis is to discover patterns and insights that can help businesses improve profitability.

Using R, I will create visualizations, explore relationships between variables, and perform a statistical analysis to understand how sales and discounts affect profit.

Dataset Description

  • Dataset: Global Superstore
  • Source: Kaggle
  • Contains sales, profit, discount, customer segments, and regional data.

R Setup and Load Data

# Uncomment below if not installed:
# install.packages(c("tidyverse", "plotly", "readr", "lubridate"))

library(tidyverse)
library(plotly)
library(readr)
library(lubridate)

superstore <- read_csv("Global_Superstore2.csv")

head(superstore)

Total Sales by Category

ggplot(superstore, aes(x = Category, y = Sales, fill = Category)) +
  geom_bar(stat = "summary", fun = sum) +
  labs(title = "Total Sales by Category", x = "Category", y = "Total Sales") +
  theme_minimal()

Profit by Region

ggplot(superstore, aes(x = Region, y = Profit, fill = Region)) +
  geom_boxplot() +
  labs(title = "Profit Distribution by Region", x = "Region", 
       y = "Profit") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Sales vs Profit (2D Scatter)

This plot shows how sales and profit are related.
Darker colors mean higher discounts, which often lead to lower profits.

3D Plot: Sales, Profit, Discount

This 3D plot shows how sales, profit, and discount are related.
Each color represents a different product category.

Statistical Analysis - Linear Regression

model <- lm(Profit ~ Sales + Discount, data = superstore)
summary(model)$coefficients
##                 Estimate  Std. Error   t value      Pr(>|t|)
## (Intercept)   20.4383910 0.850991335  24.01716 9.258651e-127
## Sales          0.1648197 0.001315533 125.28741  0.000000e+00
## Discount    -227.0973084 3.021521241 -75.15992  0.000000e+00
  • Sales have a positive impact on profit. Higher sales lead to more profit.
  • Discounts reduce profit. The more discount given, the lower the profit.
  • Both variables are statistically significant and strongly affect profit.

Sales Trend Over Time

This slide shows how total sales changed year by year. It helps us understand whether the business is growing over time.

Profit by Customer Segment

This slide compares the total profit earned from different customer segments to identify which group contributes the most.

Top 5 Sub-Categories by Sales

This table displays the top 5 product sub-categories that generated the highest sales.

Sub-Category TotalSales
Phones 1706824
Copiers 1509436
Chairs 1501682
Bookcases 1466572
Storage 1127086

Business Recommendations

  • Focus more on Technology and Office Supplies, as they bring in the most profit.
  • Avoid offering high discounts on low-margin products they reduce overall profit.
  • Review profit trends in the West region due to high variation.
  • Continue serving Corporate and Home Office segments, as they show strong performance.

Conclusion

  • Sales have a strong positive impact on profit.
  • Higher discounts reduce profit, even if they increase sales.
  • Technology and Office Supplies are the best-performing categories.
  • The West region shows the highest variation in profit.

This analysis can help businesses make smarter decisions about pricing, discounts, and product focus.