1 Introduction
- 1.1 What is Superstore?
- 1.2 Sample Superstore Analysis
2 Data Overview
3 Which product categories and sub-categories are the most and least profitable?
4 How do profit margins vary across different regions and states?
5 Are discounts driving more sales or reducing overall profitability?
- 5.1 Impact of Discounts on Profit
- 5.2 Profit Margin by Discount Level
6 Which customer segments and individual customers contribute the most to revenue and profit?
- 6.1 Top 10 Customers by Profit
7 Which customer segments or individual customers contribute the most to revenue and profit?
8 Trends in Total Orders by Shipping Mode
- 8.1 Profit Trends by Shipping Mode
9 Are there specific products or regions consistently generating losses?
- 9.1 Loss Making Product Sub-Categories by Profit Margin
10 Products Generating Losses by Region
11 Final Insights – Superstore Analysis

1 Introduction

1.1 What is Superstore?

Superstore is a retail business that sells furniture, office supplies, and technology products across the United States. The company serves different customer segments, including consumers, corporate clients, and home offices. With operations spanning in multiple regions, Superstore faces challenges in balancing sales growth, profitability, and discount strategies.

1.2 Sample Superstore Analysis

This report presents an analysis of the Sample Superstore dataset with the objective of addressing key business challenges and identifying actionable solutions. The analysis focuses on trends and patterns in sales, profit, and loss, as well as additional performance metrics relevant to strategic decision-making.

To establish a clear foundation, the analysis will cover the following key questions:

Which product categories and sub-categories are the most and least profitable?
How do profit margins vary across different regions and states?
Are discounts driving higher sales or reducing overall profitability?
Which customer segments or individual customers contribute the most to revenue and profit?
What are the trends in shipping modes, and are we spending efficiently on delivery?
Are there specific products or regions consistently generating losses?
Where are the biggest opportunities for growth — in new markets, product categories, or customer segments?

With reasonable insights from the data, we can formulate predictions, address key business challenges, and identify actionable solutions to these challenges.

2 Data Overview

The Sample Superstore dataset contains information about sales transactions, including product categories, sub-categories, sales, profit, discounts, shipping modes, regions, states, and customer segments. This section provides a brief overview of the dataset structure and key summary statistics.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(ggplot2)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(ggimage)
library(tidyr)
library(ggiraph)
library(maps)

## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:purrr':
## 
##     map

library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(lubridate)
library(maps)


Sample_Superstore <- read_csv("Sample_Superstore.csv")

## Rows: 9994 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (13): Order ID, Ship Mode, Customer ID, Customer Name, Segment, Country...
## dbl   (7): Row ID, Postal Code, Sales, Quantity, Discount, Profit, Order_Pro...
## dttm  (2): Order Date, Ship Date
## date  (3): Order_Date, Ship_Date, Order_Month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

category_summary <- read_csv("category_summary.csv")

## Rows: 17 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Category, Sub-Category
## dbl (3): Total_Sales, Total_Profit, Avg_Profit_Per_Order
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

category_long <- read_csv("category_long.csv")

## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Category, Metric, Value_abbr, Profit_Percent
## dbl (1): Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

profit_trend <- read_csv("profit_trend.csv")

## Rows: 48 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): Total_Profit
## dttm (1): Order_Month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

category_long_export <- read_csv("category_long_export.csv")

## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Category, Metric, Value_abbr, Profit_Percent
## dbl (1): Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

subcategory_summary <- read_csv("subcategory_summary.csv")

## Rows: 17 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Sub-Category, Profit_Percent_Label
## dbl (4): Total_Sales, Total_Profit, Total_Quantity, Profit_Percent
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

state_region <- read_csv("state_region.csv")

## Rows: 49 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): State, Region
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

profit_margin_trend <- read_csv("profit_margin_trend.csv")

## Rows: 6 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Discount_Bin
## dbl (1): Avg_Profit_Margin
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

loss_products <- read_csv("loss_products.csv")

## Rows: 3 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Category, Sub-Category
## dbl (2): Total_Sales, Total_Profit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

profit_by_category <- read_csv("profit_by_category.csv")

## Rows: 3 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Category
## dbl (1): Total_Profit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

profit_by_region <- read_csv("profit_by_region.csv")

## Rows: 192 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): Region
## dbl  (1): Total_Profit
## date (1): Order_Month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

profit_by_subcategory <- read_csv("profit_by_category.csv")

## Rows: 3 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Category
## dbl (1): Total_Profit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

profit_nested <- read_csv("profit_nested.csv")

## Rows: 17 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Category, Sub-Category
## dbl (1): Total_Profit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

profit_data <- read_csv("profit_data.csv")

## Rows: 17 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Category, Sub-Category, tooltip, fill_color
## dbl (1): Total_Profit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

region_centroids <- read_csv("region_centeroids.csv")

## Rows: 4 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Region, Profit_Margin_Label
## dbl (5): lon, lat, Total_Sales, Total_Profit, Profit_Margin
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

region_trends <- read_csv("region_trends.csv")

## Rows: 384 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Region, Metric
## dbl  (1): Value
## date (1): Order_Month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

region_state_summary <- read_csv("region_state_summary.csv")

## Rows: 49 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Region, State
## dbl (2): Total_Sales, Total_Profit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

margin_extremes <- read_csv("margin_extremes.csv")

## Rows: 10 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): State, Profit_Margin_Label, Category
## dbl (3): Total_Sales, Total_Profit, Profit_Margin
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

top_customers <- read_csv("top_customers.csv")

## Rows: 10 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Customer Name
## dbl (3): Total_Sales, Total_Profit, Total_Orders
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

shipping_trends <- read_csv("shipping_trends.csv")

## Rows: 188 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): Ship Mode
## dbl  (3): Total_Sales, Total_Profit, Total_Orders
## date (1): Order_Month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3 Which product categories and sub-categories are the most and least profitable?

3.1 Category Analysis: Sales, Profit, and Quantity

library(dplyr)
library(ggplot2)
library(plotly)
library(scales)
library(readr)

# Custom function for thousands
label_thousands <- label_number(
  scale = 1/1000,
  suffix = "k",
  accuracy = 1
)

# Compute min and max for each facet (Metric)
tick_values <- category_long %>%
  group_by(Metric) %>%
  summarise(
    min_val = min(Value[Value > 0]),
    max_val = max(Value)
  )

# Function to pick min/max for breaks
breaks_fn <- function(x) c(min(x[x > 0]), max(x))

# Plot with only first and last tick per facet
p <- ggplot(category_long,
            aes(x = Value, y = Category, fill = Category,
                text = paste0(
                  Metric, ": ", label_thousands(Value),
                  "<br>Profit %: ", Profit_Percent
                ))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ Metric, scales = "free_x") +
  scale_x_continuous(
    labels = label_thousands,
    breaks = breaks_fn
  ) +
  labs(
    title = "Total Sales, Profit, and Quantity by Category",
    x = "Value (in Thousands)",
    y = "Category"
  ) +
  theme_minimal(base_size = 14) +
  theme(axis.text.x = element_text(angle = 0, hjust = 0.5))  # horizontal labels

# Make it interactive
ggplotly(p, tooltip = "text")

# Export the cleaned table with abbreviated values
category_long_export <- category_long %>%
  mutate(Value_abbr = label_thousands(Value)) %>%
  select(Category, Metric, Value, Value_abbr, Profit_Percent)

Insights - From the chart above, we can observe that Technology leads in total sales with $836,000 and a 17.40% profit, totaling $145,555, making it the most profitable category overall. - Interestingly, Technology achieved the highest sales despite having the fewest products sold (6,936 units), indicating a higher revenue per unit compared to other categories. - Looking at Furniture, this category produced $742,000 in sales but only managed a 2.49% profit, making it the least profitable category. - As for Office Supplies, this category achieved a 17.04% profit with total sales of $719,047. It is also the most sold category with 22,906 units sold.

3.2 Profitability Trend Over Time

library(dplyr)
library(ggplot2)
library(lubridate)
library(scales)
library(plotly)

# Aggregate profit by month
profit_trend <- Sample_Superstore %>%
  mutate(Order_Month = floor_date(`Order Date`, "month")) %>%  
  group_by(Order_Month) %>%
  summarise(
    Total_Profit = sum(Profit, na.rm = TRUE),
    .groups = "drop"
  )

# Base plot with explicit group = 1
p <- ggplot(profit_trend, aes(x = Order_Month, y = Total_Profit,
                              text = paste0("Month: ", Order_Month,
                                            "<br>Total Profit: ", dollar(Total_Profit)),
                              group = 1)) +  # this fixes ggplotly rendering
  geom_line(color = "steelblue", size = 1) +
  geom_point(color = "darkred", size = 2) +
  geom_smooth(method = "loess", se = FALSE, color = "darkred", linetype = "dashed") +
  scale_y_continuous(labels = dollar) +
  labs(
    title = "Profitability Trend Over Time",
    x = "Month",
    y = "Total Profit ($)"
  ) +
  theme_minimal(base_size = 14)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Make interactive
ggplotly(p, tooltip = "text")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: The following aesthetics were dropped during statistical transformation: text.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Insights - The plot above shows the monthly profitability of Superstore over the recorded period. Overall, the store exhibits a slightly growing trend in profit, as indicated by the red dashed smoothing line, although there are noticeable fluctuations month-to-month. - The highest profits occur in months such as November and December, reflecting the seasonal boost from holiday sales. For example, profits peak at around $70,000–$75,000 in these periods. - Conversely, some months, particularly in the first quarter of the year, show lower profitability, sometimes dipping to $10,000–$15,000, likely due to slower sales after the holiday season. - The trend line smooths out these fluctuations and suggests a gradual upward trajectory, indicating that overall profitability is improving over time.

Despite seasonal dips, the store is increasingly generating more profit per month, showing resilience and growth potential. The combination of monthly variation and the overall upward trend highlights areas of high performance and opportunities to optimize during lower-profit months.

3.3 Sub-Category Analysis: Sales, Profit, and Quantity

library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
library(plotly)

# Summarise subcategory
subcategory_summary <- Sample_Superstore %>%
  group_by(`Sub-Category`) %>%
  summarise(
    Total_Sales = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    Total_Quantity = sum(Quantity, na.rm = TRUE)
  ) %>%
  mutate(
    Profit_Percent = Total_Profit / Total_Sales,   # numeric for sorting
    Profit_Percent_Label = percent(Profit_Percent, accuracy = 0.01)
  )

# Reshape to long format
subcategory_long <- subcategory_summary %>%
  pivot_longer(
    cols = c(Total_Sales, Total_Profit, Total_Quantity),
    names_to = "Metric",
    values_to = "Value"
  )

# Custom function to display values in thousands
label_thousands <- label_number(
  scale = 1/1000,
  suffix = "k",
  accuracy = 1
)

# Base ggplot with horizontal bars
p <- ggplot(
  subcategory_long,
  aes(
    x = Value,
    y = `Sub-Category`,
    fill = `Sub-Category`,
    text = paste0(
      "Subcategory: ", `Sub-Category`, "<br>",
      "Metric: ", Metric, "<br>",
      "Value: ", label_thousands(Value), "<br>",
      "Profit %: ", Profit_Percent_Label
    )
  )
) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ Metric, scales = "free_x") +  # free_x because Value is on x-axis
  scale_x_continuous(labels = label_thousands) +
  labs(
    title = "Total Sales, Profit, and Quantity by Subcategory",
    x = "Value (in Thousands)",
    y = "Subcategory"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 30, hjust = 1),
    axis.text.y = element_text(size = 8)
  )

# Make it interactive
ggplotly(p, tooltip = "text")

Insights The chart above illustrates the totals of sales, profits, and quantities across all subcategories.

Sales: The subcategories generating the highest sales value are Tables, Storage, Phones, Chairs, and Binders.
Quantity: In terms of units sold, Paper and Binders are the most frequently purchased items.
Profitability: Copiers stand out as the most profitable subcategory, achieving a profit value of $55,617 and a profit margin of 34%. On the other hand, Tables and Bookcases rank as the least profitable subcategories, with Tables showing a negative profit margin of nearly -9%, and Bookcases at a -3% profit margin.

While certain subcategories such as, Tables and Bookcases drive sales, their weak or negative margins indicate they may be eroding overall profitability. In contrast, high-margin subcategories like Copiers and consistently high-volume categories like Paper and Binders should be considered strategic priorities, as they combine strong demand with solid profitability.

3.4 Profit Percentage by Subcategory: Top 5 and Bottom 5

library(ggplot2)
library(plotly)
library(scales)

# Profit Percent Chart by Subcategory
p_profit <- ggplot(
  subcategory_summary,
  aes(
    x = reorder(`Sub-Category`, Profit_Percent),  # order by profit %
    y = Profit_Percent,
    fill = `Sub-Category`,
    text = paste0(
      "Subcategory: ", `Sub-Category`, "<br>",
      "Profit %: ", Profit_Percent_Label, "<br>",
      "Total Sales: ", comma(Total_Sales), "<br>",
      "Total Profit: ", comma(Total_Profit)
    )
  )
) +
  geom_col(show.legend = FALSE) +
  coord_flip() +  # flip for readability
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    title = "Profit % by Subcategory",
    x = "Subcategory",
    y = "Profit %"
  ) +
  theme_minimal(base_size = 12)

# Interactive version
ggplotly(p_profit, tooltip = "text")

Insights Now, with the help of the chart above, we examine the profit percentage by subcategory, we identify the top five most profitable and the least profitable groups:

Top 5 Most Profitable Subcategories:

Labels — 44% profit margin, with $5,546 profit
Envelopes — 42% profit margin, with $6,964 profit
Copiers — 37% profit margin, with $55,518 profit
Paper — 33% profit margin, with $34,054 profit
Fasteners — 31% profit margin, with $950 profit

Top 5 Least Profitable Subcategories:

Tables — -9% profit margin, with a loss of $17,725
Bookcases — -3% profit margin, with a loss of $3,473
Supplies — -3% profit margin, with a loss of $1,189
Machines — 2% profit margin, with a small profit of $3,385
Chairs — 8% profit margin, with a profit of $26,590

While subcategories such as Labels, Envelopes, Copiers, and Paper deliver strong profit margins, it is worth noting that Tables, Bookcases, and Supplies are dragging profitability with negative or near-zero margins.

Actionable Insights This suggests that management should reassess pricing, discounts, or cost structures for under performing categories, while doubling down on high-margin subcategories that contribute more effectively to the bottom line.

3.5 Sales vs Profitability by Subcategory

library(ggplot2)
library(plotly)
library(scales)

# Scatter plot: Sales vs Profit %
p_scatter <- ggplot(
  subcategory_summary,
  aes(
    x = Total_Sales,
    y = Profit_Percent,
    size = Total_Quantity,
    color = `Sub-Category`,
    text = paste0(
      "Subcategory: ", `Sub-Category`, "<br>",
      "Total Sales: $", comma(Total_Sales), "<br>",
      "Total Profit: $", comma(Total_Profit), "<br>",
      "Profit %: ", Profit_Percent_Label, "<br>",
      "Quantity Sold: ", comma(Total_Quantity)
    )
  )
) +
  geom_point(alpha = 0.7) +
  scale_x_continuous(labels = dollar_format()) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    title = "Sales vs Profit % by Subcategory",
    x = "Total Sales",
    y = "Profit % (Profit/Sales)",
    size = "Quantity Sold"
  ) +
  theme_minimal(base_size = 13)

# Make interactive
ggplotly(p_scatter, tooltip = "text")

This bubble chart below provides a strategic view of how each subcategory in the Superstore performs in terms of sales revenue, profitability, and product volume:

X-Axis (Total Sales): Subcategories further to the right generate higher revenue. Y-Axis (Profit %): Subcategories higher up are more profitable relative to their sales. Bubble Size (Quantity Sold): Larger bubbles indicate subcategories that move higher product volumes.

Phones and Chairs Appear far to the right → strong sales performance. Moderate height = profitable, but not the top performers by margin. These drive significant revenue but not the highest efficiency.

Labels and Paper Positioned higher up on the Y-axis = highest profit percentages. Labels, though relatively smaller in sales, deliver exceptional margins, making them profit leaders. Paper combines both high profit % and high volume, making it a critical driver of both margin and market presence.

Tables and Bookcases Positioned to the right but lower on the Y-axis = high revenue but weak profitability. These categories drag down margins and may need pricing or cost-structure adjustments.

Binders and Paper Large bubbles = sold in very high quantities. These are workhorse categories, with Binders moving huge volumes but not offering the strongest profitability.

Insights From our analysis, we can state the following: Phones and Chairs drive the highest sales volume, but they are not the most profitable. The subcategories Labels and Paper stand out as the most profitable, generating a strong profit margin relative to their sales.On the other hand, Tables and Bookcases represent the least profitable subcategories, with profit margins below -3%. In terms of product movement, Binders and Paper are sold in the highest quantities, indicating their importance in driving overall sales volume.

This indicates that while some subcategories contribute heavily to revenue (like Phones), true profitability is concentrated in lower-sales but high-margin products (like Labels). Conversely, high-sales categories like Tables and Bookcases under perform in profit contribution, suggesting they may require a more thorough analysis.

4 How do profit margins vary across different regions and states?

4.1 Regional vs State Profitability

library(dplyr)
library(ggplot2)
library(plotly)
library(maps)
library(scales)

# --- Prepare State-level data ---
state_summary <- Sample_Superstore %>%
  group_by(State, Region) %>%
  summarise(
    Total_Sales = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    Profit_Margin = Total_Profit / Total_Sales,
    Profit_Margin_Label = percent(Profit_Margin, accuracy = 0.1),
    Map_Type = "State"   # new column to identify plot type
  )

states_map <- map_data("state")
state_summary <- state_summary %>%
  mutate(State_lower = tolower(State))

map_data_states <- states_map %>%
  left_join(state_summary, by = c("region" = "State_lower"))

# --- Prepare Region-level data ---
region_summary <- Sample_Superstore %>%
  group_by(Region) %>%
  summarise(
    Total_Sales = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    Profit_Margin = Total_Profit / Total_Sales,
    Map_Type = "Region"
  )

state_region <- Sample_Superstore %>%
  select(State, Region) %>%
  distinct() %>%
  mutate(State_lower = tolower(State))

map_data_region <- states_map %>%
  left_join(state_region, by = c("region" = "State_lower")) %>%
  left_join(region_summary, by = "Region") %>%
  mutate(State = NA,             # add missing State column for consistency
         Profit_Margin_Label = NA)  # optional placeholder

# --- Combine state and region data ---
map_data_combined <- bind_rows(
  map_data_states %>% select(long, lat, group, region, Profit_Margin, Profit_Margin_Label, Map_Type, State, Total_Sales, Region),
  map_data_region %>% select(long, lat, group, region, Profit_Margin, Profit_Margin_Label, Map_Type, State, Total_Sales, Region)
)

# --- Plot both maps side by side using facet_wrap ---
p <- ggplot(map_data_combined, aes(long, lat, group = group, fill = Profit_Margin,
                                   text = paste0(
                                     ifelse(Map_Type == "State", paste("State: ", State), paste("Region: ", Region)),
                                     "<br>Total Sales: $", comma(Total_Sales),
                                     "<br>Profit Margin: ", ifelse(Map_Type == "State", Profit_Margin_Label, percent(Profit_Margin, accuracy = 0.01))
                                   ))) +
  geom_polygon(color = "white") +
  coord_fixed(1.3) +
  scale_fill_gradient2(low = "red", mid = "orange", high = "green", midpoint = 0,
                       labels = percent) +
  facet_wrap(~ Map_Type, ncol = 2) +   # two plots side by side
  labs(title = "Profit Margin: States vs Regions", fill = "Profit Margin") +
  theme_void()

# Make interactive
ggplotly(p, tooltip = "text")

4.2 State Profitability

Insights This map provides a geographic perspective on profitability, helping identify where sales are effectively generating profit versus where they may be underperforming.

1. High Profit Margin States (Green)

These states are the most efficient, converting sales into strong profits.
Examples: California, Washington, New York.
Key drivers: High demand and profitable categories such as Technology and Office Supplies.

2. Low or Negative Profit Margin States (White to Red)

States shaded in white or red show low or negative margins.
Examples: Texas, Illinois, Pennsylvania.
Possible reasons: High sales volume but low profitability, deep discounting, unprofitable product mixes, or higher shipping costs.

3. Moderate Profit Margin States (Near Zero / White)

These states are breaking even; sales cover costs but contribute little to profit.
Opportunity: Improve profitability through strategic pricing or adjusting product mix.

Actionable Insights

Focus on High-Margin States: Consider expanding offerings or marketing campaigns in California, New York, and other profitable states.
Investigate Low-Margin States: Analyze factors in states like Texas and Illinois—review discount policies, return rates, and category performance.
Target Moderate-Margin States: Explore opportunities for profitable growth with targeted strategies.

4.3 Regional Profitability

Insights: This interactive map shows profit margin by region across the US for the Sample Superstore dataset. * Darker green regions → higher profit margins, indicating strong profitability relative to sales. * Lighter green regions → lower profit margins, signaling less efficient sales-to-profit conversion. * Hover over regions to see Total Sales and Profit Margin, allowing quick comparison of performance.

Actionable Insights

Focus on Top-Performing Regions: Prioritize regions with strong margins for marketing and expansion.
Improve Underperforming Regions: Investigate factors reducing margins and optimize operations or product strategies.
Compare Across Regions: Use the map to benchmark performance and identify areas for efficiency gains.

4.4 Total Profit Distribution by State

library(plotly)
library(dplyr)
library(scales)

donut_data <- state_summary %>%
  mutate(
    Fraction = Total_Profit / sum(Total_Profit, na.rm = TRUE),
    Label = paste0(State, ": ", dollar(Total_Profit))
  )

plot_ly(
  donut_data,
  labels = ~State,
  values = ~Total_Profit,
  type = "pie",
  textinfo = "none",   # no static labels
  hoverinfo = "text",  # keep hover tooltips
  text = ~paste0(
    "State: ", State,
    "<br>Total Profit: ", dollar(Total_Profit),
    "<br>Profit Margin: ", percent(Profit_Margin, accuracy = 0.1)
  ),
  hole = 0.4
) %>%
  layout(
    title = "Total Profit Distribution by State",
    legend = list(
      orientation = "h",  # 🔹 horizontal legend
      x = 0.5,            # center horizontally
      xanchor = "center", # anchor legend to the center
      y = -0.2            # move legend below the plot
    )
  )

The donuts plot shows Profit Value ($) and Profit Margin (%) for each state, allowing us to analyze both absolute profitability and efficiency of sales.

Profit Margin vs Profit Value:

Profit Value ($) is the total dollar profit a state generates. High sales can lead to high total profit, but it doesn’t account for how efficiently sales are converted into profit.
Profit Margin (%) shows the percentage of sales that turns into profit. A high margin indicates that a state is very efficient at generating profit relative to its sales.

Insights * Highest Profit Margin: Washington D.C. (34%) — this means that for every dollar of sales in D.C., the company earns 34 cents in profit, making it the most efficient state in terms of profitability. * Lowest Profit Margin: Ohio (-27.1%) — Ohio is losing money, with a negative margin indicating that for every dollar of sales, the company loses 27 cents.

This comparison highlights that high total sales do not always equal high profitability. Some states may have large sales but low or negative margins, while others with moderate sales can be highly profitable due to efficient operations.

4.5 Top 5 and Bottom 5 States by Profit Margin

library(ggplot2)
library(scales)

p <- ggplot(margin_extremes, aes(x = reorder(State, Profit_Margin), 
                                 y = Profit_Margin, fill = Category)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = paste0(Profit_Margin_Label, 
                               "\n$", comma(Total_Profit))),
            hjust = -0.1, size = 3.5) +
  facet_wrap(~ Category, scales = "free") +
  coord_flip() +
  scale_y_continuous(labels = percent, expand = expansion(mult = c(0, 0.15))) +
  labs(
    title = "Top 5 and Bottom 5 States by Profit Margin",
    x = "State",
    y = "Profit Margin (%)"
  ) +
  theme_minimal(base_size = 14)

p

Insights - From the chart above, we can identify the top and bottom five states by profit margin. The top-performing states demonstrate strong efficiency in converting sales into profit, even if their sales volumes are not the highest. This indicates advantages such as better pricing power, lower costs, or a favorable product mix. On the other hand, the bottom-performing states are struggling with weak or even negative margins, which may stem from excessive discounting, higher transportation costs, or low demand. - The high-margin states serve as models worth replicating across other markets, the low-margin states highlight areas where immediate strategic adjustments are necessary. For shareholders, this contrast underscores both the strengths and risks within the company’s operations and provides a clear roadmap for where management should focus improvement efforts.

4.6 Subcategory Performance Across States

library(dplyr)
library(ggplot2)
library(scales)
library(plotly)
 
# Summarize profit margin by state
state_summary <- Sample_Superstore %>%
  group_by(State) %>%
  summarise(
    Total_Sales = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    .groups = "drop"
    ) %>%
  mutate(Profit_Margin = Total_Profit / Total_Sales)

# Identify Top 5 and Bottom 5 states by Profit Margin
top_bottom_states <- state_summary %>%
  arrange(desc(Profit_Margin)) %>%
  slice(c(1:5, (n()-4):n())) %>%
  pull(State)

# Filter original data to just those states + summarize by Sub-Category
state_sub_summary <- Sample_Superstore %>%
     filter(State %in% top_bottom_states) %>%
     group_by(State, `Sub-Category`) %>%
     summarise(
         Total_Sales = sum(Sales, na.rm = TRUE),
         Total_Profit = sum(Profit, na.rm = TRUE),
         .groups = "drop"
     ) %>%
     mutate(Profit_Margin = Total_Profit / Total_Sales)
 
# Keep ordering of states (Top 5 first, Bottom 5 after)
state_order <- state_summary %>%
     arrange(desc(Profit_Margin)) %>%
     slice(c(1:5, (n()-4):n())) %>%
     pull(State)
 
 state_sub_summary$State <- factor(state_sub_summary$State, levels = state_order)
 
# Plot stacked bar chart
p <- ggplot(state_sub_summary, aes(x = State, y = Profit_Margin,
                                    fill = `Sub-Category`,
                                    text = paste0(
                                        "State: ", State,
                                        "<br>Sub-Category: ", `Sub-Category`,
                                        "<br>Profit Margin: ", percent(Profit_Margin, accuracy = 0.1),
                                        "<br>Total Profit: ", dollar(Total_Profit)
                                    ))) +
     geom_col() +
     coord_flip() +
     scale_y_continuous(labels = percent) +
     labs(
         title = "Top 5 and Bottom 5 States by Profit Margin (Stacked by Sub-Category)",
         x = "State",
         y = "Profit Margin (%)",
         fill = "Sub-Category"
     ) +
     theme_minimal(base_size = 14)
 
  # Convert to interactive
ggplotly(p, tooltip = "text")

Insights - The analysis of profit margins by state with subcategory contributions reveals clear patterns in performance. States with the highest profit margins, such as the District of Columbia, benefit from strong and consistent contributions across multiple subcategories, particularly Copiers, Paper, and Labels. These states demonstrate balanced profitability, where no single subcategory dominates, reducing risk and ensuring stable performance. - Conversely, the states with the lowest profit margins, such as Ohio, are weighed down by substantial losses in specific subcategories, especially Tables, Bookcases, and Supplies. While certain products like Phones and Binders perform well even in these states, their contributions are insufficient to offset the heavy losses in furniture-related categories.
- This comparison highlights that state-level profitability is driven less by total sales volume and more by the composition of sales across subcategories. A diverse and balanced mix of profitable products leads to strong margins, whereas over-reliance on loss-making categories erodes profitability, regardless of sales performance.

5 Are discounts driving more sales or reducing overall profitability?

5.1 Impact of Discounts on Profit

library(ggplot2)
library(dplyr)
library(scales)
library(plotly)

# Base ggplot
p <- ggplot(Sample_Superstore, aes(
    x = Discount,
    y = Profit,
    text = paste0(
      "Sales: $", comma(Sales),
      "<br>Profit: $", comma(Profit),
      "<br>Discount: ", percent(Discount),
      "<br>Category: ", Category,
      "<br>Sub-Category: ", `Sub-Category`
    )
  )) +
  geom_point(alpha = 0.6, color = "darkblue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Impact of Discounts on Profit",
    x = "Discount",
    y = "Profit"
  ) +
  theme_minimal(base_size = 14)

# Convert to interactive
ggplotly(p, tooltip = "text")

## `geom_smooth()` using formula = 'y ~ x'

Understanding the Discount vs Profit Scatter Plot

Each Dot = One Transaction
- Every point on the plot represents a single order (or line item) in the Sample Superstore dataset.
- The x-axis shows the discount applied.
- The y-axis shows the profit for that order.
Clusters of Points
- Low/No discounts: Many points are profitable, clustered at the top left.
- High discounts: More points fall into negative profit territory, especially beyond 30–40% discount.
Interactive Hover Details
- Hovering reveals Sales, Profit, Discount, Category, and Sub-Category.
- This allows you to pinpoint which product categories suffer most under discounting.

Insights
- Discounts don’t always equal profitability — while they may boost sales, they often erode profit.
- High-discount transactions are risky, particularly in categories like Furniture, which already operate on thin margins.
- Controlled discounting works best — small, targeted promotions protect margins better than blanket heavy discounts.
- Category-specific strategy:
- Use discounts cautiously in low-margin categories (Furniture).
- Be more flexible in high-margin categories (Technology, Office Supplies).
- Monitor closely: Ensure discounts drive net profit gains, not just inflated sales at a loss.

5.2 Profit Margin by Discount Level

library(dplyr)
library(ggplot2)
library(scales)
library(plotly)

# Compute average profit margin by discount bin
profit_margin_trend <- Sample_Superstore %>%
  mutate(
    Discount_Bin = cut(
      Discount,
      breaks = c(0, 0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1.0),
      labels = c("0-10%", "10-20%", "20-30%", "30-40%", "40-60%", "60-80%", "80-100%"),
      include.lowest = TRUE
    ),
    Profit_Margin = ifelse(Sales > 0, Profit / Sales, NA)
  ) %>%
  group_by(Discount_Bin) %>%
  summarise(
    Avg_Profit_Margin = mean(Profit_Margin, na.rm = TRUE),
    .groups = "drop"
  )

# Find break-even point (approximate)
break_even_idx <- which.min(abs(profit_margin_trend$Avg_Profit_Margin))  # closest to 0
break_even_bin <- profit_margin_trend$Discount_Bin[break_even_idx]
break_even_value <- profit_margin_trend$Avg_Profit_Margin[break_even_idx]

# Plot
p <- ggplot(profit_margin_trend, aes(x = Discount_Bin, y = Avg_Profit_Margin,
                                     group = 1,
                                     text = paste0(
                                       "Discount Range: ", Discount_Bin,
                                       "<br>Avg Profit Margin: ", percent(Avg_Profit_Margin, 0.1)
                                     ))) +
  geom_line(color = "steelblue", size = 1.2) +        # trend line
  geom_point(color = "darkred", size = 3) +           # points
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  geom_point(aes(x = break_even_bin, y = break_even_value),
             color = "red", size = 4) +               # mark break-even
  annotate("text", x = break_even_bin, y = break_even_value + 0.05,
           label = "Break-even", color = "red", angle = 0, vjust = 0) +
  scale_y_continuous(labels = percent) +
  labs(
    title = "Average Profit Margin by Discount Level",
    x = "Discount Range",
    y = "Average Profit Margin"
  ) +
  theme_minimal(base_size = 14)

# Interactive
ggplotly(p, tooltip = "text")

Insights Discount vs. Profitability 1. Trend Overview * The blue trend line shows average profit margin across different discount levels. * As discounts increase, profit margins consistently decline, highlighting the trade-off between sales growth and profitability. 2. Break-even Point * The red point marks the break-even discount level, where the average profit margin crosses 0%. * In the Sample Superstore data, this occurs around the 30–40% discount range. * Discounts below this threshold are generally profitable, while higher discounts push margins into negative territory.

Actionable Insights * Controlled Discounts: Keep discounts below the break-even threshold to ensure profitability. * High-Margin Categories: Can tolerate slightly higher discounts without going negative. * Low-Margin Categories: Even moderate discounts may turn sales unprofitable, so apply carefully. * Monitoring: Regularly track profit margins by discount level to adjust promotions dynamically.

Key Point: The plot visually emphasizes that more discount ≠ more profit. Understanding the break-even point allows the company to maximize sales while protecting overall profitability.

6 Which customer segments and individual customers contribute the most to revenue and profit?

6.1 Top 10 Customers by Profit

customer_summary <- Sample_Superstore %>%
  group_by(`Customer Name`) %>%
  summarise(
    Total_Sales = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    Total_Orders = n(),
    .groups = "drop"
  ) %>%
  arrange(desc(Total_Profit))

library(plotly)

# Add interactive tooltips
p <- ggplot(top_customers, aes(x = reorder(`Customer Name`, Total_Profit), y = Total_Profit,
                               text = paste0(
                                 "Customer: ", `Customer Name`,
                                 "<br>Total Sales: $", scales::comma(Total_Sales),
                                 "<br>Total Profit: $", scales::comma(Total_Profit),
                                 "<br>Orders: ", Total_Orders
                               ))) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Top 10 Customers by Profit",
    x = "Customer",
    y = "Total Profit"
  ) +
  theme_minimal()

# Convert to interactive plot
ggplotly(p, tooltip = "text")

Insights The plot highlights our most profitable customers, showing that a small number of clients contribute disproportionately to total profit. While some high-sales customers generate modest profit, others with fewer orders deliver strong margins, indicating key opportunities for targeted engagement and risk management.

7 Which customer segments or individual customers contribute the most to revenue and profit?

library(dplyr)
library(ggplot2)
library(scales)
library(plotly)

# Summarize by customer segment
segment_summary <- Sample_Superstore %>%
  group_by(`Segment`) %>%
  summarise(
    Total_Sales = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    Total_Orders = n(),
    .groups = "drop"
  ) %>%
  arrange(desc(Total_Profit))

p <- ggplot(segment_summary, aes(x = `Segment`, y = Total_Profit,
                                 text = paste0(
                                   "Segment: ", `Segment`,
                                   "<br>Total Sales: $", scales::comma(Total_Sales),
                                   "<br>Total Profit: $", scales::comma(Total_Profit),
                                   "<br>Orders: ", Total_Orders
                                 ))) +
  geom_col(fill = "darkgreen") +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Customer Segment Contribution to Profit",
    x = "Segment",
    y = "Total Profit"
  ) +
  theme_minimal()

ggplotly(p, tooltip = "text")

Insights - The plot shows which customer segments drive the most profit.
- Segments with high Total Profit and Total Sales are the most valuable and should be prioritized for marketing and retention.
- Segments with high sales but low profit may need pricing adjustments or targeted promotions to improve profitability.

Insights Lowest Profit Segment & Highest Profit Segment - Consumer is the highest profit segment, contributing the most to overall profit and revenue.
- Strong margins may come from larger orders, repeat business, and higher-margin product purchases.
- Actionable takeaway: Focus on retention strategies, loyalty programs, and upselling opportunities in this segment to further maximize revenue and profitability.

Home Office is the lowest profit segment, contributing the least to overall profit.
Despite any sales it generates, profit margins may be low due to heavy discounting, small order sizes, or unprofitable product mixes.
Actionable takeaway: Consider optimizing pricing, reducing discounting, or promoting higher-margin products in this segment to improve profitability.

8 Trends in Total Orders by Shipping Mode

library(plotly)

p_orders <- ggplot(shipping_trends, aes(x = Order_Month, y = Total_Orders, color = `Ship Mode`,
                                        text = paste0(
                                          "Shipping Mode: ", `Ship Mode`,
                                          "<br>Month: ", Order_Month,
                                          "<br>Total Orders: ", Total_Orders
                                        ))) +
  geom_line(size = 1.2) +
  geom_point() +
  labs(
    title = "Trends in Total Orders by Shipping Mode",
    x = "Month",
    y = "Total Orders",
    color = "Shipping Mode"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p_orders, tooltip = "text")

Insights The plot shows that Standard Class consistently handles the largest share of orders over time, making it the backbone of shipping operations. Other modes such as Second Class, First Class, and Same Day have much lower and relatively stable volumes, suggesting they are used selectively. This indicates that the company relies heavily on the most economical shipping option, while premium modes remain niche

8.1 Profit Trends by Shipping Mode

library(ggplot2)
library(plotly)
library(scales)

p_profit_facet <- ggplot(data = shipping_trends, 
                         aes(x = Order_Month, y = Total_Profit,
                             group = `Ship Mode`,
                             color = `Ship Mode`,
                             text = paste0(
                               "Shipping Mode: ", `Ship Mode`,
                               "<br>Month: ", Order_Month,
                               "<br>Total Profit: $", scales::comma(Total_Profit)
                             ))) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  facet_wrap(~`Ship Mode`, scales = "free_y") +  # separate panel per shipping mode
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Total Profit Trends by Shipping Mode",
    x = "Month",
    y = "Total Profit"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none")   # legend not needed since each panel is labeled

ggplotly(p_profit_facet, tooltip = "text")

Insights The trend shows that Standard Class dominates order volumes consistently over time, indicating it is the most relied-on and cost-efficient shipping mode. Premium options such as First Class and Same Day remain low in usage, with only minor fluctuations, suggesting they are reserved for urgent or special cases rather than routine orders.

9 Are there specific products or regions consistently generating losses?

9.1 Loss Making Product Sub-Categories by Profit Margin

library(dplyr)
library(ggplot2)
library(plotly)
library(scales)

# Calculate loss-making sub-categories with profit margin
loss_products_margin <- Sample_Superstore %>%
  group_by(Category, `Sub-Category`) %>%
  summarise(
    Total_Sales  = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(Profit_Margin = Total_Profit / Total_Sales) %>%
  filter(Profit_Margin < 0) %>%
  arrange(Profit_Margin)

# Plot Profit Margin for loss-making sub-categories
p_loss_products_margin <- ggplot(loss_products_margin, 
                                 aes(x = reorder(`Sub-Category`, Profit_Margin),
                                     y = Profit_Margin, fill = Category,
                                     text = paste0(
                                       "Sub-Category: ", `Sub-Category`,
                                       "<br>Category: ", Category,
                                       "<br>Total Sales: $", comma(Total_Sales),
                                       "<br>Total Profit: $", comma(Total_Profit),
                                       "<br>Profit Margin: ", percent(Profit_Margin, accuracy = 0.1)
                                     ))) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = percent) +
  labs(
    title = "Loss-Making Product Sub-Categories by Profit Margin",
    x = "Sub-Category",
    y = "Profit Margin (%)",
    fill = "Category"
  ) +
  theme_minimal()

# Interactive
ggplotly(p_loss_products_margin, tooltip = "text")

Insights The plot highlights loss-making sub-categories by profit margin (%), showing which products are most inefficient relative to their sales.

Furniture (e.g., Tables) and Office Supplies (e.g., Binders) consistently have the lowest profit margins.
Some sub-categories may have moderate sales but very negative margins, signaling structural inefficiencies such as high costs, deep discounts, or shipping expenses.
This view helps prioritize corrective actions: adjust pricing, limit discounts, renegotiate supplier costs, or phase out persistently unprofitable products.

10 Products Generating Losses by Region

library(dplyr)
library(ggplot2)
library(plotly)

# Group by Region + Sub-Category
loss_regions_products <- Sample_Superstore %>%
  group_by(Region, `Sub-Category`) %>%
  summarise(
    Total_Sales = sum(Sales, na.rm = TRUE),
    Total_Profit = sum(Profit, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(Total_Profit < 0) %>%   # only show losses
  arrange(Total_Profit)

# Plot
p_loss_regions_products <- ggplot(loss_regions_products,
                                  aes(x = reorder(`Sub-Category`, Total_Profit),
                                      y = Total_Profit,
                                      fill = Region,
                                      text = paste0(
                                        "Region: ", Region,
                                        "<br>Sub-Category: ", `Sub-Category`,
                                        "<br>Total Sales: $", scales::comma(Total_Sales),
                                        "<br>Total Profit: $", scales::comma(Total_Profit)
                                      ))) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Products Generating Losses by Region",
    x = "Sub-Category",
    y = "Total Profit",
    fill = "Region"
  ) +
  theme_minimal()

ggplotly(p_loss_regions_products, tooltip = "text")

Insights The plot identifies loss-making products across regions, showing which sub-categories consistently generate negative profits.

Furniture (e.g., Tables) and Office Supplies (e.g., Binders) are the largest contributors to losses.
Losses vary by region, suggesting that some regions struggle more with specific products.
This highlights opportunities to adjust pricing, discounts, or inventory regionally to reduce losses and protect profitability. ____________________________________________________________________________________________

11 Final Insights – Superstore Analysis

Product Profitability
- Technology emerges as the most profitable category, with sub-categories like Copiers and Phones consistently driving high margins.
- Conversely, Furniture (especially Tables) and Office Supplies (Binders) are the least profitable, with several sub-categories regularly generating losses.

Business Action

Category	Sub-Categories	Action Plan
Technology	Copiers, Phones	Expand stock, targeted marketing, cross-sell, maintain margins
Furniture	Tables	Reduce discounts, optimize shipping, renegotiate suppliers, consider phasing out
Office Supplies	Binders	Reprice, bundle with profitable items, limit unnecessary discounts

Regional and State-Level Margins
- Profit margins are strongest in the West and East regions, while the Central region shows weaker performance and several loss-generating states.

Business action

Region	Insight / Observation	Business Action
West	High-margin region, strong performance	Double down on high-margin products, increase marketing, expand inventory, upsell and cross-sell
East	High-margin region, strong performance	Double down on high-margin products, increase marketing, expand inventory, upsell and cross-sell
Central	Under-performing region, lower margins	Improve pricing strategy, reduce unnecessary discounts, focus on profitable product categories, optimize shipping and operations

Discount Impact
- Heavy discounting does increase sales volumes, but in many categories it erodes profitability, particularly in Furniture.

Business action

Focus Area	Insight / Observation	Business Action
Discounts	Blanket promotions erode profit margins, especially on low-margin categories	Apply discounts strategically at the breakpoint margin, prioritizing high-margin categories instead of blanket promotions

Customer Segmentation
- A small proportion of customers drive the bulk of revenue and profit. Corporate and Consumer segments perform better than Home Office, which often shows lower profitability.

Business action

Customer Segment	Insight / Observation	Business Action
High-Value Customers	Most profitable, drive significant revenue and margins	Prioritize retention and loyalty programs, offer personalized rewards, and targeted promotions
Lower-Tier Customers	Lower profitability, often reliant on discounts	Reassess pricing strategy, limit deep discounts, focus on profitable product offerings, optimize acquisition costs

Shipping Mode Trends
- Standard Class dominates in order volume, confirming its cost efficiency.
- Premium options (First Class and Same Day) show limited and stable usage, indicating selective adoption rather than growth opportunities.
Consistent Loss Areas
- Certain sub-categories (Tables, Bookcases, and Binders) and some Central region states repeatedly show negative profit contributions.

Business action

Focus Area	Insight / Observation	Business Action
Underperforming Products	Certain products consistently generate low or negative profit	Take corrective action: adjust pricing, control costs, or consider phasing out persistently unprofitable products
Discounts	Discounts over 30% erode margins	Limit deep discounts, implement discount caps, and use targeted promotions instead of blanket markdowns

Opportunities for Growth
- Technology offers the largest growth potential, particularly Phones, Copiers, and Accessories.

Business action

Focus Area	Insight / Observation	Business Action
Product Category	Technology products (Phones, Copiers, Accessories) drive highest margins	Double down on high-margin categories: increase inventory, marketing, and cross-sell opportunities
Region	West and East regions have strongest margins	Focus expansion and sales efforts in West and East to maximize returns

* Expanding in **West and East regions**, where margins are strongest, will maximize   return.

Business action

Area / Focus	Insight / Observation	Business Action
Discounts	Deep or unnecessary discounts erode profit margins	Implement targeted discounting, limit blanket discounts, monitor discount impact on margins
Shipping	High-cost shipping reduces profitability	Promote cost-efficient shipping methods, optimize logistics, charge for premium shipping when necessary
Loss-Making Products	Some sub-categories consistently generate losses	Focus on profitable products, reprice or renegotiate supplier contracts, consider phasing out persistently unprofitable items

Customer growth strategies should focus on high-value individual customers while reducing dependence on unprofitable discount-driven sales.

Business action

Current Strategy	Insight / Observation	Business Action
Volume-driven with heavy discounts	Discounts drive sales but erode profit margins	Shift to value-focused strategy: implement loyalty points program, reward repeat customers, encourage profitable purchases

11.0.1 Overall Conclusion:

The Superstore’s profitability is concentrated in specific products, regions, and customers, while other areas drain performance. By doubling down on Technology, profitable regions, and loyal high-value customers, while tightening discounts and addressing loss-making sub-categories, the company can drive sustainable growth and improve efficiency in both operations and profitability.

11.0.2 References

Tableau Software. (2020, June 30). Sample - Superstore Sales (Excel). Tableau Community Forums. https://community.tableau.com/s/question/0D54T00000CWeX8SAL/sample-superstore-sales-excelxls
Becker, R. A., Wilks, A. R., Brownrigg, R., Minka, T. P., & Deckmyn, A. (2018). maps: Draw geographical maps [Computer software]. CRAN. https://cran.r-project.org/package=maps
Gohel, D., & Skintzos, P. (2025). ggiraph: Make ‘ggplot’ graphics interactive [Computer software]. CRAN. https://cran.r-project.org/package=ggiraph
Plotly Technologies Inc. (2025). plotly: Create interactive web graphics via plotly.js [Computer software]. CRAN. https://cran.r-project.org/package=plotly
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2025). dplyr: A grammar of data manipulation [Computer software]. CRAN. https://cran.r-project.org/package=dplyr
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media. (tidyverse and lubridate packages are part of this ecosystem).
Wickham, H., & Henry, L. (2019). tidyr: Tidy messy data [Computer software]. CRAN. https://cran.r-project.org/package=tidyr
Wickham, H., & Seidel, D. (2022). scales: Scale functions for visualization [Computer software]. CRAN. https://cran.r-project.org/package=scales
Yu, G., & Xu, S. (2025). ggimage: Use image in ‘ggplot2’ [Computer software]. CRAN. https://cran.r-project.org/package=ggimage

Sample Superstore Report

Gabriel Lamarre

2025-10-05

1 Introduction

1.1 What is Superstore?

1.2 Sample Superstore Analysis

2 Data Overview

3 Which product categories and sub-categories are the most and least profitable?

3.1 Category Analysis: Sales, Profit, and Quantity

3.2 Profitability Trend Over Time

3.3 Sub-Category Analysis: Sales, Profit, and Quantity

3.4 Profit Percentage by Subcategory: Top 5 and Bottom 5

3.5 Sales vs Profitability by Subcategory

4 How do profit margins vary across different regions and states?

4.1 Regional vs State Profitability

4.2 State Profitability

4.3 Regional Profitability

4.4 Total Profit Distribution by State

4.5 Top 5 and Bottom 5 States by Profit Margin

4.6 Subcategory Performance Across States

5 Are discounts driving more sales or reducing overall profitability?

5.1 Impact of Discounts on Profit

5.2 Profit Margin by Discount Level

6 Which customer segments and individual customers contribute the most to revenue and profit?

6.1 Top 10 Customers by Profit

7 Which customer segments or individual customers contribute the most to revenue and profit?

8 Trends in Total Orders by Shipping Mode

8.1 Profit Trends by Shipping Mode

9 Are there specific products or regions consistently generating losses?

9.1 Loss Making Product Sub-Categories by Profit Margin

10 Products Generating Losses by Region

11 Final Insights – Superstore Analysis

11.0.1 Overall Conclusion:

11.0.2 References