Superstore is a retail business that sells furniture, office supplies, and technology products across the United States. The company serves different customer segments, including consumers, corporate clients, and home offices. With operations spanning in multiple regions, Superstore faces challenges in balancing sales growth, profitability, and discount strategies.
This report presents an analysis of the Sample Superstore dataset with the objective of addressing key business challenges and identifying actionable solutions. The analysis focuses on trends and patterns in sales, profit, and loss, as well as additional performance metrics relevant to strategic decision-making.
To establish a clear foundation, the analysis will cover the following key questions:
With reasonable insights from the data, we can formulate predictions, address key business challenges, and identify actionable solutions to these challenges.
The Sample Superstore dataset contains information about sales transactions, including product categories, sub-categories, sales, profit, discounts, shipping modes, regions, states, and customer segments. This section provides a brief overview of the dataset structure and key summary statistics.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
##
## Attaching package: 'maps'
##
## The following object is masked from 'package:purrr':
##
## map
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
## Rows: 9994 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Order ID, Ship Mode, Customer ID, Customer Name, Segment, Country...
## dbl (7): Row ID, Postal Code, Sales, Quantity, Discount, Profit, Order_Pro...
## dttm (2): Order Date, Ship Date
## date (3): Order_Date, Ship_Date, Order_Month
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 17 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Category, Sub-Category
## dbl (3): Total_Sales, Total_Profit, Avg_Profit_Per_Order
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Category, Metric, Value_abbr, Profit_Percent
## dbl (1): Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 48 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): Total_Profit
## dttm (1): Order_Month
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Category, Metric, Value_abbr, Profit_Percent
## dbl (1): Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 17 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Sub-Category, Profit_Percent_Label
## dbl (4): Total_Sales, Total_Profit, Total_Quantity, Profit_Percent
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 49 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): State, Region
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 6 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Discount_Bin
## dbl (1): Avg_Profit_Margin
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 3 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Category, Sub-Category
## dbl (2): Total_Sales, Total_Profit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 3 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Category
## dbl (1): Total_Profit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 192 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Region
## dbl (1): Total_Profit
## date (1): Order_Month
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 3 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Category
## dbl (1): Total_Profit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 17 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Category, Sub-Category
## dbl (1): Total_Profit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 17 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Category, Sub-Category, tooltip, fill_color
## dbl (1): Total_Profit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 4 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Region, Profit_Margin_Label
## dbl (5): lon, lat, Total_Sales, Total_Profit, Profit_Margin
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 384 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Region, Metric
## dbl (1): Value
## date (1): Order_Month
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 49 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Region, State
## dbl (2): Total_Sales, Total_Profit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 10 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): State, Profit_Margin_Label, Category
## dbl (3): Total_Sales, Total_Profit, Profit_Margin
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 10 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Customer Name
## dbl (3): Total_Sales, Total_Profit, Total_Orders
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 188 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Ship Mode
## dbl (3): Total_Sales, Total_Profit, Total_Orders
## date (1): Order_Month
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(dplyr)
library(ggplot2)
library(plotly)
library(scales)
library(readr)
# Custom function for thousands
label_thousands <- label_number(
scale = 1/1000,
suffix = "k",
accuracy = 1
)
# Compute min and max for each facet (Metric)
tick_values <- category_long %>%
group_by(Metric) %>%
summarise(
min_val = min(Value[Value > 0]),
max_val = max(Value)
)
# Function to pick min/max for breaks
breaks_fn <- function(x) c(min(x[x > 0]), max(x))
# Plot with only first and last tick per facet
p <- ggplot(category_long,
aes(x = Value, y = Category, fill = Category,
text = paste0(
Metric, ": ", label_thousands(Value),
"<br>Profit %: ", Profit_Percent
))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ Metric, scales = "free_x") +
scale_x_continuous(
labels = label_thousands,
breaks = breaks_fn
) +
labs(
title = "Total Sales, Profit, and Quantity by Category",
x = "Value (in Thousands)",
y = "Category"
) +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) # horizontal labels
# Make it interactive
ggplotly(p, tooltip = "text")# Export the cleaned table with abbreviated values
category_long_export <- category_long %>%
mutate(Value_abbr = label_thousands(Value)) %>%
select(Category, Metric, Value, Value_abbr, Profit_Percent)Insights - From the chart above, we can observe that Technology leads in total sales with $836,000 and a 17.40% profit, totaling $145,555, making it the most profitable category overall. - Interestingly, Technology achieved the highest sales despite having the fewest products sold (6,936 units), indicating a higher revenue per unit compared to other categories. - Looking at Furniture, this category produced $742,000 in sales but only managed a 2.49% profit, making it the least profitable category. - As for Office Supplies, this category achieved a 17.04% profit with total sales of $719,047. It is also the most sold category with 22,906 units sold.
library(dplyr)
library(ggplot2)
library(lubridate)
library(scales)
library(plotly)
# Aggregate profit by month
profit_trend <- Sample_Superstore %>%
mutate(Order_Month = floor_date(`Order Date`, "month")) %>%
group_by(Order_Month) %>%
summarise(
Total_Profit = sum(Profit, na.rm = TRUE),
.groups = "drop"
)
# Base plot with explicit group = 1
p <- ggplot(profit_trend, aes(x = Order_Month, y = Total_Profit,
text = paste0("Month: ", Order_Month,
"<br>Total Profit: ", dollar(Total_Profit)),
group = 1)) + # this fixes ggplotly rendering
geom_line(color = "steelblue", size = 1) +
geom_point(color = "darkred", size = 2) +
geom_smooth(method = "loess", se = FALSE, color = "darkred", linetype = "dashed") +
scale_y_continuous(labels = dollar) +
labs(
title = "Profitability Trend Over Time",
x = "Month",
y = "Total Profit ($)"
) +
theme_minimal(base_size = 14)## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: text.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
Insights - The plot above shows the monthly profitability of Superstore over the recorded period. Overall, the store exhibits a slightly growing trend in profit, as indicated by the red dashed smoothing line, although there are noticeable fluctuations month-to-month. - The highest profits occur in months such as November and December, reflecting the seasonal boost from holiday sales. For example, profits peak at around $70,000–$75,000 in these periods. - Conversely, some months, particularly in the first quarter of the year, show lower profitability, sometimes dipping to $10,000–$15,000, likely due to slower sales after the holiday season. - The trend line smooths out these fluctuations and suggests a gradual upward trajectory, indicating that overall profitability is improving over time.
Despite seasonal dips, the store is increasingly generating more profit per month, showing resilience and growth potential. The combination of monthly variation and the overall upward trend highlights areas of high performance and opportunities to optimize during lower-profit months.
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
library(plotly)
# Summarise subcategory
subcategory_summary <- Sample_Superstore %>%
group_by(`Sub-Category`) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
Total_Quantity = sum(Quantity, na.rm = TRUE)
) %>%
mutate(
Profit_Percent = Total_Profit / Total_Sales, # numeric for sorting
Profit_Percent_Label = percent(Profit_Percent, accuracy = 0.01)
)
# Reshape to long format
subcategory_long <- subcategory_summary %>%
pivot_longer(
cols = c(Total_Sales, Total_Profit, Total_Quantity),
names_to = "Metric",
values_to = "Value"
)
# Custom function to display values in thousands
label_thousands <- label_number(
scale = 1/1000,
suffix = "k",
accuracy = 1
)
# Base ggplot with horizontal bars
p <- ggplot(
subcategory_long,
aes(
x = Value,
y = `Sub-Category`,
fill = `Sub-Category`,
text = paste0(
"Subcategory: ", `Sub-Category`, "<br>",
"Metric: ", Metric, "<br>",
"Value: ", label_thousands(Value), "<br>",
"Profit %: ", Profit_Percent_Label
)
)
) +
geom_col(show.legend = FALSE) +
facet_wrap(~ Metric, scales = "free_x") + # free_x because Value is on x-axis
scale_x_continuous(labels = label_thousands) +
labs(
title = "Total Sales, Profit, and Quantity by Subcategory",
x = "Value (in Thousands)",
y = "Subcategory"
) +
theme_minimal(base_size = 12) +
theme(
axis.text.x = element_text(angle = 30, hjust = 1),
axis.text.y = element_text(size = 8)
)
# Make it interactive
ggplotly(p, tooltip = "text")Insights The chart above illustrates the totals of sales, profits, and quantities across all subcategories.
While certain subcategories such as, Tables and Bookcases drive sales, their weak or negative margins indicate they may be eroding overall profitability. In contrast, high-margin subcategories like Copiers and consistently high-volume categories like Paper and Binders should be considered strategic priorities, as they combine strong demand with solid profitability.
library(ggplot2)
library(plotly)
library(scales)
# Profit Percent Chart by Subcategory
p_profit <- ggplot(
subcategory_summary,
aes(
x = reorder(`Sub-Category`, Profit_Percent), # order by profit %
y = Profit_Percent,
fill = `Sub-Category`,
text = paste0(
"Subcategory: ", `Sub-Category`, "<br>",
"Profit %: ", Profit_Percent_Label, "<br>",
"Total Sales: ", comma(Total_Sales), "<br>",
"Total Profit: ", comma(Total_Profit)
)
)
) +
geom_col(show.legend = FALSE) +
coord_flip() + # flip for readability
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Profit % by Subcategory",
x = "Subcategory",
y = "Profit %"
) +
theme_minimal(base_size = 12)
# Interactive version
ggplotly(p_profit, tooltip = "text")Insights Now, with the help of the chart above, we examine the profit percentage by subcategory, we identify the top five most profitable and the least profitable groups:
Top 5 Most Profitable Subcategories:
Top 5 Least Profitable Subcategories:
While subcategories such as Labels, Envelopes, Copiers, and Paper deliver strong profit margins, it is worth noting that Tables, Bookcases, and Supplies are dragging profitability with negative or near-zero margins.
Actionable Insights This suggests that management should reassess pricing, discounts, or cost structures for under performing categories, while doubling down on high-margin subcategories that contribute more effectively to the bottom line.
library(ggplot2)
library(plotly)
library(scales)
# Scatter plot: Sales vs Profit %
p_scatter <- ggplot(
subcategory_summary,
aes(
x = Total_Sales,
y = Profit_Percent,
size = Total_Quantity,
color = `Sub-Category`,
text = paste0(
"Subcategory: ", `Sub-Category`, "<br>",
"Total Sales: $", comma(Total_Sales), "<br>",
"Total Profit: $", comma(Total_Profit), "<br>",
"Profit %: ", Profit_Percent_Label, "<br>",
"Quantity Sold: ", comma(Total_Quantity)
)
)
) +
geom_point(alpha = 0.7) +
scale_x_continuous(labels = dollar_format()) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Sales vs Profit % by Subcategory",
x = "Total Sales",
y = "Profit % (Profit/Sales)",
size = "Quantity Sold"
) +
theme_minimal(base_size = 13)
# Make interactive
ggplotly(p_scatter, tooltip = "text")This bubble chart below provides a strategic view of how each subcategory in the Superstore performs in terms of sales revenue, profitability, and product volume:
X-Axis (Total Sales): Subcategories further to the right generate higher revenue. Y-Axis (Profit %): Subcategories higher up are more profitable relative to their sales. Bubble Size (Quantity Sold): Larger bubbles indicate subcategories that move higher product volumes.
Phones and Chairs Appear far to the right → strong sales performance. Moderate height = profitable, but not the top performers by margin. These drive significant revenue but not the highest efficiency.
Labels and Paper Positioned higher up on the Y-axis = highest profit percentages. Labels, though relatively smaller in sales, deliver exceptional margins, making them profit leaders. Paper combines both high profit % and high volume, making it a critical driver of both margin and market presence.
Tables and Bookcases Positioned to the right but lower on the Y-axis = high revenue but weak profitability. These categories drag down margins and may need pricing or cost-structure adjustments.
Binders and Paper Large bubbles = sold in very high quantities. These are workhorse categories, with Binders moving huge volumes but not offering the strongest profitability.
Insights From our analysis, we can state the following: Phones and Chairs drive the highest sales volume, but they are not the most profitable. The subcategories Labels and Paper stand out as the most profitable, generating a strong profit margin relative to their sales.On the other hand, Tables and Bookcases represent the least profitable subcategories, with profit margins below -3%. In terms of product movement, Binders and Paper are sold in the highest quantities, indicating their importance in driving overall sales volume.
This indicates that while some subcategories contribute heavily to revenue (like Phones), true profitability is concentrated in lower-sales but high-margin products (like Labels). Conversely, high-sales categories like Tables and Bookcases under perform in profit contribution, suggesting they may require a more thorough analysis.
library(dplyr)
library(ggplot2)
library(plotly)
library(maps)
library(scales)
# --- Prepare State-level data ---
state_summary <- Sample_Superstore %>%
group_by(State, Region) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
Profit_Margin = Total_Profit / Total_Sales,
Profit_Margin_Label = percent(Profit_Margin, accuracy = 0.1),
Map_Type = "State" # new column to identify plot type
)
states_map <- map_data("state")
state_summary <- state_summary %>%
mutate(State_lower = tolower(State))
map_data_states <- states_map %>%
left_join(state_summary, by = c("region" = "State_lower"))
# --- Prepare Region-level data ---
region_summary <- Sample_Superstore %>%
group_by(Region) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
Profit_Margin = Total_Profit / Total_Sales,
Map_Type = "Region"
)
state_region <- Sample_Superstore %>%
select(State, Region) %>%
distinct() %>%
mutate(State_lower = tolower(State))
map_data_region <- states_map %>%
left_join(state_region, by = c("region" = "State_lower")) %>%
left_join(region_summary, by = "Region") %>%
mutate(State = NA, # add missing State column for consistency
Profit_Margin_Label = NA) # optional placeholder
# --- Combine state and region data ---
map_data_combined <- bind_rows(
map_data_states %>% select(long, lat, group, region, Profit_Margin, Profit_Margin_Label, Map_Type, State, Total_Sales, Region),
map_data_region %>% select(long, lat, group, region, Profit_Margin, Profit_Margin_Label, Map_Type, State, Total_Sales, Region)
)
# --- Plot both maps side by side using facet_wrap ---
p <- ggplot(map_data_combined, aes(long, lat, group = group, fill = Profit_Margin,
text = paste0(
ifelse(Map_Type == "State", paste("State: ", State), paste("Region: ", Region)),
"<br>Total Sales: $", comma(Total_Sales),
"<br>Profit Margin: ", ifelse(Map_Type == "State", Profit_Margin_Label, percent(Profit_Margin, accuracy = 0.01))
))) +
geom_polygon(color = "white") +
coord_fixed(1.3) +
scale_fill_gradient2(low = "red", mid = "orange", high = "green", midpoint = 0,
labels = percent) +
facet_wrap(~ Map_Type, ncol = 2) + # two plots side by side
labs(title = "Profit Margin: States vs Regions", fill = "Profit Margin") +
theme_void()
# Make interactive
ggplotly(p, tooltip = "text")Insights This map provides a geographic perspective on profitability, helping identify where sales are effectively generating profit versus where they may be underperforming.
1. High Profit Margin States (Green)
2. Low or Negative Profit Margin States (White to Red)
3. Moderate Profit Margin States (Near Zero / White)
Actionable Insights
Insights: This interactive map shows profit margin by region across the US for the Sample Superstore dataset. * Darker green regions → higher profit margins, indicating strong profitability relative to sales. * Lighter green regions → lower profit margins, signaling less efficient sales-to-profit conversion. * Hover over regions to see Total Sales and Profit Margin, allowing quick comparison of performance.
Actionable Insights
library(plotly)
library(dplyr)
library(scales)
donut_data <- state_summary %>%
mutate(
Fraction = Total_Profit / sum(Total_Profit, na.rm = TRUE),
Label = paste0(State, ": ", dollar(Total_Profit))
)
plot_ly(
donut_data,
labels = ~State,
values = ~Total_Profit,
type = "pie",
textinfo = "none", # no static labels
hoverinfo = "text", # keep hover tooltips
text = ~paste0(
"State: ", State,
"<br>Total Profit: ", dollar(Total_Profit),
"<br>Profit Margin: ", percent(Profit_Margin, accuracy = 0.1)
),
hole = 0.4
) %>%
layout(
title = "Total Profit Distribution by State",
legend = list(
orientation = "h", # 🔹 horizontal legend
x = 0.5, # center horizontally
xanchor = "center", # anchor legend to the center
y = -0.2 # move legend below the plot
)
)The donuts plot shows Profit Value ($) and Profit Margin (%) for each state, allowing us to analyze both absolute profitability and efficiency of sales.
Profit Margin vs Profit Value:
Insights * Highest Profit Margin: Washington D.C. (34%) — this means that for every dollar of sales in D.C., the company earns 34 cents in profit, making it the most efficient state in terms of profitability. * Lowest Profit Margin: Ohio (-27.1%) — Ohio is losing money, with a negative margin indicating that for every dollar of sales, the company loses 27 cents.
This comparison highlights that high total sales do not always equal high profitability. Some states may have large sales but low or negative margins, while others with moderate sales can be highly profitable due to efficient operations.
library(ggplot2)
library(scales)
p <- ggplot(margin_extremes, aes(x = reorder(State, Profit_Margin),
y = Profit_Margin, fill = Category)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = paste0(Profit_Margin_Label,
"\n$", comma(Total_Profit))),
hjust = -0.1, size = 3.5) +
facet_wrap(~ Category, scales = "free") +
coord_flip() +
scale_y_continuous(labels = percent, expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Top 5 and Bottom 5 States by Profit Margin",
x = "State",
y = "Profit Margin (%)"
) +
theme_minimal(base_size = 14)
pInsights - From the chart above, we can identify the top and bottom five states by profit margin. The top-performing states demonstrate strong efficiency in converting sales into profit, even if their sales volumes are not the highest. This indicates advantages such as better pricing power, lower costs, or a favorable product mix. On the other hand, the bottom-performing states are struggling with weak or even negative margins, which may stem from excessive discounting, higher transportation costs, or low demand. - The high-margin states serve as models worth replicating across other markets, the low-margin states highlight areas where immediate strategic adjustments are necessary. For shareholders, this contrast underscores both the strengths and risks within the company’s operations and provides a clear roadmap for where management should focus improvement efforts.
library(dplyr)
library(ggplot2)
library(scales)
library(plotly)
# Summarize profit margin by state
state_summary <- Sample_Superstore %>%
group_by(State) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(Profit_Margin = Total_Profit / Total_Sales)
# Identify Top 5 and Bottom 5 states by Profit Margin
top_bottom_states <- state_summary %>%
arrange(desc(Profit_Margin)) %>%
slice(c(1:5, (n()-4):n())) %>%
pull(State)
# Filter original data to just those states + summarize by Sub-Category
state_sub_summary <- Sample_Superstore %>%
filter(State %in% top_bottom_states) %>%
group_by(State, `Sub-Category`) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(Profit_Margin = Total_Profit / Total_Sales)
# Keep ordering of states (Top 5 first, Bottom 5 after)
state_order <- state_summary %>%
arrange(desc(Profit_Margin)) %>%
slice(c(1:5, (n()-4):n())) %>%
pull(State)
state_sub_summary$State <- factor(state_sub_summary$State, levels = state_order)
# Plot stacked bar chart
p <- ggplot(state_sub_summary, aes(x = State, y = Profit_Margin,
fill = `Sub-Category`,
text = paste0(
"State: ", State,
"<br>Sub-Category: ", `Sub-Category`,
"<br>Profit Margin: ", percent(Profit_Margin, accuracy = 0.1),
"<br>Total Profit: ", dollar(Total_Profit)
))) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = percent) +
labs(
title = "Top 5 and Bottom 5 States by Profit Margin (Stacked by Sub-Category)",
x = "State",
y = "Profit Margin (%)",
fill = "Sub-Category"
) +
theme_minimal(base_size = 14)
# Convert to interactive
ggplotly(p, tooltip = "text")Insights - The analysis of profit margins by state
with subcategory contributions reveals clear patterns in performance.
States with the highest profit margins, such as the District of
Columbia, benefit from strong and consistent contributions
across multiple subcategories, particularly Copiers,
Paper, and Labels. These states demonstrate balanced
profitability, where no single subcategory dominates, reducing risk and
ensuring stable performance. - Conversely, the states with the lowest
profit margins, such as Ohio, are weighed down by
substantial losses in specific subcategories, especially
Tables, Bookcases, and Supplies. While
certain products like Phones and Binders perform well
even in these states, their contributions are insufficient to offset the
heavy losses in furniture-related categories.
- This comparison highlights that state-level profitability is driven
less by total sales volume and more by the composition of sales across
subcategories. A diverse and balanced mix of profitable products leads
to strong margins, whereas over-reliance on loss-making categories
erodes profitability, regardless of sales performance.
library(ggplot2)
library(dplyr)
library(scales)
library(plotly)
# Base ggplot
p <- ggplot(Sample_Superstore, aes(
x = Discount,
y = Profit,
text = paste0(
"Sales: $", comma(Sales),
"<br>Profit: $", comma(Profit),
"<br>Discount: ", percent(Discount),
"<br>Category: ", Category,
"<br>Sub-Category: ", `Sub-Category`
)
)) +
geom_point(alpha = 0.6, color = "darkblue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
scale_y_continuous(labels = comma) +
labs(
title = "Impact of Discounts on Profit",
x = "Discount",
y = "Profit"
) +
theme_minimal(base_size = 14)
# Convert to interactive
ggplotly(p, tooltip = "text")## `geom_smooth()` using formula = 'y ~ x'
Understanding the Discount vs Profit Scatter Plot
Insights
- Discounts don’t always equal profitability — while
they may boost sales, they often erode profit.
- High-discount transactions are risky, particularly in
categories like Furniture, which already operate on
thin margins.
- Controlled discounting works best — small, targeted
promotions protect margins better than blanket heavy discounts.
- Category-specific strategy:
- Use discounts cautiously in low-margin categories
(Furniture).
- Be more flexible in high-margin categories
(Technology, Office Supplies).
- Monitor closely: Ensure discounts drive net
profit gains, not just inflated sales at a loss.
library(dplyr)
library(ggplot2)
library(scales)
library(plotly)
# Compute average profit margin by discount bin
profit_margin_trend <- Sample_Superstore %>%
mutate(
Discount_Bin = cut(
Discount,
breaks = c(0, 0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1.0),
labels = c("0-10%", "10-20%", "20-30%", "30-40%", "40-60%", "60-80%", "80-100%"),
include.lowest = TRUE
),
Profit_Margin = ifelse(Sales > 0, Profit / Sales, NA)
) %>%
group_by(Discount_Bin) %>%
summarise(
Avg_Profit_Margin = mean(Profit_Margin, na.rm = TRUE),
.groups = "drop"
)
# Find break-even point (approximate)
break_even_idx <- which.min(abs(profit_margin_trend$Avg_Profit_Margin)) # closest to 0
break_even_bin <- profit_margin_trend$Discount_Bin[break_even_idx]
break_even_value <- profit_margin_trend$Avg_Profit_Margin[break_even_idx]
# Plot
p <- ggplot(profit_margin_trend, aes(x = Discount_Bin, y = Avg_Profit_Margin,
group = 1,
text = paste0(
"Discount Range: ", Discount_Bin,
"<br>Avg Profit Margin: ", percent(Avg_Profit_Margin, 0.1)
))) +
geom_line(color = "steelblue", size = 1.2) + # trend line
geom_point(color = "darkred", size = 3) + # points
geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
geom_point(aes(x = break_even_bin, y = break_even_value),
color = "red", size = 4) + # mark break-even
annotate("text", x = break_even_bin, y = break_even_value + 0.05,
label = "Break-even", color = "red", angle = 0, vjust = 0) +
scale_y_continuous(labels = percent) +
labs(
title = "Average Profit Margin by Discount Level",
x = "Discount Range",
y = "Average Profit Margin"
) +
theme_minimal(base_size = 14)
# Interactive
ggplotly(p, tooltip = "text")Insights Discount vs. Profitability 1. Trend Overview * The blue trend line shows average profit margin across different discount levels. * As discounts increase, profit margins consistently decline, highlighting the trade-off between sales growth and profitability. 2. Break-even Point * The red point marks the break-even discount level, where the average profit margin crosses 0%. * In the Sample Superstore data, this occurs around the 30–40% discount range. * Discounts below this threshold are generally profitable, while higher discounts push margins into negative territory.
Actionable Insights * Controlled Discounts: Keep discounts below the break-even threshold to ensure profitability. * High-Margin Categories: Can tolerate slightly higher discounts without going negative. * Low-Margin Categories: Even moderate discounts may turn sales unprofitable, so apply carefully. * Monitoring: Regularly track profit margins by discount level to adjust promotions dynamically.
Key Point: The plot visually emphasizes that more discount ≠ more profit. Understanding the break-even point allows the company to maximize sales while protecting overall profitability.
customer_summary <- Sample_Superstore %>%
group_by(`Customer Name`) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
Total_Orders = n(),
.groups = "drop"
) %>%
arrange(desc(Total_Profit))
library(plotly)
# Add interactive tooltips
p <- ggplot(top_customers, aes(x = reorder(`Customer Name`, Total_Profit), y = Total_Profit,
text = paste0(
"Customer: ", `Customer Name`,
"<br>Total Sales: $", scales::comma(Total_Sales),
"<br>Total Profit: $", scales::comma(Total_Profit),
"<br>Orders: ", Total_Orders
))) +
geom_col(fill = "steelblue") +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Top 10 Customers by Profit",
x = "Customer",
y = "Total Profit"
) +
theme_minimal()
# Convert to interactive plot
ggplotly(p, tooltip = "text")Insights The plot highlights our most profitable customers, showing that a small number of clients contribute disproportionately to total profit. While some high-sales customers generate modest profit, others with fewer orders deliver strong margins, indicating key opportunities for targeted engagement and risk management.
library(dplyr)
library(ggplot2)
library(scales)
library(plotly)
# Summarize by customer segment
segment_summary <- Sample_Superstore %>%
group_by(`Segment`) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
Total_Orders = n(),
.groups = "drop"
) %>%
arrange(desc(Total_Profit))
p <- ggplot(segment_summary, aes(x = `Segment`, y = Total_Profit,
text = paste0(
"Segment: ", `Segment`,
"<br>Total Sales: $", scales::comma(Total_Sales),
"<br>Total Profit: $", scales::comma(Total_Profit),
"<br>Orders: ", Total_Orders
))) +
geom_col(fill = "darkgreen") +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Customer Segment Contribution to Profit",
x = "Segment",
y = "Total Profit"
) +
theme_minimal()
ggplotly(p, tooltip = "text")Insights - The plot shows which customer
segments drive the most profit.
- Segments with high Total Profit and Total Sales are
the most valuable and should be prioritized for marketing and
retention.
- Segments with high sales but low profit may need pricing
adjustments or targeted promotions to improve
profitability.
Insights Lowest Profit Segment & Highest
Profit Segment - Consumer is the highest profit
segment, contributing the most to overall profit and revenue.
- Strong margins may come from larger orders, repeat business,
and higher-margin product purchases.
- Actionable takeaway: Focus on retention
strategies, loyalty programs, and upselling opportunities in
this segment to further maximize revenue and profitability.
library(plotly)
p_orders <- ggplot(shipping_trends, aes(x = Order_Month, y = Total_Orders, color = `Ship Mode`,
text = paste0(
"Shipping Mode: ", `Ship Mode`,
"<br>Month: ", Order_Month,
"<br>Total Orders: ", Total_Orders
))) +
geom_line(size = 1.2) +
geom_point() +
labs(
title = "Trends in Total Orders by Shipping Mode",
x = "Month",
y = "Total Orders",
color = "Shipping Mode"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(p_orders, tooltip = "text")Insights The plot shows that Standard Class consistently handles the largest share of orders over time, making it the backbone of shipping operations. Other modes such as Second Class, First Class, and Same Day have much lower and relatively stable volumes, suggesting they are used selectively. This indicates that the company relies heavily on the most economical shipping option, while premium modes remain niche
library(ggplot2)
library(plotly)
library(scales)
p_profit_facet <- ggplot(data = shipping_trends,
aes(x = Order_Month, y = Total_Profit,
group = `Ship Mode`,
color = `Ship Mode`,
text = paste0(
"Shipping Mode: ", `Ship Mode`,
"<br>Month: ", Order_Month,
"<br>Total Profit: $", scales::comma(Total_Profit)
))) +
geom_line(size = 1.2) +
geom_point(size = 2) +
facet_wrap(~`Ship Mode`, scales = "free_y") + # separate panel per shipping mode
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Total Profit Trends by Shipping Mode",
x = "Month",
y = "Total Profit"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") # legend not needed since each panel is labeled
ggplotly(p_profit_facet, tooltip = "text")Insights The trend shows that Standard Class dominates order volumes consistently over time, indicating it is the most relied-on and cost-efficient shipping mode. Premium options such as First Class and Same Day remain low in usage, with only minor fluctuations, suggesting they are reserved for urgent or special cases rather than routine orders.
library(dplyr)
library(ggplot2)
library(plotly)
library(scales)
# Calculate loss-making sub-categories with profit margin
loss_products_margin <- Sample_Superstore %>%
group_by(Category, `Sub-Category`) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(Profit_Margin = Total_Profit / Total_Sales) %>%
filter(Profit_Margin < 0) %>%
arrange(Profit_Margin)
# Plot Profit Margin for loss-making sub-categories
p_loss_products_margin <- ggplot(loss_products_margin,
aes(x = reorder(`Sub-Category`, Profit_Margin),
y = Profit_Margin, fill = Category,
text = paste0(
"Sub-Category: ", `Sub-Category`,
"<br>Category: ", Category,
"<br>Total Sales: $", comma(Total_Sales),
"<br>Total Profit: $", comma(Total_Profit),
"<br>Profit Margin: ", percent(Profit_Margin, accuracy = 0.1)
))) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = percent) +
labs(
title = "Loss-Making Product Sub-Categories by Profit Margin",
x = "Sub-Category",
y = "Profit Margin (%)",
fill = "Category"
) +
theme_minimal()
# Interactive
ggplotly(p_loss_products_margin, tooltip = "text")Insights The plot highlights loss-making sub-categories by profit margin (%), showing which products are most inefficient relative to their sales.
library(dplyr)
library(ggplot2)
library(plotly)
# Group by Region + Sub-Category
loss_regions_products <- Sample_Superstore %>%
group_by(Region, `Sub-Category`) %>%
summarise(
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE),
.groups = "drop"
) %>%
filter(Total_Profit < 0) %>% # only show losses
arrange(Total_Profit)
# Plot
p_loss_regions_products <- ggplot(loss_regions_products,
aes(x = reorder(`Sub-Category`, Total_Profit),
y = Total_Profit,
fill = Region,
text = paste0(
"Region: ", Region,
"<br>Sub-Category: ", `Sub-Category`,
"<br>Total Sales: $", scales::comma(Total_Sales),
"<br>Total Profit: $", scales::comma(Total_Profit)
))) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Products Generating Losses by Region",
x = "Sub-Category",
y = "Total Profit",
fill = "Region"
) +
theme_minimal()
ggplotly(p_loss_regions_products, tooltip = "text")Insights The plot identifies loss-making products across regions, showing which sub-categories consistently generate negative profits.
Product Profitability
Business Action
| Category | Sub-Categories | Action Plan |
|---|---|---|
| Technology | Copiers, Phones | Expand stock, targeted marketing, cross-sell, maintain margins |
| Furniture | Tables | Reduce discounts, optimize shipping, renegotiate suppliers, consider phasing out |
| Office Supplies | Binders | Reprice, bundle with profitable items, limit unnecessary discounts |
Regional and State-Level Margins
Business action
| Region | Insight / Observation | Business Action |
|---|---|---|
| West | High-margin region, strong performance | Double down on high-margin products, increase marketing, expand inventory, upsell and cross-sell |
| East | High-margin region, strong performance | Double down on high-margin products, increase marketing, expand inventory, upsell and cross-sell |
| Central | Under-performing region, lower margins | Improve pricing strategy, reduce unnecessary discounts, focus on profitable product categories, optimize shipping and operations |
Discount Impact
Business action
| Focus Area | Insight / Observation | Business Action |
|---|---|---|
| Discounts | Blanket promotions erode profit margins, especially on low-margin categories | Apply discounts strategically at the breakpoint margin, prioritizing high-margin categories instead of blanket promotions |
Customer Segmentation
Business action
| Customer Segment | Insight / Observation | Business Action |
|---|---|---|
| High-Value Customers | Most profitable, drive significant revenue and margins | Prioritize retention and loyalty programs, offer personalized rewards, and targeted promotions |
| Lower-Tier Customers | Lower profitability, often reliant on discounts | Reassess pricing strategy, limit deep discounts, focus on profitable product offerings, optimize acquisition costs |
Shipping Mode Trends
Consistent Loss Areas
Business action
| Focus Area | Insight / Observation | Business Action |
|---|---|---|
| Underperforming Products | Certain products consistently generate low or negative profit | Take corrective action: adjust pricing, control costs, or consider phasing out persistently unprofitable products |
| Discounts | Discounts over 30% erode margins | Limit deep discounts, implement discount caps, and use targeted promotions instead of blanket markdowns |
Opportunities for Growth
Business action
| Focus Area | Insight / Observation | Business Action |
|---|---|---|
| Product Category | Technology products (Phones, Copiers, Accessories) drive highest margins | Double down on high-margin categories: increase inventory, marketing, and cross-sell opportunities |
| Region | West and East regions have strongest margins | Focus expansion and sales efforts in West and East to maximize returns |
* Expanding in **West and East regions**, where margins are strongest, will maximize return.
Business action
| Area / Focus | Insight / Observation | Business Action |
|---|---|---|
| Discounts | Deep or unnecessary discounts erode profit margins | Implement targeted discounting, limit blanket discounts, monitor discount impact on margins |
| Shipping | High-cost shipping reduces profitability | Promote cost-efficient shipping methods, optimize logistics, charge for premium shipping when necessary |
| Loss-Making Products | Some sub-categories consistently generate losses | Focus on profitable products, reprice or renegotiate supplier contracts, consider phasing out persistently unprofitable items |
Business action
| Current Strategy | Insight / Observation | Business Action |
|---|---|---|
| Volume-driven with heavy discounts | Discounts drive sales but erode profit margins | Shift to value-focused strategy: implement loyalty points program, reward repeat customers, encourage profitable purchases |
The Superstore’s profitability is concentrated in specific products, regions, and customers, while other areas drain performance. By doubling down on Technology, profitable regions, and loyal high-value customers, while tightening discounts and addressing loss-making sub-categories, the company can drive sustainable growth and improve efficiency in both operations and profitability.
Tableau Software. (2020, June 30). Sample - Superstore Sales (Excel). Tableau Community Forums. https://community.tableau.com/s/question/0D54T00000CWeX8SAL/sample-superstore-sales-excelxls
Becker, R. A., Wilks, A. R., Brownrigg, R., Minka, T. P., & Deckmyn, A. (2018). maps: Draw geographical maps [Computer software]. CRAN. https://cran.r-project.org/package=maps
Gohel, D., & Skintzos, P. (2025). ggiraph: Make ‘ggplot’ graphics interactive [Computer software]. CRAN. https://cran.r-project.org/package=ggiraph
Plotly Technologies Inc. (2025). plotly: Create interactive web graphics via plotly.js [Computer software]. CRAN. https://cran.r-project.org/package=plotly
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2025). dplyr: A grammar of data manipulation [Computer software]. CRAN. https://cran.r-project.org/package=dplyr
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media. (tidyverse and lubridate packages are part of this ecosystem).
Wickham, H., & Henry, L. (2019). tidyr: Tidy messy data [Computer software]. CRAN. https://cran.r-project.org/package=tidyr
Wickham, H., & Seidel, D. (2022). scales: Scale functions for visualization [Computer software]. CRAN. https://cran.r-project.org/package=scales
Yu, G., & Xu, S. (2025). ggimage: Use image in ‘ggplot2’ [Computer software]. CRAN. https://cran.r-project.org/package=ggimage