In this ‘Learning by Building’ project, I am working with data from the retail.csv file, which contains sales records for a company that sells furniture, office supplies, and technology products. The goal of this analysis is to support the company’s growth and long-term sustainability by identifying areas for improvement. This includes analyzing key aspects such as market segments, target audience, sales performance, profit, and more.
Setting working directory to the correct location:
Load the retail file
## [1] 9994 17
## [1] "Order.Date" "Ship.Date" "Ship.Mode" "Customer.ID"
## [5] "Segment" "Product.ID" "Category" "Sub.Category"
## [9] "Product.Name" "Sales" "Quantity" "Discount"
## [13] "Profit" "Ship.Duration" "Month" "Day"
## [17] "IsWeekend"
From the data inspection, we can find that the retail data has 9994 rows and 17 columns (as shown above).
## 'data.frame': 9994 obs. of 17 variables:
## $ Order.Date : chr "2016-11-08" "2016-11-08" "2016-06-12" "2015-10-11" ...
## $ Ship.Date : chr "2016-11-11" "2016-11-11" "2016-06-16" "2015-10-18" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category : chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name : chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
## $ Ship.Duration: int 3 3 4 7 7 5 5 5 5 5 ...
## $ Month : chr "November" "November" "June" "October" ...
## $ Day : chr "Tuesday" "Tuesday" "Sunday" "Sunday" ...
## $ IsWeekend : chr "Weekday" "Weekday" "Weekend" "Weekend" ...
Some of the data has invalid data type, therefore we should change them into the right datatypes first.
## Warning: package 'lubridate' was built under R version 4.3.3
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
retail$Order.Date <- ymd(retail$Order.Date)
retail$Ship.Date <- ymd(retail$Ship.Date)
retail$Ship.Mode <- factor(retail$Ship.Mode, levels = c("Same Day", "First Class", "Second Class", "Standard Class"))
retail$Segment <- factor(retail$Segment, levels = c("Consumer", "Corporate", "Home Office"))
retail$Category <- factor(retail$Category, levels = c("Furniture", "Office Supplies", "Technology"))
retail$Sub.Category <- factor(retail$Sub.Category, levels = c("Bookcases","Chairs","Labels","Tables", "Storage", "Furnishings", "Art", "Phones", "Binders", "Appliances", "Paper", "Accessories", "Envelopes", "Fasteners", "Supplies","Machines", "Copiers"))
str(retail)## 'data.frame': 9994 obs. of 17 variables:
## $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
## $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
## $ Ship.Mode : Factor w/ 4 levels "Same Day","First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
## $ Sub.Category : Factor w/ 17 levels "Bookcases","Chairs",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Product.Name : chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
## $ Ship.Duration: int 3 3 4 7 7 5 5 5 5 5 ...
## $ Month : chr "November" "November" "June" "October" ...
## $ Day : chr "Tuesday" "Tuesday" "Sunday" "Sunday" ...
## $ IsWeekend : chr "Weekday" "Weekday" "Weekend" "Weekend" ...
Now each column has changed into the right data type.
## Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 0 0 0 0 0
## Product.ID Category Sub.Category Product.Name Sales
## 0 0 0 0 0
## Quantity Discount Profit Ship.Duration Month
## 0 0 0 0 0
## Day IsWeekend
## 0 0
There is no missing value.
## Order.Date Ship.Date Ship.Mode
## Min. :2014-01-03 Min. :2014-01-07 Same Day : 543
## 1st Qu.:2015-05-23 1st Qu.:2015-05-27 First Class :1538
## Median :2016-06-26 Median :2016-06-29 Second Class :1945
## Mean :2016-04-30 Mean :2016-05-03 Standard Class:5968
## 3rd Qu.:2017-05-14 3rd Qu.:2017-05-18
## Max. :2017-12-30 Max. :2018-01-05
##
## Customer.ID Segment Product.ID
## Length:9994 Consumer :5191 Length:9994
## Class :character Corporate :3020 Class :character
## Mode :character Home Office:1783 Mode :character
##
##
##
##
## Category Sub.Category Product.Name
## Furniture :2121 Binders :1523 Length:9994
## Office Supplies:6026 Paper :1370 Class :character
## Technology :1847 Furnishings: 957 Mode :character
## Phones : 889
## Storage : 846
## Art : 796
## (Other) :3613
## Sales Quantity Discount Profit
## Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
## 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
## Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
## Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
##
## Ship.Duration Month Day IsWeekend
## Min. :0.000 Length:9994 Length:9994 Length:9994
## 1st Qu.:3.000 Class :character Class :character Class :character
## Median :4.000 Mode :character Mode :character Mode :character
## Mean :3.958
## 3rd Qu.:5.000
## Max. :7.000
##
The information we got are:
Question 1: Who is our biggest customer? The biggest customer is direct consumer
##
## Consumer Corporate Home Office
## 5191 3020 1783
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
profit_by_segment <- retail %>%
group_by(Segment) %>%
summarise(Total_Profit = sum(Profit)) %>%
arrange(desc(Total_Profit)) # Arrange in descending order
ggplot(profit_by_segment, aes(x = reorder(Segment, Total_Profit), y = Total_Profit, fill = Segment)) +
geom_col() +
coord_flip() + # Flip for better readability
labs(title = "Total Profit by Segment",
x = "Segment",
y = "Total Profit ($)") +
theme_minimal() +
theme(legend.position = "none") # Hide legend as Segment is already on x-axis## Segment
## Sub.Category Consumer Corporate Home Office
## Bookcases -4435.6382 638.4502 324.6320
## Chairs 13235.3319 8344.6565 5010.1779
## Labels 3075.9884 1760.8273 709.4383
## Tables -9728.0378 -4906.4986 -3090.9447
## Storage 7104.2004 9131.0247 5043.6013
## Furnishings 7919.4227 3508.2077 1631.5132
## Art 3454.3011 2004.6477 1068.8382
## Phones 23837.1147 11766.2196 8912.3963
## Binders 17995.5972 6377.3201 5848.8460
## Appliances 6981.9282 7429.8952 3726.1820
## Paper 15534.6436 10361.5468 8157.3789
## Accessories 20735.9225 12707.4805 8493.2327
## Envelopes 3264.4126 2571.2290 1128.5351
## Fasteners 576.8008 251.9030 120.8144
## Supplies -1657.5513 338.9264 129.5254
## Machines 2141.0618 703.0190 540.6761
## Copiers 24083.7106 18990.2789 12543.8354
This visualization presents the profit generated by each segment, along with details of the sub-categories sold within each segment. The results can help the company strategically focus on selling specific products to targeted markets to ensure sustained business growth. For example, bookcases may be more profitable when sold exclusively to corporate and home office segments. Additionally, given that table sales have resulted in losses across all segments, the company should consider discontinuing them to minimize financial setbacks.
Question 2: Does the number of sales align with the profit?
ggplot(data= retail,
mapping= aes(x = Sales, y = Profit, fill = Category)) +
geom_boxplot(mapping = aes(colour = Category)) +
labs(title = "Profit Distribution by Category", x = "Sales", y = "Profit")Does Sales Align with Profit?
Question 3: Which product(s) generate most profit?
profit_contributor <- aggregate(x = Profit ~ Sub.Category,
data = retail,
FUN = 'sum')
profit_contributor[order(profit_contributor$Profit, decreasing= T),]#subsetting data to show products with profit > 0
profit_contributor <- filter(profit_contributor, Profit > 0)
#bar chart
ggplot(data = profit_contributor,
mapping = aes(y= reorder(Sub.Category, Profit), x= Profit))+
geom_col()Top three profit generated by these products: Copiers, Phones and Accessories.
Question 4: What sub-category contribute to negative profit?
library(dplyr)
# Filter only negative profit values
loss_data <- retail %>%
filter(Profit < 0) %>%
group_by(Sub.Category) %>%
summarise(Total_Loss = sum(Profit)) %>%
arrange(Total_Loss) # Arrange from highest loss to lowest
# Create the bar chart
ggplot(loss_data, aes(x = reorder(Sub.Category, Total_Loss), y = Total_Loss)) +
geom_col() +
coord_flip() + # Flips the chart for better readability
labs(title = "Total Profit Loss by Sub-Category",
x = "Sub-Category",
y = "Total Loss ($)") +
theme_minimal() +
theme(legend.position = "none") # Hide legend since fill is just for color variationThese products contribute to profit losses, and the company may need to reassess whether to continue selling them. While sub-categories like binders, phones, and accessories are among the top profit contributors, certain product IDs within these categories may still be generating losses.
Question 5: How much discount does the company gives? Did it affect the business?
disc_given <- as.data.frame(table(retail$Discount))
colnames(disc_given) <- c("Discount", "Frequency")
disc_given#analyzing discount above 40%
# Prepare data
discounted_sales <- retail %>%
filter(Discount > 0.40) %>%
group_by(Sub.Category) %>%
summarise(
Count = n(),
Total_Sales = sum(Sales, na.rm = TRUE),
Total_Profit = sum(Profit, na.rm = TRUE)
) %>%
arrange(desc(Count))
# Create bar chart
ggplot(discounted_sales, aes(x = reorder(Sub.Category, Count), y = Count, fill = Total_Profit > 0)) +
geom_col() +
coord_flip() + # Flip to make it more readable
labs(title = "Frequency of Discounts Above 40% by Sub-Category",
x = "Sub-Category",
y = "Count of Discounts Given",
fill = "Profitable?") +
scale_fill_manual(values = c("red", "green"), labels = c("Loss", "Profit")) +
theme_minimal()The bar chart shows which sub-categories receive discount above 40% most frequently. Moreover, high discounts on these products leads to financial loses. Although high discount might indicate slow-moving inventory, the company should analyze if these products should remain in stock or replacing these items with better-performing ones. This analysis also indicates that the company should re-evaluate pricing strategies.
Question 6: Does the company experiencing profit or loss?
## [1] 286397
Overall, the company remains profitable despite some losses (profit was at 286397). Moving forward, it may be necessary to reassess pricing strategies and profit margins for the least profitable or loss-generating subcategories. A potential solution could be to discontinue unprofitable products and focus on those that drive the highest sales and profitability.
Between 2014 and 2017, the company generated a total profit of $286,397 from selling products across three categories: Furniture, Office Supplies, and Technology. Among these, Technology contributed both the highest profit ($145,454.95) and the highest losses (over -$4,000), indicating the need for a closer evaluation of its pricing and cost structures.
While consumers accounted for the highest sales, the data reveals that higher sales do not always translate to higher profits. Across all three categories, there were instances of negative profit, suggesting inefficiencies in pricing, cost management, or discount strategies.
Key Recommendations:
Expand Target Segments – To drive sales growth, the company should consider targeting corporate and office segments, especially for high-demand items such as copiers, paper, phones, and accessories. Reassess Pricing and Discounts – Review selling prices and profit margins on products with negative profits. Additionally, analyze the impact of discounts to ensure they do not further contribute to losses. Optimize Inventory Management – Identify products that should remain in stock and those that should be discontinued to minimize unprofitable inventory and improve overall profitability.