Nice to meet you all! My name is Jose Octavian Leandro. In this session, I will conduct a retail data analysis using a dataset containing information about product sales.
The dataset used is retail data from a fictional company, which includes details about product sales. This dataset covers sales data from 2014 to 2017 and consists of 9,994 rows and 15 columns.
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Displaying Data Structure
## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
From the inspection we have conducted, there are 9,994 rows and 15 columns. These columns contain information about:
Row.ID: ID of the data row
Order.ID: Order ID
Order.Date: Order date
Ship.Date: Shipping date
Ship.Mode: Shipping mode
Customer.ID: Customer ID
Segment: Customer segmentation
Product.ID: Product ID
Category: Product category
Sub.Category: Product subcategory
Product.Name: Product name
Sales: Sales revenue
Quantity: Quantity of products
Discount: Discount applied
Profit: Profit earned
From the data inspection results, we can see that the Order.Date and Ship.Date columns have a character data type. We will change the data type of these columns to Date.
Additionally, we will convert the data type of the Ship.Mode, Segment, and Category columns to factor. Finally, we will modify the levels in the Ship.Mode column to “Standard Class”, “Second Class”, “First Class”, and “Same Day”.
# explicit coercion kolom Order.Date
retail$Order.Date <- as.Date(retail$Order.Date, format = "%m/%d/%y")
# explicit coercion kolom Ship.Date
retail$Ship.Date <- as.Date(retail$Ship.Date, format = "%m/%d/%y")
retail$Ship.Mode <- as.factor(retail$Ship.Mode)
retail$Segment <- as.factor(retail$Segment)
retail$Category <- as.factor(retail$Category)
levels(retail$Ship.Mode) <- c("Standard Class", "Second Class", "First Class", "Same Day")We will perform explicit coercion on the Order.Date and Ship.Date columns. We will change their data type to Date.
Additionally, we will also perform explicit coercion on the Ship.Mode, Segment, and Category columns, converting their data type to factor.
Finally, we will modify the levels in the Ship.Mode column to “Standard Class”, “Second Class”, “First Class”, and “Same Day”.
## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 0 0 0 0 0 0
## Segment Product.ID Category Sub.Category Product.Name Sales
## 0 0 0 0 0 0
## Quantity Discount Profit
## 0 0 0
We will check for missing values using colSums on the dataset.
From the results above, we can see that there are no missing values in our dataset.
## Row.ID Order.ID Order.Date Ship.Date
## Min. : 1.00 Length:9994 Min. :2014-01-03 Min. :2014-01-07
## 1st Qu.:2499.25 Class :character 1st Qu.:2015-05-23 1st Qu.:2015-05-27
## Median :4997.50 Mode :character Median :2016-06-26 Median :2016-06-29
## Mean :4997.50 Mean :2016-04-30 Mean :2016-05-03
## 3rd Qu.:7495.75 3rd Qu.:2017-05-14 3rd Qu.:2017-05-18
## Max. :9994.00 Max. :2017-12-30 Max. :2018-01-05
## Ship.Mode Customer.ID Segment Product.ID
## Standard Class:1538 Length:9994 Consumer :5191 Length:9994
## Second Class : 543 Class :character Corporate :3020 Class :character
## First Class :1945 Mode :character Home Office:1783 Mode :character
## Same Day :5968
##
##
## Category Sub.Category Product.Name
## Furniture :2121 Length:9994 Length:9994
## Office Supplies:6026 Class :character Class :character
## Technology :1847 Mode :character Mode :character
##
##
##
## Sales Quantity Discount
## Min. : 0.44400000 Min. : 1.00000000000 Min. :0.000000000000
## 1st Qu.: 17.28000000 1st Qu.: 2.00000000000 1st Qu.:0.000000000000
## Median : 54.49000000 Median : 3.00000000000 Median :0.200000000000
## Mean : 229.85800083 Mean : 3.78957374425 Mean :0.156202721633
## 3rd Qu.: 209.94000000 3rd Qu.: 5.00000000000 3rd Qu.:0.200000000000
## Max. :22638.48000000 Max. :14.00000000000 Max. :0.800000000000
## Profit
## Min. :-6599.9780000000
## 1st Qu.: 1.7287500000
## Median : 8.6665000000
## Mean : 28.6568963078
## 3rd Qu.: 29.3640000000
## Max. : 8399.9760000000
From the summary above, we can see an overview of our dataset. We can observe information regarding:
Average sales (Sales): 229.858
Average quantity of products (Quantity): 3.789
Average discount (Discount): 0.156
Average profit (Profit): 28.656
Highest sales (Sales): 22,638.48
Highest quantity of products (Quantity): 14
Highest discount (Discount): 0.8
Highest profit (Profit): 8,399.98
First order date: 2014-01-03, last order date: 2017-12-30
Total Sales
## [1] 2297200.8603
Total Profit
## [1] 286397.0217
Total Quantity
## [1] 37873
Viewing the total sales for each product category.
sales_category <- aggregate(
Sales ~ Category
, data = retail
, FUN = sum
)
order_sales_category <- sales_category[order(sales_category$Sales, decreasing = TRUE),]
order_sales_categoryWe can also analyze the total sales for each product category per year.
sales_category_year <- aggregate(
Sales ~ Category + format(Order.Date, "%Y")
, data = retail
, FUN = sum
)
order_sales_category_year <- sales_category_year[order(sales_category_year$Sales, decreasing = TRUE),]
names(order_sales_category_year) <- c("Category", "Year", "Sales")
order_sales_category_yearThen, we visualize it using ggplot2.
ggplot(order_sales_category_year, aes(x = Year, y = Sales, fill = Category)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Total Sales for Each Product Category",
x = "Year",
y = "Total Sales" ,
fill = "Category") +
theme_minimal() +
scale_fill_manual(values = c("#E07A5F", "#F1E3D3", "#1B4D3E"))From the visualization, we can observe that:
Total sales in the Office Supplies category increased significantly in 2017.
The Technology category consistently had the highest sales each year.
Sales increased every year, except for Office Supplies in 2015.
The highest sales were recorded in the Technology category in 2017, reaching 271,730.81.
First, we will look at the total profit for each product category.
profit_category <- aggregate(
Profit ~ Category
, data = retail
, FUN = sum
)
order_profit_category <- profit_category[order(profit_category$Profit, decreasing = TRUE),]
order_profit_categoryNext, we will look at the total profit for each product category per year.
profit_category_year <- aggregate(
Profit ~ Category + format(Order.Date, "%Y")
, data = retail
, FUN = sum
)
order_profit_category_year <- profit_category_year[order(profit_category_year$Profit, decreasing = TRUE),]
names(order_profit_category_year) <- c("Category", "Year", "Profit")
order_profit_category_yearThen, we visualize the data using ggplot2.
ggplot(order_profit_category_year, aes(x = Year, y = Profit, fill = Category)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Total Profit for Each Product Category ",
x = "Year",
y = "Total Profit" ,
fill = "Category") +
theme_minimal() +
scale_fill_manual(values = c("#00879E", "#FFAB5B", "#FFF2DB"))
From the visualization, we can observe that:
The Furniture category has the lowest profit each year.
The lowest profit in the Furniture category was in 2015, reaching only 3,015.20.
The Technology category consistently has the highest profit each year.
The highest profit in the Technology category was in 2017, reaching 50,684.26.
Since we can see that Furniture has the lowest profit, we will now conduct a further analysis.
furniture <- retail[retail$Category == "Furniture",]
loss <- furniture[furniture$Profit < 0,]
order_loss <- loss[order(loss$Profit, decreasing = FALSE),]
order_loss[, c("Category", "Sub.Category", "Sales", "Quantity", "Discount", "Profit")]From the table above, we can see that the Tables subcategory has the lowest total profit compared to other subcategories.
From the visualization above, we can observe that profit and sales have a fairly strong positive relationship. This means that as sales increase, profit also tends to rise.
This can be confirmed by calculating the correlation between profit and sales.
## [1] 0.479064349737706
From the results above, we can see that the correlation between profit and sales is 0.48, which indicates a fairly strong positive relationship between profit and sales.
Next, we will look at the total sales for each customer segment.
sales_segment <- aggregate(
Sales ~ Segment
, data = retail
, FUN = sum
)
order_sales_segment <- sales_segment[order(sales_segment$Sales, decreasing = TRUE),]
order_sales_segmentThe Consumer segment, with total sales of 1,161,401.35, has the highest sales compared to Corporate and Home Office. Now, we will analyze the total profit for each customer segment.
profit_segment <- aggregate(
Profit ~ Segment
, data = retail
, FUN = sum
)
order_profit_segment <- profit_segment[order(profit_segment$Profit, decreasing = TRUE),]
order_profit_segmentThe Consumer segment, with a total profit of 134,119.21, has the highest profit compared to Corporate and Home Office. This further confirms that sales and profit have a fairly strong relationship.
This retail company has a total sales of 2,297,200.86 and a total profit of 286,397.02. Based on the analysis conducted, it can be concluded that sales in the Office Supplies category saw a significant increase in 2017, while the Technology category consistently had the highest total sales each year, peaking at 271,730.81 in 2017. Overall, sales increased every year.
From a profitability perspective, the Furniture category had the lowest profit each year, hitting its lowest point in 2015 at 3,015.20, whereas the Technology category consistently recorded the highest profit, with the highest profit in 2017 at 50,684.26. Additionally, the Tables subcategory had the lowest total profit compared to other subcategories.
Regarding customer segments, the Consumer segment dominated total sales with 1,161,401.35 in revenue and 134,119.21 in profit, outperforming the Corporate and Home Office segments. Furthermore, there is a positive correlation of 0.48 between profit and sales, indicating that an increase in sales tends to be accompanied by an increase in profit.
Recommendations
The company should focus on selling products from the Technology category, as it consistently has the highest sales and profit each year.
The company should evaluate products in the Furniture category, especially the Tables subcategory, which has the lowest total profit compared to other subcategories.
The company should prioritize the Consumer segment, as it generates the highest total sales and profit compared to Corporate and Home Office.