This dataset is the sales record from retail company (pict from google). Sales were recorded from 2014 to 2017. This retail company sells Office Supplies, Furniture and Technology stuff with three kind of consumer segmentaton: Individual Consumer, Corporate and Home Office. I am as a Market Researcher would like to provide some recommendation for this retail company to increase profit.
make sure the data in the same folder with R project
retail <- read.csv("retail.csv")
head(retail)
tail(retail)
dim(retail)
## [1] 9994 15
names(retail)
## [1] "Row.ID" "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode"
## [6] "Customer.ID" "Segment" "Product.ID" "Category" "Sub.Category"
## [11] "Product.Name" "Sales" "Quantity" "Discount" "Profit"
The data have 9994 observations with 15 variables. The variables are:
Row.ID : Number of RowOrder.ID : ID of orderOrder.Date : Date of orderShip.Date : Date of shippingShip.Mode : Type of shipmentShip.Mode : ID of customerSegment : Customers segmentationProduct.ID : ID of productCategory : 3 kind of category “Furniture”,“Office Supplies”,“Technology”Sub.category: More specific categoriesProduct.Name: Name of productSales : How much earning from each saleQuantity : Quantity of item soldDiscount : How much Discount was given for each saleProfit : How much can a company earn from each salecheck type of data for each variables
str(retail)
## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
From this result, some of data type not in the correct type. we need change the data type so the data suitable for analysis purposes
retail$Order.Date <- as.Date(retail$Order.Date, format = "%m/%d/%y")
retail$Ship.Date <- as.Date(retail$Ship.Date, format = "%m/%d/%y")
retail$Ship.Mode <- as.factor(retail$Ship.Mode)
retail$Category <- as.factor(retail$Category)
retail$Segment <- as.factor(retail$Segment)
retail$Ship.Mode <- as.factor(retail$Ship.Mode)
retail$Sub.Category <- as.factor(retail$Sub.Category)
str(retail)
## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
## $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
## $ Ship.Mode : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
## $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
Check missing values
colSums(is.na(retail))
## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 0 0 0 0 0 0
## Segment Product.ID Category Sub.Category Product.Name Sales
## 0 0 0 0 0 0
## Quantity Discount Profit
## 0 0 0
Great! The data don’t have missing value.
Then, subset the first column because we don’t need the information.
retail <- retail [, -c(1)]
Add column for year of order
library(lubridate)
retail$year_order <- year(retail$Order.Date)
head(retail)
Now, the dataset is ready to be processed and analyzed
summary(retail)
## Order.ID Order.Date Ship.Date
## Length:9994 Min. :2014-01-03 Min. :2014-01-07
## Class :character 1st Qu.:2015-05-23 1st Qu.:2015-05-27
## Mode :character Median :2016-06-26 Median :2016-06-29
## Mean :2016-04-30 Mean :2016-05-03
## 3rd Qu.:2017-05-14 3rd Qu.:2017-05-18
## Max. :2017-12-30 Max. :2018-01-05
##
## Ship.Mode Customer.ID Segment Product.ID
## First Class :1538 Length:9994 Consumer :5191 Length:9994
## Same Day : 543 Class :character Corporate :3020 Class :character
## Second Class :1945 Mode :character Home Office:1783 Mode :character
## Standard Class:5968
##
##
##
## Category Sub.Category Product.Name
## Furniture :2121 Binders :1523 Length:9994
## Office Supplies:6026 Paper :1370 Class :character
## Technology :1847 Furnishings: 957 Mode :character
## Phones : 889
## Storage : 846
## Art : 796
## (Other) :3613
## Sales Quantity Discount Profit
## Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
## 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
## Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
## Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
##
## year_order
## Min. :2014
## 1st Qu.:2015
## Median :2016
## Mean :2016
## 3rd Qu.:2017
## Max. :2017
##
Summary:
Check the outlier in sales
boxplot(retail$Sales)
Check the outlier in profit
boxplot(retail$Profit)
From boxplot, we find posibilities for the outliers. But, in my opinion we should not discard this outlier data because we can assumption that these outliers data can represents a success and a failure in sales record in this retail company. So, the process may continue.
profit_year <- aggregate(formula = Profit ~ year_order , data = retail , FUN = sum)
profit_year[order(profit_year$Profit , decreasing = F) ,]
Profit of retail company have increased every year
sum(retail$Profit)
## [1] 286397
Total Profit within 4 years 286397
sum(retail$Quantity)
## [1] 37873
From 2014 to 2017 the retail company was successful selling the product 37873 items
xtabs(Quantity ~ Category + Segment, retail)
## Segment
## Category Consumer Corporate Home Office
## Furniture 4166 2495 1367
## Office Supplies 11758 7018 4130
## Technology 3597 2095 1247
The most sold items were office supplies purchased by the consumer segment, totaling 11758 items
retail_agg <- aggregate(formula = Quantity ~ Sub.Category, data = retail, FUN = sum)
retail_agg[order(retail_agg$Quantity, decreasing = T) ,]
Binders, Paper, Furnishings, Phones, Storage
office <- retail[retail$Category == "Office Supplies" ,]
office_agg <- aggregate(formula = Quantity ~ Sub.Category, data = office, FUN = sum)
top3_office <- office_agg[order(office_agg$Quantity, decreasing = T) ,]
head(top3_office , 3)
Binders, Paper, Storage
xtabs(Sales ~ Category, retail)
## Category
## Furniture Office Supplies Technology
## 741999.8 719047.0 836154.0
The highest sales was techonolgy
loss <- retail[retail$Profit <= 0 ,]
table(loss$Category)
##
## Furniture Office Supplies Technology
## 747 915 274
Office supplies was the category that most often have losses
xtabs(Profit ~ year_order , loss)
## year_order
## 2014 2015 2016 2017
## -31892.77 -32529.39 -37872.93 -53836.19
The biggest loss happened in 2017
table(loss$Discount)
##
## 0 0.1 0.15 0.2 0.3 0.32 0.4 0.45 0.5 0.6 0.7 0.8
## 30 4 17 524 221 27 180 11 66 138 418 300
Even when not giving discounts, retail companies have ever losses
retail[retail$Profit == -6599.978 ,]
happened in December, 2nd 2016, retail company was selling 5 items of Cubify CubeX 3D Printer Double Head Print and give discount 70%
retail[retail$Profit == 8399.976 ,]
happened October, 2nd 2016, retail company was selling 5 items of Canon imageCLASS 2200 Advanced Copier and Purchased by corporate
retail[retail$Sales == 22638.480 ,]
The highest sales occurred on 23 March 2014 when home office segment purchased of Cisco TelePresence System EX90 Videoconferencing Unit. But at the same time, the retail company suffered a loss of -1811.078
xtabs(Profit ~ Category + Segment , retail)
## Segment
## Category Consumer Corporate Home Office
## Furniture 6991.079 7584.816 3875.378
## Office Supplies 56330.321 40227.320 25933.160
## Technology 70797.810 44166.998 30490.141
The highest profit 70797.810 were technology purchased by the consumer segment and the lowest profit 3875.378 were furniture purchased by home office
table(retail$Ship.Mode , retail$Segment)
##
## Consumer Corporate Home Office
## First Class 769 485 284
## Same Day 317 114 112
## Second Class 1020 609 316
## Standard Class 3085 1812 1071
Most of buyer was choosing Standard Class for shipping options
xtabs(Profit ~ Segment + Ship.Mode , retail)
## Ship.Mode
## Segment First Class Same Day Second Class Standard Class
## Consumer 21374.044 9874.205 24946.911 77924.049
## Corporate 14464.472 1818.142 18225.713 57470.807
## Home Office 13131.324 4199.412 14274.011 28693.932
The highest profit with standard class shipping mode and the lowest profit with same day shipping mode
In four years, from 2014 to 2017, retail companies was successful to sell 37873 items with total profit of 286397. The highest profit 93439.27 occurred in 2017. Buyers are divided into three segments, namely individual consumers, home office and corporate. While the categories of goods sold are also divided into three, office supplies such as binders, paper, etc. furniture like tables, chairs, etc and technology like phones, copiers, etc. It was noted that the biggest buyers came from individual consumers who bought the office supplies category. The top three most-selling sub categories are binders, paper, and furnishing.
The highest sales and profits were achieved by the technology category, but the highest loss also came from technology. The highest loss occurred in 2016 but within 4 years the highest total loss occurred in 2017. The company also gave several discounts to customers, and the biggest discount was 80%. After further analysis, even though the retail company did not provide any discount at all, the retail company had also 0 profit or loss.
For shipping options, customers can choose 4 types: First Class, Same day, Second class, and Standard class. Based on data, from all segment, most of them are choosing Standard Class as their shipping option.
Recommendation