1 Introduction

Management of PT. Elips (a retail company) want to know about their constumer and their current product sales. The management expect more sales if they know the demanded product and the effect about discount to their sales. Hopefully, there is any insight from discounted product so they know which product effected from discount. Hence, the management could give better product sales decision, which product sales will increased by discount and which are not

2 Data Import

elips <- read.csv("datainput/retail.csv")
head(elips)

3 Data Wrangling

tail(elips)
dim(elips)
## [1] 9994   15
names(elips)
##  [1] "Row.ID"       "Order.ID"     "Order.Date"   "Ship.Date"    "Ship.Mode"   
##  [6] "Customer.ID"  "Segment"      "Product.ID"   "Category"     "Sub.Category"
## [11] "Product.Name" "Sales"        "Quantity"     "Discount"     "Profit"
str(elips)
## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : chr  "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
##  $ Ship.Date   : chr  "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
##  $ Ship.Mode   : chr  "Second Class" "Second Class" "Second Class" "Standard Class" ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : chr  "Consumer" "Consumer" "Corporate" "Consumer" ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : chr  "Furniture" "Furniture" "Office Supplies" "Furniture" ...
##  $ Sub.Category: chr  "Bookcases" "Chairs" "Labels" "Tables" ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

4 Data Preparation

elips$Row.ID <- as.character(elips$Row.ID)
elips$Order.ID <- as.character(elips$Order.ID)
elips$Order.Date <- as.Date(elips$Order.Date,"%m/%d/%y")
elips$Ship.Date <- as.Date(elips$Ship.Date,"%m/%d/%y")
elips$Customer.ID <- as.character(elips$Customer.ID)
elips$Product.ID <- as.character(elips$Product.ID)
elips$Product.Name <- as.character(elips$Product.Name)
elips$Ship.Mode <- as.factor(elips$Ship.Mode)
elips$Segment <- as.factor(elips$Segment)
elips$Category <- as.factor(elips$Category)
elips$Sub.Category <- as.factor(elips$Sub.Category)

str(elips)
## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : chr  "1" "2" "3" "4" ...
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : Date, format: "2016-11-08" "2016-11-08" ...
##  $ Ship.Date   : Date, format: "2016-11-11" "2016-11-11" ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

5 Null and Empty cell removal

colSums(is.na(elips))
##       Row.ID     Order.ID   Order.Date    Ship.Date    Ship.Mode  Customer.ID 
##            0            0            0            0            0            0 
##      Segment   Product.ID     Category Sub.Category Product.Name        Sales 
##            0            0            0            0            0            0 
##     Quantity     Discount       Profit 
##            0            0            0
anyNA(elips)
## [1] FALSE
Elips <- elips[,c(3:15)]
head(Elips)

6 Brief statistical of data

summary(Elips)
##    Order.Date           Ship.Date                   Ship.Mode   
##  Min.   :2014-01-03   Min.   :2014-01-07   First Class   :1538  
##  1st Qu.:2015-05-23   1st Qu.:2015-05-27   Same Day      : 543  
##  Median :2016-06-26   Median :2016-06-29   Second Class  :1945  
##  Mean   :2016-04-30   Mean   :2016-05-03   Standard Class:5968  
##  3rd Qu.:2017-05-14   3rd Qu.:2017-05-18                        
##  Max.   :2017-12-30   Max.   :2018-01-05                        
##                                                                 
##  Customer.ID               Segment      Product.ID       
##  Length:9994        Consumer   :5191   Length:9994       
##  Class :character   Corporate  :3020   Class :character  
##  Mode  :character   Home Office:1783   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##             Category         Sub.Category  Product.Name      
##  Furniture      :2121   Binders    :1523   Length:9994       
##  Office Supplies:6026   Paper      :1370   Class :character  
##  Technology     :1847   Furnishings: 957   Mode  :character  
##                         Phones     : 889                     
##                         Storage    : 846                     
##                         Art        : 796                     
##                         (Other)    :3613                     
##      Sales              Quantity        Discount          Profit         
##  Min.   :    0.444   Min.   : 1.00   Min.   :0.0000   Min.   :-6599.978  
##  1st Qu.:   17.280   1st Qu.: 2.00   1st Qu.:0.0000   1st Qu.:    1.729  
##  Median :   54.490   Median : 3.00   Median :0.2000   Median :    8.666  
##  Mean   :  229.858   Mean   : 3.79   Mean   :0.1562   Mean   :   28.657  
##  3rd Qu.:  209.940   3rd Qu.: 5.00   3rd Qu.:0.2000   3rd Qu.:   29.364  
##  Max.   :22638.480   Max.   :14.00   Max.   :0.8000   Max.   : 8399.976  
## 
sum(Elips$Quantity)
## [1] 37873
sum(Elips$Profit)
## [1] 286397

Initial order date is on 3/1/2014 and the first delivery is on 7/1/2014. Classes with the highest to lowest number of delivery, consecutively, are Standard Class, Second Class, First Class, and Same Day. Product Segment with the highest number of orders/purchases is Consumer Goods, followed with Corporate Goods and Home Office Goods. Categories of goods most purchased from highest to lowest are Office Supply, Furniture, and Technology. The most purchased goods are of the sub-category of Binders, followed by Paper, and Furnishing. The total number of sales is at 37873. The maximum sales is at 22638.4; minimum at 0.4, with an average sales of 229.858. Maximum Quantity is at 14; minimum at 1, with an average of 3.79. Maximum discount rate is set at 80% with an average of 20%. Maximum profit reaches 8399.9 with a minimum profit at -6599.9, yielding an average profit of 28.65. The total profit sums up to 286397.

aggregate(Elips$Profit~Elips$Segment,Elips,mean)
aggregate(Elips$Profit~Elips$Segment,Elips,var)
aggregate(Elips$Profit~Elips$Segment,Elips,sd)
boxplot(Elips$Profit)

Elips[Elips$Sales== min(Elips$Sales),]

Lowest sales in the Consumer segment comes from Office Supplies in which the profit ratio is at 1.11.

os <- Elips[Elips$Category == "Office Supplies",]
round(prop.table(table(os$Segment))*100,2)
## 
##    Consumer   Corporate Home Office 
##       51.89       30.20       17.91

Highest number of buyers of Office Supplies come from the Consumer segment

dis <- Elips[Elips$Discount == max(Elips$Discount),]
dis_date <- sort(table(dis$Order.Date),decreasing = T)[1]
dis_date
## 2014-07-26 
##          4

An 80% Discount occurs 4 times in July 2014.

dis2 <- dis[dis$Order.Date == "2014-07-26",]
table(dis2$Segment)
## 
##    Consumer   Corporate Home Office 
##           4           0           0

Buyers of 80%-discounted goods are consumers.

xtabs(Quantity~Segment+Category,Elips)
##              Category
## Segment       Furniture Office Supplies Technology
##   Consumer         4166           11758       3597
##   Corporate        2495            7018       2095
##   Home Office      1367            4130       1247
heatmap(xtabs(Quantity~Segment+Category,Elips),cexRow = 0.8,cexCol = 0.8,scale = "column",Colv = NA, Rowv = NA)

aggregate(Quantity~Segment+Category,Elips,median)
sum(Elips$Quantity)
## [1] 37873
sum(Elips$Profit)
## [1] 286397
nol <-  Elips[Elips$Discount == 0,]
knitr::kable(table(nol$Discount))
Var1 Freq
0 4798
More0 <- Elips[Elips$Discount > 0 & Elips$Discount <=0.8,]
knitr::kable(table(More0$Discount))
Var1 Freq
0.1 94
0.15 52
0.2 3657
0.3 227
0.32 27
0.4 206
0.45 11
0.5 66
0.6 138
0.7 418
0.8 300
sort(xtabs(Profit~Category,Elips),decreasing = T)
## Category
##      Technology Office Supplies       Furniture 
##       145454.95       122490.80        18451.27

The most profitable category is Technology.

Elips[Elips$Profit== min(Elips$Profit),]

The biggest loss occurs on order date 2016-11-26 in which 5 items of Technology-Maxhines-Cubify CubeX 3D Printer Double Head Print are discounted 70%.

sort(xtabs(Profit ~ Segment, Elips),decreasing = T)
## Segment
##    Consumer   Corporate Home Office 
##   134119.21    91979.13    60298.68

Profit based on segment is consumer.

sort(xtabs(Sales ~ Category, Elips),decreasing = T)
## Category
##      Technology       Furniture Office Supplies 
##        836154.0        741999.8        719047.0

Highest selling price / highest selling margin of goods is Technology.

xtabs(Profit ~ Segment+Ship.Mode,Elips)
##              Ship.Mode
## Segment       First Class  Same Day Second Class Standard Class
##   Consumer      21374.044  9874.205    24946.911      77924.049
##   Corporate     14464.472  1818.142    18225.713      57470.807
##   Home Office   13131.324  4199.412    14274.011      28693.932
plot(xtabs(Profit ~ Segment+Ship.Mode,Elips))

Every segment prefers delivery by Standard Class the most, followed by Second Class, First Class, and Same Day, and consumer provides the highest profit.

sort(xtabs(Quantity ~ Category, Elips),decreasing=T)
## Category
## Office Supplies       Furniture      Technology 
##           22906            8028            6939

Based on item sold, office supply was the highest folloewd by furniture and technology

Tech_Category <- Elips[Elips$Category == "Technology",]
Off_Sup_Category <- Elips[Elips$Category == "Office Supplies",]
Furniture_Category <- Elips[Elips$Category == "Furniture",]
sum(Tech_Category$Profit)
## [1] 145454.9
sum(Off_Sup_Category$Profit)
## [1] 122490.8
sum(Furniture_Category$Profit)
## [1] 18451.27
cor(Tech_Category$Quantity,Tech_Category$Discount)
## [1] -0.01413123
cor(Tech_Category$Profit,Tech_Category$Discount)
## [1] -0.2688532

The number of purchases on goods of the Technology category has a contrast correlation toward discounts, which means that discounts do not contribute to sales of Technology goods.

The profit of Technology is inversely proportional to discounts, meaning that discounts contribute negatively towards profit of the sales of Technology goods.

7 Conslusion

since PT. Elips started their company to sell Furniture, Office supplies and Technology goods in 2014 to 2017, there were 9994 transaction recorded, sold 37873 items. Buyer from this company derived in three segment, they were consumer, Corporate and Home Office. Based on all transaction transaction we can conclude that product categorty that oftenly brought was consumer goods followed by Corporate goods then Home office, where Office supply goods found highly consumed then followed by Furniture and Technology goods. Binders, paper and furnishing are the top three item consumed from office supply. Office supply buyer mostly came from Consumer . Shipping mode that usually used for each segment was standard class followed by second class, first class then same day. Maximum sales achieved per single transaction was 22638.4 and maximum quantity sold per single transaction was 14 but oftenly 3 items per transaction. Margin of product profit maximum per transaction was 8399.9, and the average was 28.65. Among those three category, technoloy given high margin, but due to high discount about 80% from technoloy caused a big loss about -6599.9.

8 Way Forward

  1. high discount should suitable for consumer goods for consumer segment buyer, and not suitable or will give loss if given to technology product. Because mostly profit comes from consumer goods and home office goods, altough given small margin but consumer items sold in big scale.
  2. Dicount will improve sales for Corporate segment and home office segment market, to support that prepare extra supply to accomodate high sales.