Overview

This dataset is the sales record from retail company (pict from google). Sales were recorded from 2014 to 2017. This retail company sells Office Supplies, Furniture and Technology stuff with three kind of consumer segmentaton: Individual Consumer, Corporate and Home Office. I am as a Market Researcher would like to provide some recommendation for this retail company to increase profit.

Data Preparation

Input Data

make sure the data in the same folder with R project

retail <- read.csv("retail.csv")

Data Inspection

head(retail)
tail(retail)
dim(retail)
## [1] 9994   15
names(retail)
##  [1] "Row.ID"       "Order.ID"     "Order.Date"   "Ship.Date"    "Ship.Mode"   
##  [6] "Customer.ID"  "Segment"      "Product.ID"   "Category"     "Sub.Category"
## [11] "Product.Name" "Sales"        "Quantity"     "Discount"     "Profit"

The data have 9994 observations with 15 variables. The variables are:

  • Row.ID : Number of Row
  • Order.ID : ID of order
  • Order.Date : Date of order
  • Ship.Date : Date of shipping
  • Ship.Mode : Type of shipment
  • Ship.Mode : ID of customer
  • Segment : Customers segmentation
  • Product.ID : ID of product
  • Category : 3 kind of category “Furniture”,“Office Supplies”,“Technology”
  • Sub.category: More specific categories
  • Product.Name: Name of product
  • Sales : How much earning from each sale
  • Quantity : Quantity of item sold
  • Discount : How much Discount was given for each sale
  • Profit : How much can a company earn from each sale

Data Cleansing

check type of data for each variables

str(retail)
## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : chr  "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
##  $ Ship.Date   : chr  "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
##  $ Ship.Mode   : chr  "Second Class" "Second Class" "Second Class" "Standard Class" ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : chr  "Consumer" "Consumer" "Corporate" "Consumer" ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : chr  "Furniture" "Furniture" "Office Supplies" "Furniture" ...
##  $ Sub.Category: chr  "Bookcases" "Chairs" "Labels" "Tables" ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

From this result, some of data type not in the correct type. we need change the data type so the data suitable for analysis purposes

retail$Order.Date <- as.Date(retail$Order.Date, format = "%m/%d/%y")
retail$Ship.Date <- as.Date(retail$Ship.Date, format = "%m/%d/%y")
retail$Ship.Mode <- as.factor(retail$Ship.Mode)
retail$Category <- as.factor(retail$Category)
retail$Segment <- as.factor(retail$Segment)
retail$Ship.Mode <- as.factor(retail$Ship.Mode)
retail$Sub.Category <- as.factor(retail$Sub.Category)

str(retail)
## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : Date, format: "2016-11-08" "2016-11-08" ...
##  $ Ship.Date   : Date, format: "2016-11-11" "2016-11-11" ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

Check missing values

colSums(is.na(retail))
##       Row.ID     Order.ID   Order.Date    Ship.Date    Ship.Mode  Customer.ID 
##            0            0            0            0            0            0 
##      Segment   Product.ID     Category Sub.Category Product.Name        Sales 
##            0            0            0            0            0            0 
##     Quantity     Discount       Profit 
##            0            0            0

Great! The data don’t have missing value.

Then, subset the first column because we don’t need the information.

retail <- retail [, -c(1)]

Add column for year of order

library(lubridate)
retail$year_order <- year(retail$Order.Date)
head(retail)

Now, the dataset is ready to be processed and analyzed

Data Explanation

summary(retail)
##    Order.ID           Order.Date           Ship.Date         
##  Length:9994        Min.   :2014-01-03   Min.   :2014-01-07  
##  Class :character   1st Qu.:2015-05-23   1st Qu.:2015-05-27  
##  Mode  :character   Median :2016-06-26   Median :2016-06-29  
##                     Mean   :2016-04-30   Mean   :2016-05-03  
##                     3rd Qu.:2017-05-14   3rd Qu.:2017-05-18  
##                     Max.   :2017-12-30   Max.   :2018-01-05  
##                                                              
##           Ship.Mode    Customer.ID               Segment      Product.ID       
##  First Class   :1538   Length:9994        Consumer   :5191   Length:9994       
##  Same Day      : 543   Class :character   Corporate  :3020   Class :character  
##  Second Class  :1945   Mode  :character   Home Office:1783   Mode  :character  
##  Standard Class:5968                                                           
##                                                                                
##                                                                                
##                                                                                
##             Category         Sub.Category  Product.Name      
##  Furniture      :2121   Binders    :1523   Length:9994       
##  Office Supplies:6026   Paper      :1370   Class :character  
##  Technology     :1847   Furnishings: 957   Mode  :character  
##                         Phones     : 889                     
##                         Storage    : 846                     
##                         Art        : 796                     
##                         (Other)    :3613                     
##      Sales              Quantity        Discount          Profit         
##  Min.   :    0.444   Min.   : 1.00   Min.   :0.0000   Min.   :-6599.978  
##  1st Qu.:   17.280   1st Qu.: 2.00   1st Qu.:0.0000   1st Qu.:    1.729  
##  Median :   54.490   Median : 3.00   Median :0.2000   Median :    8.666  
##  Mean   :  229.858   Mean   : 3.79   Mean   :0.1562   Mean   :   28.657  
##  3rd Qu.:  209.940   3rd Qu.: 5.00   3rd Qu.:0.2000   3rd Qu.:   29.364  
##  Max.   :22638.480   Max.   :14.00   Max.   :0.8000   Max.   : 8399.976  
##                                                                          
##    year_order  
##  Min.   :2014  
##  1st Qu.:2015  
##  Median :2016  
##  Mean   :2016  
##  3rd Qu.:2017  
##  Max.   :2017  
## 

Summary:

  • The first order in 2014
  • The most of customer chose standard class for mode of shipping
  • The most buyers coming from individual customer segment
  • Office supplies was the best seller category
  • In Sub category, Other stuff was the most favorite one then followed by binders and paper.
  • Maximum sales value at 22638.480 and minimum sales value at 0.444 with average sales value at 229.858
  • The buyers purchases the most order for 14 pieces and at least 1 piece
  • The maximum discount that can be given by retail company was 80%
  • The average of profit was 28.657 with highest profit at 8399.976 and ever had loss at 6599.978

Check the outlier in sales

boxplot(retail$Sales)

Check the outlier in profit

boxplot(retail$Profit)

From boxplot, we find posibilities for the outliers. But, in my opinion we should not discard this outlier data because we can assumption that these outliers data can represents a success and a failure in sales record in this retail company. So, the process may continue.

Data Manipulation

  1. Profit in 2014 - 2018
profit_year <- aggregate(formula = Profit ~ year_order , data = retail , FUN = sum)
profit_year[order(profit_year$Profit , decreasing = F) ,]

Profit of retail company have increased every year

  1. Total Profit within 4 years
sum(retail$Profit)
## [1] 286397

Total Profit within 4 years 286397

  1. Total Quantity sold within 4 years
sum(retail$Quantity)
## [1] 37873

From 2014 to 2017 the retail company was successful selling the product 37873 items

  1. Total quantity for each category and segment
xtabs(Quantity ~ Category + Segment, retail)
##                  Segment
## Category          Consumer Corporate Home Office
##   Furniture           4166      2495        1367
##   Office Supplies    11758      7018        4130
##   Technology          3597      2095        1247

The most sold items were office supplies purchased by the consumer segment, totaling 11758 items

  1. five most popular subcategories
retail_agg <- aggregate(formula = Quantity ~ Sub.Category, data = retail, FUN = sum)
retail_agg[order(retail_agg$Quantity, decreasing = T) ,]

Binders, Paper, Furnishings, Phones, Storage

  1. three most popular subcategories within the Office supplies category
office <- retail[retail$Category == "Office Supplies" ,]
office_agg <- aggregate(formula = Quantity ~ Sub.Category, data = office, FUN = sum)
top3_office <- office_agg[order(office_agg$Quantity, decreasing = T) ,]
head(top3_office , 3)

Binders, Paper, Storage

  1. Total sales for each category
xtabs(Sales ~ Category, retail)
## Category
##       Furniture Office Supplies      Technology 
##        741999.8        719047.0        836154.0

The highest sales was techonolgy

  1. Total Quantity loss order
loss <- retail[retail$Profit <= 0 ,]
table(loss$Category)
## 
##       Furniture Office Supplies      Technology 
##             747             915             274

Office supplies was the category that most often have losses

  1. Loss in 2014 - 2017
xtabs(Profit ~ year_order , loss)
## year_order
##      2014      2015      2016      2017 
## -31892.77 -32529.39 -37872.93 -53836.19

The biggest loss happened in 2017

  1. How much discount given when the retail company losses
table(loss$Discount)
## 
##    0  0.1 0.15  0.2  0.3 0.32  0.4 0.45  0.5  0.6  0.7  0.8 
##   30    4   17  524  221   27  180   11   66  138  418  300

Even when not giving discounts, retail companies have ever losses

  1. Highest Loss in transaction
retail[retail$Profit == -6599.978 ,]

happened in December, 2nd 2016, retail company was selling 5 items of Cubify CubeX 3D Printer Double Head Print and give discount 70%

  1. Highest Profit in transaction
retail[retail$Profit == 8399.976 ,]

happened October, 2nd 2016, retail company was selling 5 items of Canon imageCLASS 2200 Advanced Copier and Purchased by corporate

  1. Highest sales in transaction
retail[retail$Sales == 22638.480 ,]

The highest sales occurred on 23 March 2014 when home office segment purchased of Cisco TelePresence System EX90 Videoconferencing Unit. But at the same time, the retail company suffered a loss of -1811.078

  1. Total Profit in Category and Segment
xtabs(Profit ~ Category + Segment , retail)
##                  Segment
## Category           Consumer Corporate Home Office
##   Furniture        6991.079  7584.816    3875.378
##   Office Supplies 56330.321 40227.320   25933.160
##   Technology      70797.810 44166.998   30490.141

The highest profit 70797.810 were technology purchased by the consumer segment and the lowest profit 3875.378 were furniture purchased by home office

  1. Ship mode for each segment
table(retail$Ship.Mode , retail$Segment)
##                 
##                  Consumer Corporate Home Office
##   First Class         769       485         284
##   Same Day            317       114         112
##   Second Class       1020       609         316
##   Standard Class     3085      1812        1071

Most of buyer was choosing Standard Class for shipping options

  1. Profit based on Ship Mode + segment
xtabs(Profit ~ Segment + Ship.Mode , retail)
##              Ship.Mode
## Segment       First Class  Same Day Second Class Standard Class
##   Consumer      21374.044  9874.205    24946.911      77924.049
##   Corporate     14464.472  1818.142    18225.713      57470.807
##   Home Office   13131.324  4199.412    14274.011      28693.932

The highest profit with standard class shipping mode and the lowest profit with same day shipping mode

Explanation and Recommendation

In four years, from 2014 to 2017, retail companies was successful to sell 37873 items with total profit of 286397. The highest profit 93439.27 occurred in 2017. Buyers are divided into three segments, namely individual consumers, home office and corporate. While the categories of goods sold are also divided into three, office supplies such as binders, paper, etc. furniture like tables, chairs, etc and technology like phones, copiers, etc. It was noted that the biggest buyers came from individual consumers who bought the office supplies category. The top three most-selling sub categories are binders, paper, and furnishing.

The highest sales and profits were achieved by the technology category, but the highest loss also came from technology. The highest loss occurred in 2016 but within 4 years the highest total loss occurred in 2017. The company also gave several discounts to customers, and the biggest discount was 80%. After further analysis, even though the retail company did not provide any discount at all, the retail company had also 0 profit or loss.

For shipping options, customers can choose 4 types: First Class, Same day, Second class, and Standard class. Based on data, from all segment, most of them are choosing Standard Class as their shipping option.

Recommendation

  • To avoid excessive losses, the selling price must be calculated correctly. don’t just give a discount. The fact is that discounts can attract a lot of customers. But, maybe for now should look for other strategies besides giving discounts like promos on social media, distributing brochures and so on.
  • For now, maybe the retail company only can provide a maximum discount of 50% for certain goods or purchases in large quantities.
  • Need to increase sales in the home office and corporate segments. maybe it could be increasing the types of sub-categories or by collaborating with the home office and corporate.
  • Collaborating with shipping companies to get maximum profit.