1 Explanation

1.1 Brief

Hi !! Welcome to my LBB :)
In this LBB, I will use data Retail.csv
I hope you enjoy it !

1.2 Data’s Point of View

This data is contains of sales record included the profit or loss from a retail company who sells Office Supplies, Furniture and Technology stuff. The buyers are coming from different segment such as Corporate, Home Office until individual consumer. This company also could provide the best option for your delivery which suitable with your needs.\

Note : Detail explanatory will be given at the end of content

2 Data

2.1 Input Data

Make sure our data placed within folder of our R project data.

data<-read.csv("data_input/retail.csv")

Input data is DONE ! then let’s get started

2.2 Data Inspection

head(data)
dim(data)
#> [1] 9994   15
names(data)
#>  [1] "Row.ID"       "Order.ID"     "Order.Date"   "Ship.Date"    "Ship.Mode"   
#>  [6] "Customer.ID"  "Segment"      "Product.ID"   "Category"     "Sub.Category"
#> [11] "Product.Name" "Sales"        "Quantity"     "Discount"     "Profit"

From our inspection we can conclude :
* Data contains 9994 of rows and 15 of coloumns
* Column Description

  • Row.ID = Row No
  • Order.ID = Unique ID order
  • Order.Date = Order Date
  • Ship.Date = Ship Date
  • Ship.Mode = Ship Mode
  • Customer.ID = Unique ID customer
  • Segment = Segmentation/category of Customer
  • Product.ID = Unique ID product
  • Category = product category
  • Sub.Category = product sub category
  • Product.Name = product name
  • Sales = Total sales of product that is bought by customer
  • Quantity = Total product that is bought by customer
  • Discount = Total discount that customer received
  • Profit = Total profit for company

2.3 Data Cleansing & Coertions

Check data type for each column

str(data)
#> 'data.frame':    9994 obs. of  15 variables:
#>  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
#>  $ Order.Date  : chr  "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
#>  $ Ship.Date   : chr  "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
#>  $ Ship.Mode   : chr  "Second Class" "Second Class" "Second Class" "Standard Class" ...
#>  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
#>  $ Segment     : chr  "Consumer" "Consumer" "Corporate" "Consumer" ...
#>  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
#>  $ Category    : chr  "Furniture" "Furniture" "Office Supplies" "Furniture" ...
#>  $ Sub.Category: chr  "Bookcases" "Chairs" "Labels" "Tables" ...
#>  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
#>  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
#>  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
#>  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
#>  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

From this result, we find some of data type not in the corect type. we need to convert it into corect type (data coertion)

data$Order.Date<- as.Date(data$Order.Date, "%m/%d/%y")
data$Ship.Date <- as.Date(data$Ship.Date, "%m/%d/%y")   
data$Ship.Mode <- as.factor(data$Ship.Mode)
data$Segment <- as.factor(data$Segment)
data$Category <- as.factor(data$Category)
data$Sub.Category <- as.factor(data$Sub.Category)

str(data)
#> 'data.frame':    9994 obs. of  15 variables:
#>  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
#>  $ Order.Date  : Date, format: "2016-11-08" "2016-11-08" ...
#>  $ Ship.Date   : Date, format: "2016-11-11" "2016-11-11" ...
#>  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
#>  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
#>  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
#>  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
#>  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
#>  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
#>  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
#>  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
#>  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
#>  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
#>  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

Each of column already changed into desired data type

Check for missing value

anyNA(data)
#> [1] FALSE
colSums(is.na(data))
#>       Row.ID     Order.ID   Order.Date    Ship.Date    Ship.Mode  Customer.ID 
#>            0            0            0            0            0            0 
#>      Segment   Product.ID     Category Sub.Category Product.Name        Sales 
#>            0            0            0            0            0            0 
#>     Quantity     Discount       Profit 
#>            0            0            0

Good!! No missing value

2.4 Data Summary

Here is the brief summary of our data

summary(data)
#>      Row.ID       Order.ID           Order.Date           Ship.Date         
#>  Min.   :   1   Length:9994        Min.   :2014-01-03   Min.   :2014-01-07  
#>  1st Qu.:2499   Class :character   1st Qu.:2015-05-23   1st Qu.:2015-05-27  
#>  Median :4998   Mode  :character   Median :2016-06-26   Median :2016-06-29  
#>  Mean   :4998                      Mean   :2016-04-30   Mean   :2016-05-03  
#>  3rd Qu.:7496                      3rd Qu.:2017-05-14   3rd Qu.:2017-05-18  
#>  Max.   :9994                      Max.   :2017-12-30   Max.   :2018-01-05  
#>                                                                             
#>           Ship.Mode    Customer.ID               Segment      Product.ID       
#>  First Class   :1538   Length:9994        Consumer   :5191   Length:9994       
#>  Same Day      : 543   Class :character   Corporate  :3020   Class :character  
#>  Second Class  :1945   Mode  :character   Home Office:1783   Mode  :character  
#>  Standard Class:5968                                                           
#>                                                                                
#>                                                                                
#>                                                                                
#>             Category         Sub.Category  Product.Name      
#>  Furniture      :2121   Binders    :1523   Length:9994       
#>  Office Supplies:6026   Paper      :1370   Class :character  
#>  Technology     :1847   Furnishings: 957   Mode  :character  
#>                         Phones     : 889                     
#>                         Storage    : 846                     
#>                         Art        : 796                     
#>                         (Other)    :3613                     
#>      Sales              Quantity        Discount          Profit         
#>  Min.   :    0.444   Min.   : 1.00   Min.   :0.0000   Min.   :-6599.978  
#>  1st Qu.:   17.280   1st Qu.: 2.00   1st Qu.:0.0000   1st Qu.:    1.729  
#>  Median :   54.490   Median : 3.00   Median :0.2000   Median :    8.666  
#>  Mean   :  229.858   Mean   : 3.79   Mean   :0.1562   Mean   :   28.657  
#>  3rd Qu.:  209.940   3rd Qu.: 5.00   3rd Qu.:0.2000   3rd Qu.:   29.364  
#>  Max.   :22638.480   Max.   :14.00   Max.   :0.8000   Max.   : 8399.976  
#> 

Summary Insight

  • First order occured in Jan 2014
  • Standart Class was the most popular one for shipping option and Same day delivery was the lowest one
  • Most buyers coming from Consumer segment and Home Office segment was the lowest
  • Office Supply category was the best seller compare to others
  • In Sub category, Other stuff was the most favourite one then followed by Binders and paper. Art is the lowest
  • Average sales at 229.858; with max sales value at 22638.480 and minumum sales value at 0.444
  • Max quantity order was 14, but average at 3.79
  • Max amount of discount was 80%, but mean number of discount around 15%
  • Max profit is 8399 and minimum profit is -6599.978 (loss) with average profit at 28.657
  • 3rd quartile value of sales is less than mean of sales. The data is suspected of having outliers.

Check Outlier

boxplot(data$Sales)

From result above, we now know for sure that the dataset has outlier

3 Data Manipulation & Transformation

1. Which category has the highest sales number? How much profit?

xtabs(formula=Sales~Category, data = data)
#> Category
#>       Furniture Office Supplies      Technology 
#>        741999.8        719047.0        836154.0
xtabs(formula=Profit~Category, data = data)
#> Category
#>       Furniture Office Supplies      Technology 
#>        18451.27       122490.80       145454.95

Answer : Category with highest sales number is Technology and the profit is 145454.95

2. Which segment buying the most Technology suppies?

tech <- data[data$Category=="Office Supplies",]
prop.table(table(tech$Segment))*100
#> 
#>    Consumer   Corporate Home Office 
#>    51.89180    30.20246    17.90574

Answer : Consumer Segment with over 51%

3. How much is the highest discount given to the customer?

summary(data$Discount)*100
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    0.00   20.00   15.62   20.00   80.00

Answer : The highest discount is 80%

4. what category gives the highest discount?

highest.discount<-data[data$Discount>=0.8,] 
summary(highest.discount$Category)
#>       Furniture Office Supplies      Technology 
#>               0             300               0

Answer : The Office Supplies gives the biggest discount 80% for 300 transaction

5. What combination of segment and category gives the lowest profit?

agregasi_profit_category <-aggregate(x = Profit~Segment+Category, data = data, FUN = "sum")
agregasi_profit_category[order(agregasi_profit_category$Profit,decreasing=F),]

Answer : The Home Office and Furniture gives the lowest profit

6. Which transaction generated highest loss?

summary(data$Profit)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> -6599.978     1.729     8.666    28.657    29.364  8399.976

The lowest profit is -6599.978

data[data$Profit==-6599.978,]

Answer : happened on 2016 Nov 25th, was selling 5 items of machines with discount 70%

7. Which category generated highest loss transaction?

highest.loss<-data[data$Profit<0,] 
xtabs(Profit~ Category, highest.loss)
#> Category
#>       Furniture Office Supplies      Technology 
#>       -60936.11       -56615.26       -38579.92

Answer : Category Furniture generates highest loss with 60936.11

8. Which category generated most profittable transaction?

highest.profit<-data[data$Profit>0,] 
xtabs(Profit~ Category, highest.profit)
#> Category
#>       Furniture Office Supplies      Technology 
#>        79387.38       179106.06       184034.87

Answer : Category Technology generates highest profittable transaction with profit is 184034.87

9. How much total of profit order from each segment and category, and which is the highest?

xtabs(Profit~Segment+Category,data=data)
#>              Category
#> Segment       Furniture Office Supplies Technology
#>   Consumer     6991.079       56330.321  70797.810
#>   Corporate    7584.816       40227.320  44166.998
#>   Home Office  3875.378       25933.160  30490.141
plot(xtabs(Profit~Segment+Category,data))

Answer : Based on result above: segment Consumer with category Technology make the highest quantity order with sum 70797.810

9. What is the median of profit for every segment and category?

aggregate(Profit~Segment+Category,data,median)

10. Which category has the biggest no discount transaction?

no.discount<- data[data$Discount==0,]
aggregate(Discount~Category,no.discount,length)

Answer : Based on result above: category Office Supplies has 3129 transaction with no discount

4 Explanatory Text & Business Recomendation

4.1 Explanatory Text

This company selling 3 main categories : Furniture, Office supplies and Technology. From these categories, Technology generates most profits with 145454.95 from all transaction and Furniture generates highest loss with 60936.11 from its loss transaction. Furniture also gives least profit from all segment of customer.

4.2 Recommendation

  1. From above business insight, the company must think about the possibility to close the Furniture category OR spend more (give more discount or send more for marketing budget) to this category to make this category more profittable. Try to approach Home Office segment.
  2. The Office Supplies generates sizable loss (-56615.26), but it can handles because the profit is great too (179106.06, almost same with technology). Please be aware of this, because this category has the biggest number of transaction and over 50% has no discount, but still generated that much loss.
  3. The Technology is the best category of all. It generates highest profit, even some transaction looks weird (with the biggest loss), and the number of transaction is low.