A template on how data analysis work

1 Intro

1.1 Data Overview

Explain the data in short paragraph

This data is contains of sales record included the profit or loss from a retail company (called PT. retail). PT.retail is a global retail company who sells Office Supplies, Furniture and Technology stuff.
The buyers are coming from different segment such as Corporate, Home Office until individual consumer. This company also could provide the best option for your delivery which suitable with your needs.

1.2 Problem Statement & Business Question

List severall business questions to be answered using the data

  • Which category gives the lowest sales number? how much profit?
  • Which segment buying the most office suppies?
  • How much total of quantity order from each segment and category, and which is the highest?
  • Median of quantity for every segment and category?
  • How much Total quantity order since PT. retail start selling the product?
  • How much profit in 4 years?
  • Category which generate the highest profit?
  • Which transation has the highest Loss in which transaction?
  • Which one has the highest Profit in Segment?
  • What is the highest sales number based on category ?
  • What is the highest profit based on Ship Mode + segment?
  • Which cateogry has the highest quantity of seller

2 Import Data

Make sure our data placed in the same folder our R project data.

2.1 Inspect Data

Check the structure and type of data

## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID    : Factor w/ 5009 levels "CA-2014-100006",..: 2501 2501 2297 4373 4373 202 202 202 202 202 ...
##  $ Order.Date  : Factor w/ 1237 levels "1/1/17","1/10/14",..: 305 305 836 94 94 922 922 922 922 922 ...
##  $ Ship.Date   : Factor w/ 1334 levels "1/1/15","1/1/16",..: 220 220 907 129 129 897 897 897 897 897 ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : Factor w/ 793 levels "AA-10315","AA-10375",..: 144 144 240 706 706 89 89 89 89 89 ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : Factor w/ 1862 levels "FUR-BO-10000112",..: 13 56 947 320 1317 186 563 1762 795 438 ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: Factor w/ 1850 levels "\"While you Were Out\" Message Book, One Form per Page",..: 387 833 1440 367 574 570 1137 1099 535 295 ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

From our inspection we can conclude :
* retail data contain 9994 of rows and 15 of columns
* From this result, we find some of data type not in the corect type. we need to convert it into corect type (data coertion)

2.2 Data Cleansing

Change any variable that has improper data type

## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : chr  "1" "2" "3" "4" ...
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : Factor w/ 1237 levels "1/1/17","1/10/14",..: 305 305 836 94 94 922 922 922 922 922 ...
##  $ Ship.Date   : Factor w/ 1334 levels "1/1/15","1/1/16",..: 220 220 907 129 129 897 897 897 897 897 ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

Each of column already changed into desired data type

Do subset to delete some column (1&2 because we dont need the informations). then save it into ‘retail’ variable

Now, retail dataset is ready to be processed and analyzed

3 Data Summary

Let’s summarise the data

##     Order.Date      Ship.Date             Ship.Mode    Customer.ID       
##  9/5/16  :  38   12/16/15:  35   First Class   :1538   Length:9994       
##  9/2/17  :  36   9/26/17 :  34   Same Day      : 543   Class :character  
##  11/10/16:  35   11/21/17:  32   Second Class  :1945   Mode  :character  
##  12/1/17 :  34   12/6/17 :  32   Standard Class:5968                     
##  12/2/17 :  34   12/12/17:  30                                           
##  12/9/17 :  33   9/15/17 :  30                                           
##  (Other) :9784   (Other) :9801                                           
##         Segment      Product.ID                   Category   
##  Consumer   :5191   Length:9994        Furniture      :2121  
##  Corporate  :3020   Class :character   Office Supplies:6026  
##  Home Office:1783   Mode  :character   Technology     :1847  
##                                                              
##                                                              
##                                                              
##                                                              
##       Sub.Category  Product.Name           Sales              Quantity    
##  Binders    :1523   Length:9994        Min.   :    0.444   Min.   : 1.00  
##  Paper      :1370   Class :character   1st Qu.:   17.280   1st Qu.: 2.00  
##  Furnishings: 957   Mode  :character   Median :   54.490   Median : 3.00  
##  Phones     : 889                      Mean   :  229.858   Mean   : 3.79  
##  Storage    : 846                      3rd Qu.:  209.940   3rd Qu.: 5.00  
##  Art        : 796                      Max.   :22638.480   Max.   :14.00  
##  (Other)    :3613                                                         
##     Discount          Profit         
##  Min.   :0.0000   Min.   :-6599.978  
##  1st Qu.:0.0000   1st Qu.:    1.729  
##  Median :0.2000   Median :    8.666  
##  Mean   :0.1562   Mean   :   28.657  
##  3rd Qu.:0.2000   3rd Qu.:   29.364  
##  Max.   :0.8000   Max.   : 8399.976  
## 

Summary :

  • Standart Class was the most popular one for shipping option and Same day delivery was the lowest one
  • PT. retail has the most buyers coming from cunsumer segment and home office segment was the lowest
  • Office Supply category was the best seller compare to others
  • In Sub category, Other stuff was the most favourite one then followed by Binders and paper. Art is the lowest
  • PT. retail gained the average sales at 229.858; with max sales value at 22638.480 and minumum sales value at 0.444
  • Max quantity order was 14 within 4 years but average at 3.79
  • PT. retail max amount of discount was 80%, but mean number of discount around 15%
  • PT. retail ever had loss at 6599 and gained 8399 as their profit with average profit at 28.657

4 Data Transformation

  1. Which category gives the lowest sales number? how much profit?

Answer : Lowest sales comes from segment ‘consumer’, office supplies, and got loss 1.11

  1. Which segment buying the most office suppies?
## 
##    Consumer   Corporate Home Office 
##   0.5189180   0.3020246   0.1790574

Answer : consumer segment –> 51.89%

  1. How much total of quantity order from each segment and category, and which is the highest?
##              Category
## Segment       Furniture Office Supplies Technology
##   Consumer         4166           11758       3597
##   Corporate        2495            7018       2095
##   Home Office      1367            4130       1247

Answer : Based on result above: segmen consumer with category office supply make the highest quantity order with sum 11758

  1. Median of quantity for every segment and category?

Answer : Median of quantity = 3

  1. How much Total quantity order since PT. retail start selling the product?
## [1] 37873

Answer : 37873 items

  1. How much profit in 4 years?
## [1] 286397

Answer : 286397

  1. Category which generate the highest profit?
## Category
##       Furniture Office Supplies      Technology 
##        18451.27       122490.80       145454.95

Answer : Technology

  1. Highest Loss in which transaction?

Answer : happened in 2016, was selling 5 items of machines, discount 70%

  1. Highest Profit in Segment?
## Segment
##    Consumer   Corporate Home Office 
##   134119.21    91979.13    60298.68

Answer : Consumer was given the higest profit compare to other segment

  1. Sales number based on category ?
## Category
##       Furniture Office Supplies      Technology 
##        741999.8        719047.0        836154.0

Answer : Technology have the highest selling price compare to other

  1. Profit based on Ship Mode + segment?
##              Ship.Mode
## Segment       First Class  Same Day Second Class Standard Class
##   Consumer      21374.044  9874.205    24946.911      77924.049
##   Corporate     14464.472  1818.142    18225.713      57470.807
##   Home Office   13131.324  4199.412    14274.011      28693.932

Answer : Most of buyer was choosing Standar Class for shipping options. Among them consumer was the higest one

  1. The highest quantity seller based on Category
## Category
##       Furniture Office Supplies      Technology 
##            8028           22906            6939

Answer : Office Supply

5 Explanatory Text & Business Recomendation

In 4 years, since bigining of 2014-2017 PT. retail have sold total 37873 items with total profit 286397. This company selling 3 main category : Furniture, Office supplies and Technology. From this category, ‘Technology’ generate the highest profit with value 145454.95 compare to others but ironically, Highest loss with ammount 6599.987 happened in 2016 was coming from ‘Technology’ category as well. Apart from that, Technology category has the highest selling price.

Buyer who bought the product was coming from different segment, such as : Consumer, Corporate and Home Office. ‘Consumer’ was giving the higest ammount of profit compare to other with 134119.21. Shipping option consist of 4 types : First Class, Same day, Second class, and Standart class. Based on data, from all segment, most of them are choosing Standart Class as their shipping option. Consumer segment is the most highest segment who choosing Standart class mode.

Recomendations :

  1. From our calculation above, showing that Technology was selling with highest sales value but contradictory max loss was coming from Technology section as well. This is caused because discound given was too hight (80%) without think about the provit.Consider it, No discount given without making some profit!! Calculate twice before deciding how many percentage of discount would be given

  2. Need to increase market target at Corporate segment and home office segment, we may give them more discount with taking back the profit to company

  3. Make sure all category was safely supplied on the wareshouse specially for ffice supply items.Based on our calculation, buyers mostly buyed office supply category with 22906, it almost 2-3 times higer from other category