EXPLANATION

Hi…!welcome to my Rmd. in this LBB i will use previous data which is Retail.csv. i hope you enjoy it!

INPUT DATA

eceran <- read.csv("E:/Kerja/retail.csv")

input data is done! then let’s started

DATA INSPECTION

head(eceran)
##         Order.ID Order.Date Ship.Date      Ship.Mode Customer.ID   Segment
## 1 CA-2016-152156    11/8/16  11/11/16   Second Class    CG-12520  Consumer
## 2 CA-2016-152156    11/8/16  11/11/16   Second Class    CG-12520  Consumer
## 3 CA-2016-138688    6/12/16   6/16/16   Second Class    DV-13045 Corporate
## 4 US-2015-108966   10/11/15  10/18/15 Standard Class    SO-20335  Consumer
## 5 US-2015-108966   10/11/15  10/18/15 Standard Class    SO-20335  Consumer
## 6 CA-2014-115812     6/9/14   6/14/14 Standard Class    BH-11710  Consumer
##        Product.ID        Category Sub.Category
## 1 FUR-BO-10001798       Furniture    Bookcases
## 2 FUR-CH-10000454       Furniture       Chairs
## 3 OFF-LA-10000240 Office Supplies       Labels
## 4 FUR-TA-10000577       Furniture       Tables
## 5 OFF-ST-10000760 Office Supplies      Storage
## 6 FUR-FU-10001487       Furniture  Furnishings
##                                                       Product.Name    Sales
## 1                                Bush Somerset Collection Bookcase 261.9600
## 2      Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back 731.9400
## 3        Self-Adhesive Address Labels for Typewriters by Universal  14.6200
## 4                    Bretford CR4500 Series Slim Rectangular Table 957.5775
## 5                                   Eldon Fold 'N Roll Cart System  22.3680
## 6 Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood  48.8600
##   Quantity Discount    Profit
## 1        2     0.00   41.9136
## 2        3     0.00  219.5820
## 3        2     0.00    6.8714
## 4        5     0.45 -383.0310
## 5        2     0.20    2.5164
## 6        7     0.00   14.1694
tail(eceran)
##            Order.ID Order.Date Ship.Date      Ship.Mode Customer.ID   Segment
## 9989 CA-2017-163629   11/17/17  11/21/17 Standard Class    RA-19885 Corporate
## 9990 CA-2014-110422    1/21/14   1/23/14   Second Class    TB-21400  Consumer
## 9991 CA-2017-121258    2/26/17    3/3/17 Standard Class    DB-13060  Consumer
## 9992 CA-2017-121258    2/26/17    3/3/17 Standard Class    DB-13060  Consumer
## 9993 CA-2017-121258    2/26/17    3/3/17 Standard Class    DB-13060  Consumer
## 9994 CA-2017-119914     5/4/17    5/9/17   Second Class    CC-12220  Consumer
##           Product.ID        Category Sub.Category
## 9989 TEC-PH-10004006      Technology       Phones
## 9990 FUR-FU-10001889       Furniture  Furnishings
## 9991 FUR-FU-10000747       Furniture  Furnishings
## 9992 TEC-PH-10003645      Technology       Phones
## 9993 OFF-PA-10004041 Office Supplies        Paper
## 9994 OFF-AP-10002684 Office Supplies   Appliances
##                                                                   Product.Name
## 9989                                           Panasonic KX - TS880B Telephone
## 9990                                                    Ultra Door Pull Handle
## 9991                        Tenex B1-RE Series Chair Mats for Low Pile Carpets
## 9992                                                     Aastra 57i VoIP phone
## 9993                         It's Hot Message Books with Stickers, 2 3/4" x 5"
## 9994 Acco 7-Outlet Masterpiece Power Center, Wihtout Fax/Phone Line Protection
##        Sales Quantity Discount  Profit
## 9989 206.100        5      0.0 55.6470
## 9990  25.248        3      0.2  4.1028
## 9991  91.960        2      0.0 15.6332
## 9992 258.576        2      0.2 19.3932
## 9993  29.600        4      0.0 13.3200
## 9994 243.160        2      0.0 72.9480
dim(eceran)
## [1] 9994   14
names(eceran)
##  [1] "Order.ID"     "Order.Date"   "Ship.Date"    "Ship.Mode"    "Customer.ID" 
##  [6] "Segment"      "Product.ID"   "Category"     "Sub.Category" "Product.Name"
## [11] "Sales"        "Quantity"     "Discount"     "Profit"

from our inspection we can conclude: * toko eceran data contain 9994 of rows and 14 of columns * each of columns: “Row.ID”, “Order.ID”, “Order.Date”, “Ship.Date”, “Ship.Mode”, “Customer.ID”, “Segment”,“Product.ID”, “Category”, “Sub.Category”, “Product.Name”, “Sales”, “Quantity”, “Discount”, “Profit” * ## DATA CLEANSING

check data type for each columns

str(eceran)
## 'data.frame':    9994 obs. of  14 variables:
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : chr  "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
##  $ Ship.Date   : chr  "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
##  $ Ship.Mode   : chr  "Second Class" "Second Class" "Second Class" "Standard Class" ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : chr  "Consumer" "Consumer" "Corporate" "Consumer" ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : chr  "Furniture" "Furniture" "Office Supplies" "Furniture" ...
##  $ Sub.Category: chr  "Bookcases" "Chairs" "Labels" "Tables" ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...
eceran$Order.Date<- as.Date(eceran$Order.Date, "%m/%d/%y")
eceran$Ship.Date <- as.Date(eceran$Ship.Date, "%m/%d/%y")
eceran$Ship.Mode <- as.factor(eceran$Ship.Mode)
eceran$Segment<- as.factor(eceran$Segment)
eceran$Category <- as.factor(eceran$Category)
eceran$Sub.Category <- as.factor(eceran$Sub.Category)
str(eceran)
## 'data.frame':    9994 obs. of  14 variables:
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : Date, format: "2016-11-08" "2016-11-08" ...
##  $ Ship.Date   : Date, format: "2016-11-11" "2016-11-11" ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

check for missing value

colSums(is.na(eceran))
##     Order.ID   Order.Date    Ship.Date    Ship.Mode  Customer.ID      Segment 
##            0            0            0            0            0            0 
##   Product.ID     Category Sub.Category Product.Name        Sales     Quantity 
##            0            0            0            0            0            0 
##     Discount       Profit 
##            0            0
anyNA(eceran)
## [1] FALSE

great! no missing value now toko eceran dataset is ready to be processed and analyzed

DATA EXPLANATION

summary(eceran)
##    Order.ID           Order.Date           Ship.Date         
##  Length:9994        Min.   :2014-01-03   Min.   :2014-01-07  
##  Class :character   1st Qu.:2015-05-23   1st Qu.:2015-05-27  
##  Mode  :character   Median :2016-06-26   Median :2016-06-29  
##                     Mean   :2016-04-30   Mean   :2016-05-03  
##                     3rd Qu.:2017-05-14   3rd Qu.:2017-05-18  
##                     Max.   :2017-12-30   Max.   :2018-01-05  
##                                                              
##           Ship.Mode    Customer.ID               Segment      Product.ID       
##  First Class   :1538   Length:9994        Consumer   :5191   Length:9994       
##  Same Day      : 543   Class :character   Corporate  :3020   Class :character  
##  Second Class  :1945   Mode  :character   Home Office:1783   Mode  :character  
##  Standard Class:5968                                                           
##                                                                                
##                                                                                
##                                                                                
##             Category         Sub.Category  Product.Name      
##  Furniture      :2121   Binders    :1523   Length:9994       
##  Office Supplies:6026   Paper      :1370   Class :character  
##  Technology     :1847   Furnishings: 957   Mode  :character  
##                         Phones     : 889                     
##                         Storage    : 846                     
##                         Art        : 796                     
##                         (Other)    :3613                     
##      Sales              Quantity        Discount          Profit         
##  Min.   :    0.444   Min.   : 1.00   Min.   :0.0000   Min.   :-6599.978  
##  1st Qu.:   17.280   1st Qu.: 2.00   1st Qu.:0.0000   1st Qu.:    1.729  
##  Median :   54.490   Median : 3.00   Median :0.2000   Median :    8.666  
##  Mean   :  229.858   Mean   : 3.79   Mean   :0.1562   Mean   :   28.657  
##  3rd Qu.:  209.940   3rd Qu.: 5.00   3rd Qu.:0.2000   3rd Qu.:   29.364  
##  Max.   :22638.480   Max.   :14.00   Max.   :0.8000   Max.   : 8399.976  
## 

Summary : 1. First order occured in Jan 2014 2. Standart Class was the most popular one for shipping option and Same day delivery was the lowest one 3. Toko Eceran has the most buyers coming from cunsumer segment and home office segment was the lowest 4. Office Supply category was the best seller compare to others 5. In Sub category, Other stuff was the most favourite one then followed by Binders and paper. Art is the lowest 6. Toko Ececran gained the average sales at 229.858; with max sales value at 22638.480 and minumum sales value at 0.444 7. Max quantity order was 14 within 4 years but average at 3.79 8. Toko Eceran max amount of discount was 80%, but mean number of discount around 15% 9. Toko Eceran ever had loss at 6599 and gained 8399 as their profit with average profit at 28.657

check the outlier within profit

aggregate(Profit~Segment,eceran,mean)
##       Segment   Profit
## 1    Consumer 25.83687
## 2   Corporate 30.45667
## 3 Home Office 33.81866
aggregate(Profit~Segment,eceran,var)
##       Segment   Profit
## 1    Consumer 58851.86
## 2   Corporate 53817.63
## 3 Home Office 45105.89
aggregate(Profit~Segment,eceran,sd)
##       Segment   Profit
## 1    Consumer 242.5940
## 2   Corporate 231.9863
## 3 Home Office 212.3815
boxplot(eceran$Profit)

DATA MANIPULATION AND TRANSFORMATION

  1. how much quantity is sold in each category?
aggregate(formula = Quantity~Category, data = eceran, FUN = sum)
##          Category Quantity
## 1       Furniture     8028
## 2 Office Supplies    22906
## 3      Technology     6939

furniture 8028 items, office supplies 22906 items, and technology 6939 items

  1. how many profit is generated from each customer segment?
aggregate(formula = Profit~Segment, data = eceran, FUN = mean)
##       Segment   Profit
## 1    Consumer 25.83687
## 2   Corporate 30.45667
## 3 Home Office 33.81866

from segmnet cunsomer we get 25.83687, corporate 30.45667, home office 33.81866

  1. Which category gives the lowest sales number? how much profit?
eceran[eceran$Sales == 0.444,]
##            Order.ID Order.Date  Ship.Date      Ship.Mode Customer.ID  Segment
## 4102 US-2017-102288 2017-06-19 2017-06-23 Standard Class    ZC-21910 Consumer
##           Product.ID        Category Sub.Category
## 4102 OFF-AP-10002906 Office Supplies   Appliances
##                                                                    Product.Name
## 4102 Hoover Replacement Belt for Commercial Guardsman Heavy-Duty Upright Vacuum
##      Sales Quantity Discount Profit
## 4102 0.444        1      0.8  -1.11

comes from segment consumer, office supplies

  1. How much Total quantity order since Toko eceran start selling the product?
sum(eceran$Quantity)
## [1] 37873

37873 items

  1. Category which generate the highest profit?
xtabs(Profit~ Category, eceran)
## Category
##       Furniture Office Supplies      Technology 
##        18451.27       122490.80       145454.95

technology