Hi…!welcome to my Rmd. in this LBB i will use previous data which is Retail.csv. i hope you enjoy it!
eceran <- read.csv("E:/Kerja/retail.csv")
input data is done! then let’s started
head(eceran)
## Order.ID Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 1 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520 Consumer
## 2 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520 Consumer
## 3 CA-2016-138688 6/12/16 6/16/16 Second Class DV-13045 Corporate
## 4 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335 Consumer
## 5 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335 Consumer
## 6 CA-2014-115812 6/9/14 6/14/14 Standard Class BH-11710 Consumer
## Product.ID Category Sub.Category
## 1 FUR-BO-10001798 Furniture Bookcases
## 2 FUR-CH-10000454 Furniture Chairs
## 3 OFF-LA-10000240 Office Supplies Labels
## 4 FUR-TA-10000577 Furniture Tables
## 5 OFF-ST-10000760 Office Supplies Storage
## 6 FUR-FU-10001487 Furniture Furnishings
## Product.Name Sales
## 1 Bush Somerset Collection Bookcase 261.9600
## 2 Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back 731.9400
## 3 Self-Adhesive Address Labels for Typewriters by Universal 14.6200
## 4 Bretford CR4500 Series Slim Rectangular Table 957.5775
## 5 Eldon Fold 'N Roll Cart System 22.3680
## 6 Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood 48.8600
## Quantity Discount Profit
## 1 2 0.00 41.9136
## 2 3 0.00 219.5820
## 3 2 0.00 6.8714
## 4 5 0.45 -383.0310
## 5 2 0.20 2.5164
## 6 7 0.00 14.1694
tail(eceran)
## Order.ID Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 9989 CA-2017-163629 11/17/17 11/21/17 Standard Class RA-19885 Corporate
## 9990 CA-2014-110422 1/21/14 1/23/14 Second Class TB-21400 Consumer
## 9991 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060 Consumer
## 9992 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060 Consumer
## 9993 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060 Consumer
## 9994 CA-2017-119914 5/4/17 5/9/17 Second Class CC-12220 Consumer
## Product.ID Category Sub.Category
## 9989 TEC-PH-10004006 Technology Phones
## 9990 FUR-FU-10001889 Furniture Furnishings
## 9991 FUR-FU-10000747 Furniture Furnishings
## 9992 TEC-PH-10003645 Technology Phones
## 9993 OFF-PA-10004041 Office Supplies Paper
## 9994 OFF-AP-10002684 Office Supplies Appliances
## Product.Name
## 9989 Panasonic KX - TS880B Telephone
## 9990 Ultra Door Pull Handle
## 9991 Tenex B1-RE Series Chair Mats for Low Pile Carpets
## 9992 Aastra 57i VoIP phone
## 9993 It's Hot Message Books with Stickers, 2 3/4" x 5"
## 9994 Acco 7-Outlet Masterpiece Power Center, Wihtout Fax/Phone Line Protection
## Sales Quantity Discount Profit
## 9989 206.100 5 0.0 55.6470
## 9990 25.248 3 0.2 4.1028
## 9991 91.960 2 0.0 15.6332
## 9992 258.576 2 0.2 19.3932
## 9993 29.600 4 0.0 13.3200
## 9994 243.160 2 0.0 72.9480
dim(eceran)
## [1] 9994 14
names(eceran)
## [1] "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode" "Customer.ID"
## [6] "Segment" "Product.ID" "Category" "Sub.Category" "Product.Name"
## [11] "Sales" "Quantity" "Discount" "Profit"
from our inspection we can conclude: * toko eceran data contain 9994 of rows and 14 of columns * each of columns: “Row.ID”, “Order.ID”, “Order.Date”, “Ship.Date”, “Ship.Mode”, “Customer.ID”, “Segment”,“Product.ID”, “Category”, “Sub.Category”, “Product.Name”, “Sales”, “Quantity”, “Discount”, “Profit” * ## DATA CLEANSING
check data type for each columns
str(eceran)
## 'data.frame': 9994 obs. of 14 variables:
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
eceran$Order.Date<- as.Date(eceran$Order.Date, "%m/%d/%y")
eceran$Ship.Date <- as.Date(eceran$Ship.Date, "%m/%d/%y")
eceran$Ship.Mode <- as.factor(eceran$Ship.Mode)
eceran$Segment<- as.factor(eceran$Segment)
eceran$Category <- as.factor(eceran$Category)
eceran$Sub.Category <- as.factor(eceran$Sub.Category)
str(eceran)
## 'data.frame': 9994 obs. of 14 variables:
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
## $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
## $ Ship.Mode : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
## $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
check for missing value
colSums(is.na(eceran))
## Order.ID Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 0 0 0 0 0 0
## Product.ID Category Sub.Category Product.Name Sales Quantity
## 0 0 0 0 0 0
## Discount Profit
## 0 0
anyNA(eceran)
## [1] FALSE
great! no missing value now toko eceran dataset is ready to be processed and analyzed
summary(eceran)
## Order.ID Order.Date Ship.Date
## Length:9994 Min. :2014-01-03 Min. :2014-01-07
## Class :character 1st Qu.:2015-05-23 1st Qu.:2015-05-27
## Mode :character Median :2016-06-26 Median :2016-06-29
## Mean :2016-04-30 Mean :2016-05-03
## 3rd Qu.:2017-05-14 3rd Qu.:2017-05-18
## Max. :2017-12-30 Max. :2018-01-05
##
## Ship.Mode Customer.ID Segment Product.ID
## First Class :1538 Length:9994 Consumer :5191 Length:9994
## Same Day : 543 Class :character Corporate :3020 Class :character
## Second Class :1945 Mode :character Home Office:1783 Mode :character
## Standard Class:5968
##
##
##
## Category Sub.Category Product.Name
## Furniture :2121 Binders :1523 Length:9994
## Office Supplies:6026 Paper :1370 Class :character
## Technology :1847 Furnishings: 957 Mode :character
## Phones : 889
## Storage : 846
## Art : 796
## (Other) :3613
## Sales Quantity Discount Profit
## Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
## 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
## Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
## Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
##
Summary : 1. First order occured in Jan 2014 2. Standart Class was the most popular one for shipping option and Same day delivery was the lowest one 3. Toko Eceran has the most buyers coming from cunsumer segment and home office segment was the lowest 4. Office Supply category was the best seller compare to others 5. In Sub category, Other stuff was the most favourite one then followed by Binders and paper. Art is the lowest 6. Toko Ececran gained the average sales at 229.858; with max sales value at 22638.480 and minumum sales value at 0.444 7. Max quantity order was 14 within 4 years but average at 3.79 8. Toko Eceran max amount of discount was 80%, but mean number of discount around 15% 9. Toko Eceran ever had loss at 6599 and gained 8399 as their profit with average profit at 28.657
check the outlier within profit
aggregate(Profit~Segment,eceran,mean)
## Segment Profit
## 1 Consumer 25.83687
## 2 Corporate 30.45667
## 3 Home Office 33.81866
aggregate(Profit~Segment,eceran,var)
## Segment Profit
## 1 Consumer 58851.86
## 2 Corporate 53817.63
## 3 Home Office 45105.89
aggregate(Profit~Segment,eceran,sd)
## Segment Profit
## 1 Consumer 242.5940
## 2 Corporate 231.9863
## 3 Home Office 212.3815
boxplot(eceran$Profit)
aggregate(formula = Quantity~Category, data = eceran, FUN = sum)
## Category Quantity
## 1 Furniture 8028
## 2 Office Supplies 22906
## 3 Technology 6939
furniture 8028 items, office supplies 22906 items, and technology 6939 items
aggregate(formula = Profit~Segment, data = eceran, FUN = mean)
## Segment Profit
## 1 Consumer 25.83687
## 2 Corporate 30.45667
## 3 Home Office 33.81866
from segmnet cunsomer we get 25.83687, corporate 30.45667, home office 33.81866
eceran[eceran$Sales == 0.444,]
## Order.ID Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 4102 US-2017-102288 2017-06-19 2017-06-23 Standard Class ZC-21910 Consumer
## Product.ID Category Sub.Category
## 4102 OFF-AP-10002906 Office Supplies Appliances
## Product.Name
## 4102 Hoover Replacement Belt for Commercial Guardsman Heavy-Duty Upright Vacuum
## Sales Quantity Discount Profit
## 4102 0.444 1 0.8 -1.11
comes from segment consumer, office supplies
sum(eceran$Quantity)
## [1] 37873
37873 items
xtabs(Profit~ Category, eceran)
## Category
## Furniture Office Supplies Technology
## 18451.27 122490.80 145454.95
technology