Hi !! Welcome to my Rmd :) in this LBB i will use previous data which is retail.csv
This data is contains of sales record included the profit or loss from a retail company (i named it PT. Elips). PT.Elips is a global retail company who sells Office Supplies, Furniture and Technology stuff. The buyers are coming from different segment such as Corporate, Home Office until individual consumer. This company also could provide the best option for your delivery which suitable with your needs.
Note : Detail explanatory will be given at the end of content
retail <- read.csv("retail.csv")
My Input Data is done! Now let’s check our data!
head(retail)
## Order.ID Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 1 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520 Consumer
## 2 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520 Consumer
## 3 CA-2016-138688 6/12/16 6/16/16 Second Class DV-13045 Corporate
## 4 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335 Consumer
## 5 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335 Consumer
## 6 CA-2014-115812 6/9/14 6/14/14 Standard Class BH-11710 Consumer
## Product.ID Category Sub.Category
## 1 FUR-BO-10001798 Furniture Bookcases
## 2 FUR-CH-10000454 Furniture Chairs
## 3 OFF-LA-10000240 Office Supplies Labels
## 4 FUR-TA-10000577 Furniture Tables
## 5 OFF-ST-10000760 Office Supplies Storage
## 6 FUR-FU-10001487 Furniture Furnishings
## Product.Name Sales
## 1 Bush Somerset Collection Bookcase 261.9600
## 2 Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back 731.9400
## 3 Self-Adhesive Address Labels for Typewriters by Universal 14.6200
## 4 Bretford CR4500 Series Slim Rectangular Table 957.5775
## 5 Eldon Fold 'N Roll Cart System 22.3680
## 6 Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood 48.8600
## Quantity Discount Profit
## 1 2 0.00 41.9136
## 2 3 0.00 219.5820
## 3 2 0.00 6.8714
## 4 5 0.45 -383.0310
## 5 2 0.20 2.5164
## 6 7 0.00 14.1694
tail(retail)
## Order.ID Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 9989 CA-2017-163629 11/17/17 11/21/17 Standard Class RA-19885 Corporate
## 9990 CA-2014-110422 1/21/14 1/23/14 Second Class TB-21400 Consumer
## 9991 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060 Consumer
## 9992 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060 Consumer
## 9993 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060 Consumer
## 9994 CA-2017-119914 5/4/17 5/9/17 Second Class CC-12220 Consumer
## Product.ID Category Sub.Category
## 9989 TEC-PH-10004006 Technology Phones
## 9990 FUR-FU-10001889 Furniture Furnishings
## 9991 FUR-FU-10000747 Furniture Furnishings
## 9992 TEC-PH-10003645 Technology Phones
## 9993 OFF-PA-10004041 Office Supplies Paper
## 9994 OFF-AP-10002684 Office Supplies Appliances
## Product.Name
## 9989 Panasonic KX - TS880B Telephone
## 9990 Ultra Door Pull Handle
## 9991 Tenex B1-RE Series Chair Mats for Low Pile Carpets
## 9992 Aastra 57i VoIP phone
## 9993 It's Hot Message Books with Stickers, 2 3/4" x 5"
## 9994 Acco 7-Outlet Masterpiece Power Center, Wihtout Fax/Phone Line Protection
## Sales Quantity Discount Profit
## 9989 206.100 5 0.0 55.6470
## 9990 25.248 3 0.2 4.1028
## 9991 91.960 2 0.0 15.6332
## 9992 258.576 2 0.2 19.3932
## 9993 29.600 4 0.0 13.3200
## 9994 243.160 2 0.0 72.9480
dim(retail)
## [1] 9994 14
names(retail)
## [1] "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode" "Customer.ID"
## [6] "Segment" "Product.ID" "Category" "Sub.Category" "Product.Name"
## [11] "Sales" "Quantity" "Discount" "Profit"
From our inspection we can conclude : * retail data contain 9994 of rows and 14 of coloumns * Each of column name : “Order.ID”, “Order.Date”, “Ship.Date”, “Ship.Mode”, “Customer.ID”, “Segment”,“Product.ID”, “Category”, “Sub.Category”, “Product.Name”, “Sales”, “Quantity”, “Discount”, “Profit”
Check data type for each column
str(retail)
## 'data.frame': 9994 obs. of 14 variables:
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
From this result, we find some of data type not in the corect type. we need to convert it into corect type (data coertion)
retail$Order.ID<- as.character (retail$Order.ID)
retail$Order.Date<- as.Date(retail$Order.Date, "%m/%d/%y")
retail$Ship.Date <- as.Date(retail$Ship.Date, "%m/%d/%y")
retail$Customer.ID<-as.character(retail$Customer.ID)
retail$Product.ID<- as.character(retail$Product.ID)
retail$Product.Name<-as.character(retail$Product.Name)
retail$Segment <- as.factor(retail$Segment)
retail$Ship.Mode <- as.factor(retail$Ship.Mode)
retail$Category <- as.factor(retail$Category)
retail$Sub.Category <- as.factor(retail$Sub.Category)
str(retail)
## 'data.frame': 9994 obs. of 14 variables:
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
## $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
## $ Ship.Mode : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
## $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
Each of column has already changed into desired data type
Cek for missing value
colSums(is.na(retail))
## Order.ID Order.Date Ship.Date Ship.Mode Customer.ID Segment
## 0 0 0 0 0 0
## Product.ID Category Sub.Category Product.Name Sales Quantity
## 0 0 0 0 0 0
## Discount Profit
## 0 0
alright!! there are no missing values!!
Do subset do delete some column (1&2 because we dont need the informations). then save it into ‘Elips’ variable
retail_clean <- retail[,c(2:14)]
head(retail_clean)
## Order.Date Ship.Date Ship.Mode Customer.ID Segment Product.ID
## 1 2016-11-08 2016-11-11 Second Class CG-12520 Consumer FUR-BO-10001798
## 2 2016-11-08 2016-11-11 Second Class CG-12520 Consumer FUR-CH-10000454
## 3 2016-06-12 2016-06-16 Second Class DV-13045 Corporate OFF-LA-10000240
## 4 2015-10-11 2015-10-18 Standard Class SO-20335 Consumer FUR-TA-10000577
## 5 2015-10-11 2015-10-18 Standard Class SO-20335 Consumer OFF-ST-10000760
## 6 2014-06-09 2014-06-14 Standard Class BH-11710 Consumer FUR-FU-10001487
## Category Sub.Category
## 1 Furniture Bookcases
## 2 Furniture Chairs
## 3 Office Supplies Labels
## 4 Furniture Tables
## 5 Office Supplies Storage
## 6 Furniture Furnishings
## Product.Name Sales
## 1 Bush Somerset Collection Bookcase 261.9600
## 2 Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back 731.9400
## 3 Self-Adhesive Address Labels for Typewriters by Universal 14.6200
## 4 Bretford CR4500 Series Slim Rectangular Table 957.5775
## 5 Eldon Fold 'N Roll Cart System 22.3680
## 6 Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood 48.8600
## Quantity Discount Profit
## 1 2 0.00 41.9136
## 2 3 0.00 219.5820
## 3 2 0.00 6.8714
## 4 5 0.45 -383.0310
## 5 2 0.20 2.5164
## 6 7 0.00 14.1694
Now, retail dataset is ready to be processed and analyzed!
So here is the brief explanation of our data
summary(retail_clean)
## Order.Date Ship.Date Ship.Mode
## Min. :2014-01-03 Min. :2014-01-07 First Class :1538
## 1st Qu.:2015-05-23 1st Qu.:2015-05-27 Same Day : 543
## Median :2016-06-26 Median :2016-06-29 Second Class :1945
## Mean :2016-04-30 Mean :2016-05-03 Standard Class:5968
## 3rd Qu.:2017-05-14 3rd Qu.:2017-05-18
## Max. :2017-12-30 Max. :2018-01-05
##
## Customer.ID Segment Product.ID
## Length:9994 Consumer :5191 Length:9994
## Class :character Corporate :3020 Class :character
## Mode :character Home Office:1783 Mode :character
##
##
##
##
## Category Sub.Category Product.Name
## Furniture :2121 Binders :1523 Length:9994
## Office Supplies:6026 Paper :1370 Class :character
## Technology :1847 Furnishings: 957 Mode :character
## Phones : 889
## Storage : 846
## Art : 796
## (Other) :3613
## Sales Quantity Discount Profit
## Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
## 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
## Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
## Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
##
Summary :
Check the Outlier within profit
aggregate(Profit~Segment,retail_clean,mean)
## Segment Profit
## 1 Consumer 25.83687
## 2 Corporate 30.45667
## 3 Home Office 33.81866
aggregate(Profit~Segment,retail_clean,var)
## Segment Profit
## 1 Consumer 58851.86
## 2 Corporate 53817.63
## 3 Home Office 45105.89
aggregate(Profit~Segment,retail_clean,sd)
## Segment Profit
## 1 Consumer 242.5940
## 2 Corporate 231.9863
## 3 Home Office 212.3815
boxplot(retail_clean$Profit)
From result above, we find posibilities for the outliers, but from our calculation, Sd value is around 200 and its still can be tolerated, so the process may continue.
retail_clean[retail_clean$Sales == 0.444,]
## Order.Date Ship.Date Ship.Mode Customer.ID Segment Product.ID
## 4102 2017-06-19 2017-06-23 Standard Class ZC-21910 Consumer OFF-AP-10002906
## Category Sub.Category
## 4102 Office Supplies Appliances
## Product.Name
## 4102 Hoover Replacement Belt for Commercial Guardsman Heavy-Duty Upright Vacuum
## Sales Quantity Discount Profit
## 4102 0.444 1 0.8 -1.11
Answer : Sales 0.444 comes from segment ‘consumer’, office supplies, and got loss 1.11
offsup <- retail_clean[retail_clean$Category=="Office Supplies",]
round(prop.table(table(offsup$Segment))*100,2)
##
## Consumer Corporate Home Office
## 51.89 30.20 17.91
Answer : consumer segment –> 51.89%
disc <- retail_clean[retail_clean$Discount == 0.8,]
(sort(table(disc$Order.Date),decreasing = T)[1])
## 2014-07-26
## 4
Answer : 80% discount was on July 2014, happened 4 times
disc_buyer <- disc[disc$Order.Date=="2014-07-26",]
(table(disc_buyer$Segment))
##
## Consumer Corporate Home Office
## 4 0 0
Answer : all buyer was coming from consumer segment
xtabs(Quantity~Segment+Category,retail_clean)
## Category
## Segment Furniture Office Supplies Technology
## Consumer 4166 11758 3597
## Corporate 2495 7018 2095
## Home Office 1367 4130 1247
plot(xtabs(Quantity~Segment+Category,retail_clean))
heatmap(xtabs(Quantity~Segment+Category, retail_clean), cexRow = 0.8, cexCol = 0.8, scale = "column", Colv = NA, Rowv = NA)
Answer : Based on result above: segmen consumer with category office supply make the highest quantity order with sum 11758
aggregate(Quantity~Segment+Category,retail_clean,median)
## Segment Category Quantity
## 1 Consumer Furniture 3
## 2 Corporate Furniture 3
## 3 Home Office Furniture 3
## 4 Consumer Office Supplies 3
## 5 Corporate Office Supplies 3
## 6 Home Office Office Supplies 3
## 7 Consumer Technology 3
## 8 Corporate Technology 3
## 9 Home Office Technology 3
Answer : Median of quantity = 3
sum(retail_clean$Quantity)
## [1] 37873
Answer : 37873 items
sum(retail_clean$Profit)
## [1] 286397
Answer : 286397
DisP<- retail_clean[retail_clean$Discount== 0,]
knitr::kable(table(DisP$Discount))
| Var1 | Freq |
|---|---|
| 0 | 4798 |
DisP_new <- retail_clean[retail_clean$Discount>0 & retail_clean$Discount<=0.8,]
knitr::kable(table(DisP_new$Discount))
| Var1 | Freq |
|---|---|
| 0.1 | 94 |
| 0.15 | 52 |
| 0.2 | 3657 |
| 0.3 | 227 |
| 0.32 | 27 |
| 0.4 | 206 |
| 0.45 | 11 |
| 0.5 | 66 |
| 0.6 | 138 |
| 0.7 | 418 |
| 0.8 | 300 |
Answer : based on this data, half ammount of data was seeling without any discount given
xtabs(Profit~ Category, retail_clean)
## Category
## Furniture Office Supplies Technology
## 18451.27 122490.80 145454.95
Answer : Technology
retail_clean[retail_clean$Profit==-6599.978,]
## Order.Date Ship.Date Ship.Mode Customer.ID Segment Product.ID
## 7773 2016-11-25 2016-12-02 Standard Class CS-12505 Consumer TEC-MA-10000418
## Category Sub.Category Product.Name Sales
## 7773 Technology Machines Cubify CubeX 3D Printer Double Head Print 4499.985
## Quantity Discount Profit
## 7773 5 0.7 -6599.978
Answer : happened in 2016, was selling 5 items of machines, discount 70%
xtabs(Profit~ Segment, retail_clean)
## Segment
## Consumer Corporate Home Office
## 134119.21 91979.13 60298.68
Answer : Consumer was given the higest profit compare to other segment
xtabs(Sales~ Category, retail_clean)
## Category
## Furniture Office Supplies Technology
## 741999.8 719047.0 836154.0
Answer : Technology have the highest selling price compare to other
xtabs(Profit~Segment+Ship.Mode,retail_clean)
## Ship.Mode
## Segment First Class Same Day Second Class Standard Class
## Consumer 21374.044 9874.205 24946.911 77924.049
## Corporate 14464.472 1818.142 18225.713 57470.807
## Home Office 13131.324 4199.412 14274.011 28693.932
plot(xtabs(Profit~Segment+Ship.Mode,retail_clean))
Answer : Most of buyer was choosing Standar Class for shipping options. Among them consumer was the higest one
xtabs(Quantity~Category,retail_clean)
## Category
## Furniture Office Supplies Technology
## 8028 22906 6939
Answer : Office Supply
In 4 years, since bigining of 2014-2017 retail. csv have sold total 37873 items with total profit 286397. This company selling 3 main category : Furniture, Office supplies and Technology. From this category, ‘Technology’ generate the highest profit with value 145454.95 compare to others but ironically, Highest loss with ammount 6599.987 happened in 2016 was coming from ‘Technology’ category as well. Apart from that, Technology categoru has the highest selling price.
Buyer who buying the product was coming from different segment, such as : Consumer, Corporate and Home Office. ‘Consumer’ was giving the higest ammount of profit compare to other with 134119.21. Shipping option consist of 4 types : First Class, Same day, Second class, and Standart class. Based on data, from all segment, most of them are choosing Standart Class as their shipping option. Consumer segment is the most highest segment who choosing Standart class mode.
Recomendations :
From our calculation above, showing that Technology was selling with highest sales value but contradictory max loss was coming from Technology section as well. This is caused because discound given was too hight (80%) without think about the provit.Consider it, No discount given without making some profit!! Calculate twice before deciding how many percentage of discount would be given Need to increase market target at Corporate segment and home office segment, we may give them more discount with taking back the profit to company Make sure all category was safely supplied on the wareshouse specially for ffice supply items.Based on our calculation, buyers mostly buyed office supply category with 22906, it almost 2-3 times higer from other category