Hi !! Welcome to my LBB :)
In this LBB, I will use data Retail.csv
I hope you enjoy it !
This data is contains of sales record included the profit or loss from a retail company who sells Office Supplies, Furniture and Technology stuff. The buyers are coming from different segment such as Corporate, Home Office until individual consumer. This company also could provide the best option for your delivery which suitable with your needs.\
Note : Detail explanatory will be given at the end of content
Make sure our data placed within folder of our R project data.
data<-read.csv("data_input/retail.csv")Input data is DONE ! then let’s get started
head(data)dim(data)#> [1] 9994 15
names(data)#> [1] "Row.ID" "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode"
#> [6] "Customer.ID" "Segment" "Product.ID" "Category" "Sub.Category"
#> [11] "Product.Name" "Sales" "Quantity" "Discount" "Profit"
From our inspection we can conclude :
* Data contains 9994 of rows and 15 of coloumns
* Column Description
Row.ID = Row NoOrder.ID = Unique ID orderOrder.Date = Order DateShip.Date = Ship DateShip.Mode = Ship ModeCustomer.ID = Unique ID customerSegment = Segmentation/category of CustomerProduct.ID = Unique ID productCategory = product categorySub.Category = product sub categoryProduct.Name = product nameSales = Total sales of product that is bought by
customerQuantity = Total product that is bought by
customerDiscount = Total discount that customer receivedProfit = Total profit for companyCheck data type for each column
str(data)#> 'data.frame': 9994 obs. of 15 variables:
#> $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
#> $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
#> $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
#> $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
#> $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
#> $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
#> $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
#> $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
#> $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
#> $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
#> $ Sales : num 262 731.9 14.6 957.6 22.4 ...
#> $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
#> $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
#> $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
From this result, we find some of data type not in the corect type. we need to convert it into corect type (data coertion)
data$Order.Date<- as.Date(data$Order.Date, "%m/%d/%y")
data$Ship.Date <- as.Date(data$Ship.Date, "%m/%d/%y")
data$Ship.Mode <- as.factor(data$Ship.Mode)
data$Segment <- as.factor(data$Segment)
data$Category <- as.factor(data$Category)
data$Sub.Category <- as.factor(data$Sub.Category)
str(data)#> 'data.frame': 9994 obs. of 15 variables:
#> $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
#> $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
#> $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
#> $ Ship.Mode : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
#> $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
#> $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
#> $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
#> $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
#> $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
#> $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
#> $ Sales : num 262 731.9 14.6 957.6 22.4 ...
#> $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
#> $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
#> $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
Each of column already changed into desired data type
Check for missing value
anyNA(data)#> [1] FALSE
colSums(is.na(data))#> Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
#> 0 0 0 0 0 0
#> Segment Product.ID Category Sub.Category Product.Name Sales
#> 0 0 0 0 0 0
#> Quantity Discount Profit
#> 0 0 0
Good!! No missing value
Here is the brief summary of our data
summary(data)#> Row.ID Order.ID Order.Date Ship.Date
#> Min. : 1 Length:9994 Min. :2014-01-03 Min. :2014-01-07
#> 1st Qu.:2499 Class :character 1st Qu.:2015-05-23 1st Qu.:2015-05-27
#> Median :4998 Mode :character Median :2016-06-26 Median :2016-06-29
#> Mean :4998 Mean :2016-04-30 Mean :2016-05-03
#> 3rd Qu.:7496 3rd Qu.:2017-05-14 3rd Qu.:2017-05-18
#> Max. :9994 Max. :2017-12-30 Max. :2018-01-05
#>
#> Ship.Mode Customer.ID Segment Product.ID
#> First Class :1538 Length:9994 Consumer :5191 Length:9994
#> Same Day : 543 Class :character Corporate :3020 Class :character
#> Second Class :1945 Mode :character Home Office:1783 Mode :character
#> Standard Class:5968
#>
#>
#>
#> Category Sub.Category Product.Name
#> Furniture :2121 Binders :1523 Length:9994
#> Office Supplies:6026 Paper :1370 Class :character
#> Technology :1847 Furnishings: 957 Mode :character
#> Phones : 889
#> Storage : 846
#> Art : 796
#> (Other) :3613
#> Sales Quantity Discount Profit
#> Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
#> 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
#> Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
#> Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
#> 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
#> Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
#>
Summary Insight
Standart Class was the most popular one for shipping
option and Same day delivery was the lowest oneConsumer segment and
Home Office segment was the lowestOffice Supply category was the best seller compare to
othersOther stuff was the most favourite one
then followed by Binders and paper. Art is the
lowestoutliers.Check Outlier
boxplot(data$Sales)
From result above, we now know for sure that the dataset has
outlier
1. Which category has the highest sales number? How much profit?
xtabs(formula=Sales~Category, data = data)#> Category
#> Furniture Office Supplies Technology
#> 741999.8 719047.0 836154.0
xtabs(formula=Profit~Category, data = data)#> Category
#> Furniture Office Supplies Technology
#> 18451.27 122490.80 145454.95
Answer : Category with highest sales number is
Technology and the profit is 145454.95
2. Which segment buying the most Technology suppies?
tech <- data[data$Category=="Office Supplies",]
prop.table(table(tech$Segment))*100#>
#> Consumer Corporate Home Office
#> 51.89180 30.20246 17.90574
Answer : Consumer Segment with over 51%
3. How much is the highest discount given to the customer?
summary(data$Discount)*100#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.00 0.00 20.00 15.62 20.00 80.00
Answer : The highest discount is 80%
4. what category gives the highest discount?
highest.discount<-data[data$Discount>=0.8,]
summary(highest.discount$Category)#> Furniture Office Supplies Technology
#> 0 300 0
Answer : The Office Supplies gives the biggest
discount 80% for 300 transaction
5. What combination of segment and category gives the lowest profit?
agregasi_profit_category <-aggregate(x = Profit~Segment+Category, data = data, FUN = "sum")
agregasi_profit_category[order(agregasi_profit_category$Profit,decreasing=F),]Answer : The Home Office and Furniture
gives the lowest profit
6. Which transaction generated highest loss?
summary(data$Profit)#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -6599.978 1.729 8.666 28.657 29.364 8399.976
The lowest profit is -6599.978
data[data$Profit==-6599.978,]Answer : happened on 2016 Nov 25th, was selling 5 items of machines with discount 70%
7. Which category generated highest loss transaction?
highest.loss<-data[data$Profit<0,]
xtabs(Profit~ Category, highest.loss)#> Category
#> Furniture Office Supplies Technology
#> -60936.11 -56615.26 -38579.92
Answer : Category Furniture generates highest loss
with 60936.11
8. Which category generated most profittable transaction?
highest.profit<-data[data$Profit>0,]
xtabs(Profit~ Category, highest.profit)#> Category
#> Furniture Office Supplies Technology
#> 79387.38 179106.06 184034.87
Answer : Category Technology generates highest
profittable transaction with profit is 184034.87
9. How much total of profit order from each segment and category, and which is the highest?
xtabs(Profit~Segment+Category,data=data)#> Category
#> Segment Furniture Office Supplies Technology
#> Consumer 6991.079 56330.321 70797.810
#> Corporate 7584.816 40227.320 44166.998
#> Home Office 3875.378 25933.160 30490.141
plot(xtabs(Profit~Segment+Category,data))
Answer : Based on result above: segment
Consumer with
category Technology make the highest quantity order with
sum 70797.810
9. What is the median of profit for every segment and category?
aggregate(Profit~Segment+Category,data,median)10. Which category has the biggest no discount transaction?
no.discount<- data[data$Discount==0,]
aggregate(Discount~Category,no.discount,length)Answer : Based on result above: category
Office Supplies has 3129 transaction with no
discount
This company selling 3 main categories : Furniture,
Office supplies and Technology. From these
categories, Technology generates most profits with
145454.95 from all transaction and Furniture generates
highest loss with 60936.11 from its loss transaction.
Furniture also gives least profit from all segment of
customer.
Furniture category OR spend more
(give more discount or send more for marketing budget) to this category
to make this category more profittable. Try to approach
Home Office segment.Office Supplies generates sizable loss (-56615.26),
but it can handles because the profit is great too (179106.06, almost
same with technology). Please be aware of this, because this category
has the biggest number of transaction and over 50% has no discount, but
still generated that much loss.Technology is the best category of all. It
generates highest profit, even some transaction looks weird (with the
biggest loss), and the number of transaction is low.