Hello, welcome to my Rmd!
I’ll be using Retail.csv data in this LBB
Retail.csv data is a description of the data of retail companies engaged in one-stop shopping for office supplies, home appliances, and lifestyle products
Let’s dig deeper into the data and name the object AN_hardware!
First, we can input the data
AN_hardware <- read.csv("data_input/retail.csv")Then, using the functions head() and tail(), we will see the top data and the last data
head(AN_hardware)tail(AN_hardware)Check dimension data:
dim(AN_hardware)## [1] 9994 15
Check names each columns:
names(AN_hardware)## [1] "Row.ID" "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode"
## [6] "Customer.ID" "Segment" "Product.ID" "Category" "Sub.Category"
## [11] "Product.Name" "Sales" "Quantity" "Discount" "Profit"
From the inspection of the data, we can conclude:
- AN_hardware data contain 9994 of rows and 15 of coloumns
- Column name of the data : “Row.ID”, “Order.ID”, “Order.Date”, “Ship.Date”, “Ship.Mode”, “Customer.ID”, “Segment”,“Product.ID”, “Category”, “Sub.Category”, “Product.Name”, “Sales”, “Quantity”, “Discount”, “Profit”
Check data types:
str(AN_hardware)## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
We can see that there are some data types that do not match.
Then we can change the data type first!
AN_hardware$Segment <- as.factor(AN_hardware$Segment)
AN_hardware$Ship.Mode <- as.factor(AN_hardware$Ship.Mode)
AN_hardware$Category <- as.factor(AN_hardware$Category)
AN_hardware$Sub.Category <- as.factor(AN_hardware$Sub.Category)
AN_hardware$Ship.Date <- as.Date(AN_hardware$Ship.Date, format = "%M/%d/%y")
AN_hardware$Order.Date <- as.Date(AN_hardware$Order.Date, format = "%M/%d/%y")
str(AN_hardware)## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : Date, format: "2016-01-08" "2016-01-08" ...
## $ Ship.Date : Date, format: "2016-01-11" "2016-01-11" ...
## $ Ship.Mode : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
## $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
The data types change was successful! We can continue to look for missing values.
Check missing value on all data:
anyNA(AN_hardware)## [1] FALSE
Check missing value each of columns:
colSums(is.na(AN_hardware))## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 0 0 0 0 0 0
## Segment Product.ID Category Sub.Category Product.Name Sales
## 0 0 0 0 0 0
## Quantity Discount Profit
## 0 0 0
Good! No missing value.
Now, AN_Hardware dataset is ready to be processed and analyzed.
summary(AN_hardware)## Row.ID Order.ID Order.Date Ship.Date
## Min. : 1 Length:9994 Min. :2014-01-01 Min. :2014-01-01
## 1st Qu.:2499 Class :character 1st Qu.:2015-01-08 1st Qu.:2015-01-09
## Median :4998 Mode :character Median :2016-01-11 Median :2016-01-12
## Mean :4998 Mean :2015-10-06 Mean :2015-10-12
## 3rd Qu.:7496 3rd Qu.:2017-01-08 3rd Qu.:2017-01-08
## Max. :9994 Max. :2017-01-31 Max. :2018-01-05
##
## Ship.Mode Customer.ID Segment Product.ID
## First Class :1538 Length:9994 Consumer :5191 Length:9994
## Same Day : 543 Class :character Corporate :3020 Class :character
## Second Class :1945 Mode :character Home Office:1783 Mode :character
## Standard Class:5968
##
##
##
## Category Sub.Category Product.Name
## Furniture :2121 Binders :1523 Length:9994
## Office Supplies:6026 Paper :1370 Class :character
## Technology :1847 Furnishings: 957 Mode :character
## Phones : 889
## Storage : 846
## Art : 796
## (Other) :3613
## Sales Quantity Discount Profit
## Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
## 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
## Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
## Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
##
Summary :
1. The data’s first order and shipment occurred on January 1, 2014.
2. Standart Class delivery was the most popular for shipping option and Same day delivery was the lowest for shipping option
3. AN_hardware has the most buyers from the consumer segment and home office segment was the lowest
4. In Category, office supplies was the best seller compared to other categories
5. Other stuff was the most popular Sub.Category, while Art was the least popular
6. AN_hardware achieved a maximum sales value of 22638.480, a minimum sales value of 0.444, and an average sales value of 229.858
7. Within four years, the maximum order quantity received by AN_hardware was 14, with an average of 3.79
8. The maximum amount of discount for AN hardware was 80%, but the average amount of discount was around 15.62%
9. AN_hardware has ever had a loss of 6599.978 and a profit of 8399.976, with an average profit of 28.657.
sum(AN_hardware$Quantity)## [1] 37873
Answer: AN_hardware has sold a total of 37.873 items.
sum(AN_hardware$Profit)## [1] 286397
Answer: AN_hardward has a total profit in 4 years of 286.397.
xtabs(formula = Profit ~ Category, data = AN_hardware)## Category
## Furniture Office Supplies Technology
## 18451.27 122490.80 145454.95
Answer: Technology category has a highest profit.
plot(x = AN_hardware$Category, y = AN_hardware$Profit)sd(AN_hardware$Profit )## [1] 234.2601
As a result, we discovered the possibility of outliers, based on calculations that the value of Standard Deviation is around 200, indicating that it is still within the tolerated range, allowing the process to continue.
xtabs(Profit~ Segment, AN_hardware)## Segment
## Consumer Corporate Home Office
## 134119.21 91979.13 60298.68
Answer : The consumer segment has the highest profit.
xtabs(formula = Sales ~ Segment + Category, data = AN_hardware)## Category
## Segment Furniture Office Supplies Technology
## Consumer 391049.3 363952.1 406399.9
## Corporate 229019.8 230676.5 246450.1
## Home Office 121930.7 124418.4 183304.0
Answer : In terms of sales, it shows that the most sales in the technology category with buyers from the consumer segment.
AN_hardware[AN_hardware$Profit==8399.976,]Answer : The highest profit on this transaction can be shown from the corporate segment buying 5 items copiers at sales normal prices without discount.
AN_hardware[AN_hardware$Profit==-6599.978,]Answer : The lowest profit on this transaction can be shown that the consumer buys a machines as much as 5 items at a low price and gets a 70% discount.
catsegment <- aggregate (formula = Profit ~ Category + Segment,
data=AN_hardware,
FUN=mean )
catsegment[order(catsegment$Profit, decreasing= TRUE),]Answer: Technology has the highest average profit and in the home office segment.
sales_catsegment<- aggregate (formula = Sales ~ Category + Segment,
data=AN_hardware,
FUN=mean )
sales_catsegment[order(sales_catsegment$Sales, decreasing= TRUE),]Answer: Technology has the highest average sales and in the home office segment.
ship.mode_cat_subcat<- aggregate (formula = Profit ~ Category + Segment + Ship.Mode,
data=AN_hardware,
FUN=sum )
ship.mode_cat_subcat[order(ship.mode_cat_subcat$Profit, decreasing= TRUE),]Answer: Most buyers choose Standard Class for shipping options. And consumer was the higest using it.
In this study case, AN_hardware is a retail company engaged in one-stop shopping business for selling office supplies, home appliances and lifestyle products. The company sells 3 main categories : Furniture, Office Supplies and Technology.
In 4 years, AN_hardware has sold a total of 37.873 items with a total profit of 286.397. The highest profit in the category with a value of 145.454,95 generated from the ‘Technology’ category compared to others, but the highest loss with a total of 6.599,987 that occurred in 2016 also came from the ‘Technology’ category, And the most sales in the ‘Technology’ category with buyers from the ‘Consumer’ segment. In addition, other segments are buying the products, such as Corporate and Home Office. Based on the data, the ‘Home Office’ segment has the highest average profit and the highest average sales when compared to other segments.
There are four shipping options: First Class, Same Day, Second Class, and Standard Class. The majority of them choose Standard Class as their delivery option within all segments. The consumer segment is the highest segment that purchases technology products in the Standard class.
Based on calculations, it shows that Technology sells with the highest sales value and gets the highest profit. However, the highest losses were also recorded in the Technology category.
The highest loss that comes from the Technology category is because the discount given is too high (70%), so you should calculate it first before deciding what percentage of the discount will be given in order to still get the advantage.
Can increase the target market in the Corporate segment and the Home Office segment, by giving them more discounts and considering the benefits for the company.