Background

Hello, welcome to my Rmd!
I’ll be using Retail.csv data in this LBB
Retail.csv data is a description of the data of retail companies engaged in one-stop shopping for office supplies, home appliances, and lifestyle products
Let’s dig deeper into the data and name the object AN_hardware!

Import Data

First, we can input the data

AN_hardware <- read.csv("data_input/retail.csv")

Data Inspection

Then, using the functions head() and tail(), we will see the top data and the last data

head(AN_hardware)
tail(AN_hardware)

Check dimension data:

dim(AN_hardware)
## [1] 9994   15

Check names each columns:

names(AN_hardware)
##  [1] "Row.ID"       "Order.ID"     "Order.Date"   "Ship.Date"    "Ship.Mode"   
##  [6] "Customer.ID"  "Segment"      "Product.ID"   "Category"     "Sub.Category"
## [11] "Product.Name" "Sales"        "Quantity"     "Discount"     "Profit"

From the inspection of the data, we can conclude:
- AN_hardware data contain 9994 of rows and 15 of coloumns
- Column name of the data : “Row.ID”, “Order.ID”, “Order.Date”, “Ship.Date”, “Ship.Mode”, “Customer.ID”, “Segment”,“Product.ID”, “Category”, “Sub.Category”, “Product.Name”, “Sales”, “Quantity”, “Discount”, “Profit”

Data Cleansing

Check data types:

str(AN_hardware)
## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : chr  "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
##  $ Ship.Date   : chr  "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
##  $ Ship.Mode   : chr  "Second Class" "Second Class" "Second Class" "Standard Class" ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : chr  "Consumer" "Consumer" "Corporate" "Consumer" ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : chr  "Furniture" "Furniture" "Office Supplies" "Furniture" ...
##  $ Sub.Category: chr  "Bookcases" "Chairs" "Labels" "Tables" ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

We can see that there are some data types that do not match.
Then we can change the data type first!

AN_hardware$Segment <- as.factor(AN_hardware$Segment)
AN_hardware$Ship.Mode <- as.factor(AN_hardware$Ship.Mode)
AN_hardware$Category <- as.factor(AN_hardware$Category)
AN_hardware$Sub.Category <- as.factor(AN_hardware$Sub.Category)
AN_hardware$Ship.Date <- as.Date(AN_hardware$Ship.Date, format = "%M/%d/%y")
AN_hardware$Order.Date <- as.Date(AN_hardware$Order.Date, format = "%M/%d/%y")

str(AN_hardware)
## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID    : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date  : Date, format: "2016-01-08" "2016-01-08" ...
##  $ Ship.Date   : Date, format: "2016-01-11" "2016-01-11" ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

The data types change was successful! We can continue to look for missing values.

Check missing value on all data:

anyNA(AN_hardware)
## [1] FALSE

Check missing value each of columns:

colSums(is.na(AN_hardware))
##       Row.ID     Order.ID   Order.Date    Ship.Date    Ship.Mode  Customer.ID 
##            0            0            0            0            0            0 
##      Segment   Product.ID     Category Sub.Category Product.Name        Sales 
##            0            0            0            0            0            0 
##     Quantity     Discount       Profit 
##            0            0            0

Good! No missing value.
Now, AN_Hardware dataset is ready to be processed and analyzed.

Data Explanation

summary(AN_hardware)
##      Row.ID       Order.ID           Order.Date           Ship.Date         
##  Min.   :   1   Length:9994        Min.   :2014-01-01   Min.   :2014-01-01  
##  1st Qu.:2499   Class :character   1st Qu.:2015-01-08   1st Qu.:2015-01-09  
##  Median :4998   Mode  :character   Median :2016-01-11   Median :2016-01-12  
##  Mean   :4998                      Mean   :2015-10-06   Mean   :2015-10-12  
##  3rd Qu.:7496                      3rd Qu.:2017-01-08   3rd Qu.:2017-01-08  
##  Max.   :9994                      Max.   :2017-01-31   Max.   :2018-01-05  
##                                                                             
##           Ship.Mode    Customer.ID               Segment      Product.ID       
##  First Class   :1538   Length:9994        Consumer   :5191   Length:9994       
##  Same Day      : 543   Class :character   Corporate  :3020   Class :character  
##  Second Class  :1945   Mode  :character   Home Office:1783   Mode  :character  
##  Standard Class:5968                                                           
##                                                                                
##                                                                                
##                                                                                
##             Category         Sub.Category  Product.Name      
##  Furniture      :2121   Binders    :1523   Length:9994       
##  Office Supplies:6026   Paper      :1370   Class :character  
##  Technology     :1847   Furnishings: 957   Mode  :character  
##                         Phones     : 889                     
##                         Storage    : 846                     
##                         Art        : 796                     
##                         (Other)    :3613                     
##      Sales              Quantity        Discount          Profit         
##  Min.   :    0.444   Min.   : 1.00   Min.   :0.0000   Min.   :-6599.978  
##  1st Qu.:   17.280   1st Qu.: 2.00   1st Qu.:0.0000   1st Qu.:    1.729  
##  Median :   54.490   Median : 3.00   Median :0.2000   Median :    8.666  
##  Mean   :  229.858   Mean   : 3.79   Mean   :0.1562   Mean   :   28.657  
##  3rd Qu.:  209.940   3rd Qu.: 5.00   3rd Qu.:0.2000   3rd Qu.:   29.364  
##  Max.   :22638.480   Max.   :14.00   Max.   :0.8000   Max.   : 8399.976  
## 

Summary :
1. The data’s first order and shipment occurred on January 1, 2014.
2. Standart Class delivery was the most popular for shipping option and Same day delivery was the lowest for shipping option
3. AN_hardware has the most buyers from the consumer segment and home office segment was the lowest
4. In Category, office supplies was the best seller compared to other categories
5. Other stuff was the most popular Sub.Category, while Art was the least popular
6. AN_hardware achieved a maximum sales value of 22638.480, a minimum sales value of 0.444, and an average sales value of 229.858
7. Within four years, the maximum order quantity received by AN_hardware was 14, with an average of 3.79
8. The maximum amount of discount for AN hardware was 80%, but the average amount of discount was around 15.62%
9. AN_hardware has ever had a loss of 6599.978 and a profit of 8399.976, with an average profit of 28.657.

Data Manipulation & Transformation

  1. How much total quantity order since AN_hardware start selling the product?
sum(AN_hardware$Quantity)
## [1] 37873

Answer: AN_hardware has sold a total of 37.873 items.

  1. How much total profit?
sum(AN_hardware$Profit)
## [1] 286397

Answer: AN_hardward has a total profit in 4 years of 286.397.

  1. Which category has the highest profit?
xtabs(formula = Profit ~ Category, data = AN_hardware)
## Category
##       Furniture Office Supplies      Technology 
##        18451.27       122490.80       145454.95

Answer: Technology category has a highest profit.

plot(x = AN_hardware$Category, y = AN_hardware$Profit)

sd(AN_hardware$Profit )
## [1] 234.2601

As a result, we discovered the possibility of outliers, based on calculations that the value of Standard Deviation is around 200, indicating that it is still within the tolerated range, allowing the process to continue.

  1. Which segment has the highest profit?
xtabs(Profit~ Segment, AN_hardware)
## Segment
##    Consumer   Corporate Home Office 
##   134119.21    91979.13    60298.68

Answer : The consumer segment has the highest profit.

  1. What is the total Sales generated by each Segment & Category pair?
xtabs(formula = Sales ~ Segment + Category, data = AN_hardware)
##              Category
## Segment       Furniture Office Supplies Technology
##   Consumer     391049.3        363952.1   406399.9
##   Corporate    229019.8        230676.5   246450.1
##   Home Office  121930.7        124418.4   183304.0

Answer : In terms of sales, it shows that the most sales in the technology category with buyers from the consumer segment.

  1. Highest gain in which transaction?
AN_hardware[AN_hardware$Profit==8399.976,]

Answer : The highest profit on this transaction can be shown from the corporate segment buying 5 items copiers at sales normal prices without discount.

  1. Highest loss in which transaction?
AN_hardware[AN_hardware$Profit==-6599.978,]

Answer : The lowest profit on this transaction can be shown that the consumer buys a machines as much as 5 items at a low price and gets a 70% discount.

  1. Which has the highest average profit?
catsegment <- aggregate (formula = Profit ~ Category + Segment, 
           data=AN_hardware, 
           FUN=mean )
catsegment[order(catsegment$Profit, decreasing= TRUE),]

Answer: Technology has the highest average profit and in the home office segment.

  1. Which has the highest average Sales?
sales_catsegment<- aggregate (formula = Sales ~ Category + Segment, 
           data=AN_hardware, 
           FUN=mean )
sales_catsegment[order(sales_catsegment$Sales, decreasing= TRUE),]

Answer: Technology has the highest average sales and in the home office segment.

  1. Profit based on Category + Segment + Ship.Mode?
ship.mode_cat_subcat<- aggregate (formula = Profit ~ Category + Segment + Ship.Mode, 
           data=AN_hardware, 
           FUN=sum )
ship.mode_cat_subcat[order(ship.mode_cat_subcat$Profit, decreasing= TRUE),]

Answer: Most buyers choose Standard Class for shipping options. And consumer was the higest using it.

Explanatory Text & Business Recomendation

Explanatory

In this study case, AN_hardware is a retail company engaged in one-stop shopping business for selling office supplies, home appliances and lifestyle products. The company sells 3 main categories : Furniture, Office Supplies and Technology.

In 4 years, AN_hardware has sold a total of 37.873 items with a total profit of 286.397. The highest profit in the category with a value of 145.454,95 generated from the ‘Technology’ category compared to others, but the highest loss with a total of 6.599,987 that occurred in 2016 also came from the ‘Technology’ category, And the most sales in the ‘Technology’ category with buyers from the ‘Consumer’ segment. In addition, other segments are buying the products, such as Corporate and Home Office. Based on the data, the ‘Home Office’ segment has the highest average profit and the highest average sales when compared to other segments.

There are four shipping options: First Class, Same Day, Second Class, and Standard Class. The majority of them choose Standard Class as their delivery option within all segments. The consumer segment is the highest segment that purchases technology products in the Standard class.

Business Recomendations

  • Based on calculations, it shows that Technology sells with the highest sales value and gets the highest profit. However, the highest losses were also recorded in the Technology category.

  • The highest loss that comes from the Technology category is because the discount given is too high (70%), so you should calculate it first before deciding what percentage of the discount will be given in order to still get the advantage.

  • Can increase the target market in the Corporate segment and the Home Office segment, by giving them more discounts and considering the benefits for the company.