1 About

This is an R Markdown document that contains business recomendations based on retail data provided by PT. ABC’s Business Database. This Rmarkdown will describe :

1. Data Explanation
1. Data Reading into R
1. Data Transformation
1. Data Exploration

1.1 Data Explanation

This is retail dataset obtained from PT. ABC’s Business Database. The dataset includes their retail transactions in the past 4 years. Today, successful retail businesses rely on solid facts, metrics, and up-to-date data to make key business decisions. That is the reason why we need to analyze this data. If you want to maximize profits and gain customer loyalty then you must understand your customers better.

1.2 Data Reading into R

First make sure the data we’ll like to work with is also in our current directory, and use the read.csv() to read our csv file into our global environment.

retail <- read.csv("dataset/retail.csv")

We can use str() to inspect the structure of data. Call str() on our retail dataset. str() returns the structure of an R Object.

str(retail)

## 'data.frame':    9994 obs. of  15 variables:
##  $ Row.ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID    : Factor w/ 5009 levels "CA-2014-100006",..: 2501 2501 2297 4373 4373 202 202 202 202 202 ...
##  $ Order.Date  : Factor w/ 1237 levels "1/1/17","1/10/14",..: 305 305 836 94 94 922 922 922 922 922 ...
##  $ Ship.Date   : Factor w/ 1334 levels "1/1/15","1/1/16",..: 220 220 907 129 129 897 897 897 897 897 ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : Factor w/ 793 levels "AA-10315","AA-10375",..: 144 144 240 706 706 89 89 89 89 89 ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : Factor w/ 1862 levels "FUR-BO-10000112",..: 13 56 947 320 1317 186 563 1762 795 438 ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: Factor w/ 1850 levels "\"While you Were Out\" Message Book, One Form per Page",..: 387 833 1440 367 574 570 1137 1099 535 295 ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

To learn the data we have, we can see our data attributes like names, length, colnames, dim etc:

length(retail)

## [1] 15

For small dataset, we can view the full data in its raw format through the View(retail) command. But, it is not recommended to use the View() command, because in real life we don’t always know beforehand the size of data.

We can take a peek of our data by using the head or tail command. And we can pass in an extra argument, n, so the function would return the first/last n number of rows instead of the default 6.

head(retail)

We can also inspect if there is any NA (not available) or missing value from the retail dataset by using anyNA() command or colSums(is.na()) for coloumn checking.

anyNA(retail)

## [1] FALSE

After Reading Data into R, we can see the information from raw dataset:

The number of retail transactions is 9994
Fortunately there is no missing value in this retail dataset

1.3 Data Transform

We can simplify the data we need to do a good analysis. First we can exclude the data that we don’t need, such as Row.ID and Order.ID.

retail <- retail[,-c(1,2)]

As we have seen in the results of str() command, we have to transform some types of data variables. We could explicitly coerce one class to another:

retail$Order.Date <- as.Date(retail$Order.Date, "%m/%d/%y")
retail$Ship.Date <- as.Date(retail$Ship.Date, "%m/%d/%y")
retail[,c("Customer.ID","Product.ID","Product.Name")] <- 
  lapply(retail[,c("Customer.ID","Product.ID","Product.Name")], as.character) 
#using "lapply()" command to convert/transform variables into one  same type of variable
str(retail)

## 'data.frame':    9994 obs. of  13 variables:
##  $ Order.Date  : Date, format: "2016-11-08" "2016-11-08" ...
##  $ Ship.Date   : Date, format: "2016-11-11" "2016-11-11" ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Product.ID  : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

We can also remove the Product.ID and Discount variables by using subset() fucntion.

#another method to subset/exclude our data
retail <- subset(retail, select = - c(Product.ID, Discount))
str(retail) #take a review from our new retail dataset structure

## 'data.frame':    9994 obs. of  11 variables:
##  $ Order.Date  : Date, format: "2016-11-08" "2016-11-08" ...
##  $ Ship.Date   : Date, format: "2016-11-11" "2016-11-11" ...
##  $ Ship.Mode   : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Segment     : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ Category    : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
##  $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
##  $ Product.Name: chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...

1.4 Data Exploration

After we finish tidying up the data we need, we can summary data by using summary() function. This summary() function returns quick summary statistics on each of the variable in our new dataset.

summary(retail)

##    Order.Date           Ship.Date                   Ship.Mode   
##  Min.   :2014-01-03   Min.   :2014-01-07   First Class   :1538  
##  1st Qu.:2015-05-23   1st Qu.:2015-05-27   Same Day      : 543  
##  Median :2016-06-26   Median :2016-06-29   Second Class  :1945  
##  Mean   :2016-04-30   Mean   :2016-05-03   Standard Class:5968  
##  3rd Qu.:2017-05-14   3rd Qu.:2017-05-18                        
##  Max.   :2017-12-30   Max.   :2018-01-05                        
##                                                                 
##  Customer.ID               Segment                Category   
##  Length:9994        Consumer   :5191   Furniture      :2121  
##  Class :character   Corporate  :3020   Office Supplies:6026  
##  Mode  :character   Home Office:1783   Technology     :1847  
##                                                              
##                                                              
##                                                              
##                                                              
##       Sub.Category  Product.Name           Sales              Quantity    
##  Binders    :1523   Length:9994        Min.   :    0.444   Min.   : 1.00  
##  Paper      :1370   Class :character   1st Qu.:   17.280   1st Qu.: 2.00  
##  Furnishings: 957   Mode  :character   Median :   54.490   Median : 3.00  
##  Phones     : 889                      Mean   :  229.858   Mean   : 3.79  
##  Storage    : 846                      3rd Qu.:  209.940   3rd Qu.: 5.00  
##  Art        : 796                      Max.   :22638.480   Max.   :14.00  
##  (Other)    :3613                                                         
##      Profit         
##  Min.   :-6599.978  
##  1st Qu.:    1.729  
##  Median :    8.666  
##  Mean   :   28.657  
##  3rd Qu.:   29.364  
##  Max.   : 8399.976  
##

We can see a lot of information generated by summary() function:

All transactions are carried out within 4 years starting from 2014 to 2017
The Consumer segment made the most transactions while the product category is dominated by Office Supplies
The average sales transaction is 229.58
The average profit of a transaction is 28.657, where not all transactions prove to be profitable
The most unprofitable transaction recorded a 6599.978 in losses, and the most profitable transaction recorded a 8399.976 gain

And then we can see which category has the highest sales point and which one has the highest profit by using xtabs() function.

xtabs(Profit ~ Category, retail)

## Category
##       Furniture Office Supplies      Technology 
##        18451.27       122490.80       145454.95

xtabs(Sales ~ Category, retail)

## Category
##       Furniture Office Supplies      Technology 
##        741999.8        719047.0        836154.0

Based on the information from xtabs category tabulation on profits and sales, we can see even though Furniture Office Category sales is greater than Office Supplies, the profit made by Furniture Office is smaller than Office Supplies. We can presume that there is something wrong in the sales of Furniture product category. So, we can analyze the profit made by each Sub.Category as follows:

xtabs(Profit ~ Category + Sub.Category, retail)

##                  Sub.Category
## Category          Accessories  Appliances         Art     Binders
##   Furniture            0.0000      0.0000      0.0000      0.0000
##   Office Supplies      0.0000  18138.0054   6527.7870  30221.7633
##   Technology       41936.6357      0.0000      0.0000      0.0000
##                  Sub.Category
## Category            Bookcases      Chairs     Copiers   Envelopes
##   Furniture        -3472.5560  26590.1663      0.0000      0.0000
##   Office Supplies      0.0000      0.0000      0.0000   6964.1767
##   Technology           0.0000      0.0000  55617.8249      0.0000
##                  Sub.Category
## Category            Fasteners Furnishings      Labels    Machines
##   Furniture            0.0000  13059.1436      0.0000      0.0000
##   Office Supplies    949.5182      0.0000   5546.2540      0.0000
##   Technology           0.0000      0.0000      0.0000   3384.7569
##                  Sub.Category
## Category                Paper      Phones     Storage    Supplies
##   Furniture            0.0000      0.0000      0.0000      0.0000
##   Office Supplies  34053.5693      0.0000  21278.8264  -1189.0995
##   Technology           0.0000  44515.7306      0.0000      0.0000
##                  Sub.Category
## Category               Tables
##   Furniture       -17725.4811
##   Office Supplies      0.0000
##   Technology           0.0000

summary(retail[retail$Sub.Category == "Bookcases","Profit"])

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -1665.052   -42.631     4.133   -15.230    40.054  1013.127

plot(retail[retail$Sub.Category == "Bookcases","Profit"])

summary(retail[retail$Sub.Category == "Tables","Profit"])

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1862.31  -122.83   -31.37   -55.57    33.85   629.01

plot(retail[retail$Sub.Category == "Tables","Profit"])

summary(retail[retail$Sub.Category == "Supplies","Profit"])

##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1049.3406     0.4371     3.9593    -6.2584     9.0858   327.5060

plot(retail[retail$Sub.Category == "Supplies","Profit"])

We can see several number unprofitable Sub.Category of all transactions on Furniture category, such as :

Bookcases with 3472.5560 losses
Tables with 17725.4811 losses, and
Supplies sub.category with 1189.0995 losses (from Office Supplies)

And then we can analyze transactions that occur in those categories by subsetting the dataframe.

retail.Fur <- retail[retail$Category == "Furniture",]
retail.Off <- retail[retail$Category == "Office Supplies",]
retail.Tech <- retail[retail$Category == "Technology",]

Before we analyze the transactions contained in each new dataset we have, I want to show another method that is more flexible for aggregation. We can try to ensure that transactions that occur in the Furniture and Office Supplies categories made losses.

aggregate(Profit ~ Sub.Category, retail.Fur, mean)

aggregate(Profit ~ Sub.Category, retail.Off, mean)

With this aggregate() function we can also get the same information, Bookcases, Tables and Supplies sub.categories provide some losses on most transactions.

Next we want to observe at the distribution of transactions that exist in each category against our segments market.

prop.table(table(retail.Fur$Segment))

## 
##    Consumer   Corporate Home Office 
##   0.5247525   0.3045733   0.1706742

prop.table(table(retail.Off$Segment))

## 
##    Consumer   Corporate Home Office 
##   0.5189180   0.3020246   0.1790574

prop.table(table(retail.Fur$Segment))

## 
##    Consumer   Corporate Home Office 
##   0.5247525   0.3045733   0.1706742

By using those functions we can conclude that the transactions distribution from all three Categories are distributed fairly similiar, in which about 52% come from the Consumer market 30% from the Corporate market, and 18% from the Home Office market.

prop.table(table(retail$Category))

## 
##       Furniture Office Supplies      Technology 
##       0.2122273       0.6029618       0.1848109

What is more interesting is that the most profits from 4 years come from the Technology category. From what we can obtain from our summary, the Technology (18.5%) category has the lowest frequency of transactions compared to Furniture (21.2%) and Office Supplies (60.3%).

2 Business Recommendation

We can conclude Technology category is very potential for the retail business of PT ABC. With a relatively lowest frequency of transactions compared to others, Technology category product can make the highest profit 145454.95. And we must reconsider to make transaction in Bookcases, Supplies and Tables sub.category which made some losses.

PT. ABC Business Recommendation

Fran Sanjaya Lumbangaol