1. Background
PT Komoditi Ria Kencana is a retail commodity trading company. The company have worldwide activities in the sourcing, transporting, storage, and financing of some commodities products to retail market. Their success is built upon knowledge of the market, reliability and independence
They strive to be a reliable and competitive partner, working to contribute to their customers’ and suppliers’ expansion and development
This report contains of explanatory product data analysis from PT Komoditi Ria Kencana including sales records data, P&L data and other detail data.
2. Exploratory Data
2.1 Data Inspection
Before we do explanatory data analysis, we need to inspect our data and do some cleanings if it is necessary.
Our first step is inputting the data.
komoditi <- read.csv("data_input/retail.csv")Our next step is inspecting the data.
First, we have to check the dimension of the data by using function dim().
dim(komoditi)## [1] 9994 15
The data has dimension as follows:
- 9994 rows. It means that the company had 9994 transactions from 2014 to 2017
- 15 columns. It means that the data had recorded 15 variables, from Order ID to Product details.
Now we will cross check the dimension by calling the first 6 data and the last 6 data by using function head() and tail()
head(komoditi)tail(komoditi)So the dimension is matched by using function dim() and head()/tail(). Great!
The next inspection is to make sure the columns name of the data to ease our explanatory data analysis.
names(komoditi)## [1] "Row.ID" "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode"
## [6] "Customer.ID" "Segment" "Product.ID" "Category" "Sub.Category"
## [11] "Product.Name" "Sales" "Quantity" "Discount" "Profit"
Each Collumns label is as follows: + “Row.ID”, “Order.ID”, “Order.Date”, “Ship.Date”, “Ship.Mode”, “Customer.ID”, “Segment”,“Product.ID”, “Category”, “Sub.Category”, “Product.Name”, “Sales”, “Quantity”, “Discount”, “Profit”
The last inspection is to check one by one the datatype of each columns. This inspection is needed to check if the datatype is in the correct form of the datatype required to do the explanatory data analysis. We use funcion str() to check the datatype of each columns.
str(komoditi)## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
We found out that some of the columns are not in the correct datatype.
- Order Date and Ship Date columns should be in Datetime datatype
- Ship Mode, Segment, Category, and Sub.Category should be in Category/Factor datatype
In the next process (data cleansing and coertion), we will change the incorrect datatypes to the correct datatypes
2.2 Data Cleansing & Coertions
Following up the last chapter, we need to change some of the incorrect datatypes to the correct datatype.
komoditi$Segment <- as.factor(komoditi$Segment)
komoditi$Category <- as.factor(komoditi$Category)
komoditi$Order.Date <- as.Date(komoditi$Order.Date, "%m/%d/%y")
komoditi$Ship.Date <- as.Date(komoditi$Ship.Date, "%m/%d/%y")
komoditi$Ship.Mode <- as.factor(komoditi$Ship.Mode)
komoditi$Sub.Category <- as.factor(komoditi$Sub.Category)
str(komoditi)## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
## $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
## $ Ship.Mode : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
## $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
Now all the datatypes is already in the correct datatypes. The next process is to check if we have any missing values in the Data.
colSums(is.na(komoditi))## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 0 0 0 0 0 0
## Segment Product.ID Category Sub.Category Product.Name Sales
## 0 0 0 0 0 0
## Quantity Discount Profit
## 0 0 0
anyNA(komoditi)## [1] FALSE
So there are no missing values in the dataset. It is a great dataset!
Next process is to subset some columns that has no value informations to the explanatory data analysis.
Komoditi<- komoditi[,c(3:15)]
head(komoditi)Now, the dataset is ready to be processed and analyzed.
3. Data Explanation
Now we will show details from each columns to gain insights from each columns
summary(Komoditi)## Order.Date Ship.Date Ship.Mode
## Min. :2014-01-03 Min. :2014-01-07 First Class :1538
## 1st Qu.:2015-05-23 1st Qu.:2015-05-27 Same Day : 543
## Median :2016-06-26 Median :2016-06-29 Second Class :1945
## Mean :2016-04-30 Mean :2016-05-03 Standard Class:5968
## 3rd Qu.:2017-05-14 3rd Qu.:2017-05-18
## Max. :2017-12-30 Max. :2018-01-05
##
## Customer.ID Segment Product.ID
## Length:9994 Consumer :5191 Length:9994
## Class :character Corporate :3020 Class :character
## Mode :character Home Office:1783 Mode :character
##
##
##
##
## Category Sub.Category Product.Name
## Furniture :2121 Binders :1523 Length:9994
## Office Supplies:6026 Paper :1370 Class :character
## Technology :1847 Furnishings: 957 Mode :character
## Phones : 889
## Storage : 846
## Art : 796
## (Other) :3613
## Sales Quantity Discount Profit
## Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
## 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
## Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
## Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
##
Here are insights we can get after we summarize the data:
- The first order recorded in Jan 2014
- Standard Class was the most popular shipping mode chosen by the customers and Same-day-delivery was the least popular
- Customer’s segment from PT Komoditi Ria Kencana are divided into 3 segments. Consumer Segment is the biggest segment while home office segment is the lowest segment.
- Office Supply category was the best seller compare to others
- In Sub category, Other stuff was the most favourite one then followed by Binders and paper. Art is the lowest
- The company gained the average sales at 229.858; with max sales value at 22638.480 and minumum sales value at 0.444
- Max quantity order was 14 within 4 years with average at 3.79
- The company max amount of discount was 80%, but mean number of discount around 15%
- The company had recorded a loss at 6599 and a gain 8399 as their profit with average profit at 28.657
Now we check correlation between numerical data:
cor(Komoditi$Sales,Komoditi$Profit)## [1] 0.4790643
plot(Komoditi$Sales,Komoditi$Profit)
abline(lm(Komoditi$Profit~Komoditi$Sales),col='red')cor(Komoditi$Sales,Komoditi$Quantity)## [1] 0.2007948
plot(Komoditi$Quantity,Komoditi$Sales)
abline(lm(Komoditi$Sales~Komoditi$Quantity),col='red')cor(Komoditi$Profit,Komoditi$Quantity)## [1] 0.06625319
plot(Komoditi$Quantity,Komoditi$Profit)
abline(lm(Komoditi$Profit~Komoditi$Quantity),col='red')cor(Komoditi$Discount,Komoditi$Sales)## [1] -0.02819012
plot(Komoditi$Discount,Komoditi$Sales)
abline(lm(Komoditi$Sales~Komoditi$Discount),col='red')cor(Komoditi$Discount,Komoditi$Profit)## [1] -0.2194875
plot(Komoditi$Discount,Komoditi$Profit)
abline(lm(Komoditi$Profit~Komoditi$Discount),col='red')cor(Komoditi$Discount,Komoditi$Profit)## [1] -0.2194875
plot(Komoditi$Discount,Komoditi$Quantity)
abline(lm(Komoditi$Quantity~Komoditi$Discount),col='red')After we check the correlation between numerical data:
- There are positive correlation between Sales and Profit.It means that if the Sales is going up, the profit will also go up
- There are also positive correlation between Sales and Quantity however the correlation is not significant
- The correlation between Profit and Quantity is almost zero. We can conclude there are no correlation betwee these numerical data
- There are negative correlation between Sales and Discount. It means that if we give higher discount, it doesn’t mean that the Sales will go up.
- There are negative correlation between Profit and Discount. It means that if we give higher discount, it doesn’t mean that the Profit will go up.
- There are negative correlation between Quanity and Discount. It means that if we give higher discount, it doesn’t mean that the Quantity will go up.
4. Analysis
Now we will do analysis to some business questions related to this data.
- Which Ship Mode that have provided the lowest and the highest sum of Sales,Profit & Quantity.
library(ggplot2)
komss <- aggregate.data.frame(x=list(komoditi$Sales,komoditi$Profit,komoditi$Quantity),
by=list(komoditi$Ship.Mode),FUN=sum)
komss <- setNames(komss,c("Moda_Pengapalan","Penjualan","Keuntungan","Kuantiti"))
ggplot(data=komss,mapping=aes(x=reorder(Moda_Pengapalan,Penjualan),
y=Penjualan,fill=Moda_Pengapalan))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Penjualan,0)),vjust=2)+
labs(title="Moda Pengapalan vs Penjualan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Moda Pengapalan",
y="Penjualan")ggplot(data=komss,mapping=aes(x=reorder(Moda_Pengapalan,Keuntungan),
y=Keuntungan,fill=Moda_Pengapalan))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Keuntungan,0)),vjust=2)+
labs(title="Moda Pengapalan vs Keuntungan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Moda Pengapalan",
y="Keuntungan")ggplot(data=komss,mapping=aes(x=reorder(Moda_Pengapalan,Kuantiti),
y=Kuantiti,fill=Moda_Pengapalan))+
geom_col(show.legend = F)+
geom_text(aes(label=Kuantiti),vjust=2)+
labs(title="Moda Pengapalan vs Kuantiti",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Moda Pengapalan",
y="Kuantiti")If we look at the plots above, we can analyse that in terms of Total Sales, Total Profit, and Total Quantity, most of buyers choose Standard Class as the shipping mode option while Same Day shipping mode option is the least favourite.
- Which Segment that have provided the lowest and the highest sum of Sales,Profit & Quantity.
komseg <- aggregate.data.frame(x=list(komoditi$Sales,komoditi$Profit,komoditi$Quantity),
by=list(komoditi$Segment),FUN=sum)
komseg <- setNames(komseg,c("Segmen","Penjualan","Keuntungan","Kuantiti"))
ggplot(data=komseg,mapping=aes(x=reorder(Segmen,Penjualan),
y=Penjualan,fill=Segmen))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Penjualan,0)),vjust=2)+
labs(title="Segmen vs Penjualan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Segmen",
y="Penjualan")ggplot(data=komseg,mapping=aes(x=reorder(Segmen,Keuntungan),
y=Keuntungan,fill=Segmen))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Keuntungan,0)),vjust=2)+
labs(title="Segmen vs Keuntungan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Segmen",
y="Keuntungan")ggplot(data=komseg,mapping=aes(x=reorder(Segmen,Kuantiti),
y=Kuantiti,fill=Segmen))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Kuantiti,0)),vjust=2)+
labs(title="Segmen vs Kuantiti",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Segmen",
y="Kuantiti")If we look at the plots above, we can analyse that in terms of Total Sales, Total Profit, and Total Quantity, most of buyers choose to buy products in Consumer Segment while products from Home Office Segment is the least favourite.
- Which Category products that have provided the lowest and the highest sum of Sales,Profit & Quantity
komcat <- aggregate.data.frame(x=list(komoditi$Sales,komoditi$Profit,komoditi$Quantity),
by=list(komoditi$Category),FUN=sum)
komcat <- setNames(komcat,c("Kategori","Penjualan","Keuntungan","Kuantiti"))
ggplot(data=komcat,mapping=aes(x=reorder(Kategori,Penjualan),
y=Penjualan,fill=Kategori))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Penjualan,0)),vjust=2)+
labs(title="Kategori vs Penjualan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Kategori",
y="Penjualan") ggplot(data=komcat,mapping=aes(x=reorder(Kategori,Keuntungan),
y=Keuntungan,fill=Kategori))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Keuntungan,0)),vjust=2)+
labs(title="Kategori vs Keuntungan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Kategori",
y="Keuntungan")ggplot(data=komcat,mapping=aes(x=reorder(Kategori,Kuantiti),
y=Kuantiti,fill=Kategori))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Kuantiti,0)),vjust=2)+
labs(title="Kategori vs Kuantiti",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Kategori",
y="Kuantiti")If we look at the plots above, we can analyse that in terms of Total Quantity, the Office Supplies category is the most quantity with value of 22,906 items. However The Office Supplies category provided the least in terms of Sales and Profitability. The highest Sales and Profitablity comes from Technology Category with lesser quantity.
- Which Sub-Category products that have provided the lowest and the highest sum of Sales,Profit & Quantity
komsub <- aggregate.data.frame(x=list(komoditi$Sales,komoditi$Profit,komoditi$Quantity),
by=list(komoditi$Sub.Category),FUN=sum)
komsub <- setNames(komsub,c("Sub_Kategori","Penjualan","Keuntungan","Kuantiti"))
ggplot(data=komsub,mapping=aes(x=reorder(Sub_Kategori,Penjualan),
y=Penjualan,fill=Sub_Kategori))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Penjualan,0)),vjust=2)+
labs(title="Sub Kategori vs Penjualan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Sub Kategori",
y="Penjualan") komsub <- aggregate.data.frame(x=list(komoditi$Sales,komoditi$Profit,komoditi$Quantity),
by=list(komoditi$Sub.Category),FUN=sum)
komsub <- setNames(komsub,c("Sub_Kategori","Penjualan","Keuntungan","Kuantiti"))
ggplot(data=komsub,mapping=aes(x=reorder(Sub_Kategori,Keuntungan),
y=Keuntungan,fill=Sub_Kategori))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Keuntungan,0)),vjust=2)+
labs(title="Sub Kategori vs Keuntungan",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Sub Kategori",
y="Keuntungan") #There are products with negative profit. Now check the detail discount from these products
loss <- subset(komoditi,subset=komoditi$Sub.Category=="Tables"|
komoditi$Sub.Category=="Bookcases"|
komoditi$Sub.Category=="Supplies")
aggregate(loss$Discount~loss$Sub.Category+loss$Category+loss$Segment,FUN=mean)komsub <- aggregate.data.frame(x=list(komoditi$Sales,komoditi$Profit,komoditi$Quantity),
by=list(komoditi$Sub.Category),FUN=sum)
komsub <- setNames(komsub,c("Sub_Kategori","Penjualan","Keuntungan","Kuantiti"))
ggplot(data=komsub,mapping=aes(x=reorder(Sub_Kategori,Kuantiti),
y=Kuantiti,fill=Sub_Kategori))+
geom_col(show.legend = F)+
geom_text(aes(label=round(Kuantiti,0)),vjust=2)+
labs(title="Sub Kategori vs Kuantiti",
subtitle="PT Komoditi Ria Kencana 2014-2017",
x="Sub Kategori",
y="Kuantiti") If we look at the plots above, we can analyse as belows:
- In terms of Total Sales, Phones sub-category products provide the highest while Fasteners sub-category products provide the least
- In terms of Profitablity, Copiers sub-category products provide the highest profit. We found that some sub-category product happened to be negative profit. These sub-category are Tables, Bookcases and Supplies.
- In terms of Quantity, Binders sub-category products provide the highest while Copiers sub-category products provide the least
- Copiers products provides the most profitability with the least quantity compare to others.
5. Business Recommendation
PT Komoditi Ria kencana has been doing business since 2014. We have analysed all the commercial transaction from 2014-2017 specifically in products. We come with below recommendation for better future business:
- Standard Class Ship mode is the most preferred by buyers. PT Komoditi should keep this ship mode options since this ship mode provides the most sales, profitability and quantity. Please consider to delete Same-Day ship and ask buyers to move to First Class ship mode since First Class ship Mode provide the better sales, profitablility annd quantity compare to Same-Day ship mode.
- PT Komoditi do not need to change all the products Segments since all Segments provides positive sales, profit, and quantity.
- PT Komoditi do not need to change all the products Category since all Category provides positive sales, profit, and quantity.However there are some things that PT Komoditi needs to be considered. In Furniture Category, Although the sales is positively high, the profitability and the quantity sold is considerably low. PT Komoditi should consider to source these products from another vendors with lesser price or lower the discount.
- During Analysis of the sub-category products, we found that there are several products that cause negative profit (Bookcases, Tables, and Supplies). Please consider to lower the discount this sub-category products or source these products from another vendors with lesser price. We can consider also to delete “supplies” sub-category product since the sales is low, the quantity is low, negative profit with almost zero discount.
- PT Komoditi needs to sell more “Copiers” since this products provide the highest profitability with the least quantity. Please consider to give a little bit more discount to grow the quantity, sales, and profitability.