Supply Chain
Hello Everyone ! hope you guys in well. In this LBB it will continuing the previous LBB about P4DS with simply analytics statistic, but for this LBB it will be focus on interactive plotting visualization of data-set as our necessary, it talks about a data set of Supply Chains used by the company DataCo Global was used for the analysis.
source dataset : https://www.kaggle.com/datasets/shashwatwork/dataco-smart-supply-chain-for-big-data-analysis?select=DataCoSupplyChainDataset.csv
DataCo SMART SUPPLY CHAIN, one of the enormous logistic company moving in 3PL help many companies to fulfill their demands all world as DataCo’s Client.
Areas of important registered activities : Provisioning , Production , Sales , Commercial Distribution.It also allows the correlation of Structured Data with Unstructured Data for knowledge generation.
Types of Products : Clothing , Sports , and Electronic Supplies.
In this case, I will break down about sales progress every typed of products, how about customer satisfaction on progress shipping, and many more.
for the detail of information meaning of columns you all can read detail of description DataCo Supply Chain below :
df_detail <- read.csv('Dataset/DescriptionDataCoSupplyChain.csv')
knitr::kable(df_detail)| FIELDS | DESCRIPTION |
|---|---|
| Type | : Type of transaction made |
| Days for shipping (real) | : Actual shipping days of the purchased product |
| Days for shipment (scheduled) | : Days of scheduled delivery of the purchased product |
| Benefit per order | : Earnings per order placed |
| Sales per customer | : Total sales per customer made per customer |
| Delivery Status | : Delivery status of orders: Advance shipping , Late delivery , Shipping canceled , Shipping on time |
| Late_delivery_risk | : Categorical variable that indicates if sending is late (1), it is not late (0). |
| Category Id | : Product category code |
| Category Name | : Description of the product category |
| Customer City | : City where the customer made the purchase |
| Customer Country | : Country where the customer made the purchase |
| Customer Email | : Customer’s email |
| Customer Fname | : Customer name |
| Customer Id | : Customer ID |
| Customer Lname | : Customer lastname |
| Customer Password | : Masked customer key |
| Customer Segment | : Types of Customers: Consumer , Corporate , Home Office |
| Customer State | : State to which the store where the purchase is registered belongs |
| Customer Street | : Street to which the store where the purchase is registered belongs |
| Customer Zipcode | : Customer Zipcode |
| Department Id | : Department code of store |
| Department Name | : Department name of store |
| Latitude | : Latitude corresponding to location of store |
| Longitude | : Longitude corresponding to location of store |
| Market | : Market to where the order is delivered : Africa , Europe , LATAM , Pacific Asia , USCA |
| Order City | : Destination city of the order |
| Order Country | : Destination country of the order |
| Order Customer Id | : Customer order code |
| order date (DateOrders) | : Date on which the order is made |
| Order Id | : Order code |
| Order Item Cardprod Id | : Product code generated through the RFID reader |
| Order Item Discount | : Order item discount value |
| Order Item Discount Rate | : Order item discount percentage |
| Order Item Id | : Order item code |
| Order Item Product Price | : Price of products without discount |
| Order Item Profit Ratio | : Order Item Profit Ratio |
| Order Item Quantity | : Number of products per order |
| Sales | : Value in sales |
| Order Item Total | : Total amount per order |
| Order Profit Per Order | : Order Profit Per Order |
| Order Region | : Region of the world where the order is delivered : Southeast Asia ,South Asia ,Oceania ,Eastern Asia, West Asia , West of USA , US Center , West Africa, Central Africa ,North Africa ,Western Europe ,Northern , Caribbean , South America ,East Africa ,Southern Europe , East of USA ,Canada ,Southern Africa , Central Asia , Europe , Central America, Eastern Europe , South of USA |
| Order State | : State of the region where the order is delivered |
| Order Status | : Order Status : COMPLETE , PENDING , CLOSED , PENDING_PAYMENT ,CANCELED , PROCESSING ,SUSPECTED_FRAUD ,ON_HOLD ,PAYMENT_REVIEW |
| Product Card Id | : Product code |
| Product Category Id | : Product category code |
| Product Description | : Product Description |
| Product Image | : Link of visit and purchase of the product |
| Product Name | : Product Name |
| Product Price | : Product Price |
| Product Status | : Status of the product stock :If it is 1 not available , 0 the product is available |
| Shipping date (DateOrders) | : Exact date and time of shipment |
| Shipping Mode | : The following shipping modes are presented : Standard Class , First Class , Second Class , Same Day |
DataCo SMART SUPPLY CHAIN, one of the enormous logistic company moving in 3PL help many companies to fulfill their demands all world as DataCo’s Client.
Make sure our data placed in the same folder our R project data (Rdproj). in this case we will use first analyst in first file DataCoSupplyChainDataset.csv assign as df
# Data Input and Checking Data
df <- read.csv('Dataset/DataCoSupplyChainDataset.csv')Input Data is Done
In this sub section, I will see how does the data look like, we can see by this code
#Inspecting Data
head(df)#We can use this code to know specific of structure data
str(df)#> 'data.frame': 180519 obs. of 53 variables:
#> $ Type : chr "DEBIT" "TRANSFER" "CASH" "DEBIT" ...
#> $ Days.for.shipping..real. : int 3 5 4 3 2 6 2 2 3 2 ...
#> $ Days.for.shipment..scheduled.: int 4 4 4 4 4 4 1 1 2 1 ...
#> $ Benefit.per.order : num 91.2 -249.1 -247.8 22.9 134.2 ...
#> $ Sales.per.customer : num 315 311 310 305 298 ...
#> $ Delivery.Status : chr "Advance shipping" "Late delivery" "Shipping on time" "Advance shipping" ...
#> $ Late_delivery_risk : int 0 1 0 0 0 0 1 1 1 1 ...
#> $ Category.Id : int 73 73 73 73 73 73 73 73 73 73 ...
#> $ Category.Name : chr "Sporting Goods" "Sporting Goods" "Sporting Goods" "Sporting Goods" ...
#> $ Customer.City : chr "Caguas" "Caguas" "San Jose" "Los Angeles" ...
#> $ Customer.Country : chr "Puerto Rico" "Puerto Rico" "EE. UU." "EE. UU." ...
#> $ Customer.Email : chr "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" ...
#> $ Customer.Fname : chr "Cally" "Irene" "Gillian" "Tana" ...
#> $ Customer.Id : int 20755 19492 19491 19490 19489 19488 19487 19486 19485 19484 ...
#> $ Customer.Lname : chr "Holloway" "Luna" "Maldonado" "Tate" ...
#> $ Customer.Password : chr "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" ...
#> $ Customer.Segment : chr "Consumer" "Consumer" "Consumer" "Home Office" ...
#> $ Customer.State : chr "PR" "PR" "CA" "CA" ...
#> $ Customer.Street : chr "5365 Noble Nectar Island" "2679 Rustic Loop" "8510 Round Bear Gate" "3200 Amber Bend" ...
#> $ Customer.Zipcode : int 725 725 95125 90027 725 14150 725 33162 725 94583 ...
#> $ Department.Id : int 2 2 2 2 2 2 2 2 2 2 ...
#> $ Department.Name : chr "Fitness" "Fitness" "Fitness" "Fitness" ...
#> $ Latitude : num 18.3 18.3 37.3 34.1 18.3 ...
#> $ Longitude : num -66 -66 -122 -118 -66 ...
#> $ Market : chr "Pacific Asia" "Pacific Asia" "Pacific Asia" "Pacific Asia" ...
#> $ Order.City : chr "Bekasi" "Bikaner" "Bikaner" "Townsville" ...
#> $ Order.Country : chr "Indonesia" "India" "India" "Australia" ...
#> $ Order.Customer.Id : int 20755 19492 19491 19490 19489 19488 19487 19486 19485 19484 ...
#> $ order.date..DateOrders. : chr "1/31/2018 22:56" "1/13/2018 12:27" "1/13/2018 12:06" "1/13/2018 11:45" ...
#> $ Order.Id : int 77202 75939 75938 75937 75936 75935 75934 75933 75932 75931 ...
#> $ Order.Item.Cardprod.Id : int 1360 1360 1360 1360 1360 1360 1360 1360 1360 1360 ...
#> $ Order.Item.Discount : num 13.1 16.4 18 22.9 29.5 ...
#> $ Order.Item.Discount.Rate : num 0.04 0.05 0.06 0.07 0.09 ...
#> $ Order.Item.Id : int 180517 179254 179253 179252 179251 179250 179249 179248 179247 179246 ...
#> $ Order.Item.Product.Price : num 328 328 328 328 328 ...
#> $ Order.Item.Profit.Ratio : num 0.29 -0.8 -0.8 0.08 0.45 ...
#> $ Order.Item.Quantity : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ Sales : num 328 328 328 328 328 ...
#> $ Order.Item.Total : num 315 311 310 305 298 ...
#> $ Order.Profit.Per.Order : num 91.2 -249.1 -247.8 22.9 134.2 ...
#> $ Order.Region : chr "Southeast Asia" "South Asia" "South Asia" "Oceania" ...
#> $ Order.State : chr "Java Occidental" "Rajast\xe1n" "Rajast\xe1n" "Queensland" ...
#> $ Order.Status : chr "COMPLETE" "PENDING" "CLOSED" "COMPLETE" ...
#> $ Order.Zipcode : int NA NA NA NA NA NA NA NA NA NA ...
#> $ Product.Card.Id : int 1360 1360 1360 1360 1360 1360 1360 1360 1360 1360 ...
#> $ Product.Category.Id : int 73 73 73 73 73 73 73 73 73 73 ...
#> $ Product.Description : logi NA NA NA NA NA NA ...
#> $ Product.Image : chr "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " ...
#> $ Product.Name : chr "Smart watch " "Smart watch " "Smart watch " "Smart watch " ...
#> $ Product.Price : num 328 328 328 328 328 ...
#> $ Product.Status : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ shipping.date..DateOrders. : chr "2/3/2018 22:56" "1/18/2018 12:27" "1/17/2018 12:06" "1/16/2018 11:45" ...
#> $ Shipping.Mode : chr "Standard Class" "Standard Class" "Standard Class" "Standard Class" ...
📌 From this result, we find some of data type not in the correct type. we need to convert it into correct type (data coertion), there’s several type data should be corrected
Here the list of data should be corrected
Type chr -> fctDelivery.Status chr -> fctCategory.Id int -> fctCategory.Name chr -> fctCustomer.City chr -> fctCustomer.Country chr -> fctDepartment.Id int -> fctDepartment.Name chr -> fctMarket chr -> fctOrder.City chr -> fctOrder.Country chr -> fct + USCII & UTC8Order.Customer.Id int -> fctorder.date..DateOrders chr -> datetime (using
lubridate)Order.Region chr -> fctOrder.State chr -> fctOrder.Status chr -> fctProduct.Name chr -> fctshipping.date..DateOrders. chr -> datetime (using
lubridate)Shipping.Mode chr -> fct# explicit coercion
df[,c("Type", "Delivery.Status", "Category.Id","Category.Name","Customer.Segment","Customer.State","Customer.Street","Customer.Zipcode","Customer.City","Customer.Country","Department.Id","Department.Name","Market","Order.City","Order.Country","Customer.Id","Order.Region","Order.State","Order.Status","Product.Name","Shipping.Mode")]<-lapply(df[,c("Type","Delivery.Status","Category.Id","Category.Name","Customer.Segment","Customer.State","Customer.Street","Customer.Zipcode","Customer.City","Customer.Country","Department.Id","Department.Name","Market","Order.City","Order.Country","Customer.Id","Order.Region","Order.State","Order.Status","Product.Name","Shipping.Mode")],as.factor)# Check again df's table structure data
# now in the column with the data type factor the frequency appears
str(df)#> 'data.frame': 180519 obs. of 53 variables:
#> $ Type : Factor w/ 4 levels "CASH","DEBIT",..: 2 4 1 2 3 4 2 4 1 1 ...
#> $ Days.for.shipping..real. : int 3 5 4 3 2 6 2 2 3 2 ...
#> $ Days.for.shipment..scheduled.: int 4 4 4 4 4 4 1 1 2 1 ...
#> $ Benefit.per.order : num 91.2 -249.1 -247.8 22.9 134.2 ...
#> $ Sales.per.customer : num 315 311 310 305 298 ...
#> $ Delivery.Status : Factor w/ 4 levels "Advance shipping",..: 1 2 4 1 1 3 2 2 2 2 ...
#> $ Late_delivery_risk : int 0 1 0 0 0 0 1 1 1 1 ...
#> $ Category.Id : Factor w/ 51 levels "2","3","4","5",..: 48 48 48 48 48 48 48 48 48 48 ...
#> $ Category.Name : Factor w/ 50 levels "Accessories",..: 41 41 41 41 41 41 41 41 41 41 ...
#> $ Customer.City : Factor w/ 563 levels "Aguadilla","Alameda",..: 67 67 453 286 67 506 67 319 67 458 ...
#> $ Customer.Country : Factor w/ 2 levels "EE. UU.","Puerto Rico": 2 2 1 1 2 1 2 1 2 1 ...
#> $ Customer.Email : chr "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" ...
#> $ Customer.Fname : chr "Cally" "Irene" "Gillian" "Tana" ...
#> $ Customer.Id : Factor w/ 20652 levels "1","2","3","4",..: 20650 19387 19386 19385 19384 19383 19382 19381 19380 19379 ...
#> $ Customer.Lname : chr "Holloway" "Luna" "Maldonado" "Tate" ...
#> $ Customer.Password : chr "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" ...
#> $ Customer.Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 1 3 2 1 3 2 2 2 ...
#> $ Customer.State : Factor w/ 46 levels "91732","95758",..: 37 37 6 6 37 32 37 11 37 6 ...
#> $ Customer.Street : Factor w/ 7458 levels "1 Bright Manor",..: 3684 1401 6218 1804 6346 967 767 5489 910 6741 ...
#> $ Customer.Zipcode : Factor w/ 995 levels "603","612","674",..: 8 8 942 721 8 174 8 341 8 923 ...
#> $ Department.Id : Factor w/ 11 levels "2","3","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ Department.Name : Factor w/ 11 levels "Apparel","Book Shop",..: 5 5 5 5 5 5 5 5 5 5 ...
#> $ Latitude : num 18.3 18.3 37.3 34.1 18.3 ...
#> $ Longitude : num -66 -66 -122 -118 -66 ...
#> $ Market : Factor w/ 5 levels "Africa","Europe",..: 4 4 4 4 4 4 4 4 4 4 ...
#> $ Order.City : Factor w/ 3597 levels "Zany\xe1n","S\xe3o Jos\xe9 dos Campos",..: 491 549 549 3190 3190 3172 1335 1335 1335 1335 ...
#> $ Order.Country : Factor w/ 164 levels "Camer\xfan","Ben\xedn",..: 90 89 89 41 41 41 57 57 57 57 ...
#> $ Order.Customer.Id : int 20755 19492 19491 19490 19489 19488 19487 19486 19485 19484 ...
#> $ order.date..DateOrders. : chr "1/31/2018 22:56" "1/13/2018 12:27" "1/13/2018 12:06" "1/13/2018 11:45" ...
#> $ Order.Id : int 77202 75939 75938 75937 75936 75935 75934 75933 75932 75931 ...
#> $ Order.Item.Cardprod.Id : int 1360 1360 1360 1360 1360 1360 1360 1360 1360 1360 ...
#> $ Order.Item.Discount : num 13.1 16.4 18 22.9 29.5 ...
#> $ Order.Item.Discount.Rate : num 0.04 0.05 0.06 0.07 0.09 ...
#> $ Order.Item.Id : int 180517 179254 179253 179252 179251 179250 179249 179248 179247 179246 ...
#> $ Order.Item.Product.Price : num 328 328 328 328 328 ...
#> $ Order.Item.Profit.Ratio : num 0.29 -0.8 -0.8 0.08 0.45 ...
#> $ Order.Item.Quantity : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ Sales : num 328 328 328 328 328 ...
#> $ Order.Item.Total : num 315 311 310 305 298 ...
#> $ Order.Profit.Per.Order : num 91.2 -249.1 -247.8 22.9 134.2 ...
#> $ Order.Region : Factor w/ 23 levels "Canada","Caribbean",..: 16 14 14 12 12 12 8 8 8 8 ...
#> $ Order.State : Factor w/ 1089 levels "B\xedo-B\xedo",..: 500 62 62 800 800 800 423 423 423 423 ...
#> $ Order.Status : Factor w/ 9 levels "CANCELED","CLOSED",..: 3 6 2 3 7 1 3 8 2 2 ...
#> $ Order.Zipcode : int NA NA NA NA NA NA NA NA NA NA ...
#> $ Product.Card.Id : int 1360 1360 1360 1360 1360 1360 1360 1360 1360 1360 ...
#> $ Product.Category.Id : int 73 73 73 73 73 73 73 73 73 73 ...
#> $ Product.Description : logi NA NA NA NA NA NA ...
#> $ Product.Image : chr "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " ...
#> $ Product.Name : Factor w/ 118 levels "adidas Brazuca 2014 Official Match Ball",..: 83 83 83 83 83 83 83 83 83 83 ...
#> $ Product.Price : num 328 328 328 328 328 ...
#> $ Product.Status : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ shipping.date..DateOrders. : chr "2/3/2018 22:56" "1/18/2018 12:27" "1/17/2018 12:06" "1/16/2018 11:45" ...
#> $ Shipping.Mode : Factor w/ 4 levels "First Class",..: 4 4 4 4 4 4 1 1 3 1 ...
#Applied Library Lubridate to change typical column date to correct form
library(lubridate)
df$order.date..DateOrders. <- mdy_hm(df$order.date..DateOrders.)
df$shipping.date..DateOrders. <- mdy_hm(df$shipping.date..DateOrders.)str(df$order.date..DateOrders.)#> POSIXct[1:180519], format: "2018-01-31 22:56:00" "2018-01-13 12:27:00" "2018-01-13 12:06:00" ...
str(df$shipping.date..DateOrders.)#> POSIXct[1:180519], format: "2018-02-03 22:56:00" "2018-01-18 12:27:00" "2018-01-17 12:06:00" ...
#Change date time to EST Zone
df$order.date..DateOrders. <- with_tz(df$order.date..DateOrders., tz="EST")
df$shipping.date..DateOrders. <- with_tz(df$shipping.date..DateOrders., tz="EST")
str(df$order.date..DateOrders.)#> POSIXct[1:180519], format: "2018-01-31 17:56:00" "2018-01-13 07:27:00" "2018-01-13 07:06:00" ...
str(df$shipping.date..DateOrders.)#> POSIXct[1:180519], format: "2018-02-03 17:56:00" "2018-01-18 07:27:00" "2018-01-17 07:06:00" ...
Name of Countries not in correct as usual, we need to change a correct form
#Due to our character names of countries is "non-ASCII" we are using iconv with id ASCII and UTF-8
df$Order.Country <- iconv(df$Order.Country,"ASCII","UTF-8") str(df)#> 'data.frame': 180519 obs. of 53 variables:
#> $ Type : Factor w/ 4 levels "CASH","DEBIT",..: 2 4 1 2 3 4 2 4 1 1 ...
#> $ Days.for.shipping..real. : int 3 5 4 3 2 6 2 2 3 2 ...
#> $ Days.for.shipment..scheduled.: int 4 4 4 4 4 4 1 1 2 1 ...
#> $ Benefit.per.order : num 91.2 -249.1 -247.8 22.9 134.2 ...
#> $ Sales.per.customer : num 315 311 310 305 298 ...
#> $ Delivery.Status : Factor w/ 4 levels "Advance shipping",..: 1 2 4 1 1 3 2 2 2 2 ...
#> $ Late_delivery_risk : int 0 1 0 0 0 0 1 1 1 1 ...
#> $ Category.Id : Factor w/ 51 levels "2","3","4","5",..: 48 48 48 48 48 48 48 48 48 48 ...
#> $ Category.Name : Factor w/ 50 levels "Accessories",..: 41 41 41 41 41 41 41 41 41 41 ...
#> $ Customer.City : Factor w/ 563 levels "Aguadilla","Alameda",..: 67 67 453 286 67 506 67 319 67 458 ...
#> $ Customer.Country : Factor w/ 2 levels "EE. UU.","Puerto Rico": 2 2 1 1 2 1 2 1 2 1 ...
#> $ Customer.Email : chr "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" ...
#> $ Customer.Fname : chr "Cally" "Irene" "Gillian" "Tana" ...
#> $ Customer.Id : Factor w/ 20652 levels "1","2","3","4",..: 20650 19387 19386 19385 19384 19383 19382 19381 19380 19379 ...
#> $ Customer.Lname : chr "Holloway" "Luna" "Maldonado" "Tate" ...
#> $ Customer.Password : chr "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" "XXXXXXXXX" ...
#> $ Customer.Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 1 3 2 1 3 2 2 2 ...
#> $ Customer.State : Factor w/ 46 levels "91732","95758",..: 37 37 6 6 37 32 37 11 37 6 ...
#> $ Customer.Street : Factor w/ 7458 levels "1 Bright Manor",..: 3684 1401 6218 1804 6346 967 767 5489 910 6741 ...
#> $ Customer.Zipcode : Factor w/ 995 levels "603","612","674",..: 8 8 942 721 8 174 8 341 8 923 ...
#> $ Department.Id : Factor w/ 11 levels "2","3","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ Department.Name : Factor w/ 11 levels "Apparel","Book Shop",..: 5 5 5 5 5 5 5 5 5 5 ...
#> $ Latitude : num 18.3 18.3 37.3 34.1 18.3 ...
#> $ Longitude : num -66 -66 -122 -118 -66 ...
#> $ Market : Factor w/ 5 levels "Africa","Europe",..: 4 4 4 4 4 4 4 4 4 4 ...
#> $ Order.City : Factor w/ 3597 levels "Zany\xe1n","S\xe3o Jos\xe9 dos Campos",..: 491 549 549 3190 3190 3172 1335 1335 1335 1335 ...
#> $ Order.Country : chr "Indonesia" "India" "India" "Australia" ...
#> $ Order.Customer.Id : int 20755 19492 19491 19490 19489 19488 19487 19486 19485 19484 ...
#> $ order.date..DateOrders. : POSIXct, format: "2018-01-31 17:56:00" "2018-01-13 07:27:00" ...
#> $ Order.Id : int 77202 75939 75938 75937 75936 75935 75934 75933 75932 75931 ...
#> $ Order.Item.Cardprod.Id : int 1360 1360 1360 1360 1360 1360 1360 1360 1360 1360 ...
#> $ Order.Item.Discount : num 13.1 16.4 18 22.9 29.5 ...
#> $ Order.Item.Discount.Rate : num 0.04 0.05 0.06 0.07 0.09 ...
#> $ Order.Item.Id : int 180517 179254 179253 179252 179251 179250 179249 179248 179247 179246 ...
#> $ Order.Item.Product.Price : num 328 328 328 328 328 ...
#> $ Order.Item.Profit.Ratio : num 0.29 -0.8 -0.8 0.08 0.45 ...
#> $ Order.Item.Quantity : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ Sales : num 328 328 328 328 328 ...
#> $ Order.Item.Total : num 315 311 310 305 298 ...
#> $ Order.Profit.Per.Order : num 91.2 -249.1 -247.8 22.9 134.2 ...
#> $ Order.Region : Factor w/ 23 levels "Canada","Caribbean",..: 16 14 14 12 12 12 8 8 8 8 ...
#> $ Order.State : Factor w/ 1089 levels "B\xedo-B\xedo",..: 500 62 62 800 800 800 423 423 423 423 ...
#> $ Order.Status : Factor w/ 9 levels "CANCELED","CLOSED",..: 3 6 2 3 7 1 3 8 2 2 ...
#> $ Order.Zipcode : int NA NA NA NA NA NA NA NA NA NA ...
#> $ Product.Card.Id : int 1360 1360 1360 1360 1360 1360 1360 1360 1360 1360 ...
#> $ Product.Category.Id : int 73 73 73 73 73 73 73 73 73 73 ...
#> $ Product.Description : logi NA NA NA NA NA NA ...
#> $ Product.Image : chr "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " "http://images.acmesports.sports/Smart+watch " ...
#> $ Product.Name : Factor w/ 118 levels "adidas Brazuca 2014 Official Match Ball",..: 83 83 83 83 83 83 83 83 83 83 ...
#> $ Product.Price : num 328 328 328 328 328 ...
#> $ Product.Status : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ shipping.date..DateOrders. : POSIXct, format: "2018-02-03 17:56:00" "2018-01-18 07:27:00" ...
#> $ Shipping.Mode : Factor w/ 4 levels "First Class",..: 4 4 4 4 4 4 1 1 3 1 ...
DataCo SMART SUPPLY CHAIN, is the one of the enormous logistic company moving in 3PL help many companies to fulfill their demands all world as DataCo’s Client.
We want to know based on data-set, how many transaction has been handled by DataCo
# Using Dimension of Table
nrow(df)#> [1] 180519
📈Insight :
Summary table of df.
summary(df)#> Type Days.for.shipping..real. Days.for.shipment..scheduled.
#> CASH :19616 Min. :0.000 Min. :0.000
#> DEBIT :69295 1st Qu.:2.000 1st Qu.:2.000
#> PAYMENT :41725 Median :3.000 Median :4.000
#> TRANSFER:49883 Mean :3.498 Mean :2.932
#> 3rd Qu.:5.000 3rd Qu.:4.000
#> Max. :6.000 Max. :4.000
#>
#> Benefit.per.order Sales.per.customer Delivery.Status
#> Min. :-4274.98 Min. : 7.49 Advance shipping :41592
#> 1st Qu.: 7.00 1st Qu.: 104.38 Late delivery :98977
#> Median : 31.52 Median : 163.99 Shipping canceled: 7754
#> Mean : 21.98 Mean : 183.11 Shipping on time :32196
#> 3rd Qu.: 64.80 3rd Qu.: 247.40
#> Max. : 911.80 Max. :1939.99
#>
#> Late_delivery_risk Category.Id Category.Name
#> Min. :0.0000 17 :24551 Cleats :24551
#> 1st Qu.:0.0000 18 :22246 Men's Footwear :22246
#> Median :1.0000 24 :21035 Women's Apparel :21035
#> Mean :0.5483 46 :19298 Indoor/Outdoor Games:19298
#> 3rd Qu.:1.0000 45 :17325 Fishing :17325
#> Max. :1.0000 48 :15540 Water Sports :15540
#> (Other):60524 (Other) :60524
#> Customer.City Customer.Country Customer.Email
#> Caguas :66770 EE. UU. :111146 Length:180519
#> Chicago : 3885 Puerto Rico: 69373 Class :character
#> Los Angeles : 3417 Mode :character
#> Brooklyn : 3412
#> New York : 1816
#> Philadelphia: 1577
#> (Other) :99642
#> Customer.Fname Customer.Id Customer.Lname Customer.Password
#> Length:180519 5654 : 47 Length:180519 Length:180519
#> Class :character 5004 : 45 Class :character Class :character
#> Mode :character 10591 : 45 Mode :character Mode :character
#> 3708 : 44
#> 5715 : 44
#> 9371 : 44
#> (Other):180250
#> Customer.Segment Customer.State Customer.Street
#> Consumer :93504 PR :69373 9126 Wishing Expressway : 122
#> Corporate :54789 CA :29223 4388 Burning Goose Ridge: 117
#> Home Office:32226 NY :11327 4720 Noble Hills Wynd : 116
#> TX : 9103 2878 Hazy Wagon Thicket: 113
#> IL : 7631 398 Emerald Grove : 109
#> FL : 5456 3593 Blue Brook Acres : 108
#> (Other):48406 (Other) :179834
#> Customer.Zipcode Department.Id Department.Name Latitude
#> 725 : 66770 7 :66861 Fan Shop:66861 Min. :-33.94
#> 921 : 337 4 :48998 Apparel :48998 1st Qu.: 18.27
#> 23455 : 334 5 :33220 Golf :33220 Median : 33.14
#> 957 : 297 3 :14525 Footwear:14525 Mean : 29.72
#> 79109 : 292 6 : 9686 Outdoors: 9686 3rd Qu.: 39.28
#> (Other):112486 2 : 2479 Fitness : 2479 Max. : 48.78
#> NA's : 3 (Other): 4750 (Other) : 4750
#> Longitude Market Order.City
#> Min. :-158.03 Africa :11614 Santo Domingo: 2211
#> 1st Qu.: -98.45 Europe :50252 New York City: 2202
#> Median : -76.85 LATAM :51594 Los Angeles : 1845
#> Mean : -84.92 Pacific Asia:41260 Tegucigalpa : 1783
#> 3rd Qu.: -66.37 USCA :25799 Managua : 1682
#> Max. : 115.26 Mexico City : 1484
#> (Other) :169312
#> Order.Country Order.Customer.Id order.date..DateOrders.
#> Length:180519 Min. : 1 Min. :2014-12-31 19:00:00.00
#> Class :character 1st Qu.: 3258 1st Qu.:2015-09-21 08:49:00.00
#> Mode :character Median : 6457 Median :2016-06-11 08:06:00.00
#> Mean : 6691 Mean :2016-06-12 12:47:04.67
#> 3rd Qu.: 9779 3rd Qu.:2017-03-01 03:42:00.00
#> Max. :20757 Max. :2018-01-31 18:38:00.00
#>
#> Order.Id Order.Item.Cardprod.Id Order.Item.Discount
#> Min. : 1 Min. : 19.0 Min. : 0.00
#> 1st Qu.:18057 1st Qu.: 403.0 1st Qu.: 5.40
#> Median :36140 Median : 627.0 Median : 14.00
#> Mean :36222 Mean : 692.5 Mean : 20.66
#> 3rd Qu.:54144 3rd Qu.:1004.0 3rd Qu.: 29.99
#> Max. :77204 Max. :1363.0 Max. :500.00
#>
#> Order.Item.Discount.Rate Order.Item.Id Order.Item.Product.Price
#> Min. :0.0000 Min. : 1 Min. : 9.99
#> 1st Qu.:0.0400 1st Qu.: 45131 1st Qu.: 50.00
#> Median :0.1000 Median : 90260 Median : 59.99
#> Mean :0.1017 Mean : 90260 Mean : 141.23
#> 3rd Qu.:0.1600 3rd Qu.:135390 3rd Qu.: 199.99
#> Max. :0.2500 Max. :180519 Max. :1999.99
#>
#> Order.Item.Profit.Ratio Order.Item.Quantity Sales
#> Min. :-2.7500 Min. :1.000 Min. : 9.99
#> 1st Qu.: 0.0800 1st Qu.:1.000 1st Qu.: 119.98
#> Median : 0.2700 Median :1.000 Median : 199.92
#> Mean : 0.1206 Mean :2.128 Mean : 203.77
#> 3rd Qu.: 0.3600 3rd Qu.:3.000 3rd Qu.: 299.95
#> Max. : 0.5000 Max. :5.000 Max. :1999.99
#>
#> Order.Item.Total Order.Profit.Per.Order Order.Region
#> Min. : 7.49 Min. :-4274.98 Central America:28341
#> 1st Qu.: 104.38 1st Qu.: 7.00 Western Europe :27109
#> Median : 163.99 Median : 31.52 South America :14935
#> Mean : 183.11 Mean : 21.98 Oceania :10148
#> 3rd Qu.: 247.40 3rd Qu.: 64.80 Northern Europe: 9792
#> Max. :1939.99 Max. : 911.80 Southeast Asia : 9539
#> (Other) :80655
#> Order.State Order.Status Order.Zipcode
#> Inglaterra : 6722 COMPLETE :59491 Min. : 1040
#> California : 4966 PENDING_PAYMENT:39832 1st Qu.:23464
#> Isla de Francia : 4580 PROCESSING :21902 Median :59405
#> Renania del Norte-Westfalia: 3303 PENDING :20227 Mean :55426
#> San Salvador : 3055 CLOSED :19616 3rd Qu.:90008
#> Nueva York : 2753 ON_HOLD : 9804 Max. :99301
#> (Other) :155140 (Other) : 9647 NA's :155679
#> Product.Card.Id Product.Category.Id Product.Description Product.Image
#> Min. : 19.0 Min. : 2.00 Mode:logical Length:180519
#> 1st Qu.: 403.0 1st Qu.:18.00 NA's:180519 Class :character
#> Median : 627.0 Median :29.00 Mode :character
#> Mean : 692.5 Mean :31.85
#> 3rd Qu.:1004.0 3rd Qu.:45.00
#> Max. :1363.0 Max. :76.00
#>
#> Product.Name Product.Price
#> Perfect Fitness Perfect Rip Deck :24515 Min. : 9.99
#> Nike Men's CJ Elite 2 TD Football Cleat :22246 1st Qu.: 50.00
#> Nike Men's Dri-FIT Victory Golf Polo :21035 Median : 59.99
#> O'Brien Men's Neoprene Life Vest :19298 Mean : 141.23
#> Field & Stream Sportsman 16 Gun Fire Safe:17325 3rd Qu.: 199.99
#> Pelican Sunstream 100 Kayak :15500 Max. :1999.99
#> (Other) :60600
#> Product.Status shipping.date..DateOrders. Shipping.Mode
#> Min. :0 Min. :2015-01-02 19:00:00.0 First Class : 27814
#> 1st Qu.:0 1st Qu.:2015-09-25 01:59:00.0 Same Day : 9737
#> Median :0 Median :2016-06-15 03:32:00.0 Second Class : 35216
#> Mean :0 Mean :2016-06-16 00:45:23.2 Standard Class:107752
#> 3rd Qu.:0 3rd Qu.:2017-03-04 16:29:00.0
#> Max. :0 Max. :2018-02-06 17:14:00.0
#>
Basic Thinking Way :
For this reports, I’m trying to analyze 180,519 transactions in DataCo which distributed in 164 different countries in 5 markets and 23 Regions. the insight will be divided into 3 types :
📌 General Knowledge Insight of business process :
📌 Negative Info of business process :
📌 Search for additional information in the data-set :
In this general knowledge, we want to focus every market country and their items
#Calculate Market for frequent
a <-as.data.frame(table(df$Market))library(dplyr)
library(glue)
library(plotly)fig <- plot_ly(a, labels = ~Var1, values = ~Freq, type = 'pie',
textposition = 'inside',
textinfo = 'label+percent',
insidetextfont = list(color = '#FFFFFF'),
hoverinfo = 'text',
text = ~paste('Order', Freq),
marker = list(colors = "",
line = list(color = '#FFFFFF', width = 1)),
#The 'pull' attribute can also be used to create space between the sectors
showlegend = FALSE)
fig <- fig %>% layout(title = 'The Most Order Market',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig📈Insight :
#Calculate Region for frequent (Big 5 Regions)
b <- as.data.frame(table(df$Order.Region))
b1 <- head(b[order(b$Freq, decreasing=T),],5)
b1 <- b1 %>%
mutate(label=glue("Regions : {Var1}
Count Order : {Freq}"))
p2 <- ggplot(data= b1 , mapping = aes(y=reorder(Var1,Freq),x=Freq, text=label)) +
geom_col(aes(fill=Freq)) + scale_fill_gradient(low ='#90e0ef', high ='#415a77') +
theme(legend.position = 'none' )
p2 <- ggplotly(p2,tooltip="text")# Make it table as C contains frequency of country (Big 5 Country)
c <- as.data.frame(table(df$Order.Country))
c1<-head(c[order(c$Freq, decreasing=T),],n=5)
c1 <- c1 %>%
mutate(label=glue("Country : {Var1}
Count Order : {Freq}"))
p3 <- ggplot(data= c1 , mapping = aes(y=reorder(Var1,Freq),x=Freq, text=label)) +
geom_col(aes(fill=Freq)) + scale_fill_gradient(low ='#90e0ef', high ='#415a77') +
theme(legend.position = 'none' )
p3 <- ggplotly(p3,tooltip="text")subplot(p2,p3, nrows=2, margin = 0.08) %>%
layout(annotations = list(
list(x=0.2 , y = 1.0, text = "Big 5 Regions",xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",showarrow = FALSE),
list(x = 0.2 , y = 0.45, text = "Big 5 Countries",xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",showarrow = FALSE)))📈Insight :
In this Positive Info of business process, we want to focus to know well about :
We need to know first time of order and the last time, we can use this code
time1 <- sort(df$order.date..DateOrders.)
head(time1,n=1)#> [1] "2014-12-31 19:00:00 EST"
tail(time1,n=1)#> [1] "2018-01-31 18:38:00 EST"
📈Insight :
datacoprofit <- df %>%
mutate(year = year(order.date..DateOrders.)) %>%
mutate(month = month(order.date..DateOrders.)) %>%
mutate(month_year = my(paste(month,year, sep = " "))) %>%
group_by(month_year) %>%
summarise(total=sum(Order.Profit.Per.Order))
datacoprofitdatacoprofit <- datacoprofit %>%
mutate(label = glue("Period : {month_year}
Total Profit per Order : {total}")) %>%
ggplot() +
geom_line(mapping = aes(x=month_year,y = total)) +
geom_jitter(aes(x=month_year,y = total, text = label))+
labs(title= "Profit Data Co",y = "Total Profit per Order", x = "Year")
ggplotly(datacoprofit, tooltip="text")📈Insight :
datasalesvalue <- df %>%
mutate(year = year(order.date..DateOrders.)) %>%
mutate(month = month(order.date..DateOrders.)) %>%
mutate(month_year = my(paste(month,year, sep = " "))) %>%
group_by(month_year) %>%
summarise(total=sum(Sales))datasalesvalue <- datasalesvalue %>%
mutate(label = glue("Period : {month_year}
Total Profit per Order : {total}")) %>%
ggplot() +
geom_line(mapping = aes(x=month_year,y = total)) +
geom_jitter(aes(x=month_year,y = total, text = label)) +
labs(title= "Sales Value Data Co",y = "Total Sales Value", x = "Year")
ggplotly(datasalesvalue, tooltip="text")📈Insight :
del_status <- as.data.frame(table(df$Delivery.Status))
del_status <- del_status[order(del_status$Freq, decreasing=T),]
c3 <- del_status %>%
mutate(label=glue("Delivery Status : {Var1}
Count Order : {Freq}"))
c2 <- ggplot(data= c3 , mapping = aes(y=reorder(Var1,Freq),x=Freq, text=label)) +
geom_col(aes(fill=Freq)) + scale_fill_gradient(low ='#90e0ef', high ='#415a77') +
labs(y="Delivery Status")+
theme(legend.position = 'none')
ggplotly(c2,tooltip="text")📈Insight :
order_status <- as.data.frame(table(df$Order.Status))
order_status <- order_status[order(order_status$Freq, decreasing=T),]
c4 <- order_status %>%
mutate(label=glue("Status Payment : {Var1}
Count Order : {Freq}"))
c5 <- ggplot(data= c4 , mapping = aes(y=reorder(Var1,Freq),x=Freq, text=label)) +
geom_col(aes(fill=Freq)) + scale_fill_gradient(low ='#90e0ef', high ='#415a77') +
labs(y="Status Payment")+
theme(legend.position = 'none')
ggplotly(c5,tooltip="text")📈Insight :
#Make a new dataset of Indonesia
ind_city <- df %>%
group_by(Order.City, Order.Country) %>%
filter(Order.Country == "Indonesia") %>%
summarise(total = sum(Order.Item.Quantity)) %>%
mutate(label = glue ( "Order City = {Order.City}
Total Order = {total}"
)) %>%
arrange(-total)
ind_cityind_city <- ind_city %>%
ggplot()+
geom_col(mapping=aes(y=reorder(Order.City,total),x=total, fill=total, text = label))+
scale_fill_gradient(low ='#90e0ef', high ='#415a77') +
labs(y="Order City", x="Total Order")+
theme(legend.position = 'none')
ggplotly(ind_city, tooltip="text")Insight :
ind_prod <- df %>%
group_by(Product.Name, Category.Name) %>%
summarise(total = sum(Order.Item.Quantity)) %>%
arrange(- total)
ind_prod📈Insight :
ind <- df %>%
group_by(Order.City, Order.Country) %>%
filter(Order.Country == "Indonesia")
indplot(xtabs(ind$Sales ~ Customer.Segment + Shipping.Mode , ind))
📈Insight :
ind1 <- df %>%
group_by(Order.Country, Type) %>%
filter(Order.Country == "Indonesia") %>%
filter(Customer.Segment =="Consumer") %>%
filter(Type == "DEBIT") %>%
filter(Delivery.Status == "Late delivery") %>%
count() %>%
rename(total = n) %>%
summarise(sum = (total))
ind1plot(xtabs(ind$Sales ~ Customer.Segment + Shipping.Mode + Type , ind))
📈Insight :
datasetship <- df %>%
mutate(year = year(order.date..DateOrders.)) %>%
mutate(quarter = quarter(order.date..DateOrders.)) %>%
mutate(year_quarter = as.factor(paste(year, "Q", quarter))) %>%
group_by(Shipping.Mode,year_quarter, Delivery.Status,Order.Country) %>%
count(Shipping.Mode) %>%
rename(total = n) %>%
filter(Order.Country == "Indonesia") %>%
filter(Delivery.Status == "Late delivery") %>%
mutate(label=glue("Shipping Mode : {Shipping.Mode}
Total : {total}")) %>%
ggplot() +
geom_line(aes(x = year_quarter, y = total, color = Shipping.Mode, group = Delivery.Status)) +
geom_point(aes(x = year_quarter, y = total, color = Shipping.Mode, group = Delivery.Status))+
scale_fill_brewer(palette = "Set3") +
theme(legend.position = "none")
ggplotly(datasetship)📈Insight :
ind2 <- df %>%
mutate(year = year(order.date..DateOrders.)) %>%
mutate(quarter = quarter(order.date..DateOrders.)) %>%
mutate(year_quarter = as.factor(paste(year, "Q", quarter))) %>%
group_by(year_quarter, Order.Country) %>%
filter(Order.Country == "Indonesia") %>%
mutate(total=sum(Order.Item.Quantity)) %>%
count() %>%
rename(total = n) %>%
mutate(label=glue("Year Quarter : {year_quarter}
Total Order : {total}")) %>%
ggplot() +
geom_col(aes(x = year_quarter, y = total)) +
theme(legend.position = "none")
ggplotly(ind2)📈Insight :