E-commerce Shipping Analysis

Author

Joe Galuppo

Introduction

Today, we are going to take a deep dive into a data set that will be able to help online entrepreneurs understand their supply chain in-depth by looking at shipping data from an electronic E-commerce company. This data includes some interesting nominal and ordinal data that will provide insights to what components have excellent correlation in an effective supply chain process. The following is our data and what each category entails:

Warning: package 'kableExtra' was built under R version 4.3.2
Variable Description
ID Each specific event is given an ID number to track its process
Warehouse_block Warehouse block at the electronic company (A, B, C, D, F)
Mode_of_Shipment How each delivery's transportation is conducted
Customer_Care_Calls Number of calls made to ensure the product was shipped safely
Customer_Rating Rating given to the company for delivering its product (1, lowest; 5, highest)
Cost_of_Product How much the product costs
Prior_Purchases How many prior purchases the customer has made
Product_Importance How the company rates its most important products (low - high)
Gender Gender of the customer
Discount_Offered Discount offered to a specific customer
Weight Weight of the product in grams
Reached_on_Time If the product reached its destination on time or not

Analysis

This data set gives us lots of interesting data to use that can help us understand:

What methods can we take to ensure the product arrives on time? What do we do that will give us a good rating? What things correlate the most that determine importance? How does our ratings compare to other companies?

I believe these things will be interesting because it will be helpful to try and determine what a great supply chain has. By looking at all the components, we can really see what a company can do to make its supply chain great! Also, I find this data to be very rich with lots of hidden parts that we cannot get from just looking at it. Only an analysis will be able to tell us!

Process Optimization

To begin, we will be taking a look at what is the best method to delivering our products. We can tell what the optimized path is by looking at what mode of shipment, what warehouse it comes from, and what level of importance is the most effective for delivering our product on time. Comparing this to the most frequently chosen path, we can show the company by changing to the more effective path, their company can benefit by providing customers downstream value through saving on costs. The first step in determining what a great supply chain will have is determining the effectiveness and frequency of each path. We are going to look at the completion rate of each mode of transportation to determine which will be the most reliable way to ship our products to begin.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()     masks stats::filter()
✖ dplyr::group_rows() masks kableExtra::group_rows()
✖ dplyr::lag()        masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: package 'tidytext' was built under R version 4.3.2
Warning: package 'textdata' was built under R version 4.3.2
Mode_of_Shipment Total_Shipping_Count Reached.on.time.sum completion_rate_mode
Flight 1777 1069 60.15757
Road 1760 1035 58.80682
Ship 7462 4459 59.75610

As we can see here, the three modes of transportation are quite similar in completion rates. However, it appears that flights are the most effective strategy to send our products. Assuming that the cost of shipping each of these ways are similar, we would assume that flights would be the best way to send all products.

However, according to the graph, we can see that ship is by far the most commonly used method of transportation. Since we don’t know what company this is, we have to assume this either because they are not near an easily accessible airport or their warehouse is located closely to a harbor.

For determining which warehouse would be the best we can do a similar analysis.

Warehouse_block Total_Shipping_Count Reached.on.time.sum completion_rate_wh
A 1833 1075 58.64703
B 1833 1104 60.22913
C 1833 1094 59.68358
D 1834 1096 59.76009
F 3666 2194 59.84725

Again, we can see similar results for each of the warehouses. However, understanding why the volume is so large at warehouse block F instead of the more efficient warehouse block B could help supply chain managers reduce costs for their downstream customers.

This graph helps us understand the main process at which this electronic company is supplying their goods to their customers. By following the main source of volume we can determine the path most followed. Obviously, they are usually going through warehouse block F as their main process follow right now. However, by using the more effective route with going through warehouse block B, they could reduce costs even if it is a marginal difference.

Finally, I believe that our priority of our products matters in our supply chain. Determining between highly important packages and lowly important packages could make a difference in what the optimal process would be.

We are going to a similar analysis for this to determine the best process path for our products to take.

Product_importance Total_Shipping_Count Reached.on.time.sum completion_rate_pi
high 948 616 64.97890
low 5297 3140 59.27884
medium 4754 2807 59.04501

Here, the most effective route and the most often taken route. By labeling a product as high importance we can see that it reaches it delivery point on time the most often. However, it is more common that a delivery is labeled as low importance and not delivered on time compared to a high importance item. Therefore, my recommendation to this company would to start treating all deliveries as if they a high importance item to increase completion rate efficiency and thus delivering downstream value to their customers.

We see the frequency of product’s importance, how often they are labeled, and how often they reached their destination on time. Again, instead of categorically, making a distinction between products, the company should treat them more as the same to improve process efficiency.

After finding the results of effectiveness and frequency between different factors, we can come up with a plan for this company to use their most efficient options to create a higher reached in time rate! Right now, the company is using Ships, Warehouse Block F, and low importance to delivered the majority of their products in time. However, by using Planes, Warehouse Block B, and a high importance on the majority of the products going through their process, they would create a more effective process saving them and their customers money.

Rating Analysis

Next, we want to determine what factors contribute the most to Rating. We will do a similar analysis to see how number of customer care calls, number of prior purchases, gender, and discount offered will affect the rating.

Our average customer rating follows. which is very low and we need to identify where in the data we can improve.

[1] 2.990545
Correlation Matrix:
                    Customer_rating Customer_care_calls Prior_purchases
Customer_rating              1.0000              0.0122          0.0132
Customer_care_calls          0.0122              1.0000          0.1808
Prior_purchases              0.0132              0.1808          1.0000
Discount_offered            -0.0031             -0.1308         -0.0828
Gender                       0.0028              0.0025         -0.0094
                    Discount_offered  Gender
Customer_rating              -0.0031  0.0028
Customer_care_calls          -0.1308  0.0025
Prior_purchases              -0.0828 -0.0094
Discount_offered              1.0000 -0.0118
Gender                       -0.0118  1.0000

From these correlations we can see which are the most correlated. The number of customer care calls and number of prior purchases are the stats that are the most correlated to customer rating. Improving on these factors could help us improve on our companies low rating. Gender and discount are really not correlated, so these factors we do not have to improve on.

From this graph we can see some interesting relationships between number of customer care calls and the rating the company receives. Interestingly, when a customer receives 3 calls, they are way more likely to rate the company a 1 comparatively than any other number of calls is. The company should make sure to investigate why this is the case and if there is any real correlation because of it. It appears at 2, 5, 6, and 7 calls clients are more likely to rate the company hire than when they call them 3 and 4 times. So, understanding why this is the case and breaking down their internal process may help them to have a better rating.

Next, we should determine if prior purchases influences the rating.

Here we can see when a customer purchases items in the past, what kind of rating they are likely to leave. We see that buying 2 and 5 items prior more times than not lead to a customer leaving a lover review. This could mean that our customer retention is not as strong after those first few purchases. Especially in the beginning of our relationship with a customer we want to be strong so we can get our customers to continue to buy from us. Working on our process will help with customer satisfaction and increase our rating overall.

Working with these factors, we can see how small changes to the way we care for customers can help improve our rating.

Rating Comparison

In this section, I want to compare how our company’s ratings compare to that of another company: Electronic Express. Electronic Express is a electronic shipping company that is similar to the one we have been analyzing. Using the website https://www.trustpilot.com/review/electronicexpress.com I was able to scrape 2000 reviews complete with name, review, review time, and the thing we will be analyzing, review rating. Using this rating, compared to the rating in our data set, we will be able to better understand how our company stands agaist our competition. As previously stated, our company has an average rating of 2.99. Below we can see what Electronic Express’s average rating is.

Average Rating for Electronic Express Reviews:
Mean Rating:  4.37 

Comparing our average ratings across both companies, we can see that the average rating for Electronic Express is 1.38 points higher than the average rating for our company. Unfortunately the data set we have been using does not have any time data so we can not compare to see at what time of year our company does better or worse. However this does tell us that our company is severely lacking in customer satisfaction.

Conclusion

The results of our analysis show us that our company is not very competitive right now. Our main process follows a path that is not the most effective way of delivering our products on time to our customers. Our ratings show us that we need to improve our customer care and customer retention in certain areas. And looking at Electronic Express’s ratings we can see that our business will not stand if we do not make a change. We already have 11,000 purchases from customers which tells us we have a solid product. By implementing changes that improve us marginally, we would be able to seriously improve our company and become competitive