e-commerce (k-means)

By: Michelle Heywood

Objective

The objective is to gain customer insights of an e-commerce company that sells electronic products. BY clustering, the company may be able to see where problems arise with shipping.

The data comes from Kaggle titled: E-Commerce Shipping Data.
Data Source: https://www.kaggle.com/datasets/prachi13/customer-analytics
The data contains 10,999 observations and 12 variables of shipments to meet customer demand.

Data Dictionary

Variable	Type	Description
`ID`	Key	Unique customer ID
`Warehouse_block`	character	Warehouse divided into sections
`mode_of_shipment`	character	Ways of shipment
`customer_care_calls`	numeric	Number of inquires made about shipment
`customer_rating`	numeric	Company’s rating of customer 1 worse, 5 highest
`cost_of_the_product`	numeric	Product cost in U.S.D
`prior_purchases`	numeric	Number of prior purchases
`product_importance`	character	Importance level of product
`discounted_offer`	numeric	Discount on specific product
`weight_in_grams`	numeric	Shipment weight in grams
`reached on time`	binary	On time shipment, No/Yes

Get Data Ready for Clustering

cluster_df <- df %>%     #### get data ready for the clustering analysis 
  select(-ID, -Warehouse_block, -Mode_of_Shipment, -Product_importance, -Reached.on.Time_Y.N)
                 
# View the first 3 rows of the cluster_df
head(cluster_df, n = 3)

## # A tibble: 3 x 6
##   Customer_care_calls Customer_rating Cost_of_the_Product Prior_purchases
##                 <dbl>           <dbl>               <dbl>           <dbl>
## 1                   4               2                 177               3
## 2                   4               5                 216               2
## 3                   2               2                 183               4
## # ... with 2 more variables: Discount_offered <dbl>, Weight_in_gms <dbl>

Scale Data

The data must be standardized to make variables comparable. Standardization consists of transforming the variables such that they have mean zero and standard deviation one.

cluster_df <- scale(cluster_df) #Scale Data
head(cluster_df, n = 3)

##      Customer_care_calls Customer_rating Cost_of_the_Product Prior_purchases
## [1,]         -0.04770915      -0.7007232          -0.6906903      -0.3727178
## [2,]         -0.04770915       1.4215131           0.1207401      -1.0293770
## [3,]         -1.79980563      -0.7007232          -0.5658549       0.2839414
##      Discount_offered Weight_in_gms
## [1,]         1.889897    -1.4681730
## [2,]         2.815508    -0.3338781
## [3,]         2.136727    -0.1589950

Get the optimal number of clusters

I am using two methods for finding the optimal number of clusters
* Elbow
* Silhouhette

The Elbow method looks at the total WSS as a function of the number of clusters. Choosing a number of clusters so that adding another cluster doesn’t improve much better the total WSS.

# Elbow method
fviz_nbclust(cluster_df, kmeans, method = "wss") +
    geom_vline(xintercept = 3, linetype = 2)+
  labs(subtitle = "Elbow Method (k-means)")

# Silhouette method
fviz_nbclust(cluster_df, kmeans, method = "silhouette")+
  labs(subtitle = "Silhouette Method (k-means)")

The Elbow method suggests 3 clusters
The Silhouette method also suggest 3 clusters

k = 3 is chosen as the optimal number of clusters in the data.

k-means clustering plot

The classification into groups requires computing the distance between each pair of observations. I used the Euclidean distance measurement.

The goal is to define clusters so that the total within-cluster sum of square (WSS), is minimized

# K-means clustering
k3 <- eclust(cluster_df, "kmeans", k = 3, nstart = 25, graph = FALSE)
# Visualize k-means clusters
fviz_cluster(k3, geom = "point", ellipse.type = "norm",
             palette = "jco", ggtheme = theme_minimal())+
   labs(subtitle = "k-3")

Assessing the goodness of clustering

Silhouette Width

The silhouette analysis measures how well an observation is clustered and it estimates the average distance between clusters. The silhouette plot displays a measure of how close each point in one cluster is to points in the neighboring clusters.

1.) Observations with a large silhouette width (almost 1) are very well clustered.
2.) A small silhouette width (around 0) means that the observation lies between two clusters.
3.) Observations with a negative silhouette are probably placed in the wrong cluster.

Dunn Index

Another internal clustering validation measure which can be computed as follow:

1.) For each cluster, compute the distance between each of the objects in the cluster and the objects in the other clusters
2.) Use the minimum of this pairwise distance as the inter-cluster separation (min.separation)
3.) For each cluster, compute the distance between the objects in the same cluster.
4.) Use the maximal intra-cluster distance (i.e maximum diameter) as the intra-cluster compactness
5.) Calculate the Dunn index (D) as follow:

\[ {Dunn}=\frac{min.separation}{max.diameter}\]

If the data set contains compact and well-separated clusters, the diameter of the clusters is expected to be small and the distance between the clusters is expected to be large. Thus, Dunn index should be maximized.

View results

Cluster 1 has 6110 shipments
Cluster 2 has 2259 shipments
Clsuter 3 has 2630 shipments

\[ {WithinSS}=\frac{between SumSquares}{totalSumSquares}=36.6\%\]

Silhouette Plot

The silhouette coefficient measures how similar an object is to the the other objects in its own cluster versus those in the neighbor cluster. Values range from 1 to - 1:
***
A value of close to 1 indicates that the object is well clustered. In the other words, the object is similar to the other objects in its group.
***
A value of close to -1 indicates that the object is poorly clustered, and that assignment to some other cluster would probably improve the overall results.

fviz_silhouette(k3, palette = "jco", 
                ggtheme = theme_classic())+
   labs(subtitle = "sil.width 0.280457  k=3")

##   cluster size ave.sil.width
## 1       1 6110          0.31
## 2       2 2259          0.23
## 3       3 2630          0.26

# Silhouette information
silinfo <- k3$silinfo
names(silinfo)

## [1] "widths"          "clus.avg.widths" "avg.width"

# Silhouette widths of each observation
head(silinfo$widths[, 1:3], 10)  ##### Show the first 10 rows only ###############

##       cluster neighbor sil_width
## 3149        1        2 0.4950430
## 9117        1        2 0.4945994
## 7281        1        3 0.4915844
## 9817        1        2 0.4912080
## 6854        1        3 0.4909967
## 6720        1        3 0.4909031
## 9367        1        3 0.4907056
## 6151        1        3 0.4899460
## 10067       1        2 0.4895271
## 8961        1        2 0.4893568

# Average silhouette width of each cluster
silinfo$clus.avg.widths

## [1] 0.3099548 0.2261985 0.2585324

# The total average (mean of all individual silhouette widths)
silinfo$avg.width

## [1] 0.280457

# The size of each clusters
k3$size

## [1] 6110 2259 2630

# Mean of centers
k3$centers

##   Customer_care_calls Customer_rating Cost_of_the_Product Prior_purchases
## 1          -0.2901637     -0.00512063          -0.1854847      -0.3295137
## 2          -0.3467057     -0.02306065          -0.4462048      -0.3189409
## 3           0.9719044      0.03170382           0.8141781       1.0394739
##   Discount_offered Weight_in_gms
## 1       -0.4614149     0.6991846
## 2        1.6803274    -0.8671867
## 3       -0.3713364    -0.8794841

Are any obs. in the wrong cluster?

Yes, 290 observations in cluster #2 and #3 out of 10,999 obs. are likely in the wrong cluster because they have a negative sil.width.

# Silhouette width of observation
sil <- k3$silinfo$widths[, 1:3]
# Objects with negative silhouette
neg_sil_index <- which(sil[, 'sil_width'] < 0)
sil[neg_sil_index, , drop = FALSE]

##       cluster neighbor     sil_width
## 1096        2        1 -0.0005269036
## 479         2        3 -0.0005719781
## 2107        2        1 -0.0008795047
## 10797       2        1 -0.0011351006
## 2312        2        1 -0.0012314614
## 2068        2        3 -0.0012327547
## 973         2        1 -0.0013022129
## 2147        2        1 -0.0021617341
## 855         2        1 -0.0024570893
## 1720        2        1 -0.0027780989
## 326         2        1 -0.0029749893
## 1806        2        1 -0.0033112966
## 2631        2        1 -0.0033883830
## 2793        2        1 -0.0039388251
## 1581        2        1 -0.0042308013
## 17          2        1 -0.0042657699
## 2820        2        1 -0.0044883226
## 1290        2        1 -0.0045382113
## 2067        2        1 -0.0046196610
## 2836        2        1 -0.0050646137
## 814         2        1 -0.0054701567
## 1939        2        1 -0.0055492790
## 2187        2        1 -0.0056608534
## 1836        2        1 -0.0059714680
## 2900        2        1 -0.0069136894
## 1982        2        1 -0.0069551233
## 960         2        1 -0.0069694733
## 2800        2        1 -0.0070539126
## 193         2        1 -0.0071616628
## 580         2        1 -0.0072355736
## 5029        2        1 -0.0074578834
## 2956        2        1 -0.0075075628
## 6873        2        1 -0.0076513570
## 9634        2        1 -0.0077932370
## 129         2        1 -0.0078414068
## 1556        2        1 -0.0079162269
## 2124        2        1 -0.0081938390
## 3111        2        1 -0.0083545840
## 2370        2        1 -0.0086985335
## 1527        2        1 -0.0087975288
## 9435        2        1 -0.0088018507
## 787         2        1 -0.0090182702
## 238         2        1 -0.0090996012
## 2569        2        1 -0.0091637303
## 10418       2        1 -0.0091666325
## 1568        2        1 -0.0091733752
## 2196        2        1 -0.0096337081
## 2744        2        1 -0.0097368413
## 3133        2        1 -0.0099033854
## 370         2        1 -0.0109138422
## 1680        2        3 -0.0112531564
## 2848        2        1 -0.0118743262
## 99          2        1 -0.0124242226
## 1620        2        1 -0.0125314915
## 9           2        1 -0.0127605422
## 1022        2        1 -0.0128129862
## 661         2        1 -0.0128337787
## 2423        2        1 -0.0137349725
## 1674        2        1 -0.0137732600
## 635         2        1 -0.0137844845
## 430         2        1 -0.0139465851
## 941         2        1 -0.0145884317
## 930         2        1 -0.0149714531
## 3080        2        1 -0.0159077701
## 1458        2        1 -0.0160846247
## 2190        2        1 -0.0160990472
## 2403        2        1 -0.0162223582
## 2858        2        1 -0.0162361808
## 2417        2        1 -0.0164529129
## 3070        2        1 -0.0166753821
## 2764        2        1 -0.0173889665
## 2637        2        1 -0.0174855388
## 6015        2        1 -0.0180657734
## 847         2        1 -0.0182793571
## 921         2        1 -0.0184699709
## 502         2        1 -0.0186906239
## 93          2        1 -0.0188079291
## 2630        2        1 -0.0191393642
## 1642        2        1 -0.0192152027
## 2933        2        1 -0.0196003288
## 2495        2        1 -0.0201099888
## 1512        2        1 -0.0203238361
## 1392        2        1 -0.0203833186
## 2173        2        1 -0.0207615231
## 2155        2        1 -0.0209858840
## 1172        2        1 -0.0213854627
## 1238        2        1 -0.0214813444
## 9665        2        1 -0.0218133009
## 2443        2        1 -0.0221532193
## 25          2        1 -0.0230020943
## 10809       2        1 -0.0239815658
## 2743        2        1 -0.0240065037
## 2515        2        1 -0.0244767281
## 2479        2        1 -0.0247459901
## 10783       2        1 -0.0253935814
## 748         2        1 -0.0253942499
## 2846        2        1 -0.0255786743
## 438         2        1 -0.0261166793
## 2488        2        1 -0.0268266900
## 1511        2        1 -0.0284204983
## 860         2        1 -0.0287756320
## 2258        2        1 -0.0292228663
## 1518        2        1 -0.0317200434
## 2065        2        1 -0.0322611834
## 1002        2        1 -0.0325787929
## 3828        2        1 -0.0334662630
## 5520        2        1 -0.0337298421
## 2815        2        1 -0.0342673419
## 1959        2        1 -0.0347072750
## 1974        2        1 -0.0354738947
## 1036        2        1 -0.0367363420
## 6608        2        1 -0.0372143497
## 2728        2        1 -0.0373883755
## 1406        2        1 -0.0378728076
## 3090        2        1 -0.0387402278
## 582         2        1 -0.0389791999
## 799         2        1 -0.0404630270
## 1364        2        1 -0.0418164374
## 2794        2        1 -0.0420478611
## 2893        2        1 -0.0422478763
## 1241        2        1 -0.0423856511
## 712         2        1 -0.0433060516
## 2734        2        1 -0.0439719055
## 2462        2        1 -0.0457461281
## 614         2        1 -0.0460951563
## 1817        2        1 -0.0476833797
## 1935        2        1 -0.0480379266
## 3062        2        1 -0.0485713726
## 1408        2        1 -0.0486816787
## 2880        2        1 -0.0489163391
## 126         2        1 -0.0500652266
## 740         2        1 -0.0507149783
## 385         2        1 -0.0514337774
## 727         2        1 -0.0518145973
## 449         2        1 -0.0521921822
## 2174        2        1 -0.0526354227
## 1276        2        1 -0.0540903731
## 2359        2        1 -0.0593596541
## 1943        2        1 -0.0612332471
## 2313        2        1 -0.0615006387
## 10604       3        1 -0.0006942137
## 1417        3        1 -0.0009946183
## 1898        3        2 -0.0010815538
## 7954        3        1 -0.0012410138
## 6529        3        1 -0.0017249349
## 2902        3        1 -0.0019942618
## 6532        3        1 -0.0021891865
## 1380        3        1 -0.0022026573
## 3734        3        1 -0.0022326446
## 1658        3        2 -0.0024861684
## 8813        3        1 -0.0025145384
## 2549        3        1 -0.0025360470
## 4986        3        1 -0.0029482627
## 9121        3        1 -0.0032196408
## 6796        3        1 -0.0033592286
## 9727        3        1 -0.0033601384
## 7928        3        1 -0.0034443391
## 4052        3        1 -0.0037965742
## 1577        3        1 -0.0044440530
## 9799        3        1 -0.0052427785
## 6804        3        1 -0.0053976693
## 9127        3        1 -0.0054847337
## 4070        3        1 -0.0058174306
## 9715        3        1 -0.0059018811
## 2699        3        1 -0.0060580787
## 9234        3        1 -0.0061625210
## 7941        3        1 -0.0063737555
## 5199        3        1 -0.0068411079
## 10989       3        1 -0.0068938935
## 8065        3        1 -0.0069201572
## 4949        3        1 -0.0072259220
## 4535        3        1 -0.0072798171
## 5632        3        1 -0.0073350903
## 510         3        1 -0.0074914003
## 3560        3        1 -0.0077099106
## 7391        3        1 -0.0081422384
## 9763        3        1 -0.0082475336
## 2686        3        1 -0.0091630612
## 1776        3        1 -0.0096692775
## 6093        3        1 -0.0096919336
## 1370        3        1 -0.0097774285
## 10027       3        1 -0.0100013378
## 10002       3        1 -0.0100278847
## 4290        3        1 -0.0101038316
## 7443        3        1 -0.0108752768
## 6523        3        1 -0.0112455716
## 2329        3        1 -0.0114232118
## 8294        3        1 -0.0120609437
## 1215        3        1 -0.0120707317
## 781         3        1 -0.0120810468
## 3383        3        1 -0.0126660251
## 10554       3        1 -0.0129936385
## 10619       3        1 -0.0130025429
## 7564        3        1 -0.0135410213
## 5399        3        1 -0.0141935126
## 2722        3        1 -0.0149763051
## 9842        3        1 -0.0153023286
## 9328        3        1 -0.0159861524
## 2478        3        1 -0.0163124016
## 7719        3        1 -0.0164057108
## 5972        3        1 -0.0168496473
## 555         3        1 -0.0172244762
## 10699       3        1 -0.0173114533
## 3231        3        1 -0.0173455195
## 508         3        1 -0.0174889233
## 6964        3        1 -0.0176992760
## 9996        3        1 -0.0177226323
## 6067        3        1 -0.0177671015
## 887         3        1 -0.0184704061
## 7198        3        1 -0.0189258630
## 5623        3        1 -0.0204724945
## 10684       3        1 -0.0208751851
## 4983        3        1 -0.0209756924
## 3359        3        1 -0.0209893387
## 7125        3        1 -0.0213709010
## 7900        3        1 -0.0213780968
## 6765        3        1 -0.0214784118
## 8445        3        1 -0.0225360259
## 7300        3        1 -0.0233047806
## 72          3        1 -0.0234097741
## 3606        3        1 -0.0240939736
## 4527        3        1 -0.0241745736
## 8217        3        1 -0.0244194187
## 9620        3        1 -0.0246786602
## 1651        3        1 -0.0250882040
## 6487        3        1 -0.0253010184
## 10558       3        1 -0.0256688773
## 7789        3        1 -0.0258330302
## 9599        3        1 -0.0283922632
## 7974        3        1 -0.0289425029
## 6709        3        1 -0.0291033846
## 4882        3        1 -0.0291061000
## 10855       3        1 -0.0300823794
## 5795        3        1 -0.0303566875
## 1449        3        1 -0.0308323468
## 5449        3        1 -0.0311982359
## 375         3        1 -0.0312038656
## 10435       3        1 -0.0314755675
## 892         3        1 -0.0320920186
## 9619        3        1 -0.0321846683
## 388         3        1 -0.0322118531
## 1075        3        1 -0.0329209900
## 5402        3        1 -0.0330582699
## 7135        3        1 -0.0332851481
## 360         3        1 -0.0333873602
## 1931        3        1 -0.0336503071
## 1426        3        1 -0.0344522816
## 9646        3        1 -0.0351293674
## 794         3        1 -0.0354484610
## 2632        3        1 -0.0363982053
## 5686        3        1 -0.0376684702
## 8134        3        1 -0.0377870463
## 3338        3        1 -0.0388261170
## 4251        3        1 -0.0391779233
## 2172        3        1 -0.0403533142
## 6036        3        1 -0.0407007003
## 9325        3        1 -0.0415739430
## 5108        3        1 -0.0417472927
## 5385        3        1 -0.0419241739
## 7366        3        1 -0.0419762549
## 5110        3        1 -0.0426026291
## 2193        3        1 -0.0428453457
## 5337        3        1 -0.0428918426
## 9859        3        1 -0.0430869679
## 3050        3        1 -0.0435726941
## 9527        3        1 -0.0446878560
## 6118        3        1 -0.0454558905
## 8684        3        1 -0.0455120873
## 3309        3        1 -0.0458693399
## 1403        3        1 -0.0461221991
## 3878        3        1 -0.0462856553
## 6344        3        1 -0.0477136248
## 8409        3        1 -0.0488036209
## 9290        3        1 -0.0490358391
## 5937        3        1 -0.0500607041
## 4447        3        1 -0.0513398592
## 3391        3        1 -0.0519153485
## 5368        3        1 -0.0522132709
## 1371        3        1 -0.0556212586
## 8781        3        1 -0.0582108249
## 6695        3        1 -0.0601786352
## 6241        3        1 -0.0606194105
## 5587        3        1 -0.0607596215
## 6174        3        1 -0.0609305178
## 9991        3        1 -0.0617114630
## 8096        3        1 -0.0671009615
## 3039        3        1 -0.0710079135
## 7000        3        1 -0.0718982311
## 5751        3        1 -0.0743151009
## 9210        3        1 -0.0785673821

Dunn index

Evaluates the quality of clustering.

Dunn Index = (minimum distance between two clusters)/(maximum distance of points within the cluster)

The Dunn index should be as high as possible for the clusters to be stable.

The Dunn Index for k=3 is Dunn = 0.009534497

This model does not have a good Dunn Index and suggests the quality of the clustering is not ideal.

# Statistics for k-means clustering
k3_stats <- cluster.stats(dist(cluster_df),  k3$cluster)
# Dun index
k3_stats$dunn

## [1] 0.009534497

Overall k-3 stats:

k3_stats

## $n
## [1] 10999
## 
## $cluster.number
## [1] 3
## 
## $cluster.size
## [1] 6110 2259 2630
## 
## $min.cluster.size
## [1] 2259
## 
## $noisen
## [1] 0
## 
## $diameter
## [1] 7.528389 7.305424 7.807101
## 
## $average.distance
## [1] 2.437688 2.818444 2.740155
## 
## $median.distance
## [1] 2.403743 2.795968 2.590132
## 
## $separation
## [1] 0.07443678 0.07443678 0.13498955
## 
## $average.toother
## [1] 3.743315 3.875337 3.851397
## 
## $separation.matrix
##            [,1]       [,2]      [,3]
## [1,] 0.00000000 0.07443678 0.1349896
## [2,] 0.07443678 0.00000000 0.3643669
## [3,] 0.13498955 0.36436693 0.0000000
## 
## $ave.between.matrix
##          [,1]     [,2]     [,3]
## [1,] 0.000000 3.751563 3.736231
## [2,] 3.751563 0.000000 4.162890
## [3,] 3.736231 4.162890 0.000000
## 
## $average.between
## [1] 3.81292
## 
## $average.within
## [1] 2.588212
## 
## $n.between
## [1] 35812960
## 
## $n.within
## [1] 24670541
## 
## $max.diameter
## [1] 7.807101
## 
## $min.separation
## [1] 0.07443678
## 
## $within.cluster.ss
## [1] 41813.63
## 
## $clus.avg.silwidths
##         1         2         3 
## 0.3099548 0.2261985 0.2585324 
## 
## $avg.silwidth
## [1] 0.280457
## 
## $g2
## NULL
## 
## $g3
## NULL
## 
## $pearsongamma
## [1] 0.5786845
## 
## $dunn
## [1] 0.009534497
## 
## $dunn2
## [1] 1.325636
## 
## $entropy
## [1] 0.9937922
## 
## $wb.ratio
## [1] 0.6788005
## 
## $ch
## [1] 3178.646
## 
## $cwidegap
## [1] 1.748130 1.767741 1.897209
## 
## $widestgap
## [1] 1.897209
## 
## $sindex
## [1] 0.5388646
## 
## $corrected.rand
## NULL
## 
## $vi
## NULL

Add identifiers to the original data.

#add cluster assignment to original data
final_data <- cbind(df, cluster = k3$cluster)


#view final data 
 head(final_data, 20)

##    ID Warehouse_block Mode_of_Shipment Customer_care_calls Customer_rating
## 1   1               D           Flight                   4               2
## 2   2               F           Flight                   4               5
## 3   3               A           Flight                   2               2
## 4   4               B           Flight                   3               3
## 5   5               C           Flight                   2               2
## 6   6               F           Flight                   3               1
## 7   7               D           Flight                   3               4
## 8   8               F           Flight                   4               1
## 9   9               A           Flight                   3               4
## 10 10               B           Flight                   3               2
## 11 11               C           Flight                   3               4
## 12 12               F           Flight                   4               5
## 13 13               D           Flight                   3               5
## 14 14               F           Flight                   4               4
## 15 15               A           Flight                   4               3
## 16 16               B           Flight                   4               3
## 17 17               C           Flight                   3               4
## 18 18               F             Ship                   5               5
## 19 19               D             Ship                   5               5
## 20 20               F             Ship                   4               5
##    Cost_of_the_Product Prior_purchases Product_importance Discount_offered
## 1                  177               3                low               44
## 2                  216               2                low               59
## 3                  183               4                low               48
## 4                  176               4             medium               10
## 5                  184               3             medium               46
## 6                  162               3             medium               12
## 7                  250               3                low                3
## 8                  233               2                low               48
## 9                  150               3                low               11
## 10                 164               3             medium               29
## 11                 189               2             medium               12
## 12                 232               3             medium               32
## 13                 198               3             medium                1
## 14                 275               3               high               29
## 15                 152               3                low               43
## 16                 227               3                low               45
## 17                 143               2             medium                6
## 18                 227               3             medium               36
## 19                 239               3               high               18
## 20                 145               3             medium               45
##    Weight_in_gms Reached.on.Time_Y.N cluster
## 1           1233                   1       2
## 2           3088                   1       2
## 3           3374                   1       2
## 4           1177                   1       2
## 5           2484                   1       2
## 6           1417                   1       2
## 7           2371                   1       1
## 8           2804                   1       2
## 9           1861                   1       2
## 10          1187                   1       2
## 11          2888                   1       1
## 12          3253                   1       2
## 13          3667                   1       1
## 14          2602                   1       2
## 15          1009                   1       2
## 16          2707                   1       2
## 17          1194                   1       2
## 18          3952                   1       2
## 19          2495                   1       3
## 20          1059                   1       2

Conclusion

The e-commerce data tracks shipments of electronic products. The company has a large warehouse that is divided up into 5 blocks, depending on the product segment. The data also tracks mode of shipment, and the number of inquires about a shipment, and if a shipment is late or not.

By clustering the data, the company can assess if further investigation into the cluster with the most problems. Maybe it has to do with the warehouse block, or late shipments may be out of their control by the mode of shipment or weather.

For this model, the k-means method did not perform well with this data set. Likely because K-means does not handle outliers well. My suggestion is to segregate the data and drill down on warehouse blocks or drill down on all the observations that did not arrive on time. Or pick an supervised learning method.