Instacart Customer Segmentation
DATA 608 Spring 2025 - Story 5
This is a customer segmentation analysis for Instacart users based on:
the total number of orders made for the year
the total number of items purchased for the year
the average number of days between orders
the minimum number of days between orders
Principal Component Analysis
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.6396 0.9536 0.47340 0.42214
Proportion of Variance 0.6721 0.2273 0.05603 0.04455
Cumulative Proportion 0.6721 0.8994 0.95545 1.00000
PC1 PC2 PC3 PC4
total_orders 0.5256406 -0.4130459 -0.2766375 0.6903381
min_freq -0.4592567 -0.6149182 -0.5885584 -0.2540825
mean_freq -0.5264728 -0.3841048 0.6293934 0.4232656
total_items 0.4853984 -0.5511190 0.4253641 -0.5288871
Principal Component Analysis
The weights for total number of orders and mean frequency of days between orders on PC1 are about equal (though the other 2 variables are not far off).
For PC2, minimum number of days between orders has the highest absolute value.
PC1 and PC2 have a combined proportion of variance of about 0.9.
PCA Biplot
Here are the projected data points, with the direction and strength of the original variables on the principal components shown as scaled arrows.
PCA and K-Means Clusters
Here are the data points once again projected onto principal components, colored by their assigned clusters and marked at the centers.
K-Means Clusters
The cluster assignments of each customer can also be plotted (based on total orders and minimum frequency, for example).
Customer Segmentation Analysis
Though 3 was selected as a starting point for the number of clusters, this number can be adjusted to further examine customer data using PCA and K-Means clustering.