Instacart Customer Segmentation

DATA 608 Spring 2025 - Story 5

Stephanie Chiang

This is a customer segmentation analysis for Instacart users based on:

  • the total number of orders made for the year

  • the total number of items purchased for the year

  • the average number of days between orders

  • the minimum number of days between orders

Principal Component Analysis

Importance of components:
                          PC1    PC2     PC3     PC4
Standard deviation     1.6396 0.9536 0.47340 0.42214
Proportion of Variance 0.6721 0.2273 0.05603 0.04455
Cumulative Proportion  0.6721 0.8994 0.95545 1.00000
                    PC1        PC2        PC3        PC4
total_orders  0.5256406 -0.4130459 -0.2766375  0.6903381
min_freq     -0.4592567 -0.6149182 -0.5885584 -0.2540825
mean_freq    -0.5264728 -0.3841048  0.6293934  0.4232656
total_items   0.4853984 -0.5511190  0.4253641 -0.5288871

Principal Component Analysis

  • The weights for total number of orders and mean frequency of days between orders on PC1 are about equal (though the other 2 variables are not far off).

  • For PC2, minimum number of days between orders has the highest absolute value.

  • PC1 and PC2 have a combined proportion of variance of about 0.9.

PCA Biplot

Here are the projected data points, with the direction and strength of the original variables on the principal components shown as scaled arrows.

PCA and K-Means Clusters

Here are the data points once again projected onto principal components, colored by their assigned clusters and marked at the centers.

K-Means Clusters

The cluster assignments of each customer can also be plotted (based on total orders and minimum frequency, for example).

Customer Segmentation Analysis

Though 3 was selected as a starting point for the number of clusters, this number can be adjusted to further examine customer data using PCA and K-Means clustering.