Instruction

Market segmentation is a strategy that divides a broad target market of customers into smaller, more similar groups, and then designs a marketing strategy specifically for each group. Clustering is a common technique for market segmentation since it automatically finds similar groups given a data set.

In this problem, we’ll see how clustering can be used to find similar groups of customers who belong to an airline’s frequent flyer program. The airline is trying to learn more about its customers so that it can target different customer segments with different types of mileage offers.

The file AirlinesCluster.csv contains information on 3,999 members of the frequent flyer program. This data comes from the textbook “Data Mining for Business Intelligence,” by Galit Shmueli, Nitin R. Patel, and Peter C. Bruce. For more information, see the website for the book.

There sare seven different variables in the dataset, described below:

  • Balance = number of miles eligible for award travel
  • QualMiles = number of miles qualifying for TopFlight status
  • BonusMiles = number of miles earned from non-flight bonus transactions in the past 12 months
  • BonusTrans = number of non-flight bonus transactions in the past 12 months
  • FlightMiles = number of flight miles in the past 12 months
  • FlightTrans = number of flight transactions in the past 12 months
  • DaysSinceEnroll = number of days since enrolled in the frequent flyer program

Load library

library(caret)
library(caTools)
library(flexclust)
library(readr)
library(dplyr)

Glimpse Dataset

airlines = read_csv("AirlinesCluster.csv")
knitr::kable(head(airlines))
Balance QualMiles BonusMiles BonusTrans FlightMiles FlightTrans DaysSinceEnroll
28143 0 174 1 0 0 7000
19244 0 215 2 0 0 6968
41354 0 4123 4 0 0 7034
14776 0 500 1 0 0 6952
97752 0 43300 26 2077 4 6935
16420 0 0 0 0 0 6942
knitr::kable(summary(airlines))
Balance QualMiles BonusMiles BonusTrans FlightMiles FlightTrans DaysSinceEnroll
Min. : 0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.000 Min. : 2
1st Qu.: 18528 1st Qu.: 0.0 1st Qu.: 1250 1st Qu.: 3.0 1st Qu.: 0.0 1st Qu.: 0.000 1st Qu.:2330
Median : 43097 Median : 0.0 Median : 7171 Median :12.0 Median : 0.0 Median : 0.000 Median :4096
Mean : 73601 Mean : 144.1 Mean : 17145 Mean :11.6 Mean : 460.1 Mean : 1.374 Mean :4119
3rd Qu.: 92404 3rd Qu.: 0.0 3rd Qu.: 23801 3rd Qu.:17.0 3rd Qu.: 311.0 3rd Qu.: 1.000 3rd Qu.:5790
Max. :1704838 Max. :11148.0 Max. :263685 Max. :86.0 Max. :30817.0 Max. :53.000 Max. :8296

Normalizing the Data

preproc = preProcess(airlines)
airlinesNorm = predict(preproc, airlines)
knitr::kable(summary(airlinesNorm))
Balance QualMiles BonusMiles BonusTrans FlightMiles FlightTrans DaysSinceEnroll
Min. :-0.7303 Min. :-0.1863 Min. :-0.7099 Min. :-1.20805 Min. :-0.3286 Min. :-0.36212 Min. :-1.99336
1st Qu.:-0.5465 1st Qu.:-0.1863 1st Qu.:-0.6581 1st Qu.:-0.89568 1st Qu.:-0.3286 1st Qu.:-0.36212 1st Qu.:-0.86607
Median :-0.3027 Median :-0.1863 Median :-0.4130 Median : 0.04145 Median :-0.3286 Median :-0.36212 Median :-0.01092
Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00000
3rd Qu.: 0.1866 3rd Qu.:-0.1863 3rd Qu.: 0.2756 3rd Qu.: 0.56208 3rd Qu.:-0.1065 3rd Qu.:-0.09849 3rd Qu.: 0.80960
Max. :16.1868 Max. :14.2231 Max. :10.2083 Max. : 7.74673 Max. :21.6803 Max. :13.61035 Max. : 2.02284

Hierarchical Clustering

airDist = dist(airlinesNorm, method = "euclidean")
airHclust = hclust(airDist, method = "ward.D")
clusterGroups = cutree(airHclust, k = 5)

summary_hc = airlines %>% mutate(clusterGroups) %>% group_by(clusterGroups) %>% 
    summarise_each(funs(mean))

knitr::kable(summary_hc)
clusterGroups Balance QualMiles BonusMiles BonusTrans FlightMiles FlightTrans DaysSinceEnroll
1 57866.90 0.6443299 10360.124 10.823454 83.18428 0.3028351 6235.365
2 110669.27 1065.9826590 22881.763 18.229287 2613.41811 7.4026975 4402.414
3 198191.57 30.3461538 55795.860 19.663968 327.67611 1.0688259 5615.709
4 52335.91 4.8479263 20788.766 17.087558 111.57373 0.3444700 2840.823
5 36255.91 2.5111773 2264.788 2.973174 119.32191 0.4388972 3060.081

K-Means Clustering

set.seed(88)
airKmeans = kmeans(airlinesNorm, 5, iter.max = 1000)
airKclust = airKmeans$cluster

summary_km = airlines %>% mutate(airKclust) %>% group_by(airKclust) %>% summarise_each(funs(mean))

knitr::kable(summary_km)
airKclust Balance QualMiles BonusMiles BonusTrans FlightMiles FlightTrans DaysSinceEnroll
1 219161.40 539.57843 62474.483 21.524510 623.8725 1.9215686 5605.051
2 174431.51 673.16312 31985.085 28.134752 5859.2340 17.0000000 4684.901
3 67977.44 34.99396 24490.019 18.429003 289.4713 0.8851964 3416.783
4 60166.18 55.20812 8709.712 8.362098 203.2589 0.6294416 6109.540
5 32706.67 126.46667 3097.478 4.284706 181.4698 0.5403922 2281.055