MARKET SEGMENTATION FOR AIRLINES

Instruction

Market segmentation is a strategy that divides a broad target market of customers into smaller, more similar groups, and then designs a marketing strategy specifically for each group. Clustering is a common technique for market segmentation since it automatically finds similar groups given a data set.

In this problem, we’ll see how clustering can be used to find similar groups of customers who belong to an airline’s frequent flyer program. The airline is trying to learn more about its customers so that it can target different customer segments with different types of mileage offers.

The file AirlinesCluster.csv contains information on 3,999 members of the frequent flyer program. This data comes from the textbook “Data Mining for Business Intelligence,” by Galit Shmueli, Nitin R. Patel, and Peter C. Bruce. For more information, see the website for the book.

There sare seven different variables in the dataset, described below:

Balance = number of miles eligible for award travel
QualMiles = number of miles qualifying for TopFlight status
BonusMiles = number of miles earned from non-flight bonus transactions in the past 12 months
BonusTrans = number of non-flight bonus transactions in the past 12 months
FlightMiles = number of flight miles in the past 12 months
FlightTrans = number of flight transactions in the past 12 months
DaysSinceEnroll = number of days since enrolled in the frequent flyer program

Load library

library(caret)
library(caTools)
library(flexclust)
library(readr)
library(dplyr)

Glimpse Dataset

airlines = read_csv("AirlinesCluster.csv")
knitr::kable(head(airlines))

Balance	BonusMiles	BonusTrans	FlightMiles	FlightTrans	DaysSinceEnroll
28143	174	1	0	0	7000
19244	215	2	0	0	6968
41354	4123	4	0	0	7034
14776	500	1	0	0	6952
97752	43300	26	2077	4	6935
16420	0	0	0	0	6942

knitr::kable(summary(airlines))

Balance	QualMiles	BonusMiles	BonusTrans	FlightMiles	FlightTrans	DaysSinceEnroll
Min. : 0	Min. : 0.0	Min. : 0	Min. : 0.0	Min. : 0.0	Min. : 0.000	Min. : 2
1st Qu.: 18528	1st Qu.: 0.0	1st Qu.: 1250	1st Qu.: 3.0	1st Qu.: 0.0	1st Qu.: 0.000	1st Qu.:2330
Median : 43097	Median : 0.0	Median : 7171	Median :12.0	Median : 0.0	Median : 0.000	Median :4096
Mean : 73601	Mean : 144.1	Mean : 17145	Mean :11.6	Mean : 460.1	Mean : 1.374	Mean :4119
3rd Qu.: 92404	3rd Qu.: 0.0	3rd Qu.: 23801	3rd Qu.:17.0	3rd Qu.: 311.0	3rd Qu.: 1.000	3rd Qu.:5790
Max. :1704838	Max. :11148.0	Max. :263685	Max. :86.0	Max. :30817.0	Max. :53.000	Max. :8296

Normalizing the Data

preproc = preProcess(airlines)
airlinesNorm = predict(preproc, airlines)
knitr::kable(summary(airlinesNorm))

Balance	QualMiles	BonusMiles	BonusTrans	FlightMiles	FlightTrans	DaysSinceEnroll
Min. :-0.7303	Min. :-0.1863	Min. :-0.7099	Min. :-1.20805	Min. :-0.3286	Min. :-0.36212	Min. :-1.99336
1st Qu.:-0.5465	1st Qu.:-0.1863	1st Qu.:-0.6581	1st Qu.:-0.89568	1st Qu.:-0.3286	1st Qu.:-0.36212	1st Qu.:-0.86607
Median :-0.3027	Median :-0.1863	Median :-0.4130	Median : 0.04145	Median :-0.3286	Median :-0.36212	Median :-0.01092
Mean : 0.0000	Mean : 0.0000	Mean : 0.0000	Mean : 0.00000	Mean : 0.0000	Mean : 0.00000	Mean : 0.00000
3rd Qu.: 0.1866	3rd Qu.:-0.1863	3rd Qu.: 0.2756	3rd Qu.: 0.56208	3rd Qu.:-0.1065	3rd Qu.:-0.09849	3rd Qu.: 0.80960
Max. :16.1868	Max. :14.2231	Max. :10.2083	Max. : 7.74673	Max. :21.6803	Max. :13.61035	Max. : 2.02284

Hierarchical Clustering

airDist = dist(airlinesNorm, method = "euclidean")
airHclust = hclust(airDist, method = "ward.D")
clusterGroups = cutree(airHclust, k = 5)

summary_hc = airlines %>% mutate(clusterGroups) %>% group_by(clusterGroups) %>% 
    summarise_each(funs(mean))

knitr::kable(summary_hc)

clusterGroups	Balance	QualMiles	BonusMiles	BonusTrans	FlightMiles	FlightTrans	DaysSinceEnroll
1	57866.90	0.6443299	10360.124	10.823454	83.18428	0.3028351	6235.365
2	110669.27	1065.9826590	22881.763	18.229287	2613.41811	7.4026975	4402.414
3	198191.57	30.3461538	55795.860	19.663968	327.67611	1.0688259	5615.709
4	52335.91	4.8479263	20788.766	17.087558	111.57373	0.3444700	2840.823
5	36255.91	2.5111773	2264.788	2.973174	119.32191	0.4388972	3060.081

K-Means Clustering

set.seed(88)
airKmeans = kmeans(airlinesNorm, 5, iter.max = 1000)
airKclust = airKmeans$cluster

summary_km = airlines %>% mutate(airKclust) %>% group_by(airKclust) %>% summarise_each(funs(mean))

knitr::kable(summary_km)

airKclust	Balance	QualMiles	BonusMiles	BonusTrans	FlightMiles	FlightTrans	DaysSinceEnroll
1	219161.40	539.57843	62474.483	21.524510	623.8725	1.9215686	5605.051
2	174431.51	673.16312	31985.085	28.134752	5859.2340	17.0000000	4684.901
3	67977.44	34.99396	24490.019	18.429003	289.4713	0.8851964	3416.783
4	60166.18	55.20812	8709.712	8.362098	203.2589	0.6294416	6109.540
5	32706.67	126.46667	3097.478	4.284706	181.4698	0.5403922	2281.055