Unsupervised Learning: Airline Passenger Satisfaction

Introduction

In this project, we will explore the application of unsupervised learning techniques to analyze and understand airline passenger satisfaction. The data set we will be working with is the “Airplane” data set, which contains information about airline passengers, their demographics, flight details, and various aspects of their in-flight experience. The main objective of this project is to identify patterns and gain insights into factors that contribute to passenger satisfaction.

Unsupervised learning techniques, such as clustering and dimensionality reduction, will be employed to analyze the data. Clustering algorithms will help us identify groups or segments of passengers with similar characteristics, preferences, or levels of satisfaction. Dimensionality reduction, specifically Principal Component Analysis (PCA) , will be used to reduce the dimensionality of the dataset and uncover the most significant factors driving passenger satisfaction.

Project Steps

  1. Exploratory Data Analysis (EDA): We will begin by performing exploratory data analysis to gain a better understanding of the dataset. This will involve examining the structure of the data, checking for missing values, and exploring the distribution and relationships between different variables.

  2. Preprocessing: Before applying unsupervised learning techniques, we will preprocess the data by handling missing values, encoding categorical variables, and scaling numerical variables if necessary. This step is crucial to ensure the accuracy and effectiveness of the subsequent analysis.

  3. Clustering Analysis: Using appropriate clustering algorithms, such as k-means or hierarchical clustering, we will group passengers based on their attributes and behaviors. This will allow us to identify distinct segments of passengers and analyze the differences in their satisfaction levels.

  4. Dimensionality Reduction: We will apply Principal Component Analysis (PCA) to reduce the dimensionality of the dataset while retaining the most informative features. By visualizing the principal components and analyzing their corresponding loadings, we can interpret the underlying factors influencing passenger satisfaction.

  5. Interpretation and Insights: Finally, we will interpret the results of the clustering and dimensionality reduction analyses to gain insights into the key factors that drive passenger satisfaction. We will explore the characteristics of different passenger segments and identify any specific patterns or trends that contribute to higher satisfaction levels.

Expected Outcomes

By the end of this project, we aim to gain a deeper understanding of the factors that influence airline passenger satisfaction. The analysis will provide insights into the different passenger segments, their preferences, and the areas where airlines can focus to improve overall satisfaction. These findings can help airlines make data-driven decisions to enhance the travel experience and improve customer satisfaction.

Throughout the project, we will utilize R and various data science libraries, such as tidyverse, caret, and factoextra, to perform the necessary data manipulation, analysis, and visualization tasks.

Let’s get started with the exploratory data analysis phase and delve into the world of unsupervised learning to uncover valuable insights from the “Airplane” data set!

DataSet Overview

The airplane dataframe contains the following variables:

  1. X: This variable represents the index or ID of each observation in the dataset.

  2. id: This variable contains the unique ID assigned to each passenger.

  3. Gender: This variable indicates the gender of the passenger (Male or Female).

  4. Customer.Type: This variable specifies the type of customer, whether they are a loyal customer or a disloyal customer.

  5. Age: This variable represents the age of the passenger.

  6. Type.of.Travel: This variable indicates the purpose of the passenger’s travel, whether it is personal or business-related.

  7. Class: This variable represents the class of travel chosen by the passenger (Eco Plus, Business, or other classes).

  8. Flight.Distance: This variable denotes the distance of the flight in miles.

  9. Inflight.wifi.service: This variable represents the rating provided by the passenger for the in-flight Wi-Fi service on a scale of 1 to 5.

  10. Departure.Arrival.time.convenient: This variable indicates the convenience of the departure and arrival times as rated by the passenger on a scale of 1 to 5.

  11. Ease.of.Online.booking: This variable represents the passenger’s rating of the ease of online booking on a scale of 1 to 5.

  12. Gate.location: This variable denotes the rating given by the passenger for the gate location on a scale of 1 to 5.

  13. Food.and.drink: This variable represents the passenger’s rating of the food and drink service on a scale of 1 to 5.

  14. Online.boarding: This variable indicates the passenger’s rating of the online boarding experience on a scale of 1 to 5.

  15. Seat.comfort: This variable represents the passenger’s rating of the seat comfort on a scale of 1 to 5.

  16. Inflight.entertainment: This variable denotes the rating given by the passenger for the in-flight entertainment options on a scale of 1 to 5.

  17. On.board.service: This variable indicates the passenger’s rating of the on-board service provided on a scale of 1 to 5.

  18. Leg.room.service: This variable represents the passenger’s rating of the legroom service on a scale of 1 to 5.

  19. Baggage.handling: This variable denotes the rating given by the passenger for the baggage handling service on a scale of 1 to 5.

  20. Checkin.service: This variable indicates the passenger’s rating of the check-in service on a scale of 1 to 5.

  21. Inflight.service: This variable represents the passenger’s rating of the in-flight service provided on a scale of 1 to 5.

  22. Cleanliness: This variable denotes the rating given by the passenger for the cleanliness of the aircraft on a scale of 1 to 5.

  23. Departure.Delay.in.Minutes: This variable represents the delay in departure time in minutes for each flight.

  24. Arrival.Delay.in.Minutes: This variable represents the delay in arrival time in minutes for each flight.

  25. satisfaction: This variable indicates the level of passenger satisfaction, categorized as “neutral or dissatisfied” or “satisfied”.

These variables capture various aspects of the airline passenger experience, including demographics, flight details, and passenger ratings for different services provided during the flight.

Data Preparation

Import package & Dataset

Import Package :

library(dplyr) # for data wrangling
library(ggplot2) # to visualize data
library(gridExtra) # to display multiple graph
library(caret) # to pre-process data
library(tibble) # for creating and manipulating tabular data structures
library(animation) # for creating animated visualizations
library(GGally)      # Extension to ggplot2 for exploratory data analysis
library(gtools)      # Utility functions for data manipulation and statistical analysis
library(tidyr)       # Package for reshaping and tidying data
library(reshape2)    # Package for data reshaping and restructuring
library(plotly)      # Interactive plotting library
library(knitr)       # Package for dynamic report generation

# Cluster
library(factoextra)
library(FactoMineR)

Import dataSet

airplane <- read.csv("data_input/Airline_Passenge_Satisfaction.csv")
rmarkdown::paged_table(airplane)

Subsetting Dataset

Performing stratified subsetting can be a valuable approach in situations where the dataset is too large to handle directly or when memory limitations arise. In my case, My memory doesnt even stand a chance to handle this much data :(

So i Subsetting the data using a stratified approach can help mitigate this issue. Other than Memory Efficieny There’s other reason for me to for performing stratified subsetting :

  1. Representativeness: Stratified subsetting helps to maintain the representativeness of the original dataset. By preserving the proportion of observations from each category in a specific variable, such as satisfaction in my case, we ensure that the resulting subset reflects the same distribution and patterns present in the full dataset.

  2. Efficient Resource Utilization: Subsetting the data to a smaller representative sample can significantly reduce the computational resources required for the subsequent cluster analysis

# Stratified subsetting based on satisfaction variable
subset_airplane <- airplane %>% 
  group_by(satisfaction) %>%
  slice_sample(n = 5000, replace = FALSE)

Regarding why we only take 10% of the total dataframe, the decision to choose a specific subset size depends on various factors. Taking a 10% subset is a common practice in statistical sampling, as it provides a reasonable compromise between computational feasibility and maintaining a representative sample. By selecting a smaller subset, we reduce the computational load while still capturing the main patterns and characteristics of the original data.

Data Pre-processing

Data Wrangling

glimpse(subset_airplane)
#> Rows: 10,000
#> Columns: 25
#> Groups: satisfaction [2]
#> $ X                                 <int> 40209, 36950, 77829, 97678, 68824, 8…
#> $ id                                <int> 104473, 16249, 8626, 11114, 94100, 7…
#> $ Gender                            <chr> "Female", "Male", "Female", "Male", …
#> $ Customer.Type                     <chr> "disloyal Customer", "disloyal Custo…
#> $ Age                               <int> 25, 25, 64, 54, 55, 59, 63, 36, 26, …
#> $ Type.of.Travel                    <chr> "Business travel", "Business travel"…
#> $ Class                             <chr> "Business", "Eco", "Business", "Eco"…
#> $ Flight.Distance                   <int> 1671, 748, 1652, 216, 562, 1096, 719…
#> $ Inflight.wifi.service             <int> 4, 2, 3, 3, 3, 2, 2, 4, 4, 1, 3, 3, …
#> $ Departure.Arrival.time.convenient <int> 5, 2, 2, 4, 5, 1, 4, 4, 4, 4, 3, 5, …
#> $ Ease.of.Online.booking            <int> 3, 2, 2, 3, 3, 2, 2, 4, 3, 1, 3, 3, …
#> $ Gate.location                     <int> 3, 1, 2, 1, 3, 3, 5, 3, 4, 3, 4, 4, …
#> $ Food.and.drink                    <int> 5, 3, 4, 2, 4, 2, 3, 3, 3, 4, 2, 3, …
#> $ Online.boarding                   <int> 3, 2, 4, 3, 5, 2, 5, 4, 3, 1, 3, 3, …
#> $ Seat.comfort                      <int> 5, 3, 3, 2, 5, 2, 4, 3, 3, 2, 2, 3, …
#> $ Inflight.entertainment            <int> 5, 3, 3, 2, 5, 2, 1, 3, 3, 4, 2, 5, …
#> $ On.board.service                  <int> 4, 1, 3, 5, 5, 1, 1, 5, 4, 1, 3, 4, …
#> $ Leg.room.service                  <int> 5, 3, 3, 4, 3, 4, 2, 4, 4, 2, 2, 5, …
#> $ Baggage.handling                  <int> 5, 5, 3, 5, 2, 2, 1, 3, 3, 3, 4, 4, …
#> $ Checkin.service                   <int> 4, 2, 1, 5, 5, 3, 4, 4, 4, 1, 4, 3, …
#> $ Inflight.service                  <int> 4, 5, 3, 4, 5, 2, 1, 3, 4, 3, 3, 4, …
#> $ Cleanliness                       <int> 5, 3, 1, 2, 3, 2, 4, 3, 3, 4, 2, 3, …
#> $ Departure.Delay.in.Minutes        <int> 4, 100, 0, 0, 8, 67, 0, 1, 0, 18, 32…
#> $ Arrival.Delay.in.Minutes          <dbl> 10, 88, 0, 0, 8, 53, 0, 0, 0, 32, 20…
#> $ satisfaction                      <chr> "neutral or dissatisfied", "neutral …

Dropping Un-used Columns

let’s explain why we might want to drop these columns for clustering modeling:

  • Irrelevant for Clustering: The X and id columns typically serve as identification or indexing columns, providing no meaningful information about the underlying patterns or characteristics of the data points
  • Dimensionality Reduction: Removing irrelevant or redundant columns, such as X and id, helps reduce the dimensionality of the dataset.
# Drop 'X' and 'id' columns
airplane <- select(subset_airplane, -c(X, id))

Handling NA Value

# Calculate the total and percentage of missing values in each column
missing_data <- data.frame(
  Column = names(airplane),
  Total = colSums(is.na(airplane)),
  Percent = colMeans(is.na(airplane)) * 100
)

# Create a tibble with the missing data information
as_tibble(missing_data)
#> # A tibble: 23 × 3
#>    Column                            Total Percent
#>    <chr>                             <dbl>   <dbl>
#>  1 Gender                                0       0
#>  2 Customer.Type                         0       0
#>  3 Age                                   0       0
#>  4 Type.of.Travel                        0       0
#>  5 Class                                 0       0
#>  6 Flight.Distance                       0       0
#>  7 Inflight.wifi.service                 0       0
#>  8 Departure.Arrival.time.convenient     0       0
#>  9 Ease.of.Online.booking                0       0
#> 10 Gate.location                         0       0
#> # ℹ 13 more rows

In the “Airplane” dataset, we encountered missing values in the "Arrival.Delay.in.Minutes" variable. Missing values can occur due to various reasons, such as data collection errors or unrecorded information. It is essential to handle missing values appropriately to ensure the integrity and completeness of our analysis.

To address the missing values in the “Arrival.Delay.in.Minutes” variable, we employed mean imputation. Mean imputation is a common technique used to replace missing values with the mean value of the available observations in the same variable. By imputing missing values with the mean, we aim to preserve the overall distribution and statistical properties of the variable.

Using the mean() function in R, we calculated the mean of the "Arrival.Delay.in.Minutes" variable while excluding the NA values. This mean value was then used to replace the missing values in the variable. By imputing the mean for the missing values, we have ensured that the dataset is complete and ready for further analysis.

# Compute mean of "Arrival.Delay.in.Minutes" with NA value
airplane$Arrival.Delay.in.Minutes <- mean(airplane$Arrival.Delay.in.Minutes, na.rm = TRUE)

Encoding Categorical Variabels

Converting categorical variables into numeric columns for clustering modeling is important because it enables the use of distance-based metrics, ensures equal treatment of variables, and allows compatibility with a variety of clustering algorithms. This conversion allows us to effectively analyze and identify meaningful patterns and groups within the data.

airplane <- airplane %>%
  mutate(
    Gender = case_when(
      Gender == "Female" ~ 0,
      Gender == "Male" ~ 1
    ),
    Customer.Type = case_when(
      Customer.Type == "disloyal Customer" ~ 0,
      Customer.Type == "Loyal Customer" ~ 1
    ),
    Type.of.Travel = case_when(
      Type.of.Travel == "Personal Travel" ~ 0,
      Type.of.Travel == "Business travel" ~ 1
    ),
    Class = case_when(
      Class == "Eco" ~ 1,
      Class == "Eco Plus" ~ 2,
      Class == "Business" ~ 3
    ),
    satisfaction = case_when(
      satisfaction == "neutral or dissatisfied" ~ 0,
      satisfaction == "satisfied" ~ 1
    )
  )

Scalling Numerical Variabel

Scaling numerical variables is important for clustering modeling due to the following main reasons:

  1. Comparable Measurement Scales: Scaling numerical variables ensures that they have comparable measurement scales, facilitating meaningful distance calculations for clustering.

  2. Equal Weighting of Variables: Scaling ensures that each numerical variable contributes equally to the clustering analysis, preventing variables with larger scales from dominating the results.

  3. Improved Clustering Performance: Scaling enhances clustering algorithm performance by helping them converge faster and generate more accurate clusters.

  4. Enhanced Interpretability: Scaling allows for a more balanced consideration of all variables, leading to clearer interpretations of the resulting clusters and their underlying patterns.

Overall, scaling numerical variables in clustering modeling ensures that variables are on a comparable scale, that their contributions are equally weighted, and that the clustering algorithm can perform optimally.

# Select only numeric variables for scaling
airplane_scaled <- airplane[, sapply(airplane, is.numeric)]

# Scale the numeric variables
airplane_scaled <- scale(airplane_scaled)

Exploraty Data Analysis

Checking Clustering Possibility

Checking the scatter matrix in exploratory data analysis is essential when exploring clustering opportunities in your dataset. The scatter matrix is a matrix of scatter plots that allows you to visualize the relationships between multiple variables (in this case, Age and Flight.Distance) in a single view. Each scatter plot represents the relationship between two continuous variables, and when you use color or other visual cues to represent additional categorical variables (like “satisfaction” in this case), it becomes a powerful tool for understanding data patterns.

scatter_matrix <- ggplot(airplane, aes(x = Age, y = Flight.Distance)) +
  geom_point(alpha = 0.7, aes(color = satisfaction)) +
  facet_grid(.~satisfaction) +
  theme_bw() +
  labs(title = "Scatter Plot Matrix - Age vs. Flight Distance",
       x = "Age",
       y = "Flight Distance",
       color = "Satisfaction")

# Print the scatter plot matrix
print(scatter_matrix)

Checking Dimension Reduction Opportunities

Checking Correlation

Checking the correlation matrix in exploratory data analysis is crucial when exploring clustering opportunities in your dataset. The correlation matrix quantifies the relationships between all pairs of variables, showing how they are linearly related to each other

ggcorr(airplane, label = TRUE, 
       label_size = 2.9, 
       hjust = 1, 
       layout.exp = 2)

In our dataset, it appears that there is a high correlation among the variables. Given the presence of high correlation, it is advisable to perform Principal Component Analysis (PCA).

The high correlation in the dataset suggests that some variables may carry redundant or similar information, which can lead to multicollinearity issues during clustering or other analyses. By applying PCA, we can transform the original variables into a new set of uncorrelated components, known as principal components. These components retain the most significant variance in the data while removing the multicollinearity problem.

Other EDA

Checking Matrix Covariance in airplane :

Checking the covariance matrix in the “airplane” dataset is crucial for clustering and PCA analyses. It helps identify relevant variables for clustering decisions, measure similarity between variables, and select principal components for dimensionality reduction. Understanding the covariance structure enhances the interpretability and efficiency of both clustering and PCA tasks in the dataset.

plot(prcomp(airplane_scaled))

After scaling the variables in the airplane dataset, they now range between 0 and 4. Scaling is a preprocessing step commonly used in data analysis to bring all variables to a similar scale, allowing for fair comparisons and avoiding potential dominance by variables with large numerical values

Check distribution

checking the distribution of variables using summary() is an integral part of unsupervised learning. It enables us to handle outliers, apply data preprocessing techniques, and make informed decisions during the clustering and PCA processes. By understanding the data distribution, we can optimize the unsupervised learning algorithms to extract meaningful patterns and structures from the “airplane” dataset.

summary(airplane_scaled)
#>      Gender        Customer.Type          Age           Type.of.Travel   
#>  Min.   :-0.9704   Min.   :-2.1886   Min.   :-2.19218   Min.   :-1.5968  
#>  1st Qu.:-0.9704   1st Qu.: 0.4569   1st Qu.:-0.84495   1st Qu.:-1.5968  
#>  Median :-0.9704   Median : 0.4569   Median : 0.03075   Median : 0.6262  
#>  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000  
#>  3rd Qu.: 1.0304   3rd Qu.: 0.4569   3rd Qu.: 0.77173   3rd Qu.: 0.6262  
#>  Max.   : 1.0304   Max.   : 0.4569   Max.   : 3.06202   Max.   : 0.6262  
#>      Class         Flight.Distance   Inflight.wifi.service
#>  Min.   :-1.1234   Min.   :-1.1549   Min.   :-2.0410      
#>  1st Qu.:-1.1234   1st Qu.:-0.7936   1st Qu.:-0.5743      
#>  Median : 0.9525   Median :-0.3657   Median : 0.1591      
#>  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000      
#>  3rd Qu.: 0.9525   3rd Qu.: 0.5979   3rd Qu.: 0.8924      
#>  Max.   : 0.9525   Max.   : 3.6501   Max.   : 1.6258      
#>  Departure.Arrival.time.convenient Ease.of.Online.booking Gate.location     
#>  Min.   :-2.00013                  Min.   :-1.9468        Min.   :-1.52751  
#>  1st Qu.:-0.69183                  1st Qu.:-0.5478        1st Qu.:-0.75530  
#>  Median :-0.03768                  Median : 0.1517        Median : 0.01691  
#>  Mean   : 0.00000                  Mean   : 0.0000        Mean   : 0.00000  
#>  3rd Qu.: 0.61647                  3rd Qu.: 0.8512        3rd Qu.: 0.78912  
#>  Max.   : 1.27062                  Max.   : 1.5508        Max.   : 1.56133  
#>  Food.and.drink    Online.boarding    Seat.comfort     Inflight.entertainment
#>  Min.   :-2.4392   Min.   :-2.4620   Min.   :-1.9018   Min.   :-1.8328       
#>  1st Qu.:-0.9247   1st Qu.:-0.9865   1st Qu.:-0.3758   1st Qu.:-1.0728       
#>  Median :-0.1675   Median : 0.4889   Median : 0.3872   Median : 0.4472       
#>  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000       
#>  3rd Qu.: 0.5897   3rd Qu.: 0.4889   3rd Qu.: 1.1502   3rd Qu.: 0.4472       
#>  Max.   : 1.3469   Max.   : 1.2266   Max.   : 1.1502   Max.   : 1.2072       
#>  On.board.service  Leg.room.service  Baggage.handling  Checkin.service  
#>  Min.   :-1.9094   Min.   :-2.6228   Min.   :-2.2546   Min.   :-1.8930  
#>  1st Qu.:-0.3394   1st Qu.:-1.0846   1st Qu.:-0.5574   1st Qu.:-0.2913  
#>  Median : 0.4456   Median : 0.4536   Median : 0.2913   Median :-0.2913  
#>  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
#>  3rd Qu.: 0.4456   3rd Qu.: 0.4536   3rd Qu.: 1.1399   3rd Qu.: 0.5095  
#>  Max.   : 1.2306   Max.   : 1.2227   Max.   : 1.1399   Max.   : 1.3103  
#>  Inflight.service   Cleanliness      Departure.Delay.in.Minutes
#>  Min.   :-2.2799   Min.   :-1.7809   Min.   :-0.39139          
#>  1st Qu.:-0.5742   1st Qu.:-1.0150   1st Qu.:-0.39139          
#>  Median : 0.2786   Median :-0.2491   Median :-0.39139          
#>  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000          
#>  3rd Qu.: 1.1315   3rd Qu.: 0.5168   3rd Qu.:-0.09478          
#>  Max.   : 1.1315   Max.   : 1.2828   Max.   :19.13081          
#>  Arrival.Delay.in.Minutes  satisfaction
#>  Min.   :1                Min.   :-1   
#>  1st Qu.:1                1st Qu.:-1   
#>  Median :1                Median : 0   
#>  Mean   :1                Mean   : 0   
#>  3rd Qu.:1                3rd Qu.: 1   
#>  Max.   :1                Max.   : 1

The result from the summary() function indicates that after scaling, the variables in the “airplane” dataset now range between 0 and 4. Scaling is a crucial step in unsupervised learning, including clustering and PCA, as it ensures that all variables are on a standardized scale. This allows for fair comparisons and prevents certain variables from dominating the analysis due to their larger numerical values. By scaling the variables to the same range, we ensure that each variable contributes equally to the unsupervised learning process, leading to more meaningful and reliable results.


Clustering

Clustering is an unsupervised machine learning technique used to identify and group similar data points into clusters based on their intrinsic patterns and similarities. The primary goal of clustering is to partition the data in such a way that objects within the same cluster are more similar to each other than to those in other clusters.

In the context of our airplane dataframe, clustering can help us discover distinct groups of passengers based on their characteristics and experiences. For example, we can use clustering to identify segments of passengers who share similar travel preferences, satisfaction levels, or other attributes. By doing so, we can gain valuable insights into customer behavior, tailor services, and improve overall customer experience.

Finding Optimal Number of Cluster

Finding the optimal number of clusters is a critical step in clustering analysis, as it determines the most appropriate number of groups to partition the data. Three commonly used methods for this purpose are the Elbow Method, Silhouette Method.

Elbow Method

fviz_nbclust(
  x = airplane_scaled, #data untuk clustering
  FUNcluster = kmeans, #algoritma kmeans
  method = "wss" #berdasarkan wss
)

The Elbow Method suggests that the optimal number of clusters is 3. At this point, the explained variance starts to level off, and adding more clusters does not significantly improve the variance explained by the clustering algorithm.

Silhouette Method

fviz_nbclust(
  x = airplane_scaled, #data untuk clustering
  FUNcluster = kmeans, #algoritma kmeans
  method = "silhouette" #berdasarkan Sillhouete Methode
 )

The Silhouette Method yields an optimal number of 2 clusters. This is determined by the maximum average silhouette score achieved when the data points are divided into two groups. Higher average silhouette scores indicate well-defined and distinct clusters

In conclusion, based on the different clustering analysis methods applied to the “airplane” dataset, we obtain varying optimal numbers of clusters: 3 clusters from the Elbow Method, 2 clusters from the Silhouette Method, and 1 cluster from the Gap Statistic. Each method offers valuable insights into the dataset, and the choice of the optimal number of clusters depends on the specific context and objectives of the analysis.

K-Means Clustering

Right now we are performing K-Means Clustering on the airplane_scaled dataset. K-Means is an unsupervised machine learning algorithm that partitions the data into a specified number of clusters (in this case, 3 clusters) based on the similarity of data points to the cluster centers. The algorithm aims to minimize the within-cluster sum of squares, effectively grouping similar data points together while maximizing the separation between clusters.

library(animation)
# pakai 'interval' yang lebih tinggi bila animasi terlalu cepat
# jalankan command ini di console:

RNGkind(sample.kind = "Rounding")
set.seed(100)
ani.options(interval = 1)
par(mar = c(3, 3, 1, 1.5), mgp = c(1.5, 0.5, 0))
kmeans.ani()
RNGkind(sample.kind = "Rounding")
set.seed(100)

# k-means dengan k optimum
airplane_cluster <- kmeans(x = airplane_scaled,
                           centers = 3)

airplane_cluster
#> K-means clustering with 3 clusters of sizes 2652, 3392, 3956
#> 
#> Cluster means:
#>         Gender Customer.Type        Age Type.of.Travel      Class
#> 1 -0.029589517   -0.07581208 -0.1446386    -0.75685600 -0.7942162
#> 2  0.001108934   -0.27780536 -0.1781261    -0.09664744 -0.3014576
#> 3  0.018885211    0.28902159  0.2496929     0.59024526  0.7909013
#>   Flight.Distance Inflight.wifi.service Departure.Arrival.time.convenient
#> 1      -0.4985866           0.005037965                        0.21145103
#> 2      -0.2225662          -0.339491421                       -0.11096264
#> 3       0.5250749           0.287713402                       -0.04660841
#>   Ease.of.Online.booking Gate.location Food.and.drink Online.boarding
#> 1             -0.1236495    0.02186144      0.5917204      -0.1898569
#> 2             -0.1706043   -0.03704310     -0.8322952      -0.5970224
#> 3              0.2291730    0.01710658      0.3169623       0.6391811
#>   Seat.comfort Inflight.entertainment On.board.service Leg.room.service
#> 1    0.3834822              0.4119284       -0.1922423       -0.3125912
#> 2   -0.9253081             -1.0501691       -0.4622439       -0.3574390
#> 3    0.5363120              0.6243022        0.5252168        0.5160326
#>   Baggage.handling Checkin.service Inflight.service Cleanliness
#> 1       -0.1849048     -0.06788385       -0.1490822   0.5740314
#> 2       -0.3902565     -0.34446938       -0.3818722  -0.9725540
#> 3        0.4585737      0.34086656        0.4273702   0.4490829
#>   Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
#> 1                 0.01766059                  0.99995   -0.5384346
#> 2                 0.04675663                  0.99995   -0.6691882
#> 3                -0.05192983                  0.99995    0.9347359
#> 
#> Clustering vector:
#>     [1] 3 2 2 2 1 2 1 1 1 2 2 1 2 1 1 1 2 2 2 2 2 1 1 2 1 2 1 2 2 2 1 2 2 1 2 1
#>    [37] 2 1 1 1 1 1 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 1 1 2 1 1 1 2 2 2 2 1 1 1 2
#>    [73] 2 2 1 1 1 1 2 2 2 2 2 1 2 2 1 1 1 1 2 1 3 2 2 2 1 2 2 2 2 2 2 1 2 1 1 1
#>   [109] 1 2 1 1 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 2 1 1 1 1 2 2 1 2 2 1 2 2 2 1 2 1
#>   [145] 2 1 2 1 2 2 1 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 1 2 1 2 1 1
#>   [181] 2 1 2 2 2 1 1 2 1 1 2 2 2 3 1 2 2 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 2 3 2 1
#>   [217] 2 2 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 1 1 1 2 2 2 1 2 1 2 2 2 2 1 2 1 2 1 1
#>   [253] 1 1 1 2 1 2 2 2 1 2 1 2 2 1 2 2 2 1 1 2 2 1 2 1 1 2 2 1 1 1 1 1 2 2 2 1
#>   [289] 2 1 2 2 1 2 1 2 3 1 2 1 1 3 2 2 2 1 1 2 2 2 1 2 2 1 2 1 2 2 2 1 2 2 1 2
#>   [325] 2 2 2 2 2 1 2 2 2 1 2 1 1 2 1 2 2 2 1 2 2 1 2 1 2 2 2 2 2 1 2 1 2 1 1 1
#>   [361] 2 1 2 1 1 2 1 2 2 1 2 2 1 2 2 1 1 2 2 2 1 1 1 1 2 1 1 1 1 2 2 2 2 1 1 2
#>   [397] 1 2 1 2 2 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 1 1 2 2 2 2 2 1 1
#>   [433] 1 2 2 2 2 2 3 1 2 2 1 2 1 1 2 1 2 2 3 2 2 1 1 2 1 2 2 1 1 2 2 3 2 3 1 2
#>   [469] 1 2 2 2 2 1 2 1 2 2 1 1 1 1 2 2 1 2 1 2 2 2 2 1 2 2 2 2 2 2 1 1 2 1 1 2
#>   [505] 2 2 2 2 1 1 2 1 1 2 2 2 1 2 2 2 1 1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 2 2 2 1
#>   [541] 2 2 1 2 3 2 1 2 1 2 2 2 1 1 2 2 1 2 2 3 2 1 2 2 1 2 2 2 1 1 1 1 2 1 1 1
#>   [577] 1 1 2 2 1 1 2 2 2 2 1 2 1 1 1 2 2 1 1 1 2 2 2 2 2 2 2 2 1 1 2 1 1 1 2 2
#>   [613] 1 2 1 1 2 1 1 1 1 2 2 1 2 1 2 2 2 1 1 2 1 1 1 2 1 2 2 1 1 2 1 1 2 2 1 1
#>   [649] 2 2 2 1 2 1 2 2 2 2 1 3 1 1 3 2 1 2 1 2 2 2 2 1 2 2 1 2 1 1 2 2 2 1 1 1
#>   [685] 1 2 1 1 1 1 2 1 1 1 2 2 2 3 1 2 2 2 2 1 2 2 2 1 1 2 3 1 2 1 2 2 2 1 1 2
#>   [721] 1 1 2 2 1 2 2 1 2 2 1 2 2 2 1 1 2 2 2 1 2 1 2 2 1 1 1 1 1 3 2 2 2 1 1 2
#>   [757] 2 2 3 2 2 2 2 2 2 2 1 1 2 1 1 2 1 1 2 2 2 2 1 1 2 3 2 2 2 2 2 1 2 1 2 1
#>   [793] 2 1 2 2 1 1 2 2 1 1 2 2 2 1 1 1 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 1 2 2 1 2
#>   [829] 2 2 1 1 1 2 2 1 2 2 1 2 2 2 1 2 1 2 2 1 2 2 1 1 2 2 2 1 3 2 1 1 2 2 2 1
#>   [865] 2 1 2 2 2 1 1 2 2 1 1 2 2 1 2 1 1 2 2 2 2 2 1 2 1 1 2 2 1 2 2 2 2 2 2 1
#>   [901] 2 2 2 2 2 2 1 2 2 2 2 1 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 2 2 2 1 2 2 1 2 1
#>   [937] 2 1 2 1 1 2 1 2 2 2 1 1 1 2 1 2 3 2 1 1 2 2 2 2 2 1 2 2 1 1 2 1 2 1 1 2
#>   [973] 2 1 2 2 1 1 2 1 1 2 2 2 2 1 1 2 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 2 1 2 2 2
#>  [1009] 1 2 2 2 2 1 2 2 2 1 2 1 2 2 1 2 2 1 2 2 2 1 1 2 2 1 1 2 2 2 2 1 2 2 2 1
#>  [1045] 2 1 2 1 2 1 2 3 2 1 2 2 1 1 1 2 2 1 2 2 1 2 2 2 1 2 1 3 2 2 1 1 2 2 1 2
#>  [1081] 3 1 2 2 3 2 2 2 2 1 1 2 2 1 2 1 3 2 2 2 2 1 1 2 2 2 2 1 2 2 1 1 1 1 2 2
#>  [1117] 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 3 2 1 1 1 1 2 2 2 2 2 1 2 2 2 2 1 1 2 1 1
#>  [1153] 1 2 1 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 1 2 1 2 2 2 1 1 1 2 2 1 2 1 1 2 2 2
#>  [1189] 1 1 2 1 2 2 1 2 2 2 2 2 2 1 1 1 2 2 2 2 1 2 1 2 2 1 1 2 2 2 2 1 1 2 2 2
#>  [1225] 1 2 1 2 1 2 2 1 1 2 1 1 2 2 2 3 2 2 2 2 2 1 1 2 2 1 1 1 1 1 2 2 2 1 2 1
#>  [1261] 1 2 2 2 1 2 2 2 1 2 1 2 1 2 2 2 1 1 1 2 1 2 1 1 1 2 2 1 1 1 2 2 3 2 2 2
#>  [1297] 2 1 2 1 1 1 2 2 2 1 2 2 1 1 2 1 1 1 1 2 1 1 2 2 2 1 1 2 1 2 2 2 2 2 2 2
#>  [1333] 2 2 2 1 2 2 2 2 3 2 1 2 1 2 2 1 2 1 2 1 1 1 1 2 2 2 2 2 1 1 2 2 2 1 1 2
#>  [1369] 2 1 2 2 2 1 2 2 2 2 2 1 1 1 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2
#>  [1405] 1 2 1 2 2 1 2 1 2 2 3 2 2 1 1 2 1 1 2 2 2 2 2 2 2 1 2 1 2 1 2 2 1 2 1 2
#>  [1441] 1 2 2 2 2 2 1 1 1 1 1 2 2 1 1 1 2 1 1 2 1 1 2 2 1 2 2 2 1 1 1 2 2 2 2 2
#>  [1477] 1 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 1 1 1 2 2 2 1 1 2 1 2 2 2 1 2 2 2
#>  [1513] 2 2 2 1 1 2 2 2 1 2 1 1 2 1 2 2 1 1 2 2 1 2 1 3 2 2 2 2 2 2 1 2 2 1 1 2
#>  [1549] 2 1 2 2 2 2 1 1 1 2 2 1 2 2 2 1 1 1 2 1 2 2 2 1 1 1 2 1 2 1 1 1 1 1 2 3
#>  [1585] 1 1 2 2 1 2 1 1 3 2 1 1 2 1 2 1 1 2 1 3 1 1 1 1 2 2 2 2 1 2 2 2 2 2 1 2
#>  [1621] 2 1 1 1 1 2 2 1 1 2 2 2 2 1 1 2 1 1 2 2 2 3 1 2 1 2 2 2 2 2 2 2 2 2 2 1
#>  [1657] 2 1 2 2 2 3 1 2 1 1 1 1 3 2 1 2 2 2 1 2 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2
#>  [1693] 2 1 2 1 2 2 2 2 1 3 2 2 2 2 2 3 2 2 2 1 2 2 2 2 2 1 2 1 1 1 1 1 1 2 2 1
#>  [1729] 2 1 2 2 1 2 1 1 1 2 2 2 1 2 2 1 2 1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 1 1 2 1
#>  [1765] 2 1 2 2 2 1 1 2 2 1 1 2 1 1 2 1 1 2 2 2 1 1 1 2 2 2 2 2 2 2 1 3 1 1 2 2
#>  [1801] 1 1 1 1 2 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 2 1 2
#>  [1837] 2 1 1 2 2 2 1 2 2 1 2 1 2 2 1 1 1 2 2 2 3 1 1 1 1 1 2 1 1 2 2 3 3 2 1 2
#>  [1873] 2 1 2 1 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2
#>  [1909] 2 1 2 1 2 1 2 2 1 2 1 2 1 2 2 2 2 1 1 2 2 1 1 2 2 2 3 2 2 1 1 1 1 2 2 1
#>  [1945] 2 2 2 2 2 1 1 2 2 1 1 2 1 2 2 2 1 2 2 1 2 2 2 3 2 1 1 2 2 2 2 2 2 1 2 1
#>  [1981] 2 2 2 1 1 2 1 2 2 2 2 2 1 2 2 1 1 2 2 1 2 1 1 1 2 2 2 1 2 2 1 2 1 1 1 2
#>  [2017] 2 1 2 2 2 2 1 1 1 2 2 2 2 2 2 1 1 2 2 1 1 2 1 2 2 2 2 2 2 2 2 2 2 3 1 1
#>  [2053] 2 2 2 2 1 2 2 1 2 2 2 2 1 1 2 1 1 2 1 2 2 2 1 2 2 1 1 1 1 2 2 2 2 1 2 2
#>  [2089] 2 2 1 2 1 2 2 2 2 2 3 2 2 2 1 2 2 2 2 1 1 1 1 2 2 2 1 2 2 1 1 2 2 2 2 3
#>  [2125] 1 2 2 2 2 1 2 2 2 2 1 2 2 2 1 1 2 2 2 1 2 1 2 1 1 1 1 1 1 2 2 2 1 2 2 2
#>  [2161] 2 2 2 2 2 2 1 2 2 2 2 1 2 1 2 1 1 1 1 2 1 1 2 1 3 1 2 2 1 1 1 2 3 2 1 3
#>  [2197] 1 1 2 2 2 2 1 2 1 2 1 1 2 2 1 2 2 1 2 2 1 1 1 1 1 1 2 2 1 1 3 2 3 2 2 2
#>  [2233] 2 1 2 1 1 1 2 1 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 1 2 1 2 2 2 1 2 2 1 1 2
#>  [2269] 1 2 1 1 1 2 2 2 1 2 1 2 2 2 1 1 1 2 2 1 1 1 2 1 2 2 3 2 2 1 1 2 2 2 3 1
#>  [2305] 2 2 1 2 2 2 2 1 2 2 2 2 2 1 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2
#>  [2341] 2 1 1 2 1 2 1 1 1 2 1 1 1 2 2 2 1 1 2 2 1 1 2 2 2 1 1 2 2 1 1 1 2 1 2 1
#>  [2377] 2 1 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 2 1 2 1 2 2 2 2 1 3 1 3 1 2 2 2 3 1 2
#>  [2413] 1 1 1 2 1 2 1 1 2 1 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 2 1 1 1 1 2 2 2 1 2 2
#>  [2449] 2 2 1 1 2 1 2 1 2 2 1 1 1 1 2 1 1 3 2 2 1 2 1 2 2 2 1 2 2 1 1 2 2 3 3 2
#>  [2485] 2 3 2 1 2 1 1 3 1 1 1 2 1 2 2 2 3 1 2 1 2 1 2 1 2 1 1 1 2 1 1 2 2 1 2 1
#>  [2521] 1 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 1 2 2 2 1 2 2 2 2 2 1 2 2 2 1 2 1 2 2 1
#>  [2557] 2 1 3 2 1 1 2 3 1 1 2 2 2 2 2 2 1 2 1 2 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 2
#>  [2593] 1 2 2 1 1 3 1 2 1 1 1 1 2 3 2 2 2 2 1 1 2 1 3 2 2 2 1 2 2 2 1 1 1 1 2 2
#>  [2629] 1 1 1 2 1 1 2 1 2 1 2 2 2 1 1 1 2 1 2 2 2 2 1 2 2 1 2 2 1 2 2 2 2 1 2 1
#>  [2665] 2 2 1 1 1 1 1 2 1 1 2 1 2 2 2 1 2 1 2 2 1 2 1 2 1 1 1 2 2 2 1 2 1 1 1 2
#>  [2701] 2 3 1 1 2 2 2 2 1 2 1 2 1 1 1 1 1 2 1 2 2 2 1 1 2 2 1 1 1 2 2 2 2 2 2 2
#>  [2737] 2 1 1 2 2 1 2 1 2 2 2 2 2 1 2 2 2 2 2 2 1 2 1 1 2 1 1 1 1 2 2 2 1 1 3 2
#>  [2773] 2 2 2 2 2 1 2 2 1 2 3 2 2 2 2 1 1 2 1 1 1 2 2 1 1 1 1 1 2 1 2 2 1 2 1 1
#>  [2809] 1 2 1 1 2 2 1 2 2 2 2 2 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 2 3 2 2 2 2 2 2
#>  [2845] 1 1 2 2 2 1 1 1 2 2 1 1 2 1 3 2 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 2 2 2
#>  [2881] 1 1 1 1 2 2 1 2 2 2 2 2 2 1 2 1 2 2 2 2 2 1 2 1 2 1 2 2 2 3 2 2 2 2 2 2
#>  [2917] 1 2 2 1 1 2 2 1 1 2 2 1 1 1 1 2 1 2 1 1 1 2 1 1 2 1 1 2 2 2 2 2 2 1 2 2
#>  [2953] 2 2 2 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 2 3 2 1 1 2 2 2 1 2 1 2 2 2 2 2 2
#>  [2989] 2 2 2 2 1 1 2 2 1 1 2 1 3 2 2 2 2 2 1 1 1 2 3 1 1 2 2 2 1 2 2 2 2 1 1 2
#>  [3025] 2 2 3 2 1 1 2 2 1 2 1 1 1 2 1 2 2 2 1 1 2 2 1 2 2 2 2 2 1 1 1 1 2 1 1 2
#>  [3061] 1 1 2 1 2 2 1 2 1 1 2 2 2 1 2 2 1 1 2 1 2 1 2 2 1 1 2 2 1 2 2 2 1 1 2 2
#>  [3097] 1 1 1 1 2 1 2 1 2 1 1 3 1 2 1 1 2 1 2 2 1 2 2 2 1 1 2 1 3 1 2 2 2 1 2 1
#>  [3133] 1 2 2 2 1 2 1 2 2 1 1 1 1 2 2 2 1 1 2 2 1 2 1 1 2 2 2 2 2 2 3 1 2 2 2 1
#>  [3169] 2 1 1 1 1 2 2 2 2 1 2 1 2 2 1 1 2 1 2 1 2 2 2 1 1 2 2 1 2 3 2 2 1 1 1 2
#>  [3205] 1 1 2 2 2 1 1 2 2 2 1 2 1 2 1 2 1 2 1 1 2 2 2 1 2 1 2 2 2 1 2 3 1 1 2 2
#>  [3241] 1 2 2 2 1 1 2 1 2 3 1 2 1 2 2 2 2 2 1 2 1 2 2 2 1 2 2 1 2 2 2 1 2 2 1 1
#>  [3277] 1 2 2 2 1 2 2 2 1 2 2 2 1 1 2 1 1 2 1 2 2 1 1 1 2 2 2 2 2 2 3 2 1 1 1 2
#>  [3313] 2 2 2 2 2 1 2 2 2 2 1 1 1 2 1 1 2 3 1 2 1 2 1 2 2 2 1 1 1 2 1 1 2 1 1 2
#>  [3349] 2 1 1 1 2 2 2 2 1 1 2 2 2 1 2 1 2 1 1 2 2 2 1 2 1 3 1 2 2 2 2 1 1 2 1 1
#>  [3385] 2 2 1 3 2 2 1 2 1 2 2 1 1 1 1 2 2 2 2 2 1 2 2 1 1 2 2 1 2 2 1 2 1 1 1 2
#>  [3421] 2 2 2 1 1 1 2 2 1 2 1 2 3 2 1 1 1 2 2 1 2 2 1 1 2 2 1 1 1 2 1 2 2 2 1 1
#>  [3457] 2 2 1 2 1 1 2 1 1 2 2 1 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1
#>  [3493] 1 1 1 2 1 2 2 2 1 2 1 2 1 1 1 1 2 2 1 2 2 1 2 1 2 1 2 2 2 1 1 2 1 1 2 2
#>  [3529] 2 1 1 2 2 1 2 1 1 2 2 1 1 2 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2 2 1
#>  [3565] 2 3 2 2 2 1 2 1 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 1 2 2 1 1 2 2 2 2 2 1 2
#>  [3601] 2 1 2 2 1 1 2 2 1 2 3 1 2 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2 1 1 2 1 1 2 2 2
#>  [3637] 2 1 2 2 2 1 1 1 1 1 2 2 2 3 2 2 1 2 1 1 2 1 1 2 1 1 1 1 3 2 1 1 1 1 2 2
#>  [3673] 2 2 2 1 1 2 2 1 2 1 1 2 2 2 2 2 3 2 1 1 2 2 2 1 2 2 2 2 1 2 1 2 2 2 1 2
#>  [3709] 2 2 2 1 1 2 1 2 2 2 1 2 1 2 1 2 2 1 2 2 2 1 3 2 1 1 1 2 1 2 2 1 2 2 1 2
#>  [3745] 2 1 1 1 1 1 1 2 2 2 3 2 1 1 2 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 2 2 2 1 2
#>  [3781] 1 2 2 1 2 1 2 1 2 2 2 1 2 1 2 1 2 2 2 1 2 2 2 2 2 1 2 2 1 2 2 2 1 2 2 2
#>  [3817] 2 1 2 2 1 1 1 2 1 2 2 2 3 2 2 1 1 2 2 2 1 2 1 2 2 2 1 1 2 2 1 2 2 1 1 1
#>  [3853] 2 1 2 2 3 1 2 1 2 2 2 1 1 1 2 2 3 2 1 1 1 2 2 1 1 2 1 2 2 2 1 1 2 1 1 3
#>  [3889] 1 1 1 2 2 2 2 1 1 1 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2 1 2 2 1 2 1 2 1 2 2 1
#>  [3925] 1 2 1 1 2 1 3 1 2 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 3 2 2 1 2 1 1 1 2 2 2
#>  [3961] 1 1 2 1 2 2 2 2 2 1 1 1 2 1 1 2 1 2 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 1 1 2
#>  [3997] 1 1 2 2 2 2 1 1 2 1 2 2 1 1 1 2 2 1 2 2 1 2 2 3 1 1 2 2 1 1 2 1 1 2 2 2
#>  [4033] 1 2 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 2 2 1 1
#>  [4069] 2 2 2 1 1 2 2 1 2 2 2 1 1 1 2 1 1 2 1 2 1 2 1 1 2 2 2 1 2 2 2 2 1 2 1 1
#>  [4105] 1 2 1 2 2 2 1 1 2 1 2 2 1 3 2 1 1 2 2 1 1 2 1 2 3 2 2 1 2 2 1 2 1 2 2 1
#>  [4141] 2 1 2 2 1 1 2 1 2 2 1 1 2 2 1 2 1 1 1 1 2 2 1 2 2 1 1 1 2 2 2 1 2 1 1 1
#>  [4177] 1 1 2 1 1 2 1 2 1 2 1 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 2 2 1 3 1 2 2 2 1 2
#>  [4213] 1 1 2 2 2 2 1 1 1 1 1 1 2 1 1 1 2 1 2 1 2 2 1 1 1 2 2 2 2 2 2 2 1 1 1 2
#>  [4249] 1 2 2 2 2 3 1 2 2 1 2 2 2 2 1 1 2 2 1 2 2 2 2 2 2 1 2 1 1 1 2 2 2 1 2 2
#>  [4285] 1 1 1 2 2 1 1 1 2 2 1 2 2 1 2 2 2 1 2 2 2 1 1 3 2 1 2 2 1 1 1 1 2 1 2 2
#>  [4321] 2 2 1 2 1 2 2 2 1 2 2 1 1 2 2 1 1 2 2 2 1 2 1 2 2 1 2 1 1 1 1 1 2 2 1 2
#>  [4357] 2 2 1 1 2 2 2 2 1 1 1 2 2 1 2 2 2 1 2 2 1 1 1 1 3 2 1 2 2 2 2 1 1 2 1 2
#>  [4393] 2 2 1 2 2 1 2 2 2 1 3 2 1 1 2 1 1 1 2 2 2 1 2 2 1 1 2 2 2 2 2 1 1 2 1 1
#>  [4429] 1 1 3 2 2 2 2 2 2 2 2 2 1 2 1 2 1 1 2 2 2 1 1 1 1 2 2 2 1 2 2 2 2 2 1 2
#>  [4465] 2 2 2 1 2 2 1 2 2 1 2 2 1 2 2 2 2 2 1 1 3 2 2 1 1 2 2 1 2 1 2 2 1 2 1 2
#>  [4501] 1 2 2 1 1 2 1 2 2 1 1 2 1 1 1 1 2 2 2 1 2 2 1 2 3 2 1 2 1 1 2 1 1 2 2 2
#>  [4537] 1 2 2 1 1 2 1 2 1 2 2 2 2 2 1 1 2 1 1 1 2 1 2 2 1 2 2 1 2 1 2 2 1 2 1 2
#>  [4573] 1 1 1 1 1 1 2 1 2 2 1 2 1 2 1 2 2 2 2 2 2 1 1 2 2 2 1 2 2 2 2 1 1 2 2 1
#>  [4609] 2 1 2 2 2 2 2 1 2 1 1 1 3 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 1 2 2 2 1 2 2 2
#>  [4645] 2 1 3 2 1 2 1 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 2 3 1 3 1 1 2 2 1
#>  [4681] 1 1 1 1 2 2 2 2 1 2 2 2 1 1 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2
#>  [4717] 1 3 2 2 1 2 1 1 1 2 1 2 2 2 1 2 1 1 2 1 1 2 2 2 2 2 2 1 2 2 1 1 2 1 2 1
#>  [4753] 1 1 2 2 2 3 2 2 1 2 1 1 2 2 2 1 1 1 3 3 2 2 2 2 1 2 1 2 1 2 2 2 1 1 1 1
#>  [4789] 1 2 2 2 2 1 1 2 1 2 1 2 1 1 1 2 1 1 2 1 1 2 1 2 1 2 2 1 2 1 2 2 1 2 2 2
#>  [4825] 1 1 1 2 2 2 2 1 2 1 1 2 2 2 1 1 2 2 2 1 2 1 1 2 2 1 2 2 1 1 1 2 2 2 2 2
#>  [4861] 1 2 2 1 1 2 1 2 2 2 1 1 1 2 2 2 2 1 2 3 1 1 2 1 2 1 2 2 2 2 1 1 2 2 1 1
#>  [4897] 3 2 1 1 2 1 1 1 2 1 1 2 3 2 2 2 2 2 1 2 2 1 1 2 2 2 1 1 1 2 1 1 2 2 1 2
#>  [4933] 1 1 2 2 2 1 2 2 2 2 1 2 1 2 1 1 1 1 2 2 2 1 1 1 2 2 2 2 2 1 1 1 2 1 1 2
#>  [4969] 2 2 2 1 2 2 2 2 1 1 3 1 1 1 3 2 2 1 2 1 2 2 2 1 2 2 2 1 2 1 2 1 2 3 3 3
#>  [5005] 3 2 3 3 3 3 3 3 3 1 3 3 3 3 3 1 3 3 3 3 3 3 2 3 3 3 3 3 3 3 1 3 3 3 3 3
#>  [5041] 1 3 2 2 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [5077] 3 3 2 3 3 2 3 3 1 2 3 3 3 3 2 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 1 3 3 3 3 3
#>  [5113] 2 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 2 3 3 3 1 3 3 3 3 3 2 1 3 3 3
#>  [5149] 3 1 3 3 3 3 3 3 1 3 3 3 2 3 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 3 3 3 3 1 3 3
#>  [5185] 3 3 3 3 3 3 3 3 3 1 3 1 2 3 1 3 1 3 3 3 3 3 3 2 2 1 3 3 3 3 3 2 3 3 3 3
#>  [5221] 3 2 3 3 3 3 3 3 3 3 3 3 1 1 3 3 2 3 3 3 3 3 3 1 3 1 3 3 3 3 3 3 3 3 3 2
#>  [5257] 3 3 2 1 3 3 2 3 3 3 1 3 3 1 3 3 3 3 3 2 3 3 3 3 3 3 3 3 1 3 2 3 3 3 1 3
#>  [5293] 1 3 2 1 3 3 3 3 1 2 3 2 1 3 2 3 3 1 2 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2
#>  [5329] 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 3 3 3 2 3 3 2 2 3
#>  [5365] 3 3 2 3 3 2 3 2 3 2 3 3 3 3 3 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [5401] 3 3 2 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 2 3 3 1 3 3 3 3 3 3 3 3
#>  [5437] 3 3 3 2 3 2 3 3 3 1 3 3 3 3 3 3 1 3 3 1 3 3 3 3 3 3 2 3 1 3 3 2 3 3 3 3
#>  [5473] 3 3 3 3 2 3 3 3 3 3 2 2 3 2 3 3 1 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3
#>  [5509] 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 1 3 2 3 3 3 3 3 1 3 3 3 3 3 3 3 1 3 3 3
#>  [5545] 3 3 3 2 3 3 3 3 3 1 3 3 3 3 1 2 3 3 3 2 3 2 3 2 3 3 3 3 3 3 2 3 3 3 1 3
#>  [5581] 3 3 3 3 1 3 3 3 3 2 3 3 3 3 2 3 3 3 1 3 3 3 3 2 3 2 3 3 3 2 3 3 3 3 3 2
#>  [5617] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 2 3 3 1 3 3 2 3 3 3 3 3 3 1 1 3 3 2 3
#>  [5653] 2 2 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3
#>  [5689] 3 3 1 3 3 3 3 2 3 3 3 1 3 2 3 3 2 2 3 3 1 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3
#>  [5725] 3 3 3 1 3 3 3 3 3 2 3 3 3 3 2 3 3 2 2 2 3 3 2 3 3 3 2 3 3 3 3 2 3 3 3 3
#>  [5761] 3 3 3 3 3 3 3 3 1 1 3 1 2 3 3 3 3 3 3 1 3 3 3 1 3 3 2 3 1 3 3 3 2 3 3 3
#>  [5797] 3 3 3 3 3 3 3 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3 3 3 3 2 1 3 3 2 1
#>  [5833] 3 3 3 3 1 3 3 3 3 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 1 1 3 3 3 3 3 3 1 3 3
#>  [5869] 3 3 3 1 1 1 3 3 3 3 3 3 3 2 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 2
#>  [5905] 3 1 3 1 3 2 3 2 1 3 3 3 3 3 2 3 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 1 3 3
#>  [5941] 1 2 1 3 3 3 3 1 1 3 3 1 3 3 3 3 1 2 2 3 3 3 3 3 3 3 1 3 3 2 2 3 3 3 1 3
#>  [5977] 3 3 2 3 1 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 3 3 3 3 3
#>  [6013] 3 3 3 1 3 3 3 1 1 3 3 3 3 3 3 1 3 3 3 3 3 3 1 3 3 3 3 2 3 1 3 3 3 1 3 3
#>  [6049] 3 2 3 3 3 3 3 2 1 3 3 1 1 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3
#>  [6085] 3 3 3 3 3 2 1 3 3 2 3 2 3 3 3 3 3 2 3 2 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 3
#>  [6121] 3 3 3 2 3 3 3 3 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 2 3 2 3
#>  [6157] 3 3 3 3 3 3 2 3 3 3 3 3 3 3 1 3 3 3 3 1 3 3 3 3 1 2 3 1 3 3 2 2 3 3 2 3
#>  [6193] 3 1 1 3 3 2 3 2 1 3 1 1 3 3 3 3 3 2 2 3 1 3 1 3 3 1 3 3 1 3 3 3 3 3 3 3
#>  [6229] 3 3 3 1 3 3 3 3 3 3 3 1 1 1 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 3 2 1 3 3
#>  [6265] 3 3 3 2 1 3 3 3 3 1 3 2 3 1 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3 3 1 3
#>  [6301] 3 3 3 3 3 3 3 2 3 1 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [6337] 2 2 3 3 3 3 3 3 2 3 3 3 3 1 3 1 3 3 3 3 3 3 1 3 3 3 3 3 3 1 3 3 3 3 3 2
#>  [6373] 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2 3
#>  [6409] 3 3 1 2 3 2 3 3 3 1 2 3 3 3 3 3 1 1 3 1 3 3 3 3 1 3 1 1 3 3 3 3 3 3 2 3
#>  [6445] 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 3 1 3 3 2 2 3 3 3 3 3 2 1 3
#>  [6481] 1 3 3 3 2 2 3 3 3 3 1 3 1 3 3 3 3 1 2 3 3 3 3 1 1 2 3 3 1 3 1 3 3 2 1 1
#>  [6517] 2 3 3 3 1 3 2 3 1 3 3 3 3 3 3 3 1 3 2 2 3 3 3 3 2 3 2 3 3 3 3 3 3 3 3 3
#>  [6553] 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 1 3 3 3 1 3 3 1 3 2 3 3 3 1 3 3 1 1 3
#>  [6589] 3 3 2 3 3 3 3 3 2 3 1 3 3 1 3 3 3 1 3 3 1 3 3 3 1 3 3 3 3 1 3 1 3 2 3 3
#>  [6625] 3 3 3 3 3 3 3 3 3 3 1 1 2 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 3 2 3
#>  [6661] 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 1 3 2 1 3 2 3 3 3 3 2 2 3
#>  [6697] 3 3 2 2 3 2 3 3 3 1 3 3 3 3 3 3 3 1 3 1 3 3 2 3 3 3 2 3 3 3 2 2 3 3 3 3
#>  [6733] 3 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3
#>  [6769] 3 1 3 3 3 3 3 1 3 3 3 3 3 2 3 3 1 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2
#>  [6805] 1 3 3 3 3 3 1 3 3 3 2 3 2 3 3 1 3 2 3 1 3 1 3 3 3 3 3 3 3 3 3 2 3 2 3 3
#>  [6841] 3 2 3 2 3 2 3 1 2 3 3 3 1 1 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 2
#>  [6877] 3 3 3 3 1 2 3 3 3 2 3 3 2 3 3 3 3 3 1 3 3 3 1 2 3 1 3 3 3 3 2 3 1 3 3 3
#>  [6913] 3 3 3 3 3 3 3 1 1 1 3 3 1 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 2 3 1 3 3 3 1 3
#>  [6949] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 2 3 3 3 3 2 2 3 2 3 3 3 3 2 3 3 3 3 1
#>  [6985] 3 2 3 3 3 1 3 3 2 3 3 3 3 3 1 3 1 3 2 3 3 3 3 3 2 3 3 3 1 3 2 3 3 3 3 3
#>  [7021] 3 2 3 3 3 2 3 3 3 3 3 1 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 1 3 3 3 3 3
#>  [7057] 3 1 3 3 3 1 3 3 1 1 3 1 1 3 1 3 3 3 2 3 3 2 3 1 3 3 3 1 3 1 3 3 3 3 1 3
#>  [7093] 1 2 3 1 2 2 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2 2 3 3 3 1 3 3 3 3 3 3 3
#>  [7129] 3 3 1 3 3 3 3 3 2 3 3 3 1 3 3 1 3 3 3 3 3 2 3 1 3 1 1 1 1 3 1 3 3 1 3 1
#>  [7165] 3 3 3 3 3 3 1 3 3 3 3 2 1 3 3 1 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 2 1 3 1 3
#>  [7201] 3 3 1 2 3 3 3 3 3 3 3 3 2 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 3 1 2 3 3 3 1 1
#>  [7237] 3 1 1 3 3 3 3 3 3 2 3 1 3 3 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [7273] 3 3 3 1 3 1 3 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 2 3 3 2 3 3 2 3 3 3 1 3 3 2
#>  [7309] 3 3 3 3 3 1 1 3 3 1 3 1 2 3 3 3 3 3 2 1 3 3 3 3 3 1 3 3 3 2 3 3 3 2 3 3
#>  [7345] 3 1 3 3 3 2 3 3 3 3 1 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3
#>  [7381] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 1 3 3 3 3 3 1 3 1 3 1 1 3 3 3 1 3 1 3 1
#>  [7417] 3 3 3 1 3 2 3 3 3 2 3 3 1 2 1 3 1 3 3 3 3 3 2 3 1 3 1 1 3 3 1 2 3 3 3 3
#>  [7453] 1 2 1 3 3 3 3 3 3 3 3 3 3 2 1 3 1 3 1 3 3 1 3 3 3 2 2 3 1 3 3 3 3 3 3 3
#>  [7489] 3 3 3 3 3 2 3 1 3 3 3 3 2 3 1 1 3 1 2 3 3 3 3 3 2 3 1 3 2 3 3 3 3 3 3 3
#>  [7525] 3 3 2 3 3 1 3 1 3 1 3 3 2 3 3 3 3 1 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 1
#>  [7561] 3 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 3 3 3 3
#>  [7597] 3 3 3 3 3 1 3 3 3 1 3 3 3 3 3 3 3 3 2 3 1 2 3 2 3 3 3 1 1 3 2 1 3 1 3 1
#>  [7633] 3 3 3 3 3 3 3 3 2 2 3 3 2 2 3 2 2 3 2 2 3 3 3 3 3 1 2 3 3 3 3 3 3 3 3 1
#>  [7669] 2 2 3 3 3 3 3 2 3 2 3 3 3 3 3 3 3 1 1 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3 3 1
#>  [7705] 3 3 3 3 3 3 3 2 3 2 1 3 3 3 2 3 2 3 1 3 3 3 1 3 3 3 3 3 1 3 3 3 3 3 3 3
#>  [7741] 2 1 2 3 1 3 3 2 3 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 2 3 1
#>  [7777] 3 2 3 3 3 3 1 3 3 1 3 3 3 3 3 1 3 3 3 3 3 2 1 2 3 1 3 1 3 1 2 3 3 3 3 3
#>  [7813] 3 2 3 3 3 3 2 3 3 3 3 3 1 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 1
#>  [7849] 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 1 2 1 3 3 2 3 3 3 3 3 2 3 3 3 3 2 3
#>  [7885] 3 3 2 3 3 3 3 2 3 3 3 3 1 3 3 3 3 2 3 3 3 3 1 3 3 3 2 3 2 3 3 2 1 3 3 3
#>  [7921] 3 3 3 3 2 3 3 3 3 2 1 3 3 2 2 3 3 1 1 3 3 3 1 3 3 3 3 3 3 3 3 3 2 3 3 3
#>  [7957] 3 3 3 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 3 1 3 3 1 3 2
#>  [7993] 1 3 2 1 2 1 3 3 1 3 3 3 2 3 3 3 3 3 2 3 3 1 1 3 3 3 3 3 1 3 1 3 2 3 3 1
#>  [8029] 3 3 3 3 1 2 3 3 3 3 2 3 3 3 1 3 1 3 3 3 3 3 2 1 1 3 3 3 3 2 3 3 3 1 3 3
#>  [8065] 1 2 3 3 3 3 3 3 3 2 1 3 2 3 3 3 3 3 2 3 1 3 3 2 3 2 3 3 1 3 3 3 3 3 1 3
#>  [8101] 3 3 3 3 3 2 3 3 1 3 3 3 1 3 3 3 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 2 3 3
#>  [8137] 3 1 3 2 2 3 3 1 3 1 3 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 3 1 3 3
#>  [8173] 3 1 3 3 1 3 3 3 3 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3 1 2 3 3 1 3 3 3 2 2 3 3
#>  [8209] 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 1 2 3 3 3 3 1 3 3 3 3 3 3 3 3 2 3
#>  [8245] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 3 2 3 3 3 3 3 3 3 3 3 3 1 3 3 3 2 1 1 3
#>  [8281] 3 3 2 1 2 3 2 3 3 2 3 3 1 3 3 1 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3
#>  [8317] 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 2 1 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3
#>  [8353] 3 2 1 3 3 3 3 3 3 1 3 3 3 3 3 2 1 3 3 2 3 3 3 2 1 3 3 3 3 3 1 3 3 2 3 3
#>  [8389] 3 3 3 3 3 3 1 3 3 2 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3
#>  [8425] 2 3 3 3 3 3 3 3 3 3 2 3 1 3 3 3 3 3 3 1 3 3 3 1 3 2 3 3 3 3 2 2 3 1 3 3
#>  [8461] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 1 3 1 2 3 3 1 3 1 3 1 3 3 3 2 2 3 3 3 3
#>  [8497] 3 3 3 3 3 3 3 1 2 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 1 3 3 1 2 3 3 1 3
#>  [8533] 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 3 1 3 1 3
#>  [8569] 3 3 3 3 3 3 3 1 1 3 1 3 2 3 3 3 3 2 3 3 3 3 3 3 2 3 2 1 3 2 3 3 3 3 1 2
#>  [8605] 3 3 3 3 3 3 3 3 3 3 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 1 2 2 3 1
#>  [8641] 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 2 3 1 2 3 3
#>  [8677] 1 3 2 3 3 3 3 3 3 3 3 1 3 3 1 2 2 1 3 1 2 3 3 2 3 3 3 3 3 3 2 3 3 3 3 1
#>  [8713] 1 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 1 2 3 2 3 1 2 3 1 2 1 3 3 2 3 3
#>  [8749] 1 2 3 3 1 3 3 3 3 1 3 3 3 3 1 3 3 3 2 3 1 3 3 3 3 3 3 3 3 3 3 2 3 2 3 1
#>  [8785] 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 1 2 3 3 1 3 3 1 3 3 3 1 3 3 3 3 3 3 3 1 3
#>  [8821] 3 3 3 2 3 3 3 3 1 3 2 3 1 3 2 3 3 3 3 1 3 3 3 1 3 3 2 2 3 3 3 1 2 3 3 3
#>  [8857] 3 1 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2
#>  [8893] 2 3 1 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 1 3 3 2 2 3 3
#>  [8929] 3 3 3 3 3 3 3 2 3 3 3 3 3 3 1 3 3 3 1 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1
#>  [8965] 1 3 1 2 3 1 3 3 1 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [9001] 3 3 3 3 3 3 1 3 1 3 3 3 3 1 3 3 3 3 2 3 3 3 1 3 3 1 2 3 1 2 3 3 3 3 3 3
#>  [9037] 3 3 3 2 1 3 1 3 1 3 3 3 3 3 3 3 2 3 1 3 3 3 3 3 3 3 3 1 3 1 3 3 3 3 3 3
#>  [9073] 1 1 3 1 3 3 3 3 3 3 1 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 1 3 3 3 3 3 1 3 3
#>  [9109] 2 1 3 3 3 3 3 3 1 3 3 3 3 3 2 3 1 3 3 3 3 1 3 2 3 3 3 3 3 3 3 2 2 3 3 3
#>  [9145] 3 3 3 3 3 3 3 3 3 1 3 3 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3 3 3
#>  [9181] 3 3 3 2 3 2 3 3 3 3 3 1 3 3 3 3 3 2 3 2 3 1 3 3 3 3 3 3 3 3 3 3 3 3 1 3
#>  [9217] 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 1 3 3 3 2 3 3 1 2 3 3 3 2 1 1 2 3 3 3 3 3
#>  [9253] 3 3 3 2 3 3 3 3 3 3 3 3 3 2 1 3 3 3 1 3 3 1 3 3 3 3 3 3 1 3 3 3 3 3 2 1
#>  [9289] 3 3 2 3 1 3 1 3 1 3 3 3 3 3 3 2 1 3 2 3 3 1 3 1 1 3 3 3 3 3 3 3 3 3 3 1
#>  [9325] 3 3 1 3 3 1 3 3 2 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3
#>  [9361] 3 3 3 2 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 1 3 3 1 3 2 3 1 3 3 2 3 1 3 3
#>  [9397] 3 2 3 1 3 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 3 1 3 3 3 3 3 3 3 3 2 3
#>  [9433] 1 3 2 1 3 3 3 3 3 1 2 3 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3
#>  [9469] 3 3 3 3 3 3 3 2 3 3 1 3 1 3 3 2 3 3 3 3 3 2 3 3 3 1 3 3 1 3 1 3 3 2 3 2
#>  [9505] 3 3 2 3 3 2 2 3 3 1 1 3 3 1 3 3 3 3 3 3 2 3 3 3 3 3 3 2 1 3 3 3 1 3 2 3
#>  [9541] 2 1 3 3 3 3 3 3 3 3 1 1 3 3 3 3 3 3 3 3 3 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3
#>  [9577] 3 1 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 2 3 3 3 1 3 1 3 2 3 3 3 3 1 1 3 3 3 2
#>  [9613] 3 2 2 3 1 3 2 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 1 3 3
#>  [9649] 3 1 3 3 3 3 3 3 3 2 1 3 3 3 1 3 3 3 2 3 3 2 3 3 3 1 3 3 3 3 3 3 3 3 3 3
#>  [9685] 3 3 3 3 1 3 3 3 3 3 1 1 1 3 3 3 1 3 3 3 3 3 3 3 2 3 2 3 2 3 3 3 3 3 3 3
#>  [9721] 3 3 3 3 1 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 1 3 3
#>  [9757] 3 3 3 1 3 2 2 3 3 2 3 3 3 3 3 3 3 3 1 3 3 1 3 1 2 2 3 3 1 3 3 3 3 3 3 3
#>  [9793] 3 3 1 3 3 3 3 3 3 3 1 3 3 1 3 3 3 3 3 3 3 3 3 3 2 3 3 3 1 3 3 3 3 1 3 3
#>  [9829] 3 3 1 3 2 3 3 1 1 3 3 2 3 3 3 3 3 1 1 3 3 3 3 2 1 2 3 3 3 3 3 3 1 3 3 3
#>  [9865] 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 2 3 3 3 1 3 3 2 3 3 3 3 2 3 1
#>  [9901] 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 2 3 2 1 2 3 3 3 3 3 2 3 3 3 3 2 3 3
#>  [9937] 1 3 3 3 3 3 3 1 2 3 3 3 1 3 3 3 3 3 3 3 3 2 3 3 2 3 3 3 3 2 3 3 3 1 3 3
#>  [9973] 3 2 1 3 2 3 3 3 3 1 1 3 3 3 3 3 3 3 2 3 1 2 3 1 3 3 2 2
#> 
#> Within cluster sum of squares by cluster:
#> [1] 50249.74 66499.16 57009.60
#>  (between_SS / total_SS =  21.0 %)
#> 
#> Available components:
#> 
#> [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
#> [6] "betweenss"    "size"         "iter"         "ifault"
airplane$cluster <- as.factor(airplane_cluster$cluster)
head(airplane)
#> # A tibble: 6 × 24
#> # Groups:   satisfaction [1]
#>   Gender Customer.Type   Age Type.of.Travel Class Flight.Distance
#>    <dbl>         <dbl> <int>          <dbl> <dbl>           <int>
#> 1      0             0    25              1     3            1671
#> 2      1             0    25              1     1             748
#> 3      0             1    64              1     3            1652
#> 4      1             1    54              0     1             216
#> 5      0             1    55              0     3             562
#> 6      1             1    59              0     1            1096
#> # ℹ 18 more variables: Inflight.wifi.service <int>,
#> #   Departure.Arrival.time.convenient <int>, Ease.of.Online.booking <int>,
#> #   Gate.location <int>, Food.and.drink <int>, Online.boarding <int>,
#> #   Seat.comfort <int>, Inflight.entertainment <int>, On.board.service <int>,
#> #   Leg.room.service <int>, Baggage.handling <int>, Checkin.service <int>,
#> #   Inflight.service <int>, Cleanliness <int>,
#> #   Departure.Delay.in.Minutes <int>, Arrival.Delay.in.Minutes <dbl>, …

Evaluation

Clustering evaluation involves assessing the quality of the obtained clusters and the effectiveness of the clustering algorithm in partitioning the data. In this code, we use two evaluation metrics: Within-Cluster Sum of Squares (WSS) and Between-Cluster Sum of Squares (BSS) to understand the clustering performance.

# WSS
airplane_cluster$withinss
#> [1] 50249.74 66499.16 57009.60
airplane_cluster$tot.withinss 
#> [1] 173758.5
# BSS/TSS
airplane_cluster$betweenss/airplane_cluster$totss
#> [1] 0.2101097
  1. Within-Cluster Sum of Squares (WSS): The WSS measures the compactness of each cluster, quantifying the sum of squared distances between data points and their assigned cluster center. In this code, we have three WSS values for the three clusters obtained. Lower WSS values indicate more compact clusters, indicating that data points within the same cluster are closer together, and the clustering is effective in grouping similar points.

  2. Total Within-Cluster Sum of Squares (Total WSS or TSS): The Total WSS represents the sum of the WSS values for all clusters. It measures the overall compactness of the clustering solution. In this code, the Total WSS value is 174076.5, representing the cumulative compactness of the three clusters. The goal is to minimize the Total WSS to achieve more distinct clusters.

  3. Between-Cluster Sum of Squares (BSS) / Total Sum of Squares (TSS): The BSS measures the separation between clusters, representing the sum of squared distances between cluster centers and the overall data mean. The Total Sum of Squares (TSS) is the sum of squared distances between each data point and the overall data mean. The ratio BSS/TSS indicates the proportion of variance explained by the clustering solution. A higher BSS/TSS ratio (in this case, 0.2086638) indicates that the clusters are well-separated and the clustering solution is more effective in capturing the variance among data points.

By evaluating the WSS, Total WSS, and BSS/TSS ratio, we gain insights into the clustering quality and how well the clusters are separated. These evaluation metrics help assess the validity and appropriateness of the K-Means clustering solution and provide valuable information for further analysis and decision-making.

Grouping data based on cluster label

as.data.frame(airplane_cluster$centers)
#>         Gender Customer.Type        Age Type.of.Travel      Class
#> 1 -0.029589517   -0.07581208 -0.1446386    -0.75685600 -0.7942162
#> 2  0.001108934   -0.27780536 -0.1781261    -0.09664744 -0.3014576
#> 3  0.018885211    0.28902159  0.2496929     0.59024526  0.7909013
#>   Flight.Distance Inflight.wifi.service Departure.Arrival.time.convenient
#> 1      -0.4985866           0.005037965                        0.21145103
#> 2      -0.2225662          -0.339491421                       -0.11096264
#> 3       0.5250749           0.287713402                       -0.04660841
#>   Ease.of.Online.booking Gate.location Food.and.drink Online.boarding
#> 1             -0.1236495    0.02186144      0.5917204      -0.1898569
#> 2             -0.1706043   -0.03704310     -0.8322952      -0.5970224
#> 3              0.2291730    0.01710658      0.3169623       0.6391811
#>   Seat.comfort Inflight.entertainment On.board.service Leg.room.service
#> 1    0.3834822              0.4119284       -0.1922423       -0.3125912
#> 2   -0.9253081             -1.0501691       -0.4622439       -0.3574390
#> 3    0.5363120              0.6243022        0.5252168        0.5160326
#>   Baggage.handling Checkin.service Inflight.service Cleanliness
#> 1       -0.1849048     -0.06788385       -0.1490822   0.5740314
#> 2       -0.3902565     -0.34446938       -0.3818722  -0.9725540
#> 3        0.4585737      0.34086656        0.4273702   0.4490829
#>   Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
#> 1                 0.01766059                  0.99995   -0.5384346
#> 2                 0.04675663                  0.99995   -0.6691882
#> 3                -0.05192983                  0.99995    0.9347359

Cluster Analyst

Cluster Centeroid

airplane_centroid <- airplane %>% 
  group_by(cluster) %>% 
  summarise_all(mean) %>% 
  mutate_if(is.numeric, .funs = "round", digits = 2) 
airplane_centroid
#> # A tibble: 3 × 24
#>   cluster Gender Customer.Type   Age Type.of.Travel Class Flight.Distance
#>   <fct>    <dbl>         <dbl> <dbl>          <dbl> <dbl>           <dbl>
#> 1 1         0.47          0.8   37.4           0.38  1.32            726.
#> 2 2         0.49          0.72  36.9           0.67  1.79           1008.
#> 3 3         0.49          0.94  43.2           0.98  2.84           1772.
#> # ℹ 17 more variables: Inflight.wifi.service <dbl>,
#> #   Departure.Arrival.time.convenient <dbl>, Ease.of.Online.booking <dbl>,
#> #   Gate.location <dbl>, Food.and.drink <dbl>, Online.boarding <dbl>,
#> #   Seat.comfort <dbl>, Inflight.entertainment <dbl>, On.board.service <dbl>,
#> #   Leg.room.service <dbl>, Baggage.handling <dbl>, Checkin.service <dbl>,
#> #   Inflight.service <dbl>, Cleanliness <dbl>,
#> #   Departure.Delay.in.Minutes <dbl>, Arrival.Delay.in.Minutes <dbl>, …

General Intrepertation

  • Cluster 1 consists of customers who gave relatively low ratings across various aspects of the flight experience. They are mostlyregular customers, traveling for non-business purposes, and typically opt for economy class. Their overall satisfaction is relatively low.

  • Cluster 2 represents customers who are slightly older on average, tend to be regular customers, and prefer business class. They generally give higher ratings in all aspects of the flight experience compared to Cluster 1, resulting in moderate overall satisfaction.

  • Custer 3 consists of customers who are older than those in Cluster 2 and have longer flight distances. They give the highest ratings in all aspects of the flight experience and exhibit the highest overall satisfaction.

Visualization

fviz_cluster(object = airplane_cluster,
             data = airplane %>% select(-cluster),
             labelsize = 0) +theme_minimal()

Looks like our cluster models are overlapping and not well separated, it indicates that the clusters might not be distinct enough in the current feature space. To improve the separation between clusters, We can consider the following suggestions:

  1. Feature Selection: Review the features used for clustering and identify if there are other relevant features that can better differentiate the clusters. You can try selecting a subset of informative features or perform feature engineering to create new meaningful features.

  2. Feature Scaling: Check if the features used for clustering are on different scales. If there are significant differences in the ranges or units of different features, it’s recommended to scale the features appropriately. Common scaling techniques include standardization (subtracting the mean and dividing by the standard deviation) or normalization (scaling to a specific range, e.g., [0, 1]). Scaling the features can help avoid dominance of certain features and enable fair comparisons during clustering.

  3. Dimensionality Reduction: If the number of features is high, it might be beneficial to reduce the dimensionality of the dataset. Techniques like Principal Component Analysis (PCA) or t-SNE can help capture the most important information while reducing the dimensionality. By visualizing or clustering on the reduced-dimensional space, you might obtain better cluster separation.

  4. Adjust Clustering Algorithm: Consider trying different clustering algorithms that are suitable for your data. Algorithms such as hierarchical clustering, DBSCAN, or density-based clustering can handle different types of data distributions and cluster shapes. Experimenting with alternative algorithms might reveal more distinct clusters in your data.

Looks like we’re gonna used Dimensionality Reduction to improve seperation between clusters

Principal Component Analysis (PCA)

To improve the separation between clusters in the “airplane” dataset. The primary goal of PCA is to reduce the number of variables while preserving the most important information, allowing us to achieve a better representation of the data in lower-dimensional space.

pca <- prcomp(airplane %>% 
              select(-cluster), 
              scale = T)
summary(pca)
#> Importance of components:
#>                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
#> Standard deviation     2.1522 1.5597 1.4844 1.35906 1.21171 1.02086 1.00314
#> Proportion of Variance 0.2014 0.1058 0.0958 0.08031 0.06384 0.04531 0.04375
#> Cumulative Proportion  0.2014 0.3072 0.4030 0.48327 0.54711 0.59242 0.63617
#>                            PC8     PC9    PC10    PC11   PC12    PC13    PC14
#> Standard deviation     1.00000 0.97962 0.96835 0.89452 0.8265 0.75851 0.69575
#> Proportion of Variance 0.04348 0.04172 0.04077 0.03479 0.0297 0.02501 0.02105
#> Cumulative Proportion  0.67965 0.72137 0.76214 0.79693 0.8266 0.85165 0.87270
#>                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
#> Standard deviation     0.66743 0.66246 0.62701 0.60765 0.56485 0.54412 0.51386
#> Proportion of Variance 0.01937 0.01908 0.01709 0.01605 0.01387 0.01287 0.01148
#> Cumulative Proportion  0.89206 0.91115 0.92824 0.94429 0.95816 0.97104 0.98252
#>                           PC22    PC23
#> Standard deviation     0.47632 0.41862
#> Proportion of Variance 0.00986 0.00762
#> Cumulative Proportion  0.99238 1.00000
fviz_eig(pca, ncp = 23, 
         addlabels = T, 
         main = "Variance explained by each dimensions")

In this case, we want to keep 80% of the information, which corresponds to approximately the first 11 principal components. By retaining these 11 components, we achieve a good balance between reducing the dimensionality of the data while preserving a substantial amount of the original information. The reduced feature space can lead to improved separation between clusters, allowing for more distinct and meaningful groupings of data points in subsequent clustering analyses.

# mengambil PC hasil dimensionality reduction
pc_keep <- as.data.frame(pca$x[,1:11])
airplane %>% 
  select_if(~!is.numeric(.)) %>% 
  cbind(pc_keep)
#> # A tibble: 10,000 × 13
#> # Groups:   satisfaction [2]
#>    satisfaction cluster    PC1     PC2    PC3     PC4    PC5     PC6      PC7
#>           <dbl> <fct>    <dbl>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>    <dbl>
#>  1            0 3        2.20  -0.0985  0.272  1.21   -2.34   0.0294 -0.494  
#>  2            0 2       -2.09   1.53    0.525  0.504  -2.32  -1.65   -1.71   
#>  3            0 2       -0.998  0.482  -0.766 -1.76    0.819  0.474   0.522  
#>  4            0 2       -1.34  -0.0313  2.35   1.40    1.70   0.881   0.635  
#>  5            0 1        1.34  -0.676  -0.390  1.54    1.48   1.64   -0.197  
#>  6            0 2       -3.61   0.106  -0.878 -0.402   1.63  -0.961  -0.780  
#>  7            0 1       -2.55  -1.91   -3.27   0.0560  1.82   1.60   -0.00829
#>  8            0 1       -0.283 -1.17    0.734 -0.108  -1.96   1.52   -0.153  
#>  9            0 1       -0.838 -1.13    1.02   0.604  -2.23  -0.425   0.486  
#> 10            0 2       -3.08   0.688  -1.28   0.991  -2.79  -0.803  -0.290  
#> # ℹ 9,990 more rows
#> # ℹ 4 more variables: PC8 <dbl>, PC9 <dbl>, PC10 <dbl>, PC11 <dbl>

Visualization PCA

Individual Factor Map

# membuat biplot
biplot(x = pca,
       cex = 0.6,
       scale = FALSE)

Variabel Factor MAP

fviz_contrib(
  X = pca, 
  choice = "var",
  axes = 1 
)

fviz_contrib(
  X = pca, 
  choice = "var",
  axes = 2
)

fviz_contrib(
  X = pca, 
  choice = "var",
  axes = 3
)

Combining PCA and K Means

In this task, we are combining Principal Component Analysis (PCA) and K-Means clustering using the “FactorMiner” library. The primary reason for using the “FactorMiner” library is that it provides more interpretability for the PCA results, making it easier to understand the relationships between the original variables and the principal components. “FactorMiner” offers additional tools for visualizing PCA results and extracting meaningful insights from the reduced feature space.

Another reason for using the “FactorMiner” library could be that it supports a broader range of PCA variations and analyses. It may provide options for different types of rotations, allowing us to explore the underlying structure of the data in a more comprehensive manner. Moreover, “FactorMiner” might offer additional methods for assessing the quality of the PCA results and selecting the optimal number of principal components based on various criteria.

# numeric column name (quantitative)
quanti <- airplane %>%
   select_if(is.numeric) %>%
   colnames()

# numeric column index
quantivar <- which(colnames(airplane) %in% quanti)

# categorical (qualitative) column names
quali <- airplane %>%
   select_if(is.factor) %>%
   colnames()

# categorical column index
qualivar <- which(colnames(airplane) %in% quali)
# equivalent to prcomp(data, scale. = T)
loan_pca <- PCA(X = airplane, #data used
                 scale.unit = T, #scaling
                 quali.sup = qualivar, #tell me which column is categorical
                 graph = F, #don't want to show plot directly
                 ncp = 23) #the number of PCs corresponds to the number of variables
head(as.data.frame(loan_pca$ind$coord))
#>        Dim.1        Dim.2      Dim.3      Dim.4     Dim.5      Dim.6
#> 1  2.7474565 -0.033235247  0.2931977 -0.5804917 -2.334204 -0.2940107
#> 2 -1.8434329 -1.533539680  0.5301838 -0.4087503 -2.324506  1.5338587
#> 3 -0.8013831 -0.457054583 -0.7721550  1.9761053  0.827376 -0.6153768
#> 4 -1.0212404  0.001118096  2.3561120 -1.3799138  1.694425 -0.7137025
#> 5  1.8440320  0.563204071 -0.3718484 -1.1498570  1.479870 -1.7598218
#> 6 -3.4790110 -0.045046763 -0.8866077  0.2001117  1.630988  0.9828266
#>         Dim.7 Dim.8      Dim.9     Dim.10      Dim.11      Dim.12      Dim.13
#> 1  0.31074529    -1 -1.1197259  0.6674372  0.45565145 -0.90529691  0.61488054
#> 2  1.89186146    -1  1.0072143 -0.5890646 -0.30918191  0.62456860  0.78738031
#> 3 -0.70275441    -1 -0.4262429 -2.1964075  0.18307508  0.42150113  0.03119027
#> 4 -0.58611938    -1  2.2518687  0.3112638 -0.35974619  0.09064256  0.36381001
#> 5 -0.01697945    -1  0.2041006  0.4561327  0.63825977  0.52078152 -0.47077780
#> 6  0.95631710    -1  1.4433800 -0.6036251 -0.01984666 -1.36557622  0.63165453
#>        Dim.14     Dim.15     Dim.16     Dim.17      Dim.18      Dim.19
#> 1  0.17706969  0.7143774 -1.3738105 -0.6407311  0.01524533 -0.52549956
#> 2  2.18038774  0.5342693 -0.7567435  0.5052407  0.32206780  0.03100035
#> 3  0.03204121  1.0474684  0.1455162 -0.2294148 -0.90488196 -0.05861908
#> 4 -0.28478303  0.7059168 -0.2773889 -0.7662820  0.15287021 -0.19999587
#> 5 -1.17598124  0.3737339 -0.4912790  1.5685011 -1.01539735 -0.70073777
#> 6  0.51015191 -0.3141964  1.1187420  0.1560077 -0.19805218 -0.35140427
#>        Dim.20     Dim.21     Dim.22
#> 1 -0.17472056 -0.6713790 -0.1249614
#> 2  0.46999634  0.2075433  0.1854061
#> 3  0.69623022 -0.7830441 -0.2757045
#> 4 -0.09702522  0.2896449  0.3040273
#> 5  0.07278404 -0.5711699 -1.1881403
#> 6 -0.51848148 -0.1482935 -0.4989104

Visualization

plot.PCA(x = loan_pca,
         choix = "var")

fviz_pca_biplot(X = loan_pca,
                habillage = "cluster",
                geom.ind = "point",
                addEllipses = T,
                col.var = "navy")

However, same problem like the individual observations map happened: we have to little dimensions to represent our data. On the above visualization, our cluster looks like intersecting each other because we don’t have enough dimensions to represent them.

We may add 1 more dimensions using plotly to see if our clusters is still clumped together.

3D Visualization

# Load required library
library(plotly)

# Extract PCA scores for the first three principal components
pca_scores <- loan_pca$ind$coord[, 1:3]

# Create a data frame with the PCA scores and cluster information
pca_data <- cbind(pca_scores, Cluster = as.factor(airplane_cluster$cluster))
pca_data <- as.data.frame(pca_data) # Convert to data frame

# Plot the 3D scatter plot
plot_ly(data = pca_data, x = ~Dim.1, y = ~Dim.2, z = ~Dim.3, color = ~Cluster,
        type = "scatter3d", mode = "markers", marker = list(size = 5)) %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")))

Conclusion

The application of unsupervised learning techniques, specifically K-Means clustering and Principal Component Analysis (PCA), has provided valuable insights into customer segmentation and satisfaction levels in the “airplane” dataset.

Through K-Means clustering, we identified three distinct customer clusters based on their flight experience ratings.

  • Cluster 1 represents customers who expressed relatively low satisfaction, giving lower ratings across various aspects of the flight, and are primarily regular customers traveling for non-business purposes in the economy class.

  • Cluster 2 consists of customers who are slightly older on average, also regular customers, and tend to prefer the business class. They generally provided higher ratings for all flight aspects compared to Cluster 1, resulting in moderate overall satisfaction.

  • Cluster 3 includes older customers with longer flight distances, who gave the highest ratings in all aspects of the flight experience, leading to the highest overall satisfaction among the three clusters.

We further improved our clustering analysis by incorporating PCA using the “FactorMiner” library, which helped visualize the data in a higher-dimensional space and potentially enhance the separation between clusters. This combined approach allowed us to understand the underlying structure of the data and gain more meaningful insights into customer segments.

The findings from this analysis can be leveraged by the airline industry to tailor their services and marketing strategies to different customer segments. For example, Cluster 1 customers might benefit from targeted improvements to enhance their satisfaction, while Cluster 3 customers may appreciate specialized offerings to maintain their high levels of satisfaction. Additionally, the identified clusters can be used as a foundation for future predictive models or customer segmentation strategies to further enhance the overall customer experience.

Overall, the utilization of unsupervised learning techniques has proven to be a valuable approach in understanding customer preferences, segmenting customers, and guiding data-driven decision-making in the aviation industry. As the industry continues to collect more data, ongoing analysis and optimization of services based on customer preferences will be essential to remain competitive and deliver an exceptional flying experience.