Unsupervised Learning: Airline Passenger Satisfaction
Introduction
In this project, we will explore the application of unsupervised learning techniques to analyze and understand airline passenger satisfaction. The data set we will be working with is the “Airplane” data set, which contains information about airline passengers, their demographics, flight details, and various aspects of their in-flight experience. The main objective of this project is to identify patterns and gain insights into factors that contribute to passenger satisfaction.
Unsupervised learning techniques, such as
clustering and dimensionality reduction, will be
employed to analyze the data. Clustering algorithms will
help us identify groups or segments of passengers with similar
characteristics, preferences, or levels of satisfaction. Dimensionality
reduction, specifically Principal Component Analysis (PCA)
, will be used to reduce the dimensionality of the dataset and uncover
the most significant factors driving passenger satisfaction.
Project Steps
Exploratory Data Analysis (EDA): We will begin by performing exploratory data analysis to gain a better understanding of the dataset. This will involve examining the structure of the data, checking for missing values, and exploring the distribution and relationships between different variables.
Preprocessing: Before applying unsupervised learning techniques, we will preprocess the data by handling missing values, encoding categorical variables, and scaling numerical variables if necessary. This step is crucial to ensure the accuracy and effectiveness of the subsequent analysis.
Clustering Analysis: Using appropriate clustering algorithms, such as k-means or hierarchical clustering, we will group passengers based on their attributes and behaviors. This will allow us to identify distinct segments of passengers and analyze the differences in their satisfaction levels.
Dimensionality Reduction: We will apply Principal Component Analysis (PCA) to reduce the dimensionality of the dataset while retaining the most informative features. By visualizing the principal components and analyzing their corresponding loadings, we can interpret the underlying factors influencing passenger satisfaction.
Interpretation and Insights: Finally, we will interpret the results of the clustering and dimensionality reduction analyses to gain insights into the key factors that drive passenger satisfaction. We will explore the characteristics of different passenger segments and identify any specific patterns or trends that contribute to higher satisfaction levels.
Expected Outcomes
By the end of this project, we aim to gain a deeper understanding of the factors that influence airline passenger satisfaction. The analysis will provide insights into the different passenger segments, their preferences, and the areas where airlines can focus to improve overall satisfaction. These findings can help airlines make data-driven decisions to enhance the travel experience and improve customer satisfaction.
Throughout the project, we will utilize R and various data science
libraries, such as tidyverse, caret, and
factoextra, to perform the necessary data manipulation,
analysis, and visualization tasks.
Let’s get started with the exploratory data analysis phase and delve into the world of unsupervised learning to uncover valuable insights from the “Airplane” data set!
DataSet Overview
The airplane dataframe contains the following
variables:
X: This variable represents the index or ID of each observation in the dataset.id: This variable contains the unique ID assigned to each passenger.Gender: This variable indicates the gender of the passenger (Male or Female).Customer.Type: This variable specifies the type of customer, whether they are a loyal customer or a disloyal customer.Age: This variable represents the age of the passenger.Type.of.Travel: This variable indicates the purpose of the passenger’s travel, whether it is personal or business-related.Class: This variable represents the class of travel chosen by the passenger (Eco Plus, Business, or other classes).Flight.Distance: This variable denotes the distance of the flight in miles.Inflight.wifi.service: This variable represents the rating provided by the passenger for the in-flight Wi-Fi service on a scale of 1 to 5.Departure.Arrival.time.convenient: This variable indicates the convenience of the departure and arrival times as rated by the passenger on a scale of 1 to 5.Ease.of.Online.booking: This variable represents the passenger’s rating of the ease of online booking on a scale of 1 to 5.Gate.location: This variable denotes the rating given by the passenger for the gate location on a scale of 1 to 5.Food.and.drink: This variable represents the passenger’s rating of the food and drink service on a scale of 1 to 5.Online.boarding: This variable indicates the passenger’s rating of the online boarding experience on a scale of 1 to 5.Seat.comfort: This variable represents the passenger’s rating of the seat comfort on a scale of 1 to 5.Inflight.entertainment: This variable denotes the rating given by the passenger for the in-flight entertainment options on a scale of 1 to 5.On.board.service: This variable indicates the passenger’s rating of the on-board service provided on a scale of 1 to 5.Leg.room.service: This variable represents the passenger’s rating of the legroom service on a scale of 1 to 5.Baggage.handling: This variable denotes the rating given by the passenger for the baggage handling service on a scale of 1 to 5.Checkin.service: This variable indicates the passenger’s rating of the check-in service on a scale of 1 to 5.Inflight.service: This variable represents the passenger’s rating of the in-flight service provided on a scale of 1 to 5.Cleanliness: This variable denotes the rating given by the passenger for the cleanliness of the aircraft on a scale of 1 to 5.Departure.Delay.in.Minutes: This variable represents the delay in departure time in minutes for each flight.Arrival.Delay.in.Minutes: This variable represents the delay in arrival time in minutes for each flight.satisfaction: This variable indicates the level of passenger satisfaction, categorized as “neutral or dissatisfied” or “satisfied”.
These variables capture various aspects of the airline passenger experience, including demographics, flight details, and passenger ratings for different services provided during the flight.
Data Preparation
Import package & Dataset
Import Package :
library(dplyr) # for data wrangling
library(ggplot2) # to visualize data
library(gridExtra) # to display multiple graph
library(caret) # to pre-process data
library(tibble) # for creating and manipulating tabular data structures
library(animation) # for creating animated visualizations
library(GGally) # Extension to ggplot2 for exploratory data analysis
library(gtools) # Utility functions for data manipulation and statistical analysis
library(tidyr) # Package for reshaping and tidying data
library(reshape2) # Package for data reshaping and restructuring
library(plotly) # Interactive plotting library
library(knitr) # Package for dynamic report generation
# Cluster
library(factoextra)
library(FactoMineR)Import dataSet
airplane <- read.csv("data_input/Airline_Passenge_Satisfaction.csv")
rmarkdown::paged_table(airplane)Subsetting Dataset
Performing stratified subsetting can be a valuable approach in situations where the dataset is too large to handle directly or when memory limitations arise. In my case, My memory doesnt even stand a chance to handle this much data :(
So i Subsetting the data using a stratified approach can help mitigate this issue. Other than Memory Efficieny There’s other reason for me to for performing stratified subsetting :
Representativeness: Stratified subsetting helps to maintain the representativeness of the original dataset. By preserving the proportion of observations from each category in a specific variable, such as satisfaction in my case, we ensure that the resulting subset reflects the same distribution and patterns present in the full dataset.
Efficient Resource Utilization: Subsetting the data to a smaller representative sample can significantly reduce the computational resources required for the subsequent cluster analysis
# Stratified subsetting based on satisfaction variable
subset_airplane <- airplane %>%
group_by(satisfaction) %>%
slice_sample(n = 5000, replace = FALSE)Regarding why we only take 10% of the total dataframe, the decision to choose a specific subset size depends on various factors. Taking a 10% subset is a common practice in statistical sampling, as it provides a reasonable compromise between computational feasibility and maintaining a representative sample. By selecting a smaller subset, we reduce the computational load while still capturing the main patterns and characteristics of the original data.
Data Pre-processing
Data Wrangling
glimpse(subset_airplane)#> Rows: 10,000
#> Columns: 25
#> Groups: satisfaction [2]
#> $ X <int> 40209, 36950, 77829, 97678, 68824, 8…
#> $ id <int> 104473, 16249, 8626, 11114, 94100, 7…
#> $ Gender <chr> "Female", "Male", "Female", "Male", …
#> $ Customer.Type <chr> "disloyal Customer", "disloyal Custo…
#> $ Age <int> 25, 25, 64, 54, 55, 59, 63, 36, 26, …
#> $ Type.of.Travel <chr> "Business travel", "Business travel"…
#> $ Class <chr> "Business", "Eco", "Business", "Eco"…
#> $ Flight.Distance <int> 1671, 748, 1652, 216, 562, 1096, 719…
#> $ Inflight.wifi.service <int> 4, 2, 3, 3, 3, 2, 2, 4, 4, 1, 3, 3, …
#> $ Departure.Arrival.time.convenient <int> 5, 2, 2, 4, 5, 1, 4, 4, 4, 4, 3, 5, …
#> $ Ease.of.Online.booking <int> 3, 2, 2, 3, 3, 2, 2, 4, 3, 1, 3, 3, …
#> $ Gate.location <int> 3, 1, 2, 1, 3, 3, 5, 3, 4, 3, 4, 4, …
#> $ Food.and.drink <int> 5, 3, 4, 2, 4, 2, 3, 3, 3, 4, 2, 3, …
#> $ Online.boarding <int> 3, 2, 4, 3, 5, 2, 5, 4, 3, 1, 3, 3, …
#> $ Seat.comfort <int> 5, 3, 3, 2, 5, 2, 4, 3, 3, 2, 2, 3, …
#> $ Inflight.entertainment <int> 5, 3, 3, 2, 5, 2, 1, 3, 3, 4, 2, 5, …
#> $ On.board.service <int> 4, 1, 3, 5, 5, 1, 1, 5, 4, 1, 3, 4, …
#> $ Leg.room.service <int> 5, 3, 3, 4, 3, 4, 2, 4, 4, 2, 2, 5, …
#> $ Baggage.handling <int> 5, 5, 3, 5, 2, 2, 1, 3, 3, 3, 4, 4, …
#> $ Checkin.service <int> 4, 2, 1, 5, 5, 3, 4, 4, 4, 1, 4, 3, …
#> $ Inflight.service <int> 4, 5, 3, 4, 5, 2, 1, 3, 4, 3, 3, 4, …
#> $ Cleanliness <int> 5, 3, 1, 2, 3, 2, 4, 3, 3, 4, 2, 3, …
#> $ Departure.Delay.in.Minutes <int> 4, 100, 0, 0, 8, 67, 0, 1, 0, 18, 32…
#> $ Arrival.Delay.in.Minutes <dbl> 10, 88, 0, 0, 8, 53, 0, 0, 0, 32, 20…
#> $ satisfaction <chr> "neutral or dissatisfied", "neutral …
Dropping Un-used Columns
let’s explain why we might want to drop these columns for clustering modeling:
- Irrelevant for Clustering: The X and id columns typically serve as identification or indexing columns, providing no meaningful information about the underlying patterns or characteristics of the data points
- Dimensionality Reduction: Removing irrelevant or redundant columns, such as X and id, helps reduce the dimensionality of the dataset.
# Drop 'X' and 'id' columns
airplane <- select(subset_airplane, -c(X, id))Handling NA Value
# Calculate the total and percentage of missing values in each column
missing_data <- data.frame(
Column = names(airplane),
Total = colSums(is.na(airplane)),
Percent = colMeans(is.na(airplane)) * 100
)
# Create a tibble with the missing data information
as_tibble(missing_data)#> # A tibble: 23 × 3
#> Column Total Percent
#> <chr> <dbl> <dbl>
#> 1 Gender 0 0
#> 2 Customer.Type 0 0
#> 3 Age 0 0
#> 4 Type.of.Travel 0 0
#> 5 Class 0 0
#> 6 Flight.Distance 0 0
#> 7 Inflight.wifi.service 0 0
#> 8 Departure.Arrival.time.convenient 0 0
#> 9 Ease.of.Online.booking 0 0
#> 10 Gate.location 0 0
#> # ℹ 13 more rows
In the “Airplane” dataset, we encountered missing values in the
"Arrival.Delay.in.Minutes" variable. Missing values can
occur due to various reasons, such as data collection errors or
unrecorded information. It is essential to handle missing values
appropriately to ensure the integrity and completeness of our
analysis.
To address the missing values in the “Arrival.Delay.in.Minutes” variable, we employed mean imputation. Mean imputation is a common technique used to replace missing values with the mean value of the available observations in the same variable. By imputing missing values with the mean, we aim to preserve the overall distribution and statistical properties of the variable.
Using the mean() function in R, we calculated the mean of the
"Arrival.Delay.in.Minutes" variable while excluding the NA
values. This mean value was then used to replace the missing values in
the variable. By imputing the mean for the missing values, we have
ensured that the dataset is complete and ready for further analysis.
# Compute mean of "Arrival.Delay.in.Minutes" with NA value
airplane$Arrival.Delay.in.Minutes <- mean(airplane$Arrival.Delay.in.Minutes, na.rm = TRUE)Encoding Categorical Variabels
Converting categorical variables into numeric columns for clustering modeling is important because it enables the use of distance-based metrics, ensures equal treatment of variables, and allows compatibility with a variety of clustering algorithms. This conversion allows us to effectively analyze and identify meaningful patterns and groups within the data.
airplane <- airplane %>%
mutate(
Gender = case_when(
Gender == "Female" ~ 0,
Gender == "Male" ~ 1
),
Customer.Type = case_when(
Customer.Type == "disloyal Customer" ~ 0,
Customer.Type == "Loyal Customer" ~ 1
),
Type.of.Travel = case_when(
Type.of.Travel == "Personal Travel" ~ 0,
Type.of.Travel == "Business travel" ~ 1
),
Class = case_when(
Class == "Eco" ~ 1,
Class == "Eco Plus" ~ 2,
Class == "Business" ~ 3
),
satisfaction = case_when(
satisfaction == "neutral or dissatisfied" ~ 0,
satisfaction == "satisfied" ~ 1
)
)Scalling Numerical Variabel
Scaling numerical variables is important for clustering modeling due to the following main reasons:
Comparable Measurement Scales: Scaling numerical variables ensures that they have comparable measurement scales, facilitating meaningful distance calculations for clustering.
Equal Weighting of Variables: Scaling ensures that each numerical variable contributes equally to the clustering analysis, preventing variables with larger scales from dominating the results.
Improved Clustering Performance: Scaling enhances clustering algorithm performance by helping them converge faster and generate more accurate clusters.
Enhanced Interpretability: Scaling allows for a more balanced consideration of all variables, leading to clearer interpretations of the resulting clusters and their underlying patterns.
Overall, scaling numerical variables in clustering modeling ensures that variables are on a comparable scale, that their contributions are equally weighted, and that the clustering algorithm can perform optimally.
# Select only numeric variables for scaling
airplane_scaled <- airplane[, sapply(airplane, is.numeric)]
# Scale the numeric variables
airplane_scaled <- scale(airplane_scaled)Exploraty Data Analysis
Checking Clustering Possibility
Checking the scatter matrix in exploratory data analysis is essential
when exploring clustering opportunities in your dataset. The scatter
matrix is a matrix of scatter plots that allows you to visualize the
relationships between multiple variables (in this case, Age
and Flight.Distance) in a single view. Each scatter plot
represents the relationship between two continuous variables, and when
you use color or other visual cues to represent additional categorical
variables (like “satisfaction” in this case), it becomes a powerful tool
for understanding data patterns.
scatter_matrix <- ggplot(airplane, aes(x = Age, y = Flight.Distance)) +
geom_point(alpha = 0.7, aes(color = satisfaction)) +
facet_grid(.~satisfaction) +
theme_bw() +
labs(title = "Scatter Plot Matrix - Age vs. Flight Distance",
x = "Age",
y = "Flight Distance",
color = "Satisfaction")
# Print the scatter plot matrix
print(scatter_matrix)Checking Dimension Reduction Opportunities
Checking Correlation
Checking the correlation matrix in exploratory data analysis is crucial when exploring clustering opportunities in your dataset. The correlation matrix quantifies the relationships between all pairs of variables, showing how they are linearly related to each other
ggcorr(airplane, label = TRUE,
label_size = 2.9,
hjust = 1,
layout.exp = 2)In our dataset, it appears that there is a high correlation among the variables. Given the presence of high correlation, it is advisable to perform Principal Component Analysis (PCA).
The high correlation in the dataset suggests that some variables may carry redundant or similar information, which can lead to multicollinearity issues during clustering or other analyses. By applying PCA, we can transform the original variables into a new set of uncorrelated components, known as principal components. These components retain the most significant variance in the data while removing the multicollinearity problem.
Other EDA
Checking Matrix Covariance in airplane :
Checking the covariance matrix in the “airplane” dataset is crucial for clustering and PCA analyses. It helps identify relevant variables for clustering decisions, measure similarity between variables, and select principal components for dimensionality reduction. Understanding the covariance structure enhances the interpretability and efficiency of both clustering and PCA tasks in the dataset.
plot(prcomp(airplane_scaled))
After scaling the variables in the airplane dataset, they now
range between 0 and 4. Scaling is a preprocessing step commonly used in
data analysis to bring all variables to a similar scale, allowing for
fair comparisons and avoiding potential dominance by variables with
large numerical values
Check distribution
checking the distribution of variables using summary() is an integral part of unsupervised learning. It enables us to handle outliers, apply data preprocessing techniques, and make informed decisions during the clustering and PCA processes. By understanding the data distribution, we can optimize the unsupervised learning algorithms to extract meaningful patterns and structures from the “airplane” dataset.
summary(airplane_scaled)#> Gender Customer.Type Age Type.of.Travel
#> Min. :-0.9704 Min. :-2.1886 Min. :-2.19218 Min. :-1.5968
#> 1st Qu.:-0.9704 1st Qu.: 0.4569 1st Qu.:-0.84495 1st Qu.:-1.5968
#> Median :-0.9704 Median : 0.4569 Median : 0.03075 Median : 0.6262
#> Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
#> 3rd Qu.: 1.0304 3rd Qu.: 0.4569 3rd Qu.: 0.77173 3rd Qu.: 0.6262
#> Max. : 1.0304 Max. : 0.4569 Max. : 3.06202 Max. : 0.6262
#> Class Flight.Distance Inflight.wifi.service
#> Min. :-1.1234 Min. :-1.1549 Min. :-2.0410
#> 1st Qu.:-1.1234 1st Qu.:-0.7936 1st Qu.:-0.5743
#> Median : 0.9525 Median :-0.3657 Median : 0.1591
#> Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
#> 3rd Qu.: 0.9525 3rd Qu.: 0.5979 3rd Qu.: 0.8924
#> Max. : 0.9525 Max. : 3.6501 Max. : 1.6258
#> Departure.Arrival.time.convenient Ease.of.Online.booking Gate.location
#> Min. :-2.00013 Min. :-1.9468 Min. :-1.52751
#> 1st Qu.:-0.69183 1st Qu.:-0.5478 1st Qu.:-0.75530
#> Median :-0.03768 Median : 0.1517 Median : 0.01691
#> Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
#> 3rd Qu.: 0.61647 3rd Qu.: 0.8512 3rd Qu.: 0.78912
#> Max. : 1.27062 Max. : 1.5508 Max. : 1.56133
#> Food.and.drink Online.boarding Seat.comfort Inflight.entertainment
#> Min. :-2.4392 Min. :-2.4620 Min. :-1.9018 Min. :-1.8328
#> 1st Qu.:-0.9247 1st Qu.:-0.9865 1st Qu.:-0.3758 1st Qu.:-1.0728
#> Median :-0.1675 Median : 0.4889 Median : 0.3872 Median : 0.4472
#> Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
#> 3rd Qu.: 0.5897 3rd Qu.: 0.4889 3rd Qu.: 1.1502 3rd Qu.: 0.4472
#> Max. : 1.3469 Max. : 1.2266 Max. : 1.1502 Max. : 1.2072
#> On.board.service Leg.room.service Baggage.handling Checkin.service
#> Min. :-1.9094 Min. :-2.6228 Min. :-2.2546 Min. :-1.8930
#> 1st Qu.:-0.3394 1st Qu.:-1.0846 1st Qu.:-0.5574 1st Qu.:-0.2913
#> Median : 0.4456 Median : 0.4536 Median : 0.2913 Median :-0.2913
#> Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
#> 3rd Qu.: 0.4456 3rd Qu.: 0.4536 3rd Qu.: 1.1399 3rd Qu.: 0.5095
#> Max. : 1.2306 Max. : 1.2227 Max. : 1.1399 Max. : 1.3103
#> Inflight.service Cleanliness Departure.Delay.in.Minutes
#> Min. :-2.2799 Min. :-1.7809 Min. :-0.39139
#> 1st Qu.:-0.5742 1st Qu.:-1.0150 1st Qu.:-0.39139
#> Median : 0.2786 Median :-0.2491 Median :-0.39139
#> Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
#> 3rd Qu.: 1.1315 3rd Qu.: 0.5168 3rd Qu.:-0.09478
#> Max. : 1.1315 Max. : 1.2828 Max. :19.13081
#> Arrival.Delay.in.Minutes satisfaction
#> Min. :1 Min. :-1
#> 1st Qu.:1 1st Qu.:-1
#> Median :1 Median : 0
#> Mean :1 Mean : 0
#> 3rd Qu.:1 3rd Qu.: 1
#> Max. :1 Max. : 1
The result from the summary() function indicates that after scaling, the variables in the “airplane” dataset now range between 0 and 4. Scaling is a crucial step in unsupervised learning, including clustering and PCA, as it ensures that all variables are on a standardized scale. This allows for fair comparisons and prevents certain variables from dominating the analysis due to their larger numerical values. By scaling the variables to the same range, we ensure that each variable contributes equally to the unsupervised learning process, leading to more meaningful and reliable results.
Clustering
Clustering is an unsupervised machine learning technique used to identify and group similar data points into clusters based on their intrinsic patterns and similarities. The primary goal of clustering is to partition the data in such a way that objects within the same cluster are more similar to each other than to those in other clusters.
In the context of our airplane dataframe, clustering can help us discover distinct groups of passengers based on their characteristics and experiences. For example, we can use clustering to identify segments of passengers who share similar travel preferences, satisfaction levels, or other attributes. By doing so, we can gain valuable insights into customer behavior, tailor services, and improve overall customer experience.
Finding Optimal Number of Cluster
Finding the optimal number of clusters is a critical step in clustering analysis, as it determines the most appropriate number of groups to partition the data. Three commonly used methods for this purpose are the Elbow Method, Silhouette Method.
Elbow Method
fviz_nbclust(
x = airplane_scaled, #data untuk clustering
FUNcluster = kmeans, #algoritma kmeans
method = "wss" #berdasarkan wss
)
The Elbow Method suggests that the optimal number of clusters is 3. At
this point, the explained variance starts to level off, and adding more
clusters does not significantly improve the variance explained by the
clustering algorithm.
Silhouette Method
fviz_nbclust(
x = airplane_scaled, #data untuk clustering
FUNcluster = kmeans, #algoritma kmeans
method = "silhouette" #berdasarkan Sillhouete Methode
)
The Silhouette Method yields an optimal number of 2 clusters. This is
determined by the maximum average silhouette score achieved when the
data points are divided into two groups. Higher average silhouette
scores indicate well-defined and distinct clusters
In conclusion, based on the different clustering analysis methods applied to the “airplane” dataset, we obtain varying optimal numbers of clusters: 3 clusters from the Elbow Method, 2 clusters from the Silhouette Method, and 1 cluster from the Gap Statistic. Each method offers valuable insights into the dataset, and the choice of the optimal number of clusters depends on the specific context and objectives of the analysis.
K-Means Clustering
Right now we are performing K-Means Clustering on the airplane_scaled dataset. K-Means is an unsupervised machine learning algorithm that partitions the data into a specified number of clusters (in this case, 3 clusters) based on the similarity of data points to the cluster centers. The algorithm aims to minimize the within-cluster sum of squares, effectively grouping similar data points together while maximizing the separation between clusters.
library(animation)
# pakai 'interval' yang lebih tinggi bila animasi terlalu cepat
# jalankan command ini di console:
RNGkind(sample.kind = "Rounding")
set.seed(100)
ani.options(interval = 1)
par(mar = c(3, 3, 1, 1.5), mgp = c(1.5, 0.5, 0))
kmeans.ani()RNGkind(sample.kind = "Rounding")
set.seed(100)
# k-means dengan k optimum
airplane_cluster <- kmeans(x = airplane_scaled,
centers = 3)
airplane_cluster#> K-means clustering with 3 clusters of sizes 2652, 3392, 3956
#>
#> Cluster means:
#> Gender Customer.Type Age Type.of.Travel Class
#> 1 -0.029589517 -0.07581208 -0.1446386 -0.75685600 -0.7942162
#> 2 0.001108934 -0.27780536 -0.1781261 -0.09664744 -0.3014576
#> 3 0.018885211 0.28902159 0.2496929 0.59024526 0.7909013
#> Flight.Distance Inflight.wifi.service Departure.Arrival.time.convenient
#> 1 -0.4985866 0.005037965 0.21145103
#> 2 -0.2225662 -0.339491421 -0.11096264
#> 3 0.5250749 0.287713402 -0.04660841
#> Ease.of.Online.booking Gate.location Food.and.drink Online.boarding
#> 1 -0.1236495 0.02186144 0.5917204 -0.1898569
#> 2 -0.1706043 -0.03704310 -0.8322952 -0.5970224
#> 3 0.2291730 0.01710658 0.3169623 0.6391811
#> Seat.comfort Inflight.entertainment On.board.service Leg.room.service
#> 1 0.3834822 0.4119284 -0.1922423 -0.3125912
#> 2 -0.9253081 -1.0501691 -0.4622439 -0.3574390
#> 3 0.5363120 0.6243022 0.5252168 0.5160326
#> Baggage.handling Checkin.service Inflight.service Cleanliness
#> 1 -0.1849048 -0.06788385 -0.1490822 0.5740314
#> 2 -0.3902565 -0.34446938 -0.3818722 -0.9725540
#> 3 0.4585737 0.34086656 0.4273702 0.4490829
#> Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
#> 1 0.01766059 0.99995 -0.5384346
#> 2 0.04675663 0.99995 -0.6691882
#> 3 -0.05192983 0.99995 0.9347359
#>
#> Clustering vector:
#> [1] 3 2 2 2 1 2 1 1 1 2 2 1 2 1 1 1 2 2 2 2 2 1 1 2 1 2 1 2 2 2 1 2 2 1 2 1
#> [37] 2 1 1 1 1 1 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 1 1 2 1 1 1 2 2 2 2 1 1 1 2
#> [73] 2 2 1 1 1 1 2 2 2 2 2 1 2 2 1 1 1 1 2 1 3 2 2 2 1 2 2 2 2 2 2 1 2 1 1 1
#> [109] 1 2 1 1 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 2 1 1 1 1 2 2 1 2 2 1 2 2 2 1 2 1
#> [145] 2 1 2 1 2 2 1 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 1 2 1 2 1 1
#> [181] 2 1 2 2 2 1 1 2 1 1 2 2 2 3 1 2 2 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 2 3 2 1
#> [217] 2 2 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 1 1 1 2 2 2 1 2 1 2 2 2 2 1 2 1 2 1 1
#> [253] 1 1 1 2 1 2 2 2 1 2 1 2 2 1 2 2 2 1 1 2 2 1 2 1 1 2 2 1 1 1 1 1 2 2 2 1
#> [289] 2 1 2 2 1 2 1 2 3 1 2 1 1 3 2 2 2 1 1 2 2 2 1 2 2 1 2 1 2 2 2 1 2 2 1 2
#> [325] 2 2 2 2 2 1 2 2 2 1 2 1 1 2 1 2 2 2 1 2 2 1 2 1 2 2 2 2 2 1 2 1 2 1 1 1
#> [361] 2 1 2 1 1 2 1 2 2 1 2 2 1 2 2 1 1 2 2 2 1 1 1 1 2 1 1 1 1 2 2 2 2 1 1 2
#> [397] 1 2 1 2 2 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 1 1 2 2 2 2 2 1 1
#> [433] 1 2 2 2 2 2 3 1 2 2 1 2 1 1 2 1 2 2 3 2 2 1 1 2 1 2 2 1 1 2 2 3 2 3 1 2
#> [469] 1 2 2 2 2 1 2 1 2 2 1 1 1 1 2 2 1 2 1 2 2 2 2 1 2 2 2 2 2 2 1 1 2 1 1 2
#> [505] 2 2 2 2 1 1 2 1 1 2 2 2 1 2 2 2 1 1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 2 2 2 1
#> [541] 2 2 1 2 3 2 1 2 1 2 2 2 1 1 2 2 1 2 2 3 2 1 2 2 1 2 2 2 1 1 1 1 2 1 1 1
#> [577] 1 1 2 2 1 1 2 2 2 2 1 2 1 1 1 2 2 1 1 1 2 2 2 2 2 2 2 2 1 1 2 1 1 1 2 2
#> [613] 1 2 1 1 2 1 1 1 1 2 2 1 2 1 2 2 2 1 1 2 1 1 1 2 1 2 2 1 1 2 1 1 2 2 1 1
#> [649] 2 2 2 1 2 1 2 2 2 2 1 3 1 1 3 2 1 2 1 2 2 2 2 1 2 2 1 2 1 1 2 2 2 1 1 1
#> [685] 1 2 1 1 1 1 2 1 1 1 2 2 2 3 1 2 2 2 2 1 2 2 2 1 1 2 3 1 2 1 2 2 2 1 1 2
#> [721] 1 1 2 2 1 2 2 1 2 2 1 2 2 2 1 1 2 2 2 1 2 1 2 2 1 1 1 1 1 3 2 2 2 1 1 2
#> [757] 2 2 3 2 2 2 2 2 2 2 1 1 2 1 1 2 1 1 2 2 2 2 1 1 2 3 2 2 2 2 2 1 2 1 2 1
#> [793] 2 1 2 2 1 1 2 2 1 1 2 2 2 1 1 1 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 1 2 2 1 2
#> [829] 2 2 1 1 1 2 2 1 2 2 1 2 2 2 1 2 1 2 2 1 2 2 1 1 2 2 2 1 3 2 1 1 2 2 2 1
#> [865] 2 1 2 2 2 1 1 2 2 1 1 2 2 1 2 1 1 2 2 2 2 2 1 2 1 1 2 2 1 2 2 2 2 2 2 1
#> [901] 2 2 2 2 2 2 1 2 2 2 2 1 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 2 2 2 1 2 2 1 2 1
#> [937] 2 1 2 1 1 2 1 2 2 2 1 1 1 2 1 2 3 2 1 1 2 2 2 2 2 1 2 2 1 1 2 1 2 1 1 2
#> [973] 2 1 2 2 1 1 2 1 1 2 2 2 2 1 1 2 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 2 1 2 2 2
#> [1009] 1 2 2 2 2 1 2 2 2 1 2 1 2 2 1 2 2 1 2 2 2 1 1 2 2 1 1 2 2 2 2 1 2 2 2 1
#> [1045] 2 1 2 1 2 1 2 3 2 1 2 2 1 1 1 2 2 1 2 2 1 2 2 2 1 2 1 3 2 2 1 1 2 2 1 2
#> [1081] 3 1 2 2 3 2 2 2 2 1 1 2 2 1 2 1 3 2 2 2 2 1 1 2 2 2 2 1 2 2 1 1 1 1 2 2
#> [1117] 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 3 2 1 1 1 1 2 2 2 2 2 1 2 2 2 2 1 1 2 1 1
#> [1153] 1 2 1 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 1 2 1 2 2 2 1 1 1 2 2 1 2 1 1 2 2 2
#> [1189] 1 1 2 1 2 2 1 2 2 2 2 2 2 1 1 1 2 2 2 2 1 2 1 2 2 1 1 2 2 2 2 1 1 2 2 2
#> [1225] 1 2 1 2 1 2 2 1 1 2 1 1 2 2 2 3 2 2 2 2 2 1 1 2 2 1 1 1 1 1 2 2 2 1 2 1
#> [1261] 1 2 2 2 1 2 2 2 1 2 1 2 1 2 2 2 1 1 1 2 1 2 1 1 1 2 2 1 1 1 2 2 3 2 2 2
#> [1297] 2 1 2 1 1 1 2 2 2 1 2 2 1 1 2 1 1 1 1 2 1 1 2 2 2 1 1 2 1 2 2 2 2 2 2 2
#> [1333] 2 2 2 1 2 2 2 2 3 2 1 2 1 2 2 1 2 1 2 1 1 1 1 2 2 2 2 2 1 1 2 2 2 1 1 2
#> [1369] 2 1 2 2 2 1 2 2 2 2 2 1 1 1 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2
#> [1405] 1 2 1 2 2 1 2 1 2 2 3 2 2 1 1 2 1 1 2 2 2 2 2 2 2 1 2 1 2 1 2 2 1 2 1 2
#> [1441] 1 2 2 2 2 2 1 1 1 1 1 2 2 1 1 1 2 1 1 2 1 1 2 2 1 2 2 2 1 1 1 2 2 2 2 2
#> [1477] 1 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 1 1 1 2 2 2 1 1 2 1 2 2 2 1 2 2 2
#> [1513] 2 2 2 1 1 2 2 2 1 2 1 1 2 1 2 2 1 1 2 2 1 2 1 3 2 2 2 2 2 2 1 2 2 1 1 2
#> [1549] 2 1 2 2 2 2 1 1 1 2 2 1 2 2 2 1 1 1 2 1 2 2 2 1 1 1 2 1 2 1 1 1 1 1 2 3
#> [1585] 1 1 2 2 1 2 1 1 3 2 1 1 2 1 2 1 1 2 1 3 1 1 1 1 2 2 2 2 1 2 2 2 2 2 1 2
#> [1621] 2 1 1 1 1 2 2 1 1 2 2 2 2 1 1 2 1 1 2 2 2 3 1 2 1 2 2 2 2 2 2 2 2 2 2 1
#> [1657] 2 1 2 2 2 3 1 2 1 1 1 1 3 2 1 2 2 2 1 2 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2
#> [1693] 2 1 2 1 2 2 2 2 1 3 2 2 2 2 2 3 2 2 2 1 2 2 2 2 2 1 2 1 1 1 1 1 1 2 2 1
#> [1729] 2 1 2 2 1 2 1 1 1 2 2 2 1 2 2 1 2 1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 1 1 2 1
#> [1765] 2 1 2 2 2 1 1 2 2 1 1 2 1 1 2 1 1 2 2 2 1 1 1 2 2 2 2 2 2 2 1 3 1 1 2 2
#> [1801] 1 1 1 1 2 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 2 1 2
#> [1837] 2 1 1 2 2 2 1 2 2 1 2 1 2 2 1 1 1 2 2 2 3 1 1 1 1 1 2 1 1 2 2 3 3 2 1 2
#> [1873] 2 1 2 1 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2
#> [1909] 2 1 2 1 2 1 2 2 1 2 1 2 1 2 2 2 2 1 1 2 2 1 1 2 2 2 3 2 2 1 1 1 1 2 2 1
#> [1945] 2 2 2 2 2 1 1 2 2 1 1 2 1 2 2 2 1 2 2 1 2 2 2 3 2 1 1 2 2 2 2 2 2 1 2 1
#> [1981] 2 2 2 1 1 2 1 2 2 2 2 2 1 2 2 1 1 2 2 1 2 1 1 1 2 2 2 1 2 2 1 2 1 1 1 2
#> [2017] 2 1 2 2 2 2 1 1 1 2 2 2 2 2 2 1 1 2 2 1 1 2 1 2 2 2 2 2 2 2 2 2 2 3 1 1
#> [2053] 2 2 2 2 1 2 2 1 2 2 2 2 1 1 2 1 1 2 1 2 2 2 1 2 2 1 1 1 1 2 2 2 2 1 2 2
#> [2089] 2 2 1 2 1 2 2 2 2 2 3 2 2 2 1 2 2 2 2 1 1 1 1 2 2 2 1 2 2 1 1 2 2 2 2 3
#> [2125] 1 2 2 2 2 1 2 2 2 2 1 2 2 2 1 1 2 2 2 1 2 1 2 1 1 1 1 1 1 2 2 2 1 2 2 2
#> [2161] 2 2 2 2 2 2 1 2 2 2 2 1 2 1 2 1 1 1 1 2 1 1 2 1 3 1 2 2 1 1 1 2 3 2 1 3
#> [2197] 1 1 2 2 2 2 1 2 1 2 1 1 2 2 1 2 2 1 2 2 1 1 1 1 1 1 2 2 1 1 3 2 3 2 2 2
#> [2233] 2 1 2 1 1 1 2 1 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 1 2 1 2 2 2 1 2 2 1 1 2
#> [2269] 1 2 1 1 1 2 2 2 1 2 1 2 2 2 1 1 1 2 2 1 1 1 2 1 2 2 3 2 2 1 1 2 2 2 3 1
#> [2305] 2 2 1 2 2 2 2 1 2 2 2 2 2 1 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2
#> [2341] 2 1 1 2 1 2 1 1 1 2 1 1 1 2 2 2 1 1 2 2 1 1 2 2 2 1 1 2 2 1 1 1 2 1 2 1
#> [2377] 2 1 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 2 1 2 1 2 2 2 2 1 3 1 3 1 2 2 2 3 1 2
#> [2413] 1 1 1 2 1 2 1 1 2 1 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 2 1 1 1 1 2 2 2 1 2 2
#> [2449] 2 2 1 1 2 1 2 1 2 2 1 1 1 1 2 1 1 3 2 2 1 2 1 2 2 2 1 2 2 1 1 2 2 3 3 2
#> [2485] 2 3 2 1 2 1 1 3 1 1 1 2 1 2 2 2 3 1 2 1 2 1 2 1 2 1 1 1 2 1 1 2 2 1 2 1
#> [2521] 1 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 1 2 2 2 1 2 2 2 2 2 1 2 2 2 1 2 1 2 2 1
#> [2557] 2 1 3 2 1 1 2 3 1 1 2 2 2 2 2 2 1 2 1 2 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 2
#> [2593] 1 2 2 1 1 3 1 2 1 1 1 1 2 3 2 2 2 2 1 1 2 1 3 2 2 2 1 2 2 2 1 1 1 1 2 2
#> [2629] 1 1 1 2 1 1 2 1 2 1 2 2 2 1 1 1 2 1 2 2 2 2 1 2 2 1 2 2 1 2 2 2 2 1 2 1
#> [2665] 2 2 1 1 1 1 1 2 1 1 2 1 2 2 2 1 2 1 2 2 1 2 1 2 1 1 1 2 2 2 1 2 1 1 1 2
#> [2701] 2 3 1 1 2 2 2 2 1 2 1 2 1 1 1 1 1 2 1 2 2 2 1 1 2 2 1 1 1 2 2 2 2 2 2 2
#> [2737] 2 1 1 2 2 1 2 1 2 2 2 2 2 1 2 2 2 2 2 2 1 2 1 1 2 1 1 1 1 2 2 2 1 1 3 2
#> [2773] 2 2 2 2 2 1 2 2 1 2 3 2 2 2 2 1 1 2 1 1 1 2 2 1 1 1 1 1 2 1 2 2 1 2 1 1
#> [2809] 1 2 1 1 2 2 1 2 2 2 2 2 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 2 3 2 2 2 2 2 2
#> [2845] 1 1 2 2 2 1 1 1 2 2 1 1 2 1 3 2 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 2 2 2
#> [2881] 1 1 1 1 2 2 1 2 2 2 2 2 2 1 2 1 2 2 2 2 2 1 2 1 2 1 2 2 2 3 2 2 2 2 2 2
#> [2917] 1 2 2 1 1 2 2 1 1 2 2 1 1 1 1 2 1 2 1 1 1 2 1 1 2 1 1 2 2 2 2 2 2 1 2 2
#> [2953] 2 2 2 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 2 3 2 1 1 2 2 2 1 2 1 2 2 2 2 2 2
#> [2989] 2 2 2 2 1 1 2 2 1 1 2 1 3 2 2 2 2 2 1 1 1 2 3 1 1 2 2 2 1 2 2 2 2 1 1 2
#> [3025] 2 2 3 2 1 1 2 2 1 2 1 1 1 2 1 2 2 2 1 1 2 2 1 2 2 2 2 2 1 1 1 1 2 1 1 2
#> [3061] 1 1 2 1 2 2 1 2 1 1 2 2 2 1 2 2 1 1 2 1 2 1 2 2 1 1 2 2 1 2 2 2 1 1 2 2
#> [3097] 1 1 1 1 2 1 2 1 2 1 1 3 1 2 1 1 2 1 2 2 1 2 2 2 1 1 2 1 3 1 2 2 2 1 2 1
#> [3133] 1 2 2 2 1 2 1 2 2 1 1 1 1 2 2 2 1 1 2 2 1 2 1 1 2 2 2 2 2 2 3 1 2 2 2 1
#> [3169] 2 1 1 1 1 2 2 2 2 1 2 1 2 2 1 1 2 1 2 1 2 2 2 1 1 2 2 1 2 3 2 2 1 1 1 2
#> [3205] 1 1 2 2 2 1 1 2 2 2 1 2 1 2 1 2 1 2 1 1 2 2 2 1 2 1 2 2 2 1 2 3 1 1 2 2
#> [3241] 1 2 2 2 1 1 2 1 2 3 1 2 1 2 2 2 2 2 1 2 1 2 2 2 1 2 2 1 2 2 2 1 2 2 1 1
#> [3277] 1 2 2 2 1 2 2 2 1 2 2 2 1 1 2 1 1 2 1 2 2 1 1 1 2 2 2 2 2 2 3 2 1 1 1 2
#> [3313] 2 2 2 2 2 1 2 2 2 2 1 1 1 2 1 1 2 3 1 2 1 2 1 2 2 2 1 1 1 2 1 1 2 1 1 2
#> [3349] 2 1 1 1 2 2 2 2 1 1 2 2 2 1 2 1 2 1 1 2 2 2 1 2 1 3 1 2 2 2 2 1 1 2 1 1
#> [3385] 2 2 1 3 2 2 1 2 1 2 2 1 1 1 1 2 2 2 2 2 1 2 2 1 1 2 2 1 2 2 1 2 1 1 1 2
#> [3421] 2 2 2 1 1 1 2 2 1 2 1 2 3 2 1 1 1 2 2 1 2 2 1 1 2 2 1 1 1 2 1 2 2 2 1 1
#> [3457] 2 2 1 2 1 1 2 1 1 2 2 1 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1
#> [3493] 1 1 1 2 1 2 2 2 1 2 1 2 1 1 1 1 2 2 1 2 2 1 2 1 2 1 2 2 2 1 1 2 1 1 2 2
#> [3529] 2 1 1 2 2 1 2 1 1 2 2 1 1 2 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2 2 1
#> [3565] 2 3 2 2 2 1 2 1 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 1 2 2 1 1 2 2 2 2 2 1 2
#> [3601] 2 1 2 2 1 1 2 2 1 2 3 1 2 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2 1 1 2 1 1 2 2 2
#> [3637] 2 1 2 2 2 1 1 1 1 1 2 2 2 3 2 2 1 2 1 1 2 1 1 2 1 1 1 1 3 2 1 1 1 1 2 2
#> [3673] 2 2 2 1 1 2 2 1 2 1 1 2 2 2 2 2 3 2 1 1 2 2 2 1 2 2 2 2 1 2 1 2 2 2 1 2
#> [3709] 2 2 2 1 1 2 1 2 2 2 1 2 1 2 1 2 2 1 2 2 2 1 3 2 1 1 1 2 1 2 2 1 2 2 1 2
#> [3745] 2 1 1 1 1 1 1 2 2 2 3 2 1 1 2 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 2 2 2 1 2
#> [3781] 1 2 2 1 2 1 2 1 2 2 2 1 2 1 2 1 2 2 2 1 2 2 2 2 2 1 2 2 1 2 2 2 1 2 2 2
#> [3817] 2 1 2 2 1 1 1 2 1 2 2 2 3 2 2 1 1 2 2 2 1 2 1 2 2 2 1 1 2 2 1 2 2 1 1 1
#> [3853] 2 1 2 2 3 1 2 1 2 2 2 1 1 1 2 2 3 2 1 1 1 2 2 1 1 2 1 2 2 2 1 1 2 1 1 3
#> [3889] 1 1 1 2 2 2 2 1 1 1 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2 1 2 2 1 2 1 2 1 2 2 1
#> [3925] 1 2 1 1 2 1 3 1 2 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 3 2 2 1 2 1 1 1 2 2 2
#> [3961] 1 1 2 1 2 2 2 2 2 1 1 1 2 1 1 2 1 2 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 1 1 2
#> [3997] 1 1 2 2 2 2 1 1 2 1 2 2 1 1 1 2 2 1 2 2 1 2 2 3 1 1 2 2 1 1 2 1 1 2 2 2
#> [4033] 1 2 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 2 2 1 1
#> [4069] 2 2 2 1 1 2 2 1 2 2 2 1 1 1 2 1 1 2 1 2 1 2 1 1 2 2 2 1 2 2 2 2 1 2 1 1
#> [4105] 1 2 1 2 2 2 1 1 2 1 2 2 1 3 2 1 1 2 2 1 1 2 1 2 3 2 2 1 2 2 1 2 1 2 2 1
#> [4141] 2 1 2 2 1 1 2 1 2 2 1 1 2 2 1 2 1 1 1 1 2 2 1 2 2 1 1 1 2 2 2 1 2 1 1 1
#> [4177] 1 1 2 1 1 2 1 2 1 2 1 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 2 2 1 3 1 2 2 2 1 2
#> [4213] 1 1 2 2 2 2 1 1 1 1 1 1 2 1 1 1 2 1 2 1 2 2 1 1 1 2 2 2 2 2 2 2 1 1 1 2
#> [4249] 1 2 2 2 2 3 1 2 2 1 2 2 2 2 1 1 2 2 1 2 2 2 2 2 2 1 2 1 1 1 2 2 2 1 2 2
#> [4285] 1 1 1 2 2 1 1 1 2 2 1 2 2 1 2 2 2 1 2 2 2 1 1 3 2 1 2 2 1 1 1 1 2 1 2 2
#> [4321] 2 2 1 2 1 2 2 2 1 2 2 1 1 2 2 1 1 2 2 2 1 2 1 2 2 1 2 1 1 1 1 1 2 2 1 2
#> [4357] 2 2 1 1 2 2 2 2 1 1 1 2 2 1 2 2 2 1 2 2 1 1 1 1 3 2 1 2 2 2 2 1 1 2 1 2
#> [4393] 2 2 1 2 2 1 2 2 2 1 3 2 1 1 2 1 1 1 2 2 2 1 2 2 1 1 2 2 2 2 2 1 1 2 1 1
#> [4429] 1 1 3 2 2 2 2 2 2 2 2 2 1 2 1 2 1 1 2 2 2 1 1 1 1 2 2 2 1 2 2 2 2 2 1 2
#> [4465] 2 2 2 1 2 2 1 2 2 1 2 2 1 2 2 2 2 2 1 1 3 2 2 1 1 2 2 1 2 1 2 2 1 2 1 2
#> [4501] 1 2 2 1 1 2 1 2 2 1 1 2 1 1 1 1 2 2 2 1 2 2 1 2 3 2 1 2 1 1 2 1 1 2 2 2
#> [4537] 1 2 2 1 1 2 1 2 1 2 2 2 2 2 1 1 2 1 1 1 2 1 2 2 1 2 2 1 2 1 2 2 1 2 1 2
#> [4573] 1 1 1 1 1 1 2 1 2 2 1 2 1 2 1 2 2 2 2 2 2 1 1 2 2 2 1 2 2 2 2 1 1 2 2 1
#> [4609] 2 1 2 2 2 2 2 1 2 1 1 1 3 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 1 2 2 2 1 2 2 2
#> [4645] 2 1 3 2 1 2 1 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 2 3 1 3 1 1 2 2 1
#> [4681] 1 1 1 1 2 2 2 2 1 2 2 2 1 1 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2
#> [4717] 1 3 2 2 1 2 1 1 1 2 1 2 2 2 1 2 1 1 2 1 1 2 2 2 2 2 2 1 2 2 1 1 2 1 2 1
#> [4753] 1 1 2 2 2 3 2 2 1 2 1 1 2 2 2 1 1 1 3 3 2 2 2 2 1 2 1 2 1 2 2 2 1 1 1 1
#> [4789] 1 2 2 2 2 1 1 2 1 2 1 2 1 1 1 2 1 1 2 1 1 2 1 2 1 2 2 1 2 1 2 2 1 2 2 2
#> [4825] 1 1 1 2 2 2 2 1 2 1 1 2 2 2 1 1 2 2 2 1 2 1 1 2 2 1 2 2 1 1 1 2 2 2 2 2
#> [4861] 1 2 2 1 1 2 1 2 2 2 1 1 1 2 2 2 2 1 2 3 1 1 2 1 2 1 2 2 2 2 1 1 2 2 1 1
#> [4897] 3 2 1 1 2 1 1 1 2 1 1 2 3 2 2 2 2 2 1 2 2 1 1 2 2 2 1 1 1 2 1 1 2 2 1 2
#> [4933] 1 1 2 2 2 1 2 2 2 2 1 2 1 2 1 1 1 1 2 2 2 1 1 1 2 2 2 2 2 1 1 1 2 1 1 2
#> [4969] 2 2 2 1 2 2 2 2 1 1 3 1 1 1 3 2 2 1 2 1 2 2 2 1 2 2 2 1 2 1 2 1 2 3 3 3
#> [5005] 3 2 3 3 3 3 3 3 3 1 3 3 3 3 3 1 3 3 3 3 3 3 2 3 3 3 3 3 3 3 1 3 3 3 3 3
#> [5041] 1 3 2 2 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [5077] 3 3 2 3 3 2 3 3 1 2 3 3 3 3 2 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 1 3 3 3 3 3
#> [5113] 2 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 2 3 3 3 1 3 3 3 3 3 2 1 3 3 3
#> [5149] 3 1 3 3 3 3 3 3 1 3 3 3 2 3 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 3 3 3 3 1 3 3
#> [5185] 3 3 3 3 3 3 3 3 3 1 3 1 2 3 1 3 1 3 3 3 3 3 3 2 2 1 3 3 3 3 3 2 3 3 3 3
#> [5221] 3 2 3 3 3 3 3 3 3 3 3 3 1 1 3 3 2 3 3 3 3 3 3 1 3 1 3 3 3 3 3 3 3 3 3 2
#> [5257] 3 3 2 1 3 3 2 3 3 3 1 3 3 1 3 3 3 3 3 2 3 3 3 3 3 3 3 3 1 3 2 3 3 3 1 3
#> [5293] 1 3 2 1 3 3 3 3 1 2 3 2 1 3 2 3 3 1 2 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2
#> [5329] 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 3 3 3 2 3 3 2 2 3
#> [5365] 3 3 2 3 3 2 3 2 3 2 3 3 3 3 3 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [5401] 3 3 2 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 2 3 3 1 3 3 3 3 3 3 3 3
#> [5437] 3 3 3 2 3 2 3 3 3 1 3 3 3 3 3 3 1 3 3 1 3 3 3 3 3 3 2 3 1 3 3 2 3 3 3 3
#> [5473] 3 3 3 3 2 3 3 3 3 3 2 2 3 2 3 3 1 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3
#> [5509] 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 1 3 2 3 3 3 3 3 1 3 3 3 3 3 3 3 1 3 3 3
#> [5545] 3 3 3 2 3 3 3 3 3 1 3 3 3 3 1 2 3 3 3 2 3 2 3 2 3 3 3 3 3 3 2 3 3 3 1 3
#> [5581] 3 3 3 3 1 3 3 3 3 2 3 3 3 3 2 3 3 3 1 3 3 3 3 2 3 2 3 3 3 2 3 3 3 3 3 2
#> [5617] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 2 3 3 1 3 3 2 3 3 3 3 3 3 1 1 3 3 2 3
#> [5653] 2 2 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3
#> [5689] 3 3 1 3 3 3 3 2 3 3 3 1 3 2 3 3 2 2 3 3 1 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3
#> [5725] 3 3 3 1 3 3 3 3 3 2 3 3 3 3 2 3 3 2 2 2 3 3 2 3 3 3 2 3 3 3 3 2 3 3 3 3
#> [5761] 3 3 3 3 3 3 3 3 1 1 3 1 2 3 3 3 3 3 3 1 3 3 3 1 3 3 2 3 1 3 3 3 2 3 3 3
#> [5797] 3 3 3 3 3 3 3 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3 3 3 3 2 1 3 3 2 1
#> [5833] 3 3 3 3 1 3 3 3 3 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 1 1 3 3 3 3 3 3 1 3 3
#> [5869] 3 3 3 1 1 1 3 3 3 3 3 3 3 2 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 2
#> [5905] 3 1 3 1 3 2 3 2 1 3 3 3 3 3 2 3 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 1 3 3
#> [5941] 1 2 1 3 3 3 3 1 1 3 3 1 3 3 3 3 1 2 2 3 3 3 3 3 3 3 1 3 3 2 2 3 3 3 1 3
#> [5977] 3 3 2 3 1 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 3 3 3 3 3
#> [6013] 3 3 3 1 3 3 3 1 1 3 3 3 3 3 3 1 3 3 3 3 3 3 1 3 3 3 3 2 3 1 3 3 3 1 3 3
#> [6049] 3 2 3 3 3 3 3 2 1 3 3 1 1 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3
#> [6085] 3 3 3 3 3 2 1 3 3 2 3 2 3 3 3 3 3 2 3 2 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 3
#> [6121] 3 3 3 2 3 3 3 3 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 2 3 2 3
#> [6157] 3 3 3 3 3 3 2 3 3 3 3 3 3 3 1 3 3 3 3 1 3 3 3 3 1 2 3 1 3 3 2 2 3 3 2 3
#> [6193] 3 1 1 3 3 2 3 2 1 3 1 1 3 3 3 3 3 2 2 3 1 3 1 3 3 1 3 3 1 3 3 3 3 3 3 3
#> [6229] 3 3 3 1 3 3 3 3 3 3 3 1 1 1 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 3 2 1 3 3
#> [6265] 3 3 3 2 1 3 3 3 3 1 3 2 3 1 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3 3 1 3
#> [6301] 3 3 3 3 3 3 3 2 3 1 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [6337] 2 2 3 3 3 3 3 3 2 3 3 3 3 1 3 1 3 3 3 3 3 3 1 3 3 3 3 3 3 1 3 3 3 3 3 2
#> [6373] 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2 3
#> [6409] 3 3 1 2 3 2 3 3 3 1 2 3 3 3 3 3 1 1 3 1 3 3 3 3 1 3 1 1 3 3 3 3 3 3 2 3
#> [6445] 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 3 1 3 3 2 2 3 3 3 3 3 2 1 3
#> [6481] 1 3 3 3 2 2 3 3 3 3 1 3 1 3 3 3 3 1 2 3 3 3 3 1 1 2 3 3 1 3 1 3 3 2 1 1
#> [6517] 2 3 3 3 1 3 2 3 1 3 3 3 3 3 3 3 1 3 2 2 3 3 3 3 2 3 2 3 3 3 3 3 3 3 3 3
#> [6553] 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 1 3 3 3 1 3 3 1 3 2 3 3 3 1 3 3 1 1 3
#> [6589] 3 3 2 3 3 3 3 3 2 3 1 3 3 1 3 3 3 1 3 3 1 3 3 3 1 3 3 3 3 1 3 1 3 2 3 3
#> [6625] 3 3 3 3 3 3 3 3 3 3 1 1 2 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 3 2 3
#> [6661] 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 1 3 2 1 3 2 3 3 3 3 2 2 3
#> [6697] 3 3 2 2 3 2 3 3 3 1 3 3 3 3 3 3 3 1 3 1 3 3 2 3 3 3 2 3 3 3 2 2 3 3 3 3
#> [6733] 3 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3
#> [6769] 3 1 3 3 3 3 3 1 3 3 3 3 3 2 3 3 1 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2
#> [6805] 1 3 3 3 3 3 1 3 3 3 2 3 2 3 3 1 3 2 3 1 3 1 3 3 3 3 3 3 3 3 3 2 3 2 3 3
#> [6841] 3 2 3 2 3 2 3 1 2 3 3 3 1 1 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 2
#> [6877] 3 3 3 3 1 2 3 3 3 2 3 3 2 3 3 3 3 3 1 3 3 3 1 2 3 1 3 3 3 3 2 3 1 3 3 3
#> [6913] 3 3 3 3 3 3 3 1 1 1 3 3 1 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 2 3 1 3 3 3 1 3
#> [6949] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 2 3 3 3 3 2 2 3 2 3 3 3 3 2 3 3 3 3 1
#> [6985] 3 2 3 3 3 1 3 3 2 3 3 3 3 3 1 3 1 3 2 3 3 3 3 3 2 3 3 3 1 3 2 3 3 3 3 3
#> [7021] 3 2 3 3 3 2 3 3 3 3 3 1 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 1 3 3 3 3 3
#> [7057] 3 1 3 3 3 1 3 3 1 1 3 1 1 3 1 3 3 3 2 3 3 2 3 1 3 3 3 1 3 1 3 3 3 3 1 3
#> [7093] 1 2 3 1 2 2 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2 2 3 3 3 1 3 3 3 3 3 3 3
#> [7129] 3 3 1 3 3 3 3 3 2 3 3 3 1 3 3 1 3 3 3 3 3 2 3 1 3 1 1 1 1 3 1 3 3 1 3 1
#> [7165] 3 3 3 3 3 3 1 3 3 3 3 2 1 3 3 1 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 2 1 3 1 3
#> [7201] 3 3 1 2 3 3 3 3 3 3 3 3 2 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 3 1 2 3 3 3 1 1
#> [7237] 3 1 1 3 3 3 3 3 3 2 3 1 3 3 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [7273] 3 3 3 1 3 1 3 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 2 3 3 2 3 3 2 3 3 3 1 3 3 2
#> [7309] 3 3 3 3 3 1 1 3 3 1 3 1 2 3 3 3 3 3 2 1 3 3 3 3 3 1 3 3 3 2 3 3 3 2 3 3
#> [7345] 3 1 3 3 3 2 3 3 3 3 1 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3
#> [7381] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 1 3 3 3 3 3 1 3 1 3 1 1 3 3 3 1 3 1 3 1
#> [7417] 3 3 3 1 3 2 3 3 3 2 3 3 1 2 1 3 1 3 3 3 3 3 2 3 1 3 1 1 3 3 1 2 3 3 3 3
#> [7453] 1 2 1 3 3 3 3 3 3 3 3 3 3 2 1 3 1 3 1 3 3 1 3 3 3 2 2 3 1 3 3 3 3 3 3 3
#> [7489] 3 3 3 3 3 2 3 1 3 3 3 3 2 3 1 1 3 1 2 3 3 3 3 3 2 3 1 3 2 3 3 3 3 3 3 3
#> [7525] 3 3 2 3 3 1 3 1 3 1 3 3 2 3 3 3 3 1 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 1
#> [7561] 3 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 3 3 3 3
#> [7597] 3 3 3 3 3 1 3 3 3 1 3 3 3 3 3 3 3 3 2 3 1 2 3 2 3 3 3 1 1 3 2 1 3 1 3 1
#> [7633] 3 3 3 3 3 3 3 3 2 2 3 3 2 2 3 2 2 3 2 2 3 3 3 3 3 1 2 3 3 3 3 3 3 3 3 1
#> [7669] 2 2 3 3 3 3 3 2 3 2 3 3 3 3 3 3 3 1 1 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3 3 1
#> [7705] 3 3 3 3 3 3 3 2 3 2 1 3 3 3 2 3 2 3 1 3 3 3 1 3 3 3 3 3 1 3 3 3 3 3 3 3
#> [7741] 2 1 2 3 1 3 3 2 3 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 2 3 1
#> [7777] 3 2 3 3 3 3 1 3 3 1 3 3 3 3 3 1 3 3 3 3 3 2 1 2 3 1 3 1 3 1 2 3 3 3 3 3
#> [7813] 3 2 3 3 3 3 2 3 3 3 3 3 1 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 1
#> [7849] 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 1 2 1 3 3 2 3 3 3 3 3 2 3 3 3 3 2 3
#> [7885] 3 3 2 3 3 3 3 2 3 3 3 3 1 3 3 3 3 2 3 3 3 3 1 3 3 3 2 3 2 3 3 2 1 3 3 3
#> [7921] 3 3 3 3 2 3 3 3 3 2 1 3 3 2 2 3 3 1 1 3 3 3 1 3 3 3 3 3 3 3 3 3 2 3 3 3
#> [7957] 3 3 3 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 3 1 3 3 1 3 2
#> [7993] 1 3 2 1 2 1 3 3 1 3 3 3 2 3 3 3 3 3 2 3 3 1 1 3 3 3 3 3 1 3 1 3 2 3 3 1
#> [8029] 3 3 3 3 1 2 3 3 3 3 2 3 3 3 1 3 1 3 3 3 3 3 2 1 1 3 3 3 3 2 3 3 3 1 3 3
#> [8065] 1 2 3 3 3 3 3 3 3 2 1 3 2 3 3 3 3 3 2 3 1 3 3 2 3 2 3 3 1 3 3 3 3 3 1 3
#> [8101] 3 3 3 3 3 2 3 3 1 3 3 3 1 3 3 3 3 1 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 2 3 3
#> [8137] 3 1 3 2 2 3 3 1 3 1 3 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 3 1 3 3
#> [8173] 3 1 3 3 1 3 3 3 3 2 3 3 3 3 3 3 1 3 3 3 3 3 3 3 1 2 3 3 1 3 3 3 2 2 3 3
#> [8209] 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 1 2 3 3 3 3 1 3 3 3 3 3 3 3 3 2 3
#> [8245] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 3 2 3 3 3 3 3 3 3 3 3 3 1 3 3 3 2 1 1 3
#> [8281] 3 3 2 1 2 3 2 3 3 2 3 3 1 3 3 1 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3
#> [8317] 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 2 1 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3
#> [8353] 3 2 1 3 3 3 3 3 3 1 3 3 3 3 3 2 1 3 3 2 3 3 3 2 1 3 3 3 3 3 1 3 3 2 3 3
#> [8389] 3 3 3 3 3 3 1 3 3 2 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3
#> [8425] 2 3 3 3 3 3 3 3 3 3 2 3 1 3 3 3 3 3 3 1 3 3 3 1 3 2 3 3 3 3 2 2 3 1 3 3
#> [8461] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 1 3 1 2 3 3 1 3 1 3 1 3 3 3 2 2 3 3 3 3
#> [8497] 3 3 3 3 3 3 3 1 2 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 1 3 3 1 2 3 3 1 3
#> [8533] 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 1 3 3 3 3 3 3 2 3 3 1 3 1 3
#> [8569] 3 3 3 3 3 3 3 1 1 3 1 3 2 3 3 3 3 2 3 3 3 3 3 3 2 3 2 1 3 2 3 3 3 3 1 2
#> [8605] 3 3 3 3 3 3 3 3 3 3 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 1 2 2 3 1
#> [8641] 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 2 3 1 2 3 3
#> [8677] 1 3 2 3 3 3 3 3 3 3 3 1 3 3 1 2 2 1 3 1 2 3 3 2 3 3 3 3 3 3 2 3 3 3 3 1
#> [8713] 1 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 1 2 3 2 3 1 2 3 1 2 1 3 3 2 3 3
#> [8749] 1 2 3 3 1 3 3 3 3 1 3 3 3 3 1 3 3 3 2 3 1 3 3 3 3 3 3 3 3 3 3 2 3 2 3 1
#> [8785] 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 1 2 3 3 1 3 3 1 3 3 3 1 3 3 3 3 3 3 3 1 3
#> [8821] 3 3 3 2 3 3 3 3 1 3 2 3 1 3 2 3 3 3 3 1 3 3 3 1 3 3 2 2 3 3 3 1 2 3 3 3
#> [8857] 3 1 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2
#> [8893] 2 3 1 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 1 3 3 2 2 3 3
#> [8929] 3 3 3 3 3 3 3 2 3 3 3 3 3 3 1 3 3 3 1 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1
#> [8965] 1 3 1 2 3 1 3 3 1 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [9001] 3 3 3 3 3 3 1 3 1 3 3 3 3 1 3 3 3 3 2 3 3 3 1 3 3 1 2 3 1 2 3 3 3 3 3 3
#> [9037] 3 3 3 2 1 3 1 3 1 3 3 3 3 3 3 3 2 3 1 3 3 3 3 3 3 3 3 1 3 1 3 3 3 3 3 3
#> [9073] 1 1 3 1 3 3 3 3 3 3 1 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 1 3 3 3 3 3 1 3 3
#> [9109] 2 1 3 3 3 3 3 3 1 3 3 3 3 3 2 3 1 3 3 3 3 1 3 2 3 3 3 3 3 3 3 2 2 3 3 3
#> [9145] 3 3 3 3 3 3 3 3 3 1 3 3 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3 3 3
#> [9181] 3 3 3 2 3 2 3 3 3 3 3 1 3 3 3 3 3 2 3 2 3 1 3 3 3 3 3 3 3 3 3 3 3 3 1 3
#> [9217] 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 1 3 3 3 2 3 3 1 2 3 3 3 2 1 1 2 3 3 3 3 3
#> [9253] 3 3 3 2 3 3 3 3 3 3 3 3 3 2 1 3 3 3 1 3 3 1 3 3 3 3 3 3 1 3 3 3 3 3 2 1
#> [9289] 3 3 2 3 1 3 1 3 1 3 3 3 3 3 3 2 1 3 2 3 3 1 3 1 1 3 3 3 3 3 3 3 3 3 3 1
#> [9325] 3 3 1 3 3 1 3 3 2 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3
#> [9361] 3 3 3 2 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 1 3 3 1 3 2 3 1 3 3 2 3 1 3 3
#> [9397] 3 2 3 1 3 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 3 1 3 3 3 3 3 3 3 3 2 3
#> [9433] 1 3 2 1 3 3 3 3 3 1 2 3 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3
#> [9469] 3 3 3 3 3 3 3 2 3 3 1 3 1 3 3 2 3 3 3 3 3 2 3 3 3 1 3 3 1 3 1 3 3 2 3 2
#> [9505] 3 3 2 3 3 2 2 3 3 1 1 3 3 1 3 3 3 3 3 3 2 3 3 3 3 3 3 2 1 3 3 3 1 3 2 3
#> [9541] 2 1 3 3 3 3 3 3 3 3 1 1 3 3 3 3 3 3 3 3 3 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3
#> [9577] 3 1 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 2 3 3 3 1 3 1 3 2 3 3 3 3 1 1 3 3 3 2
#> [9613] 3 2 2 3 1 3 2 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 1 3 3
#> [9649] 3 1 3 3 3 3 3 3 3 2 1 3 3 3 1 3 3 3 2 3 3 2 3 3 3 1 3 3 3 3 3 3 3 3 3 3
#> [9685] 3 3 3 3 1 3 3 3 3 3 1 1 1 3 3 3 1 3 3 3 3 3 3 3 2 3 2 3 2 3 3 3 3 3 3 3
#> [9721] 3 3 3 3 1 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 1 3 3
#> [9757] 3 3 3 1 3 2 2 3 3 2 3 3 3 3 3 3 3 3 1 3 3 1 3 1 2 2 3 3 1 3 3 3 3 3 3 3
#> [9793] 3 3 1 3 3 3 3 3 3 3 1 3 3 1 3 3 3 3 3 3 3 3 3 3 2 3 3 3 1 3 3 3 3 1 3 3
#> [9829] 3 3 1 3 2 3 3 1 1 3 3 2 3 3 3 3 3 1 1 3 3 3 3 2 1 2 3 3 3 3 3 3 1 3 3 3
#> [9865] 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 2 3 3 3 1 3 3 2 3 3 3 3 2 3 1
#> [9901] 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 2 3 3 2 3 2 1 2 3 3 3 3 3 2 3 3 3 3 2 3 3
#> [9937] 1 3 3 3 3 3 3 1 2 3 3 3 1 3 3 3 3 3 3 3 3 2 3 3 2 3 3 3 3 2 3 3 3 1 3 3
#> [9973] 3 2 1 3 2 3 3 3 3 1 1 3 3 3 3 3 3 3 2 3 1 2 3 1 3 3 2 2
#>
#> Within cluster sum of squares by cluster:
#> [1] 50249.74 66499.16 57009.60
#> (between_SS / total_SS = 21.0 %)
#>
#> Available components:
#>
#> [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
#> [6] "betweenss" "size" "iter" "ifault"
airplane$cluster <- as.factor(airplane_cluster$cluster)
head(airplane)#> # A tibble: 6 × 24
#> # Groups: satisfaction [1]
#> Gender Customer.Type Age Type.of.Travel Class Flight.Distance
#> <dbl> <dbl> <int> <dbl> <dbl> <int>
#> 1 0 0 25 1 3 1671
#> 2 1 0 25 1 1 748
#> 3 0 1 64 1 3 1652
#> 4 1 1 54 0 1 216
#> 5 0 1 55 0 3 562
#> 6 1 1 59 0 1 1096
#> # ℹ 18 more variables: Inflight.wifi.service <int>,
#> # Departure.Arrival.time.convenient <int>, Ease.of.Online.booking <int>,
#> # Gate.location <int>, Food.and.drink <int>, Online.boarding <int>,
#> # Seat.comfort <int>, Inflight.entertainment <int>, On.board.service <int>,
#> # Leg.room.service <int>, Baggage.handling <int>, Checkin.service <int>,
#> # Inflight.service <int>, Cleanliness <int>,
#> # Departure.Delay.in.Minutes <int>, Arrival.Delay.in.Minutes <dbl>, …
Evaluation
Clustering evaluation involves assessing the quality of the obtained clusters and the effectiveness of the clustering algorithm in partitioning the data. In this code, we use two evaluation metrics: Within-Cluster Sum of Squares (WSS) and Between-Cluster Sum of Squares (BSS) to understand the clustering performance.
# WSS
airplane_cluster$withinss#> [1] 50249.74 66499.16 57009.60
airplane_cluster$tot.withinss #> [1] 173758.5
# BSS/TSS
airplane_cluster$betweenss/airplane_cluster$totss#> [1] 0.2101097
Within-Cluster Sum of Squares (WSS): The WSS measures the compactness of each cluster, quantifying the sum of squared distances between data points and their assigned cluster center. In this code, we have three WSS values for the three clusters obtained. Lower WSS values indicate more compact clusters, indicating that data points within the same cluster are closer together, and the clustering is effective in grouping similar points.
Total Within-Cluster Sum of Squares (Total WSS or TSS): The Total WSS represents the sum of the WSS values for all clusters. It measures the overall compactness of the clustering solution. In this code, the Total WSS value is 174076.5, representing the cumulative compactness of the three clusters. The goal is to minimize the Total WSS to achieve more distinct clusters.
Between-Cluster Sum of Squares (BSS) / Total Sum of Squares (TSS): The BSS measures the separation between clusters, representing the sum of squared distances between cluster centers and the overall data mean. The Total Sum of Squares (TSS) is the sum of squared distances between each data point and the overall data mean. The ratio BSS/TSS indicates the proportion of variance explained by the clustering solution. A higher BSS/TSS ratio (in this case, 0.2086638) indicates that the clusters are well-separated and the clustering solution is more effective in capturing the variance among data points.
By evaluating the WSS, Total WSS, and BSS/TSS ratio, we gain insights into the clustering quality and how well the clusters are separated. These evaluation metrics help assess the validity and appropriateness of the K-Means clustering solution and provide valuable information for further analysis and decision-making.
Grouping data based on cluster label
as.data.frame(airplane_cluster$centers)#> Gender Customer.Type Age Type.of.Travel Class
#> 1 -0.029589517 -0.07581208 -0.1446386 -0.75685600 -0.7942162
#> 2 0.001108934 -0.27780536 -0.1781261 -0.09664744 -0.3014576
#> 3 0.018885211 0.28902159 0.2496929 0.59024526 0.7909013
#> Flight.Distance Inflight.wifi.service Departure.Arrival.time.convenient
#> 1 -0.4985866 0.005037965 0.21145103
#> 2 -0.2225662 -0.339491421 -0.11096264
#> 3 0.5250749 0.287713402 -0.04660841
#> Ease.of.Online.booking Gate.location Food.and.drink Online.boarding
#> 1 -0.1236495 0.02186144 0.5917204 -0.1898569
#> 2 -0.1706043 -0.03704310 -0.8322952 -0.5970224
#> 3 0.2291730 0.01710658 0.3169623 0.6391811
#> Seat.comfort Inflight.entertainment On.board.service Leg.room.service
#> 1 0.3834822 0.4119284 -0.1922423 -0.3125912
#> 2 -0.9253081 -1.0501691 -0.4622439 -0.3574390
#> 3 0.5363120 0.6243022 0.5252168 0.5160326
#> Baggage.handling Checkin.service Inflight.service Cleanliness
#> 1 -0.1849048 -0.06788385 -0.1490822 0.5740314
#> 2 -0.3902565 -0.34446938 -0.3818722 -0.9725540
#> 3 0.4585737 0.34086656 0.4273702 0.4490829
#> Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
#> 1 0.01766059 0.99995 -0.5384346
#> 2 0.04675663 0.99995 -0.6691882
#> 3 -0.05192983 0.99995 0.9347359
Cluster Analyst
Cluster Centeroid
airplane_centroid <- airplane %>%
group_by(cluster) %>%
summarise_all(mean) %>%
mutate_if(is.numeric, .funs = "round", digits = 2)
airplane_centroid#> # A tibble: 3 × 24
#> cluster Gender Customer.Type Age Type.of.Travel Class Flight.Distance
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.47 0.8 37.4 0.38 1.32 726.
#> 2 2 0.49 0.72 36.9 0.67 1.79 1008.
#> 3 3 0.49 0.94 43.2 0.98 2.84 1772.
#> # ℹ 17 more variables: Inflight.wifi.service <dbl>,
#> # Departure.Arrival.time.convenient <dbl>, Ease.of.Online.booking <dbl>,
#> # Gate.location <dbl>, Food.and.drink <dbl>, Online.boarding <dbl>,
#> # Seat.comfort <dbl>, Inflight.entertainment <dbl>, On.board.service <dbl>,
#> # Leg.room.service <dbl>, Baggage.handling <dbl>, Checkin.service <dbl>,
#> # Inflight.service <dbl>, Cleanliness <dbl>,
#> # Departure.Delay.in.Minutes <dbl>, Arrival.Delay.in.Minutes <dbl>, …
General Intrepertation
Cluster 1consists of customers who gave relatively low ratings across various aspects of the flight experience. They are mostlyregular customers, traveling fornon-business purposes, and typically opt foreconomy class. Their overall satisfaction is relatively low.Cluster 2represents customers who areslightly older on average, tend to beregular customers, andprefer business class. They generally give higher ratings in all aspects of the flight experience compared toCluster 1, resulting in moderateoverall satisfaction.Custer 3consists of customers who are older than those inCluster 2and havelonger flight distances. They give the highest ratings in all aspects of the flight experience and exhibit thehighest overall satisfaction.
Visualization
fviz_cluster(object = airplane_cluster,
data = airplane %>% select(-cluster),
labelsize = 0) +theme_minimal()Looks like our cluster models are overlapping and not well separated, it indicates that the clusters might not be distinct enough in the current feature space. To improve the separation between clusters, We can consider the following suggestions:
Feature Selection: Review the features used for clustering and identify if there are other relevant features that can better differentiate the clusters. You can try selecting a subset of informative features or perform feature engineering to create new meaningful features.
Feature Scaling: Check if the features used for clustering are on different scales. If there are significant differences in the ranges or units of different features, it’s recommended to scale the features appropriately. Common scaling techniques include standardization (subtracting the mean and dividing by the standard deviation) or normalization (scaling to a specific range, e.g., [0, 1]). Scaling the features can help avoid dominance of certain features and enable fair comparisons during clustering.
Dimensionality Reduction: If the number of features is high, it might be beneficial to reduce the dimensionality of the dataset. Techniques like Principal Component Analysis (PCA) or t-SNE can help capture the most important information while reducing the dimensionality. By visualizing or clustering on the reduced-dimensional space, you might obtain better cluster separation.
Adjust Clustering Algorithm: Consider trying different clustering algorithms that are suitable for your data. Algorithms such as hierarchical clustering, DBSCAN, or density-based clustering can handle different types of data distributions and cluster shapes. Experimenting with alternative algorithms might reveal more distinct clusters in your data.
Looks like we’re gonna used Dimensionality Reduction to improve seperation between clusters
Principal Component Analysis (PCA)
To improve the separation between clusters in the “airplane” dataset. The primary goal of PCA is to reduce the number of variables while preserving the most important information, allowing us to achieve a better representation of the data in lower-dimensional space.
pca <- prcomp(airplane %>%
select(-cluster),
scale = T)summary(pca)#> Importance of components:
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
#> Standard deviation 2.1522 1.5597 1.4844 1.35906 1.21171 1.02086 1.00314
#> Proportion of Variance 0.2014 0.1058 0.0958 0.08031 0.06384 0.04531 0.04375
#> Cumulative Proportion 0.2014 0.3072 0.4030 0.48327 0.54711 0.59242 0.63617
#> PC8 PC9 PC10 PC11 PC12 PC13 PC14
#> Standard deviation 1.00000 0.97962 0.96835 0.89452 0.8265 0.75851 0.69575
#> Proportion of Variance 0.04348 0.04172 0.04077 0.03479 0.0297 0.02501 0.02105
#> Cumulative Proportion 0.67965 0.72137 0.76214 0.79693 0.8266 0.85165 0.87270
#> PC15 PC16 PC17 PC18 PC19 PC20 PC21
#> Standard deviation 0.66743 0.66246 0.62701 0.60765 0.56485 0.54412 0.51386
#> Proportion of Variance 0.01937 0.01908 0.01709 0.01605 0.01387 0.01287 0.01148
#> Cumulative Proportion 0.89206 0.91115 0.92824 0.94429 0.95816 0.97104 0.98252
#> PC22 PC23
#> Standard deviation 0.47632 0.41862
#> Proportion of Variance 0.00986 0.00762
#> Cumulative Proportion 0.99238 1.00000
fviz_eig(pca, ncp = 23,
addlabels = T,
main = "Variance explained by each dimensions")In this case, we want to keep 80% of the information, which corresponds to approximately the first 11 principal components. By retaining these 11 components, we achieve a good balance between reducing the dimensionality of the data while preserving a substantial amount of the original information. The reduced feature space can lead to improved separation between clusters, allowing for more distinct and meaningful groupings of data points in subsequent clustering analyses.
# mengambil PC hasil dimensionality reduction
pc_keep <- as.data.frame(pca$x[,1:11])
airplane %>%
select_if(~!is.numeric(.)) %>%
cbind(pc_keep)#> # A tibble: 10,000 × 13
#> # Groups: satisfaction [2]
#> satisfaction cluster PC1 PC2 PC3 PC4 PC5 PC6 PC7
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 3 2.20 -0.0985 0.272 1.21 -2.34 0.0294 -0.494
#> 2 0 2 -2.09 1.53 0.525 0.504 -2.32 -1.65 -1.71
#> 3 0 2 -0.998 0.482 -0.766 -1.76 0.819 0.474 0.522
#> 4 0 2 -1.34 -0.0313 2.35 1.40 1.70 0.881 0.635
#> 5 0 1 1.34 -0.676 -0.390 1.54 1.48 1.64 -0.197
#> 6 0 2 -3.61 0.106 -0.878 -0.402 1.63 -0.961 -0.780
#> 7 0 1 -2.55 -1.91 -3.27 0.0560 1.82 1.60 -0.00829
#> 8 0 1 -0.283 -1.17 0.734 -0.108 -1.96 1.52 -0.153
#> 9 0 1 -0.838 -1.13 1.02 0.604 -2.23 -0.425 0.486
#> 10 0 2 -3.08 0.688 -1.28 0.991 -2.79 -0.803 -0.290
#> # ℹ 9,990 more rows
#> # ℹ 4 more variables: PC8 <dbl>, PC9 <dbl>, PC10 <dbl>, PC11 <dbl>
Visualization PCA
Individual Factor Map
# membuat biplot
biplot(x = pca,
cex = 0.6,
scale = FALSE)Variabel Factor MAP
fviz_contrib(
X = pca,
choice = "var",
axes = 1
)fviz_contrib(
X = pca,
choice = "var",
axes = 2
)fviz_contrib(
X = pca,
choice = "var",
axes = 3
)Combining PCA and K Means
In this task, we are combining Principal Component Analysis (PCA) and K-Means clustering using the “FactorMiner” library. The primary reason for using the “FactorMiner” library is that it provides more interpretability for the PCA results, making it easier to understand the relationships between the original variables and the principal components. “FactorMiner” offers additional tools for visualizing PCA results and extracting meaningful insights from the reduced feature space.
Another reason for using the “FactorMiner” library could be that it supports a broader range of PCA variations and analyses. It may provide options for different types of rotations, allowing us to explore the underlying structure of the data in a more comprehensive manner. Moreover, “FactorMiner” might offer additional methods for assessing the quality of the PCA results and selecting the optimal number of principal components based on various criteria.
# numeric column name (quantitative)
quanti <- airplane %>%
select_if(is.numeric) %>%
colnames()
# numeric column index
quantivar <- which(colnames(airplane) %in% quanti)
# categorical (qualitative) column names
quali <- airplane %>%
select_if(is.factor) %>%
colnames()
# categorical column index
qualivar <- which(colnames(airplane) %in% quali)# equivalent to prcomp(data, scale. = T)
loan_pca <- PCA(X = airplane, #data used
scale.unit = T, #scaling
quali.sup = qualivar, #tell me which column is categorical
graph = F, #don't want to show plot directly
ncp = 23) #the number of PCs corresponds to the number of variableshead(as.data.frame(loan_pca$ind$coord))#> Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
#> 1 2.7474565 -0.033235247 0.2931977 -0.5804917 -2.334204 -0.2940107
#> 2 -1.8434329 -1.533539680 0.5301838 -0.4087503 -2.324506 1.5338587
#> 3 -0.8013831 -0.457054583 -0.7721550 1.9761053 0.827376 -0.6153768
#> 4 -1.0212404 0.001118096 2.3561120 -1.3799138 1.694425 -0.7137025
#> 5 1.8440320 0.563204071 -0.3718484 -1.1498570 1.479870 -1.7598218
#> 6 -3.4790110 -0.045046763 -0.8866077 0.2001117 1.630988 0.9828266
#> Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13
#> 1 0.31074529 -1 -1.1197259 0.6674372 0.45565145 -0.90529691 0.61488054
#> 2 1.89186146 -1 1.0072143 -0.5890646 -0.30918191 0.62456860 0.78738031
#> 3 -0.70275441 -1 -0.4262429 -2.1964075 0.18307508 0.42150113 0.03119027
#> 4 -0.58611938 -1 2.2518687 0.3112638 -0.35974619 0.09064256 0.36381001
#> 5 -0.01697945 -1 0.2041006 0.4561327 0.63825977 0.52078152 -0.47077780
#> 6 0.95631710 -1 1.4433800 -0.6036251 -0.01984666 -1.36557622 0.63165453
#> Dim.14 Dim.15 Dim.16 Dim.17 Dim.18 Dim.19
#> 1 0.17706969 0.7143774 -1.3738105 -0.6407311 0.01524533 -0.52549956
#> 2 2.18038774 0.5342693 -0.7567435 0.5052407 0.32206780 0.03100035
#> 3 0.03204121 1.0474684 0.1455162 -0.2294148 -0.90488196 -0.05861908
#> 4 -0.28478303 0.7059168 -0.2773889 -0.7662820 0.15287021 -0.19999587
#> 5 -1.17598124 0.3737339 -0.4912790 1.5685011 -1.01539735 -0.70073777
#> 6 0.51015191 -0.3141964 1.1187420 0.1560077 -0.19805218 -0.35140427
#> Dim.20 Dim.21 Dim.22
#> 1 -0.17472056 -0.6713790 -0.1249614
#> 2 0.46999634 0.2075433 0.1854061
#> 3 0.69623022 -0.7830441 -0.2757045
#> 4 -0.09702522 0.2896449 0.3040273
#> 5 0.07278404 -0.5711699 -1.1881403
#> 6 -0.51848148 -0.1482935 -0.4989104
Visualization
plot.PCA(x = loan_pca,
choix = "var")fviz_pca_biplot(X = loan_pca,
habillage = "cluster",
geom.ind = "point",
addEllipses = T,
col.var = "navy")
However, same problem like the individual observations map happened: we
have to little dimensions to represent our data. On the above
visualization, our cluster looks like intersecting each other because we
don’t have enough dimensions to represent them.
We may add 1 more dimensions using plotly to see if our clusters is still clumped together.
3D Visualization
# Load required library
library(plotly)
# Extract PCA scores for the first three principal components
pca_scores <- loan_pca$ind$coord[, 1:3]
# Create a data frame with the PCA scores and cluster information
pca_data <- cbind(pca_scores, Cluster = as.factor(airplane_cluster$cluster))
pca_data <- as.data.frame(pca_data) # Convert to data frame
# Plot the 3D scatter plot
plot_ly(data = pca_data, x = ~Dim.1, y = ~Dim.2, z = ~Dim.3, color = ~Cluster,
type = "scatter3d", mode = "markers", marker = list(size = 5)) %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")))Conclusion
The application of unsupervised learning techniques, specifically K-Means clustering and Principal Component Analysis (PCA), has provided valuable insights into customer segmentation and satisfaction levels in the “airplane” dataset.
Through K-Means clustering, we identified three distinct customer clusters based on their flight experience ratings.
Cluster 1 represents customers who expressed relatively low satisfaction, giving lower ratings across various aspects of the flight, and are primarily regular customers traveling for non-business purposes in the economy class.
Cluster 2 consists of customers who are slightly older on average, also regular customers, and tend to prefer the business class. They generally provided higher ratings for all flight aspects compared to Cluster 1, resulting in moderate overall satisfaction.
Cluster 3 includes older customers with longer flight distances, who gave the highest ratings in all aspects of the flight experience, leading to the highest overall satisfaction among the three clusters.
We further improved our clustering analysis by incorporating PCA using the “FactorMiner” library, which helped visualize the data in a higher-dimensional space and potentially enhance the separation between clusters. This combined approach allowed us to understand the underlying structure of the data and gain more meaningful insights into customer segments.
The findings from this analysis can be leveraged by the airline industry to tailor their services and marketing strategies to different customer segments. For example, Cluster 1 customers might benefit from targeted improvements to enhance their satisfaction, while Cluster 3 customers may appreciate specialized offerings to maintain their high levels of satisfaction. Additionally, the identified clusters can be used as a foundation for future predictive models or customer segmentation strategies to further enhance the overall customer experience.
Overall, the utilization of unsupervised learning techniques has proven to be a valuable approach in understanding customer preferences, segmenting customers, and guiding data-driven decision-making in the aviation industry. As the industry continues to collect more data, ongoing analysis and optimization of services based on customer preferences will be essential to remain competitive and deliver an exceptional flying experience.