library(ggplot2)
library(cluster)
library(factoextra)
library(tidyverse)
library(factoextra)
library(flexclust)
library(fpc)
library(clustertend)
library(cluster)
library(ClusterR)
library("fpc")
library("dbscan")
library("factoextra")
library(dplyr)
library(tidyr)

Source

This dataset is provided by the Incheon International Airport Corporation and is used to retrieve the latest departure and arrival information for passenger flights, covering approximately 3–6 days prior to the query date. Search conditions can be based on scheduled or updated times, and queries may be filtered by airport code, flight number (UNIQ code or three-digit flight number), IATA airport code, and language. The returned data include airline information, airport name and code, scheduled and updated times, codeshare details, flight number and master flight number, boarding gate, terminal type, flight status, and aircraft operation type.

airport_data <- read.csv("C:/Users/13640/Desktop/incheon airport.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(airport_data)

##   Year Month ICAO IATA           Airways International_domestic  Routes
## 1 2025     1  AAL   AA American Airlines          international America
## 2 2025     1  AAL   AA American Airlines          international America
## 3 2025     1  AAR   OZ   Asiana Airlines               domestic   Korea
## 4 2025     1  AAR   OZ   Asiana Airlines          international   Japan
## 5 2025     1  AAR   OZ   Asiana Airlines          international   Japan
## 6 2025     1  AAR   OZ   Asiana Airlines          international   Japan
##   Nation_Area Airport Arrival_Departure Scheduled_Unscheduled Passenger_Cargo
## 1      the US     DFW           Arrival             Scheduled       Passenger
## 2      the US     DFW         Departure             Scheduled       Passenger
## 3       Korea     CJU           Arrival           Unscheduled       Passenger
## 4       Japan     AKJ           Arrival             Scheduled       Passenger
## 5       Japan     AKJ         Departure             Scheduled       Passenger
## 6       Japan     CTS           Arrival             Scheduled       Passenger
##   Flight Farepaying_passenger Free_passenger Transfer_passenger Direct_cargo
## 1     29                 5374             15                959        17112
## 2     29                 4762             22               2938       226482
## 3      1                  232              0                  0            0
## 4     17                 2172              6                 85            0
## 5     17                 2436              5                100            0
## 6     35                 7836             15               1388            0
##   Transshipped_cargo. Mail_matter X.Luggage
## 1                   0       10757    151120
## 2                   0       20853    160667
## 3                   0           0      1305
## 4                   0           0     29890
## 5                   0           0     29428
## 6                   0           0    137087

airport_clean<-airport_data %>%drop_na()
head(airport_clean)

##   Year Month ICAO IATA           Airways International_domestic  Routes
## 1 2025     1  AAL   AA American Airlines          international America
## 2 2025     1  AAL   AA American Airlines          international America
## 3 2025     1  AAR   OZ   Asiana Airlines               domestic   Korea
## 4 2025     1  AAR   OZ   Asiana Airlines          international   Japan
## 5 2025     1  AAR   OZ   Asiana Airlines          international   Japan
## 6 2025     1  AAR   OZ   Asiana Airlines          international   Japan
##   Nation_Area Airport Arrival_Departure Scheduled_Unscheduled Passenger_Cargo
## 1      the US     DFW           Arrival             Scheduled       Passenger
## 2      the US     DFW         Departure             Scheduled       Passenger
## 3       Korea     CJU           Arrival           Unscheduled       Passenger
## 4       Japan     AKJ           Arrival             Scheduled       Passenger
## 5       Japan     AKJ         Departure             Scheduled       Passenger
## 6       Japan     CTS           Arrival             Scheduled       Passenger
##   Flight Farepaying_passenger Free_passenger Transfer_passenger Direct_cargo
## 1     29                 5374             15                959        17112
## 2     29                 4762             22               2938       226482
## 3      1                  232              0                  0            0
## 4     17                 2172              6                 85            0
## 5     17                 2436              5                100            0
## 6     35                 7836             15               1388            0
##   Transshipped_cargo. Mail_matter X.Luggage
## 1                   0       10757    151120
## 2                   0       20853    160667
## 3                   0           0      1305
## 4                   0           0     29890
## 5                   0           0     29428
## 6                   0           0    137087

Research Objectives

Incheon International Airport has long pursued the strategic objective of establishing itself as a major air transit hub in Northeast Asia and beyond. Its competitive advantage lies not only in the scale of its route network, but also in its transfer efficiency, route structure, and ability to attract regional passenger flows. Against this background, this study aims to conduct a systematic, data-driven analysis of Incheon Airport’s route and passenger flow structure in order to evaluate the division of roles between transfer-oriented and direct services across different market directions. Specifically, this study is based on aviation transport data at the route or regional level and applies clustering analysis to classify passenger flow characteristics across different routes and market directions. The analysis focuses on identifying three types of markets: (1) mature route markets dominated by point-to-point direct demand; (2) routes or regional directions that rely heavily on transfers, with Incheon Airport functioning as a key hub node; and (3) markets with potential for transfer growth that remain underdeveloped. Through this classification, the study seeks to reveal functional differences in Incheon Airport’s position within the global air route network, as well as varying degrees of dependence on transfer services across regions. Furthermore, by analyzing the clustering results, this study aims to examine the spatial distribution of Incheon Airport’s potential transfer passenger sources and to identify countries and regions that are more suitable for transfer via Incheon in terms of geographic location, route accessibility, and traffic rights structures. This analysis can help airport management better understand the formation mechanisms of transfer passenger flows and provide quantitative support for future decisions regarding route network optimization, flight schedule coordination, transfer product design, and strategic cooperation with airlines. Overall, the objective of this study is not only to characterize the current structure of Incheon Airport’s routes and passenger flows, but also to evaluate its long-term potential and development direction as an international transfer hub. By doing so, the study aims to provide data-driven insights and decision support for Incheon Airport in formulating differentiated development strategies amid increasingly intense competition among global hub airports. ## Data Cleaning Because the dataset records a large number of detailed indicators while some aggregate measures are not directly provided, we performed aggregation and logical calculations for selected variables. For example, free passengers and paid passengers were summed to obtain the total number of passengers, and the average number of passengers per flight was calculated as total passengers divided by the number of flights. In this way, the data were logically consolidated and transformed to construct meaningful analytical variables.

Given the large size of the dataset, observations with missing values were directly discarded to ensure computational efficiency and consistency in subsequent analyses.

airport_clean <- airport_clean %>%
  mutate(sum_passenger = Farepaying_passenger+Free_passenger)
head(airport_clean$sum_passenger)

## [1] 5389 4784  232 2178 2441 7851

airport_clean <- airport_clean %>%
  mutate(passenger_per_flight = round(sum_passenger/Flight))
head(airport_clean$passenger_per_flight)

## [1] 186 165 232 128 144 224

airport_clean <- airport_clean %>%
  mutate(passenger_transfer_rate = Transfer_passenger/sum_passenger)
airport_clean <- airport_clean %>%
  dplyr::filter(!is.na(Flight), Flight > 0)
head(airport_clean$passenger_transfer_rate)

## [1] 0.17795509 0.61413043 0.00000000 0.03902663 0.04096682 0.17679277

airport_clean <- airport_clean %>%
  mutate(passenger_direct = (sum_passenger-Transfer_passenger)/Flight)
head(airport_clean$passenger_direct)

## [1] 152.75862  63.65517 232.00000 123.11765 137.70588 184.65714

airport_clean <- airport_clean %>%
  mutate(passenger_transfer = Transfer_passenger/Flight)
head(airport_clean$passenger_transfer)

## [1]  33.068966 101.310345   0.000000   5.000000   5.882353  39.657143

head(airport_clean$cargo_per_flight )

## NULL

airport_clu <- data.frame(
  transfer = airport_clean$passenger_transfer_rate,
  sum   = airport_clean$sum_passenger
)
airport_clustering<-airport_clu %>%drop_na()

Comparing different methods of clustering

Kmeans

First, we applied the K-means clustering method. An elbow plot was used to determine the optimal number of clusters. A clear elbow is observed at K = 2, suggesting this as a suitable choice. We further validated this selection using silhouette analysis.

fviz_nbclust(
  airport_clustering,
  FUNcluster = kmeans,
  method = "wss",
  k.max = 10
) +
  theme_classic() +
  labs(title = "Elbow Method for Choosing k")

Based on the silhouette analysis, the clustering performance is optimal when K = 2. Therefore, we adopt a two-cluster solution for the subsequent analysis.

fviz_nbclust(
  airport_clustering,
  FUNcluster = kmeans,
  method = "silhouette",
  k.max = 10
) +
  theme_classic() +
  labs(title = "Silhouette Method for Choosing k")

### Kmeans Clustering Result Based on the clustering results, two distinct types of routes can be identified. Cluster 1 is characterized by a high total number of passengers and a relatively low transfer rate, while Cluster 2 exhibits a lower total passenger volume but a higher transfer rate.

These two patterns are well aligned with established commercial logic in the aviation industry. On popular routes, a large volume of point-to-point passengers naturally dilutes the proportion of transfer passengers. In contrast, on less popular routes, airlines tend to encourage transfer traffic in order to improve load factors and make better use of available capacity. From a business perspective, both clusters therefore reflect economically rational and internally consistent operational strategies.

airport_clusters <- kmeans(
  airport_clustering,
  centers =2,
  nstart = 25
)
airport_clusters$size

## [1]  854 4320

airport_clusters$centers

##     transfer       sum
## 1 0.08745426 17367.903
## 2 0.12822219  4123.953

Based on the clustering visualization, we observe that some regions exhibit non-convex (concave) shapes, indicating that there is room for improvement in clustering performance. However, since the analysis is restricted to two clusters, the simplified clustering structure helps mitigate the well-known limitation of K-means in handling complex cluster shapes. As a result, despite this geometric constraint, the two-cluster solution remains reasonable and effective for the purposes of this analysis.

fviz_cluster(
  airport_clusters,
  data = airport_clustering,
  geom = "point",
  ellipse.type = "norm"
) +
  theme_classic()

### K-means Clustering Quality Based on the silhouette analysis, the overall average silhouette coefficient is 0.73, indicating that the clustering results exhibit good quality and a clear cluster structure.

Examining the internal structure of each cluster, most observations have positive silhouette values concentrated in relatively high ranges, suggesting strong within-cluster similarity and good separation from other clusters. In particular, Cluster 2 shows generally higher and more stable silhouette values, indicating a compact internal structure with well-defined boundaries. In contrast, Cluster 1 has relatively lower silhouette values, but the majority remain positive, implying that its clustering assignment is still reasonable overall.

Taken together, the K-means clustering performs robustly on this dataset and effectively distinguishes different route types, providing a reliable basis for subsequent analysis.

sil<-silhouette(airport_clusters$cluster, dist(airport_clustering))
fviz_silhouette(sil)

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

##   cluster size ave.sil.width
## 1       1  854          0.54
## 2       2 4320          0.77

## PAM Consistent with the K-means results, it is also necessary to determine the optimal number of clusters for the PAM method. Based on silhouette analysis, K = 2 is again identified as the optimal number of clusters for PAM, indicating strong agreement between the two clustering approaches.

fviz_nbclust(
  airport_clustering,
  FUNcluster = pam,
  method = "silhouette",
  k.max = 5
) +
  theme_classic() +
  labs(title = "Silhouette Method for Choosing k (PAM)")

### PAM Clustering Result Based on the clustering results, two main route types can be identified: Cluster 1, characterized by a high total number of passengers and a low transfer rate, and Cluster 2, characterized by a lower passenger volume but a higher transfer rate. These findings are consistent with the K-means results and align well with established commercial logic in the aviation industry.

P1<-pam(airport_clustering,2)
P1$medoids

##         transfer   sum
## [1,] 0.002779181  3958
## [2,] 0.000000000 15372

plot(P1)

Based on the visualization, the overall clustering quality is high and consistent with the K-means results; however, limitations in identifying concave-shaped cluster structures remain.

fviz_cluster(P1, geom = "point", ellipse.type = "norm")

### PAM Clustering Quality Based on the silhouette analysis, the overall average silhouette coefficient is 0.72, indicating that the clustering solution achieves a good balance between within-cluster similarity and between-cluster separation. The overall cluster structure is relatively clear, with no evidence of large-scale misclassification. Most observations have positive silhouette values, suggesting that they are reasonably assigned to their respective clusters.

Although the average silhouette score is slightly lower than that obtained from the K-means clustering, this method demonstrates superior stability and robustness in the presence of data dispersion and structural heterogeneity. Compared with clustering approaches that are more sensitive to initial conditions and extreme values, this method is able to maintain good clustering quality while producing more consistent and reliable classifications. Therefore, when both clustering performance and result reliability are taken into account, this clustering solution offers higher practical value.

fviz_silhouette(P1)

##   cluster size ave.sil.width
## 1       1 4193          0.78
## 2       2  981          0.47

## Hierachy Based on hierarchical clustering, we first partitioned the data into two clusters. The resulting average silhouette coefficient is 0.73, indicating high clustering quality and a level of performance comparable to that of the K-means solution.

However, a small number of observations exhibit silhouette values close to zero, and some even show slightly negative values. This suggests that these observations lie near the boundary between clusters, where the distinction between neighboring clusters is not particularly clear.

Given this issue, we ultimately prefer the K-means clustering, which does not produce negative silhouette values and therefore provides a more clearly separated and interpretable clustering structure.

d <- dist(airport_clustering, method = "euclidean")
hc <- hclust(d, method = "ward.D2")
hc_k2 <- cutree(hc, k = 2)
plot(hc, labels = FALSE, hang = -1)
rect.hclust(hc, k = 2, border = c("red", "blue"))

sil_hc2 <- silhouette(hc_k2, d)
mean(sil_hc2[, 3])

## [1] 0.7302064

fviz_silhouette(sil_hc2)

##   cluster size ave.sil.width
## 1       1 4632          0.74
## 2       2  542          0.67

We further explored a clustering solution with three clusters, but the results show a clear decline in the average silhouette coefficient, indicating a deterioration in clustering quality and structural clarity. Moreover, from a business interpretation perspective, the three-cluster solution fails to produce a clear and intuitive segmentation of market stages and lacks meaningful economic interpretation. Therefore, considering both quantitative clustering metrics and practical interpretability, the three-cluster solution is not adopted in this study.

plot(hc, labels = FALSE, hang = -1)
hc_k3 <- cutree(hc, k = 3)
rect.hclust(hc, k = 3, border = c("red","blue","green"))

sil_hc3 <- silhouette(hc_k3, d)
mean(sil_hc3[, 3])

## [1] 0.5865173

fviz_silhouette(sil_hc3)

##   cluster size ave.sil.width
## 1       1 3412          0.65
## 2       2 1220          0.43
## 3       3  542          0.51

## DBSCAN The performance of density-based clustering is considerably weaker. Due to the relatively compact structure of the data, aviation-related variables tend to exhibit smooth transitions rather than clear density separations. As a result, it is difficult to identify meaningful clusters based on density differences from the observed data distribution. In this case, the density-based approach identifies only a single large cluster along with a small number of outliers, providing little analytical or practical value for this study.

set.seed(123)
db <- fpc::dbscan(airport_clustering, eps = 0.15, MinPts = 5)
table(db$cluster)

## 
##    0    1    2    3    4 
## 5154    5    5    5    5

plot(db, airport_clustering, main = "DBSCAN", frame = FALSE)

fviz_cluster(db, airport_clustering, stand = FALSE, frame = FALSE, geom = "point")

## Warning: argument frame is deprecated; please use ellipse instead.

In summary, we adopt the two-cluster PAM solution, as it provides the strongest interpretability and achieves the best overall clustering quality.

airport_clustering$cluster_pam <- factor(P1$clustering)
head(airport_clustering)

##     transfer  sum cluster_pam
## 1 0.17795509 5389           1
## 2 0.61413043 4784           1
## 3 0.00000000  232           1
## 4 0.03902663 2178           1
## 5 0.04096682 2441           1
## 6 0.17679277 7851           1

airport_clean$cluster_pam <- NA

airport_clean$cluster_pam[
  as.numeric(rownames(airport_clustering))
] <- P1$clustering
head(airport_clean)

##   Year Month ICAO IATA           Airways International_domestic  Routes
## 1 2025     1  AAL   AA American Airlines          international America
## 2 2025     1  AAL   AA American Airlines          international America
## 3 2025     1  AAR   OZ   Asiana Airlines               domestic   Korea
## 4 2025     1  AAR   OZ   Asiana Airlines          international   Japan
## 5 2025     1  AAR   OZ   Asiana Airlines          international   Japan
## 6 2025     1  AAR   OZ   Asiana Airlines          international   Japan
##   Nation_Area Airport Arrival_Departure Scheduled_Unscheduled Passenger_Cargo
## 1      the US     DFW           Arrival             Scheduled       Passenger
## 2      the US     DFW         Departure             Scheduled       Passenger
## 3       Korea     CJU           Arrival           Unscheduled       Passenger
## 4       Japan     AKJ           Arrival             Scheduled       Passenger
## 5       Japan     AKJ         Departure             Scheduled       Passenger
## 6       Japan     CTS           Arrival             Scheduled       Passenger
##   Flight Farepaying_passenger Free_passenger Transfer_passenger Direct_cargo
## 1     29                 5374             15                959        17112
## 2     29                 4762             22               2938       226482
## 3      1                  232              0                  0            0
## 4     17                 2172              6                 85            0
## 5     17                 2436              5                100            0
## 6     35                 7836             15               1388            0
##   Transshipped_cargo. Mail_matter X.Luggage sum_passenger passenger_per_flight
## 1                   0       10757    151120          5389                  186
## 2                   0       20853    160667          4784                  165
## 3                   0           0      1305           232                  232
## 4                   0           0     29890          2178                  128
## 5                   0           0     29428          2441                  144
## 6                   0           0    137087          7851                  224
##   passenger_transfer_rate passenger_direct passenger_transfer cluster_pam
## 1              0.17795509        152.75862          33.068966           1
## 2              0.61413043         63.65517         101.310345           1
## 3              0.00000000        232.00000           0.000000           1
## 4              0.03902663        123.11765           5.000000           1
## 5              0.04096682        137.70588           5.882353           1
## 6              0.17679277        184.65714          39.657143           1

sum(airport_clean$cluster_pam == 2, na.rm = TRUE)

## [1] 981

airport_clean %>%
  count(cluster_pam)

##   cluster_pam    n
## 1           1 4193
## 2           2  981
## 3          NA 1560

#Cluster Analysis Results For most airports, a stable hinterland passenger base is the foundation for sustained operation and long-term viability. Even though Incheon International Airport has long pursued the strategy of developing into an international transfer hub, passenger flows dominated by local and direct travelers continue to constitute its primary demand base in the short to medium term. In this context, direct markets not only serve as the airport’s operational “core base,” but also provide the necessary scale to support the development of transfer services.

Accordingly, in this study, routes or market directions dominated by direct passengers (Cluster 1) are defined as the control group, while routes or market directions dominated by transfer passengers (Cluster 2) are defined as the experimental group. This grouping strategy allows transfer-oriented markets to be analyzed more precisely while maintaining direct markets as a baseline reference. Such a control–experiment framework helps clarify the functional roles of different route types at Incheon Airport and further identify markets with potential for conversion toward, or deepening of, transfer-oriented operations. By comparing the structural characteristics of the two clusters, this study aims to provide data-driven support and strategic guidance for advancing transfer hub development while maintaining the stability of direct passenger demand.

For Korea, China and Japan have long constituted the most important aviation markets, and thus warrant separate and focused attention in the analysis. From the data structure, it is evident that both arrival and departure passenger flows at Incheon Airport are highly concentrated in surrounding countries and regions, reflecting the characteristics of a typical regional hub airport.

Specifically, Incheon Airport primarily serves passengers from East Asia—namely China and Japan—Northeast Asia (such as Mongolia), and multiple Southeast Asian countries. Routes from these regions together form a stable and extensive base of transfer passengers. Among them, travelers from Southeast Asia and Northeast Asia often use Incheon as a key transfer point for journeys to North America, which is consistent with Korea’s role as an important gateway airport for U.S.-bound routes. From the perspectives of route network structure, geography, and traffic rights policies, Incheon occupies a strategically central position in the “Asia–North America” aviation system.

It is important to note that within the core China–Japan–Korea market, the arrival and departure structures are not fully symmetric. The data show that the number of flights departing from Incheon to Japan is significantly higher than the number of flights arriving from Japan to Incheon. This phenomenon should not be interpreted simply as stable point-to-point direct demand, but rather as the outcome of price-driven transfer behavior. A large share of passengers do not regard Korea or Japan as their final destination; instead, they choose to transfer via Incheon in order to obtain more cost-efficient travel options.

The primary driver of this structure is the highly developed low-cost carrier (LCC) system in Korea. The Korea–Japan market is characterized by high flight density, intense competition, and fares that are substantially lower than alternative routing options. As a result, “transferring via Incheon to Japan” offers a clear economic advantage. Consequently, routes between Korea and Japan in this context function more as downstream legs within a transfer network, rather than as traditional point-to-point destination markets.

From the arrival and departure distributions of Cluster 2, it can be directly observed that this cluster exhibits a flight structure markedly different from the transfer-oriented Cluster 1, and is overall more consistent with direct-service-dominated operations. Nevertheless, Cluster 2 also shows substantial weight in the China–Japan–Korea market and does not represent a purely “direct-only” structure. Instead, both direct and transfer passenger flows are actively present.

This pattern aligns with Incheon Airport’s operational characteristics as a hub airport. The airport operates clear bank structures, with certain flights scheduled specifically to facilitate transfer connections. Through mechanisms such as through-check baggage, codesharing, and integrated ticketing across airlines, even on relatively short-haul China–Japan–Korea routes, some passengers may still be incorporated into the transfer system. At the same time, China, Japan, Korea, and Southeast Asian countries are all major tourism destinations, with frequent cross-border movements and strong direct travel demand. As a result, in Cluster 2, direct passengers still account for a relatively high proportion, producing a structure in which transfer traffic exists but direct demand is dominant.

Compared with Cluster 1, Cluster 2 exhibits significantly fewer flights in Northeast Asia (excluding China, Japan, and Korea), North America, and Europe, particularly on the arrival side. This indicates that Cluster 2 is more concentrated on regional and short-haul international routes, while long-haul intercontinental routes account for a smaller share. In addition, Korea itself appears much more prominently in Cluster 2 than in Cluster 1, suggesting that this cluster contains more routes for which Korea is either the origin or the destination, rather than merely a transit node.

This feature is closely related to Korea’s transportation and demographic structure. On the one hand, the country’s high-speed rail network is highly developed, limiting domestic air travel demand, which is largely concentrated at Gimpo Airport. On the other hand, Korea’s population of approximately 50 million—nearly half of whom reside in the Seoul metropolitan area—provides a highly concentrated and stable hinterland for international direct travel. Under these conditions, international direct demand with Korea as the origin or destination remains stable, while “domestic-to-international” transfer demand is relatively weak.

Overall, Cluster 2 exhibits a route structure primarily characterized by regional, symmetric, and direct demand, while still incorporating a limited degree of transfer activity. It therefore serves as an appropriate direct-route control group for comparison with the transfer-oriented Cluster 1, enabling a clearer understanding of Incheon Airport’s differentiated functional roles within its route network.

par(mfrow = c(2, 2))
PAM1_ARV <- airport_clean %>%
  filter(
    cluster_pam == 1,
    Arrival_Departure == "Arrival",
    Passenger_Cargo == "Passenger"
  ) %>%
  group_by(Routes) %>%
  summarise(
    total_flights = sum(Flight, na.rm = TRUE)
  ) %>%
  ggplot(aes(x = reorder(Routes, total_flights), y = total_flights)) +
  geom_col(fill = "#4E79A7") +
  geom_text(
    aes(label = total_flights),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  theme_classic() +
  labs(
    title = "Cluster 1 – Arrival Routes",
    x = "Routes",
    y = "Total Flights"
  )
PAM1_ARV

PAM1_DEP <- airport_clean %>%
  filter(
    cluster_pam == 1,
    Arrival_Departure == "Departure",
    Passenger_Cargo == "Passenger"
  ) %>%
  group_by(Routes) %>%
  summarise(
    total_flights = sum(Flight, na.rm = TRUE)
  ) %>%
  ggplot(aes(x = reorder(Routes, total_flights), y = total_flights)) +
  geom_col(fill = "#4E79A7") +
  geom_text(
    aes(label = total_flights),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  theme_classic() +
  labs(
    title = "Cluster 1 – Departure Routes",
    x = "Routes",
    y = "Total Flights"
  )
PAM1_DEP

PAM2_ARV <- airport_clean %>%
  filter(cluster_pam == 2, Arrival_Departure == "Arrival",
         Passenger_Cargo == "Passenger") %>%
  group_by(Routes) %>%
  summarise(
    total_flights = sum(Flight, na.rm = TRUE)
  ) %>%
  ggplot(aes(x = reorder(Routes, total_flights), y = total_flights)) +
  geom_col(fill = "#4E79A7") +
  geom_text(
    aes(label = total_flights),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  theme_classic() +
  labs(
    title = "Cluster 2 – Arrival Routes",
    x = "Routes",
    y = "Total Flights"
  )

plot(PAM2_ARV)

PAM2_DEP <- airport_clean %>%
  filter(cluster_pam == 2, Arrival_Departure == "Departure",
         Passenger_Cargo == "Passenger") %>%
  group_by(Routes) %>%
  summarise(
    total_flights = sum(Flight, na.rm = TRUE)
  ) %>%
  ggplot(aes(x = reorder(Routes, total_flights), y = total_flights)) +
  geom_col(fill = "#4E79A7") +
  geom_text(
    aes(label = total_flights),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  theme_classic() +
  labs(
    title = "Cluster 2 – Departure Routes ",
    x = "Routes",
    y = "Total Flights"
  )

plot(PAM2_DEP)

## Summary Through clustering analysis of Incheon International Airport’s international routes, combined with a comparison of arrival and departure route structures, this study identifies two types of route groupings that differ significantly in both functional roles and spatial structure: a transfer-dominated route group (Cluster 1) and a direct-demand-dominated route group (Cluster 2). This finding indicates that Incheon Airport’s route system does not operate under a single hub model, but rather consists of multiple operational logics that coexist and overlap.

From a research perspective, the clustering results clearly demonstrate that reliance on a single indicator—such as flight volume or regional share—is insufficient to accurately capture the true operational structure of a hub airport. By grouping routes according to functional characteristics, it becomes possible to distinguish more effectively between transfer-oriented and direct-oriented routes, thereby avoiding misinterpretations caused by conflating fundamentally different traffic patterns. This study shows that data-driven clustering methods offer strong explanatory power in aviation network analysis, particularly for uncovering functional differences that are not immediately visible from aggregate flight volumes alone.

From an airport operations and strategic perspective, the results reveal that Incheon Airport successfully operates two distinct yet interconnected route systems within the same geographic and traffic-rights environment. On the one hand, the route structure represented by Cluster 1 reflects Incheon’s core role as a Northeast Asian transfer hub, with routes oriented toward cross-regional connectivity and serving as transfer channels for passengers from China, Northeast Asia, and Southeast Asia traveling to long-haul markets such as North America. On the other hand, Cluster 2 highlights Incheon’s stable and substantial direct-demand base, particularly along China–Japan–Korea and Southeast Asian routes, where high-frequency direct movements provide critical support for the airport’s long-term operations.

This coexistence of transfer and direct functions offers important strategic implications for Incheon Airport. First, transfer operations do not replace direct demand; rather, they are built upon and reinforced by a stable direct-market foundation. Second, direct routes enhance network density and scheduling flexibility for the transfer system, while transfer traffic increases overall network value and resilience. Together, these two components are mutually reinforcing and jointly constitute Incheon Airport’s core competitive advantage within the Northeast Asian aviation landscape.

Overall, this study suggests that Incheon Airport’s competitiveness does not stem solely from its identity as a transfer hub, but from its ability to flexibly allocate transfer and direct functions across different regions and distance scales. This structural diversity not only improves network efficiency and stability, but also provides greater strategic resilience in the face of external shocks and intensifying regional competition.

LUBOWEN Clustering

2026-02-01

Source

Research Objectives

Comparing different methods of clustering

Kmeans