Dimensionality reduction

LU BOWEN 488222

library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(gridExtra)

Research Objectives

This study aims to systematically analyze the overall evolution and temporal structure of the European aviation industry from 2013 to 2024 using dimensionality reduction techniques. Specifically, the objectives of this study are as follows:

To apply principal component analysis (PCA) to high-dimensional country–year aviation transport data in order to extract the main patterns of variation, identify the core common factors driving changes in the European aviation industry, and assess the relative roles and degree of synchronization among different countries within the overall market.

To characterize the temporal evolution of the European aviation industry, with a particular focus on the structural differences in the principal component space across the pre-pandemic period (2013–2019), the pandemic-induced shock period (2020–2021), and the post-pandemic recovery phase (2022–2024), thereby revealing the structural break caused by COVID-19 and the subsequent recovery process.

To complement the PCA results with multidimensional scaling (MDS), examining similarities between years from a distance-based perspective and comparing the results with those obtained from PCA, in order to assess the robustness of the identified temporal segmentation and its dependence on the choice of dimensionality reduction method.

To compare the aviation development patterns of EU and non-EU countries, exploring the high degree of synchronization within the aviation industry under the EU single market framework and the relative heterogeneity exhibited by non-EU countries in the overall structure.

Through these analyses, this study seeks to enhance the understanding of the evolution mechanisms of the European aviation market from both an overall and a structural perspective, and to provide quantitative evidence for subsequent policy analysis and industry-related research.

source

The data are sourced from the Eurostat database, the official statistical database maintained by the European Statistical Office (Eurostat). It provides comprehensive statistics covering multiple domains, including the economy, society, population, employment, education, health, and the environment. The data in this database are highly standardized and comparable across countries, covering all EU Member States as well as several related countries, and are regularly updated. As a result, Eurostat data are widely used in academic research and policy analysis.

https://ec.europa.eu/eurostat/web/main/data/database

Within the Traffic subdirectory, information on air transport activity for EU Member States and other European countries from 2015 to 2024 can be found. These data are used in this study to analyze annual changes in the aviation industry across EU countries.

aireu<-read.csv("C:/Users/13640/Desktop/pass.csv", header=TRUE)
head(aireu)
##   freq.unit.tra_meas.tra_cov.schedule.geo.TIME_PERIOD     X2013     X2014
## 1                          A,PAS,PAS_CRD,TOTAL,TOT,AT 25749724  26378676 
## 2                          A,PAS,PAS_CRD,TOTAL,TOT,BA        :         : 
## 3                          A,PAS,PAS_CRD,TOTAL,TOT,BE 26389927  28776258 
## 4                          A,PAS,PAS_CRD,TOTAL,TOT,BG  7079292   7520697 
## 5                          A,PAS,PAS_CRD,TOTAL,TOT,CH 44217568  46127426 
## 6                          A,PAS,PAS_CRD,TOTAL,TOT,CY  7011437   7328546 
##       X2015     X2016     X2017     X2018     X2019     X2020     X2021
## 1 26754007  27181511  28327279  31138417  35644188   9168431  11105564 
## 2        :         :         :         :         :         :    987659 
## 3 30958841  30115832  33260493  34506309  35385188   9465828  13500020 
## 4  7610949   9324217  11092651  12137714  11713068   3729017   5047877 
## 5 48026375  50505492  53564943  56139549  57194328  16006811  19109708 
## 6  7590787   8961817  10238913  10927101  11261410   2270577   5099704 
##       X2022     X2023     X2024
## 1 26381180  33063166  35281811 
## 2  1769813   1917082         : 
## 3 27873892  32341221  34759837 
## 4  8807502  10561597  10961466 
## 5 42568368  52090531  56674840 
## 6  9200931  11616238  12264970
summary(aireu)
##  freq.unit.tra_meas.tra_cov.schedule.geo.TIME_PERIOD    X2013          
##  Length:37                                           Length:37         
##  Class :character                                    Class :character  
##  Mode  :character                                    Mode  :character  
##     X2014              X2015              X2016              X2017          
##  Length:37          Length:37          Length:37          Length:37         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##     X2018              X2019              X2020              X2021          
##  Length:37          Length:37          Length:37          Length:37         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##     X2022              X2023              X2024          
##  Length:37          Length:37          Length:37         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character

data Cleaning

We performed data preprocessing steps including data cleaning, extraction of the year column, and converting the year variable into a numeric format to ensure the feasibility of subsequent numerical analyses.

aireu[aireu == ":"] <- NA
year_cols <- grep("^X[0-9]{4}$", colnames(aireu))
colnames(aireu)[year_cols]
##  [1] "X2013" "X2014" "X2015" "X2016" "X2017" "X2018" "X2019" "X2020" "X2021"
## [10] "X2022" "X2023" "X2024"
aireu[year_cols] <- lapply(aireu[year_cols], as.numeric)
summary(aireu)
##  freq.unit.tra_meas.tra_cov.schedule.geo.TIME_PERIOD     X2013          
##  Length:37                                           Min.   :  1265766  
##  Class :character                                    1st Qu.:  5487400  
##  Mode  :character                                    Median : 23939062  
##                                                      Mean   : 62310055  
##                                                      3rd Qu.: 38569165  
##                                                      Max.   :746100398  
##                                                      NA's   :5          
##      X2014               X2015               X2016          
##  Min.   :  1307128   Min.   :  1436003   Min.   :  1404152  
##  1st Qu.:  5806168   1st Qu.:  5145856   1st Qu.:  5232303  
##  Median : 26012624   Median : 26754007   Median : 18099954  
##  Mean   : 65320751   Mean   : 66690435   Mean   : 67253019  
##  3rd Qu.: 40870231   3rd Qu.: 42096402   3rd Qu.: 43236708  
##  Max.   :781202599   Max.   :819698948   Max.   :871695782  
##  NA's   :5           NA's   :4           NA's   :2          
##      X2017               X2018               X2019          
##  Min.   :  1682133   Min.   :  1810567   Min.   :1.719e+06  
##  1st Qu.:  6042792   1st Qu.:  6921444   1st Qu.:7.451e+06  
##  Median : 20054947   Median : 22173530   Median :2.329e+07  
##  Mean   : 72534524   Mean   : 76774819   Mean   :7.948e+07  
##  3rd Qu.: 48921892   3rd Qu.: 52638712   3rd Qu.:5.555e+07  
##  Max.   :938854476   Max.   :996295411   Max.   :1.035e+09  
##  NA's   :2           NA's   :2           NA's   :2          
##      X2020               X2021               X2022          
##  Min.   :   287787   Min.   :   419346   Min.   :   968811  
##  1st Qu.:  1837992   1st Qu.:  2450871   1st Qu.:  5614983  
##  Median :  6031034   Median :  5099704   Median : 13812577  
##  Mean   : 19706028   Mean   : 26179610   Mean   : 57003915  
##  3rd Qu.: 15461473   3rd Qu.: 19001760   3rd Qu.: 40958032  
##  Max.   :276758108   Max.   :373809763   Max.   :816699952  
##  NA's   :3           NA's   :2           NA's   :2          
##      X2023               X2024          
##  Min.   :  1268352   Min.   :1.438e+06  
##  1st Qu.:  7514804   1st Qu.:8.691e+06  
##  Median : 19783568   Median :2.459e+07  
##  Mean   : 70582699   Mean   :7.877e+07  
##  3rd Qu.: 54344226   3rd Qu.:6.052e+07  
##  Max.   :972941917   Max.   :1.054e+09  
##  NA's   :1           NA's   :2

Subsequently, the data were transposed from a “region × year” structure to a “year × region” time-series matrix, and the column names were cleaned and standardized.

dim(aireu)
## [1] 37 13
year_cols <- grep("^X[0-9]{4}$", colnames(aireu))
air_mat <- as.matrix(aireu[, year_cols])
air_time <- t(air_mat)
summary(air_time)
##        V1                 V2                V3                 V4          
##  Min.   : 9168431   Min.   : 987659   Min.   : 9465828   Min.   : 3729017  
##  1st Qu.:26221438   1st Qu.:1378736   1st Qu.:27502901   1st Qu.: 7410346  
##  Median :26967759   Median :1769813   Median :30537336   Median : 9065860  
##  Mean   :26347830   Mean   :1558185   Mean   :28111137   Mean   : 8798837  
##  3rd Qu.:31619604   3rd Qu.:1843448   3rd Qu.:33571947   3rd Qu.:10994262  
##  Max.   :35644188   Max.   :1917082   Max.   :35385188   Max.   :12137714  
##                     NA's   :9                                              
##        V5                 V6                 V7                 V8           
##  Min.   :16006811   Min.   : 2270577   Min.   : 3821372   Min.   : 57795978  
##  1st Qu.:43805268   1st Qu.: 7249269   1st Qu.:11802022   1st Qu.:174413052  
##  Median :49265934   Median : 9081374   Median :13172183   Median :190191122  
##  Mean   :45185495   Mean   : 8647703   Mean   :12995590   Mean   :174579965  
##  3rd Qu.:54208594   3rd Qu.:11010678   3rd Qu.:16620692   3rd Qu.:203612806  
##  Max.   :57194328   Max.   :12264970   Max.   :18767088   Max.   :226764086  
##                                                                              
##        V9                V10               V11                V12           
##  Min.   : 8658654   Min.   : 857837   Min.   :17341192   Min.   : 57797305  
##  1st Qu.:27257110   1st Qu.:2004496   1st Qu.:37844358   1st Qu.:163448780  
##  Median :30916440   Median :2425067   Median :47857050   Median :197799836  
##  Mean   :27900418   Mean   :2378618   Mean   :46931793   Mean   :183126034  
##  3rd Qu.:33621195   3rd Qu.:2958663   3rd Qu.:56542888   3rd Qu.:222524165  
##  Max.   :34865711   Max.   :3471878   Max.   :71028749   Max.   :259739884  
##                                                                             
##       V13                 V14                V15                 V16          
##  Min.   :2.768e+08   Min.   : 4554497   Min.   : 50724011   Min.   : 1943547  
##  1st Qu.:7.724e+08   1st Qu.:15877188   1st Qu.:135461222   1st Qu.: 6036210  
##  Median :8.457e+08   Median :17325588   Median :143074086   Median : 8159258  
##  Mean   :8.069e+08   Mean   :16087157   Mean   :135071213   Mean   : 7863082  
##  3rd Qu.:9.788e+08   3rd Qu.:18588702   3rd Qu.:160545991   3rd Qu.: 9954280  
##  Max.   :1.054e+09   Max.   :23287929   Max.   :168726788   Max.   :12610863  
##                                                                               
##       V17                V18                V19                V20           
##  Min.   : 3962687   Min.   : 8268297   Min.   : 1527633   Min.   : 40405355  
##  1st Qu.: 8901466   1st Qu.:25884030   1st Qu.: 3690027   1st Qu.:119693216  
##  Median :12026939   Median :32500800   Median : 6632646   Median :133451750  
##  Mean   :11527551   Mean   :29272366   Mean   : 6032473   Mean   :127890384  
##  3rd Qu.:14980317   3rd Qu.:36745631   3rd Qu.: 8252340   3rd Qu.:155181318  
##  Max.   :17781961   Max.   :40837815   Max.   :10166386   Max.   :182370612  
##                                                                              
##       V21               V22               V23               V24         
##  Min.   :1804500   Min.   :1426310   Min.   :1995133   Min.   : 521959  
##  1st Qu.:3719172   1st Qu.:2367806   1st Qu.:4797276   1st Qu.:1845464  
##  Median :5016831   Median :3269486   Median :5376264   Median :2173494  
##  Mean   :4706512   Mean   :3297885   Mean   :5369451   Mean   :2024112  
##  3rd Qu.:6057558   3rd Qu.:4134328   3rd Qu.:6723326   3rd Qu.:2493209  
##  Max.   :6582749   Max.   :5147854   Max.   :7785726   Max.   :2871774  
##                                                        NA's   :3        
##       V25               V26               V27                V28          
##  Min.   : 709241   Min.   :1752445   Min.   :23594783   Min.   :13216883  
##  1st Qu.:1501623   1st Qu.:4225531   1st Qu.:60241570   1st Qu.:33059246  
##  Median :1998135   Median :5471022   Median :67444466   Median :37553124  
##  Mean   :1989149   Mean   :5424804   Mean   :62714618   Mean   :33338751  
##  3rd Qu.:2303182   3rd Qu.:6933952   3rd Qu.:76246802   3rd Qu.:38399377  
##  Max.   :3169418   Max.   :8968239   Max.   :81192507   Max.   :40348437  
##  NA's   :2                                                                
##       V29                V30                V31                V32         
##  Min.   :13825460   Min.   :16548993   Min.   : 6633447   Min.   :1938468  
##  1st Qu.:25104438   1st Qu.:31844002   1st Qu.:10776768   1st Qu.:4414858  
##  Median :34975764   Median :44301550   Median :16544246   Median :5521250  
##  Mean   :34819941   Mean   :42337867   Mean   :16006462   Mean   :5519979  
##  3rd Qu.:44561354   3rd Qu.:52126117   3rd Qu.:20248698   3rd Qu.:6450643  
##  Max.   :57045518   Max.   :63996712   Max.   :24590200   Max.   :8715276  
##                                                           NA's   :3        
##       V33                V34               V35               V36           
##  Min.   : 9317677   Min.   : 287787   Min.   : 500604   Min.   :168558965  
##  1st Qu.:28340082   1st Qu.:1191527   1st Qu.:1642755   1st Qu.:172087594  
##  Median :32104634   Median :1355640   Median :2050959   Median :175616222  
##  Mean   :29467972   Mean   :1250562   Mean   :1963682   Mean   :175616222  
##  3rd Qu.:36368109   3rd Qu.:1498780   3rd Qu.:2494402   3rd Qu.:179144851  
##  Max.   :38945096   Max.   :1810567   Max.   :2839787   Max.   :182673480  
##                                                         NA's   :10         
##       V37           
##  Min.   :210468980  
##  1st Qu.:226146280  
##  Median :248868873  
##  Mean   :246554629  
##  3rd Qu.:268409804  
##  Max.   :277432380  
##  NA's   :5
head(air_time)
##           [,1] [,2]     [,3]     [,4]     [,5]     [,6]     [,7]      [,8]
## X2013 25749724   NA 26389927  7079292 44217568  7011437 11891812 180783188
## X2014 26378676   NA 28776258  7520697 46127426  7328546 12079873 186445814
## X2015 26754007   NA 30958841  7610949 48026375  7590787 12672004 193936430
## X2016 27181511   NA 30115832  9324217 50505492  8961817 13672362 200687293
## X2017 28327279   NA 33260493 11092651 53564943 10238913 16245554 212389343
## X2018 31138417   NA 34506309 12137714 56139549 10927101 17838221 222422361
##           [,9]   [,10]    [,11]     [,12]     [,13]    [,14]     [,15]   [,16]
## X2013 27459623 1958565 34023934 157731973 746100398 16565391 132762875 5722447
## X2014 29015133 2019806 39117833 165354382 781202599 17171931 136360671 6140797
## X2015 30095505 2160978 42096402 174652503 819698948 17479246 140867569 6571698
## X2016 32763142 2214989 45543371 193872037 871695782 18099954 145280602 7475463
## X2017 33261214 2635145 50170728 209824089 938854476 20054947 154096485 8843053
## X2018 34701139 2995528 54258826 220611429 996295411 22173530 161991179 9731294
##          [,17]    [,18]    [,19]     [,20]   [,21]   [,22]   [,23]   [,24]
## X2013  8441319 24603640  3199266 115279105 3482358 2169327 4782257      NA
## X2014  9054848 26310826  3853614 121164587 3798110 2433966 4802282      NA
## X2015 10228352 29545020  4847288 127665221 4227389 2651751 5145856      NA
## X2016 11660366 32595709  6801814 134477781 4787561 2984242 5384160 1845464
## X2017 13350029 34271771  8728509 144306325 5246101 3554730 6077854 2173494
## X2018 15176493 36345005 10166386 153352444 6254178 3988804 7037070 2440486
##         [,25]   [,26]    [,27]    [,28]    [,29]    [,30]    [,31]   [,32]
## X2013      NA 4032029 58077271 36686364 23274484 29694146 10016933      NA
## X2014      NA 4290032 60963003 37603195 25714422 32560621 10907487      NA
## X2015 1452373 4619557 64570938 37503052 28907439 36005814 12580711      NA
## X2016 1649374 5080446 70317995 37727546 32266861 40930044 15153719 4414858
## X2017 1861282 6007731 76240304 38739778 37684668 47673057 17934774 4828171
## X2018 2152746 6805817 79644163 40030105 43767548 51018598 19809642 5521250
##          [,33]   [,34]   [,35] [,36]     [,37]
## X2013 31443225 1265766 1557149    NA 210468980
## X2014 32766043 1307128 1671290    NA 220022122
## X2015 34011263 1436003 1943656    NA 232270437
## X2016 35952558 1404152 2158261    NA 248868873
## X2017 38456213 1682133 2402651    NA 264629454
## X2018 38945096 1810567 2794094    NA 272190155
colnames(air_time)
## NULL
colnames(air_time) <- aireu[, 1]
head(colnames(air_time))
## [1] "A,PAS,PAS_CRD,TOTAL,TOT,AT" "A,PAS,PAS_CRD,TOTAL,TOT,BA"
## [3] "A,PAS,PAS_CRD,TOTAL,TOT,BE" "A,PAS,PAS_CRD,TOTAL,TOT,BG"
## [5] "A,PAS,PAS_CRD,TOTAL,TOT,CH" "A,PAS,PAS_CRD,TOTAL,TOT,CY"
colnames(air_time) <- sub(".*,", "", colnames(air_time))
head(colnames(air_time))
## [1] "AT" "BA" "BE" "BG" "CH" "CY"
summary(air_time)
##        AT                 BA                BE                 BG          
##  Min.   : 9168431   Min.   : 987659   Min.   : 9465828   Min.   : 3729017  
##  1st Qu.:26221438   1st Qu.:1378736   1st Qu.:27502901   1st Qu.: 7410346  
##  Median :26967759   Median :1769813   Median :30537336   Median : 9065860  
##  Mean   :26347830   Mean   :1558185   Mean   :28111137   Mean   : 8798837  
##  3rd Qu.:31619604   3rd Qu.:1843448   3rd Qu.:33571947   3rd Qu.:10994262  
##  Max.   :35644188   Max.   :1917082   Max.   :35385188   Max.   :12137714  
##                     NA's   :9                                              
##        CH                 CY                 CZ                 DE           
##  Min.   :16006811   Min.   : 2270577   Min.   : 3821372   Min.   : 57795978  
##  1st Qu.:43805268   1st Qu.: 7249269   1st Qu.:11802022   1st Qu.:174413052  
##  Median :49265934   Median : 9081374   Median :13172183   Median :190191122  
##  Mean   :45185495   Mean   : 8647703   Mean   :12995590   Mean   :174579965  
##  3rd Qu.:54208594   3rd Qu.:11010678   3rd Qu.:16620692   3rd Qu.:203612806  
##  Max.   :57194328   Max.   :12264970   Max.   :18767088   Max.   :226764086  
##                                                                              
##        DK                 EE                EL                 ES           
##  Min.   : 8658654   Min.   : 857837   Min.   :17341192   Min.   : 57797305  
##  1st Qu.:27257110   1st Qu.:2004496   1st Qu.:37844358   1st Qu.:163448780  
##  Median :30916440   Median :2425067   Median :47857050   Median :197799836  
##  Mean   :27900418   Mean   :2378618   Mean   :46931793   Mean   :183126034  
##  3rd Qu.:33621195   3rd Qu.:2958663   3rd Qu.:56542888   3rd Qu.:222524165  
##  Max.   :34865711   Max.   :3471878   Max.   :71028749   Max.   :259739884  
##                                                                             
##    EU27_2020               FI                 FR                  HR          
##  Min.   :2.768e+08   Min.   : 4554497   Min.   : 50724011   Min.   : 1943547  
##  1st Qu.:7.724e+08   1st Qu.:15877188   1st Qu.:135461222   1st Qu.: 6036210  
##  Median :8.457e+08   Median :17325588   Median :143074086   Median : 8159258  
##  Mean   :8.069e+08   Mean   :16087157   Mean   :135071213   Mean   : 7863082  
##  3rd Qu.:9.788e+08   3rd Qu.:18588702   3rd Qu.:160545991   3rd Qu.: 9954280  
##  Max.   :1.054e+09   Max.   :23287929   Max.   :168726788   Max.   :12610863  
##                                                                               
##        HU                 IE                 IS                 IT           
##  Min.   : 3962687   Min.   : 8268297   Min.   : 1527633   Min.   : 40405355  
##  1st Qu.: 8901466   1st Qu.:25884030   1st Qu.: 3690027   1st Qu.:119693216  
##  Median :12026939   Median :32500800   Median : 6632646   Median :133451750  
##  Mean   :11527551   Mean   :29272366   Mean   : 6032473   Mean   :127890384  
##  3rd Qu.:14980317   3rd Qu.:36745631   3rd Qu.: 8252340   3rd Qu.:155181318  
##  Max.   :17781961   Max.   :40837815   Max.   :10166386   Max.   :182370612  
##                                                                              
##        LT                LU                LV                ME         
##  Min.   :1804500   Min.   :1426310   Min.   :1995133   Min.   : 521959  
##  1st Qu.:3719172   1st Qu.:2367806   1st Qu.:4797276   1st Qu.:1845464  
##  Median :5016831   Median :3269486   Median :5376264   Median :2173494  
##  Mean   :4706512   Mean   :3297885   Mean   :5369451   Mean   :2024112  
##  3rd Qu.:6057558   3rd Qu.:4134328   3rd Qu.:6723326   3rd Qu.:2493209  
##  Max.   :6582749   Max.   :5147854   Max.   :7785726   Max.   :2871774  
##                                                        NA's   :3        
##        MK                MT                NL                 NO          
##  Min.   : 709241   Min.   :1752445   Min.   :23594783   Min.   :13216883  
##  1st Qu.:1501623   1st Qu.:4225531   1st Qu.:60241570   1st Qu.:33059246  
##  Median :1998135   Median :5471022   Median :67444466   Median :37553124  
##  Mean   :1989149   Mean   :5424804   Mean   :62714618   Mean   :33338751  
##  3rd Qu.:2303182   3rd Qu.:6933952   3rd Qu.:76246802   3rd Qu.:38399377  
##  Max.   :3169418   Max.   :8968239   Max.   :81192507   Max.   :40348437  
##  NA's   :2                                                                
##        PL                 PT                 RO                 RS         
##  Min.   :13825460   Min.   :16548993   Min.   : 6633447   Min.   :1938468  
##  1st Qu.:25104438   1st Qu.:31844002   1st Qu.:10776768   1st Qu.:4414858  
##  Median :34975764   Median :44301550   Median :16544246   Median :5521250  
##  Mean   :34819941   Mean   :42337867   Mean   :16006462   Mean   :5519979  
##  3rd Qu.:44561354   3rd Qu.:52126117   3rd Qu.:20248698   3rd Qu.:6450643  
##  Max.   :57045518   Max.   :63996712   Max.   :24590200   Max.   :8715276  
##                                                           NA's   :3        
##        SE                 SI                SK                TR           
##  Min.   : 9317677   Min.   : 287787   Min.   : 500604   Min.   :168558965  
##  1st Qu.:28340082   1st Qu.:1191527   1st Qu.:1642755   1st Qu.:172087594  
##  Median :32104634   Median :1355640   Median :2050959   Median :175616222  
##  Mean   :29467972   Mean   :1250562   Mean   :1963682   Mean   :175616222  
##  3rd Qu.:36368109   3rd Qu.:1498780   3rd Qu.:2494402   3rd Qu.:179144851  
##  Max.   :38945096   Max.   :1810567   Max.   :2839787   Max.   :182673480  
##                                                         NA's   :10         
##        UK           
##  Min.   :210468980  
##  1st Qu.:226146280  
##  Median :248868873  
##  Mean   :246554629  
##  3rd Qu.:268409804  
##  Max.   :277432380  
##  NA's   :5
dim(air_time)
## [1] 12 37
col_var <- apply(air_time, 2, var, na.rm = TRUE)
col_var
##           AT           BA           BE           BG           CH           CY 
## 6.971638e+13 2.495467e+11 6.901332e+13 7.157922e+12 1.900447e+14 8.695569e+12 
##           CZ           DE           DK           EE           EL           ES 
## 2.276762e+13 2.965730e+15 7.991660e+13 6.120454e+11 2.239351e+14 3.486241e+15 
##    EU27_2020           FI           FR           HR           HU           IE 
## 6.102672e+16 3.336933e+13 1.446896e+15 9.251603e+12 1.967285e+13 1.162854e+14 
##           IS           IT           LT           LU           LV           ME 
## 7.837456e+12 1.704888e+15 2.499450e+12 1.420915e+12 3.179985e+12 5.427816e+11 
##           MK           MT           NL           NO           PL           PT 
## 6.094041e+11 4.549305e+12 3.471177e+14 8.979898e+13 1.761256e+14 2.296188e+14 
##           RO           RS           SE           SI           SK           TR 
## 3.431209e+13 4.801279e+12 9.898155e+13 2.284819e+11 5.962417e+11 9.960977e+13 
##           UK 
## 6.921428e+14

Next, the variance of each region over time was calculated to identify and remove variables that exhibit no variation across the entire time period, thereby improving the effectiveness and stability of the analysis.

zero_var_cols <- which(col_var == 0)
zero_var_cols
## named integer(0)

Missing values were imputed to ensure that the computations could be carried out smoothly. Finally, the data were standardized to prevent variables with large dispersion from dominating the variance and unduly influencing the PCA results.

air_time <- apply(
  air_time,
  2,
  function(x) {
    x[is.na(x)] <- mean(x, na.rm = TRUE)
    x
  }
)
air_scaled <- scale(air_time)
summary(air_scaled)
##        AT                 BA               BE                 BG          
##  Min.   :-2.05750   Min.   :-2.678   Min.   :-2.24441   Min.   :-1.89496  
##  1st Qu.:-0.01514   1st Qu.: 0.000   1st Qu.:-0.07322   1st Qu.:-0.51898  
##  Median : 0.07425   Median : 0.000   Median : 0.29205   Median : 0.09981  
##  Mean   : 0.00000   Mean   : 0.000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.63138   3rd Qu.: 0.000   3rd Qu.: 0.65734   3rd Qu.: 0.82059  
##  Max.   : 1.11339   Max.   : 1.685   Max.   : 0.87561   Max.   : 1.24798  
##        CH                CY                CZ                 DE           
##  Min.   :-2.1166   Min.   :-2.1626   Min.   :-1.92269   Min.   :-2.144458  
##  1st Qu.:-0.1001   1st Qu.:-0.4742   1st Qu.:-0.25014   1st Qu.:-0.003065  
##  Median : 0.2960   Median : 0.1471   Median : 0.03701   Median : 0.286661  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.000000  
##  3rd Qu.: 0.6545   3rd Qu.: 0.8013   3rd Qu.: 0.75973   3rd Qu.: 0.533119  
##  Max.   : 0.8711   Max.   : 1.2267   Max.   : 1.20957   Max.   : 0.958236  
##        DK                 EE                 EL                 ES         
##  Min.   :-2.15242   Min.   :-1.94390   Min.   :-1.97739   Min.   :-2.1226  
##  1st Qu.:-0.07196   1st Qu.:-0.47821   1st Qu.:-0.60727   1st Qu.:-0.3333  
##  Median : 0.33738   Median : 0.05937   Median : 0.06183   Median : 0.2485  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.63994   3rd Qu.: 0.74143   3rd Qu.: 0.64226   3rd Qu.: 0.6673  
##  Max.   : 0.77915   Max.   : 1.39744   Max.   : 1.61028   Max.   : 1.2976  
##    EU27_2020             FI                 FR                 HR          
##  Min.   :-2.1460   Min.   :-1.99644   Min.   :-2.21744   Min.   :-1.94616  
##  1st Qu.:-0.1396   1st Qu.:-0.03635   1st Qu.: 0.01025   1st Qu.:-0.60062  
##  Median : 0.1570   Median : 0.21439   Median : 0.21039   Median : 0.09737  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.6957   3rd Qu.: 0.43305   3rd Qu.: 0.66972   3rd Qu.: 0.68752  
##  Max.   : 0.9991   Max.   : 1.24654   Max.   : 0.88479   Max.   : 1.56093  
##        HU                IE                IS                IT         
##  Min.   :-1.7056   Min.   :-1.9478   Min.   :-1.6091   Min.   :-2.1188  
##  1st Qu.:-0.5921   1st Qu.:-0.3142   1st Qu.:-0.8367   1st Qu.:-0.1985  
##  Median : 0.1126   Median : 0.2994   Median : 0.2144   Median : 0.1347  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.7785   3rd Qu.: 0.6930   3rd Qu.: 0.7929   3rd Qu.: 0.6610  
##  Max.   : 1.4101   Max.   : 1.0725   Max.   : 1.4766   Max.   : 1.3194  
##        LT                LU                 LV                  ME         
##  Min.   :-1.8356   Min.   :-1.57009   Min.   :-1.892228   Min.   :-2.3909  
##  1st Qu.:-0.6245   1st Qu.:-0.78025   1st Qu.:-0.320861   1st Qu.:-0.2090  
##  Median : 0.1963   Median :-0.02382   Median : 0.003821   Median : 0.0000  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.000000   Mean   : 0.0000  
##  3rd Qu.: 0.8546   3rd Qu.: 0.70170   3rd Qu.: 0.759217   3rd Qu.: 0.6837  
##  Max.   : 1.1868   Max.   : 1.55196   Max.   : 1.354983   Max.   : 1.3492  
##        MK                MT                 NL                NO         
##  Min.   :-1.8126   Min.   :-1.72176   Min.   :-2.0997   Min.   :-2.1234  
##  1st Qu.:-0.5509   1st Qu.:-0.56227   1st Qu.:-0.1327   1st Qu.:-0.0295  
##  Median : 0.0000   Median : 0.02167   Median : 0.2539   Median : 0.4447  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.3027   3rd Qu.: 0.70755   3rd Qu.: 0.7263   3rd Qu.: 0.5340  
##  Max.   : 1.6715   Max.   : 1.66131   Max.   : 0.9918   Max.   : 0.7397  
##        PL                 PT                RO                 RS         
##  Min.   :-1.58195   Min.   :-1.7019   Min.   :-1.60013   Min.   :-1.9166  
##  1st Qu.:-0.73207   1st Qu.:-0.6925   1st Qu.:-0.89280   1st Qu.:-0.4255  
##  Median : 0.01174   Median : 0.1296   Median : 0.09181   Median : 0.0000  
##  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.73402   3rd Qu.: 0.6460   3rd Qu.: 0.72422   3rd Qu.: 0.3138  
##  Max.   : 1.67472   Max.   : 1.4293   Max.   : 1.46539   Max.   : 1.7100  
##        SE                SI                SK                TR        
##  Min.   :-2.0254   Min.   :-2.0142   Min.   :-1.8948   Min.   :-2.345  
##  1st Qu.:-0.1134   1st Qu.:-0.1235   1st Qu.:-0.4156   1st Qu.: 0.000  
##  Median : 0.2650   Median : 0.2198   Median : 0.1130   Median : 0.000  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.000  
##  3rd Qu.: 0.6936   3rd Qu.: 0.5193   3rd Qu.: 0.6873   3rd Qu.: 0.000  
##  Max.   : 0.9526   Max.   : 1.1716   Max.   : 1.1346   Max.   : 2.345  
##        UK         
##  Min.   :-1.8572  
##  1st Qu.:-0.1838  
##  Median : 0.0000  
##  Mean   : 0.0000  
##  3rd Qu.: 0.3219  
##  Max.   : 1.5892

PCA

PCA analysis:

We then conducted principal component analysis (PCA) on the standardized data. The results show that the first principal component (PC1) explains approximately 83.0% of the total variance, indicating that the vast majority of the information in the data can be summarized by a dominant common pattern of variation. The second principal component (PC2) explains an additional 7.3% of the variance, and together the first two principal components account for about 90.3% of the total variance. This suggests that most of the information in the original data can be retained using only two principal components.

The remaining principal components contribute very little additional variance, implying that their marginal role in explaining the data structure is limited. After dimensionality reduction, the data exhibit strong correlations and a substantially reduced dimensionality, indicating that PCA achieves an effective reduction. Overall, retaining the first two principal components allows for efficient dimensionality reduction without significant loss of information.

pca_air <- prcomp(air_scaled, center = FALSE, scale. = FALSE)
summary(pca_air)
## Importance of components:
##                          PC1     PC2     PC3     PC4     PC5    PC6    PC7
## Standard deviation     5.542 1.64442 1.23790 1.12140 0.69030 0.4036 0.2854
## Proportion of Variance 0.830 0.07308 0.04142 0.03399 0.01288 0.0044 0.0022
## Cumulative Proportion  0.830 0.90310 0.94451 0.97850 0.99138 0.9958 0.9980
##                            PC8    PC9    PC10    PC11      PC12
## Standard deviation     0.18660 0.1614 0.09607 0.06734 7.488e-16
## Proportion of Variance 0.00094 0.0007 0.00025 0.00012 0.000e+00
## Cumulative Proportion  0.99892 0.9996 0.99988 1.00000 1.000e+00

visusalisation of quality

The first principal component (Dim1) explains the vast majority of the common variation among variables (83%), while the second principal component (Dim2) contributes only a limited amount of additional information (7.3%). Most variables are strongly positively correlated with Dim1, indicating that the data are mainly driven by a dominant overall level or scale factor.

At the same time, all countries point in the same (rightward) direction along the first principal component, suggesting that their characteristics are highly similar on Dim1 and that this component has very strong explanatory power. Along the second principal component, the distribution of countries is relatively compact, largely within the range of −0.5 to 0.5. Although some countries differ in their directions on Dim2, the overall pattern indicates that the PCA performs very well: the first two principal components are sufficient to explain most of the features while preserving the essential structure of the data.

fviz_eig(pca_air, choice='eigenvalue') # eigenvalues on y-axis
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.

fviz_eig(pca_air) # percentage of explained variance on y-axis
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.

fviz_pca_var(pca_air, col.var="steelblue")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Loading calculation

We extracted and ranked the variable loadings of the first principal component (PC1) to identify which original variables contribute most strongly to it. The results show that the loadings of most countries are very similar, with a large number falling within the range of 0.16–0.18. This indicates that PC1 is not driven by a small subset of countries versus others, but rather reflects a pattern in which all countries tend to move together. In other words, although differences exist across national aviation industries, their dynamics over time exhibit a high degree of common fluctuation. The integrated nature of the European (and broader regional) aviation market leads EU member states and neighboring countries to share a common underlying trend. Moreover, almost all countries have positive loadings on PC1, implying that increases in PC1 are associated with simultaneous increases across nearly all country-level variables.

Notably, Bosnia and Herzegovina, the United Kingdom, and Turkey, which are not EU member states, display relatively small loadings on PC1. This suggests that their aviation development patterns are less tightly coupled with the EU single market and therefore contribute less to the dominant EU-wide factor captured by PC1. This finding further supports the interpretation that the EU single market fosters greater similarity in aviation industry dynamics among member states, whereas non-EU countries exhibit greater heterogeneity and uncertainty.

loading_scores_PC_1 <-pca_air$rotation[, 1]
fac_scores_PC_1<-abs(loading_scores_PC_1)
fac_scores_PC_1_ranked<-names(sort(fac_scores_PC_1, decreasing=T))
pca_air$rotation[fac_scores_PC_1_ranked, 1]
##         IT         ES  EU27_2020         IE         LV         SK         CY 
## 0.17911015 0.17899747 0.17896279 0.17863464 0.17829860 0.17817018 0.17726000 
##         AT         HU         EE         LT         FR         CZ         NL 
## 0.17704003 0.17682900 0.17649837 0.17632070 0.17627003 0.17591024 0.17543933 
##         BE         BG         CH         MT         ME         HR         DK 
## 0.17470083 0.17416380 0.17398518 0.17286873 0.17203878 0.17190225 0.17151484 
##         PT         PL         EL         IS         DE         LU         RO 
## 0.17011542 0.16772058 0.16571838 0.16458533 0.16375791 0.16300440 0.16221009 
##         FI         SI         NO         MK         RS         SE         BA 
## 0.16034850 0.15908578 0.15700788 0.15212015 0.15112119 0.14913563 0.09649906 
##         UK         TR 
## 0.07120124 0.01316652

Based on the figure, it is also evident that the European aviation market is still largely dominated by the European Union, which constitutes the largest market in the region. In contrast, non-EU countries—regardless of their economic strength or geographic location—have only a very limited influence on the principal component of the European aviation market.

var<-get_pca_var(pca_air)
a<-fviz_contrib(pca_air, "var", axes=1, xtickslab.rt=90) # default angle=45°
b<-fviz_contrib(pca_air, "var", axes=2, xtickslab.rt=90)
grid.arrange(a,b,top='Contribution to the first two Principal Components')

Individual Coordinates and Contributions

Individual Coordinates

Each year (2013–2024) is treated as an individual, and its position in the principal component space, as well as its contribution to each principal component, is calculated in order to analyze the temporal structure and evolution of the data. The results based on individual coordinates reveal a very clear time-related structure in the distribution of years within the principal component space. The first principal component (Dim.1) explains the vast majority of the variance in the data, and its scores exhibit pronounced stage-wise changes over time. Specifically, from 2013 to 2019, the scores on Dim.1 gradually increase from negative to positive values, reflecting an overall growth trend in aviation-related indicators during this period. However, in 2020 and 2021, the Dim.1 scores drop sharply to extreme negative values, deviating markedly from the previous trajectory and indicating a strong structural shock to the aviation industry. This abrupt break is highly consistent with the global impact of the COVID-19 pandemic on air transport activity. After 2022, the Dim.1 scores recover rapidly and continue to rise in 2023 and 2024, exceeding pre-pandemic levels and suggesting a rebound of the aviation industry following the short-term disruption. Therefore, the first principal component can be reasonably interpreted as a composite indicator capturing changes in the overall scale or activity level of the aviation industry over time.

In contrast, the second principal component (Dim.2) explains a much smaller proportion of the total variance and mainly reflects secondary structural differences beyond the overall trend captured by Dim.1. The results show that during the period from 2013 to 2018, most years take negative values on Dim.2, while from 2019 onwards the scores gradually move toward zero and even become positive, with a clear sign change after 2020. This indicates that Dim.2 distinguishes, to some extent, structural differences between the pre-pandemic and post-pandemic periods. However, given its limited explanatory power, this component should not be overinterpreted. The third and higher-order principal components display noticeable values only for a small number of years and account for very little variance overall, suggesting that they mainly capture localized fluctuations or random noise and contribute little to explaining the overall temporal structure.

ind<-get_pca_ind(pca_air)  
print(ind)
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"
ind$coord
##             Dim.1       Dim.2        Dim.3      Dim.4       Dim.5        Dim.6
## X2013  -2.2964177 -2.05493666  1.558274933  0.8587761 -0.24049236 -0.269035960
## X2014  -1.4952409 -1.89289428  1.178860944  0.6531005 -0.18740534 -0.079076811
## X2015  -0.6189945 -1.82087495  0.423267073  0.2602832  0.06130885  0.118584147
## X2016   0.6220097 -1.35264023 -0.547747668 -0.3574027  0.27697392  0.552906528
## X2017   2.7295112 -0.92222767 -1.295340671 -0.6405873  0.05913927  0.393696428
## X2018   4.5633485 -0.50252227 -1.598038327 -0.7070166 -0.09687421  0.021021491
## X2019   5.4476581 -0.03605946 -1.260883436 -0.4175117 -0.28192314 -0.913730091
## X2020 -11.3140346  0.93106732 -0.312627165 -0.8221830  1.19857114 -0.425177325
## X2021  -9.1175468  1.79602284 -1.063763079  0.7255411 -1.47569333  0.226928234
## X2022   0.6937917  1.63309164  0.679672586 -0.2710359  0.81981402  0.411574800
## X2023   4.4446719  2.01646261  2.246447245 -1.8607971 -0.65006043 -0.006651514
## X2024   6.3412433  2.20551111 -0.008122433  2.5788334  0.51664161 -0.031039928
##             Dim.7        Dim.8        Dim.9       Dim.10       Dim.11
## X2013  0.22421271  0.193095386  0.027456507 -0.025192179 -0.131735522
## X2014  0.06231824  0.119930745 -0.009108138  0.046518286  0.173370426
## X2015 -0.42368083 -0.419868985  0.178752926 -0.004497616 -0.019484567
## X2016 -0.09343430  0.006363235 -0.368876617 -0.125729954 -0.007703347
## X2017  0.21484363  0.015643722 -0.001611594  0.241299969 -0.034543730
## X2018  0.38811806  0.025142745  0.259684390 -0.155318040  0.026479676
## X2019 -0.40017391  0.090371194 -0.101728111  0.020041055 -0.004865412
## X2020  0.20163692 -0.102246045 -0.037352900  0.006416580  0.007038375
## X2021 -0.09603629  0.024828679  0.008984570 -0.006414667 -0.002839429
## X2022 -0.42919625  0.314654730  0.173210984 -0.005106588 -0.007023877
## X2023  0.15732365 -0.136733863 -0.064769040  0.002573454  0.000145308
## X2024  0.19406838 -0.131181543 -0.064642977  0.005409701  0.001162100
##              Dim.12
## X2013  4.440892e-16
## X2014  4.996004e-16
## X2015  4.996004e-16
## X2016  6.956241e-16
## X2017  9.853229e-16
## X2018  1.471046e-15
## X2019  3.885781e-16
## X2020 -1.665335e-16
## X2021  0.000000e+00
## X2022  5.282233e-16
## X2023  6.383782e-16
## X2024  1.026956e-15

Individual Contributions

Based on the individual contribution results, the first principal component is mainly defined by a small number of years, among which 2020 and 2021 contribute significantly more to PC1 than other years. This indicates that the orientation of the principal component is largely driven by these extreme years. It suggests that the aviation industry experienced a strong structural shock during this period, while the intermediate years contribute relatively little to the construction of the component and mainly reflect stable phases of the overall trend. From 2013 to 2019, the observations move gradually from the lower-left to the right in the PCA space, indicating a period of steady growth in the European aviation industry, which reached its peak in 2019. During 2020–2021, the data points shift abruptly to the far left, representing a strong negative deviation. This reflects the severe impact of the COVID-19 pandemic, during which changes in entry policies, widespread flight suspensions, and temporary airspace closures led to a rapid contraction of the aviation sector. From 2022 to 2024, the points move rapidly back to the right, showing a clear recovery that even exceeds pre-pandemic levels. This rebound may be attributed to pent-up travel demand following two to three years of restrictions, as well as policy measures implemented by the EU to revitalize the aviation industry. Overall, the first principal component primarily captures the long-term temporal evolution of the aviation industry and clearly identifies the structural shock occurring in 2020–2021. The second principal component appears to describe a secondary growth-related pattern: values are negative before the pandemic, possibly reflecting a smoother development phase, and become positive after the pandemic, potentially corresponding to a period of rapid expansion or contraction. However, this interpretation requires further validation and should be regarded only as a supplementary explanation. Finally, the cos²-based coloring confirms that the years associated with the pandemic shock and subsequent recovery are well represented by the first two principal components, indicating that these key periods are robustly and reliably captured by the PCA results.

ind$contrib
##            Dim.1        Dim.2        Dim.3      Dim.4       Dim.5        Dim.6
## X2013  1.4309800 13.013357712 1.320485e+01  4.8871897  1.01145769  3.703697727
## X2014  0.6066723 11.041935577 7.557366e+00  2.8265673  0.61419898  0.319972873
## X2015  0.1039694 10.217690211 9.742586e-01  0.4489434  0.06573421  0.719561863
## X2016  0.1049847  5.638415679 1.631572e+00  0.8464768  1.34159958 15.642937334
## X2017  2.0216295  2.621010669 9.124589e+00  2.7192932  0.06116417  7.931180217
## X2018  5.6506586  0.778221312 1.388736e+01  3.3125201  0.16411999  0.022612152
## X2019  8.0528848  0.004007107 8.645601e+00  1.1551451  1.38997372 42.721916330
## X2020 34.7349604  2.671496795 5.314944e-01  4.4795699 25.12307424  9.250283877
## X2021 22.5573357  9.940682466 6.153683e+00  3.4883768 38.08354604  2.635068673
## X2022  0.1306141  8.218898167 2.512146e+00  0.4868026 11.75373346  8.667870666
## X2023  5.3605731 12.530628035 2.744339e+01 22.9454915  7.39013911  0.002263893
## X2024 10.9114041 14.990322938 3.587712e-04 44.0702900  4.66792546  0.049301061
##            Dim.7        Dim.8        Dim.9       Dim.10       Dim.11     Dim.12
## X2013  5.1440860  8.923317277  0.241201448  0.573036269 3.189460e+01  2.9308675
## X2014  0.3973911  3.442259464  0.026542909  1.953880822 5.524099e+01  3.7093792
## X2015 18.3681497 42.190059047 10.223407602  0.018264818 6.977389e-01  3.7093792
## X2016  0.8933065  0.009690329 43.536309026 14.273424115 1.090612e-01  7.1912602
## X2017  4.7231611  0.058568311  0.000830999 52.573293993 2.193057e+00 14.4282258
## X2018 15.4140046  0.151289203 21.576502228 21.781834792 1.288655e+00 32.1594020
## X2019 16.3864667  1.954531097  3.311092526  0.362653434 4.350615e-02  2.2439455
## X2020  4.1603309  2.501932426  0.446414216  0.037175586 9.104503e-02  0.4121532
## X2021  0.9437534  0.147533207  0.025827593  0.037153429 1.481745e-02  0.0000000
## X2022 18.8494906 23.694692151  9.599314370  0.023545741 9.067035e-02  4.1465877
## X2023  2.5326537  4.474399037  1.342221768  0.005979757 3.880523e-05  6.0563630
## X2024  3.8538724  4.118395119  1.337001981  0.026423912 2.481979e-03 15.6732721
fviz_pca_ind(pca_air, col.ind="cos2", geom="point", gradient.cols=c("white", "#2E9FDF", "#FC4E07" ))

Clustering

performance was not satisfactory. This is mainly because the clustering results are largely dominated by the first principal component. Although PC1 effectively captures the overall trend, it has limited ability to distinguish structural differences between distinct phases, which restricts the interpretability of the clustering results in a real-world context.

Given the relatively small number of years in the dataset, we therefore prefer an observation- and experience-based classification. Specifically, 2013–2019 are classified as years of smooth growth 2020–2021 as years of pandemic-induced shock 2022–2024 as years of rapid rebound and expansion.

km <- eclust(
  pca_air$x[, 1:2],  # PCA scores
  k = 3
)

MDS

We also compared an alternative dimensionality reduction method, multidimensional scaling (MDS), which projects the high-dimensional data onto a two-dimensional plane based on the Euclidean distances between years.

dist_year <- dist(air_time, method = "euclidean")

The results of the classical multidimensional scaling (MDS) analysis show that the distribution of years in the two-dimensional space exhibits a clear temporal structure. The years 2013–2019 are highly clustered in the MDS space, indicating a strong similarity in the aviation industry during the pre-pandemic period. In contrast, 2020–2021 are clearly separated from the other years, reflecting the significant structural shock caused by the COVID-19 pandemic. The years 2022–2024 gradually move closer to the pre-pandemic period, indicating a recovery trend in the aviation industry.

These results are highly consistent with the findings from the PCA individual score analysis. Although PCA is based on the principle of variance maximization, whereas MDS directly relies on Euclidean distances between years, both methods identify the same temporal segmentation. This consistency suggests that the observed structure is not driven by the choice of dimensionality reduction method, but rather reflects intrinsic characteristics of the data itself. Overall, the high degree of integration within the EU aviation market leads to strong synchronization among national aviation industries under normal conditions, resulting in relatively small differences between years, with pronounced divergence emerging only under major external shocks such as the pandemic.

mds_res <- cmdscale(dist_year, k = 2, eig = TRUE)
Z_mds <- mds_res$points
plot(
  Z_mds[,1], Z_mds[,2],
  pch = 19,
  xlab = "MDS Dimension 1",
  ylab = "MDS Dimension 2",
  main = "MDS of Years"
)
text(
  Z_mds[,1], Z_mds[,2],
  labels = rownames(Z_mds),
  cex = 0.7,
  pos = 3
)

Correlation coefficients

By computing the correlation coefficients between each country’s aviation time series and the first MDS dimension, we find that MDS Dimension 1 is highly consistent with the overall evolution of the EU aviation industry. This indicates that the main structure identified by MDS is not driven by a small number of individual countries, but rather reflects a highly synchronized overall trend of the EU aviation market. In contrast, non-EU countries exhibit significantly lower correlations with this dimension, suggesting that their aviation development paths differ to some extent from the overall EU structure.

dim1 <- Z_mds[,1]
dim2 <- Z_mds[,2]
cor_dim1 <- apply(air_time, 2, cor, y = dim1)
cor_dim2 <- apply(air_time, 2, cor, y = dim2)
sort(cor_dim1, decreasing = TRUE)[1:36]
## EU27_2020        FR        BE        NL        IT        CH        AT        IE 
## 0.9999967 0.9953865 0.9904314 0.9898371 0.9897729 0.9890947 0.9876944 0.9875429 
##        SK        DK        ES        CZ        LV        CY        ME        HU 
## 0.9824974 0.9817974 0.9808813 0.9795346 0.9790786 0.9595260 0.9504110 0.9500485 
##        DE        EE        BG        LT        FI        NO        SI        MT 
## 0.9500243 0.9486498 0.9482696 0.9458141 0.9260874 0.9235008 0.9209768 0.9179689 
##        HR        PT        SE        IS        PL        EL        LU        RO 
## 0.9121098 0.8978454 0.8846912 0.8809248 0.8774151 0.8752077 0.8447133 0.8389763 
##        RS        MK        BA        UK 
## 0.8121083 0.8050209 0.5326404 0.3237185
sort(cor_dim1)[1:36]
##         TR         UK         BA         MK         RS         RO         LU 
## 0.07035906 0.32371853 0.53264044 0.80502088 0.81210825 0.83897626 0.84471326 
##         EL         PL         IS         SE         PT         HR         MT 
## 0.87520774 0.87741512 0.88092481 0.88469120 0.89784536 0.91210979 0.91796892 
##         SI         NO         FI         LT         BG         EE         DE 
## 0.92097684 0.92350080 0.92608740 0.94581409 0.94826965 0.94864980 0.95002431 
##         HU         ME         CY         LV         CZ         ES         DK 
## 0.95004849 0.95041095 0.95952601 0.97907856 0.97953458 0.98088129 0.98179737 
##         SK         IE         AT         CH         IT         NL         BE 
## 0.98249739 0.98754287 0.98769439 0.98909470 0.98977293 0.98983707 0.99043137 
##         FR 
## 0.99538645

The second MDS dimension does not represent the overall EU aviation trend; instead, it distinguishes secondary differences among countries in terms of aviation structure or recovery paths, and its explanatory power is clearly weaker than that of the first dimension.

sort(cor_dim2, decreasing = TRUE)[1:36]
##            RO            LU            PL            EL            UK 
##  0.5386647460  0.5303463664  0.4673689632  0.4579470824  0.4432114232 
##            PT            HR            MT            MK            RS 
##  0.4350933184  0.3987166238  0.3743986202  0.3718903761  0.2998165886 
##            LT            EE            HU            IS            CY 
##  0.2881653151  0.2785855707  0.2738939123  0.2729895829  0.2511898145 
##            ES            BG            IT            IE            LV 
##  0.1810506391  0.1414987517  0.1063854368  0.1057219262  0.0792326081 
##            BA            ME            SK            TR            AT 
##  0.0771344556  0.0670487723  0.0634026271  0.0500965430  0.0010886517 
##     EU27_2020            CZ            FR            NL            BE 
## -0.0004526982 -0.0335727519 -0.0786496378 -0.0919080765 -0.1109951960 
##            CH            DK            DE            FI            SI 
## -0.1412580205 -0.1755016350 -0.3019937952 -0.3124779500 -0.3187286902 
##            NO 
## -0.3635349164
sort(cor_dim2)[1:36]
##            SE            NO            SI            FI            DE 
## -0.4354981587 -0.3635349164 -0.3187286902 -0.3124779500 -0.3019937952 
##            DK            CH            BE            NL            FR 
## -0.1755016350 -0.1412580205 -0.1109951960 -0.0919080765 -0.0786496378 
##            CZ     EU27_2020            AT            TR            SK 
## -0.0335727519 -0.0004526982  0.0010886517  0.0500965430  0.0634026271 
##            ME            BA            LV            IE            IT 
##  0.0670487723  0.0771344556  0.0792326081  0.1057219262  0.1063854368 
##            BG            ES            CY            IS            HU 
##  0.1414987517  0.1810506391  0.2511898145  0.2729895829  0.2738939123 
##            EE            LT            RS            MK            MT 
##  0.2785855707  0.2881653151  0.2998165886  0.3718903761  0.3743986202 
##            HR            PT            UK            EL            PL 
##  0.3987166238  0.4350933184  0.4432114232  0.4579470824  0.4673689632 
##            LU 
##  0.5303463664

Proportion of variance explained (eigenvalues)

The first two dimensions explain 99.51% of the total distance variation, indicating that the two-dimensional MDS representation almost completely preserves the original distance structure.

eig <- mds_res$eig
eig_pos <- eig[eig > 0]
prop_2d <- sum(eig_pos[1:2]) / sum(eig_pos)
prop_2d
## [1] 0.995103

Correlation between original distances and MDS distances

The correlation coefficient is approximately 0.9999, indicating that the pairwise distances between points in the MDS space are almost identical to those in the original high-dimensional space. This demonstrates that both the relative ordering of distances and the relative proximities are preserved nearly perfectly.

D_high <- dist(air_time)
D_mds  <- dist(Z_mds)
cor_dist <- cor(as.vector(D_high), as.vector(D_mds))
cor_dist
## [1] 0.9998819

Shepard Diagram

The points are distributed almost perfectly along a straight line, with virtually no systematic curvature or dispersion. A near-linear Shepard diagram indicates an excellent fit of the MDS representation.

plot(
  as.vector(D_high),
  as.vector(D_mds),
  pch = 19, col = rgb(0,0,0,0.4),
  xlab = "Original distances",
  ylab = "MDS distances",
  main = "Shepard Diagram"
)
abline(lm(as.vector(D_mds) ~ as.vector(D_high)), col = "red", lwd = 2)

Comparing PCA & MDS

Because the data are highly similar and exhibit an almost one-dimensional structure dominated by time, the resulting dimensionality reduction performs exceptionally well.

The analysis shows that principal component analysis (PCA) and classical multidimensional scaling (MDS) provide highly consistent conclusions in characterizing the structural relationships between years. PCA, based on the principle of variance maximization, identifies a time-evolution pattern dominated by the first principal component, clearly capturing the steady growth of the aviation industry before the pandemic, the sharp decline during the pandemic period, and the rapid recovery thereafter. In contrast, MDS directly relies on Euclidean distances between years to preserve similarity relationships in a low-dimensional space, and similarly separates the pandemic years (2020–2021) from other periods, yielding a temporal segmentation that closely aligns with the PCA individual score results.

Although the two methods differ in their theoretical foundations—PCA focusing on variance structure and MDS emphasizing distance preservation—both reveal the same core pattern. This strong consistency indicates that the observed temporal structure is not an artifact introduced by a specific dimensionality reduction technique, but rather reflects intrinsic characteristics of the data itself. Furthermore, the near-perfect fit demonstrated by the MDS Shepard diagram and distance correlation diagnostics confirms that the two-dimensional representation preserves the original distance structure almost without loss, thereby further validating the robustness and reliability of the structure identified by PCA.

Summary

By synthesizing the results from PCA, MDS, and correlation analyses, a clear business-relevant conclusion can be drawn: under normal conditions, the EU aviation market exhibits a high degree of synchronization and integration, operating much more like a single unified market than a collection of independently driven national systems. During the period from 2013 to 2019, aviation indicators across countries are highly similar and differences between years are limited, suggesting that market demand, capacity allocation, and aviation activity levels are primarily shaped by EU-level macroeconomic conditions and the mechanisms of the single market.

However, the results for 2020–2021 clearly demonstrate that when confronted with a strong external shock such as the COVID-19 pandemic, the structure of the aviation industry undergoes a pronounced break, with all countries deviating simultaneously from their previous trajectories. This implies that in a highly integrated market, systemic risks are strongly synchronized and amplified, allowing shocks to propagate rapidly across the entire market. The post-2022 results show a rapid recovery and even an expansion beyond pre-pandemic levels, indicating that the EU aviation market possesses substantial resilience and recovery capacity supported by unified regulations, cross-border networks, and scale effects.

From a commercial and strategic perspective, these findings suggest that analyses and decisions concerning the EU aviation market gain limited additional insight from purely country-level distinctions. Instead, greater value lies in assessing overall market cycles and systemic risks. Airlines, airport operators, and related investors should therefore focus more on macroeconomic cycles, cross-border coordination, and the management of sudden external shocks, rather than overemphasizing short-term fluctuations in individual countries. In contrast, non-EU countries exhibit weaker synchronization with the EU-wide trend, reflecting higher uncertainty and structural heterogeneity, which from a business standpoint implies higher risk premia and lower predictability.