# Load data
air_trafic <- read.csv("/Users/pin.lyu/Desktop/BC_Class_Folder/Predictive_Analytics/Data_Sets/air_traffic.csv")
# Missing values check
missmap(air_trafic)
# Extract columns of interest
air <- air_trafic |>
select('Year',
'Month',
'Dom_Pax', # Total numbers of domestic passengers
'Int_Pax', # Total numbers of international passengers
'Pax', # Total numbers of passengers
'Dom_Flt', # Total numbers of domestic flights
'Int_Flt', # Total numbers of international flights
'Flt' # Total number of flights
)
## Create new time colunm for grahing
# Convert year and month columns to character
air$Year <- as.character(air$Year)
air$Month <- as.character(air$Month)
# Combine year and month columns into a single string
air$time <- as.Date(paste0(air$Year, '-', air$Month, '-01'))
# Delete comma in numbers
# Remove commas & Convert the column to numeric
air$Int_Flt <- as.numeric(gsub(",", "", air$Int_Flt))
# Historic record of international flights in the U.S. from 2013 to 2023
ggplot(air, aes(x = time, y = Int_Flt)) +
geom_line() +
labs(x = 'Time', y = 'Number of Flights', title = 'Total Number of Monthly International Flights In The U.S. (2003-2023)') +
theme_classic() +
scale_x_date(date_breaks = "2 year", date_labels = "%Y-%m")
# Create a time series object
air_ts <- ts(air$Int_Flt, start = min(air$time), frequency = 12)
# Perform decomposition
decomp_add <- decompose(air_ts, type = 'additive')
# Perform decomposition
decomp_mul <- decompose(air_ts, type = 'multiplicative')
The decomposition outcomes for this time series data are remarkably alike, hence making the magnitude of variation in each component difficult to determine a model’s superiority. Thus, the focus should shift to the qualitative nature of the models.
The additive decomposition model excels with linear data, while the multiplicative model is better suited for data displaying exponential growth or decay. In our flight data, there is no discernible trend characterized by exponential growth or decay. On a yearly level, the total number of international flights in the U.S. remains relatively consistent. Therefore, the additive model is better equipped to capture the nature of the data.
Additionally,In a multiplicative model, extreme values can disproportionately impact the relative sizes of the seasonal and trend components, potentially resulting in less stable decomposition outcomes. Conversely, the additive model tends to be less sensitive to extreme values because it separates the components additively rather than multiplicatively. Thus, additive decomposition is a more robust model for dealing with extreme values. In our data, the sudden drop in flight numbers from 2020 to 2022 due to the COVID-19 pandemic illustrates this point. The linear nature of the additive approach mitigates the impact of these rare extreme values.
Based on the arguments presented, I believe that additive decomposition will perform better as it is more aligned with the nature of the data. If I had to choose between the two models for forecasting, I would opt for the additive model. However, if other options like X-11 were available, I would explore those as well. Classic models have the drawback of not being able to provide estimations for the two ends. Other models such as X-11 and STL perform better in handling extreme values, which could help mitigate the impact of the pandemic on international flight forecasts.