logo

Introduction

The AirPassengers dataset embedded into R gives information on the number of international airline passengers per month from US airports in the years 1949 to 1960.

This project aims to experiment with the Prophet library meta has created in order to explore the AirPassengers dataset and to forecast and observe seasonal variations and trends in air travel over the years.

Set up

1.1 AirPassengers dataset

The data in the AirPassengers dataset has 144 data points representing the total number of international airline passengers for each month over the span of 12 years measured in thousands.

Plotting the raw data, with years on the x-axis and number of passengers in thousands on the y-axis yields a graph that looks like this:

par(mar = c(4, 4, 2, 1)) 
plot(AirPassengers, main = "Airline Passengers with Linear Regression", 
     ylab = "Number of Passengers (thousands)", xlab = "Year", col = "blue", type = "o")

We can see that there seems to be an overall growing number of airline passengers over the years as well as possible regular fluctuations around the trend suggesting some seasonality and positive trend.

1.2 Running a linear regression

Here we’ll run a regression on the data and fit a regression line.

time_index <- as.numeric(time(AirPassengers))
linear_model <- lm(AirPassengers ~ time_index)
summary(linear_model)
## 
## Call:
## lm(formula = AirPassengers ~ time_index)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -62055.907   2166.077  -28.65   <2e-16 ***
## time_index      31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16
par(mar = c(4, 4, 2, 1)) 
plot(AirPassengers, main = "Airline Passengers with Linear Regression", 
     ylab = "Number of Passengers", xlab = "Year", col = "blue", type = "o")
abline(linear_model, col = "red", lwd = 2)

From the coefficient of time_index being +31.886 as well as the previous plots, we can expect to see a positive trend over time when implementing Prophet functions. It can be estimated from this that each year the number of monthly passengers on international flights from the US increases by 31,886.

Forecasting with Prophet

2.1 Loading into prophet

In order to use the Prophet library, we’ll convert the csv file AirPassengers into a data frame with columns ds and y corresponding to the month-year and the number of passengers on flights that month. Here zoo::as.yearmon() will be used to convert the dates which were previously in years into years and months which are suitable for Prophet.

AP.df = data.frame(ds=as.yearmon(time(AirPassengers)),y=AirPassengers)
tail(AirPassengers)
## [1] 622 606 508 461 390 432
tail(AP.df)
##           ds   y
## 139 Jul 1960 622
## 140 Aug 1960 606
## 141 Sep 1960 508
## 142 Oct 1960 461
## 143 Nov 1960 390
## 144 Dec 1960 432

Notice how the last six rows of AP.df outputs ds and their corresponding y rather than just the last six monthly totals y in the original dataset.

We can now proceed to fit the Prophet model to the data frame:

AP_model = prophet(AP.df, yearly.seasonality = TRUE, weekly.seasonality = FALSE, daily.seasonality = FALSE)
forecast_periods = make_future_dataframe(AP_model, periods=24, freq="month")
head(forecast_periods)
##           ds
## 1 1949-01-01
## 2 1949-02-01
## 3 1949-03-01
## 4 1949-04-01
## 5 1949-05-01
## 6 1949-06-01
tail(forecast_periods)
##             ds
## 163 1962-07-01
## 164 1962-08-01
## 165 1962-09-01
## 166 1962-10-01
## 167 1962-11-01
## 168 1962-12-01

AP_model uses prophet to fit the data to the prophet function while clarifying that there is yearly seasonality but not weekly or daily as AirPassengers is monthly and no such seasonality would be relevant.

The function make_future_dataframe generates 24 additional monthly periods that occur after the end of the original time period and adds it onto the original dataset time frame. Now forecast_periods covers time from January 1949 to December 1962 rather than January 1949 to December 1960 which is what AirPassengers covers.

Prophet can also be used to plot the original time series. It adds on forecasts (extra periods) and uncertainty intervals (shaded areas) which helps visualise what to expect using previous years.

predictions = predict(AP_model, forecast_periods)
plot(AP_model,predictions)

predict(m, forecast_periods) generates predictions of number of aircraft passengers monthly for the next 2 years from Jan 1961 to Dec 1962.

2.2 Confidence levels

predictions also gives uncertainty levels which are assigned yhat_lower and yhat_upper.

tail(predictions[c('ds','yhat','yhat_lower','yhat_upper')])
##             ds     yhat yhat_lower yhat_upper
## 163 1962-07-01 613.8120   584.1430   640.1683
## 164 1962-08-01 613.7566   583.1485   643.8505
## 165 1962-09-01 565.8271   537.2291   594.5635
## 166 1962-10-01 530.1203   498.5226   559.2807
## 167 1962-11-01 497.2557   467.1521   526.6023
## 168 1962-12-01 526.7618   496.7381   556.6183

This displays the date, estimation for monthly total of passengers, and then the uncertainty levels from July to December 1962.

We can inspect closer what Prophet predicts by plotting an interactive plot using dyplot.prophet(). This will display the date, forecasts, actual data points and uncertainty intervals. This interval.width is set automatically to 80% by prophet function. However this could be decreased to 70% to make the shaded region wider and predictions more certain. This is

dyplot.prophet(AP_model, predictions)

The interval.width is set automatically to 80% (0.8) by prophet function. However this could be increased to 95% (0.95) to make the shaded region wider and predictions less certain. This is useful as air travel has significant seasonal fluctuations which will be captured better with a wider interval. This 0.95 means we are making it so that 95% of future predictions now fall in the interval hence it’s less precise and more lenient.

AP_model = prophet::prophet(AP.df, yearly.seasonality = TRUE, weekly.seasonality = FALSE, daily.seasonality = FALSE, interval.width = 0.95)
dyplot.prophet(AP_model, predictions)

2.3 Seasonality

There appears to be some seasonality and long term trends observed over time. This is seen by the repeating fluctuations around the trend. To observe the different components we can inspect closer:

prophet_plot_components(AP_model,predictions)

This outputs two plots for trend and seasonal components of the forecast. We can see possible seasonal patters (yearly) and the long-term increasing trend in passenger numbers. Looking at the yearly component, there are peaks in the warmer months on the year and there are sharp rises likely due to when holidays fall in the year, suggesting there are more passengers on flights in holiday periods and warmer months.