The AirPassengers dataset embedded into R gives
information on the number of international airline passengers per month
from US airports in the years 1949 to 1960.
This project aims to experiment with the Prophet library meta has
created in order to explore the AirPassengers dataset and
to forecast and observe seasonal variations and trends in air travel
over the years.
The data in the AirPassengers dataset has 144 data
points representing the total number of international airline passengers
for each month over the span of 12 years measured in thousands.
Plotting the raw data, with years on the x-axis and number of passengers in thousands on the y-axis yields a graph that looks like this:
par(mar = c(4, 4, 2, 1))
plot(AirPassengers, main = "Airline Passengers with Linear Regression",
ylab = "Number of Passengers (thousands)", xlab = "Year", col = "blue", type = "o")We can see that there seems to be an overall growing number of airline passengers over the years as well as possible regular fluctuations around the trend suggesting some seasonality and positive trend.
Here we’ll run a regression on the data and fit a regression line.
time_index <- as.numeric(time(AirPassengers))
linear_model <- lm(AirPassengers ~ time_index)
summary(linear_model)##
## Call:
## lm(formula = AirPassengers ~ time_index)
##
## Residuals:
## Min 1Q Median 3Q Max
## -93.858 -30.727 -5.757 24.489 164.999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -62055.907 2166.077 -28.65 <2e-16 ***
## time_index 31.886 1.108 28.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared: 0.8536, Adjusted R-squared: 0.8526
## F-statistic: 828.2 on 1 and 142 DF, p-value: < 2.2e-16
par(mar = c(4, 4, 2, 1))
plot(AirPassengers, main = "Airline Passengers with Linear Regression",
ylab = "Number of Passengers", xlab = "Year", col = "blue", type = "o")
abline(linear_model, col = "red", lwd = 2)From the coefficient of time_index being +31.886 as well
as the previous plots, we can expect to see a positive trend over time
when implementing Prophet functions. It can be estimated from this that
each year the number of monthly passengers on international flights from
the US increases by 31,886.
In order to use the Prophet library, we’ll convert the csv file
AirPassengers into a data frame with columns
ds and y corresponding to the month-year and
the number of passengers on flights that month. Here
zoo::as.yearmon() will be used to convert the dates which
were previously in years into years and months which are suitable for
Prophet.
## [1] 622 606 508 461 390 432
## ds y
## 139 Jul 1960 622
## 140 Aug 1960 606
## 141 Sep 1960 508
## 142 Oct 1960 461
## 143 Nov 1960 390
## 144 Dec 1960 432
Notice how the last six rows of AP.df outputs ds and
their corresponding y rather than just the last six monthly
totals y in the original dataset.
We can now proceed to fit the Prophet model to the data frame:
AP_model = prophet(AP.df, yearly.seasonality = TRUE, weekly.seasonality = FALSE, daily.seasonality = FALSE)
forecast_periods = make_future_dataframe(AP_model, periods=24, freq="month")
head(forecast_periods)## ds
## 1 1949-01-01
## 2 1949-02-01
## 3 1949-03-01
## 4 1949-04-01
## 5 1949-05-01
## 6 1949-06-01
## ds
## 163 1962-07-01
## 164 1962-08-01
## 165 1962-09-01
## 166 1962-10-01
## 167 1962-11-01
## 168 1962-12-01
AP_model uses prophet to fit the data to
the prophet function while clarifying that there is yearly seasonality
but not weekly or daily as AirPassengers is monthly and no
such seasonality would be relevant.
The function make_future_dataframe generates 24
additional monthly periods that occur after the end of the original time
period and adds it onto the original dataset time frame. Now
forecast_periods covers time from January 1949 to December
1962 rather than January 1949 to December 1960 which is what
AirPassengers covers.
Prophet can also be used to plot the original time series. It adds on forecasts (extra periods) and uncertainty intervals (shaded areas) which helps visualise what to expect using previous years.
predict(m, forecast_periods) generates predictions of
number of aircraft passengers monthly for the next 2 years from Jan 1961
to Dec 1962.
predictions also gives uncertainty levels which are
assigned yhat_lower and yhat_upper.
## ds yhat yhat_lower yhat_upper
## 163 1962-07-01 613.8120 584.1430 640.1683
## 164 1962-08-01 613.7566 583.1485 643.8505
## 165 1962-09-01 565.8271 537.2291 594.5635
## 166 1962-10-01 530.1203 498.5226 559.2807
## 167 1962-11-01 497.2557 467.1521 526.6023
## 168 1962-12-01 526.7618 496.7381 556.6183
This displays the date, estimation for monthly total of passengers, and then the uncertainty levels from July to December 1962.
We can inspect closer what Prophet predicts by plotting an
interactive plot using dyplot.prophet(). This will display
the date, forecasts, actual data points and uncertainty intervals. This
interval.width is set automatically to 80% by prophet
function. However this could be decreased to 70% to make the shaded
region wider and predictions more certain. This is
The interval.width is set automatically to 80% (0.8) by
prophet function. However this could be increased to 95%
(0.95) to make the shaded region wider and predictions less certain.
This is useful as air travel has significant seasonal fluctuations which
will be captured better with a wider interval. This 0.95 means we are
making it so that 95% of future predictions now fall in the interval
hence it’s less precise and more lenient.
There appears to be some seasonality and long term trends observed over time. This is seen by the repeating fluctuations around the trend. To observe the different components we can inspect closer:
This outputs two plots for trend and seasonal components of the forecast. We can see possible seasonal patters (yearly) and the long-term increasing trend in passenger numbers. Looking at the yearly component, there are peaks in the warmer months on the year and there are sharp rises likely due to when holidays fall in the year, suggesting there are more passengers on flights in holiday periods and warmer months.