logo

Section 1.1: AirPassengers DataSet

The data set chosen is AirPassengers which is a time series containing monthly totals of international passengers ranging from 1949 to 1960.

Here, we will make a data frame for our chosen data set, AirPassengers.

AirPassengers.df = data.frame(
  ds=zoo::as.yearmon(time(AirPassengers)), 
  y=AirPassengers)

Now,we display the data frame created above:

##           ds   y
## 1   Jan 1949 112
## 2   Feb 1949 118
## 3   Mar 1949 132
## 4   Apr 1949 129
## 5   May 1949 121
## 6   Jun 1949 135
## 7   Jul 1949 148
## 8   Aug 1949 148
## 9   Sep 1949 136
## 10  Oct 1949 119
## 11  Nov 1949 104
## 12  Dec 1949 118
## 13  Jan 1950 115
## 14  Feb 1950 126
## 15  Mar 1950 141
## 16  Apr 1950 135
## 17  May 1950 125
## 18  Jun 1950 149
## 19  Jul 1950 170
## 20  Aug 1950 170
## 21  Sep 1950 158
## 22  Oct 1950 133
## 23  Nov 1950 114
## 24  Dec 1950 140
## 25  Jan 1951 145
## 26  Feb 1951 150
## 27  Mar 1951 178
## 28  Apr 1951 163
## 29  May 1951 172
## 30  Jun 1951 178
## 31  Jul 1951 199
## 32  Aug 1951 199
## 33  Sep 1951 184
## 34  Oct 1951 162
## 35  Nov 1951 146
## 36  Dec 1951 166
## 37  Jan 1952 171
## 38  Feb 1952 180
## 39  Mar 1952 193
## 40  Apr 1952 181
## 41  May 1952 183
## 42  Jun 1952 218
## 43  Jul 1952 230
## 44  Aug 1952 242
## 45  Sep 1952 209
## 46  Oct 1952 191
## 47  Nov 1952 172
## 48  Dec 1952 194
## 49  Jan 1953 196
## 50  Feb 1953 196
## 51  Mar 1953 236
## 52  Apr 1953 235
## 53  May 1953 229
## 54  Jun 1953 243
## 55  Jul 1953 264
## 56  Aug 1953 272
## 57  Sep 1953 237
## 58  Oct 1953 211
## 59  Nov 1953 180
## 60  Dec 1953 201
## 61  Jan 1954 204
## 62  Feb 1954 188
## 63  Mar 1954 235
## 64  Apr 1954 227
## 65  May 1954 234
## 66  Jun 1954 264
## 67  Jul 1954 302
## 68  Aug 1954 293
## 69  Sep 1954 259
## 70  Oct 1954 229
## 71  Nov 1954 203
## 72  Dec 1954 229
## 73  Jan 1955 242
## 74  Feb 1955 233
## 75  Mar 1955 267
## 76  Apr 1955 269
## 77  May 1955 270
## 78  Jun 1955 315
## 79  Jul 1955 364
## 80  Aug 1955 347
## 81  Sep 1955 312
## 82  Oct 1955 274
## 83  Nov 1955 237
## 84  Dec 1955 278
## 85  Jan 1956 284
## 86  Feb 1956 277
## 87  Mar 1956 317
## 88  Apr 1956 313
## 89  May 1956 318
## 90  Jun 1956 374
## 91  Jul 1956 413
## 92  Aug 1956 405
## 93  Sep 1956 355
## 94  Oct 1956 306
## 95  Nov 1956 271
## 96  Dec 1956 306
## 97  Jan 1957 315
## 98  Feb 1957 301
## 99  Mar 1957 356
## 100 Apr 1957 348
## 101 May 1957 355
## 102 Jun 1957 422
## 103 Jul 1957 465
## 104 Aug 1957 467
## 105 Sep 1957 404
## 106 Oct 1957 347
## 107 Nov 1957 305
## 108 Dec 1957 336
## 109 Jan 1958 340
## 110 Feb 1958 318
## 111 Mar 1958 362
## 112 Apr 1958 348
## 113 May 1958 363
## 114 Jun 1958 435
## 115 Jul 1958 491
## 116 Aug 1958 505
## 117 Sep 1958 404
## 118 Oct 1958 359
## 119 Nov 1958 310
## 120 Dec 1958 337
## 121 Jan 1959 360
## 122 Feb 1959 342
## 123 Mar 1959 406
## 124 Apr 1959 396
## 125 May 1959 420
## 126 Jun 1959 472
## 127 Jul 1959 548
## 128 Aug 1959 559
## 129 Sep 1959 463
## 130 Oct 1959 407
## 131 Nov 1959 362
## 132 Dec 1959 405
## 133 Jan 1960 417
## 134 Feb 1960 391
## 135 Mar 1960 419
## 136 Apr 1960 461
## 137 May 1960 472
## 138 Jun 1960 535
## 139 Jul 1960 622
## 140 Aug 1960 606
## 141 Sep 1960 508
## 142 Oct 1960 461
## 143 Nov 1960 390
## 144 Dec 1960 432

We fit a forecasting model to the data frame above. This will help us analyse any trends/seasonality…which can be beneficial to us when generating future predictions.

We then allow the model to predict future values by extending the original time series by 24 months.

The forecast model will generate predictions based on the future dates given by the future forecast.

forecast_model=prophet::prophet(AirPassengers.df,weekly.seasonality=TRUE,daily.seasonality=TRUE)
future_forecast=prophet::make_future_dataframe(forecast_model, periods=24, freq="month")
predicted=predict(forecast_model,future_forecast)

The following graph illustrates the results:

plot(forecast_model,predicted)

We now create something a little more interactive called a dyplot. This graph allows us to compare the actual vs predicted values of our time series at different intervals of your choice…Give it a try!

library(prophet)
dyplot.prophet(forecast_model,predicted)

1.2 Linear Regression

We now run a linear regression model on our data. This model assumes a linear relationship between the variables time and number of passengers. This may provide an insight on overall trend but may not fully show seasonal fluctuations in our data set. The following shows a summary of the linear regression.

We see the paramters of the model \(\beta_0\) and \(\beta_1\). We also can interpret \(R^2\) and see that 85% of varibaility is accounted for by the linear model.

Summary Of Linear Regression Model:

## 
## Call:
## lm(formula = No_of_Passengers ~ Time)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -62055.907   2166.077  -28.65   <2e-16 ***
## Time            31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16

Let’s have a look at the plot:

plot(AirPassengers,main="AirPassengers + Linear Regression Line", ylab="No. of Passengers", xlab="Year",col="black",type="o")

#Linear Regression Line 
abline(linear_regression_model,col="red",lwd=2.5)

We can see that the linear model does fit the data quite well from around 1950-1954, however later on we observe large fluctuations in the number of passengers.

1.3 Decomposition Of Time Series

We now decompose our Time Series into its three main components: Trend, Seasonality and Residual Noise.

Interpretation:

-TREND:

Here, we see that the trend is increasing due an rise in number of passengers.

One possible explanation for this: In the 1950s the knowledge on airplanes,as a mode of transport, was limited,hence fewer people were inclined to use this method of transport.

However after 1970, the trend started increasing significantly which is likely to represnt the development in technology and accessibility of air travel worldwide.

-SEASONALITY:

Here, the seasonality has a periodic shape suggesting that passengers tend to prefer certain times of the year to travel.

For example, during the summer, there is a noticeable increase in the number of passengers (which is illustrated better in the latter plot) whereas in the winter, the number declines significantly.

The smaller peaks at the end of the year could result from holiday travel, such as New year’s & Christmas.

If we examine weekly seasonality, it shows that the end of the week is most popular time for travel. this is likely because many people dont work on weekends, allowing them to utilsie this time for leisure activities.

Overall we see that after May the number of passengers starts to increase indicating that spring/summer seasons are the most popular for travel.

-RESIDUAL NOISE/RANDOM:

Here, we see that the residual errors are fluctuating from 1950-1953 which indicates tht the model may not be a suitable fit for the data.

However, from 1954-1956, the errors are spread around zero which is good but they seem to increase shortly after and have even larger fluctutions than before after 1958.