Author: Charles Ugiagbe

Date: “12/12/2023”

library(tidyverse)
library(lubridate)
library(forecast)
class(AirPassengers)
## [1] "ts"
AirPassengers
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
# Create a color palette for the box plot
my_colors <- rainbow(12)
 
# Box plot by month with customizations
boxplot(split(AirPassengers, cycle(AirPassengers)),
        xlab = "Month", ylab = "Number of Passengers",
        col = my_colors,  # Assign colors to each box
        border = "black",  # Set the border color
        main = "Monthly Air Passenger Counts by Month",
        names = month.abb,  # Use abbreviated month names as labels
        outline = FALSE)  # Remove outliers

Plot the dataset to observe how the values have been changing from 1949 to 1960.

plot(AirPassengers)

Time series data are decomposed into three components :

Seasonal – Patterns that show how data is being changed over a certain period of time. Example – A clothing e-commerce website will have heavy traffic during festive seasons and less traffic during normal times. Here it is a seasonal pattern as value is being increased only at a certain period of time.

Trend – It is a pattern that shows how values are being changed. For example how a website is running overall if running successfully trend goes up, if not, the trend comes down.

Random – The remaining data of the time series after seasonal trends are removed is a random pattern. This is also known as noise.

data<-ts(AirPassengers, frequency=12)
d<-decompose(data, "multiplicative")
plot(d)

The parameter multiplicative is added because time series data changes with the trend, if not so, such kinds of data are called “additive”.

Now we forecast 10 years of data by using Arima() function.

model<-auto.arima(AirPassengers)
summary(model)
## Series: AirPassengers 
## ARIMA(2,1,1)(0,1,0)[12] 
## 
## Coefficients:
##          ar1     ar2      ma1
##       0.5960  0.2143  -0.9819
## s.e.  0.0888  0.0880   0.0292
## 
## sigma^2 = 132.3:  log likelihood = -504.92
## AIC=1017.85   AICc=1018.17   BIC=1029.35
## 
## Training set error measures:
##                  ME     RMSE     MAE      MPE     MAPE     MASE        ACF1
## Training set 1.3423 10.84619 7.86754 0.420698 2.800458 0.245628 -0.00124847
# h = 10*12 because, forecast is for 10 years for all 12 months
f<-forecast(model, level=c(95), h=10*12)
plot(f)

The provided ARIMA(2,1,1)(0,1,0)[12] model is designed for time series forecasting with a 12-month seasonal pattern. It includes a second-order autoregressive (AR) component, first-order differencing (I) to make the series stationary, and a first-order moving average (MA) term.

The model estimates the coefficients for these components and reports error measures. The AIC, AICc, and BIC values help assess model quality, with lower values indicating a better fit. The error measures, including RMSE and MAPE, evaluate the model’s predictive accuracy on the training data, while the log likelihood measures how well the model fits the data. Further evaluation on new data is needed to confirm its forecasting performance.

The shaded region covers all the values that can possibly occur in the future10 years and the blue color pattern is the average of all values in the shaded part. This is how we can forecast values using any time series dataset.