In today’s seminar, you are asked to perform a time series analysis covering material that you have seen during this week but also material that we will see next week.
The first goal of the seminar is to understand how to create a time series object. This can be done with the function ts(). Download the AirPass.csv from Minerva. This file contains monthly totals of international airline passengers for the period from 01/1949 to 12/1960.
Import the data and store it as dataframe called dataset1. Alternatively, to import the data, you can try the following command read.csv(file.choose()). The file.choose will pop up a window that will allow you to select the folder and the file.
The packages related to time series analysis require a time series object, therefore, we cannot use the dataframe dataset1.
To create a time series object you should use the function ts(). This function has the following structure ts(vector, start=, end=, frequency=), where start and end are the the time period of the first and last observation, and frequency is the number of observations per unit time (1=annual, 4=quarterly, 12=monthly).
For example, for a monthly time series starting 01/2009 and ending 12/2012 the syntax would be as below:
myts <- ts(myvector, start=c(2009, 1), end=c(2014, 12), frequency=12)
Task: Create a time series object (“datasetTS”) for the period covered in your dataset. The input vector is the number of passengers (“Passengers”). You should be able to identify the start and end dates from the dataframe.
if you have done it corrert the time series object should now have this form.
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
In the dataset1 there is also a column named PassengersMissing. This column has almost the same values as the original time series, but there are also some missing values (NA).
Task:Calculate the number of missing values for the PassengersMissing column. We have seen the relevant commands in the past.
sum(is.na(dataset1$PassengersMissing))
Task: Create a second time series object where the input vector is the PassengersMissing. Call the new time series object datasetM
datasetM<- ts(dataset1$PassengersMissing, start=c(1949,1), end=c(1960,12), frequency=12)
Task: Remove the original dataset (“dataset1”) from the R Environment. Use the function rm().
rm(dataset1)
In LUBS5308 you have seen data imputation methods. Let’s see how you can impute values in time series analysis and how well they perform. For time series imputations, you can use the methods described in imputeTS package. For the seminar we will use three options. These are the na_interpolation(), na_kalman() and the na_ma(). The na_interpolation() performs linear interpolation which is the simplest type of interpolation. The estimated value is the mean between the value before the missing data and the value after. The na_ma() instead uses a moving average approach. The na_kalman() is most suitable for time series with seasonality and trend. This function either employs a structural time series fitted by maximum likelihood (default) or employs an ARIMA model (We will cover ARIMA models next week).
Task: Complete the timeseries with imputed values based on the four methods describe above. You can follow the code below, but it will require to install and load the library.
datasetComplete1<-na_interpolation(datasetM)
datasetComplete2<-na_ma(datasetM, k=4, weighting = "exponential")
datasetComplete3<-na_kalman(datasetM)
datasetComplete4<-na_kalman(datasetM, model="auto.arima")
Task: Create a new time series object by combining the complete time series (datasetTS), the time series with the missing data (datasetM) and the four time series with the imputed variables. You can use the function cbind(). Check the rows with the missing values and see which imputation method performed better. Call this object whatever name you like.
test<- cbind(datasetTS,datasetM,datasetComplete1,datasetComplete2,datasetComplete3,datasetComplete4){.spoiler}
Task: Remove all the time series objects and keep only the datasetTS.
One way to do this is to select Grid, then select the object you want to remove, and clear.
We can plot now the time series, to explore if patterns, such as trend, seasonality, or cyclicality, exist.
Task: Plot the timeseries object using the plot() function.
plot(datasetTS){.spoiler}
Question What do you observe? Is there any trend present? Is there any seasonality? Does this time series require additive or a multiplicative time decomposition?
In order to understand better the existence of seasonality you could also use the seasonal plot from the library forecast. A seasonal plot is similar to a time plot, but the data is plotted against the individual “seasons” that was observed. The function is the seasonplot().You can do this with the following code:
library(forecast)
seasonplot(datasetTS)
In this seminar we will rely on the forecast package. In order to understand better the patterns of the time series, you can use the simple moving average function, ma(), from the forecast package. In order to understand the syntax of ma() you can use the help function. A simple way to do this is to type ? followed by the name of the function.
Task: Calculate and plot the moving average for k=3, 7 and 13 orders. Combine the three plots and the plot from the original series into one graph. We have seen how to combine plots in lecture 1 with the (par(mfrow))
par(mfrow=c(2,2))
plot(datasetTS)
plot(ma(datasetTS,3))
plot(ma(datasetTS,7))
plot(ma(datasetTS,13))
The next task is to decompose the time series based on the loess method. From the step 8 you may observe that a multiplicative decomposition is more appropriate for the time series as the seasonality varies across time. The stl() function can only handle additive models, but this should not be a serious limitation (Remember that multiplicative models can be transformed into additive models using a log transformation).
Task:log transform the time series object. The function is the log(). Call the object logPass. Plot the new time series. What do you observe?
par(mfrow=c(1,1))
logPass<- log(datasetTS)
plot(logPass)
Task: Use the stl() function to decompose the logarithmic time series and save it to an object called fit. The argument s.window should be set to “periodic” as the seasonality based on the seasonal plot seem to be same across all years. Then, plot the fit object.
fit<- stl(logPass, s.window=“periodic”)
plot(fit)
Task: Remove all the objects and keep only the original time series (datasetTS).
Now we will use the simple exponential smoothing to forecast the next n periods. The function for that is the ses().We will discuss exponential smoothing in next lecture. You can define the number of the periods you want to forecast by using two arguments as following: ses(timeseries name, h = number of periods).
Task: i) Forecast the original time series for the next 10 periods. Store them as an object called fc Plot the prediction object using the following syntax
autoplot(datasetTS) + autolayer(fc)
fc <- ses(datasetTS, h = 10)
autoplot(datasetTS) + autolayer(fc)
Let’s create our train set using the subset function. As we have discussed it is very important to see how a model behaves out-of-sample. We will use all but the last 20 observations to construct the train dataset. You can do this following the code below:
train <- subset(datasetTS, end = length(datasetTS) - 20)
Task: Compute SES and naive forecasts for the next 20 periods, store them under the names fcses and fcnaive. The syntax for the naïve model function naïve() is similar to the ses(). Use the train subset.
Question What do you expect to be the predicted value of the naïve model?
fcses <- ses(train, h = 20)
fcnaive <- naive(train, h = 20)
To find the best performing model we will use the accuracy () function. This function compares the forecast and the actual values (the first argument should be the forecasts and the second argument should be the test set from the original data). Accuracy measures are part of the next lecture.
Task:Calculate forecast accuracy measures of the two sets of forecasts and see which one performed better
accuracy(fcses, datasetTS)
accuracy(fcnaive,datasetTS)
Simple exponential smoothing works fine provided that your data has no trend or seasonality. In case of trend and seasonality two alternative methods can be used. These are the Holt linear trend method that accounts for the trend and the Holt-Winters that accounts for the trend and seasonality. The functions have similar syntax to the ses() and naïve(). For the Holt method you should use the function holt() while for the Holt-winters the function hw().
Task: Compute Holt and Holt-Winters method for the next 10 periods, save them as fcholt and fchw. Plot the forecasts to see how they behave
fcholt <- holt(datasetTS, h=10)
summary(fcholt)
plot(fcholt)
fchw<- hw(datasetTS, h=10)
plot(fchw)
The Holt-Winter has also an argument seasonal= “multiplicative” or “additive”. This is to select additive and multiplicative methods for seasonality. The default value is the additive method but in that specific time series the multiplicative should be more suitable as the seasonality increases over time. The syntax for the multiplicative has an additional argument seasonal=“multiplicative”.
Task:Compute Holt-Winters with multiplicative seasonality for the next 10 periods
fchw1<- hw(datasetTS, h=10, seasonal=“multiplicative”)
plot(fchw1)
Task: Finally, use as before the train subset. Forecast the last 20 observations based on the Holt, Holt-Winters, and Holt-Winters multiplicative method (you can assign the names fcholtT, fchwT, fchw1T). Compare the accuracy of the models and plot the original series (datasets) adding three autolayers for the forecasting of the three models (you should use the $mean, for example + autolayer(fcholtT$mean)).
fcholtT <- holt(train, h=20)
fchwT<- hw(train, h=20)
fchw1T<- hw(train, h=20, seasonal=“multiplicative”)
accuracy(fcholtT, datasetTS)
accuracy(fchwT, datasetTS)
accuracy(fchw1T, datasetTS)
autoplot(datasetTS) + autolayer(fcholtT$mean) + autolayer(fchwT$mean) + autolayer(fchw1T$mean)
Well done you have managed to complete this week’s seminar!!!!