I choose the U.S. leading economic index (LEI) to do analysis. It is a monthly time series data which can be used to forecast U.S. economy. Also, economy can be affected by seasonal factors. So, it is suitable for this week discussion. First, we need to load some necessary packages.
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.6.2
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
library(forecast)
## Warning: package 'forecast' was built under R version 3.6.2
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
Now let’s import the LEI data and check its structure
USLEI <- read_csv("TheConferenceBoard_USLEI_Historical_Data.csv")
## Parsed with column specification:
## cols(
## Date = col_date(format = ""),
## CEI = col_double(),
## LEI = col_double(),
## LAG = col_double(),
## `LEI change` = col_character()
## )
str(USLEI)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 704 obs. of 5 variables:
## $ Date : Date, format: "1959-01-01" "1959-02-01" ...
## $ CEI : num 30.5 30.8 31 31.3 31.5 31.6 31.5 31.1 31.1 31.1 ...
## $ LEI : num 33.3 33.8 34.3 34.4 34.7 34.8 34.9 34.8 34.9 34.7 ...
## $ LAG : num 32.5 32.7 32.7 32.9 33.1 33.2 33.5 33.9 34.1 34.4 ...
## $ LEI change: chr NA "1.50%" "1.50%" "0.30%" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_date(format = ""),
## .. CEI = col_double(),
## .. LEI = col_double(),
## .. LAG = col_double(),
## .. `LEI change` = col_character()
## .. )
Then, I need to assign the Date variable as “Date”, so that R can understand that. After that, we can transform the LEI to time series data.
USLEI$Date <- strptime(USLEI$Date, '%Y-%m-%d')
print(paste(
"Our data starts on",
head(USLEI$Date, 1),
", and ends on",
tail(USLEI$Date, 1),
sep = " ")
)
## [1] "Our data starts on 1959-01-01 , and ends on 2020-08-20"
LEI_ts<-ts(USLEI$LEI,start=1959,frequency=12)
Let’s plot the LEI time series data.
autoplot(LEI_ts)
Overall, we can find a positive long-term trend
Now, let’s view from seasonal and subseries perspectives:
ggseasonplot(LEI_ts,polar=TRUE)
ggsubseriesplot(LEI_ts)
Well, the seasonal effect is insignificant from the plots. Since the seasonal graph shows almost perfect circles, and subseries plot indicates that the monthly LEIs have the similar pattern and almost the same average monthly LEIs.
Then, I use both additive and multiplicative methods to decompose the LEI.
aLEI<-decompose(LEI_ts,type='additive')
autoplot(aLEI)
mLEI<-decompose(LEI_ts,type='multiplicative')
autoplot(mLEI)
By comparing these two plots, we can see that random/remainder part is better in the multiplicative time series than in the additive time series. Because in the additive time series, the amplitude of variation are getting larger as time goes by, while it is more steady in the multiplicative time series. So, in our case where the seasonal variation increases over time, the multiplicative model is more useful.
Finally, we can forecast based on the multiplicative model:
mymodel <- tslm(LEI_ts~season*trend,LEI_ts)
my_fc<-forecast(mymodel,h=120)
autoplot(my_fc)