Time Series terminology

Time Series terminology

1 Dependence- It refers to the association of two observations of the same variable at prior time periods.

2 Stationarity- It shows the mean value of the series that remains constant over the time period. If past effects accumulate and the values increase towards infinity then stationarity is not met.

3 Differencing- Differencing is used to make the series stationary and to control the auto-correlations. There may be some cases in time series analyses where we do not require differencing and over-differenced series can produce wrong estimates.

4 Specification - It may involve the testing of the linear or non-linear relationships of dependent variables by using time series models such as ARIMA models.

5 Exponential Smoothing - Exponential smoothing in time series analysis predicts the one next period value based on the past and current value. It involves averaging of data such that the non-systematic components of each individual case or observation cancel out each other. The exponential smoothing method is used to predict the short term prediction.

ARIMA - ARIMA stands for Auto Regressive Integrated Moving Average.

some useful packages to do our analysis and forcasting

library(tidyverse)
library(fpp3)
library(lubridate)

loading data set…..

amzn1=read.csv("D:\\wallpapers and photos\\csv\\time series\\Amazon.csv")

This data set is about amazon stock market details from 1997-06-13 to 2001-05-02.

Converting the dataset into tsibble will help us to do further time series analysis.

amzn=amzn1 %>% 
  mutate(Date=ymd(Date)) %>% 
  as_tsibble(index = Date) %>% 
  fill_gaps(.full=TRUE) %>% 
  fill(everything(),.direction = "updown")
amzn
## # A tsibble: 8,479 x 7 [1D]
##    Date        Open  High   Low Close Adj.Close   Volume
##    <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <int>
##  1 1997-05-15  2.44  2.5   1.93  1.96      1.96 72156000
##  2 1997-05-16  1.97  1.98  1.71  1.73      1.73 14700000
##  3 1997-05-17  1.76  1.77  1.62  1.71      1.71  6106800
##  4 1997-05-18  1.76  1.77  1.62  1.71      1.71  6106800
##  5 1997-05-19  1.76  1.77  1.62  1.71      1.71  6106800
##  6 1997-05-20  1.73  1.75  1.64  1.64      1.64  5467200
##  7 1997-05-21  1.64  1.65  1.38  1.43      1.43 18853200
##  8 1997-05-22  1.44  1.45  1.31  1.40      1.40 11776800
##  9 1997-05-23  1.41  1.52  1.33  1.5       1.5  15937200
## 10 1997-05-24  1.51  1.65  1.46  1.58      1.58  8697600
## # … with 8,469 more rows

summary of the data

summary(amzn)
##       Date                 Open               High               Low          
##  Min.   :1997-05-15   Min.   :   1.406   Min.   :   1.448   Min.   :   1.312  
##  1st Qu.:2003-03-04   1st Qu.:  38.040   1st Qu.:  38.530   1st Qu.:  37.265  
##  Median :2008-12-22   Median :  83.160   Median :  84.800   Median :  81.250  
##  Mean   :2008-12-22   Mean   : 371.984   Mean   : 376.422   Mean   : 367.441  
##  3rd Qu.:2014-10-11   3rd Qu.: 359.890   3rd Qu.: 362.810   3rd Qu.: 356.175  
##  Max.   :2020-07-31   Max.   :3251.060   Max.   :3344.290   Max.   :3151.000  
##      Close            Adj.Close            Volume         
##  Min.   :   1.396   Min.   :   1.396   Min.   :   487200  
##  1st Qu.:  37.950   1st Qu.:  37.950   1st Qu.:  3653450  
##  Median :  83.050   Median :  83.050   Median :  5625700  
##  Mean   : 372.297   Mean   : 372.297   Mean   :  7348637  
##  3rd Qu.: 359.780   3rd Qu.: 359.780   3rd Qu.:  8407300  
##  Max.   :3200.000   Max.   :3200.000   Max.   :104329200
library(patchwork)
p1=ggplot(amzn,aes(x=Open))+
  geom_density(fill="violet")
p2=ggplot(amzn,aes(x=High))+
  geom_density(fill="purple")
p3=ggplot(amzn,aes(x=Low))+
  geom_density(fill="yellow")
p4=ggplot(amzn,aes(x=Close))+
  geom_density(fill="lightblue")
(p1+p2)/(p3+p4)

which shows the distribution of closing,opening,hightest stock, lowest stock in a day is pretty much same . Perhaps there is no outlier

library(plotly)
plot1=ggplot(amzn,aes(x=Date,y=Close))+
  geom_line() + theme(panel.grid.major = element_line(colour = "darkslategray2",
    linetype = "dashed"), panel.grid.minor = element_line(colour = "darkslategray2",
    linetype = "dashed"), panel.background = element_rect(colour = "white"),
    plot.background = element_rect(fill = "aliceblue",
        colour = "aliceblue")) +labs(title = "Date vs Closing of the day plot")
ggplotly(plot1)
amzn %>% 
  gg_season(Close,period = "month",labels = "both") 

above plot shows closing stock of the day is increasing throughout every year.

The common way to test for seasonality of a time series is to plot the series and check for repeatable patterns in fixed time intervals. So, the types of seasonality is determined by the clock or the calendar.

  1. Hour of day

  2. Day of month

  3. Weekly

  4. Monthly

  5. Yearly

However, if we want a more definitive inspection of the seasonality, use the Autocorrelation Function (ACF) plot. There is a strong seasonal pattern, the ACF plot usually reveals definitive repeated spikes at the multiples of the seasonal window

acf_plot=amzn %>% 
  ACF(Close,lag_max = 100) %>% 
  autoplot()
ggplotly(acf_plot)

Decomposition of a Time Series

Stl_graph=amzn %>% 
  model(STL(Close)) %>% 
  components() %>% 
  autoplot()
ggplotly(Stl_graph)
amzn %>% 
  model(STL(Close)) %>% 
  components()
## # A dable: 8,479 x 8 [1D]
## # Key:     .model [1]
## # :        Close = trend + season_year + season_week + remainder
##    .model     Date       Close trend season_week season_year remainder season_…¹
##    <chr>      <date>     <dbl> <dbl>       <dbl>       <dbl>     <dbl>     <dbl>
##  1 STL(Close) 1997-05-15  1.96 0.719     0.0350       0.223      0.981      1.70
##  2 STL(Close) 1997-05-16  1.73 0.741     0.00770      0.225      0.755      1.50
##  3 STL(Close) 1997-05-17  1.71 0.762     0.0359       0.407      0.503      1.27
##  4 STL(Close) 1997-05-18  1.71 0.784     0.0226       0.471      0.431      1.21
##  5 STL(Close) 1997-05-19  1.71 0.805     0.0210       0.0317     0.850      1.66
##  6 STL(Close) 1997-05-20  1.64 0.827    -0.0414      -0.223      1.07       1.90
##  7 STL(Close) 1997-05-21  1.43 0.848    -0.0786      -0.483      1.14       1.99
##  8 STL(Close) 1997-05-22  1.40 0.870     0.0364      -1.07       1.56       2.43
##  9 STL(Close) 1997-05-23  1.5  0.891     0.0111      -1.08       1.68       2.57
## 10 STL(Close) 1997-05-24  1.58 0.912     0.0303      -1.72       2.36       3.28
## # … with 8,469 more rows, and abbreviated variable name ¹​season_adjust

that means there is good trend in the data after 2015 and in augest it is picking highest seasonality

splitting the data into train data for modeling

train=amzn %>% 
  filter(Date <= "1999-09-17"   )

doing some simple model some times can be useful for some sanity check

train %>% 
  model(mean=MEAN(Close),
        snaive=SNAIVE(Close),
        rw=RW(Close~drift())) %>% 
  forecast(h="12 month") %>% 
  autoplot(train)

SNAIVE

  • The seasonal naive model takes the last observed value from a similar period in the past.
#SNAIVE
fit_snaive=train %>% 
  model(SNAIVE(Close)) %>% 
  forecast()

train %>% 
  model(SNAIVE(Close)) %>% 
  report()
## Series: Close 
## Model: SNAIVE 
## 
## sigma^2: 38.2706
accuracy(fit_snaive,amzn)
## # A tibble: 1 × 10
##   .model        .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>         <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(Close) Test   2.44  6.84  3.98  2.84  5.31  1.19  1.10 0.626
train %>% 
  model(SNAIVE(Close)) %>% 
  gg_tsresiduals()

as we can see visually those models are a bad fit, except SNAIVE

# STL DECOMPOSITION MODEL WITH SEASON ADJUST
fit_dcmp=train %>% 
  model(stlf=decomposition_model(STL(Close),NAIVE(season_adjust)))%>% 
  forecast(h=" 3 month")


fit_dcmp %>% 
  autoplot(train)

train %>% 
  model(stlf=decomposition_model(STL(Close),NAIVE(season_adjust))) %>%
  report()
## Series: Close 
## Model: STL decomposition model 
## Combination: season_adjust + season_year + season_week
## 
## ======================================================
## 
## Series: season_adjust + season_year 
## Model: COMBINATION 
## Combination: season_adjust + season_year
## 
## ========================================
## 
## Series: season_adjust 
## Model: NAIVE 
## 
## sigma^2: 2.0537 
## 
## Series: season_year 
## Model: SNAIVE 
## 
## sigma^2: 0.0065 
## 
## 
## Series: season_week 
## Model: SNAIVE 
## 
## sigma^2: 0.0216
fit_dcmp %>%
  accuracy(amzn)
## # A tibble: 1 × 10
##   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 stlf   Test   14.9  17.6  15.1  17.4  17.8  4.51  2.84 0.896
#TSLM
fit_tslm=train %>% 
  model(tslm=TSLM(Close~trend()+season() )) %>% 
  forecast(h="1 year")


fit_tslm %>% 
  autoplot(train)

train %>% 
  model(tslm=TSLM(Close~trend()+season() )) %>% 
  report()
## Series: Close 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.362  -9.465  -2.723   8.456  50.872 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -14.021641   1.469354  -9.543   <2e-16 ***
## trend()         0.095587   0.001884  50.730   <2e-16 ***
## season()week2   0.440766   1.737036   0.254    0.800    
## season()week3   0.230994   1.740593   0.133    0.894    
## season()week4   0.135407   1.740591   0.078    0.938    
## season()week5   0.039821   1.740591   0.023    0.982    
## season()week6  -0.024484   1.740593  -0.014    0.989    
## season()week7   0.153314   1.740598   0.088    0.930    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.62 on 848 degrees of freedom
## Multiple R-squared: 0.7522,  Adjusted R-squared: 0.7501
## F-statistic: 367.7 on 7 and 848 DF, p-value: < 2.22e-16
fit_tslm %>%
  accuracy(amzn)
## # A tibble: 1 × 10
##   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 tslm   Test  -24.0  36.2  29.2 -58.0  63.9  8.73  5.83 0.990

visually judging this is also a bad fit, because it picking seasonality very badly

ETS:

ETS computes a weighted average over all observations in the input time series dataset as its prediction. The weights are exponentially decreasing over time, rather than the constant weights in simple moving average methods. The weights are dependent on a constant parameter, which is known as the smoothing parameter

#ETS 
fit_ets=train %>% 
  model(ets=ETS(Close) ) %>% 
  forecast(h="6 month") 


 train %>% 
  model(ets=ETS(Close) ) %>% 
  report()
## Series: Close 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.004733842 
## 
##   Initial states:
##      l[0]       b[0]
##  1.940126 0.01257808
## 
##   sigma^2:  0.0027
## 
##      AIC     AICc      BIC 
## 5190.770 5190.841 5214.532
fit_ets %>% autoplot(train) 

accuracy(fit_ets,amzn)
## # A tibble: 1 × 10
##   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ets    Test   5.69  13.6  11.0  5.52  13.8  3.30  2.19 0.948

STATIONARITY

  • now we will do arima test ,first we will check if the data is stationary if not we will convert the data. to check stationarity we will do kpss test
  • let,H0: stationary
#arima
amzn %>% 
  features(Close,unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      39.6        0.01
  • thus p value<0.05 meaning the data is not stationary.
amzn %>% 
  features(Close %>% difference() %>% difference()  ,unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0
amzn %>% 
  gg_tsdisplay(Close %>% difference() %>% difference() ,plot_type = "partial")

amzn %>% 
  features(Close %>% difference() %>% difference()  ,unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0238         0.1

thus p value>0.05 meaning the data is stationary.

ARIMA:

  • ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
#arima

fit_arima=train %>% 
model(arima=ARIMA(Close,stepwise = F,approximation = F)) %>% 
  forecast(h="6 month")

fit_arima %>% 
  autoplot(train)

train %>% 
model(arima=ARIMA(Close,stepwise = F,approximation = F)) %>% 
  report()
## Series: Close 
## Model: ARIMA(2,1,2) 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       1.0299  -0.7904  -1.0300  0.8676
## s.e.  0.0773   0.0827   0.0651  0.0658
## 
## sigma^2 estimated as 4.554:  log likelihood=-1859.36
## AIC=3728.72   AICc=3728.79   BIC=3752.47
accuracy(fit_arima,amzn)
## # A tibble: 1 × 10
##   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima  Test   12.1  16.3  12.3  14.4  14.7  3.67  2.62 0.934

Difference between ETS and ARIMA time series

  • ETS models focus on the trend and seasonality in the data while ARIMA focuses on the autocorrelations in the data
train %>% 
  model(SNAIVE(Close)) %>% 
  gg_tsresiduals()

train %>% 
model(arima=ARIMA(Close,stepwise = F,approximation = F)) %>% 
  gg_tsresiduals()

LjungBox test

  • The Ljung-Box Q-test is a “portmanteau” test that assesses the null hypothesis that a series of residuals exhibits no autocorrelation for a fixed number of lags L (see Lags ), against the alternative that some autocorrelation coefficient ρ(k), k = 1, ..., L is nonzero

  • The LjungBox test is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero.

  • H0=model is fine

  • H1=model shows lack of fit

res1=train %>% 
  model(snaive=SNAIVE(Close)) %>% 
  augment() %>% 
  features(.innov,ljung_box,lag=10)
res2=train %>% 
model(arima=ARIMA(Close,stepwise = F,approximation = F)) %>% 
  augment() %>%
  features(.innov,ljung_box,lag=10)
res3=train %>% 
  model(ets=ETS(Close)) %>% 
  augment() %>% 
  features(.innov,ljung_box,lag=10)
res1
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 snaive   1615.         0
res2
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima     13.2     0.215
res3
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 ets       11.7     0.308

meaning SNAIVE shows lack of fit. ETS and ARIMA passed the “portmanteau” test

CONCLUSION

  • according to the data or model fit RMSE of SNAIVE<ETS<ARIMA<STLF<TSLM.but ETS has higher AICc than arima . IF we have to choose our model then we will choose SNAVIE ,ETS and ARIMA , but ARIMA has much more white noise in the residual than SNAIVE.Thus ARIMA would be better