1 Week1 Overview

1.1 Agenda

Introduction to Time Series
White noise and random walk
Correlation, independence, orthogonality and auto-correlation
Stationarity and non-stationarity

1.2 What is forecastable?

how well we understand the factors that contribute to it;
how much data are available;
whether the forecasts can affect the thing we are trying to forecast.

1.3 What to forecast?

granularity: every product line, or for groups of products?
granularity: every sales outlet, or for outlets grouped by region, or only for total sales?
weekly data, monthly data or annual data?

1.4 Forecasting Types:

Cross-sectional forecasting: Cross-sectional models are used when the variable to be forecast exhibits a relationship with one or more other predictor variables. Models in this class include regression models, additive models, and some kinds of neural networks

Some people use the term “predict” for cross-sectional data and “forecast” for time series data

Time series forecasting:Time series data are useful when you are forecasting something that is changing over time. Anything that is observed sequentially over time is a time series. Stock price, whether. When forecasting time series data, the aim is to estimate how the sequence of observations will continue into the future

Time series forecasting uses only information on the variable to be forecast, and makes no attempt to discover the factors which affect its behavior. Therefore it will extrapolate trend and seasonal patterns, but it ignores all other information

you can also set up time series with predictors. This is where a lot of new techniques are being added

$ED=f(current temperature, strength of economy, population,time of day, day of week, error).$

$ED_{t+1}=f(ED_{t},ED_{t−1},ED_{t−2},ED_{t−3},…,error)$

1.5 Autocorrelation

correlation measures the extent of a linear relationship between two variables, auto-correlation measures the linear relationship between lagged values of a time series.

There are several auto-correlation coefficients, depending on the lag length

$r_{k}= \dfrac{\sum_{t=k+1}^{T}(y_{t}-\bar{y})(y_{t-k}-\bar{y})}{\sum_{t=1}^{T} (y_{t}-\bar{y})}$

T is the length of the time series.

1.6 ACF

If you plot the auto-correlation coefficients you get the ACF plot

Example from textbook

suppressWarnings(library(forecast))
suppressWarnings(library(fpp))

## Loading required package: fma

## Loading required package: expsmooth

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: tseries

beer2 <- window(ausbeer, start=1992, end=2006-.1)
lag.plot(beer2, lags=9, do.lines=FALSE)

head(beer2)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1992  443  410  420  532
## 1993  433  421

Acf(beer2)

1.7 White noise:

Time series that show no auto-correlation are called “white noise” (auto-correlation close to zero). There are none zero due to randomness.

For a white noise series, we expect 95% of the spikes in the ACF to lie within $\dfrac{±2} {\sqrt{T}}$ where T is the length of the time series. That’s the blue line on ACF plot

set.seed(60661)
x <- ts(rnorm(150))
plot(x, main="White noise")

acf(x)

1.8 Random Walk

Mathematically represented as $Y_{t}=Y_{t-1} + w_{t}$
Autocovariance–measures the linear dependency between two points in the same TS observed at different times.

1.9 Stationarity

A stationary time series is one whose properties do not depend on the time at which the series is observed.

In mathematics and statistics, a stationary process (a.k.a. a strict(ly) stationary process or strong(ly) stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance, if they are present, also do not change over time.

if $y_{t}$ is a stationary time series, then for all $s$, the distribution of $(y_{t},…,y_{t+s})$ does not depend on $t$

Basic idea is that the laws of probability that govern the process behavior do not change over time.
STRICTLY stationary –when the probability behaviors of every collection of values of a TS is identical to that of the time shifted T {s+ℎ}for all h
WEAKLY stationary -is a finite variance process such that:

Mean is constant and does not depend on
Autocovariance function depends on depend on t and s through their absolute delta

If TS is Gaussian then weak stationarity=== strict stationarity

So time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.

White noise series is stationary — it does not matter when you observe it, it should look much the same at any period of time.
a stationary time series will have no predictable patterns in the long-term.
For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly.
for non-stationary data, the value of r1r1 is often large and positive.
adf test and kpss tests for stationarity

1.9.1 Differencing

Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trend and seasonality.

2 Week2 Overview

2.1 Agenda

Smoothing Time Series (Moving Average / Exponential)
Holt Winters
Regression analysis
Univariate and multivariate regression modeling – model assumptions & multicollinearity

2.2 Simple and multiple Regression (again…)

The “error” term does not imply a mistake, but a deviation from the underlying straight line model. It captures anything that may affect yiyi other than xixi. We assume that these errors:

have mean zero; otherwise the forecasts will be systematically biased.
are not auto-correlated; otherwise the forecasts will be inefficient as there is more information to be exploited in the data.
are unrelated to the predictor variable; otherwise there would be more information that should be included in the systematic part of the model.
It is also useful to have the errors normally distributed with constant variance in order to produce prediction intervals and to perform statistical inference. While these additional conditions make the calculations simpler, they are not necessary for forecasting.

The forecast values of y obtained from the observed x values are called “fitted values”.
Rememebr Residuals is same as error

2.2.1 Regression and correlation

$r$ measures the strength and the direction (positive or negative) of the linear relationship between the two variables. The stronger the linear relationship, the closer the observed data points will cluster around a straight line.

$\hat{B_{1}} = r * S_{y}/S_{x}$

where s is standard deviation

2.2.2 residual review:

A non-random pattern may indicate that a non-linear relationship may be required
or some heteroscedasticity is present (i.e., the residuals show non-constant variance)
or there is some left over serial correlation (only when the data are time series).

2.2.3 Non-linear functional forms

$log (y_{i})= \beta_{0} + \beta_{1}log (x_{i})+ \epsilon_{i}$

In this model, the slope β1β1 can be interpeted as an elasticity: β1β1 is the average percentage change in yy resulting from a 1%1% change in xx.

Example from text book:

par(mfrow=c(1,2))
fit2 <- lm(log(Carbon) ~ log(City), data=fuel)
plot(jitter(Carbon) ~ jitter(City), xlab="City (mpg)",
  ylab="Carbon footprint (tonnes per year)", data=fuel)
lines(1:50, exp(fit2$coef[1]+fit2$coef[2]*log(1:50)))
plot(log(jitter(Carbon)) ~ log(jitter(City)), 
  xlab="log City mpg", ylab="log carbon footprint", data=fuel)
abline(fit2)

2.2.4 Regression with time series data

fit.ex3 <- tslm(consumption ~ income, data=usconsumption)
plot(usconsumption, ylab="% change in consumption and income",
  plot.type="single", col=1:2, xlab="Year")
legend("topright", legend=c("Consumption","Income"),
 lty=1, col=c(1,2), cex=.9)

plot(consumption ~ income, data=usconsumption, 
 ylab="% change in consumption", xlab="% change in income")
abline(fit.ex3)

summary(fit.ex3)

## 
## Call:
## tslm(formula = consumption ~ income, data = usconsumption)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3681 -0.3237  0.0266  0.3436  1.5581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.52062    0.06231   8.356 2.79e-14 ***
## income       0.31866    0.05226   6.098 7.61e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6274 on 162 degrees of freedom
## Multiple R-squared:  0.1867, Adjusted R-squared:  0.1817 
## F-statistic: 37.18 on 1 and 162 DF,  p-value: 7.614e-09

Challenges:

future values of the predictor variable (Income in this case) are needed to be input into the estimated model, but these are not known in advance. often you have to fortecast those separately
when fitting a regression model to time series data, it is very common to find autocorrelation in the residuals.It is not wrong becasue it’s not biased
Spurious regression More often than not, time series data are “non-stationary”; that is, the values of the time series do not fluctuate around a constant mean or with a constant variance. High $R_{2}$ and high residual autocorrelation can be signs of spurious regression. often when there are highly correlated data that are totally unrelated such as “#of air passengers”, and “Rice production”. Great example in text book. This often given good short term forecast but fails long term

beer2 <- window(ausbeer,start=1992,end=2006-.1)
fit <- tslm(beer2 ~ trend + season)
summary(fit)

## 
## Call:
## tslm(formula = beer2 ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.024  -8.390   0.249   8.619  23.320 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 441.8141     4.5338  97.449  < 2e-16 ***
## trend        -0.3820     0.1078  -3.544 0.000854 ***
## season2     -34.0466     4.9174  -6.924 7.18e-09 ***
## season3     -18.0931     4.9209  -3.677 0.000568 ***
## season4      76.0746     4.9268  15.441  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.01 on 51 degrees of freedom
## Multiple R-squared:  0.921,  Adjusted R-squared:  0.9149 
## F-statistic: 148.7 on 4 and 51 DF,  p-value: < 2.2e-16

2.2.5 Multicollinearity

A closely related issue is multicollinearity which occurs when similar information is provided by two or more of the predictor variables in a multiple regression. It can occur in a number of ways.

Two predictors are highly correlated with each other
A linear combination of predictors is highly correlated with another linear combination of predictors

In most statistical softwares, if you are not interested in the specific contributions of each predictor, and if the future values of your predictor variables are within their historical ranges, there is nothing to worry about — multicollinearity is not a problem.

2.3 Time series decomp

Trend

A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend “changing direction” when it might go from an increasing trend to a decreasing trend.

Seasonal

A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a fixed and known period.

Cyclic

A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.

2.3.1 additive model

most appropriate if the magnitude of the seasonal fluctuations or the variation around the trend-cycle does not vary with the level of the time series.

$y_{t}=S_{t}+T_{t}+E_{t}$

e.g. toy sales increase by $1 million every Dec.

where $y_{t}$ is the data at period t, St is the seasonal component at period t, $T_{t}$ is the trend-cycle component at period t and Et is the remainder (or irregular or error) component at period t.

2.3.2 multiplicative model (common with econometrics models)

When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of the time series, then a multiplicative model is more appropriate.

$y_{t}=S_{t}T_{t}E_{t}$

$log(y_{t})=log(S_{t})+log(T_{t})+log(E_{t})$

e.g. toy sales increase by 42% every Dec.

example from book

fit <- stl(elecequip, s.window=5)
plot(elecequip, col="gray",
 main="Electrical equipment manufacturing",
 ylab="New orders index", xlab="")
lines(fit$time.series[,2],col="red",ylab="Trend")

plot(fit)

Forecasts of STL objects are obtained by applying a non-seasonal forecasting method to the seasonally adjusted data and re-seasonalizing using the last year of the seasonal component.

For an additive model, the seasonally adjusted data are given by yt−St, and for multiplicative data, the seasonally adjusted values are obtained using yt/St

2.4 moving average

smoother:

$\hat{T}_{t}= 1/m \sum_{j=-k}^{K}(y_{t+j})$

where $m=2k+1$. That is, the estimate of the trend-cycle at time t is obtained by averaging values of the time series within k periods of t.

2.5 Exponential smoothing:

2.5.1 simple exponential smoothing

This method is suitable for forecasting data with no trend or seasonal pattern.

Checkout link

wighted is what we talk about in the lecture that more recent observation have more weight

2.5.2 Holt’s linear trend method

Extended simple exponential smoothing to allow forecasting of data with a trend.

Checkout link

2.5.3 Holt-Winters seasonal method

Exponential smoothing with trend and seasonality The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations

Checkout link

Checkout link for a great short video

3 Week3 Overview

3.1 Agenda

Data transformations
Box-Jenkins ARMA models
Box-Jenkins ARIMA models
Stationarity and invertibility
Model specification

3.2 ARIMA Model

Exponential smoothing models are based on a description of trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

3.2.1 Stationarity

A stationary time series is one whose properties do not depend on the time at which the series is observed. More precisely, if $y_{t}$ is a stationary time series, then for all $s$, the distribution of $(y_{t},…,y_{t+s})$ does not depend on $t$.

Time series with trends, or with seasonality, are not stationary
White noise series is stationary — it does not matter when you observe it
A time series with cyclic behaviour (but not trend or seasonality) is also stationary. That is because the cycles are not of fixed length,

In general, a stationary time series will have no predictable patterns in the long-term.

ACF plots are a good way identify non stationary time series

How to determine stationarity?

For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly.

** One way to determine more objectively if differencing is required is to use a unit root test. ADF ($H_{0}$ is non-stationarity), KPSS, etc.

3.2.2 Differencing

Differencing is a way to make a time series stationary by computing the differences between consecutive observations.

Transformations such as logarithms can help to stabilize the variance of a time series.
Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trend and seasonality.

3.2.3 Random walk models:

When the differenced series is white noise, the model can be written as:

$y_{t}−y_{t-1}=e_{t}$

where $e$ is white noise

Widely used for non-stationary data especially in finance and econ

Random walks typically have:

long periods of apparent trends up or down
sudden and unpredictable changes in direction.

The forecasts from a random walk model are equal to the last observation, as future movements are unpredictable

sometime the differncing is vs last year same time (seasonal naive) or vs more than one previous period

3.2.4 Autoregressive models (AR(p))

In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable.

$y_{t}=c+ϕ1y_{t-1}1+ϕ2y_{t-2}+⋯+ϕp y_{t-p}+e_{t}$,

When ϕ1=0, yt is equivalent to white noise.
When ϕ1=1 and c=0, yt is equivalent to a random walk.
When ϕ1=1 and c≠0, yt is equivalent to a random walk with drift

3.2.5 Moving average models (MA(q))

Rather than use past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model. yt can be thought of as a weighted moving average of the past few forecast errors

$y_{t}=c+e_{t} + ϕ1 e_{t-1} + ϕ 2e_{t-2}+⋯+ϕ pe_{t-q}$

Note: A moving average model is used for forecasting future values while moving average smoothing is used for estimating the trend-cycle of past values.

3.2.6 Non-seasonal ARIMA models

If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model.The “predictors” on the right hand side include both lagged values of ytyt and lagged errors. We call this an ARIMA(p,d,qp,d,q) model, where

p = order of the autoregressive part;
d = degree of first differencing involved for $y_{t}$ ;
q = order of the moving average part.

finding p, d, and q are not trivial but with auto.arima it makes it pretty easy, but with automation you need to be careful with d and c to ensure things are not breaking and you are also searching enough space to find AIC optimum

White noise ARIMA(0,0,0)
Random walk ARIMA(0,1,0) with no constant
Random walk with drift ARIMA(0,1,0) with a constant
Autoregression ARIMA(p,0,0)
Moving average ARIMA(0,0,q)

3.2.6.1 PACF

It is sometimes possible to use the ACF plot, and the closely related PACF plot, to determine appropriate values for p and q as well.

partial autocorrelations. These measure the {relationship} between yt and yt−k after removing the effects of other time lags

par(mfrow=c(1,2))
Acf(usconsumption[,1],main="")
Pacf(usconsumption[,1],main="")

If the data are from an ARIMA(p,d,0p,d,0) or ARIMA(0,d,q0,d,q) model, then the ACF and PACF plots can be helpful in determining the value of p or q. If both p and q are positive, then the plots do not help in finding suitable values of p and q.

3.2.7 Choosing your own model

Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
If the data are non-stationary: take first differences of the data until the data are stationary.
Examine the ACF/PACF: Is an AR(pp) or MA(qq) model appropriate?
Try your chosen model(s), and use the AICc to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.

auto.arima only take care of step 3 to 5

4 FAQ

Why $r^2$ is not valid for non-linear regression?

http://statisticsbyjim.com/regression/r-squared-invalid-nonlinear-regression/

What is the mean and variance of White Noise?

White noise has zero mean and finite variance.

How to deal with Multicollinearity

cocept link

good example for stepwise VIF

what is drift?

In probability theory, stochastic drift is the change of the average value of a stochastic (random) process.

Reference:

https://www.otexts.org/fpp/

Time Series Analysis - Spring 2018,

Cyrus Safaie

March 30, 2018

1 Week1 Overview

1.1 Agenda

1.2 What is forecastable?

1.3 What to forecast?

1.4 Forecasting Types:

1.5 Autocorrelation

1.6 ACF

1.7 White noise:

1.8 Random Walk

1.9 Stationarity

1.9.1 Differencing

2 Week2 Overview

2.1 Agenda

2.2 Simple and multiple Regression (again…)

2.2.1 Regression and correlation

2.2.2 residual review:

2.2.3 Non-linear functional forms

2.2.4 Regression with time series data

2.2.5 Multicollinearity

2.3 Time series decomp

2.3.1 additive model

2.3.2 multiplicative model (common with econometrics models)

2.4 moving average

2.5 Exponential smoothing:

2.5.1 simple exponential smoothing

2.5.2 Holt’s linear trend method

2.5.3 Holt-Winters seasonal method

3 Week3 Overview

3.1 Agenda

3.2 ARIMA Model

3.2.1 Stationarity

3.2.2 Differencing

3.2.3 Random walk models:

3.2.4 Autoregressive models (AR(p))

3.2.5 Moving average models (MA(q))

3.2.6 Non-seasonal ARIMA models

3.2.6.1 PACF

3.2.7 Choosing your own model

4 FAQ