Introduction to Time Series
White noise and random walk
Correlation, independence, orthogonality and auto-correlation
Stationarity and non-stationarity
how well we understand the factors that contribute to it;
how much data are available;
whether the forecasts can affect the thing we are trying to forecast.
granularity: every product line, or for groups of products?
granularity: every sales outlet, or for outlets grouped by region, or only for total sales?
weekly data, monthly data or annual data?
Some people use the term “predict” for cross-sectional data and “forecast” for time series data
Time series forecasting uses only information on the variable to be forecast, and makes no attempt to discover the factors which affect its behavior. Therefore it will extrapolate trend and seasonal patterns, but it ignores all other information
you can also set up time series with predictors. This is where a lot of new techniques are being added
\(ED=f(current temperature, strength of economy, population,time of day, day of week, error).\)
OR
\(ED_{t+1}=f(ED_{t},ED_{t−1},ED_{t−2},ED_{t−3},…,error)\)
correlation measures the extent of a linear relationship between two variables, auto-correlation measures the linear relationship between lagged values of a time series.
There are several auto-correlation coefficients, depending on the lag length
\(r_{k}= \dfrac{\sum_{t=k+1}^{T}(y_{t}-\bar{y})(y_{t-k}-\bar{y})}{\sum_{t=1}^{T} (y_{t}-\bar{y})}\)
T is the length of the time series.
If you plot the auto-correlation coefficients you get the ACF plot
Example from textbook
suppressWarnings(library(forecast))
suppressWarnings(library(fpp))
## Loading required package: fma
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: tseries
beer2 <- window(ausbeer, start=1992, end=2006-.1)
lag.plot(beer2, lags=9, do.lines=FALSE)
head(beer2)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1992 443 410 420 532
## 1993 433 421
Acf(beer2)
Time series that show no auto-correlation are called “white noise” (auto-correlation close to zero). There are none zero due to randomness.
For a white noise series, we expect 95% of the spikes in the ACF to lie within \(\dfrac{±2} {\sqrt{T}}\) where T is the length of the time series. That’s the blue line on ACF plot
set.seed(60661)
x <- ts(rnorm(150))
plot(x, main="White noise")
acf(x)
Mathematically represented as \(Y_{t}=Y_{t-1} + w_{t}\)
Autocovariance–measures the linear dependency between two points in the same TS observed at different times.
A stationary time series is one whose properties do not depend on the time at which the series is observed.
In mathematics and statistics, a stationary process (a.k.a. a strict(ly) stationary process or strong(ly) stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance, if they are present, also do not change over time.
if \(y_{t}\) is a stationary time series, then for all \(s\), the distribution of \((y_{t},…,y_{t+s})\) does not depend on \(t\)
Basic idea is that the laws of probability that govern the process behavior do not change over time.
STRICTLY stationary –when the probability behaviors of every collection of values of a TS is identical to that of the time shifted T {s+ℎ}for all h
WEAKLY stationary -is a finite variance process such that:
So time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.
White noise series is stationary — it does not matter when you observe it, it should look much the same at any period of time.
a stationary time series will have no predictable patterns in the long-term.
For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly.
for non-stationary data, the value of r1r1 is often large and positive.
adf test and kpss tests for stationarity
Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trend and seasonality.
Smoothing Time Series (Moving Average / Exponential)
Holt Winters
Regression analysis
Univariate and multivariate regression modeling – model assumptions & multicollinearity
The “error” term does not imply a mistake, but a deviation from the underlying straight line model. It captures anything that may affect yiyi other than xixi. We assume that these errors:
have mean zero; otherwise the forecasts will be systematically biased.
are not auto-correlated; otherwise the forecasts will be inefficient as there is more information to be exploited in the data.
are unrelated to the predictor variable; otherwise there would be more information that should be included in the systematic part of the model.
It is also useful to have the errors normally distributed with constant variance in order to produce prediction intervals and to perform statistical inference. While these additional conditions make the calculations simpler, they are not necessary for forecasting.
The forecast values of y obtained from the observed x values are called “fitted values”.
Rememebr Residuals is same as error
\(r\) measures the strength and the direction (positive or negative) of the linear relationship between the two variables. The stronger the linear relationship, the closer the observed data points will cluster around a straight line.
\(\hat{B_{1}} = r * S_{y}/S_{x}\)
where s is standard deviation
A non-random pattern may indicate that a non-linear relationship may be required
or some heteroscedasticity is present (i.e., the residuals show non-constant variance)
or there is some left over serial correlation (only when the data are time series).
\(log (y_{i})= \beta_{0} + \beta_{1}log (x_{i})+ \epsilon_{i}\)
In this model, the slope β1β1 can be interpeted as an elasticity: β1β1 is the average percentage change in yy resulting from a 1%1% change in xx.
Example from text book:
par(mfrow=c(1,2))
fit2 <- lm(log(Carbon) ~ log(City), data=fuel)
plot(jitter(Carbon) ~ jitter(City), xlab="City (mpg)",
ylab="Carbon footprint (tonnes per year)", data=fuel)
lines(1:50, exp(fit2$coef[1]+fit2$coef[2]*log(1:50)))
plot(log(jitter(Carbon)) ~ log(jitter(City)),
xlab="log City mpg", ylab="log carbon footprint", data=fuel)
abline(fit2)
fit.ex3 <- tslm(consumption ~ income, data=usconsumption)
plot(usconsumption, ylab="% change in consumption and income",
plot.type="single", col=1:2, xlab="Year")
legend("topright", legend=c("Consumption","Income"),
lty=1, col=c(1,2), cex=.9)
plot(consumption ~ income, data=usconsumption,
ylab="% change in consumption", xlab="% change in income")
abline(fit.ex3)
summary(fit.ex3)
##
## Call:
## tslm(formula = consumption ~ income, data = usconsumption)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3681 -0.3237 0.0266 0.3436 1.5581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.52062 0.06231 8.356 2.79e-14 ***
## income 0.31866 0.05226 6.098 7.61e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6274 on 162 degrees of freedom
## Multiple R-squared: 0.1867, Adjusted R-squared: 0.1817
## F-statistic: 37.18 on 1 and 162 DF, p-value: 7.614e-09
Challenges:
future values of the predictor variable (Income in this case) are needed to be input into the estimated model, but these are not known in advance. often you have to fortecast those separately
when fitting a regression model to time series data, it is very common to find autocorrelation in the residuals.It is not wrong becasue it’s not biased
Spurious regression More often than not, time series data are “non-stationary”; that is, the values of the time series do not fluctuate around a constant mean or with a constant variance. High \(R_{2}\) and high residual autocorrelation can be signs of spurious regression. often when there are highly correlated data that are totally unrelated such as “#of air passengers”, and “Rice production”. Great example in text book. This often given good short term forecast but fails long term
beer2 <- window(ausbeer,start=1992,end=2006-.1)
fit <- tslm(beer2 ~ trend + season)
summary(fit)
##
## Call:
## tslm(formula = beer2 ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.024 -8.390 0.249 8.619 23.320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 441.8141 4.5338 97.449 < 2e-16 ***
## trend -0.3820 0.1078 -3.544 0.000854 ***
## season2 -34.0466 4.9174 -6.924 7.18e-09 ***
## season3 -18.0931 4.9209 -3.677 0.000568 ***
## season4 76.0746 4.9268 15.441 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.01 on 51 degrees of freedom
## Multiple R-squared: 0.921, Adjusted R-squared: 0.9149
## F-statistic: 148.7 on 4 and 51 DF, p-value: < 2.2e-16
A closely related issue is multicollinearity which occurs when similar information is provided by two or more of the predictor variables in a multiple regression. It can occur in a number of ways.
Two predictors are highly correlated with each other
A linear combination of predictors is highly correlated with another linear combination of predictors
In most statistical softwares, if you are not interested in the specific contributions of each predictor, and if the future values of your predictor variables are within their historical ranges, there is nothing to worry about — multicollinearity is not a problem.
A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend “changing direction” when it might go from an increasing trend to a decreasing trend.
A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a fixed and known period.
A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.
most appropriate if the magnitude of the seasonal fluctuations or the variation around the trend-cycle does not vary with the level of the time series.
\(y_{t}=S_{t}+T_{t}+E_{t}\)
e.g. toy sales increase by $1 million every Dec.
where \(y_{t}\) is the data at period t, St is the seasonal component at period t, \(T_{t}\) is the trend-cycle component at period t and Et is the remainder (or irregular or error) component at period t.
When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of the time series, then a multiplicative model is more appropriate.
\(y_{t}=S_{t}T_{t}E_{t}\)
OR
\(log(y_{t})=log(S_{t})+log(T_{t})+log(E_{t})\)
e.g. toy sales increase by 42% every Dec.
example from book
fit <- stl(elecequip, s.window=5)
plot(elecequip, col="gray",
main="Electrical equipment manufacturing",
ylab="New orders index", xlab="")
lines(fit$time.series[,2],col="red",ylab="Trend")
plot(fit)
Forecasts of STL objects are obtained by applying a non-seasonal forecasting method to the seasonally adjusted data and re-seasonalizing using the last year of the seasonal component.
smoother:
\(\hat{T}_{t}= 1/m \sum_{j=-k}^{K}(y_{t+j})\)
where \(m=2k+1\). That is, the estimate of the trend-cycle at time t is obtained by averaging values of the time series within k periods of t.
This method is suitable for forecasting data with no trend or seasonal pattern.
wighted is what we talk about in the lecture that more recent observation have more weight
Extended simple exponential smoothing to allow forecasting of data with a trend.
Exponential smoothing with trend and seasonality The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations
Data transformations
Box-Jenkins ARMA models
Box-Jenkins ARIMA models
Stationarity and invertibility
Model specification
Exponential smoothing models are based on a description of trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
A stationary time series is one whose properties do not depend on the time at which the series is observed. More precisely, if \(y_{t}\) is a stationary time series, then for all \(s\), the distribution of \((y_{t},…,y_{t+s})\) does not depend on \(t\).
Time series with trends, or with seasonality, are not stationary
White noise series is stationary — it does not matter when you observe it
A time series with cyclic behaviour (but not trend or seasonality) is also stationary. That is because the cycles are not of fixed length,
In general, a stationary time series will have no predictable patterns in the long-term.
How to determine stationarity?
** One way to determine more objectively if differencing is required is to use a unit root test. ADF (\(H_{0}\) is non-stationarity), KPSS, etc.
Differencing is a way to make a time series stationary by computing the differences between consecutive observations.
Transformations such as logarithms can help to stabilize the variance of a time series.
Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trend and seasonality.
When the differenced series is white noise, the model can be written as:
\(y_{t}−y_{t-1}=e_{t}\)
where \(e\) is white noise
Random walks typically have:
The forecasts from a random walk model are equal to the last observation, as future movements are unpredictable
In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable.
\(y_{t}=c+ϕ1y_{t-1}1+ϕ2y_{t-2}+⋯+ϕp y_{t-p}+e_{t}\),
When ϕ1=0, yt is equivalent to white noise.
When ϕ1=1 and c=0, yt is equivalent to a random walk.
When ϕ1=1 and c≠0, yt is equivalent to a random walk with drift
Rather than use past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model. yt can be thought of as a weighted moving average of the past few forecast errors
\(y_{t}=c+e_{t} + ϕ1 e_{t-1} + ϕ 2e_{t-2}+⋯+ϕ pe_{t-q}\)
If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model.The “predictors” on the right hand side include both lagged values of ytyt and lagged errors. We call this an ARIMA(p,d,qp,d,q) model, where
p = order of the autoregressive part;
d = degree of first differencing involved for \(y_{t}\) ;
q = order of the moving average part.
finding p, d, and q are not trivial but with auto.arima it makes it pretty easy, but with automation you need to be careful with d and c to ensure things are not breaking and you are also searching enough space to find AIC optimum
White noise ARIMA(0,0,0)
Random walk ARIMA(0,1,0) with no constant
Random walk with drift ARIMA(0,1,0) with a constant
Autoregression ARIMA(p,0,0)
Moving average ARIMA(0,0,q)
It is sometimes possible to use the ACF plot, and the closely related PACF plot, to determine appropriate values for p and q as well.
par(mfrow=c(1,2))
Acf(usconsumption[,1],main="")
Pacf(usconsumption[,1],main="")
If the data are from an ARIMA(p,d,0p,d,0) or ARIMA(0,d,q0,d,q) model, then the ACF and PACF plots can be helpful in determining the value of p or q. If both p and q are positive, then the plots do not help in finding suitable values of p and q.
Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
If the data are non-stationary: take first differences of the data until the data are stationary.
Examine the ACF/PACF: Is an AR(pp) or MA(qq) model appropriate?
Try your chosen model(s), and use the AICc to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
auto.arima only take care of step 3 to 5
Why \(r^2\) is not valid for non-linear regression?
http://statisticsbyjim.com/regression/r-squared-invalid-nonlinear-regression/
What is the mean and variance of White Noise?
White noise has zero mean and finite variance.
How to deal with Multicollinearity
what is drift?
In probability theory, stochastic drift is the change of the average value of a stochastic (random) process.
Reference: