library(tseries)Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
plot(AirPassengers)A very high level overview. I have a whole course next semester, so till then, this knowledge should hopefully suffice.
In case of analysis of the various variables at one instant in time. Used to compare variable of our interest with other variables. Generally, Time Series Analysis is accompanied by cross sectional analysis.
Difference between TSA and regression.
Extrapolation vs interpolation (normal regression)
Prediction intervals (confidence intervals of predictions) grow larger with time due to uncertainty piling up.
Trend
Cyclical
Seasonal
Randomness
We’ll use the AirPassengers dataset (inbuilt in R), which shows global monthly air passenger traffic from 1949 to 1960.
library(tseries)Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
plot(AirPassengers)Tests:
Visual inspection (normal plot)
Seasonal subsequence plots (local average kinda)
Box plots
Correlogram
Randomness could be in the form of white noise. (Expectation 0, constant std dev, correlation between diff lags 0 i.e. no seasonality).
If your model residuals are white noise, good job. You’ve captured everything worth capturing in your model.
Data = Signal + Noise
To check if white noise, inspect visually, check local convolutionish averages (should be constant), check ACFs (should be insignificant).
Constant mean and variance
No seasonality
Check visually or compare several local means or variances
For our current dataset, safe to say that it’s not stationary.
Even the maths says so.
adf.test(AirPassengers)Warning in adf.test(AirPassengers): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: AirPassengers
Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
If mean isn’t constant take the difference (\(y_t - y_{t-1}\))
plot(diff(AirPassengers))Well, we tried. The variance isn’t constant, and there’s still seasonality.
Just take the trend to be linear or smth and model the residual.
Take log or sqrt to stabilise variance.
plot(log(diff(AirPassengers)))Warning in log(diff(AirPassengers)): NaNs produced
Simple Pearson’s correlation coefficient between \(A_t\) and \(A_{t-p}\). (may not be reflective of actual predictive value due to indirect effect). No. of significant ACFs is a useful estimate of the number of moving average (MA) coefficients in the model.
acf(AirPassengers)In our case, the data is highly regular so the ACF is high for most lags. Decaying ACF suggests autoregressive process.
Coefficient in the regression model. More reflective of actual predictive utility as it accounts for and removes redundant correlation (which is passed on indirectly through time steps). Helps you to identify the number of autoregression (AR) coefficients in an ARIMA model.
pacf(AirPassengers)~ 3 useful lags.
Linear regression on its own past values.
# 10 years training data
modelo <- arma(AirPassengers[1:120], order = c(2,0))
summary(modelo)
Call:
arma(x = AirPassengers[1:120], order = c(2, 0))
Model:
ARMA(2,0)
Residuals:
Min 1Q Median 3Q Max
-90.334 -18.958 -5.639 14.965 73.367
Coefficient(s):
Estimate Std. Error t value Pr(>|t|)
ar1 1.24926 0.08706 14.349 < 2e-16 ***
ar2 -0.31255 0.08650 -3.613 0.000303 ***
intercept 16.91794 6.92338 2.444 0.014542 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Fit:
sigma^2 estimated as 732.4, Conditional Sum-of-Squares = 85686.21, AIC = 1138.1
We model the series in terms of moving averages of “errors” left over from.
Can only predict n periods if n order.
Not too useful on its own.
MA(1) <-> AR(inf)
MA(Inf) <-> AR(1)
In moving average, all the samples are smoothed with equal weight whereas in exponential smoothing the more we go to past lesser the weight is given to the samples.
\[ F_t = F_{t-1} + \alpha (F_{t-1}-A_{t-1}) \]
Accounts for trend
Accounts for seasonality
Just use multivariate Box Jenkins duh.
Time Series Analysis: YouTube playlist by ritvikmath (quite haphazard ordered)