SESIÓN 3. Componentes y ARIMA en R

Autor/a

Gerson Rivera

Fecha de publicación

15 agosto 2024

Componentes de ST en R

Activar paquetes

library(tseries)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(forecast)

Simular serie estacionaria

x <- rnorm(1000) 
#x

Test Dickey Fuller

adf.test(x) # Dickey Fuller Test

    Augmented Dickey-Fuller Test

data:  x
Dickey-Fuller = -11.575, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary

Diferencias

ndiffs(x)
[1] 0
#nsdiffs(x)

Utilizando serie notten

  • Temperaturas mensuales promedio en Nottingham, en el periodo 1920–1939. Es un objeto de la clase serie temporal que contiene la temperatura media del aire en el castillo de Nottingham en grados Fahrenheit, durante 20 años.
#library(ggplot2)
plot(nottem)

autoplot(nottem)

Descomposición de componentes

  • Tendencia
  • Estacionalidad
  • Aleatoriedad
plot(decompose(nottem))

Aplicamos el test de Dickey Fuller

adf.test(nottem)

    Augmented Dickey-Fuller Test

data:  nottem
Dickey-Fuller = -12.998, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Otras exploraciones

BoxCox.lambda(nottem)
[1] 0.04214111
ndiffs(nottem)
[1] 0
acf(nottem)

Serie no estacionaria

y <- diffinv(x) 
plot(y)

adf.test(y)

    Augmented Dickey-Fuller Test

data:  y
Dickey-Fuller = -1.6065, Lag order = 9, p-value = 0.7449
alternative hypothesis: stationary
acf(y)

ACF y PACF

acf(nottem, lag.max = 20, plot = T)

pacf(nottem, lag.max =20, plot = T)

  • \underline{\text{Nota}}. Los lag.max hacen referencia al número máximo de retrasos y la estructura plot = F se utiliza cuando no se requiere el gráfico solo los valores.

ACF de ruido blanco (wn)

acf(x, plot = T)

Alternativa 1, para la descomposición

autoplot(decompose(nottem, type = "additive"))

Alternativa 2, para la descomposición

plot(stl(nottem, s.window="periodic")) 

#s.window: seasonal window

Extraer los componentes de ST

mynottem = decompose(nottem, "additive")
class(mynottem)
[1] "decomposed.ts"
plot(mynottem)

Arima en R

Activar paquetes

library(forecast)
library(tseries)

Base a utilizar (ST)

  • La base corresponde al número de linces atrapados entre 1821 a 1834 en Canadá.
data("lynx")
lynx
Time Series:
Start = 1821 
End = 1934 
Frequency = 1 
  [1]  269  321  585  871 1475 2821 3928 5943 4950 2577  523   98  184  279  409
 [16] 2285 2685 3409 1824  409  151   45   68  213  546 1033 2129 2536  957  361
 [31]  377  225  360  731 1638 2725 2871 2119  684  299  236  245  552 1623 3311
 [46] 6721 4254  687  255  473  358  784 1594 1676 2251 1426  756  299  201  229
 [61]  469  736 2042 2811 4431 2511  389   73   39   49   59  188  377 1292 4031
 [76] 3495  587  105  153  387  758 1307 3465 6991 6313 3794 1836  345  382  808
 [91] 1388 2713 3800 3091 2985 3790  674   81   80  108  229  399 1132 2432 3574
[106] 2935 1537  529  485  662 1000 1590 2657 3396

Comportamiento gráfico

plot(lynx)

tsdisplay(lynx) #autoregresión

acf(lynx)

Modelar AR(2)

adf.test(lynx)

    Augmented Dickey-Fuller Test

data:  lynx
Dickey-Fuller = -6.3068, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
myarima1 = arima(lynx, order = c(2,0,0))
myarima1

Call:
arima(x = lynx, order = c(2, 0, 0))

Coefficients:
         ar1      ar2  intercept
      1.1474  -0.5997  1545.4458
s.e.  0.0742   0.0740   181.6736

sigma^2 estimated as 768159:  log likelihood = -935.02,  aic = 1878.03
model=Arima(lynx,order=c(2,0,0))
model
Series: lynx 
ARIMA(2,0,0) with non-zero mean 

Coefficients:
         ar1      ar2       mean
      1.1474  -0.5997  1545.4458
s.e.  0.0742   0.0740   181.6736

sigma^2 = 788920:  log likelihood = -935.02
AIC=1878.03   AICc=1878.4   BIC=1888.98
summary(model)
Series: lynx 
ARIMA(2,0,0) with non-zero mean 

Coefficients:
         ar1      ar2       mean
      1.1474  -0.5997  1545.4458
s.e.  0.0742   0.0740   181.6736

sigma^2 = 788920:  log likelihood = -935.02
AIC=1878.03   AICc=1878.4   BIC=1888.98

Training set error measures:
                    ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.3615075 876.4468 631.7405 -74.99125 153.9046 0.7603467
                     ACF1
Training set -0.008188618
coef(myarima1)
         ar1          ar2    intercept 
   1.1474360   -0.5997452 1545.4458439 
tail(lynx)
Time Series:
Start = 1929 
End = 1934 
Frequency = 1 
[1]  485  662 1000 1590 2657 3396

Análisis de los residuos

plot(residuals(myarima1))

Pronóstico

pred=forecast(myarima1,h=5)
plot(lynx)

#lines(pred)
plot(pred,col=c('red','green'))

Modelar MA(2)

myarima = arima(lynx, order = c(0,0,2))
summary(myarima)

Call:
arima(x = lynx, order = c(0, 0, 2))

Coefficients:
         ma1     ma2  intercept
      1.1407  0.4697  1545.3670
s.e.  0.0776  0.0721   224.5215

sigma^2 estimated as 855092:  log likelihood = -941.03,  aic = 1890.06

Training set error measures:
                   ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
Training set 3.733296 924.7119 667.1134 -123.7496 169.1603 0.8029206 0.08937316
coefficients(myarima)
         ma1          ma2    intercept 
   1.1407405    0.4696918 1545.3670440 
plot(residuals(myarima))

Test DF

adf.test(lynx)
Warning in adf.test(lynx): p-value smaller than printed p-value

    Augmented Dickey-Fuller Test

data:  lynx
Dickey-Fuller = -6.3068, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

ARIMA(p,d,q) del paquete forecast

myarima <- Arima(lynx, order = c(4,0,0))
myarima
Series: lynx 
ARIMA(4,0,0) with non-zero mean 

Coefficients:
         ar1      ar2     ar3      ar4       mean
      1.1246  -0.7174  0.2634  -0.2543  1547.3859
s.e.  0.0903   0.1367  0.1361   0.0897   136.8501

sigma^2 = 748457:  log likelihood = -931.11
AIC=1874.22   AICc=1875.01   BIC=1890.64
checkresiduals(myarima)


    Ljung-Box test

data:  Residuals from ARIMA(4,0,0) with non-zero mean
Q* = 13.201, df = 6, p-value = 0.03996

Model df: 4.   Total lags used: 10
Box.test(myarima$residuals)

    Box-Pierce test

data:  myarima$residuals
X-squared = 0.02872, df = 1, p-value = 0.8654

Forecast de 10 años

arimafore <- forecast(myarima, h = 10)
plot(arimafore)

Valores estimados del futuro

arimafore$mean
Time Series:
Start = 1935 
End = 1944 
Frequency = 1 
 [1] 2980.7782 2114.6447 1361.7211  839.0137  668.7873  874.3079 1281.3753
 [8] 1679.8363 1933.3503 1987.5494

Ultimas observaciones y forecast

plot(arimafore, xlim = c(1930, 1944))