TIME SERIES ANALYSIS

library(TSA)

## Warning: package 'TSA' was built under R version 3.5.1

## 
## Attaching package: 'TSA'

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

library(ggplot2)
library(timeSeries)

## Warning: package 'timeSeries' was built under R version 3.5.1

## Loading required package: timeDate

## Warning: package 'timeDate' was built under R version 3.5.1

## 
## Attaching package: 'timeDate'

## The following objects are masked from 'package:TSA':
## 
##     kurtosis, skewness

library(urca)

## Warning: package 'urca' was built under R version 3.5.1

library(uroot)

data("AirPassengers")
mydata <- AirPassengers
class(mydata)

## [1] "ts"

start(mydata)

## [1] 1949    1

end(mydata)

## [1] 1960   12

frequency(mydata)

## [1] 12

The data has a cycle of 12 years.

summary(AirPassengers)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   104.0   180.0   265.5   280.3   360.5   622.0

plot(mydata,xlab="Date", ylab = "No. of Passengers",main="TIME from 1949 to 1961")
abline(reg=lm(mydata~time(mydata)))

We can observe a trend in the data. This implies with time number of passengers have increased over the time.We have to check for the stationarity of the data. We can check for it in two ways, graphically and by Augmented Dickey Fuller test.

Before, formally testing for the stationarity we can look at some more of visualisation to check for presence of seasonality.

plot(aggregate(mydata,FUN=mean))

This figure where we plot year on year trend clearly shows presence of deterministic trend, that is the numer of passengers are clearly increasing over the years.

boxplot(mydata~cycle(mydata))

The X axis of the boxplot shows the months.The box plot can help us to check if any seasonality is present in the data, that is if there are any strong variations in the data. In the months from 6 to 9 there is higher mean and higher variances than the other months, indicating seasonality with an apparent cycle of 12 months. The data appears to be multiplicative time series as the passenger number increase. It appears so does the pattern of seasonality. There are very less outliers.

We decompose the time series for estimates of trend,seasonal and random components using the moving average method. The multiplicative model is Y(t) = T(t) + S(t) + e(t)

where Y(t) is the number of passengers at time t T(t) is the trend component at time t S(t) is the seasonal component at time t e(t) is the random error component at time t

decomposemydata <- decompose(mydata, "multiplicative")
plot(decomposemydata)

In the decomposition plot, we can clearly observe the presence of trend and seasonality. We can observe an pattern in the error component as well.

We shall now test for stationarity of the time series using both Augmented Dickey Fuller test and autocorrelation function. In order to fit an arima model it is required that the time series is stationary.

FOR AUGMENTED DICKEY FULLER TEST Null Hypothesis (H0)= time seris is non stationary Alternative Hypothesis (H1) = time series is stationary

We saw that the variances in the data are increasing( from the plots). We need to remove unequal variances and we do this by taking log of the series. And also we need to correct for the trend by differencing the series.

library(fUnitRoots)

## Warning: package 'fUnitRoots' was built under R version 3.5.1

## Loading required package: fBasics

## Warning: package 'fBasics' was built under R version 3.5.1

## 
## Attaching package: 'fUnitRoots'

## The following objects are masked from 'package:urca':
## 
##     punitroot, qunitroot, unitrootTable

newdata <- diff(log(mydata))
adfTest(newdata, lags = 0)

## Warning in adfTest(newdata, lags = 0): p-value smaller than printed p-value

## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 0
##   STATISTIC:
##     Dickey-Fuller: -9.6057
##   P VALUE:
##     0.01 
## 
## Description:
##  Mon Dec 03 02:18:15 2018 by user: Sameekhsya Behura

The p-value is smaller than 0.05 so we reject the null hypothesis at 95% confidence interval. So, the time series is stationary.

FOR AUTOCORRELATION FUNCTION

We check for stationarity by looking at the patten of autocorrelation and partial autocorrelation function in the

logdata <- log(mydata)
acf(logdata)

The decay in the autocorrelation clearly indicates non stationarity. So now we check for stationarity in difference of the log of the series.

acf(newdata)

pacf(newdata)

ACF is clearly cutting off after lag 1. We can use the auto arima function to determine the optimal number of lags in ARIMA model. The Auto Regressive Integrated Moving Average Model has parameters (p,d,q) where p denotes the number of lag in the autoregressive term, d denotes the number of differencing term and q denotes the number of lag in the moving average term. We fit the ARIMA model given the default parameter including seasonality as TRUE.

library(forecast)

## Warning: package 'forecast' was built under R version 3.5.1

estarima <- auto.arima(mydata)
estarima

## Series: mydata 
## ARIMA(2,1,1)(0,1,0)[12] 
## 
## Coefficients:
##          ar1     ar2      ma1
##       0.5960  0.2143  -0.9819
## s.e.  0.0888  0.0880   0.0292
## 
## sigma^2 estimated as 132.3:  log likelihood=-504.92
## AIC=1017.85   AICc=1018.17   BIC=1029.35

The ARIMA Model can be written as Y^=0.5960Yt???2+0.2143Yt???12???0.9819et???1+E To check for the fit of the arima model, we check the behaviour of the residual.

library(ggfortify)

## Warning: package 'ggfortify' was built under R version 3.5.1

ggtsdiag(estarima)

The residual plots appear to be with no pattern so the arima model is a good fit.

fcdata <- forecast(estarima, level = c(95), h = 36)
autoplot(fcdata)

The hsaded region represents significance with 95% level of confidence.

TIME SERIES ANALYSIS

SAMEEKSHYA BEHURA

30 November 2018