In the United States, power and gas are the two most significant energies which are conveyed and provided to public usage. The creation of electric and gas utilities can mirror the vitality use by inhabitants. In this analysis, we will analyse the US electric and gas utilities production through time series examination.
This project we are using dataset of Industrial production of electric and gas utilities[1] in the United States, from the years January 1985-January 2018, with the frequency being Monthly production output.
Forecast for next 10 months for Industrial production of electric and gas utilities will be made using suitable model which is identified as the best model from the analysis.
Below are the charasteristics of the Time series graph of our dataset
Trend : It is obvious from the above time series graph that there is upwords trend which suggests that graph is non-stationary.
Changing variation : Even though it seems like there is a slight changing variation among the seasonal pattern, it is very hard to decide just by looking at the above time series graph.
Seasonality : Seasonality is obvious from the repeating patterns from the graph since this is also a monthly observation.
Autocorrelation structure : We don’t see any observations hanging around.Hence we can say there is no autocorrelation.
Intervention point : No intervention point since there is no sudden change in the patterns of the graph.
## [1] 0.8684431
Strong correlation can be seen from scatter plot between our data and its first lag (figure 1).
This is proved by covariance test with the result of 0.87 which suggest positive autocorrelation between the same.
Observations made from Autocorrelation structure of our time series data,
Effect of seasonal trend is seen from ACF plot. The shape in ACF plot indicates seasonality.
This indicates we need to remove the seasonality before building the model. We will apply residual approach for the same.
From time series graph it is clear that this is a stocastic trend with seasonality.
Since there is obvious trend, we start with seasonal differencing. Seasonal differencing with D = 1 and white noise models are made.
Data is fitted to obtain residuals.
From Time series plot of residuals, we can see that seasonality is pretty much filtered out.
P and Q values for seasonal part are determined from ACF and PACF plots. We can see there are 3 lags at PACF and 2 lags at ACF plots which are significant at 12,24 and 36 months. Therefore P = 3 and Q = 2 are taken for seasonal part.
From the above time series plot of residuals, we can see that there is no seasonality in the time series graph.
From ACF and PACF plots we can see that residuals are now white noise which is centered around zero, uncorrelated with almost constant variance. We can conclude that model is now Adequate.
Our model with P=3, D=1,, Q=2 is aqequate to model the original part
For seasonal part, our model will be SARMA(3,2).
* In the time series graph after transformation we can see that the variance is stabilised.
ACF plot doesn’t show any seasonal pattern but has exponentially decaying trend among lags which explains non-stationarity.
This can be corrected by differencing the original part of the model.
##
## 0.01
From ADF test of the residuals, we can see that series is stationary after differencing it with d = 1. This is also confirmed by ADF test with p value < 0.01 which rejects null hypothesis and proves series is stationary.
Same can be seen from ACF plot as well. There are no exponentially decaying lags.
ACF is related with MA component. There are 3 significant lags going beyond blue dotted lines.
PACF is related with AR component. There are 4 significant lags going beyond blue dotted lines.
SARIMA(4,1,3)X(3,1,2)_12 is the model obtained from ACF and PACF plots.
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x o o o o o o o x o o o o
## 1 x x o o o o o o o o o o o o
## 2 x x o o o o o o o o o o o o
## 3 x o x o o o o o o o o o o o
## 4 x x x x o o o o o o o o o o
## 5 x x x o x o o o o o o o o o
## 6 x o o x x x o o o o o o o o
## 7 x x o x x o o o o o o o o o
## Reordering variables and trying again:
MA(4) and MA(5) components can be seen significant from BIC graph. Candidate models from BIC for ARMA is MA(0, 4) and MA(0,5).
We are not considering AR(10) due to principle of parcimony.
Below are our candidate models from ACF, PACF, EACF and BIC plots
All the parameters must be statistically significant at 5% of significance.
If they are insignificant, they will be dropped from the model.
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.430246 0.050921 -8.4492 < 2.2e-16 ***
## ma2 -0.332135 0.057116 -5.8151 6.059e-09 ***
## sar1 0.341335 0.636699 0.5361 0.5918888
## sar2 -0.256797 0.069326 -3.7042 0.0002121 ***
## sar3 0.023448 0.165915 0.1413 0.8876155
## sma1 -1.107127 0.634977 -1.7436 0.0812341 .
## sma2 0.310893 0.477178 0.6515 0.5147087
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.492009 0.122405 4.0195 5.832e-05 ***
## ma1 -0.908725 0.140302 -6.4769 9.362e-11 ***
## ma2 -0.016074 0.118620 -0.1355 0.8922099
## sar1 0.487073 0.353599 1.3775 0.1683658
## sar2 -0.261586 0.068549 -3.8160 0.0001356 ***
## sar3 0.047681 0.114671 0.4158 0.6775548
## sma1 -1.259078 0.348365 -3.6142 0.0003012 ***
## sma2 0.429343 0.263607 1.6287 0.1033724
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.416220 0.052475 -7.9317 2.161e-15 ***
## ma2 -0.289413 0.059922 -4.8298 1.366e-06 ***
## ma3 -0.113121 0.048058 -2.3539 0.0185799 *
## sar1 0.445792 0.419500 1.0627 0.2879300
## sar2 -0.258217 0.068797 -3.7533 0.0001745 ***
## sar3 0.054092 0.128605 0.4206 0.6740412
## sma1 -1.228258 0.416713 -2.9475 0.0032036 **
## sma2 0.398270 0.314299 1.2672 0.2050943
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.417338 0.051639 -8.0818 6.383e-16 ***
## ma2 -0.254462 0.057037 -4.4613 8.146e-06 ***
## ma3 -0.058738 0.053604 -1.0958 0.273178
## ma4 -0.108068 0.052755 -2.0485 0.040510 *
## sar1 -0.512741 NaN NaN NaN
## sar2 -0.248224 0.085551 -2.9015 0.003714 **
## sar3 -0.145795 NaN NaN NaN
## sma1 -0.251091 NaN NaN NaN
## sma2 -0.345411 NaN NaN NaN
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.402210 0.051860 -7.7557 8.788e-15 ***
## ma2 -0.258219 0.056701 -4.5541 5.262e-06 ***
## ma3 -0.034358 0.056770 -0.6052 0.545037
## ma4 -0.068483 0.055958 -1.2238 0.221016
## ma5 -0.088020 0.048894 -1.8002 0.071824 .
## sar1 -0.521673 NaN NaN NaN
## sar2 -0.248848 0.085293 -2.9176 0.003528 **
## sar3 -0.151252 NaN NaN NaN
## sma1 -0.230361 NaN NaN NaN
## sma2 -0.352123 NaN NaN NaN
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.99419, p-value = 0.1358
##
##
## Box-Ljung test
##
## data: res.model
## X-squared = 17.842, df = 10, p-value = 0.05769
From ACF plot we can see that 2 lags are crossing blue dotted line which suggest that the residuals are not white noise.
But the lags in ACF plot are not very far from the blue dotted line. Instead of rejecting this model, we will compare this model with the residul analysis of all candidate models.
Shapiro test indicates that the residuals are normally distributed with the test result of 0.1795 which is > 0.05. This is also proved by QQ-plot with all data points are being alligned to the red line.
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.99407, p-value = 0.1252
##
##
## Box-Ljung test
##
## data: res.model
## X-squared = 12.023, df = 10, p-value = 0.2835
From ACF plot we can see that 1 lag crossing blue dotted line which suggest that the residuals are not white noise.
But the lag in ACF plot is not very far from the blue dotted line. Instead of rejecting this model, we will compare this model with the residul analysis of all candidate models.
Shapiro test indicates that the residuals are normally distributed with the test result of 0.1857 which is > 0.05. This is also proved by QQ-plot with all data points are being alligned to the red line.
ACF plot of residuals of model SARIMA(0,1,3)X(3,1,2)_12 with 1 lag crossing blue dotted line is seen to be better than residuals of model SARIMA(0,1,2)X(3,1,2)_12.
SARIMA(0,1,3)X(3,1,2)_12 being the most eligible model, SARIMA(0,1,4)X(3,1,2)_12 and SARIMA(1,1,3)X(3,1,2)_12 are the over fitted model.
Analysis for model SARIMA(0,1,4)X(3,1,2)_12 is already done. Analysis for model SARIMA(1,1,3)X(3,1,2)_12 is done below.
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.6153409 0.1602113 3.8408 0.0001226 ***
## ma1 -1.0274595 0.1682366 -6.1072 1.014e-09 ***
## ma2 0.0078804 0.1026841 0.0767 0.9388270
## ma3 0.0746371 0.0868249 0.8596 0.3899943
## sar1 0.4724080 0.3801575 1.2427 0.2139917
## sar2 -0.2641270 0.0690086 -3.8275 0.0001295 ***
## sar3 0.0425761 0.1180051 0.3608 0.7182500
## sma1 -1.2328369 0.3759513 -3.2792 0.0010408 **
## sma2 0.4122796 0.2831803 1.4559 0.1454228
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.99435, p-value = 0.1499
##
##
## Box-Ljung test
##
## data: res.model
## X-squared = 8.6409, df = 10, p-value = 0.5665
From ACF plot we can see 2 lags crossing blue dotted line which suggest that the residuals are not white noise.
few co-efficients of model SARIMA(1,1,3)X(3,1,2)_12 model are not significant with test value p > 0.05.
## df AIC
## m013 9 -1729.342
## m012 8 -1726.043
## df BIC
## m012 8 -1694.438
## m013 9 -1693.787
We got two different results for AIC and BIC test for model comparision.
AIC is taken into account for goodness of the fit and model complexity.
According to AIC, SARIMA(0,1,3)X(3,1,2)_12 is good model on our data for forecasting.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2018 4.710677 4.679119 4.742234 4.662414 4.758940
## Mar 2018 4.629216 4.592674 4.665757 4.573330 4.685101
## Apr 2018 4.489901 4.452197 4.527605 4.432238 4.547564
## May 2018 4.514993 4.476858 4.553128 4.456670 4.573315
## Jun 2018 4.624210 4.585649 4.662772 4.565235 4.683185
## Jul 2018 4.713796 4.674813 4.752780 4.654176 4.773416
## Aug 2018 4.698027 4.658626 4.737428 4.637769 4.758286
## Sep 2018 4.599235 4.559421 4.639049 4.538344 4.660125
## Oct 2018 4.517877 4.477655 4.558100 4.456362 4.579393
## Nov 2018 4.563416 4.522789 4.604043 4.501282 4.625550
There were several models that we got from the analysis. Residual analysis and parameter estimations showed that the residuals of model SARIMA(0,1,3)X(3,1,2)_12 approximately follows the white noise with normal distribution property. This is an indicator of a good fit.
From the forecast we can see that production of electricity and gas utility for next 10 months won’t differ much when compared to previous years.
[1] Data Source https://fred.stlouisfed.org/series/IPG2211A2N
[2] Class notes https://rmit.instructure.com/courses/67182/files/12231943?module_item_id=2391148
#Load the requied libraries
library(TSA)
library(readr)
library(tseries)
library(fUnitRoots)
library(lmtest)
library(FSAdata)
library(forecast)
#Read the dataset and convert the dataframe to time series data
utilities <- read.csv('C:/Users/DELL/Downloads/Electric_Production.csv', header = TRUE)
names(utilities) <- c("date", "production")
rownames(utilities) <- utilities[,1]
utilities <- utilities[,2]
utilities.ts <- ts(as.vector(utilities), start=c(1985,1), end=c(2018,1), frequency=12)
# plotting time series
plot(utilities.ts, type='l',ylab='Electricity Production' )
points(y=utilities.ts,x=time(utilities.ts))
## scattter plot
par(mfrow=c(1,1) , cex.lab = 0.7, cex.main=0.7, cex.axis = 0.7)
plot(y=utilities.ts,x=zlag(utilities.ts), ylab='Electricity Production', xlab = 'Year', main = 'Figure-1')
# Autocovariance test
y = utilities.ts
x = zlag(utilities.ts) # Generate first lag
index = 2:length(x) # Create an index to get rid of the first NA value and the last
cor(y[index],x[index])
#ACF and PACF plots of original time series data
par(mfrow=c(2,1))
acf(utilities.ts, lag.max = 36,main="The sample ACF of production of electric and gas utilities series")
pacf(utilities.ts, lag.max = 36,main="The sample PACF of production of electric and gas utilities series")
#Seasonal differencing with D = 1 and fit a white noise model
m1 = arima(utilities.ts,order=c(0,0,0),seasonal=list(order=c(0,1,0), period=12))
res_m1 = residuals(m1)
#Plot Time series graph and ACF and PACF plot
par(mfrow=c(3,1))
plot(res_m1, type='l', ylab='Electric and gas utilities production', main = 'Time series graph of residuals' )
acf(res_m1,lag.max=36)
pacf(res_m1, lag.max=36)
par(mfrow=c(1,1))
#Adding SARMA(3,2) to the model
m3 = arima(utilities.ts,order=c(0,0,0),seasonal=list(order=c(3,1,2), period=12))
res_m3 = residuals(m3)
# plotting time series and ACF and PACF plot
par(mfrow=c(3,1))
plot(res_m3,type='l',ylab='Electric and gas utilities production',main = 'Time series graph of residuals' )
acf(res_m3,lag.max=36)
pacf(res_m3, lag.max=36)
par(mfrow=c(1,1))
#Log Transformation
log.utilities.ts = log(utilities.ts)
#Adding SARMA(3,2) and differencing the data to the model
m4 = arima(log.utilities.ts,order=c(0,0,0),seasonal=list(order=c(3,1,2), period=12))
res_m4 = residuals(m4)
#Plotting ACF and PACF
par(mfrow=c(3,1))
plot(res_m4,type='l',ylab='Electric and gas utilities production',main = 'Time series graph of residuals' )
acf(res_m4,lag.max=36)
pacf(res_m4, lag.max=36)
par(mfrow=c(1,1))
#Differencing the original part of the model
m5 = arima(log.utilities.ts,order=c(0,1,0),seasonal=list(order=c(3,1,2), period=12))
res_m5 = residuals(m5)
#ADF test for non-stationarity
order = ar(diff(res_m5))$order
adfTest(res_m5, lags = order, title = NULL,
description = NULL)@test$p.value
# ACF and PACF plots
par(mfrow=c(2,1))
acf(res_m5,lag.max=36)
pacf(res_m5, lag.max=36)
par(mfrow=c(1,1))
#Plot EACF graph
eacf(res_m5)
#Plotting BIC Graph
par(mfrow=c(1,1) , cex.lab = 0.7, cex.main=0.7, cex.axis = 0.7)
res4 = armasubsets(y=res_m5,nar=12,nma=12,y.name='test',ar.method='ols')
plot(res4)
mtext("BIC Table", outer=TRUE, cex=1, line=-1.25)
#SARIMA(0,1,2)X(3,1,2)_12
m012 = arima(log.utilities.ts,order=c(0,1,2),seasonal=list(order=c(3,1,2), period=12))
coeftest(m012)
#SARIMA(1,1,2)X(3,1,2)_12
m112 = arima(log.utilities.ts,order=c(1,1,2),seasonal=list(order=c(3,1,2), period=12))
coeftest(m112)
#SARIMA(0,1,3)X(3,1,2)_12
m013 = arima(log.utilities.ts,order=c(0,1,3),seasonal=list(order=c(3,1,2), period=12))
coeftest(m013)
#SARIMA(0,1,4)X(3,1,2)_12
m014 = arima(log.utilities.ts,order=c(0,1,4),seasonal=list(order=c(3,1,2), period=12))
coeftest(m014)
#SARIMA(0,1,5)X(3,1,2)_12
m015 = arima(log.utilities.ts,order=c(0,1,5),seasonal=list(order=c(3,1,2), period=12))
coeftest(m015)
#Function to test residual analysis of best model
residual.analysis <- function(model, std = TRUE,start = 2, class = c("ARIMA","GARCH","ARMA-GARCH")[1]){
library(TSA)
library(FitAR)
if (class == "ARIMA"){
if (std == TRUE){
res.model = residuals(model)
}else{
res.model = residuals(model)
}
}else if (class == "GARCH"){
res.model = model$residuals[start:model$n.used]
}else if (class == "ARMA-GARCH"){
res.model = model@fit$residuals
}else {
stop("The argument 'class' must be either 'ARIMA' or 'GARCH' ")
}
par(mfrow=c(3,2))
plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised residuals")
abline(h=0)
hist(res.model,main="Histogram of standardised residuals")
acf(res.model,main="ACF of standardised residuals")
pacf(res.model,main="PACF of standardised residuals")
qqnorm(res.model,main="QQ plot of standardised residuals")
qqline(res.model, col = 2)
print(shapiro.test(res.model))
print(Box.test(res.model, lag=10, type="Ljung-Box"))
k=0
LBQPlot(res.model, lag.max = length(model$residuals)-1, StartLag = k + 1, k = 0, SquaredQ = FALSE)
}
#residual analysis of SARIMA(0,1,2)X(3,1,2)_12
residual.analysis(m012)
#residual analysis of SARIMA(0,1,3)X(3,1,2)_12
residual.analysis(m013)
#Parameter estimate and residual analysis of SARIMA(1,1,3)X(3,1,2)_12
m113 = arima(log.utilities.ts,order=c(1,1,3),seasonal=list(order=c(3,1,2), period=12))
coeftest(m113)
residual.analysis(m113)
#Function for sorting AIC and BIC scores
sort.score <- function(x, score = c("bic", "aic")){
if (score == "aic"){
x[with(x, order(AIC)),]
} else if (score == "bic") {
x[with(x, order(BIC)),]
} else {
warning('score = "x" only accepts valid arguments ("aic","bic")')
}
}
# Calculate AIC and BIC scores of the model
sc.AIC=AIC(m012, m013)
sc.BIC=BIC(m012, m013)
#sort AIC and BIC scores of the candidate models
sort.score(sc.AIC, score = "aic")
sort.score(sc.BIC, score = "bic")
#Plotting Time series graph with forecasting for next 10 months
finalModel = Arima(log.utilities.ts,order=c(0,1,3),seasonal=list(order=c(3,1,2), period=12))
pred = forecast(finalModel, h=10)
plot(pred)
#Forecasted values for next 10 months
pred