22-10-2023library(GGally) # for ggpairs()
library(TSA) # season(), prewhiten() and other functions
library(tseries) # adf.test()
library(forecast) # BoxCox.lambda()
library(dLagM) # For DLM modelling
library(car) # for vif()
library(tis) # for Lag()
library(dynlm) # for Dynamic linear modeling
library(stats) # for classical decomposition
library(x12) # X-12-ARIMA decomposition
library(lmtest) # for bgtest()
library(dplyr) # for arrange()
A significance level \(\alpha=5\%\) is used.
The dataset holds 6 columns and 508 observations. They are, Index column, the disease specific averaged weekly mortality in Paris, France, the city’s local climate (temperature degrees Fahrenheit), size of pollutants and levels of noxious chemical emissions from cars and industry in the air - all measured at the same points between 2010-2020.
Our aim for the mort dataset is to give best 4 weeks ahead forecasts by determining the most accurate and suitable regression model that determines the average weekly mortality in Paris in terms of MASE using multiple predictors. A descriptive analysis will be conducted initially. Model-building strategy will be applied to find the best fitting model from the time series regression methods (dLagM package), dynamic linear models (dynlm package), and exponential smoothing and corresponding state-space models.
MASE
Out of various different error measures to assess the forecast accuracy, Mean absolute scaled error (MASE) is a generally applicable measure of forecast accuracy and is obtained by scaling the errors based on the in-sample MAE from the naive forecast method. It is the only available method which can be used in all circumstances and can be used to compare forecast accuracy between series as it is scale-free.
Information Criteria (AIC and BIC)
IC for model selection penalizes the likelihood criteria by the penalty of twice the number of parameters in the model in AIC and by number of parameters and the sample size (qlog(n)) in BIC. Or in simple terms, IC incorporates penalties to the maximum likelihood methods, thus given a better criteria for model selection.
Adjusted R Squared
Comparison of models using adjusted R squared gives a rough estimate of how good the model fits the data in percentage.
mort <- read.csv("C:/Users/admin/Downloads/mort.csv")
mort = mort[,2:6] # remove index column
head(mort)
## mortality temp chem1 chem2 particle.size
## 1 183.63 72.38 11.51 45.79 72.72
## 2 191.05 67.19 8.92 43.90 49.60
## 3 180.09 62.94 9.48 32.18 55.68
## 4 184.67 72.49 10.28 40.43 55.16
## 5 173.60 74.25 10.57 48.53 66.02
## 6 183.73 67.88 7.99 48.61 44.01
For fitting a regression model, the response is Mortality and the 4 regressor variables are the temperature, pollutants particle size, and the two chemical emissions (chem1, chem2).
All the 5 variables are continuous variables.
Lets first get the regressor and response as TS objects,
Mortality = ts(mort[,1])
Temp = ts(mort[,2])
Chem1 = ts(mort[,3])
Chem2 = ts(mort[,4])
ParticleSize = ts(mort[,5])
data.ts = ts(mort) # Y and x in single dataframe
Lets scale, center and plot all the 4 variables together
data.scale = scale(data.ts)
plot(data.ts, plot.type="s", col=c("black", "red", "blue", "green", "yellow"), main = "Mortality (Black - Respone), Temperature (Red - X1),\n Chemical 1 (Blue - X2), Chemical 2 (Green - X3), Particle size (Yellow - X4)")
It is hard to read the correlations between the regressors and the response and the among the response themselves. But it is fair to say the 5 variables show some correlations. Lets check for correlation statistically using ggpairs(),
ggpairs(data = mort, columns = c(1,2,3,4,5), progress = FALSE) #library(GGally)
Hence, some correlations between the 4 regressors and response is present. We can generate regression model based on these correlations. First, lets look at the descriptive statistics
Since we are generating regression model which estimates the response, \(Mortality\), lets focus on Mortalitys statistics.
summary(Mortality)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 142.1 159.6 166.7 169.0 176.4 231.7
The mean and median of the Mortality are very close indicating symmetrical distribution.
The time series plot for our data is generated using the following code chunk,
plot(Mortality,ylab='Average weekly mortality in Paris',xlab='Weeks',
type='o', main="Average weekly mortality Trend (2010-2020/week1-week508)")
Plot Inference :
From Figure 1, we can comment on the time series’s,
Trend: The overall shape of the trend seems to follow an downward trend. Thus, indicating non-stationarity.
Seasonality: From the plot, seasonal behavior is quite evident every year. This needs to be confirmed using statistical tests.
Change in Variance: Variation is random and needs to be checked statistically.
Behavior: We notice mixed behavior of MA and AR series. AR behavior is dominant as we obverse more following data points. MA behavior is evident due to up and down fluctuations in the data points.
Intervention/Change points: No particular intervention point is seen. Week 150 might be an intervention point and will be checked if it caused significant change in mean value.
acf(Mortality, main="ACF of Average weekly mortality")
pacf(Mortality, main ="PACF of Average weekly mortality")
ACF plot: We notice multiple autocorrelations are significant. A slowly decaying pattern indicates non stationary series. We do not see any ‘wavish’ form. Thus, no significant seasonal behavior is observed.
PACF plot: We see 1 high vertical spike indicating non stationary series. We have observed non stationarity in the time series plot as well. Also, the second correlation bar is significant as well.
Many model estimating procedures assume normality of the residuals. If this assumption doesn’t hold, then the coefficient estimates are not optimum. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.
qqnorm(Mortality, main = "Normal Q-Q Plot of Average weekly mortality")
qqline(Mortality, col = 2)
We see deviations from normality. Clearly, both the tails are off and most of the data in middle is off the line as well. Lets check statistically using shapiro-wilk test. Lets state the hypothesis of this test,
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(Mortality)
##
## Shapiro-Wilk normality test
##
## data: Mortality
## W = 0.94454, p-value = 7.548e-13
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, Mortality series is not normally distributed.
The PACF plot of Mortality time series at the descriptive analysis stage of time series tells us nonstationarity in our time series. Lets use ADF and PP tests,
Using ADF (Augmented Dickey-Fuller) test :
Lets confirm the non-stationarity using Dickey-Fuller Test or ADF
test. Lets state the hypothesis,
\(H_0\) : Time series is Difference
non-stationary
\(H_a\) : Time
series is Stationary
adf.test(Mortality) #library(tseries)
##
## Augmented Dickey-Fuller Test
##
## data: Mortality
## Dickey-Fuller = -5.4301, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
since p-value < 0.05, we reject null hypothesis of non stationarity. we can conclude that the series is stationary at 5% level of significance.
Using PP (Phillips-Perron) test :
The null and alternate hypothesis are same as ADF test.
PP.test(Mortality)
##
## Phillips-Perron Unit Root Test
##
## data: Mortality
## Dickey-Fuller = -9.9724, Truncation lag parameter = 6, p-value = 0.01
According to the PP tests, Mortality series is stationary at 5% level
Lets perform with Box-Cox transformation,
To improve normality in our Mortality time series, lets test Box-Cox transformations on the series
lambda = BoxCox.lambda(Mortality, method = "loglik") # library(forecast)
BC.Mortality = BoxCox(Mortality, lambda = lambda)
Visually comparing the time series plots before and after box-cox transformation,
par(mfrow=c(2,1))
plot(BC.Mortality,ylab='Weekly Mortality',xlab='Time',
type='o', main="Box-Cox Transformed Mortality Time Series")
points(y=BC.Mortality,x=time(BC.Mortality))
plot(Mortality,ylab='Weekly Mortality',xlab='Time',
type='o', main="Original Mortality Time Series")
points(y=Mortality,x=time(Mortality))
par(mfrow=c(1,1))
From the plot, almost no improvement in the variance of the time series is visible after BC transformation. Lets check for normality using shapiro test,
shapiro.test(BC.Mortality)
##
## Shapiro-Wilk normality test
##
## data: BC.Mortality
## W = 0.9854, p-value = 5.59e-05
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, BC Transformed Mortality is not normal.
The BC transformed Mortality series is Stationary and not normal. BC transformation was not effective.
To observe the individual effects of the existing components and historical effects occurred in the past, lets perform decomposition of the Mortality time series. The time series can be decomposed into are seasonal and trend components. STL decomposition method will be used.
Lets set t.window to 15 and look the STL decomposed plots,
We can adjust the series for seasonality by subtracting the seasonal component from the original series using the following code chunk,
# Code gist - Apply STL decomposition to get seasonally adjusted and trend adjusted and visually compare w.r.t to original time series
MortalityX = ts(mort[,1], start = c(2010,1), frequency = 52) # set frequency
stl.Mortality <- stl(window(MortalityX, start=c(2010,1)), t.window=15, s.window="periodic", robust=TRUE)
par(mfrow=c(3,1))
plot(MortalityX,ylab='Mortality',xlab='Time',
type='o', main="Original Mortality Time Series")
plot(seasadj(stl.Mortality), ylab='Mortality Radiation',xlab='Time', main = "Seasonally adjusted Mortality")
stl.Mortality.trend = stl.Mortality$time.series[,"trend"] # Extract the trend component from the output
stl.Mortality.trend.adjusted = MortalityX - stl.Mortality.trend
plot(stl.Mortality.trend.adjusted, ylab='Mortality',xlab='Time', main = "Trend adjusted Mortality")
par(mfrow=c(1,1))
Not much change is visually seen in the trend adjusted series and the seasonally adjusted series compared to the original series. This indicates both the trend and seasonal components are equally significant or insignificant for the Mortality time series.
Neither significant trend nor seasonal components are found through decomposition. Thus, we expect the fitted model to have neither trend and seasonal components.
Time series regression methods namely,
Based on whether the lags are known (Finite DLM) or undetermined (Infinite DLM), 4 major modelling methods will be tested, namely,
The response of a finite DLM model with 1 regressor is represented as
shown below,
\(Y_t = \alpha + \sum_{s=0}^{q} \beta_s
X_{t-s} + \epsilon_t\)
where,
In our dataset, we have 4 regressors, hence the model equation has X1, X2, X3 and x4 instead of just one regressor.
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = mortality ~ temp + chem1 + chem2 + particle.size, data = mort, q.min = 1, q.max = 12,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 12 12 0.79668 3678.921 3906.076 0.71739 0.59279 0.56683 0
## 11 11 0.81097 3690.626 3901.055 0.77977 0.34928 0.55912 0
## 10 10 0.81455 3696.208 3889.895 0.78290 0.80140 0.55711 0
## 9 9 0.81741 3704.335 3881.265 0.77674 -0.53319 0.55240 0
## 8 8 0.82157 3710.060 3870.215 0.76386 2.44211 0.55022 0
## 7 7 0.82450 3713.873 3857.237 0.76895 -0.15585 0.54954 0
## 6 6 0.83830 3726.178 3852.736 0.79115 1.21454 0.54079 0
## 5 5 0.83876 3729.364 3839.099 0.77233 0.17266 0.54125 0
## 4 4 0.85441 3741.636 3834.533 0.80750 0.26095 0.53242 0
## 3 3 0.86883 3755.807 3831.850 0.83498 0.50969 0.52269 0
## 2 2 0.88537 3769.907 3829.078 0.81427 1.28602 0.51223 0
## 1 1 0.91963 3812.934 3855.219 0.81979 -0.26483 0.47418 0
Note - We are using Mortality and not the BC.Mortality (BC transformed Mortality series) as normality is violated in both of these.
q = 12 has the smallest AIC and BIC scores. Fit model with q = 12,
Since there are 4 predictors, there are 4C1 + 4C2 + 4C3 + 4C4 = 15 possible combinations of predictors and hence 15 models to compare. Lets fit all these 15 models and compare based on AIC, BIC and MASE scores.
DLM.model = dlm(formula = mortality ~ temp + chem1 + chem2 + particle.size, data = mort, q = 12)
DLM.model1 = dlm(formula = mortality ~ temp , data = mort, q = 12)
DLM.model2 = dlm(formula = mortality ~ chem1, data = mort, q = 12)
DLM.model3 = dlm(formula = mortality ~ chem2, data = mort, q = 12)
DLM.model4 = dlm(formula = mortality ~ particle.size, data = mort, q = 12)
DLM.model5 = dlm(formula = mortality ~ temp + chem1, data = mort, q = 12)
DLM.model6 = dlm(formula = mortality ~ temp + chem2, data = mort, q = 12)
DLM.model7 = dlm(formula = mortality ~ temp + particle.size, data = mort, q = 12)
DLM.model8 = dlm(formula = mortality ~ temp + chem1 + chem2, data = mort, q = 12)
DLM.model9 = dlm(formula = mortality ~ temp + chem1 + particle.size, data = mort, q = 12)
DLM.model10 = dlm(formula = mortality ~ temp + chem2 + particle.size, data = mort, q = 12)
DLM.model11 = dlm(formula = mortality ~ chem1 + chem2 + particle.size, data = mort, q = 12)
DLM.model12 = dlm(formula = mortality ~ chem1 + chem2 , data = mort, q = 12)
DLM.model13 = dlm(formula = mortality ~ chem1 + particle.size, data = mort, q = 12)
DLM.model14 = dlm(formula = mortality ~ chem2 + particle.size, data = mort, q = 12)
Model <- c("DLM.model", "DLM.model1", "DLM.model2", "DLM.model3", "DLM.model4", "DLM.model5", "DLM.model6", "DLM.model7", "DLM.model8", "DLM.model9", "DLM.model10", "DLM.model11", "DLM.model12", "DLM.model13", "DLM.model14")
AIC <- c(AIC(DLM.model), AIC(DLM.model1), AIC(DLM.model2), AIC(DLM.model3), AIC(DLM.model4),AIC(DLM.model5), AIC(DLM.model6), AIC(DLM.model7), AIC(DLM.model8), AIC(DLM.model9), AIC(DLM.model10), AIC(DLM.model11), AIC(DLM.model12), AIC(DLM.model13), AIC(DLM.model14))
BIC <- c(BIC(DLM.model), BIC(DLM.model1), BIC(DLM.model2), BIC(DLM.model3), BIC(DLM.model4),BIC(DLM.model5), BIC(DLM.model6), BIC(DLM.model7), BIC(DLM.model8), BIC(DLM.model9), BIC(DLM.model10), BIC(DLM.model11), BIC(DLM.model12), BIC(DLM.model13), BIC(DLM.model14))
MASE <- MASE(DLM.model, DLM.model1, DLM.model2, DLM.model3, DLM.model4, DLM.model5, DLM.model6, DLM.model7, DLM.model8, DLM.model9, DLM.model10, DLM.model11, DLM.model12, DLM.model13, DLM.model14)
data.frame(AIC, BIC, MASE) %>% arrange(MASE)
## AIC BIC n MASE
## DLM.model 3678.921 3906.076 496 0.7966763
## DLM.model8 3680.200 3852.670 496 0.8170975
## DLM.model9 3685.181 3857.651 496 0.8216239
## DLM.model11 3681.534 3854.004 496 0.8245264
## DLM.model12 3672.715 3790.500 496 0.8370629
## DLM.model5 3681.821 3799.605 496 0.8384156
## DLM.model13 3689.321 3807.105 496 0.8411857
## DLM.model10 3741.257 3913.726 496 0.8494305
## DLM.model2 3684.963 3748.062 496 0.8573333
## DLM.model14 3741.543 3859.327 496 0.8728312
## DLM.model7 3737.286 3855.070 496 0.8826570
## DLM.model4 3750.060 3813.158 496 0.9101003
## DLM.model6 3766.032 3883.816 496 0.9122243
## DLM.model3 3798.042 3861.140 496 0.9337064
## DLM.model1 3860.683 3923.782 496 1.0667651
The best model as per MASE (best for forecasting) is the one with all 4 predictors, \(DLM.model\).
We can apply a diagnostic check using checkresiduals() function from the forecast package.
checkresiduals(DLM.model$model$residuals) # forecast package
##
## Ljung-Box test
##
## data: Residuals
## Q* = 380.43, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
In this output,
Model Summary for Finite DLM model (DLM.model) :
summary(DLM.model)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.355 -5.502 -0.107 4.850 43.608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.328e+02 1.063e+01 12.491 < 2e-16 ***
## temp.t 4.429e-01 1.168e-01 3.793 0.000169 ***
## temp.1 -2.417e-01 1.212e-01 -1.994 0.046782 *
## temp.2 -9.926e-02 1.245e-01 -0.797 0.425624
## temp.3 4.283e-02 1.262e-01 0.339 0.734566
## temp.4 1.211e-01 1.264e-01 0.958 0.338534
## temp.5 -4.649e-02 1.264e-01 -0.368 0.713112
## temp.6 3.780e-02 1.256e-01 0.301 0.763659
## temp.7 2.605e-02 1.253e-01 0.208 0.835350
## temp.8 5.145e-02 1.237e-01 0.416 0.677756
## temp.9 -9.944e-02 1.222e-01 -0.814 0.416054
## temp.10 -8.694e-02 1.222e-01 -0.712 0.477127
## temp.11 -4.108e-02 1.185e-01 -0.347 0.729099
## temp.12 -8.598e-02 1.144e-01 -0.751 0.452814
## chem1.t -6.862e-01 4.401e-01 -1.559 0.119674
## chem1.1 6.670e-01 4.462e-01 1.495 0.135727
## chem1.2 1.085e+00 4.718e-01 2.300 0.021893 *
## chem1.3 1.058e+00 4.738e-01 2.232 0.026091 *
## chem1.4 5.075e-01 4.816e-01 1.054 0.292529
## chem1.5 5.447e-01 4.828e-01 1.128 0.259857
## chem1.6 8.502e-01 4.830e-01 1.760 0.079060 .
## chem1.7 4.882e-01 4.801e-01 1.017 0.309748
## chem1.8 3.463e-02 4.769e-01 0.073 0.942154
## chem1.9 2.999e-01 4.762e-01 0.630 0.529222
## chem1.10 -3.240e-01 4.746e-01 -0.683 0.495139
## chem1.11 -7.156e-01 4.481e-01 -1.597 0.110992
## chem1.12 -9.069e-01 4.341e-01 -2.089 0.037243 *
## chem2.t -5.401e-04 8.872e-02 -0.006 0.995145
## chem2.1 -9.989e-02 8.968e-02 -1.114 0.265973
## chem2.2 -1.270e-01 9.249e-02 -1.373 0.170484
## chem2.3 -1.959e-01 9.288e-02 -2.110 0.035452 *
## chem2.4 -1.155e-02 9.352e-02 -0.124 0.901753
## chem2.5 -1.182e-01 9.359e-02 -1.263 0.207220
## chem2.6 -5.839e-02 9.279e-02 -0.629 0.529542
## chem2.7 1.589e-02 9.196e-02 0.173 0.862887
## chem2.8 1.753e-02 9.221e-02 0.190 0.849321
## chem2.9 4.621e-02 9.123e-02 0.507 0.612710
## chem2.10 5.557e-02 9.094e-02 0.611 0.541495
## chem2.11 1.141e-01 8.667e-02 1.316 0.188699
## chem2.12 2.230e-01 8.605e-02 2.592 0.009856 **
## particle.size.t 2.088e-01 8.054e-02 2.593 0.009838 **
## particle.size.1 9.288e-03 8.117e-02 0.114 0.908949
## particle.size.2 6.814e-03 8.312e-02 0.082 0.934703
## particle.size.3 -7.578e-02 8.382e-02 -0.904 0.366466
## particle.size.4 -6.728e-02 8.505e-02 -0.791 0.429335
## particle.size.5 5.676e-02 8.506e-02 0.667 0.504912
## particle.size.6 -9.976e-02 8.542e-02 -1.168 0.243486
## particle.size.7 7.766e-04 8.627e-02 0.009 0.992821
## particle.size.8 7.200e-03 8.643e-02 0.083 0.933642
## particle.size.9 2.201e-02 8.373e-02 0.263 0.792741
## particle.size.10 1.043e-01 8.319e-02 1.254 0.210571
## particle.size.11 1.156e-01 8.144e-02 1.420 0.156309
## particle.size.12 1.067e-01 8.105e-02 1.317 0.188594
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.368 on 443 degrees of freedom
## Multiple R-squared: 0.6123, Adjusted R-squared: 0.5668
## F-statistic: 13.46 on 52 and 443 DF, p-value: < 2.2e-16
##
## AIC and BIC values for the model:
## AIC BIC
## 1 3678.921 3906.076
Lets consider the effect of collinearity on these results. To inspect this issue, we will display variance inflation factors (VIFs). If the value of VIF is greater than 10, we can conclude that the effect of multicollinearity is high.
vif(DLM.model$model) # variance inflation factors #library(car)
## temp.t temp.1 temp.2 temp.3
## 6.330556 6.829527 7.202565 7.416043
## temp.4 temp.5 temp.6 temp.7
## 7.450336 7.450163 7.353137 7.311261
## temp.8 temp.9 temp.10 temp.11
## 7.087101 6.846589 6.822736 6.425389
## temp.12 chem1.t chem1.1 chem1.2
## 5.963463 15.717728 16.168511 18.096868
## chem1.3 chem1.4 chem1.5 chem1.6
## 18.249145 18.842660 18.923789 18.956352
## chem1.7 chem1.8 chem1.9 chem1.10
## 18.697202 18.462704 18.421260 18.289736
## chem1.11 chem1.12 chem2.t chem2.1
## 16.263125 15.281320 8.197008 8.403944
## chem2.2 chem2.3 chem2.4 chem2.5
## 8.938698 9.022675 9.177957 9.185801
## chem2.6 chem2.7 chem2.8 chem2.9
## 9.043460 8.870870 8.919021 8.731734
## chem2.10 chem2.11 chem2.12 particle.size.t
## 8.705336 7.888207 7.778710 8.444210
## particle.size.1 particle.size.2 particle.size.3 particle.size.4
## 8.576138 8.987483 9.099668 9.408277
## particle.size.5 particle.size.6 particle.size.7 particle.size.8
## 9.409970 9.449247 9.638768 9.699272
## particle.size.9 particle.size.10 particle.size.11 particle.size.12
## 9.099315 8.967741 8.589072 8.536916
MASE(DLM.model)
## MASE
## DLM.model 0.7966763
ATTENTION - Lets summarise the models from here on
and not go into each models details for simplicity
Polynomial DLM model helps remove the effect of multicollinearity, but our data has significant multicollinearity. Lets fit a polynomial DLM of order 2 and check if the polynomial component (order 2) reduces multicollinearity. Lets do this for each of the 4 regressors individually.
For Temperature regressor:
PolyDLM.Temp = polyDlm(x = as.vector(Temp), y = as.vector(Mortality), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.Temp)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.210 -7.947 -2.086 5.734 53.184
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 222.488659 6.618124 33.618 < 2e-16 ***
## z.t0 -0.161012 0.054745 -2.941 0.00342 **
## z.t1 -0.019936 0.026643 -0.748 0.45465
## z.t2 0.004504 0.002210 2.038 0.04211 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.94 on 492 degrees of freedom
## Multiple R-squared: 0.3, Adjusted R-squared: 0.2957
## F-statistic: 70.29 on 3 and 492 DF, p-value: < 2.2e-16
Polynomial DLM model with Temperature as regressor variable is significant at 5% significance level.
For Chemical 1 regressor:
PolyDLM.Chem1 = polyDlm(x = as.vector(Chem1), y = as.vector(Mortality), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.Chem1)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.817 -6.283 -0.574 4.845 48.223
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 141.458130 1.386081 102.056 <2e-16 ***
## z.t0 0.304811 0.122211 2.494 0.013 *
## z.t1 0.006495 0.060265 0.108 0.914
## z.t2 -0.001546 0.004990 -0.310 0.757
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.973 on 492 degrees of freedom
## Multiple R-squared: 0.5121, Adjusted R-squared: 0.5091
## F-statistic: 172.1 on 3 and 492 DF, p-value: < 2.2e-16
Polynomial DLM model with Chemical 1 as regressor variable is significant at 5% significance level.
For Chemical 2 regressor:
PolyDLM.Chem2 = polyDlm(x = as.vector(Chem2), y = as.vector(Mortality), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.Chem2)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.756 -6.782 -1.705 5.040 58.318
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 117.413821 2.960269 39.663 < 2e-16 ***
## z.t0 0.115869 0.031277 3.705 0.000236 ***
## z.t1 -0.021075 0.014590 -1.444 0.149239
## z.t2 0.001775 0.001199 1.481 0.139289
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.15 on 492 degrees of freedom
## Multiple R-squared: 0.3902, Adjusted R-squared: 0.3865
## F-statistic: 105 on 3 and 492 DF, p-value: < 2.2e-16
Polynomial DLM model with Chemical 2 as regressor variable is significant at 5% significance level.
For Particle Size regressor:
PolyDLM.ParticleSize = polyDlm(x = as.vector(ParticleSize), y = as.vector(Mortality), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.ParticleSize)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.813 -5.943 -1.099 5.128 48.100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 125.724476 2.305488 54.533 < 2e-16 ***
## z.t0 0.119830 0.028886 4.148 3.94e-05 ***
## z.t1 -0.027004 0.013928 -1.939 0.0531 .
## z.t2 0.002247 0.001151 1.953 0.0514 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.61 on 492 degrees of freedom
## Multiple R-squared: 0.4475, Adjusted R-squared: 0.4441
## F-statistic: 132.8 on 3 and 492 DF, p-value: < 2.2e-16
Polynomial DLM model with Particle Size as regressor variable is significant at 5% significance level.
Polynomial DLM models for each of the 4 regressors are significant. The 0th and 1st order regressors of copper price variable are significant, but the 2nd order regressor (z.t2) is insignificant.
MASE(PolyDLM.Temp, PolyDLM.Chem1, PolyDLM.Chem2, PolyDLM.ParticleSize) %>% arrange(MASE)
## n MASE
## PolyDLM.Chem1 496 0.8944918
## PolyDLM.ParticleSize 496 0.9522845
## PolyDLM.Chem2 496 0.9713557
## PolyDLM.Temp 496 1.1042464
As per MASE measure, Polynomial DLM model with Chemical 1 as regressor is the best model for forecasting.
checkresiduals(PolyDLM.Chem1$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 349.78, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
Here the lag weights are positive and decline geometrically. This
model is called infinite geometric DLM, meaning there are infinite lag
weights. Koyck transformation is applied to implement this infinite
geometric DLM model by subtracting the first lag of geometric DLM
multiplied by \(\phi\). The Koyck
transformed model is represented as,
\(Y_t = \delta_1 + \delta_2Y_{t-1} +
\nu_t\)
where \(\delta_1 = \alpha(1-\phi), \delta_2
= \phi, \delta_3 = \beta\) and the random error after the
transformation is \(\nu_t = (\epsilon_t
-\phi\epsilon_{t-1})\).
The koyckDlm() function is used to implement a two-staged least squares method to first estimate the \(\hat{Y}_{t-1}\) and the estimate \(Y_{t}\) through simple linear regression. Lets deduce Koyck geometric GLM models for each of the 4 regressors individually.
For Temperature regressor:
Koyck.Temp = koyckDlm(x = as.vector(mort$temp) , y = as.vector(mort$mortality) )
summary(Koyck.Temp$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.8714 -8.4484 -0.5811 7.2446 43.9005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 162.22228 17.58058 9.227 < 2e-16 ***
## Y.1 0.44475 0.05493 8.096 4.28e-15 ***
## X.t -0.92085 0.12974 -7.098 4.33e-12 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 504 190.8 <2e-16 ***
## Wu-Hausman 1 503 129.1 <2e-16 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.08 on 504 degrees of freedom
## Multiple R-Squared: 0.277, Adjusted R-squared: 0.2741
## Wald test: 210 on 2 and 504 DF, p-value: < 2.2e-16
Koyck DLM model with Temperature as regressor variable is significant at 5% significance level.
For Chemical 1 regressor:
Koyck.Chem1 = koyckDlm(x = as.vector(mort$chem1) , y = as.vector(mort$mortality) )
summary(Koyck.Chem1$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.82596 -5.89508 -0.06125 6.06967 32.82722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.46578 5.34042 10.012 <2e-16 ***
## Y.1 0.65058 0.03738 17.407 <2e-16 ***
## X.t 0.70588 0.22498 3.138 0.0018 **
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 504 186.21 <2e-16 ***
## Wu-Hausman 1 503 5.97 0.0149 *
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.017 on 504 degrees of freedom
## Multiple R-Squared: 0.5974, Adjusted R-squared: 0.5958
## Wald test: 336.8 on 2 and 504 DF, p-value: < 2.2e-16
Koyck DLM model with Chemical 1 as regressor variable is significant at 5% significance level.
For Chemical 2 regressor:
Koyck.Chem2 = koyckDlm(x = as.vector(mort$chem2) , y = as.vector(mort$mortality) )
summary(Koyck.Chem2$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.80875 -6.85425 0.06398 7.01094 31.94255
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.84742 5.56124 8.424 3.82e-16 ***
## Y.1 0.75420 0.04405 17.122 < 2e-16 ***
## X.t -0.10536 0.11751 -0.897 0.37
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 504 49.67 6.01e-12 ***
## Wu-Hausman 1 503 15.89 7.70e-05 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.34 on 504 degrees of freedom
## Multiple R-Squared: 0.4709, Adjusted R-squared: 0.4688
## Wald test: 252.9 on 2 and 504 DF, p-value: < 2.2e-16
Koyck DLM model with Chemical 2 as regressor variable is significant at 5% significance level.
For Pariticle size regressor:
Koyck.ParticleSize = koyckDlm(x = as.vector(mort$particle.size) , y = as.vector(mort$mortality) )
summary(Koyck.ParticleSize$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.04298 -5.91345 -0.04809 6.25653 32.26785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.19733 5.02197 9.398 <2e-16 ***
## Y.1 0.69461 0.03634 19.114 <2e-16 ***
## X.t 0.09294 0.06104 1.523 0.128
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 504 148.898 < 2e-16 ***
## Wu-Hausman 1 503 8.471 0.00377 **
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.348 on 504 degrees of freedom
## Multiple R-Squared: 0.5673, Adjusted R-squared: 0.5655
## Wald test: 309.9 on 2 and 504 DF, p-value: < 2.2e-16
Koyck DLM model with Pariticle size as regressor variable is significant at 5% significance level.
Koyck DLM models for each of the 4 regressors are significant.
MASE(Koyck.Temp, Koyck.Chem1, Koyck.Chem2, Koyck.ParticleSize) %>% arrange(MASE)
## n MASE
## Koyck.Chem1 507 0.8530742
## Koyck.ParticleSize 507 0.8837852
## Koyck.Chem2 507 0.9927387
## Koyck.Temp 507 1.1396264
As per MASE measure, Koyck DLM model with Chemical 1 as regressor is the best model for forecasting.
checkresiduals(Koyck.Chem1$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 112.71, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
Autoregressive Distributed lag model is a flexible and parsimonious
infinite DLM. The model is represented as,
\(Y_t = \mu + \beta_0 X_t + \beta_1 X_{t-1}
+ \gamma_1 Y_{t-1} + e_t\)
Similar to the Koyck DLM, it is possible to write this model as an infinite DLM with infinite lag distribution of any shape rather than a polynomial or geometric shape. The model is denoted as ARDL(p,q). To fit the model we will use ardlDlm() function is used. Lets find the best lag length using AIC and BIC score through an iteration. Lets set max lag length to 12.
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 144 ARDL (since max lag for response and predictor of ARDL model is 12, i.e, p = q = 12 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:12){
for(j in 1:12){
model4.1 = ardlDlm(formula = mortality ~ temp + chem1 + chem2 + particle.size, data = mort, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),],1) # Best model as per AIC
## p q AIC BIC
## 60 5 12 3444.216 3604.066
head(df[order( df[,4] ),],1) # Best model as per BIC
## p q AIC BIC
## 12 1 12 3448.146 3540.691
ARDL(5,12) and ARDL(1,12) are the best models as per AIC and BIC
scores respectively. Now, lets fit these 2 models,
1. ARDL(5,12) model (BEST FINITE DLM MODEL)
ARDL.5x12 = ardlDlm(formula = mortality ~ temp + chem1 + chem2 + particle.size, data = mort, p = 5, q = 12)
summary(ARDL.5x12)
##
## Time series regression with "ts" data:
## Start = 13, End = 508
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.297 -4.707 -0.229 4.891 32.044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.956115 11.346709 4.579 6.03e-06 ***
## temp.t 0.330981 0.086925 3.808 0.000159 ***
## temp.1 -0.391803 0.093837 -4.175 3.56e-05 ***
## temp.2 -0.176996 0.097987 -1.806 0.071522 .
## temp.3 0.173976 0.098493 1.766 0.077996 .
## temp.4 0.169554 0.093970 1.804 0.071834 .
## temp.5 -0.182285 0.087057 -2.094 0.036821 *
## chem1.t -0.562260 0.335931 -1.674 0.094864 .
## chem1.1 0.825981 0.347765 2.375 0.017954 *
## chem1.2 1.035985 0.373850 2.771 0.005813 **
## chem1.3 0.303235 0.376345 0.806 0.420811
## chem1.4 -0.327034 0.359288 -0.910 0.363180
## chem1.5 -0.218576 0.349796 -0.625 0.532369
## chem2.t 0.081020 0.068985 1.174 0.240823
## chem2.1 -0.038998 0.070020 -0.557 0.577823
## chem2.2 -0.088762 0.072207 -1.229 0.219601
## chem2.3 -0.084572 0.072486 -1.167 0.243920
## chem2.4 0.089458 0.069569 1.286 0.199127
## chem2.5 -0.037733 0.069050 -0.546 0.585014
## particle.size.t 0.148787 0.063509 2.343 0.019568 *
## particle.size.1 -0.120956 0.064361 -1.879 0.060832 .
## particle.size.2 -0.071128 0.063615 -1.118 0.264113
## particle.size.3 -0.062485 0.063657 -0.982 0.326819
## particle.size.4 0.010341 0.063776 0.162 0.871261
## particle.size.5 0.160916 0.063555 2.532 0.011676 *
## mortality.1 0.368259 0.046599 7.903 2.04e-14 ***
## mortality.2 0.384979 0.049186 7.827 3.48e-14 ***
## mortality.3 -0.001288 0.051757 -0.025 0.980155
## mortality.4 -0.071443 0.051184 -1.396 0.163446
## mortality.5 0.039420 0.049570 0.795 0.426885
## mortality.6 -0.058020 0.046371 -1.251 0.211492
## mortality.7 -0.052843 0.046106 -1.146 0.252344
## mortality.8 0.020605 0.046402 0.444 0.657212
## mortality.9 0.076066 0.046484 1.636 0.102439
## mortality.10 -0.055984 0.046805 -1.196 0.232276
## mortality.11 0.037981 0.043296 0.877 0.380817
## mortality.12 -0.005607 0.041084 -0.136 0.891496
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.502 on 459 degrees of freedom
## Multiple R-squared: 0.7424, Adjusted R-squared: 0.7222
## F-statistic: 36.74 on 36 and 459 DF, p-value: < 2.2e-16
checkresiduals(ARDL.5x12$model)
##
## Breusch-Godfrey test for serial correlation of order up to 40
##
## data: Residuals
## LM test = 35.675, df = 40, p-value = 0.6652
MASE(ARDL.5x12)
## MASE
## ARDL.5x12 0.6808706
Summary of ARDL(5x12) DLM model
2. ARDL(1,12) model
ARDL.1x12 = ardlDlm(formula = mortality ~ temp + chem1 + chem2 + particle.size, data = mort, p = 1, q = 12)
summary(ARDL.1x12)
##
## Time series regression with "ts" data:
## Start = 13, End = 508
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.125 -4.524 -0.314 4.910 30.287
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.669410 9.695803 5.742 1.67e-08 ***
## temp.t 0.299804 0.077392 3.874 0.000122 ***
## temp.1 -0.446996 0.076893 -5.813 1.13e-08 ***
## chem1.t -0.266850 0.287687 -0.928 0.354102
## chem1.1 1.063044 0.297810 3.570 0.000394 ***
## chem2.t 0.075833 0.062642 1.211 0.226664
## chem2.1 -0.095445 0.063014 -1.515 0.130523
## particle.size.t 0.109059 0.055098 1.979 0.048353 *
## particle.size.1 -0.069979 0.055129 -1.269 0.204935
## mortality.1 0.385514 0.044057 8.750 < 2e-16 ***
## mortality.2 0.343553 0.044331 7.750 5.63e-14 ***
## mortality.3 -0.025005 0.046485 -0.538 0.590890
## mortality.4 0.003468 0.046338 0.075 0.940364
## mortality.5 0.027723 0.046302 0.599 0.549632
## mortality.6 -0.045346 0.046334 -0.979 0.328243
## mortality.7 -0.050591 0.046366 -1.091 0.275778
## mortality.8 0.015998 0.046757 0.342 0.732385
## mortality.9 0.074476 0.046828 1.590 0.112406
## mortality.10 -0.054654 0.047043 -1.162 0.245904
## mortality.11 0.020304 0.043490 0.467 0.640814
## mortality.12 -0.002706 0.040974 -0.066 0.947379
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.647 on 475 degrees of freedom
## Multiple R-squared: 0.723, Adjusted R-squared: 0.7114
## F-statistic: 62 on 20 and 475 DF, p-value: < 2.2e-16
checkresiduals(ARDL.1x12$model)
##
## Breusch-Godfrey test for serial correlation of order up to 24
##
## data: Residuals
## LM test = 37.453, df = 24, p-value = 0.03941
MASE(ARDL.1x12)
## MASE
## ARDL.1x12 0.7024931
Summary of ARDL(1x12) DLM model
ARDL(5,12) is the best of all ARDL models with better MASE and adjusted R-squared statistics. Also, ARDL(5,12) does not violate assumptions of normality, linearity and serial autocorrelation.
The 4 DLM models are,
mean absolute scaled errors or MASE
of these models are,
MASE(DLM.model, PolyDLM.Chem1, Koyck.Chem1, ARDL.5x12) %>% arrange(MASE)
## n MASE
## ARDL.5x12 496 0.6808706
## DLM.model 496 0.7966763
## Koyck.Chem1 507 0.8530742
## PolyDLM.Chem1 496 0.8944918
The Best DLM model for the Mortality response is based on the precipitation regressor which gives the most accurate forecasting based on the MASE measure is the Autoregressive DLM model, ARDL.5x12 with MASE measure of 0.6808706.
Dynamic linear models are general class of time series regression models which can account for trends, seasonality, serial correlation between response and regressor variable, and most importantly the affect of intervention points.
The response of a general Dynamic linear model is,
\(Y_t = \omega_2Y_{t-1} + (\omega_0 +
\omega_1)P_t - \omega_2\omega_0P_{t-1} + N_t\)
where,
Lets revisit the time series plot for the response, Mortality, to
visualize possible intervention points
plot(Mortality)
As mentioned at the descriptive analysis stage, there is no clear intervention that we identify visually. But maybe week 153 might be an intervention point just because of its magnitude. Assuming this intervention point lets fit a Dynamic Linear model and see if the pulse function at week 153 is significant or not.
Now, lets fit Dynamic Linear model using dynlm() as shown below, (Note, the potential intervention point was identified at Week 153).
MortalityX = ts(mort[,1], start = c(2010,1), frequency = 52) # set frequency
Y.t = MortalityX
T = 153 # The time point when the intervention occurred
P.t = 1*(seq(MortalityX) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t) + season(Y.t)) # library(dynlm)
Dyn.model1 = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + trend(Y.t) + season(Y.t)) # library(dynlm)
Dyn.model2 = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + P.t.1 + trend(Y.t) + season(Y.t)) # library(dynlm)
Dyn.model3 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + P.t + P.t.1 + trend(Y.t) + season(Y.t)) # library(dynlm)
Dyn.model4 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t) + season(Y.t)) # library(dynlm)
AIC(Dyn.model, Dyn.model1, Dyn.model2, Dyn.model3, Dyn.model4) %>% arrange(AIC)
## df AIC
## Dyn.model 58 3581.297
## Dyn.model4 58 3581.297
## Dyn.model3 57 3585.558
## Dyn.model1 56 3675.135
## Dyn.model2 56 3675.135
Dyn.model is the best Dynamic Linear model with 3 lagged components of the response (Mortality), a significant pulse component at T=153rd week, and trend and seasonal components of Mortality series having frequency of 52 weeks as per the AIC score. Lets look at the summary statistics and check residuals
summary(Dyn.model)
##
## Time series regression with "ts" data:
## Start = 2010(4), End = 2019(40)
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + L(Y.t, k = 2) + L(Y.t,
## k = 3) + P.t + trend(Y.t) + season(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.4794 -5.0652 -0.0579 4.7323 28.8326
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.81090 9.82971 6.186 1.39e-09 ***
## L(Y.t, k = 1) 0.24542 0.04779 5.135 4.21e-07 ***
## L(Y.t, k = 2) 0.38445 0.04514 8.518 2.50e-16 ***
## L(Y.t, k = 3) 0.02813 0.04730 0.595 0.552361
## P.t 19.98213 8.64428 2.312 0.021252 *
## trend(Y.t) -0.55873 0.15121 -3.695 0.000247 ***
## season(Y.t)2 1.16618 3.75922 0.310 0.756539
## season(Y.t)3 -3.90147 3.76534 -1.036 0.300688
## season(Y.t)4 -0.21336 3.69209 -0.058 0.953942
## season(Y.t)5 4.08592 3.70091 1.104 0.270172
## season(Y.t)6 -2.43424 3.68742 -0.660 0.509498
## season(Y.t)7 -4.26038 3.70334 -1.150 0.250587
## season(Y.t)8 -0.69626 3.73511 -0.186 0.852208
## season(Y.t)9 -4.06793 3.73698 -1.089 0.276933
## season(Y.t)10 -1.16705 3.76150 -0.310 0.756506
## season(Y.t)11 -1.32840 3.76654 -0.353 0.724491
## season(Y.t)12 -4.89799 3.77185 -1.299 0.194761
## season(Y.t)13 0.80463 3.79204 0.212 0.832055
## season(Y.t)14 -6.62136 3.78599 -1.749 0.080991 .
## season(Y.t)15 0.22392 3.82142 0.059 0.953300
## season(Y.t)16 -4.59251 3.81011 -1.205 0.228706
## season(Y.t)17 -4.32770 3.83136 -1.130 0.259272
## season(Y.t)18 -3.84213 3.83334 -1.002 0.316743
## season(Y.t)19 -1.81591 3.84827 -0.472 0.637245
## season(Y.t)20 0.93120 3.84012 0.242 0.808510
## season(Y.t)21 -5.58830 3.82162 -1.462 0.144364
## season(Y.t)22 -5.39485 3.83070 -1.408 0.159730
## season(Y.t)23 -4.11372 3.85112 -1.068 0.286011
## season(Y.t)24 -2.38456 3.86640 -0.617 0.537720
## season(Y.t)25 2.70699 3.86080 0.701 0.483576
## season(Y.t)26 -5.68786 3.83237 -1.484 0.138470
## season(Y.t)27 -7.66912 3.83656 -1.999 0.046217 *
## season(Y.t)28 -5.62879 3.87081 -1.454 0.146601
## season(Y.t)29 -1.84193 3.89559 -0.473 0.636568
## season(Y.t)30 -0.98999 3.88793 -0.255 0.799125
## season(Y.t)31 -3.58448 3.86759 -0.927 0.354530
## season(Y.t)32 -0.69539 3.85448 -0.180 0.856912
## season(Y.t)33 0.23723 3.83915 0.062 0.950755
## season(Y.t)34 -1.28064 3.82116 -0.335 0.737672
## season(Y.t)35 -7.57592 3.80111 -1.993 0.046859 *
## season(Y.t)36 0.83909 3.83836 0.219 0.827056
## season(Y.t)37 -0.98780 3.82847 -0.258 0.796513
## season(Y.t)38 3.61908 3.82361 0.947 0.344399
## season(Y.t)39 0.25202 3.77680 0.067 0.946827
## season(Y.t)40 -0.30103 3.76586 -0.080 0.936323
## season(Y.t)41 0.57732 3.85049 0.150 0.880884
## season(Y.t)42 3.67096 3.84761 0.954 0.340554
## season(Y.t)43 5.43604 3.82854 1.420 0.156340
## season(Y.t)44 1.50648 3.80624 0.396 0.692446
## season(Y.t)45 9.57170 3.79071 2.525 0.011912 *
## season(Y.t)46 14.51684 3.77913 3.841 0.000140 ***
## season(Y.t)47 14.80036 3.78414 3.911 0.000106 ***
## season(Y.t)48 5.15147 3.76452 1.368 0.171864
## season(Y.t)49 5.18546 3.87420 1.338 0.181425
## season(Y.t)50 1.48511 3.76214 0.395 0.693214
## season(Y.t)51 3.51171 3.75971 0.934 0.350788
## season(Y.t)52 6.97193 3.76239 1.853 0.064531 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.94 on 448 degrees of freedom
## Multiple R-squared: 0.7208, Adjusted R-squared: 0.6859
## F-statistic: 20.65 on 56 and 448 DF, p-value: < 2.2e-16
checkresiduals(Dyn.model)
##
## Breusch-Godfrey test for serial correlation of order up to 101
##
## data: Residuals
## LM test = 123.45, df = 101, p-value = 0.06399
Summary of Dynamic linear model, Dyn.model
Most importantly, the dynamic linear model is insignificant although the pulse (P.t) component significant at 153rd week. Thus, Dynamic Linear model is not suitable/necessary for our Mortality time series.
Exponential smoothing methods including the state-space models takes into consideration the Error component, Trend component and seasonality component of the time series. Each of these components can be absent (None), Additive (A) or Multiplicative (M). Hence, these models are represented as ETS(ZZZ) representing the Error, Trend and Seasonal component respectively.
The best Exponential Smoothing model or State-Space model for our Mortality time series can be easily identified by triggering the auto-search by setting the argument model = “ZZZ” in the ets() as shown below. Also, we will check if damped trend and the possibility of drift give us better models.
Best Exponential Smoothing model -
autofit.ETS = ets(Mortality, model="ZZZ")
summary(autofit.ETS)
## ETS(M,N,N)
##
## Call:
## ets(y = Mortality, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.4818
##
## Initial states:
## l = 184.0437
##
## sigma: 0.0526
##
## AIC AICc BIC
## 5386.809 5386.857 5399.500
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.06720288 9.061271 7.121783 -0.2494414 4.194287 0.8628215
## ACF1
## Training set -0.05776727
checkresiduals(autofit.ETS)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,N)
## Q* = 37.641, df = 10, p-value = 4.381e-05
##
## Model df: 0. Total lags used: 10
System chooses the Simple exponential smoothing with Multiplicative errors ETS(MNN). MASE is 0.8628215.
Best Exponential Smoothing model with damping -
autofit.ETS.damped = ets(Mortality, model="ZZZ", damped = TRUE)
summary(autofit.ETS.damped)
## ETS(M,Ad,N)
##
## Call:
## ets(y = Mortality, model = "ZZZ", damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.4501
## beta = 0.0311
## phi = 0.8
##
## Initial states:
## l = 189.1078
## b = -1.587
##
## sigma: 0.0527
##
## AIC AICc BIC
## 5391.739 5391.906 5417.122
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.04653304 9.063544 7.111666 -0.2250772 4.186325 0.8615958
## ACF1
## Training set -0.04632118
checkresiduals(autofit.ETS.damped)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,Ad,N)
## Q* = 36.302, df = 10, p-value = 7.468e-05
##
## Model df: 0. Total lags used: 10
System chooses the Holt’s damped model with Multiplicative errors ETS(MAdN). MASE is 0.8615958.
Best Exponential Smoothing model with drift -
autofit.ETS.drift = ets(Mortality, model="ZZZ", beta = 1E-4)
summary(autofit.ETS.drift)
## ETS(M,N,N)
##
## Call:
## ets(y = Mortality, model = "ZZZ", beta = 1e-04)
##
## Smoothing parameters:
## alpha = 0.4818
##
## Initial states:
## l = 184.0437
##
## sigma: 0.0526
##
## AIC AICc BIC
## 5386.809 5386.857 5399.500
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.06720288 9.061271 7.121783 -0.2494414 4.194287 0.8628215
## ACF1
## Training set -0.05776727
checkresiduals(autofit.ETS.drift)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,N)
## Q* = 37.641, df = 10, p-value = 4.381e-05
##
## Model df: 0. Total lags used: 10
Again system chooses the ETS(MNN) model.
Thus, the best Exponential smoothing or State-state model for our Mortality series is the best Holt’s damped model with Multiplicative errors ETS(M,Ad,N) with MASE score of 0.8615958.
The best State-space model which gives the most accurate forecasting based on the MASE measure is ETS(M,Ad,N) having lowest MASE measure of 0.8615958 of all possible State space models.
Based on the 4 Time series regression methods considered, the best
model as per MASE measure for each method is summarized below,
A. Best Distributed lag models is - Autoregressive Distributed Lag model ARDL(5,12) with MASE measure of 0.6808706, AIC of 3444.216, BIC of 3604.066 and Adjusted R-squared of 72.22%.
B. Best Dynamic linear models is - None (No intervention points were present)
C. Best Exponential smoothing and State-Space model is - Holt’s damped model with Multiplicative errors ETS(M,Ad,N) with MASE measure of 0.8615958., AIC of 5391.739 and BIC of 5417.122
Clearly, the best model is Autoregressive Distributed Lag model ARDL(5,12) as per AIC, BIC and MASE measures.
Best Time Series regression model is - Autoregressive Distributed Lag model ARDL(5,12) with MASE measure of 0.6808706.
Residual analysis to test model assumptions.
Lets perform a detailed Residual Analysis to check if any model assumptions have been violated.
The estimator error (or residual) is defined by:
\(\hat{\epsilon_i}\) = \(Y_i\) - \(\hat{Y_i}\) (i.e. observed value less - trend value)
The following problems are to be checked,
Lets first apply diagnostic check using checkresiduals() function,
checkresiduals(ARDL.5x12)
## Time Series:
## Start = 13
## End = 508
## Frequency = 1
## 13 14 15 16 17 18
## 3.95494986 -6.36350734 7.99771977 -11.75431331 7.09884780 4.58047006
## 19 20 21 22 23 24
## -5.96534256 10.94415086 -0.81117987 -1.91017591 -3.31810510 -2.18290048
## 25 26 27 28 29 30
## 7.74738011 -2.17198209 -1.61317247 -3.12823624 6.77108837 2.48328299
## 31 32 33 34 35 36
## -6.58151519 5.13409668 0.91884745 -4.34035642 -4.17477177 -4.38950894
## 37 38 39 40 41 42
## -3.10306926 4.40231690 -1.00584694 1.48994828 -17.48630487 -5.45136378
## 43 44 45 46 47 48
## -1.57291605 1.67982240 0.51996408 8.67040743 16.14082832 -1.26590273
## 49 50 51 52 53 54
## -1.97864371 -10.82760233 -1.60772773 5.38027355 -1.48993599 -1.20461513
## 55 56 57 58 59 60
## 8.07891725 -5.72503141 14.99810002 0.76348794 0.42326550 -3.40900049
## 61 62 63 64 65 66
## -5.68326211 1.34607468 5.82962069 -7.16286550 5.14270543 -1.80840368
## 67 68 69 70 71 72
## 9.90398629 8.82709463 -3.25134642 -7.49177106 7.99751593 2.47436299
## 73 74 75 76 77 78
## 7.63428207 0.91275020 0.05189774 0.65424475 24.72791046 2.55448363
## 79 80 81 82 83 84
## -13.89578958 3.12600723 4.13244257 8.72361352 7.23958918 0.58099168
## 85 86 87 88 89 90
## -1.64437876 3.10848043 0.18718698 6.55168796 4.12022167 0.94683856
## 91 92 93 94 95 96
## -12.58909035 -2.78863935 3.81057487 1.61142040 5.60086580 4.19176194
## 97 98 99 100 101 102
## 12.82719343 15.93749541 4.81137291 -3.77471656 6.63967465 -5.85526998
## 103 104 105 106 107 108
## 0.89703193 5.37633720 1.14403497 -12.56215249 -9.60249373 -5.40703500
## 109 110 111 112 113 114
## 8.53644596 -9.71880601 -3.56142163 -1.72870244 -0.57467916 5.79515905
## 115 116 117 118 119 120
## -1.02364995 -4.23894337 8.64961729 -1.43960957 14.42047405 -10.55216758
## 121 122 123 124 125 126
## 0.24942657 -1.32335769 4.78266953 9.35751552 1.24405800 3.07568849
## 127 128 129 130 131 132
## 14.21141664 4.94421817 9.12753948 -8.35512494 -1.05777654 -6.98414765
## 133 134 135 136 137 138
## 3.91420649 7.89516998 1.60605040 0.24529240 -5.39644512 0.05577879
## 139 140 141 142 143 144
## -2.96246220 10.18351258 -5.94502174 9.11210802 5.50275447 -6.57941825
## 145 146 147 148 149 150
## 1.83149064 1.92769126 0.94023115 -1.98544788 11.19442990 8.26246496
## 151 152 153 154 155 156
## 32.04402112 20.52385883 16.63266388 -3.41483471 7.40346882 -22.25734296
## 157 158 159 160 161 162
## -15.96459384 6.60885769 -9.62958053 4.25297252 -9.86589255 0.94396185
## 163 164 165 166 167 168
## -1.45133493 1.81847683 -24.29726730 21.69488829 -4.48157971 -2.19209220
## 169 170 171 172 173 174
## 2.06410303 -9.73895260 -4.99272731 5.49865228 -2.71224559 2.38271481
## 175 176 177 178 179 180
## 5.51336039 3.14767406 -1.10161022 -7.13701392 -3.24146665 5.00952959
## 181 182 183 184 185 186
## 7.89868239 -3.16097664 -1.81958855 -7.75598599 -3.71025819 -0.95460849
## 187 188 189 190 191 192
## 6.48038568 7.29581998 5.61478856 -5.10998855 -6.60046652 -5.44791623
## 193 194 195 196 197 198
## -0.71296493 0.08204407 2.08372614 -7.33045097 -4.72240256 -1.10571072
## 199 200 201 202 203 204
## 6.15443951 -10.80724807 -0.23485328 6.29358350 8.21373595 -15.02892544
## 205 206 207 208 209 210
## -15.70656567 -7.52102948 1.69214433 0.77130990 -1.94060059 1.07997724
## 211 212 213 214 215 216
## -5.70402995 12.37452330 -8.52640839 -2.49407150 -12.31886927 5.48771654
## 217 218 219 220 221 222
## 4.18604015 -2.77747763 7.02287339 0.80685187 8.18575344 -4.14505050
## 223 224 225 226 227 228
## -8.19552814 3.76189147 1.28399051 -0.27506470 -4.38325701 10.20273698
## 229 230 231 232 233 234
## -6.13595188 0.47809552 -0.03071909 -2.44707077 12.24200374 -11.56979564
## 235 236 237 238 239 240
## -1.06037401 -1.70226168 6.54993494 5.36430268 -0.41354634 -2.86678474
## 241 242 243 244 245 246
## -2.01335244 4.58513203 3.91936390 -3.62514114 4.67264810 -5.04558390
## 247 248 249 250 251 252
## -12.12825316 3.87871254 -5.90578042 9.25150484 -9.45979822 -5.51826691
## 253 254 255 256 257 258
## 0.66683663 16.23955168 -1.75061512 4.13312030 10.49940565 0.20890397
## 259 260 261 262 263 264
## 6.53612137 8.82886563 -14.99919117 -10.70611254 -14.14124202 0.18674557
## 265 266 267 268 269 270
## 2.81197390 -7.52926052 -4.72786981 6.33816656 -8.19211793 -0.29791104
## 271 272 273 274 275 276
## -15.48250071 -5.51345178 -0.30909320 -1.85980470 10.07902840 -8.50006956
## 277 278 279 280 281 282
## -3.95605451 -12.20879074 4.93301944 3.54516835 -1.10492530 -7.25058298
## 283 284 285 286 287 288
## -5.56065095 5.10586215 -1.03675046 -0.90100348 1.87133137 -10.78375617
## 289 290 291 292 293 294
## 1.76204014 -1.34097098 4.54284422 0.86593809 -2.04524076 -5.94074903
## 295 296 297 298 299 300
## -5.90248596 -2.25734110 8.97666898 -1.11631874 -12.77628876 -3.22769716
## 301 302 303 304 305 306
## -0.22294276 8.41745253 -6.67243994 -2.74484401 -11.28232816 -7.51264743
## 307 308 309 310 311 312
## 3.50720366 -8.21929782 -5.81503744 -8.47385201 -7.14613964 16.96539422
## 313 314 315 316 317 318
## 4.96927543 6.34034521 1.34080760 10.29836049 10.27245140 -10.08435339
## 319 320 321 322 323 324
## -7.52021221 4.77782974 -0.48242631 -6.11398827 -2.86695154 1.63487650
## 325 326 327 328 329 330
## 9.35300423 -7.72133706 -6.95062429 -3.82452975 0.81510671 4.54748135
## 331 332 333 334 335 336
## -10.27773923 11.26008371 -3.54523194 -7.35547627 -2.76545150 -8.55326819
## 337 338 339 340 341 342
## 2.43992524 8.71181625 -6.35509030 -2.42100759 -8.89243483 -1.67706047
## 343 344 345 346 347 348
## -6.65317861 5.78204085 -6.18828881 -5.55413728 -2.35445672 5.80385550
## 349 350 351 352 353 354
## 0.35626608 1.73276358 -2.99400288 -6.47141511 -4.84022067 -6.87039476
## 355 356 357 358 359 360
## 0.76104295 -14.13899076 10.13365038 -1.75451284 4.26403685 -5.33777551
## 361 362 363 364 365 366
## -5.62726884 -2.60996356 -21.16215705 -2.81155085 2.87606613 -10.71700066
## 367 368 369 370 371 372
## -7.93687031 5.29189659 6.60039848 -2.77491951 6.78954301 1.80329898
## 373 374 375 376 377 378
## -4.85569184 0.91986147 0.21856595 5.05903518 -2.70500831 -8.83209164
## 379 380 381 382 383 384
## 5.79678548 1.24502036 -2.22752446 2.81396419 -0.62689599 -8.49049411
## 385 386 387 388 389 390
## -4.50358734 0.48898681 0.15573572 5.10851477 3.75209636 1.33272178
## 391 392 393 394 395 396
## -6.15677834 3.26092506 2.09406121 -6.48780741 -1.20246600 -4.17537649
## 397 398 399 400 401 402
## -4.14695808 10.99456729 -12.80809145 4.19788948 1.95792601 -3.66649168
## 403 404 405 406 407 408
## -2.72680158 1.01894967 -15.09684944 -4.05525714 -1.80574978 5.87611750
## 409 410 411 412 413 414
## -3.44941047 6.49922835 6.81761828 -0.01406888 0.99094710 17.00433730
## 415 416 417 418 419 420
## 9.09424738 -8.32513793 4.05406324 5.88077288 -7.67549369 -4.11298651
## 421 422 423 424 425 426
## 5.07045780 -1.71621091 0.26741571 4.99926486 -3.18277015 -3.53453799
## 427 428 429 430 431 432
## 3.57014104 2.62910704 -9.66892826 2.67009821 3.84785088 -8.35412345
## 433 434 435 436 437 438
## 4.22764416 -6.20473009 1.81663193 1.90998451 2.26532096 -1.70997595
## 439 440 441 442 443 444
## 8.72943081 -1.20527249 -2.57651108 6.31449861 12.06466565 -7.59479846
## 445 446 447 448 449 450
## -3.95599036 6.75758022 2.15766636 -0.38119962 12.78150215 7.27005142
## 451 452 453 454 455 456
## -13.42279147 -5.75120165 -2.31732281 11.65348196 -0.37507577 -0.48908806
## 457 458 459 460 461 462
## 6.37889410 -5.85706155 -3.08613221 7.49879999 1.19246291 20.77003058
## 463 464 465 466 467 468
## 5.13014423 -8.26457285 -3.09503403 -1.32484604 -0.30632603 11.92483602
## 469 470 471 472 473 474
## -6.41935443 2.36767161 1.20286845 -0.36366850 -4.70177013 4.23713734
## 475 476 477 478 479 480
## -0.49427117 -1.53169214 6.12857475 5.50521529 -10.11844644 2.83869858
## 481 482 483 484 485 486
## 0.32598571 12.55872717 -0.11072409 18.46248465 -8.87724870 -6.68243304
## 487 488 489 490 491 492
## 8.15156241 -10.61184195 2.15745252 1.61811292 3.21339731 -14.94680715
## 493 494 495 496 497 498
## 7.18023010 -6.57171590 -2.03292254 9.76134156 -3.67465512 -3.11276137
## 499 500 501 502 503 504
## 4.87702190 10.68668331 7.92422530 -11.62216091 -15.98896382 7.12266604
## 505 506 507 508
## -1.66429198 4.67593289 8.28451087 2.32033818
##
## Ljung-Box test
##
## data: Residuals
## Q* = 1.9828, df = 10, p-value = 0.9965
##
## Model df: 0. Total lags used: 10
From the Residuals plot, linearity is not violated as the residuals are randomly distributed across the mean. Thus, linearity in distribution of error terms is not violated
To test mean value of residuals is zero or not, lets calculate mean value of residuals as,
mean(ARDL.5x12$model$residuals)
## [1] 1.875473e-16
As mean value of residuals is close to 0, zero mean residuals is not violated.
Which has,
\(H_0\) : series
of residuals exhibit no serial autocorrelation of any order up to p
\(H_a\) : series of residuals
exhibit serial autocorrelation of any order up to p
From the Ljung-Box test output, since p (0.9965) > 0.05, we do not reject the null hypothesis of no serial autocorrelation.
Thus, according to this test and ACF plot, we can conclude that the serial correlation left in residuals is insignificant.
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(ARDL.5x12$model$residuals)
##
## Shapiro-Wilk normality test
##
## data: ARDL.5x12$model$residuals
## W = 0.99131, p-value = 0.005213
From the Shapiro-Wilk test, since p<0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, residuals of ARDL.5x12 are Not normally distributed.
Summarizing residual analysis on \(full\) model:
Assumption 1: The error terms are randomly distributed and thus show
linearity: Not violated
Assumption 2:
The mean value of E is zero (zero mean residuals): Not
violated
Assumption 4: The error terms are
independently distributed, i.e. they are not autocorrelated:
Not violated
Assumption 5: The errors
are normally distributed. Violated
Although normality of residuals assumption is violated, ‘There is no normality assumption in fitting an exponential smoothing model’ (Rob Hyndman 2013). Having no residual assumptions’ violations, the Holt’s damped model with Multiplicative errors ETS(M,Ad,N) model is good for accurate forecasting of Mortality. Lets forecast for the next 4 weeks ahead Mortality,
Using MASE measure, Autoregressive Distributed Lag model ARDL(5,12) is best fitted model to forecast Mortality. Lets estimate and plot 4 weeks (509-512 weeks) ahead forecasts for Mortality series.
Observed and fitted values are plotted below. This plot indicates a good agreement between the model and the original series.
plot(Mortality,ylab='Mortality', xlab = 'week', type="l", col="black", main="Observed and fitted values using ARDL(5,12) model on Mortality")
lines(ARDL.5x12$model$fitted.values, col="red")
legend("topleft",lty=1, text.width = 12,
col=c("black", "red"),
c("Mortality series", "ARDL(5,12) fit"))
Since the future covariates aren’t given, lets estimate the best Exponential smoothing/State-Space model for each of the 4 covariates first. A custom function GoFVals() will be used.
GoFVals = function(data, H, models){
M = length(models) # The number of competing models
N = length(data) # The number of considered time series
fit.models = list()
series = array(NA, N*M)
FittedModels = array(NA, N*M)
AIC = array(NA, N*M)
AICc = array(NA, N*M)
BIC = array(NA, N*M)
HQIC = array(NA, N*M)
MASE = array(NA, N*M)
mean.MASE = array(NA, N)
median.MASE = array(NA, N)
GoF = data.frame(series, FittedModels, AIC, AICc, BIC, HQIC, MASE)
count = 0
for ( j in 1:N){
sum.MASE = 0
sample.median = array(NA, M)
for ( i in 1: M){
count = count + 1
fit.models[[count]] = ets(data[[j]], model = models[i])
GoF$AIC[count] = fit.models[[count]]$aic
GoF$AICc[count] = fit.models[[count]]$aicc
GoF$BIC[count] = fit.models[[count]]$bic
q = length(fit.models[[count]]$par)
GoF$HQIC[count] = -2*fit.models[[count]]$loglik+ 2*q*log(log(length(data[[j]])))
GoF$MASE[count] = accuracy(fit.models[[count]])[6]
sum.MASE = sum.MASE + GoF$MASE[count]
sample.median[i] = GoF$MASE[count]
GoF$series[count] = j
GoF$FittedModels[count] = models[i]
}
mean.MASE[j] = sum.MASE / N
median.MASE[j] = median(sample.median)
}
return(list(GoF = GoF, mean.MASE = mean.MASE, median.MASE = median.MASE))
}
The 4 regressors auto fit to either “MAdN”, “AAdN”, “ANN” or “MAN”. (This part of analysis has been hidden for simplicity purpose). Hence we will focus on these 4 ETS models for the 4 regressors, Temperature, Chemical 1 and 2, and particle size. The fitting model for each of these 4 regressors using the GoFVals() function is shown below.
# Series to be modelled
data = list()
data[[1]] = Temp
data[[2]] = Chem1
data[[3]] = Chem2
data[[4]] = ParticleSize
# Specify the forecast horizon
H = 4
# Specify the models we will focus on
models = c("MAN", "AAN", "ANN")
GoFVals(data = data, H = H, models = models)
## $GoF
## series FittedModels AIC AICc BIC HQIC MASE
## 1 1 MAN 5093.616 5093.784 5118.999 5099.910 0.8200243
## 2 1 AAN 5076.333 5076.501 5101.716 5082.628 0.8210695
## 3 1 ANN 5088.180 5088.227 5100.871 5089.498 0.8233697
## 4 2 MAN 3922.682 3922.850 3948.065 3928.977 0.7599568
## 5 2 AAN 4097.537 4097.705 4122.920 4103.832 0.7590948
## 6 2 ANN 4130.231 4130.279 4142.922 4131.549 0.7992823
## 7 3 MAN 5610.257 5610.424 5635.640 5616.551 0.7553396
## 8 3 AAN 5631.253 5631.421 5656.636 5637.547 0.7551668
## 9 3 ANN 5641.994 5642.042 5654.686 5643.312 0.7683649
## 10 4 MAN 5598.713 5598.881 5624.096 5605.008 0.7749270
## 11 4 AAN 5621.606 5621.774 5646.989 5627.901 0.7669993
## 12 4 ANN 5643.652 5643.700 5656.344 5644.970 0.8037025
##
## $mean.MASE
## [1] 0.6161159 0.5795835 0.5697178 0.5864072
##
## $median.MASE
## [1] 0.8210695 0.7599568 0.7553396 0.7749270
Based on MASE, the best ETS models for each regressor are,
Lets fit these models and get the future covariates,
fit.MAN.Temp = ets(Temp, model="MAN")
forecast.MAN.Temp = forecast::forecast(fit.MAN.Temp, h = 4)
fit.MAN.Chem1 = ets(Chem1, model="AAN")
forecast.MAN.Chem1 = forecast::forecast(fit.MAN.Chem1, h = 4)
fit.MAN.Chem2 = ets(Chem2, model="AAN")
forecast.MAN.Chem2 = forecast::forecast(fit.MAN.Chem2, h = 4)
fit.MAN.ParticleSize = ets(ParticleSize, model="AAN")
forecast.MAN.ParticleSize = forecast::forecast(fit.MAN.ParticleSize, h = 4)
Using the Point Forecasts of these covariates, we can now forecast our Mortality response.
x.new = t(matrix(c(forecast.MAN.Temp$mean, forecast.MAN.Chem1$mean, forecast.MAN.Chem2$mean, forecast.MAN.ParticleSize$mean), ncol = 4,
nrow = 4))
forecasts.ardldlm = dLagM::forecast(model = ARDL.5x12, x = x.new, h = 4)$forecasts
Forecast using overall BEST fitting model:
The point forecasts and the forecast plot using the overall best fitting model, ARDL(5,12) is given below,
df <- data.frame(
ARDL_forecasts = c(forecasts.ardldlm)
)
row.names(df) <- c("week 509", "week 510", "week 511", "week 512")
df
## ARDL_forecasts
## week 509 171.0425
## week 510 171.7601
## week 511 171.2978
## week 512 171.4611
Mortality.extended4 = c(Mortality , forecasts.ardldlm)
{
plot(ts(Mortality.extended4),type="l", col = "red", xlim= c(400, 515),
ylab = "Mortality", xlab = "Weeks",
main="4 weeks ahead forecast for Mortality series
using ARDL(5,12) model")
lines(Mortality,col="black",type="l")
legend("topleft",lty=1,
col=c("black", "red"),
c("Mortality series", "ARDL(5,12) forecasts"))
}
The forecasts for best Finite DLM, Polynomial DLM, Koyck, and Exponential smoothing/State-space model are plotted and given below (Note, Dynamic Linear model was found insignificant),
For Distributed Lag models:
The 4 weeks ahead Point forecasts for the 4 DLM models are printed and plotted below,
# Forecasts using Finite DLM
forecasts.dlm = dLagM::forecast(model = DLM.model, x = x.new, h = 4)$forecasts
# Forecasts using Polynomial DLM
x.new2 = c(forecast.MAN.Chem1$mean)
forecasts.polydlm = dLagM::forecast(model = PolyDLM.Chem1 , x = x.new2, h = 4)$forecasts
# Forecasts using Koyck DLM
x.new3 = c(forecast.MAN.Chem1$mean)
forecasts.koyckdlm = dLagM::forecast(model = Koyck.Chem1 , x = x.new3, h = 4)$forecasts
# Forecasts using ARDL
forecasts.ardldlm = dLagM::forecast(model = ARDL.5x12, x = x.new, h = 4)$forecasts
df <- data.frame(
Finite_DLM_forecasts = c(forecasts.dlm),
Polynomial_DLM_forecasts = c(forecasts.polydlm),
Koyck_DLM_forecasts = c(forecasts.koyckdlm),
ARDL_forecasts = c(forecasts.ardldlm)
)
row.names(df) <- c("week 509", "week 510", "week 511", "week 512")
df
## Finite_DLM_forecasts Polynomial_DLM_forecasts Koyck_DLM_forecasts
## week 509 167.0714 164.6609 170.3823
## week 510 167.3189 165.2797 169.9345
## week 511 169.0378 166.2768 169.7852
## week 512 167.4805 166.9232 169.8031
## ARDL_forecasts
## week 509 171.0425
## week 510 171.7601
## week 511 171.2978
## week 512 171.4611
Mortality.extended1 = c(Mortality , forecasts.dlm)
Mortality.extended2 = c(Mortality , forecasts.polydlm)
Mortality.extended3 = c(Mortality , forecasts.koyckdlm)
Mortality.extended4 = c(Mortality , forecasts.ardldlm)
{
plot(ts(Mortality.extended4),type="l", col = "Red", xlim= c(400, 515),
ylab = "Mortality", xlab = "Weeks",
main="4 weeks ahead forecast for Mortality series
using DLM models")
lines(ts(Mortality.extended1),col="blue",type="l")
lines(ts(Mortality.extended2),col="green",type="l")
lines(ts(Mortality.extended3),col="orange",type="l")
lines(Mortality,col="black",type="l")
legend("topleft",lty=1,
col=c("black", "red", "blue", "green", "orange"),
c("Mortality series", "ARDL(5,12) forecasts", "Finite DLM forecasts", "Polynomial DLM forecasts", "Koyck DLM forecasts"))
}
For Exponential smoothing/State-space model:
The 4 weeks ahead point forecasts and Confidence intervals are printed and plotted below,
forecasts.Dynlm = forecast::forecast(autofit.ETS.damped, h =4)
forecasts.Dynlm
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 509 167.6492 156.3245 178.9738 150.3296 184.9687
## 510 167.9555 155.3966 180.5144 148.7483 187.1627
## 511 168.2005 154.4267 181.9743 147.1353 189.2657
## 512 168.3966 153.4364 183.3568 145.5169 191.2763
plot(forecasts.Dynlm, ylab="Mortality", type="l", fcol="red", xlab="weeks", xlim= c(400, 515),
main="4 weeks ahead forecasts using Dynamic Linear model")
legend("topleft",lty=1, pch=1, col=1:2, c("Mortality series","Dynlm forecasts"))
The most fitting model for our Mortality series in terms of MASE which assesses the forecast accuracy is the Autoregressive Distributed Lag model \(ARDL(5,12)\) with all 4 regressors, Temperature, Chemical 1 and 2, and particle size. The point forecasts for 4 weeks ahead reported using the forecast() of dLagM package are 171.0425, 171.7601, 171.2978, and 171.4611 respectively (Confidence Intervals are not outputted).
Potentially better forecasting methods can be explored, compared and diagnosed for better fit.
Rob Hyndman (2013) Does the Holt-Winters algorithm for exponential smoothing in time series modelling require the normality assumption in residuals?, Stack Exchange Website, accessed 26 September 2023. https://stats.stackexchange.com/questions/64911/does-the-holt-winters-algorithm-for-exponential-smoothing-in-time-series-modelli#:~:text=There%20is%20no%20normality%20assumption,under%20almost%20all%20residual%20distributions.
The dataset holds 6 columns and 31 observations. They are, Year column, the day of occurrence of a species first flowering (first flowering day, FFD, a number between 1-365), climate factors namely, rainfall (rain), temperature (temp), radiation level (rad), and relative humidity (RH) - all focused on one species of plants and measured from 1984 to 2014.
Our aim for the FFD dataset is to give best 4 years ahead forecasts by determining the most accurate and suitable regression model that determines the yearly First flowering day in terms of MASE using single predictor (univariate analysis). A descriptive analysis will be conducted initially. Model-building strategy will be applied to find the best fitting model from the time series regression methods (dLagM package), dynamic linear models (dynlm package), and exponential smoothing and corresponding state-space models.
MASE, Information Criteria (AIC and BIC), and Adjusted R Squared.
FFD_dataset <- read.csv("C:/Users/admin/Downloads/FFD.csv")
head(FFD_dataset)
## Year Temperature Rainfall Radiation RelHumidity FFD
## 1 1984 18.71038 2.489344 14.87158 54.64891 314
## 2 1985 19.26301 2.475890 14.68493 54.95781 314
## 3 1986 18.58356 2.421370 14.51507 54.96301 320
## 4 1987 19.10137 2.319726 14.67397 53.87205 306
## 5 1988 20.36066 2.465301 14.74863 53.11885 306
## 6 1989 19.59589 2.735890 14.78356 55.37671 314
For fitting a regression model, the response is FFD and the 4 regressor variables are the Temperature, Rainfall, Radiation Level and Relative Humidity.
All the 5 variables are continuous variables.
Lets first get the regressor and response as TS objects,
FFD = ts(FFD_dataset[,6], start = c(1984))
Temperature = ts(FFD_dataset[,2], start = c(1984))
Rainfall = ts(FFD_dataset[,3], start = c(1984))
Radiation = ts(FFD_dataset[,4], start = c(1984))
RelHumidity = ts(FFD_dataset[,5], start = c(1984))
data.ts = ts(FFD_dataset, start = c(1984)) # Y and x in single dataframe
Lets scale, center and plot all the 5 variables together
plot(FFD)
data.scale = scale(data.ts)
plot(data.scale[,2:6], plot.type="s", col=c("red", "blue", "green", "yellow", "black"), main = "FFD (Black - Respone), Temperature (Red - X1),\n Rainfall (Blue - X2), Radiation (Green - X3), RelHumidity (Yellow - X4)")
It is hard to read the correlations between the regressors and the response and the among the response themselves. But it is fair to say the 5 variables show some correlations. Lets check for correlation statistically using ggpairs(),
ggpairs(data = FFD_dataset, columns = c(6,2,3,4,5), progress = FALSE) #library(GGally)
Hence, some correlations between the 4 regressors and response is present. We can generate regression model based on these correlations. First, lets look at the descriptive statistics
Since we are generating regression model which estimates the response, \(FFD\), lets focus on FFDs statistics.
summary(FFD)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 265.0 291.0 301.0 306.4 314.0 380.0
The mean and median of the FFD are very close indicating symmetrical distribution.
The time series plot for our data is generated using the following code chunk,
plot(FFD, ylab='Yearly average of First Flowering Day (FFD)',xlab='Year',
type='o', main="Figure 1: Yearly Average FFD Trend (1984-2014)")
Plot Inference :
From Figure 1, we can comment on the time series’s,
Trend: The overall shape of the trend seems to follow an downward trend. Thus, indicating non-stationarity.
Seasonality: From the plot, no seasonal behavior is seen.
Change in Variance: We see high variation in FFD series during the years 1997-2004 and low variation during other years.
Behavior: We notice mixed behavior of MA and AR series. AR behavior is seen as we obverse following data points. MA behavior is evident due to up and down fluctuations in the data points.
Intervention/Change points: No clear intervention point seen. Year 2002-2003 might be an intervention points and we will be checked if they cause significant change in mean value.
acf(FFD, main="ACF of FFD")
pacf(FFD, main ="PACF of FFD")
ACF plot: We notice no significant autocorrelations. No slowly decaying pattern indicates stationary series. We do not see any ‘wavish’ form. Thus, no significant seasonal behavior is observed.
PACF plot: The 1st vertical spike is insignificant indicating stationary series.
Many model estimating procedures assume normality of the residuals. If this assumption doesn’t hold, then the coefficient estimates are not optimum. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.
qqnorm(FFD, main = "Normal Q-Q Plot of Average yearly FFD")
qqline(FFD, col = 2)
We see deviations from normality. Clearly, upper tail is off and most of the data in middle is off the line as well. Lets check statistically using shapiro-wilk test. Lets state the hypothesis of this test,
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(FFD)
##
## Shapiro-Wilk normality test
##
## data: FFD
## W = 0.85617, p-value = 0.0006877
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, FFD series is not normally distributed.
The ACF and PACF of FFD time series at the descriptive analysis stage of time series tells us stationarity in our time series. Lets use ADF and PP tests,
Using ADF (Augmented Dickey-Fuller) test :
Lets confirm the non-stationarity using Dickey-Fuller Test or ADF
test. Lets state the hypothesis,
\(H_0\) : Time series is Difference
non-stationary
\(H_a\) : Time
series is Stationary
adf.test(FFD) #library(tseries)
##
## Augmented Dickey-Fuller Test
##
## data: FFD
## Dickey-Fuller = -2.5139, Lag order = 3, p-value = 0.3749
## alternative hypothesis: stationary
since p-value > 0.05, we do not reject null hypothesis of non stationarity. we can conclude that the series is non-stationary at 5% level of significance.
Using PP (Phillips-Perron) test :
The null and alternate hypothesis are same as ADF test.
PP.test(FFD, lshort = TRUE)
##
## Phillips-Perron Unit Root Test
##
## data: FFD
## Dickey-Fuller = -4.0962, Truncation lag parameter = 2, p-value =
## 0.01861
PP.test(FFD, lshort = FALSE)
##
## Phillips-Perron Unit Root Test
##
## data: FFD
## Dickey-Fuller = -3.9565, Truncation lag parameter = 8, p-value =
## 0.02368
According to the PP tests, FFD series is stationary at 5% level
The two procedures give differing outcomes. Since Philips-Perron (PP) test is non-parametric, i.e. it does not require to select the level of serial correlation as in ADF and since our FFD series does not have significant serial autocorrelations, we can go with the outcome of PP test stating the FFD series is stationary.
Lets perform with Box-Cox transformation,
To improve normality in our FFD time series, lets test Box-Cox transformations on the series
lambda = BoxCox.lambda(FFD, method = "loglik") # library(forecast)
BC.FFD = BoxCox(FFD, lambda = lambda)
Visually comparing the time series plots before and after box-cox transformation,
par(mfrow=c(2,1))
plot(BC.FFD,ylab='Yearly FFD',xlab='Time',
type='o', main="Box-Cox Transformed FFD Time Series")
points(y=BC.FFD,x=time(BC.FFD))
plot(FFD,ylab='Yearly FFD',xlab='Time',
type='o', main="Original FFD Time Series")
points(y=FFD,x=time(FFD))
par(mfrow=c(1,1))
From the plot, almost no improvement in the variance of the time series is visible after BC transformation. Lets check for normality using shapiro test,
shapiro.test(BC.FFD)
##
## Shapiro-Wilk normality test
##
## data: BC.FFD
## W = 0.92261, p-value = 0.0277
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, BC Transformed FFD is not normal.
The BC transformed FFD series is Stationary and not normal. BC transformation was not effective.
At the descriptive analysis stage, from the time series plot and the ACF/PACF plots, no seasonal pattern was observed but a downward trend was observed. Lets decompose the FFD series and confirm. STL decomposition method will be used.
Lets set t.window to 15 and look the STL decomposed plots,
We can adjust the series for seasonality by subtracting the seasonal component from the original series using the following code chunk,
Note - Since we cannot do decomposition on a series having frequency as 1, lets falsely use frequency as 2. Also note, the time truncates from 2014 to 2000 as the frequency is doubled. This is okay since we are just interested in the decomposition.
# Code gist - Apply STL decomposition to get seasonally adjusted and trend adjusted and visually compare w.r.t to original time series
FFDX = ts(FFD_dataset[,6], start = c(1984),frequency = 2) # set frequency
stl.FFD <- stl(window(FFDX, start=c(1984)), t.window=15, s.window="periodic", robust=TRUE)
par(mfrow=c(3,1))
plot(FFDX,ylab='FFD',xlab='Time',
type='o', main="Original FFD Time Series")
plot(seasadj(stl.FFD), ylab='FFD',xlab='Time', main = "Seasonally adjusted FFD")
stl.FFD.trend = stl.FFD$time.series[,"trend"] # Extract the trend component from the output
stl.FFD.trend.adjusted = FFDX - stl.FFD.trend
plot(stl.FFD.trend.adjusted, ylab='FFD',xlab='Time', main = "Trend adjusted FFD")
par(mfrow=c(1,1))
On very close inspection of the plots above, the trend adjusted series looks more different (than the seasonally adjusted series) from the Original FFD series. Meaning, trend component is more significant than the seasonal component in the FFD series.
Trend component is more significant than the seasonal component in the FFD series. Thus, we expect the fitted model to have no seasonal component.
Time series regression methods namely,
Based on whether the lags are known (Finite DLM) or undetermined (Infinite DLM), 4 major modelling methods will be tested, namely,
The response of a finite DLM model with 1 regressor is represented as
shown below,
\(Y_t = \alpha + \sum_{s=0}^{q} \beta_s
X_{t-s} + \epsilon_t\)
where,
In our dataset, we have 4 regressors. For uni variate analysis lets fit models with single regressor for each of the 4 regressors.
Note - We are using FFD and not the BC.FFD (BC transformed FFD series) as normality is violated in both of these.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = FFD ~ Temperature, data = FFD_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.13333 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.14286 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.15385 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.16667 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.13636 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.15000 NaN NaN
## 14 14 0.08990 108.3305 122.4952 62.05535 0.36951 0.92421 0.02999554
## 13 13 0.27470 158.8972 173.1432 118.78319 -0.00663 0.60746 0.15929080
## 12 12 0.39566 182.3394 196.5060 95.89904 1.23693 0.30941 0.29664926
## 11 11 0.45582 190.0656 204.0059 99.13968 0.86788 0.40406 0.34898253
## 10 10 0.54444 202.4974 216.0762 95.21457 0.75880 0.30833 0.86820277
## 9 9 0.69138 217.5864 230.6789 98.30030 0.78249 0.07685 0.13632287
## 8 8 0.70080 225.7275 238.2179 88.30203 1.10743 0.11344 0.45917951
## 7 7 0.69979 236.6118 248.3923 40.87507 0.55008 0.00966 0.80817276
## 6 6 0.75788 248.1290 259.0989 134.74503 0.47069 -0.13573 0.33850283
## 5 5 0.73794 255.3456 265.4104 102.43406 0.47497 -0.09659 0.23858117
## 4 4 0.84199 264.5275 273.5984 101.88888 0.38987 -0.15287 0.20121785
## 3 3 0.85614 270.8700 278.8632 204.15267 0.34556 -0.09628 0.20691076
## 2 2 0.84819 277.4305 284.2670 171.03798 0.47663 -0.04610 0.22787307
## 1 1 0.82301 284.3083 289.9131 144.59119 0.52710 -0.02241 0.19315716
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.Temperature = dlm(formula = FFD ~ Temperature, data = FFD_dataset, q = 14)
summary(DLM.Temperature)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -0.87863 -0.01357 2.16805 -2.44038 2.36726 -3.46998 1.57429 -1.69876
## 9 10 11 12 13 14 15 16
## 2.05320 2.82958 -1.94792 -2.01906 2.49497 2.29534 -2.48892 1.36276
## 17
## -2.18822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 324.150 272.588 1.189 0.445
## Temperature.t 8.726 15.306 0.570 0.670
## Temperature.1 23.651 22.743 1.040 0.488
## Temperature.2 19.604 12.959 1.513 0.372
## Temperature.3 -5.885 15.286 -0.385 0.766
## Temperature.4 -5.930 11.873 -0.499 0.705
## Temperature.5 10.871 16.614 0.654 0.631
## Temperature.6 4.638 7.972 0.582 0.665
## Temperature.7 -43.245 9.989 -4.329 0.145
## Temperature.8 -20.122 8.696 -2.314 0.260
## Temperature.9 8.562 13.430 0.638 0.639
## Temperature.10 -7.797 7.130 -1.093 0.472
## Temperature.11 -22.479 9.770 -2.301 0.261
## Temperature.12 2.044 11.708 0.175 0.890
## Temperature.13 12.845 7.391 1.738 0.332
## Temperature.14 12.630 7.772 1.625 0.351
##
## Residual standard error: 8.881 on 1 degrees of freedom
## Multiple R-squared: 0.9953, Adjusted R-squared: 0.9242
## F-statistic: 14.01 on 15 and 1 DF, p-value: 0.207
##
## AIC and BIC values for the model:
## AIC BIC
## 1 108.3305 122.4952
DLM.Temperature Model is insignificant (p-value =
0.207) at 0.05 significant level.
Without intercept :
DLM.Temperature.noIntercept = dlm(formula = FFD ~ 0 + Temperature, data = FFD_dataset, q = 14)
summary(DLM.Temperature.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## 1.5293 1.5668 1.9674 -2.0817 4.5839 -4.5171 5.1261 -0.5464 5.8975 0.3338
## 11 12 13 14 15 16 17
## -2.8880 0.3861 -0.2987 -2.5086 -2.5415 1.4761 -7.1409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Temperature.t 7.771 16.793 0.463 0.6890
## Temperature.1 42.787 17.658 2.423 0.1363
## Temperature.2 25.787 13.041 1.977 0.1866
## Temperature.3 -16.455 13.663 -1.204 0.3517
## Temperature.4 -8.112 12.888 -0.629 0.5934
## Temperature.5 23.699 13.882 1.707 0.2299
## Temperature.6 8.248 8.098 1.019 0.4156
## Temperature.7 -49.947 9.062 -5.512 0.0314 *
## Temperature.8 -20.216 9.554 -2.116 0.1686
## Temperature.9 -1.095 11.751 -0.093 0.9342
## Temperature.10 -2.734 6.284 -0.435 0.7060
## Temperature.11 -27.131 9.836 -2.758 0.1101
## Temperature.12 9.938 10.596 0.938 0.4473
## Temperature.13 11.407 8.011 1.424 0.2905
## Temperature.14 10.338 8.272 1.250 0.3378
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.757 on 2 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.999
## F-statistic: 1107 on 15 and 2 DF, p-value: 0.0009028
##
## AIC and BIC values for the model:
## AIC BIC
## 1 121.3131 134.6445
DLM.Temperature.noIntercept Model is significant.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = FFD ~ Rainfall, data = FFD_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.13333 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.14286 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.15385 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.16667 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.13636 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.15000 NaN NaN
## 14 14 0.06321 100.0839 114.2485 40.97864 0.34442 0.95334 0.03001328
## 13 13 0.29906 166.8307 181.0766 78.54528 -0.30469 0.39005 0.51860280
## 12 12 0.41664 181.2808 195.4474 121.22896 -953.69675 0.34683 0.20539117
## 11 11 0.44805 188.2359 202.1761 111.03396 1.08054 0.45616 0.20552675
## 10 10 0.55752 200.1260 213.7048 107.68352 0.27076 0.38219 0.24086643
## 9 9 0.59928 212.7206 225.8131 86.83974 0.90191 0.26002 0.03078702
## 8 8 0.63873 220.6896 233.1800 71.50389 0.28130 0.28784 0.01640637
## 7 7 0.72075 234.3870 246.1675 75.08693 0.37359 0.09734 0.04286781
## 6 6 0.77856 247.9410 258.9109 155.32847 0.40794 -0.12723 0.01597736
## 5 5 0.79987 254.7647 264.8295 115.44821 0.65748 -0.07236 0.01296258
## 4 4 0.77962 264.1272 273.1980 88.57121 0.51811 -0.13590 0.03640757
## 3 3 0.83069 270.9038 278.8970 224.91666 0.18089 -0.09761 0.05860571
## 2 2 0.83748 277.5444 284.3809 156.29025 0.06368 -0.05021 0.05860348
## 1 1 0.84164 284.1154 289.7202 128.87890 -4.27195 -0.01585 0.05515185
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.Rainfall = dlm(formula = FFD ~ Rainfall, data = FFD_dataset, q = 14)
summary(DLM.Rainfall)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -2.0224 0.1804 1.2773 2.4063 -1.5688 0.5488 -2.3178 0.7084 -0.3974 2.6065
## 11 12 13 14 15 16 17
## -0.9432 0.7841 -1.2432 0.8286 -3.1891 2.7153 -0.3739
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 536.968 81.521 6.587 0.0959 .
## Rainfall.t -22.617 11.404 -1.983 0.2973
## Rainfall.1 -19.118 7.085 -2.698 0.2259
## Rainfall.2 -30.871 8.480 -3.640 0.1707
## Rainfall.3 -10.860 6.858 -1.584 0.3586
## Rainfall.4 -45.273 9.064 -4.995 0.1258
## Rainfall.5 -61.818 6.821 -9.063 0.0700 .
## Rainfall.6 -44.444 7.509 -5.918 0.1066
## Rainfall.7 28.972 8.629 3.358 0.1843
## Rainfall.8 24.033 7.222 3.328 0.1858
## Rainfall.9 42.785 7.557 5.662 0.1113
## Rainfall.10 35.400 7.262 4.875 0.1288
## Rainfall.11 30.458 7.855 3.878 0.1607
## Rainfall.12 -19.904 7.011 -2.839 0.2156
## Rainfall.13 -2.661 11.686 -0.228 0.8575
## Rainfall.14 -14.529 10.470 -1.388 0.3975
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.968 on 1 degrees of freedom
## Multiple R-squared: 0.9971, Adjusted R-squared: 0.9533
## F-statistic: 22.8 on 15 and 1 DF, p-value: 0.1631
##
## AIC and BIC values for the model:
## AIC BIC
## 1 100.0839 114.2485
DLM.Rainfall Model is insignificant (p-value =
0.1631) at 0.05 significant level.
Without intercept :
DLM.Rainfall.noIntercept = dlm(formula = FFD ~ 0 + Rainfall, data = FFD_dataset, q = 14)
summary(DLM.Rainfall.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -10.103 -22.647 -5.902 13.906 10.911 1.895 -5.248 -14.913 -7.868 7.168
## 11 12 13 14 15 16 17
## 15.913 9.856 10.133 9.585 -11.094 -7.647 9.978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rainfall.t 33.725 35.533 0.949 0.443
## Rainfall.1 1.135 30.071 0.038 0.973
## Rainfall.2 -20.010 39.188 -0.511 0.660
## Rainfall.3 -2.902 31.803 -0.091 0.936
## Rainfall.4 -26.893 40.628 -0.662 0.576
## Rainfall.5 -61.929 32.132 -1.927 0.194
## Rainfall.6 -22.233 31.609 -0.703 0.555
## Rainfall.7 48.877 38.075 1.284 0.328
## Rainfall.8 38.476 32.416 1.187 0.357
## Rainfall.9 25.108 33.279 0.754 0.529
## Rainfall.10 28.563 33.859 0.844 0.488
## Rainfall.11 9.729 33.905 0.287 0.801
## Rainfall.12 -10.485 32.334 -0.324 0.776
## Rainfall.13 41.674 45.004 0.926 0.452
## Rainfall.14 37.699 32.211 1.170 0.362
##
## Residual standard error: 32.83 on 2 degrees of freedom
## Multiple R-squared: 0.9986, Adjusted R-squared: 0.9884
## F-statistic: 97.69 on 15 and 2 DF, p-value: 0.01018
##
## AIC and BIC values for the model:
## AIC BIC
## 1 162.564 175.8954
DLM.Rainfall.noIntercept Model is significant.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = FFD ~ Radiation, data = FFD_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.13333 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.14286 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.15385 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.16667 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.13636 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.15000 NaN NaN
## 14 14 0.03909 89.04681 103.2114 18.82913 0.37467 0.97562 0.44963843
## 13 13 0.33459 166.15857 180.4045 149.80902 1.73189 0.41240 0.10406781
## 12 12 0.44056 183.55370 197.7203 133.73569 0.87138 0.26383 0.96472470
## 11 11 0.48709 190.82115 204.7614 132.94946 -0.67933 0.38111 0.88498361
## 10 10 0.50968 200.65556 214.2344 70.58245 0.12372 0.36641 0.96837884
## 9 9 0.67299 213.95421 227.0467 91.01071 0.49096 0.21734 0.34600265
## 8 8 0.74835 227.33217 239.8226 95.08541 0.54224 0.04938 0.06793710
## 7 7 0.73121 233.54416 245.3247 73.68775 0.56382 0.12849 0.06174573
## 6 6 0.82962 250.69918 261.6691 178.94922 0.73611 -0.25871 0.30633276
## 5 5 0.82691 257.54959 267.6144 91.87223 -0.46592 -0.19360 0.18952017
## 4 4 0.84162 264.08048 273.1513 106.78227 0.60001 -0.13394 0.17174020
## 3 3 0.83712 270.41454 278.4078 167.36396 0.62192 -0.07860 0.15731113
## 2 2 0.84548 277.39923 284.2357 117.96904 0.92888 -0.04497 0.17671372
## 1 1 0.85379 284.14675 289.7515 105.58523 0.50672 -0.01691 0.16861971
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.Radiation = dlm(formula = FFD ~ Radiation, data = FFD_dataset, q = 14)
summary(DLM.Radiation)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -0.13309 -1.59683 -0.19007 1.77884 0.16235 -0.02138 0.61645 -2.14129
## 9 10 11 12 13 14 15 16
## 0.80942 -0.25149 2.25184 0.24341 -2.67825 0.04819 0.46272 -0.44306
## 17
## 1.08223
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8562.308 1469.386 -5.827 0.1082
## Radiation.t 107.716 14.721 7.317 0.0865 .
## Radiation.1 78.781 14.515 5.428 0.1160
## Radiation.2 56.552 17.392 3.252 0.1899
## Radiation.3 48.867 19.819 2.466 0.2453
## Radiation.4 -87.676 20.012 -4.381 0.1429
## Radiation.5 -101.449 25.466 -3.984 0.1566
## Radiation.6 105.043 12.740 8.245 0.0768 .
## Radiation.7 12.211 11.781 1.037 0.4886
## Radiation.8 115.181 22.558 5.106 0.1231
## Radiation.9 -45.629 7.503 -6.082 0.1037
## Radiation.10 -45.000 7.348 -6.124 0.1030
## Radiation.11 -14.283 8.005 -1.784 0.3252
## Radiation.12 43.601 8.722 4.999 0.1257
## Radiation.13 168.244 24.040 6.998 0.0904 .
## Radiation.14 167.580 29.060 5.767 0.1093
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.036 on 1 degrees of freedom
## Multiple R-squared: 0.9985, Adjusted R-squared: 0.9756
## F-statistic: 43.69 on 15 and 1 DF, p-value: 0.1182
##
## AIC and BIC values for the model:
## AIC BIC
## 1 89.04681 103.2114
DLM.Radiation Model is insignificant (p-value =
0.1182) at 0.05 significant level.
Without intercept :
DLM.Radiation.noIntercept = dlm(formula = FFD ~ 0 + Rainfall, data = FFD_dataset, q = 14)
summary(DLM.Radiation.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -10.103 -22.647 -5.902 13.906 10.911 1.895 -5.248 -14.913 -7.868 7.168
## 11 12 13 14 15 16 17
## 15.913 9.856 10.133 9.585 -11.094 -7.647 9.978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rainfall.t 33.725 35.533 0.949 0.443
## Rainfall.1 1.135 30.071 0.038 0.973
## Rainfall.2 -20.010 39.188 -0.511 0.660
## Rainfall.3 -2.902 31.803 -0.091 0.936
## Rainfall.4 -26.893 40.628 -0.662 0.576
## Rainfall.5 -61.929 32.132 -1.927 0.194
## Rainfall.6 -22.233 31.609 -0.703 0.555
## Rainfall.7 48.877 38.075 1.284 0.328
## Rainfall.8 38.476 32.416 1.187 0.357
## Rainfall.9 25.108 33.279 0.754 0.529
## Rainfall.10 28.563 33.859 0.844 0.488
## Rainfall.11 9.729 33.905 0.287 0.801
## Rainfall.12 -10.485 32.334 -0.324 0.776
## Rainfall.13 41.674 45.004 0.926 0.452
## Rainfall.14 37.699 32.211 1.170 0.362
##
## Residual standard error: 32.83 on 2 degrees of freedom
## Multiple R-squared: 0.9986, Adjusted R-squared: 0.9884
## F-statistic: 97.69 on 15 and 2 DF, p-value: 0.01018
##
## AIC and BIC values for the model:
## AIC BIC
## 1 162.564 175.8954
DLM.Radiation.noIntercept Model is significant.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = FFD ~ RelHumidity, data = FFD_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.13333 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.14286 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.15385 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.16667 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.13636 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.15000 NaN NaN
## 14 14 0.01053 39.90284 54.06747 5.74566 0.23305 0.99865 0.5062922
## 13 13 0.29014 163.95746 178.20341 98.23537 0.75770 0.48004 0.5217713
## 12 12 0.34461 178.08248 192.24906 94.00451 0.21639 0.44803 0.6765633
## 11 11 0.39538 185.84189 199.78215 97.88977 0.32728 0.51751 0.3213649
## 10 10 0.51502 197.62296 211.20175 112.98141 0.44008 0.45161 0.8515080
## 9 9 0.73191 222.08303 235.17554 93.41179 0.63595 -0.13251 0.1991888
## 8 8 0.71876 228.95749 241.44792 71.60391 -86.76689 -0.02023 0.1283026
## 7 7 0.74342 237.69356 249.47409 57.30648 -0.54471 -0.03600 0.1683119
## 6 6 0.85119 252.53366 263.50354 145.90020 0.60988 -0.35454 0.1414141
## 5 5 0.85674 259.13851 269.20328 128.72173 0.70634 -0.26882 0.1438949
## 4 4 0.81748 265.84331 274.91417 91.83514 0.41658 -0.21044 0.1588539
## 3 3 0.82083 272.18292 280.17615 160.11196 -5.00196 -0.14891 0.1528627
## 2 2 0.82665 278.86224 285.69872 125.00108 0.45375 -0.09904 0.1577557
## 1 1 0.83046 285.18336 290.78815 118.55218 0.46508 -0.05267 0.1458492
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.RelHumidity = dlm(formula = FFD ~ RelHumidity, data = FFD_dataset, q = 14)
summary(DLM.RelHumidity)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -0.50251 -0.14706 0.32627 0.29784 0.24216 -0.54987 0.23332 -0.20803
## 9 10 11 12 13 14 15 16
## 0.50850 0.07915 0.09090 -0.25842 -0.01506 0.03899 -0.32785 0.01504
## 17
## 0.17662
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1108.0727 66.7845 -16.592 0.0383 *
## RelHumidity.t -6.1015 0.5084 -12.003 0.0529 .
## RelHumidity.1 -5.4993 0.3776 -14.565 0.0436 *
## RelHumidity.2 -3.4724 0.3217 -10.793 0.0588 .
## RelHumidity.3 3.8202 0.3128 12.213 0.0520 .
## RelHumidity.4 -6.4177 0.3679 -17.443 0.0365 *
## RelHumidity.5 -10.5496 0.4186 -25.205 0.0252 *
## RelHumidity.6 -3.5258 0.3440 -10.250 0.0619 .
## RelHumidity.7 9.0306 0.3014 29.961 0.0212 *
## RelHumidity.8 10.3493 0.4336 23.870 0.0267 *
## RelHumidity.9 8.7593 0.4638 18.886 0.0337 *
## RelHumidity.10 18.4883 0.4371 42.298 0.0150 *
## RelHumidity.11 10.2434 0.4535 22.586 0.0282 *
## RelHumidity.12 1.6249 0.4428 3.670 0.1694
## RelHumidity.13 0.6677 0.6748 0.989 0.5034
## RelHumidity.14 -1.6337 0.6342 -2.576 0.2357
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.187 on 1 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9986
## F-statistic: 788 on 15 and 1 DF, p-value: 0.02795
##
## AIC and BIC values for the model:
## AIC BIC
## 1 39.90284 54.06747
DLM.RelHumidity Model is significant (p-value =
0.02795) at 0.05 significant level.
Without intercept :
DLM.RelHumidity.noIntercept = dlm(formula = FFD ~ 0 + RelHumidity, data = FFD_dataset, q = 14)
summary(DLM.RelHumidity.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -1.7316 11.9571 1.2094 1.0992 0.2971 1.0447 7.2072 0.8436 3.7545 -5.9973
## 11 12 13 14 15 16 17
## -5.0662 -3.3051 -2.4479 -7.1852 0.2318 3.3305 -5.5918
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## RelHumidity.t -12.229 4.106 -2.978 0.0967 .
## RelHumidity.1 -5.435 4.438 -1.225 0.3453
## RelHumidity.2 -2.406 3.705 -0.649 0.5827
## RelHumidity.3 1.872 3.407 0.549 0.6380
## RelHumidity.4 -5.812 4.303 -1.351 0.3093
## RelHumidity.5 -8.025 4.583 -1.751 0.2220
## RelHumidity.6 -1.757 3.844 -0.457 0.6924
## RelHumidity.7 8.423 3.516 2.395 0.1389
## RelHumidity.8 8.539 4.932 1.731 0.2255
## RelHumidity.9 5.638 4.982 1.131 0.3753
## RelHumidity.10 18.657 5.136 3.633 0.0681 .
## RelHumidity.11 11.430 5.264 2.171 0.1620
## RelHumidity.12 1.394 5.201 0.268 0.8138
## RelHumidity.13 -5.482 6.628 -0.827 0.4951
## RelHumidity.14 -9.473 4.972 -1.905 0.1971
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.95 on 2 degrees of freedom
## Multiple R-squared: 0.9998, Adjusted R-squared: 0.9979
## F-statistic: 541.6 on 15 and 2 DF, p-value: 0.001845
##
## AIC and BIC values for the model:
## AIC BIC
## 1 133.4673 146.7987
DLM.RelHumidity.noIntercept Model is significant.
Models using all 4 predictors without intercept are significant. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("DLM.Temperature.noIntercept", "DLM.Rainfall.noIntercept", "DLM.Radiation.noIntercept", "DLM.RelHumidity", "DLM.RelHumidity.noIntercept")
AIC <- c(AIC(DLM.Temperature.noIntercept), AIC(DLM.Rainfall.noIntercept), AIC(DLM.Radiation.noIntercept), AIC(DLM.RelHumidity), AIC(DLM.RelHumidity.noIntercept))
BIC <- c(BIC(DLM.Temperature.noIntercept), BIC(DLM.Rainfall.noIntercept), BIC(DLM.Radiation.noIntercept), BIC(DLM.RelHumidity), BIC(DLM.RelHumidity.noIntercept))
Adjusted_Rsquared <- c(0.999, 0.9884, 0.9884, 0.9986, 0.9979)
MASE <- MASE(DLM.Temperature.noIntercept, DLM.Rainfall.noIntercept, DLM.Radiation.noIntercept, DLM.RelHumidity, DLM.RelHumidity.noIntercept)
data.frame(AIC, BIC, Adjusted_Rsquared, MASE) %>% arrange(AIC)
## AIC BIC Adjusted_Rsquared n MASE
## DLM.RelHumidity 39.90284 54.06747 0.9986 17 0.01053278
## DLM.Temperature.noIntercept 121.31305 134.64447 0.9990 17 0.11899711
## DLM.RelHumidity.noIntercept 133.46732 146.79874 0.9979 17 0.16333023
## DLM.Rainfall.noIntercept 162.56397 175.89538 0.9884 17 0.45818179
## DLM.Radiation.noIntercept 162.56397 175.89538 0.9884 17 0.45818179
Thus, as per AIC, BIC and MASE, finite distributed lag model for FFD with Relative Humidity as the regressor (DLM.RelHumidity) is the best.
We can apply a diagnostic check using checkresiduals() function from the forecast package.
checkresiduals(DLM.RelHumidity$model$residuals) # forecast package
##
## Ljung-Box test
##
## data: Residuals
## Q* = 5.5937, df = 3, p-value = 0.1331
##
## Model df: 0. Total lags used: 3
In this output,
ATTENTION - Lets summarise the models from here on
and not go into each models details for simplicity
Polynomial DLM model helps remove the effect of multicollinearity. Lets fit a polynomial DLM of order 2 for each of the 4 regressors individually.
PolyDLM.Temperature = polyDlm(x = as.vector(Temperature), y = as.vector(FFD), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.Temperature)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -40.293 -13.817 -2.696 12.161 49.486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 357.0169 503.3239 0.709 0.49065
## z.t0 23.0987 9.9875 2.313 0.03775 *
## z.t1 -9.5889 3.0664 -3.127 0.00802 **
## z.t2 0.6472 0.1864 3.471 0.00413 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.38 on 13 degrees of freedom
## Multiple R-squared: 0.536, Adjusted R-squared: 0.4289
## F-statistic: 5.006 on 3 and 13 DF, p-value: 0.01596
Polynomial DLM model with Temperature as regressor variable is significant at 5% significance level.
PolyDLM.Rainfall = polyDlm(x = as.vector(Rainfall), y = as.vector(FFD), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.Rainfall)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.557 -12.489 -8.102 5.573 62.516
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 253.8534 245.0117 1.036 0.319
## z.t0 -14.3017 18.9184 -0.756 0.463
## z.t1 4.2052 5.7540 0.731 0.478
## z.t2 -0.2047 0.4175 -0.490 0.632
##
## Residual standard error: 32.19 on 13 degrees of freedom
## Multiple R-squared: 0.1907, Adjusted R-squared: 0.003978
## F-statistic: 1.021 on 3 and 13 DF, p-value: 0.415
Polynomial DLM model with Rainfall as regressor variable is insignificant at 5% significance level.
PolyDLM.Radiation = polyDlm(x = as.vector(Radiation), y = as.vector(FFD), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.Radiation)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -55.703 -13.245 -3.613 0.802 61.872
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1084.8086 1619.5212 0.670 0.515
## z.t0 0.9589 17.5034 0.055 0.957
## z.t1 -2.7044 5.2672 -0.513 0.616
## z.t2 0.2129 0.3893 0.547 0.594
##
## Residual standard error: 31.76 on 13 degrees of freedom
## Multiple R-squared: 0.2124, Adjusted R-squared: 0.03059
## F-statistic: 1.168 on 3 and 13 DF, p-value: 0.3594
Polynomial DLM model with Radiation as regressor variable is insignificant at 5% significance level.
PolyDLM.RelHumidity = polyDlm(x = as.vector(RelHumidity), y = as.vector(FFD), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.RelHumidity)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.620 -12.644 -6.069 1.193 69.905
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -980.16153 1319.02934 -0.743 0.471
## z.t0 -0.09770 4.50105 -0.022 0.983
## z.t1 1.03154 1.65904 0.622 0.545
## z.t2 -0.08175 0.13240 -0.617 0.548
##
## Residual standard error: 33.21 on 13 degrees of freedom
## Multiple R-squared: 0.1387, Adjusted R-squared: -0.06
## F-statistic: 0.6981 on 3 and 13 DF, p-value: 0.5697
Polynomial DLM model with Relative Humidity as regressor variable is insignificant at 5% significance level.
Polynomial DLM model for only Temperature regressor is significant.
MASE(PolyDLM.Temperature, PolyDLM.Rainfall, PolyDLM.Radiation, PolyDLM.RelHumidity)
## n MASE
## PolyDLM.Temperature 17 0.7388406
## PolyDLM.Rainfall 17 0.9173699
## PolyDLM.Radiation 17 0.8164986
## PolyDLM.RelHumidity 17 0.8588444
Also as per MASE, Polynomial DLM model with Temperature as regressor is the best.
checkresiduals(PolyDLM.Temperature$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 6.5881, df = 3, p-value = 0.08625
##
## Model df: 0. Total lags used: 3
Serial autocorrelations left in residuals are insignificant as per Ljung-Box test and ACF plot. From the time series plot and histogram of residuals, there is an obvious random pattern and normality in the residual distribution. Thus, no violation in general assumptions.
Here the lag weights are positive and decline geometrically. This
model is called infinite geometric DLM, meaning there are infinite lag
weights. Koyck transformation is applied to implement this infinite
geometric DLM model by subtracting the first lag of geometric DLM
multiplied by \(\phi\). The Koyck
transformed model is represented as,
\(Y_t = \delta_1 + \delta_2Y_{t-1} +
\nu_t\)
where \(\delta_1 = \alpha(1-\phi), \delta_2
= \phi, \delta_3 = \beta\) and the random error after the
transformation is \(\nu_t = (\epsilon_t
-\phi\epsilon_{t-1})\).
The koyckDlm() function is used to implement a two-staged least squares method to first estimate the \(\hat{Y}_{t-1}\) and the estimate \(Y_{t}\) through simple linear regression. Lets deduce Koyck geometric GLM models for each of the 4 regressors individually.
With intercept :
Koyck.Temperature = koyckDlm(x = as.vector(FFD_dataset$Temperature) , y = as.vector(FFD_dataset$FFD) )
summary(Koyck.Temperature$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.451 -15.487 -2.648 6.757 75.055
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 526.5488 401.9192 1.310 0.201
## Y.1 0.1585 0.2372 0.668 0.510
## X.t -13.7092 18.0543 -0.759 0.454
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 6.247 0.0188 *
## Wu-Hausman 1 26 0.306 0.5846
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.66 on 27 degrees of freedom
## Multiple R-Squared: 0.03631, Adjusted R-squared: -0.03508
## Wald test: 1.255 on 2 and 27 DF, p-value: 0.3011
Koyck.Temperature is insignificant at 5% significance level.
Without intercept :
Koyck.Temperature.NoIntercept = koyckDlm(x = as.vector(FFD_dataset$Temperature) , y = as.vector(FFD_dataset$FFD), intercept = FALSE)
summary(Koyck.Temperature.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.778 -19.249 -1.045 9.819 74.110
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 0.3774 0.1718 2.197 0.03646 *
## X.t 9.6891 2.6956 3.594 0.00123 **
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 146.001 3.33e-15 ***
## Wu-Hausman 1 27 1.869 0.183
## Sargan 1 NA 1.765 0.184
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.2 on 28 degrees of freedom
## Multiple R-Squared: 0.9932, Adjusted R-squared: 0.9927
## Wald test: 2049 on 2 and 28 DF, p-value: < 2.2e-16
Koyck.Temperature.NoIntercept is significant at 5% significance level.
With intercept :
Koyck.Rainfall = koyckDlm(x = as.vector(FFD_dataset$Rainfall) , y = as.vector(FFD_dataset$FFD) )
summary(Koyck.Rainfall$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.776 -21.430 -3.028 6.055 93.212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 177.6796 136.5337 1.301 0.204
## Y.1 0.1850 0.3036 0.609 0.547
## X.t 30.2749 75.4565 0.401 0.691
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 1.152 0.293
## Wu-Hausman 1 26 0.695 0.412
## Sargan 0 NA NA NA
##
## Residual standard error: 30.69 on 27 degrees of freedom
## Multiple R-Squared: -0.3779, Adjusted R-squared: -0.48
## Wald test: 0.7567 on 2 and 27 DF, p-value: 0.4789
Koyck.Rainfall model is insignificant at 5% significance level.
Without intercept :
Koyck.Rainfall.NoIntercept = koyckDlm(x = as.vector(FFD_dataset$Rainfall) , y = as.vector(FFD_dataset$FFD), intercept = FALSE)
summary(Koyck.Rainfall.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -67.19 -36.77 -10.18 16.76 146.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 0.1140 0.5448 0.209 0.836
## X.t 114.4554 70.8645 1.615 0.117
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 2.172 0.133400
## Wu-Hausman 1 27 15.604 0.000505 ***
## Sargan 1 NA 0.544 0.460588
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 55.97 on 28 degrees of freedom
## Multiple R-Squared: 0.969, Adjusted R-squared: 0.9668
## Wald test: 448.7 on 2 and 28 DF, p-value: < 2.2e-16
Koyck.Rainfall.NoIntercept model is significant at 5% significance level.
With intercept :
Koyck.Radiation = koyckDlm(x = as.vector(FFD_dataset$Radiation) , y = as.vector(FFD_dataset$FFD) )
summary(Koyck.Radiation$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.615 -13.816 -3.619 9.597 76.084
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 633.4417 404.8050 1.565 0.129
## Y.1 0.2965 0.2085 1.422 0.166
## X.t -28.6891 28.0566 -1.023 0.316
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 6.901 0.014 *
## Wu-Hausman 1 26 1.470 0.236
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.73 on 27 degrees of freedom
## Multiple R-Squared: -0.1254, Adjusted R-squared: -0.2088
## Wald test: 1.351 on 2 and 27 DF, p-value: 0.276
Koyck.Radiation model is insignificant at 5% significance level.
Without intercept :
Koyck.Radiation.NoIntercept = koyckDlm(x = as.vector(FFD_dataset$Radiation) , y = as.vector(FFD_dataset$FFD), intercept = FALSE)
summary(Koyck.Radiation.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.371 -12.801 -4.088 5.688 73.765
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 0.3000 0.1929 1.555 0.13120
## X.t 14.6701 4.0749 3.600 0.00121 **
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 140.089 5.54e-15 ***
## Wu-Hausman 1 27 0.420 0.5225
## Sargan 1 NA 3.063 0.0801 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.66 on 28 degrees of freedom
## Multiple R-Squared: 0.9935, Adjusted R-squared: 0.993
## Wald test: 2134 on 2 and 28 DF, p-value: < 2.2e-16
Koyck.Radiation.NoIntercept model is significant at 5% significance level.
With intercept :
Koyck.RelHumidity = koyckDlm(x = as.vector(FFD_dataset$RelHumidity) , y = as.vector(FFD_dataset$FFD) )
summary(Koyck.RelHumidity$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.742 -13.465 -3.038 6.024 82.953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -181.7506 639.8825 -0.284 0.779
## Y.1 0.1695 0.2552 0.664 0.512
## X.t 8.0820 12.6627 0.638 0.529
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 2.594 0.119
## Wu-Hausman 1 26 0.531 0.473
## Sargan 0 NA NA NA
##
## Residual standard error: 27.73 on 27 degrees of freedom
## Multiple R-Squared: -0.1248, Adjusted R-squared: -0.2081
## Wald test: 1.032 on 2 and 27 DF, p-value: 0.3699
Koyck.RelHumidity model is insignificant at 5% significance level.
Without intercept :
Koyck.RelHumidity.NoIntercept = koyckDlm(x = as.vector(FFD_dataset$RelHumidity) , y = as.vector(FFD_dataset$FFD), intercept = FALSE)
summary(Koyck.RelHumidity.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.379 -12.760 -3.419 6.830 79.149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 0.2061 0.2028 1.016 0.318213
## X.t 4.5031 1.1586 3.887 0.000569 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 131.621 1.19e-14 ***
## Wu-Hausman 1 27 2.020 0.167
## Sargan 1 NA 0.102 0.750
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.56 on 28 degrees of freedom
## Multiple R-Squared: 0.9935, Adjusted R-squared: 0.9931
## Wald test: 2153 on 2 and 28 DF, p-value: < 2.2e-16
Koyck.RelHumidity.NoIntercept model is significant at 5% significance level.
Koyck DLM models for all 4 regressors without intercept are significant. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("Koyck.Temperature.NoIntercept", "Koyck.Rainfall.NoIntercept", "Koyck.Radiation.NoIntercept", "Koyck.RelHumidity.NoIntercept")
AIC <- c(AIC(Koyck.Temperature.NoIntercept), AIC(Koyck.Rainfall.NoIntercept), AIC(Koyck.Radiation.NoIntercept), AIC(Koyck.RelHumidity.NoIntercept))
BIC <- c( BIC(Koyck.Temperature.NoIntercept), BIC(Koyck.Rainfall.NoIntercept), BIC(Koyck.Radiation.NoIntercept), BIC(Koyck.RelHumidity.NoIntercept))
Adjusted_Rsquared <- c(0.9927, 0.9668, 0.993, 0.9931)
MASE <- MASE(Koyck.Temperature.NoIntercept, Koyck.Rainfall.NoIntercept, Koyck.Radiation.NoIntercept, Koyck.RelHumidity.NoIntercept)
data.frame(AIC, BIC, Adjusted_Rsquared, MASE) %>% arrange(MASE)
## AIC BIC Adjusted_Rsquared n MASE
## Koyck.RelHumidity.NoIntercept 283.5234 287.7270 0.9931 30 0.8512851
## Koyck.Radiation.NoIntercept 283.7733 287.9769 0.9930 30 0.9259536
## Koyck.Temperature.NoIntercept 285.0008 289.2044 0.9927 30 0.9781023
## Koyck.Rainfall.NoIntercept 330.5585 334.7621 0.9668 30 2.1113619
Thus, as per AIC,BIC,MASE (best in terms of forecasting), and Adjusted R-Squared, Koyck DLM for FFD with Relative Humidity as the regressor with no intercept (Koyck.RelHumidity.NoIntercept) is the best.
checkresiduals(Koyck.RelHumidity.NoIntercept$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 1.2763, df = 6, p-value = 0.9729
##
## Model df: 0. Total lags used: 6
Serial autocorrelations left in residuals are insignificant as per Ljung-Box test and ACF plot. From the time series plot and histogram of residuals, there is an obvious random pattern and normality in the residual distribution. Thus, no violation in general assumptions.
Autoregressive Distributed lag model is a flexible and parsimonious
infinite DLM. The model is represented as,
\(Y_t = \mu + \beta_0 X_t + \beta_1 X_{t-1}
+ \gamma_1 Y_{t-1} + e_t\)
Similar to the Koyck DLM, it is possible to write this model as an infinite DLM with infinite lag distribution of any shape rather than a polynomial or geometric shape. The model is denoted as ARDL(p,q). To fit the model we will use ardlDlm() function is used. Lets find the best lag length using AIC and BIC score through an iteration. Lets set max lag length to 14. Lets do this for each regressor individually.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ Temperature, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 13 2 85.51361 101.5403
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 13 2 85.51361 101.5403
ARDL(13,2) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(13,2):
ARDL.Temperature.13x2 = ardlDlm(formula = FFD ~ Temperature, data = FFD_dataset, p = 13, q = 2)
summary(ARDL.Temperature.13x2)
##
## Time series regression with "ts" data:
## Start = 14, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 14 15 16 17 18 19 20 21
## -0.28212 0.30905 -0.36535 0.55389 -1.11505 1.89987 -1.40635 1.09117
## 22 23 24 25 26 27 28 29
## -1.28712 0.33102 0.04305 0.11575 0.59499 -0.61912 -0.51206 -0.29693
## 30 31
## 1.98022 -1.03492
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1799.14345 242.31073 7.425 0.0852 .
## Temperature.t 57.43342 7.63646 7.521 0.0842 .
## Temperature.1 -31.77471 3.51783 -9.032 0.0702 .
## Temperature.2 -12.18393 5.82043 -2.093 0.2837
## Temperature.3 14.68490 2.99401 4.905 0.1280
## Temperature.4 33.07505 6.84722 4.830 0.1300
## Temperature.5 -6.31500 2.26272 -2.791 0.2190
## Temperature.6 -12.13070 3.45046 -3.516 0.1764
## Temperature.7 -41.46746 2.75004 -15.079 0.0422 *
## Temperature.8 -32.66363 5.81102 -5.621 0.1121
## Temperature.9 16.11390 3.95843 4.071 0.1534
## Temperature.10 -51.03539 3.71147 -13.751 0.0462 *
## Temperature.11 34.16284 5.62777 6.070 0.1039
## Temperature.12 -41.62016 5.03400 -8.268 0.0766 .
## Temperature.13 3.58385 4.45503 0.804 0.5687
## FFD.1 0.34062 0.07992 4.262 0.1467
## FFD.2 -0.83494 0.11440 -7.298 0.0867 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.062 on 1 degrees of freedom
## Multiple R-squared: 0.9991, Adjusted R-squared: 0.984
## F-statistic: 66.37 on 16 and 1 DF, p-value: 0.09616
checkresiduals(ARDL.Temperature.13x2$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 20.759, df = 4, p-value = 0.0003535
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Temperature.13x2)
## MASE
## ARDL.Temperature.13x2 0.0305356
Model is insignificant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ -1 + Temperature, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 13 3 109.1219 125.1485
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 13 3 109.1219 125.1485
ARDL(13,3) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(13,3):
ARDL.Temperature.NoIntercept.13x3 = ardlDlm(formula = FFD ~ -1 + Temperature, data = FFD_dataset, p = 13, q = 3)
summary(ARDL.Temperature.NoIntercept.13x3)
##
## Time series regression with "ts" data:
## Start = 14, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 14 15 16 17 18 19 20 21
## -0.39467 0.51088 -0.67364 1.32736 -2.25787 3.72448 -2.86633 2.25355
## 22 23 24 25 26 27 28 29
## -2.38613 0.76865 0.43390 0.05015 0.65610 -0.99901 -0.60352 -0.88462
## 30 31
## 3.53983 -2.18261
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Temperature.t 40.5038 13.5981 2.979 0.2062
## Temperature.1 -55.7986 10.7210 -5.205 0.1208
## Temperature.2 23.6979 10.0183 2.365 0.2546
## Temperature.3 15.2497 5.7984 2.630 0.2313
## Temperature.4 9.3513 10.7807 0.867 0.5451
## Temperature.5 -23.7448 5.3922 -4.404 0.1422
## Temperature.6 -1.2956 6.0064 -0.216 0.8648
## Temperature.7 -33.5482 5.1556 -6.507 0.0971 .
## Temperature.8 2.2203 8.6703 0.256 0.8404
## Temperature.9 36.3215 4.4742 8.118 0.0780 .
## Temperature.10 -49.3118 6.9934 -7.051 0.0897 .
## Temperature.11 55.0020 14.6504 3.754 0.1657
## Temperature.12 -53.5290 12.2360 -4.375 0.1431
## Temperature.13 38.1858 7.1785 5.319 0.1183
## FFD.1 0.9802 0.1614 6.072 0.1039
## FFD.2 -0.8526 0.2280 -3.740 0.1663
## FFD.3 0.6458 0.1719 3.758 0.1656
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.826 on 1 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 0.9993
## F-statistic: 1626 on 17 and 1 DF, p-value: 0.0195
checkresiduals(ARDL.Temperature.NoIntercept.13x3$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 24.364, df = 4, p-value = 6.753e-05
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Temperature.NoIntercept.13x3)
## MASE
## ARDL.Temperature.NoIntercept.13x3 0.05850544
Model is significant at 5% significance level.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ Rainfall, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 2 13 155.1653 171.192
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 2 13 155.1653 171.192
ARDL(2,13) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(2,13):
ARDL.Rainfall.2x13 = ardlDlm(formula = FFD ~ Rainfall, data = FFD_dataset, p = 2, q = 13)
summary(ARDL.Rainfall.2x13)
##
## Time series regression with "ts" data:
## Start = 14, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 14 15 16 17 18 19 20 21
## -12.3187 -7.4665 -10.5105 -8.2321 10.3234 14.6427 2.9609 -2.7338
## 22 23 24 25 26 27 28 29
## 1.6466 1.5984 1.5430 1.2328 0.6374 3.5737 -1.4717 1.2570
## 30 31
## -2.5867 5.9039
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1286.54423 1078.81523 1.193 0.444
## Rainfall.t -138.19598 50.62236 -2.730 0.224
## Rainfall.1 -24.36210 40.13714 -0.607 0.653
## Rainfall.2 -75.70590 36.74138 -2.061 0.288
## FFD.1 0.53732 0.34387 1.563 0.362
## FFD.2 -0.29943 0.35141 -0.852 0.551
## FFD.3 0.15577 0.34972 0.445 0.733
## FFD.4 -0.10101 0.36267 -0.279 0.827
## FFD.5 -0.76882 0.33789 -2.275 0.264
## FFD.6 -0.54383 0.34948 -1.556 0.364
## FFD.7 -0.52359 0.36204 -1.446 0.385
## FFD.8 0.54449 0.59484 0.915 0.528
## FFD.9 -0.06028 0.57921 -0.104 0.934
## FFD.10 1.02547 0.69118 1.484 0.378
## FFD.11 0.13521 0.57556 0.235 0.853
## FFD.12 -0.33668 0.62199 -0.541 0.684
## FFD.13 -1.21974 0.68829 -1.772 0.327
##
## Residual standard error: 28.12 on 1 degrees of freedom
## Multiple R-squared: 0.9549, Adjusted R-squared: 0.2336
## F-statistic: 1.324 on 16 and 1 DF, p-value: 0.6024
checkresiduals(ARDL.Rainfall.2x13$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 6.2832, df = 4, p-value = 0.179
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Rainfall.2x13)
## MASE
## ARDL.Rainfall.2x13 0.2000099
Model is insignificant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ -1 + Rainfall, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 12 5 126.9599 144.9042
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 12 5 126.9599 144.9042
ARDL(12,5) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(12,5):
ARDL.Rainfall.NoIntercept.12x5 = ardlDlm(formula = FFD ~ -1 + Rainfall, data = FFD_dataset, p = 12, q = 5)
summary(ARDL.Rainfall.NoIntercept.12x5)
##
## Time series regression with "ts" data:
## Start = 13, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 13 14 15 16 17 18 19 20 21 22
## -7.2608 2.3455 -0.8289 3.5294 -0.3649 0.3543 -2.9993 0.6410 -0.5476 -0.1955
## 23 24 25 26 27 28 29 30 31
## -1.7474 1.0049 -0.8311 -0.8444 4.0869 0.1678 2.0419 -1.2396 3.2965
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rainfall.t -160.5202 35.3684 -4.539 0.138
## Rainfall.1 133.6537 29.3784 4.549 0.138
## Rainfall.2 -130.7409 36.0883 -3.623 0.171
## Rainfall.3 -62.4015 13.3674 -4.668 0.134
## Rainfall.4 19.0601 13.0988 1.455 0.383
## Rainfall.5 -152.5199 28.6844 -5.317 0.118
## Rainfall.6 89.9317 24.2465 3.709 0.168
## Rainfall.7 143.6564 28.8183 4.985 0.126
## Rainfall.8 -4.3487 12.6135 -0.345 0.789
## Rainfall.9 165.0070 39.1577 4.214 0.148
## Rainfall.10 -99.6413 39.1972 -2.542 0.239
## Rainfall.11 -250.2284 66.3218 -3.773 0.165
## Rainfall.12 -228.8179 49.9715 -4.579 0.137
## FFD.1 3.9935 0.8424 4.741 0.132
## FFD.2 -0.9188 0.3293 -2.790 0.219
## FFD.3 0.3889 0.3542 1.098 0.470
## FFD.4 3.7845 0.8238 4.594 0.136
## FFD.5 -2.1429 0.5696 -3.762 0.165
##
## Residual standard error: 10.96 on 1 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9987
## F-statistic: 823.4 on 18 and 1 DF, p-value: 0.02742
checkresiduals(ARDL.Rainfall.NoIntercept.12x5$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 7.2277, df = 4, p-value = 0.1243
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Rainfall.NoIntercept.12x5)
## MASE
## ARDL.Rainfall.NoIntercept.12x5 0.06993808
Model is significant at 5% significance level.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ Radiation, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 2 13 134.4458 150.4725
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 2 13 134.4458 150.4725
ARDL(2,13) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(2,13):
ARDL.Radiation.2x13 = ardlDlm(formula = FFD ~ Radiation, data = FFD_dataset, p = 2, q = 13)
summary(ARDL.Radiation.2x13)
##
## Time series regression with "ts" data:
## Start = 14, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 14 15 16 17 18 19 20 21
## -3.54323 8.53636 -1.27297 -9.11223 -2.36069 7.09851 1.50385 -1.87316
## 22 23 24 25 26 27 28 29
## -0.77066 1.40688 0.59778 -0.79341 -1.32024 1.55674 1.93126 -2.46251
## 30 31
## 0.02261 0.85512
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1979.06076 582.55132 3.397 0.182
## Radiation.t 92.00812 54.45628 1.690 0.340
## Radiation.1 -182.05922 73.55176 -2.475 0.244
## Radiation.2 48.81549 25.97919 1.879 0.311
## FFD.1 0.46877 0.41858 1.120 0.464
## FFD.2 -0.11289 0.21130 -0.534 0.688
## FFD.3 -0.74065 0.52710 -1.405 0.394
## FFD.4 0.47140 0.81105 0.581 0.665
## FFD.5 0.02944 0.46803 0.063 0.960
## FFD.6 -0.81573 0.58890 -1.385 0.398
## FFD.7 0.70481 0.72208 0.976 0.508
## FFD.8 -0.31561 0.27655 -1.141 0.458
## FFD.9 -0.58602 0.22584 -2.595 0.234
## FFD.10 -0.85568 0.30542 -2.802 0.218
## FFD.11 0.32301 0.51577 0.626 0.644
## FFD.12 -1.35893 0.34040 -3.992 0.156
## FFD.13 -0.64256 0.45674 -1.407 0.393
##
## Residual standard error: 15.81 on 1 degrees of freedom
## Multiple R-squared: 0.9857, Adjusted R-squared: 0.7576
## F-statistic: 4.321 on 16 and 1 DF, p-value: 0.363
checkresiduals(ARDL.Radiation.2x13$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 14.87, df = 4, p-value = 0.004979
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Radiation.2x13)
## MASE
## ARDL.Radiation.2x13 0.1037525
Model is insignificant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ -1 + Radiation, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 1 14 136.2045 150.3691
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 1 14 136.2045 150.3691
ARDL(1,14) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(1,14):
ARDL.Radiation.NoIntercept.1x14 = ardlDlm(formula = FFD ~ -1 + Radiation, data = FFD_dataset, p = 1, q = 14)
summary(ARDL.Radiation.NoIntercept.1x14)
##
## Time series regression with "ts" data:
## Start = 15, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 15 16 17 18 19 20 21 22
## -12.95137 0.51595 11.88609 8.62918 -1.76593 -0.79224 0.12925 0.10285
## 23 24 25 26 27 28 29 30
## -1.61808 -1.19211 0.51836 1.20603 -2.35993 -2.43226 1.06106 -0.01166
## 31
## -0.71991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Radiation.t 82.25080 71.69482 1.147 0.456
## Radiation.1 -317.59728 117.77368 -2.697 0.226
## FFD.1 1.13389 0.59516 1.905 0.308
## FFD.2 0.45463 0.33261 1.367 0.402
## FFD.3 -0.37805 0.72437 -0.522 0.694
## FFD.4 1.87443 1.25840 1.490 0.376
## FFD.5 1.59345 0.87066 1.830 0.318
## FFD.6 0.21401 0.91138 0.235 0.853
## FFD.7 2.27585 1.17398 1.939 0.303
## FFD.8 0.91302 0.44476 2.053 0.289
## FFD.9 0.12313 0.18826 0.654 0.631
## FFD.10 -0.43492 0.42465 -1.024 0.492
## FFD.11 1.19993 0.72913 1.646 0.348
## FFD.12 -0.02874 0.52396 -0.055 0.965
## FFD.13 1.59485 0.77010 2.071 0.286
## FFD.14 1.64463 0.54248 3.032 0.203
##
## Residual standard error: 20.16 on 1 degrees of freedom
## Multiple R-squared: 0.9997, Adjusted R-squared: 0.9956
## F-statistic: 243.1 on 16 and 1 DF, p-value: 0.05035
checkresiduals(ARDL.Radiation.NoIntercept.1x14$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 7.5102, df = 3, p-value = 0.0573
##
## Model df: 0. Total lags used: 3
MASE(ARDL.Radiation.NoIntercept.1x14)
## MASE
## ARDL.Radiation.NoIntercept.1x14 0.1255573
Model is insignificant at 5% significance level.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ RelHumidity, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 12 4 81.53721 99.48155
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 12 4 81.53721 99.48155
ARDL(12,4) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(12,4):
ARDL.RelHumidity.12x4 = ardlDlm(formula = FFD ~ RelHumidity, data = FFD_dataset, p = 12, q = 4)
summary(ARDL.RelHumidity.12x4)
##
## Time series regression with "ts" data:
## Start = 13, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 13 14 15 16 17 18 19 20
## 0.61481 0.03041 0.03346 -0.62519 -0.73767 -0.23052 1.35028 -1.21052
## 21 22 23 24 25 26 27 28
## 1.03746 -1.10501 1.22645 -0.47680 0.87798 0.35483 -0.72833 0.10790
## 29 30 31
## 0.51057 -0.72155 -0.30856
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.835e+03 1.800e+02 -15.748 0.0404 *
## RelHumidity.t -1.157e+01 9.700e-01 -11.931 0.0532 .
## RelHumidity.1 -1.047e+01 1.288e+00 -8.132 0.0779 .
## RelHumidity.2 -5.998e-01 8.945e-01 -0.671 0.6240
## RelHumidity.3 -1.532e-01 1.034e+00 -0.148 0.9063
## RelHumidity.4 -7.623e+00 9.362e-01 -8.142 0.0778 .
## RelHumidity.5 -4.949e+00 8.016e-01 -6.174 0.1022
## RelHumidity.6 -3.840e+00 9.853e-01 -3.897 0.1599
## RelHumidity.7 6.395e+00 9.233e-01 6.926 0.0913 .
## RelHumidity.8 1.344e+01 1.547e+00 8.684 0.0730 .
## RelHumidity.9 3.190e+00 1.256e+00 2.539 0.2389
## RelHumidity.10 2.936e+01 1.462e+00 20.085 0.0317 *
## RelHumidity.11 2.723e+01 1.906e+00 14.284 0.0445 *
## RelHumidity.12 2.781e+01 2.294e+00 12.124 0.0524 .
## FFD.1 -7.238e-01 7.733e-02 -9.360 0.0678 .
## FFD.2 -2.671e-01 6.259e-02 -4.267 0.1466
## FFD.3 -2.094e-01 6.224e-02 -3.364 0.1840
## FFD.4 -6.733e-01 6.141e-02 -10.963 0.0579 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.317 on 1 degrees of freedom
## Multiple R-squared: 0.9994, Adjusted R-squared: 0.9887
## F-statistic: 94.04 on 17 and 1 DF, p-value: 0.08093
checkresiduals(ARDL.RelHumidity.12x4$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 13.042, df = 4, p-value = 0.01107
##
## Model df: 0. Total lags used: 4
MASE(ARDL.RelHumidity.12x4)
## MASE
## ARDL.RelHumidity.12x4 0.02503557
Model is insignificant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = FFD ~ -1 + RelHumidity, data = FFD_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per AIC
## p q AIC BIC
## 1 14 1 132.8892 147.0539
head(df[order( df[,4] ),] %>% filter(, AIC >= 0 & BIC >= 0),1) # Best model as per BIC
## p q AIC BIC
## 1 14 1 132.8892 147.0539
ARDL(14,1) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(14,1):
ARDL.RelHumidity.NoIntercept.14x1 = ardlDlm(formula = FFD ~ -1 + RelHumidity, data = FFD_dataset, p = 14, q = 1)
summary(ARDL.RelHumidity.NoIntercept.14x1)
##
## Time series regression with "ts" data:
## Start = 15, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 15 16 17 18 19 20 21 22 23 24
## -4.3614 9.1725 2.9031 2.6461 1.6513 -2.3081 7.3930 -0.4973 6.0919 -4.5694
## 25 26 27 28 29 30 31
## -3.7208 -4.2671 -2.1394 -5.7978 -1.7040 2.8791 -3.6651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## RelHumidity.t -11.0600 6.1087 -1.811 0.321
## RelHumidity.1 -3.7933 7.0927 -0.535 0.687
## RelHumidity.2 -2.9773 5.0579 -0.589 0.661
## RelHumidity.3 2.2570 4.5673 0.494 0.708
## RelHumidity.4 -6.5883 5.9583 -1.106 0.468
## RelHumidity.5 -9.0079 6.4804 -1.390 0.397
## RelHumidity.6 -1.2076 5.2190 -0.231 0.855
## RelHumidity.7 8.6483 4.6433 1.863 0.314
## RelHumidity.8 8.2055 6.5177 1.259 0.427
## RelHumidity.9 4.4254 7.1858 0.616 0.649
## RelHumidity.10 18.3930 6.7644 2.719 0.224
## RelHumidity.11 6.8110 13.3378 0.511 0.699
## RelHumidity.12 0.7468 7.0037 0.107 0.932
## RelHumidity.13 -4.9650 8.7825 -0.565 0.672
## RelHumidity.14 -5.5943 11.5908 -0.483 0.714
## FFD.1 0.1829 0.4519 0.405 0.755
##
## Residual standard error: 18.29 on 1 degrees of freedom
## Multiple R-squared: 0.9998, Adjusted R-squared: 0.9964
## F-statistic: 295.4 on 16 and 1 DF, p-value: 0.04567
checkresiduals(ARDL.RelHumidity.NoIntercept.14x1$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 1.7001, df = 3, p-value = 0.6369
##
## Model df: 0. Total lags used: 3
MASE(ARDL.RelHumidity.NoIntercept.14x1)
## MASE
## ARDL.RelHumidity.NoIntercept.14x1 0.1724197
Model is significant at 5% significance level.
ARDL DLM models for Temperature, Rainfall and Relative Humidity regressors without intercept are significant. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("ARDL.Temperature.NoIntercept.13x3", "ARDL.Rainfall.NoIntercept.12x5", "ARDL.RelHumidity.NoIntercept.14x1")
AIC <- c(AIC(ARDL.Temperature.NoIntercept.13x3), AIC(ARDL.Rainfall.NoIntercept.12x5), AIC(ARDL.RelHumidity.NoIntercept.14x1))
BIC <- c( BIC(ARDL.Temperature.NoIntercept.13x3), BIC(ARDL.Rainfall.NoIntercept.12x5), BIC(ARDL.RelHumidity.NoIntercept.14x1))
Adjusted_Rsquared <- c(0.9993, 0.9987, 0.9964)
MASE <- MASE(ARDL.Temperature.NoIntercept.13x3, ARDL.Rainfall.NoIntercept.12x5, ARDL.RelHumidity.NoIntercept.14x1)
data.frame(AIC, BIC, Adjusted_Rsquared, MASE) %>% arrange(MASE)
## AIC BIC Adjusted_Rsquared n
## ARDL.Temperature.NoIntercept.13x3 109.1219 125.1485 0.9993 18
## ARDL.Rainfall.NoIntercept.12x5 126.9599 144.9042 0.9987 19
## ARDL.RelHumidity.NoIntercept.14x1 132.8892 147.0539 0.9964 17
## MASE
## ARDL.Temperature.NoIntercept.13x3 0.05850544
## ARDL.Rainfall.NoIntercept.12x5 0.06993808
## ARDL.RelHumidity.NoIntercept.14x1 0.17241968
Thus, as per AIC, BIC, MASE (best in terms of forecasting), and Adjusted R-Squared, ARDL(13,3) model for FFD with Temperature as the regressor with no intercept (ARDL.Temperature.NoIntercept.13x3) is the best.
Diagnostic check for ARDL (Residual analysis):
checkresiduals(ARDL.Temperature.NoIntercept.13x3$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 24.364, df = 4, p-value = 6.753e-05
##
## Model df: 0. Total lags used: 4
Serial autocorrelations left in residuals are significant as per Ljung-Box test and ACF plot. From the time series plot and histogram of residuals, there is a random pattern and normality in the residual distribution. Thus, no violation in general assumptions.
The 4 DLM models are,
mean absolute scaled errors or MASE
of these models are,
MASE(DLM.RelHumidity, PolyDLM.Temperature, Koyck.RelHumidity.NoIntercept, ARDL.Temperature.NoIntercept.13x3) %>% arrange(MASE)
## n MASE
## DLM.RelHumidity 17 0.01053278
## ARDL.Temperature.NoIntercept.13x3 18 0.05850544
## PolyDLM.Temperature 17 0.73884055
## Koyck.RelHumidity.NoIntercept 30 0.85128511
The Best DLM model for the FFD response which gives the most accurate forecasting based on the MASE measure is the Finite DLM model having Relative Humidity as regressor with no intercept , DLM.RelHumidity with MASE measure of 0.01053278.
Dynamic linear models are general class of time series regression models which can account for trends, seasonality, serial correlation between response and regressor variable, and most importantly the affect of intervention points.
The response of a general Dynamic linear model is,
\(Y_t = \omega_2Y_{t-1} + (\omega_0 +
\omega_1)P_t - \omega_2\omega_0P_{t-1} + N_t\)
where,
Lets revisit the time series plot for the response, FFD, to visualize
possible intervention points
plot(FFD)
As mentioned at the descriptive analysis stage, there is no clear intervention that we identify visually. But maybe years 2002 and 2003 might be intervention points just because of their magnitude. Assuming this intervention point lets fit a Dynamic Linear model and see if the pulse function at years 2002 and 2003 are significant or not.
As always we do, we will have a look at ACF and PACF plots of the FFD series first.
acf(FFD, main="ACF of FFD")
pacf(FFD, main ="PACF of FFD")
In ACF plot we see a slowly decaying pattern indicating trend in the FFD series. In PACF plot we see 1 high vertical spike indicating trend. No significant seasonal behavior is observed. Thus, lets fit a Dynamic linear model with trend component and no seasonal component. For thoroughness, lets test all possible combinations using trend, multiple lags of FFD, and most importantly, the Pulse at 1996.
Now, lets fit Dynamic Linear model using dynlm() as shown below, (Note, the potential intervention point was identified at years 2002 and 2003).
With intercept :
Y.t = FFD
T = c(19,20) # The time point when the intervention occurred
P.t = 1*(seq(FFD) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model1 = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + P.t.1) # library(dynlm)
Dyn.model2 = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Dyn.model3 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Dyn.model4 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model5 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t) # library(dynlm)
AIC(Dyn.model, Dyn.model1, Dyn.model2, Dyn.model3, Dyn.model4, Dyn.model5) %>% arrange(AIC)
## df AIC
## Dyn.model4 7 234.6498
## Dyn.model3 6 240.7430
## Dyn.model5 6 242.0378
## Dyn.model 5 245.7365
## Dyn.model2 5 245.7365
## Dyn.model1 4 255.4715
summary(Dyn.model4)
##
## Time series regression with "ts" data:
## Start = 1987, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + L(Y.t, k = 2) + L(Y.t,
## k = 3) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.864 -7.892 -2.667 5.050 29.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 400.29802 56.98282 7.025 4.76e-07 ***
## L(Y.t, k = 1) -0.16899 0.12349 -1.368 0.18499
## L(Y.t, k = 2) -0.01011 0.11436 -0.088 0.93033
## L(Y.t, k = 3) -0.09248 0.11302 -0.818 0.42197
## P.t 89.69321 11.57506 7.749 9.98e-08 ***
## trend(Y.t) -1.02242 0.34538 -2.960 0.00723 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.04 on 22 degrees of freedom
## Multiple R-squared: 0.7615, Adjusted R-squared: 0.7073
## F-statistic: 14.05 on 5 and 22 DF, p-value: 3.162e-06
Without intercept :
Y.t = FFD
T = c(19,20) # The time point when the intervention occurred
P.t = 1*(seq(FFD) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model1.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + P.t + P.t.1) # library(dynlm)
Dyn.model2.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Dyn.model3.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Dyn.model4.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t)) # library(dynlm)
AIC(Dyn.model.NoIntercept, Dyn.model1.NoIntercept, Dyn.model2.NoIntercept, Dyn.model3.NoIntercept, Dyn.model4.NoIntercept) %>% arrange(AIC)
## df AIC
## Dyn.model4.NoIntercept 6 265.5930
## Dyn.model3.NoIntercept 5 278.4041
## Dyn.model1.NoIntercept 3 290.7651
## Dyn.model.NoIntercept 4 292.6898
## Dyn.model2.NoIntercept 4 292.6898
summary(Dyn.model4.NoIntercept)
##
## Time series regression with "ts" data:
## Start = 1987, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ 0 + L(Y.t, k = 1) + L(Y.t, k = 2) + L(Y.t,
## k = 3) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.987 -3.470 0.793 9.987 41.311
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## L(Y.t, k = 1) 0.37567 0.16929 2.219 0.0366 *
## L(Y.t, k = 2) 0.22166 0.19286 1.149 0.2622
## L(Y.t, k = 3) 0.38212 0.15958 2.395 0.0252 *
## P.t 65.01230 19.42521 3.347 0.0028 **
## trend(Y.t) -0.08072 0.56062 -0.144 0.8868
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.73 on 23 degrees of freedom
## Multiple R-squared: 0.9947, Adjusted R-squared: 0.9935
## F-statistic: 855.4 on 5 and 23 DF, p-value: < 2.2e-16
The best Dynamic Linear models with and without intercept were Dyn.model4 and Dyn.model4.NoIntercept respectively. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("Dyn.model4", "Dyn.model4.NoIntercept")
AIC <- c(AIC(Dyn.model4), AIC(Dyn.model4.NoIntercept))
BIC <- c( BIC(Dyn.model4), BIC(Dyn.model4.NoIntercept))
Adjusted_Rsquared <- c(0.7071, 0.9979)
data.frame(Model,AIC, BIC, Adjusted_Rsquared) %>% arrange(AIC)
## Model AIC BIC Adjusted_Rsquared
## 1 Dyn.model4 234.6498 243.9753 0.7071
## 2 Dyn.model4.NoIntercept 265.5930 273.5863 0.9979
Thus, as per Adjusted R-Squared, Dynamic Linear model for FFD with no intercept (Dyn.model4) is the best.
Dyn.model4 is the best Dynamic Linear model as per Adjusted R-Squared with 3 lagged components of the response (FFD), a significant pulse component at years 2002 and 2003, and trend and seasonal components of FFD series. Lets look at the summary statistics and check residuals
summary(Dyn.model4)
##
## Time series regression with "ts" data:
## Start = 1987, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + L(Y.t, k = 2) + L(Y.t,
## k = 3) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.864 -7.892 -2.667 5.050 29.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 400.29802 56.98282 7.025 4.76e-07 ***
## L(Y.t, k = 1) -0.16899 0.12349 -1.368 0.18499
## L(Y.t, k = 2) -0.01011 0.11436 -0.088 0.93033
## L(Y.t, k = 3) -0.09248 0.11302 -0.818 0.42197
## P.t 89.69321 11.57506 7.749 9.98e-08 ***
## trend(Y.t) -1.02242 0.34538 -2.960 0.00723 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.04 on 22 degrees of freedom
## Multiple R-squared: 0.7615, Adjusted R-squared: 0.7073
## F-statistic: 14.05 on 5 and 22 DF, p-value: 3.162e-06
checkresiduals(Dyn.model4)
##
## Breusch-Godfrey test for serial correlation of order up to 9
##
## data: Residuals
## LM test = 18.53, df = 9, p-value = 0.0295
Summary of Dynamic linear model,
Dyn.model4.NoIntercept
The dynamic linear model, Dyn.model4, is significant and the pulse (P.t) component significant at year 2002 and 2003.
Exponential smoothing methods including the state-space models takes into consideration the Error component, Trend component and seasonality component of the time series. Each of these components can be absent (None), Additive (A) or Multiplicative (M). Hence, these models are represented as ETS(ZZZ) representing the Error, Trend and Seasonal component respectively.
The best Exponential Smoothing model or State-Space model for our FFD time series can be easily identified by triggering the auto-search by setting the argument model = “ZZZ” in the ets() as shown below. Also, we will check if damped trend and the possibility of drift give us better models.
Best Exponential Smoothing model -
autofit.ETS = ets(FFD, model="ZZZ")
summary(autofit.ETS)
## ETS(M,N,N)
##
## Call:
## ets(y = FFD, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 306.3826
##
## sigma: 0.0825
##
## AIC AICc BIC
## 310.6132 311.5021 314.9152
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0009438502 24.43768 16.75958 -0.5805143 5.329407 0.8759362
## ACF1
## Training set 0.2589561
checkresiduals(autofit.ETS)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,N)
## Q* = 3.1718, df = 6, p-value = 0.787
##
## Model df: 0. Total lags used: 6
System chooses the Simple exponential smoothing with Multiplicative errors ETS(MNN). MASE is 0.8759362.
Best Exponential Smoothing model with damping -
autofit.ETS.damped = ets(FFD, model="ZZZ", damped = TRUE)
summary(autofit.ETS.damped)
## ETS(A,Ad,N)
##
## Call:
## ets(y = FFD, model = "ZZZ", damped = TRUE)
##
## Smoothing parameters:
## alpha = 5e-04
## beta = 1e-04
## phi = 0.9798
##
## Initial states:
## l = 315.7065
## b = -0.7506
##
## sigma: 25.9994
##
## AIC AICc BIC
## 315.0015 318.5015 323.6054
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.3745223 23.81052 15.17483 -0.4243502 4.772873 0.7931096
## ACF1
## Training set 0.2279019
checkresiduals(autofit.ETS.damped)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,N)
## Q* = 3.1879, df = 6, p-value = 0.7849
##
## Model df: 0. Total lags used: 6
System chooses the Holt’s damped model with Additive errors ETS(A,Ad,N). MASE is 0.7931096.
Best Exponential Smoothing model with drift -
autofit.ETS.drift = ets(FFD, model="ZZZ", beta = 1E-4)
summary(autofit.ETS.drift)
## ETS(M,N,N)
##
## Call:
## ets(y = FFD, model = "ZZZ", beta = 1e-04)
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 306.3826
##
## sigma: 0.0839
##
## AIC AICc BIC
## 310.6132 311.5021 314.9152
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0009438502 24.43768 16.75958 -0.5805143 5.329407 0.8759362
## ACF1
## Training set 0.2589561
checkresiduals(autofit.ETS.drift)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,N)
## Q* = 3.1718, df = 6, p-value = 0.787
##
## Model df: 0. Total lags used: 6
Again system chooses the ETS(MNN) model.
Thus, the best Exponential smoothing or State-state model for our FFD series is the best Holt’s damped model with Additive errors ETS(A,Ad,N) with MASE score of 0.7931096.
The best State-space model which gives the most accurate forecasting based on the MASE measure is ETS(A,Ad,N) having lowest MASE measure of 0.7931096 of all possible State space models.
Based on the 4 Time series regression methods considered, the best
model as per MASE measure for each method is summarized below,
A. Best Distributed lag models is - Finite DLM model having Relative humidity as regressor without an intercept DLM.RelHumidity with MASE measure of 0.01053278, AIC of 39.90284, BIC of 54.06747 and Adjusted R-squared of 99.86%.
B. Best Dynamic linear models is - Dyn.model4.NoIntercept having 3 lagged components of the response (FFD), a significant pulse component at years 2002 and 2003, and trend and seasonal components with AIC of 234.6498, BIC of 243.9753 and Adjusted R-squared of 70.73%.
C. Best Exponential smoothing and State-Space model is - Holt’s damped model with Additive errors ETS(A,Ad,N) with MASE measure of 0.7931096, AIC of 315.0015 and BIC of 323.6054.
Clearly, the best model is Finite DLM model having Relative humidity as regressor without an intercept DLM.RelHumidity as per AIC, BIC, Adjusted R-squared and MASE measures.
Best Time Series regression model is - Finite DLM model having Relative humidity as regressor without an intercept (DLM.RelHumidity) with MASE measure of 0.01053278.
Residual analysis to test model assumptions.
Lets perform a detailed Residual Analysis to check if any model assumptions have been violated.
The estimator error (or residual) is defined by:
\(\hat{\epsilon_i}\) = \(Y_i\) - \(\hat{Y_i}\) (i.e. observed value less - trend value)
The following problems are to be checked,
Lets first apply diagnostic check using checkresiduals() function,
checkresiduals(DLM.RelHumidity)
## 1 2 3 4 5 6
## -0.50250987 -0.14705938 0.32626851 0.29784316 0.24216425 -0.54986885
## 7 8 9 10 11 12
## 0.23331897 -0.20803168 0.50850139 0.07914825 0.09089903 -0.25841685
## 13 14 15 16 17
## -0.01506289 0.03899430 -0.32784972 0.01504263 0.17661876
##
## Ljung-Box test
##
## data: Residuals
## Q* = 5.5937, df = 3, p-value = 0.1331
##
## Model df: 0. Total lags used: 3
From the Residuals plot, linearity is not violated as the residuals are randomly distributed across the mean. Thus, linearity in distribution of error terms is not violated
To test mean value of residuals is zero or not, lets calculate mean value of residuals as,
mean(DLM.RelHumidity$model$residuals)
## [1] 2.693605e-17
As mean value of residuals is close to 0, zero mean residuals is not violated.
Which has,
\(H_0\) : series
of residuals exhibit no serial autocorrelation of any order up to p
\(H_a\) : series of residuals
exhibit serial autocorrelation of any order up to p
From the Ljung-Box test output, since p (0.1331) > 0.05, we do not reject the null hypothesis of no serial autocorrelation.
Thus, according to this test and ACF plot, we can conclude that the serial correlation left in residuals is insignificant.
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(DLM.RelHumidity$model$residuals)
##
## Shapiro-Wilk normality test
##
## data: DLM.RelHumidity$model$residuals
## W = 0.96952, p-value = 0.8101
From the Shapiro-Wilk test, since p>0.05 significance level, we do reject the null hypothesis that states the data is normal. Thus, residuals of DLM.RelHumidity model are normally distributed.
Summarizing residual analysis on \(DLM.RelHumidity\) model:
Assumption 1: The error terms are randomly distributed and thus show
linearity: Not violated
Assumption 2:
The mean value of E is zero (zero mean residuals): Not
violated
Assumption 4: The error terms are
independently distributed, i.e. they are not autocorrelated:
Not violated
Assumption 5: The errors
are normally distributed. Not violated
Having no residual assumptions’ violations, the Finite DLM model having Relative humidity as regressor without an intercept (DLM.RelHumidity) model is good for accurate forecasting of FFD. Lets forecast for the next 4 years ahead FFD,
Using MASE measure, Finite DLM model, DLM.RelHumidity is best fitted model to forecast FFD. Lets estimate and plot 4 years (2015-2018) ahead forecasts for FFD series.
Observed and fitted values are plotted below. This plot indicates a good agreement between the model and the original series. (Note, since lag is set as 14 (q=14), fitted values are not available for the first 14 years)
plot(FFD, ylab='FFD', xlab = 'Year', type="l", col="black", main="Observed and fitted values using DLM.RelHumidity model on FFD")
lines(ts(DLM.RelHumidity$model$fitted.values, start = c(1998)), col="red")
legend("topleft",lty=1, text.width = 12,
col=c("black", "red"),
c("FFD series", "DLM.RelHumidity fit"))
Using the given 4 years ahead future covariates values, we can forecast our FFD response.
Future_Covariates <- read.csv("C:/Users/admin/Downloads/Covariate x-values for Task 2.csv")
head(Future_Covariates)
## Year Temperature Rainfall Radiation RelHumidity
## 1 2015 20.74 2.27 14.60 52.16
## 2 2016 20.49 2.38 14.56 52.87
## 3 2017 20.52 2.26 14.79 52.58
## 4 2018 20.56 2.27 14.79 52.50
Our DLM.RelHumidity model uses only 1 covariate, Relative Humidity. 4 years ahead point forecasts of FFD using Relative Humidity covariate is,
DLM.RelHumidity = dlm(formula = FFD ~ RelHumidity, data = FFD_dataset, q = 14)
x.new = c(Future_Covariates$RelHumidity)
forecasts.dlm = dLagM::forecast(model = DLM.RelHumidity, x = x.new, h = 4)$forecasts
Forecast using overall BEST fitting model:
The point forecasts and the forecast plot using the overall best fitting model, DLM.RelHumidity is given below,
df <- data.frame(
Finite_DLM_forecasts = c(forecasts.dlm)
)
row.names(df) <- c("2015", "2016", "2017", "2018")
df
## Finite_DLM_forecasts
## 2015 217.4990
## 2016 164.6623
## 2017 203.6109
## 2018 271.5180
FFD.extended1 = c(FFD, forecasts.dlm)
{
plot(ts(FFD.extended1, start = c(1984)), type="l", col = "red",
ylab = "FFD", xlab = "Year",
main="4 years ahead forecasts for FFD series
using DLM.RelHumidity model")
lines(FFD,col="black",type="l")
legend("topleft",lty=1,
col=c("black", "red"),
c("FFD series", "Finite DLM forecasts"))
}
The forecasts for best Finite DLM, Polynomial DLM, Koyck, Dynamic Linear model, and Exponential smoothing/State-space model are plotted and given below,
For Distributed Lag models:
The 4 years ahead Point forecasts for the DLM models are printed and plotted below, (Note, since the best Koyck and ARDL models do not have intercept, their forecasts aren’t printed)
# Forecasts using Finite DLM
x.new = c(Future_Covariates$RelHumidity)
forecasts.dlm = dLagM::forecast(model = DLM.RelHumidity, x = x.new, h = 4)$forecasts
# Forecasts using Polynomial DLM
x.new2 = c(Future_Covariates$Temperature)
forecasts.polydlm = dLagM::forecast(model = PolyDLM.Temperature , x = x.new2, h = 4)$forecasts
df <- data.frame(
Finite_DLM_forecasts = c(forecasts.dlm),
Polynomial_DLM_forecasts = c(forecasts.polydlm)
)
row.names(df) <- c("2015", "2016", "2017", "2018")
df
## Finite_DLM_forecasts Polynomial_DLM_forecasts
## 2015 217.4990 306.4362
## 2016 164.6623 301.9256
## 2017 203.6109 297.3487
## 2018 271.5180 295.2537
FFD.extended1 = c(FFD , forecasts.dlm)
FFD.extended2 = c(FFD , forecasts.polydlm)
{
plot(ts(FFD.extended1, start = c(1984)),type="l", col = "Red",
ylab = "FFD", xlab = "Year",
main="4 years ahead forecast for FFD series
using DLM models")
lines(ts(FFD.extended2, start = c(1984)),col="blue",type="l")
lines(FFD,col="black",type="l")
legend("topleft",lty=1,
col=c("black", "red", "blue"),
c("FFD series", "Finite DLM forecasts", "Polynomial DLM forecasts"))
}
For Dynamic Linear model:
The 4 years ahead point forecasts are printed and plotted below,
Dyn.model4 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t)) # library(dynlm)
q = 4
n = nrow(Dyn.model4$model)
FFD.frc = array(NA , (n + q))
FFD.frc[1:n] = Y.t[4:length(Y.t)] # length(1:n) = length(2:length(Y.t)) = 28
trend = array(NA,q)
trend.start = Dyn.model4$model[n,"trend(Y.t)"]
trend = seq(trend.start , trend.start + q/1, 1)
for (i in 1:q){
#months = array(0,11)
#months[(i+4)%%12] = 1 # Data ends in May, to start the new forecast from JUNE, put i + 4.
data.new = c(1,FFD.frc[n-1+i], FFD.frc[n-2+i], FFD.frc[n-3+i], P.t[n] ,trend[i])
FFD.frc[n+i] = as.vector(Dyn.model4$coefficients) %*% data.new
}
par(mfrow=c(1,1))
plot(Y.t,xlim=c(1984,2018),ylab='FFD',xlab='Year',main = "Time series plot of FFD series with 4 years ahead forecasts (in red)")
lines(ts(FFD.frc[(n+1):(n+q)],start=c(2015)),col="red")
For Exponential smoothing/State-space model:
The 4 years ahead point forecasts and Confidence intervals are printed and plotted below,
forecasts.Dynlm = forecast::forecast(autofit.ETS.damped, h = 4)
forecasts.Dynlm
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2015 298.2942 264.9747 331.6138 247.3364 349.2521
## 2016 297.9112 264.5917 331.2308 246.9534 348.8691
## 2017 297.5360 264.2164 330.8555 246.5781 348.4938
## 2018 297.1682 263.8487 330.4878 246.2104 348.1261
plot(forecasts.Dynlm, ylab="FFD", type="l", fcol="red", xlab="Year", ylim= c(100, 400),
main="4 years ahead forecasts using Dynamic Linear model")
legend("topleft",lty=1, pch=1, col=1:2, c("FFD series","Dynlm forecasts"))
The most fitting model for our FFD series in terms of MASE which assesses the forecast accuracy is the Finite DLM model with Relative Humidity as regressor \(DLM.RelHumidity\). The point forecasts for 4 years ahead reported using the forecast() of dLagM package are 217.4990, 164.6623, 203.6109, and 271.5180 respectively (Confidence Intervals are not outputted).
Potentially better forecasting methods can be explored, compared and diagnosed for better fit.
The dataset holds 6 columns and 31 observations. They are, Year column, the Rank-based Order similarity metric (RBO) which denotes changes in flowering orders of the 81 plant species measured by computing the similarity between annual flowering order and the flowering order of 1983. Other 4 columns are climate factors namely, rainfall (rain), temperature (temp), radiation level (rad), and relative humidity (RH) - all measured from 1984 to 2014.
Our aim for the RBO dataset is to give best 3 years ahead forecasts by determining the most accurate and suitable regression model that determines the annual Rank-based Order similarity metric (RBO) in terms of MASE using single predictor (univariate analysis). A descriptive analysis will be conducted initially. Model-building strategy will be applied to find the best fitting model from the time series regression methods (dLagM package) and dynamic linear models (dynlm package).
MASE, Information Criteria (AIC and BIC), and Adjusted R Squared.
RBO_dataset <- read.csv("C:/Users/admin/Downloads/RBO.csv")
head(RBO_dataset)
## Year RBO Temperature Rainfall Radiation RelHumidity
## 1 1984 0.7550088 18.71038 2.489344 14.87158 93.92650
## 2 1985 0.7407520 19.26301 2.475890 14.68493 94.93589
## 3 1986 0.8423860 18.58356 2.421370 14.51507 94.09507
## 4 1987 0.7484425 19.10137 2.319726 14.67397 94.49699
## 5 1988 0.7984084 20.36066 2.465301 14.74863 94.08142
## 6 1989 0.7938803 19.59589 2.735890 14.78356 96.08685
For fitting a regression model, the response is Rank-based flowering Order similarity metric, RBO, and the 4 regressor variables are the Temperature, Rainfall, Radiation Level and Relative Humidity.
All the 5 variables are continuous variables.
Lets first get the regressor and response as TS objects,
RBO = ts(RBO_dataset[,2], start = c(1984))
Temperature = ts(RBO_dataset[,3], start = c(1984))
Rainfall = ts(RBO_dataset[,4], start = c(1984))
Radiation = ts(RBO_dataset[,5], start = c(1984))
RelHumidity = ts(RBO_dataset[,6], start = c(1984))
data.ts = ts(RBO_dataset, start = c(1984)) # Y and x in single dataframe
Lets scale, center and plot all the 5 variables together
data.scale = scale(data.ts)
plot(data.scale[,2:6], plot.type="s", col=c("black", "red", "blue", "green", "yellow"), main = "RBO (Black - Respone), Temperature (Red - X1),\n Rainfall (Blue - X2), Radiation (Green - X3), RelHumidity (Yellow - X4)")
It is hard to read the correlations between the regressors and the response and the among the response themselves. But it is fair to say the 5 variables show some correlations. Lets check for correlation statistically using ggpairs(),
ggpairs(data = RBO_dataset, columns = c(2,3,4,5,6), progress = FALSE) #library(GGally)
Hence, some correlations between the 4 regressors and response is present. We can generate regression model based on these correlations. First, lets look at the descriptive statistics
Since we are generating regression model which estimates the response, \(RBO\), lets focus on RBOs statistics.
summary(RBO)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6629 0.7043 0.7321 0.7379 0.7566 0.8424
The mean and median of the RBO are very close indicating symmetrical distribution.
The time series plot for our data is generated using the following code chunk,
plot(RBO, ylab='Yearly average of RBO',xlab='Year',
type='o', main="Figure 1: Yearly Average RBO Trend (1984-2014)")
Plot Inference :
From Figure 1, we can comment on the time series’s,
Trend: The overall shape of the trend seems to follow an downward trend. Thus, indicating non-stationarity.
Seasonality: From the plot, no seasonal behavior is seen.
Change in Variance: The variation is very random
Behavior: We notice mixed behavior of MA and AR series. AR behavior is seen as we obverse following data points. MA behavior is evident due to up and down fluctuations in the data points.
Intervention/Change points: Year 1996 might be an intervention point as the mean level of the RBO series falls notably low from this point onwards.
acf(RBO, main="ACF of RBO")
pacf(RBO, main ="PACF of RBO")
ACF plot: We notice first 3 autocorrelations are significant. A slowly decaying pattern indicates non stationary series. We do not see any ‘wavish’ form. Thus, no significant seasonal behavior is observed.
PACF plot: We see 1 high vertical spike indicating non stationary series. We have observed non stationarity in the time series plot as well. Also, the second correlation bar is significant as well.
Many model estimating procedures assume normality of the residuals. If this assumption doesn’t hold, then the coefficient estimates are not optimum. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.
qqnorm(RBO, main = "Normal Q-Q Plot of Average yearly RBO")
qqline(RBO, col = 2)
We see deviations from normality. Clearly, upper tail is off and most of the data in middle is off the line as well. Lets check statistically using shapiro-wilk test. Lets state the hypothesis of this test,
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(RBO)
##
## Shapiro-Wilk normality test
##
## data: RBO
## W = 0.96136, p-value = 0.3169
From the Shapiro-Wilk test, since p > 0.05 significance level, we do not reject the null hypothesis that states the data is normal. Thus, RBO series is normally distributed.
The time series plot, ACF and PACF of RBO time series at the descriptive analysis stage of time series tells us non-stationarity in our time series. Lets use ADF and PP tests,
Using ADF (Augmented Dickey-Fuller) test :
Lets confirm the non-stationarity using Dickey-Fuller Test or ADF
test. Lets state the hypothesis,
\(H_0\) : Time series is Difference
non-stationary
\(H_a\) : Time
series is Stationary
adf.test(RBO) #library(tseries)
##
## Augmented Dickey-Fuller Test
##
## data: RBO
## Dickey-Fuller = -2.0545, Lag order = 3, p-value = 0.5518
## alternative hypothesis: stationary
since p-value > 0.05, we do not reject null hypothesis of non stationarity. we can conclude that the series is non-stationary at 5% level of significance.
Using PP (Phillips-Perron) test :
The null and alternate hypothesis are same as ADF test.
PP.test(RBO, lshort = TRUE)
##
## Phillips-Perron Unit Root Test
##
## data: RBO
## Dickey-Fuller = -3.5927, Truncation lag parameter = 2, p-value =
## 0.04906
PP.test(RBO, lshort = FALSE)
##
## Phillips-Perron Unit Root Test
##
## data: RBO
## Dickey-Fuller = -3.8292, Truncation lag parameter = 8, p-value =
## 0.03167
According to the PP tests, RBO series is stationary at 5% level
The two procedures give differing outcomes. Since Philips-Perron (PP) test is non-parametric, i.e. it does not require to select the level of serial correlation as in ADF and since our RBO series does not have significant serial autocorrelations, we can go with the outcome of PP test stating the RBO series is stationary.
At the descriptive analysis stage, from the time series plot and the ACF/PACF plots, no seasonal pattern was observed but a downward trend was observed. Lets decompose the RBO series and confirm. STL decomposition method will be used.
Lets set t.window to 15 and look the STL decomposed plots,
We can adjust the series for seasonality by subtracting the seasonal component from the original series using the following code chunk,
Note - Since we cannot do decomposition on a series having frequency as 1, lets falsely use frequency as 2. Also note, the time truncates from 2014 to 2000 as the frequency is doubled. This is okay since we are just interested in the decomposition.
# Code gist - Apply STL decomposition to get seasonally adjusted and trend adjusted and visually compare w.r.t to original time series
RBOX = ts(RBO_dataset[,2], start = c(1984),frequency = 2) # set frequency
stl.RBO <- stl(window(RBOX, start=c(1984)), t.window=15, s.window="periodic", robust=TRUE)
par(mfrow=c(3,1))
plot(RBOX,ylab='RBO',xlab='Time',
type='o', main="Original RBO Time Series")
plot(seasadj(stl.RBO), ylab='RBO',xlab='Time', main = "Seasonally adjusted RBO")
stl.RBO.trend = stl.RBO$time.series[,"trend"] # Extract the trend component from the output
stl.RBO.trend.adjusted = RBOX - stl.RBO.trend
plot(stl.RBO.trend.adjusted, ylab='RBO',xlab='Time', main = "Trend adjusted RBO")
par(mfrow=c(1,1))
On very close inspection of the plots above, the trend adjusted series looks more different (than the seasonally adjusted series) from the Original RBO series. Meaning, trend component is more significant than the seasonal component in the RBO series.
Trend component is more significant than the seasonal component in the RBO series. Thus, we expect the fitted model to have no seasonal component.
Time series regression methods namely,
Based on whether the lags are known (Finite DLM) or undetermined (Infinite DLM), 4 major modelling methods will be tested, namely,
The response of a finite DLM model with 1 regressor is represented as
shown below,
\(Y_t = \alpha + \sum_{s=0}^{q} \beta_s
X_{t-s} + \epsilon_t\)
where,
In our dataset, we have 4 regressors. For uni variate analysis lets fit models with single regressor for each of the 4 regressors.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = RBO ~ Temperature, data = RBO_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 14 14 0.14051 -104.59411 -90.42949 0.14730 0.00930 0.44459 0.0299955362
## 1 1 1.00562 -97.52442 -91.91963 0.97718 -1.50100 0.05726 0.0022526947
## 2 2 1.14742 -91.57332 -84.73684 1.18433 2.00053 0.03520 0.0019973822
## 3 3 1.18155 -88.84678 -80.85356 1.19082 1.78688 -0.09575 0.0003753525
## 13 13 0.29047 -85.71072 -71.46477 0.25411 -0.02821 0.00081 0.0067221779
## 10 10 0.52353 -85.18937 -71.61058 0.65204 -10.06486 -0.01070 0.1806431756
## 4 4 1.24918 -82.35556 -73.28471 1.17170 0.61711 -0.15735 0.0006014497
## 5 5 1.09444 -81.62553 -71.56076 1.03411 0.44496 -0.08477 0.0016802983
## 9 9 0.62982 -79.75895 -66.66644 0.52185 2.67533 -0.28008 0.7633752533
## 8 8 0.71270 -78.35040 -65.85997 0.55285 -3.45059 -0.09835 0.2689392429
## 11 11 0.48457 -77.95858 -64.01833 0.43409 -0.30578 -0.27985 0.1390715482
## 6 6 0.98697 -77.12304 -66.15316 0.75879 1.05760 -0.17893 0.0077737837
## 7 7 0.91375 -76.88467 -65.10413 0.79294 0.21697 -0.09764 0.0661515342
## 12 12 0.50627 -73.59058 -59.42400 0.41570 19.78949 -0.57819 0.6363609739
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.Temperature = dlm(formula = RBO ~ Temperature, data = RBO_dataset, q = 14)
summary(DLM.Temperature)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## 1.675e-03 2.586e-05 -4.133e-03 4.653e-03 -4.513e-03 6.615e-03 -3.001e-03
## 8 9 10 11 12 13 14
## 3.239e-03 -3.914e-03 -5.395e-03 3.714e-03 3.849e-03 -4.757e-03 -4.376e-03
## 15 16 17
## 4.745e-03 -2.598e-03 4.172e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.279084 0.519687 -0.537 0.686
## Temperature.t -0.092104 0.029180 -3.156 0.195
## Temperature.1 0.097983 0.043360 2.260 0.265
## Temperature.2 0.058018 0.024706 2.348 0.256
## Temperature.3 -0.067167 0.029143 -2.305 0.261
## Temperature.4 -0.052905 0.022636 -2.337 0.257
## Temperature.5 0.053018 0.031674 1.674 0.343
## Temperature.6 0.063678 0.015198 4.190 0.149
## Temperature.7 -0.014486 0.019045 -0.761 0.586
## Temperature.8 0.036527 0.016580 2.203 0.271
## Temperature.9 -0.067426 0.025603 -2.633 0.231
## Temperature.10 0.027969 0.013594 2.057 0.288
## Temperature.11 -0.039054 0.018627 -2.097 0.283
## Temperature.12 0.060493 0.022320 2.710 0.225
## Temperature.13 0.002056 0.014090 0.146 0.908
## Temperature.14 -0.015631 0.014817 -1.055 0.483
##
## Residual standard error: 0.01693 on 1 degrees of freedom
## Multiple R-squared: 0.9653, Adjusted R-squared: 0.4446
## F-statistic: 1.854 on 15 and 1 DF, p-value: 0.526
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -104.5941 -90.42949
DLM.Temperature Model is insignificant (p-value =
0.526) at 0.05 significant level.
Without intercept :
DLM.Temperature.noIntercept = dlm(formula = RBO ~ 0 + Temperature, data = RBO_dataset, q = 14)
summary(DLM.Temperature.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## -0.0003981 -0.0013348 -0.0039606 0.0043438 -0.0064216 0.0075171 -0.0060594
## 8 9 10 11 12 13 14
## 0.0022465 -0.0072242 -0.0032458 0.0045231 0.0017786 -0.0023514 -0.0002400
## 15 16 17
## 0.0047904 -0.0026957 0.0084360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Temperature.t -0.091281 0.023388 -3.903 0.0598 .
## Temperature.1 0.081508 0.024594 3.314 0.0802 .
## Temperature.2 0.052694 0.018163 2.901 0.1011
## Temperature.3 -0.058068 0.019030 -3.051 0.0927 .
## Temperature.4 -0.051026 0.017950 -2.843 0.1047
## Temperature.5 0.041974 0.019334 2.171 0.1621
## Temperature.6 0.060570 0.011279 5.370 0.0330 *
## Temperature.7 -0.008716 0.012621 -0.691 0.5612
## Temperature.8 0.036608 0.013307 2.751 0.1106
## Temperature.9 -0.059111 0.016367 -3.612 0.0688 .
## Temperature.10 0.023610 0.008753 2.697 0.1143
## Temperature.11 -0.035049 0.013700 -2.558 0.1248
## Temperature.12 0.053697 0.014757 3.639 0.0679 .
## Temperature.13 0.003294 0.011157 0.295 0.7956
## Temperature.14 -0.013658 0.011521 -1.185 0.3576
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01359 on 2 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 0.9996
## F-statistic: 3133 on 15 and 2 DF, p-value: 0.0003191
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -102.2864 -88.95495
DLM.Temperature.noIntercept Model is significant.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = RBO ~ Rainfall, data = RBO_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 14 14 0.09581 -113.88404 -99.71941 0.09433 -0.03282 0.67842 0.03001328
## 13 13 0.15241 -109.63418 -95.38823 0.12675 0.01008 0.73549 0.69144810
## 1 1 0.94180 -100.89798 -95.29319 0.94415 0.46484 0.15753 0.04687413
## 3 3 0.97969 -97.19966 -89.20643 0.86453 0.97573 0.18688 0.01249933
## 2 2 0.99937 -96.70956 -89.87308 0.83164 0.44840 0.19180 0.03708415
## 4 4 1.03883 -90.46187 -81.39101 0.86444 0.78209 0.14281 0.02073547
## 5 5 0.92568 -87.24242 -77.17765 0.76943 0.47689 0.12600 0.02871866
## 6 6 0.85440 -82.31788 -71.34800 0.51455 27.53976 0.04227 0.04784411
## 9 9 0.62059 -80.79432 -67.70181 0.54274 -0.14207 -0.22123 0.84870967
## 12 12 0.42402 -79.63373 -65.46714 0.37726 0.92378 -0.14823 0.84289084
## 7 7 0.82934 -77.98405 -66.20351 0.77764 0.94056 -0.04849 0.17026081
## 11 11 0.46218 -77.84858 -63.90833 0.29371 0.22987 -0.28691 0.55512719
## 10 10 0.58562 -76.93255 -63.35376 0.53594 0.09998 -0.49754 0.52875140
## 8 8 0.72338 -76.81922 -64.32879 0.62316 0.02713 -0.17396 0.59232039
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.Rainfall = dlm(formula = RBO ~ Rainfall, data = RBO_dataset, q = 14)
summary(DLM.Rainfall)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## -0.0037393 0.0003335 0.0023616 0.0044489 -0.0029005 0.0010146 -0.0042853
## 8 9 10 11 12 13 14
## 0.0013098 -0.0007348 0.0048191 -0.0017438 0.0014497 -0.0022985 0.0015321
## 15 16 17
## -0.0058962 0.0050202 -0.0006913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6361975 0.1507225 4.221 0.148
## Rainfall.t 0.0195284 0.0210850 0.926 0.524
## Rainfall.1 0.0204377 0.0130997 1.560 0.363
## Rainfall.2 0.0027206 0.0156791 0.174 0.891
## Rainfall.3 0.0074631 0.0126796 0.589 0.661
## Rainfall.4 0.0002523 0.0167585 0.015 0.990
## Rainfall.5 0.0271139 0.0126104 2.150 0.277
## Rainfall.6 -0.0007402 0.0138839 -0.053 0.966
## Rainfall.7 0.0008901 0.0159534 0.056 0.965
## Rainfall.8 -0.0082237 0.0133523 -0.616 0.649
## Rainfall.9 -0.0359922 0.0139713 -2.576 0.236
## Rainfall.10 -0.0076753 0.0134261 -0.572 0.669
## Rainfall.11 -0.0087286 0.0145229 -0.601 0.655
## Rainfall.12 0.0220019 0.0129624 1.697 0.339
## Rainfall.13 0.0171198 0.0216060 0.792 0.573
## Rainfall.14 -0.0189919 0.0193580 -0.981 0.506
##
## Residual standard error: 0.01288 on 1 degrees of freedom
## Multiple R-squared: 0.9799, Adjusted R-squared: 0.6784
## F-statistic: 3.25 on 15 and 1 DF, p-value: 0.4127
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -113.884 -99.71941
DLM.Rainfall Model is insignificant (p-value =
0.4127) at 0.05 significant level.
Without intercept :
DLM.Rainfall.noIntercept = dlm(formula = RBO ~ 0 + Rainfall, data = RBO_dataset, q = 14)
summary(DLM.Rainfall.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -0.013313 -0.026712 -0.006145 0.018074 0.011886 0.002610 -0.007758 -0.017198
## 9 10 11 12 13 14 15 16
## -0.009586 0.010223 0.018227 0.012199 0.011180 0.011907 -0.015261 -0.007257
## 17
## 0.011574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rainfall.t 0.086282 0.042775 2.017 0.181
## Rainfall.1 0.044434 0.036200 1.227 0.345
## Rainfall.2 0.015588 0.047175 0.330 0.772
## Rainfall.3 0.016892 0.038284 0.441 0.702
## Rainfall.4 0.022028 0.048907 0.450 0.697
## Rainfall.5 0.026982 0.038680 0.698 0.558
## Rainfall.6 0.025575 0.038051 0.672 0.571
## Rainfall.7 0.024474 0.045835 0.534 0.647
## Rainfall.8 0.008888 0.039022 0.228 0.841
## Rainfall.9 -0.056935 0.040061 -1.421 0.291
## Rainfall.10 -0.015777 0.040759 -0.387 0.736
## Rainfall.11 -0.033288 0.040815 -0.816 0.500
## Rainfall.12 0.033160 0.038924 0.852 0.484
## Rainfall.13 0.069648 0.054175 1.286 0.327
## Rainfall.14 0.042888 0.038776 1.106 0.384
##
## Residual standard error: 0.03952 on 2 degrees of freedom
## Multiple R-squared: 0.9996, Adjusted R-squared: 0.9969
## F-statistic: 370.4 on 15 and 2 DF, p-value: 0.002695
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -65.99336 -52.66195
DLM.Rainfall.noIntercept Model is significant.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = RBO ~ Radiation, data = RBO_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 14 14 0.12847 -98.60918 -84.44456 0.09397 -0.18351 0.21021 0.4496384297
## 1 1 1.06303 -97.71113 -92.10634 1.18033 2.85932 0.06311 0.0015201563
## 3 3 1.12846 -92.38658 -84.39335 1.15033 0.64125 0.03438 0.0005032338
## 2 2 1.17471 -91.50708 -84.67061 1.10279 2.19244 0.03299 0.0024745598
## 13 13 0.28585 -87.57406 -73.32812 0.28875 -1.83052 0.09907 0.1349515619
## 4 4 1.19981 -86.43017 -77.35931 1.27178 -0.09837 0.00476 0.0007267546
## 5 5 1.07186 -83.32223 -73.25746 1.15984 2.21790 -0.01624 0.0036551584
## 6 6 0.94832 -81.85403 -70.88414 0.86545 -0.85749 0.02433 0.0134482398
## 8 8 0.70733 -80.65686 -68.16642 0.59509 0.78620 0.00645 0.2032244911
## 7 7 0.85847 -79.04997 -67.26943 0.67378 1.85251 -0.00294 0.0831448135
## 9 9 0.66171 -78.45240 -65.35989 0.50103 0.35495 -0.35840 0.9746014602
## 10 10 0.60373 -78.06879 -64.49000 0.63534 0.45096 -0.41866 0.4067961832
## 11 11 0.55833 -72.88766 -58.94741 0.63561 0.27746 -0.64919 0.3184508076
## 12 12 0.64771 -65.19474 -51.02816 0.67460 0.25024 -1.45510 0.4948052542
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.Radiation = dlm(formula = RBO ~ Radiation, data = RBO_dataset, q = 14)
summary(DLM.Radiation)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## -5.335e-04 -6.401e-03 -7.619e-04 7.131e-03 6.508e-04 -8.572e-05 2.471e-03
## 8 9 10 11 12 13 14
## -8.584e-03 3.245e-03 -1.008e-03 9.027e-03 9.758e-04 -1.074e-02 1.932e-04
## 15 16 17
## 1.855e-03 -1.776e-03 4.338e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.943432 5.890296 -0.839 0.555
## Radiation.t -0.047737 0.059011 -0.809 0.567
## Radiation.1 0.081805 0.058184 1.406 0.394
## Radiation.2 0.090890 0.069717 1.304 0.417
## Radiation.3 0.083786 0.079447 1.055 0.483
## Radiation.4 -0.064871 0.080220 -0.809 0.567
## Radiation.5 -0.195571 0.102087 -1.916 0.306
## Radiation.6 0.078744 0.051070 1.542 0.366
## Radiation.7 0.043922 0.047225 0.930 0.523
## Radiation.8 0.133958 0.090427 1.481 0.378
## Radiation.9 0.013467 0.030075 0.448 0.732
## Radiation.10 0.002873 0.029457 0.098 0.938
## Radiation.11 -0.070320 0.032088 -2.191 0.273
## Radiation.12 0.034562 0.034965 0.988 0.504
## Radiation.13 0.033330 0.096370 0.346 0.788
## Radiation.14 0.171252 0.116490 1.470 0.380
##
## Residual standard error: 0.02019 on 1 degrees of freedom
## Multiple R-squared: 0.9506, Adjusted R-squared: 0.2102
## F-statistic: 1.284 on 15 and 1 DF, p-value: 0.6086
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -98.60918 -84.44456
DLM.Radiation Model is insignificant (p-value =
0.6086) at 0.05 significant level.
Without intercept :
DLM.Radiation.noIntercept = dlm(formula = RBO ~ 0 + Rainfall, data = RBO_dataset, q = 14)
summary(DLM.Radiation.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -0.013313 -0.026712 -0.006145 0.018074 0.011886 0.002610 -0.007758 -0.017198
## 9 10 11 12 13 14 15 16
## -0.009586 0.010223 0.018227 0.012199 0.011180 0.011907 -0.015261 -0.007257
## 17
## 0.011574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rainfall.t 0.086282 0.042775 2.017 0.181
## Rainfall.1 0.044434 0.036200 1.227 0.345
## Rainfall.2 0.015588 0.047175 0.330 0.772
## Rainfall.3 0.016892 0.038284 0.441 0.702
## Rainfall.4 0.022028 0.048907 0.450 0.697
## Rainfall.5 0.026982 0.038680 0.698 0.558
## Rainfall.6 0.025575 0.038051 0.672 0.571
## Rainfall.7 0.024474 0.045835 0.534 0.647
## Rainfall.8 0.008888 0.039022 0.228 0.841
## Rainfall.9 -0.056935 0.040061 -1.421 0.291
## Rainfall.10 -0.015777 0.040759 -0.387 0.736
## Rainfall.11 -0.033288 0.040815 -0.816 0.500
## Rainfall.12 0.033160 0.038924 0.852 0.484
## Rainfall.13 0.069648 0.054175 1.286 0.327
## Rainfall.14 0.042888 0.038776 1.106 0.384
##
## Residual standard error: 0.03952 on 2 degrees of freedom
## Multiple R-squared: 0.9996, Adjusted R-squared: 0.9969
## F-statistic: 370.4 on 15 and 2 DF, p-value: 0.002695
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -65.99336 -52.66195
DLM.Radiation.noIntercept Model is significant.
With intercept :
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = RBO ~ RelHumidity, data = RBO_dataset, q.min = 1, q.max = 20,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 15 15 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 16 16 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 17 17 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 18 18 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 19 19 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 20 20 0.00000 -Inf -Inf 0.00000 0.00000 NaN NaN
## 14 14 0.05888 -122.75432 -108.58970 0.04730 -0.06621 0.80915 0.6290754261
## 12 12 0.24538 -102.94035 -88.77377 0.22312 -1.00307 0.66326 0.9118912488
## 13 13 0.20119 -98.02833 -83.78238 0.15223 -0.12365 0.49597 0.1896695293
## 1 1 1.10110 -94.56619 -88.96140 1.22180 2.01650 -0.04044 0.0021897031
## 10 10 0.45185 -89.23348 -75.65468 0.41472 -3.34411 0.16635 0.8908388622
## 3 3 1.17112 -89.19540 -81.20218 1.12421 0.09980 -0.08219 0.0002775153
## 2 2 1.23588 -88.37920 -81.54272 1.29446 0.39157 -0.07714 0.0043469395
## 11 11 0.34014 -86.87319 -72.93294 0.18838 0.48333 0.18044 0.7234515930
## 9 9 0.58212 -83.46275 -70.37025 0.49322 0.29039 -0.08174 0.4767002700
## 4 4 1.26243 -82.80086 -73.73000 1.31804 1.18794 -0.13842 0.0002642674
## 5 5 1.14964 -79.40410 -69.33932 1.00804 0.38188 -0.18152 0.0004056958
## 6 6 1.02674 -76.22147 -65.25158 0.82281 0.36858 -0.22222 0.0061384348
## 7 7 0.92355 -74.80399 -63.02345 0.77382 0.64955 -0.19704 0.0043529149
## 8 8 0.84661 -72.65628 -60.16585 0.79021 -4.31457 -0.40689 0.0443006703
q = 14 has the smallest AIC and BIC scores. Fit model with q = 14,
DLM.RelHumidity = dlm(formula = RBO ~ RelHumidity, data = RBO_dataset, q = 14)
summary(DLM.RelHumidity)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## 3.440e-04 1.266e-03 1.250e-03 1.127e-03 -9.612e-04 -5.170e-03 -2.620e-04
## 8 9 10 11 12 13 14
## -5.288e-05 8.850e-04 -5.459e-04 7.252e-04 -2.540e-03 -7.889e-04 2.681e-04
## 15 16 17
## -3.376e-03 1.143e-03 6.690e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2139297 2.8600699 -0.424 0.744
## RelHumidity.t 0.0039691 0.0116410 0.341 0.791
## RelHumidity.1 -0.0218992 0.0082984 -2.639 0.231
## RelHumidity.2 -0.0009846 0.0078243 -0.126 0.920
## RelHumidity.3 -0.0197978 0.0079714 -2.484 0.244
## RelHumidity.4 0.0009245 0.0043927 0.210 0.868
## RelHumidity.5 0.0160989 0.0053607 3.003 0.205
## RelHumidity.6 -0.0123647 0.0045655 -2.708 0.225
## RelHumidity.7 0.0163437 0.0049107 3.328 0.186
## RelHumidity.8 0.0060125 0.0048058 1.251 0.429
## RelHumidity.9 -0.0087084 0.0066835 -1.303 0.417
## RelHumidity.10 0.0011195 0.0045142 0.248 0.845
## RelHumidity.11 -0.0068266 0.0058391 -1.169 0.450
## RelHumidity.12 0.0269905 0.0068458 3.943 0.158
## RelHumidity.13 0.0128362 0.0059733 2.149 0.277
## RelHumidity.14 0.0068126 0.0054905 1.241 0.432
##
## Residual standard error: 0.009924 on 1 degrees of freedom
## Multiple R-squared: 0.9881, Adjusted R-squared: 0.8092
## F-statistic: 5.522 on 15 and 1 DF, p-value: 0.3235
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -122.7543 -108.5897
DLM.RelHumidity Model is insignificant (p-value =
0.3235) at 0.05 significant level.
Without intercept :
DLM.RelHumidity.noIntercept = dlm(formula = RBO ~ 0 + RelHumidity, data = RBO_dataset, q = 14)
summary(DLM.RelHumidity.noIntercept)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## -0.0018299 0.0010538 0.0005606 0.0013108 -0.0012237 -0.0063138 0.0012040
## 8 9 10 11 12 13 14
## -0.0011289 0.0018053 -0.0015397 0.0013437 -0.0018318 0.0005116 -0.0001880
## 15 16 17
## -0.0027747 0.0027611 0.0062651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## RelHumidity.t -2.084e-05 5.274e-03 -0.004 0.9972
## RelHumidity.1 -2.402e-02 5.090e-03 -4.718 0.0421 *
## RelHumidity.2 -2.646e-03 5.204e-03 -0.509 0.6616
## RelHumidity.3 -2.185e-02 4.864e-03 -4.493 0.0461 *
## RelHumidity.4 3.987e-04 3.237e-03 0.123 0.9132
## RelHumidity.5 1.471e-02 3.257e-03 4.515 0.0457 *
## RelHumidity.6 -1.307e-02 3.268e-03 -3.998 0.0572 .
## RelHumidity.7 1.547e-02 3.424e-03 4.518 0.0457 *
## RelHumidity.8 5.471e-03 3.559e-03 1.537 0.2641
## RelHumidity.9 -1.090e-02 3.251e-03 -3.354 0.0786 .
## RelHumidity.10 1.194e-03 3.465e-03 0.345 0.7632
## RelHumidity.11 -5.777e-03 4.063e-03 -1.422 0.2910
## RelHumidity.12 2.854e-02 4.453e-03 6.409 0.0235 *
## RelHumidity.13 1.359e-02 4.380e-03 3.103 0.0901 .
## RelHumidity.14 6.632e-03 4.205e-03 1.577 0.2555
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007624 on 2 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 0.9999
## F-statistic: 9956 on 15 and 2 DF, p-value: 0.0001004
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -121.9384 -108.607
DLM.RelHumidity.noIntercept Model is significant.
Models using all 4 predictors without intercept are significant. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("DLM.Temperature.noIntercept", "DLM.Rainfall.noIntercept", "DLM.Radiation.noIntercept", "DLM.RelHumidity.noIntercept")
AIC <- c(AIC(DLM.Temperature.noIntercept), AIC(DLM.Rainfall.noIntercept), AIC(DLM.Radiation.noIntercept), AIC(DLM.RelHumidity.noIntercept))
BIC <- c(BIC(DLM.Temperature.noIntercept), BIC(DLM.Rainfall.noIntercept), BIC(DLM.Radiation.noIntercept), BIC(DLM.RelHumidity.noIntercept))
Adjusted_Rsquared <- c(0.999, 0.9969, 0.9969, 0.9999)
MASE <- MASE(DLM.Temperature.noIntercept, DLM.Rainfall.noIntercept, DLM.Radiation.noIntercept, DLM.RelHumidity.noIntercept)
data.frame(AIC, BIC, Adjusted_Rsquared, MASE) %>% arrange(AIC)
## AIC BIC Adjusted_Rsquared n
## DLM.RelHumidity.noIntercept -121.93842 -108.60701 0.9999 17
## DLM.Temperature.noIntercept -102.28636 -88.95495 0.9990 17
## DLM.Rainfall.noIntercept -65.99336 -52.66195 0.9969 17
## DLM.Radiation.noIntercept -65.99336 -52.66195 0.9969 17
## MASE
## DLM.RelHumidity.noIntercept 0.07231577
## DLM.Temperature.noIntercept 0.14522072
## DLM.Rainfall.noIntercept 0.45373278
## DLM.Radiation.noIntercept 0.45373278
Thus, as per AIC, BIC and MASE, finite distributed lag model for RBO with Relative Humidity as the regressor with no intercept (DLM.RelHumidity.noIntercept) is the best.
We can apply a diagnostic check using checkresiduals() function from the forecast package.
checkresiduals(DLM.RelHumidity.noIntercept$model$residuals) # forecast package
##
## Ljung-Box test
##
## data: Residuals
## Q* = 0.81081, df = 3, p-value = 0.8469
##
## Model df: 0. Total lags used: 3
In this output,
ATTENTION - Lets summarise the models from here on
and not go into each models details for simplicity
Polynomial DLM model helps remove the effect of multicollinearity. Lets fit a polynomial DLM of order 2 for each of the 4 regressors individually.
PolyDLM.Temperature = polyDlm(x = as.vector(Temperature), y = as.vector(RBO), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.Temperature)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.032950 -0.011121 0.001367 0.011331 0.030851
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3942960 0.4617116 0.854 0.409
## z.t0 -0.0104404 0.0091618 -1.140 0.275
## z.t1 0.0041049 0.0028129 1.459 0.168
## z.t2 -0.0002538 0.0001710 -1.484 0.162
##
## Residual standard error: 0.02236 on 13 degrees of freedom
## Multiple R-squared: 0.2127, Adjusted R-squared: 0.03107
## F-statistic: 1.171 on 3 and 13 DF, p-value: 0.3585
Polynomial DLM model with Temperature as regressor variable is insignificant at 5% significance level.
PolyDLM.Rainfall = polyDlm(x = as.vector(Rainfall), y = as.vector(RBO), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.Rainfall)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.036444 -0.011375 -0.002838 0.017678 0.033093
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7480934 0.1649886 4.534 0.000561 ***
## z.t0 0.0114043 0.0127395 0.895 0.386959
## z.t1 -0.0028950 0.0038747 -0.747 0.468269
## z.t2 0.0001186 0.0002811 0.422 0.680119
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02168 on 13 degrees of freedom
## Multiple R-squared: 0.2601, Adjusted R-squared: 0.08932
## F-statistic: 1.523 on 3 and 13 DF, p-value: 0.2553
Polynomial DLM model with Rainfall as regressor variable is insignificant at 5% significance level.
PolyDLM.Radiation = polyDlm(x = as.vector(Radiation), y = as.vector(RBO), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.Radiation)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.04294 -0.01136 0.00325 0.01095 0.03331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.3266320 1.1428764 1.161 0.267
## z.t0 -0.0130075 0.0123520 -1.053 0.311
## z.t1 0.0052006 0.0037170 1.399 0.185
## z.t2 -0.0003872 0.0002747 -1.410 0.182
##
## Residual standard error: 0.02241 on 13 degrees of freedom
## Multiple R-squared: 0.2091, Adjusted R-squared: 0.0266
## F-statistic: 1.146 on 3 and 13 DF, p-value: 0.3674
Polynomial DLM model with Radiation as regressor variable is insignificant at 5% significance level.
PolyDLM.RelHumidity = polyDlm(x = as.vector(RelHumidity), y = as.vector(RBO), q = 14, k = 2, show.beta = FALSE)
summary(PolyDLM.RelHumidity)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.044911 -0.012149 0.003855 0.010324 0.031775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.643e+00 3.580e+00 -1.855 0.0864 .
## z.t0 9.925e-03 6.224e-03 1.595 0.1348
## z.t1 -2.833e-04 1.431e-03 -0.198 0.8462
## z.t2 -4.078e-05 9.394e-05 -0.434 0.6713
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02168 on 13 degrees of freedom
## Multiple R-squared: 0.2597, Adjusted R-squared: 0.08886
## F-statistic: 1.52 on 3 and 13 DF, p-value: 0.256
Polynomial DLM model with Relative Humidity as regressor variable is insignificant at 5% significance level.
None of the univariate Polynomial DLM models using either of the 4 predictor were significant.
No significant Polynomial DLM model was found.
Here the lag weights are positive and decline geometrically. This
model is called infinite geometric DLM, meaning there are infinite lag
weights. Koyck transformation is applied to implement this infinite
geometric DLM model by subtracting the first lag of geometric DLM
multiplied by \(\phi\). The Koyck
transformed model is represented as,
\(Y_t = \delta_1 + \delta_2Y_{t-1} +
\nu_t\)
where \(\delta_1 = \alpha(1-\phi), \delta_2
= \phi, \delta_3 = \beta\) and the random error after the
transformation is \(\nu_t = (\epsilon_t
-\phi\epsilon_{t-1})\).
The koyckDlm() function is used to implement a two-staged least squares method to first estimate the \(\hat{Y}_{t-1}\) and the estimate \(Y_{t}\) through simple linear regression. Lets deduce Koyck geometric GLM models for each of the 4 regressors individually.
With intercept :
Koyck.Temperature = koyckDlm(x = as.vector(RBO_dataset$Temperature) , y = as.vector(RBO_dataset$RBO) )
summary(Koyck.Temperature$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0741656 -0.0225173 -0.0006794 0.0240622 0.1270971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.20775 0.83741 -0.248 0.8059
## Y.1 0.68547 0.25559 2.682 0.0123 *
## X.t 0.02235 0.03523 0.634 0.5312
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 4.635 0.0404 *
## Wu-Hausman 1 26 1.347 0.2563
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04319 on 27 degrees of freedom
## Multiple R-Squared: 0.1517, Adjusted R-squared: 0.08891
## Wald test: 5.309 on 2 and 27 DF, p-value: 0.01136
Koyck.Temperature is significant at 5% significance level.
Without intercept :
Koyck.Temperature.NoIntercept = koyckDlm(x = as.vector(RBO_dataset$Temperature) , y = as.vector(RBO_dataset$RBO), intercept = FALSE)
summary(Koyck.Temperature.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.067575 -0.019831 -0.002845 0.021777 0.118166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 0.633607 0.136928 4.627 7.68e-05 ***
## X.t 0.013715 0.005163 2.656 0.0129 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 93.540 7.25e-13 ***
## Wu-Hausman 1 27 5.026 0.0334 *
## Sargan 1 NA 0.076 0.7827
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04021 on 28 degrees of freedom
## Multiple R-Squared: 0.9972, Adjusted R-squared: 0.997
## Wald test: 5049 on 2 and 28 DF, p-value: < 2.2e-16
Koyck.Temperature.NoIntercept is significant at 5% significance level.
With intercept :
Koyck.Rainfall = koyckDlm(x = as.vector(RBO_dataset$Rainfall) , y = as.vector(RBO_dataset$RBO) )
summary(Koyck.Rainfall$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3665 -0.4155 -0.1142 0.3241 1.6012
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3207 2.4302 0.132 0.896
## Y.1 -6.5147 243.8216 -0.027 0.979
## X.t 2.2101 76.0635 0.029 0.977
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 0.001 0.977
## Wu-Hausman 1 26 0.360 0.554
## Sargan 0 NA NA NA
##
## Residual standard error: 0.7951 on 27 degrees of freedom
## Multiple R-Squared: -286.5, Adjusted R-squared: -307.8
## Wald test: 0.01549 on 2 and 27 DF, p-value: 0.9846
Koyck.Rainfall model is insignificant at 5% significance level.
Without intercept :
Koyck.Rainfall.NoIntercept = koyckDlm(x = as.vector(RBO_dataset$Rainfall) , y = as.vector(RBO_dataset$RBO), intercept = FALSE)
summary(Koyck.Rainfall.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.99247 -0.31365 -0.07781 0.23237 1.22822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 -4.287 177.876 -0.024 0.981
## X.t 1.650 55.537 0.030 0.977
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 0.000 1.000
## Wu-Hausman 1 27 0.161 0.692
## Sargan 1 NA 0.035 0.852
##
## Residual standard error: 0.5815 on 28 degrees of freedom
## Multiple R-Squared: 0.4216, Adjusted R-squared: 0.3803
## Wald test: 24.13 on 2 and 28 DF, p-value: 8.091e-07
Koyck.Rainfall.NoIntercept model is significant at 5% significance level.
With intercept :
Koyck.Radiation = koyckDlm(x = as.vector(RBO_dataset$Radiation) , y = as.vector(RBO_dataset$RBO) )
summary(Koyck.Radiation$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.082255 -0.017008 -0.001036 0.021424 0.106984
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.48011 0.94819 -0.506 0.6167
## Y.1 0.69801 0.24502 2.849 0.0083 **
## X.t 0.04812 0.05661 0.850 0.4028
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 4.942 0.0348 *
## Wu-Hausman 1 26 2.765 0.1084
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0467 on 27 degrees of freedom
## Multiple R-Squared: 0.008467, Adjusted R-squared: -0.06498
## Wald test: 4.731 on 2 and 27 DF, p-value: 0.01732
Koyck.Radiation model is significant at 5% significance level.
Without intercept :
Koyck.Radiation.NoIntercept = koyckDlm(x = as.vector(RBO_dataset$Radiation) , y = as.vector(RBO_dataset$RBO), intercept = FALSE)
summary(Koyck.Radiation.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.075202 -0.018109 -0.001784 0.019029 0.105209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 0.607517 0.144697 4.199 0.000246 ***
## X.t 0.019783 0.007344 2.694 0.011798 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 109.386 1.13e-13 ***
## Wu-Hausman 1 27 6.810 0.0146 *
## Sargan 1 NA 0.369 0.5438
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04031 on 28 degrees of freedom
## Multiple R-Squared: 0.9972, Adjusted R-squared: 0.997
## Wald test: 5024 on 2 and 28 DF, p-value: < 2.2e-16
Koyck.Radiation.NoIntercept model is significant at 5% significance level.
With intercept :
Koyck.RelHumidity = koyckDlm(x = as.vector(RBO_dataset$RelHumidity) , y = as.vector(RBO_dataset$RBO) )
summary(Koyck.RelHumidity$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.080897 -0.021103 -0.004676 0.022673 0.111041
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.16679 8.04941 -0.145 0.8858
## Y.1 0.62503 0.34753 1.798 0.0833 .
## X.t 0.01525 0.08274 0.184 0.8551
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 0.393 0.536
## Wu-Hausman 1 26 0.055 0.816
## Sargan 0 NA NA NA
##
## Residual standard error: 0.04127 on 27 degrees of freedom
## Multiple R-Squared: 0.2256, Adjusted R-squared: 0.1682
## Wald test: 5.612 on 2 and 27 DF, p-value: 0.009161
Koyck.RelHumidity model is significant at 5% significance level.
Without intercept :
Koyck.RelHumidity.NoIntercept = koyckDlm(x = as.vector(RBO_dataset$RelHumidity) , y = as.vector(RBO_dataset$RBO), intercept = FALSE)
summary(Koyck.RelHumidity.NoIntercept$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.079941 -0.018086 -0.006952 0.018421 0.105496
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Y.1 0.580729 0.153090 3.793 0.000729 ***
## X.t 0.003260 0.001198 2.720 0.011080 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 27 801.908 <2e-16 ***
## Wu-Hausman 1 27 0.493 0.489
## Sargan 1 NA 0.026 0.871
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0382 on 28 degrees of freedom
## Multiple R-Squared: 0.9975, Adjusted R-squared: 0.9973
## Wald test: 5595 on 2 and 28 DF, p-value: < 2.2e-16
Koyck.RelHumidity.NoIntercept model is significant at 5% significance level.
Koyck DLM models for all 4 regressors without intercept are significant. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("Koyck.Temperature", "Koyck.Temperature.NoIntercept", "Koyck.Rainfall.NoIntercept", "Koyck.Radiation", "Koyck.Radiation.NoIntercept", "Koyck.RelHumidity", "Koyck.RelHumidity.NoIntercept")
AIC <- c(AIC(Koyck.Temperature), AIC(Koyck.Temperature.NoIntercept), AIC(Koyck.Rainfall.NoIntercept), AIC(Koyck.Radiation), AIC(Koyck.Radiation.NoIntercept), AIC(Koyck.RelHumidity), AIC(Koyck.RelHumidity.NoIntercept))
BIC <- c(BIC(Koyck.Temperature), BIC(Koyck.Temperature.NoIntercept), BIC(Koyck.Rainfall.NoIntercept), BIC(Koyck.Radiation), BIC(Koyck.Radiation.NoIntercept), BIC(Koyck.RelHumidity), BIC(Koyck.RelHumidity.NoIntercept))
Adjusted_Rsquared <- c(0.08891, 0.997, 0.3803, -0.06498, 0.997, 0.1682, 0.9973)
MASE <- MASE(Koyck.Temperature, Koyck.Temperature.NoIntercept, Koyck.Rainfall.NoIntercept, Koyck.Radiation, Koyck.Radiation.NoIntercept, Koyck.RelHumidity, Koyck.RelHumidity.NoIntercept)
data.frame(AIC, BIC, Adjusted_Rsquared, MASE) %>% arrange(MASE)
## AIC BIC Adjusted_Rsquared n
## Koyck.RelHumidity.NoIntercept -106.83021 -102.62662 0.99730 30
## Koyck.Temperature.NoIntercept -103.74657 -99.54297 0.99700 30
## Koyck.Radiation.NoIntercept -103.60090 -99.39730 0.99700 30
## Koyck.Temperature -98.54907 -92.94428 0.08891 30
## Koyck.RelHumidity -101.28049 -95.67571 0.16820 30
## Koyck.Radiation -93.86690 -88.26211 -0.06498 30
## Koyck.Rainfall.NoIntercept 56.53470 60.73829 0.38030 30
## MASE
## Koyck.RelHumidity.NoIntercept 0.8702601
## Koyck.Temperature.NoIntercept 0.9126045
## Koyck.Radiation.NoIntercept 0.9169493
## Koyck.Temperature 0.9535116
## Koyck.RelHumidity 0.9559618
## Koyck.Radiation 1.0314227
## Koyck.Rainfall.NoIntercept 14.2098845
Thus, as per AIC, BIC, MASE (best in terms of forecasting), and Adjusted R-Squared, Koyck DLM for RBO with Relative Humidity as the regressor with no intercept (Koyck.RelHumidity.NoIntercept) is the best.
checkresiduals(Koyck.RelHumidity.NoIntercept$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 6.3336, df = 6, p-value = 0.3869
##
## Model df: 0. Total lags used: 6
Serial autocorrelations left in residuals are insignificant as per Ljung-Box test and ACF plot. From the time series plot and histogram of residuals, there is an obvious random pattern and normality in the residual distribution. Thus, no violation in general assumptions.
Autoregressive Distributed lag model is a flexible and parsimonious
infinite DLM. The model is represented as,
\(Y_t = \mu + \beta_0 X_t + \beta_1 X_{t-1}
+ \gamma_1 Y_{t-1} + e_t\)
Similar to the Koyck DLM, it is possible to write this model as an infinite DLM with infinite lag distribution of any shape rather than a polynomial or geometric shape. The model is denoted as ARDL(p,q). To fit the model we will use ardlDlm() function is used. Lets find the best lag length using AIC and BIC score through an iteration. Lets set max lag length to 14. Lets do this for each regressor individually.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ Temperature, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 2 13 -142.4481 -126.4214
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 2 13 -142.4481 -126.4214
ARDL(2,13) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(2,13):
ARDL.Temperature.2x13 = ardlDlm(formula = RBO ~ Temperature, data = RBO_dataset, p = 2, q = 13)
summary(ARDL.Temperature.2x13)
##
## Time series regression with "ts" data:
## Start = 14, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 14 15 16 17 18 19 20
## 1.651e-03 -6.815e-05 -2.004e-03 -2.547e-04 3.351e-03 -6.355e-04 -2.342e-03
## 21 22 23 24 25 26 27
## -9.779e-04 1.796e-03 1.513e-03 3.106e-04 2.002e-03 2.291e-03 -1.650e-04
## 28 29 30 31
## -1.020e-03 -1.710e-03 -1.026e-03 -2.711e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.280126 1.164162 7.113 0.0889 .
## Temperature.t -0.050777 0.009731 -5.218 0.1205
## Temperature.1 -0.065776 0.014288 -4.603 0.1362
## Temperature.2 -0.065288 0.013233 -4.934 0.1273
## RBO.1 -0.916271 0.212320 -4.316 0.1450
## RBO.2 -0.841775 0.246285 -3.418 0.1812
## RBO.3 -0.650233 0.122350 -5.315 0.1184
## RBO.4 -0.788172 0.130667 -6.032 0.1046
## RBO.5 -0.961598 0.167927 -5.726 0.1101
## RBO.6 -0.109752 0.142309 -0.771 0.5818
## RBO.7 -0.244319 0.127712 -1.913 0.3066
## RBO.8 0.508823 0.108925 4.671 0.1343
## RBO.9 0.180396 0.154063 1.171 0.4500
## RBO.10 0.192379 0.120400 1.598 0.3560
## RBO.11 -0.716589 0.134857 -5.314 0.1184
## RBO.12 -0.932688 0.162567 -5.737 0.1099
## RBO.13 -0.202849 0.096217 -2.108 0.2820
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007222 on 1 degrees of freedom
## Multiple R-squared: 0.994, Adjusted R-squared: 0.8974
## F-statistic: 10.29 on 16 and 1 DF, p-value: 0.2407
checkresiduals(ARDL.Temperature.2x13$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 4.6877, df = 4, p-value = 0.3209
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Temperature.2x13)
## MASE
## ARDL.Temperature.2x13 0.05441407
Model is insignificant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ -1 + Temperature, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 13 3 -138.1306 -122.1039
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 13 3 -138.1306 -122.1039
ARDL(13,3) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(13,3):
ARDL.Temperature.NoIntercept.13x3 = ardlDlm(formula = RBO ~ -1 + Temperature, data = RBO_dataset, p = 13, q = 3)
summary(ARDL.Temperature.NoIntercept.13x3)
##
## Time series regression with "ts" data:
## Start = 14, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 14 15 16 17 18 19 20
## -5.110e-04 4.615e-04 6.965e-05 -2.328e-03 2.081e-03 -2.395e-03 2.878e-03
## 21 22 23 24 25 26 27
## -2.025e-03 9.700e-04 -1.617e-03 -2.614e-03 1.435e-03 2.567e-03 -1.335e-03
## 28 29 30 31
## -2.176e-03 2.525e-03 -4.746e-04 2.406e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Temperature.t -0.035375 0.023477 -1.507 0.3730
## Temperature.1 -0.026328 0.011897 -2.213 0.2702
## Temperature.2 0.049082 0.013655 3.594 0.1727
## Temperature.3 0.025998 0.016135 1.611 0.3536
## Temperature.4 -0.079590 0.015458 -5.149 0.1221
## Temperature.5 -0.014149 0.006759 -2.093 0.2837
## Temperature.6 0.062652 0.007525 8.326 0.0761 .
## Temperature.7 0.023190 0.007344 3.158 0.1953
## Temperature.8 0.064374 0.009153 7.033 0.0899 .
## Temperature.9 -0.010914 0.007826 -1.395 0.3960
## Temperature.10 -0.005591 0.006772 -0.826 0.5606
## Temperature.11 -0.011420 0.008053 -1.418 0.3910
## Temperature.12 0.008843 0.011495 0.769 0.5826
## Temperature.13 0.053546 0.010262 5.218 0.1205
## RBO.1 -0.869492 0.180138 -4.827 0.1301
## RBO.2 -1.169731 0.345904 -3.382 0.1830
## RBO.3 0.229279 0.179811 1.275 0.4234
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.008142 on 1 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 0.9999
## F-statistic: 8129 on 17 and 1 DF, p-value: 0.00872
checkresiduals(ARDL.Temperature.NoIntercept.13x3$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 5.751, df = 4, p-value = 0.2185
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Temperature.NoIntercept.13x3)
## MASE
## ARDL.Temperature.NoIntercept.13x3 0.06502803
Model is significant at 5% significance level.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ Rainfall, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 12 4 -119.141 -101.1967
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 12 4 -119.141 -101.1967
ARDL(12,4) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(12,4):
ARDL.Rainfall.12x4 = ardlDlm(formula = RBO ~ Rainfall, data = RBO_dataset, p = 12, q = 4)
summary(ARDL.Rainfall.12x4)
##
## Time series regression with "ts" data:
## Start = 13, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 13 14 15 16 17 18 19
## 0.0011337 -0.0002443 -0.0008624 0.0010833 -0.0026434 0.0034920 -0.0033775
## 20 21 22 23 24 25 26
## 0.0028406 -0.0035397 0.0049358 -0.0020010 0.0042188 -0.0022551 0.0046327
## 27 28 29 30 31
## -0.0080942 0.0058096 -0.0059044 0.0037880 -0.0030127
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.760273 0.337034 5.223 0.120
## Rainfall.t 0.002116 0.016777 0.126 0.920
## Rainfall.1 0.033695 0.019234 1.752 0.330
## Rainfall.2 0.001718 0.017245 0.100 0.937
## Rainfall.3 0.013224 0.015823 0.836 0.557
## Rainfall.4 -0.008302 0.018220 -0.456 0.728
## Rainfall.5 0.015187 0.015325 0.991 0.503
## Rainfall.6 -0.009069 0.015134 -0.599 0.656
## Rainfall.7 0.005314 0.016608 0.320 0.803
## Rainfall.8 -0.007361 0.014119 -0.521 0.694
## Rainfall.9 -0.023528 0.018842 -1.249 0.430
## Rainfall.10 -0.023671 0.019226 -1.231 0.434
## Rainfall.11 -0.030104 0.013630 -2.209 0.271
## Rainfall.12 0.002512 0.018530 0.136 0.914
## RBO.1 -0.376929 0.237815 -1.585 0.358
## RBO.2 -0.490094 0.194215 -2.523 0.240
## RBO.3 -0.133310 0.183349 -0.727 0.600
## RBO.4 -0.366547 0.183579 -1.997 0.296
##
## Residual standard error: 0.01687 on 1 degrees of freedom
## Multiple R-squared: 0.9738, Adjusted R-squared: 0.5289
## F-statistic: 2.189 on 17 and 1 DF, p-value: 0.4918
checkresiduals(ARDL.Rainfall.12x4$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 56.3, df = 4, p-value = 1.734e-11
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Rainfall.12x4)
## MASE
## ARDL.Rainfall.12x4 0.1265811
Model is insignificant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ -1 + Rainfall, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 7 11 -145.2684 -125.3537
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 7 11 -145.2684 -125.3537
ARDL(7,11) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(7,11):
ARDL.Rainfall.NoIntercept.7x11 = ardlDlm(formula = RBO ~ -1 + Rainfall, data = RBO_dataset, p = 7, q = 11)
summary(ARDL.Rainfall.NoIntercept.7x11)
##
## Time series regression with "ts" data:
## Start = 12, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 12 13 14 15 16 17 18
## -0.0024899 -0.0029150 -0.0007196 0.0026196 0.0012108 -0.0008134 -0.0014173
## 19 20 21 22 23 24 25
## 0.0014326 -0.0011076 -0.0015172 0.0007071 -0.0021900 -0.0047420 0.0001211
## 26 27 28 29 30 31
## 0.0017904 0.0041946 0.0002401 0.0006746 -0.0002648 0.0054837
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rainfall.t 0.148398 0.021730 6.829 0.0926 .
## Rainfall.1 -0.078600 0.010894 -7.215 0.0877 .
## Rainfall.2 0.008794 0.010039 0.876 0.5420
## Rainfall.3 -0.010323 0.010732 -0.962 0.5124
## Rainfall.4 0.010759 0.013679 0.787 0.5757
## Rainfall.5 0.037905 0.017317 2.189 0.2728
## Rainfall.6 0.034012 0.018606 1.828 0.3187
## Rainfall.7 -0.099192 0.013885 -7.144 0.0885 .
## RBO.1 -0.642074 0.150792 -4.258 0.1469
## RBO.2 1.131750 0.126398 8.954 0.0708 .
## RBO.3 0.994445 0.127653 7.790 0.0813 .
## RBO.4 -0.153450 0.114991 -1.334 0.4094
## RBO.5 -1.616582 0.236708 -6.829 0.0926 .
## RBO.6 -0.785741 0.234559 -3.350 0.1847
## RBO.7 2.198336 0.225746 9.738 0.0651 .
## RBO.8 1.307629 0.244736 5.343 0.1178
## RBO.9 -2.210155 0.274147 -8.062 0.0786 .
## RBO.10 -1.068641 0.170467 -6.269 0.1007
## RBO.11 1.672555 0.201048 8.319 0.0762 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01054 on 1 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 0.9998
## F-statistic: 4818 on 19 and 1 DF, p-value: 0.01134
checkresiduals(ARDL.Rainfall.NoIntercept.7x11$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 3.053, df = 4, p-value = 0.549
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Rainfall.NoIntercept.7x11)
## MASE
## ARDL.Rainfall.NoIntercept.7x11 0.06169275
Model is significant at 5% significance level.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ Radiation, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 12 4 -231.3149 -213.3706
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 12 4 -231.3149 -213.3706
ARDL(12,4) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(12,4):
ARDL.Radiation.12x4 = ardlDlm(formula = RBO ~ Radiation, data = RBO_dataset, p = 12, q = 4)
summary(ARDL.Radiation.12x4)
##
## Time series regression with "ts" data:
## Start = 13, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 13 14 15 16 17 18 19
## -1.652e-04 2.200e-04 -3.309e-04 6.602e-05 -4.037e-06 2.264e-04 -4.739e-05
## 20 21 22 23 24 25 26
## 6.652e-06 7.452e-05 -1.547e-04 -1.306e-04 3.505e-04 -1.289e-04 -5.799e-05
## 27 28 29 30 31
## -1.374e-04 3.952e-05 2.134e-04 -3.925e-04 3.528e-04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.8745074 0.0369600 50.717 0.01255 *
## Radiation.t 0.0271007 0.0019713 13.748 0.04623 *
## Radiation.1 -0.1327683 0.0027999 -47.419 0.01342 *
## Radiation.2 0.0483279 0.0016258 29.725 0.02141 *
## Radiation.3 0.0912607 0.0021988 41.504 0.01534 *
## Radiation.4 -0.0449738 0.0009985 -45.041 0.01413 *
## Radiation.5 0.0251085 0.0012836 19.561 0.03252 *
## Radiation.6 -0.0603475 0.0016012 -37.688 0.01689 *
## Radiation.7 0.0259175 0.0009088 28.519 0.02231 *
## Radiation.8 0.0511526 0.0009442 54.178 0.01175 *
## Radiation.9 -0.0268781 0.0014025 -19.164 0.03319 *
## Radiation.10 0.1004034 0.0017366 57.815 0.01101 *
## Radiation.11 -0.0293446 0.0012726 -23.058 0.02759 *
## Radiation.12 -0.0514641 0.0016960 -30.344 0.02097 *
## RBO.1 -0.8229929 0.0129258 -63.670 0.01000 **
## RBO.2 -0.4822605 0.0112129 -43.009 0.01480 *
## RBO.3 0.0213830 0.0105782 2.021 0.29246
## RBO.4 -0.8245905 0.0105139 -78.429 0.00812 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0008814 on 1 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9987
## F-statistic: 823.6 on 17 and 1 DF, p-value: 0.02739
checkresiduals(ARDL.Radiation.12x4$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 6.9142, df = 4, p-value = 0.1405
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Radiation.12x4)
## MASE
## ARDL.Radiation.12x4 0.006142636
Model is significant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ -1 + Radiation, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 10 9 -248.9684 -227.0334
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 10 9 -248.9684 -227.0334
ARDL(10,9) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(10,9):
ARDL.Radiation.NoIntercept.10x9 = ardlDlm(formula = RBO ~ -1 + Radiation, data = RBO_dataset, p = 10, q = 9)
summary(ARDL.Radiation.NoIntercept.10x9)
##
## Time series regression with "ts" data:
## Start = 11, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 11 12 13 14 15 16 17
## 3.022e-04 -1.510e-04 -3.900e-04 2.135e-04 -1.426e-05 -4.160e-05 3.021e-04
## 18 19 20 21 22 23 24
## -2.217e-04 1.056e-04 -4.959e-04 2.594e-04 2.859e-04 -3.515e-04 -1.580e-04
## 25 26 27 28 29 30 31
## 9.482e-05 1.629e-04 6.342e-05 -9.028e-05 -3.198e-05 -1.598e-04 3.135e-04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Radiation.t -0.838255 0.014335 -58.48 0.01089 *
## Radiation.1 1.134823 0.018196 62.37 0.01021 *
## Radiation.2 -0.211696 0.003179 -66.59 0.00956 **
## Radiation.3 -0.049423 0.001208 -40.93 0.01555 *
## Radiation.4 -0.062767 0.001524 -41.18 0.01546 *
## Radiation.5 -0.395602 0.007091 -55.79 0.01141 *
## Radiation.6 0.268634 0.003397 79.09 0.00805 **
## Radiation.7 0.322715 0.005626 57.37 0.01110 *
## Radiation.8 0.092381 0.002269 40.71 0.01564 *
## Radiation.9 0.479705 0.008787 54.59 0.01166 *
## Radiation.10 -0.603782 0.009693 -62.29 0.01022 *
## RBO.1 0.534926 0.022160 24.14 0.02636 *
## RBO.2 -2.674589 0.046700 -57.27 0.01111 *
## RBO.3 -2.662534 0.054448 -48.90 0.01302 *
## RBO.4 4.640431 0.084013 55.23 0.01152 *
## RBO.5 -2.619076 0.039175 -66.86 0.00952 **
## RBO.6 -3.901487 0.077186 -50.55 0.01259 *
## RBO.7 -1.194310 0.046938 -25.44 0.02501 *
## RBO.8 8.037549 0.135232 59.44 0.01071 *
## RBO.9 -2.000497 0.026872 -74.45 0.00855 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001087 on 1 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 4.54e+05 on 20 and 1 DF, p-value: 0.001169
checkresiduals(ARDL.Radiation.NoIntercept.10x9$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 4.2211, df = 4, p-value = 0.3769
##
## Model df: 0. Total lags used: 4
MASE(ARDL.Radiation.NoIntercept.10x9)
## MASE
## ARDL.Radiation.NoIntercept.10x9 0.007057272
Model is significant at 5% significance level.
With intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ RelHumidity, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 8 9 -127.5342 -105.7134
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 8 9 -127.5342 -105.7134
ARDL(8,9) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(8,9):
ARDL.RelHumidity.8x9 = ardlDlm(formula = RBO ~ RelHumidity, data = RBO_dataset, p = 8, q = 9)
summary(ARDL.RelHumidity.8x9)
##
## Time series regression with "ts" data:
## Start = 10, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 10 11 12 13 14 15 16
## 0.0027811 -0.0006716 0.0047485 -0.0059467 -0.0055389 0.0080587 -0.0051992
## 17 18 19 20 21 22 23
## 0.0077077 -0.0087389 -0.0009153 0.0091855 -0.0040882 0.0020028 -0.0065956
## 24 25 26 27 28 29 30
## 0.0083209 -0.0046487 0.0008040 -0.0027462 0.0024608 0.0048495 -0.0001455
## 31
## -0.0056849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -19.305065 5.095171 -3.789 0.0322 *
## RelHumidity.t 0.042887 0.010073 4.258 0.0238 *
## RelHumidity.1 0.004376 0.008366 0.523 0.6371
## RelHumidity.2 0.020412 0.007946 2.569 0.0826 .
## RelHumidity.3 0.021539 0.013008 1.656 0.1963
## RelHumidity.4 0.002933 0.007477 0.392 0.7211
## RelHumidity.5 0.042051 0.014006 3.002 0.0576 .
## RelHumidity.6 0.031692 0.011039 2.871 0.0640 .
## RelHumidity.7 0.001288 0.008325 0.155 0.8869
## RelHumidity.8 0.037337 0.010751 3.473 0.0403 *
## RBO.1 -0.357498 0.203829 -1.754 0.1777
## RBO.2 0.390037 0.197816 1.972 0.1432
## RBO.3 0.345983 0.191987 1.802 0.1693
## RBO.4 0.169381 0.214533 0.790 0.4874
## RBO.5 0.074593 0.200957 0.371 0.7352
## RBO.6 -0.524327 0.204635 -2.562 0.0831 .
## RBO.7 0.617870 0.187686 3.292 0.0460 *
## RBO.8 0.591129 0.195981 3.016 0.0569 .
## RBO.9 -0.377267 0.125934 -2.996 0.0579 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01455 on 3 degrees of freedom
## Multiple R-squared: 0.9631, Adjusted R-squared: 0.7415
## F-statistic: 4.346 on 18 and 3 DF, p-value: 0.1258
checkresiduals(ARDL.RelHumidity.8x9$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 8.5646, df = 4, p-value = 0.07295
##
## Model df: 0. Total lags used: 4
MASE(ARDL.RelHumidity.8x9)
## MASE
## ARDL.RelHumidity.8x9 0.1629283
Model is insignificant at 5% significance level.
Without intercept :
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 196 ARDL (since max lag for response and predictor of ARDL model is 14, i.e, p = q = 14 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
# Also, models with AIC or BIC scores of inf or -inf are removed
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:14){
for(j in 1:14){
model4.1 = ardlDlm(formula = RBO ~ -1 + RelHumidity, data = RBO_dataset, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per AIC
## p q AIC BIC
## 1 14 1 -129.0694 -114.9047
head(df[order( df[,4] ),] %>% filter(, AIC != -Inf & BIC != -Inf),1) # Best model as per BIC
## p q AIC BIC
## 1 14 1 -129.0694 -114.9047
ARDL(14,1) is the best models as per AIC and BIC scores respectively.
Lets fit this models,
ARDL(14,1):
ARDL.RelHumidity.NoIntercept.14x1 = ardlDlm(formula = RBO ~ -1 + RelHumidity, data = RBO_dataset, p = 14, q = 1)
summary(ARDL.RelHumidity.NoIntercept.14x1)
##
## Time series regression with "ts" data:
## Start = 15, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## 15 16 17 18 19 20 21
## 1.527e-03 1.128e-03 1.390e-03 7.877e-04 -6.114e-04 -3.444e-03 -1.053e-03
## 22 23 24 25 26 27 28
## 5.786e-04 1.717e-04 1.398e-04 2.189e-04 -2.425e-03 -1.376e-03 4.758e-04
## 29 30 31
## -3.027e-03 -2.618e-05 5.554e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## RelHumidity.t 0.001946 0.006161 0.316 0.805
## RelHumidity.1 -0.026871 0.006460 -4.160 0.150
## RelHumidity.2 0.001078 0.007152 0.151 0.905
## RelHumidity.3 -0.021950 0.005260 -4.173 0.150
## RelHumidity.4 0.001545 0.003755 0.411 0.752
## RelHumidity.5 0.016063 0.003871 4.149 0.151
## RelHumidity.6 -0.016269 0.005186 -3.137 0.196
## RelHumidity.7 0.018368 0.005052 3.636 0.171
## RelHumidity.8 0.001517 0.006066 0.250 0.844
## RelHumidity.9 -0.010047 0.003659 -2.746 0.222
## RelHumidity.10 0.001367 0.003752 0.364 0.777
## RelHumidity.11 -0.005870 0.004394 -1.336 0.409
## RelHumidity.12 0.030070 0.005146 5.843 0.108
## RelHumidity.13 0.009266 0.006981 1.327 0.411
## RelHumidity.14 0.006070 0.004594 1.321 0.412
## RBO.1 0.187642 0.222523 0.843 0.554
##
## Residual standard error: 0.008242 on 1 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 0.9999
## F-statistic: 7985 on 16 and 1 DF, p-value: 0.00879
checkresiduals(ARDL.RelHumidity.NoIntercept.14x1$model, test = "LB")
##
## Ljung-Box test
##
## data: Residuals
## Q* = 1.7383, df = 3, p-value = 0.6284
##
## Model df: 0. Total lags used: 3
MASE(ARDL.RelHumidity.NoIntercept.14x1)
## MASE
## ARDL.RelHumidity.NoIntercept.14x1 0.05143758
Model is significant at 5% significance level.
ARDL models for Temperature, Rainfall and Relative Humidity regressors without intercept are significant. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("ARDL.Temperature.NoIntercept.13x3", "ARDL.Rainfall.NoIntercept.7x11", "ARDL.Radiation.12x4", "ARDL.Radiation.NoIntercept.10x9", "ARDL.RelHumidity.NoIntercept.14x1")
AIC <- c(AIC(ARDL.Temperature.NoIntercept.13x3), AIC(ARDL.Rainfall.NoIntercept.7x11), AIC(ARDL.Radiation.12x4), AIC(ARDL.Radiation.NoIntercept.10x9), AIC(ARDL.RelHumidity.NoIntercept.14x1))
BIC <- c( BIC(ARDL.Temperature.NoIntercept.13x3), BIC(ARDL.Rainfall.NoIntercept.7x11), BIC(ARDL.Radiation.12x4), BIC(ARDL.Radiation.NoIntercept.10x9), BIC(ARDL.RelHumidity.NoIntercept.14x1))
Adjusted_Rsquared <- c(0.9999, 0.9998, 0.9987, 1, 0.9999)
MASE <- MASE(ARDL.Temperature.NoIntercept.13x3, ARDL.Rainfall.NoIntercept.7x11, ARDL.Radiation.12x4, ARDL.Radiation.NoIntercept.10x9, ARDL.RelHumidity.NoIntercept.14x1)
data.frame(AIC, BIC, Adjusted_Rsquared, MASE) %>% arrange(MASE)
## AIC BIC Adjusted_Rsquared n
## ARDL.Radiation.12x4 -231.3149 -213.3706 0.9987 19
## ARDL.Radiation.NoIntercept.10x9 -248.9684 -227.0334 1.0000 21
## ARDL.RelHumidity.NoIntercept.14x1 -129.0694 -114.9047 0.9999 17
## ARDL.Rainfall.NoIntercept.7x11 -145.2684 -125.3537 0.9998 20
## ARDL.Temperature.NoIntercept.13x3 -138.1306 -122.1039 0.9999 18
## MASE
## ARDL.Radiation.12x4 0.006142636
## ARDL.Radiation.NoIntercept.10x9 0.007057272
## ARDL.RelHumidity.NoIntercept.14x1 0.051437583
## ARDL.Rainfall.NoIntercept.7x11 0.061692749
## ARDL.Temperature.NoIntercept.13x3 0.065028027
Thus, as per AIC, BIC, MASE (best in terms of forecasting), and Adjusted R-Squared, ARDL(12,4) model for RBO with Radiation as the regressor (ARDL.Radiation.12x4) is the best.
checkresiduals(ARDL.Radiation.12x4$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 6.9142, df = 4, p-value = 0.1405
##
## Model df: 0. Total lags used: 4
Serial autocorrelations left in residuals are insignificant as per Ljung-Box test and ACF plot. From the time series plot and histogram of residuals, there is a random pattern and normality in the residual distribution. Thus, no violation in general assumptions.
The 4 DLM models are,
mean absolute scaled errors or MASE
of these models are,
MASE(DLM.RelHumidity.noIntercept, Koyck.RelHumidity.NoIntercept, ARDL.Radiation.12x4) %>% arrange(MASE)
## n MASE
## ARDL.Radiation.12x4 19 0.006142636
## DLM.RelHumidity.noIntercept 17 0.072315768
## Koyck.RelHumidity.NoIntercept 30 0.870260066
The Best DLM model for the RBO response which gives the most accurate forecasting based on the MASE measure is the Autoregressive DLM model having Radiation as regressor, ARDL.Radiation.12x4 with MASE measure of 0.006142636.
Dynamic linear models are general class of time series regression models which can account for trends, seasonality, serial correlation between response and regressor variable, and most importantly the affect of intervention points.
The response of a general Dynamic linear model is,
\(Y_t = \omega_2Y_{t-1} + (\omega_0 +
\omega_1)P_t - \omega_2\omega_0P_{t-1} + N_t\)
where,
Lets revisit the time series plot for the response, RBO, to visualize
possible intervention points
plot(RBO, ylab='RBO', xlab='Year')
As mentioned at the descriptive analysis stage, year 1996 might be intervention point because the mean level of the RBO series falls notably low from this point on wards. Assuming this intervention point lets fit a Dynamic Linear model and see if the pulse function at years 1996 is significant or not.
As always we do, we will have a look at ACF and PACF plots of the RBO series first.
acf(RBO, main="ACF of RBO")
pacf(RBO, main ="PACF of RBO")
In ACF plot we see a slowly decaying pattern indicating trend in the RBO series. In PACF plot we see 1 high vertical spike indicating trend. No significant seasonal behavior is observed. Thus, lets fit a Dynamic linear model with trend component and no seasonal component. For thoroughness, lets test all possible combinations using trend, multiple lags of RBO, and most importantly, the Pulse at 1996.
Now, lets fit Dynamic Linear model using dynlm() as shown below, (Note, the potential intervention point was identified at year 1996). Lets fit models with and without the intercept and compare,
With intercept :
Y.t = RBO
T = c(13) # The time point when the intervention occurred
P.t = 1*(seq(RBO) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model1 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model2 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model3 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Model <- c("Dyn.model", "Dyn.model1", "Dyn.model2", "Dyn.model3")
AIC <- c(AIC(Dyn.model), AIC(Dyn.model1), AIC(Dyn.model2), AIC(Dyn.model3))
BIC <- c( BIC(Dyn.model), BIC(Dyn.model1), BIC(Dyn.model2), BIC(Dyn.model3))
data.frame(Model, AIC, BIC) %>% arrange(BIC)
## Model AIC BIC
## 1 Dyn.model -114.7932 -107.7873
## 2 Dyn.model2 -114.9847 -105.6593
## 3 Dyn.model3 -114.9847 -105.6593
## 4 Dyn.model1 -112.2284 -104.0246
summary(Dyn.model)
##
## Time series regression with "ts" data:
## Start = 1985, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.067777 -0.019047 0.000599 0.013851 0.074160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5027754 0.1273859 3.947 0.000537 ***
## L(Y.t, k = 1) 0.3665457 0.1612803 2.273 0.031545 *
## P.t -0.0872460 0.0331257 -2.634 0.014032 *
## trend(Y.t) -0.0020229 0.0008264 -2.448 0.021442 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03248 on 26 degrees of freedom
## Multiple R-squared: 0.5382, Adjusted R-squared: 0.485
## F-statistic: 10.1 on 3 and 26 DF, p-value: 0.0001372
As per BIC the best model Dynamic Linear model with intercept for RBO is the \(Dyn.model\) having regressors, an instantaneous 1996 year affect, a 1 year lagged RBO response, and a trend component of RBO.
From the summary statistics, the \(Dyn.model\) is significant at 5% significance level. All the 3 regressors are significant. Most importantly, the pulse at 1996 year is significant at 5% significance level.
Without intercept :
Y.t = RBO
T = c(13) # The time point when the intervention occurred
P.t = 1*(seq(RBO) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model1.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model2.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model3.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Model <- c("Dyn.model.NoIntercept", "Dyn.model1.NoIntercept", "Dyn.model2.NoIntercept", "Dyn.model3.NoIntercept")
AIC <- c(AIC(Dyn.model.NoIntercept), AIC(Dyn.model1.NoIntercept), AIC(Dyn.model2.NoIntercept), AIC(Dyn.model3.NoIntercept))
BIC <- c( BIC(Dyn.model.NoIntercept), BIC(Dyn.model1.NoIntercept), BIC(Dyn.model2.NoIntercept), BIC(Dyn.model3.NoIntercept))
data.frame(Model, AIC, BIC) %>% arrange(BIC)
## Model AIC BIC
## 1 Dyn.model2.NoIntercept -114.1008 -106.10753
## 2 Dyn.model3.NoIntercept -114.1008 -106.10753
## 3 Dyn.model1.NoIntercept -107.1523 -100.31579
## 4 Dyn.model.NoIntercept -102.7092 -97.10439
summary(Dyn.model2.NoIntercept)
##
## Time series regression with "ts" data:
## Start = 1987, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ 0 + L(Y.t, k = 1) + L(Y.t, k = 2) + L(Y.t,
## k = 3) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.050359 -0.020437 0.009498 0.017263 0.049244
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## L(Y.t, k = 1) 0.3690799 0.1470329 2.510 0.01955 *
## L(Y.t, k = 2) 0.3642948 0.1430617 2.546 0.01804 *
## L(Y.t, k = 3) 0.2538073 0.1480625 1.714 0.09994 .
## P.t -0.0869779 0.0290544 -2.994 0.00649 **
## trend(Y.t) 0.0004006 0.0006111 0.656 0.51861
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02809 on 23 degrees of freedom
## Multiple R-squared: 0.9988, Adjusted R-squared: 0.9985
## F-statistic: 3826 on 5 and 23 DF, p-value: < 2.2e-16
As per BIC the best model Dynamic Linear model without intercept for RBO is the \(Dyn.model2.NoIntercept\) having regressors, an instantaneous 1996 year affect, 3 lagged RBO response, and a trend component of RBO.
From the summary statistics, the \(Dyn.model2.NoIntercept\) is significant at 5% significance level. 3 regressors are significant. Most importantly, the pulse at 1996 year is significant at 5% significance level.
The best Dynamic Linear models with and without intercept were Dyn.model and Dyn.model.NoIntercept respectively. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("Dyn.model", "Dyn.model1.NoIntercept")
AIC <- c(AIC(Dyn.model), AIC(Dyn.model2.NoIntercept))
BIC <- c( BIC(Dyn.model), BIC(Dyn.model2.NoIntercept))
Adjusted_Rsquared <- c(0.485, 0.9985)
data.frame(Model,AIC, BIC, Adjusted_Rsquared) %>% arrange(AIC)
## Model AIC BIC Adjusted_Rsquared
## 1 Dyn.model -114.7932 -107.7873 0.4850
## 2 Dyn.model1.NoIntercept -114.1008 -106.1075 0.9985
Thus, as per AIC and BIC, Dynamic Linear model for RBO with intercept (Dyn.model) is the best.
Dyn.model is the best Dynamic Linear model as per AIC and BIC with 1 lagged components of the response (RBO), a significant pulse component at year 1996, and trend component of RBO series. Lets look at the summary statistics and check residuals
summary(Dyn.model)
##
## Time series regression with "ts" data:
## Start = 1985, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.067777 -0.019047 0.000599 0.013851 0.074160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5027754 0.1273859 3.947 0.000537 ***
## L(Y.t, k = 1) 0.3665457 0.1612803 2.273 0.031545 *
## P.t -0.0872460 0.0331257 -2.634 0.014032 *
## trend(Y.t) -0.0020229 0.0008264 -2.448 0.021442 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03248 on 26 degrees of freedom
## Multiple R-squared: 0.5382, Adjusted R-squared: 0.485
## F-statistic: 10.1 on 3 and 26 DF, p-value: 0.0001372
checkresiduals(Dyn.model)
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 6.8908, df = 7, p-value = 0.4403
Summary of Dynamic linear model,
Dyn.model.NoIntercept
The dynamic linear model, Dyn.model, is significant and the pulse (P.t) component significant at year 1996.
Based on the 4 Time series regression methods considered, the best
model as per MASE measure for each method is summarized below,
A. Best Distributed lag models is - Autoregressive DLM model having Radiation as regressor ARDL.Radiation.12x4 with MASE measure of 0.006142636, AIC of -231.3149, BIC of -213.3706 and Adjusted R-squared of 99.87%.
B. Best Dynamic linear models is - Dyn.model having 1 lagged components of the response (RBO), a significant pulse component at years 1996, and trend component with AIC of -114.7932, BIC of -107.7873 and Adjusted R-squared of 48.5%.
Clearly, the best model is ARDL.Radiation.12x4 as per AIC, BIC and Adjusted R-squared measures.
Best Time Series regression model is - Autoregressive DLM model having Radiation as regressor (ARDL.Radiation.12x4)
Residual analysis to test model assumptions.
Lets perform a detailed Residual Analysis to check if any model assumptions have been violated.
The estimator error (or residual) is defined by:
\(\hat{\epsilon_i}\) = \(Y_i\) - \(\hat{Y_i}\) (i.e. observed value less - trend value)
The following problems are to be checked,
Lets first apply diagnostic check using checkresiduals() function,
checkresiduals(ARDL.Radiation.12x4)
## Time Series:
## Start = 13
## End = 31
## Frequency = 1
## 13 14 15 16 17
## -1.652337e-04 2.199540e-04 -3.308864e-04 6.602329e-05 -4.037381e-06
## 18 19 20 21 22
## 2.263848e-04 -4.739273e-05 6.651827e-06 7.451776e-05 -1.546653e-04
## 23 24 25 26 27
## -1.306467e-04 3.504837e-04 -1.288934e-04 -5.798920e-05 -1.374479e-04
## 28 29 30 31
## 3.952405e-05 2.133918e-04 -3.925050e-04 3.527663e-04
##
## Ljung-Box test
##
## data: Residuals
## Q* = 6.9142, df = 4, p-value = 0.1405
##
## Model df: 0. Total lags used: 4
From the Residuals plot, linearity is not violated as the residuals are randomly distributed across the mean. Thus, linearity in distribution of error terms is not violated
To test mean value of residuals is zero or not, lets calculate mean value of residuals as,
mean(ARDL.Radiation.12x4$model$residuals)
## [1] 1.426442e-20
As mean value of residuals is close to 0, zero mean residuals is not violated.
Which has,
\(H_0\) : series
of residuals exhibit no serial autocorrelation of any order up to p
\(H_a\) : series of residuals
exhibit serial autocorrelation of any order up to p
From the Ljung-Box test output, since p (0.1405) > 0.05, we do not reject the null hypothesis of no serial autocorrelation.
Thus, according to this test and ACF plot, we can conclude that the serial correlation left in residuals is insignificant.
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(ARDL.Radiation.12x4$model$residuals)
##
## Shapiro-Wilk normality test
##
## data: ARDL.Radiation.12x4$model$residuals
## W = 0.96413, p-value = 0.656
From the Shapiro-Wilk test, since p>0.05 significance level, we do reject the null hypothesis that states the data is normal. Thus, residuals of ARDL.Radiation.12x4 model are normally distributed.
Summarizing residual analysis on \(ARDL.Radiation.12x4\) model:
Assumption 1: The error terms are randomly distributed and thus show
linearity: Not violated
Assumption 2:
The mean value of E is zero (zero mean residuals): Not
violated
Assumption 4: The error terms are
independently distributed, i.e. they are not autocorrelated:
Not violated
Assumption 5: The errors
are normally distributed. Not violated
Having no residual assumptions’ violations, the Finite DLM model having Relative humidity as regressor without an intercept (ARDL.Radiation.12x4) model is good for accurate forecasting. Lets forecast for the next 3 years,
Using MASE measure, ARDL model, \(ARDL.Radiation.12x4\) is best fitted model to forecast RBO. Lets estimate and plot 3 years (2015-2017) ahead forecasts for RBO series.
Observed and fitted values are plotted below. This plot indicates a good agreement between the model and the original series. (Note, since lag is set as 16 (q=4 + p=12), fitted values are not available for the first 16 years)
plot(RBO, ylab='RBO', xlab = 'Year', type="l", col="black", main="Observed and fitted values using ARDL.Radiation.12x4 model on RBO")
lines(ts(ARDL.Radiation.12x4$model$fitted.values, start = c(1996)), col="red")
legend("topleft",lty=1,
col=c("black", "red"),
c("RBO series", "ARDL.Radiation.12x4 fit"))
Using the given 4 years ahead future covariates values, we can forecast our RBO response.
Future_Covariates_RBO <- read.csv("C:/Users/admin/Downloads/Covariate x-values for Task 3.csv")
head(Future_Covariates_RBO)
## Year Temperature Rainfall Radiation RelHumidity
## 1 2015 20.74 2.27 14.60 94.45
## 2 2016 20.49 2.38 14.56 94.03
## 3 2017 20.52 2.26 14.79 95.04
## 4 2018 20.56 2.27 14.79 95.06
Our ARDL.Radiation.12x4 model uses only 1 covariate, Radiation. 4 years ahead point forecasts of RBO using Radiation covariate is,
ARDL.Radiation.12x4 = ardlDlm(formula = RBO ~ Radiation, data = RBO_dataset, p = 12, q = 4)
x.new = c(Future_Covariates_RBO$Radiation)
forecasts.ardldlm = dLagM::forecast(model = ARDL.Radiation.12x4, x = x.new, h = 3)$forecasts
Forecast using overall best fitting model:
The point forecasts and the forecast plot using the overall best fitting model, ARDL.Radiation.12x4 is given below,
df <- data.frame(
ARDL_forecasts = c(forecasts.ardldlm)
)
row.names(df) <- c("2015", "2016", "2017")
df
## ARDL_forecasts
## 2015 0.6710744
## 2016 0.8016781
## 2017 0.7241307
RBO.extended4 = c(RBO, forecasts.ardldlm)
{
plot(ts(RBO.extended4, start = c(1984)), type="l", col = "red",
ylab = "RBO", xlab = "Year",
main="3 years ahead forecasts for RBO series
using ARDL.Radiation.12x4 model")
lines(RBO,col="black",type="l")
legend("topleft",lty=1,
col=c("black", "red"),
c("RBO series", "ARDL(12,4) forecasts"))
}
The forecasts for best Finite DLM, Koyck, and Dynamic Linear model are plotted and given below, (Note, no significant Polynomial DLM were found and since the best Finite and Koyck models do not have intercepts, their forecasts aren’t printed). Since there is only one Distributed Lag model, ARDL, which is already plotted above, lets move on to Dynamic Linear models,
For Dynamic Linear model:
The 3 years ahead point forecasts are printed and plotted below,
Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
q = 3
n = nrow(Dyn.model$model)
RBO.frc = array(NA , (n + q))
RBO.frc[1:n] = Y.t[2:length(Y.t)] # length(1:n) = length(2:length(Y.t)) = 30
trend = array(NA,q)
trend.start = Dyn.model$model[n,"trend(Y.t)"]
trend = seq(trend.start , trend.start + q/1, 1)
for (i in 1:q){
#months = array(0,11)
#months[(i+4)%%12] = 1 # Data ends in May, to start the new forecast from JUNE, put i + 4.
data.new = c(1,RBO.frc[n-1+i], P.t[n] ,trend[i])
RBO.frc[n+i] = as.vector(Dyn.model$coefficients) %*% data.new
}
par(mfrow=c(1,1))
plot(Y.t,xlim=c(1984,2017),ylab='RBO',xlab='Year',main = "Time series plot of RBO series with 3 years ahead forecasts (in red)")
lines(ts(RBO.frc[(n+1):(n+q)],start=c(2015)),col="red")
The most fitting model for our RBO series in terms of MASE which assesses the forecast accuracy is the Autoregressive DLM model, ARDL(12,4) with Radiation as regressor \(ARDL.Radiation.12x4\). The point forecasts for 3 years ahead reported using the forecast() of dLagM package are 0.6710744, 0.8016781, and 0.7241307 respectively (Confidence Intervals are not outputted).
Potentially better forecasting methods can be explored, compared and diagnosed for better fit.
To accommodate the affect of the Millennium Drought, which occurred during 1996-2009 period, in the analysis of Rank-based flowering Order similarity metric (RBO) based on the 4 climatic regressor variables and obtain the 3 year ahead forecasts.
We expect the Millennium Drought from 1996-2009 to have created an
intervention point which changes the mean function or trend of the RBO
series. Lets revisit the time series plot for the response, RBO, to
visualize possible intervention points at 1996 and 2009 or between these
years.
plot(RBO, ylab = 'RBO', xlab = 'Year')
From the time series plot above, year 1996 might be an intervention point because the mean level of the RBO series falls notably low from this point on wards. Assuming this intervention point lets fit a Dynamic Linear model and see if the pulse function at years 1996 is significant or not.
To analyze the affect of this potential intervention point, Dynamic Linear Regression model can be used. Dynamic linear models are general class of time series regression models which can account for trends, seasonality, serial correlation between response and regressor variable, and most importantly the affect of intervention points.
The response of a general Dynamic linear model is,
\(Y_t = \omega_2Y_{t-1} + (\omega_0 +
\omega_1)P_t - \omega_2\omega_0P_{t-1} + N_t\)
where,
As always we do, we will have a look at ACF and PACF plots of the RBO series first.
acf(RBO, main="ACF of RBO")
pacf(RBO, main ="PACF of RBO")
In ACF plot we see a slowly decaying pattern indicating trend in the RBO
series. In PACF plot we see 1 high vertical spike indicating trend. No
significant seasonal behavior is observed. Thus, lets fit a Dynamic
linear model with trend component and no seasonal component. For
thoroughness, lets test all possible combinations using trend, multiple
lags of RBO, and most importantly, the Pulse at 1996.
Now, lets fit Dynamic Linear model using dynlm() as shown below, (Note, the potential intervention point was identified at year 1996, i.e the 13th data point). Lets fit models with and without the intercept and compare,
With intercept :
Y.t = RBO
T = c(13) # The time point when the intervention occurred
P.t = 1*(seq(RBO) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model1 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model2 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model3 = dynlm(Y.t ~ L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Model <- c("Dyn.model", "Dyn.model1", "Dyn.model2", "Dyn.model3")
AIC <- c(AIC(Dyn.model), AIC(Dyn.model1), AIC(Dyn.model2), AIC(Dyn.model3))
BIC <- c( BIC(Dyn.model), BIC(Dyn.model1), BIC(Dyn.model2), BIC(Dyn.model3))
data.frame(Model, AIC, BIC) %>% arrange(BIC)
## Model AIC BIC
## 1 Dyn.model -114.7932 -107.7873
## 2 Dyn.model2 -114.9847 -105.6593
## 3 Dyn.model3 -114.9847 -105.6593
## 4 Dyn.model1 -112.2284 -104.0246
summary(Dyn.model)
##
## Time series regression with "ts" data:
## Start = 1985, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.067777 -0.019047 0.000599 0.013851 0.074160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5027754 0.1273859 3.947 0.000537 ***
## L(Y.t, k = 1) 0.3665457 0.1612803 2.273 0.031545 *
## P.t -0.0872460 0.0331257 -2.634 0.014032 *
## trend(Y.t) -0.0020229 0.0008264 -2.448 0.021442 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03248 on 26 degrees of freedom
## Multiple R-squared: 0.5382, Adjusted R-squared: 0.485
## F-statistic: 10.1 on 3 and 26 DF, p-value: 0.0001372
As per BIC the best model Dynamic Linear model with intercept for RBO is the \(Dyn.model\) having regressors, an instantaneous 1996 year affect, a 1 year lagged RBO response, and a trend component of RBO.
From the summary statistics, the \(Dyn.model\) is significant at 5% significance level. All the 3 regressors are significant. Most importantly, the pulse at 1996 year is significant at 5% significance level.
Without intercept :
Y.t = RBO
T = c(13) # The time point when the intervention occurred
P.t = 1*(seq(RBO) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model1.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model2.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + trend(Y.t)) # library(dynlm)
Dyn.model3.NoIntercept = dynlm(Y.t ~ 0 + L(Y.t , k = 1) + L(Y.t , k = 2) + L(Y.t , k = 3) + P.t + P.t.1 + trend(Y.t)) # library(dynlm)
Model <- c("Dyn.model.NoIntercept", "Dyn.model1.NoIntercept", "Dyn.model2.NoIntercept", "Dyn.model3.NoIntercept")
AIC <- c(AIC(Dyn.model.NoIntercept), AIC(Dyn.model1.NoIntercept), AIC(Dyn.model2.NoIntercept), AIC(Dyn.model3.NoIntercept))
BIC <- c( BIC(Dyn.model.NoIntercept), BIC(Dyn.model1.NoIntercept), BIC(Dyn.model2.NoIntercept), BIC(Dyn.model3.NoIntercept))
data.frame(Model, AIC, BIC) %>% arrange(BIC)
## Model AIC BIC
## 1 Dyn.model2.NoIntercept -114.1008 -106.10753
## 2 Dyn.model3.NoIntercept -114.1008 -106.10753
## 3 Dyn.model1.NoIntercept -107.1523 -100.31579
## 4 Dyn.model.NoIntercept -102.7092 -97.10439
summary(Dyn.model2.NoIntercept)
##
## Time series regression with "ts" data:
## Start = 1987, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ 0 + L(Y.t, k = 1) + L(Y.t, k = 2) + L(Y.t,
## k = 3) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.050359 -0.020437 0.009498 0.017263 0.049244
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## L(Y.t, k = 1) 0.3690799 0.1470329 2.510 0.01955 *
## L(Y.t, k = 2) 0.3642948 0.1430617 2.546 0.01804 *
## L(Y.t, k = 3) 0.2538073 0.1480625 1.714 0.09994 .
## P.t -0.0869779 0.0290544 -2.994 0.00649 **
## trend(Y.t) 0.0004006 0.0006111 0.656 0.51861
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02809 on 23 degrees of freedom
## Multiple R-squared: 0.9988, Adjusted R-squared: 0.9985
## F-statistic: 3826 on 5 and 23 DF, p-value: < 2.2e-16
As per BIC the best model Dynamic Linear model without intercept for RBO is the \(Dyn.model2.NoIntercept\) having regressors, an instantaneous 1996 year affect, 3 lagged RBO response, and a trend component of RBO.
From the summary statistics, the \(Dyn.model2.NoIntercept\) is significant at 5% significance level. 3 regressors are significant. Most importantly, the pulse at 1996 year is significant at 5% significance level.
The best Dynamic Linear models with and without intercept were Dyn.model and Dyn.model.NoIntercept respectively. Eliminating all the insignificant models and comparing the significant Finite DLM models based on R-squared, AIC, BIC and MASE
Model <- c("Dyn.model", "Dyn.model1.NoIntercept")
AIC <- c(AIC(Dyn.model), AIC(Dyn.model2.NoIntercept))
BIC <- c( BIC(Dyn.model), BIC(Dyn.model2.NoIntercept))
Adjusted_Rsquared <- c(0.485, 0.9985)
data.frame(Model,AIC, BIC, Adjusted_Rsquared) %>% arrange(AIC)
## Model AIC BIC Adjusted_Rsquared
## 1 Dyn.model -114.7932 -107.7873 0.4850
## 2 Dyn.model1.NoIntercept -114.1008 -106.1075 0.9985
Thus, as per AIC and BIC, Dynamic Linear model for RBO with intercept (Dyn.model) is the best.
Dyn.model is the best Dynamic Linear model as per AIC and BIC with 1 lagged components of the response (RBO), a significant pulse component at year 1996, and trend component of RBO series. Lets look at the summary statistics and check residuals
summary(Dyn.model)
##
## Time series regression with "ts" data:
## Start = 1985, End = 2014
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + P.t + trend(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.067777 -0.019047 0.000599 0.013851 0.074160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5027754 0.1273859 3.947 0.000537 ***
## L(Y.t, k = 1) 0.3665457 0.1612803 2.273 0.031545 *
## P.t -0.0872460 0.0331257 -2.634 0.014032 *
## trend(Y.t) -0.0020229 0.0008264 -2.448 0.021442 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03248 on 26 degrees of freedom
## Multiple R-squared: 0.5382, Adjusted R-squared: 0.485
## F-statistic: 10.1 on 3 and 26 DF, p-value: 0.0001372
checkresiduals(Dyn.model)
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 6.8908, df = 7, p-value = 0.4403
Summary of Dynamic linear model,
Dyn.model.NoIntercept
The dynamic linear model, Dyn.model, is significant and the pulse (P.t) component significant at year 1996.
Observed and fitted values are plotted below. This plot indicates a decent agreement between the model and the original series.
plot(RBO,ylab='RBO', xlab = 'Year', type="l", col="red")
lines(Dyn.model$fitted.values)
Now, let’s find 3 years ahead point forecasts for RBO series using the Dyn.model.
Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1) + P.t + trend(Y.t)) # library(dynlm)
q = 3
n = nrow(Dyn.model$model)
RBO.frc = array(NA , (n + q))
RBO.frc[1:n] = Y.t[2:length(Y.t)] # length(1:n) = length(2:length(Y.t)) = 30
trend = array(NA,q)
trend.start = Dyn.model$model[n,"trend(Y.t)"]
trend = seq(trend.start , trend.start + q/1, 1)
for (i in 1:q){
#months = array(0,11)
#months[(i+4)%%12] = 1 # Data ends in May, to start the new forecast from JUNE, put i + 4.
data.new = c(1,RBO.frc[n-1+i], P.t[n] ,trend[i])
RBO.frc[n+i] = as.vector(Dyn.model$coefficients) %*% data.new
}
par(mfrow=c(1,1))
plot(Y.t,xlim=c(1984,2017),ylab='RBO',xlab='Year',main = "Time series plot of RBO series with 3 years ahead forecasts (in red)")
lines(ts(RBO.frc[(n+1):(n+q)],start=c(2015)),col="red")
Data can be collected at monthly level which would allow more precise forecasting.