# Clear the environment and the console
rm(list = ls()); cat("\f")
#----------------------------#
# Install required libraries #
#----------------------------#
packages <- c("ggplot2", "ggpubr", "readxl", "readr", "dplyr", "tidyr", "psych", "stringr",
"lubridate", "knitr", "outliers", "MVN", "TSA", "tseries", "lmtest", "FSAdata",
"forecast", "matrixcalc", "car", "corpcor", "scales", "QuantPsyc",
"urca", "rugarch", "fGarch", "tswge",
"imputeTS", "shiny", "coda", "rjags", "runjags", "ks", "epiDisplay", "fastDummies")
# Install any packages not already installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) { install.packages(packages[!installed_packages]) }
#-------------------------#
# Load libraries #
#-------------------------#
invisible(lapply(packages, library, character.only = TRUE))
#-------------------------#
# Clear the environment and the console
rm(list = ls()); cat("\f")
library(TSA)
library(readr)
library(stringr)
library(dplyr)
library(lubridate)
This report summarises the analysis and modelling of the ‘assignment1Data2022’ dataset, as part of Assignment 1 for Time Series Analysis. The report covers analysis of the dataset, finding a suitable model for the data, and using the model to predict the next 5 observations for the closing share price.
After importing the data, a summary of the data showed there were 144 observations imported into R, the same as observed in the original data fil, confirming the data had been imported successfully.
# Read the data into R
ASX <- read_csv("assignment1Data2022.csv", col_names = TRUE)
colnames(ASX) <- c("Day", "Close_Price")
summary(ASX)
Day Close_Price
Min. : 1.00 Min. :15.47
1st Qu.: 36.75 1st Qu.:71.33
Median : 72.50 Median :81.25
Mean : 72.50 Mean :75.41
3rd Qu.:108.25 3rd Qu.:85.05
Max. :144.00 Max. :95.48
The ‘class()’ function was then used to confirm the format of the ASX dataset and identify that the raw data that had been imported was a data frame. Because the ASX data was a data frame object, the ‘ts()’ function was used to convert the ASX data to a time series object.
class(ASX)
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
ASX_ts <- ts(ASX$Close_Price) #column 1 is simply the observation number
class(ASX_ts)
[1] "ts"
With the ASX data converted to a time series object, the data was plotted as a time series for visual inspection, and to assess the key time series elements of trend, seasonality, changing variance, behaviour, and change point/intervention.
plot(ASX_ts, type='o', ylab='Closing Pricing (AUD)', xlab='Day',
main='Time series plot of the ASX closing price series.')
The time series plot of the ASX closing price series showed the following;
There appear to be two trends in the time series plot. An initial upward trend, beginning from day 1 and ending around day 80, followed by a downward trend beginning at day 81 and continuing through to the end of the series.
The time series plot appeared to show evidence of seasonality, with repeating patterns of increases and decreases that become clearer, and more pronounced as time increases.
The fluctuations in the data grow larger as time progresses, indicative of changing variance. In particular, the variances between day 1 and day 70 are noticeably smaller than the fluctuations from day 70 to day 144.
The presence of seasonality makes it difficult to determine the behaviours present in the data. Nonetheless, the time series plot appears to exhibit some autoregressive behaviour (successive points).
There appears to be a change point (or intervention) around day 80, with the closing price trending upward gradually until approximately day 80, after which the trend in the closing price changes to a steep decline (relative to the initiative upward trend). This change point could reflect an external intervention such as a policy change, or an environmental factor, causing a shift in demand for the share, changing its trajectory.
The points in Figure 1 show evidence of succeeding measurements being related to one another, and the scatter plot of neighbouring pairs in Figure 2 makes this relationship clearer.
par(mfrow=c(1,1))
plot(y=ASX_ts, x=zlag(ASX_ts), ylab='Closing Price', xlab='Closing Price Previous Day',
main = "Scatter plot of Closing Price in consecutive days.")
y = ASX_ts # Assign the ASX data to y
x = zlag(ASX_ts) # Generate first lag of the ASX series
index = 2:length(x) # Create an index to get rid of the first NA value in x
cor(y[index],x[index]) # Calculate correlation between numerical values in x and y
[1] 0.9678333
The scatter plot in Figure 2 shows a strong, upward trend, indicative of a strong correlation between neighbouring pairs. The plot indicates that low values in the ASX closing price series tend to be followed by low values, middle-sized values tend to be followed by middle-sized values, and high values tend to be followed by high values. These observations were supported by the correlation between neighbouring closing prices which, at 0.968 indicates a very strong, positive correlation between neighbouring points.
As the ASX data contains daily closing prices, it is possible that a frequency exists in the data that may need to be applied to the series so that seasonal and cosine models can be modelled on the data. The sample autocorrelation function (ACF) plot was used to identify the frequency of the data.
acf(ASX_ts, main="ACF plot for the closing price of a share on the ASX series.", lag.max = 40)
The ACF plot in Figure 3 showed ‘waves’ in the lags, with peaks at (approximately) every 5 lags. Markers have been added to the ACF plot in Figure 4 to assist in identifying that there are approximately 5 lags between each peak.
As the data comes from daily closing prices, and the ASX is only open on weekdays, a frequency of 5 was deemed reasonable as it aligns with the number of weekdays in a week. A frequency of 5 was therefore applied to the time series object to capture the frequency of weekdays in the series.
ASX_ts_f <- ts(ASX$Close_Price, frequency = 5)
summary(ASX_ts_f)
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.47 71.33 81.25 75.41 85.05 95.48
ASX_ts_f
Time Series:
Start = c(1, 1)
End = c(29, 4)
Frequency = 5
[1] 80.00000 80.00000 80.00000 80.00000 80.00000 80.00000 80.00000 80.00000 80.00000
[10] 81.38536 79.83485 79.93370 80.23789 81.33359 81.14499 79.80375 79.43928 81.41525
[19] 80.70474 79.69196 81.87581 83.51553 82.56475 80.87173 80.72899 83.64094 84.88065
[28] 82.33275 84.32522 84.89717 83.40710 81.37416 80.95209 82.92598 83.99487 81.82895
[37] 84.99273 85.50362 83.98170 81.17617 81.80690 86.13822 87.06587 83.57672 86.29252
[46] 85.58256 83.77695 80.39828 82.20623 88.28351 89.03392 85.63961 87.10639 84.92215
[55] 82.17131 80.00454 81.63989 87.61051 87.22371 84.96421 88.55203 87.53684 84.11937
[64] 82.27707 83.09376 87.79680 88.11651 85.91388 89.92467 88.68123 84.58792 83.52971
[73] 85.20692 90.62572 91.49070 90.07306 95.31403 93.25239 86.41805 83.94959 87.10673
[82] 93.23133 92.43702 90.80849 95.48193 91.64825 82.60358 78.17113 81.01983 86.86542
[91] 86.53665 85.42238 90.39699 86.85428 78.12587 74.29434 78.62797 84.89108 83.41155
[100] 80.45274 84.53619 80.45623 71.39215 66.56142 71.14703 75.80771 74.22025 72.14369
[109] 76.89173 71.38474 61.95091 57.37281 63.16960 67.80788 65.79209 63.46010 69.09905
[118] 63.59795 53.19971 49.52717 56.96862 60.89146 59.20133 58.66473 65.12076 58.77136
[127] 47.24968 44.21060 50.65772 52.97805 50.63288 50.26958 56.19423 48.64413 35.37176
[136] 30.81471 36.18630 38.41415 35.68011 34.78598 40.24962 33.00511 20.00861 15.46719
The ASX data with frequency was then plotted as a time series to understand the nature of the ASX closing price series after applying a frequency.
plot(ASX_ts_f, type='o', ylab='Closing Pricing (AUD)', xlab='Week',
main='Time series plot of the ASX closing price series.')
By applying a frequency of 5, every 5 observations would be interpreted as one week. Points to indicate days were then overlayed on time series plot to aid in checking for seasonality in the series.
plot(ASX_ts_f, type='l', ylab='Closing Pricing (AUD)', xlab='Week',
main='Time series plot of the ASX closing price series.')
points(y=ASX_ts_f, x=time(ASX_ts_f), pch=as.vector(season(ASX_ts_f, 1:5)))
When points were added to the time series plot there did not appear to be a clear seasonal pattern where a particular day regularly represented a peak or trough. In the last 5 “waves” of the series, the trough of each wave is a different number, indicating that the troughs are not occurring on the same day.
The first model fitted was a minimum variance estimator, or sample mean.
mean(ASX_ts)
[1] 75.40934
var(ASX_ts)
[1] 266.0105
The mean of the ASX closing price was $75.41 with a variance of $266.01 however, when the sample mean was plotted against the time series plot, it was clear it was not a suitable model for the closing price data.
plot(ASX_ts, type='o', ylab='Closing Pricing (AUD)', xlab='Day',
main='Time series plot of the ASX closing price series \nwith fitted minimum variance estimator.')
abline(h = mean(ASX_ts), col="red", lwd=1, lty=1)
Residual analysis of the sample mean was not conducted as it was clear from the plot of the sample mean and the ASX closing price series that the sample mean was a poor model for the ASX closing price series.
A linear model was fitted to the ASX closing price series with the following model summary
t <- time(ASX_ts) # Get time points
model_linear <- lm(ASX_ts ~ t)
summary(model_linear)
Call:
lm(formula = ASX_ts ~ t)
Residuals:
Min 1Q Median 3Q Max
-41.083 -8.949 0.231 8.896 23.370
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 94.53267 2.02418 46.70 <2e-16 ***
t -0.26377 0.02422 -10.89 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.08 on 142 degrees of freedom
Multiple R-squared: 0.4551, Adjusted R-squared: 0.4513
F-statistic: 118.6 on 1 and 142 DF, p-value: < 2.2e-16
The summary of the linear model showed the model was significant at the α=0.05 level (p < 0.05) as was the coefficient of the model variable “t” and the intercept. With an adjusted R-squared of 0.451, the model only accounted for 45% of the total variation in closing price, suggesting it may not be a good model for the ASX closing price series. A time series plot of the linear model was then produced, to assess the linear model’s suitability, visually.
plot(ASX_ts, type='o', ylab='Closing Pricing (AUD)', xlab='Day',
main='Time series plot of the ASX closing price series \nwith fitted linear model.')
abline(model_linear, col="red")
From the plot of the ASX closing price series and the linear model in Figure 8, it was clear that the linear model was a poor fit for the ASX closing price series. A residual analysis of the linear model was conducted as a final assessment of fit
# Residual analysis
res_model_linear = rstudent(model_linear)
par(mfrow=c(2,2))
plot(y=res_model_linear, x=as.vector(t), type='l',
xlab='Day', ylab='Standardized Residuals', main="Standardised residuals from linear model.")
hist(res_model_linear, xlab='Standardized Residuals', main="Histogram of standardised residuals \nfrom linear model.")
qqnorm(y=res_model_linear, main = "QQ plot of standardised residuals \nfrom linear model.")
qqline(y=res_model_linear, col = 2, lwd = 1, lty = 2)
acf(res_model_linear, main = "ACF of standardized residuals \nfrom linear model.")
par(mfrow=c(1,1))
pacf(res_model_linear, main = "PACF of standardized residuals \nfrom linear model.")
shapiro.test(res_model_linear)
Shapiro-Wilk normality test
data: res_model_linear
W = 0.97653, p-value = 0.01413
The residual analysis indicated that the residuals were not normally distributed, and that there was evidence of a trend in the standardised residuals, meaning information about the ASX series was being missed by the linear model.
As such, the linear model was deemed unsuitable as a model for the ASX closing price series.
The ASX closing price series was then modelled using a quadratic model, with the following output summary.
t <- time(ASX_ts) # Get time points
t2 <- t^2
model_quad = lm(ASX_ts ~ t + t2)
summary(model_quad)
Call:
lm(formula = ASX_ts ~ t + t2)
Residuals:
Min 1Q Median 3Q Max
-17.8176 -3.5322 -0.7553 3.6988 12.5656
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 70.2777995 1.4284180 49.20 <2e-16 ***
t 0.7330056 0.0454811 16.12 <2e-16 ***
t2 -0.0068743 0.0003038 -22.62 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.634 on 141 degrees of freedom
Multiple R-squared: 0.8823, Adjusted R-squared: 0.8807
F-statistic: 528.6 on 2 and 141 DF, p-value: < 2.2e-16
The quadratic model was significant at the α=0.05 level (p < 0.05), as were all coefficients of model variables. With an adjusted R-squared of 0.8807, the model accounted for 88% of the total variation in closing price, indicating the quadratic model may be a better model of the ASX closing price series than the linear model.
A time series plot of the fitted quadratic model was then produced to assess the model’s suitability, visually.
plot( ts( fitted(model_quad)), ylab='Closing Pricing (AUD)', xlab='Day',
main = "Time series plot of the ASX closing price series \nwith fitted quadratic model.",
ylim = c(min( c( fitted(model_quad), as.vector(ASX_ts))),
max( c( fitted(model_quad), as.vector(ASX_ts)))),
col = 'red')
lines(as.vector(ASX_ts), type="o")
From visual inspection, it was clear that the fitted quadratic model was more suitable than the linear model for the ASX closing price series. A residual analysis of the suitability of the quadratic model was then conducted to further assess the model’s suitability.
# Residual analysis
res_model_quad = rstudent(model_quad)
par(mfrow=c(2,2))
plot(y=res_model_quad, x=as.vector(t), type='l',
xlab='Day', ylab='Standardized Residuals', main="Standardised residuals \nfrom quadratic model.")
hist(res_model_quad, xlab='Standardized Residuals', main="Histogram of standardised residuals \nfrom quadratic model.")
qqnorm(y=res_model_quad, main = "QQ plot of standardised residuals \nfrom quadratic model.")
qqline(y=res_model_quad, col = 2, lwd = 1, lty = 2)
acf(res_model_quad, main = "ACF of standardized residuals \nfrom quadratic model.")
par(mfrow=c(1,1))
pacf(res_model_quad, main = "PACF of standardized residuals from quadratic model.")
shapiro.test(res_model_quad)
Shapiro-Wilk normality test
data: res_model_quad
W = 0.98771, p-value = 0.2324
The time series plot of residuals from the fitted quadratic model showed a pattern/trend, indicating that the residuals were not distributed as white noise and the ASX series may not be an adequate model. * The histogram of standardised residuals showed there were fewer large residuals (+/-3) compared to the linear model, and that the distribution of the residuals displayed a very slight negative skew. * The Q-Q plot of standardised residuals showed deviations from the normal at the beginning of the series, but the deviations were smaller deviations than in the linear model. * The sample Autocorrelation Function (ACF) plot showed significant autocorrelations at the first, second and fourth lags, as well as a borderline significant autocorrelation at the third lag, and significant autocorrelations at later lags. The sample Partial Autocorrelation Function (PACF) plot showed significant partial autocorrelations at the first, second and third lags, as well as significant partial autocorrelation at the 6th, 7th, and 9th lag, as observed in the residual analysis of the linear model. Finally, the Shapiro-Wilk test resulted in a p-value of 0.232, indicating there was insufficient evidence to reject the null hypothesis at the α=0.05 level and therefore we could conclude that the standardised residuals were approximately normally distributed.
The residual analysis of the fitted quadratic model indicated that the residuals were normally distributed, but there was a trend present in the standardised residuals, suggesting that the quadratic model was capturing more information about the ASX closing price series than the linear model, but that there was still information about the ASX closing price series (a trend) that had not been captured by the quadratic model.
Nonetheless, as the residuals were normally distributed (unlike the residuals from the linear model), the residual analysis confirmed that the quadratic model was a more suitable model for the ASX closing price series than the linear model.
As the ASX closing price series appeared to exhibit seasonality, a seasonal model was fitted to the ASX closing price series to observe its suitability for the ASX closing price series. The seasonal model was built using the ASX series formatted as a time series object with a frequency of , with the ‘seasons’ assigned to a variable called ‘day’.
day. <- season(ASX_ts_f, 1:5)
day.
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4
[45] 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3
[89] 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2
[133] 3 4 5 1 2 3 4 5 1 2 3 4
Levels: 1 2 3 4 5
model_seasonal <- lm(ASX_ts_f ~ day. -1) # -1 removes the intercept term, so you can see 'day'
summary(model_seasonal)
Call:
lm(formula = ASX_ts_f ~ day. - 1)
Residuals:
Min 1Q Median 3Q Max
-58.912 -4.710 5.935 9.830 19.643
Coefficients:
Estimate Std. Error t value Pr(>|t|)
day.1 75.883 3.070 24.72 <2e-16 ***
day.2 75.671 3.070 24.65 <2e-16 ***
day.3 75.100 3.070 24.46 <2e-16 ***
day.4 74.379 3.070 24.23 <2e-16 ***
day.5 76.035 3.124 24.34 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.53 on 139 degrees of freedom
Multiple R-squared: 0.9557, Adjusted R-squared: 0.9541
F-statistic: 599.3 on 5 and 139 DF, p-value: < 2.2e-16
The seasonal model was significant at the α=0.05 level (p < 0.05), as were the coefficients for all days. The adjusted R-squared of 0.9541 indicated that 95.4% of the variation in closing price was accounted for by the seasonal model. The adjusted R-squared of the seasonal model exceeded the recommended range for an adjusted R-squared of a model fitted to a time series (between 0.8 and 0.85), suggesting that the seasonal model may be overfitted, which would affect how useful the model would be in predicting future observations.
A time series plot of the fitted seasonal model was then produced to assess the model’s suitability, visually.
plot(ts(fitted(model_seasonal)), type="l", xlab='Day', ylab='Closing Pricing (AUD)',
ylim = c(min(c(fitted(model_seasonal), as.vector(ASX_ts_f))),
max(c(fitted(model_seasonal), as.vector(ASX_ts_f)))),
main = "Fitted seasonal model to ASX closing price series.", lty=2, col="red")
lines(as.vector(ASX_ts_f),type="o")
From visual inspection, it appeared that the fitted seasonal model was less suitable than the quadratic model for the ASX closing price series, as the seasonal model failed to capture the trends in the ASX closing price series.
A residual analysis of the suitability of the seasonal model was then conducted.
res_model_seasonal = rstudent(model_seasonal)
par(mfrow=c(2,2))
plot(y=res_model_seasonal, x=as.vector(time(ASX_ts_f)), type='l',
xlab='Week', ylab='Standardized Residuals', main="Standardised residuals \nfrom seasonal model.")
hist(res_model_seasonal, xlab='Standardized Residuals', main="Histogram of standardised residuals \nfrom seasonal model.")
qqnorm(y=res_model_seasonal, main = "QQ plot of standardised residuals \nfrom seasonal model.")
qqline(y=res_model_seasonal, col = 2, lwd = 1, lty = 2)
acf(res_model_seasonal, main = "ACF of standardized residuals \nfrom seasonal model.")
par(mfrow=c(1,1))
pacf(res_model_seasonal, main = "PACF of standardized residuals from seasonal model.")
shapiro.test(res_model_seasonal)
Shapiro-Wilk normality test
data: res_model_seasonal
W = 0.79649, p-value = 7.213e-13
The residual analysis indicated that the residuals were not normally distributed. * The Histogram of standardised residuals showed there was at least one large residual (greater than +/-3), and that the distribution of the residuals displayed a very strong negative skew. * The Q-Q plot of standardised residuals showed a large deviation from the normal at the beginning of the series, with the deviations being much larger than the Q-Q plot of the quadratic model. * The sample Autocorrelation Function (ACF) plot showed many significant autocorrelation lags, decaying as lags increased. A wave pattern in the lagged residuals was also observed. * The sample Partial Autocorrelation Function (PACF) plot showed significant partial autocorrelations at 1st and third lag, as well as a significant partial autocorrelation at the 9th lag. A wave pattern in the lagged residuals was also observed, indicating that there was information in the residuals that had not been captured by the seasonal model. * Finally, the Shapiro-Wilk test resulted in a p-value less than 0.05, indicating there was sufficient evidence to reject the null hypothesis at the α=0.05 level and conclude that the standardised residuals were not approximately normally distributed.
When the seasonal model was fitted to the ASX closing price series, it appeared that the seasonal model was a poor estimator of the ASX closing price series as it failed to capture the trend in the data. The residual analysis confirmed this, as the standardised residuals from the fitted seasonal model were not normally distributed and had many significant autocorrelations, unlike the quadratic model.
As the quadratic model captured the trend in the ASX closing price series, but not the seasonality, a seasonal quadratic model was fitted to the ASX closing price series to understand whether it was a better model of the ASX closing price series.
day. <- season(ASX_ts_f, 1:5)
t <- time(ASX_ts_f)
t2 <- t^2
model_seasonal_quad <- lm(ASX_ts_f ~ day. + t + t2 -1)
# -1 removes the intercept term, so you can see 'day'
summary(model_seasonal_quad)
Call:
lm(formula = ASX_ts_f ~ day. + t + t2 - 1)
Residuals:
Min 1Q Median 3Q Max
-17.3589 -3.3848 -0.8009 3.9543 12.6022
Coefficients:
Estimate Std. Error t value Pr(>|t|)
day.1 67.468986 1.859123 36.29 <2e-16 ***
day.2 67.507015 1.864856 36.20 <2e-16 ***
day.3 67.199778 1.870226 35.93 <2e-16 ***
day.4 66.755692 1.875235 35.60 <2e-16 ***
day.5 67.184583 1.903103 35.30 <2e-16 ***
t 3.940973 0.242492 16.25 <2e-16 ***
t2 -0.171866 0.007701 -22.32 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.709 on 137 degrees of freedom
Multiple R-squared: 0.9948, Adjusted R-squared: 0.9945
F-statistic: 3736 on 7 and 137 DF, p-value: < 2.2e-16
The seasonal quadratic model was significant at the α=0.05 level (p < 0.05), as were the coefficients for all days, and the linear and quadratic components. The adjusted R-squared of 0.9945 indicated that 99.5% of the variation in the ASX closing price series was accounted for by the model, however, the adjusted R-squared value was very high, indicative of overfitting. A time series plot of the fitted seasonal quadratic model was produced to visually assess the model’s suitability.
plot(ts(fitted(model_seasonal_quad)), type="l", xlab='Day', ylab='Closing Pricing (AUD)',
ylim = c(min(c(fitted(model_seasonal_quad), as.vector(ASX_ts_f))),
max(c(fitted(model_seasonal_quad), as.vector(ASX_ts_f)))),
main = "Fitted seasonal quadratic model to ASX closing price series", lty=2, col="red")
lines(as.vector(ASX_ts_f),type="o")
The fitted seasonal quadratic model appeared more suitable than the seasonal model as it captured some of the trend in the ASX closing price series. It also appeared to capture some of the seasonality in the ASX closing price series, suggesting it may also be more suitable than the quadratic model.
To further assess the suitability of the seasonal quadratic model, a residual analysis was conducted with the following outputs.
res_model_ssnl_quad = rstudent(model_seasonal_quad)
par(mfrow=c(2,2))
plot(y=res_model_ssnl_quad, x=as.vector(time(ASX_ts_f)), type='l',
xlab='Week', ylab='Standardized Residuals', main="Standardised residuals \nfrom seasonal quadratic model.")
hist(res_model_ssnl_quad, xlab='Standardized Residuals', main="Histogram of standardised residuals \nfrom seasonal quadratic model.")
qqnorm(y=res_model_ssnl_quad, main = "QQ plot of standardised residuals \nfrom seasonal quadratic model.")
qqline(y=res_model_ssnl_quad, col = 2, lwd = 1, lty = 2)
acf(res_model_ssnl_quad, main = "ACF of standardized residuals \nfrom seasonal quadratic model.")
par(mfrow=c(1,1))
pacf(res_model_ssnl_quad, main = "PACF of standardized residuals from seasonal quadratic model.")
shapiro.test(res_model_ssnl_quad)
Shapiro-Wilk normality test
data: res_model_ssnl_quad
W = 0.98737, p-value = 0.2142
The residual analysis indicated that the residuals were approximately normally distributed. * The Histogram of standardised residuals showed there were few (if any) large, standardised residuals (greater than +/-3), and that the distribution of the standardised residuals was very slightly negatively skewed. * The Q-Q plot of standardised residuals showed deviations from the normal at the beginning and end of the series, but the deviations were smaller deviations than in the linear model. * The sample Autocorrelation Function (ACF) plot showed significant autocorrelations at the first, second and fourth lags, as well as a borderline significant autocorrelation at the third lag, and significant autocorrelations at later lags. * The sample Partial Autocorrelation Function (PACF) plot showed significant partial autocorrelations at the first, second and third lags, as well as significant partial autocorrelation at the 6th, 7th, and 9th lag, like the quadratic model. * The Shapiro-Wilk test resulted in a p-value of 0.214, indicating there was insufficient evidence to reject the null hypothesis at the α=0.05 level and therefore we could conclude that the standardised residuals were approximately normally distributed.
The standardised residuals from the fitted seasonal quadratic model were very similar to those of the quadratic model, making it hard to determine which model was a better fit for the ASX closing price series. However, the adjusted R-squared for the seasonal quadratic model was very high (0.9945), indicative of overfitting. Because the adjusted R-squared of the quadratic model (0.8807) was closer to the recommended range (0.8 to 0.85) the quadratic model appeared to be a more suitable model for the ASX closing price series than thee seasonal quadratic model, which was likely overfitting the data which would make it less accurate at forecasting future observations.
A Cosine model was fitted to the ASX closing price series to assess the suitability of a model with cosine and sine terms in modelling the ASX closing price series.
har. <- harmonic(ASX_ts_f, 1) # calculate cos(2*pi*t) and sin(2*pi*t)
model_cosine <- lm(ASX_ts_f ~ har.)
summary(model_cosine)
Call:
lm(formula = ASX_ts_f ~ har.)
Residuals:
Min 1Q Median 3Q Max
-59.331 -4.276 5.759 9.805 19.881
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 75.4107 1.3681 55.120 <2e-16 ***
har.cos(2*pi*t) 0.7308 1.9293 0.379 0.705
har.sin(2*pi*t) 0.0369 1.9403 0.019 0.985
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.42 on 141 degrees of freedom
Multiple R-squared: 0.00102, Adjusted R-squared: -0.01315
F-statistic: 0.07195 on 2 and 141 DF, p-value: 0.9306
The cosine model was not significant at the α=0.05 level (p > 0.05), and the coefficients of the sine and cosine terms were also not significant (p > 0.05). The R-squared of 0.001 indicated that less than 1% of the variation in the ASX closing price series was accounted for by the cosine model.
A time series plot of the fitted seasonal quadratic model was produced to visually assess the model’s suitability.
plot(ts(fitted(model_cosine)), xlab='Day', ylab='Closing Pricing (AUD)',
ylim = c(min(c(fitted(model_cosine), as.vector(ASX_ts_f))),
max(c(fitted(model_cosine), as.vector(ASX_ts_f)))),
main = "Fitted harmonic curve to ASX closing price series", type="l",lty=2,col="red")
lines(as.vector(ASX_ts_f),type="o")
The plot clearly shows that the fitted cosine model is a poor fit for the ASX closing price series, as it does not capture the trend in the data.
As it was clear from Figure 20 that the cosine model was less suitable than the quadratic model for the ASX closing price series, a residual analysis of the cosine model was not conducted.
Furthermore, as the coefficients of the cosine and sine terms in the cosine model were not significant, a harmonic quadratic model was not modelled.
The quadratic model was selected as the most appropriate model for the ASX closing price series as the model captured the trend in the ASX closing price series, the standardised residuals from the fitted quadratic model were approximately normally distributed, and the adjusted R-squared was large (0.8807), without causing major concern for overfitting. The other models were deemed unsuitable for modelling the ASX closing price series for the following reasons; * The minimum variance estimator, seasonal model, and the cosine model did not capture the trend in the ASX closing price series. * The linear model, when fitted to the ASX series, produced residuals that were not normally distributed * The seasonal quadratic model had a very large adjusted R-squared (0.9945), indicating that the model was likely to be overfitting to the ASX closing price data, which would make it a poor predictor of future closing prices.
With the quadratic model identified as the most appropriate model for the ASX closing price series, the model was used to predict the next 5 observations.
Using the best model to predict the next 5 observations in the ASX closing price series yielded the following forecasts, and lower/upper bound limits
t2 <- t^2
forecast_model <- model_quad # assign the 'best' model to forecast_model
h <- 5 # 5 steps ahead forecasts
t <- seq((length(t)+1), (length(t)+h), 1)
t2 <- t^2
new <- data.frame(t, t2)
new
# To run the predict() function properly, the names of variables in the fitted model
# and "new" data frame must be the same.
forecasts = predict(forecast_model, new, interval = "prediction")
# Here interval argument shows the prediction interval
print(forecasts)
fit lwr upr
1 32.03113 20.53974 43.52253
2 30.76371 19.25266 42.27477
3 29.48255 17.95092 41.01417
4 28.18763 16.63449 39.74077
5 26.87896 15.30335 38.45458
plot(ASX_ts, xlim = c(1,160), ylim = c(0, 100), xlab='Day', ylab = "Closing Price (AUD)",
main = "ASX closing price series with forecasts from best model.")
# We need to convert forecasts to time series object starting from the first
# time steps-ahead to be able to use plot function. We do this for all columns of forecasts
lines(ts(as.vector(forecasts[,1]), start = length(time(ASX_ts)+1)), col="red", type="l")
lines(ts(as.vector(forecasts[,2]), start = length(time(ASX_ts)+1)), col="blue", type="l")
lines(ts(as.vector(forecasts[,3]), start = length(time(ASX_ts)+1)), col="blue", type="l")
legend("topright", lty=1, pch=1, col=c("black","blue","red"),
text.width = 30, c("Data","5% forecast limits", "Forecasts"))
Although the forecasted observations in Figure 21 appear in-line with the downward trend of the ASX closing price series, there appears to be evidence of overfitting, as the forecasted observations do not display the seasonality of the ASX series, nor do they capture the changing/increasing variation, and appear to simply follow the neighbouring points. While the quadratic model could be a useful guide for forecasting future values of the ASX closing price series, it is recommended as a guide only, and not as a reliable predictor of future prices as the quadratic model shows signs of overfitting and fails to capture the seasonality and increasing variation in the ASX series.
The objective of the analysis, model fitting and forecasting conducted was to find the best fitting model for the ASX closing price series and predict the next 5 observations.
Analysis of the ASX closing price series revealed a number of time series elements that could make modelling and prediction challenging, including seasonality, changing variation, and a change point in the ASX series.
Model fitting identified the best model as the quadratic model, as it was significant, captured the trend in the ASX series, had residuals that were approximately normally distributed, and had an adjusted R-squared that was large without causing serious concern for overfitting.
When the quadratic model was used to forecast the next 5 observations in the ASX closing price series, the forecasted values were in-line with the downward trend observed after the change point in the ASX series. However, the predicted observations also showed signs of overfitting, and did not appear to capture the seasonal trend or the increasing variation in the ASX series. As such, it was recommended that the quadratic model and its forecasted observations be used as a guide only, as the actual values of future observations would likely deviate from the predicted values due to overfitting.