MATH 1318 - TIME SERIES ANALYSIS - ASSIGNMENT

1.0 BACKGROUND
1.0.1 Objective
1.0.2 Programming language to be used
1.0.3 Loading required Libraries for Time series
1.1 DATA PROPROCESSING AND DESCRIPTIVE ANALYSIS
1.1.1 Data import and review
1.1.2 Data Preprocessing
1.1.3 Viewing, inspecting and Converting the Dataset to Time series
1.2 DATA VISUALIZATION
1.2.1 Data Visualization - Time series Plot
1.2.2 Data Visualization - Scatter and lagged Scatter Plot
1.2.3 Correlation Test of Scatter Plot
1.3 STATIONARITY TEST AND VALIDATION
1.3.1. Breusch-Pagan test for heteroscedasticity
1.3.2. Data Visualization - ACF Plot
1.3.3. Augmented Dickey-Fuller (ADF) Test
1.3.4 Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test
1.3.5 Phillips-Perron Unit Root Test
1.3.6 Normality Q-Q Plot with qqline
1.3.7 Shapiro-Wilk normality test
1.3.8 Histogram with Density Plot
1.4 ADDRESSING THE NON-STATIONARITY IN OUR TIME SERIES
1.4.1 With Box Cox Transformation
1.4.2 Box Cox Transformation: - Orginal Data vs Transformed Data Plot
1.4.3 Q-Q Plot of transformed Data
1.5 USING DIFFERENCING TO ACHIEVE STATIONARITY
1.5.1 Differencing our Time series
1.5.2 Confirming Stationarity
1.5.1.1. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test
1.5.1.2. Phillips-Perron Unit Root Test
1.6. MODEL SPECIFICATION AND FITTTING
1.6.1 Approach 1: - Using ACF - PACF of First Differenced Time Series
1.6.2. Approach 2: - Using EACF Plot
1.6.3. Approach 3: - BIC Table
1.6.4 Approach 4: - Using AUTO ARIMA Function
1.6.5 Testing our candidate models with Box - Ljung Test
1.6.6 Testing our candidate models with Shapiro-Wilk normality test
1.6.7 Candidate Model’s AIC and BIC Confirmation test
1.6.8. Candidate Model’s Accuracy table confirmation
1.6.9. Preferred Candidate Model’s residual Test confirmation
1.6.10. Five Year Forecast - Preferred Candidate Model
RESULTS
DISCUSSION
CONCLUSION
RECOMMENDATIONS
REFERENCES

1.0 BACKGROUND

Time series analysis is a key tool useful in analyzing and forecasting trends, patterns and anomalies in dataset over time. In this assignment, we will be analyzing the year Global Land Temperature Anomalies dataset. The dataset provides insight into temperature variations for the period of 1901-2000. it spans from the year 1850 to 2023 and is sourced from the NOAA National Centers for Environmental Information.

1.0.1 Objective

The primary objective of this study is to perform a comprehensive time series analysis of the Global Land Temperature Anomalies dataset. We also intend to achieve the following: - Conduct descriptive analysis to understand the overall trends and patterns in the temperature anomalies data. - Propose and evaluate a set of possible ARIMA(p, d, q) models using various model specification tools. - Identify the best model using goodness-of-fit metrics (AIC, BIC, MSE, etc).

1.0.2 Programming language to be used

The Assignment will be written in R programming language with descriptive analysis and diagnostic assessments intersperse in the Rmarkdown template used.

1.0.3 Loading required Libraries for Time series

# Load the necessary libraries required for the assignment.

library(tseries) # Time series analysis.
library(readr) # Data import and parsing.
library(ggplot2) # Data visualization and chart creation.

library(dplyr) # data manipulation and transformation
library(forecast) # Time series forecasting.
library(TSA) # Time series analysis and modeling.
library(expsmooth) # Exponential smoothing for time series.
library(lmtest) # Linear regression model testing.

1.1 DATA PROPROCESSING AND DESCRIPTIVE ANALYSIS

The Global Land Temperature Anomalies dataset contains yearly temperature anomalies measured in degrees Celsius relative to the base period of 1901-2000. The dataset covers a time span from 1850 to 2023, providing a comprehensive historical record of global land temperature variations.

1.1.1 Data import and review

# Getting working directory for the analysis
getwd()

## [1] "C:/Users/adeyi/Documents/RMIT classes/Time Series/Assignment 3"

# Reading the dataset provided make provision for the presence of header in the CSV file
assignment2Data2024 <- read.csv("~/RMIT classes/Time Series/Assignment1/assignment2Data2024.csv", header = TRUE)

# Glimpse columns
glimpse (assignment2Data2024)

## Rows: 174
## Columns: 2
## $ Year    <int> 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 18…
## $ Anomaly <dbl> -0.06, 0.00, 0.03, 0.01, -0.02, 0.01, -0.10, -0.15, -0.10, 0.0…

The “assignment2Data2024” dataset consists of two columns: “Year” and “Anomaly.” The “Year” column covers the years from 1850 to 2023, providing a comprehensive historical record. On the other hand, the “Anomaly” column represents the yearly Global Land Temperature Anomalies measured in degrees Celsius. These anomalies are calculated against the base period of 1901-2000, indicating deviations from the average temperatures during that time frame.

1.1.2 Data Preprocessing

# Check the class of the 'assignment2Data2024' object
class(assignment2Data2024)

## [1] "data.frame"

# Explore the structure of the 'assignment2Data2024' object
str(assignment2Data2024)

## 'data.frame':    174 obs. of  2 variables:
##  $ Year   : int  1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 ...
##  $ Anomaly: num  -0.06 0 0.03 0.01 -0.02 0.01 -0.1 -0.15 -0.1 0.02 ...

# Display the first few rows of the 'assignment2Data2024' object
head(assignment2Data2024)

##   Year Anomaly
## 1 1850   -0.06
## 2 1851    0.00
## 3 1852    0.03
## 4 1853    0.01
## 5 1854   -0.02
## 6 1855    0.01

# Display the last few rows of the 'assignment2Data2024' object
tail(assignment2Data2024)

##     Year Anomaly
## 169 2018    0.66
## 170 2019    0.74
## 171 2020    0.73
## 172 2021    0.63
## 173 2022    0.67
## 174 2023    0.91

#checking out the summary of our dataset
summary(assignment2Data2024)

##       Year         Anomaly        
##  Min.   :1850   Min.   :-0.44000  
##  1st Qu.:1893   1st Qu.:-0.12750  
##  Median :1936   Median : 0.00000  
##  Mean   :1936   Mean   : 0.06218  
##  3rd Qu.:1980   3rd Qu.: 0.23000  
##  Max.   :2023   Max.   : 0.91000

The temperature anomalies in the dataset range from negative values, indicating cooler-than-average temperatures, to positive values, indicating warmer-than-average temperatures. This dataset is intended for time series analysis, which will involve examining trends, patterns, and forecasting of global land temperature anomalies over the years.

1.1.3 Viewing, inspecting and Converting the Dataset to Time series

# Convert the dataset into a time series object using the ts() function
ts_assignment2Data2024 <- ts(assignment2Data2024$Anomaly, start = 1850, end = 2023, frequency = 1)

# Check the structure of the time series object
str(ts_assignment2Data2024)

##  Time-Series [1:174] from 1850 to 2023: -0.06 0 0.03 0.01 -0.02 0.01 -0.1 -0.15 -0.1 0.02 ...

# class of the converted data
class(ts_assignment2Data2024)

## [1] "ts"

# Check the range of the resulting time series
range(ts_assignment2Data2024)

## [1] -0.44  0.91

# Check the summary of the resulting time series
summary(ts_assignment2Data2024)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.44000 -0.12750  0.00000  0.06218  0.23000  0.91000

The time series objects are organized as follows:

‘assignment2Data2024_ts’ has observation with a range from -0.44 to 0.91,
Minimum value of the range is -0.44000,
The 1st Quartile is -0.12750,
The Median of the series is 0.00000,
The Mean of the time series is 0.06218,
The 3rd Quartile is 0.23000,
The maximum value of the range is 0.91000,

The summary information provided is valuable for understanding the characteristics of each time series, including the range of values and measures of central tendency. It can assist in further analysis and modeling of the data, particularly in the context of time series forecasting.”

1.2 DATA VISUALIZATION

1.2.1 Data Visualization - Time series Plot

# Create a time series plot and add axis labels

plot(ts_assignment2Data2024, ylab = 'Anomaly', xlab = 'Year', type ='o', main = "Fig 1: Time series Plot of Global Land Temp. Anomalies", col="blue")

Visual inspection of the plot above shows:-

Trend Analysis -: The time series plot reveals a progressively upward and positive trend from 1850 to 2023. There were however an observable downward trend between 1877 & 1910 and another brief downward trend between 1940 & 1950. These negative downward trend needs further investigation as it may be as a result of natural climate variabilities or human activities such as the Industrial Revolution which though started in the late 18th century, its effects on global temperature anomalies became more pronounced in the late 19th and early 20th centuries.

Seasonality -: Despite apparent mid swings year on year, we can observe no seasonal pattern over the timeframe considered (1850 - 2023)

Changing variance -: There are observable change in variance observed in our series.

Behavior - The plot shows Moving Average (MA) characteristics. We would need to investigate to confirm the presence of any Autoregressive behavior in our data.

Change Point - The plot has observable change point

1.2.2 Data Visualization - Scatter and lagged Scatter Plot

# Create a scatter plot of the time series data
plot(ts_assignment2Data2024, type = "p", pch = 16, col = "blue",
     main = "Fig 2. Scatter Plot of Global Temp Anomalies (1850 - 2023)",
     xlab = "Year", ylab = "Anomaly (Degrees Celsius)")

# Create a Lagged scatter plot of the time series data to show relationship between points
par(mfrow=c(1,1))
plot(y=ts_assignment2Data2024, x=zlag(ts_assignment2Data2024), ylab='Anomaly (Degrees Celsius)', xlab='Year', 
     main = "Fig 3. Lagged Scatter Plot of Global Temp Anomalies (1850 - 2023)")

The Scatter Plot: Shows individual data points without any connection between them. This provides a visual representation of the distribution of the temperature anomalies over time.

Lagged Scatter Plot: Shows This plot connects each data point with its lagged counterpart, where each point represents the relationship between the temperature anomaly in a given year and its lagged value (e.g., temperature anomaly in the previous year).

Visually inspecting both scatter plots above, we can see that the data points exhibit a discernible linear trend, they form an upward-sloping line from left to right. This observation indicates a positive correlation between the two variables under consideration.

1.2.3 Correlation Test of Scatter Plot

# Read the data into y
y <- ts_assignment2Data2024

# Generate the first lag of the series
x <- zlag(ts_assignment2Data2024)

# Create an index to exclude the first NA value in x if applicable
index <- 2:length(x)

# Calculate the correlation between numerical values in x and y
correlation <- cor(y[index], x[index])

print(correlation)

## [1] 0.9399931

The result 0.9399931 suggests that the correlation coefficient between the variables, y and x has a strong linear relationship between the original time series and its lagged version.

1.3 STATIONARITY TEST AND VALIDATION

1.3.1. Breusch-Pagan test for heteroscedasticity

# Performing the Breusch-Pagan test for heteroscedasticity
anom <- 1:length(ts_assignment2Data2024)

# Fit a linear regression model with an intercept and the "anom" variable
model <- lm(ts_assignment2Data2024 ~ anom)

# Perform the Breusch-Pagan test for heteroscedasticity
bp_test <- bptest(model)

# Print the test results
print(bp_test)

## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.11734, df = 1, p-value = 0.7319

The Breusch-Pagan test which is used to test for heteroscedasticity shows that our p-value of 0.7319 is greater than the typical significance level of 0.05. This indicates that the assumptions of constant variance of errors (homoscedasticity) in the regression model are not violated. Hence, It shows the consistency in the variability of our data.

1.3.2. Data Visualization - ACF Plot

# Plot the ACF and PACF Plot to check for stationarity
par(mfrow=c(1,1))
acf(ts_assignment2Data2024, main = "Fig 4. Autocorrelation Function (ACF)")

pacf(ts_assignment2Data2024, main = "Fig 5. Partial Autocorrelation Function (PACF)")

Autocorrelation Function (ACF): The bars in the ACF plot shows all significant lags are over the horizontal dash confidence line (“blue”). It also shows a slow decay signifying a moving average process. This suggests a strong autocorrelation pattern across multiple lags. It also suggests that past values of the time series are good predictors of future values

Partial Autocorrelation (PACF): We can see that the first and third lags are above the confidence line. The significant spike at the first lag indicates a strong correlation between the series and its first lagged value. This suggests the presence of autoregressive behavior. There is also an observable significant spike at the third lag, after a quick dampening from the first lag. The quick dampening after the third lag is a behavior consistent with autoregressive process.

1.3.3. Augmented Dickey-Fuller (ADF) Test

# Perform ADF test
adf.test(ts_assignment2Data2024)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_assignment2Data2024
## Dickey-Fuller = -1.1974, Lag order = 5, p-value = 0.9044
## alternative hypothesis: stationary

With a p-value of 0.9044, which is greater than the significance level of 0.05, we fail to reject the null hypothesis. The ADF test suggests that the time series may be non-stationary, meaning it may exhibit a trend or other non-constant behavior over time.

1.3.4 Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test

# Perform KPSS test
kpss.test(ts_assignment2Data2024)

## 
##  KPSS Test for Level Stationarity
## 
## data:  ts_assignment2Data2024
## KPSS Level = 2.3416, Truncation lag parameter = 4, p-value = 0.01

With a P-value of 0.01, which is less than the typical significance level of 0.05, we reject the null hypothesis. Therefore, based on the KPSS test, there is sufficient evidence to conclude that the time series ts_assignment2Data2024 is not trend-stationary suggestive of some trend or irregular pattern in the data.

1.3.5 Phillips-Perron Unit Root Test

# Perform PP test
pp.test(ts_assignment2Data2024)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  ts_assignment2Data2024
## Dickey-Fuller Z(alpha) = -12.042, Truncation lag parameter = 4, p-value
## = 0.4258
## alternative hypothesis: stationary

With a p-Value of 0.4258, which is greater than the typical significance level of 0.05, we fail to reject the null hypothesis. Therefore, based on the Phillips-Perron test results, there is insufficient evidence to suggest that the time series may exhibit non-stationarity, meaning it might have a trend or irregular pattern.

1.3.6 Normality Q-Q Plot with qqline

# Q-Q plot with qqline
qqnorm(y = ts_assignment2Data2024, main = "Fig 6. Q-Q Plot (Global Temp Anomalies)", col = "blue")
qqline(y = ts_assignment2Data2024, col = 2, lwd = 2, lty = 2)

The Q-Q plot (quantile-quantile plot) shows a divergence at the end of the line. We can assume that the our dataset does not adhere to a normal distribution at both ends of the data point.

1.3.7 Shapiro-Wilk normality test

shapiro.test(ts_assignment2Data2024)

## 
##  Shapiro-Wilk normality test
## 
## data:  ts_assignment2Data2024
## W = 0.94279, p-value = 1.897e-06

We can confirm that our time series does not follow a normal distribution. With a very low p-value (close to zero), significantly smaller than the typical significance level of 0.05, This shows that our time series - “ts_assignment2Data2024” is not normally distributed.

1.3.8 Histogram with Density Plot

# Create a histogram with a density plot overlay
hist(ts_assignment2Data2024, breaks = 20, freq = FALSE, main = "Fig 7. Histogram with Density Plot (Global Temp Anomalies)", xlab = "Value", ylab = "Density")
lines(density(ts_assignment2Data2024), col = "blue")

The histogram shows our dataset is right skewed to our distribution.

1.4 ADDRESSING THE NON-STATIONARITY IN OUR TIME SERIES

1.4.1 With Box Cox Transformation

The Box-Cox transformation is typically used to stabilize the variance of a time series and make it more homoscedastic (having constant variance) and closer to a normal distribution.

# Add a small offset to the data to make it positive
ts_assignment2Data2024_1 <- ts_assignment2Data2024 + abs(min(ts_assignment2Data2024)) + 0.01

# Apply the Box-Cox transformation
BC <- suppressWarnings(BoxCox.ar(y = ts_assignment2Data2024_1, lambda = seq(-3, 3, 0.01)))

# Plot log-likelihood versus lambda
plot(BC$lambda, BC$loglike, type = "l", xlab = "Lambda", ylab = "Log-Likelihood",
     main = "Fig. 8. Log-likelihood Vs Lambda for Box-Cox Transformation")

BC$ci # Values of the first and third vertical lines

## [1] 0.85 1.19

# Find the lambda value corresponding to the maximum log-likelihood
lambda_max <- BC$lambda[which.max(BC$loglike)]

lambda_max

## [1] 1.03

1.4.2 Box Cox Transformation: - Orginal Data vs Transformed Data Plot

# Original Data Plot
plot(ts_assignment2Data2024, 
     type = "l", 
     col = "blue", 
     xlab = "Year", 
     ylab = "Original Data", 
     main = "Fig. 9. Time Series Plot (Global Temp Anomalies)")

# Box-Cox Transformed Data Plot
BC_ts_assignment2Data2024 <- BoxCox(ts_assignment2Data2024_1, lambda_max)
plot(BC_ts_assignment2Data2024, 
     type = "l", 
     col = "red", 
     xlab = "Year", 
     ylab = "Transformed Data", 
     main = "Fig. 10. Time Series Plot of Box-Cox Transformed Data")

shapiro.test(BC_ts_assignment2Data2024)

## 
##  Shapiro-Wilk normality test
## 
## data:  BC_ts_assignment2Data2024
## W = 0.93871, p-value = 8.765e-07

Despite the application of the Box-Cox transformation, from Fig. 9, the difference using the Shapiro-Wilk normality was marginal. The p-value = 8.765e-07 is quite far from the ideal significance level (α = 0.05).

1.4.3 Q-Q Plot of transformed Data

# Create a QQ plot of the transformed data
qqnorm(BC_ts_assignment2Data2024, main = "Fig. 11. QQ Plot of Transformed Data (Global Temp Anomalies", col = "red")

# Add a QQ line to the plot
qqline(y = BC_ts_assignment2Data2024, col = 1, lwd = 2, lty = 2)

In the context of a Box-Cox transformation, BC$ci represents the confidence interval for the estimated optimal lambda.max (λ) parameter obtained from the Box-Cox transformation. The output 1.03 indicates that the lambda value associated with the maximum log-likelihood, and thus the middle vertical line in the Box-Cox transformation plot, is approximately 1.03. This lambda value represents the optimal transformation parameter that maximizes the likelihood of the transformed data. The range [0.85, 1.19] indicates that lambda values within this range result in similar maximum log-likelihood values. We can also see that the Box-Cox transformation may not help stabilize the time series data variance; we would therefore focus on other transformation approach to achieve the possibility of normality.

1.5 USING DIFFERENCING TO ACHIEVE STATIONARITY

1.5.1 Differencing our Time series

# Applying differencing to the Time series
ts_assignment2Data2024_diff <- diff(ts_assignment2Data2024, differences = 1)

# Plotting the differenced series
plot(ts_assignment2Data2024_diff, ylab = "Anomaly", xlab = "Year",
     main = "Fig. 12. Time series Plot (Global Land Temperature Anomalies)", type="o")

After applying the first differencing to the time series data, Fig. 11. shows that there is an observable change in the plot. Our plot appears seasonal and stationary. We would need to carry out further test to confirm our observation.

1.5.2 Confirming Stationarity

# Perform ADF test
adf.test(ts_assignment2Data2024_diff)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_assignment2Data2024_diff
## Dickey-Fuller = -7.0878, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

With a p-value of 0.01, which is less than the significance level of 0.05, we reject the null hypothesis. Therefore, based on the ADF test results, we have sufficient evidence to conclude that the differenced time series ts_assignment2Data2024_diff is stationary.

1.5.1.1. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test

# Perform KPSS test
kpss.test(ts_assignment2Data2024_diff)

## 
##  KPSS Test for Level Stationarity
## 
## data:  ts_assignment2Data2024_diff
## KPSS Level = 0.24108, Truncation lag parameter = 4, p-value = 0.1

With the p-value (0.1) which is greater than the significance level, we fail to reject the null hypothesis. Therefore, based on the KPSS test results, there is insufficient evidence to conclude that the differenced time series ts_assignment2Data2024_diff is non-stationary around a deterministic trend.

1.5.1.2. Phillips-Perron Unit Root Test

# Perform PP test
pp.test(ts_assignment2Data2024_diff)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  ts_assignment2Data2024_diff
## Dickey-Fuller Z(alpha) = -136.61, Truncation lag parameter = 4, p-value
## = 0.01
## alternative hypothesis: stationary

With a p-value (0.01) is less than the significance level, we reject the null hypothesis. Therefore, based on the PP test results, we have sufficient evidence to conclude that the differenced time series ts_assignment2Data2024_diff is stationary.

1.6. MODEL SPECIFICATION AND FITTTING

Using ARIMA (AutoRegressive Integrated Moving Average) models provides a widely used context in time series forecasting techniques. For determining the best model, we would use a combination of steps/processes. such as Auto.Arima, EACF, BIC and ACF, PACF.

1.6.1 Approach 1: - Using ACF - PACF of First Differenced Time Series

# Plot the ACF of the first differenced time series
acf(ts_assignment2Data2024_diff, main = "Fig. 13. ACF of the First Difference")

# Plot the PACF of the first differenced time series
pacf(ts_assignment2Data2024_diff, main = "FIg 14. PACF of the First Difference")

From Figure 13 and 14, the ACF and PACF plots provide insights into the correlation structure of the differenced time series. The following observations can be noticed: -

Both PACF and ACF display Sinusoidal properties
The negative correlations at lag 2 in both ACF and PACF indicate that the current observation is negatively correlated with the observation two time points ago.
The presence of significant correlations at lags 4, 14, and 18 in the ACF suggests potential seasonality or periodic behavior in the data.
The sinusoidal shape of the ACF plot further supports the presence of a cyclical pattern.
The PACF plot helps identify the direct relationships between observations, with significant correlations at lags 2 and 18.

1.6.2. Approach 2: - Using EACF Plot

# Generate EACF plot 
eacf_plot <- eacf(ts_assignment2Data2024_diff, ar.max = 14, ma.max = 14)

## AR/MA
##    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 0  o x o o o o o o o o o  o  o  o  o 
## 1  o x o o o o o o o o o  o  o  o  o 
## 2  x o x o o o o o o o o  o  o  o  o 
## 3  x o x o o o o o o o o  o  o  o  o 
## 4  x x x o o o o o o o o  o  o  o  o 
## 5  o x o o o o x o o o o  o  o  o  o 
## 6  x x o x o o x o o o o  o  o  o  o 
## 7  x x o x x o o o o o o  o  o  o  o 
## 8  o x o x o o o o o o o  o  o  o  o 
## 9  o x x x x x x o o o o  o  o  o  o 
## 10 x o x o o o o x o o o  o  o  o  o 
## 11 x x o o o o o x o o o  o  o  o  o 
## 12 x x x o o o o o o o o  o  o  o  o 
## 13 x x o x o o o o o o o  o  o  o  o 
## 14 x x o o o o o o o o o  o  o  o  o

1.6.3. Approach 3: - BIC Table

# Generate BIC table for subset ARIMA models
bic_table <- armasubsets(y = ts_assignment2Data2024_diff, nar = 4, nma = 4, y.name = "ts_assignment2Data2024_diff", ar.method = "ols")

## Reordering variables and trying again:

# Plot the BIC table
plot(bic_table)

1.6.4 Approach 4: - Using AUTO ARIMA Function

auto.arima(ts_assignment2Data2024, trace = TRUE)

## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,1,2) with drift         : -344.9453
##  ARIMA(0,1,0) with drift         : -319.6611
##  ARIMA(1,1,0) with drift         : -317.4408
##  ARIMA(0,1,1) with drift         : -320.6917
##  ARIMA(0,1,0)                    : -321.0892
##  ARIMA(1,1,2) with drift         : -345.6942
##  ARIMA(0,1,2) with drift         : -348.364
##  ARIMA(0,1,3) with drift         : -346.9123
##  ARIMA(1,1,1) with drift         : -333.0038
##  ARIMA(1,1,3) with drift         : -349.4323
##  ARIMA(2,1,3) with drift         : -345.3634
##  ARIMA(1,1,4) with drift         : -346.1384
##  ARIMA(0,1,4) with drift         : -348.1132
##  ARIMA(2,1,4) with drift         : -343.2256
##  ARIMA(1,1,3)                    : -348.4185
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(1,1,3) with drift         : -354.8723
## 
##  Best model: ARIMA(1,1,3) with drift

## Series: ts_assignment2Data2024 
## ARIMA(1,1,3) with drift 
## 
## Coefficients:
##           ar1     ma1      ma2      ma3   drift
##       -0.9579  0.8500  -0.5480  -0.4684  0.0048
## s.e.   0.0394  0.0777   0.0827   0.0679  0.0027
## 
## sigma^2 = 0.00718:  log likelihood = 183.69
## AIC=-355.38   AICc=-354.87   BIC=-336.46

The EACF plot offers insights into potential ARIMA (AutoRegressive Integrated Moving Average) models by displaying combinations of possible AutoRegressive (AR) and Moving Average (MA) terms. Each row represents a potential AR order, while each column represents a potential MA order. Symbols such as “o” and “x” indicate the appropriateness of each term for modeling the data.

From the output, the following can be taken into consideration:

The presence of “o” suggests the term may be suitable for inclusion.
Conversely, “x” indicates that the term is less likely to be appropriate.

The observed “o” and “x” patterns in the plot lead to the identification of several candidate ARIMA models, including variations such as ARIMA(3,1,3), ARIMA(3,1,2), ARIMA(2,1,3), etc.

Further analysis involves considering additional candidate models based on both the EACF plot and other criteria such as the Bayesian Information Criterion (BIC) table and the Auto-ARIMA function: Candidate Models like ARIMA(0,1,0), ARIMA(1,1,0), ARIMA(1,1,2), ARIMA(0,1,3), and ARIMA(1,1,1) are identified based on the “o” marks in the EACF plot, indicating potential ARIMA terms.

Additional candidate models, such as ARIMA(2,1,2), ARIMA(1,1,4), ARIMA(0,1,4), etc., are derived from the Auto-ARIMA function. In summary, the analysis utilizes the EACF plot, BIC table, and Auto-ARIMA function to identify a range of potential ARIMA models, considering various combinations of AR and MA terms along with differencing orders. These candidate models provide a basis for further evaluation and selection of the most suitable model for the time series data.

1.6.5 Testing our candidate models with Box - Ljung Test

# Define the candidate models
candidate_models <- list(
  ARIMA_001 = c(0, 1, 0),
  ARIMA_110 = c(1, 1, 0),
  ARIMA_112 = c(1, 1, 2),
  ARIMA_013 = c(0, 1, 3),
  ARIMA_111 = c(1, 1, 1),
  ARIMA_113 = c(1, 1, 3),
  ARIMA_212 = c(2, 1, 2),
  ARIMA_011 = c(0, 1, 1),
  ARIMA_012 = c(0, 1, 2),
  ARIMA_213 = c(2, 1, 3),
  ARIMA_114 = c(1, 1, 4),
  ARIMA_014 = c(0, 1, 4),
  ARIMA_214 = c(2, 1, 4)
)

# Function to perform Box-Ljung test for a given model
perform_box_ljung_test <- function(model_order, ts_assignment2Data2024) {
  model <- Arima(ts_assignment2Data2024, order = model_order)
  residuals <- resid(model)
  
  # Perform Box-Ljung test
  box_ljung_test <- Box.test(residuals, lag = 20, type = "Ljung-Box")
  
  return(box_ljung_test)
}

# Perform Box-Ljung test for each candidate model
box_ljung_test_results <- lapply(candidate_models, function(model_order) {
  perform_box_ljung_test(model_order, ts_assignment2Data2024)
})

# Print the results
names(box_ljung_test_results) <- names(candidate_models)
print(box_ljung_test_results)

## $ARIMA_001
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 60.234, df = 20, p-value = 6.551e-06
## 
## 
## $ARIMA_110
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 60.058, df = 20, p-value = 6.977e-06
## 
## 
## $ARIMA_112
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 30.91, df = 20, p-value = 0.05639
## 
## 
## $ARIMA_013
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 30.677, df = 20, p-value = 0.0596
## 
## 
## $ARIMA_111
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 41.186, df = 20, p-value = 0.003526
## 
## 
## $ARIMA_113
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 23.302, df = 20, p-value = 0.2742
## 
## 
## $ARIMA_212
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 27.924, df = 20, p-value = 0.1112
## 
## 
## $ARIMA_011
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 54.282, df = 20, p-value = 5.253e-05
## 
## 
## $ARIMA_012
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 31.065, df = 20, p-value = 0.05434
## 
## 
## $ARIMA_213
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 23.711, df = 20, p-value = 0.2552
## 
## 
## $ARIMA_114
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 24.025, df = 20, p-value = 0.2413
## 
## 
## $ARIMA_014
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 25.329, df = 20, p-value = 0.1891
## 
## 
## $ARIMA_214
## 
##  Box-Ljung test
## 
## data:  residuals
## X-squared = 23.756, df = 20, p-value = 0.2532

ARIMA_001, ARIMA_110, ARIMA_111, and ARIMA_011 have small p-values (< 0.05), indicating significant autocorrelation in the residuals. The other models have p-values greater than 0.05, suggesting no significant autocorrelation in the residuals at a typical significance level of 0.05. However, it’s essential to interpret these results cautiously and consider additional diagnostics and model evaluation criteria.

1.6.6 Testing our candidate models with Shapiro-Wilk normality test

# Define the candidate models
candidate_models <- list(
  ARIMA_001 = c(0, 1, 0),
  ARIMA_110 = c(1, 1, 0),
  ARIMA_112 = c(1, 1, 2),
  ARIMA_013 = c(0, 1, 3),
  ARIMA_111 = c(1, 1, 1),
  ARIMA_113 = c(1, 1, 3),
  ARIMA_212 = c(2, 1, 2),
  ARIMA_011 = c(0, 1, 1),
  ARIMA_012 = c(0, 1, 2),
  ARIMA_213 = c(2, 1, 3),
  ARIMA_114 = c(1, 1, 4),
  ARIMA_014 = c(0, 1, 4),
  ARIMA_214 = c(2, 1, 4)
)

# Function to perform Shapiro-Wilk test for a given model
perform_shapiro_wilk_test <- function(model_order, ts_assignment2Data2024) {
  model <- Arima(ts_assignment2Data2024, order = model_order)
  residuals <- resid(model)
  
  # Perform Shapiro-Wilk test
  shapiro_wilk_test <- shapiro.test(residuals)
  
  return(shapiro_wilk_test)
}

# Perform Shapiro-Wilk test for each candidate model
shapiro_wilk_test_results <- lapply(candidate_models, function(model_order) {
  perform_shapiro_wilk_test(model_order, ts_assignment2Data2024)
})

# Print the results
names(shapiro_wilk_test_results) <- names(candidate_models)
print(shapiro_wilk_test_results)

## $ARIMA_001
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.99039, p-value = 0.2921
## 
## 
## $ARIMA_110
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.99197, p-value = 0.4456
## 
## 
## $ARIMA_112
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.98344, p-value = 0.03674
## 
## 
## $ARIMA_013
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.98392, p-value = 0.04241
## 
## 
## $ARIMA_111
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.99295, p-value = 0.5639
## 
## 
## $ARIMA_113
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.98233, p-value = 0.02629
## 
## 
## $ARIMA_212
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.97774, p-value = 0.006826
## 
## 
## $ARIMA_011
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.9953, p-value = 0.8614
## 
## 
## $ARIMA_012
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.98277, p-value = 0.02999
## 
## 
## $ARIMA_213
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.97581, p-value = 0.003958
## 
## 
## $ARIMA_114
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.97574, p-value = 0.003876
## 
## 
## $ARIMA_014
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.9775, p-value = 0.00637
## 
## 
## $ARIMA_214
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.97558, p-value = 0.00371

Based on the p-values:

ARIMA_001, ARIMA_110, ARIMA_111, and ARIMA_011 have large p-values (> 0.05), indicating that the residuals are likely normally distributed. The other models have small p-values (< 0.05), suggesting that the residuals may not follow a normal distribution.

1.6.7 Candidate Model’s AIC and BIC Confirmation test

fit_arima_models <- function(time_series, arima_orders) {
  models <- list()
  for (order in arima_orders) {
    cat("Fitting model with order:", order, "\n")
    tryCatch({
      model <- arima(time_series, order = order, method = 'CSS-ML')
      coef_test <- coeftest(model)
      aic_score <- AIC(model)
      bic_score <- BIC(model)
      models[[paste("ARIMA(", paste(order, collapse = ","), ")", sep = "")]] <- list(model = model, coef_test = coef_test, AIC = aic_score, BIC = bic_score)
      cat("Model fitted successfully!\n")
    }, error = function(e) {
      cat("Error fitting model:", conditionMessage(e), "\n")
    })
  }
  return(models)
}

# Define the list of ARIMA models
arima_orders <- list(
  c(0, 1, 0), c(1, 1, 0), c(1, 1, 2), c(0, 1, 3),
  c(1,1,1), c(1,1,3), c(2,1,2), c(0,1,1),
  c(0,1,2), c(0,1,3), c(2,1,3), c(1,1,4), 
  c(0,1,4), c(2,1,4)
)
ts_assignment2Data2024_models <- fit_arima_models(ts_assignment2Data2024, arima_orders)

## Fitting model with order: 0 1 0 
## Error fitting model: length of 'dimnames' [2] not equal to array extent 
## Fitting model with order: 1 1 0 
## Model fitted successfully!
## Fitting model with order: 1 1 2 
## Model fitted successfully!
## Fitting model with order: 0 1 3 
## Model fitted successfully!
## Fitting model with order: 1 1 1 
## Model fitted successfully!
## Fitting model with order: 1 1 3 
## Model fitted successfully!
## Fitting model with order: 2 1 2 
## Model fitted successfully!
## Fitting model with order: 0 1 1 
## Model fitted successfully!
## Fitting model with order: 0 1 2 
## Model fitted successfully!
## Fitting model with order: 0 1 3 
## Model fitted successfully!
## Fitting model with order: 2 1 3 
## Model fitted successfully!
## Fitting model with order: 1 1 4 
## Model fitted successfully!
## Fitting model with order: 0 1 4 
## Model fitted successfully!
## Fitting model with order: 2 1 4 
## Model fitted successfully!

# Accessing the models and their coefficient tests, AIC and BIC scores:
for (model_name in names(ts_assignment2Data2024_models)) {
  cat("Model:", model_name, "\n")
  cat("Coefficient test:\n")
  print(ts_assignment2Data2024_models[[model_name]]$coef_test)
  cat("AIC:", ts_assignment2Data2024_models[[model_name]]$AIC, "\n")
  cat("BIC:", ts_assignment2Data2024_models[[model_name]]$BIC, "\n\n")
}

## Model: ARIMA(1,1,0) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)
## ar1 -0.052520   0.077271 -0.6797   0.4967
## 
## AIC: -324.3068 
## BIC: -318.0002 
## 
## Model: ARIMA(1,1,2) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.070052   0.155949  0.4492    0.6533    
## ma1 -0.207737   0.136254 -1.5246    0.1274    
## ma2 -0.366972   0.067424 -5.4428 5.245e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -350.525 
## BIC: -337.9119 
## 
## Model: ARIMA(0,1,3) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.121301   0.086912 -1.3957    0.1628    
## ma2 -0.377080   0.061660 -6.1155 9.624e-10 ***
## ma3 -0.051447   0.083107 -0.6191    0.5359    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -350.7104 
## BIC: -338.0972 
## 
## Model: ARIMA(1,1,1) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1  0.503412   0.111791   4.5031 6.696e-06 ***
## ma1 -0.784973   0.073598 -10.6657 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -338.5562 
## BIC: -329.0963 
## 
## Model: ARIMA(1,1,3) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.958563   0.039123 -24.5010 < 2.2e-16 ***
## ma1  0.870269   0.077771  11.1902 < 2.2e-16 ***
## ma2 -0.510305   0.082912  -6.1548 7.517e-10 ***
## ma3 -0.449688   0.067245  -6.6873 2.274e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -354.6225 
## BIC: -338.856 
## 
## Model: ARIMA(2,1,2) 
## Coefficient test:
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ar1  0.13050    0.17423  0.7490  0.45386  
## ar2 -0.29632    0.16652 -1.7795  0.07516 .
## ma1 -0.25890    0.17914 -1.4452  0.14841  
## ma2 -0.12000    0.17013 -0.7053  0.48061  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -351.3795 
## BIC: -335.6131 
## 
## Model: ARIMA(0,1,1) 
## Coefficient test:
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ma1 -0.27808    0.14943 -1.8609  0.06275 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -326.4765 
## BIC: -320.1699 
## 
## Model: ARIMA(0,1,2) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.154685   0.067338 -2.2971   0.02161 *  
## ma2 -0.377000   0.062068 -6.0739 1.248e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -352.3256 
## BIC: -342.8657 
## 
## Model: ARIMA(2,1,3) 
## Coefficient test:
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.76412    0.25815 -2.9599 0.0030771 ** 
## ar2 -0.28008    0.17716 -1.5809 0.1138919    
## ma1  0.64794    0.24091  2.6895 0.0071561 ** 
## ma2 -0.24227    0.15730 -1.5402 0.1235162    
## ma3 -0.40488    0.11319 -3.5769 0.0003477 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -351.9828 
## BIC: -333.0631 
## 
## Model: ARIMA(1,1,4) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.388706   0.281550 -1.3806   0.16740    
## ma1  0.275135   0.278312  0.9886   0.32287    
## ma2 -0.479540   0.076914 -6.2348 4.524e-10 ***
## ma3 -0.226674   0.131130 -1.7286   0.08388 .  
## ma4  0.135023   0.086431  1.5622   0.11824    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -351.807 
## BIC: -332.8872 
## 
## Model: ARIMA(0,1,4) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.102776   0.076979 -1.3351   0.18184    
## ma2 -0.441424   0.077029 -5.7306 1.001e-08 ***
## ma3 -0.078958   0.075397 -1.0472   0.29500    
## ma4  0.147710   0.074192  1.9909   0.04649 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## AIC: -352.714 
## BIC: -336.9475 
## 
## Model: ARIMA(2,1,4) 
## Coefficient test:
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)
## ar1 -0.636106   0.831280 -0.7652   0.4441
## ar2 -0.200243   0.528641 -0.3788   0.7048
## ma1  0.519642   0.829833  0.6262   0.5312
## ma2 -0.310154   0.453821 -0.6834   0.4943
## ma3 -0.342797   0.398842 -0.8595   0.3901
## ma4  0.045398   0.289425  0.1569   0.8754
## 
## AIC: -349.9966 
## BIC: -327.9236

We use the Hybrid Method: The “CSS-ML” hybrid method combines elements of both Conditional Sum of Squares and Maximum Likelihood estimation. It likely involves an iterative approach where the optimization algorithm alternates between CSS and ML steps to refine the parameter estimates and improve the overall fit of the ARIMA model to the data.

This hybrid approach may offer advantages such as computational efficiency, robustness, or improved performance compared to using either CSS or ML alone. By leveraging the strengths of both methods, the hybrid approach aims to provide more accurate parameter estimates and better model fit for time series data.

Based on the AIC and BIC values, the most ideal models (with lowest AIC and BIC values) are:

ARIMA(1,1,0)
ARIMA(0,1,0)

Using the aforementioned aproach above, ARIMA(1,1,0) appears to be the best model as it has the lowest AIC and BIC values (-324.3068 for both), indicating the best balance between goodness of fit and model complexity.

1.6.8. Candidate Model’s Accuracy table confirmation

# Define the candidate models
candidate_models <- list(
  ARIMA_001 = c(0, 1, 0),
  ARIMA_110 = c(1, 1, 0),
  ARIMA_112 = c(1, 1, 2),
  ARIMA_013 = c(0, 1, 3),
  ARIMA_111 = c(1, 1, 1),
  ARIMA_113 = c(1, 1, 3),
  ARIMA_212 = c(2, 1, 2),
  ARIMA_011 = c(0, 1, 1),
  ARIMA_012 = c(0, 1, 2),
  ARIMA_213 = c(2, 1, 3),
  ARIMA_114 = c(1, 1, 4),
  ARIMA_014 = c(0, 1, 4),
  ARIMA_214 = c(2, 1, 4)
)

# Function to compute accuracy metrics for a given model
compute_accuracy <- function(model_order, ts_assignment2Data2024) {
  tryCatch({
    model <- Arima(ts_assignment2Data2024, order = model_order)
    accuracy_metrics <- accuracy(model)
    return(accuracy_metrics)
  }, error = function(e) {
    return(rep(NA, 7)) # Return NA values for all metrics in case of error
  })
}

# Compute accuracy metrics for each candidate model
accuracy_results <- lapply(candidate_models, function(model_order) {
  compute_accuracy(model_order, ts_assignment2Data2024)
})

# Create a table of accuracy results
accuracy_table <- tibble(
  Model = names(candidate_models),
  ME = sapply(accuracy_results, function(x) x[1]),
  RMSE = sapply(accuracy_results, function(x) x[2]),
  MAE = sapply(accuracy_results, function(x) x[3]),
  MPE = sapply(accuracy_results, function(x) x[4]),
  MAPE = sapply(accuracy_results, function(x) x[5]),
  MASE = sapply(accuracy_results, function(x) x[6]),
  ACF1 = sapply(accuracy_results, function(x) x[7])
)

# Print the accuracy table
print(accuracy_table)

## # A tibble: 13 × 8
##    Model          ME   RMSE    MAE   MPE  MAPE  MASE     ACF1
##    <chr>       <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
##  1 ARIMA_001 0.00557 0.0935 0.0714   NaN   Inf 0.994 -0.0536 
##  2 ARIMA_110 0.00579 0.0934 0.0716   NaN   Inf 0.997 -0.0241 
##  3 ARIMA_112 0.0104  0.0855 0.0646   NaN   Inf 0.899 -0.00992
##  4 ARIMA_013 0.0105  0.0855 0.0646   NaN   Inf 0.899 -0.0222 
##  5 ARIMA_111 0.0108  0.0891 0.0687   NaN   Inf 0.956  0.0915 
##  6 ARIMA_113 0.0102  0.0839 0.0637   NaN   Inf 0.887 -0.0240 
##  7 ARIMA_212 0.00911 0.0848 0.0635   NaN   Inf 0.884 -0.00650
##  8 ARIMA_011 0.00716 0.0928 0.0718   NaN   Inf 1.00   0.101  
##  9 ARIMA_012 0.0102  0.0856 0.0645   NaN   Inf 0.898  0.00343
## 10 ARIMA_213 0.00974 0.0841 0.0627   NaN   Inf 0.872 -0.0123 
## 11 ARIMA_114 0.00948 0.0842 0.0628   NaN   Inf 0.874 -0.0134 
## 12 ARIMA_014 0.00924 0.0845 0.0632   NaN   Inf 0.880 -0.0233 
## 13 ARIMA_214 0.00963 0.0841 0.0627   NaN   Inf 0.872 -0.0118

From the Accuracy table, the model with the lowest RMSE, MAE, MAPE, MASE, and ACF1 is ARIMA(011). However, it’s important to note that ARIMA(111) has the highest ME among all models, suggesting that on average, its forecasts are slightly biased. This may indicate that while its point forecast accuracy is slightly lower, its direction of errors is better balanced. Since we cannot calculate MPE for any model (resulting in NaN), it’s not possible to determine which model performs best in terms of percentage error.

Considering a combination of the lowest RMSE, MAE, MAPE, MASE, and ACF1, along with a reasonably low ME, ARIMA(011) appears to be the best overall model and is likely to perform reasonably well across the board.

1.6.9. Preferred Candidate Model’s residual Test confirmation

# Fit the ARIMA(0,1,1) model
model_011 <- Arima(ts_assignment2Data2024, order = c(0, 1, 1))

# Fit the ARIMA(0,1,0) model
model_010 <- Arima(ts_assignment2Data2024, order = c(0, 1, 0))

# Obtain residuals for ARIMA(0,1,1) model
residuals_011 <- resid(model_011)

# Obtain residuals for ARIMA(0,1,0) model
residuals_010 <- resid(model_010)

# Standardize residuals for ARIMA(0,1,1) model
standardized_residuals_011 <- residuals_011 / sqrt(var(residuals_011))

# Standardize residuals for ARIMA(0,1,0) model
standardized_residuals_010 <- residuals_010 / sqrt(var(residuals_010))

# Residual plot for ARIMA(0,1,1) model
plot(residuals_011, main = "Fig.15.Residual Plot (ARIMA(0,1,1))", ylab = "Residuals")

# Time series plot of standardized residuals for ARIMA(0,1,1) model
plot(standardized_residuals_011, type = "l", main = "Fig.16.Time Series Plot of Standardized Residuals (ARIMA(0,1,1))", ylab = "Standardized Residuals")

# Generate histogram of standardized residuals for ARIMA(0,1,1) model
hist(standardized_residuals_011, main = "Fig. 17.Histogram of Standardized Residuals (ARIMA(0,1,1))", xlab = "Standardized Residuals", ylab = "Frequency")

# Generate QQ plot
qqnorm(standardized_residuals_011, xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
qqline(standardized_residuals_011)

# Add a custom title below the plot
title(main = "Fig 18", sub = "", adj = 0.285)

# ACF plot of standardized residuals for ARIMA(0,1,1) model
acf(standardized_residuals_011, main = "Fig.19.ACF Plot of Standardized Residuals (ARIMA(0,1,1))")

# PACF plot of standardized residuals for ARIMA(0,1,1) model
pacf(standardized_residuals_011, main = "Fig.20.ACF Plot of Standardized Residuals (ARIMA(0,1,1))")

# Residual plot for ARIMA(0,1,0) model
plot(residuals_010, main = "Fig. 21.Residual Plot (ARIMA(0,1,0))", ylab = "Residuals")

# Time series plot of standardized residuals for ARIMA(0,1,0) model
plot(standardized_residuals_010, type = "l", main = "Fig.22. Time Series Plot of Standardized Residuals (ARIMA(0,1,0))", ylab = "Standardized Residuals")

# Generate histogram of standardized residuals for ARIMA(0,1,0) model
hist(standardized_residuals_010, main = "Fig.23. Histogram of Standardized Residuals (ARIMA(0,1,0))", xlab = "Standardized Residuals", ylab = "Frequency")

# Generate QQ plot
qqnorm(standardized_residuals_010, xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
qqline(standardized_residuals_010)

# Add a custom title below the plot
title(main = "Fig 24", sub = "", adj = 0.285)

# ACF plot of standardized residuals for ARIMA(0,1,0) model
acf(standardized_residuals_010, main = "Fig.25. ACF Plot of Standardized Residuals (ARIMA(0,1,0))")

Figure 15 to Fig 22 shows residual plot for - ARIMA(1,1,0) and ARIMA(0,1,0). They do not display any trend though they display seasonality and variance. The histogram of the standardized residuals for both candidates ranges between -2 and +. The Quantile-Quantile (QQ) plot shows some slight deviation for some of the data points at both tails which confirms some deviation from normality same is observed for our standardized residuals. We sill have significant correlation in our ACF plot which can be seen in both candidate models. Our model captures our expectation even though there is a need for further improvement in order to be able to properly forecast with subsequent datasets/time series.

1.6.10. Five Year Forecast - Preferred Candidate Model

# ARIMA(0,1,0) model
model_001 <- Arima(ts_assignment2Data2024, order = c(0, 1, 0), method = 'CSS-ML')

# Forecast the next 5 time steps
forecast_values_001 <- forecast::forecast(model_001, h = 5)


# Plot the forecast
plot(forecast_values_001, xlab = "Time", ylab = "Forecasted Values", main = "Fig.26. ARIMA(0,1,0) Forecast")

# Display the forecast values
print(forecast_values_001)

##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 2024           0.91 0.7897759 1.030224 0.7261330 1.093867
## 2025           0.91 0.7399774 1.080023 0.6499729 1.170027
## 2026           0.91 0.7017657 1.118234 0.5915331 1.228467
## 2027           0.91 0.6695517 1.150448 0.5422661 1.277734
## 2028           0.91 0.6411706 1.178829 0.4988610 1.321139

# ARIMA(0,1,1) model
model_011 <- Arima(ts_assignment2Data2024, order = c(0, 1, 1), method = 'CSS-ML')

# Forecast the next 5 time steps
forecast_values_011 <- forecast::forecast(model_011, h = 5)

plot(forecast_values_011, xlab = "Time", ylab = "Forecasted Values", main = "Fig.27. ARIMA(0,1,1) Forecast")

# Display the forecast values
print(forecast_values_011)

##      Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## 2024       0.842267 0.7226351 0.9618990 0.6593058 1.025228
## 2025       0.842267 0.6947185 0.9898155 0.6166111 1.067923
## 2026       0.842267 0.6713012 1.0132329 0.5807973 1.103737
## 2027       0.842267 0.6507257 1.0338083 0.5493298 1.135204
## 2028       0.842267 0.6321555 1.0523785 0.5209292 1.163605

RESULTS

The analysis began with the exploration of the “assignment2Data2024” dataset, which contains yearly Global Land Temperature Anomalies from 1850 to 2023. Visual inspection of the time series plot revealed a progressively upward trend over the years, with noticeable fluctuations and a change in variance. Further examination using scatter plots and lagged scatter plots indicated a positive linear trend and potential autocorrelation in the data. Subsequent analysis involved fitting multiple ARIMA models with different parameters and assessing their performance using various evaluation metrics.

Evaluation of the ARIMA models based on criteria such as AIC, BIC, and diagnostic tests revealed that ARIMA(1,1,0) and ARIMA(0,1,0) emerged as two most suitable model despite other ARIMA candidate models identified which are also very tangible to use. This model demonstrated lower AIC and BIC values compared to others, indicating a better fit for the data. Additionally, the residuals of both exhibited less significant autocorrelation and were likely normally distributed, further validating its suitability for forecasting.

DISCUSSION

The upward trend observed in the temperature anomalies suggests a long-term increase in global land temperatures over the years, with potential implications for climate change. The fluctuations and change in variance indicate the presence of variability in temperature anomalies, which may be influenced by natural climate variability and human activities. The identification of ARIMA(0,1,0) and (0,1,1) and as the most appropriate model highlights the importance of considering both simplicity and performance in model selection. The absence of significant autocorrelation and normality in the residuals of ARIMA(0,1,0) and (0,1,1) further strengthens its reliability for forecasting.

CONCLUSION

Based on the evaluation metrics and diagnostic tests, ARIMA(0,1,0) and (0,1,1) is recommended as the preferred model for forecasting global land temperature anomalies. This model provides a balance between simplicity and performance, offering reliable estimates and forecasts. The findings suggest a continuation of the upward trend in land temperatures, emphasizing the importance of addressing climate change through mitigation and adaptation measures.

RECOMMENDATIONS

Utilize both ARIMA models (0,1,0) and (0,1,1) for future forecasting of global land temperature anomalies, considering its robust performance and suitability for the data.
Monitor and reassess the model periodically to account for any changes in temperature trends or underlying dynamics.
Explore additional factors, such as greenhouse gas emissions and land-use changes, to enhance the predictive accuracy of the model.
Consider integrating other forecasting methods, such as machine learning techniques or ensemble models, to capture complex interactions and uncertainties in climate systems.
Collaborate with climate scientists and policymakers to leverage forecasted temperature anomalies for informed decision-making and climate action planning.
By following these recommendations, stakeholders can leverage the insights from time series analysis to better understand and address the challenges posed by climate change, ultimately contributing to sustainable environmental management and societal well-being.

REFERENCES

Concept of Stationarity test ADF-1.pdf (2019) Instructure.com, https://rmit.instructure.com/courses/112639/files/30914858?wrap=1, accessed 24 March 2024.

Hyndman, R.J., Koehler, A.B., Ord, J.K., and Snyder, R.D. (2008). Forecasting with exponential smoothing: the state space approach, Springer-Verlag.

MyApps Portal (2019a) Instructure.com, https://rmit.instructure.com/courses/124176/files/36179093?module_item_id=5935310, accessed 24 April 2024.

https://www.kaggle.com/datasets/luisblanche/quarterly-tourism-in-australia/code

MyApps Portal (2019b) RMIT Instructure.com, https://rmit.instructure.com/courses/124176/files/36179237?module_item_id=5935377&fd_cookie_set=1, accessed 24 April 2024.

MyApps Portal (2019c) Instructure.com, https://rmit.instructure.com/courses/124176/files/36179023?module_item_id=5935405&fd_cookie_set=1, accessed 23 April 2024.

MyApps Portal (2019d) Instructure.com, https://rmit.instructure.com/courses/124176/files/36179219?module_item_id=5935410, accessed 1 May 2024.

MyApps Portal (2019e) Instructure.com, https://rmit.instructure.com/courses/124176/files/37562069?module_item_id=6150705, accessed 1 May 2024.

Instructure.com, https://rmit.instructure.com/courses/124176/files/36178912?module_item_id=5935330&fd_cookie_set=1, accessed 22 April 2024.

OpenAI. (2024). ChatGPT (March 26 version) [ANN Network + ARIMA Models]. https://chat.openai.com/chat

OpenAI. (2024). ChatGPT (March 24 version) [Multiple Regression, AIC and BIC]. https://chat.openai.com/chat

MATH 1318 - TIME SERIES ANALYSIS - ASSIGNMENT - Final Project

Time Series Analysis of yearly Global Land Temperature Anomalies in Degrees Celsius against the base period 1901-2000

Adeyinka K. Freeman (S3960988)