Climate is the habit and character of the weather that occurs in a place or area. In this case study, we will predict the average climate temperature that occurs in the city of Delhi, India.
library(tidyverse)
library(forecast)
library(lubridate)
library(TTR)
library(fpp)
library(xts)
library(tseries)
library(TSstudio)
library(padr)
library(MLmetrics)climate <- read.csv("DailyDelhiClimate.csv")First, we want to take the columns that we use to predict, namely the date and meantemp columns
climate_clean <- climate %>%
select(date,meantemp)Next, we want to change data type column date to date
climate_clean$date <- ymd(climate_clean$date)Then, after that we sort the data from the oldest date to the newest date, and we fill in the missing date with the pad function
climate_clean <- climate_clean %>%
arrange(date) %>%
pad()## pad applied on the interval: day
Next, we want to check whether there is a missing value in the data
colSums(is.na(climate_clean))## date meantemp
## 0 0
Good, we don’t have missing value in this data
climate_ts <- ts(climate_clean$meantemp, start=c(2013,1), frequency = 365)
autoplot(climate_ts)
Next, we want to know if our time series object has trend and seasonal
properties
climate_dc <- decompose(climate_ts)
autoplot(climate_dc)climate_train <- head(climate_ts, n = -365)
climate_test <- tail(climate_ts, n=365)model_hw <- HoltWinters(climate_train)model_arima <- stlm(climate_train,method="arima")# Model Holt Winters
model_hw_forecast <- forecast(object = model_hw, h = 365)
holt_winters_forecast <- as.vector(model_hw_forecast$mean)
accuracy_holt_winters <- accuracy(holt_winters_forecast,climate_test)
accuracy_holt_winters## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 0.9592066 3.000553 2.382942 3.183632 9.445904 0.7288183 1.714971
# Model Arima
forecast_arima <- forecast(model_arima, h=365)
arima_forecast <- as.vector(forecast_arima$mean)
accuracy_arima <- accuracy(arima_forecast,climate_test)
accuracy_arima## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 2.138268 3.293607 2.739158 8.212782 10.82566 0.6792405 1.901996
We will choose the model that has the smallest MAE value, namely the ARIMA model
test_forecast(actual = climate_ts, forecast.obj = forecast_arima, train = climate_train, test = climate_test)shapiro.test(model_arima$residuals)##
## Shapiro-Wilk normality test
##
## data: model_arima$residuals
## W = 0.99348, p-value = 9.84e-05
Because p-value < 0.05, it means not normality for residuals.
Box.test(x=model_arima$residuals)##
## Box-Pierce test
##
## data: model_arima$residuals
## X-squared = 0.00075099, df = 1, p-value = 0.9781
Because p-value > 0.05, it means no-autocorrelation for residuals.
Based on the MAE of the Holt Winters model and the Arima model, we will choose the smallest MAE of the two models, namely the Arima model to predict the weather in Delhi, India. Then based on the normality of residuals, the Arima model is not normality for residuals. In addition, the Arima model also have no-autocorrelation for residuals.