1 Intro

Climate is the habit and character of the weather that occurs in a place or area. In this case study, we will predict the average climate temperature that occurs in the city of Delhi, India.

1.1 Load Library & Read Data

library(tidyverse)
library(forecast)
library(lubridate)
library(TTR)
library(fpp)
library(xts)
library(tseries)
library(TSstudio)
library(padr)
library(MLmetrics)
climate <- read.csv("DailyDelhiClimate.csv")

2 Data Pre-Processing

First, we want to take the columns that we use to predict, namely the date and meantemp columns

climate_clean <- climate %>% 
  select(date,meantemp)

Next, we want to change data type column date to date

climate_clean$date <- ymd(climate_clean$date)

Then, after that we sort the data from the oldest date to the newest date, and we fill in the missing date with the pad function

climate_clean <- climate_clean %>% 
  arrange(date) %>% 
  pad()
## pad applied on the interval: day

Next, we want to check whether there is a missing value in the data

colSums(is.na(climate_clean))
##     date meantemp 
##        0        0

Good, we don’t have missing value in this data

3 Time Series Object

climate_ts <- ts(climate_clean$meantemp, start=c(2013,1), frequency = 365)
autoplot(climate_ts)

Next, we want to know if our time series object has trend and seasonal properties

climate_dc <- decompose(climate_ts)
autoplot(climate_dc)

4 Cross Validation

climate_train <- head(climate_ts, n = -365)
climate_test <- tail(climate_ts, n=365)

5 Modelling

5.1 Holt Winters Model

model_hw <- HoltWinters(climate_train)

5.2 Arima Model

model_arima <- stlm(climate_train,method="arima")

6 Forecasting & Evaluation

# Model Holt Winters
model_hw_forecast <- forecast(object = model_hw, h = 365)
holt_winters_forecast <- as.vector(model_hw_forecast$mean)
accuracy_holt_winters <- accuracy(holt_winters_forecast,climate_test)
accuracy_holt_winters
##                 ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 0.9592066 3.000553 2.382942 3.183632 9.445904 0.7288183  1.714971
# Model Arima
forecast_arima <- forecast(model_arima, h=365)
arima_forecast <- as.vector(forecast_arima$mean)
accuracy_arima <- accuracy(arima_forecast,climate_test)
accuracy_arima
##                ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 2.138268 3.293607 2.739158 8.212782 10.82566 0.6792405  1.901996

We will choose the model that has the smallest MAE value, namely the ARIMA model

test_forecast(actual = climate_ts, forecast.obj = forecast_arima, train = climate_train, test = climate_test)

7 Assumption

7.1 Normality of residuals

shapiro.test(model_arima$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model_arima$residuals
## W = 0.99348, p-value = 9.84e-05

Because p-value < 0.05, it means not normality for residuals.

7.2 No-Autocorrelation for residuals

Box.test(x=model_arima$residuals)
## 
##  Box-Pierce test
## 
## data:  model_arima$residuals
## X-squared = 0.00075099, df = 1, p-value = 0.9781

Because p-value > 0.05, it means no-autocorrelation for residuals.

8 Conclusion

Based on the MAE of the Holt Winters model and the Arima model, we will choose the smallest MAE of the two models, namely the Arima model to predict the weather in Delhi, India. Then based on the normality of residuals, the Arima model is not normality for residuals. In addition, the Arima model also have no-autocorrelation for residuals.