Climate is the habit and character of the weather that occurs in a place or area. In this case study, we will predict the average climate temperature that occurs in the city of Delhi, India.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'MLmetrics'
##
## The following object is masked from 'package:base':
##
## Recall
## Loading required package: fma
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# read data
climate_train <- read.csv("DailyDelhiClimateTrain.csv")
climate_test <- read.csv("DailyDelhiClimateTest.csv")
# check data
dim(climate_train)## [1] 1462 5
## [1] 114 5
Description of variable :
## Rows: 1,462
## Columns: 5
## $ date <chr> "2013-01-01", "2013-01-02", "2013-01-03", "2013-01-04", "…
## $ meantemp <dbl> 10.000000, 7.400000, 7.166667, 8.666667, 6.000000, 7.0000…
## $ humidity <dbl> 84.50000, 92.00000, 87.00000, 71.33333, 86.83333, 82.8000…
## $ wind_speed <dbl> 0.0000000, 2.9800000, 4.6333333, 1.2333333, 3.7000000, 1.…
## $ meanpressure <dbl> 1015.667, 1017.800, 1018.667, 1017.167, 1016.500, 1018.00…
## Rows: 114
## Columns: 5
## $ date <chr> "2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "…
## $ meantemp <dbl> 15.91304, 18.50000, 17.11111, 18.70000, 18.38889, 19.3181…
## $ humidity <dbl> 85.86957, 77.22222, 81.88889, 70.05000, 74.94444, 79.3181…
## $ wind_speed <dbl> 2.743478, 2.894444, 4.016667, 4.545000, 3.300000, 8.68181…
## $ meanpressure <dbl> 59.000, 1018.278, 1018.333, 1015.700, 1014.333, 1011.773,…
# climate_train - change data type/structure : `date` from chr to date
climate_train <- climate_train %>%
mutate(date = lubridate::ymd(date))
glimpse(climate_train)## Rows: 1,462
## Columns: 5
## $ date <date> 2013-01-01, 2013-01-02, 2013-01-03, 2013-01-04, 2013-01-…
## $ meantemp <dbl> 10.000000, 7.400000, 7.166667, 8.666667, 6.000000, 7.0000…
## $ humidity <dbl> 84.50000, 92.00000, 87.00000, 71.33333, 86.83333, 82.8000…
## $ wind_speed <dbl> 0.0000000, 2.9800000, 4.6333333, 1.2333333, 3.7000000, 1.…
## $ meanpressure <dbl> 1015.667, 1017.800, 1018.667, 1017.167, 1016.500, 1018.00…
# climate_test - change data type/structure : `date` from chr to date
climate_test <- climate_test %>%
mutate(date = lubridate::ymd(date))
glimpse(climate_test)## Rows: 114
## Columns: 5
## $ date <date> 2017-01-01, 2017-01-02, 2017-01-03, 2017-01-04, 2017-01-…
## $ meantemp <dbl> 15.91304, 18.50000, 17.11111, 18.70000, 18.38889, 19.3181…
## $ humidity <dbl> 85.86957, 77.22222, 81.88889, 70.05000, 74.94444, 79.3181…
## $ wind_speed <dbl> 2.743478, 2.894444, 4.016667, 4.545000, 3.300000, 8.68181…
## $ meanpressure <dbl> 59.000, 1018.278, 1018.333, 1015.700, 1014.333, 1011.773,…
We use anyNA(), is.na() and colSums() function to check the missing value.
## [1] FALSE
## date meantemp humidity wind_speed meanpressure
## 0 0 0 0 0
## [1] FALSE
## date meantemp humidity wind_speed meanpressure
## 0 0 0 0 0
On the object climate_train, there is outlier on the attributes
wind_speed and meanpressure.
On the object climate_test, there is outlier on the attributes meanpressure.
We can use histogram to check distribution data on the objects climate_train and climate_test
# Histogram for each Attribute - climate_train
climate_train %>%
gather(Attributes, value, 2:5) %>%
ggplot(aes(x=value, fill=Attributes)) +
geom_histogram(colour="black", show.legend=FALSE) +
facet_wrap(~Attributes, scales="free_x") +
labs(x="Values", y="Frequency",
title="Climate-train Attributes - Histograms") +
theme_bw()## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Histogram for each Attribute - climate_test
climate_test %>%
gather(Attributes, value, 2:5) %>%
ggplot(aes(x=value, fill=Attributes)) +
geom_histogram(colour="black", show.legend=FALSE) +
facet_wrap(~Attributes, scales="free_x") +
labs(x="Values", y="Frequency",
title="Climate-train Attributes - Histograms") +
theme_bw()## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In this data processing, there are some steps to be taken, such as: changing the data to time series format, performing decomposition, and fitting a model. In this LBB, we only use the data object climate_train.
We find the range time for atribute date on the objects by use range() function.
## [1] "2013-01-01" "2017-01-01"
Use ts() function to change data into time series format.
# object train_ts
train_ts <- ts(data = climate_train$meantemp,# target variable
start = c(2013,1,1), #start period
frequency = 365)# daily data so we use 365 days
head(train_ts,12)## Time Series:
## Start = c(2013, 1)
## End = c(2013, 12)
## Frequency = 365
## [1] 10.000000 7.400000 7.166667 8.666667 6.000000 7.000000 7.000000
## [8] 8.857143 14.000000 11.000000 15.714286 14.000000
We check the data visualization to get the pattern.
The pattern data for the object train_ts is additive. Use this type of
data for the next decompose() function.
From the results above, we can see that the pattern is the same for some
periods. Therefore, we can say that the type of seasonal data is
additive.
Before fitting the model, we split the data into two parts:
We then proceed to the next step by checking for missing values in the clim_train and clim_test data.
# test use `tail()`
clim_test <- train_ts %>% tail(365) # take 12 last data
# train use `head()`
clim_train <- train_ts %>% head(-365) # don't take 12 last data## [1] FALSE
## [1] FALSE
## [1] FALSE
There are no missing values in the data clim_train, clim_test, and train_dc. We can proceed to the next process, which is fitting the model.
For model fitting, we use three types of models: Simple Moving Average (SMA), Triple Exponential Smoothing (Holt Winters), and Autoregressive Integrated Moving Average (ARIMA).
In this model, we use SMA() function from data train_ts with
# Build SMA model with orde = 3 and 10
train_sma3 <- SMA(x = train_ts, n = 3)
train_sma10 <- SMA(x = train_ts, n = 10)Based on the result of train_sma3 and train_sma10, there is missing value in the some rows. We cannot drop NA because it can cause the missing data.
## [1] "numeric"
This model is simple, and it cannot use the accuracy() function from forecasting packages. Therefore, we will stop the process for the SMA model here. The next model will be fitting by Triple Exponential Smoothing (Holt Winters).
In this model, we use the HoltWinters() function with the data clim_train and the seasonal type set to additive. We then save the model data with the object name model_train.
# model fitting with train_ts
model_train <- HoltWinters(x = clim_train, seasonal = "additive")
model_train## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = clim_train, seasonal = "additive")
##
## Smoothing parameters:
## alpha: 0.6770537
## beta : 0
## gamma: 1
##
## Coefficients:
## [,1]
## a 2.602363e+01
## b 7.786026e-04
## s1 -1.198739e+01
## s2 -1.201948e+01
## s3 -1.208472e+01
## s4 -1.335531e+01
## s5 -1.341066e+01
## s6 -1.273301e+01
## s7 -1.179621e+01
## s8 -1.185330e+01
## s9 -1.225452e+01
## s10 -1.196065e+01
## s11 -1.046138e+01
## s12 -9.993057e+00
## s13 -1.159085e+01
## s14 -1.183360e+01
## s15 -1.214046e+01
## s16 -1.222146e+01
## s17 -1.049410e+01
## s18 -1.022124e+01
## s19 -1.077042e+01
## s20 -9.972734e+00
## s21 -8.614474e+00
## s22 -1.073655e+01
## s23 -1.185561e+01
## s24 -1.056419e+01
## s25 -8.944911e+00
## s26 -8.754930e+00
## s27 -9.452906e+00
## s28 -9.464328e+00
## s29 -9.385187e+00
## s30 -9.786707e+00
## s31 -8.705496e+00
## s32 -7.170183e+00
## s33 -6.867985e+00
## s34 -7.347541e+00
## s35 -6.330757e+00
## s36 -5.164852e+00
## s37 -8.854825e+00
## s38 -1.071593e+01
## s39 -1.193176e+01
## s40 -1.137633e+01
## s41 -1.152675e+01
## s42 -9.496429e+00
## s43 -1.024789e+01
## s44 -1.176785e+01
## s45 -1.077741e+01
## s46 -9.761975e+00
## s47 -9.490288e+00
## s48 -9.047401e+00
## s49 -8.720213e+00
## s50 -8.227624e+00
## s51 -8.042738e+00
## s52 -8.364720e+00
## s53 -7.908749e+00
## s54 -6.944919e+00
## s55 -7.304191e+00
## s56 -6.738413e+00
## s57 -7.697524e+00
## s58 -9.552880e+00
## s59 -8.870290e+00
## s60 -7.907670e+00
## s61 -6.435495e+00
## s62 -5.384627e+00
## s63 -5.619107e+00
## s64 -6.529959e+00
## s65 -4.999649e+00
## s66 -3.872528e+00
## s67 -4.543274e+00
## s68 -3.994859e+00
## s69 -3.735427e+00
## s70 -4.767116e+00
## s71 -4.914178e+00
## s72 -4.895460e+00
## s73 -1.898573e+00
## s74 7.963801e-01
## s75 1.930300e-01
## s76 -1.622856e+00
## s77 -2.219351e+00
## s78 -1.414227e+00
## s79 -1.516522e-01
## s80 -1.477008e+00
## s81 -1.720296e+00
## s82 -3.185110e-01
## s83 7.962206e-01
## s84 -9.144463e-01
## s85 -5.987792e-01
## s86 -4.509573e-02
## s87 -8.908691e-01
## s88 -1.177850e-01
## s89 4.493147e-01
## s90 2.254912e+00
## s91 -4.489233e-01
## s92 -5.144484e-01
## s93 2.443984e+00
## s94 4.886696e+00
## s95 2.861116e+00
## s96 2.198044e+00
## s97 1.499231e+00
## s98 1.942867e+00
## s99 3.287239e+00
## s100 2.770645e+00
## s101 2.452824e+00
## s102 3.220669e+00
## s103 4.363059e+00
## s104 2.254409e+00
## s105 2.766362e+00
## s106 1.214408e+00
## s107 1.648391e+00
## s108 2.318209e+00
## s109 2.606305e+00
## s110 3.669800e+00
## s111 4.169232e+00
## s112 4.621317e+00
## s113 5.288762e+00
## s114 5.729720e+00
## s115 6.285048e+00
## s116 6.891578e+00
## s117 7.304470e+00
## s118 8.559226e+00
## s119 9.174117e+00
## s120 9.114483e+00
## s121 7.964882e+00
## s122 7.354071e+00
## s123 6.158270e+00
## s124 5.889066e+00
## s125 7.045911e+00
## s126 6.318557e+00
## s127 6.228680e+00
## s128 5.528708e+00
## s129 4.677026e+00
## s130 3.374079e+00
## s131 -3.650718e-01
## s132 1.864083e+00
## s133 3.389892e+00
## s134 5.828839e+00
## s135 6.332719e+00
## s136 7.229460e+00
## s137 6.170136e+00
## s138 6.757834e+00
## s139 8.500125e+00
## s140 8.624006e+00
## s141 9.755148e+00
## s142 7.194046e+00
## s143 7.376386e+00
## s144 5.134703e+00
## s145 5.925452e+00
## s146 8.177301e+00
## s147 9.805533e+00
## s148 7.842959e+00
## s149 9.795927e+00
## s150 6.933521e+00
## s151 8.306507e+00
## s152 8.486835e+00
## s153 1.080591e+01
## s154 1.183884e+01
## s155 1.259195e+01
## s156 1.379096e+01
## s157 1.354257e+01
## s158 1.350898e+01
## s159 1.238927e+01
## s160 1.227420e+01
## s161 9.418578e+00
## s162 5.102697e+00
## s163 4.080900e+00
## s164 8.659279e+00
## s165 1.098113e+01
## s166 1.210620e+01
## s167 1.035881e+01
## s168 1.090681e+01
## s169 1.148372e+01
## s170 1.091575e+01
## s171 1.058657e+01
## s172 8.115540e+00
## s173 9.977863e+00
## s174 4.790759e+00
## s175 8.177398e+00
## s176 9.829933e+00
## s177 1.049189e+01
## s178 7.905602e+00
## s179 7.322194e+00
## s180 7.155751e+00
## s181 6.922562e+00
## s182 6.350669e+00
## s183 7.922029e+00
## s184 9.314715e+00
## s185 4.509044e+00
## s186 6.554235e+00
## s187 7.325155e+00
## s188 5.387483e+00
## s189 4.255590e+00
## s190 5.819591e+00
## s191 5.178832e+00
## s192 6.454543e+00
## s193 6.435384e+00
## s194 7.587076e+00
## s195 5.708966e+00
## s196 5.283614e+00
## s197 4.991974e+00
## s198 6.251457e+00
## s199 4.340584e+00
## s200 4.872696e+00
## s201 4.841755e+00
## s202 5.554948e+00
## s203 5.324924e+00
## s204 4.220181e+00
## s205 4.820982e+00
## s206 4.961836e+00
## s207 5.797714e+00
## s208 5.849649e+00
## s209 6.440888e+00
## s210 5.515838e+00
## s211 5.229031e+00
## s212 5.779635e+00
## s213 7.517130e+00
## s214 6.737311e+00
## s215 5.865234e+00
## s216 4.678301e+00
## s217 3.794647e+00
## s218 3.569938e+00
## s219 2.191900e+00
## s220 4.158276e+00
## s221 4.487895e+00
## s222 5.848860e+00
## s223 6.231407e+00
## s224 4.810258e+00
## s225 3.349423e+00
## s226 2.670219e+00
## s227 3.516602e+00
## s228 3.032772e+00
## s229 4.292575e+00
## s230 3.364114e+00
## s231 2.845801e+00
## s232 4.083058e+00
## s233 5.190897e+00
## s234 6.616419e+00
## s235 6.202298e+00
## s236 6.609839e+00
## s237 7.158783e+00
## s238 5.311021e+00
## s239 6.121021e+00
## s240 5.187595e+00
## s241 4.207190e+00
## s242 5.199997e+00
## s243 6.087984e+00
## s244 6.922225e+00
## s245 7.070691e+00
## s246 6.374606e+00
## s247 6.792467e+00
## s248 7.172922e+00
## s249 5.561277e+00
## s250 5.521302e+00
## s251 4.979986e+00
## s252 4.866787e+00
## s253 6.582701e+00
## s254 6.811925e+00
## s255 6.315701e+00
## s256 7.115734e+00
## s257 6.182785e+00
## s258 5.146724e+00
## s259 4.740289e+00
## s260 4.968169e+00
## s261 4.200353e+00
## s262 2.839160e+00
## s263 3.112356e+00
## s264 3.910461e+00
## s265 4.812431e+00
## s266 4.765990e+00
## s267 4.482814e+00
## s268 4.310929e+00
## s269 4.675208e+00
## s270 4.570342e+00
## s271 3.477899e+00
## s272 3.164084e+00
## s273 3.949424e+00
## s274 4.648610e+00
## s275 2.333475e+00
## s276 2.818144e+00
## s277 3.809919e+00
## s278 4.749270e+00
## s279 4.206872e+00
## s280 4.932077e+00
## s281 4.874837e+00
## s282 2.047586e+00
## s283 1.834447e+00
## s284 2.556250e+00
## s285 1.777814e+00
## s286 1.442464e+00
## s287 1.492022e+00
## s288 2.339262e+00
## s289 1.609453e+00
## s290 1.325911e+00
## s291 5.807043e-01
## s292 2.745502e-01
## s293 1.237791e+00
## s294 8.498795e-01
## s295 5.606081e-01
## s296 -1.466818e-01
## s297 -1.248188e+00
## s298 -1.775563e+00
## s299 -2.944453e+00
## s300 -1.660302e+00
## s301 -1.743883e+00
## s302 -1.458948e+00
## s303 -1.275488e+00
## s304 -1.796295e+00
## s305 -2.541282e+00
## s306 -3.066512e+00
## s307 -4.992446e+00
## s308 -4.227070e+00
## s309 -4.373254e+00
## s310 -5.265555e+00
## s311 -5.032966e+00
## s312 -6.453988e+00
## s313 -7.052221e+00
## s314 -7.305308e+00
## s315 -7.117091e+00
## s316 -7.017000e+00
## s317 -7.081618e+00
## s318 -7.693883e+00
## s319 -8.958847e+00
## s320 -7.938604e+00
## s321 -7.048718e+00
## s322 -5.892132e+00
## s323 -6.417941e+00
## s324 -6.316405e+00
## s325 -7.351809e+00
## s326 -5.859603e+00
## s327 -4.359161e+00
## s328 -2.790935e+00
## s329 -4.577928e+00
## s330 -4.738208e+00
## s331 -5.604709e+00
## s332 -5.655111e+00
## s333 -6.110323e+00
## s334 -6.610442e+00
## s335 -5.985412e+00
## s336 -7.208772e+00
## s337 -8.101604e+00
## s338 -8.218172e+00
## s339 -8.282624e+00
## s340 -7.916427e+00
## s341 -8.475868e+00
## s342 -7.120550e+00
## s343 -8.286985e+00
## s344 -9.198028e+00
## s345 -8.401650e+00
## s346 -7.867668e+00
## s347 -8.390164e+00
## s348 -8.556913e+00
## s349 -1.036563e+01
## s350 -1.013688e+01
## s351 -9.495622e+00
## s352 -9.487741e+00
## s353 -9.614455e+00
## s354 -9.752089e+00
## s355 -9.680601e+00
## s356 -1.064618e+01
## s357 -1.072739e+01
## s358 -1.075701e+01
## s359 -1.095665e+01
## s360 -1.218979e+01
## s361 -1.305511e+01
## s362 -1.279254e+01
## s363 -1.136668e+01
## s364 -1.063268e+01
## s365 -1.202363e+01
We check the data model with variable fitted.
## Time Series:
## Start = c(2014, 1)
## End = c(2014, 10)
## Frequency = 365
## xhat level trend season
## 2014.000 13.60364 24.66347 0.0007786026 -11.06061
## 2014.003 11.08200 24.50945 0.0007786026 -13.42822
## 2014.005 12.52996 24.45470 0.0007786026 -11.92552
## 2014.008 12.88023 24.43520 0.0007786026 -11.55574
## 2014.011 12.37850 24.43243 0.0007786026 -12.05471
## 2014.014 11.42272 24.43084 0.0007786026 -13.00890
## 2014.016 12.12759 24.43558 0.0007786026 -12.30877
## 2014.019 11.85167 24.44670 0.0007786026 -12.59581
## 2014.022 12.80911 24.46327 0.0007786026 -11.65494
## 2014.025 12.35665 24.48045 0.0007786026 -12.12458
We check the visualization of model_ _train by using plot() function.
The plot shows the original time series in black and the forecasts as a red line. We can see from the picture that the forecasts time series agree pretty well with the original values, although they tend to lag behind the original values a little bit.
As a measure of the accuracy of the forecasts, we can calculate the sum of squared errors for the in-sample forecast errors, which is stored in the named element SSE.
## [1] 2289.19
That is, the sum-of-squared-errors is 2289.19.
We have a second way to measure the accuracy of the forecasts. We continue the next process by seeing the error measurement with the accuracy() function from the forecast package. We take the xhat column of the data model_train$fitted based on the data clim_train.
# accuracy : way 2 by using `accuracy()` function
forecast::accuracy(object = model_train$fitted[,1], clim_train)## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 0.001594478 1.76842 1.18208 -0.1981344 5.143228 0.1470544 1.144072
Base on the result above, we get the accuracy measurement for the holt winters model are : MAPE : 5.14 % MAE : 1.18 % RMSE : 1.77 %
In this model, we use adf.test() to check if the data is already stationer.
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -1.7903, Lag order = 10, p-value = 0.6671
## alternative hypothesis: stationary
We make the hypothesis of the stationary data as follows:
Based on the data above, we find that the P-value (0.6671) is greater than the alpha (0.05). This means that we fail to reject the null hypothesis, so we cannot conclude that the data is stationary. We must set the differencing in the seasonal data to make the data stationary.
clim_train %>%
diff(lag = 12) %>% #untuk diff seasonal (D = 1)
diff(lag = 1) %>% #untuk diff trend (d=1)
adf.test()## Warning in adf.test(.): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -9.7415, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary
Based on the data above, we find that the P-value (0.01) is less than the alpha (0.05). This means that the data is already stationary. We then use the tsdisplay() function to see the adjusted data clim_train graphic.
library(forecast)
# Set the margins to smaller values
par(mar = c(3, 3, 2, 1))
clim_train_diff <- clim_train %>%
diff(lag = 12) %>%
diff(lag = 1)
tsdisplay(clim_train_diff, lag.max = 20)
Based on the graphic above, we set the combined model ARIMA.
For seasosal : AR * P : 0 * D : 1 * Q : 0
For all data : MA * p: 1,2,3 * d: 1 * q: 0,1
Combine the model
SARIMA: (p, d, q)(P, D, Q)[m]
(1,1,0)(0,1,0)[12] (1,1,1)(0,1,0)[12] (2,1,0)(0,1,0)[12] (2,1,1)(0,1,0)[12] (3,1,0)(0,1,0)[12] (3,1,1)(0,1,0)[12] Use Arima() and auto.arima() function to create model Arima and save the data model into the new objects.
# create model using auto.arima()
sarima1 <- Arima(y = clim_train, order = c(1,1,0), seasonal = c(0,1,0))
sarima2 <- Arima(y = clim_train, order = c(1,1,1), seasonal = c(0,1,0))
sarima3 <- Arima(y = clim_train, order = c(2,1,0), seasonal = c(0,1,0))
sarima4 <- Arima(y = clim_train, order = c(2,1,1), seasonal = c(0,1,0))
sarima5 <- Arima(y = clim_train, order = c(3,1,0), seasonal = c(0,1,0))
sarima6 <- Arima(y = clim_train, order = c(3,1,1), seasonal = c(0,1,0))
sarima_auto <- auto.arima(y = clim_train)After building the combined model, we check the aic value for each model.
## [1] 3249.425
## [1] 3160.268
## [1] 3242.748
## [1] 3162.192
## [1] 3150.97
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016.0055 14.491288 11.829716 17.15286 10.420765 18.56181
## 2016.0082 13.666039 10.379749 16.95233 8.640093 18.69198
## 2016.0110 13.667601 10.097068 17.23813 8.206943 19.12826
## 2016.0137 11.759262 8.048332 15.47019 6.083883 17.43464
## 2016.0164 9.450648 5.668157 13.23314 3.665828 15.23547
## 2016.0192 9.873727 6.054237 13.69322 4.032321 15.71513
## 2016.0219 10.533547 6.694794 14.37230 4.662680 16.40441
## 2016.0247 11.058766 7.209946 14.90759 5.172504 16.94503
## 2016.0274 10.952031 7.097941 14.80612 5.057709 16.84635
## 2016.0301 10.590259 6.733408 14.44711 4.691715 16.48880
## 2016.0329 12.224839 8.366541 16.08314 6.324081 18.12560
## 2016.0356 11.981777 8.122720 15.84083 6.079859 17.88370
## 2016.0384 12.361802 8.502347 16.22126 6.459275 18.26433
## 2016.0411 12.990442 9.130778 16.85011 7.087595 18.89329
## 2016.0438 13.493077 9.633304 17.35285 7.590063 19.39609
## 2016.0466 12.994986 9.135156 16.85482 7.091884 18.89809
## 2016.0493 13.746369 9.886508 17.60623 7.843221 19.64952
## 2016.0521 13.372370 9.512493 17.23225 7.469198 19.27554
clim_train %>%
autoplot()+
autolayer(clim_test, series = "data test")+
autolayer(clim_forecast$mean, series = "forecast")+
autolayer(sarima_auto$fitted, series = "fitted values")## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 3.005627 4.165464 3.317985 17.94666 20.48799 0.7312294 3.263169
Base on the result above, we get the accuracy measurement for the holt winters model are : MAPE : 20.49 % MAE : 3.32 % RMSE : 4.16 %
Assumptions in time series are tested to measure whether the residuals obtained from modeling results are good enough to describe and capture information in the data. Residual data is used because it provides information from both the actual data and the prediction results from models.
We noted that p-value > 0.05 (alpha), so the residual have no-autocorrelation.
# Reduce the margins for the plot
par(mar = c(3, 3, 2, 1) + 0.1)
# Use visualization plot
sarima_auto$residuals %>%
tsdisplay(lag.max = 20)
We hope the model have no lag that over blue line.
##
## Box-Ljung test
##
## data: sarima_auto$residuals
## X-squared = 0.065276, df = 1, p-value = 0.7983
From the result above, we can conclude the assumption that the model with p-value 0.7983 have no autocorrelation (The model data meet H0 : residual has no-autocorrelation).
##
## Shapiro-Wilk normality test
##
## data: sarima_auto$residuals
## W = 0.92283, p-value < 2.2e-16
From the results above, we can conclude that the model with a p-value of
0.00000000000000022 has a normal distribution (the model data meets H0:
Residuals have a normal distribution).
Based on our analysis, we recommend that the data be pre-processed to meet the assumptions of the Holt-Winters model, which include having a normal distribution and no autocorrelation. The Holt-Winters model is the best model for this data because it has the least error.