Hello everyone! At this page, I would like to show you an analysis about foreign exchange. I am using another time-series model for predicting changes on EUR/USD. This datasets contains close price each day from 2010-2019. So it’s gonna be a big time series model.
Without any further ado, let’s get started!
By using the dataset from 2010 to 2019, I am trying to forecast the US dollar rate for Euro for the entire 2020. I am going to use several R time-series functions to process all the problems.
As usual, load the necessary packages.
First, load the needed dataset.
## # A tibble: 6 x 24
## X1 `Time Serie` `AUSTRALIA - AU~ `EURO AREA - EU~ `NEW ZEALAND - ~
## <dbl> <date> <chr> <chr> <chr>
## 1 0 2000-01-03 1.5172 0.9847 1.9033
## 2 1 2000-01-04 1.5239 0.97 1.9238
## 3 2 2000-01-05 1.5267 0.9676 1.9339
## 4 3 2000-01-06 1.5291 0.9686 1.9436
## 5 4 2000-01-07 1.5272 0.9714 1.938
## 6 5 2000-01-10 1.5242 0.9754 1.935
## # ... with 19 more variables: `UNITED KINGDOM - UNITED KINGDOM
## # POUND/US$` <chr>, `BRAZIL - REAL/US$` <chr>, `CANADA - CANADIAN
## # DOLLAR/US$` <chr>, `CHINA - YUAN/US$` <chr>, `HONG KONG - HONG KONG
## # DOLLAR/US$` <chr>, `INDIA - INDIAN RUPEE/US$` <chr>, `KOREA -
## # WON/US$` <chr>, `MEXICO - MEXICAN PESO/US$` <chr>, `SOUTH AFRICA -
## # RAND/US$` <chr>, `SINGAPORE - SINGAPORE DOLLAR/US$` <chr>, `DENMARK -
## # DANISH KRONE/US$` <chr>, `JAPAN - YEN/US$` <chr>, `MALAYSIA -
## # RINGGIT/US$` <chr>, `NORWAY - NORWEGIAN KRONE/US$` <chr>, `SWEDEN -
## # KRONA/US$` <chr>, `SRI LANKA - SRI LANKAN RUPEE/US$` <chr>, `SWITZERLAND -
## # FRANC/US$` <chr>, `TAIWAN - NEW TAIWAN DOLLAR/US$` <chr>, `THAILAND -
## # BAHT/US$` <chr>
Use glimpse() to observe overall data
## Observations: 5,217
## Variables: 24
## $ X1 <dbl> 0, 1, 2, 3, 4, 5, 6, 7,...
## $ `Time Serie` <date> 2000-01-03, 2000-01-04...
## $ `AUSTRALIA - AUSTRALIAN DOLLAR/US$` <chr> "1.5172", "1.5239", "1....
## $ `EURO AREA - EURO/US$` <chr> "0.9847", "0.97", "0.96...
## $ `NEW ZEALAND - NEW ZELAND DOLLAR/US$` <chr> "1.9033", "1.9238", "1....
## $ `UNITED KINGDOM - UNITED KINGDOM POUND/US$` <chr> "0.6146", "0.6109", "0....
## $ `BRAZIL - REAL/US$` <chr> "1.805", "1.8405", "1.8...
## $ `CANADA - CANADIAN DOLLAR/US$` <chr> "1.4465", "1.4518", "1....
## $ `CHINA - YUAN/US$` <chr> "8.2798", "8.2799", "8....
## $ `HONG KONG - HONG KONG DOLLAR/US$` <chr> "7.7765", "7.7775", "7....
## $ `INDIA - INDIAN RUPEE/US$` <chr> "43.55", "43.55", "43.5...
## $ `KOREA - WON/US$` <chr> "1128", "1122.5", "1135...
## $ `MEXICO - MEXICAN PESO/US$` <chr> "9.4015", "9.457", "9.5...
## $ `SOUTH AFRICA - RAND/US$` <chr> "6.126", "6.085", "6.07...
## $ `SINGAPORE - SINGAPORE DOLLAR/US$` <chr> "1.6563", "1.6535", "1....
## $ `DENMARK - DANISH KRONE/US$` <chr> "7.329", "7.218", "7.20...
## $ `JAPAN - YEN/US$` <chr> "101.7", "103.09", "103...
## $ `MALAYSIA - RINGGIT/US$` <chr> "3.8", "3.8", "3.8", "3...
## $ `NORWAY - NORWEGIAN KRONE/US$` <chr> "7.964", "7.934", "7.93...
## $ `SWEDEN - KRONA/US$` <chr> "8.443", "8.36", "8.353...
## $ `SRI LANKA - SRI LANKAN RUPEE/US$` <chr> "72.3", "72.65", "72.95...
## $ `SWITZERLAND - FRANC/US$` <chr> "1.5808", "1.5565", "1....
## $ `TAIWAN - NEW TAIWAN DOLLAR/US$` <chr> "31.38", "30.6", "30.8"...
## $ `THAILAND - BAHT/US$` <chr> "36.97", "37.13", "37.1...
Since, I want to analyze EUR to USD only, I have to select() the variable and change the type of EURO AREA EURO/US$ to integer. Then, I will filter the date from 2010.
forex <- read %>%
mutate(`EURO AREA - EURO/US$`= as.numeric(`EURO AREA - EURO/US$`)) %>%
select(date = `Time Serie`, usd_to_eur = `EURO AREA - EURO/US$`) %>%
filter(date >= ymd("2010-01-01"))
head(forex) ## # A tibble: 6 x 2
## date usd_to_eur
## <date> <dbl>
## 1 2010-01-01 NA
## 2 2010-01-04 0.694
## 3 2010-01-05 0.694
## 4 2010-01-06 0.694
## 5 2010-01-07 0.699
## 6 2010-01-08 0.696
Okay, next, I have to ensure that the date has no missing value. I will use pad() to do it.
## [1] "2010-01-01"
## [1] "2019-12-31"
Do the padding
## # A tibble: 6 x 2
## date usd_to_eur
## <date> <dbl>
## 1 2010-01-01 NA
## 2 2010-01-02 NA
## 3 2010-01-03 NA
## 4 2010-01-04 0.694
## 5 2010-01-05 0.694
## 6 2010-01-06 0.694
Great!
Now, check for missing values
## date usd_to_eur
## 0 1150
In order to get a clearer view, I will plot the time-series data using ggplotly()
p1<- forex %>%
ggplot(mapping = aes(x = date, y = usd_to_eur))+
geom_line(col = "DarkRed")+
labs(title = "Time Series Model : USD to EUR Between 2010 and 2019", y ="USD to EUR Rate", x= NULL)+
theme_minimal()
ggplotly(p1)As we see from the chart above, the line still not connected each other. This indicates that there are some missing values in the time series data. To handle missing values of time-series model, we have to build the time series model first.
To build time-series model is quite easey. Just insert the variable we want and set start time and end time. Do not forget to determine frequency because can affect much difference in the model.At this case, I will choose 365 (as the time-series data is a daily pattern).
Use ts() funcion.
Try to plot using autoplot()
Nice. We have a similar chart as the previous with ggplot().
As we see from the dataframe, there are NA or missing value inside of it. I will try to replace the NA using imputeTS:: package. I am using na_kalman with model auto.arima to replace the NA.
forex_kalman<-na_kalman(forex_ts, model = "auto.arima", smooth = T)
forex_kalman %>%
autoplot()+
theme_minimal()Cool! The lines are now fully connected. Now, I can continue to process and analyze the model.
Decomposing is a kind process that split the time-series data into three main components which are :
We can use decompose() function to do the process. This function has type argument which will determine the results of the decomposition.
There are two options for type :
Before picking type, we have to do observation of the time-series chart that has been built. In this case, I will pick for additive.
As we see on the trend line, the time-series model is tended to fluctuate.This pattern in trend might be sourced from uncaptured extra seasonality from higher natural period in this case,so it can be considered as multi-seasonal data. In order to observe it, we have to change the forex ts obejct into msts
forex_msts<- msts(forex$usd_to_eur, seasonal.periods = c(7,365), start = c(2010,1))
plot(forex_msts, main="USD To EUR", xlab="Year", ylab="USD To EUR")Great! Now we have a new correct multiple seasonal time series (msts) model. Do another NA imputation again
msts_kalman <- na_kalman(forex_msts, model = "auto.arima", smooth = T)
msts_kalman %>%
autoplot()+
theme_minimal()Next, let’s do decomposing. We can use mstl() function.
As we see from the chart above, this msts model has a quite clear patter in 365 days of frequency. Before going to the next step, I will split the data into train and test.
In order to forecast the incoming data, I will use several different time-series forecasting techniques:
At the end, I am going to compare the results based on their errors produced.
To use SMA, the first thing we need to do is model fitting. UseSMA() function and try to choose 3 as period number.
forex_sma <- SMA(x = msts_train, n = 5)
forex_sma <- msts (forex_sma, start = c(2010,1), seasonal.periods = c(7,365))Visualize the model to do comparison
msts_train %>%
autoplot(series = "Actual") +
autolayer(forex_sma, series ="SMA")+
scale_color_manual(values = c("Black", "Red"))+
theme_minimal()From the chart above, we can see slight difference betweeen the actual series and the SMA series. If we change tha value of n in SMA model into a higher value, there will be more removed data.
Do the forecasting
Visualize the forecast model
msts_kalman %>%
autoplot(series = "train") +
autolayer(msts_test, series = "test") +
autolayer(forex_forecast_sma$fitted, series = "forecast train") +
autolayer(forex_forecast_sma$mean, series = "forecast test") +
theme_minimal()From the line graph above, there is a slight difference in the last data. SMA model produced a lower value of US dollar rate.
Check for the accuracy
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 0.02629245 0.02978435 0.02650706 2.926293 2.950941 0.9851078 14.93062
This model has a low RMSE (Root Mean Squared Error) and MAE (Mean Abosulte Error) which is a good news.
Before choosing the type of exponential smoothing, we have to find out whether there is seasonal pattern on the model or no. I can use stl_features() function to do it.
## nperiods seasonal_period1 seasonal_period2 trend
## 2.000000e+00 7.000000e+00 3.650000e+02 9.304959e-01
## spike linearity curvature e_acf1
## 3.152743e-14 3.458855e+00 -3.451180e-01 9.853241e-01
## e_acf10 seasonal_strength1 seasonal_strength2 peak1
## 8.313717e+00 4.926877e-03 1.134930e-01 1.000000e+00
## peak2 trough1 trough2
## 1.460000e+02 5.000000e+00 2.910000e+02
From the result above, we know that the model has seasonal period and seasonal strength. Therefore, we can create exponential smoothing using seasonal parameter.
In stlm() function, there are several parameters that have to be assigned :
y : time series object
method : method of modelling (“ets”, “arima”) /li>Now, I will try to make create model using stlm() function
#model building
stlm_ets <- msts_train %>%
stlm(lambda = 0, method = "ets") %>%
forecast(h = 365)
#plot the model
autoplot(stlm_ets)+
scale_x_continuous(labels = scales::number_format(accuracy = 1))+
theme_minimal()Create foercast object to make a comparison using autoplot()
msts_kalman %>%
autoplot(series = "train") +
autolayer(msts_test, series = "test") +
autolayer(forex_forecast_ets$fitted, series = "forecast train") +
autolayer(forex_forecast_ets$mean, series = "forecast test") +
theme_minimal()Great! The cart above shows us the model partition to compare the result. It seems that forecast_train data using stlm() ets() gives much difference.
Next step is model evaluation. I will use accuracy() function
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 0.02554848 0.02896997 0.02577542 2.843545 2.869584 0.9792286 14.5232
By using accuracy() function, we can observe numbers of errors that made by the model. I woll try to focus on RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error). It is very clear that RMSE and MAE value of this ETS forecasting is very low (~0.02), hence the model built produced a high accuracy result.
HoltWinters method is good to be used on model that has trend and seasonal. It will do smoothing on trend and error components. Let’s call HoltWinters() function.
Do forecasting for the data test
I will try to visualize using autoplot().
msts_kalman %>%
autoplot(series = "train") +
autolayer(msts_test, series = "test") +
autolayer(forex_forecast_holtwinters$fitted, series = "forecast train") +
autolayer(forex_forecast_holtwinters$mean, series = "forecast test") +
theme_minimal()The forecast result produced using Holt Winters seem slightly different form the actual data. To make sure, call the accuracy() function.
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 0.05306738 0.06057553 0.05394233 5.91461 6.014995 0.9887554 30.44576
As we look at the error result, this model has a low error produce (~0.05 - 0.06) for RMSE and MAE.
ARIMA is a combination of two forecasting method, Autoregressive (AR) and Moving Average (MA).
Now fit the model. I am going to use stlm() function do to it and put “arima” on method argument.
Do the forecasting
And visualize time!
msts_kalman %>%
autoplot(series = "train") +
autolayer(msts_test, series = "test") +
autolayer(forex_forecast_arima$fitted, series = "forecast train") +
autolayer(forex_forecast_arima$mean, series = "forecast test") +
theme_minimal()Check the accuracy :
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 0.02586933 0.02938157 0.02606508 2.879121 2.901588 0.9800462 14.72804
Based on the result, RMSE and MAE value are also very low. It means it is a good model forecasting.
After establishing all of the three forecasting method, we can conclude that all of them produced good models (low errors). Hence, I am going to try to build three models for year 2020.
At this part, I do no have to split the data because I am trying to forecast next 365 days using whole data.
Build the SMA model
forex_sma_2020 <- SMA(x = msts_kalman, n = 5)
forex_sma_2020 <- msts(data = forex_sma_2020, start = c(2010,1),seasonal.periods = c(7,365))Forecasting
Visualization
Same as the previous, I am going to build model using three methods.
Build The model
Forecasting
Visualize!
ggplotly(msts_kalman %>%
autoplot(series = "Actual") +
autolayer(forex_forecast_ets_2020$mean, series = "Forecast 2020") +
labs(title = "USD to EUR Forecast for 2020 using ETS", y = "USD to EUR")+
theme_minimal()
)Store the result to an object:
Build the model:
Forecasting :
Visualize :
ggplotly(msts_kalman %>%
autoplot(series = "Actual") +
autolayer(forex_forecast_holwinters_2020$mean, series = "Forecast 2020") +
labs(title = "USD to EUR Forecast for 2020 using HoltWinters", y = "USD to EUR")+
theme_minimal()
)Store the result to an object
Build the model :
Forecasting :
Visualize :
ggplotly(msts_kalman %>%
autoplot(series = "Actual") +
autolayer(forex_forecast_arima_2020$mean, series = "Forecast 2020") +
labs(title = "USD to EUR Forecast for 2020 using ARIMA", y = "USD to EUR")+
theme_minimal()
)Store the result to an object
To conlclude, I am going to combine all of the result data frame into one data.frame
result<-sma_result %>%
left_join(ets_result) %>%
left_join(holtwtinters_result) %>%
left_join(arima_result)
paged_table(result, options = list(rows.print = 10))I will show the error comparison
error <-data.frame("Model" = c("SMA","ETS","Holt Winters","ARIMA"),"RMSE" = c(sma_acc[[2]], ets_acc[[2]], holt_acc[[2]], arima_acc[[2]]), "MAE" = c(sma_acc[[3]], ets_acc[[3]], holt_acc[[3]], arima_acc[[3]]))
paged_table(error)I will check the range of result
## [1] 0.8726728 0.8955830
## [1] 0.8765365 0.8995205
## [1] 0.8190207 0.9139066
## [1] 0.8770886 0.8997408
Overall, the forecast results between all models are quite similar. They can produce results with low error. Therefore, these models worked well on the dataset. And for final result, the US Dollar rate for Euro in 2020 will be around 0.87 - 0.91.
So, that’s all for the process of time-series forecasting using packages in R programming language.I hope this page can help you understand time-series problem and the solution behind it.
See you in the other page!
Author,
Alfado Sembiring
Notes :
In case you want to look up my profile, click the link below :
Jump To My Profile (open link in a new tab)