Introduction

Hello everyone! At this page, I would like to show you an analysis about foreign exchange. I am using another time-series model for predicting changes on EUR/USD. This datasets contains close price each day from 2010-2019. So it’s gonna be a big time series model.
Without any further ado, let’s get started!

Objective

By using the dataset from 2010 to 2019, I am trying to forecast the US dollar rate for Euro for the entire 2020. I am going to use several R time-series functions to process all the problems.

Data Wrangling

First, load the needed dataset.

## # A tibble: 6 x 24
##      X1 `Time Serie` `AUSTRALIA - AU~ `EURO AREA - EU~ `NEW ZEALAND - ~
##   <dbl> <date>       <chr>            <chr>            <chr>           
## 1     0 2000-01-03   1.5172           0.9847           1.9033          
## 2     1 2000-01-04   1.5239           0.97             1.9238          
## 3     2 2000-01-05   1.5267           0.9676           1.9339          
## 4     3 2000-01-06   1.5291           0.9686           1.9436          
## 5     4 2000-01-07   1.5272           0.9714           1.938           
## 6     5 2000-01-10   1.5242           0.9754           1.935           
## # ... with 19 more variables: `UNITED KINGDOM - UNITED KINGDOM
## #   POUND/US$` <chr>, `BRAZIL - REAL/US$` <chr>, `CANADA - CANADIAN
## #   DOLLAR/US$` <chr>, `CHINA - YUAN/US$` <chr>, `HONG KONG - HONG KONG
## #   DOLLAR/US$` <chr>, `INDIA - INDIAN RUPEE/US$` <chr>, `KOREA -
## #   WON/US$` <chr>, `MEXICO - MEXICAN PESO/US$` <chr>, `SOUTH AFRICA -
## #   RAND/US$` <chr>, `SINGAPORE - SINGAPORE DOLLAR/US$` <chr>, `DENMARK -
## #   DANISH KRONE/US$` <chr>, `JAPAN - YEN/US$` <chr>, `MALAYSIA -
## #   RINGGIT/US$` <chr>, `NORWAY - NORWEGIAN KRONE/US$` <chr>, `SWEDEN -
## #   KRONA/US$` <chr>, `SRI LANKA - SRI LANKAN RUPEE/US$` <chr>, `SWITZERLAND -
## #   FRANC/US$` <chr>, `TAIWAN - NEW TAIWAN DOLLAR/US$` <chr>, `THAILAND -
## #   BAHT/US$` <chr>

Use glimpse() to observe overall data

## Observations: 5,217
## Variables: 24
## $ X1                                          <dbl> 0, 1, 2, 3, 4, 5, 6, 7,...
## $ `Time Serie`                                <date> 2000-01-03, 2000-01-04...
## $ `AUSTRALIA - AUSTRALIAN DOLLAR/US$`         <chr> "1.5172", "1.5239", "1....
## $ `EURO AREA - EURO/US$`                      <chr> "0.9847", "0.97", "0.96...
## $ `NEW ZEALAND - NEW ZELAND DOLLAR/US$`       <chr> "1.9033", "1.9238", "1....
## $ `UNITED KINGDOM - UNITED KINGDOM POUND/US$` <chr> "0.6146", "0.6109", "0....
## $ `BRAZIL - REAL/US$`                         <chr> "1.805", "1.8405", "1.8...
## $ `CANADA - CANADIAN DOLLAR/US$`              <chr> "1.4465", "1.4518", "1....
## $ `CHINA - YUAN/US$`                          <chr> "8.2798", "8.2799", "8....
## $ `HONG KONG - HONG KONG DOLLAR/US$`          <chr> "7.7765", "7.7775", "7....
## $ `INDIA - INDIAN RUPEE/US$`                  <chr> "43.55", "43.55", "43.5...
## $ `KOREA - WON/US$`                           <chr> "1128", "1122.5", "1135...
## $ `MEXICO - MEXICAN PESO/US$`                 <chr> "9.4015", "9.457", "9.5...
## $ `SOUTH AFRICA - RAND/US$`                   <chr> "6.126", "6.085", "6.07...
## $ `SINGAPORE - SINGAPORE DOLLAR/US$`          <chr> "1.6563", "1.6535", "1....
## $ `DENMARK - DANISH KRONE/US$`                <chr> "7.329", "7.218", "7.20...
## $ `JAPAN - YEN/US$`                           <chr> "101.7", "103.09", "103...
## $ `MALAYSIA - RINGGIT/US$`                    <chr> "3.8", "3.8", "3.8", "3...
## $ `NORWAY - NORWEGIAN KRONE/US$`              <chr> "7.964", "7.934", "7.93...
## $ `SWEDEN - KRONA/US$`                        <chr> "8.443", "8.36", "8.353...
## $ `SRI LANKA - SRI LANKAN RUPEE/US$`          <chr> "72.3", "72.65", "72.95...
## $ `SWITZERLAND - FRANC/US$`                   <chr> "1.5808", "1.5565", "1....
## $ `TAIWAN - NEW TAIWAN DOLLAR/US$`            <chr> "31.38", "30.6", "30.8"...
## $ `THAILAND - BAHT/US$`                       <chr> "36.97", "37.13", "37.1...

Since, I want to analyze EUR to USD only, I have to select() the variable and change the type of EURO AREA EURO/US$ to integer. Then, I will filter the date from 2010.

## # A tibble: 6 x 2
##   date       usd_to_eur
##   <date>          <dbl>
## 1 2010-01-01     NA    
## 2 2010-01-04      0.694
## 3 2010-01-05      0.694
## 4 2010-01-06      0.694
## 5 2010-01-07      0.699
## 6 2010-01-08      0.696

Okay, next, I have to ensure that the date has no missing value. I will use pad() to do it.

## [1] "2010-01-01"
## [1] "2019-12-31"

Do the padding

## # A tibble: 6 x 2
##   date       usd_to_eur
##   <date>          <dbl>
## 1 2010-01-01     NA    
## 2 2010-01-02     NA    
## 3 2010-01-03     NA    
## 4 2010-01-04      0.694
## 5 2010-01-05      0.694
## 6 2010-01-06      0.694

Great!

Now, check for missing values

##       date usd_to_eur 
##          0       1150

Exploratory Data Analysis

In order to get a clearer view, I will plot the time-series data using ggplotly()

As we see from the chart above, the line still not connected each other. This indicates that there are some missing values in the time series data. To handle missing values of time-series model, we have to build the time series model first.

Modelling

To build time-series model is quite easey. Just insert the variable we want and set start time and end time. Do not forget to determine frequency because can affect much difference in the model.At this case, I will choose 365 (as the time-series data is a daily pattern).
Use ts() funcion.

Try to plot using autoplot()

Nice. We have a similar chart as the previous with ggplot().

As we see from the dataframe, there are NA or missing value inside of it. I will try to replace the NA using imputeTS:: package. I am using na_kalman with model auto.arima to replace the NA.

Cool! The lines are now fully connected. Now, I can continue to process and analyze the model.

Decomposing

Decomposing is a kind process that split the time-series data into three main components which are :

  • Seasonal : up or down pattern of graph
  • Trend : repeating pattern for a certain time
  • Reminder : information which is not recognized by the model.

We can use decompose() function to do the process. This function has type argument which will determine the results of the decomposition.
There are two options for type :

  • Additive : DATA = TREND + SEASONALITY + ERROR
  • Multiplicative : DATA = TREND * SEASONALITY * ERROR

Before picking type, we have to do observation of the time-series chart that has been built. In this case, I will pick for additive.

As we see on the trend line, the time-series model is tended to fluctuate.This pattern in trend might be sourced from uncaptured extra seasonality from higher natural period in this case,so it can be considered as multi-seasonal data. In order to observe it, we have to change the forex ts obejct into msts

Great! Now we have a new correct multiple seasonal time series (msts) model. Do another NA imputation again

Next, let’s do decomposing. We can use mstl() function.

As we see from the chart above, this msts model has a quite clear patter in 365 days of frequency. Before going to the next step, I will split the data into train and test.

Forecasting For Validation

In order to forecast the incoming data, I will use several different time-series forecasting techniques:

  • SMA : Simple Moving Average
  • ETS : Error, Trend, and Seasonal
  • Holt : Holt method (data with trend but without seasonal)
  • ARIMA : Autoregressive Integrated Moving Average

At the end, I am going to compare the results based on their errors produced.

Simple Moving Average (SMA)

To use SMA, the first thing we need to do is model fitting. UseSMA() function and try to choose 3 as period number.

Visualize the model to do comparison

From the chart above, we can see slight difference betweeen the actual series and the SMA series. If we change tha value of n in SMA model into a higher value, there will be more removed data.

Do the forecasting

Visualize the forecast model

From the line graph above, there is a slight difference in the last data. SMA model produced a lower value of US dollar rate.

Check for the accuracy

##                  ME       RMSE        MAE      MPE     MAPE      ACF1 Theil's U
## Test set 0.02629245 0.02978435 0.02650706 2.926293 2.950941 0.9851078  14.93062

This model has a low RMSE (Root Mean Squared Error) and MAE (Mean Abosulte Error) which is a good news.

Exponential Smoothing

Before choosing the type of exponential smoothing, we have to find out whether there is seasonal pattern on the model or no. I can use stl_features() function to do it.

##           nperiods   seasonal_period1   seasonal_period2              trend 
##       2.000000e+00       7.000000e+00       3.650000e+02       9.304959e-01 
##              spike          linearity          curvature             e_acf1 
##       3.152743e-14       3.458855e+00      -3.451180e-01       9.853241e-01 
##            e_acf10 seasonal_strength1 seasonal_strength2              peak1 
##       8.313717e+00       4.926877e-03       1.134930e-01       1.000000e+00 
##              peak2            trough1            trough2 
##       1.460000e+02       5.000000e+00       2.910000e+02

From the result above, we know that the model has seasonal period and seasonal strength. Therefore, we can create exponential smoothing using seasonal parameter.

ETS (Error, Trend, Seasonal)

In stlm() function, there are several parameters that have to be assigned :

  • y : time series object
  • method : method of modelling (“ets”, “arima”) /li>

Now, I will try to make create model using stlm() function

Create foercast object to make a comparison using autoplot()

Great! The cart above shows us the model partition to compare the result. It seems that forecast_train data using stlm() ets() gives much difference.

Next step is model evaluation. I will use accuracy() function

##                  ME       RMSE        MAE      MPE     MAPE      ACF1 Theil's U
## Test set 0.02554848 0.02896997 0.02577542 2.843545 2.869584 0.9792286   14.5232

By using accuracy() function, we can observe numbers of errors that made by the model. I woll try to focus on RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error). It is very clear that RMSE and MAE value of this ETS forecasting is very low (~0.02), hence the model built produced a high accuracy result.

Holt Winters

HoltWinters method is good to be used on model that has trend and seasonal. It will do smoothing on trend and error components. Let’s call HoltWinters() function.

Do forecasting for the data test

I will try to visualize using autoplot().

The forecast result produced using Holt Winters seem slightly different form the actual data. To make sure, call the accuracy() function.

##                  ME       RMSE        MAE     MPE     MAPE      ACF1 Theil's U
## Test set 0.05306738 0.06057553 0.05394233 5.91461 6.014995 0.9887554  30.44576

As we look at the error result, this model has a low error produce (~0.05 - 0.06) for RMSE and MAE.

Autoregressive Integrated Moving Average (ARIMA)

ARIMA is a combination of two forecasting method, Autoregressive (AR) and Moving Average (MA).

Now fit the model. I am going to use stlm() function do to it and put “arima” on method argument.

Do the forecasting

And visualize time!

Check the accuracy :

##                  ME       RMSE        MAE      MPE     MAPE      ACF1 Theil's U
## Test set 0.02586933 0.02938157 0.02606508 2.879121 2.901588 0.9800462  14.72804

Based on the result, RMSE and MAE value are also very low. It means it is a good model forecasting.

Forecasting For Incoming year 2020

After establishing all of the three forecasting method, we can conclude that all of them produced good models (low errors). Hence, I am going to try to build three models for year 2020.
At this part, I do no have to split the data because I am trying to forecast next 365 days using whole data.

Simple Moving Average

Build the SMA model

Forecasting

Visualization

Exponential Smoothing

Same as the previous, I am going to build model using three methods.

ETS (Error, Trend, Seasonal)

Build The model

Forecasting

Visualize!

Store the result to an object:

Holt Winters

Build the model:

Forecasting :

Visualize :

Store the result to an object

Autoregressive Integrated Moving Average (ARIMA)

Build the model :

Forecasting :

Visualize :

Store the result to an object

Conclusion

To conlclude, I am going to combine all of the result data frame into one data.frame

I will show the error comparison

I will check the range of result

## [1] 0.8726728 0.8955830
## [1] 0.8765365 0.8995205
## [1] 0.8190207 0.9139066
## [1] 0.8770886 0.8997408

Overall, the forecast results between all models are quite similar. They can produce results with low error. Therefore, these models worked well on the dataset. And for final result, the US Dollar rate for Euro in 2020 will be around 0.87 - 0.91.

Ending

So, that’s all for the process of time-series forecasting using packages in R programming language.I hope this page can help you understand time-series problem and the solution behind it.

See you in the other page!

Author,
Alfado Sembiring

Notes :
In case you want to look up my profile, click the link below :
Jump To My Profile (open link in a new tab)