Forecsating & Timeseries

Introduction

In this exercise, we have covered various topics including:

Time Series Data:

Plots, trends, and seasonal variation
Decomposition of series

Correlation:

Expected value
Autocorrelation
The correlogram

Forecasting Strategies:

Leading variables and associated variables
Bass model
Exponential smoothing and the Holt-Winters method.

Dataset: We will be using goy dataset that represents daily prices of gold, oil, and the price of 1 US dollar in terms of Japanese yen.

Q-1:

Download goy data set posted on Moodle and lable it goy. Set the first column in each data set to the date format and the remaining columns in numerical format.

goy <- read_excel("goy.xls", col_types = c("date", "numeric", "numeric", "numeric"))

class(goy$observation_date)

## [1] "POSIXct" "POSIXt"

Q-2:

Create a new data set called “goycc” that contains all complete cases of goy data. Utilize complete.cases function.

str(goy)

## tibble [879 x 4] (S3: tbl_df/tbl/data.frame)
##  $ observation_date: POSIXct[1:879], format: "1946-01-01" "1946-02-01" ...
##  $ gold            : num [1:879] NA NA NA NA NA NA NA NA NA NA ...
##  $ oil             : num [1:879] 1.17 1.17 1.17 1.27 1.27 1.27 1.27 1.52 1.52 1.52 ...
##  $ yen             : num [1:879] NA NA NA NA NA NA NA NA NA NA ...

goycc <- goy[complete.cases(goy),]
str(goycc)

## tibble [578 x 4] (S3: tbl_df/tbl/data.frame)
##  $ observation_date: POSIXct[1:578], format: "1971-01-01" "1971-02-01" ...
##  $ gold            : num [1:578] 37.9 38.7 38.9 39 40.5 ...
##  $ oil             : num [1:578] 3.56 3.56 3.56 3.56 3.56 3.56 3.56 3.56 3.56 3.56 ...
##  $ yen             : num [1:578] 358 358 358 358 357 ...

Observation: We can see that we have 879 observation in original dataset, some of which are NA. Picking only complete cases reduced our observations to 578.

Q-3:

Create a stand alone variable “date” that takes on values of “observation_date” variable from the goycc data set. Set the mode of the variable to character

date <- as.character(goycc$observation_date)
head(date)

## [1] "1971-01-01" "1971-02-01" "1971-03-01" "1971-04-01" "1971-05-01"
## [6] "1971-06-01"

Q-4:

Find the range of dates covered in goycc data set by applying range() function to “date” variable.

range(date)

## [1] "1971-01-01" "2019-02-01"

We have dates ranging from “1971-01-01” to “2019-02-01”.

Q-5:

Create a time series object called “goyccts” by utilizing goycc dataset and ts() function. In this dataset please exclude the first column of the goycc dataset.

goycc2 <- goycc[,-1] # Removing 1st date column
goyccts <- ts(goycc2, start = c(1971, 1), end = c(2019, 2), freq = 12) # Converting to time series
head(goyccts)

##          gold  oil      yen
## [1,] 37.86750 3.56 358.0200
## [2,] 38.71600 3.56 357.5450
## [3,] 38.87283 3.56 357.5187
## [4,] 39.00100 3.56 357.5032
## [5,] 40.49250 3.56 357.4130
## [6,] 40.10477 3.56 357.4118

Once we convert to time series object, we do not need to keep date column, as we already know when the observations occur.

Q-6:

Reassign the value of the yen varible from the goycc data set by converting the exchange rate of yen that represents the price of 1 US Dollar in terms of Japanese yen to represent the price of 1 Yen in terms of US Dollar. This way if the number increases it represent appreciation of Yen. Hint: Reassign the value of yen variable by taking a reciprocal.

goyccts[,"yen"] <- 1/goyccts[,"yen"]
head(goyccts)

##          gold  oil         yen
## [1,] 37.86750 3.56 0.002793140
## [2,] 38.71600 3.56 0.002796851
## [3,] 38.87283 3.56 0.002797057
## [4,] 39.00100 3.56 0.002797178
## [5,] 40.49250 3.56 0.002797884
## [6,] 40.10477 3.56 0.002797893

Q-7:

Plot the time series plot of the three assets. Do you see any trend? Do you see any seasonal component?

plot(goyccts)

Observation: A positive trend is apparent in all three assets. There maybe some Seasonal fluctuations in all, though hard to tell.

Q-8:

Utilize the aggregate function to plot annual prices of the three assets. How does this graph differ from the monthly time series plot?

plot(aggregate(goyccts))

Observation: The graph is smoother as it helps us eliminate the seasonal monthly fluctuations, leaving us just the trend.

Q-9:

Find the average summer price of oil for the entire sample.

oil_june <- window(goyccts[,"oil"], start = c(1971,6), freq = TRUE)
oil_july <- window(goyccts[,"oil"], start = c(1971,7), freq = TRUE)
oil_aug <- window(goyccts[,"oil"], start = c(1971,8), freq = TRUE)

oil_summer_mean <- mean(c(oil_june, oil_july, oil_aug))
oil_summer_mean

## [1] 37.26092

Freq=True ensures taking summer months of each year and saving it in window.

Q-10:

Find the average winter price of oil for the entire sample.

oil_dec <- window(goyccts[,"oil"], start = c(1971,12), freq = TRUE)
oil_jan <- window(goyccts[,"oil"], start = c(1971,1), freq = TRUE)
oil_feb <- window(goyccts[,"oil"], start = c(1971,2), freq = TRUE)

oil_winter_mean <- mean(c(oil_dec, oil_jan, oil_feb))
oil_winter_mean

## [1] 34.74591

Freq=True ensures taking winter months’ values for each year and saving it in window.

Q-11:

How does the summer price of oil compare to the winter price of oil. Please provide your answer in percentages.

oil_summer_ratio <- (oil_summer_mean / mean(goyccts[,"oil"]))*100
oil_winter_ratio <- (oil_winter_mean / mean(goyccts[,"oil"]))*100
cbind(oil_summer_ratio, oil_winter_ratio)

##      oil_summer_ratio oil_winter_ratio
## [1,]         102.7733         95.83637

summer_winter_pct <- ((oil_summer_mean - oil_winter_mean)/oil_winter_mean)*100

Observation: We can see that oil prices in Summer are 102% of the average oil prices all year. Where as winters’ oil prices are 96% of the average oil prices. Hence, we can say that oil prices soar in Summers, and plunge in Winters. Or we can also say that in Summer the price of oil is 7% higher than that in Winters.

Q-12:

Use window() function to create three stand alone variables “gold”, “oil”, and “yen” that take on values of the “gold”, “oil”, and “yen” variables from the goyccts dataset starting from January of 2005

gold <- window(goyccts[,"gold"], start = c(2005,1))
oil <- window(goyccts[,"oil"], start = c(2005,1))
yen <- window(goyccts[,"yen"], start = c(2005,1))

Q-13:

Use plot() and decompose() functions to generate three graphs that would depict the observed values, trends, seasonal, and random components for “gold”, “oil” and “yen” variables. Would you choose multiplicative or additive decomposition model for each of the variables?

oil_decom <- decompose(oil, type = "additive")
gold_decom <- decompose(gold, type = "additive")
yen_decom <- decompose(yen, type = "additive")

plot(oil_decom)

plot(gold_decom)

plot(yen_decom)

We choose additive or multiplicative based on the appearance of “random” component by hit-and-trial. We are going to stick with the decomposition type that results in a random component with unchanged variance.
For all three, the variance seems to stay the same when we apply Additive decomposition, so we are going to stick with that.

Q-14:

For each of the variables extract the random component and save them as “goldrand”, “oilrand”, and “yenrand”. Moreover, use na.omit() function to deal with the missing values.

goldrand <- na.omit(gold_decom$random)
oilrand <- na.omit(oil_decom$random)
yenrand <- na.omit(yen_decom$random)

Q-15:

For the random component of each of the assets, please estimate autocorrelation function. Does any of the assets exhibit autocorrelation? If yes, to what degree? Keep in mind there are missing values.

# Plot correlelograms of random components
acf(goldrand, na.action = na.pass)

acf(oilrand, na.action = na.pass)

acf(yenrand, na.action = na.pass)

Observation:

goldrand seems to exhibit autocorrelation: specifically, lags 1 though 9 seem statistically significant, except for lag=3.
oilrand seems to exhibit autocorrelation: specifically, lags 1 though 8 seem statistically significant, except for lag=3.
yenrand seems to exhibit autocorrelation for most of the lags, with the exception of just a few. It also seems to follow a sinusoidal trend.

This means we did not capture all variation in our decompostition function; there is still some variance in this residual series.

Q-16:

For all possible pairs of assets please estimate cross-correlation function Do any of the variable lead or precede each other? Could you use any of the varibales to predict values of other variables? Make sure to use detranded and seasonally adjusted variables. (“goldrand”, “oilrand”, and “yenrand”).

Plotting the 3 variables’ random components on the same graph to visually observe relationship:

ts.plot(goldrand, oilrand, yenrand, lty = c(1,2,3), col=c("red","blue","green"))

Observation: It appears that both gold and yen lead oil. However within gold and yen, it is unclear as to which one leads the other as their peeks and troughs seem to interchange from leading the other over time.
Cross-correlation pairs:

ccf(goldrand, oilrand, na.action = na.pass)

ccf(yenrand, oilrand, na.action = na.pass)

ccf(yenrand, goldrand, na.action = na.pass)

Observation:

ccf(goldrand, oilrand): 4th lag of goldrand seems to show highest positive cross-correlation to oilrand. It suggests that gold leads oil prices by 4 months.
ccf(yenrand, oilrand): 1st lag of oilrand seems to show highest cross-correlation to oilrand. It suggests that oil leads yen prices by 1 month.
ccf(yenrand, goldrand): 1st lag of yenrand seems to show highest positive cross-correlation to goldrand. It suggests that yen leads gold prices by 1 month.

Q-17:

Based on the time series plot of gold, oil, and yen prices, there appears to be no systematic trends or seasonal effects. Therefore, it is reasonable to use exponential smoothing for these time series. Estimate alpha, the smoothing parameter for gold, oil and yen. What does the value of alpha tell you about the behavior of the mean? What is the estimated value of the mean for each asset?

gold_hw1 <- HoltWinters(gold, beta =F, gamma = F)
gold_hw1

## Holt-Winters exponential smoothing without trend and without seasonal component.
## 
## Call:
## HoltWinters(x = gold, beta = F, gamma = F)
## 
## Smoothing parameters:
##  alpha: 0.9999271
##  beta : FALSE
##  gamma: FALSE
## 
## Coefficients:
##       [,1]
## a 1319.753

Interpretation: Estimate of alpha (smoothing parameter) for gold = 0.99. This means 99% of the mean is determined by current value of x at time t. Estimated value of mean for gold is a = 1319.7.

#Exponential Smoothing is a case of HoltWinters when Beta and Gamma = F
oil_hw1 <- HoltWinters(oil, beta =F, gamma = F) 
oil_hw1

## Holt-Winters exponential smoothing without trend and without seasonal component.
## 
## Call:
## HoltWinters(x = oil, beta = F, gamma = F)
## 
## Smoothing parameters:
##  alpha: 0.9999263
##  beta : FALSE
##  gamma: FALSE
## 
## Coefficients:
##       [,1]
## a 54.94974

Interpretation: Estimate of alpha (smoothing parameter) for oil = 0.99. This means 99% of the mean is determined by current value of x at time t. Estimated value of mean for oil is a = 54.95.

yen_hw1 <- HoltWinters(yen, beta =F, gamma = F)
yen_hw1

## Holt-Winters exponential smoothing without trend and without seasonal component.
## 
## Call:
## HoltWinters(x = yen, beta = F, gamma = F)
## 
## Smoothing parameters:
##  alpha: 0.9999431
##  beta : FALSE
##  gamma: FALSE
## 
## Coefficients:
##          [,1]
## a 0.009054697

Interpretation: Estimate of alpha (smoothing parameter) for yen = 0.99. This means 99% of the mean is determined by current value of x at time t. Estimated value of mean for yen is a = 0.009055.

Q-18:

Use plot() function to generate three graphs that depict observed and exponentially smoothed values for each asset.

plot(gold_hw1)

plot(oil_hw1)

plot(yen_hw1)

Observation: These plots show how much the value of estimated mean changes. We can see that forecasted value (in red) lies pretty close to the actual values for all asstets when alpha is set at auto generation.

Q-19:

Use window() function to create 3 new variables called “goldpre”, “oilpre”, and “yenpre” that cover the period from January 2005, until August 2018.

goldpre <- window(gold, start = c(2005,1), end = c(2018, 8))
oilpre <- window(oil, start = c(2005,1), end = c(2018, 8))
yenpre <- window(yen, start = c(2005,1), end = c(2018, 8))

Q-20:

Use window() function to create 3 new variables called goldpost, oilpost, and yenpost that cover the period from September 2018, until February 2019.

goldpost <- window(gold, start = c(2018, 9), end = c(2019, 2))
oilpost <- window(oil, start = c(2018, 9), end = c(2019, 2))
yenpost <- window(yen, start = c(2018, 9), end = c(2019, 2))

Q-21:

Estimate HoltWinters filter model for each asset, while using only pre data. Save each of these estimates as “gold.hw”, “oil.hw”, and “yen.hw”.

gold.hw <- HoltWinters(goldpre, seasonal="additive")
oil.hw <- HoltWinters(oilpre, seasonal="additive")
yen.hw <- HoltWinters(yenpre, seasonal="additive")

Observation:

gold.hw: alpha = 0.85, estimated mean = 1198
oil.hw: alpha = 1, estimated mean = 62.9
alpha = 0.86, estimated mean = 0.0087

Q-22:

Use HoltWinters filter estimates generated in#23 and predict() function to create a 6 month ahead forecast of the gold, oil, and yen prices. Save these forcasted values as “goldforc”, “oilforc”, and “yenforc”.

goldforc <- predict(gold.hw, n.ahead=6)
oilforc <- predict(oil.hw, n.ahead=6)
yenforc <- predict(yen.hw, n.ahead=6)

Q-23:

Use ts.plot() function to plot side-by-side post sample prices (“goldpost”, “oilpost”,“yenpost”) and their forecasted counterparts. Please designate red color to represent the actual prices, and blue doted lines to represent forecasted values.

ts.plot(goldpost, goldforc, lty = 1:2, col = c("red","blue"))

ts.plot(oilpost, oilforc, lty = 1:2, col = c("red","blue"))

ts.plot(yenpost, yenforc, lty = 1:2, col = c("red","blue"))

Observation: For all plots we can see that initially, for first few months, the forecast seems to be accurate but then it becomes less accurate when compared to actual values.

Q-24:

Please calculate forecast mean percentage error for each assets forecasting model. Which asset’s forecasting model has the lowest mean percentage error?

gold_mpe <- mean(((goldpost-goldforc)/goldpost)*100)
oil_mpe <- mean(((oilpost-oilforc)/oilpost)*100)
yen_mpe <- mean(((yenpost-yenforc)/yenpost)*100)

cbind(gold_mpe,oil_mpe,yen_mpe)

##      gold_mpe   oil_mpe  yen_mpe
## [1,]  5.98464 -7.141958 3.700225

Observation: Yen’s forecasting model has the lowest mean percentage error of 3.7%.

Q-25:

Use gold, oil, and yen variables to estimate HoltWinters model for each asset. Save these estimates as “goldc.hw”, “oilc.hw”, and “yenc.hw”.

gold.hw <- HoltWinters(gold, seasonal="additive")
oil.hw <- HoltWinters(oil, seasonal="additive")
yen.hw <- HoltWinters(yen, seasonal="additive")

Q-26:

Use “goldc.hw”, “oilc.hw”, and “yenc.hw” models to create an out-of-sample forecasts to predict the prices of each of the assets for the rest of the 2019. Save these forecasts as “goldforcos”, “oilforcos”, “yenforcos”.

goldforcos <- predict(gold.hw, n.ahead=10)
oilforcos <- predict(oil.hw, n.ahead=10)
yenforcos <- predict(yen.hw, n.ahead=10)

What is the forecasted price of Gold for November 2019?

gold_price <- window(goldforcos, start = c(2019,11), end = c(2019,11))[[1]]

Forecasted price of gold for November 2019 would be 1276.9364195 yen.

Q-27:

Create time series plots for each asset, that combines the actual price data of each asset and their out-of-sample forecasted values. Please designate red color to represent the actual prices, and blue doted lines to represent forecasted values. What do you think will happen to the price of each asset by the end of the year?

ts.plot(gold, goldforcos, lty = c(1,4), col = c("red","blue"))

ts.plot(oil, oilforcos, lty = c(1,4), col = c("red","blue"))

ts.plot(yen, yenforcos, lty = c(1,4), col = c("red","blue"))

Predictions:

Gold: It appears that gold’s price would very slightly decrease by about a 100 yen by end of year.
Oil: It appears that oil’s price would very slightly rise by about 20 yen, only to return to the same price by end of year.
Yen: It appears that yen would very slightly rise by about 0.0005, only to return to the same value by end of year.

Q-28:

Please calculate percentage between the price of each asset in February 2019 and their forecasted December 2019 prices. Which asset promises the highest rate of return?

gold_dec <- window(goldforcos, start = c(2019,12))[[1]]
gold_feb <- window(gold, start = c(2019,2))[[1]]
gold_appreciate <- (gold_dec/gold_feb)*100

oil_dec <- window(oilforcos, start = c(2019,12))[[1]]
oil_feb <- window(oil, start = c(2019,2))[[1]]
oil_appreciate <- (oil_dec/oil_feb)*100

yen_dec <- window(yenforcos, start = c(2019,12))[[1]]
yen_feb <- window(yen, start = c(2019,2))[[1]]
yen_appreciate <- (yen_dec/yen_feb)*100

cbind(gold_appreciate, oil_appreciate, yen_appreciate)

##      gold_appreciate oil_appreciate yen_appreciate
## [1,]        96.16837         105.55       100.0761

Result: It appears that of the three assets, oil is forecasted to give the highest return by December 2019.

Forecsating & Timeseries - Exercise 1

Ayesha Khan

6/18/2020

Introduction

Q-1:

Q-2:

Q-3:

Q-4:

Q-5:

Q-6:

Q-7:

Q-8:

Q-9:

Q-10:

Q-11:

Q-12:

Q-13:

Q-14:

Q-15:

Q-16:

Q-17:

Q-18:

Q-19:

Q-20:

Q-21:

Q-22:

Q-23:

Q-24:

Q-25:

Q-26:

Q-27:

Q-28: