The goal for this week is to forecast the right hand side variables in a time series model using the known response variable.

The dataset consists of share prices for Safeway Inc. and Google Trends data for the word “recipe” over the past two years. Monthly share prices are used to match up with the monthly format of the Google Trends data. Because of the way Google Trends data is calculated (self-indexed within the time period and search term set), the search window is all years of Google Trends (2004-present) and only data for the term “recipe” is returned.

library(quantmod)
library(forecast)
library(lubridate)
getSymbols('KR', src = 'yahoo', 
           from = Sys.Date() - years(5), to = Sys.Date())

KR <- data.frame(
  KR,
  date = as.Date(rownames(data.frame(KR)))
  )

library(gtrendsR)
trends.dat <- gtrends('recipe', 'US')$interest_over_time

kr.rounded <- KR%>%
    left_join(
      aggregate(KR["date"], list(month = substr(KR$date, 1, 7)), max)%>%
        mutate(keep = 1)
    )%>%
  filter(keep == 1)%>%
  mutate(month = as.Date(paste0(month,'-01')))

gt.rounded <- trends.dat%>%
  mutate(month = substr(as.character(date),1,7))%>%
  group_by(month)%>%
  summarise(hits = mean(hits))%>%
  mutate(month = as.Date(paste0(month,'-01')))

kr <- inner_join(
  kr.rounded, gt.rounded,
  by = 'month'
)%>%
  select(kr.close = KR.Adjusted, month, hits)%>%
  ts(frequency = 12)

As a first exploratory step, I want to plot the decomposition of each time series to get a general idea of which components may correlate. Kroger share prices show seasonal spikes around winter and troughs in the summer and within the time period there is a generally positive trend (although it may be turning downward!). Search results for “recipe” show a downward trend over the past five years with similar but more severe seasonality around winter.

library(gridExtra)
library(ggplot2)
# devtools::install_github('yaztheme','joshyazman')
library(yaztheme)
dec.hits <- decompose(kr[,'hits'])
dec.kr <- decompose(kr[,'kr.close'])
grid.arrange(
  autoplot(dec.hits)+
    labs(title = 'Relative Search Frequency: "Recipe"',
         x = 'Years Ago', y = element_blank())+
    theme_yaz()+
    scale_x_continuous(breaks = c(1,2,3,4,5), labels = c('5','4','3','2','1'))+
    scale_color_manual(values = yaz_cols),
  autoplot(dec.kr)+
    labs(title = 'Adjusted Closing Price: "Kroger"',
         x = 'Years Ago', y = element_blank())+
    theme_yaz()+
    scale_x_continuous(breaks = c(1,2,3,4,5), labels = c('5','4','3','2','1'))+
    scale_color_manual(values = yaz_cols),
  nrow = 1
)

Using the trend in Kroger share prices to forecast average monthly search interest for “recipes” appears to require a quadratic trend model due to the curved shape of the trend line. Another option is to use the logarithm to linearize share price trends.

The first model is the quadratic combination of seasonally adjusted share prices and seasonally adjusted average search interest. This model has a root-mean-squared-error of 4.45 (for data indexed to a maximum value of 100) with residuals close to mean zero. The Breusch-Godfrey test returned a p-value of .001 indicating a low likelihood of serial autocorrelation.

quad.lm <- tslm(formula = dec.hits$trend~dec.kr$trend + dec.kr$trend^2)
quad.acc <- data.frame(accuracy(quad.lm))%>%mutate(method = 'Quadratic')
longer object length is not a multiple of shorter object length
checkresiduals(quad.lm)

    Breusch-Godfrey test for serial correlation of order up to 24

data:  Residuals from Linear regression model
LM test = 48.906, df = 24, p-value = 0.001946

The next model attempts to log-linearize seasonally adjusted share prices for Kroger and use that variable as a regressor against search interest. In this case, RMSE ticked up a notch as did the Breusch-Godfrey p-value.

fit.lm <- tslm(formula = dec.hits$trend~log(dec.kr$trend))
fit.acc <- data.frame(accuracy(fit.lm))%>%mutate(method = 'Simple')
longer object length is not a multiple of shorter object length
checkresiduals(fit.lm)

    Breusch-Godfrey test for serial correlation of order up to 24

data:  Residuals from Linear regression model
LM test = 48.372, df = 24, p-value = 0.002269

A final comparison of the two above models illustrates that the two models are very, very close in performance. That said, across all five bias and variance metrics presented below, the quadratic model ekes out a win.

mod.health <- bind_rows(quad.acc, fit.acc)%>%select(-MASE)
library(reshape2)
mod.health%>%
  melt(id.vars = 'method')%>%
  ggplot(aes(x = variable, y = value, fill = method))+
    geom_bar(stat = 'identity', position = 'dodge')+
    scale_fill_manual(name = 'Method', values = yaz_cols[3:4])+
    theme_yaz()+
    labs(title = 'Model Accuracy Measures',
         x = element_blank(), y = 'Measure Value')

LS0tDQp0aXRsZTogIlBSRURJQ1QgNDEzIC0gRGlzY3Vzc2lvbiBXZWVrIDYiDQphdXRob3I6ICdKb3NoIFlhem1hbicNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNClRoZSBnb2FsIGZvciB0aGlzIHdlZWsgaXMgdG8gZm9yZWNhc3QgdGhlIHJpZ2h0IGhhbmQgc2lkZSB2YXJpYWJsZXMgaW4gYSB0aW1lIHNlcmllcyBtb2RlbCB1c2luZyB0aGUga25vd24gcmVzcG9uc2UgdmFyaWFibGUuIA0KDQpUaGUgZGF0YXNldCBjb25zaXN0cyBvZiBzaGFyZSBwcmljZXMgZm9yIFNhZmV3YXkgSW5jLiBhbmQgR29vZ2xlIFRyZW5kcyBkYXRhIGZvciB0aGUgd29yZCAicmVjaXBlIiBvdmVyIHRoZSBwYXN0IHR3byB5ZWFycy4gTW9udGhseSBzaGFyZSBwcmljZXMgYXJlIHVzZWQgdG8gbWF0Y2ggdXAgd2l0aCB0aGUgbW9udGhseSBmb3JtYXQgb2YgdGhlIEdvb2dsZSBUcmVuZHMgZGF0YS4gQmVjYXVzZSBvZiB0aGUgd2F5IEdvb2dsZSBUcmVuZHMgZGF0YSBpcyBjYWxjdWxhdGVkIChzZWxmLWluZGV4ZWQgd2l0aGluIHRoZSB0aW1lIHBlcmlvZCBhbmQgc2VhcmNoIHRlcm0gc2V0KSwgdGhlIHNlYXJjaCB3aW5kb3cgaXMgYWxsIHllYXJzIG9mIEdvb2dsZSBUcmVuZHMgKDIwMDQtcHJlc2VudCkgYW5kIG9ubHkgZGF0YSBmb3IgdGhlIHRlcm0gInJlY2lwZSIgaXMgcmV0dXJuZWQuDQpgYGB7ciwgZWNobyA9IFRSVUV9DQpsaWJyYXJ5KHF1YW50bW9kKQ0KbGlicmFyeShmb3JlY2FzdCkNCmxpYnJhcnkobHVicmlkYXRlKQ0KZ2V0U3ltYm9scygnS1InLCBzcmMgPSAneWFob28nLCANCiAgICAgICAgICAgZnJvbSA9IFN5cy5EYXRlKCkgLSB5ZWFycyg1KSwgdG8gPSBTeXMuRGF0ZSgpKQ0KDQpLUiA8LSBkYXRhLmZyYW1lKA0KICBLUiwNCiAgZGF0ZSA9IGFzLkRhdGUocm93bmFtZXMoZGF0YS5mcmFtZShLUikpKQ0KICApDQoNCmxpYnJhcnkoZ3RyZW5kc1IpDQp0cmVuZHMuZGF0IDwtIGd0cmVuZHMoJ3JlY2lwZScsICdVUycpJGludGVyZXN0X292ZXJfdGltZQ0KDQprci5yb3VuZGVkIDwtIEtSJT4lDQogICAgbGVmdF9qb2luKA0KICAgICAgYWdncmVnYXRlKEtSWyJkYXRlIl0sIGxpc3QobW9udGggPSBzdWJzdHIoS1IkZGF0ZSwgMSwgNykpLCBtYXgpJT4lDQogICAgICAgIG11dGF0ZShrZWVwID0gMSkNCiAgICApJT4lDQogIGZpbHRlcihrZWVwID09IDEpJT4lDQogIG11dGF0ZShtb250aCA9IGFzLkRhdGUocGFzdGUwKG1vbnRoLCctMDEnKSkpDQoNCmd0LnJvdW5kZWQgPC0gdHJlbmRzLmRhdCU+JQ0KICBtdXRhdGUobW9udGggPSBzdWJzdHIoYXMuY2hhcmFjdGVyKGRhdGUpLDEsNykpJT4lDQogIGdyb3VwX2J5KG1vbnRoKSU+JQ0KICBzdW1tYXJpc2UoaGl0cyA9IG1lYW4oaGl0cykpJT4lDQogIG11dGF0ZShtb250aCA9IGFzLkRhdGUocGFzdGUwKG1vbnRoLCctMDEnKSkpDQoNCmtyIDwtIGlubmVyX2pvaW4oDQogIGtyLnJvdW5kZWQsIGd0LnJvdW5kZWQsDQogIGJ5ID0gJ21vbnRoJw0KKSU+JQ0KICBzZWxlY3Qoa3IuY2xvc2UgPSBLUi5BZGp1c3RlZCwgbW9udGgsIGhpdHMpJT4lDQogIHRzKGZyZXF1ZW5jeSA9IDEyKQ0KYGBgDQoNCkFzIGEgZmlyc3QgZXhwbG9yYXRvcnkgc3RlcCwgSSB3YW50IHRvIHBsb3QgdGhlIGRlY29tcG9zaXRpb24gb2YgZWFjaCB0aW1lIHNlcmllcyB0byBnZXQgYSBnZW5lcmFsIGlkZWEgb2Ygd2hpY2ggY29tcG9uZW50cyBtYXkgY29ycmVsYXRlLiBLcm9nZXIgc2hhcmUgcHJpY2VzIHNob3cgc2Vhc29uYWwgc3Bpa2VzIGFyb3VuZCB3aW50ZXIgYW5kIHRyb3VnaHMgaW4gdGhlIHN1bW1lciBhbmQgd2l0aGluIHRoZSB0aW1lIHBlcmlvZCB0aGVyZSBpcyBhIGdlbmVyYWxseSBwb3NpdGl2ZSB0cmVuZCAoYWx0aG91Z2ggaXQgbWF5IGJlIHR1cm5pbmcgZG93bndhcmQhKS4gU2VhcmNoIHJlc3VsdHMgZm9yICJyZWNpcGUiIHNob3cgYSBkb3dud2FyZCB0cmVuZCBvdmVyIHRoZSBwYXN0IGZpdmUgeWVhcnMgd2l0aCBzaW1pbGFyIGJ1dCBtb3JlIHNldmVyZSBzZWFzb25hbGl0eSBhcm91bmQgd2ludGVyLiANCmBgYHtyLCBmaWcud2lkdGg9OCwgZmlnLmhlaWdodD00LCBlY2hvID0gVFJVRX0NCmxpYnJhcnkoZ3JpZEV4dHJhKQ0KbGlicmFyeShnZ3Bsb3QyKQ0KIyBkZXZ0b29sczo6aW5zdGFsbF9naXRodWIoJ3lhenRoZW1lJywnam9zaHlhem1hbicpDQpsaWJyYXJ5KHlhenRoZW1lKQ0KDQpkZWMuaGl0cyA8LSBkZWNvbXBvc2Uoa3JbLCdoaXRzJ10pDQpkZWMua3IgPC0gZGVjb21wb3NlKGtyWywna3IuY2xvc2UnXSkNCmdyaWQuYXJyYW5nZSgNCiAgYXV0b3Bsb3QoZGVjLmhpdHMpKw0KICAgIGxhYnModGl0bGUgPSAnUmVsYXRpdmUgU2VhcmNoIEZyZXF1ZW5jeTogIlJlY2lwZSInLA0KICAgICAgICAgeCA9ICdZZWFycyBBZ28nLCB5ID0gZWxlbWVudF9ibGFuaygpKSsNCiAgICB0aGVtZV95YXooKSsNCiAgICBzY2FsZV94X2NvbnRpbnVvdXMoYnJlYWtzID0gYygxLDIsMyw0LDUpLCBsYWJlbHMgPSBjKCc1JywnNCcsJzMnLCcyJywnMScpKSsNCiAgICBzY2FsZV9jb2xvcl9tYW51YWwodmFsdWVzID0geWF6X2NvbHMpLA0KICBhdXRvcGxvdChkZWMua3IpKw0KICAgIGxhYnModGl0bGUgPSAnQWRqdXN0ZWQgQ2xvc2luZyBQcmljZTogIktyb2dlciInLA0KICAgICAgICAgeCA9ICdZZWFycyBBZ28nLCB5ID0gZWxlbWVudF9ibGFuaygpKSsNCiAgICB0aGVtZV95YXooKSsNCiAgICBzY2FsZV94X2NvbnRpbnVvdXMoYnJlYWtzID0gYygxLDIsMyw0LDUpLCBsYWJlbHMgPSBjKCc1JywnNCcsJzMnLCcyJywnMScpKSsNCiAgICBzY2FsZV9jb2xvcl9tYW51YWwodmFsdWVzID0geWF6X2NvbHMpLA0KICBucm93ID0gMQ0KKQ0KYGBgDQoNClVzaW5nIHRoZSB0cmVuZCBpbiBLcm9nZXIgc2hhcmUgcHJpY2VzIHRvIGZvcmVjYXN0IGF2ZXJhZ2UgbW9udGhseSBzZWFyY2ggaW50ZXJlc3QgZm9yICJyZWNpcGVzIiBhcHBlYXJzIHRvIHJlcXVpcmUgYSBxdWFkcmF0aWMgdHJlbmQgbW9kZWwgZHVlIHRvIHRoZSBjdXJ2ZWQgc2hhcGUgb2YgdGhlIHRyZW5kIGxpbmUuIEFub3RoZXIgb3B0aW9uIGlzIHRvIHVzZSB0aGUgbG9nYXJpdGhtIHRvIGxpbmVhcml6ZSBzaGFyZSBwcmljZSB0cmVuZHMuDQoNClRoZSBmaXJzdCBtb2RlbCBpcyB0aGUgcXVhZHJhdGljIGNvbWJpbmF0aW9uIG9mIHNlYXNvbmFsbHkgYWRqdXN0ZWQgc2hhcmUgcHJpY2VzIGFuZCBzZWFzb25hbGx5IGFkanVzdGVkIGF2ZXJhZ2Ugc2VhcmNoIGludGVyZXN0LiBUaGlzIG1vZGVsIGhhcyBhIHJvb3QtbWVhbi1zcXVhcmVkLWVycm9yIG9mIDQuNDUgKGZvciBkYXRhIGluZGV4ZWQgdG8gYSBtYXhpbXVtIHZhbHVlIG9mIDEwMCkgd2l0aCByZXNpZHVhbHMgY2xvc2UgdG8gbWVhbiB6ZXJvLiBUaGUgQnJldXNjaC1Hb2RmcmV5IHRlc3QgcmV0dXJuZWQgYSBwLXZhbHVlIG9mIC4wMDEgaW5kaWNhdGluZyBhIGxvdyBsaWtlbGlob29kIG9mIHNlcmlhbCBhdXRvY29ycmVsYXRpb24uDQpgYGB7ciwgZWNobyA9IFRSVUV9DQpxdWFkLmxtIDwtIHRzbG0oZm9ybXVsYSA9IGRlYy5oaXRzJHRyZW5kfmRlYy5rciR0cmVuZCArIGRlYy5rciR0cmVuZF4yKQ0KcXVhZC5hY2MgPC0gZGF0YS5mcmFtZShhY2N1cmFjeShxdWFkLmxtKSklPiVtdXRhdGUobWV0aG9kID0gJ1F1YWRyYXRpYycpDQpjaGVja3Jlc2lkdWFscyhxdWFkLmxtKQ0KYGBgDQoNClRoZSBuZXh0IG1vZGVsIGF0dGVtcHRzIHRvIGxvZy1saW5lYXJpemUgc2Vhc29uYWxseSBhZGp1c3RlZCBzaGFyZSBwcmljZXMgZm9yIEtyb2dlciBhbmQgdXNlIHRoYXQgdmFyaWFibGUgYXMgYSByZWdyZXNzb3IgYWdhaW5zdCBzZWFyY2ggaW50ZXJlc3QuIEluIHRoaXMgY2FzZSwgUk1TRSB0aWNrZWQgdXAgYSBub3RjaCBhcyBkaWQgdGhlIEJyZXVzY2gtR29kZnJleSBwLXZhbHVlLiANCmBgYHtyLCBlY2hvID0gVFJVRX0NCmZpdC5sbSA8LSB0c2xtKGZvcm11bGEgPSBkZWMuaGl0cyR0cmVuZH5sb2coZGVjLmtyJHRyZW5kKSkNCmZpdC5hY2MgPC0gZGF0YS5mcmFtZShhY2N1cmFjeShmaXQubG0pKSU+JW11dGF0ZShtZXRob2QgPSAnU2ltcGxlJykNCmNoZWNrcmVzaWR1YWxzKGZpdC5sbSkNCmBgYA0KDQpBIGZpbmFsIGNvbXBhcmlzb24gb2YgdGhlIHR3byBhYm92ZSBtb2RlbHMgaWxsdXN0cmF0ZXMgdGhhdCB0aGUgdHdvIG1vZGVscyBhcmUgdmVyeSwgdmVyeSBjbG9zZSBpbiBwZXJmb3JtYW5jZS4gVGhhdCBzYWlkLCBhY3Jvc3MgYWxsIGZpdmUgYmlhcyBhbmQgdmFyaWFuY2UgbWV0cmljcyBwcmVzZW50ZWQgYmVsb3csIHRoZSBxdWFkcmF0aWMgbW9kZWwgZWtlcyBvdXQgYSB3aW4uDQpgYGB7ciwgZWNobyA9IFRSVUV9DQptb2QuaGVhbHRoIDwtIGJpbmRfcm93cyhxdWFkLmFjYywgZml0LmFjYyklPiVzZWxlY3QoLU1BU0UpDQoNCmxpYnJhcnkocmVzaGFwZTIpDQoNCm1vZC5oZWFsdGglPiUNCiAgbWVsdChpZC52YXJzID0gJ21ldGhvZCcpJT4lDQogIGdncGxvdChhZXMoeCA9IHZhcmlhYmxlLCB5ID0gdmFsdWUsIGZpbGwgPSBtZXRob2QpKSsNCiAgICBnZW9tX2JhcihzdGF0ID0gJ2lkZW50aXR5JywgcG9zaXRpb24gPSAnZG9kZ2UnKSsNCiAgICBzY2FsZV9maWxsX21hbnVhbChuYW1lID0gJ01ldGhvZCcsIHZhbHVlcyA9IHlhel9jb2xzWzM6NF0pKw0KICAgIHRoZW1lX3lheigpKw0KICAgIGxhYnModGl0bGUgPSAnTW9kZWwgQWNjdXJhY3kgTWVhc3VyZXMnLA0KICAgICAgICAgeCA9IGVsZW1lbnRfYmxhbmsoKSwgeSA9ICdNZWFzdXJlIFZhbHVlJykNCmBgYA==