このページでは、株式会社インサイト・ファクトリー 2020年度社内研修「マーケティング・ミックス・モデリング」VI章「動学的市場反応モデル (1)」のRコードを公開しています。

準備

library(tidyverse)
## -- Attaching packages --------------------------------------------------------- tidyverse 1.3.0 --
## √ ggplot2 3.3.0     √ purrr   0.3.3
## √ tibble  3.0.0     √ dplyr   0.8.5
## √ tidyr   1.0.2     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.5.0
## -- Conflicts ------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(patchwork)
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(TSA)
## 
## Attaching package: 'TSA'
## The following object is masked from 'package:readr':
## 
##     spec
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## The following object is masked from 'package:utils':
## 
##     tar
library(nlme)
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method       from
##   fitted.Arima TSA 
##   plot.Arima   TSA
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:nlme':
## 
##     getResponse
library(dynlm)
library(FinTS)
## 
## Attaching package: 'FinTS'
## The following object is masked from 'package:forecast':
## 
##     Acf
library(vars)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:patchwork':
## 
##     area
## The following object is masked from 'package:dplyr':
## 
##     select
## Loading required package: strucchange
## Loading required package: sandwich
## 
## Attaching package: 'strucchange'
## The following object is masked from 'package:stringr':
## 
##     boundary
## Loading required package: urca
## Loading required package: lmtest
WORKDIR <- "."


# 「売上と広告」データの作成
set.seed(123)
dfSalesAds <- tibble(
  # 年月
  dMonth = seq(as.Date("2013-09-01"), as.Date("2018-12-01"), by = "month"),
  # TV広告支出
  gTV    = as.vector(
    stats::filter(
      rnorm(length(dMonth), mean = 5, sd = 2),
      filter = 0.4,
      method = "recursive" # AR
    )
  ),
  # Web広告支出
  gWeb = as.vector(
    stats::filter(
      rnorm(length(dMonth), mean = 2, sd = 1.5),
      filter = 0.3,
      method = "recursive" # AR
    )
  ),
  # 売上の撹乱項
  gYErr = as.vector(
    stats::filter(
      rnorm(length(dMonth), mean = 0, sd = 1),
      filter = 0.45,
      method = "recursive"
    )
  )
) %>%
  mutate(
    # 広告支出の負値を0に置き換え、丸める
    gTV  = round(if_else(gTV < 0, 0, gTV), digits = 1),
    gWeb = round(if_else(gWeb < 0, 0, gWeb), digits = 1),
    # 売上の生成
    gSales = 6 + 0.05 * seq_along(dMonth),
    gSales = gSales + 0.10 * gTV ,
    gSales = gSales + 0.30 * lag(gTV, n = 1) ,
    gSales = gSales + 0.20 * lag(gTV, n = 2),
    gSales = gSales + 0.10 * lag(gTV, n = 3),
    gSales = gSales + 0.05 * lag(gTV, n = 4),
    gSales = gSales + 0.50 * gWeb,
    gSales = gSales + 0.30 * lag(gWeb, n = 1),
    gSales = gSales + gYErr
  ) %>%
  # 変数選択
  dplyr::select(dMonth, gSales, gTV, gWeb) %>%
  # ここで欠損行を削除
  drop_na() 
## print(summary(dfSalesAds))

# IBM-SPデータの作成
data(m.ibmspln)
dfIBMSP <- data.frame(m.ibmspln) %>%
  mutate(
    nYear  = as.integer(time(m.ibmspln)), 
    nMonth = as.integer(cycle(m.ibmspln)), 
    dMonth = as.Date(paste0(nYear, "-", nMonth, "-01"))
  ) %>%
  dplyr::select(dMonth, IBM, SP)

code 1. negative binomial lag

dfOmega <- crossing(
  nR = 1:4,
  nK = 0:10,
  gLambda = seq(0.2, 0.8, 0.2)
) %>%
  mutate(
    gTemp1 = gamma(nR + nK), # (r+k-1)!
    gTemp2 = gamma(nR),      # (r-1)!
    gTemp3 = gamma(nK + 1),  # k!
    gOmega = gTemp1 * ((1-gLambda)^nR) * (gLambda^nK) / (gTemp2 * gTemp3)
  )
## print(dfOmega)

# 描画
dfPlot <- dfOmega %>%
  mutate( sR = paste0("r=", nR) )
g <- ggplot(
  data=dfPlot, 
  aes(x = nK, y = gOmega, group = gLambda, color = factor(gLambda))
)
g <- g + geom_line()
g <- g + geom_point()
g <- g + scale_x_continuous(breaks=0:10)
g <- g + facet_grid(~ sR)
g <- g + labs(color = "lambda", x = "k", y = "omega")
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code1.png"), width = 21, height = 12, units = "cm")

code 2. 「売上と広告」データのチャート

dfPlot <- dfSalesAds %>%
  gather(sVar, gValue, c(gSales, gTV, gWeb)) %>%
  mutate(
    fVar = factor(
      sVar,
      levels = c("gSales", "gTV", "gWeb"),
      labels = c("売上数量", "TV支出", "Web支出")
    )
  )
g <- ggplot(data = dfPlot, aes(x = dMonth, y = gValue))
g <- g + geom_line()
g <- g + facet_grid(fVar ~ ., scales = "free")
g <- g + labs(x = "年月", y = "値")
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code2.png"), width = 21, height = 9, units = "cm")

code 3. 「売上と広告」データのACF

png(paste0(WORKDIR, "/chap6_code3.png"), width = 15, height = 25, res = 72, units = "cm")
par(mfrow=c(3,1))
acf(dfSalesAds$gSales)
acf(dfSalesAds$gTV)
acf(dfSalesAds$gWeb)
dev.off()
## png 
##   2

code 4. 「売上と広告」データの単位根検定

ndiffs(dfSalesAds$gSales, test = "adf", type = "trend")
## [1] 0
ndiffs(dfSalesAds$gWeb, test = "adf", type = "trend")
## [1] 0
ndiffs(dfSalesAds$gTV, test = "adf", type = "trend")
## [1] 0

code 5. 「売上と広告」データの散布図

dfPlot <- dfSalesAds %>%
  mutate(
    fYear = factor(year(dMonth)),
    sMonth = as.character(month(dMonth))
  )
# TV
g <- ggplot(data = dfPlot, aes(x = gTV, y = gSales, color = fYear, label = sMonth))
g <- g + geom_text()
g <- g + geom_path(alpha=0.5)
g <- g + guides(colour=FALSE)
g <- g + labs(title = "TV", x = "TV支出", y = "売上")
g <- g + facet_wrap(~ fYear)
g1 <- g + theme_bw()
# Web
g <- ggplot(data = dfPlot, aes(x = gWeb, y = gSales, color = fYear, label = sMonth))
g <- g + geom_text()
g <- g + geom_path(alpha=0.5)
g <- g + guides(colour=FALSE)
g <- g + labs(title = "Web", x = "Web支出", y = "売上")
g <- g + facet_wrap(~ fYear)
g2 <- g + theme_bw()
# 合成
g <- g1 + g2
print(g)

ggsave(paste0(WORKDIR, "/chap6_code5.png"), width = 24, height = 10, units = "cm")

code 6. 「売上と広告」データのccf

png(paste0(WORKDIR, "/chap6_code6.png"), width = 24, height = 9, res = 72, units = "cm")
par(mfrow=c(1,2))
ccf(dfSalesAds$gTV, dfSalesAds$gSales, main = "TV vs. Sales")
ccf(dfSalesAds$gWeb, dfSalesAds$gSales, main = "Web vs. Sales")
dev.off()
## png 
##   2

code 7. CCFとprewhitened CCF

# データ生成
set.seed(123)
sLength <- 10000
dfTemp <- tibble(
  gX = as.vector(stats::filter(rnorm(sLength), filter = 0.5, method = "recursive")),
  gE = stats::filter(rnorm(sLength), filter = 0.1, method = "recursive"),
  gY = gX + gE
) %>%
  drop_na()
# CCF
png(paste0(WORKDIR, "/chap6_code7.1.png"), width = 15, height = 10, res = 72, units = "cm")
ccf(dfTemp$gX, dfTemp$gY, ylab = "CCF", lag.max = 10, main = "X vs. Y")
dev.off()
## png 
##   2
# CCFとprewhitened CCF
png(paste0(WORKDIR, "/chap6_code7.2.png"), width = 20, height = 8, res = 72, units = "cm")
par(mfrow=c(1,2))
ccf(dfTemp$gX, dfTemp$gY, ylab = "CCF", lag.max = 10, main = "標本CCF")
prewhiten(dfTemp$gX, dfTemp$gY, ylab = "prewhitened CCF", lag.max = 10, main = "プレホワイト化標本CCF")
dev.off()
## png 
##   2

code 8. 「売上と広告」データのprewhitened ccf

png(paste0(WORKDIR, "/chap6_code8.png"), width = 24, height = 9, res = 72, units = "cm")
par(mfrow=c(1,2))
prewhiten(dfSalesAds$gTV, dfSalesAds$gSales, ylab = "prewhitend CCF", main = "TV vs. Sales")
prewhiten(dfSalesAds$gWeb, dfSalesAds$gSales,ylab = "prewhitend CCF", main = "Web vs. Sales")
dev.off()
## png 
##   2

code 9. 「売上と広告」OLS回帰

dfTemp <- dfSalesAds %>%
  mutate(
    nMonth = seq_along(dMonth),
    gTV_Lag1 = dplyr::lag(gTV, 1), 
    gTV_Lag2 = dplyr::lag(gTV, 2), 
    gTV_Lag3 = dplyr::lag(gTV, 3), 
    gTV_Lag4 = dplyr::lag(gTV, 4), 
    gWeb_Lag1 = dplyr::lag(gWeb, 1), 
    gWeb_Lag2 = dplyr::lag(gWeb, 2), 
    gWeb_Lag3 = dplyr::lag(gWeb, 3), 
    gWeb_Lag4 = dplyr::lag(gWeb, 4) 
  )
oModel1 <- lm(
  gSales ~ nMonth + gTV + gTV_Lag1 + gTV_Lag2 + gTV_Lag3 + gTV_Lag4 
           + gWeb + gWeb_Lag1 + gWeb_Lag2 + gWeb_Lag3 + gWeb_Lag4, 
  data = dfTemp
)
print(summary(oModel1))
## 
## Call:
## lm(formula = gSales ~ nMonth + gTV + gTV_Lag1 + gTV_Lag2 + gTV_Lag3 + 
##     gTV_Lag4 + gWeb + gWeb_Lag1 + gWeb_Lag2 + gWeb_Lag3 + gWeb_Lag4, 
##     data = dfTemp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.24975 -0.57631 -0.09088  0.64647  2.83571 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.90296    1.42851   4.832 1.67e-05 ***
## nMonth       0.06689    0.00978   6.840 1.96e-08 ***
## gTV          0.04807    0.08728   0.551 0.584581    
## gTV_Lag1     0.33001    0.09175   3.597 0.000811 ***
## gTV_Lag2     0.20652    0.09026   2.288 0.026999 *  
## gTV_Lag3    -0.07086    0.08925  -0.794 0.431473    
## gTV_Lag4     0.05953    0.08633   0.690 0.494112    
## gWeb         0.47568    0.13080   3.637 0.000721 ***
## gWeb_Lag1    0.21089    0.12524   1.684 0.099304 .  
## gWeb_Lag2    0.08791    0.12480   0.704 0.484910    
## gWeb_Lag3   -0.05198    0.11593  -0.448 0.656088    
## gWeb_Lag4    0.16409    0.11695   1.403 0.167634    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.077 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.7101, Adjusted R-squared:  0.6376 
## F-statistic: 9.797 on 11 and 44 DF,  p-value: 1.219e-08

code 10. 「売上と広告」ラグ選択

# データ作成
dfTemp <- dfSalesAds %>%
  mutate(
    nMonth = seq_along(dMonth),
    gTV_Lag1 = dplyr::lag(gTV, 1), 
    gTV_Lag2 = dplyr::lag(gTV, 2), 
    gWeb_Lag1 = dplyr::lag(gWeb, 1), 
    gWeb_Lag2 = dplyr::lag(gWeb, 2), 
  ) %>%
  dplyr::select(
    gSales, nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1, gWeb_Lag2
  ) %>%
  # ここでNAを持つ行を削る。
  # 仮に削らないと、モデルによって使う行が変わってしまう
  drop_na() %>% 
  as.data.frame()
# ラグを変えながらOLS推定
lModel2 <- list(
  lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, F,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, T,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, T,T)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, F,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, T,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, T,T)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, F,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, T,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, T,T)])
)
# 結果のまとめ
lOut <- lapply(
  seq_along(lModel2), 
  function(nModelID){
    oModel <- lModel2[[nModelID]]
    agCoef <- coef(oModel)
    out <- tibble(
      nModelID = nModelID,
      sCoef = c(names(agCoef), "AIC"), 
      gValue = c(as.vector(agCoef), AIC(oModel))
    )
    return(out)
  }
)
out <- bind_rows(lOut) %>%
  spread(sCoef, gValue)
print(out)
## # A tibble: 9 x 10
##   nModelID `(Intercept)`   AIC    gTV gTV_Lag1 gTV_Lag2  gWeb gWeb_Lag1
##      <int>         <dbl> <dbl>  <dbl>    <dbl>    <dbl> <dbl>     <dbl>
## 1        1         11.5   202. 0.182    NA       NA     0.478    NA    
## 2        2         11.1   202. 0.174    NA       NA     0.415     0.209
## 3        3         10.9   203. 0.173    NA       NA     0.418     0.188
## 4        4          9.55  186. 0.0479    0.373   NA     0.465    NA    
## 5        5          8.98  183. 0.0352    0.384   NA     0.391     0.241
## 6        6          8.89  185. 0.0351    0.383   NA     0.393     0.229
## 7        7          8.15  181. 0.0589    0.300    0.208 0.527    NA    
## 8        8          7.72  179. 0.0466    0.315    0.194 0.456     0.220
## 9        9          7.57  180. 0.0467    0.312    0.197 0.460     0.202
## # ... with 2 more variables: gWeb_Lag2 <dbl>, nMonth <dbl>
write.csv(out, paste0(WORKDIR, "/chap6_code10.csv"), row.names = F)

# VIF
car::vif(lModel2[[8]])
##    nMonth       gTV  gTV_Lag1  gTV_Lag2      gWeb gWeb_Lag1 
##  1.055391  1.181920  1.302087  1.202703  1.199167  1.147182
# BG test
lmtest::bgtest(lModel2[[8]])
## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  lModel2[[8]]
## LM test = 3.483, df = 1, p-value = 0.062
# 残差プロットとACF
png(paste0(WORKDIR, "/chap6_code10.1.png"), width = 12, height = 12, res = 72, units = "cm")
plot(residuals(lModel2[[8]]), type = "l")
dev.off()
## png 
##   2
png(paste0(WORKDIR, "/chap6_code10.2.png"), width = 12, height = 12, res = 72, units = "cm")
plot(stats::acf(residuals(lModel2[[8]])))
dev.off()
## png 
##   2

code 11. 「売上と広告」モデル構築

dfTemp <- dfSalesAds %>%
  mutate(
    nMonth = seq_along(dMonth),
    gTV_Lag1 = dplyr::lag(gTV, 1), 
    gTV_Lag2 = dplyr::lag(gTV, 2), 
    gWeb_Lag1 = dplyr::lag(gWeb, 1), 
  ) %>%
  dplyr::select(
    gSales, dMonth, nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1
  ) %>%
  drop_na() %>% 
  as.data.frame()

# ML推定
oModel3 <- auto.arima(
  dfTemp$gSales,
  d = 0,
  xreg = dfTemp %>%
    dplyr::select(nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1) %>%
    as.matrix(.),
  trace = TRUE, approximation = FALSE, stepwise = FALSE
)
## 
##  ARIMA(0,0,0) with zero mean     : 220.5417
##  ARIMA(0,0,0) with non-zero mean : 181.525
##  ARIMA(0,0,1) with zero mean     : 207.8165
##  ARIMA(0,0,1) with non-zero mean : 181.1315
##  ARIMA(0,0,2) with zero mean     : 210.1012
##  ARIMA(0,0,2) with non-zero mean : 183.7719
##  ARIMA(0,0,3) with zero mean     : 206.0413
##  ARIMA(0,0,3) with non-zero mean : 180.5368
##  ARIMA(0,0,4) with zero mean     : 208.9584
##  ARIMA(0,0,4) with non-zero mean : 183.444
##  ARIMA(0,0,5) with zero mean     : Inf
##  ARIMA(0,0,5) with non-zero mean : 186.7485
##  ARIMA(1,0,0) with zero mean     : 203.518
##  ARIMA(1,0,0) with non-zero mean : 180.6375
##  ARIMA(1,0,1) with zero mean     : Inf
##  ARIMA(1,0,1) with non-zero mean : 183.0648
##  ARIMA(1,0,2) with zero mean     : Inf
##  ARIMA(1,0,2) with non-zero mean : 185.7695
##  ARIMA(1,0,3) with zero mean     : 208.6526
##  ARIMA(1,0,3) with non-zero mean : 183.5167
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 186.779
##  ARIMA(2,0,0) with zero mean     : 198.8459
##  ARIMA(2,0,0) with non-zero mean : 183.0783
##  ARIMA(2,0,1) with zero mean     : Inf
##  ARIMA(2,0,1) with non-zero mean : Inf
##  ARIMA(2,0,2) with zero mean     : Inf
##  ARIMA(2,0,2) with non-zero mean : Inf
##  ARIMA(2,0,3) with zero mean     : Inf
##  ARIMA(2,0,3) with non-zero mean : Inf
##  ARIMA(3,0,0) with zero mean     : Inf
##  ARIMA(3,0,0) with non-zero mean : 185.3596
##  ARIMA(3,0,1) with zero mean     : Inf
##  ARIMA(3,0,1) with non-zero mean : 185.6658
##  ARIMA(3,0,2) with zero mean     : Inf
##  ARIMA(3,0,2) with non-zero mean : Inf
##  ARIMA(4,0,0) with zero mean     : Inf
##  ARIMA(4,0,0) with non-zero mean : 184.0288
##  ARIMA(4,0,1) with zero mean     : Inf
##  ARIMA(4,0,1) with non-zero mean : 186.8062
##  ARIMA(5,0,0) with zero mean     : Inf
##  ARIMA(5,0,0) with non-zero mean : 187.1538
## 
## 
## 
##  Best model: Regression with ARIMA(0,0,3) errors
print(summary(oModel3))
## Series: dfTemp$gSales 
## Regression with ARIMA(0,0,3) errors 
## 
## Coefficients:
##          ma1     ma2     ma3  intercept  nMonth     gTV  gTV_Lag1  gTV_Lag2
##       0.2529  0.1344  0.4397     7.9935  0.0620  0.0500    0.3119    0.2099
## s.e.  0.1258  0.1245  0.1680     1.1199  0.0124  0.0702    0.0702    0.0682
##         gWeb  gWeb_Lag1
##       0.4481     0.0919
## s.e.  0.1005     0.0983
## 
## sigma^2 estimated as 0.974:  log likelihood=-76.4
## AIC=174.8   AICc=180.54   BIC=197.46
## 
## Training set error measures:
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.01079291 0.8978319 0.7190424 -0.234242 4.471458 0.5692091
##                    ACF1
## Training set 0.01598773
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.01079291 0.8978319 0.7190424 -0.234242 4.471458 0.5692091
##                    ACF1
## Training set 0.01598773
# モデル診断
png(paste0(WORKDIR, "/chap6_code11.1.png"), width = 14, height = 16, res = 72, units = "cm")
tsdiag(oModel3)
dev.off()
## png 
##   2
# FGLS推定
oModel4 <- gls(
  gSales ~ ., 
  data = dfTemp %>% dplyr::select(-dMonth), 
  corr = corARMA(q = 3)
)
print(summary(oModel4))
## Generalized least squares fit by REML
##   Model: gSales ~ . 
##   Data: dfTemp %>% dplyr::select(-dMonth) 
##        AIC      BIC    logLik
##   199.5963 220.8463 -88.79813
## 
## Correlation Structure: ARMA(0,3)
##  Formula: ~1 
##  Parameter estimate(s):
##    Theta1    Theta2    Theta3 
## 0.2852240 0.1436000 0.4123918 
## 
## Coefficients:
##                Value Std.Error  t-value p-value
## (Intercept) 7.946032 1.1809692 6.728399  0.0000
## nMonth      0.062135 0.0133116 4.667781  0.0000
## gTV         0.050873 0.0720590 0.705984  0.4834
## gTV_Lag1    0.315700 0.0700079 4.509495  0.0000
## gTV_Lag2    0.209146 0.0703625 2.972411  0.0045
## gWeb        0.445837 0.0973315 4.580608  0.0000
## gWeb_Lag1   0.098835 0.0917617 1.077078  0.2865
## 
##  Correlation: 
##           (Intr) nMonth gTV    gTV_L1 gTV_L2 gWeb  
## nMonth    -0.399                                   
## gTV       -0.549  0.000                            
## gTV_Lag1  -0.310  0.026 -0.199                     
## gTV_Lag2  -0.523 -0.023  0.213 -0.171              
## gWeb      -0.298  0.065  0.178 -0.163  0.106       
## gWeb_Lag1 -0.218  0.122 -0.046  0.094 -0.115 -0.065
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.00779092 -0.58358882  0.01660233  0.66292952  2.67581835 
## 
## Residual standard error: 1.08131 
## Degrees of freedom: 58 total; 51 residual
# 描画準備
dfPlot <- dfTemp %>%
  mutate(
    gFitted = as.vector(fitted(oModel4))
  ) %>%
  dplyr::select(dMonth, gSales, gFitted) %>%
  gather(sVar, gValue, c(gSales, gFitted)) %>%
  mutate(
    fVar = factor(
      sVar, 
      levels = c("gSales", "gFitted"), 
      labels = c("Actual", "Fitted")
    )
  )
# 描画
g <- ggplot(data = dfPlot, aes(x = dMonth, y = gValue, color = fVar))
g <- g + geom_line()
g <- g + labs(y = "値", x = "年月", color=NULL)
g <- g + theme_bw()
g

ggsave(paste0(WORKDIR, "/chap6_code11.1.png"), width = 24, height = 12, units = "cm")

# 係数表示
mgConfint <- confint(oModel4)
colnames(mgConfint) <- c("Lower", "Upper")
dfConfint <- data.frame(
  sVar = rownames(mgConfint), 
  mgConfint, 
  stringsAsFactors = FALSE
)
agCoef <- coef(oModel4)
dfCoef <- tibble(
  sVar = names(agCoef),
  gCoef = as.vector(agCoef)
)
dfAdd <- tibble(sVar = "gWeb_Lag2", Lower = 0, Upper = 0, gCoef = 0)
dfPlot <- full_join(dfCoef, dfConfint, by = "sVar") %>%
  bind_rows(dfAdd) %>%
  filter(grepl("g(TV|Web)", sVar)) %>%
  mutate( sVar = sub("^g(TV|Web)$", "g\\1_Lag0", sVar)) %>%
  separate(sVar, c("sVar1", "sVar2")) %>%
  mutate(
    nLag = as.integer(sub("Lag", "", sVar2)),
    fLag = factor(nLag, levels = 0:2, labels = c("当月", "翌月", "翌々月")),
    sType = sub("^g", "", sVar1)
  )
g <- ggplot(
  data = dfPlot, 
  aes(x = fLag, y = gCoef, ymin = Lower, ymax = Upper, color = sType)
)
g <- g + geom_hline(yintercept = 0)
g <- g + geom_pointrange(size = 1)
g <- g + facet_grid(sType ~ .)
g <- g + guides(color = FALSE)
g <- g + labs(
  title = "ある月の広告支出を1増やしたときの売上数量の増大", 
  x = NULL, 
  y = "売上数量の増大(95%信頼区間)"
)
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code11.2.png"), width = 15, height = 12, units = "cm")
# 合計の表示
out <- dfPlot %>%
  group_by(sType) %>%
  summarize(gCoef = sum(gCoef)) %>%
  ungroup()
print(out)
## # A tibble: 2 x 2
##   sType gCoef
##   <chr> <dbl>
## 1 TV    0.576
## 2 Web   0.545

code 12. 「売上と広告」自己回帰

# データ作成
tsSales <- ts(dfSalesAds$gSales)
tsTV    <- ts(dfSalesAds$gTV)
tsWeb   <- ts(dfSalesAds$gWeb)
# 推定
oModel5 <- dynlm(tsSales ~ L(tsTV, 0:1) + tsWeb + L(tsSales))
print(summary(oModel5))
## 
## Time series regression with "ts" data:
## Start = 2, End = 60
## 
## Call:
## dynlm(formula = tsSales ~ L(tsTV, 0:1) + tsWeb + L(tsSales))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7147 -0.5968  0.2122  0.6162  2.9425 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.60775    1.63575   2.206  0.03169 *  
## L(tsTV, 0:1)0  0.04824    0.08799   0.548  0.58580    
## L(tsTV, 0:1)1  0.29478    0.09085   3.245  0.00202 ** 
## tsWeb          0.35911    0.11328   3.170  0.00251 ** 
## L(tsSales)     0.53968    0.09255   5.831 3.18e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.208 on 54 degrees of freedom
## Multiple R-squared:  0.5617, Adjusted R-squared:  0.5292 
## F-statistic:  17.3 on 4 and 54 DF,  p-value: 3.442e-09
# 変換
out <- tibble(
  nLag = 0:1000, # 十分に大きな数
  gTV  = 0.048 * c(1, ARMAtoMA(ar = 0.540, ma = c(0.295/0.048), lag.max = 1000)), 
  gWeb = 0.359 * c(1, ARMAtoMA(ar = 0.540,                      lag.max = 1000))
) 
sum(out$gTV)
## [1] 0.7456522
sum(out$gWeb)
## [1] 0.7804348
# 図示
dfPlot <- out %>%
  filter(nLag <= 10) %>%
  gather(sVar, gValue, c(gTV, gWeb)) %>%
  mutate(
    sLag = factor(nLag, levels = 0:10, labels = paste0(0:10, "ヶ月後")), 
    sType = sub("^g", "", sVar)
  ) 
g <- ggplot(data = dfPlot, aes(x = sLag, y = gValue, fill = sType))
g <- g + geom_hline(yintercept = 0)
g <- g + geom_col()
g <- g + facet_grid(sType ~ .)
g <- g + guides(fill = FALSE)
g <- g + labs(
  title = "ある月の広告支出を1増やしたときの売上数量の増大", 
  x = NULL, 
  y = "売上数量の増大(95%信頼区間)"
)
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code12.png"), width = 15, height = 12, units = "cm")

code 13. 「売上と広告」アドストック

dfTemp <- dfSalesAds %>%
  mutate(
    gTVStock  = stats::filter(gTV, filter = 0.5, method = "recursive"),
    gWebStock = stats::filter(gWeb, filter = 0.5, method = "recursive")
  ) 
summary(lm(gSales ~ gTVStock + gWebStock, data = dfTemp))
## 
## Call:
## lm(formula = gSales ~ gTVStock + gWebStock, data = dfTemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6398 -1.1932  0.1214  1.2154  3.7934 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.62407    1.50452   7.061 2.51e-09 ***
## gTVStock     0.24902    0.07604   3.275   0.0018 ** 
## gWebStock    0.28244    0.11132   2.537   0.0139 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.579 on 57 degrees of freedom
## Multiple R-squared:  0.2097, Adjusted R-squared:  0.1819 
## F-statistic:  7.56 on 2 and 57 DF,  p-value: 0.001224

code 14. IBM-SPの図示

dfPlot <- dfIBMSP %>%
  gather(sVar, gValue, c(IBM, SP))
g <- ggplot(data=dfPlot, aes(x = dMonth, y = gValue, color = sVar))
g <- g + geom_line()
g <- g + guides(color = FALSE)
g <- g + labs(x = NULL, y = "収益率(対数)")
g <- g + facet_grid(sVar ~ .)
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code14.png"), width = 24, height = 10, units = "cm")

code 15. IBM-SPの単位根検定

ndiffs(dfIBMSP$IBM, test = "adf", type = "trend")
## [1] 0
ndiffs(dfIBMSP$SP, test = "adf", type = "trend")
## [1] 0

code 16. IBM-SPのVARモデル

# VAR次数選択
VARselect(dfIBMSP %>% dplyr::select(IBM, SP), lag.max = 5, type = "const")
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      3      1      1      3 
## 
## $criteria
##                 1          2          3          4          5
## AIC(n)   6.760111   6.758874   6.753468   6.755548   6.753548
## HQ(n)    6.772538   6.779587   6.782466   6.792831   6.799116
## SC(n)    6.792614   6.813046   6.829308   6.853057   6.872725
## FPE(n) 862.737822 861.671992 857.026761 858.811796 857.096638
# VARモデル推定
oVARModel <- VAR(
  dfIBMSP %>% dplyr::select(IBM, SP), 
  p = 1, 
  type = "const"
)
summary(oVARModel)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: IBM, SP 
## Deterministic variables: const 
## Sample size: 887 
## Log Likelihood: -5508.559 
## Roots of the characteristic polynomial:
## 0.06853 0.03086
## Call:
## VAR(y = dfIBMSP %>% dplyr::select(IBM, SP), p = 1, type = "const")
## 
## 
## Estimation results for equation IBM: 
## ==================================== 
## IBM = IBM.l1 + SP.l1 + const 
## 
##        Estimate Std. Error t value Pr(>|t|)    
## IBM.l1  0.01919    0.04334   0.443   0.6579    
## SP.l1   0.10616    0.05167   2.054   0.0402 *  
## const   1.16265    0.22897   5.078 4.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 6.704 on 884 degrees of freedom
## Multiple R-Squared: 0.01047, Adjusted R-squared: 0.008226 
## F-statistic: 4.675 on 2 and 884 DF,  p-value: 0.009562 
## 
## 
## Estimation results for equation SP: 
## =================================== 
## SP = IBM.l1 + SP.l1 + const 
## 
##         Estimate Std. Error t value Pr(>|t|)   
## IBM.l1 -0.005419   0.036441  -0.149  0.88183   
## SP.l1   0.080189   0.043453   1.845  0.06531 . 
## const   0.499350   0.192541   2.593  0.00966 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 5.638 on 884 degrees of freedom
## Multiple R-Squared: 0.005809,    Adjusted R-squared: 0.003559 
## F-statistic: 2.582 on 2 and 884 DF,  p-value: 0.07616 
## 
## 
## 
## Covariance matrix of residuals:
##       IBM    SP
## IBM 44.95 23.94
## SP  23.94 31.78
## 
## Correlation matrix of residuals:
##        IBM     SP
## IBM 1.0000 0.6334
## SP  0.6334 1.0000
# SPはIBMのグレンジャー原因か?
causality(oVARModel, cause = "SP")
## $Granger
## 
##  Granger causality H0: SP do not Granger-cause IBM
## 
## data:  VAR object oVARModel
## F-Test = 4.2205, df1 = 1, df2 = 1768, p-value = 0.04009
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: SP and IBM
## 
## data:  VAR object oVARModel
## Chi-squared = 253.94, df = 1, p-value < 2.2e-16
# IBMはSPのグレンジャー原因か?
causality(oVARModel, cause = "IBM")
## $Granger
## 
##  Granger causality H0: IBM do not Granger-cause SP
## 
## data:  VAR object oVARModel
## F-Test = 0.022111, df1 = 1, df2 = 1768, p-value = 0.8818
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: IBM and SP
## 
## data:  VAR object oVARModel
## Chi-squared = 253.94, df = 1, p-value < 2.2e-16
# IRF
# チャートの出力をうまく合成できない...
png(paste0(WORKDIR, "/chap6_code16.1.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel, impulse = "SP"))
dev.off()
## png 
##   2
png(paste0(WORKDIR, "/chap6_code16.2.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel, impulse = "IBM"))
dev.off()
## png 
##   2

code 17. 広告と売上のVARモデル

VARselect(
  dfSalesAds %>% dplyr::select(gSales, gTV, gWeb), 
  lag.max = 5, 
  type = "trend"
)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      1      1      1      1 
## 
## $criteria
##               1        2        3         4         5
## AIC(n) 2.154837 2.281246 2.265232  2.490680  2.691815
## HQ(n)  2.324201 2.577633 2.688642  3.041113  3.369271
## SC(n)  2.592801 3.047682 3.360141  3.914061  4.443669
## FPE(n) 8.633145 9.829711 9.752239 12.403913 15.544622
oVARModel1 <- VAR(
  dfSalesAds %>% dplyr::select(gSales, gTV), 
  p = 1, 
  type = "trend"
)
oVARModel2 <- VAR(
  dfSalesAds %>% dplyr::select(gSales, gWeb), 
  p = 1, 
  type = "trend"
)
png(paste0(WORKDIR, "/chap6_code17.1.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel1, impulse = "gTV", response = c("gSales")))
dev.off()
## png 
##   2
png(paste0(WORKDIR, "/chap6_code17.2.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel2, impulse = "gWeb", response = c("gSales")))
dev.off()
## png 
##   2

Appendix. 動的回帰分析のためのRパッケージ

# 動作確認用データ
set.seed(123)
nLength <- 10000

# dfDirectLag: 
# omega[0]=0.3, omega[1]=0.2, theta[1]=0.25, phi[1]=0.1
dfDirectLag <- tibble(gX = rnorm(nLength)) %>%
  mutate(
    gXLag1 = dplyr::lag(gX),
    gState = as.vector(
      0.3 * arima.sim(list(ma=0.2/0.3), n=nLength, innov=gX)
    ),
    gErr   = as.vector(
      arima.sim(list(ma=0.25, ar=0.1), n=nLength)
    ),
    gY     = gState + gErr
  ) %>%
  drop_na()
mgDirectLagX <- dfDirectLag %>% 
  dplyr::select(gX, gXLag1) %>% 
  as.matrix(.)

# dfADL: 
# omega[0]=0.4, omega[1]=0.3, phi[1]=0.2
dfADL <- tibble(gX = rnorm(nLength)) %>%
  mutate(
    gState = as.vector(
      0.4 * arima.sim(list(ma=0.3/0.4, ar=0.2), n=nLength, innov=gX)
    ),
    gErr   = as.vector(
      arima.sim(list(ar=0.2), n=nLength)
    ),
    gY     = gState + gErr
  ) %>%
  drop_na()

# dfTransfer: 
# omega[0]=0.3, omega[1]=0.2, delta[1]=0.2, theta[1]=0.25, phi[1]=0.1
dfTransfer <- tibble(gX = rnorm(nLength)) %>%
  mutate(
    gState = as.vector(
      0.3 * arima.sim(list(ma=0.2/0.3, ar=0.2), n=nLength, innov=gX)
    ),
    gErr   = as.vector(
      arima.sim(list(ma=0.25, ar=0.1), n=nLength)
    ),
    gY     = gState + gErr
  ) %>%
  drop_na()

forecast パッケージ

library(forecast)

# forecast::Arima()
oModel <- Arima(
  dfDirectLag$gY,       # 目的変数
  order = c(1, 0, 1),   # 撹乱項のARMA次数
  xreg  = mgDirectLagX  # 説明変数行列
)
print(oModel)
## Series: dfDirectLag$gY 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1     ma1  intercept      gX  gXLag1
##       0.1185  0.2372    -0.0144  0.2971  0.2139
## s.e.  0.0285  0.0278     0.0141  0.0100  0.0100
## 
## sigma^2 estimated as 1.004:  log likelihood=-14203.07
## AIC=28418.15   AICc=28418.16   BIC=28461.41
## forecast::auto.arima()
oModel <- auto.arima(
  dfDirectLag$gY,       # 目的変数
  d = 0,                # 差分階数
  xreg = mgDirectLagX,  # 撹乱項のARMA次数行列
  trace = TRUE, approximation = FALSE
)
## 
##  ARIMA(2,0,2) with non-zero mean : 28421.94
##  ARIMA(0,0,0) with non-zero mean : 29620.56
##  ARIMA(1,0,0) with non-zero mean : 28481.44
##  ARIMA(0,0,1) with non-zero mean : 28433
##  ARIMA(0,0,0) with zero mean     : 29620.36
##  ARIMA(1,0,2) with non-zero mean : 28420
##  ARIMA(0,0,2) with non-zero mean : 28418.01
##  ARIMA(0,0,3) with non-zero mean : 28419.99
##  ARIMA(1,0,1) with non-zero mean : 28418.16
##  ARIMA(1,0,3) with non-zero mean : 28422.01
##  ARIMA(0,0,2) with zero mean     : 28417.06
##  ARIMA(0,0,1) with zero mean     : 28432.13
##  ARIMA(1,0,2) with zero mean     : 28419.04
##  ARIMA(0,0,3) with zero mean     : 28419.04
##  ARIMA(1,0,1) with zero mean     : 28417.19
##  ARIMA(1,0,3) with zero mean     : 28421.06
## 
##  Best model: Regression with ARIMA(0,0,2) errors
print(oModel)
## Series: dfDirectLag$gY 
## Regression with ARIMA(0,0,2) errors 
## 
## Coefficients:
##          ma1     ma2      gX  gXLag1
##       0.3558  0.0414  0.2972   0.214
## s.e.  0.0100  0.0100  0.0100   0.010
## 
## sigma^2 estimated as 1.004:  log likelihood=-14203.53
## AIC=28417.05   AICc=28417.06   BIC=28453.11

nlmeパッケージ

library(nlme)

## nlme::gls() 
oModel <- gls(
  gY ~ gX + gXLag1, 
  corr = corARMA(p = 1, q = 1),     # ARMA撹乱項の次数
  data = dfDirectLag %>% head(1000) # 計算時間節約のため時点数を減らした
)
print(summary(oModel))
## Generalized least squares fit by REML
##   Model: gY ~ gX + gXLag1 
##   Data: dfDirectLag %>% head(1000) 
##        AIC      BIC    logLik
##   2863.836 2893.264 -1425.918
## 
## Correlation Structure: ARMA(1,1)
##  Formula: ~1 
##  Parameter estimate(s):
##      Phi1    Theta1 
## 0.1353544 0.1875080 
## 
## Coefficients:
##                  Value  Std.Error  t-value p-value
## (Intercept) 0.02957195 0.04347242 0.680246  0.4965
## gX          0.30609021 0.03186142 9.606922  0.0000
## gXLag1      0.25455108 0.03187183 7.986711  0.0000
## 
##  Correlation: 
##        (Intr) gX    
## gX     -0.015       
## gXLag1 -0.015  0.326
## 
## Standardized residuals:
##          Min           Q1          Med           Q3          Max 
## -3.258600581 -0.630099992  0.005858342  0.661148481  3.347699148 
## 
## Residual standard error: 1.052911 
## Degrees of freedom: 1000 total; 997 residual

dynlmパッケージ

library(dynlm)

# 使用するデータはtsオブジェクトにする
tsY <-  ts(dfADL$gY)
tsX  <- ts(dfADL$gX)

## dynlm::dynlm()
## L(変数名, 次数) でラグ変数を指定
oModel <- dynlm(tsY ~  L(tsX, 0:1) + L(tsY, 1))
print(summary(oModel))
## 
## Time series regression with "ts" data:
## Start = 2, End = 10000
## 
## Call:
## dynlm(formula = tsY ~ L(tsX, 0:1) + L(tsY, 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1152 -0.6740  0.0063  0.6740  3.9827 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.004895   0.009997   -0.49    0.624    
## L(tsX, 0:1)0  0.390970   0.009996   39.11   <2e-16 ***
## L(tsX, 0:1)1  0.301471   0.010623   28.38   <2e-16 ***
## L(tsY, 1)     0.197672   0.009168   21.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9996 on 9995 degrees of freedom
## Multiple R-squared:  0.2561, Adjusted R-squared:  0.2559 
## F-statistic:  1147 on 3 and 9995 DF,  p-value: < 2.2e-16

TSAパッケージ

library(TSA)

## TSA::arimax()
oModel <- arimax(
  dfTransfer$gY[1:1000],
  order    = c(1, 0, 1),  # 撹乱項のARMA次数
  xtransf  = as.matrix(dfTransfer$gX[1:1000]), # 説明変数行列
  transfer = list(c(1,1)),  # 説明変数のARMA次数。
                            # 説明変数が複数あるときはその順に並べる
  method   = "ML"
)
print(oModel)
## 
## Call:
## arimax(x = dfTransfer$gY[1:1000], order = c(1, 0, 1), method = "ML", xtransf = as.matrix(dfTransfer$gX[1:1000]), 
##     transfer = list(c(1, 1)))
## 
## Coefficients:
##          ar1     ma1  intercept  T1-AR1  T1-MA0  T1-MA1
##       0.1264  0.2384     0.0465  0.2029  0.2997  0.2357
## s.e.  0.0846  0.0823     0.0454  0.1084  0.0324  0.0424
## 
## sigma^2 estimated as 1.024:  log likelihood = -1429.4,  aic = 2870.8

以上