このページでは、株式会社インサイト・ファクトリー 2020年度社内研修「マーケティング・ミックス・モデリング」VI章「動学的市場反応モデル (1)」のRコードを公開しています。

準備

library(tidyverse)

## -- Attaching packages --------------------------------------------------------- tidyverse 1.3.0 --

## √ ggplot2 3.3.0     √ purrr   0.3.3
## √ tibble  3.0.0     √ dplyr   0.8.5
## √ tidyr   1.0.2     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.5.0

## -- Conflicts ------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

library(patchwork)
library(xts)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

library(TSA)

## 
## Attaching package: 'TSA'

## The following object is masked from 'package:readr':
## 
##     spec

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

library(nlme)

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:dplyr':
## 
##     collapse

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Registered S3 methods overwritten by 'forecast':
##   method       from
##   fitted.Arima TSA 
##   plot.Arima   TSA

## 
## Attaching package: 'forecast'

## The following object is masked from 'package:nlme':
## 
##     getResponse

library(dynlm)
library(FinTS)

## 
## Attaching package: 'FinTS'

## The following object is masked from 'package:forecast':
## 
##     Acf

library(vars)

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:patchwork':
## 
##     area

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: strucchange

## Loading required package: sandwich

## 
## Attaching package: 'strucchange'

## The following object is masked from 'package:stringr':
## 
##     boundary

## Loading required package: urca

## Loading required package: lmtest

WORKDIR <- "."


# 「売上と広告」データの作成
set.seed(123)
dfSalesAds <- tibble(
  # 年月
  dMonth = seq(as.Date("2013-09-01"), as.Date("2018-12-01"), by = "month"),
  # TV広告支出
  gTV    = as.vector(
    stats::filter(
      rnorm(length(dMonth), mean = 5, sd = 2),
      filter = 0.4,
      method = "recursive" # AR
    )
  ),
  # Web広告支出
  gWeb = as.vector(
    stats::filter(
      rnorm(length(dMonth), mean = 2, sd = 1.5),
      filter = 0.3,
      method = "recursive" # AR
    )
  ),
  # 売上の撹乱項
  gYErr = as.vector(
    stats::filter(
      rnorm(length(dMonth), mean = 0, sd = 1),
      filter = 0.45,
      method = "recursive"
    )
  )
) %>%
  mutate(
    # 広告支出の負値を0に置き換え、丸める
    gTV  = round(if_else(gTV < 0, 0, gTV), digits = 1),
    gWeb = round(if_else(gWeb < 0, 0, gWeb), digits = 1),
    # 売上の生成
    gSales = 6 + 0.05 * seq_along(dMonth),
    gSales = gSales + 0.10 * gTV ,
    gSales = gSales + 0.30 * lag(gTV, n = 1) ,
    gSales = gSales + 0.20 * lag(gTV, n = 2),
    gSales = gSales + 0.10 * lag(gTV, n = 3),
    gSales = gSales + 0.05 * lag(gTV, n = 4),
    gSales = gSales + 0.50 * gWeb,
    gSales = gSales + 0.30 * lag(gWeb, n = 1),
    gSales = gSales + gYErr
  ) %>%
  # 変数選択
  dplyr::select(dMonth, gSales, gTV, gWeb) %>%
  # ここで欠損行を削除
  drop_na() 
## print(summary(dfSalesAds))

# IBM-SPデータの作成
data(m.ibmspln)
dfIBMSP <- data.frame(m.ibmspln) %>%
  mutate(
    nYear  = as.integer(time(m.ibmspln)), 
    nMonth = as.integer(cycle(m.ibmspln)), 
    dMonth = as.Date(paste0(nYear, "-", nMonth, "-01"))
  ) %>%
  dplyr::select(dMonth, IBM, SP)

code 1. negative binomial lag

dfOmega <- crossing(
  nR = 1:4,
  nK = 0:10,
  gLambda = seq(0.2, 0.8, 0.2)
) %>%
  mutate(
    gTemp1 = gamma(nR + nK), # (r+k-1)!
    gTemp2 = gamma(nR),      # (r-1)!
    gTemp3 = gamma(nK + 1),  # k!
    gOmega = gTemp1 * ((1-gLambda)^nR) * (gLambda^nK) / (gTemp2 * gTemp3)
  )
## print(dfOmega)

# 描画
dfPlot <- dfOmega %>%
  mutate( sR = paste0("r=", nR) )
g <- ggplot(
  data=dfPlot, 
  aes(x = nK, y = gOmega, group = gLambda, color = factor(gLambda))
)
g <- g + geom_line()
g <- g + geom_point()
g <- g + scale_x_continuous(breaks=0:10)
g <- g + facet_grid(~ sR)
g <- g + labs(color = "lambda", x = "k", y = "omega")
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code1.png"), width = 21, height = 12, units = "cm")

code 2. 「売上と広告」データのチャート

dfPlot <- dfSalesAds %>%
  gather(sVar, gValue, c(gSales, gTV, gWeb)) %>%
  mutate(
    fVar = factor(
      sVar,
      levels = c("gSales", "gTV", "gWeb"),
      labels = c("売上数量", "TV支出", "Web支出")
    )
  )
g <- ggplot(data = dfPlot, aes(x = dMonth, y = gValue))
g <- g + geom_line()
g <- g + facet_grid(fVar ~ ., scales = "free")
g <- g + labs(x = "年月", y = "値")
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code2.png"), width = 21, height = 9, units = "cm")

code 3. 「売上と広告」データのACF

png(paste0(WORKDIR, "/chap6_code3.png"), width = 15, height = 25, res = 72, units = "cm")
par(mfrow=c(3,1))
acf(dfSalesAds$gSales)
acf(dfSalesAds$gTV)
acf(dfSalesAds$gWeb)
dev.off()

## png 
##   2

code 4. 「売上と広告」データの単位根検定

ndiffs(dfSalesAds$gSales, test = "adf", type = "trend")

## [1] 0

ndiffs(dfSalesAds$gWeb, test = "adf", type = "trend")

## [1] 0

ndiffs(dfSalesAds$gTV, test = "adf", type = "trend")

## [1] 0

code 5. 「売上と広告」データの散布図

dfPlot <- dfSalesAds %>%
  mutate(
    fYear = factor(year(dMonth)),
    sMonth = as.character(month(dMonth))
  )
# TV
g <- ggplot(data = dfPlot, aes(x = gTV, y = gSales, color = fYear, label = sMonth))
g <- g + geom_text()
g <- g + geom_path(alpha=0.5)
g <- g + guides(colour=FALSE)
g <- g + labs(title = "TV", x = "TV支出", y = "売上")
g <- g + facet_wrap(~ fYear)
g1 <- g + theme_bw()
# Web
g <- ggplot(data = dfPlot, aes(x = gWeb, y = gSales, color = fYear, label = sMonth))
g <- g + geom_text()
g <- g + geom_path(alpha=0.5)
g <- g + guides(colour=FALSE)
g <- g + labs(title = "Web", x = "Web支出", y = "売上")
g <- g + facet_wrap(~ fYear)
g2 <- g + theme_bw()
# 合成
g <- g1 + g2
print(g)

ggsave(paste0(WORKDIR, "/chap6_code5.png"), width = 24, height = 10, units = "cm")

code 6. 「売上と広告」データのccf

png(paste0(WORKDIR, "/chap6_code6.png"), width = 24, height = 9, res = 72, units = "cm")
par(mfrow=c(1,2))
ccf(dfSalesAds$gTV, dfSalesAds$gSales, main = "TV vs. Sales")
ccf(dfSalesAds$gWeb, dfSalesAds$gSales, main = "Web vs. Sales")
dev.off()

## png 
##   2

code 7. CCFとprewhitened CCF

# データ生成
set.seed(123)
sLength <- 10000
dfTemp <- tibble(
  gX = as.vector(stats::filter(rnorm(sLength), filter = 0.5, method = "recursive")),
  gE = stats::filter(rnorm(sLength), filter = 0.1, method = "recursive"),
  gY = gX + gE
) %>%
  drop_na()
# CCF
png(paste0(WORKDIR, "/chap6_code7.1.png"), width = 15, height = 10, res = 72, units = "cm")
ccf(dfTemp$gX, dfTemp$gY, ylab = "CCF", lag.max = 10, main = "X vs. Y")
dev.off()

## png 
##   2

# CCFとprewhitened CCF
png(paste0(WORKDIR, "/chap6_code7.2.png"), width = 20, height = 8, res = 72, units = "cm")
par(mfrow=c(1,2))
ccf(dfTemp$gX, dfTemp$gY, ylab = "CCF", lag.max = 10, main = "標本CCF")
prewhiten(dfTemp$gX, dfTemp$gY, ylab = "prewhitened CCF", lag.max = 10, main = "プレホワイト化標本CCF")
dev.off()

## png 
##   2

code 8. 「売上と広告」データのprewhitened ccf

png(paste0(WORKDIR, "/chap6_code8.png"), width = 24, height = 9, res = 72, units = "cm")
par(mfrow=c(1,2))
prewhiten(dfSalesAds$gTV, dfSalesAds$gSales, ylab = "prewhitend CCF", main = "TV vs. Sales")
prewhiten(dfSalesAds$gWeb, dfSalesAds$gSales,ylab = "prewhitend CCF", main = "Web vs. Sales")
dev.off()

## png 
##   2

code 9. 「売上と広告」OLS回帰

dfTemp <- dfSalesAds %>%
  mutate(
    nMonth = seq_along(dMonth),
    gTV_Lag1 = dplyr::lag(gTV, 1), 
    gTV_Lag2 = dplyr::lag(gTV, 2), 
    gTV_Lag3 = dplyr::lag(gTV, 3), 
    gTV_Lag4 = dplyr::lag(gTV, 4), 
    gWeb_Lag1 = dplyr::lag(gWeb, 1), 
    gWeb_Lag2 = dplyr::lag(gWeb, 2), 
    gWeb_Lag3 = dplyr::lag(gWeb, 3), 
    gWeb_Lag4 = dplyr::lag(gWeb, 4) 
  )
oModel1 <- lm(
  gSales ~ nMonth + gTV + gTV_Lag1 + gTV_Lag2 + gTV_Lag3 + gTV_Lag4 
           + gWeb + gWeb_Lag1 + gWeb_Lag2 + gWeb_Lag3 + gWeb_Lag4, 
  data = dfTemp
)
print(summary(oModel1))

## 
## Call:
## lm(formula = gSales ~ nMonth + gTV + gTV_Lag1 + gTV_Lag2 + gTV_Lag3 + 
##     gTV_Lag4 + gWeb + gWeb_Lag1 + gWeb_Lag2 + gWeb_Lag3 + gWeb_Lag4, 
##     data = dfTemp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.24975 -0.57631 -0.09088  0.64647  2.83571 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.90296    1.42851   4.832 1.67e-05 ***
## nMonth       0.06689    0.00978   6.840 1.96e-08 ***
## gTV          0.04807    0.08728   0.551 0.584581    
## gTV_Lag1     0.33001    0.09175   3.597 0.000811 ***
## gTV_Lag2     0.20652    0.09026   2.288 0.026999 *  
## gTV_Lag3    -0.07086    0.08925  -0.794 0.431473    
## gTV_Lag4     0.05953    0.08633   0.690 0.494112    
## gWeb         0.47568    0.13080   3.637 0.000721 ***
## gWeb_Lag1    0.21089    0.12524   1.684 0.099304 .  
## gWeb_Lag2    0.08791    0.12480   0.704 0.484910    
## gWeb_Lag3   -0.05198    0.11593  -0.448 0.656088    
## gWeb_Lag4    0.16409    0.11695   1.403 0.167634    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.077 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.7101, Adjusted R-squared:  0.6376 
## F-statistic: 9.797 on 11 and 44 DF,  p-value: 1.219e-08

code 10. 「売上と広告」ラグ選択

# データ作成
dfTemp <- dfSalesAds %>%
  mutate(
    nMonth = seq_along(dMonth),
    gTV_Lag1 = dplyr::lag(gTV, 1), 
    gTV_Lag2 = dplyr::lag(gTV, 2), 
    gWeb_Lag1 = dplyr::lag(gWeb, 1), 
    gWeb_Lag2 = dplyr::lag(gWeb, 2), 
  ) %>%
  dplyr::select(
    gSales, nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1, gWeb_Lag2
  ) %>%
  # ここでNAを持つ行を削る。
  # 仮に削らないと、モデルによって使う行が変わってしまう
  drop_na() %>% 
  as.data.frame()
# ラグを変えながらOLS推定
lModel2 <- list(
  lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, F,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, T,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, T,T)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, F,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, T,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, T,T)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, F,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, T,F)]),
  lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, T,T)])
)
# 結果のまとめ
lOut <- lapply(
  seq_along(lModel2), 
  function(nModelID){
    oModel <- lModel2[[nModelID]]
    agCoef <- coef(oModel)
    out <- tibble(
      nModelID = nModelID,
      sCoef = c(names(agCoef), "AIC"), 
      gValue = c(as.vector(agCoef), AIC(oModel))
    )
    return(out)
  }
)
out <- bind_rows(lOut) %>%
  spread(sCoef, gValue)
print(out)

## # A tibble: 9 x 10
##   nModelID `(Intercept)`   AIC    gTV gTV_Lag1 gTV_Lag2  gWeb gWeb_Lag1
##      <int>         <dbl> <dbl>  <dbl>    <dbl>    <dbl> <dbl>     <dbl>
## 1        1         11.5   202. 0.182    NA       NA     0.478    NA    
## 2        2         11.1   202. 0.174    NA       NA     0.415     0.209
## 3        3         10.9   203. 0.173    NA       NA     0.418     0.188
## 4        4          9.55  186. 0.0479    0.373   NA     0.465    NA    
## 5        5          8.98  183. 0.0352    0.384   NA     0.391     0.241
## 6        6          8.89  185. 0.0351    0.383   NA     0.393     0.229
## 7        7          8.15  181. 0.0589    0.300    0.208 0.527    NA    
## 8        8          7.72  179. 0.0466    0.315    0.194 0.456     0.220
## 9        9          7.57  180. 0.0467    0.312    0.197 0.460     0.202
## # ... with 2 more variables: gWeb_Lag2 <dbl>, nMonth <dbl>

write.csv(out, paste0(WORKDIR, "/chap6_code10.csv"), row.names = F)

# VIF
car::vif(lModel2[[8]])

##    nMonth       gTV  gTV_Lag1  gTV_Lag2      gWeb gWeb_Lag1 
##  1.055391  1.181920  1.302087  1.202703  1.199167  1.147182

# BG test
lmtest::bgtest(lModel2[[8]])

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  lModel2[[8]]
## LM test = 3.483, df = 1, p-value = 0.062

# 残差プロットとACF
png(paste0(WORKDIR, "/chap6_code10.1.png"), width = 12, height = 12, res = 72, units = "cm")
plot(residuals(lModel2[[8]]), type = "l")
dev.off()

## png 
##   2

png(paste0(WORKDIR, "/chap6_code10.2.png"), width = 12, height = 12, res = 72, units = "cm")
plot(stats::acf(residuals(lModel2[[8]])))
dev.off()

## png 
##   2

code 11. 「売上と広告」モデル構築

dfTemp <- dfSalesAds %>%
  mutate(
    nMonth = seq_along(dMonth),
    gTV_Lag1 = dplyr::lag(gTV, 1), 
    gTV_Lag2 = dplyr::lag(gTV, 2), 
    gWeb_Lag1 = dplyr::lag(gWeb, 1), 
  ) %>%
  dplyr::select(
    gSales, dMonth, nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1
  ) %>%
  drop_na() %>% 
  as.data.frame()

# ML推定
oModel3 <- auto.arima(
  dfTemp$gSales,
  d = 0,
  xreg = dfTemp %>%
    dplyr::select(nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1) %>%
    as.matrix(.),
  trace = TRUE, approximation = FALSE, stepwise = FALSE
)

## 
##  ARIMA(0,0,0) with zero mean     : 220.5417
##  ARIMA(0,0,0) with non-zero mean : 181.525
##  ARIMA(0,0,1) with zero mean     : 207.8165
##  ARIMA(0,0,1) with non-zero mean : 181.1315
##  ARIMA(0,0,2) with zero mean     : 210.1012
##  ARIMA(0,0,2) with non-zero mean : 183.7719
##  ARIMA(0,0,3) with zero mean     : 206.0413
##  ARIMA(0,0,3) with non-zero mean : 180.5368
##  ARIMA(0,0,4) with zero mean     : 208.9584
##  ARIMA(0,0,4) with non-zero mean : 183.444
##  ARIMA(0,0,5) with zero mean     : Inf
##  ARIMA(0,0,5) with non-zero mean : 186.7485
##  ARIMA(1,0,0) with zero mean     : 203.518
##  ARIMA(1,0,0) with non-zero mean : 180.6375
##  ARIMA(1,0,1) with zero mean     : Inf
##  ARIMA(1,0,1) with non-zero mean : 183.0648
##  ARIMA(1,0,2) with zero mean     : Inf
##  ARIMA(1,0,2) with non-zero mean : 185.7695
##  ARIMA(1,0,3) with zero mean     : 208.6526
##  ARIMA(1,0,3) with non-zero mean : 183.5167
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 186.779
##  ARIMA(2,0,0) with zero mean     : 198.8459
##  ARIMA(2,0,0) with non-zero mean : 183.0783
##  ARIMA(2,0,1) with zero mean     : Inf
##  ARIMA(2,0,1) with non-zero mean : Inf
##  ARIMA(2,0,2) with zero mean     : Inf
##  ARIMA(2,0,2) with non-zero mean : Inf
##  ARIMA(2,0,3) with zero mean     : Inf
##  ARIMA(2,0,3) with non-zero mean : Inf
##  ARIMA(3,0,0) with zero mean     : Inf
##  ARIMA(3,0,0) with non-zero mean : 185.3596
##  ARIMA(3,0,1) with zero mean     : Inf
##  ARIMA(3,0,1) with non-zero mean : 185.6658
##  ARIMA(3,0,2) with zero mean     : Inf
##  ARIMA(3,0,2) with non-zero mean : Inf
##  ARIMA(4,0,0) with zero mean     : Inf
##  ARIMA(4,0,0) with non-zero mean : 184.0288
##  ARIMA(4,0,1) with zero mean     : Inf
##  ARIMA(4,0,1) with non-zero mean : 186.8062
##  ARIMA(5,0,0) with zero mean     : Inf
##  ARIMA(5,0,0) with non-zero mean : 187.1538
## 
## 
## 
##  Best model: Regression with ARIMA(0,0,3) errors

print(summary(oModel3))

## Series: dfTemp$gSales 
## Regression with ARIMA(0,0,3) errors 
## 
## Coefficients:
##          ma1     ma2     ma3  intercept  nMonth     gTV  gTV_Lag1  gTV_Lag2
##       0.2529  0.1344  0.4397     7.9935  0.0620  0.0500    0.3119    0.2099
## s.e.  0.1258  0.1245  0.1680     1.1199  0.0124  0.0702    0.0702    0.0682
##         gWeb  gWeb_Lag1
##       0.4481     0.0919
## s.e.  0.1005     0.0983
## 
## sigma^2 estimated as 0.974:  log likelihood=-76.4
## AIC=174.8   AICc=180.54   BIC=197.46
## 
## Training set error measures:
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.01079291 0.8978319 0.7190424 -0.234242 4.471458 0.5692091
##                    ACF1
## Training set 0.01598773
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.01079291 0.8978319 0.7190424 -0.234242 4.471458 0.5692091
##                    ACF1
## Training set 0.01598773

# モデル診断
png(paste0(WORKDIR, "/chap6_code11.1.png"), width = 14, height = 16, res = 72, units = "cm")
tsdiag(oModel3)
dev.off()

## png 
##   2

# FGLS推定
oModel4 <- gls(
  gSales ~ ., 
  data = dfTemp %>% dplyr::select(-dMonth), 
  corr = corARMA(q = 3)
)
print(summary(oModel4))

## Generalized least squares fit by REML
##   Model: gSales ~ . 
##   Data: dfTemp %>% dplyr::select(-dMonth) 
##        AIC      BIC    logLik
##   199.5963 220.8463 -88.79813
## 
## Correlation Structure: ARMA(0,3)
##  Formula: ~1 
##  Parameter estimate(s):
##    Theta1    Theta2    Theta3 
## 0.2852240 0.1436000 0.4123918 
## 
## Coefficients:
##                Value Std.Error  t-value p-value
## (Intercept) 7.946032 1.1809692 6.728399  0.0000
## nMonth      0.062135 0.0133116 4.667781  0.0000
## gTV         0.050873 0.0720590 0.705984  0.4834
## gTV_Lag1    0.315700 0.0700079 4.509495  0.0000
## gTV_Lag2    0.209146 0.0703625 2.972411  0.0045
## gWeb        0.445837 0.0973315 4.580608  0.0000
## gWeb_Lag1   0.098835 0.0917617 1.077078  0.2865
## 
##  Correlation: 
##           (Intr) nMonth gTV    gTV_L1 gTV_L2 gWeb  
## nMonth    -0.399                                   
## gTV       -0.549  0.000                            
## gTV_Lag1  -0.310  0.026 -0.199                     
## gTV_Lag2  -0.523 -0.023  0.213 -0.171              
## gWeb      -0.298  0.065  0.178 -0.163  0.106       
## gWeb_Lag1 -0.218  0.122 -0.046  0.094 -0.115 -0.065
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.00779092 -0.58358882  0.01660233  0.66292952  2.67581835 
## 
## Residual standard error: 1.08131 
## Degrees of freedom: 58 total; 51 residual

# 描画準備
dfPlot <- dfTemp %>%
  mutate(
    gFitted = as.vector(fitted(oModel4))
  ) %>%
  dplyr::select(dMonth, gSales, gFitted) %>%
  gather(sVar, gValue, c(gSales, gFitted)) %>%
  mutate(
    fVar = factor(
      sVar, 
      levels = c("gSales", "gFitted"), 
      labels = c("Actual", "Fitted")
    )
  )
# 描画
g <- ggplot(data = dfPlot, aes(x = dMonth, y = gValue, color = fVar))
g <- g + geom_line()
g <- g + labs(y = "値", x = "年月", color=NULL)
g <- g + theme_bw()
g

ggsave(paste0(WORKDIR, "/chap6_code11.1.png"), width = 24, height = 12, units = "cm")

# 係数表示
mgConfint <- confint(oModel4)
colnames(mgConfint) <- c("Lower", "Upper")
dfConfint <- data.frame(
  sVar = rownames(mgConfint), 
  mgConfint, 
  stringsAsFactors = FALSE
)
agCoef <- coef(oModel4)
dfCoef <- tibble(
  sVar = names(agCoef),
  gCoef = as.vector(agCoef)
)
dfAdd <- tibble(sVar = "gWeb_Lag2", Lower = 0, Upper = 0, gCoef = 0)
dfPlot <- full_join(dfCoef, dfConfint, by = "sVar") %>%
  bind_rows(dfAdd) %>%
  filter(grepl("g(TV|Web)", sVar)) %>%
  mutate( sVar = sub("^g(TV|Web)$", "g\\1_Lag0", sVar)) %>%
  separate(sVar, c("sVar1", "sVar2")) %>%
  mutate(
    nLag = as.integer(sub("Lag", "", sVar2)),
    fLag = factor(nLag, levels = 0:2, labels = c("当月", "翌月", "翌々月")),
    sType = sub("^g", "", sVar1)
  )
g <- ggplot(
  data = dfPlot, 
  aes(x = fLag, y = gCoef, ymin = Lower, ymax = Upper, color = sType)
)
g <- g + geom_hline(yintercept = 0)
g <- g + geom_pointrange(size = 1)
g <- g + facet_grid(sType ~ .)
g <- g + guides(color = FALSE)
g <- g + labs(
  title = "ある月の広告支出を1増やしたときの売上数量の増大", 
  x = NULL, 
  y = "売上数量の増大(95%信頼区間)"
)
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code11.2.png"), width = 15, height = 12, units = "cm")
# 合計の表示
out <- dfPlot %>%
  group_by(sType) %>%
  summarize(gCoef = sum(gCoef)) %>%
  ungroup()
print(out)

## # A tibble: 2 x 2
##   sType gCoef
##   <chr> <dbl>
## 1 TV    0.576
## 2 Web   0.545

code 12. 「売上と広告」自己回帰

# データ作成
tsSales <- ts(dfSalesAds$gSales)
tsTV    <- ts(dfSalesAds$gTV)
tsWeb   <- ts(dfSalesAds$gWeb)
# 推定
oModel5 <- dynlm(tsSales ~ L(tsTV, 0:1) + tsWeb + L(tsSales))
print(summary(oModel5))

## 
## Time series regression with "ts" data:
## Start = 2, End = 60
## 
## Call:
## dynlm(formula = tsSales ~ L(tsTV, 0:1) + tsWeb + L(tsSales))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7147 -0.5968  0.2122  0.6162  2.9425 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.60775    1.63575   2.206  0.03169 *  
## L(tsTV, 0:1)0  0.04824    0.08799   0.548  0.58580    
## L(tsTV, 0:1)1  0.29478    0.09085   3.245  0.00202 ** 
## tsWeb          0.35911    0.11328   3.170  0.00251 ** 
## L(tsSales)     0.53968    0.09255   5.831 3.18e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.208 on 54 degrees of freedom
## Multiple R-squared:  0.5617, Adjusted R-squared:  0.5292 
## F-statistic:  17.3 on 4 and 54 DF,  p-value: 3.442e-09

# 変換
out <- tibble(
  nLag = 0:1000, # 十分に大きな数
  gTV  = 0.048 * c(1, ARMAtoMA(ar = 0.540, ma = c(0.295/0.048), lag.max = 1000)), 
  gWeb = 0.359 * c(1, ARMAtoMA(ar = 0.540,                      lag.max = 1000))
) 
sum(out$gTV)

## [1] 0.7456522

sum(out$gWeb)

## [1] 0.7804348

# 図示
dfPlot <- out %>%
  filter(nLag <= 10) %>%
  gather(sVar, gValue, c(gTV, gWeb)) %>%
  mutate(
    sLag = factor(nLag, levels = 0:10, labels = paste0(0:10, "ヶ月後")), 
    sType = sub("^g", "", sVar)
  ) 
g <- ggplot(data = dfPlot, aes(x = sLag, y = gValue, fill = sType))
g <- g + geom_hline(yintercept = 0)
g <- g + geom_col()
g <- g + facet_grid(sType ~ .)
g <- g + guides(fill = FALSE)
g <- g + labs(
  title = "ある月の広告支出を1増やしたときの売上数量の増大", 
  x = NULL, 
  y = "売上数量の増大(95%信頼区間)"
)
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code12.png"), width = 15, height = 12, units = "cm")

code 13. 「売上と広告」アドストック

dfTemp <- dfSalesAds %>%
  mutate(
    gTVStock  = stats::filter(gTV, filter = 0.5, method = "recursive"),
    gWebStock = stats::filter(gWeb, filter = 0.5, method = "recursive")
  ) 
summary(lm(gSales ~ gTVStock + gWebStock, data = dfTemp))

## 
## Call:
## lm(formula = gSales ~ gTVStock + gWebStock, data = dfTemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6398 -1.1932  0.1214  1.2154  3.7934 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.62407    1.50452   7.061 2.51e-09 ***
## gTVStock     0.24902    0.07604   3.275   0.0018 ** 
## gWebStock    0.28244    0.11132   2.537   0.0139 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.579 on 57 degrees of freedom
## Multiple R-squared:  0.2097, Adjusted R-squared:  0.1819 
## F-statistic:  7.56 on 2 and 57 DF,  p-value: 0.001224

code 14. IBM-SPの図示

dfPlot <- dfIBMSP %>%
  gather(sVar, gValue, c(IBM, SP))
g <- ggplot(data=dfPlot, aes(x = dMonth, y = gValue, color = sVar))
g <- g + geom_line()
g <- g + guides(color = FALSE)
g <- g + labs(x = NULL, y = "収益率(対数)")
g <- g + facet_grid(sVar ~ .)
g <- g + theme_bw()
print(g)

ggsave(paste0(WORKDIR, "/chap6_code14.png"), width = 24, height = 10, units = "cm")

code 15. IBM-SPの単位根検定

ndiffs(dfIBMSP$IBM, test = "adf", type = "trend")

## [1] 0

ndiffs(dfIBMSP$SP, test = "adf", type = "trend")

## [1] 0

code 16. IBM-SPのVARモデル

# VAR次数選択
VARselect(dfIBMSP %>% dplyr::select(IBM, SP), lag.max = 5, type = "const")

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      3      1      1      3 
## 
## $criteria
##                 1          2          3          4          5
## AIC(n)   6.760111   6.758874   6.753468   6.755548   6.753548
## HQ(n)    6.772538   6.779587   6.782466   6.792831   6.799116
## SC(n)    6.792614   6.813046   6.829308   6.853057   6.872725
## FPE(n) 862.737822 861.671992 857.026761 858.811796 857.096638

# VARモデル推定
oVARModel <- VAR(
  dfIBMSP %>% dplyr::select(IBM, SP), 
  p = 1, 
  type = "const"
)
summary(oVARModel)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: IBM, SP 
## Deterministic variables: const 
## Sample size: 887 
## Log Likelihood: -5508.559 
## Roots of the characteristic polynomial:
## 0.06853 0.03086
## Call:
## VAR(y = dfIBMSP %>% dplyr::select(IBM, SP), p = 1, type = "const")
## 
## 
## Estimation results for equation IBM: 
## ==================================== 
## IBM = IBM.l1 + SP.l1 + const 
## 
##        Estimate Std. Error t value Pr(>|t|)    
## IBM.l1  0.01919    0.04334   0.443   0.6579    
## SP.l1   0.10616    0.05167   2.054   0.0402 *  
## const   1.16265    0.22897   5.078 4.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 6.704 on 884 degrees of freedom
## Multiple R-Squared: 0.01047, Adjusted R-squared: 0.008226 
## F-statistic: 4.675 on 2 and 884 DF,  p-value: 0.009562 
## 
## 
## Estimation results for equation SP: 
## =================================== 
## SP = IBM.l1 + SP.l1 + const 
## 
##         Estimate Std. Error t value Pr(>|t|)   
## IBM.l1 -0.005419   0.036441  -0.149  0.88183   
## SP.l1   0.080189   0.043453   1.845  0.06531 . 
## const   0.499350   0.192541   2.593  0.00966 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 5.638 on 884 degrees of freedom
## Multiple R-Squared: 0.005809,    Adjusted R-squared: 0.003559 
## F-statistic: 2.582 on 2 and 884 DF,  p-value: 0.07616 
## 
## 
## 
## Covariance matrix of residuals:
##       IBM    SP
## IBM 44.95 23.94
## SP  23.94 31.78
## 
## Correlation matrix of residuals:
##        IBM     SP
## IBM 1.0000 0.6334
## SP  0.6334 1.0000

# SPはIBMのグレンジャー原因か？
causality(oVARModel, cause = "SP")

## $Granger
## 
##  Granger causality H0: SP do not Granger-cause IBM
## 
## data:  VAR object oVARModel
## F-Test = 4.2205, df1 = 1, df2 = 1768, p-value = 0.04009
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: SP and IBM
## 
## data:  VAR object oVARModel
## Chi-squared = 253.94, df = 1, p-value < 2.2e-16

# IBMはSPのグレンジャー原因か？
causality(oVARModel, cause = "IBM")

## $Granger
## 
##  Granger causality H0: IBM do not Granger-cause SP
## 
## data:  VAR object oVARModel
## F-Test = 0.022111, df1 = 1, df2 = 1768, p-value = 0.8818
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: IBM and SP
## 
## data:  VAR object oVARModel
## Chi-squared = 253.94, df = 1, p-value < 2.2e-16

# IRF
# チャートの出力をうまく合成できない...
png(paste0(WORKDIR, "/chap6_code16.1.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel, impulse = "SP"))
dev.off()

## png 
##   2

png(paste0(WORKDIR, "/chap6_code16.2.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel, impulse = "IBM"))
dev.off()

## png 
##   2

code 17. 広告と売上のVARモデル

VARselect(
  dfSalesAds %>% dplyr::select(gSales, gTV, gWeb), 
  lag.max = 5, 
  type = "trend"
)

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      1      1      1      1 
## 
## $criteria
##               1        2        3         4         5
## AIC(n) 2.154837 2.281246 2.265232  2.490680  2.691815
## HQ(n)  2.324201 2.577633 2.688642  3.041113  3.369271
## SC(n)  2.592801 3.047682 3.360141  3.914061  4.443669
## FPE(n) 8.633145 9.829711 9.752239 12.403913 15.544622

oVARModel1 <- VAR(
  dfSalesAds %>% dplyr::select(gSales, gTV), 
  p = 1, 
  type = "trend"
)
oVARModel2 <- VAR(
  dfSalesAds %>% dplyr::select(gSales, gWeb), 
  p = 1, 
  type = "trend"
)
png(paste0(WORKDIR, "/chap6_code17.1.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel1, impulse = "gTV", response = c("gSales")))
dev.off()

## png 
##   2

png(paste0(WORKDIR, "/chap6_code17.2.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel2, impulse = "gWeb", response = c("gSales")))
dev.off()

## png 
##   2

Appendix. 動的回帰分析のためのRパッケージ

# 動作確認用データ
set.seed(123)
nLength <- 10000

# dfDirectLag: 
# omega[0]=0.3, omega[1]=0.2, theta[1]=0.25, phi[1]=0.1
dfDirectLag <- tibble(gX = rnorm(nLength)) %>%
  mutate(
    gXLag1 = dplyr::lag(gX),
    gState = as.vector(
      0.3 * arima.sim(list(ma=0.2/0.3), n=nLength, innov=gX)
    ),
    gErr   = as.vector(
      arima.sim(list(ma=0.25, ar=0.1), n=nLength)
    ),
    gY     = gState + gErr
  ) %>%
  drop_na()
mgDirectLagX <- dfDirectLag %>% 
  dplyr::select(gX, gXLag1) %>% 
  as.matrix(.)

# dfADL: 
# omega[0]=0.4, omega[1]=0.3, phi[1]=0.2
dfADL <- tibble(gX = rnorm(nLength)) %>%
  mutate(
    gState = as.vector(
      0.4 * arima.sim(list(ma=0.3/0.4, ar=0.2), n=nLength, innov=gX)
    ),
    gErr   = as.vector(
      arima.sim(list(ar=0.2), n=nLength)
    ),
    gY     = gState + gErr
  ) %>%
  drop_na()

# dfTransfer: 
# omega[0]=0.3, omega[1]=0.2, delta[1]=0.2, theta[1]=0.25, phi[1]=0.1
dfTransfer <- tibble(gX = rnorm(nLength)) %>%
  mutate(
    gState = as.vector(
      0.3 * arima.sim(list(ma=0.2/0.3, ar=0.2), n=nLength, innov=gX)
    ),
    gErr   = as.vector(
      arima.sim(list(ma=0.25, ar=0.1), n=nLength)
    ),
    gY     = gState + gErr
  ) %>%
  drop_na()

forecast パッケージ

library(forecast)

# forecast::Arima()
oModel <- Arima(
  dfDirectLag$gY,       # 目的変数
  order = c(1, 0, 1),   # 撹乱項のARMA次数
  xreg  = mgDirectLagX  # 説明変数行列
)
print(oModel)

## Series: dfDirectLag$gY 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1     ma1  intercept      gX  gXLag1
##       0.1185  0.2372    -0.0144  0.2971  0.2139
## s.e.  0.0285  0.0278     0.0141  0.0100  0.0100
## 
## sigma^2 estimated as 1.004:  log likelihood=-14203.07
## AIC=28418.15   AICc=28418.16   BIC=28461.41

## forecast::auto.arima()
oModel <- auto.arima(
  dfDirectLag$gY,       # 目的変数
  d = 0,                # 差分階数
  xreg = mgDirectLagX,  # 撹乱項のARMA次数行列
  trace = TRUE, approximation = FALSE
)

## 
##  ARIMA(2,0,2) with non-zero mean : 28421.94
##  ARIMA(0,0,0) with non-zero mean : 29620.56
##  ARIMA(1,0,0) with non-zero mean : 28481.44
##  ARIMA(0,0,1) with non-zero mean : 28433
##  ARIMA(0,0,0) with zero mean     : 29620.36
##  ARIMA(1,0,2) with non-zero mean : 28420
##  ARIMA(0,0,2) with non-zero mean : 28418.01
##  ARIMA(0,0,3) with non-zero mean : 28419.99
##  ARIMA(1,0,1) with non-zero mean : 28418.16
##  ARIMA(1,0,3) with non-zero mean : 28422.01
##  ARIMA(0,0,2) with zero mean     : 28417.06
##  ARIMA(0,0,1) with zero mean     : 28432.13
##  ARIMA(1,0,2) with zero mean     : 28419.04
##  ARIMA(0,0,3) with zero mean     : 28419.04
##  ARIMA(1,0,1) with zero mean     : 28417.19
##  ARIMA(1,0,3) with zero mean     : 28421.06
## 
##  Best model: Regression with ARIMA(0,0,2) errors

print(oModel)

## Series: dfDirectLag$gY 
## Regression with ARIMA(0,0,2) errors 
## 
## Coefficients:
##          ma1     ma2      gX  gXLag1
##       0.3558  0.0414  0.2972   0.214
## s.e.  0.0100  0.0100  0.0100   0.010
## 
## sigma^2 estimated as 1.004:  log likelihood=-14203.53
## AIC=28417.05   AICc=28417.06   BIC=28453.11

nlmeパッケージ

library(nlme)

## nlme::gls() 
oModel <- gls(
  gY ~ gX + gXLag1, 
  corr = corARMA(p = 1, q = 1),     # ARMA撹乱項の次数
  data = dfDirectLag %>% head(1000) # 計算時間節約のため時点数を減らした
)
print(summary(oModel))

## Generalized least squares fit by REML
##   Model: gY ~ gX + gXLag1 
##   Data: dfDirectLag %>% head(1000) 
##        AIC      BIC    logLik
##   2863.836 2893.264 -1425.918
## 
## Correlation Structure: ARMA(1,1)
##  Formula: ~1 
##  Parameter estimate(s):
##      Phi1    Theta1 
## 0.1353544 0.1875080 
## 
## Coefficients:
##                  Value  Std.Error  t-value p-value
## (Intercept) 0.02957195 0.04347242 0.680246  0.4965
## gX          0.30609021 0.03186142 9.606922  0.0000
## gXLag1      0.25455108 0.03187183 7.986711  0.0000
## 
##  Correlation: 
##        (Intr) gX    
## gX     -0.015       
## gXLag1 -0.015  0.326
## 
## Standardized residuals:
##          Min           Q1          Med           Q3          Max 
## -3.258600581 -0.630099992  0.005858342  0.661148481  3.347699148 
## 
## Residual standard error: 1.052911 
## Degrees of freedom: 1000 total; 997 residual

dynlmパッケージ

library(dynlm)

# 使用するデータはtsオブジェクトにする
tsY <-  ts(dfADL$gY)
tsX  <- ts(dfADL$gX)

## dynlm::dynlm()
## L(変数名, 次数) でラグ変数を指定
oModel <- dynlm(tsY ~  L(tsX, 0:1) + L(tsY, 1))
print(summary(oModel))

## 
## Time series regression with "ts" data:
## Start = 2, End = 10000
## 
## Call:
## dynlm(formula = tsY ~ L(tsX, 0:1) + L(tsY, 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1152 -0.6740  0.0063  0.6740  3.9827 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.004895   0.009997   -0.49    0.624    
## L(tsX, 0:1)0  0.390970   0.009996   39.11   <2e-16 ***
## L(tsX, 0:1)1  0.301471   0.010623   28.38   <2e-16 ***
## L(tsY, 1)     0.197672   0.009168   21.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9996 on 9995 degrees of freedom
## Multiple R-squared:  0.2561, Adjusted R-squared:  0.2559 
## F-statistic:  1147 on 3 and 9995 DF,  p-value: < 2.2e-16

TSAパッケージ

library(TSA)

## TSA::arimax()
oModel <- arimax(
  dfTransfer$gY[1:1000],
  order    = c(1, 0, 1),  # 撹乱項のARMA次数
  xtransf  = as.matrix(dfTransfer$gX[1:1000]), # 説明変数行列
  transfer = list(c(1,1)),  # 説明変数のARMA次数。
                            # 説明変数が複数あるときはその順に並べる
  method   = "ML"
)
print(oModel)

## 
## Call:
## arimax(x = dfTransfer$gY[1:1000], order = c(1, 0, 1), method = "ML", xtransf = as.matrix(dfTransfer$gX[1:1000]), 
##     transfer = list(c(1, 1)))
## 
## Coefficients:
##          ar1     ma1  intercept  T1-AR1  T1-MA0  T1-MA1
##       0.1264  0.2384     0.0465  0.2029  0.2997  0.2357
## s.e.  0.0846  0.0823     0.0454  0.1084  0.0324  0.0424
## 
## sigma^2 estimated as 1.024:  log likelihood = -1429.4,  aic = 2870.8

以上

MMM研修 VI. 動学的市場反応モデル (1)

Shigeru ONO

2020年4月9日

準備

code 1. negative binomial lag

code 2. 「売上と広告」データのチャート

code 3. 「売上と広告」データのACF

code 4. 「売上と広告」データの単位根検定

code 5. 「売上と広告」データの散布図

code 6. 「売上と広告」データのccf

code 7. CCFとprewhitened CCF

code 8. 「売上と広告」データのprewhitened ccf

code 9. 「売上と広告」OLS回帰

code 10. 「売上と広告」ラグ選択

code 11. 「売上と広告」モデル構築

code 12. 「売上と広告」自己回帰

code 13. 「売上と広告」アドストック

code 14. IBM-SPの図示

code 15. IBM-SPの単位根検定

code 16. IBM-SPのVARモデル

code 17. 広告と売上のVARモデル

Appendix. 動的回帰分析のためのRパッケージ

forecast パッケージ

nlmeパッケージ

dynlmパッケージ

TSAパッケージ