このページでは、株式会社インサイト・ファクトリー 2020年度社内研修「マーケティング・ミックス・モデリング」VI章「動学的市場反応モデル (1)」のRコードを公開しています。
library(tidyverse)
## -- Attaching packages --------------------------------------------------------- tidyverse 1.3.0 --
## √ ggplot2 3.3.0 √ purrr 0.3.3
## √ tibble 3.0.0 √ dplyr 0.8.5
## √ tidyr 1.0.2 √ stringr 1.4.0
## √ readr 1.3.1 √ forcats 0.5.0
## -- Conflicts ------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(patchwork)
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
library(TSA)
##
## Attaching package: 'TSA'
## The following object is masked from 'package:readr':
##
## spec
## The following objects are masked from 'package:stats':
##
## acf, arima
## The following object is masked from 'package:utils':
##
## tar
library(nlme)
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
## method from
## fitted.Arima TSA
## plot.Arima TSA
##
## Attaching package: 'forecast'
## The following object is masked from 'package:nlme':
##
## getResponse
library(dynlm)
library(FinTS)
##
## Attaching package: 'FinTS'
## The following object is masked from 'package:forecast':
##
## Acf
library(vars)
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:patchwork':
##
## area
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: strucchange
## Loading required package: sandwich
##
## Attaching package: 'strucchange'
## The following object is masked from 'package:stringr':
##
## boundary
## Loading required package: urca
## Loading required package: lmtest
WORKDIR <- "."
# 「売上と広告」データの作成
set.seed(123)
dfSalesAds <- tibble(
# 年月
dMonth = seq(as.Date("2013-09-01"), as.Date("2018-12-01"), by = "month"),
# TV広告支出
gTV = as.vector(
stats::filter(
rnorm(length(dMonth), mean = 5, sd = 2),
filter = 0.4,
method = "recursive" # AR
)
),
# Web広告支出
gWeb = as.vector(
stats::filter(
rnorm(length(dMonth), mean = 2, sd = 1.5),
filter = 0.3,
method = "recursive" # AR
)
),
# 売上の撹乱項
gYErr = as.vector(
stats::filter(
rnorm(length(dMonth), mean = 0, sd = 1),
filter = 0.45,
method = "recursive"
)
)
) %>%
mutate(
# 広告支出の負値を0に置き換え、丸める
gTV = round(if_else(gTV < 0, 0, gTV), digits = 1),
gWeb = round(if_else(gWeb < 0, 0, gWeb), digits = 1),
# 売上の生成
gSales = 6 + 0.05 * seq_along(dMonth),
gSales = gSales + 0.10 * gTV ,
gSales = gSales + 0.30 * lag(gTV, n = 1) ,
gSales = gSales + 0.20 * lag(gTV, n = 2),
gSales = gSales + 0.10 * lag(gTV, n = 3),
gSales = gSales + 0.05 * lag(gTV, n = 4),
gSales = gSales + 0.50 * gWeb,
gSales = gSales + 0.30 * lag(gWeb, n = 1),
gSales = gSales + gYErr
) %>%
# 変数選択
dplyr::select(dMonth, gSales, gTV, gWeb) %>%
# ここで欠損行を削除
drop_na()
## print(summary(dfSalesAds))
# IBM-SPデータの作成
data(m.ibmspln)
dfIBMSP <- data.frame(m.ibmspln) %>%
mutate(
nYear = as.integer(time(m.ibmspln)),
nMonth = as.integer(cycle(m.ibmspln)),
dMonth = as.Date(paste0(nYear, "-", nMonth, "-01"))
) %>%
dplyr::select(dMonth, IBM, SP)
dfOmega <- crossing(
nR = 1:4,
nK = 0:10,
gLambda = seq(0.2, 0.8, 0.2)
) %>%
mutate(
gTemp1 = gamma(nR + nK), # (r+k-1)!
gTemp2 = gamma(nR), # (r-1)!
gTemp3 = gamma(nK + 1), # k!
gOmega = gTemp1 * ((1-gLambda)^nR) * (gLambda^nK) / (gTemp2 * gTemp3)
)
## print(dfOmega)
# 描画
dfPlot <- dfOmega %>%
mutate( sR = paste0("r=", nR) )
g <- ggplot(
data=dfPlot,
aes(x = nK, y = gOmega, group = gLambda, color = factor(gLambda))
)
g <- g + geom_line()
g <- g + geom_point()
g <- g + scale_x_continuous(breaks=0:10)
g <- g + facet_grid(~ sR)
g <- g + labs(color = "lambda", x = "k", y = "omega")
g <- g + theme_bw()
print(g)
ggsave(paste0(WORKDIR, "/chap6_code1.png"), width = 21, height = 12, units = "cm")
dfPlot <- dfSalesAds %>%
gather(sVar, gValue, c(gSales, gTV, gWeb)) %>%
mutate(
fVar = factor(
sVar,
levels = c("gSales", "gTV", "gWeb"),
labels = c("売上数量", "TV支出", "Web支出")
)
)
g <- ggplot(data = dfPlot, aes(x = dMonth, y = gValue))
g <- g + geom_line()
g <- g + facet_grid(fVar ~ ., scales = "free")
g <- g + labs(x = "年月", y = "値")
g <- g + theme_bw()
print(g)
ggsave(paste0(WORKDIR, "/chap6_code2.png"), width = 21, height = 9, units = "cm")
png(paste0(WORKDIR, "/chap6_code3.png"), width = 15, height = 25, res = 72, units = "cm")
par(mfrow=c(3,1))
acf(dfSalesAds$gSales)
acf(dfSalesAds$gTV)
acf(dfSalesAds$gWeb)
dev.off()
## png
## 2
ndiffs(dfSalesAds$gSales, test = "adf", type = "trend")
## [1] 0
ndiffs(dfSalesAds$gWeb, test = "adf", type = "trend")
## [1] 0
ndiffs(dfSalesAds$gTV, test = "adf", type = "trend")
## [1] 0
dfPlot <- dfSalesAds %>%
mutate(
fYear = factor(year(dMonth)),
sMonth = as.character(month(dMonth))
)
# TV
g <- ggplot(data = dfPlot, aes(x = gTV, y = gSales, color = fYear, label = sMonth))
g <- g + geom_text()
g <- g + geom_path(alpha=0.5)
g <- g + guides(colour=FALSE)
g <- g + labs(title = "TV", x = "TV支出", y = "売上")
g <- g + facet_wrap(~ fYear)
g1 <- g + theme_bw()
# Web
g <- ggplot(data = dfPlot, aes(x = gWeb, y = gSales, color = fYear, label = sMonth))
g <- g + geom_text()
g <- g + geom_path(alpha=0.5)
g <- g + guides(colour=FALSE)
g <- g + labs(title = "Web", x = "Web支出", y = "売上")
g <- g + facet_wrap(~ fYear)
g2 <- g + theme_bw()
# 合成
g <- g1 + g2
print(g)
ggsave(paste0(WORKDIR, "/chap6_code5.png"), width = 24, height = 10, units = "cm")
png(paste0(WORKDIR, "/chap6_code6.png"), width = 24, height = 9, res = 72, units = "cm")
par(mfrow=c(1,2))
ccf(dfSalesAds$gTV, dfSalesAds$gSales, main = "TV vs. Sales")
ccf(dfSalesAds$gWeb, dfSalesAds$gSales, main = "Web vs. Sales")
dev.off()
## png
## 2
# データ生成
set.seed(123)
sLength <- 10000
dfTemp <- tibble(
gX = as.vector(stats::filter(rnorm(sLength), filter = 0.5, method = "recursive")),
gE = stats::filter(rnorm(sLength), filter = 0.1, method = "recursive"),
gY = gX + gE
) %>%
drop_na()
# CCF
png(paste0(WORKDIR, "/chap6_code7.1.png"), width = 15, height = 10, res = 72, units = "cm")
ccf(dfTemp$gX, dfTemp$gY, ylab = "CCF", lag.max = 10, main = "X vs. Y")
dev.off()
## png
## 2
# CCFとprewhitened CCF
png(paste0(WORKDIR, "/chap6_code7.2.png"), width = 20, height = 8, res = 72, units = "cm")
par(mfrow=c(1,2))
ccf(dfTemp$gX, dfTemp$gY, ylab = "CCF", lag.max = 10, main = "標本CCF")
prewhiten(dfTemp$gX, dfTemp$gY, ylab = "prewhitened CCF", lag.max = 10, main = "プレホワイト化標本CCF")
dev.off()
## png
## 2
png(paste0(WORKDIR, "/chap6_code8.png"), width = 24, height = 9, res = 72, units = "cm")
par(mfrow=c(1,2))
prewhiten(dfSalesAds$gTV, dfSalesAds$gSales, ylab = "prewhitend CCF", main = "TV vs. Sales")
prewhiten(dfSalesAds$gWeb, dfSalesAds$gSales,ylab = "prewhitend CCF", main = "Web vs. Sales")
dev.off()
## png
## 2
dfTemp <- dfSalesAds %>%
mutate(
nMonth = seq_along(dMonth),
gTV_Lag1 = dplyr::lag(gTV, 1),
gTV_Lag2 = dplyr::lag(gTV, 2),
gTV_Lag3 = dplyr::lag(gTV, 3),
gTV_Lag4 = dplyr::lag(gTV, 4),
gWeb_Lag1 = dplyr::lag(gWeb, 1),
gWeb_Lag2 = dplyr::lag(gWeb, 2),
gWeb_Lag3 = dplyr::lag(gWeb, 3),
gWeb_Lag4 = dplyr::lag(gWeb, 4)
)
oModel1 <- lm(
gSales ~ nMonth + gTV + gTV_Lag1 + gTV_Lag2 + gTV_Lag3 + gTV_Lag4
+ gWeb + gWeb_Lag1 + gWeb_Lag2 + gWeb_Lag3 + gWeb_Lag4,
data = dfTemp
)
print(summary(oModel1))
##
## Call:
## lm(formula = gSales ~ nMonth + gTV + gTV_Lag1 + gTV_Lag2 + gTV_Lag3 +
## gTV_Lag4 + gWeb + gWeb_Lag1 + gWeb_Lag2 + gWeb_Lag3 + gWeb_Lag4,
## data = dfTemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.24975 -0.57631 -0.09088 0.64647 2.83571
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.90296 1.42851 4.832 1.67e-05 ***
## nMonth 0.06689 0.00978 6.840 1.96e-08 ***
## gTV 0.04807 0.08728 0.551 0.584581
## gTV_Lag1 0.33001 0.09175 3.597 0.000811 ***
## gTV_Lag2 0.20652 0.09026 2.288 0.026999 *
## gTV_Lag3 -0.07086 0.08925 -0.794 0.431473
## gTV_Lag4 0.05953 0.08633 0.690 0.494112
## gWeb 0.47568 0.13080 3.637 0.000721 ***
## gWeb_Lag1 0.21089 0.12524 1.684 0.099304 .
## gWeb_Lag2 0.08791 0.12480 0.704 0.484910
## gWeb_Lag3 -0.05198 0.11593 -0.448 0.656088
## gWeb_Lag4 0.16409 0.11695 1.403 0.167634
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.077 on 44 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.7101, Adjusted R-squared: 0.6376
## F-statistic: 9.797 on 11 and 44 DF, p-value: 1.219e-08
# データ作成
dfTemp <- dfSalesAds %>%
mutate(
nMonth = seq_along(dMonth),
gTV_Lag1 = dplyr::lag(gTV, 1),
gTV_Lag2 = dplyr::lag(gTV, 2),
gWeb_Lag1 = dplyr::lag(gWeb, 1),
gWeb_Lag2 = dplyr::lag(gWeb, 2),
) %>%
dplyr::select(
gSales, nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1, gWeb_Lag2
) %>%
# ここでNAを持つ行を削る。
# 仮に削らないと、モデルによって使う行が変わってしまう
drop_na() %>%
as.data.frame()
# ラグを変えながらOLS推定
lModel2 <- list(
lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, F,F)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, T,F)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, F,F, T, T,T)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, F,F)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, T,F)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, T,F, T, T,T)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, F,F)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, T,F)]),
lm(gSales ~ ., data = dfTemp[c(T,T,T, T,T, T, T,T)])
)
# 結果のまとめ
lOut <- lapply(
seq_along(lModel2),
function(nModelID){
oModel <- lModel2[[nModelID]]
agCoef <- coef(oModel)
out <- tibble(
nModelID = nModelID,
sCoef = c(names(agCoef), "AIC"),
gValue = c(as.vector(agCoef), AIC(oModel))
)
return(out)
}
)
out <- bind_rows(lOut) %>%
spread(sCoef, gValue)
print(out)
## # A tibble: 9 x 10
## nModelID `(Intercept)` AIC gTV gTV_Lag1 gTV_Lag2 gWeb gWeb_Lag1
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 11.5 202. 0.182 NA NA 0.478 NA
## 2 2 11.1 202. 0.174 NA NA 0.415 0.209
## 3 3 10.9 203. 0.173 NA NA 0.418 0.188
## 4 4 9.55 186. 0.0479 0.373 NA 0.465 NA
## 5 5 8.98 183. 0.0352 0.384 NA 0.391 0.241
## 6 6 8.89 185. 0.0351 0.383 NA 0.393 0.229
## 7 7 8.15 181. 0.0589 0.300 0.208 0.527 NA
## 8 8 7.72 179. 0.0466 0.315 0.194 0.456 0.220
## 9 9 7.57 180. 0.0467 0.312 0.197 0.460 0.202
## # ... with 2 more variables: gWeb_Lag2 <dbl>, nMonth <dbl>
write.csv(out, paste0(WORKDIR, "/chap6_code10.csv"), row.names = F)
# VIF
car::vif(lModel2[[8]])
## nMonth gTV gTV_Lag1 gTV_Lag2 gWeb gWeb_Lag1
## 1.055391 1.181920 1.302087 1.202703 1.199167 1.147182
# BG test
lmtest::bgtest(lModel2[[8]])
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: lModel2[[8]]
## LM test = 3.483, df = 1, p-value = 0.062
# 残差プロットとACF
png(paste0(WORKDIR, "/chap6_code10.1.png"), width = 12, height = 12, res = 72, units = "cm")
plot(residuals(lModel2[[8]]), type = "l")
dev.off()
## png
## 2
png(paste0(WORKDIR, "/chap6_code10.2.png"), width = 12, height = 12, res = 72, units = "cm")
plot(stats::acf(residuals(lModel2[[8]])))
dev.off()
## png
## 2
dfTemp <- dfSalesAds %>%
mutate(
nMonth = seq_along(dMonth),
gTV_Lag1 = dplyr::lag(gTV, 1),
gTV_Lag2 = dplyr::lag(gTV, 2),
gWeb_Lag1 = dplyr::lag(gWeb, 1),
) %>%
dplyr::select(
gSales, dMonth, nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1
) %>%
drop_na() %>%
as.data.frame()
# ML推定
oModel3 <- auto.arima(
dfTemp$gSales,
d = 0,
xreg = dfTemp %>%
dplyr::select(nMonth, gTV, gTV_Lag1, gTV_Lag2, gWeb, gWeb_Lag1) %>%
as.matrix(.),
trace = TRUE, approximation = FALSE, stepwise = FALSE
)
##
## ARIMA(0,0,0) with zero mean : 220.5417
## ARIMA(0,0,0) with non-zero mean : 181.525
## ARIMA(0,0,1) with zero mean : 207.8165
## ARIMA(0,0,1) with non-zero mean : 181.1315
## ARIMA(0,0,2) with zero mean : 210.1012
## ARIMA(0,0,2) with non-zero mean : 183.7719
## ARIMA(0,0,3) with zero mean : 206.0413
## ARIMA(0,0,3) with non-zero mean : 180.5368
## ARIMA(0,0,4) with zero mean : 208.9584
## ARIMA(0,0,4) with non-zero mean : 183.444
## ARIMA(0,0,5) with zero mean : Inf
## ARIMA(0,0,5) with non-zero mean : 186.7485
## ARIMA(1,0,0) with zero mean : 203.518
## ARIMA(1,0,0) with non-zero mean : 180.6375
## ARIMA(1,0,1) with zero mean : Inf
## ARIMA(1,0,1) with non-zero mean : 183.0648
## ARIMA(1,0,2) with zero mean : Inf
## ARIMA(1,0,2) with non-zero mean : 185.7695
## ARIMA(1,0,3) with zero mean : 208.6526
## ARIMA(1,0,3) with non-zero mean : 183.5167
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 186.779
## ARIMA(2,0,0) with zero mean : 198.8459
## ARIMA(2,0,0) with non-zero mean : 183.0783
## ARIMA(2,0,1) with zero mean : Inf
## ARIMA(2,0,1) with non-zero mean : Inf
## ARIMA(2,0,2) with zero mean : Inf
## ARIMA(2,0,2) with non-zero mean : Inf
## ARIMA(2,0,3) with zero mean : Inf
## ARIMA(2,0,3) with non-zero mean : Inf
## ARIMA(3,0,0) with zero mean : Inf
## ARIMA(3,0,0) with non-zero mean : 185.3596
## ARIMA(3,0,1) with zero mean : Inf
## ARIMA(3,0,1) with non-zero mean : 185.6658
## ARIMA(3,0,2) with zero mean : Inf
## ARIMA(3,0,2) with non-zero mean : Inf
## ARIMA(4,0,0) with zero mean : Inf
## ARIMA(4,0,0) with non-zero mean : 184.0288
## ARIMA(4,0,1) with zero mean : Inf
## ARIMA(4,0,1) with non-zero mean : 186.8062
## ARIMA(5,0,0) with zero mean : Inf
## ARIMA(5,0,0) with non-zero mean : 187.1538
##
##
##
## Best model: Regression with ARIMA(0,0,3) errors
print(summary(oModel3))
## Series: dfTemp$gSales
## Regression with ARIMA(0,0,3) errors
##
## Coefficients:
## ma1 ma2 ma3 intercept nMonth gTV gTV_Lag1 gTV_Lag2
## 0.2529 0.1344 0.4397 7.9935 0.0620 0.0500 0.3119 0.2099
## s.e. 0.1258 0.1245 0.1680 1.1199 0.0124 0.0702 0.0702 0.0682
## gWeb gWeb_Lag1
## 0.4481 0.0919
## s.e. 0.1005 0.0983
##
## sigma^2 estimated as 0.974: log likelihood=-76.4
## AIC=174.8 AICc=180.54 BIC=197.46
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01079291 0.8978319 0.7190424 -0.234242 4.471458 0.5692091
## ACF1
## Training set 0.01598773
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01079291 0.8978319 0.7190424 -0.234242 4.471458 0.5692091
## ACF1
## Training set 0.01598773
# モデル診断
png(paste0(WORKDIR, "/chap6_code11.1.png"), width = 14, height = 16, res = 72, units = "cm")
tsdiag(oModel3)
dev.off()
## png
## 2
# FGLS推定
oModel4 <- gls(
gSales ~ .,
data = dfTemp %>% dplyr::select(-dMonth),
corr = corARMA(q = 3)
)
print(summary(oModel4))
## Generalized least squares fit by REML
## Model: gSales ~ .
## Data: dfTemp %>% dplyr::select(-dMonth)
## AIC BIC logLik
## 199.5963 220.8463 -88.79813
##
## Correlation Structure: ARMA(0,3)
## Formula: ~1
## Parameter estimate(s):
## Theta1 Theta2 Theta3
## 0.2852240 0.1436000 0.4123918
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 7.946032 1.1809692 6.728399 0.0000
## nMonth 0.062135 0.0133116 4.667781 0.0000
## gTV 0.050873 0.0720590 0.705984 0.4834
## gTV_Lag1 0.315700 0.0700079 4.509495 0.0000
## gTV_Lag2 0.209146 0.0703625 2.972411 0.0045
## gWeb 0.445837 0.0973315 4.580608 0.0000
## gWeb_Lag1 0.098835 0.0917617 1.077078 0.2865
##
## Correlation:
## (Intr) nMonth gTV gTV_L1 gTV_L2 gWeb
## nMonth -0.399
## gTV -0.549 0.000
## gTV_Lag1 -0.310 0.026 -0.199
## gTV_Lag2 -0.523 -0.023 0.213 -0.171
## gWeb -0.298 0.065 0.178 -0.163 0.106
## gWeb_Lag1 -0.218 0.122 -0.046 0.094 -0.115 -0.065
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.00779092 -0.58358882 0.01660233 0.66292952 2.67581835
##
## Residual standard error: 1.08131
## Degrees of freedom: 58 total; 51 residual
# 描画準備
dfPlot <- dfTemp %>%
mutate(
gFitted = as.vector(fitted(oModel4))
) %>%
dplyr::select(dMonth, gSales, gFitted) %>%
gather(sVar, gValue, c(gSales, gFitted)) %>%
mutate(
fVar = factor(
sVar,
levels = c("gSales", "gFitted"),
labels = c("Actual", "Fitted")
)
)
# 描画
g <- ggplot(data = dfPlot, aes(x = dMonth, y = gValue, color = fVar))
g <- g + geom_line()
g <- g + labs(y = "値", x = "年月", color=NULL)
g <- g + theme_bw()
g
ggsave(paste0(WORKDIR, "/chap6_code11.1.png"), width = 24, height = 12, units = "cm")
# 係数表示
mgConfint <- confint(oModel4)
colnames(mgConfint) <- c("Lower", "Upper")
dfConfint <- data.frame(
sVar = rownames(mgConfint),
mgConfint,
stringsAsFactors = FALSE
)
agCoef <- coef(oModel4)
dfCoef <- tibble(
sVar = names(agCoef),
gCoef = as.vector(agCoef)
)
dfAdd <- tibble(sVar = "gWeb_Lag2", Lower = 0, Upper = 0, gCoef = 0)
dfPlot <- full_join(dfCoef, dfConfint, by = "sVar") %>%
bind_rows(dfAdd) %>%
filter(grepl("g(TV|Web)", sVar)) %>%
mutate( sVar = sub("^g(TV|Web)$", "g\\1_Lag0", sVar)) %>%
separate(sVar, c("sVar1", "sVar2")) %>%
mutate(
nLag = as.integer(sub("Lag", "", sVar2)),
fLag = factor(nLag, levels = 0:2, labels = c("当月", "翌月", "翌々月")),
sType = sub("^g", "", sVar1)
)
g <- ggplot(
data = dfPlot,
aes(x = fLag, y = gCoef, ymin = Lower, ymax = Upper, color = sType)
)
g <- g + geom_hline(yintercept = 0)
g <- g + geom_pointrange(size = 1)
g <- g + facet_grid(sType ~ .)
g <- g + guides(color = FALSE)
g <- g + labs(
title = "ある月の広告支出を1増やしたときの売上数量の増大",
x = NULL,
y = "売上数量の増大(95%信頼区間)"
)
g <- g + theme_bw()
print(g)
ggsave(paste0(WORKDIR, "/chap6_code11.2.png"), width = 15, height = 12, units = "cm")
# 合計の表示
out <- dfPlot %>%
group_by(sType) %>%
summarize(gCoef = sum(gCoef)) %>%
ungroup()
print(out)
## # A tibble: 2 x 2
## sType gCoef
## <chr> <dbl>
## 1 TV 0.576
## 2 Web 0.545
# データ作成
tsSales <- ts(dfSalesAds$gSales)
tsTV <- ts(dfSalesAds$gTV)
tsWeb <- ts(dfSalesAds$gWeb)
# 推定
oModel5 <- dynlm(tsSales ~ L(tsTV, 0:1) + tsWeb + L(tsSales))
print(summary(oModel5))
##
## Time series regression with "ts" data:
## Start = 2, End = 60
##
## Call:
## dynlm(formula = tsSales ~ L(tsTV, 0:1) + tsWeb + L(tsSales))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7147 -0.5968 0.2122 0.6162 2.9425
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.60775 1.63575 2.206 0.03169 *
## L(tsTV, 0:1)0 0.04824 0.08799 0.548 0.58580
## L(tsTV, 0:1)1 0.29478 0.09085 3.245 0.00202 **
## tsWeb 0.35911 0.11328 3.170 0.00251 **
## L(tsSales) 0.53968 0.09255 5.831 3.18e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.208 on 54 degrees of freedom
## Multiple R-squared: 0.5617, Adjusted R-squared: 0.5292
## F-statistic: 17.3 on 4 and 54 DF, p-value: 3.442e-09
# 変換
out <- tibble(
nLag = 0:1000, # 十分に大きな数
gTV = 0.048 * c(1, ARMAtoMA(ar = 0.540, ma = c(0.295/0.048), lag.max = 1000)),
gWeb = 0.359 * c(1, ARMAtoMA(ar = 0.540, lag.max = 1000))
)
sum(out$gTV)
## [1] 0.7456522
sum(out$gWeb)
## [1] 0.7804348
# 図示
dfPlot <- out %>%
filter(nLag <= 10) %>%
gather(sVar, gValue, c(gTV, gWeb)) %>%
mutate(
sLag = factor(nLag, levels = 0:10, labels = paste0(0:10, "ヶ月後")),
sType = sub("^g", "", sVar)
)
g <- ggplot(data = dfPlot, aes(x = sLag, y = gValue, fill = sType))
g <- g + geom_hline(yintercept = 0)
g <- g + geom_col()
g <- g + facet_grid(sType ~ .)
g <- g + guides(fill = FALSE)
g <- g + labs(
title = "ある月の広告支出を1増やしたときの売上数量の増大",
x = NULL,
y = "売上数量の増大(95%信頼区間)"
)
g <- g + theme_bw()
print(g)
ggsave(paste0(WORKDIR, "/chap6_code12.png"), width = 15, height = 12, units = "cm")
dfTemp <- dfSalesAds %>%
mutate(
gTVStock = stats::filter(gTV, filter = 0.5, method = "recursive"),
gWebStock = stats::filter(gWeb, filter = 0.5, method = "recursive")
)
summary(lm(gSales ~ gTVStock + gWebStock, data = dfTemp))
##
## Call:
## lm(formula = gSales ~ gTVStock + gWebStock, data = dfTemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6398 -1.1932 0.1214 1.2154 3.7934
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.62407 1.50452 7.061 2.51e-09 ***
## gTVStock 0.24902 0.07604 3.275 0.0018 **
## gWebStock 0.28244 0.11132 2.537 0.0139 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.579 on 57 degrees of freedom
## Multiple R-squared: 0.2097, Adjusted R-squared: 0.1819
## F-statistic: 7.56 on 2 and 57 DF, p-value: 0.001224
dfPlot <- dfIBMSP %>%
gather(sVar, gValue, c(IBM, SP))
g <- ggplot(data=dfPlot, aes(x = dMonth, y = gValue, color = sVar))
g <- g + geom_line()
g <- g + guides(color = FALSE)
g <- g + labs(x = NULL, y = "収益率(対数)")
g <- g + facet_grid(sVar ~ .)
g <- g + theme_bw()
print(g)
ggsave(paste0(WORKDIR, "/chap6_code14.png"), width = 24, height = 10, units = "cm")
ndiffs(dfIBMSP$IBM, test = "adf", type = "trend")
## [1] 0
ndiffs(dfIBMSP$SP, test = "adf", type = "trend")
## [1] 0
# VAR次数選択
VARselect(dfIBMSP %>% dplyr::select(IBM, SP), lag.max = 5, type = "const")
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 3 1 1 3
##
## $criteria
## 1 2 3 4 5
## AIC(n) 6.760111 6.758874 6.753468 6.755548 6.753548
## HQ(n) 6.772538 6.779587 6.782466 6.792831 6.799116
## SC(n) 6.792614 6.813046 6.829308 6.853057 6.872725
## FPE(n) 862.737822 861.671992 857.026761 858.811796 857.096638
# VARモデル推定
oVARModel <- VAR(
dfIBMSP %>% dplyr::select(IBM, SP),
p = 1,
type = "const"
)
summary(oVARModel)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: IBM, SP
## Deterministic variables: const
## Sample size: 887
## Log Likelihood: -5508.559
## Roots of the characteristic polynomial:
## 0.06853 0.03086
## Call:
## VAR(y = dfIBMSP %>% dplyr::select(IBM, SP), p = 1, type = "const")
##
##
## Estimation results for equation IBM:
## ====================================
## IBM = IBM.l1 + SP.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IBM.l1 0.01919 0.04334 0.443 0.6579
## SP.l1 0.10616 0.05167 2.054 0.0402 *
## const 1.16265 0.22897 5.078 4.66e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 6.704 on 884 degrees of freedom
## Multiple R-Squared: 0.01047, Adjusted R-squared: 0.008226
## F-statistic: 4.675 on 2 and 884 DF, p-value: 0.009562
##
##
## Estimation results for equation SP:
## ===================================
## SP = IBM.l1 + SP.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IBM.l1 -0.005419 0.036441 -0.149 0.88183
## SP.l1 0.080189 0.043453 1.845 0.06531 .
## const 0.499350 0.192541 2.593 0.00966 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 5.638 on 884 degrees of freedom
## Multiple R-Squared: 0.005809, Adjusted R-squared: 0.003559
## F-statistic: 2.582 on 2 and 884 DF, p-value: 0.07616
##
##
##
## Covariance matrix of residuals:
## IBM SP
## IBM 44.95 23.94
## SP 23.94 31.78
##
## Correlation matrix of residuals:
## IBM SP
## IBM 1.0000 0.6334
## SP 0.6334 1.0000
# SPはIBMのグレンジャー原因か?
causality(oVARModel, cause = "SP")
## $Granger
##
## Granger causality H0: SP do not Granger-cause IBM
##
## data: VAR object oVARModel
## F-Test = 4.2205, df1 = 1, df2 = 1768, p-value = 0.04009
##
##
## $Instant
##
## H0: No instantaneous causality between: SP and IBM
##
## data: VAR object oVARModel
## Chi-squared = 253.94, df = 1, p-value < 2.2e-16
# IBMはSPのグレンジャー原因か?
causality(oVARModel, cause = "IBM")
## $Granger
##
## Granger causality H0: IBM do not Granger-cause SP
##
## data: VAR object oVARModel
## F-Test = 0.022111, df1 = 1, df2 = 1768, p-value = 0.8818
##
##
## $Instant
##
## H0: No instantaneous causality between: IBM and SP
##
## data: VAR object oVARModel
## Chi-squared = 253.94, df = 1, p-value < 2.2e-16
# IRF
# チャートの出力をうまく合成できない...
png(paste0(WORKDIR, "/chap6_code16.1.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel, impulse = "SP"))
dev.off()
## png
## 2
png(paste0(WORKDIR, "/chap6_code16.2.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel, impulse = "IBM"))
dev.off()
## png
## 2
VARselect(
dfSalesAds %>% dplyr::select(gSales, gTV, gWeb),
lag.max = 5,
type = "trend"
)
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 1 1 1 1
##
## $criteria
## 1 2 3 4 5
## AIC(n) 2.154837 2.281246 2.265232 2.490680 2.691815
## HQ(n) 2.324201 2.577633 2.688642 3.041113 3.369271
## SC(n) 2.592801 3.047682 3.360141 3.914061 4.443669
## FPE(n) 8.633145 9.829711 9.752239 12.403913 15.544622
oVARModel1 <- VAR(
dfSalesAds %>% dplyr::select(gSales, gTV),
p = 1,
type = "trend"
)
oVARModel2 <- VAR(
dfSalesAds %>% dplyr::select(gSales, gWeb),
p = 1,
type = "trend"
)
png(paste0(WORKDIR, "/chap6_code17.1.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel1, impulse = "gTV", response = c("gSales")))
dev.off()
## png
## 2
png(paste0(WORKDIR, "/chap6_code17.2.png"), width = 20, height = 20, res = 72, units = "cm")
plot(irf(oVARModel2, impulse = "gWeb", response = c("gSales")))
dev.off()
## png
## 2
# 動作確認用データ
set.seed(123)
nLength <- 10000
# dfDirectLag:
# omega[0]=0.3, omega[1]=0.2, theta[1]=0.25, phi[1]=0.1
dfDirectLag <- tibble(gX = rnorm(nLength)) %>%
mutate(
gXLag1 = dplyr::lag(gX),
gState = as.vector(
0.3 * arima.sim(list(ma=0.2/0.3), n=nLength, innov=gX)
),
gErr = as.vector(
arima.sim(list(ma=0.25, ar=0.1), n=nLength)
),
gY = gState + gErr
) %>%
drop_na()
mgDirectLagX <- dfDirectLag %>%
dplyr::select(gX, gXLag1) %>%
as.matrix(.)
# dfADL:
# omega[0]=0.4, omega[1]=0.3, phi[1]=0.2
dfADL <- tibble(gX = rnorm(nLength)) %>%
mutate(
gState = as.vector(
0.4 * arima.sim(list(ma=0.3/0.4, ar=0.2), n=nLength, innov=gX)
),
gErr = as.vector(
arima.sim(list(ar=0.2), n=nLength)
),
gY = gState + gErr
) %>%
drop_na()
# dfTransfer:
# omega[0]=0.3, omega[1]=0.2, delta[1]=0.2, theta[1]=0.25, phi[1]=0.1
dfTransfer <- tibble(gX = rnorm(nLength)) %>%
mutate(
gState = as.vector(
0.3 * arima.sim(list(ma=0.2/0.3, ar=0.2), n=nLength, innov=gX)
),
gErr = as.vector(
arima.sim(list(ma=0.25, ar=0.1), n=nLength)
),
gY = gState + gErr
) %>%
drop_na()
library(forecast)
# forecast::Arima()
oModel <- Arima(
dfDirectLag$gY, # 目的変数
order = c(1, 0, 1), # 撹乱項のARMA次数
xreg = mgDirectLagX # 説明変数行列
)
print(oModel)
## Series: dfDirectLag$gY
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept gX gXLag1
## 0.1185 0.2372 -0.0144 0.2971 0.2139
## s.e. 0.0285 0.0278 0.0141 0.0100 0.0100
##
## sigma^2 estimated as 1.004: log likelihood=-14203.07
## AIC=28418.15 AICc=28418.16 BIC=28461.41
## forecast::auto.arima()
oModel <- auto.arima(
dfDirectLag$gY, # 目的変数
d = 0, # 差分階数
xreg = mgDirectLagX, # 撹乱項のARMA次数行列
trace = TRUE, approximation = FALSE
)
##
## ARIMA(2,0,2) with non-zero mean : 28421.94
## ARIMA(0,0,0) with non-zero mean : 29620.56
## ARIMA(1,0,0) with non-zero mean : 28481.44
## ARIMA(0,0,1) with non-zero mean : 28433
## ARIMA(0,0,0) with zero mean : 29620.36
## ARIMA(1,0,2) with non-zero mean : 28420
## ARIMA(0,0,2) with non-zero mean : 28418.01
## ARIMA(0,0,3) with non-zero mean : 28419.99
## ARIMA(1,0,1) with non-zero mean : 28418.16
## ARIMA(1,0,3) with non-zero mean : 28422.01
## ARIMA(0,0,2) with zero mean : 28417.06
## ARIMA(0,0,1) with zero mean : 28432.13
## ARIMA(1,0,2) with zero mean : 28419.04
## ARIMA(0,0,3) with zero mean : 28419.04
## ARIMA(1,0,1) with zero mean : 28417.19
## ARIMA(1,0,3) with zero mean : 28421.06
##
## Best model: Regression with ARIMA(0,0,2) errors
print(oModel)
## Series: dfDirectLag$gY
## Regression with ARIMA(0,0,2) errors
##
## Coefficients:
## ma1 ma2 gX gXLag1
## 0.3558 0.0414 0.2972 0.214
## s.e. 0.0100 0.0100 0.0100 0.010
##
## sigma^2 estimated as 1.004: log likelihood=-14203.53
## AIC=28417.05 AICc=28417.06 BIC=28453.11
library(nlme)
## nlme::gls()
oModel <- gls(
gY ~ gX + gXLag1,
corr = corARMA(p = 1, q = 1), # ARMA撹乱項の次数
data = dfDirectLag %>% head(1000) # 計算時間節約のため時点数を減らした
)
print(summary(oModel))
## Generalized least squares fit by REML
## Model: gY ~ gX + gXLag1
## Data: dfDirectLag %>% head(1000)
## AIC BIC logLik
## 2863.836 2893.264 -1425.918
##
## Correlation Structure: ARMA(1,1)
## Formula: ~1
## Parameter estimate(s):
## Phi1 Theta1
## 0.1353544 0.1875080
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 0.02957195 0.04347242 0.680246 0.4965
## gX 0.30609021 0.03186142 9.606922 0.0000
## gXLag1 0.25455108 0.03187183 7.986711 0.0000
##
## Correlation:
## (Intr) gX
## gX -0.015
## gXLag1 -0.015 0.326
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.258600581 -0.630099992 0.005858342 0.661148481 3.347699148
##
## Residual standard error: 1.052911
## Degrees of freedom: 1000 total; 997 residual
library(dynlm)
# 使用するデータはtsオブジェクトにする
tsY <- ts(dfADL$gY)
tsX <- ts(dfADL$gX)
## dynlm::dynlm()
## L(変数名, 次数) でラグ変数を指定
oModel <- dynlm(tsY ~ L(tsX, 0:1) + L(tsY, 1))
print(summary(oModel))
##
## Time series regression with "ts" data:
## Start = 2, End = 10000
##
## Call:
## dynlm(formula = tsY ~ L(tsX, 0:1) + L(tsY, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1152 -0.6740 0.0063 0.6740 3.9827
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.004895 0.009997 -0.49 0.624
## L(tsX, 0:1)0 0.390970 0.009996 39.11 <2e-16 ***
## L(tsX, 0:1)1 0.301471 0.010623 28.38 <2e-16 ***
## L(tsY, 1) 0.197672 0.009168 21.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9996 on 9995 degrees of freedom
## Multiple R-squared: 0.2561, Adjusted R-squared: 0.2559
## F-statistic: 1147 on 3 and 9995 DF, p-value: < 2.2e-16
library(TSA)
## TSA::arimax()
oModel <- arimax(
dfTransfer$gY[1:1000],
order = c(1, 0, 1), # 撹乱項のARMA次数
xtransf = as.matrix(dfTransfer$gX[1:1000]), # 説明変数行列
transfer = list(c(1,1)), # 説明変数のARMA次数。
# 説明変数が複数あるときはその順に並べる
method = "ML"
)
print(oModel)
##
## Call:
## arimax(x = dfTransfer$gY[1:1000], order = c(1, 0, 1), method = "ML", xtransf = as.matrix(dfTransfer$gX[1:1000]),
## transfer = list(c(1, 1)))
##
## Coefficients:
## ar1 ma1 intercept T1-AR1 T1-MA0 T1-MA1
## 0.1264 0.2384 0.0465 0.2029 0.2997 0.2357
## s.e. 0.0846 0.0823 0.0454 0.1084 0.0324 0.0424
##
## sigma^2 estimated as 1.024: log likelihood = -1429.4, aic = 2870.8
以上