毎回やる儀式

インターネットに接続しておいてください（このサイトが見えていればOK）。
Rstudioを起動し，この授業のプロジェクトを開いてください。(File>Open Project)
File>New File>R scriptで新しいコードファイルを作ってください。

今回は新しいデータセットbowling.csvを用意しましたので，これを読み込んでおいてください。

bowling <- read.csv("bowling.csv")

今日使うパッケージ

rstan
shinystan

前回の復習

今回のデータは，先日のボーリング大会でのデータをそのまま起こしたものです。IDが個人(28人),teamがチーム(6つ)，pondは人によって違うボールの重さです。pinは何本倒れたかで，2ゲームやったので一人当たり20個のデータポイントがあります。

さて，ピンが倒れるかどうかはカウントデータですから，ポアソン分布に従うデータだと考えてみましょう。ポアソン分布はパラメータが一つ（全体の平均）だけの分布です。

今回はstanのコードファイルも別に用意しましたので，ファイルから読み込むようにしてもらってもいいかと思います。

stancode1 <- "
data{
  int<lower=0> N;
  int Pin[N];
}

parameters {
  real<lower=0> lambda;
}

model{
  Pin ~ poisson(lambda);
  lambda ~ normal(0,100);
}
"

これだけではなにも生じませんね。stancode1オブジェクトにstanのコードを書き込んだだけだからです。データの分析例は次のようになります。

library(rstan)

## Loading required package: ggplot2

## rstan (Version 2.9.0, packaged: 2016-01-05 16:17:47 UTC, GitRev: 05c3d0058b6a)

## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())

# 起動後の呪文
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

datastan <- list(N=nrow(bowling),Pin=bowling$pin)

fit <- stan(model_code=stancode1,data=datastan, iter=2000, chain=4)
fit

## Inference for Stan model: d5818d38ce732680101ac96fd8bbf112.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##           mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff
## lambda    6.86    0.00 0.11    6.65    6.78    6.86    6.94    7.07  1455
## lp__   3554.55    0.02 0.67 3552.72 3554.40 3554.80 3554.98 3555.04  1085
##        Rhat
## lambda    1
## lp__      1
## 
## Samples were drawn using NUTS(diag_e) at Fri Jan 22 12:33:49 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

# ファイルからstanコードを読み込む場合は以下二行のコメントアウトを外してください。
# fit <- stan(file="poisson.stan",model_name="NULL poisson",data=datastan,iter=5000,chains=4)
# fit

うまく推定されているでしょうか。コードの意味を見ながら，いろいろ考えてみてください。

グラフィカルな結果を見たい場合は，shinystanパッケージを使うのがオススメです。

library(shinystan)
launch_shinystan(fit)

回帰分析を考える

さて，ボールが重たい方がよくピンが倒れる，というような傾向はあるでしょうか？これは回帰分析ですね。これはもうすでに皆さんできるようになっていると思います。

result.reg <- lm(pin ~ pond,data=bowling)
summary(result.reg)

## 
## Call:
## lm(formula = pin ~ pond, data = bowling)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.1616 -1.1616  0.5777  2.0559  4.4907 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.24819    0.60874   2.050   0.0408 *  
## pond         0.60872    0.06471   9.407   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.904 on 558 degrees of freedom
## Multiple R-squared:  0.1369, Adjusted R-squared:  0.1353 
## F-statistic:  88.5 on 1 and 558 DF,  p-value: < 2.2e-16

おっと失礼，ポアソン分布でした。

result.reg2 <- glm(pin~pond,data=bowling,family="poisson")
summary(result.reg2)

## 
## Call:
## glm(formula = pin ~ pond, family = "poisson", data = bowling)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.9747  -0.4684   0.1850   0.7176   1.6822  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 1.110441   0.081285   13.66   <2e-16 ***
## pond        0.086936   0.008356   10.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1210.0  on 559  degrees of freedom
## Residual deviance: 1102.8  on 558  degrees of freedom
## AIC: 3036.4
## 
## Number of Fisher Scoring iterations: 5

ポアソン分布のリンク関数はlogですから，exp関数を使って係数を見てみると

exp(result.reg2$coefficients)

## (Intercept)        pond 
##    3.035697    1.090827

となります。

さて，これをベイズ流でやるとどうなるでしょうか？コードは次のようになります。

stancode2 <- "
data{
  int<lower=0> M; # data length
  int<lower=0> pond[M];
  int<lower=0> Pin[M];
}

parameters {
  real<lower=0> lambda;
  real intercept;
}

model{
  real pred[M];
  for(m in 1:M){
    pred[m] <- exp(intercept + lambda * pond[m]);
  }
  
  intercept ~ normal(0,100);
  lambda ~ normal(0,100);

  Pin ~ poisson(pred);
}
"

途中，exp関数が入っていますが，これはlogの逆で，切片+係数*ポンドがデータの期待値(予測値)になっている，という意味で，このデータの期待値がポアソン分布したものとしてデータが得られている，というモデルになっています。for文という繰り返し命令が入っているのが技術的に難しいところでしょうか。

これを実行する関数は次の通りです。

datastan <- list(M=nrow(bowling),pond=bowling$pond,Pin=bowling$pin)
fit.poisson1 <- stan(model_code=stancode2,data=datastan, iter=2000, chain=4)
fit.poisson1

## Inference for Stan model: 4740e608801bbc12dcc9a29c3cee8de7.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##              mean se_mean   sd    2.5%     25%     50%     75%   97.5%
## lambda       0.09    0.00 0.01    0.07    0.08    0.09    0.09    0.10
## intercept    1.11    0.00 0.08    0.96    1.06    1.11    1.16    1.26
## lp__      3603.36    0.03 0.94 3600.74 3603.00 3603.66 3604.04 3604.28
##           n_eff Rhat
## lambda      553 1.01
## intercept   536 1.01
## lp__       1120 1.00
## 
## Samples were drawn using NUTS(diag_e) at Fri Jan 22 12:34:19 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

# ファイルからstanコードを読み込む場合は以下二行のコメントアウトを外してください。
# fit.poisson1 <- stan(file="poisson1.stan",model_name="reg poisson",data=datastan,iter=5000,chains=4)
# fit.poisson1

ベイズ統計的なポイントは，回帰係数や切片に分布を仮定している というところです。普通のGLMでは回帰係数や切片は一つの値をとるだけでしたが，ベイズではこれが分布する，と考えるのです。

また，ベイズ流の「統計的有意」を考えるなら，その係数の分布が危険率の範囲で0にならない，ということになります。 shinystanなどでグラフィカルに表示して，その特徴を理解しておきましょう。

library(shinystan)
launch_shinystan(fit)

個人差を考える

ボールの重さより何より，どれぐらいピンが倒れるか，というのは個人の能力によるものだ，と考える方が一般的でしょう。ではこの個人差をモデルに組み込むにはどうしたらよいのでしょうか。これを表現したコードが次のようになります。

stancode3 <-"
data{
  int<lower=0> M; # data length
  int<lower=0> N; # subjects
  int<lower=0> id[M]; 
  int<lower=0> Pin[M];
}

parameters {
  real<lower=0> lambda;
  real r1[N];
  real<lower=0> tau_sd;
}

model{
  real pred[N];

  for(n in 1:N){
      r1[n] ~ normal(0,tau_sd);
  }
  tau_sd ~ cauchy(0,2.5);

  for(n in 1:N){
    pred[n] <- exp(lambda + r1[n]);
  }
  lambda ~ normal(0,100);

  
  for(m in 1:M)
    Pin[m] ~ poisson(pred[id[m]]);
}
"

推定すべきパラメータは，ピンが平均何本倒れるかのlambda,そして個人差r1です。このr1は平均0，標準偏差tau_sdの正規分布に従います。lambdaに個人差r1が加わって，n人分の予測関数が作られます。lambdaは正規分布に従います。これらを組み込んで，予測関数で表される期待値がポアソン分布に従うピンになって現れる，というモデルです。

実行してみましょう。

datastan <- list(M=nrow(bowling),N=28,id=bowling$ID,Pin=bowling$pin)
fit.poisson2 <- stan(model_code=stancode3,data=datastan, iter=2000, chain=4)
fit.poisson2

## Inference for Stan model: 5b042a8adb7f8106ced62dbc9bd1d759.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##           mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff
## lambda    1.89     0.0 0.06    1.78    1.85    1.89    1.93    2.00   454
## r1[1]    -0.14     0.0 0.10   -0.34   -0.21   -0.14   -0.07    0.06  1238
## r1[2]    -0.41     0.0 0.11   -0.63   -0.49   -0.41   -0.34   -0.19  1466
## r1[3]     0.26     0.0 0.09    0.09    0.20    0.26    0.32    0.44  1176
## r1[4]     0.16     0.0 0.09   -0.03    0.10    0.16    0.22    0.34  1304
## r1[5]     0.01     0.0 0.10   -0.18   -0.05    0.01    0.08    0.20  1472
## r1[6]     0.32     0.0 0.09    0.14    0.26    0.32    0.38    0.49  1155
## r1[7]    -0.83     0.0 0.13   -1.10   -0.91   -0.82   -0.73   -0.57  1633
## r1[8]     0.05     0.0 0.09   -0.13   -0.01    0.05    0.12    0.24  1146
## r1[9]    -0.25     0.0 0.11   -0.47   -0.33   -0.25   -0.18   -0.05  1547
## r1[10]    0.17     0.0 0.09   -0.01    0.11    0.17    0.23    0.36   971
## r1[11]   -0.02     0.0 0.10   -0.22   -0.08   -0.02    0.05    0.18  1209
## r1[12]    0.10     0.0 0.10   -0.09    0.04    0.10    0.17    0.29  1130
## r1[13]    0.20     0.0 0.10    0.01    0.14    0.20    0.26    0.39  1172
## r1[14]   -0.06     0.0 0.10   -0.27   -0.13   -0.06    0.01    0.14  1278
## r1[15]    0.06     0.0 0.10   -0.13   -0.01    0.06    0.13    0.25  1098
## r1[16]    0.08     0.0 0.10   -0.12    0.01    0.08    0.14    0.26  1112
## r1[17]   -0.01     0.0 0.10   -0.20   -0.07   -0.01    0.06    0.18  1408
## r1[18]   -0.01     0.0 0.10   -0.20   -0.08   -0.01    0.06    0.18  1260
## r1[19]   -0.16     0.0 0.10   -0.35   -0.22   -0.15   -0.09    0.03  1385
## r1[20]   -0.58     0.0 0.12   -0.82   -0.66   -0.58   -0.50   -0.36  1693
## r1[21]    0.16     0.0 0.09   -0.03    0.10    0.17    0.23    0.35   982
## r1[22]    0.29     0.0 0.09    0.11    0.23    0.29    0.34    0.46  1061
## r1[23]    0.15     0.0 0.09   -0.03    0.09    0.15    0.21    0.33  1011
## r1[24]   -0.08     0.0 0.10   -0.27   -0.14   -0.08   -0.01    0.11  1157
## r1[25]    0.24     0.0 0.09    0.07    0.18    0.25    0.31    0.42  1065
## r1[26]    0.16     0.0 0.09   -0.02    0.10    0.16    0.22    0.35  1192
## r1[27]    0.20     0.0 0.09    0.02    0.14    0.20    0.26    0.39  1110
## r1[28]   -0.06     0.0 0.10   -0.26   -0.13   -0.06    0.01    0.13  1504
## tau_sd    0.29     0.0 0.05    0.21    0.26    0.28    0.32    0.39  2603
## lp__   3682.72     0.1 3.82 3674.35 3680.31 3683.00 3685.53 3689.26  1567
##        Rhat
## lambda 1.01
## r1[1]  1.00
## r1[2]  1.01
## r1[3]  1.00
## r1[4]  1.00
## r1[5]  1.00
## r1[6]  1.00
## r1[7]  1.00
## r1[8]  1.00
## r1[9]  1.00
## r1[10] 1.00
## r1[11] 1.00
## r1[12] 1.00
## r1[13] 1.00
## r1[14] 1.00
## r1[15] 1.00
## r1[16] 1.00
## r1[17] 1.00
## r1[18] 1.00
## r1[19] 1.00
## r1[20] 1.00
## r1[21] 1.00
## r1[22] 1.00
## r1[23] 1.00
## r1[24] 1.00
## r1[25] 1.00
## r1[26] 1.00
## r1[27] 1.00
## r1[28] 1.00
## tau_sd 1.00
## lp__   1.00
## 
## Samples were drawn using NUTS(diag_e) at Fri Jan 22 12:34:50 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

# ファイルからstanコードを読み込む場合は以下二行のコメントアウトを外してください。
# fit.poisson2 <- stan(file="poisson2.stan",model_name="ind.diff poisson",data=datastan,iter=2000,chains=4)
# fit.poisson2

個人差ならびにそれを考慮した上での平均が示されているのがわかるでしょう。

組み合わせたくなるよね！

個人差を考慮した上で，ボールの重さがピンを倒すのに影響するということはないでしょうか？当然あると思います。回帰と個人差のモデル，合体させましょう！

stancode4 <- "
data{
  int<lower=0> M; # data length
  int<lower=0> N; # subjects
  int<lower=0> pond[M];
  int<lower=0> id[M]; 
  int<lower=0> Pin[M];
}

parameters {
  real<lower=0> lambda;
  real r1[N];
  real<lower=0> tau_sd;
  real intercept;
}

model{
  real pred[M];
  for(n in 1:N)
      r1[n] ~ normal(0,tau_sd);
  tau_sd ~ cauchy(0,2.5);

  for(m in 1:M){
    pred[m] <- exp(intercept + lambda * pond[m] + r1[id[m]]);
  }
  
  intercept ~ normal(0,100);
  lambda ~ normal(0,100);
  
  for(m in 1:M)
    Pin[m] ~ poisson(pred[id[m]]);
}

"

これを実行するコードは次のようになります。

datastan <- list(M=nrow(bowling),N=28,id=bowling$ID,pond=bowling$pond,Pin=bowling$pin)
fit.poisson3 <- stan(model_code=stancode4,data=datastan, iter=2000, chain=4)
fit.poisson3

## Inference for Stan model: d3e414d05e4eb93653e3311b75c4a30a.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##              mean se_mean   sd    2.5%     25%     50%     75%   97.5%
## lambda       0.10    0.00 0.02    0.05    0.08    0.10    0.11    0.14
## r1[1]        0.04    0.00 0.10   -0.16   -0.03    0.04    0.11    0.24
## r1[2]       -0.21    0.00 0.11   -0.42   -0.28   -0.20   -0.13    0.01
## r1[3]        0.01    0.00 0.10   -0.19   -0.06    0.01    0.07    0.19
## r1[4]       -0.09    0.00 0.10   -0.29   -0.15   -0.08   -0.02    0.11
## r1[5]        0.02    0.00 0.09   -0.15   -0.03    0.02    0.08    0.20
## r1[6]        0.14    0.00 0.09   -0.03    0.08    0.14    0.20    0.31
## r1[7]       -0.56    0.00 0.13   -0.82   -0.65   -0.56   -0.47   -0.32
## r1[8]        0.23    0.00 0.10    0.04    0.16    0.23    0.29    0.42
## r1[9]       -0.06    0.00 0.10   -0.27   -0.13   -0.06    0.01    0.14
## r1[10]       0.09    0.00 0.09   -0.08    0.04    0.09    0.15    0.26
## r1[11]       0.16    0.00 0.10   -0.04    0.09    0.16    0.23    0.37
## r1[12]       0.03    0.00 0.09   -0.14   -0.03    0.03    0.09    0.20
## r1[13]       0.20    0.00 0.08    0.04    0.14    0.20    0.26    0.36
## r1[14]      -0.04    0.00 0.09   -0.21   -0.10   -0.04    0.02    0.14
## r1[15]      -0.09    0.00 0.09   -0.28   -0.16   -0.09   -0.03    0.09
## r1[16]       0.09    0.00 0.09   -0.08    0.03    0.09    0.15    0.26
## r1[17]       0.01    0.00 0.09   -0.16   -0.05    0.01    0.07    0.18
## r1[18]       0.01    0.00 0.09   -0.17   -0.05    0.01    0.07    0.18
## r1[19]      -0.13    0.00 0.09   -0.31   -0.19   -0.13   -0.07    0.05
## r1[20]      -0.36    0.00 0.11   -0.59   -0.43   -0.35   -0.28   -0.14
## r1[21]       0.17    0.00 0.09    0.00    0.11    0.17    0.22    0.33
## r1[22]       0.03    0.00 0.10   -0.16   -0.03    0.03    0.10    0.22
## r1[23]      -0.18    0.00 0.11   -0.41   -0.26   -0.18   -0.10    0.03
## r1[24]      -0.06    0.00 0.09   -0.24   -0.12   -0.06    0.01    0.12
## r1[25]      -0.01    0.00 0.10   -0.21   -0.07   -0.01    0.06    0.18
## r1[26]       0.33    0.00 0.10    0.13    0.26    0.33    0.39    0.52
## r1[27]       0.04    0.00 0.09   -0.14   -0.02    0.04    0.10    0.20
## r1[28]       0.12    0.00 0.10   -0.07    0.05    0.12    0.19    0.32
## tau_sd       0.21    0.00 0.04    0.14    0.18    0.21    0.23    0.30
## intercept    1.01    0.01 0.22    0.58    0.87    1.02    1.16    1.43
## lp__      3688.01    0.12 4.10 3679.00 3685.48 3688.41 3690.91 3695.08
##           n_eff Rhat
## lambda      593    1
## r1[1]      1665    1
## r1[2]      1760    1
## r1[3]       881    1
## r1[4]      1000    1
## r1[5]      2142    1
## r1[6]      1136    1
## r1[7]      2076    1
## r1[8]      1343    1
## r1[9]      1626    1
## r1[10]     1694    1
## r1[11]     1438    1
## r1[12]     1987    1
## r1[13]     1806    1
## r1[14]     4000    1
## r1[15]     1233    1
## r1[16]     2047    1
## r1[17]     2007    1
## r1[18]     2228    1
## r1[19]     2372    1
## r1[20]     1933    1
## r1[21]     1787    1
## r1[22]      893    1
## r1[23]      900    1
## r1[24]     2261    1
## r1[25]      949    1
## r1[26]     1466    1
## r1[27]     1258    1
## r1[28]     1525    1
## tau_sd     1966    1
## intercept   620    1
## lp__       1254    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Jan 22 12:35:30 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

# fit.poisson3 <- stan(file="poisson3.stan",model_name="ind.diff reg poisson",data=datastan,iter=5000,chains=4)
# fit.poisson3

いかがでしょう。

あれ？これって・・・

個人差を考えつつ回帰モデルを考える。個人にネストされた回帰を考える。そうです，これは階層線形モデルの考え方です。実は，階層線形モデルはほとんどベイズモデリングをやっていることとおなじなのです。試しにGLMMのパッケージで考えてみましょう。

library(lmerTest)

## Loading required package: Matrix

## Loading required package: lme4

## 
## Attaching package: 'lmerTest'

##  以下のオブジェクトは 'package:lme4' からマスクされています: 
## 
##      lmer

##  以下のオブジェクトは 'package:stats' からマスクされています: 
## 
##      step

library(lme4)
result.glmm <- glmer(pin~pond+(1|ID),data=bowling,family=poisson)
summary(result.glmm)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: pin ~ pond + (1 | ID)
##    Data: bowling
## 
##      AIC      BIC   logLik deviance df.resid 
##   2978.7   2991.7  -1486.4   2972.7      557 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8629 -0.3810  0.2800  0.6785  3.2939 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  ID     (Intercept) 0.0355   0.1884  
## Number of obs: 560, groups:  ID, 28
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.01076    0.19640   5.147 2.65e-07 ***
## pond         0.09588    0.02073   4.626 3.72e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## pond -0.980

少し数値は違いますが，ほとんど同じことができました。ただ，GLMMは最尤法という従来の推定法を使っているところが違います。ただ，この最尤法を使うやり方は，変量効果(IDの分散のやつです)が増えていったりするとうまく答えが出せないことがあります。分布がmixされすぎると途中の計算がうまくできないのです。

それに対し，ベイズ法は分布を生み出すようなモデリングをしますので，ほとんどどのようなモデルであってもうまく推定することができます。複雑なモデルになればなるほど，ベイズ統計の威力が実感できるようになります。

パッケージglmmstan

stanのコードをいちいち書くのは大変です。lme4のように簡単な書き方でできないものでしょうか？この悩みに応えてくれるのがglmmstanパッケージです。残念ながらCRANに登録されてはいませんので，特殊なインストール方法をする必要がありますが，これを使うと大変便利です。

### glmmstanをインストールするには，devtoolsパッケージが必要。これを入れた上で，以下二行を実行する。
# library(devtools)
# install_github("norimune/glmmstan")
library(glmmstan)
result.glmmstan <- glmmstan(pin~pond+(1|ID),data=bowling,family="poisson",iter=2000,chains=4)

## Loading required package: foreach

## Loading required package: iterators

## Loading required package: parallel

## 
## Compiling the stan code.
## 
## MCMC sampling start.
## 
## SAMPLING FOR MODEL 'pin~pond + (1 | ID) [poisson]' NOW (CHAIN 1).
## 
## Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
## #  Elapsed Time: 10.8074 seconds (Warm-up)
## #                9.51575 seconds (Sampling)
## #                20.3231 seconds (Total)
## # 
## 
## SAMPLING FOR MODEL 'pin~pond + (1 | ID) [poisson]' NOW (CHAIN 2).
## 
## Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
## #  Elapsed Time: 12.5732 seconds (Warm-up)
## #                10.0636 seconds (Sampling)
## #                22.6368 seconds (Total)
## # 
## 
## SAMPLING FOR MODEL 'pin~pond + (1 | ID) [poisson]' NOW (CHAIN 3).
## 
## Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
## #  Elapsed Time: 10.9162 seconds (Warm-up)
## #                9.36633 seconds (Sampling)
## #                20.2825 seconds (Total)
## #

## The following numerical problems occured the indicated number of times after warmup on chain 3

##                                                                                 count
## Exception thrown at line 28: normal_log: Scale parameter is 0, but must be > 0!     1

## When a numerical problem occurs, the Metropolis proposal gets rejected.

## However, by design Metropolis proposals sometimes get rejected even when there are no numerical problems.

## Thus, if the number in the 'count' column is small, do not ask about this message on stan-users.

## 
## SAMPLING FOR MODEL 'pin~pond + (1 | ID) [poisson]' NOW (CHAIN 4).
## 
## Chain 4, Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4, Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4, Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4, Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4, Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4, Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4, Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4, Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4, Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4, Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4, Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4, Iteration: 2000 / 2000 [100%]  (Sampling)# 
## #  Elapsed Time: 10.6383 seconds (Warm-up)
## #                9.22599 seconds (Sampling)
## #                19.8643 seconds (Total)
## # 
## 
## lppd = -1447.4334
## pWAIC = 29.6799
## WAIC = 2954.2265

output_result(result.glmmstan)

## $formula
## pin ~ pond + (1 | ID)
## 
## $WAIC
##        WAIC        lppd      p_waic 
##  2954.22647 -1447.43338    29.67986 
## 
## $beta
##           coefficient      stdev   95%lower  95%upper
## Intercept  0.98866007 0.21485056 0.54686396 1.3887512
## pond       0.09841681 0.02270233 0.05549812 0.1445058
## 
## $tau_sd
## $tau_sd$ID
##             coefficient      stdev  95%lower  95%upper
## (Intercept)   0.2096653 0.03890024 0.1444946 0.2982034
## 
## 
## $tau
## $tau$ID
## (Intercept) 
##  0.04547239

他にもstanを組み込んだパッケージが次々に出てきています。Enjoy!

stanを組み込んだほかのパッケージ

library(rstanarm)

## Loading required package: Rcpp

## rstanarm (Version 2.9.0-1, packaged: 2016-01-09 18:39:57 UTC)

## - Do not expect the default priors to remain the same in future rstanarm versions.

## Thus, R scripts should specify priors explicitly, even if they are just the defaults.

## - For execution on a local, multicore CPU with excess RAM we recommend calling

## options(mc.cores = parallel::detectCores())

## 
## Attaching package: 'rstanarm'

##  以下のオブジェクトは 'package:lme4' からマスクされています: 
## 
##      sigma

result.rstanarm <- stan_glmer(pin~pond+(1|ID),data=bowling,family=poisson)
summary(result.rstanarm)

## stan_glmer(formula = pin ~ pond + (1 | ID), data = bowling, family = poisson)
## 
## Algorithm: sampling
## Posterior sample size: 4000
## Observations: 560
## Groups: ID 28
## 
## Estimates:
##                        mean   sd     2.5%   25%    50%    75%    97.5%
## (Intercept)             1.0    0.2    0.6    0.9    1.0    1.1    1.4 
## pond                    0.1    0.0    0.1    0.1    0.1    0.1    0.1 
## b[(Intercept) ID:1]     0.0    0.1   -0.2    0.0    0.0    0.1    0.3 
## b[(Intercept) ID:2]    -0.2    0.1   -0.4   -0.3   -0.2   -0.1    0.0 
## b[(Intercept) ID:3]     0.0    0.1   -0.2   -0.1    0.0    0.1    0.2 
## b[(Intercept) ID:4]    -0.1    0.1   -0.3   -0.2   -0.1    0.0    0.1 
## b[(Intercept) ID:5]     0.0    0.1   -0.2    0.0    0.0    0.1    0.2 
## b[(Intercept) ID:6]     0.1    0.1    0.0    0.1    0.1    0.2    0.3 
## b[(Intercept) ID:7]    -0.6    0.1   -0.8   -0.6   -0.6   -0.5   -0.3 
## b[(Intercept) ID:8]     0.2    0.1    0.0    0.2    0.2    0.3    0.4 
## b[(Intercept) ID:9]    -0.1    0.1   -0.3   -0.1   -0.1    0.0    0.1 
## b[(Intercept) ID:10]    0.1    0.1   -0.1    0.0    0.1    0.1    0.3 
## b[(Intercept) ID:11]    0.2    0.1    0.0    0.1    0.2    0.2    0.4 
## b[(Intercept) ID:12]    0.0    0.1   -0.1    0.0    0.0    0.1    0.2 
## b[(Intercept) ID:13]    0.2    0.1    0.0    0.1    0.2    0.3    0.4 
## b[(Intercept) ID:14]    0.0    0.1   -0.2   -0.1    0.0    0.0    0.1 
## b[(Intercept) ID:15]   -0.1    0.1   -0.3   -0.2   -0.1    0.0    0.1 
## b[(Intercept) ID:16]    0.1    0.1   -0.1    0.0    0.1    0.1    0.3 
## b[(Intercept) ID:17]    0.0    0.1   -0.2   -0.1    0.0    0.1    0.2 
## b[(Intercept) ID:18]    0.0    0.1   -0.2    0.0    0.0    0.1    0.2 
## b[(Intercept) ID:19]   -0.1    0.1   -0.3   -0.2   -0.1   -0.1    0.1 
## b[(Intercept) ID:20]   -0.4    0.1   -0.6   -0.4   -0.3   -0.3   -0.1 
## b[(Intercept) ID:21]    0.2    0.1    0.0    0.1    0.2    0.2    0.3 
## b[(Intercept) ID:22]    0.0    0.1   -0.2    0.0    0.0    0.1    0.2 
## b[(Intercept) ID:23]   -0.2    0.1   -0.4   -0.3   -0.2   -0.1    0.0 
## b[(Intercept) ID:24]   -0.1    0.1   -0.2   -0.1   -0.1    0.0    0.1 
## b[(Intercept) ID:25]    0.0    0.1   -0.2   -0.1    0.0    0.1    0.2 
## b[(Intercept) ID:26]    0.3    0.1    0.1    0.3    0.3    0.4    0.5 
## b[(Intercept) ID:27]    0.0    0.1   -0.1    0.0    0.0    0.1    0.2 
## b[(Intercept) ID:28]    0.1    0.1   -0.1    0.1    0.1    0.2    0.3 
## mean_PPD                6.9    0.2    6.5    6.7    6.8    7.0    7.2 
## log-posterior        3645.1    5.7 3633.3 3641.5 3645.4 3649.1 3655.2 
## 
## Diagnostics:
##                      mcse Rhat n_eff
## (Intercept)          0.0  1.0   933 
## pond                 0.0  1.0   938 
## b[(Intercept) ID:1]  0.0  1.0  1637 
## b[(Intercept) ID:2]  0.0  1.0  1771 
## b[(Intercept) ID:3]  0.0  1.0  1173 
## b[(Intercept) ID:4]  0.0  1.0  1163 
## b[(Intercept) ID:5]  0.0  1.0  1957 
## b[(Intercept) ID:6]  0.0  1.0  1228 
## b[(Intercept) ID:7]  0.0  1.0  2105 
## b[(Intercept) ID:8]  0.0  1.0  1656 
## b[(Intercept) ID:9]  0.0  1.0  1705 
## b[(Intercept) ID:10] 0.0  1.0  1475 
## b[(Intercept) ID:11] 0.0  1.0  1654 
## b[(Intercept) ID:12] 0.0  1.0  1403 
## b[(Intercept) ID:13] 0.0  1.0  1558 
## b[(Intercept) ID:14] 0.0  1.0  2040 
## b[(Intercept) ID:15] 0.0  1.0  1537 
## b[(Intercept) ID:16] 0.0  1.0  1482 
## b[(Intercept) ID:17] 0.0  1.0  1910 
## b[(Intercept) ID:18] 0.0  1.0  1786 
## b[(Intercept) ID:19] 0.0  1.0  4000 
## b[(Intercept) ID:20] 0.0  1.0  4000 
## b[(Intercept) ID:21] 0.0  1.0  1469 
## b[(Intercept) ID:22] 0.0  1.0  1162 
## b[(Intercept) ID:23] 0.0  1.0  1195 
## b[(Intercept) ID:24] 0.0  1.0  1839 
## b[(Intercept) ID:25] 0.0  1.0  1190 
## b[(Intercept) ID:26] 0.0  1.0  1357 
## b[(Intercept) ID:27] 0.0  1.0  1434 
## b[(Intercept) ID:28] 0.0  1.0  1532 
## mean_PPD             0.0  1.0  4000 
## log-posterior        0.2  1.0   799 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

plot(result.rstanarm)

## 'pars' not specified. Showing first 10 parameters by default.

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

library(brms)

## Loading 'brms' package (version 0.7.0). Useful instructions 
## can be found by typing help('brms'). A more detailed introduction 
## to the package is available through vignette('brms').

## 
## Attaching package: 'brms'

##  以下のオブジェクトは 'package:rstanarm' からマスクされています: 
## 
##      cauchy

result.brms <- brm(pin~pond+(1|ID),data=bowling,family="poisson")

## Compiling the C++ model

result.brms

##  Family: poisson (log) 
## Formula: pin ~ pond + (1 | ID) 
##    Data: bowling (Number of observations: 560) 
## Samples: 2 chains, each with iter = 2000; warmup = 500; thin = 1; 
##          total post-warmup samples = 3000
##    WAIC: 2954.81
##  
## Random Effects: 
## ~ID (Number of levels: 28) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.21      0.04     0.14      0.3        466 1.01
## 
## Fixed Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      1.0      0.21     0.57     1.42        564    1
## pond           0.1      0.02     0.05     0.14        521    1
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

本日の課題

以前行った階層線形モデルをベイズ流にやってみてください。glmmstan他，パッケージを用いてもよい。

Lesson10.Bayesian Modeling 2

Kosugitti

2016年1月8日