Rstanの練習: MCMCでコイン投げモデル推定

確率\( \theta \)で表が出るコイン投げを\( N \)回行ったデータ\( y \)を作ります. Rだとこんな感じになります.

# sample size
N = 100
# prob 
theta = 0.3
# data
y = sample(c(1,0), N, replace=TRUE, prob =  c(theta, 1 - theta))

データ\( y \)から, パラメータ\( \theta \)の分布を得たいと思いましょう.

MCMCサンプリングを使って分布を推定します.

rstanパッケージからStanを使います.

library(rstan)
model_code = "
data{
  int<lower=0> N; // sample size
  int<lower=0, upper=1> y[N]; // 0 or 1
}
parameters{
  real<lower=0, upper=1> theta; // 1が出る確率
}
model{
  theta ~ uniform(0,1); // 無情報事前分布
  for(n in 1:N){
    y[n] ~ bernoulli(theta); // コイン投げモデル
  }
}
"

確率\( \theta \)の値がわからないので, 無情報事前分布として一様分布をとって, \[ \theta \sim U(0,1) \] としました.

では, データを流してみます.

data_list = list(N = N, y = y)
# run the stan code
fit_res = stan(model_code = model_code, data= data_list, iter=1000)

\( \mathrm{iter} = 1000 \) としています. MCMCサンプリングは初期値に依存するところがあるので, 最初の半分である500ステップ分は捨てます.

パラメータの分布の形が欲しいです. codaパッケージを使うといいらしいです.

# make plot
library(coda)
fit_res_coda <- mcmc.list(lapply(1:ncol(fit_res),
                          function(x) mcmc(as.array(fit_res)[,x,])))
plot(fit_res_coda)

plot of chunk unnamed-chunk-4

Density of thetaにてパラメータ\( \theta \)の分布の形が得られています. 分布の特徴を見てみましょう.

fit_res

## Inference for Stan model: model_code.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##        mean se_mean  sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## theta   0.3       0 0.0   0.2   0.3   0.3   0.3   0.4   760    1
## lp__  -63.1       0 0.7 -64.8 -63.3 -62.9 -62.7 -62.6  1020    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Mar  9 20:49:55 2014.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

MCMCサンプリングによって得られた\( \theta \)の分布について, 中央値は0.3. よく推定されている, と言っていいんでしょうかね.

以上です.