公共政策・実験データの分析チュートリアル

実験のシナリオ

統制群シナリオ：

あなたの住む自治体で、ごみの分別方法が変わることになりました。

誘因付与シナリオ（処置A）：

あなたの住む自治体で、ごみの分別方法が変わることになりました。

プラスチックごみを分別して、近隣のスーパーにもっていくことによって、市内で使えるクーポン券が発行されることになります。

理念提示シナリオ（処置B）：

あなたの住む自治体で、ごみの分別方法が変わることになりました。

プラスチックごみを分別して、ごみを減らし、私たちの意識を改革していくことは不可欠な営みです。

\(\Downarrow\)

結果変数：

あなたはプラスチックごみを分別しようと思いますか。

そう思う／ややそう思う／どちらともいえない／あまりそう思わない／そう思わない／答えない

実験のフロー

実験データの分析

上記のシナリオに従って、実験データが得られたとしよう。データは、「experiment.csv」データである。実験データ分析用のRコードを紹介する。以下の手順で、起動したRStduioから順に作業を進めて行ってみよう。以下の各図を再現できるだろうか。

データの読み込み

# dfというオブジェクトにデータ「experiment.csv」を格納する

df <- read.csv("experiment.csv")

パッケージの読み込み

パッケージのインストールは、授業時の説明を参考に。オンライン動画も参照しよう。

library(tidyverse)
library(ggpubr)
library(stargazer)

平均値の差の描画(Mean difference plot)

レジュメにもあったように、平均値の差を計算し描画する。どのように解釈できるだろうか。

# 1) グループ順序の設定（Cを基準にする）
df$group <- factor(df$group, levels = c("C","A","B"))

# 2) 各群の平均と95%信頼区間（confidence intervals: CI）の描画
summary_df <- df %>%
  group_by(group) %>%
  summarise(
    n    = n(),
    mean = mean(separate, na.rm = TRUE),
    se   = sqrt(mean * (1 - mean) / n),
    ci   = qnorm(0.975) * se,
    lo   = mean - ci,
    hi   = mean + ci,
    .groups = "drop"
  )

# 3) 平均の差の計算（A−C, B−C）
mC <- summary_df$mean[summary_df$group == "C"]
mA <- summary_df$mean[summary_df$group == "A"]
mB <- summary_df$mean[summary_df$group == "B"]
diff_AC <- as.numeric(mA - mC)
diff_BC <- as.numeric(mB - mC)

# 4) 図の描画：平均＋95%CI（注記に A−C / B−C を表示）
ymin <- min(summary_df$lo); ymax <- max(summary_df$hi); pad <- 0.08*(ymax - ymin)

p <- ggplot(summary_df, aes(x = group, y = mean)) +
  geom_point(size = 4, shape = 21, fill = "grey85", color = "black") +
  geom_errorbar(aes(ymin = lo, ymax = hi), width = 0.08, size = 0.9) +
  geom_text(aes(label = sprintf("%.3f\n[%.3f, %.3f]", mean, lo, hi)),
            nudge_x = 0.14, hjust = 0, vjust = 0.5, size = 4) +
  annotate("text", x = 2, y = ymin + pad,
           label = paste0("A - C = ", sprintf("%.3f", diff_AC),
                          "   |   B - C = ", sprintf("%.3f", diff_BC)),
           size = 4.6) +
  scale_x_discrete(labels = c("C"="処置群 (C)","A"="誘因付与型(A)","B"="理念提示型(B)")) +
  labs(x = NULL, y = "Mean difference") +
  theme_bw(base_size = 12) +
  theme(panel.grid.minor = element_blank(), legend.position = "none") +
  coord_cartesian(ylim = c(ymin - pad, ymax + pad))


# 5) 図の表示
print(p)

最小二乗法による推定(Ordinary Least Square estimation)

次にOLS推定を実践。推定結果を、どのように解釈できるだろうか。単回帰の推定モデルは、

\[ Y_i \;=\; \beta_0 \;+\; \beta_1\,x_{\text{treatmentA},i} \;+\; \beta_2\,x_{\text{treatmentB},i} \;+\; \varepsilon_i, \]

ここで、

\(i=1,\dots,N\) は観測個体の添字。
\(Y_i\)：アウトカム（例：「ごみの分別意識」の指標。1（そう思わない）－5（そう思う）段階）。
\(x_{\text{treatment},i}\in\{0,1\}\)：処置ダミー（処置=1、統制=0）。
\(\beta_0\)：統制群の平均（切片）。
\(\beta_1\)：平均処置効果（処置と統制の平均差）。
\(\varepsilon_i\)：誤差項。\(\mathbb{E}[\varepsilon_i\!\mid\!x_{\text{treatment},i}]=0\) を仮定。

重回帰の推定モデルは、

\[ Y_i \;=\; \beta_0 \;+\; \beta_1\,x_{\text{treatment},i} \;+\; \beta_2\,x_{\text{gender},i} \;+\; \beta_3\,x_{\text{income},i} \;+\; \beta_4\,x_{\text{education},i} \;+\; \beta_5\,x_{\text{age},i} \;+\; \varepsilon_i. \]

ここで、

\(x_{\text{gender},i}\)：性別（0=男性, 1=女性）。
\(x_{\text{income},i}\)：所得スコア（1–20 の順序尺度）。
\(x_{\text{education},i}\)：学歴（0=高卒以下, 1=大卒以上）。
\(x_{\text{age},i}\)：年齢（整数）。
\(\beta_2,\dots,\beta_5\)：各共変量の限界効果（他を固定したときの \(Y_i\) の期待値の変化）。

# 1)OLS推定を実行し、m_に結果を格納
m_simple <- lm(separate ~ group, data = df)
m_cov    <- lm(separate ~ group + gender + income + education + age, data = df)

# 2)推定結果を分かりやすく表化
stargazer(m_simple, m_cov,
          type = "text", header = FALSE, digits = 4,
          dep.var.labels = "Separate (1 = yes)",
          covariate.labels = c("A vs C","B vs C",
                               "Gender (1=female)","Income (1-20)",
                               "Education (1=college+)","Age"),
          title = "OLS: Three Groups (C as reference) — Simple and Covariate-adjusted")

## 
## OLS: Three Groups (C as reference) — Simple and Covariate-adjusted
## ==========================================================================
##                                        Dependent variable:                
##                        ---------------------------------------------------
##                                        Separate (1 = yes)                 
##                                   (1)                       (2)           
## --------------------------------------------------------------------------
## A vs C                         0.2750***                 0.2752***        
##                                (0.0243)                  (0.0243)         
##                                                                           
## B vs C                         0.1362***                 0.1372***        
##                                (0.0243)                  (0.0243)         
##                                                                           
## Gender (1=female)                                         -0.0188         
##                                                          (0.0199)         
##                                                                           
## Income (1-20)                                             0.0015          
##                                                          (0.0018)         
##                                                                           
## Education (1=college+)                                    0.0217          
##                                                          (0.0199)         
##                                                                           
## Age                                                      -0.00004         
##                                                          (0.0006)         
##                                                                           
## Constant                       0.3187***                 0.3033***        
##                                (0.0172)                  (0.0397)         
##                                                                           
## --------------------------------------------------------------------------
## Observations                     2,400                     2,400          
## R2                              0.0508                    0.0520          
## Adjusted R2                     0.0500                    0.0496          
## Residual Std. Error       0.4855 (df = 2397)        0.4856 (df = 2393)    
## F Statistic            64.1616*** (df = 2; 2397) 21.8609*** (df = 6; 2393)
## ==========================================================================
## Note:                                          *p<0.1; **p<0.05; ***p<0.01