confint

Author

Takafumi Kubota

Published

June 18, 2024

Abstract

This study simulates the construction of 95% confidence intervals for the true mean using normally distributed random data. By generating 100 samples of size 30, we compute the confidence intervals and identify those that do not contain the true mean, highlighting them in red. The proportion of intervals containing the true mean is calculated to validate the simulation’s accuracy. This approach visually demonstrates the reliability of confidence intervals in estimating population parameters.

Keywords

confidence intervals, simulation

1 Steps for Calculating Confidence Intervals

2 Visualization of 95% Confidence Intervals

Code

# パラメータ設定
set.seed(123) # 再現性のためにシードを設定
true_mean <- 50 # 真の平均
sample_size <- 30 # サンプルサイズ
num_iterations <- 100 # 繰り返し回数
confidence_level <- 0.95 # 信頼水準

# 信頼区間の保存リスト
confidence_intervals <- data.frame(
  iteration = integer(num_iterations),
  lower = numeric(num_iterations),
  upper = numeric(num_iterations),
  contains_true_mean = logical(num_iterations)
)
coverage_count <- 0

# データ生成と信頼区間計算
for (i in 1:num_iterations) {
  # データ生成
  sample_data <- rnorm(sample_size, mean=true_mean, sd=10)
  
  # 標本平均と標準誤差の計算
  sample_mean <- mean(sample_data)
  sample_se <- sd(sample_data) / sqrt(sample_size)
  
  # 信頼区間の計算
  t_value <- qt(1 - (1 - confidence_level) / 2, df=sample_size-1)
  error_margin <- t_value * sample_se
  lower_bound <- sample_mean - error_margin
  upper_bound <- sample_mean + error_margin
  
  # 信頼区間を保存
  confidence_intervals[i, ] <- c(i, lower_bound, upper_bound, true_mean >= lower_bound && true_mean <= upper_bound)
  
  # 真の平均が信頼区間に含まれているか確認
  if (confidence_intervals$contains_true_mean[i]) {
    coverage_count <- coverage_count + 1
  }
}

# 結果の表示
cat("Proportion of confidence intervals containing the true mean:", coverage_count / num_iterations, "\n")

Proportion of confidence intervals containing the true mean: 0.96

Code

# プロットの設定
plot(1:num_iterations, rep(true_mean, num_iterations), type = "n", ylim = range(c(confidence_intervals$lower, confidence_intervals$upper)), 
     xlab = "Iteration", ylab = "Confidence Interval", main = "Visualization of 95% Confidence Intervals")

# 真の平均のラインを描画
abline(h = true_mean, col = "blue", lty = 2)

# 信頼区間の描画
for (i in 1:num_iterations) {
  if (confidence_intervals$contains_true_mean[i]) {
    segments(i, confidence_intervals$lower[i], i, confidence_intervals$upper[i], col = "black")
  } else {
    segments(i, confidence_intervals$lower[i], i, confidence_intervals$upper[i], col = "red")
  }
}

2.1 パラメータ設定

set.seed(123) は、乱数生成の再現性を確保するためのシードを設定する。これにより、プログラムを実行するたびに同じ乱数列が生成される。 true_mean は、データの生成元となる真の平均を指定する。 sample_size は、各サンプルのサイズを指定する。 num_iterations は、シミュレーションを繰り返す回数を指定する。 confidence_level は、信頼区間の信頼水準を設定する。この場合、95%信頼区間を計算する。

2.2 信頼区間の保存リスト

confidence_intervals は、信頼区間の情報を保存するデータフレームである。各列は以下の情報を保持する：

iteration は、繰り返し回数を示す。
lower は、信頼区間の下限を示す。
upper は、信頼区間の上限を示す。
contains_true_mean は、信頼区間が真の平均を含むかどうかを示す。 coverage_count は、真の平均が信頼区間に含まれる回数をカウントする変数である。

2.3 データ生成と信頼区間計算

このループは、指定した回数（num_iterations）だけデータ生成と信頼区間の計算を繰り返す。

sample_data は、真の平均 true_mean と標準偏差10に基づく正規分布から生成されたデータである。
sample_mean は、生成されたサンプルデータの平均である。
sample_se は、サンプル標準誤差で、サンプルの標準偏差をサンプルサイズの平方根で割ったものである。
t_value は、指定された信頼水準に対応する t 分布の臨界値である。
error_margin は、信頼区間の誤差の幅を示す。
lower_bound と upper_bound は、それぞれ信頼区間の下限と上限を示す。
confidence_intervals[i, ] に、各反復の結果を保存する。
coverage_count は、真の平均が信頼区間に含まれる場合にカウントを増やす。

2.4 結果の表示

cat 関数は、真の平均が信頼区間に含まれる割合を計算して表示する。coverage_count を num_iterations で割ることで、割合を求める。

2.5 プロットの設定

plot 関数は、グラフの枠を作成する。

1:num_iterations は、x 軸の範囲を指定する。
rep(true_mean, num_iterations) は、y 軸の範囲を指定するが、ここではプロットしない（type = "n"）。
type = "n" は、データポイントを描画しない設定である。
ylim は、y 軸の範囲を confidence_intervals の下限と上限の範囲に設定する。
xlab と ylab は、それぞれ x 軸と y 軸のラベルを設定する。
main は、プロットのタイトルを設定する。

2.6 真の平均のラインを描画

abline 関数は、指定した y 値（ここでは true_mean）に水平線を描く。

h = true_mean は、真の平均の位置に水平線を引くことを指定する。
col = "blue" は、線の色を青に設定する。
lty = 2 は、線の種類を破線に設定する。

2.7 信頼区間の描画

このループは、各反復における信頼区間を描画する。

segments 関数は、各信頼区間を描画するために使用する。
i は、x 軸の位置を指定する。
confidence_intervals$lower[i] と confidence_intervals$upper[i] は、それぞれ信頼区間の下限と上限を指定する。
col は、信頼区間の色を指定する。信頼区間が真の平均を含む場合は黒、含まない場合は赤に設定する。