Uncertainity Estimation

set.seed(1)

n_values <- c(5, 30, 100)
sd_values <- c(10, 50, 90)
status <- c("Known", "Unknown")

result <- data.frame()

for(n in n_values){
  for(sd in sd_values){
    for(s in status){
      
      # generate data
      x <- rnorm(n, mean = 500, sd = sd)
      
      mean_x <- mean(x)
      
      if(s == "Known"){
        # Z distribution
        z <- qnorm(0.975)
        se <- sd/sqrt(n)
      } else {
        # t distribution
        t_val <- qt(0.975, df = n-1)
        se <- sd(x)/sqrt(n)
        z <- t_val
      }
      
      lower <- mean_x - z*se
      upper <- mean_x + z*se
      
      width <- upper - lower
      
      result <- rbind(result,
                     data.frame(n=n, sd=sd, status=s, width=width))
    }
  }
}

result

##      n sd  status      width
## 1    5 10   Known  17.530451
## 2    5 10 Unknown  16.609347
## 3    5 50   Known  87.652254
## 4    5 50 Unknown  57.714740
## 5    5 90   Known 157.774057
## 6    5 90 Unknown 157.496139
## 7   30 10   Known   7.156777
## 8   30 10 Unknown   7.171987
## 9   30 50   Known  35.783883
## 10  30 50 Unknown  34.464104
## 11  30 90   Known  64.410989
## 12  30 90 Unknown  72.913842
## 13 100 10   Known   3.919928
## 14 100 10 Unknown   3.857058
## 15 100 50   Known  19.599640
## 16 100 50 Unknown  19.293416
## 17 100 90   Known  35.279352
## 18 100 90 Unknown  38.461708

colors <- ifelse(result$status == "Known", "blue", "red")

bp <- barplot(result$width,
              names.arg = result$label,
              las = 2,
              col = colors,
              ylab = "CI Width",
              main = "CI Width (Blue=Known, Red=Unknown)")

legend("topright",
       legend = c("Known", "Unknown"),
       fill = c("blue", "red"))

group_centers <- c(mean(bp[1:6]), mean(bp[7:12]), mean(bp[13:18]))

text(group_centers,
     par("usr")[3] - 0.05 * diff(par("usr")[3:4]),
     labels = c("n = 5", "n = 30", "n = 100"),
     xpd = TRUE)

Confidence Interval can be affected by 3 things from the test we just did. First is n, which is the sample size we had. By seeing the comparison from the graph, we can clearly know that more samples can make the interval narrower. Maybe it is because more samples can make the interval more accurate. Second factor is standard deviation, we can see from the graph that bigger standard deviation can make the interval wider. It is because a larger standard deviation means more variability in the data, which affects the interval to become wider to make sure that samples are included. Third factor is basically the same as the second one but with a different view. Instead of looking for the value of standard deviation, we focus on whether the standard deviation is known or not. It is somewhat unique because on smaller samples, it shows that the one with unknown standard deviation is narrower than the one with known standard deviation. However, if we look further at bigger samples such as 30 and 100, the one with unknown standard deviation is definitely bigger than those whose standard deviation is known. This can be caused by various things, but mainly without information about the standard deviation, the precision of the confidence interval is worse than if we know the value of the standard deviation. So without knowing the standard deviation value, the confidence interval is basically trying to include every possible value so that it can satisfy the confidence level.

Uncertainity Estimation

Taufik Dwi Ferdiansyah

2026-03-29