STUDY CASES

Confidence Interval~ Week 13

Carol Dupino Pereira

NIM: 52250051

Mahasiswa Sains Data ITSB

R Programming Data Science Statistics

1 Case Study 1

Confidence Interval for Mean, \(\sigma\) Known: An e-commerce platform wants to estimate the average number of daily transactions per user after launching a new feature. Based on large-scale historical data, the population standard deviation is known.

\[ \begin{eqnarray*} \sigma &=& 3.2 \quad \text{(population standard deviation)} \\ n &=& 100 \quad \text{(sample size)} \\ \bar{x} &=& 12.6 \quad \text{(sample mean)} \end{eqnarray*} \]

Tasks

  1. Identify the appropriate statistical test and justify your choice.
  2. Compute the Confidence Intervals for:
    • \(90\%\)
    • \(95\%\)
    • \(99\%\)
  3. Create a comparison visualization of the three confidence intervals.
  4. Interpret the results in a business analytics context.

1.1 Appropriate Statistical Test and Justification

The appropriate statistical test for constructing the confidence interval for the population mean (\(\mu\)) is the Z-Interval for the Population Mean.

Justification:

  • Population Standard Deviation (\(\sigma\)) is Known: This is the primary condition that dictates the use of the Z-distribution (Standard Normal Distribution) over the \(t\)-distribution.

  • Large Sample Size (\(n\)): With a sample size of \(n=100\) (which is \(\ge 30\)), the Central Limit Theorem ensures that the sampling distribution of the sample mean (\(\bar{x}\)) is approximately normal, even if the underlying population distribution is not perfectly normal. This further validates the use of the Z-statistic.

The formula used is: \[ \text{Confidence Interval} = \bar{x} \pm Z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right) \]

1.2 Confidence Interval Computations

Given Parameters: \[\begin{eqnarray*} \sigma &=& 3.2 \\ n &=& 100 \\ \bar{x} &=& 12.6 \end{eqnarray*}\]

Standard Error (SE): \[\begin{eqnarray*}\text{SE} = \frac{\sigma}{\sqrt{n}} = \frac{3.2}{\sqrt{100}} = \frac{3.2}{10} = 0.32 \end{eqnarray*}\]

The computed confidence intervals are summarized in the table below:

Two-Sided Confidence Interval Menggunakan Zα/2
Confidence_Level Z_alpha_over_2 Margin_of_Error Lower_Bound Upper_Bound
90% 1.645 0.526 12.074 13.126
95% 1.960 0.627 11.973 13.227
99% 2.576 0.824 11.776 13.424

1.3 Comparison Visualization

The plot below visually compares the three confidence intervals, demonstrating how the interval width increases as the confidence level increases.

1.4 Interpretation in a Business Analytics Context

The confidence intervals provide a range of plausible values for the true average number of daily transactions per user (μ) after the new feature launch.

  1. 90% Confidence Interval (12.074,13.126):
  • We are 90% confident that the true average number of daily transactions per user is between 12.074 and 13.126. This is the most precise (narrowest) estimate.
  1. 95% Confidence Interval (11.973,13.227):
  • This is the standard interval used in most scientific and business contexts. We are 95% confident that the true average is between 11.973 and 13.227. The margin of error is 0.627 transactions.
  1. 99% Confidence Interval (11.776,13.424):
  • This is the most reliable (highest confidence) estimate. We are 99% confident that the true average is between 11.776 and 13.424.

Business Insight: The key takeaway is the trade-off between Confidence and Precision:

  • To be more confident (e.g., 99% confidence), the e-commerce platform must accept a wider interval (lower precision). This means the true average could be as low as 11.776 or as high as 13.424.

  • To have a more precise estimate (e.g., 90% confidence), the platform must accept a lower confidence level.

Since all intervals are entirely above 11.776 transactions, the platform can be highly confident that the new feature has resulted in an average transaction rate per user that is significantly higher than, for instance, 11.0 (if that were a benchmark). The 95% CI is a good balance, suggesting the new feature has likely resulted in an average between approximately 12.0 and 13.2 daily transactions per user.


2 Case Study 2

Confidence Interval for Mean, \(\sigma\) Unknown: A UX Research team analyzes task completion time (in minutes) for a new mobile application. The data are collected from 12 users:

\[ 8.4,\; 7.9,\; 9.1,\; 8.7,\; 8.2,\; 9.0,\; 7.8,\; 8.5,\; 8.9,\; 8.1,\; 8.6,\; 8.3 \]

Tasks:

  1. Identify the appropriate statistical test and explain why.
  2. Compute the Confidence Intervals for:
    • \(90\%\)
    • \(95\%\)
    • \(99\%\)
  3. Visualize the three intervals on a single plot.
  4. Explain how sample size and confidence level influence the interval width.

2.1 Identify the appropriate statistical test and explain why.

Appropriate Statistical Test: Confidence Interval for the Mean using the \(t\)-distribution (often referred to as a \(t\)-interval).

Explanation:The \(t\)-distribution is the appropriate choice for constructing the confidence interval for the population mean (\(\mu\)) for two primary reasons:

  1. Population Standard Deviation is Unknown (\(\sigma\) unknown): When \(\sigma\) is unknown, we must use the sample standard deviation (\(s\)) as an estimate.

  2. Small Sample Size (\(n < 30\)): The sample size is \(n=12\). For small samples with an unknown population standard deviation, the \(t\)-distribution provides a more accurate model of the sampling distribution of the mean than the standard normal (\(z\)) distribution.

The formula for the confidence interval for the mean is: \[ \bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}} \]

where \(t_{\alpha/2, n-1}\) is the critical \(t\)-value with \(n-1\) degrees of freedom.

2.2 Compute the Confidence Intervals

Sample Statistics

  • Sample Size (n): 12
  • Degrees of Freedom (df): 11
  • Sample Mean (x̄): 8.4500 minutes
  • Sample Standard Deviation (s): 0.4079 minutes
  • Standard Error (s / √n): 0.1179 minutes

The computed confidence intervals for the population mean task completion time (\(\mu\)) are:

Confidence Interval Waktu (dalam Menit)
Confidence_Level Lower_Bound_min Upper_Bound_min
90% 8.2435 8.6565
95% 8.1912 8.7088
99% 8.0772 8.8228
Note:

The critical t-values used in the calculation were:
  • t0.05, 11 (90% CI): 1.7959
  • t0.025, 11 (95% CI): 2.2010
  • t0.005, 11 (99% CI): 3.1058

2.3 Visualize the three intervals on a single plot.

The three confidence intervals are visualized below. The red dot represents the sample mean (\(\bar{x} = 8.45\) minutes), and the horizontal lines represent the interval for each confidence level.The plot shows the three confidence intervals:

2.4 Factors Influencing Interval Width

The width of a confidence interval is determined by the Margin of Error (\(ME = t^ \cdot \frac{s}{\sqrt{n}}\)).

  • Confidence Level:Effect:

As the confidence level increases (e.g., from 90% to 99%), the interval becomes wider.

Logic: To be more certain that the interval contains the true population mean, we must encompass a larger range of possible values. This is reflected in a larger critical value (\(t^*\)).

  • Sample Size (\(n\)):

Effect: As the sample size increases, the interval becomes narrower (more precise).

Logic: Increasing \(n\) reduces the Standard Error (\(\frac{s}{\sqrt{n}}\)). With more data, our estimate of the mean becomes more stable and reliable, allowing for a tighter range of estimation.


3 Case Study 3

Confidence Interval for a Proportion, A/B Testing: A data science team runs an A/B test on a new Call-To-Action (CTA) button design. The experiment yields:

\[ \begin{eqnarray*} n &=& 400 \quad \text{(total users)} \\ x &=& 156 \quad \text{(users who clicked the CTA)} \end{eqnarray*} \]

Tasks:

  1. Compute the sample proportion \(\hat{p}\).
  2. Compute Confidence Intervals for the proportion at:
    • \(90\%\)
    • \(95\%\)
    • \(99\%\)
  3. Visualize and compare the three intervals.
  4. Explain how confidence level affects decision-making in product experiments.

The given data is: \[ \begin{eqnarray*} n &=& 400 \quad \text{(total users)} \\ x &=& 156 \quad \text{(users who clicked the CTA)} \end{eqnarray*} \]

3.1 Compute the Sample Proportion (\(\hat{p}\))

The sample proportion \(\hat{p}\) is the point estimate for the true population proportion.

\[ \hat{p} = \frac{x}{n} = \frac{156}{400} = 0.3900 \]

The sample proportion of users who clicked the new CTA design is \(39.00\%\).

3.2 Compute Confidence Intervals

The confidence intervals (CIs) are calculated using the formula: \[ \text{CI} = \hat{p} \pm Z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \]

The standard error (\(\text{SE}\)) is: \[ \text{SE} = \sqrt{\frac{0.3900(1 - 0.3900)}{400}} \approx 0.0243 \]

Confidence Interval untuk Berbagai Tingkat Kepercayaan
Confidence_Level Z_score Margin_of_Error Confidence_Interval
90% 1.6449 0.0401 [0.3499, 0.4301]
95% 1.9600 0.0478 [0.3422, 0.4378]
99% 2.5758 0.0628 [0.3272, 0.4528]

3.3 Visualize and Compare the Three Intervals

The chart below visualizes how the width of the confidence interval increases with the confidence level. The sample proportion (\(\hat{p}=0.3900\)) is the center of all three intervals (marked by the diamond).

3.4 Explain How Confidence Level Affects Decision-Making

The confidence level directly impacts the precision and certainty of your result, which is crucial in product experimentation like A/B testing.

Hubungan Confidence Level dengan Lebar Interval, Presisi, dan Risiko Error
Confidence_Level Interval_Width Precision Certainty Risk_of_Type_I_Error
Higher (99%) Wider Less precise More certain Lower risk of Type I error
Lower (90%) Narrower More precise Less certain Higher risk of Type I error
    1. Defining a Winner (Statistical Significance):

In A/B testing, a common decision rule is to declare a “winner” if the confidence interval of the difference between the two variants does not include zero.

    1. Higher Confidence (\(\mathbf{99\%}\)):

The interval is very wide, making it harder to exclude a null hypothesis (e.g., that the new design is no different than the old one). It requires a much larger difference in performance to achieve statistical significance. While this is the safest level, it often leads to inconclusive results, requiring longer testing times.

    1. Lower Confidence (\(\mathbf{90\%}\)):

The interval is narrower, making it easier to achieve statistical significance. However, this increases the risk of a Type I error (a False Positive)—declaring the new CTA a winner when it is actually no better, or even worse, than the original.

    1. Product Standard:

Most data science and product teams default to a \(95\%\) confidence level (corresponding to a \(\alpha=0.05\) significance level). This is considered a good balance, offering a reasonable level of certainty (\(95\%\) sure the true value is in the range) without requiring an excessively large sample size or long test duration that would be needed for a \(99\%\) confidence level.


4 Case Study 4

Precision Comparison (Z-Test vs t-Test): Two data teams measure API latency (in milliseconds) under different conditions.

\[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \end{eqnarray*}\]

Tasks

  1. Identify the statistical test used by each team.
  2. Compute Confidence Intervals for 90%, 95%, and 99%.
  3. Create a visualization comparing all intervals.
  4. Explain why the interval widths differ, even with similar data.

4.1 Statistical Test Identification

The choice of statistical test for the mean depends on whether the population standard deviation (\(\sigma\)) is known and the sample size (\(n\)).

Pemilihan Uji Statistik Berdasarkan Informasi Simpangan Baku
Team Given_Standard_Deviation Test_Used Justification
Team A σ = 24 (Known population SD) Z-Test (or Z-interval) Since the population standard deviation (σ) is known and the sample size (n = 36) is large (n ≥ 30), the Z-distribution is appropriate.
Team B s = 24 (Sample SD) t-Test (or t-interval) Since the population standard deviation (σ) is unknown and only the sample standard deviation (s) is available, the t-distribution must be used.

4.2 Confidence Interval Computation

The formula for the Confidence Interval (CI) for the population mean (\(\mu\)) is:

Team A (Z-Interval): \[ \text{CI} = \bar{x} \pm Z^* \left(\frac{\sigma}{\sqrt{n}}\right) \]

Team B (t-Interval): \[ \text{CI} = \bar{x} \pm t^* \left(\frac{s}{\sqrt{n}}\right) \]

Common Parameters: \[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \\[6pt]\end{eqnarray*}\]

Standard Error (SE): \[\begin{eqnarray*}\text{SE} = \frac{\sigma}{\sqrt{n}} = \frac{24}{\sqrt{36}} = \frac{24}{6} = 4 \end{eqnarray*}\]

Team A (Z-Interval): \(\sigma\) is knownWe use the critical Z-values (\(Z\)) for the specified confidence levels.

Confidence Interval dengan Margin of Error (Z* × 4)
Confidence_Level Z_Critical_Value Margin_of_Error Confidence_Interval Interval_Width
90% 1.645 1.645 × 4 = 6.58 [203.42, 216.58] 13.16
95% 1.960 1.960 × 4 = 7.84 [202.16, 217.84] 15.68
99% 2.576 2.576 × 4 = 10.30 [199.70, 220.30] 20.60

Team B (t-Interval): \(\sigma\) is unknownWe use the critical t-values (\(t\)) with degrees of freedom (\(df\)) \(=n-1=36-1=35\)

Confidence Interval Menggunakan t-Distribution (df = 35)
Confidence_Level t_Critical_Value_df35 Margin_of_Error Confidence_Interval Interval_Width
90% 1.690 1.690 × 4 = 6.76 [203.24, 216.76] 13.52
95% 2.030 2.030 × 4 = 8.12 [201.88, 218.12] 16.24
99% 2.724 2.724 × 4 = 10.90 [199.10, 220.90] 21.80

4.3 Interval Comparison Visualization

The visualization would show that for every confidence level:

The t-intervals (Team B) are slightly wider than the Z-intervals (Team A). All intervals are centered at the sample mean of 210 ms. The width of the intervals increases as the confidence level increases (e.g., the 99% interval is widest, the 90% is narrowest).

4.4 Explanation of Interval Width Difference

The interval widths differ because of the underlying probability distributions used: the Standard Normal (Z) Distribution versus the Student’s t-Distribution.

4.4.1 \(\sigma\) Known (Team A \(\rightarrow\) Z-Test)

  • The Z-test is used when the population standard deviation (\(\sigma\)) is known.

  • Since \(\sigma\) is a fixed, known value, the estimate of the standard error (\(\sigma/\sqrt{n}\)) is highly certain and does not add extra variability to the analysis.

  • The critical \(Z^\) values are fixed based on the confidence level.

4.4.2 \(\sigma\) Unknown (Team B \(\rightarrow\) t-Test)

  • The t-test is used when the population standard deviation (\(\sigma\)) is unknown, and we must substitute the sample standard deviation (\(s\)) as an estimate.

  • The sample standard deviation (\(s\)) is itself an estimate that varies from sample to sample. This introduces an extra source of uncertainty into the standard error estimate.

  • To account for this added uncertainty, the t-distribution has heavier tails (more spread out) than the Z-distribution.

  • This results in larger critical values (\(t > Z\)) and, consequently, a larger Margin of Error (ME) and wider confidence intervals for the t-test compared to the Z-test at the same confidence level.

In summary: The t-test requires a wider interval (is less precise) to achieve the same confidence level as the Z-test because it must compensate for the additional uncertainty introduced by estimating the population standard deviation (\(\sigma\)) with the sample standard deviation (\(s\)).


5 Case Study 5

One-Sided Confidence Interval: A Software as a Service (SaaS) company wants to ensure that at least 70% of weekly active users utilize a premium feature.

From the experiment:

\[ \begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \end{eqnarray*} \]

Management is only interested in the lower bound of the estimate.

Tasks:

  1. Identify the type of Confidence Interval and the appropriate test.
  2. Compute the one-sided lower Confidence Interval at:
    • \(90\%\)
    • \(95\%\)
    • \(99\%\)
  3. Visualize the lower bounds for all confidence levels.
  4. Determine whether the 70% target is statistically satisfied.

The given data is: \[\begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \\ \hat{p} &=& \frac{x}{n} = \frac{185}{250} = 0.74 \end{eqnarray*}\] The target proportion to ensure is \(p_0 = 0.70\).

The standard error of the sample proportion (\(\hat{p}\)) is: \[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.74(1-0.74)}{250}} \approx 0.0277 \]

5.1 Identify the Type of Confidence Interval and the Appropriate Test

Type of Confidence Interval: One-Sided Lower Confidence Interval for a Population Proportion. The company is only interested in the lower bound to ensure the feature usage is at least \(70\%\).

Appropriate Test/Method: The appropriate method is using the Z-test for a Population Proportion (or the Normal Approximation method for confidence intervals) because the sample size is large enough to satisfy the normal approximation conditions (\(n\hat{p} = 185 > 10\) and \(n(1-\hat{p}) = 65 > 10\)).

5.2 Compute the One-Sided Lower Confidence Interval

The formula for the one-sided lower confidence bound is: \[ \text{Lower Bound} = \hat{p} - Z_{1-\alpha} \cdot SE \]

One-Sided Confidence Interval (Lower Bound)
Confidence_Level Alpha Z_1_minus_Alpha Lower_Bound
90% 0.10 1.282 0.7044
95% 0.05 1.645 0.6944
99% 0.01 2.326 0.6755

Detailed Results:

One-Sided Confidence Interval (Lower Bound)
Confidence_Level Z_1_minus_Alpha Lower_Bound_CI
90% 1.281552 0.704448
95% 1.644854 0.694369
99% 2.326348 0.675463

5.3 Visualize the Lower Bounds for All Confidence Levels

The following plot illustrates the calculated lower bounds against the \(70\%\) target.(A bar chart titled ‘One-Sided Lower Confidence Bounds for Premium Feature Usage’ is displayed. The x-axis shows confidence levels (90%, 95%, 99%), and the y-axis shows the Lower Bound (CI). A horizontal dashed red line indicates the target proportion of 0.70. The bars show lower bounds of 0.704





5.4 Determine Whether the \(70\%\) Target is Statistically Satisfied

The \(70\%\) target is statistically satisfied at a given confidence level if the calculated Lower Bound is \(\geq 0.70\).

5.4.1 At \(90\%\) Confidence:

  • Lower Bound:\(0.7044\)

  • Conclusion: Statistically Satisfied. Since \(0.7044 > 0.70\), we are \(90\%\) confident that the true proportion of weekly active users utilizing the premium feature is at least \(70.44\%\).

5.4.2 At \(95\%\) Confidence:

  • Lower Bound: \(0.6944\)

  • Conclusion: NOT Statistically Satisfied. Since \(0.6944 < 0.70\), we cannot be \(95\%\) confident that the true proportion is at least \(70\%\).

5.4.3 At \(99\%\) Confidence:

  • Lower Bound: \(0.6755\)

  • Conclusion: NOT Statistically Satisfied. Since \(0.6755 < 0.70\), we cannot be \(99\%\) confident that the true proportion is at least \(70\%\).

Summary: The company can be \(90\%\) confident that the true proportion of weekly active users utilizing a premium feature is at least \(70\%\). However, they cannot make this claim at the stricter \(95\%\) or \(99\%\) confidence levels.

---
title: "STUDY CASES"            # Main title of the document
subtitle: "Confidence Interval~ Week 13"  # Subtitle or topic for week 2
author: "Carol Dupino Pereira"      # Replace with your full name
date:  "`r format(Sys.Date(), '%B %d, %Y')`" # Auto displays the current date
output:                         # Output section defines the format and layout 
  rmdformats::readthedown:      # https://github.com/juba/rmdformats
    self_contained: true        # Embeds all resources (CSS, JS, images) 
    thumbnails: true            # Displays image thumbnails in the doc
    lightbox: true              # Enables click to enlarge images
    gallery: true               # Groups images into an interactive gallery
    number_sections: true       # Automatically numbers all sections
    lib_dir: libs               # Directory where JavaScript/CSS libraries
    df_print: "paged"           # Displays data frames as interactive paged 
    code_folding: "show"        # Allows folding/unfolding R code blocks 
    code_download: yes          # Adds a button to download all R code
    css : aaaa.css
---


```{r profile, echo=FALSE}
library(htmltools)

HTML('
<div style="width: 400px; height: 250px; background: linear-gradient(135deg, #ffffff 0%, #f0f0f0 100%); border: 2px solid #2c3e50; border-radius: 15px; box-shadow: 0 8px 20px rgba(0,0,0,0.15); padding: 20px; display: flex; align-items: center; gap: 20px; font-family: Arial, sans-serif; margin: 20px auto; overflow: hidden;">
  <div style="flex-shrink: 0;">
    <img src="foto_1.jpg" style="width: 120px; height: 120px; border-radius: 10px; object-fit: cover; border: 4px solid #2c3e50; box-shadow: 0 4px 10px rgba(0,0,0,0.2);">
  </div>
  <div style="flex: 1; overflow: hidden;">
    <div style="background: #ecf0f1; border: 1px solid #bdc3c7; border-radius: 8px; padding: 8px; margin-bottom: 10px;">
      <h1 style="color: #2c3e50; font-size: 18px; margin: 0 0 5px 0; font-weight: bold;">Carol Dupino Pereira</h1>
      <h2 style="color: #34495e; font-size: 14px; margin: 0; font-weight: normal;">NIM: 52250051</h2>
    </div>
    <p style="color: #7f8c8d; font-size: 12px; margin: 0 0 15px 0;">Mahasiswa Sains Data ITSB</p>
    
    <div style="display: flex; flex-wrap: wrap; gap: 8px;">
      <span style="background: #3498db; color: white; padding: 4px 10px; border-radius: 15px; font-size: 10px; font-weight: bold;">R Programming</span>
      <span style="background: #e74c3c; color: white; padding: 4px 10px; border-radius: 15px; font-size: 10px; font-weight: bold;">Data Science</span>
      <span style="background: #2ecc71; color: white; padding: 4px 10px; border-radius: 15px; font-size: 10px; font-weight: bold;">Statistics</span>
    </div>
  </div>
</div>
')
```

---

##   Case Study 1

**Confidence Interval for Mean, $\sigma$ Known:** An **e-commerce platform** wants to estimate the **average number of daily transactions per user** after launching a new feature. Based on large-scale historical data, the **population standard deviation** is known.

$$
\begin{eqnarray*}
\sigma &=& 3.2 \quad \text{(population standard deviation)} \\
n &=& 100 \quad \text{(sample size)} \\
\bar{x} &=& 12.6 \quad \text{(sample mean)}
\end{eqnarray*}
$$

**Tasks**

1. Identify the **appropriate statistical test** and justify your choice.
2. Compute the Confidence Intervals for:
   - $90\%$
   - $95\%$
   - $99\%$
3. Create a **comparison visualization** of the three confidence intervals.
4. Interpret the results in a business analytics context.

---

###    Appropriate Statistical Test and Justification

The appropriate statistical test for constructing the confidence interval for the population mean ($\mu$) is the Z-Interval for the Population Mean.

Justification:

- Population Standard Deviation ($\sigma$) is Known: This is the primary condition that dictates the use of the Z-distribution (Standard Normal Distribution) over the $t$-distribution.

- Large Sample Size ($n$): With a sample size of $n=100$ (which is $\ge 30$), the Central Limit Theorem ensures that the sampling distribution of the sample mean ($\bar{x}$) is approximately normal, even if the underlying population distribution is not perfectly normal. This further validates the use of the Z-statistic.

The formula used is:
$$
\text{Confidence Interval} = \bar{x} \pm Z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)
$$

###   Confidence Interval Computations

Given Parameters:
\begin{eqnarray*}
\sigma &=& 3.2 \\
n &=& 100 \\
\bar{x} &=& 12.6
\end{eqnarray*}

Standard Error (SE):
\begin{eqnarray*}\text{SE} = \frac{\sigma}{\sqrt{n}} = \frac{3.2}{\sqrt{100}} = \frac{3.2}{10} = 0.32
\end{eqnarray*}

The computed confidence intervals are summarized in the table below:


```{r,echo=FALSE,message=FALSE,warning=FALSE}
# Membuat tabel confidence interval dua sisi
ci_table_two_sided <- data.frame(
  Confidence_Level = c("90%", "95%", "99%"),
  Z_alpha_over_2 = c(1.645, 1.960, 2.576),
  Margin_of_Error = c(0.526, 0.627, 0.824),
  Lower_Bound = c(12.074, 11.973, 11.776),
  Upper_Bound = c(13.126, 13.227, 13.424)
)

# Menampilkan tabel
library(knitr)

kable(
  ci_table_two_sided,
  caption = "Two-Sided Confidence Interval Menggunakan Zα/2",
  align = "c"
)

```

###    Comparison Visualization

The plot below visually compares the three confidence intervals, demonstrating how the interval width increases as the confidence level increases.

```{r,echo=FALSE,message=FALSE,warning=FALSE}
# Load necessary libraries
library(plotly)

# Provided data
conf_levels <- c(0.90, 0.95, 0.99)
z_values <- c(1.645, 1.960, 2.576)
me_values <- c(0.526, 0.627, 0.824)
lb_values <- c(12.074, 11.973, 11.776)
ub_values <- c(13.126, 13.227, 13.424)
mean_val <- (lb_values + ub_values) / 2  # All approximately 12.6

# Define x values for distributions (standardized scale)
x <- seq(-4, 4, 0.01)

# Z-distribution (Standard Normal)
y_z <- dnorm(x)
fig_z <- plot_ly(x = x, y = y_z, type = 'scatter', mode = 'lines', name = 'Z-Distribution (Standard Normal)') %>%
  layout(title = 'Z-Distribution with Critical Values',
         xaxis = list(title = 'Z-Score'),
         yaxis = list(title = 'Density'))

# Add critical values for Z-distribution
for (i in 1:3) {
  fig_z <- fig_z %>% 
    add_trace(x = c(-z_values[i], z_values[i]), 
              y = c(0, 0), 
              type = 'scatter', 
              mode = 'markers', 
              marker = list(size = 8, color = 'red'), 
              name = paste0(conf_levels[i]*100, '% Critical Z-Values'))
}

# T-distribution (assuming df = 30 for illustration, as df is not provided)
df <- 30
y_t <- dt(x, df)
t_crit <- qt(1 - (1 - conf_levels)/2, df)  # Critical t-values
fig_t <- plot_ly(x = x, y = y_t, type = 'scatter', mode = 'lines', name = paste('T-Distribution (df =', df, ')')) %>%
  layout(title = paste('T-Distribution (df =', df, ') with Critical Values'),
         xaxis = list(title = 'T-Score'),
         yaxis = list(title = 'Density'))

# Add critical values for T-distribution
for (i in 1:3) {
  fig_t <- fig_t %>% 
    add_trace(x = c(-t_crit[i], t_crit[i]), 
              y = c(0, 0), 
              type = 'scatter', 
              mode = 'markers', 
              marker = list(size = 8, color = 'blue'), 
              name = paste0(conf_levels[i]*100, '% Critical T-Values'))
}

# Confidence Intervals visualization (as error bars on the mean)
fig_ci <- plot_ly(x = paste0(conf_levels*100, '%'), 
                  y = mean_val, 
                  type = 'scatter', 
                  mode = 'markers', 
                  error_y = list(type = 'data', array = me_values), 
                  name = 'Confidence Intervals') %>%
  layout(title = 'Confidence Intervals for Mean',
         xaxis = list(title = 'Confidence Level'),
         yaxis = list(title = 'Mean Estimate'))

# Combine into subplots: Z-dist, T-dist, and CI
fig_combined <- subplot(fig_z, fig_t, fig_ci, nrows = 3, shareX = FALSE, titleY = TRUE) %>%
  layout(title = 'Visualizations of Z-Distribution, T-Distribution, and Confidence Intervals')

# Display the plot
fig_combined
```


###   Interpretation in a Business Analytics Context

The confidence intervals provide a range of plausible values for the true average number of daily transactions per user (μ) after the new feature launch.

1. 90% Confidence Interval (12.074,13.126):

- We are 90% confident that the true average number of daily transactions per user is between 12.074 and 13.126. This is the most precise (narrowest) estimate.

2. 95% Confidence Interval (11.973,13.227):

- This is the standard interval used in most scientific and business contexts. We are 95% confident that the true average is between 11.973 and 13.227. The margin of error is 0.627 transactions.

3. 99% Confidence Interval (11.776,13.424):

- This is the most reliable (highest confidence) estimate. We are 99% confident that the true average is between 11.776 and 13.424.

Business Insight: The key takeaway is the trade-off between Confidence and Precision:

- To be more confident (e.g., 99% confidence), the e-commerce platform must accept a wider interval (lower precision). This means the true average could be as low as 11.776 or as high as 13.424.

- To have a more precise estimate (e.g., 90% confidence), the platform must accept a lower confidence level.

Since all intervals are entirely above 11.776 transactions, the platform can be highly confident that the new feature has resulted in an average transaction rate per user that is significantly higher than, for instance, 11.0 (if that were a benchmark). The 95% CI is a good balance, suggesting the new feature has likely resulted in an average between approximately 12.0 and 13.2 daily transactions per user.

---


##   Case Study 2 

**Confidence Interval for Mean, $\sigma$ Unknown:** A **UX Research team** analyzes **task completion time (in minutes)** for a new mobile application. The data are collected from **12 users**:

$$
8.4,\; 7.9,\; 9.1,\; 8.7,\; 8.2,\; 9.0,\;
7.8,\; 8.5,\; 8.9,\; 8.1,\; 8.6,\; 8.3
$$

**Tasks:**

1. Identify the **appropriate statistical test** and explain why.
2. Compute the Confidence Intervals for:
   - $90\%$
   - $95\%$
   - $99\%$
3. Visualize the three intervals on a single plot.
4. Explain how **sample size and confidence level** influence the interval width.

---


###    Identify the appropriate statistical test and explain why.

Appropriate Statistical Test: Confidence Interval for the Mean using the $t$-distribution (often referred to as a $t$-interval).

Explanation:The $t$-distribution is the appropriate choice for constructing the confidence interval for the population mean ($\mu$) for two primary reasons:

1. Population Standard Deviation is Unknown ($\sigma$ unknown): When $\sigma$ is unknown, we must use the sample standard deviation ($s$) as an estimate.

2. Small Sample Size ($n < 30$): The sample size is $n=12$. For small samples with an unknown population standard deviation, the $t$-distribution provides a more accurate model of the sampling distribution of the mean than the standard normal ($z$) distribution.

The formula for the confidence interval for the mean is:
$$
\bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}
$$

where $t_{\alpha/2, n-1}$ is the critical $t$-value with $n-1$ degrees of freedom.

###   Compute the Confidence Intervals


<div class="stat-box">
  <h3>Sample Statistics</h3>
  <ul>
    <li><strong>Sample Size (n):</strong> 12</li>
    <li><strong>Degrees of Freedom (df):</strong> 11</li>
    <li><strong>Sample Mean (x̄):</strong> 8.4500 minutes</li>
    <li><strong>Sample Standard Deviation (s):</strong> 0.4079 minutes</li>
    <li><strong>Standard Error (s / √n):</strong> 0.1179 minutes</li>
  </ul>
</div>


The computed confidence intervals for the population mean task completion time ($\mu$) are:
```{r,echo=FALSE,message=FALSE,warning=FALSE}
# Membuat tabel confidence interval (menit)
ci_table_minutes <- data.frame(
  Confidence_Level = c("90%", "95%", "99%"),
  Lower_Bound_min = c(8.2435, 8.1912, 8.0772),
  Upper_Bound_min = c(8.6565, 8.7088, 8.8228)
)

# Menampilkan tabel
library(knitr)

kable(
  ci_table_minutes,
  caption = "Confidence Interval Waktu (dalam Menit)",
  digits = 4,
  align = "c"
)

```

<div style="
  border: 2px solid #2c7be5;
  background-color: #f1f6ff;
  padding: 15px;
  border-radius: 8px;
  margin-top: 15px;
  margin-bottom: 15px;
  font-size: 14px;
">
  <strong>Note:</strong><br><br>
  The critical <em>t</em>-values used in the calculation were:
  <ul style="margin-top: 8px;">
    <li>
      <em>t</em><sub>0.05, 11</sub> (90% CI): <strong>1.7959</strong>
    </li>
    <li>
      <em>t</em><sub>0.025, 11</sub> (95% CI): <strong>2.2010</strong>
    </li>
    <li>
      <em>t</em><sub>0.005, 11</sub> (99% CI): <strong>3.1058</strong>
    </li>
  </ul>
</div>

###   Visualize the three intervals on a single plot.

The three confidence intervals are visualized below. The red dot represents the sample mean ($\bar{x} = 8.45$ minutes), and the horizontal lines represent the interval for each confidence level.The plot shows the three confidence intervals:

```{r,echo=FALSE}
# Install and load necessary packages if not already installed
# install.packages("plotly")
library(plotly)

# Define the range of x values for the plots
x <- seq(-4, 4, length.out = 1000)

# Z-distribution (Standard Normal)
z_density <- dnorm(x, mean = 0, sd = 1)

# T-distributions with different degrees of freedom
t_df5 <- dt(x, df = 5)
t_df10 <- dt(x, df = 10)
t_df30 <- dt(x, df = 30)

# Create the plot using plotly
plot <- plot_ly() %>%
  # Add Z-distribution trace
  add_trace(x = x, y = z_density, type = 'scatter', mode = 'lines', 
            name = 'Z-distribution (Normal)', line = list(color = 'blue', width = 3)) %>%
  # Add T-distribution traces
  add_trace(x = x, y = t_df5, type = 'scatter', mode = 'lines', 
            name = 'T-distribution (df=5)', line = list(color = 'red', width = 2, dash = 'dash')) %>%
  add_trace(x = x, y = t_df10, type = 'scatter', mode = 'lines', 
            name = 'T-distribution (df=10)', line = list(color = 'green', width = 2, dash = 'dot')) %>%
  add_trace(x = x, y = t_df30, type = 'scatter', mode = 'lines', 
            name = 'T-distribution (df=30)', line = list(color = 'orange', width = 2, dash = 'dashdot')) %>%
  # Layout settings
  layout(title = 'Comparison of Z and T Distributions',
         xaxis = list(title = 'Value'),
         yaxis = list(title = 'Density'),
         legend = list(title = list(text = 'Distributions')))

# Display the plot
plot
```

```{r,echo=FALSE}
# Install and load necessary packages if not already installed
# install.packages("plotly")
library(plotly)

# Data for confidence intervals
confidence_levels <- c("90%", "95%", "99%")
lower_bounds <- c(8.2435, 8.1912, 8.0772)
upper_bounds <- c(8.6565, 8.7088, 8.8228)
sample_mean <- 8.45

# Assumed parameters based on data (scale ≈ 0.116, df=10 for t)
scale <- 0.116  # Approximate standard error (s / sqrt(n))
df <- 10  # Degrees of freedom for t-distribution

# Define x range around the mean
x <- seq(sample_mean - 4*scale, sample_mean + 4*scale, length.out = 1000)

# Z-distribution (Normal) density
z_density <- dnorm(x, mean = sample_mean, sd = scale)

# T-distribution density
t_density <- dt((x - sample_mean)/scale, df = df) / scale

# Create the plot
plot <- plot_ly()

# Add Z-distribution trace
plot <- plot %>% add_trace(x = x, y = z_density, type = 'scatter', mode = 'lines', 
                           name = 'Z-distribution (Normal)', line = list(color = 'blue', width = 2))

# Add T-distribution trace
plot <- plot %>% add_trace(x = x, y = t_density, type = 'scatter', mode = 'lines', 
                           name = 'T-distribution (df=10)', line = list(color = 'red', width = 2))

# Add shaded areas for confidence intervals (for Z-distribution)
for (i in 1:length(confidence_levels)) {
  # Find indices for shading
  idx <- x >= lower_bounds[i] & x <= upper_bounds[i]
  x_shade <- x[idx]
  y_shade <- z_density[idx]
  
  plot <- plot %>% add_trace(x = c(x_shade, rev(x_shade)), y = c(y_shade, rep(0, length(y_shade))), 
                             type = 'scatter', mode = 'lines', fill = 'tozeroy', 
                             fillcolor = 'rgba(0,0,255,0.3)', line = list(color = 'transparent'),
                             name = paste(confidence_levels[i], 'CI (Z)'), showlegend = TRUE)
}

# Add shaded areas for confidence intervals (for T-distribution)
for (i in 1:length(confidence_levels)) {
  # Find indices for shading
  idx <- x >= lower_bounds[i] & x <= upper_bounds[i]
  x_shade <- x[idx]
  y_shade <- t_density[idx]
  
  plot <- plot %>% add_trace(x = c(x_shade, rev(x_shade)), y = c(y_shade, rep(0, length(y_shade))), 
                             type = 'scatter', mode = 'lines', fill = 'tozeroy', 
                             fillcolor = 'rgba(255,0,0,0.3)', line = list(color = 'transparent'),
                             name = paste(confidence_levels[i], 'CI (T)'), showlegend = TRUE)
}

# Add vertical lines for sample mean
plot <- plot %>% add_trace(x = rep(sample_mean, 2), y = c(0, max(c(z_density, t_density))), 
                           type = 'scatter', mode = 'lines', 
                           line = list(color = 'black', width = 2, dash = 'dash'), 
                           name = 'Sample Mean (8.45 min)')

# Layout settings
plot <- plot %>% layout(title = 'Distribusi Z dan T dengan Interval',
                        xaxis = list(title = 'Time (menit)', range = c(min(x), max(x))),
                        yaxis = list(title = 'Density'),
                        legend = list(title = list(text = 'Legend')))

# Display the plot
plot
```

###    Factors Influencing Interval Width

The width of a confidence interval is determined by the Margin of Error ($ME = t^ \cdot \frac{s}{\sqrt{n}}$).

- Confidence Level:Effect:

As the confidence level increases (e.g., from 90% to 99%), the interval becomes wider.

Logic: To be more certain that the interval contains the true population mean, we must encompass a larger range of possible values. This is reflected in a larger critical value ($t^*$).

- Sample Size ($n$):

Effect: As the sample size increases, the interval becomes narrower (more precise).

Logic: Increasing $n$ reduces the Standard Error ($\frac{s}{\sqrt{n}}$). With more data, our estimate of the mean becomes more stable and reliable, allowing for a tighter range of estimation.

---

##   Case Study 3 

**Confidence Interval for a Proportion, A/B Testing:** A data science team runs an **A/B test** on a new *Call-To-Action (CTA)* button design. The experiment yields:

$$
\begin{eqnarray*}
n &=& 400 \quad \text{(total users)} \\
x &=& 156 \quad \text{(users who clicked the CTA)}
\end{eqnarray*}
$$

**Tasks:**

1. Compute the **sample proportion** $\hat{p}$.
2. Compute Confidence Intervals for the proportion at:
   - $90\%$
   - $95\%$
   - $99\%$
3. Visualize and compare the three intervals.
4. Explain how confidence level affects decision-making in product experiments.

---

The given data is:
$$
\begin{eqnarray*}
n &=& 400 \quad \text{(total users)} \\
x &=& 156 \quad \text{(users who clicked the CTA)}
\end{eqnarray*}
$$

###    Compute the Sample Proportion ($\hat{p}$)

The sample proportion $\hat{p}$ is the point estimate for the true population proportion.

$$
\hat{p} = \frac{x}{n} = \frac{156}{400} = 0.3900
$$

The sample proportion of users who clicked the new CTA design is $39.00\%$.

###  Compute Confidence Intervals

The confidence intervals (CIs) are calculated using the formula:
$$
\text{CI} = \hat{p} \pm Z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
$$

The standard error ($\text{SE}$) is:
$$
\text{SE} = \sqrt{\frac{0.3900(1 - 0.3900)}{400}} \approx 0.0243
$$
```{r,echo=FALSE,message=FALSE,warning=FALSE}
# Membuat tabel confidence interval
ci_table <- data.frame(
  Confidence_Level = c("90%", "95%", "99%"),
  Z_score = c(1.6449, 1.9600, 2.5758),
  Margin_of_Error = c(0.0401, 0.0478, 0.0628),
  Confidence_Interval = c(
    "[0.3499, 0.4301]",
    "[0.3422, 0.4378]",
    "[0.3272, 0.4528]"
  )
)

# Menampilkan tabel
library(knitr)

kable(
  ci_table,
  caption = "Confidence Interval untuk Berbagai Tingkat Kepercayaan",
  align = "c"
)



```


###  Visualize and Compare the Three Intervals

```{r,echo=FALSE,message=FALSE,warning=FALSE}
# Load library
library(plotly)

# Data distribusi Z
z <- seq(-4, 4, length.out = 1000)
density <- dnorm(z)

# Membuat plot dasar
p <- plot_ly(
  x = ~z,
  y = ~density,
  type = "scatter",
  mode = "lines",
  line = list(width = 2),
  name = "Distribusi Normal Standar"
)

# Menambahkan garis Z untuk Confidence Level
p <- p %>%
  add_segments(
    x = -1.6449, xend = -1.6449,
    y = 0, yend = dnorm(-1.6449),
    line = list(dash = "dash", width = 2),
    name = "90% CI"
  ) %>%
  add_segments(
    x = 1.6449, xend = 1.6449,
    y = 0, yend = dnorm(1.6449),
    line = list(dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = -1.9600, xend = -1.9600,
    y = 0, yend = dnorm(-1.9600),
    line = list(dash = "dot", width = 2),
    name = "95% CI"
  ) %>%
  add_segments(
    x = 1.9600, xend = 1.9600,
    y = 0, yend = dnorm(1.9600),
    line = list(dash = "dot", width = 2),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = -2.5758, xend = -2.5758,
    y = 0, yend = dnorm(-2.5758),
    line = list(dash = "longdash", width = 2),
    name = "99% CI"
  ) %>%
  add_segments(
    x = 2.5758, xend = 2.5758,
    y = 0, yend = dnorm(2.5758),
    line = list(dash = "longdash", width = 2),
    showlegend = FALSE
  )

# Layout
p <- p %>%
  layout(
    title = "Distribusi Z dengan Confidence Interval",
    xaxis = list(title = "Z-score"),
    yaxis = list(title = "Density"),
    legend = list(orientation = "h", x = 0.1, y = -0.2)
  )

# Tampilkan plot
p

```
The chart below visualizes how the width of the confidence interval increases with the confidence level.
The sample proportion ($\hat{p}=0.3900$) is the center of all three intervals (marked by the diamond).

###  Explain How Confidence Level Affects Decision-Making

The confidence level directly impacts the precision and certainty of your result, which is crucial in product experimentation like A/B testing.
```{r,echo=FALSE}
# Membuat tabel hubungan confidence level
ci_relation <- data.frame(
  Confidence_Level = c("Higher (99%)", "Lower (90%)"),
  Interval_Width = c("Wider", "Narrower"),
  Precision = c("Less precise", "More precise"),
  Certainty = c("More certain", "Less certain"),
  Risk_of_Type_I_Error = c(
    "Lower risk of Type I error",
    "Higher risk of Type I error"
  )
)

# Menampilkan tabel
library(knitr)

kable(
  ci_relation,
  caption = "Hubungan Confidence Level dengan Lebar Interval, Presisi, dan Risiko Error",
  align = "c"
)

```

-  1. Defining a Winner (Statistical Significance):

In A/B testing, a common decision rule is to declare a "winner" if the confidence interval of the difference between the two variants does not include zero.

-  2. Higher Confidence ($\mathbf{99\%}$):

The interval is very wide, making it harder to exclude a null hypothesis (e.g., that the new design is no different than the old one).
It requires a much larger difference in performance to achieve statistical significance. While this is the safest level, it often leads to inconclusive results, requiring longer testing times.

-  3. Lower Confidence ($\mathbf{90\%}$):

The interval is narrower, making it easier to achieve statistical significance. However, this increases the risk of a Type I error (a False Positive)—declaring the new CTA a winner when it is actually no better, or even worse, than the original.

-  4. Product Standard:

Most data science and product teams default to a $95\%$ confidence level (corresponding to a $\alpha=0.05$ significance level). This is considered a good balance, offering a reasonable level of certainty ($95\%$ sure the true value is in the range) without requiring an excessively large sample size or long test duration that would be needed for a $99\%$ confidence level.
```{r,echo=FALSE}
# Install necessary packages if not already installed
# install.packages("plotly")

# Load the plotly library
library(plotly)

# Define the range of x values for the plot
x <- seq(-4, 4, length.out = 1000)

# Compute the density of the standard normal distribution (Z-distribution)
z_density <- dnorm(x, mean = 0, sd = 1)

# Compute the density of the t-distribution for different degrees of freedom
t_df_5 <- dt(x, df = 5)  # t-distribution with 5 df
t_df_10 <- dt(x, df = 10)  # t-distribution with 10 df
t_df_30 <- dt(x, df = 30)  # t-distribution with 30 df

# Create the plot using plotly
plot <- plot_ly() %>%
  # Add the Z-distribution (normal)
  add_trace(x = x, y = z_density, type = 'scatter', mode = 'lines', 
            name = 'Z-distribution (Normal)', line = list(color = 'blue', width = 3)) %>%
  # Add t-distributions with different df
  add_trace(x = x, y = t_df_5, type = 'scatter', mode = 'lines', 
            name = 't-distribution (df=5)', line = list(color = 'red', width = 2, dash = 'dash')) %>%
  add_trace(x = x, y = t_df_10, type = 'scatter', mode = 'lines', 
            name = 't-distribution (df=10)', line = list(color = 'green', width = 2, dash = 'dot')) %>%
  add_trace(x = x, y = t_df_30, type = 'scatter', mode = 'lines', 
            name = 't-distribution (df=30)', line = list(color = 'orange', width = 2, dash = 'dashdot')) %>%
  # Layout settings
  layout(title = 'Comparison of Z-distribution and t-distributions',
         xaxis = list(title = 'x'),
         yaxis = list(title = 'Density'),
         legend = list(x = 0.7, y = 0.9))

# Display the plot
plot
```

---

##   Case Study 4 

**Precision Comparison (Z-Test vs t-Test):** Two data teams measure **API latency (in milliseconds)** under different conditions.

\begin{eqnarray*}
\text{Team A:} \\
n &=& 36 \quad \text{(sample size)} \\
\bar{x} &=& 210 \quad \text{(sample mean)} \\
\sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt]

\text{Team B:} \\
n &=& 36 \quad \text{(sample size)} \\
\bar{x} &=& 210 \quad \text{(sample mean)} \\
s &=& 24 \quad \text{(sample standard deviation)}
\end{eqnarray*}

**Tasks**

1. Identify the statistical test used by each team.
2. Compute Confidence Intervals for **90%, 95%, and 99%**.
3. Create a visualization comparing all intervals.
4. Explain why the **interval widths differ**, even with similar data.

---


###   Statistical Test Identification

The choice of statistical test for the mean depends on whether the population standard deviation ($\sigma$) is known and the sample size ($n$).

```{r,echo=FALSE}
# Membuat tabel pemilihan uji
test_selection <- data.frame(
  Team = c("Team A", "Team B"),
  Given_Standard_Deviation = c(
    "σ = 24 (Known population SD)",
    "s = 24 (Sample SD)"
  ),
  Test_Used = c(
    "Z-Test (or Z-interval)",
    "t-Test (or t-interval)"
  ),
  Justification = c(
    "Since the population standard deviation (σ) is known and the sample size (n = 36) is large (n ≥ 30), the Z-distribution is appropriate.",
    "Since the population standard deviation (σ) is unknown and only the sample standard deviation (s) is available, the t-distribution must be used."
  )
)

# Menampilkan tabel
library(knitr)

kable(
  test_selection,
  caption = "Pemilihan Uji Statistik Berdasarkan Informasi Simpangan Baku",
  align = "c"
)
```
###    Confidence Interval Computation

The formula for the Confidence Interval (CI) for the population mean ($\mu$) is:

Team A (Z-Interval): $$
\text{CI} = \bar{x} \pm Z^* \left(\frac{\sigma}{\sqrt{n}}\right)
$$

Team B (t-Interval): $$
\text{CI} = \bar{x} \pm t^* \left(\frac{s}{\sqrt{n}}\right)
$$

Common Parameters:
\begin{eqnarray*}
\text{Team A:} \\
n &=& 36 \quad \text{(sample size)} \\
\bar{x} &=& 210 \quad \text{(sample mean)} \\
\sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt]

\text{Team B:} \\
n &=& 36 \quad \text{(sample size)} \\
\bar{x} &=& 210 \quad \text{(sample mean)} \\
s &=& 24 \quad \text{(sample standard deviation)}
\\[6pt]\end{eqnarray*}


Standard Error (SE):
\begin{eqnarray*}\text{SE} = \frac{\sigma}{\sqrt{n}} = \frac{24}{\sqrt{36}} = \frac{24}{6} = 4
\end{eqnarray*}


**Team A (Z-Interval): $\sigma$ is knownWe use the critical Z-values ($Z$) for the specified confidence levels.**

```{r,echo=FALSE}
# Membuat tabel confidence interval
ci_table_zstar <- data.frame(
  Confidence_Level = c("90%", "95%", "99%"),
  Z_Critical_Value = c(1.645, 1.960, 2.576),
  Margin_of_Error = c(
    "1.645 × 4 = 6.58",
    "1.960 × 4 = 7.84",
    "2.576 × 4 = 10.30"
  ),
  Confidence_Interval = c(
    "[203.42, 216.58]",
    "[202.16, 217.84]",
    "[199.70, 220.30]"
  ),
  Interval_Width = c(13.16, 15.68, 20.60)
)

# Menampilkan tabel
library(knitr)

kable(
  ci_table_zstar,
  caption = "Confidence Interval dengan Margin of Error (Z* × 4)",
  align = "c"
)
```

**Team B (t-Interval): $\sigma$ is unknownWe use the critical t-values ($t$) with degrees of freedom ($df$) $=n-1=36-1=35$**

```{r,echo=FALSE}
# Membuat tabel confidence interval berbasis t
ci_table_tstar <- data.frame(
  Confidence_Level = c("90%", "95%", "99%"),
  t_Critical_Value_df35 = c(1.690, 2.030, 2.724),
  Margin_of_Error = c(
    "1.690 × 4 = 6.76",
    "2.030 × 4 = 8.12",
    "2.724 × 4 = 10.90"
  ),
  Confidence_Interval = c(
    "[203.24, 216.76]",
    "[201.88, 218.12]",
    "[199.10, 220.90]"
  ),
  Interval_Width = c(13.52, 16.24, 21.80)
)

# Menampilkan tabel
library(knitr)

kable(
  ci_table_tstar,
  caption = "Confidence Interval Menggunakan t-Distribution (df = 35)",
  align = "c"
)

```


###    Interval Comparison Visualization

The visualization would show that for every confidence level:

The t-intervals (Team B) are slightly wider than the Z-intervals (Team A).
All intervals are centered at the sample mean of 210 ms.
The width of the intervals increases as the confidence level increases (e.g., the 99% interval is widest, the 90% is narrowest).

```{r,echo=FALSE,message=FALSE,warning=FALSE}
# Load necessary libraries
library(plotly)
library(dplyr)

# Data from the case
mean_val <- 210
n <- 36
sigma <- 24  # For Team A (Z-distribution)
s <- 24      # For Team B (t-distribution)
df <- n - 1  # 35 for t-distribution

# Confidence levels
conf_levels <- c(0.90, 0.95, 0.99)

# Z-scores and t-scores
z_scores <- qnorm((1 + conf_levels) / 2)
t_scores <- qt((1 + conf_levels) / 2, df)

# Calculate intervals
intervals <- data.frame(
  conf = rep(conf_levels, each = 2),
  type = rep(c("Z (Team A)", "t (Team B)"), times = length(conf_levels)),
  lower = numeric(length(conf_levels) * 2),
  upper = numeric(length(conf_levels) * 2)
)

for (i in 1:length(conf_levels)) {
  conf <- conf_levels[i]
  z_margin <- z_scores[i] * (sigma / sqrt(n))
  t_margin <- t_scores[i] * (s / sqrt(n))
  intervals$lower[(i-1)*2 + 1] <- mean_val - z_margin
  intervals$upper[(i-1)*2 + 1] <- mean_val + z_margin
  intervals$lower[(i-1)*2 + 2] <- mean_val - t_margin
  intervals$upper[(i-1)*2 + 2] <- mean_val + t_margin
}

# Create the interval comparison plot using Plotly
fig_intervals <- plot_ly(
  data = intervals,
  x = ~lower,
  y = ~factor(conf, levels = sort(conf_levels, decreasing = TRUE)),
  color = ~type,
  colors = c("Z (Team A)" = "blue", "t (Team B)" = "red"),
  type = "scatter",
  mode = "lines+markers",
  line = list(width = 4),
  marker = list(size = 8)
) %>%
  add_trace(
    x = ~upper,
    y = ~factor(conf, levels = sort(conf_levels, decreasing = TRUE)),
    color = ~type,
    mode = "lines+markers",
    line = list(width = 4),
    marker = list(size = 8),
    showlegend = FALSE
  ) %>%
  layout(
    title = "Confidence Interval Comparison: Z (Team A) vs t (Team B)",
    xaxis = list(title = "API Latency (ms)", range = c(195, 225)),
    yaxis = list(title = "Confidence Level"),
    annotations = list(
      list(x = mean_val, y = 0.5, text = "Sample Mean: 210 ms", showarrow = FALSE, xref = "x", yref = "paper", font = list(size = 12))
    )
  )

# Now, create a distribution plot for Z and t (sampling distributions of the mean)
x_vals <- seq(190, 230, length.out = 1000)
se <- sigma / sqrt(n)  # Standard error for both (since sigma = s = 24)
z_density <- dnorm(x_vals, mean = mean_val, sd = se)
t_density <- dt((x_vals - mean_val) / se, df) / se  # Scaled t-density

dist_data <- data.frame(
  x = rep(x_vals, 2),
  density = c(z_density, t_density),
  type = rep(c("Z Distribution (Team A)", "t Distribution (Team B)"), each = length(x_vals))
)

fig_dist <- plot_ly(
  data = dist_data,
  x = ~x,
  y = ~density,
  color = ~type,
  colors = c("Z Distribution (Team A)" = "blue", "t Distribution (Team B)" = "red"),
  type = "scatter",
  mode = "lines",
  line = list(width = 2)
) %>%
  layout(
    title = "Sampling Distributions: Z vs t",
    xaxis = list(title = "Sample Mean (ms)", range = c(190, 230)),
    yaxis = list(title = "Density")
  )

# Combine both plots into a subplot for a comprehensive visualization
fig <- subplot(fig_dist, fig_intervals, nrows = 2, shareX = FALSE, titleY = TRUE) %>%
  layout(
    title = "Distribution and Interval Comparison: Z vs t Distributions",
    showlegend = TRUE
  )

# Display the plot
fig
```

###    Explanation of Interval Width Difference

The interval widths differ because of the underlying probability distributions used: the Standard Normal (Z) Distribution versus the Student's t-Distribution.

####    $\sigma$ Known (Team A $\rightarrow$ Z-Test)

- The Z-test is used when the population standard deviation ($\sigma$) is known.

- Since $\sigma$ is a fixed, known value, the estimate of the standard error ($\sigma/\sqrt{n}$) is highly certain and does not add extra variability to the analysis.

- The critical $Z^$ values are fixed based on the confidence level.

####    $\sigma$ Unknown (Team B $\rightarrow$ t-Test)

- The t-test is used when the population standard deviation ($\sigma$) is unknown, and we must substitute the sample standard deviation ($s$) as an estimate.

- The sample standard deviation ($s$) is itself an estimate that varies from sample to sample. This introduces an extra source of uncertainty into the standard error estimate.

- To account for this added uncertainty, the t-distribution has heavier tails (more spread out) than the Z-distribution.

- This results in larger critical values ($t > Z$) and, consequently, a larger Margin of Error (ME) and wider confidence intervals for the t-test compared to the Z-test at the same confidence level.

In summary: The t-test requires a wider interval (is less precise) to achieve the same confidence level as the Z-test because it must compensate for the additional uncertainty introduced by estimating the population standard deviation ($\sigma$) with the sample standard deviation ($s$).


---

##    Case Study 5

**One-Sided Confidence Interval:** A **Software as a Service (SaaS)** company wants to ensure that **at least 70% of weekly active users** utilize a premium feature.

From the experiment:

$$
\begin{eqnarray*}
n &=& 250 \quad \text{(total users)} \\
x &=& 185 \quad \text{(active premium users)}
\end{eqnarray*}
$$

Management is only interested in the **lower bound** of the estimate.

**Tasks:**

1. Identify the **type of Confidence Interval** and the appropriate test.
2. Compute the **one-sided lower Confidence Interval** at:
   - $90\%$
   - $95\%$
   - $99\%$
3. Visualize the lower bounds for all confidence levels.
4. Determine whether the **70% target** is statistically satisfied.

---

The given data is:
\begin{eqnarray*}
n &=& 250 \quad \text{(total users)} \\
x &=& 185 \quad \text{(active premium users)} \\


\hat{p} &=& \frac{x}{n} = \frac{185}{250} = 0.74
\end{eqnarray*}
The target proportion to ensure is $p_0 = 0.70$. 

The standard error of the sample proportion ($\hat{p}$) is:
$$
SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.74(1-0.74)}{250}} \approx 0.0277
$$


###    Identify the Type of Confidence Interval and the Appropriate Test

 
Type of Confidence Interval: One-Sided Lower Confidence Interval for a Population Proportion. The company is only interested in the lower bound to ensure the feature usage is at least $70\%$.

Appropriate Test/Method: The appropriate method is using the Z-test for a Population Proportion (or the Normal Approximation method for confidence intervals) because the sample size is large enough to satisfy the normal approximation conditions ($n\hat{p} = 185 > 10$ and $n(1-\hat{p}) = 65 > 10$).

###    Compute the One-Sided Lower Confidence Interval

The formula for the one-sided lower confidence bound is:
$$
\text{Lower Bound} = \hat{p} - Z_{1-\alpha} \cdot SE
$$

```{r,echo=FALSE,message=FALSE,warning=FALSE}
# Membuat tabel one-sided confidence interval
one_sided_ci <- data.frame(
  Confidence_Level = c("90%", "95%", "99%"),
  Alpha = c(0.10, 0.05, 0.01),
  Z_1_minus_Alpha = c(1.282, 1.645, 2.326),
  Lower_Bound = c(0.7044, 0.6944, 0.6755)
)

# Menampilkan tabel
library(knitr)

kable(
  one_sided_ci,
  caption = "One-Sided Confidence Interval (Lower Bound)",
  align = "c"
)

```

**Detailed Results:**
```{r,echo=FALSE}
# Membuat tabel one-sided confidence interval (versi presisi)
one_sided_ci_precise <- data.frame(
  Confidence_Level = c("90%", "95%", "99%"),
  Z_1_minus_Alpha = c(1.281552, 1.644854, 2.326348),
  Lower_Bound_CI = c(0.704448, 0.694369, 0.675463)
)

# Menampilkan tabel
library(knitr)

kable(
  one_sided_ci_precise,
  caption = "One-Sided Confidence Interval (Lower Bound)",
  digits = 6,
  align = "c"
)

```

###    Visualize the Lower Bounds for All Confidence Levels

The following plot illustrates the calculated lower bounds against the $70\%$ target.(A bar chart titled 'One-Sided Lower Confidence Bounds for Premium Feature Usage' is displayed. The x-axis shows confidence levels (90%, 95%, 99%), and the y-axis shows the Lower Bound (CI). A horizontal dashed red line indicates the target proportion of 0.70. The bars show lower bounds of 0.704


---

```{r,echo=FALSE}
# Load necessary libraries
library(plotly)

# Given data from the plot description
conf_levels <- c(0.90, 0.95, 0.99)
lower_bounds <- c(0.7044, 0.6944, 0.6755)
target <- 0.70

# Reverse-engineer p_hat and SE from the lower bounds
# Assuming lower bound = p_hat - z * SE for one-sided lower CI
z_scores <- qnorm(conf_levels)
diffs <- lower_bounds[1] - lower_bounds  # Differences from 90% to others
z_diffs <- z_scores - z_scores[1]
SE <- diffs[2] / z_diffs[2]  # Approximate SE from 90% to 95%
p_hat <- lower_bounds[1] + z_scores[1] * SE  # Calculate p_hat

# Verify: p_hat should be consistent
# For 90%: 0.7044 + 1.282 * SE ≈ p_hat
# For 95%: 0.6944 + 1.645 * SE ≈ p_hat
# For 99%: 0.6755 + 2.326 * SE ≈ p_hat
# SE ≈ 0.0277, p_hat ≈ 0.74

SE <- 0.0277
p_hat <- 0.74
n <- round(p_hat * (1 - p_hat) / SE^2)  # Approximate n ≈ 251
df <- n - 1  # Degrees of freedom for t-distribution

# Create data for Z-distribution (sampling distribution: normal with mean=p_hat, sd=SE)
x_z <- seq(p_hat - 4*SE, p_hat + 4*SE, length.out = 1000)
y_z <- dnorm(x_z, mean = p_hat, sd = SE)

# Create data for T-distribution (t with df, scaled to match mean and sd approx)
x_t <- seq(p_hat - 4*SE, p_hat + 4*SE, length.out = 1000)
y_t <- dt((x_t - p_hat)/SE, df) / SE  # Scale to match sd

# Create plotly figure for Z-distribution (sampling distribution)
fig_z <- plot_ly(x = x_z, y = y_z, type = 'scatter', mode = 'lines', name = 'Sampling Distribution (Z-approx)',
                 line = list(color = 'blue')) %>%
  layout(title = paste0('Sampling Distribution for Proportion (Z-approximation, p̂ = ', p_hat, ', SE = ', round(SE, 4), ')'),
         xaxis = list(title = 'Proportion'),
         yaxis = list(title = 'Density'))

# Add horizontal line for target
fig_z <- fig_z %>% add_trace(x = c(min(x_z), max(x_z)), y = c(0, 0), type = 'scatter', mode = 'lines', 
                             line = list(color = 'red', dash = 'dash'), name = paste0('Target: ', target))

# Add vertical lines for lower bounds
for (i in 1:length(conf_levels)) {
  fig_z <- fig_z %>% add_trace(x = c(lower_bounds[i], lower_bounds[i]), y = c(0, dnorm(lower_bounds[i], p_hat, SE)), 
                               type = 'scatter', mode = 'lines', line = list(color = 'green', dash = 'dot'),
                               name = paste0(conf_levels[i]*100, '% Lower Bound: ', lower_bounds[i]))
}

```
----


----
```{r,echo=FALSE}

# Create plotly figure for T-distribution
fig_t <- plot_ly(x = x_t, y = y_t, type = 'scatter', mode = 'lines', name = paste0('Sampling Distribution (T, df = ', df, ')'),
                 line = list(color = 'green')) %>%
  layout(title = paste0('Sampling Distribution for Proportion (T-distribution, df = ', df, ')'),
         xaxis = list(title = 'Proportion'),
         yaxis = list(title = 'Density'))

# Add horizontal line for target
fig_t <- fig_t %>% add_trace(x = c(min(x_t), max(x_t)), y = c(0, 0), type = 'scatter', mode = 'lines', 
                             line = list(color = 'red', dash = 'dash'), name = paste0('Target: ', target))

# Add vertical lines for lower bounds
for (i in 1:length(conf_levels)) {
  fig_t <- fig_t %>% add_trace(x = c(lower_bounds[i], lower_bounds[i]), y = c(0, dt((lower_bounds[i] - p_hat)/SE, df) / SE), 
                               type = 'scatter', mode = 'lines', line = list(color = 'green', dash = 'dot'),
                               name = paste0(conf_levels[i]*100, '% Lower Bound: ', lower_bounds[i]))
}

# Display the plots
fig_z
fig_t
```
---

###    Determine Whether the $70\%$ Target is Statistically Satisfied

The $70\%$ target is statistically satisfied at a given confidence level if the calculated Lower Bound is $\geq 0.70$.

####    At $90\%$ Confidence:

- Lower Bound:$0.7044$

- Conclusion: Statistically Satisfied. Since $0.7044 > 0.70$, we are $90\%$ confident that the true proportion of weekly active users utilizing the premium feature is at least $70.44\%$.

####    At $95\%$ Confidence:

- Lower Bound: $0.6944$

- Conclusion: NOT Statistically Satisfied. Since $0.6944 < 0.70$, we cannot be $95\%$ confident that the true proportion is at least $70\%$.

####    At $99\%$ Confidence:

- Lower Bound: $0.6755$

- Conclusion: NOT Statistically Satisfied. Since $0.6755 < 0.70$, we cannot be $99\%$ confident that the true proportion is at least $70\%$.

Summary: The company can be $90\%$ confident that the true proportion of weekly active users utilizing a premium feature is at least $70\%$. However, they cannot make this claim at the stricter $95\%$ or $99\%$ confidence levels.
```{r,echo=FALSE,mensage=FALSE}
# Load library
library(plotly)

# Fungsi untuk membuat plot distribusi z
plot_z_distribution <- function() {
  # Data untuk distribusi normal standar
  x <- seq(-4, 4, length.out = 1000)
  y <- dnorm(x, mean = 0, sd = 1)
  
  # Z-scores untuk confidence levels (one-tailed)
  z_90 <- qnorm(0.90)  # ≈ 1.282
  z_95 <- qnorm(0.95)  # ≈ 1.645
  z_99 <- qnorm(0.99)  # ≈ 2.326
  
  # Buat plot dengan plotly
  p <- plot_ly() %>%
    add_trace(x = x, y = y, type = 'scatter', mode = 'lines', name = 'Distribusi Z',
              line = list(color = 'blue')) %>%
    # Area untuk 90% confidence
    add_trace(x = x[x <= z_90], y = y[x <= z_90], type = 'scatter', mode = 'lines', fill = 'tozeroy',
              fillcolor = 'rgba(0, 255, 0, 0.3)', line = list(color = 'green'), name = '90% Confidence (Satisfied)') %>%
    # Area untuk 95% confidence
    add_trace(x = x[x <= z_95], y = y[x <= z_95], type = 'scatter', mode = 'lines', fill = 'tozeroy',
              fillcolor = 'rgba(255, 255, 0, 0.3)', line = list(color = 'yellow'), name = '95% Confidence (Not Satisfied)') %>%
    # Area untuk 99% confidence
    add_trace(x = x[x <= z_99], y = y[x <= z_99], type = 'scatter', mode = 'lines', fill = 'tozeroy',
              fillcolor = 'rgba(255, 0, 0, 0.3)', line = list(color = 'red'), name = '99% Confidence (Not Satisfied)') %>%
    # Garis vertikal untuk z-scores
    add_trace(x = c(z_90, z_90), y = c(0, dnorm(z_90)), type = 'scatter', mode = 'lines',
              line = list(color = 'green', dash = 'dash'), name = paste('Z 90%:', round(z_90, 3))) %>%
    add_trace(x = c(z_95, z_95), y = c(0, dnorm(z_95)), type = 'scatter', mode = 'lines',
              line = list(color = 'yellow', dash = 'dash'), name = paste('Z 95%:', round(z_95, 3))) %>%
    add_trace(x = c(z_99, z_99), y = c(0, dnorm(z_99)), type = 'scatter', mode = 'lines',
              line = list(color = 'red', dash = 'dash'), name = paste('Z 99%:', round(z_99, 3))) %>%
    layout(title = 'Distribusi Z (Normal Standar) untuk Confidence Levels',
           xaxis = list(title = 'Z-Score'),
           yaxis = list(title = 'Density'),
           annotations = list(
             list(x = z_90, y = dnorm(z_90) + 0.05, text = 'Lower Bound: 0.7044 (Satisfied)', showarrow = FALSE),
             list(x = z_95, y = dnorm(z_95) + 0.05, text = 'Lower Bound: 0.6944 (Not Satisfied)', showarrow = FALSE),
             list(x = z_99, y = dnorm(z_99) + 0.05, text = 'Lower Bound: 0.6755 (Not Satisfied)', showarrow = FALSE)
           ))
  
  return(p)
}

# Fungsi untuk membuat plot distribusi t
plot_t_distribution <- function(df = 30) {
  # Data untuk distribusi t
  x <- seq(-4, 4, length.out = 1000)
  y <- dt(x, df = df)
  
  # T-scores untuk confidence levels (one-tailed)
  t_90 <- qt(0.90, df = df)
  t_95 <- qt(0.95, df = df)
  t_99 <- qt(0.99, df = df)
  
  # Buat plot dengan plotly
  p <- plot_ly() %>%
    add_trace(x = x, y = y, type = 'scatter', mode = 'lines', name = paste('Distribusi T (df =', df, ')'),
              line = list(color = 'purple')) %>%
    # Area untuk 90% confidence
    add_trace(x = x[x <= t_90], y = y[x <= t_90], type = 'scatter', mode = 'lines', fill = 'tozeroy',
              fillcolor = 'rgba(0, 255, 0, 0.3)', line = list(color = 'green'), name = '90% Confidence (Satisfied)') %>%
    # Area untuk 95% confidence
    add_trace(x = x[x <= t_95], y = y[x <= t_95], type = 'scatter', mode = 'lines', fill = 'tozeroy',
              fillcolor = 'rgba(255, 255, 0, 0.3)', line = list(color = 'yellow'), name = '95% Confidence (Not Satisfied)') %>%
    # Area untuk 99% confidence
    add_trace(x = x[x <= t_99], y = y[x <= t_99], type = 'scatter', mode = 'lines', fill = 'tozeroy',
              fillcolor = 'rgba(255, 0, 0, 0.3)', line = list(color = 'red'), name = '99% Confidence (Not Satisfied)') %>%
    # Garis vertikal untuk t-scores
    add_trace(x = c(t_90, t_90), y = c(0, dt(t_90, df)), type = 'scatter', mode = 'lines',
              line = list(color = 'green', dash = 'dash'), name = paste('T 90%:', round(t_90, 3))) %>%
    add_trace(x = c(t_95, t_95), y = c(0, dt(t_95, df)), type = 'scatter', mode = 'lines',
              line = list(color = 'yellow', dash = 'dash'), name = paste('T 95%:', round(t_95, 3))) %>%
    add_trace(x = c(t_99, t_99), y = c(0, dt(t_99, df)), type = 'scatter', mode = 'lines',
              line = list(color = 'red', dash = 'dash'), name = paste('T 99%:', round(t_99, 3))) %>%
    layout(title = paste('Distribusi T (df =', df, ') untuk Confidence Levels'),
           xaxis = list(title = 'T-Score'),
           yaxis = list(title = 'Density'),
           annotations = list(
             list(x = t_90, y = dt(t_90, df) + 0.05, text = 'Lower Bound: 0.7044 (Satisfied)', showarrow = FALSE),
             list(x = t_95, y = dt(t_95, df) + 0.05, text = 'Lower Bound: 0.6944 (Not Satisfied)', showarrow = FALSE),
             list(x = t_99, y = dt(t_99, df) + 0.05, text = 'Lower Bound: 0.6755 (Not Satisfied)', showarrow = FALSE)
           ))
  
  return(p)
}

# Jalankan dan tampilkan plot
plot_z <- plot_z_distribution()
plot_t <- plot_t_distribution(df = 30)  # Ubah df jika diketahui ukuran sampel

# Tampilkan plot (bisa dijalankan satu per satu atau gabungkan)
plot_z
plot_t
```

