Question: New Cholesterol Medication
A pharmaceutical company develops “CholestFix” to
reduce Low-Density Lipoprotein (LDL, fat carrier that’s low in
density) cholesterol. The current standard drug
lowers LDL by an average of 25 mg/dL with a standard deviation of 15
mg/dL. A clinical trial with 5 participants were recruited in the study
for three months. At the end of the study, the mean reduction is 29
mg/dL. Assume that the variance of LDL reduction of new drug is the same
as that of the standard drugs.
Based on the results in the clinical trial, researchers in the company
believe
CholestFix is more effective.
a). Perform a formal hypothesis test of the researchers’ belief
regarding LDL reduction, using a significance level of \(\alpha = 0.05\).
# Given values
xbar <- 29
mu0 <- 25
sigma <- 15
n <- 5
# Standard error
se <- sigma / sqrt(n)
# Z-statistic
z <- (xbar - mu0) / se
# One-sided p-value
p_value <- 1 - pnorm(z)
# Output
z
[1] 0.5962848
[1] 0.2754925
The test compares the mean cholesterol reduction of CholestFix to
the standard value of 25 using a one-sided z-test. The resulting p-value
is relatively large (≈ 0.275) indicating that the observed sample mean
of 29 is not sufficiently extreme given the variability and small sample
size.
This analysis is valid because the population standard deviation
is assumed known and the test statistic follows a normal distribution.
However, the small sample size (n = 5) limits statistical
power.
Conclusion: There is insufficient evidence at the 5%
significance level to conclude that CholestFix produces a greater mean
reduction than the standard drug.
b). Given \(n = 50, \sigma = 15, \alpha =
0.05\), and an effect size we wish to detect \(\delta = 4\) mg/dL (corresponding to a
reduction from 29 mg/dL to 25 mg/dL). What is the probability that we’d
detect a true improvement?
# Given values
sigma <- 15
n <- 50
delta <- 4 # true mean difference
alpha <- 0.05
# Standard error
se <- sigma / sqrt(n)
# Critical value
z_alpha <- qnorm(1 - alpha)
# Effect size in z-units
effect_z <- delta / se
# Power
power <- 1 - pnorm(z_alpha - effect_z)
# Output
power
[1] 0.5951312
The computed power is approximately 0.60 meaning there is about a
60% probability of correctly rejecting the null hypothesis if the true
mean improvement is 4 units above the standard.
This calculation is valid under the assumptions of known
population variance and normality. It reflects the test’s sensitivity
given the specified effect size and sample size.
Conclusion: The study has only moderate power implying a
relatively high chance (40%) of failing to detect a true improvement. A
larger sample size would be needed for more reliable
detection.
c). Determine the minimum sample size required to detect an effect
size of 4 mg/dL with a power of \(1 - \beta =
0.8\) and a significance level of \(\alpha = 0.05\). Assume the standard
deviation of LDL reduction is 15 mg/dL.
# Given values
sigma <- 15
delta <- 4
alpha <- 0.05
power_target <- 0.80
# Z values
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power_target)
# Sample size formula
n_required <- (( (z_alpha + z_beta) * sigma ) / delta)^2
# Round up
n_required <- ceiling(n_required)
# Output
n_required
[1] 87
The required sample size to achieve 80% power is 87. This ensures
a high probability of detecting a true effect of size 4 at the 5%
significance level.
This result follows from standard sample size formulas for
z-tests under normality and known variance assumptions.
Conclusion: A larger sample size than initially used is
necessary to achieve adequate statistical power.
d).
Power curve: To assess the impact of sample size on
power, we can create a power function in terms of the sample size
\(n\) and use the remaining information from
part (b). Plot the power curve by selecting a sequence of sample sizes.
# Parameters
sigma <- 15
delta <- 4
alpha <- 0.05
# Sample size range
n_seq <- seq(10, 150, by = 1)
# Function to compute power
power_func <- function(n) {
se <- sigma / sqrt(n)
z_alpha <- qnorm(1 - alpha)
effect_z <- delta / se
1 - pnorm(z_alpha - effect_z)
}
# Compute power for each n
power_vals <- sapply(n_seq, power_func)
# Plot
plot(n_seq, power_vals, type = "l",
xlab = "Sample Size (n)",
ylab = "Power",
main = "Power Curve")
# Add reference line at 80% power
abline(h = 0.8, lty = 2)

The power curve shows that the statistical power increases as
sample size increases. For small sample sizes, when power is low, true
effects are likely to be missed but as sample size approaches 87, power
reaches the desired 80% level.
This analysis is valid because it applies the same assumptions as
earlier parts while systematically varying sample size.
Conclusion: Increasing sample size significantly improves the
ability to detect meaningful effects even though gains become more
gradual at larger sample sizes.
Note: For each of the questions
above, write a short summary of what you observed, justify why your
analysis is valid, and interpret the results.
---
title: "Assignment 9: Hypothesis Testing and Power and Sample size Determination"
author: "Kieran Hefferan "
date: " Due: 4/7/26 "
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    highlight: monochrome
    theme: spacelab
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body {background-color: #ffffff;
      color: #000000;
      font-family: Arial, sans-serif;
      font-size: 1rem;
      line-height: 1.6;
      }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}

if (!require("VGAM")) {
  install.packages("VGAM")
  library(VGAM)
}
#### VGAM
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```
 
 \
 
## **Assignment Objectives** 

<p>
* Enhance understanding the logic and procedure of hypothesis testing .

* Implement the procedures for power and sample size calculation for basic hypothesis testing procedures using nuilt-in function and manual calculation.
</p>


## **Policies of Using AI Tools**

<p>
**Policy on AI Tool Use**: Please adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.
</p>

<p>
**Code Inclusion Requirement**: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.
</p>


## **Simple versus Composite Hypothesis*

**Simple Hypothesis**

* **Simple Hypothesis Test** is a hypothesis that completely specifies the population distribution. Mathematically, simple hypothesis fixes all parameters to specific values. For example, The simple hypotheses are: $H_0: \mu = 5$ (if $\alpha$ is known), $H_0: \mu = 100, \sigma^2 = 25$, and $H_1: p = 0.7$ for a Bernoulli distribution.

* **Example Scenario**  Test if a coin is fair:

\begin{aligned}
H_0&: p = 0.5 \quad \text{(completely specified)} \\
H_1&: p = 0.6 \quad \text{(also completely specified)}
\end{aligned}

Both are **simple** hypotheses.

**Composite Hypothesis**

* **Composite Hypothesis** is a hypothesis that does not completely specify the distribution. Mathematically, it allows a range of values for at least one parameter. For example, in one-sided: $\mu >5$; in two-sided: $\mu \le 5$.

* **Example Scenarios** 


\begin{aligned}
&H_0: \mu = 5 && \text{(simple)} \\
&H_1: \mu > 5 && \text{(composite)}
\end{aligned}



\begin{aligned}
&H_0: \mu \leq 5 && \text{(composite)} \\
&H_1: \mu > 5 && \text{(composite)}
\end{aligned}


\begin{aligned}
&H_0: \text{data follows } N(\mu, 1), \mu = 0 && \text{(simple)} \\
&H_1: \text{data follows Poisson}(\lambda) && \text{(composite – different family)}
\end{aligned}



<p><font color = "darkred">**This assignment focuses on performing performing a test of mean ($\mu$) a normal population and calculating the power and sample size based on various assumptions**</font></p>


\

## **Question: New Cholesterol Medication**

<p>
A pharmaceutical company develops **"CholestFix"** to reduce Low-Density Lipoprotein (LDL, *fat carrier that's low in density*) cholesterol. The **current standard drug** lowers LDL by an average of 25 mg/dL with a standard deviation of 15 mg/dL. A clinical trial with 5 participants were recruited in the study for three months. At the end of the study, the mean reduction is 29 mg/dL. Assume that the variance of LDL reduction of new drug is the same as that of the standard drugs.

Based on the results in the clinical trial, researchers in the company believe **CholestFix** is more effective.
</p>


<p>
a). Perform a formal hypothesis test of the researchers’ belief regarding LDL reduction, using a significance level of $\alpha = 0.05$.
```{r}
# Given values
xbar <- 29
mu0  <- 25
sigma <- 15
n <- 5

# Standard error
se <- sigma / sqrt(n)

# Z-statistic
z <- (xbar - mu0) / se

# One-sided p-value
p_value <- 1 - pnorm(z)

# Output
z
p_value
```
*The test compares the mean cholesterol reduction of CholestFix to the standard value of 25 using a one-sided z-test. The resulting p-value is relatively large (≈ 0.275) indicating that the observed sample mean of 29 is not sufficiently extreme given the variability and small sample size.*

*This analysis is valid because the population standard deviation is assumed known and the test statistic follows a normal distribution. However, the small sample size (n = 5) limits statistical power.*

**Conclusion: There is insufficient evidence at the 5% significance level to conclude that CholestFix produces a greater mean reduction than the standard drug.** 

b). Given $n = 50, \sigma = 15, \alpha = 0.05$, and an effect size we wish to detect $\delta = 4$ mg/dL (corresponding to a reduction from 29 mg/dL to 25 mg/dL).  What is the probability that we'd detect a true improvement?
```{r}
# Given values
sigma <- 15
n <- 50
delta <- 4   # true mean difference
alpha <- 0.05

# Standard error
se <- sigma / sqrt(n)

# Critical value
z_alpha <- qnorm(1 - alpha)

# Effect size in z-units
effect_z <- delta / se

# Power
power <- 1 - pnorm(z_alpha - effect_z)

# Output
power
```
*The computed power is approximately 0.60 meaning there is about a 60% probability of correctly rejecting the null hypothesis if the true mean improvement is 4 units above the standard.*

*This calculation is valid under the assumptions of known population variance and normality. It reflects the test’s sensitivity given the specified effect size and sample size.*

**Conclusion: The study has only moderate power implying a relatively high chance (40%) of failing to detect a true improvement. A larger sample size would be needed for more reliable detection.**

c). Determine the minimum sample size required to detect an effect size of 4 mg/dL with a power of $1 - \beta = 0.8$  and a significance level of $\alpha = 0.05$. Assume the standard deviation of LDL reduction is 15 mg/dL.
```{r}
# Given values
sigma <- 15
delta <- 4
alpha <- 0.05
power_target <- 0.80

# Z values
z_alpha <- qnorm(1 - alpha)
z_beta  <- qnorm(power_target)

# Sample size formula
n_required <- (( (z_alpha + z_beta) * sigma ) / delta)^2

# Round up
n_required <- ceiling(n_required)

# Output
n_required
```
*The required sample size to achieve 80% power is 87. This ensures a high probability of detecting a true effect of size 4 at the 5% significance level.*

*This result follows from standard sample size formulas for z-tests under normality and known variance assumptions.*

**Conclusion: A larger sample size than initially used is necessary to achieve adequate statistical power.**

d). **Power curve**: To assess the impact of sample size on power, we can create a power function in terms of the sample size $n$ and use the remaining information from part (b). Plot the power curve by selecting a sequence of sample sizes.
</p>
```{r}
# Parameters
sigma <- 15
delta <- 4
alpha <- 0.05

# Sample size range
n_seq <- seq(10, 150, by = 1)

# Function to compute power
power_func <- function(n) {
  se <- sigma / sqrt(n)
  z_alpha <- qnorm(1 - alpha)
  effect_z <- delta / se
  1 - pnorm(z_alpha - effect_z)
}

# Compute power for each n
power_vals <- sapply(n_seq, power_func)

# Plot
plot(n_seq, power_vals, type = "l",
     xlab = "Sample Size (n)",
     ylab = "Power",
     main = "Power Curve")

# Add reference line at 80% power
abline(h = 0.8, lty = 2) 

```

*The power curve shows that the statistical power increases as sample size increases. For small sample sizes, when power is low, true effects are likely to be missed but as sample size approaches 87, power reaches the desired 80% level.*

*This analysis is valid because it applies the same assumptions as earlier parts while systematically varying sample size.*

**Conclusion: Increasing sample size significantly improves the ability to detect meaningful effects even though gains become more gradual at larger sample sizes.**


<font color = "red">**Note**: For each of the questions above, write a short summary of what you observed, justify why your analysis is valid, and interpret the results.</font>

