## start by coding in given values from instructions
# DISCLAIMER: I am assuming there is a typo in the opening paragraph of the assignment. I am setting sample size to 50, not 5.
n = 50
# old_drug = what's currently on market
old_drug_xbar = 25 #mg/dl
old_drug_sd = 15 #mg/dl
# new_drug = CholestFix
new_drug_xbar = 29 #mg/dl
new_drug_sd = 15 #mg/dl
Intro and Setup
For today’s analysis we have been contacted by a pharmaceutical
company who has developed a drug they are calling CholestFix,
whose goal is to reduce LDL cholesterol. While we were not provided
access to the dataset itself, we were still given enough valuable
information to assess how effective CholestFix is in comparison to the
current LDL-fighting drugs on the marketplace (what we will refer to as
market-standard).
The market-standard sees an average LDL reduction of 25 mg/dL with a
standard deviation of 15 mg/dL. However, the three-month study conducted
by our pharma company, with a sample size of 50 patients, showed a mean
LDL reduction in patients who took CholestFix of 29 mg/dL. The standard
deviation for subjects given CholestFix was also 15 mg/dL though.
a.) Initial Hypothesis Test
Given this available information, the first step I performed was a
one-sample t-test. This hypothesis test is relevant as we are comparing
the mean of one test sample (the 50 patients who took CholestFix) to the
mean of a more general or broadly accepted population parameter (the
market-standard pharmaceutical option).
Technically speaking, calculations were performed via a custom-built
R function and conducted with a confidence level of 95%. I chose to set
\(\alpha\) = to 0.05 as that is
generally considered industry standard. Results are below.
# will use a one-sample t test
# only have one sample, the old_drug stats are the population (or baseline) for which we are comparing too
# need sample data to use R's t.test function, we don't have sample data so will be doing manual calculation
# t value formula from Sec 3.1 step 3
alpha = 0.05
one_sample_t_no_data = function(n, xbar, pop_mean, pop_sd, alpha, tail_option){
df = n - 1
numerator = xbar - pop_mean
denominator = pop_sd/sqrt(n)
t = numerator/denominator
#### now we have t value
crit_value = qt(p = alpha, df = df, lower.tail = tail_option)
# using quantile function with t distribution to get critical value
results = rbind(t, crit_value)
return(results)
}
help(qt)
t_test = one_sample_t_no_data(n = n,
xbar = new_drug_xbar,
pop_mean = old_drug_xbar,
pop_sd = old_drug_sd,
alpha = alpha,
tail_option = FALSE)
# tail option is FALSE as we are testing the "researchers' belief", which is specifically that new drug's mean is MORE than the current standard
rownames(t_test) = c("T Value", "Critical Value")
colnames(t_test) = NULL
kable(round(t_test,4),align = "c",
caption = "<span style='color:##000000;'>
T-Test Results </span>") %>%
kable_styling(
bootstrap_options = c("striped", "bordered"),
full_width = FALSE,
position = "center")
T-Test Results
|
|
|
|
T Value
|
1.8856
|
|
Critical Value
|
1.6766
|
Looking at our test output, we are lead to believe that there is
significant reason to believe that CholestFix does reduce LDL
cholesterol, and does so to a more effective degree than the current
market-standard. Using a T distribution, our sample data’s standardized
value is about 0.209 units greater than that of the market-standard. In
other words, we reject the null hypothesis in this
instance.
b.) Power Analysis (Current Conditions)
Now knowing that there appears to be statistical significance to our
sample data’s greater reduction in LDL cholesterol, we assess the power
of our findings. Statistical power can be defined as the probability of
rejecting the null hypothesis (or default assumption) IF the alternative
is actually true. To use criminal justice analogously, power is
essentially the probability of convicting someone on trial who truly did
commit the crime they are accused of. Power is bounded between (0 and
100)%, and generally it is the goal of a researcher to achieve power of
at least 80%.
## reference section 6.5
effect_size = 4
# "probability we'd detect a true improvement" = Power
power_test = power.t.test(n = n,
delta = effect_size,
# delta argument within the function is the "true difference in means" which we belive to be 4
sd = old_drug_sd,
# sd argument is for the assumed POPULATION/baseline standard deviation
sig.level = alpha,
type ="one.sample",
alternative = "one.sided"
)
partb_power = power_test$power # = 0.5849737
There are numerous factors that go into the calculation of power,
including what type of statistical test we are running, our sample size,
sample statistics, population parameters and such. As previously noted,
we have a sample size of 50, a population standard deviation of 15 mg/dL
and are operating with a Type I error rate (\(\alpha\)) of 0.05. Additionally, the effect
size of our experiment (often denoted as \(\delta\)) is equal to 4 as that is the
difference between our sample mean of LDL reduction and what is
market-standard.
Given these conditions, I used R’s power.t.test() function to calculate the
power of this specific analysis (ie the probability that we would detect
an improvement in LDL reduction if one genuinely existed).
Unfortunately, our current power is only about 58.5%, a
far cry from the 80% that is generally desired.
In the future, if our pharmaceutical partners wanted to conduct a
similar experiment with a greater degree of power, they could achieve
such by garnering a larger sample size and/or operating under a more
liberal confidence level. In other words, sample size and power are
positively correlated while confidence and power are inversely
related.
c.) Sample Size Determination
As mentioned above, power, sample size, confidence and effect size
are all interdependent. Below, I once again use R’s power.t.test() function, this time from the
perspective of searching for a necessary sample size.
partc_power = 0.8
# same function as part b, this time replace input for n argument with input for power argument
sample_size_test = power.t.test(power = partc_power,
delta = effect_size,
sd = old_drug_sd,
sig.level = alpha,
type ="one.sample",
alternative = "one.sided"
)
desired_n = sample_size_test$n
In a theoretical future study of CholestFix, if we were to maintain a
confidence level of 95% (ie \(\alpha\)
= 0.05), still have our sample data show us an effect size of 4, a
sample mean of 29 and a sample standard deviation of 15 we would need a
sample size of at least 89 subjects.
Per our calculations, the required sample size is technically about
88.31, however we can obviously not have a proportion of an individual
person participate in our study. In minimum-based calculations like
these, we must always round up despite conventional mathematics
rounding a value like this one down.
d.) Power Analysis W/ Theoretical Sample Sizes
To further analyze the relationship between sample size and power of
experiment, I used a custom function in R to generate the power of a
one-sample t-test (with the same characteristics of the experiment that
our pharmaceutical partner previously did) with sample sizes iterating
by 1, starting from 50 and ending at 200. A resulting plot of this data
is below.
# let variable x denote the different sample sizes that we are testing for power results
x = seq(from = 50, to = 200, by = 1)
# START CUSTOM FUNCTION
find_power = function(x, effect_size, pop_sd, alpha, test_type, alt){
power_results = numeric(length(x))
# START FOR LOOP
for (i in min(x):max(x)){
power_sample = power.t.test(n = x,
delta = effect_size,
sd = pop_sd,
sig.level = alpha,
type = test_type,
alternative = alt
)
## will put results in data frame format for easier analysis
power_results_n = cbind(power_sample$n)
power_results_power = cbind(power_sample$power)
results = cbind(power_results_n, power_results_power)
colnames(results) = c("n", "power")
} # END FOR LOOP
return(results)
} # END CUSTOM FUNCTION
power_curve_data = find_power(x = x,
effect_size = effect_size,
pop_sd = old_drug_sd,
alpha = alpha,
test_type = "one.sample",
alt = "one.sided")
power_curve_data = data.frame(power_curve_data)
lin_model = lm(power_curve_data$power ~ power_curve_data$n)
ggplot(data = power_curve_data) +
geom_point(mapping = aes(x = n, y = power)) +
geom_hline(yintercept = 0.8,
color = "red",
linewidth = 1) +
# using geom_hline to point out the threshold between values that do and don't achieve ideal power
geom_hline(yintercept = 0.9,
color = "blue",
linewidth = 1) +
labs(title = "Sample Size and Power Relationship",
x = "Sample Size (n)",
y = "Power (0 - 1 Scale)")

As we can see, the benchmark threshold of 80% power is crossed once
our sample size reaches just below 90 subjects. If we are to hold
ourselves to an even stricter standard and attempt to achieve 90% power,
while holding all other elements of our experiment (like confidence
level for example) constant, then we would need at least 122 subjects to
participate.
---
title: "Hypothesis Testing and Post-Analysis W/ T-Test"
author: "Chris Bahm"
date: "April 4, 2026"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: false
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---

```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.

if (!require(tidyverse)) {library(tidyvserse)} 

if (!require(GGally)) {library(GGally)} 

if (!require(kableExtra)) {library(kableExtra)} 

if (!require(ggplot2)) {library(ggplot2)} 

if (!require(car)) {library(car)} 

if (!require(dplyr)) {library(dplyr)} 

if (!require(pander)) {library(pander)} 

if (!require(forecast)) {library(forecast)} 

if (!require(lubridate)) {library(lubridate)} 

if (!require("scales")) {
install.packages("scales")                                        
library("scales") 
}

knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	comment = NA,
	results = TRUE,
	fig.align = "center"
)
```

```{r}
## start by coding in given values from instructions
  # DISCLAIMER: I am assuming there is a typo in the opening paragraph of the assignment. I am setting sample size to 50, not 5.
n = 50

# old_drug = what's currently on market
old_drug_xbar = 25 #mg/dl
old_drug_sd = 15 #mg/dl

# new_drug = CholestFix
new_drug_xbar = 29 #mg/dl
new_drug_sd = 15 #mg/dl
```

# Intro and Setup
For today's analysis we have been contacted by a pharmaceutical company who has developed a drug they are calling *CholestFix*, whose goal is to reduce LDL cholesterol. While we were not provided access to the dataset itself, we were still given enough valuable information to assess how effective CholestFix is in comparison to the current LDL-fighting drugs on the marketplace (what we will refer to as market-standard).

The market-standard sees an average LDL reduction of `r old_drug_xbar` mg/dL with a standard deviation of `r old_drug_sd` mg/dL. However, the three-month study conducted by our pharma company, with a sample size of `r n` patients, showed a mean LDL reduction in patients who took CholestFix of `r new_drug_xbar` mg/dL. The standard deviation for subjects given CholestFix was also `r new_drug_sd` mg/dL though.

# a.) Initial Hypothesis Test
Given this available information, the first step I performed was a one-sample t-test. This hypothesis test is relevant as we are comparing the mean of one test sample (the `r n` patients who took CholestFix) to the mean of a more general or broadly accepted population parameter (the market-standard pharmaceutical option).

Technically speaking, calculations were performed via a custom-built R function and conducted with a confidence level of 95%. I chose to set $\alpha$ = to 0.05 as that is generally considered industry standard. Results are below.
```{r}
# will use a one-sample t test
  # only have one sample, the old_drug stats are the population (or baseline) for which we are comparing too

# need sample data to use R's t.test function, we don't have sample data so will be doing manual calculation
  # t value formula from Sec 3.1 step 3
alpha = 0.05


one_sample_t_no_data = function(n, xbar, pop_mean, pop_sd, alpha, tail_option){
  df = n - 1
  numerator = xbar - pop_mean
  denominator = pop_sd/sqrt(n)
  t = numerator/denominator
 #### now we have t value
  
  crit_value = qt(p = alpha, df = df, lower.tail = tail_option)
    # using quantile function with t distribution to get critical value
  results = rbind(t, crit_value)
  
  return(results)
}
help(qt)

t_test = one_sample_t_no_data(n = n, 
                     xbar = new_drug_xbar, 
                     pop_mean = old_drug_xbar,
                     pop_sd = old_drug_sd,
                     alpha = alpha,
                     tail_option = FALSE)
                      # tail option is FALSE as we are testing the "researchers' belief", which is specifically that new drug's mean is MORE than the current standard

rownames(t_test) = c("T Value", "Critical Value")
colnames(t_test) = NULL

kable(round(t_test,4),align = "c",
      caption = "<span style='color:##000000;'>
      T-Test Results </span>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")

```

Looking at our test output, we are lead to believe that there is *significant* reason to believe that CholestFix does reduce LDL cholesterol, and does so to a more effective degree than the current market-standard. Using a T distribution, our sample data's standardized value is about `r round(t_test[1],4) - round(t_test[2],4)` units greater than that of the market-standard. In other words, we **reject** the null hypothesis in this instance.

```{r, include=FALSE}

## Test using simulated data and built-in R function
set.seed(123)
sim_data = rnorm(n = 50, mean = new_drug_xbar, sd = new_drug_sd)
sim_data_test = t.test(sim_data, mu = old_drug_xbar, alternative = "greater")
sim_data_test
sim_data_test$statistic
  # t test results vary WILDLY, use a bootstrap

bootstrap_sim = function(data, b, mu, method){
  n = length(data)
  bootstrap_t_estimates = numeric(b)
  
  # START FOR LOOP
    for (i in 1:b){
      boot_sample = sample(data, size = n, replace = TRUE)
      
      t_tests = t.test(boot_sample, mu = mu, alternative = method)
      bootstrap_t_estimates[i] = t_tests$statistic
    } # END FOR LOOP
  
  return(bootstrap_t_estimates)
}

sim_data_t = bootstrap_sim(data = sim_data, b = 500, mu = old_drug_xbar, method = "greater")

mean(sim_data_t)

```

# b.) Power Analysis (Current Conditions)
Now knowing that there appears to be statistical significance to our sample data's greater reduction in LDL cholesterol, we assess the power of our findings. Statistical power can be defined as the probability of rejecting the null hypothesis (or default assumption) IF the alternative is actually true. To use criminal justice analogously, power is essentially the probability of convicting someone on trial who truly did commit the crime they are accused of. Power is bounded between (0 and 100)%, and generally it is the goal of a researcher to achieve power of at least 80%. 

```{r}
## reference section 6.5

effect_size = 4
# "probability we'd detect a true improvement" = Power

power_test = power.t.test(n = n,
             delta = effect_size,
              # delta argument within the function is the "true difference in means" which we belive to be 4
             sd = old_drug_sd,
             # sd argument is for the assumed POPULATION/baseline standard deviation
             sig.level = alpha,
             type ="one.sample",
             alternative = "one.sided"
             )

partb_power = power_test$power # = 0.5849737
```
There are numerous factors that go into the calculation of power, including what type of statistical test we are running, our sample size, sample statistics, population parameters and such. As previously noted, we have a sample size of `r n`, a population standard deviation of `r old_drug_sd` mg/dL and are operating with a Type I error rate ($\alpha$) of `r alpha`. Additionally, the effect size of our experiment (often denoted as $\delta$) is equal to `r effect_size` as that is the difference between our sample mean of LDL reduction and what is market-standard.

Given these conditions, I used R's <span style="color:blue;">power.t.test()</span> function to calculate the power of this specific analysis (ie the probability that we would detect an improvement in LDL reduction if one genuinely existed). Unfortunately, our current power is only about **`r round(partb_power,4)*100`%**, a far cry from the 80% that is generally desired.

In the future, if our pharmaceutical partners wanted to conduct a similar experiment with a greater degree of power, they could achieve such by garnering a larger sample size and/or operating under a more liberal confidence level. In other words, sample size and power are positively correlated while confidence and power are inversely related.

# c.) Sample Size Determination
As mentioned above, power, sample size, confidence and effect size are all interdependent. Below, I once again use R's <span style="color:blue;">power.t.test()</span> function, this time from the perspective of searching for a necessary sample size.

```{r}
partc_power = 0.8

# same function as part b, this time replace input for n argument with input for power argument
sample_size_test = power.t.test(power = partc_power,
                                delta = effect_size,
                                sd = old_drug_sd,
                                sig.level = alpha,
                                type ="one.sample",
                                alternative = "one.sided"
                                )

desired_n = sample_size_test$n
```

In a theoretical future study of CholestFix, if we were to maintain a confidence level of 95% (ie $\alpha$ = `r alpha`), still have our sample data show us an effect size of `r effect_size`, a sample mean of `r new_drug_xbar` and a sample standard deviation of `r new_drug_sd` we would need a sample size of at least **89 subjects**.

Per our calculations, the required sample size is technically about `r round(desired_n,2)`, however we can obviously not have a proportion of an individual person participate in our study. In minimum-based calculations like these, we must always round *up* despite conventional mathematics rounding a value like this one down.

# d.) Power Analysis W/ Theoretical Sample Sizes
To further analyze the relationship between sample size and power of experiment, I used a custom function in R to generate the power of a one-sample t-test (with the same characteristics of the experiment that our pharmaceutical partner previously did) with sample sizes iterating by 1, starting from 50 and ending at 200. A resulting plot of this data is below.

```{r}
# let variable x denote the different sample sizes that we are testing for power results
x = seq(from = 50, to = 200, by = 1)

# START CUSTOM FUNCTION
find_power = function(x, effect_size, pop_sd, alpha, test_type, alt){
  power_results = numeric(length(x))
  
  # START FOR LOOP
  for (i in min(x):max(x)){
    power_sample = power.t.test(n = x,
             delta = effect_size,
             sd = pop_sd,
             sig.level = alpha,
             type = test_type,
             alternative = alt
             )
    
    ## will put results in data frame format for easier analysis
    power_results_n = cbind(power_sample$n)
    power_results_power = cbind(power_sample$power)
    results = cbind(power_results_n, power_results_power)
    colnames(results) = c("n", "power")
    
  } # END FOR LOOP
  
  return(results)
} # END CUSTOM FUNCTION

power_curve_data = find_power(x = x,
           effect_size = effect_size,
           pop_sd = old_drug_sd,
           alpha = alpha,
           test_type = "one.sample",
           alt = "one.sided")

power_curve_data = data.frame(power_curve_data)

lin_model = lm(power_curve_data$power ~ power_curve_data$n)

ggplot(data = power_curve_data) +
  geom_point(mapping = aes(x = n, y = power)) + 
  geom_hline(yintercept = 0.8, 
             color = "red", 
             linewidth = 1) + 
    # using geom_hline to point out the threshold between values that do and don't achieve ideal power
  
  geom_hline(yintercept = 0.9, 
             color = "blue", 
             linewidth = 1) + 
  labs(title = "Sample Size and Power Relationship",
       x = "Sample Size (n)",
       y = "Power (0 - 1 Scale)") 
```

As we can see, the benchmark threshold of 80% power is crossed once our sample size reaches just below 90 subjects. If we are to hold ourselves to an even stricter standard and attempt to achieve 90% power, while holding all other elements of our experiment (like confidence level for example) constant, then we would need at least 122 subjects to participate.
