W6 Test 1 Independent T-test

R PROCESS

IMPORT EXCEL FILE CODE

library(readxl)

A6R1 <- read_excel("D:/000 20251021 AA 5221 Applied Analytics & Methods 1/Week 6/A6R1.xlsx")
print(A6R1)

## # A tibble: 100 × 3
##    ParticipantID Medication HeadacheDays
##            <dbl> <chr>             <dbl>
##  1             1 A                     6
##  2             2 A                     7
##  3             3 A                    13
##  4             4 A                     8
##  5             5 A                     8
##  6             6 A                    13
##  7             7 A                     9
##  8             8 A                     4
##  9             9 A                     6
## 10            10 A                     7
## # ℹ 90 more rows

DESCRIPTIVE STATISTICS

Calculate the mean, median, SD, and sample size for each variable.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

A6R1 %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )

## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50

CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES

CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE

hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
     main = "Histogram of patients using medication A",
     xlab = "HeadacheDays",
     ylab = "Count of patient",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
     main = "Histogram of patients using medication B",
     xlab = "HeadacheDays",
     ylab = "Count of patient",
     col = "lightgreen",
     border = "black",
     breaks = 20)

#### COMMENT: Histogram of HeadacheDays for patient taking medication A and histogram of HeadacheDays for patient taking medication B is slightly not symmetrical, positive skewed with a proper bell curve

CONDUCT THE SHAPIRO-WILK TEST

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])

## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])

## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741

COMMENT: The data is normally distributed for both group of patients taking medication A and medication B

VISUALLY DISPLAY THE DATA

library(ggplot2)
library(ggpubr)
ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

COMMENTS: There are only 2 dots outside of the whiskers, continue Independent T-test

INDEPENDENT T-TEST

t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6

Test is statistically significant p < .001

EFFECT SIZE:

library(effectsize)
cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohens_d_result)

## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.

W6 Test 1 Independent T-test

Uyen Duong

2025-11-21

ASSIGNMENT 6 RESEARCH SCENARIO 1

Assess the difference in number of HeadacheDays between the two groups of patients taking medication A and medication B

HYPOTHESES:

H0: There is no difference in reducing headaches level between the two patient groups taking medication A or medication B

H1: There is a difference in reducing headaches level between the two patient groups taking medication A or medication B

R PROCESS

IMPORT EXCEL FILE CODE

DESCRIPTIVE STATISTICS

Calculate the mean, median, SD, and sample size for each variable.

CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES

CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE

#### COMMENT: Histogram of HeadacheDays for patient taking medication A and histogram of HeadacheDays for patient taking medication B is slightly not symmetrical, positive skewed with a proper bell curve

CONDUCT THE SHAPIRO-WILK TEST

COMMENT: The data is normally distributed for both group of patients taking medication A and medication B

VISUALLY DISPLAY THE DATA

COMMENTS: There are only 2 dots outside of the whiskers, continue Independent T-test

INDEPENDENT T-TEST

Test is statistically significant p < .001

EFFECT SIZE:

REPORT PARAGRAPH

An Independent T-test was conducted to compare

the difference in number of HeadacheDays between two group of patients taking medication A and medication B

Patients who take the medication A have significantly lower number of HeadacheDays (M = 8.10, SD = 2.81) than

patients who take the medication B (M = 12.6, SD = 3.59), t (98) = -6.986, p < .001

The effect size was very large (d = -1.40), indicating a big difference between patients taking different medication

Overall, taking medication A resulted in less number of HeadacheDays