INDEPENDENT T-TEST

This analysis is for RESEARCH SCENARIO 1 from assignment 6. It tests to see if there was a difference in mean number of headache days between the two medication groups.

Hypotheses

  • H0 (Null Hypothesis): There is no difference in the mean number of headache days between the two medication groups.
  • H1 (Alternate Hypothesis): There is a difference in the mean number of headache days between the two medication groups.

Result paragraph

An Independent t-test was conducted to compare headache days between participants who take Medication A (n = 50) and participants who take Medication B (n = 50). Participants who take Medication A have lower headache days (M = 8.1, SD = 2.81) than Participants who take Medication B (M = 12.6, 3.59), t(98) = -6.99, p <.001 . The effect size was very large (d = -1.4), indicating a very large difference between headache days. Overall, taking Medication A resulted in much less headache days.

R code and Analysis

CHECK NORMAL DISTRIBUTION

IMPORT EXCEL FILE Purpose: Import your Excel dataset into R to conduct analyses.

# INSTALL REQUIRED PACKAGE

# install.packages("readxl")

# LOAD THE PACKAGE

library(readxl)

# IMPORT EXCEL FILE INTO R STUDIO

dataset <- read_excel("//apporto.com/dfs/SLU/Users/minhoku_slu/Downloads/A6R1.xlsx")

DESCRIPTIVE STATISTICS PURPOSE: Calculate the mean, median, SD, and sample size for each group.

# INSTALL REQUIRED PACKAGE

# install.packages("dplyr")

# LOAD THE PACKAGE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# CALCULATE THE DESCRIPTIVE STATISTICS

dataset %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50

HISTOGRAMS Purpose: Visually check the normality of the scores for each group.

# CREATE THE HISTOGRAMS 

hist(dataset$HeadacheDays[dataset$Medication == "A"],
main = "Histogram of Group 1 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(dataset$HeadacheDays[dataset$Medication == "B"],
main = "Histogram of Group 2 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

QUESTIONS

  • Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
  • The histogram looks symmetrical.
  • Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
  • The histogram have a proper bell curve.
  • Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
  • The histogram looks symmetrical.
  • Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
  • The histogram have a nearly proper bell curve.

SHAPIRO-WILK TEST Purpose: Check the normality for each group’s score statistically.

# CONDUCT THE SHAPIRO-WILK TEST

shapiro.test(dataset$HeadacheDays[dataset$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(dataset$HeadacheDays[dataset$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "B"]
## W = 0.98758, p-value = 0.8741

QUESTION

  • Was the data normally distributed for Variable 1?
  • Yes, the data is normally distributed for variable 1.(P>0.05)
  • Was the data normally distributed for Variable 2?
  • Yes, the data is normally distributed for variable 2.(P>0.05)

BOXPLOT Purpose: Check for any outliers impacting the mean for each group’s scores.

# INSTALL REQUIRED PACKAGE

# install.packages("ggplot2")
# install.packages("ggpubr")

# LOAD THE PACKAGE

library(ggplot2)
library(ggpubr)

# CREATE THE BOXPLOT

ggboxplot(dataset, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

QUESTION

  • Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
  • There were just two dots outside of the boxplot and they are close to the whiskers.

INDEPENDENT T-TEST

PURPOSE: Test if there was a difference between the means of the two groups.

t.test(HeadacheDays ~ Medication, data = dataset, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6
# DETERMINE STATISTICAL SIGNIFICANCE

EFFECT-SIZE PURPOSE: Determine how big of a difference there was between the group means.

# INSTALL REQUIRED PACKAGE

# install.packages("effectsize")

# LOAD THE PACKAGE

library(effectsize)

# CALCULATE COHEN’S D

cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = dataset, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.

QUESTIONS

  • Q1) What is the size of the effect?
  • A Cohen’s D of -1.40 indicates the difference between the group averages was very large.
  • Q2) Which group had the higher average score?
  • Group B had the higher score.