What are the null and alternate hypotheses for YOUR research scenario? H0:There is no difference in the mean number of headache days between the two medication groups H1:There is a difference in the mean number of headache days between the two medication groups
IMPORT EXCEL FILE
#install.packages("readxl")
LOAD THE PACKAGE
library(readxl)
dataset <- read_excel("C:\\Users\\navya\\Downloads\\A6R1.xlsx")
PURPOSE: Calculate the mean, median, SD, and sample size for each group.
#install.packages("dplyr")
#install.packages("tidyverse")
LOAD THE PACKAGE
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
dataset %>%
group_by(Medication) %>%
summarise(
Mean = mean(HeadacheDays, na.rm = TRUE),
Median = median(HeadacheDays, na.rm = TRUE),
SD = sd(HeadacheDays, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## Medication Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 A 8.1 8 2.81 50
## 2 B 12.6 12.5 3.59 50
hist(dataset$HeadacheDays[dataset$Medication == "A"],
main = "Histogram of A",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(dataset$HeadacheDays[dataset$Medication == "B"],
main = "Histogram of B",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
# QUESTIONS
#Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
#Answer: The histogram of A appears to be slightly positively skewed.The tail of the distribution extends longer toward the right.
#Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
#Answer: The histogram of A looks too tall, which means the values are concentrated around the mean with thick tails. It doesn't have properly curved bell shape.
#Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
#Answer:The histogram of B appears to be roughly symmetrical which is positively skewed.
#Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
#Answer:The histogram of B has slightly peaked shape close to bell curve.
shapiro.test(dataset$HeadacheDays[dataset$Medication == "A"])
##
## Shapiro-Wilk normality test
##
## data: dataset$HeadacheDays[dataset$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(dataset$HeadacheDays[dataset$Medication == "B"])
##
## Shapiro-Wilk normality test
##
## data: dataset$HeadacheDays[dataset$Medication == "B"]
## W = 0.98758, p-value = 0.8741
# QUESTION
#Was the data normally distributed for Variable 1?
#Yes, as the p=0.49>0.05 it is normally distributed.
#Was the data normally distributed for Variable 2?
#Yes, as the p=0.87>0.05 it is normally distributed.
#install.packages("ggplot2")
#install.packages("ggpubr")
library(ggplot2)
library(ggpubr)
#CREATE THE BOXPLOT
ggboxplot(dataset, x = "Medication", y = "HeadacheDays",
color = "Medication",
palette = "jco",
add = "jitter")
# QUESTION
#Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot (check if there are any dots past the lines on the boxes) or are they very far away?
#Answer:
#BOXPLOT A
#Yes, there is one dot outside the boxplot and it is located just above the upper whisker, so we should continue with Independent t-test
#BOXPLOT B
#Yes there are two dots outside the box so we can continue with Independent t-test.
#Conclusion Since both groups have passed normality check and close to whiskers we should continue with Independent t-test.
t.test(HeadacheDays ~ Medication, data = dataset, var.equal = TRUE)
##
## Two Sample t-test
##
## data: HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B
## 8.1 12.6
#install.packages("effectsize")
library(effectsize)
cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = dataset, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d | 95% CI
## --------------------------
## -1.40 | [-1.83, -0.96]
##
## - Estimated using pooled SD.
# QUESTIONS
#Q1) What is the size of the effect?
#The Cohen's d value is -1.40 which is +/- 1.30 to + size of effect is very large
# Q2) Which group had the higher average score?
#Group B had higher average score which is M=12.6 compared to Group A which is M=8.18.