Scenario 1: Medication A vs Medication B

A medical research team created a new medication to reduce headaches (Medication A). They want to determine if Medication A is more effective at reducing headaches than the current medication on the market (Medication B). A group of participants were randomly assigned to either take Medication A or Medication B. Data was collected for 30 days through an app and participants reported each day if they did or did not have a headache. Was there a difference in the number of headaches between the groups?

HYPOTHESIS TESTED

QUESTION

What are the null and alternate hypotheses for YOUR research scenario?

Null Hypotheses H0:There is no difference in the number of headaches between participants taking Medication A and those taking Medication B.

Alternative Hypotheses H1:There is a difference in the number of headaches between participants taking Medication A and those taking Medication B.

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/N Geetha Shivani/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\N Geetha Shivani\AppData\Local\Temp\RtmpgvB2OB\downloaded_packages
library(readxl)
dataset <- read_excel("C:\\Users\\N Geetha Shivani\\Downloads\\A6R1.xlsx")

DESCRIPTIVE STATISTICS

PURPOSE: Calculate the mean, median, SD, and sample size for each group.

install.packages("dplyr")
## Installing package into 'C:/Users/N Geetha Shivani/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## also installing the dependencies 'withr', 'generics', 'tidyselect'
## package 'withr' successfully unpacked and MD5 sums checked
## package 'generics' successfully unpacked and MD5 sums checked
## package 'tidyselect' successfully unpacked and MD5 sums checked
## package 'dplyr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\N Geetha Shivani\AppData\Local\Temp\RtmpgvB2OB\downloaded_packages

LOAD THE PACKAGE

Always reload the package you want to use.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

CALCULATE THE DESCRIPTIVE STATISTICS

 dataset%>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50

HISTOGRAMS

Purpose: Visually check the normality of the scores for each group.

hist(dataset$HeadacheDays[dataset$Medication == "A"],
     main = "Histogram of Medication Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(dataset$HeadacheDays[dataset$Medication == "B"],
     main = "Histogram of Group 2 Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

QUESTIONS

Answer the questions below as comments within the R script:

Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

  1. The histogram for Group A looks symmetrical.

Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

  1. The histogram has a proper bell shaped curve.

Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

  1. The histogram for Group A looks symmetrical.

Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

  1. The histogram has a proper bell shaped curve.

SHAPIRO-WILK TEST

Purpose: Check the normality for each group’s score statistically. The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.

shapiro.test(dataset$HeadacheDays[dataset$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(dataset$HeadacheDays[dataset$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "B"]
## W = 0.98758, p-value = 0.8741

QUESTION

Answer the questions below as a comment within the R script:

Was the data normally distributed for Variable 1?

  1. Yes, the data is Normally distributed for Group A

Was the data normally distributed for Variable 2?

A)Yes, the data is Normally distributed for Group B

NOTE: If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below. If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).

BOXPLOT

Purpose: Check for any outliers impacting the mean for each group’s scores.

install.packages("ggplot2")
## Installing package into 'C:/Users/N Geetha Shivani/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## also installing the dependencies 'farver', 'labeling', 'RColorBrewer', 'viridisLite', 'gtable', 'isoband', 'S7', 'scales'
## package 'farver' successfully unpacked and MD5 sums checked
## package 'labeling' successfully unpacked and MD5 sums checked
## package 'RColorBrewer' successfully unpacked and MD5 sums checked
## package 'viridisLite' successfully unpacked and MD5 sums checked
## package 'gtable' successfully unpacked and MD5 sums checked
## package 'isoband' successfully unpacked and MD5 sums checked
## package 'S7' successfully unpacked and MD5 sums checked
## package 'scales' successfully unpacked and MD5 sums checked
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\N Geetha Shivani\AppData\Local\Temp\RtmpgvB2OB\downloaded_packages
install.packages("ggpubr")
## Installing package into 'C:/Users/N Geetha Shivani/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## also installing the dependencies 'rbibutils', 'Deriv', 'modelr', 'microbenchmark', 'Rdpack', 'numDeriv', 'doBy', 'SparseM', 'MatrixModels', 'minqa', 'nloptr', 'reformulas', 'RcppEigen', 'backports', 'carData', 'abind', 'Formula', 'pbkrtest', 'quantreg', 'lme4', 'broom', 'corrplot', 'car', 'ggrepel', 'ggsci', 'tidyr', 'purrr', 'cowplot', 'ggsignif', 'gridExtra', 'polynom', 'rstatix'
## package 'rbibutils' successfully unpacked and MD5 sums checked
## package 'Deriv' successfully unpacked and MD5 sums checked
## package 'modelr' successfully unpacked and MD5 sums checked
## package 'microbenchmark' successfully unpacked and MD5 sums checked
## package 'Rdpack' successfully unpacked and MD5 sums checked
## package 'numDeriv' successfully unpacked and MD5 sums checked
## package 'doBy' successfully unpacked and MD5 sums checked
## package 'SparseM' successfully unpacked and MD5 sums checked
## package 'MatrixModels' successfully unpacked and MD5 sums checked
## package 'minqa' successfully unpacked and MD5 sums checked
## package 'nloptr' successfully unpacked and MD5 sums checked
## package 'reformulas' successfully unpacked and MD5 sums checked
## package 'RcppEigen' successfully unpacked and MD5 sums checked
## package 'backports' successfully unpacked and MD5 sums checked
## package 'carData' successfully unpacked and MD5 sums checked
## package 'abind' successfully unpacked and MD5 sums checked
## package 'Formula' successfully unpacked and MD5 sums checked
## package 'pbkrtest' successfully unpacked and MD5 sums checked
## package 'quantreg' successfully unpacked and MD5 sums checked
## package 'lme4' successfully unpacked and MD5 sums checked
## package 'broom' successfully unpacked and MD5 sums checked
## package 'corrplot' successfully unpacked and MD5 sums checked
## package 'car' successfully unpacked and MD5 sums checked
## package 'ggrepel' successfully unpacked and MD5 sums checked
## package 'ggsci' successfully unpacked and MD5 sums checked
## package 'tidyr' successfully unpacked and MD5 sums checked
## package 'purrr' successfully unpacked and MD5 sums checked
## package 'cowplot' successfully unpacked and MD5 sums checked
## package 'ggsignif' successfully unpacked and MD5 sums checked
## package 'gridExtra' successfully unpacked and MD5 sums checked
## package 'polynom' successfully unpacked and MD5 sums checked
## package 'rstatix' successfully unpacked and MD5 sums checked
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\N Geetha Shivani\AppData\Local\Temp\RtmpgvB2OB\downloaded_packages

library(ggplot2)
library(ggpubr)

CREATE THE BOXPLOT

ggboxplot(dataset, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

QUESTION

Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?

  1. For both the box-plots, there are a few dots and they are close to the whiskers. Hence we go with Independent T-test.

INDEPENDENT T-TEST

PURPOSE: Test if there was a difference between the means of the two groups.

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable excel name (example: USD)

Replace “group” with your independent variable excel name (example: Country)

t.test(HeadacheDays ~ Medication, data = dataset, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to effect size section below.

If results were NOT statistically significant (p > .05), skip to reporting section below.

NOTE: Getting results that are not statistically significant does NOT mean you switch to Mann-Whitney U.

The Mann-Whitney U test is only for abnormally distributed data — not based on outcome significance.

EFFECT-SIZE

PURPOSE: Determine how big of a difference there was between the group means.

INSTALL REQUIRED PACKAGE

If never installed, remove the hashtag before the install code.

If previously installed, leave the hashtag in front of the code.

install.packages("effectsize")
## Installing package into 'C:/Users/N Geetha Shivani/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## also installing the dependencies 'bayestestR', 'insight', 'parameters', 'performance', 'datawizard'
## package 'bayestestR' successfully unpacked and MD5 sums checked
## package 'insight' successfully unpacked and MD5 sums checked
## package 'parameters' successfully unpacked and MD5 sums checked
## package 'performance' successfully unpacked and MD5 sums checked
## package 'datawizard' successfully unpacked and MD5 sums checked
## package 'effectsize' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\N Geetha Shivani\AppData\Local\Temp\RtmpgvB2OB\downloaded_packages

LOAD THE PACKAGE

Always load the package you want to use.

library(effectsize)

CALCULATE COHEN’S D

Replace “dataset” with your excel dataset name (without .xlsx)

Replace “score” with your dependent variable excel name (example: USD)

Replace “group” with your independent variable excel name (example: Country)

cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = dataset, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.

QUESTIONS

Answer the questions below as a comment within the R script:

Q1) What is the size of the effect?

The effect means how big or small was the difference between the group averages.

± 0.00 to 0.19 = ignore

± 0.20 to 0.49 = small

± 0.50 to 0.79 = moderate

± 0.80 to 1.29 = large

± 1.30 to + = very large

Example 1) A Cohen’s D of 0.10 indicates the difference between the group averages was not truly meaningful. There was no effect.

Example 2) A Cohen’s D of 0.22 indicates the difference between the group averages was small.

Q2) Which group had the higher average score?

You will notice that this effect size is either positive or negative. This tells us whether Group A or Group B had a higher score.

The group you entered first into your code is Group A, and the group you entered second is B.

However, it can be confusing to remember which is group is A and which group is B.

To make things easy, just look at the means of each group to see which group had the higher score.

WRITTEN REPORT FOR INDEPENDENT T-TEST

Write a paragraph summarizing your findings.

1) REVIEW YOUR OUTPUT

Collect the information below from your output:

1. The name of the inferential test used (Independent t-test)

2. The names of the IV and DV (their proper names, not their R code names).

3. The sample size for each group (labeled as “n”).

4. Whether the inferential test results were statistically significant (p < .05) or not (p > .05)

5. The mean and SD for each group’s score on the DV (rounded to two places after the decimal)

7. Degrees of freedom (labeled as “df”)

8. t-value (labeled as “sample estimate: cor” in output)

9. EXACT p-value to three decimals. NOTE: If p > .05, just report p > .05 If p < .001, just report p < .001

10. Effect size (Cohen’s d) ** Only if the results were significant

2) REPORT YOUR DATA AS A PARAGRAPH

An example report is provided below. You should copy the paragraph and just edit/ replace words with your information.

This is not considered plagiarizing because science has a specific format for reporting information.

EXAMPLE

An Independent t-test was conducted to compare

exam scores between students who attended a review session (n = 60) and students who did not (n = 60).

Students who attended the review session scored significantly higher (M = 85.31, SD = 6.12) than

students who did not attend a review session (M = 78.21, SD = 7.42), t(118) = 4.25, p = 0.12.

The effect size was large (d = 0.78), indicating a very large difference between student exam scores.

Overall, attending the review session resulted in much exam higher scores.