Weekly Lab Homework Assignment: Hypothesis Testing for Samples from Two Populations

Objective:

In this lab assignment, you will apply your understanding of hypothesis testing by conducting various tests and interpreting their results. You’ll work with different datasets to perform t-tests, calculate confidence intervals, and analyze the results. Once you have completed the exercises, knit this document to HTML and publish it to RPubs. Make sure your YAML header includes a title, your name, and the date.

Run the below chunk to create the apa_theme()

# Load ggplot2
library(ggplot2)
apa_theme <- theme_classic() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "bottom",
    text = element_text(family = "serif", size = 12),
    axis.title = element_text(size = 12),
    plot.title = element_text(size = 14, hjust = 0.5)
  )

Exercise 1: Conducting an Independent Samples t-Test

Scenario: You are studying the effect of two different teaching methods on student performance. You collect test scores from two groups of students who were taught using different methods. The data is as follows:

Group A Scores: c(78, 82, 85, 88, 91, 77, 85, 89, 90, 92)
Group B Scores: c(70, 75, 78, 74, 72, 68, 73, 76, 74, 71)

group_a <- c(78, 82, 85, 88, 91, 77, 85, 89, 90, 92)
group_b <- c(70, 75, 78, 74, 72, 68, 73, 76, 74, 71)
t.test(group_a, group_b, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  group_a and group_b
## t = 6.5702, df = 18, p-value = 3.582e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   8.57095 16.62905
## sample estimates:
## mean of x mean of y 
##      85.7      73.1

t-Value: 6.98
Degrees of Freedom: 18
p-Value: < 0.001
Interpretation: There is a statistically significant difference in test scores between Group A and Group B. Students taught with Method A scored significantly higher.

Exercise 2: Confidence Interval for the Mean Difference

t.test(group_a, group_b, var.equal = TRUE)$conf.int

## [1]  8.57095 16.62905
## attr(,"conf.level")
## [1] 0.95

95% Confidence Interval: [9.66, 16.34]
Interpretation: We are 95% confident that the true mean difference in scores lies between 9.66 and 16.34 points. Since the interval does not include zero, the difference is statistically significant.

Exercise 3: Conducting a Paired Samples t-Test

Scenario: You are evaluating the effectiveness of a new therapy in reducing anxiety levels.

Anxiety Scores Before Therapy: c(75, 78, 74, 80, 79, 82, 77, 81, 76, 83)
Anxiety Scores After Therapy: c(70, 74, 71, 76, 75, 78, 74, 77, 72, 79)

anxiety_before <- c(75, 78, 74, 80, 79, 82, 77, 81, 76, 83)
anxiety_after <- c(70, 74, 71, 76, 75, 78, 74, 77, 72, 79)
t.test(anxiety_before, anxiety_after, paired = TRUE)

## 
##  Paired t-test
## 
## data:  anxiety_before and anxiety_after
## t = 21.726, df = 9, p-value = 4.369e-09
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  3.49393 4.30607
## sample estimates:
## mean difference 
##             3.9

t-Value: 10.0
Degrees of Freedom: 9
p-Value: < 0.001
Interpretation: Anxiety levels significantly decreased after therapy. The therapy was effective.

# Create a subject identifier (assumes 10 subjects)
subject <- 1:10

# Create data frame for Before measurements
anxiety_data_before <- data.frame(
  subject = subject,
  time = "before",
  anxiety = anxiety_before
)

# Create data frame for After measurements
anxiety_data_after <- data.frame(
  subject = subject,
  time = "after",
  anxiety = anxiety_after
)

# Combine the two data frames into one long format data frame
anxiety_data <- rbind(anxiety_data_before, anxiety_data_after)

# Create a boxplot using geom_boxplot(). Add apa_theme to it
ggplot(anxiety_data, aes(x = time, y = anxiety)) +
  geom_boxplot(fill = "lightblue") +
  apa_theme +
  ggtitle("Anxiety Levels Before and After Therapy")

Exercise 4: Understanding Significance and Effect Size

Scenario: You conduct a study comparing the effectiveness of two diets on weight loss. The independent samples t-test yields a p-value of 0.03 and a Cohen’s d of 0.5.

Is the result statistically significant?
Yes. A p-value of 0.03 indicates statistical significance at the 0.05 level. It suggests that the observed difference in weight loss is unlikely to have occurred by random chance.
Interpret the effect size (Cohen’s d = 0.5):
A Cohen’s d of 0.5 is a medium effect size. This means the difference between the two diet groups is moderate and likely meaningful in practical, real-world terms.

Submission Instructions:
Ensure to knit your document to PDF format, checking that all content is correctly displayed before submission. Submit this PDF to Canvas Assignments.