Weekly Lab Homework Assignment: Hypothesis Testing for Samples from Two Populations

Objective:

In this lab assignment, you will apply your understanding of hypothesis testing by conducting various tests and interpreting their results. You’ll work with different datasets to perform t-tests, calculate confidence intervals, and analyze the results. Once you have completed the exercises, knit this document to HTML and publish it to RPubs. Make sure your YAML header includes a title, your name, and the date.

Run the below chunk to create the apa_theme()

# Load ggplot2
library(ggplot2)
apa_theme <- theme_classic() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "bottom",
    text = element_text(family = "serif", size = 12),
    axis.title = element_text(size = 12),
    plot.title = element_text(size = 14, hjust = 0.5)
  )

Exercise 1: Conducting an Independent Samples t-Test

Scenario: You are studying the effect of two different teaching methods on student performance. You collect test scores from two groups of students who were taught using different methods. The data is as follows:

Group A Scores: c(78, 82, 85, 88, 91, 77, 85, 89, 90, 92)
Group B Scores: c(70, 75, 80, 82, 84, 76, 78, 81, 83, 85)

Tasks:

1. Conduct an independent samples t-test to compare the means of the two groups using R.

2. Create a, APA Boxplot graph using ggplot and geom_boxplot()

3. Interpret the t-value, degrees of freedom, and p-value from the output.

# Sample data
# Define the data for each group  
group_A <- c(78, 82, 85, 88, 91, 77, 85, 89, 90, 92)  
group_B <- c(70, 75, 80, 82, 84, 76, 78, 81, 83, 85)  
  
# Create a data frame with group and performance variables  
performance_data <- data.frame(  
  group = factor(rep(c("A", "B"), each = 10)),  
  performance = c(group_A, group_B)  
)

# Conduct the independent samples t-test
independent_t <- t.test(group_A, group_B, var.equal = TRUE)
independent_t

## 
##  Two Sample t-test
## 
## data:  group_A and group_B
## t = 2.8222, df = 18, p-value = 0.01129
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   1.610032 10.989968
## sample estimates:
## mean of x mean of y 
##      85.7      79.4

#create a boxplot using geom_boxplot(). Add apa_theme to it
ggplot(performance_data, aes(x = group, y = performance, fill = group)) +
  geom_boxplot(outlier.shape = 16, outlier.size = 2) +
  labs(title = "Test Scores by Teaching Method", x = "Group", y = "Test Score") +
  apa_theme +
  scale_fill_manual(values = c("#4E79A7", "#F28E2B")) +
  theme(legend.position = "none")

t-Value: 2.82
Degrees of Freedom: 18
p-Value: 0.0113
Interpretation: The p-value is less than 0.05, meaning a statistically significant difference in test scores between Group A and Group B. Group A had a higher mean score which could mean that the teaching method used for Group A is probably more effective.

Exercise 2: Calculating Confidence Intervals

Scenario: You want to calculate a 95% confidence interval for the difference between the means of the two groups (Group A and Group B) from Exercise 1.

Tasks:

1. Calculate the 95% confidence interval for the difference between the means of Group A and Group B using the t-test results.

2. Interpret the confidence interval and discuss what it suggests about the difference between the two teaching methods.

# Confidence interval from t-test result
independent_t$conf.int

## [1]  1.610032 10.989968
## attr(,"conf.level")
## [1] 0.95

95% Confidence Interval: [1.61, 10.99]
Interpretation: The 95% confidence interval for the difference between the means does not include zero, so we can be confident that there is a statistically significant difference between the two teaching methods. Students taught with Method A scored between 1.61 and 10.99 points higher on average than students taught with Method B.

Exercise 3: Conducting a Paired Samples t-Test

Scenario: You collect data on the anxiety levels of participants before and after they complete a stress management program. The data is as follows:

Anxiety Levels Before: c(60, 62, 65, 68, 70, 72, 74, 76, 78, 80)
Anxiety Levels After: c(55, 58, 60, 62, 63, 65, 66, 68, 70, 72)

Tasks:

1. Conduct a paired samples t-test to determine if there is a significant difference in anxiety levels before and after the program.

2. Create an APA Boxplot graph using ggplot and geom_boxplot()

3. Interpret the t-value, degrees of freedom, and p-value from the output.

# Sample data
# Define the data for anxiety levels before and after  
anxiety_before <- c(60, 62, 65, 68, 70, 72, 74, 76, 78, 80)  
anxiety_after <- c(55, 58, 60, 62, 63, 65, 66, 68, 70, 72)

# Conduct the paired samples t-test
paired_t <- t.test(anxiety_before, anxiety_after, paired = TRUE)
paired_t

## 
##  Paired t-test
## 
## data:  anxiety_before and anxiety_after
## t = 13.863, df = 9, p-value = 2.233e-07
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  5.522998 7.677002
## sample estimates:
## mean difference 
##             6.6

Run the below chunk before making your graph.

# Create a subject identifier (assumes 10 subjects)  
subject <- 1:10  
  
# Create data frame for Before measurements  
anxiety_data_before <- data.frame(  
  subject = subject,  
  time = "before",  
  anxiety = anxiety_before)

# Create data frame for After measurements  
anxiety_data_after <- data.frame(  
  subject = subject,  
  time = "after",  
  anxiety = anxiety_after  
)  
  
# Combine the two data frames into one long format data frame  
anxiety_data <- rbind(anxiety_data_before, anxiety_data_after)

#create a boxplot using geom_boxplot(). Add apa_theme to it
#use the new anxiety_data created in the previous chunk.
ggplot(anxiety_data, aes(x = time, y = anxiety, fill = time)) +
  geom_boxplot(width = 0.6, outlier.shape = NA) +
  scale_fill_manual(values = c("#AEC6CF", "#FFDAB9")) +
  labs(title = "Anxiety Levels Before and After Stress Management Program",
       x = "Time",
       y = "Anxiety Score") +
  apa_theme

t-Value: 13.86
Degrees of Freedom: 9
p-Value: 0
Interpretation: The p-value is less than 0.05 meaning a statistically significant decrease in anxiety levels after the stress management program. This means that the program had an effect on reducing participant anxiety.

Exercise 4: Understanding Significance and Effect Size

Scenario: You conduct a study comparing the effectiveness of two diets on weight loss. The independent samples t-test yields a p-value of 0.03 and a Cohen’s d of 0.5.

Tasks:

1. Discuss whether the result is statistically significant and what the p-value indicates.

The p-value is 0.03, smaller than 0.05. This means that the result is statistically significant. We can reject the null hypothesis and conclude there is a real difference in weight loss between the two diets.

2. Interpret the effect size (Cohen’s d = 0.5) and its practical significance in the context of the study’s findings.

Cohen’s d of 0.5 represents a “medium effect size”, which means the difference between the two diets is not only statistically significant, but also meaningful. One diet helped poeple lose weight more than the other.

Weekly Lab Homework Assignment: Hypothesis Testing for Samples from Two Populations

Allison Cavanagh

Objective:

Exercise 1: Conducting an Independent Samples t-Test

Exercise 2: Calculating Confidence Intervals

Exercise 3: Conducting a Paired Samples t-Test

Exercise 4: Understanding Significance and Effect Size