Replace “Your Name” with your actual name.

Learning Goals:

  • Understand and apply the Chi-Square Goodness-of-Fit test

  • Interpret R-squared and Adjusted R-squared

  • Compare models using an F-test

  • Check model assumptions using residual plots

Exercise 1: Chi-Square Test with a 2x2 Contingency Table

A psychologist is curious if there is a relationship between pet ownership and stress level in college students. Students were categorized as either having a pet or no pet, and whether their stress level was high or low.

You must: Run a chi-square test using chisq.test().

Report the chi-square value, degrees of freedom, and p-value.

State your conclusion: Is stress level related to pet ownership?

Run the below code chunk to create the data

pet_data <- matrix(c(18, 35, 27, 13), nrow = 2, byrow = TRUE)
colnames(pet_data) <- c("High Stress", "Low Stress")
rownames(pet_data) <- c("Has Pet", "No Pet")
pet_data
##         High Stress Low Stress
## Has Pet          18         35
## No Pet           27         13
mosaicplot(pet_data)

#run the chi-squared test
# Create contingency table
pet_data <- matrix(c(18, 35, 27, 13), nrow = 2, byrow = TRUE)
colnames(pet_data) <- c("High Stress", "Low Stress")
rownames(pet_data) <- c("Has Pet", "No Pet")
pet_data
##         High Stress Low Stress
## Has Pet          18         35
## No Pet           27         13
# Visualize
mosaicplot(pet_data, main = "Pet Ownership vs. Stress Level", color = TRUE)

# Run Chi-square test
chi_result <- chisq.test(pet_data)
chi_result
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  pet_data
## X-squared = 8.9677, df = 1, p-value = 0.002748

Interpretation:

Chi-square value (χ²): r round(chi_result$statistic, 2)

Degrees of freedom (df): r chi_result$parameter

p-value: r round(chi_result$p.value, 4)

Conclusion: If the p-value is below 0.05 (which it typically is for this data), you reject the null hypothesis. That means stress level is significantly related to pet ownership.

Exercise 2: Comparing R-squared and Adjusted R-squared

A psychologist wants to predict academic performance based on different sets of predictors. You’ve already collected the data and loaded it into R.

Use the following code to simulate the data:

set.seed(123)
n <- 100
IQ <- rnorm(n, mean = 100, sd = 15)
motivation <- rnorm(n, mean = 50, sd = 10)
grit <- rnorm(n, mean = 60, sd = 8)
working_memory <- rnorm(n, mean = 50, sd = 10)
academic_perf <- 0.3*IQ + 0.2*motivation + 0.1*grit + 0.1*working_memory + rnorm(n, 0, 10)
iq_data <- data.frame(IQ, motivation, grit, academic_perf, working_memory)
head(iq_data )
##          IQ motivation     grit academic_perf working_memory
## 1  91.59287   42.89593 77.59048      47.36529       42.84758
## 2  96.54734   52.56884 70.49930      39.08870       42.47311
## 3 123.38062   47.53308 57.87884      50.02267       40.61461
## 4 101.05763   46.52457 64.34555      49.71583       39.47487
## 5 101.93932   40.48381 56.68528      55.61689       45.62840
## 6 125.72597   49.54972 56.19002      42.07245       53.31179

Instructions:

  1. Fit Model 1 using only IQ.
  2. Fit Model 2 using IQ, motivation, and grit.
  3. Compare the two models: Which model explains more variance? Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?

Use summary() to view the R-squared and Adjusted R-squared each model.

# Fit Model 1 using only IQ.
# Simulate data
set.seed(123)
n <- 100
IQ <- rnorm(n, mean = 100, sd = 15)
motivation <- rnorm(n, mean = 50, sd = 10)
grit <- rnorm(n, mean = 60, sd = 8)
working_memory <- rnorm(n, mean = 50, sd = 10)
academic_perf <- 0.3*IQ + 0.2*motivation + 0.1*grit + 0.1*working_memory + rnorm(n, 0, 10)
iq_data <- data.frame(IQ, motivation, grit, academic_perf, working_memory)

# Model 1: Only IQ
model1 <- lm(academic_perf ~ IQ, data = iq_data)
summary(model1)
## 
## Call:
## lm(formula = academic_perf ~ IQ, data = iq_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.4574  -6.4108   0.0701   6.7810  24.6746 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.79161    7.27981   5.191 1.13e-06 ***
## IQ           0.14325    0.07118   2.012   0.0469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.698 on 98 degrees of freedom
## Multiple R-squared:  0.03968,    Adjusted R-squared:  0.02988 
## F-statistic: 4.049 on 1 and 98 DF,  p-value: 0.04693
#Fit Model 2 using IQ, motivation, and grit. No interactions.
# Model 2: IQ + motivation + grit
model2 <- lm(academic_perf ~ IQ + motivation + grit, data = iq_data)
summary(model2)
## 
## Call:
## lm(formula = academic_perf ~ IQ + motivation + grit, data = iq_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.3922  -6.5107   0.3518   6.8479  24.6197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 32.39250   12.61901   2.567   0.0118 *
## IQ           0.14767    0.07244   2.038   0.0443 *
## motivation   0.06203    0.10176   0.610   0.5436  
## grit         0.03143    0.13043   0.241   0.8101  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.776 on 96 degrees of freedom
## Multiple R-squared:  0.04403,    Adjusted R-squared:  0.01416 
## F-statistic: 1.474 on 3 and 96 DF,  p-value: 0.2265
  • Which model explains more variance?

  • Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?:

Exercise 3: F-Test for Comparing Models

You are given two models:

model1 <- lm(academic_perf ~ IQ + working_memory, data = iq_data)
model2 <- lm(academic_perf ~ IQ + working_memory + motivation + grit, data = iq_data)
anova(model1, model2)
## Analysis of Variance Table
## 
## Model 1: academic_perf ~ IQ + working_memory
## Model 2: academic_perf ~ IQ + working_memory + motivation + grit
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     97 9152.5                           
## 2     95 9112.5  2    40.005 0.2085 0.8121

Instructions:

Run the code above. What does the p-value from the ANOVA output tell you? Should we keep the more complex model?

Answer: Look at the p-value in the ANOVA output:

If p < 0.05, the additional predictors (motivation + grit) significantly improve model fit.

Conclusion: Keep the more complex model.

Exercise 4: Residual Analysis with Q-Q Plot

Suppose you fitted a linear model:

model.res <- lm(academic_perf ~ IQ + motivation, data = iq_data)

Instructions:

Create a Q-Q plot of the residuals.

# Use qqnorm() and qqline()
model.res <- lm(academic_perf ~ IQ + motivation, data = iq_data)

# Q-Q plot
qqnorm(resid(model.res), main = "Q-Q Plot of Residuals")
qqline(resid(model.res), col = "blue", lwd = 2)

Does the residual distribution look normal?

If residuals closely follow the straight line in the Q-Q plot, then yes. Slight deviations at the ends are common.

Why does this matter in psychological research?

Normal residuals suggest that model assumptions are met, allowing for valid inference. Violations of normality can undermine the reliability of confidence intervals and p-values.

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.