Replace “Your Name” with your actual name.

Learning Goals:

  • Understand and apply the Chi-Square Goodness-of-Fit test

  • Interpret R-squared and Adjusted R-squared

  • Compare models using an F-test

  • Check model assumptions using residual plots

Exercise 1: Chi-Square Test with a 2x2 Contingency Table

A psychologist is curious if there is a relationship between pet ownership and stress level in college students. Students were categorized as either having a pet or no pet, and whether their stress level was high or low.

You must: Run a chi-square test using chisq.test().

Report the chi-square value, degrees of freedom, and p-value.

State your conclusion: Is stress level related to pet ownership?

Run the below code chunk to create the data

pet_data <- matrix(c(18, 35, 27, 13), nrow = 2, byrow = TRUE)
colnames(pet_data) <- c("High Stress", "Low Stress")
rownames(pet_data) <- c("Has Pet", "No Pet")
pet_data
##         High Stress Low Stress
## Has Pet          18         35
## No Pet           27         13
mosaicplot(pet_data)

# Run the chi-squared test
chi_result <- chisq.test(pet_data)
chi_result
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  pet_data
## X-squared = 8.9677, df = 1, p-value = 0.002748

Interpretation:

The chi-square value is 8.97, with 1 degrees of freedom and a p-value of 0.0027.

  • If the p-value is < .05, stress level is related to pet ownership.
  • If the p-value is > .05, there is no significant relationship between stress level and pet ownership.

Exercise 2: Comparing R-squared and Adjusted R-squared

A psychologist wants to predict academic performance based on different sets of predictors. You’ve already collected the data and loaded it into R.

Use the following code to simulate the data:

set.seed(123)
n <- 100
IQ <- rnorm(n, mean = 100, sd = 15)
motivation <- rnorm(n, mean = 50, sd = 10)
grit <- rnorm(n, mean = 60, sd = 8)
working_memory <- rnorm(n, mean = 50, sd = 10)
academic_perf <- 0.3*IQ + 0.2*motivation + 0.1*grit + 0.1*working_memory + rnorm(n, 0, 10)
iq_data <- data.frame(IQ, motivation, grit, academic_perf, working_memory)
head(iq_data )
##          IQ motivation     grit academic_perf working_memory
## 1  91.59287   42.89593 77.59048      47.36529       42.84758
## 2  96.54734   52.56884 70.49930      39.08870       42.47311
## 3 123.38062   47.53308 57.87884      50.02267       40.61461
## 4 101.05763   46.52457 64.34555      49.71583       39.47487
## 5 101.93932   40.48381 56.68528      55.61689       45.62840
## 6 125.72597   49.54972 56.19002      42.07245       53.31179

Instructions:

  1. Fit Model 1 using only IQ.
  2. Fit Model 2 using IQ, motivation, and grit.
  3. Compare the two models: Which model explains more variance? Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?

Use summary() to view the R-squared and Adjusted R-squared each model.

# Fit Model 1 using only IQ
model1 <- lm(academic_perf ~ IQ, data = iq_data)
summary(model1)
## 
## Call:
## lm(formula = academic_perf ~ IQ, data = iq_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.4574  -6.4108   0.0701   6.7810  24.6746 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.79161    7.27981   5.191 1.13e-06 ***
## IQ           0.14325    0.07118   2.012   0.0469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.698 on 98 degrees of freedom
## Multiple R-squared:  0.03968,    Adjusted R-squared:  0.02988 
## F-statistic: 4.049 on 1 and 98 DF,  p-value: 0.04693
# Fit Model 2 using IQ, motivation, and grit
model2 <- lm(academic_perf ~ IQ + motivation + grit, data = iq_data)
summary(model2)
## 
## Call:
## lm(formula = academic_perf ~ IQ + motivation + grit, data = iq_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.3922  -6.5107   0.3518   6.8479  24.6197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 32.39250   12.61901   2.567   0.0118 *
## IQ           0.14767    0.07244   2.038   0.0443 *
## motivation   0.06203    0.10176   0.610   0.5436  
## grit         0.03143    0.13043   0.241   0.8101  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.776 on 96 degrees of freedom
## Multiple R-squared:  0.04403,    Adjusted R-squared:  0.01416 
## F-statistic: 1.474 on 3 and 96 DF,  p-value: 0.2265
  • Which model explains more variance?
    Model 2 explains more variance because it includes additional predictors that account for more variation in academic performance.

  • Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?:
    If the adjusted R-squared increases in Model 2, it suggests that adding motivation and grit improves the model’s explanatory power beyond chance.

Exercise 3: F-Test for Comparing Models

You are given two models:

model1 <- lm(academic_perf ~ IQ + working_memory, data = iq_data)
model2 <- lm(academic_perf ~ IQ + working_memory + motivation + grit, data = iq_data)
anova(model1, model2)
## Analysis of Variance Table
## 
## Model 1: academic_perf ~ IQ + working_memory
## Model 2: academic_perf ~ IQ + working_memory + motivation + grit
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     97 9152.5                           
## 2     95 9112.5  2    40.005 0.2085 0.8121

Instructions:

Run the code above. What does the p-value from the ANOVA output tell you? Should we keep the more complex model?

Answer:

If the p-value from the ANOVA comparison is less than 0.05, then Model 2 (with more predictors) significantly improves fit, and we should keep it. If the p-value is greater than 0.05, then the simpler model suffices.

Exercise 4: Residual Analysis with Q-Q Plot

Suppose you fitted a linear model:

model.res <- lm(academic_perf ~ IQ + motivation, data = iq_data)

Instructions:

Create a Q-Q plot of the residuals.

# Use qqnorm() and qqline()
residuals <- resid(model.res)
qqnorm(residuals)
qqline(residuals)

Does the residual distribution look normal?
If the points follow a straight line, the residuals are approximately normally distributed.

Why does this matter in psychological research?
Because normal residuals validate the use of linear regression—ensuring accurate p-values and confidence intervals in hypothesis testing.

Why does this matter in psychological research?
Response here

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.