Replace “Your Name” with your actual name.

Learning Goals:

  • Understand and apply the Chi-Square Goodness-of-Fit test

  • Interpret R-squared and Adjusted R-squared

  • Compare models using an F-test

  • Check model assumptions using residual plots

Exercise 1: Chi-Square Test with a 2x2 Contingency Table

A psychologist is curious if there is a relationship between pet ownership and stress level in college students. Students were categorized as either having a pet or no pet, and whether their stress level was high or low.

You must: Run a chi-square test using chisq.test().

Report the chi-square value, degrees of freedom, and p-value.

State your conclusion: Is stress level related to pet ownership?

Run the below code chunk to create the data

pet_data <- matrix(c(18, 35, 27, 13), nrow = 2, byrow = TRUE)
colnames(pet_data) <- c("High Stress", "Low Stress")
rownames(pet_data) <- c("Has Pet", "No Pet")
pet_data
##         High Stress Low Stress
## Has Pet          18         35
## No Pet           27         13
mosaicplot(pet_data)

#run the chi-squared test
chisq.test(pet_data)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  pet_data
## X-squared = 8.9677, df = 1, p-value = 0.002748

Interpretation: The p-value is less than 0.05, that means that there is a significant difference in the observed frequencies of pet ownershup and stress.

Exercise 2: Comparing R-squared and Adjusted R-squared

A psychologist wants to predict academic performance based on different sets of predictors. You’ve already collected the data and loaded it into R.

Use the following code to simulate the data:

set.seed(123)
n <- 100
IQ <- rnorm(n, mean = 100, sd = 15)
motivation <- rnorm(n, mean = 50, sd = 10)
grit <- rnorm(n, mean = 60, sd = 8)
working_memory <- rnorm(n, mean = 50, sd = 10)
academic_perf <- 0.3*IQ + 0.2*motivation + 0.1*grit + 0.1*working_memory + rnorm(n, 0, 10)
iq_data <- data.frame(IQ, motivation, grit, academic_perf, working_memory)
head(iq_data )
##          IQ motivation     grit academic_perf working_memory
## 1  91.59287   42.89593 77.59048      47.36529       42.84758
## 2  96.54734   52.56884 70.49930      39.08870       42.47311
## 3 123.38062   47.53308 57.87884      50.02267       40.61461
## 4 101.05763   46.52457 64.34555      49.71583       39.47487
## 5 101.93932   40.48381 56.68528      55.61689       45.62840
## 6 125.72597   49.54972 56.19002      42.07245       53.31179

Instructions:

  1. Fit Model 1 using only IQ.
  2. Fit Model 2 using IQ, motivation, and grit.
  3. Compare the two models: Which model explains more variance? Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?

Use summary() to view the R-squared and Adjusted R-squared each model.

# Fit Model 1 using only IQ.
mod.1 <- lm(academic_perf ~ IQ, data = iq_data)
summary(mod.1)
## 
## Call:
## lm(formula = academic_perf ~ IQ, data = iq_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.4574  -6.4108   0.0701   6.7810  24.6746 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.79161    7.27981   5.191 1.13e-06 ***
## IQ           0.14325    0.07118   2.012   0.0469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.698 on 98 degrees of freedom
## Multiple R-squared:  0.03968,    Adjusted R-squared:  0.02988 
## F-statistic: 4.049 on 1 and 98 DF,  p-value: 0.04693
#Fit Model 2 using IQ, motivation, and grit. No interactions.
 mod.2 <- lm(academic_perf ~ IQ, motivation + grit, data = iq_data)
summary(mod.2)
## 
## Call:
## lm(formula = academic_perf ~ IQ, data = iq_data, subset = motivation + 
##     grit)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.594  -6.054  -1.549   6.035  13.110 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  36.7241    15.4938   2.370   0.0274 *
## IQ            0.1773     0.1427   1.243   0.2277  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.391 on 21 degrees of freedom
##   (77 observations deleted due to missingness)
## Multiple R-squared:  0.06848,    Adjusted R-squared:  0.02412 
## F-statistic: 1.544 on 1 and 21 DF,  p-value: 0.2277
  • Which model explains more variance? The simple model explains more variance (3% versus 1%)

  • Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?: No, adding more predictors lowers the adjusted r-squared values.

Exercise 3: F-Test for Comparing Models

You are given two models:

model1 <- lm(academic_perf ~ IQ + working_memory, data = iq_data)
model2 <- lm(academic_perf ~ IQ + working_memory + motivation + grit, data = iq_data)
anova(model1, model2)
## Analysis of Variance Table
## 
## Model 1: academic_perf ~ IQ + working_memory
## Model 2: academic_perf ~ IQ + working_memory + motivation + grit
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     97 9152.5                           
## 2     95 9112.5  2    40.005 0.2085 0.8121

Instructions:

Run the code above. What does the p-value from the ANOVA output tell you? Should we keep the more complex model?

Answer: The p-value = 08 which is greater than 0.05. This means that adding more predictors (model 2) did not improve our model enough to justify keeping it.

Exercise 4: Residual Analysis with Q-Q Plot

Suppose you fitted a linear model:

model.res <- lm(academic_perf ~ IQ + motivation, data = iq_data)

Instructions:

Create a Q-Q plot of the residuals.

# Use qqnorm() and qqline()
qqnorm(residuals(model.res))
qqline(residuals(model.res))

Does the residual distribution look normal?
Yes

Why does this matter in psychological research?
We want our model to have a good statistical fit so that we can feel more confident in generalizing our findings

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.