Understand and apply the Chi-Square Goodness-of-Fit test
Interpret R-squared and Adjusted R-squared
Compare models using an F-test
Check model assumptions using residual plots
A psychologist is curious if there is a relationship between pet ownership and stress level in college students. Students were categorized as either having a pet or no pet, and whether their stress level was high or low.
You must: Run a chi-square test using chisq.test().
Report the chi-square value, degrees of freedom, and p-value.
State your conclusion: Is stress level related to pet ownership?
Run the below code chunk to create the data
pet_data <- matrix(c(18, 35, 27, 13), nrow = 2, byrow = TRUE)
colnames(pet_data) <- c("High Stress", "Low Stress")
rownames(pet_data) <- c("Has Pet", "No Pet")
pet_data## High Stress Low Stress
## Has Pet 18 35
## No Pet 27 13
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: pet_data
## X-squared = 8.9677, df = 1, p-value = 0.002748
Interpretation: The Chi-square value is about 8.97 and the degrees of freedom is 1. Because the p-value is less than 0.05 (0.002748), the null hypothesis is rejected. This means there is no significant relationship between having a pet and stress level.
A psychologist wants to predict academic performance based on different sets of predictors. You’ve already collected the data and loaded it into R.
Use the following code to simulate the data:
set.seed(123)
n <- 100
IQ <- rnorm(n, mean = 100, sd = 15)
motivation <- rnorm(n, mean = 50, sd = 10)
grit <- rnorm(n, mean = 60, sd = 8)
working_memory <- rnorm(n, mean = 50, sd = 10)
academic_perf <- 0.3*IQ + 0.2*motivation + 0.1*grit + 0.1*working_memory + rnorm(n, 0, 10)
iq_data <- data.frame(IQ, motivation, grit, academic_perf, working_memory)
head(iq_data )## IQ motivation grit academic_perf working_memory
## 1 91.59287 42.89593 77.59048 47.36529 42.84758
## 2 96.54734 52.56884 70.49930 39.08870 42.47311
## 3 123.38062 47.53308 57.87884 50.02267 40.61461
## 4 101.05763 46.52457 64.34555 49.71583 39.47487
## 5 101.93932 40.48381 56.68528 55.61689 45.62840
## 6 125.72597 49.54972 56.19002 42.07245 53.31179
Instructions:
Use summary() to view the R-squared and Adjusted R-squared each model.
##
## Call:
## lm(formula = academic_perf ~ IQ, data = iq_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.4574 -6.4108 0.0701 6.7810 24.6746
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.79161 7.27981 5.191 1.13e-06 ***
## IQ 0.14325 0.07118 2.012 0.0469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.698 on 98 degrees of freedom
## Multiple R-squared: 0.03968, Adjusted R-squared: 0.02988
## F-statistic: 4.049 on 1 and 98 DF, p-value: 0.04693
#Fit Model 2 using IQ, motivation, and grit. No interactions.
model2 <- lm(academic_perf ~ IQ + motivation + grit, data = iq_data)
summary(model2)##
## Call:
## lm(formula = academic_perf ~ IQ + motivation + grit, data = iq_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.3922 -6.5107 0.3518 6.8479 24.6197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.39250 12.61901 2.567 0.0118 *
## IQ 0.14767 0.07244 2.038 0.0443 *
## motivation 0.06203 0.10176 0.610 0.5436
## grit 0.03143 0.13043 0.241 0.8101
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.776 on 96 degrees of freedom
## Multiple R-squared: 0.04403, Adjusted R-squared: 0.01416
## F-statistic: 1.474 on 3 and 96 DF, p-value: 0.2265
Which model explains more variance? Model 1 has an R-squared of 0.03968 and adjusted R-squared of 0.02988. Model 2 has a R-sqaured of 0.04403 and adjusted R-squared of 0.01416. Becuase Model 2 has a higher R-sqaured, it explains more variance.
Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?: Adding predictors does not improve the model meaningfully. The adjusted R-sqaured in Model 2 drops from 0.02988 to 0.01416, suggesting that adding motivation and grit does not meaningfully improve the model.
You are given two models:
model1 <- lm(academic_perf ~ IQ + working_memory, data = iq_data)
model2 <- lm(academic_perf ~ IQ + working_memory + motivation + grit, data = iq_data)## Analysis of Variance Table
##
## Model 1: academic_perf ~ IQ + working_memory
## Model 2: academic_perf ~ IQ + working_memory + motivation + grit
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 97 9152.5
## 2 95 9112.5 2 40.005 0.2085 0.8121
Instructions:
Run the code above. What does the p-value from the ANOVA output tell you? Should we keep the more complex model?
Answer: The p-value is 0.8121, which is greater than the significance level of 0.05, indicating that the added predictors of motivation and grit to the model don’t significantly imrove the model’s fit compared to just working with IQ and working memory.
Suppose you fitted a linear model:
Instructions:
Create a Q-Q plot of the residuals.
# Use qqnorm() and qqline()
model.res <- lm(academic_perf ~ IQ + motivation, data = iq_data)
qqnorm(residuals(model.res))
qqline(residuals(model.res))Does the residual distribution look normal?
The residual distribution looks mostly normal, with a few slight
outlines. This means the residuals are approximately normally
distributed.
Why does this matter in psychological research?
Normal residuals are important in psychological research
because, when residuals are normal, it means the mistakes the model
makes are random, making p-values and confidence intervals more
trustworthy. When the residuals, it means the model’s errors could be
biased or follow a pattern, making the results less reliable
Submission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.