Replace “Your Name” with your actual name.
Understand and apply the Chi-Square Goodness-of-Fit test
Interpret R-squared and Adjusted R-squared
Compare models using an F-test
Check model assumptions using residual plots
A psychologist is curious if there is a relationship between pet ownership and stress level in college students. Students were categorized as either having a pet or no pet, and whether their stress level was high or low.
You must: Run a chi-square test using chisq.test().
Report the chi-square value, degrees of freedom, and p-value.
State your conclusion: Is stress level related to pet ownership?
Run the below code chunk to create the data
pet_data <- matrix(c(18, 35, 27, 13), nrow = 2, byrow = TRUE)
colnames(pet_data) <- c("High Stress", "Low Stress")
rownames(pet_data) <- c("Has Pet", "No Pet")
pet_data## High Stress Low Stress
## Has Pet 18 35
## No Pet 27 13
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: pet_data
## X-squared = 8.9677, df = 1, p-value = 0.002748
Interpretation: The p-value is less than 0.05, that means that there is a significant difference in the observed frequencies of pet ownershup and stress.
A psychologist wants to predict academic performance based on different sets of predictors. You’ve already collected the data and loaded it into R.
Use the following code to simulate the data:
set.seed(123)
n <- 100
IQ <- rnorm(n, mean = 100, sd = 15)
motivation <- rnorm(n, mean = 50, sd = 10)
grit <- rnorm(n, mean = 60, sd = 8)
working_memory <- rnorm(n, mean = 50, sd = 10)
academic_perf <- 0.3*IQ + 0.2*motivation + 0.1*grit + 0.1*working_memory + rnorm(n, 0, 10)
iq_data <- data.frame(IQ, motivation, grit, academic_perf, working_memory)
head(iq_data )## IQ motivation grit academic_perf working_memory
## 1 91.59287 42.89593 77.59048 47.36529 42.84758
## 2 96.54734 52.56884 70.49930 39.08870 42.47311
## 3 123.38062 47.53308 57.87884 50.02267 40.61461
## 4 101.05763 46.52457 64.34555 49.71583 39.47487
## 5 101.93932 40.48381 56.68528 55.61689 45.62840
## 6 125.72597 49.54972 56.19002 42.07245 53.31179
Instructions:
Use summary() to view the R-squared and Adjusted R-squared each model.
##
## Call:
## lm(formula = academic_perf ~ IQ, data = iq_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.4574 -6.4108 0.0701 6.7810 24.6746
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.79161 7.27981 5.191 1.13e-06 ***
## IQ 0.14325 0.07118 2.012 0.0469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.698 on 98 degrees of freedom
## Multiple R-squared: 0.03968, Adjusted R-squared: 0.02988
## F-statistic: 4.049 on 1 and 98 DF, p-value: 0.04693
#Fit Model 2 using IQ, motivation, and grit. No interactions.
mod.2 <- lm(academic_perf ~ IQ, motivation + grit, data = iq_data)
summary(mod.2)##
## Call:
## lm(formula = academic_perf ~ IQ, data = iq_data, subset = motivation +
## grit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.594 -6.054 -1.549 6.035 13.110
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.7241 15.4938 2.370 0.0274 *
## IQ 0.1773 0.1427 1.243 0.2277
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.391 on 21 degrees of freedom
## (77 observations deleted due to missingness)
## Multiple R-squared: 0.06848, Adjusted R-squared: 0.02412
## F-statistic: 1.544 on 1 and 21 DF, p-value: 0.2277
Which model explains more variance? The simple model explains more variance (3% versus 1%)
Based on the adjusted R-squared values, does adding predictors improve the model meaningfully?: No, adding more predictors lowers the adjusted r-squared values.
You are given two models:
model1 <- lm(academic_perf ~ IQ + working_memory, data = iq_data)
model2 <- lm(academic_perf ~ IQ + working_memory + motivation + grit, data = iq_data)## Analysis of Variance Table
##
## Model 1: academic_perf ~ IQ + working_memory
## Model 2: academic_perf ~ IQ + working_memory + motivation + grit
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 97 9152.5
## 2 95 9112.5 2 40.005 0.2085 0.8121
Instructions:
Run the code above. What does the p-value from the ANOVA output tell you? Should we keep the more complex model?
Answer: The p-value = 08 which is greater than 0.05. This means that adding more predictors (model 2) did not improve our model enough to justify keeping it.
Suppose you fitted a linear model:
Instructions:
Create a Q-Q plot of the residuals.
Does the residual distribution look normal?
Yes
Why does this matter in psychological research?
We want our model to have a good statistical fit so that we can
feel more confident in generalizing our findings
Submission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.