Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?
Base on the residuals and fitted plot, it clearly has a pattern so this linear model is not appropriate. The heavy tail polt said the same thing.
Performance <- read.csv("/Users/jaylee/Downloads/Student_Performance.csv", check.names = FALSE)
plot(Performance$`Hours Studied`,Performance$`Performance Index`)
plot(Performance$`Sleep Hours`,Performance$`Performance Index`)
Performance$`Extracurricular Activities` <- as.character(Performance$`Extracurricular Activities`)
Performance$`Extracurricular Activities`[Performance$`Extracurricular Activities` == "No"] <- 0
Performance$`Extracurricular Activities`[Performance$`Extracurricular Activities` == "Yes"] <- 1
Performance$`Extracurricular Activities` <- as.integer(Performance$`Extracurricular Activities`)
plot(Performance$`Extracurricular Activities`,Performance$`Performance Index`)
# Quadratic variable
quadratic <- Performance$`Hours Studied`^2
# Dichotomous vs. quantative interaction
quantativeEA <- Performance$`Extracurricular Activities` * Performance$`Performance Index`
per_lm <- lm(Performance$`Performance Index`~ Performance$`Hours Studied`+Performance$`Extracurricular Activities`+quadratic+quantativeEA, data = Performance)
summary(per_lm)
##
## Call:
## lm(formula = Performance$`Performance Index` ~ Performance$`Hours Studied` +
## Performance$`Extracurricular Activities` + quadratic + quantativeEA,
## data = Performance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.373 -4.979 -0.243 5.278 38.880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.42526 0.51437 90.257 <2e-16
## Performance$`Hours Studied` 1.99964 0.22839 8.755 <2e-16
## Performance$`Extracurricular Activities` -50.60268 0.61670 -82.054 <2e-16
## quadratic -0.05159 0.02230 -2.314 0.0207
## quantativeEA 0.92471 0.01003 92.217 <2e-16
##
## (Intercept) ***
## Performance$`Hours Studied` ***
## Performance$`Extracurricular Activities` ***
## quadratic *
## quantativeEA ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.1 on 9995 degrees of freedom
## Multiple R-squared: 0.5355, Adjusted R-squared: 0.5353
## F-statistic: 2881 on 4 and 9995 DF, p-value: < 2.2e-16
plot(per_lm)