1.When should we use regression instead of ANOVA?
When we want to use one or more predictors to predict the outcome instead of comparing different group means, we use regression instead of ANOVA.
2.Please explain the relationship between SStotal, SSregression, and SSerror.
SStotal: Sum of Square Total SSregression: Sum of Square Regression SSerror: Sum of Square Residual SST = SSR + SSE
3.Please use the following data to build a regression model and write a summary. IV is sugar and DV is calories.
Sugar: 5, 8, 9, 10, 15, 18, 14, 17, 20, 22, 24, 26, 30 ,30, 32
Calories: 20, 30, 60, 70, 100, 95, 70, 83, 103, 112, 130, 80, 95, 130, 112
sugar <- c(5, 8, 9, 10, 15, 18, 14, 17, 20, 22, 24, 26, 30 ,30, 32)
calories <- c(20, 30, 60, 70, 100, 95, 70, 83, 103, 112, 130, 80, 95, 130, 112)
rdata <- data.frame(sugar, calories)
Non-zero Variance
plot(density(rdata$sugar))
Normality
qqnorm(rdata$calories)
Linearity
library("ggplot2")
plot(rdata$sugar, rdata$calories)
Model
model <- lm(calories ~ sugar, data = rdata)
summary(model)
##
## Call:
## lm(formula = calories ~ sugar, data = rdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.332 -19.060 3.438 11.985 27.758
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.1542 12.4132 2.349 0.035315 *
## sugar 3.0453 0.6074 5.013 0.000237 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.56 on 13 degrees of freedom
## Multiple R-squared: 0.6591, Adjusted R-squared: 0.6329
## F-statistic: 25.13 on 1 and 13 DF, p-value: 0.0002373
Residuals
shapiro.test(model$residuals)
##
## Shapiro-Wilk normality test
##
## data: model$residuals
## W = 0.91269, p-value = 0.1489
Summary
The model is: 29.1542 + 3.0453 (sugar) = calories
A simple regression model was conducted to predict calories, based on how much sugar wasconsumed. All the regression assumptions were met, and no further adjustment was made. Asignificant regression equation was found (F (1, 13) = 25.13, p < .001). Both the intercept (p = .035) and predictor (p < .001) were statistically significant. Thu, sugar predicts and shows that for one unit increase in sugar there is a 3.0453 increase in calories