1.When should we use regression instead of ANOVA?

When we want to use one or more predictors to predict the outcome instead of comparing different group means, we use regression instead of ANOVA.

2.Please explain the relationship between SStotal, SSregression, and SSerror.

SStotal: Sum of Square Total SSregression: Sum of Square Regression SSerror: Sum of Square Residual SST = SSR + SSE

3.Please use the following data to build a regression model and write a summary. IV is sugar and DV is calories.

Sugar: 5, 8, 9, 10, 15, 18, 14, 17, 20, 22, 24, 26, 30 ,30, 32

Calories: 20, 30, 60, 70, 100, 95, 70, 83, 103, 112, 130, 80, 95, 130, 112

sugar <- c(5, 8, 9, 10, 15, 18, 14, 17, 20, 22, 24, 26, 30 ,30, 32)
calories <- c(20, 30, 60, 70, 100, 95, 70, 83, 103, 112, 130, 80, 95, 130, 112)
rdata <- data.frame(sugar, calories)

Non-zero Variance

plot(density(rdata$sugar))

Normality

qqnorm(rdata$calories)

Linearity

library("ggplot2")
plot(rdata$sugar, rdata$calories)

Model

model <- lm(calories ~ sugar, data = rdata)
summary(model)
## 
## Call:
## lm(formula = calories ~ sugar, data = rdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.332 -19.060   3.438  11.985  27.758 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  29.1542    12.4132   2.349 0.035315 *  
## sugar         3.0453     0.6074   5.013 0.000237 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.56 on 13 degrees of freedom
## Multiple R-squared:  0.6591, Adjusted R-squared:  0.6329 
## F-statistic: 25.13 on 1 and 13 DF,  p-value: 0.0002373

Residuals

shapiro.test(model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model$residuals
## W = 0.91269, p-value = 0.1489

Summary

The model is: 29.1542 + 3.0453 (sugar) = calories

A simple regression model was conducted to predict calories, based on how much sugar wasconsumed. All the regression assumptions were met, and no further adjustment was made. Asignificant regression equation was found (F (1, 13) = 25.13, p < .001). Both the intercept (p = .035) and predictor (p < .001) were statistically significant. Thu, sugar predicts and shows that for one unit increase in sugar there is a 3.0453 increase in calories