b.) The test for lack of fit does not test for constant variance or normality of the error terms. However, lack of constant variance and lack of normality in the error terms can indirectly impact the results of the F-test for lack of fit in certain situations.
[1] 15 2
a.) The linear regression function is Y = 7.393 -2.505X.
b.) H0: The reduced model offers a statistically significantly better fit for time against the concentration of a solution. HA: The full model offers a statistically significantly better fit for time against the concentration of a solution. Decision rule: reject the null hypothesis if the p-value is less than 0.05 and if the F-value is larger than the F-statistic. Conclusion: Because the p-value is 0.0000046, which is smaller than alpha = 0.025, we reject the null hypothesis. Additionally, the F value of 55.994 is larger than the F-statistic of 6.4143 at df = 13. There is evidence to suggest that the full model offers a statistically significantly better fit for time against the concentration of a solution at the 0.025 significance level.c.) Yes, the test in part (b) indicates that the full regression function is appropriate when it leads to the conclusion that lack of fit of a linear regression function exists. This indicates that including all the terms in the function will lead to a better fit.
a.)
a.) FALSE. Rejecting the null hypothesis in a B-F test means that there is evidence to suggest that the variances of the groups being compared are significantly different. Thus, the plot of ei against X will likely show a pattern.
b.) TRUE. Both the t-test and the F-test are testing whether or not Beta1 = 0, so the conclusions will always be the same for SLR.
c.) TRUE. Non-constant variance occurs when the variability of errors is dependent on the predictor variables, meaning that it violates on of the assumptions of simple linear regression. A transformation can manipulate the data to have a more constant variance, thus fitting the SLR assumptions.
d.) FALSE. Outliers in a scatter plot should be removed as they can have a great effect on the assumptions of linear regression. When making conclusions from a linear regression model, the model should fit the data points as best as possible, so outliers may skew the model.
knitr::opts_chunk$set(echo = FALSE, comment = NA, message = FALSE)
options(scipen = 999) #Remove the scientific notation
library(tidyverse)
copiers <- read.table("CH01PR20.txt")
colnames(copiers) <- c("X1", "Y1")
copiers.full <- lm(Y1 ~ poly(X1,2), data = copiers)
copiers.reduced <- lm(Y1 ~ X1, data = copiers)
anova(copiers.full, copiers.reduced)
concentration <- read.table("Solution-concentration.txt")
colnames(concentration) <- c("X2", "Y2")
dim(concentration)
concentration.model <- lm(Y2~X2, data = concentration)
anova(concentration.model)