We can say with a 95% confidence level that a person of age 60 will have a muscle mass within the interval of 82.83 and 87.06.
Plot the residuals y_i − y_hat_i against x_i on one graph.
residual_plot_0
Plot the values y_hat_i - y_bar against xi on one graph, using the same scales as in the graph in part (a)
y_hat_less_y_bar_plot
From two graphs in part (b) and (c), does SSE or SSR appear to be the larger component of SSTO? What does this imply about the magnitude of R2?
SSR appears to be the larger component of SSTO. This implies that the magnitude of R2 will be greater than or equal to .5 as R2 = SSR/SSTO
Provide the ANOVA table.
anova_gt
NULL
What proportion of the total variation in muscle mass remains unexplained when age is added into the model? Is this proportion relatively small or large?
sse_prop
Proportion of Unexplained Variance
0.2499332
Relative to SSR this is large as it means that a around 25% of variation our total variation is remains unexplained.
Conduct a hypothesis test H_0 : β1 = 0 using an F test with significance level α = 0.05. Clearly state the alternatives, test statistics and conclusion.
We can obtain our F-statistic and p_value through R by using anova.
anova_mm
TERM
DF
SS
MS
F-stat
p-value
SSR
1
11627.486
11627.48584
174.062
4.123987e-19
SSE
58
3874.447
66.80082
NA
NA
We state our alternative to be H_a: β1 != 0, != means does not equal, and we see our F-statistic to be 174.062 with a p value of 4.12e-19 which is less than our significance level of a = 0.05. We therefore have significant evidence to reject the null hypothesis in favor of the alternative hypothesis of β1 != 0.
Do you consider any transformation on X or Y? Explain your reasoning.
I would either consider squaring the realized data of our response variable or taking the square root of our explanatory variable. We see that our data could possibly be fit by a line that can be described as the square root function. So we could take the square root of the explanatory variable or square our response variable to make it better fit for linear regression.
Use the transformation x′ = √x and obtain the estimated linear regression function for the transformed data.
Plot a scatter plot of the transformed data then add the estimated regression line on a graph. Is a simple linear regression appropriate for modeling this transformed data?
lm_prod_1_plot
`geom_smooth()` using formula = 'y ~ x'
Yes, I would say this data is now appropriate for linear regression.
Plot the residuals against the fitted values. What does this plot show?
lm_prod_1_residuals_plot
This shows that our residuals for our model are indeed random and do not follow any clear pattern. This shows that our linear model is indeed a good fit.
Provide Normal Q-Q plot. What does this plot show?
qq_prod_1_plot
This plot helps strengthen our assumptions of our residuals. We assume that our residuals follow a normal distribution with mean 0 and a constant variance.