Through the quizzes in our course, we are familiar with the selling price of fish in Animal Crossing: New Horizons. Now, we will explore the selling price of insects. You should explore the data to understand what the variables mean prior to you answering questions. Throughout this test, use α=0.05.
Construct the model that examines selling price as a function of if the insect is flying, whether the insect can be caught in the rain, the number of catches required, and all possible two-way interactions.
model1 <- lm(Sell ~ Flying + Rain + Catches + Flying:Rain + Flying:Catches + Rain:Catches, data = insects)
summary(model1)
##
## Call:
## lm(formula = Sell ~ Flying + Rain + Catches + Flying:Rain + Flying:Catches +
## Rain:Catches, data = insects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5285.8 -755.3 -243.9 340.0 6994.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 283.750 661.297 0.429 0.669
## Flying 195.192 824.567 0.237 0.814
## Rain 721.585 739.462 0.976 0.332
## Catches 88.813 53.315 1.666 0.100
## Flying:Rain -1140.527 2070.727 -0.551 0.583
## Flying:Catches -10.364 58.128 -0.178 0.859
## Rain:Catches -3.003 53.860 -0.056 0.956
##
## Residual standard error: 1870 on 73 degrees of freedom
## Multiple R-squared: 0.6861, Adjusted R-squared: 0.6603
## F-statistic: 26.6 on 6 and 73 DF, p-value: < 2.2e-16
summary_m1 <- summary(model1)
library(tibble)
model_c1 <- coefficients(model1)
model_s1 <- summary(model1)
model_ci1 <- as_tibble(confint(model1, level=0.95))
model_t1 <- as_tibble(model_s1[[4]])
p.value.string = function(p.value){
p.value <- round(p.value, digits=4)
if (p.value == 0) {
return("p < 0.0001")
} else {
return(paste0("p = ", format(p.value, scientific = F)))
}
}
almost_sas <- function(aov.results){
par(mfrow=c(2,2))
plot(aov.results, which=1)
plot(aov.results, which=2)
aov_residuals <- residuals(aov.results)
plot(density(aov_residuals))
hist(aov_residuals)
}
The regression line is as follows, such that \(Y={\mbox{Selling Price}}\) is: \[ \hat{Y} = 283.75 + 195.19X_{\mbox{Flying}} + 721.59X_{\mbox{Rain}} + 88.81X_{\mbox{Catches}} - 1140.53X_{\mbox{Flying:Rain}} - 10.36X_{\mbox{Flying:Catches}} - 3X_{\mbox{Rain:Catches}}\]
The new model indicates that for every 1 Bell increase in Selling Price, Flying increases by 195.19, Rain increases by 721.59, Catches increases by 88.81, the interaction between Flying and Rain decreases by 1140.53, the interaction between Flying and Catches decreases by 10.36, and the interaction between Rain and Catches decreases by 3.00.
library(car)
anova(model1)
## Analysis of Variance Table
##
## Response: Sell
## Df Sum Sq Mean Sq F value Pr(>F)
## Flying 1 22922423 22922423 6.5521 0.012546 *
## Rain 1 33705852 33705852 9.6343 0.002717 **
## Catches 1 500410829 500410829 143.0354 < 2.2e-16 ***
## Flying:Rain 1 948154 948154 0.2710 0.604226
## Flying:Catches 1 324673 324673 0.0928 0.761510
## Rain:Catches 1 10876 10876 0.0031 0.955688
## Residuals 73 255391173 3498509
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Construct the appropriate F-test to determine if we can remove all interaction terms at the same time.
Hypotheses
\(H_0: \ \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = \beta_6 = 0\)
\(H_1:\) at least one \(\beta_i \ne 0\)
Test Statistic
\(F_0 = 26.6\).
p-value
\(p < 0.0001\).
Rejection Region
Reject if \(p < \alpha\), where \(\alpha=0.05\).
Conclusion and Interpretation
Reject \(H_0\). There is sufficient evidence to suggest that the regression line is significant. However, according to the ANOVA results, the interactions are not significant (each \(p_{\mbox{Interaction}}>\alpha=0.05\)).
What is the final model after your test in Q2?
model2 <- lm(Sell ~ Flying + Rain + Catches)
summary(model2)
##
## Call:
## lm(formula = Sell ~ Flying + Rain + Catches)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5258.1 -741.9 -191.9 310.9 7008.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 378.659 496.849 0.762 0.448
## Flying -27.024 612.380 -0.044 0.965
## Rain 613.248 560.868 1.093 0.278
## Catches 85.523 7.026 12.172 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1838 on 76 degrees of freedom
## Multiple R-squared: 0.6846, Adjusted R-squared: 0.6721
## F-statistic: 54.98 on 3 and 76 DF, p-value: < 2.2e-16
summary_m2 <- summary(model2)
library(tibble)
model_c2 <- coefficients(model2)
model_s2 <- summary(model2)
model_ci2 <- as_tibble(confint(model2, level=0.95))
model_t2 <- as_tibble(model_s2[[4]])
The resulting model, such that \(Y={\mbox{Selling Price}}\), is:
\[ \hat{Y} = 378.66 - 27.02X_{\mbox{Flying}} + 613.25X_{\mbox{Rain}} + 88.81X_{\mbox{Catches}}\]
Find the R2adj for the model in Q1 and the model in Q3. Give interpretations for your R2adj values. Which model fits better and why? Based on these results, are you confident in your final model from Q3?
\(R^2 = 0.6861\) and \(R^2_{\mbox{adj}} = 0.6603\)
According to the original model, 66.03% of the variance is explained by the model.
\(R^2 = 0.6846\) and \(R^2_{\mbox{adj}} = 0.6721\)
According to the original model, 67.21% of the variance is explained by the model.
Since the second model explains more of the variance than the first model, the second model is likely more accurate, indicating greater confidence in Model 2 than Model 1.
Provide a table with point estimates and 95% confidence intervals on β as well as the p-value for the corresponding t-tests.
| Predictor | Estimate of \(\beta\) | 95% CI for \(\beta\) | p-value |
|---|---|---|---|
| Flying | -27.02 | (-1246.69, 1192.64) | p = 0.9649 |
| Rain | 613.25 | (-503.82, 1730.31) | p = 0.2777 |
| Catches | 85.52 | (71.53, 99.52) | p < 0.0001 |
Provide interpretations for each of the predictors and indicate whether or not they are significant predictors of selling price of insects.
According to Model 2, for every 1 Bell increase in Selling Price, Flying decreases by 27.02, Rain increases by 613.25, and Catches increases by 85.52.
However, according to the previous results from the predictors table, only Catches is a significant predictor (\(p<\alpha=0.05\)).
The model for predicting Selling Price based on Catches only is:
model4 <- lm(Sell ~ Catches)
model_c4 <- coefficients(model4)
\[ \hat{Y} = 686.18 + 87.68X_{\mbox{Catches}}\]
Construct a graph for flying insects that allows the number of catches to vary (x-axis). Have one line for when it rains and one line for when it does not rain.
library(sjPlot)
library(sjmisc)
library(ggplot2)
data(insects)
theme_set(theme_sjplot())
insects$rainy_fac <- as.factor(insects$Rain)
insects$pred_rainy <- model_c2[[1]] + model_c2[[2]]*insects$Rain + model_c2[[3]]
insects$pred_notrainy <- model_c2[[1]] + model_c2[[2]]*insects$Rain
fit <- lm(Flying ~ Catches*Rain)
plot_model(fit, type="int")
It appears that there is no interaction effect regarding Rain.
Check for outliers and, separately, any influence/leverage points. Do we need to perform sensitivity analysis?
library(pander)
pander(rstandard(model4), style='rmarkdown')
According to the Studentized R results, observations 10 (-2.788), 28 (2.717), and 66 (2.717) appear be outliers, and observations 67 and 71 appear to be extreme outliers (both 4.990).
library(lindia)
gg_cooksd(model2)+theme_bw()
According to the Cook’s Distance plot, observations 10, 28, 30, 31, 40, 66, 67, and 71 appear to have significant influence/leverage in the model.
Yes, sensitivity analysis should be conducted.
If you determined that sensitivity analysis was needed in Q8, perform it here.
According to both the Studentized Residuals results and Cook’s Distance plot, observations, 10, 28, 66, 67, and 71 should be removed from the model.
Compare the two models: the model with the points in question and the model without the points in question. Is there a large difference between the two models?
model3 = lm(Sell[c(-10, -28, -66, -67, -71)] ~ Catches[c(-10, -28, -66, -67, -71)])
summary(model3)
##
## Call:
## lm(formula = Sell[c(-10, -28, -66, -67, -71)] ~ Catches[c(-10,
## -28, -66, -67, -71)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2852.2 -276.5 -106.5 315.2 2702.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 406.503 136.292 2.983 0.00388 **
## Catches[c(-10, -28, -66, -67, -71)] 88.914 3.968 22.410 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1032 on 73 degrees of freedom
## Multiple R-squared: 0.8731, Adjusted R-squared: 0.8714
## F-statistic: 502.2 on 1 and 73 DF, p-value: < 2.2e-16
model_c3 <- coefficients(model3)
model_s3 <- summary(model3)
model_ci3 <- as_tibble(confint(model3, level=0.95))
model_t3 <- as_tibble(model_s3[[4]])
The resulting model, such that \(Y={\mbox{Selling Price}}\), is:
\[ \hat{Y} = 406.5 + 88.91X_{\mbox{Catches}}\]
The original model (Model 2), for comparison, is: \[ \hat{Y} = 378.66 - 27.02X_{\mbox{Flying}} + 613.25X_{\mbox{Rain}} + 88.81X_{\mbox{Catches}}\]
The two models differ quite significantly, especially regarding the removal of Flying and Rain because they were deemed insignificant in Q6. In the second model, \(\beta_{\mbox{Catches}}\) is nearly the same as it was in the previous model, but the intercept increased.
Assess the model assumptions graphically. Is our model valid?
almost_sas(model4)
According to the normality plots, it appears that the data are somewhat normally distributed, thereby meeting the normality and variance assumptions. The data do not perfectly match the desired plots (see the Normal Q-Q plot and histogram of residuals), but the data follow the required predictions (such as the density plot) quite closely. Therefore, the model assumptions are met, and the model is valid.