Fish age and length have a moderate concave down curve relationship that begins increasing and then decreases, with two apparent outliers. These outliers occur around (5, 190) and (5.5, 190).
Based on the scatterplot, I would select the length minimum to be 155 cm. The dotted line indicates the separation between fish who can be kept when caught, over the age of 4 years, and those that must be released. The maximum fish length of those under the age of 4 is 150 cm.There are also fish over the age of 4 that are 150 cm. Therefore, to give the younger fish some buffer room, I selected 155 cm, which leaves most of the fish over the age of 4 while still preserving some of that population as well.
Based on the data representations Figure 1, Figure 2, and Figure 3 present, it does not seem like an LSLR model is appropriate for this data. An LSLR model is used for linear data, and this data’s shape is curved. It starts out increasing, then turns to decrease around the age (x) of 6. Instead, a polynomial model may be better for this data.
Rather than an LSLR model, I would use a fitted polynomial model for this data. When using a polynomial model, adding a power allows the line to change directions once. This line changes direction once, from increasing to decreasing, and therefore a fitted polynomial line would be able to fit it well.
This model visually appears to be a reasonable choice for this data set. The line curves well with the data and follows the trend.
##
## Call:
## lm(formula = length ~ age + I(age^2), data = bluegills)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.694 -8.354 -0.085 8.780 28.126
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15.3085 9.2143 -1.661 0.0985 .
## age 62.2301 4.1747 14.907 <2e-16 ***
## I(age^2) -5.4586 0.4613 -11.832 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.47 on 167 degrees of freedom
## Multiple R-squared: 0.7165, Adjusted R-squared: 0.7131
## F-statistic: 211 on 2 and 167 DF, p-value: < 2.2e-16
The fitted model for this line is: \[\widehat{Fish Length} = -15.308 + 62.23(Age) - 5.46(Age^2) \]
## Question 11 Figure 5, the studentized residual plot, demonstrates that there are eight outlier that are beyond the 2 and -2 cutoff. There are 3 outliers clustered around 3 years of age below a studentized residual of -2 and 2 clustered around 3 years above the studentized residual of 2. There is also 1 outlier close to age 2, 1 close to age 5, and 1 close to age 5.5 above the studentized residual 2. Therefore, 3 outliers are below the studentized residual -2 and 5 outliers are above the studentized residual 2. An influential point is where both the x and y coordinate of the data point are unusual and will likely change the slope of the model line. Looking at Figure 4, none of these outliers appear to be influential.
## [1] 25958.02
## [1] 91552.09
## [1] 0.7164672
## [1] 0.7164672
Using the fitted model, the predicted fish length for a 4 year old bluegill would be \[\widehat{Fish Length} = -15.308 + 62.23(4) - 5.46(4^2) \] which equals 146.252 cm.
## fit lwr upr
## 1 146.2744 116.8695 175.6792
We are 98$ confident that for a fish of age 4 years, the length of the fish should be between 116.8695 cm and 175.6792 cm.
Based on this, I reccommend that the client uses a length minimum of 151 cm. The model explains 71.65% of the variance in fish length, which is a good fit. The model predicts that the fish length of a 4 year old bluegill would be 146.252 cm, while a 98% confidence interval goes up to 175.679. As many fish over the age of 4 years are included in that 0-175 cm range, I will stick closer to the predicted model average. I would therefore add a buffer of about 5 cm to the predicted model length, resulting in a reccomended length minimum of 151 cm.