library(tidyverse)
library(knitr)
library(broom)
library(rrcov)
# Recall then "fish" data set containing several species of fish from a lake in Norway.
data(fish)
# Clean up the variables in this data set.
fish <- fish %>%
mutate(Species = as.character(Species)) %>%
mutate(name = recode(Species, '1'="Bream", '2'="Whitefish", '3'="Roach", '4'="Parkki", '5'="Smelt", '6'="Pike", '7'="Perch"))
I will start this assignment by evaluating the relationship of length and weight in perch fish.
# Create a subset of data that represents one taxon.
perch <- fish %>%
filter(name == "Perch")
Calculating the natural logs of the length and weight of Perch fish.
df1 <- perch %>%
mutate(condition = 100*(Weight/(Length3)^3)) %>%
mutate(log_length = log(Length3), log_weight = log(Weight))
Designing the Linear Regression model of Length vs Weight of perch fish, weight being the “predicted value”.
df1 %>%
ggplot(aes(x = log_length, y = log_weight)) +
geom_point() +
geom_smooth(method="lm", color = 'red')
Since the line of best fit is highly representative of the data, there does not need to be any cleanup or adjustments to the graph except for axis and a title.
df1 %>%
ggplot(aes(x = log_length, y = log_weight)) +
geom_point() +
geom_smooth(method="lm", color = 'red') +
labs(title = "Perch fish from the Laengelmavesi lake, Finland, 2006") +
ylab("log weight (g)") +
xlab("log length (cm)")
Below is an analysis of the linear regression above to ascertain the p value in the weight-length relationship.
lm1 <- lm((log_weight ~ log_length), data = df1)
kable(lm1 %>%
summary() %>%
tidy(),
caption = "Regression Statistics for Predicting Weight by Length in Perch")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.078840 | 0.1521228 | -33.38644 | 0 |
| log_length | 3.162688 | 0.0454162 | 69.63794 | 0 |
The p-value tells us the slope is not zero. There is a significant relationship predicting weight based on length. This information allows us to form hypotheses based on the allometric nature of length vs weight in Perch fish.
These hypotheses can be tested using the below function with 95% confidence.
lm1 %>%
confint()
## 2.5 % 97.5 %
## (Intercept) -5.383828 -4.773852
## log_length 3.071634 3.253742
Based on the above information, in which b falls between 3.07 and 3.25, we reject our null hypothesis. Perch fish appear to grow allometrically. The below ANOVA will support this analysis.
kable(lm1 %>%
anova() %>%
tidy() %>%
select(term, df, statistic, p.value),
caption = "Regression ANOVA Table")
| term | df | statistic | p.value |
|---|---|---|---|
| log_length | 1 | 4849.443 | 0 |
| Residuals | 54 | NA | NA |
As expected, the above p-value indicates that the interaction term is not significant. As such, there is no evidence to suggest that there is a lack of relationship between these variables. #This is where Perch concludes.
The next fish to evaluate are bream. Below is the creation of their data set.
bream <- fish %>%
filter(name == "Bream")
Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.
df2 <- bream %>%
mutate(condition = 100*(Weight/(Length3)^3)) %>%
mutate(log_length = log(Length3), log_weight = log(Weight))
df2 %>%
ggplot(aes(x = log_length, y = log_weight)) +
geom_point() +
geom_smooth(method="lm", color = 'red') +
labs(title = "Bream fish from the Laengelmavesi lake, Finland, 2006") +
ylab("log weight (g)") +
xlab("log length (cm)")
Regression statistics below.
lm1 <- lm((log_weight ~ log_length), data = df2)
kable(lm1 %>%
summary() %>%
tidy(),
caption = "Regression Statistics for Predicting Weight by Length in Bream")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -4.941734 | 0.5239468 | -9.431748 | 0 |
| log_length | 3.109276 | 0.1438076 | 21.621076 | 0 |
Once again, the p value is significant. Therefore, for bream fish, the hypotheses are represented as:
These hypotheses can be tested using the below function with 95% confidence.
lm1 %>%
confint()
## 2.5 % 97.5 %
## (Intercept) -6.008979 -3.874490
## log_length 2.816349 3.402202
Based on the above information, in which b falls between 2.82 and 3.40, we reject our null hypothesis. Bream fish appear to grow allometrically. The below ANOVA will support this analysis.
kable(lm1 %>%
anova() %>%
tidy() %>%
select(term, df, statistic, p.value),
caption = "Regression ANOVA Table")
| term | df | statistic | p.value |
|---|---|---|---|
| log_length | 1 | 467.4709 | 0 |
| Residuals | 32 | NA | NA |
P value ~0.
whitefish <- fish %>%
filter(name == "Whitefish")
Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.
df3 <- whitefish %>%
mutate(condition = 100*(Weight/(Length3)^3)) %>%
mutate(log_length = log(Length3), log_weight = log(Weight))
df3 %>%
ggplot(aes(x = log_length, y = log_weight)) +
geom_point() +
geom_smooth(method="lm", color = 'red') +
labs(title = "Whitefish from the Laengelmavesi lake, Finland, 2006") +
ylab("log weight (g)") +
xlab("log length (cm)")
Regression statistics below.
lm1 <- lm((log_weight ~ log_length), data = df3)
kable(lm1 %>%
summary() %>%
tidy(),
caption = "Regression Statistics for Predicting Weight by Length in Whitefish")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.704447 | 0.8939633 | -6.381075 | 0.0030947 |
| log_length | 3.360039 | 0.2534814 | 13.255564 | 0.0001872 |
Once again, the p value is significant. Therefore, for whitefish fish, the hypotheses are represented as:
These hypotheses can be tested using the below function with 95% confidence.
lm1 %>%
confint()
## 2.5 % 97.5 %
## (Intercept) -8.186488 -3.222407
## log_length 2.656262 4.063816
Based on the above information, in which b falls between 2.66 and 4.06, we reject our null hypothesis. Whitefish fish appear to grow allometrically. The below ANOVA will support this analysis.
kable(lm1 %>%
anova() %>%
tidy() %>%
select(term, df, statistic, p.value),
caption = "Regression ANOVA Table")
| term | df | statistic | p.value |
|---|---|---|---|
| log_length | 1 | 175.71 | 0.0001872 |
| Residuals | 4 | NA | NA |
P value = 0.00019. Highly significant.
roach <- fish %>%
filter(name == "Roach")
Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.
df4 <- roach %>%
mutate(condition = 100*(Weight/(Length3)^3)) %>%
mutate(log_length = log(Length3), log_weight = log(Weight))
#filter(Weight > 0)
Unable to filter the 0 valued weight in roach despite using the instructions you provided. Not sure why. I’m just going to move on to Parkki and hopefully address it with you later this week.
parkki <- fish %>%
filter(name == "Parkki")
Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.
df5 <- parkki %>%
mutate(condition = 100*(Weight/(Length3)^3)) %>%
mutate(log_length = log(Length3), log_weight = log(Weight))
df5 %>%
ggplot(aes(x = log_length, y = log_weight)) +
geom_point() +
geom_smooth(method="lm", color = 'red') +
labs(title = "Parkki from the Laengelmavesi lake, Finland, 2006") +
ylab("log weight (g)") +
xlab("log length (cm)")
Regression statistics below.
lm1 <- lm((log_weight ~ log_length), data = df5)
kable(lm1 %>%
summary() %>%
tidy(),
caption = "Regression Statistics for Predicting Weight by Length in Parkki")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -4.527962 | 0.4554613 | -9.941486 | 3.8e-06 |
| log_length | 3.034233 | 0.1461306 | 20.763848 | 0.0e+00 |
Once again, the p value is significant. Therefore, for parkki fish, the hypotheses are represented as:
These hypotheses can be tested using the below function with 95% confidence.
lm1 %>%
confint()
## 2.5 % 97.5 %
## (Intercept) -5.558287 -3.497637
## log_length 2.703663 3.364803
Based on the above information, in which b falls between 2.70 and 3.36, we reject our null hypothesis. Parkki fish appear to grow allometrically. The below ANOVA will support this analysis.
kable(lm1 %>%
anova() %>%
tidy() %>%
select(term, df, statistic, p.value),
caption = "Regression ANOVA Table")
| term | df | statistic | p.value |
|---|---|---|---|
| log_length | 1 | 431.1374 | 0 |
| Residuals | 9 | NA | NA |
P value ~0.
Below is the information for Smelt.
smelt <- fish %>%
filter(name == "Smelt")
Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.
df6 <- smelt %>%
mutate(condition = 100*(Weight/(Length3)^3)) %>%
mutate(log_length = log(Length3), log_weight = log(Weight))
df6 %>%
ggplot(aes(x = log_length, y = log_weight)) +
geom_point() +
geom_smooth(method="lm", color = 'red') +
labs(title = "Smelt from the Laengelmavesi lake, Finland, 2006") +
ylab("log weight (g)") +
xlab("log length (cm)")
Regression statistics below.
lm1 <- lm((log_weight ~ log_length), data = df6)
kable(lm1 %>%
summary() %>%
tidy(),
caption = "Regression Statistics for Predicting Weight by Length in Smelt")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.268323 | 0.6769670 | -7.782245 | 5e-06 |
| log_length | 2.976795 | 0.2639889 | 11.276215 | 1e-07 |
Once again, the p value is significant. Therefore, for smelt fish, the hypotheses are represented as:
These hypotheses can be tested using the below function with 95% confidence.
lm1 %>%
confint()
## 2.5 % 97.5 %
## (Intercept) -6.743308 -3.793339
## log_length 2.401613 3.551978
Based on the above information, in which b falls between 2.40 and 3.56, we reject our null hypothesis. Smelt fish appear to grow allometrically. The below ANOVA will support this analysis.
kable(lm1 %>%
anova() %>%
tidy() %>%
select(term, df, statistic, p.value),
caption = "Regression ANOVA Table")
| term | df | statistic | p.value |
|---|---|---|---|
| log_length | 1 | 127.153 | 1e-07 |
| Residuals | 12 | NA | NA |
P value ~0.
The last fish to examine is Pike.
pike <- fish %>%
filter(name == "Pike")
Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.
df6 <- pike %>%
mutate(condition = 100*(Weight/(Length3)^3)) %>%
mutate(log_length = log(Length3), log_weight = log(Weight))
df6 %>%
ggplot(aes(x = log_length, y = log_weight)) +
geom_point() +
geom_smooth(method="lm", color = 'red') +
labs(title = "Pike from the Laengelmavesi lake, Finland, 2006") +
ylab("log weight (g)") +
xlab("log length (cm)")
Regression statistics below.
lm1 <- lm((log_weight ~ log_length), data = df6)
kable(lm1 %>%
summary() %>%
tidy(),
caption = "Regression Statistics for Predicting Weight by Length in Pike")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -6.006614 | 0.4577910 | -13.12087 | 0 |
| log_length | 3.201095 | 0.1182519 | 27.07014 | 0 |
Once again, the p value is significant. Therefore, for pike fish, the hypotheses are represented as:
These hypotheses can be tested using the below function with 95% confidence.
lm1 %>%
confint()
## 2.5 % 97.5 %
## (Intercept) -6.982373 -5.030856
## log_length 2.949047 3.453143
Based on the above information, in which b falls between 2.94 and 3.45, we reject our null hypothesis. Pike fish appear to grow allometrically. The below ANOVA will support this analysis.
kable(lm1 %>%
anova() %>%
tidy() %>%
select(term, df, statistic, p.value),
caption = "Regression ANOVA Table")
| term | df | statistic | p.value |
|---|---|---|---|
| log_length | 1 | 732.7925 | 0 |
| Residuals | 15 | NA | NA |
P value ~0.
According to the results presented above, all fish populations of lake Laengelmavesi, Finland grow allometrically in respect to their length and weight.