library(tidyverse)
library(knitr)
library(broom)
library(rrcov)

# Recall then "fish" data set containing several species of fish from a lake in Norway.
data(fish)

# Clean up the variables in this data set.
fish <- fish %>% 
  mutate(Species = as.character(Species)) %>% 
  mutate(name = recode(Species, '1'="Bream", '2'="Whitefish", '3'="Roach", '4'="Parkki", '5'="Smelt", '6'="Pike", '7'="Perch"))

Perch

I will start this assignment by evaluating the relationship of length and weight in perch fish.

# Create a subset of data that represents one taxon.
perch <- fish %>% 
  filter(name == "Perch")

Calculating the natural logs of the length and weight of Perch fish.

df1 <- perch %>% 
  mutate(condition = 100*(Weight/(Length3)^3)) %>% 
  mutate(log_length = log(Length3), log_weight = log(Weight))

Designing the Linear Regression model of Length vs Weight of perch fish, weight being the “predicted value”.

df1 %>% 
  ggplot(aes(x = log_length, y = log_weight)) + 
  geom_point() +
  geom_smooth(method="lm", color = 'red') 

Since the line of best fit is highly representative of the data, there does not need to be any cleanup or adjustments to the graph except for axis and a title.

df1 %>%
  ggplot(aes(x = log_length, y = log_weight)) + 
  geom_point() +
  geom_smooth(method="lm", color = 'red') +
  labs(title = "Perch fish from the Laengelmavesi lake, Finland, 2006") +
  ylab("log weight (g)") +
  xlab("log length (cm)")

Below is an analysis of the linear regression above to ascertain the p value in the weight-length relationship.

lm1 <- lm((log_weight ~ log_length), data = df1)

kable(lm1 %>% 
  summary() %>% 
  tidy(),
  caption = "Regression Statistics for Predicting Weight by Length in Perch")
Regression Statistics for Predicting Weight by Length in Perch
term estimate std.error statistic p.value
(Intercept) -5.078840 0.1521228 -33.38644 0
log_length 3.162688 0.0454162 69.63794 0

The p-value tells us the slope is not zero. There is a significant relationship predicting weight based on length. This information allows us to form hypotheses based on the allometric nature of length vs weight in Perch fish.

H0 = The null hypothesis is that the slope, b, is 3.16. The fish grows isometrically. b= 3.16

H1: The research hypothesis is that the slope, b, is not equal to 3.16. The fish grows allometrically. b =/= 3.16

These hypotheses can be tested using the below function with 95% confidence.

lm1 %>% 
  confint()
##                 2.5 %    97.5 %
## (Intercept) -5.383828 -4.773852
## log_length   3.071634  3.253742

Based on the above information, in which b falls between 3.07 and 3.25, we reject our null hypothesis. Perch fish appear to grow allometrically. The below ANOVA will support this analysis.

kable(lm1 %>% 
  anova() %>% 
  tidy() %>% 
  select(term, df, statistic, p.value),
  caption = "Regression ANOVA Table")
Regression ANOVA Table
term df statistic p.value
log_length 1 4849.443 0
Residuals 54 NA NA

As expected, the above p-value indicates that the interaction term is not significant. As such, there is no evidence to suggest that there is a lack of relationship between these variables. #This is where Perch concludes.

Bream

The next fish to evaluate are bream. Below is the creation of their data set.

bream <- fish %>% 
  filter(name == "Bream")

Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.

df2 <- bream %>% 
  mutate(condition = 100*(Weight/(Length3)^3)) %>% 
  mutate(log_length = log(Length3), log_weight = log(Weight))
df2 %>%
  ggplot(aes(x = log_length, y = log_weight)) + 
  geom_point() +
  geom_smooth(method="lm", color = 'red') +
  labs(title = "Bream fish from the Laengelmavesi lake, Finland, 2006") +
  ylab("log weight (g)") +
  xlab("log length (cm)")

Regression statistics below.

lm1 <- lm((log_weight ~ log_length), data = df2)

kable(lm1 %>% 
  summary() %>% 
  tidy(),
  caption = "Regression Statistics for Predicting Weight by Length in Bream")
Regression Statistics for Predicting Weight by Length in Bream
term estimate std.error statistic p.value
(Intercept) -4.941734 0.5239468 -9.431748 0
log_length 3.109276 0.1438076 21.621076 0

Once again, the p value is significant. Therefore, for bream fish, the hypotheses are represented as:

H0 = The null hypothesis is that the slope, b, is 3.11. The fish grows isometrically. b= 3.11

H1: The research hypothesis is that the slope, b, is not equal to 3.11. The fish grows allometrically. b =/= 3.11

These hypotheses can be tested using the below function with 95% confidence.

lm1 %>% 
  confint()
##                 2.5 %    97.5 %
## (Intercept) -6.008979 -3.874490
## log_length   2.816349  3.402202

Based on the above information, in which b falls between 2.82 and 3.40, we reject our null hypothesis. Bream fish appear to grow allometrically. The below ANOVA will support this analysis.

kable(lm1 %>% 
  anova() %>% 
  tidy() %>% 
  select(term, df, statistic, p.value),
  caption = "Regression ANOVA Table")
Regression ANOVA Table
term df statistic p.value
log_length 1 467.4709 0
Residuals 32 NA NA

P value ~0.

Whitefish

whitefish <- fish %>% 
  filter(name == "Whitefish")

Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.

df3 <- whitefish %>% 
  mutate(condition = 100*(Weight/(Length3)^3)) %>% 
  mutate(log_length = log(Length3), log_weight = log(Weight))
df3 %>%
  ggplot(aes(x = log_length, y = log_weight)) + 
  geom_point() +
  geom_smooth(method="lm", color = 'red') +
  labs(title = "Whitefish from the Laengelmavesi lake, Finland, 2006") +
  ylab("log weight (g)") +
  xlab("log length (cm)")

Regression statistics below.

lm1 <- lm((log_weight ~ log_length), data = df3)

kable(lm1 %>% 
  summary() %>% 
  tidy(),
  caption = "Regression Statistics for Predicting Weight by Length in Whitefish")
Regression Statistics for Predicting Weight by Length in Whitefish
term estimate std.error statistic p.value
(Intercept) -5.704447 0.8939633 -6.381075 0.0030947
log_length 3.360039 0.2534814 13.255564 0.0001872

Once again, the p value is significant. Therefore, for whitefish fish, the hypotheses are represented as:

H0 = The null hypothesis is that the slope, b, is 3.36. The fish grows isometrically. b= 3.36

H1: The research hypothesis is that the slope, b, is not equal to 3.36. The fish grows allometrically. b =/= 3.36

These hypotheses can be tested using the below function with 95% confidence.

lm1 %>% 
  confint()
##                 2.5 %    97.5 %
## (Intercept) -8.186488 -3.222407
## log_length   2.656262  4.063816

Based on the above information, in which b falls between 2.66 and 4.06, we reject our null hypothesis. Whitefish fish appear to grow allometrically. The below ANOVA will support this analysis.

kable(lm1 %>% 
  anova() %>% 
  tidy() %>% 
  select(term, df, statistic, p.value),
  caption = "Regression ANOVA Table")
Regression ANOVA Table
term df statistic p.value
log_length 1 175.71 0.0001872
Residuals 4 NA NA

P value = 0.00019. Highly significant.

Roach, unsuccessful

roach <- fish %>% 
  filter(name == "Roach")

Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.

df4 <- roach %>% 
  mutate(condition = 100*(Weight/(Length3)^3)) %>% 
  mutate(log_length = log(Length3), log_weight = log(Weight))
  #filter(Weight > 0)

Unable to filter the 0 valued weight in roach despite using the instructions you provided. Not sure why. I’m just going to move on to Parkki and hopefully address it with you later this week.

Parkki

parkki <- fish %>%
  filter(name == "Parkki")

Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.

df5 <- parkki %>% 
  mutate(condition = 100*(Weight/(Length3)^3)) %>% 
  mutate(log_length = log(Length3), log_weight = log(Weight))
df5 %>%
  ggplot(aes(x = log_length, y = log_weight)) + 
  geom_point() +
  geom_smooth(method="lm", color = 'red') +
  labs(title = "Parkki from the Laengelmavesi lake, Finland, 2006") +
  ylab("log weight (g)") +
  xlab("log length (cm)")

Regression statistics below.

lm1 <- lm((log_weight ~ log_length), data = df5)

kable(lm1 %>% 
  summary() %>% 
  tidy(),
  caption = "Regression Statistics for Predicting Weight by Length in Parkki")
Regression Statistics for Predicting Weight by Length in Parkki
term estimate std.error statistic p.value
(Intercept) -4.527962 0.4554613 -9.941486 3.8e-06
log_length 3.034233 0.1461306 20.763848 0.0e+00

Once again, the p value is significant. Therefore, for parkki fish, the hypotheses are represented as:

H0 = The null hypothesis is that the slope, b, is 3.03. The fish grows isometrically. b= 3.03

H1: The research hypothesis is that the slope, b, is not equal to 3.03. The fish grows allometrically. b =/= 3.03

These hypotheses can be tested using the below function with 95% confidence.

lm1 %>% 
  confint()
##                 2.5 %    97.5 %
## (Intercept) -5.558287 -3.497637
## log_length   2.703663  3.364803

Based on the above information, in which b falls between 2.70 and 3.36, we reject our null hypothesis. Parkki fish appear to grow allometrically. The below ANOVA will support this analysis.

kable(lm1 %>% 
  anova() %>% 
  tidy() %>% 
  select(term, df, statistic, p.value),
  caption = "Regression ANOVA Table")
Regression ANOVA Table
term df statistic p.value
log_length 1 431.1374 0
Residuals 9 NA NA

P value ~0.

Smelt

Below is the information for Smelt.

smelt <- fish %>%
  filter(name == "Smelt")

Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.

df6 <- smelt %>% 
  mutate(condition = 100*(Weight/(Length3)^3)) %>% 
  mutate(log_length = log(Length3), log_weight = log(Weight))
df6 %>%
  ggplot(aes(x = log_length, y = log_weight)) + 
  geom_point() +
  geom_smooth(method="lm", color = 'red') +
  labs(title = "Smelt from the Laengelmavesi lake, Finland, 2006") +
  ylab("log weight (g)") +
  xlab("log length (cm)")

Regression statistics below.

lm1 <- lm((log_weight ~ log_length), data = df6)

kable(lm1 %>% 
  summary() %>% 
  tidy(),
  caption = "Regression Statistics for Predicting Weight by Length in Smelt")
Regression Statistics for Predicting Weight by Length in Smelt
term estimate std.error statistic p.value
(Intercept) -5.268323 0.6769670 -7.782245 5e-06
log_length 2.976795 0.2639889 11.276215 1e-07

Once again, the p value is significant. Therefore, for smelt fish, the hypotheses are represented as:

H0 = The null hypothesis is that the slope, b, is 2.98. The fish grows isometrically. b= 2.98

H1: The research hypothesis is that the slope, b, is not equal to 2.98. The fish grows allometrically. b =/= 2.98

These hypotheses can be tested using the below function with 95% confidence.

lm1 %>% 
  confint()
##                 2.5 %    97.5 %
## (Intercept) -6.743308 -3.793339
## log_length   2.401613  3.551978

Based on the above information, in which b falls between 2.40 and 3.56, we reject our null hypothesis. Smelt fish appear to grow allometrically. The below ANOVA will support this analysis.

kable(lm1 %>% 
  anova() %>% 
  tidy() %>% 
  select(term, df, statistic, p.value),
  caption = "Regression ANOVA Table")
Regression ANOVA Table
term df statistic p.value
log_length 1 127.153 1e-07
Residuals 12 NA NA

P value ~0.

Pike

The last fish to examine is Pike.

pike <- fish %>%
  filter(name == "Pike")

Following the same steps as before, I will gather the natural log of the fish length and weights and graph them as the x and y, respectively.

df6 <- pike %>% 
  mutate(condition = 100*(Weight/(Length3)^3)) %>% 
  mutate(log_length = log(Length3), log_weight = log(Weight))
df6 %>%
  ggplot(aes(x = log_length, y = log_weight)) + 
  geom_point() +
  geom_smooth(method="lm", color = 'red') +
  labs(title = "Pike from the Laengelmavesi lake, Finland, 2006") +
  ylab("log weight (g)") +
  xlab("log length (cm)")

Regression statistics below.

lm1 <- lm((log_weight ~ log_length), data = df6)

kable(lm1 %>% 
  summary() %>% 
  tidy(),
  caption = "Regression Statistics for Predicting Weight by Length in Pike")
Regression Statistics for Predicting Weight by Length in Pike
term estimate std.error statistic p.value
(Intercept) -6.006614 0.4577910 -13.12087 0
log_length 3.201095 0.1182519 27.07014 0

Once again, the p value is significant. Therefore, for pike fish, the hypotheses are represented as:

H0 = The null hypothesis is that the slope, b, is 3.20. The fish grows isometrically. b= 3.20

H1: The research hypothesis is that the slope, b, is not equal to 3.20. The fish grows allometrically. b =/= 3.20

These hypotheses can be tested using the below function with 95% confidence.

lm1 %>% 
  confint()
##                 2.5 %    97.5 %
## (Intercept) -6.982373 -5.030856
## log_length   2.949047  3.453143

Based on the above information, in which b falls between 2.94 and 3.45, we reject our null hypothesis. Pike fish appear to grow allometrically. The below ANOVA will support this analysis.

kable(lm1 %>% 
  anova() %>% 
  tidy() %>% 
  select(term, df, statistic, p.value),
  caption = "Regression ANOVA Table")
Regression ANOVA Table
term df statistic p.value
log_length 1 732.7925 0
Residuals 15 NA NA

P value ~0.

Results

According to the results presented above, all fish populations of lake Laengelmavesi, Finland grow allometrically in respect to their length and weight.