setwd("C:\\Users\\Jack\\Documents\\AB\\PVA MS Sturgeon 2019")
dat <- read.csv("sturgeon_length_data.csv")
head(dat)
How many rows are there?
nrow(dat)
[1] 6683
Weight imported as a factor, but this should be numeric.
dat$weight <- as.numeric(as.character(dat$weight))
NAs introduced by coercion
In the Excel sheet, NA’s were entered as the character “.”, but this gets converted to NA correctly.
How many NA’s are there for weight?
sum(is.na(dat$weight))
[1] 4628
Check this sum against the original file sum of 4459.28198.
sum(dat$weight, na.rm = TRUE)
[1] 4459.282
The sum matches.
The data are from an Excel spreadsheet. The Excel date has an origin of December 30, 1899, and needs to be converted for R.
dat$date <- as.Date(dat$date, origin = "1970-01-01")
Now check the data set again.
head(dat)
dat2 <- aggregate(dat$numberCaught, by = list(species = dat$species, riverMile = dat$riverMile), FUN = sum)
head(dat2)
plot(dat2$riverMile[dat2$species == "PALLID"], dat2$x[dat2$species == "PALLID"])
fitPallid <- lm(dat2$x[dat2$species == "PALLID"] ~ dat2$riverMile[dat2$species == "PALLID"])
summary(fitPallid)
Call:
lm(formula = dat2$x[dat2$species == "PALLID"] ~ dat2$riverMile[dat2$species ==
"PALLID"])
Residuals:
Min 1Q Median 3Q Max
-2.3616 -1.6425 -0.5549 0.4432 13.6356
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.4116642 0.5929325 4.067 0.00011 ***
dat2$riverMile[dat2$species == "PALLID"] 0.0008329 0.0008900 0.936 0.35214
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.499 on 81 degrees of freedom
Multiple R-squared: 0.0107, Adjusted R-squared: -0.001517
F-statistic: 0.8758 on 1 and 81 DF, p-value: 0.3521
No significant trend for pallid sturgeon.
plot(dat2$riverMile[dat2$species == "SHOVELNOSE"], dat2$x[dat2$species == "SHOVELNOSE"])
fitShovel <- lm(dat2$x[dat2$species == "SHOVELNOSE"] ~ dat2$riverMile[dat2$species == "SHOVELNOSE"])
summary(fitShovel)
Call:
lm(formula = dat2$x[dat2$species == "SHOVELNOSE"] ~ dat2$riverMile[dat2$species ==
"SHOVELNOSE"])
Residuals:
Min 1Q Median 3Q Max
-72.36 -31.21 -12.70 5.48 1079.78
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -14.02203 18.78971 -0.746 0.45652
dat2$riverMile[dat2$species == "SHOVELNOSE"] 0.07892 0.02663 2.964 0.00347 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 101.5 on 173 degrees of freedom
Multiple R-squared: 0.04832, Adjusted R-squared: 0.04282
F-statistic: 8.783 on 1 and 173 DF, p-value: 0.003468
There is a significant trend for shovelnose sturgeon, but this might only be due to the two extremely large observations far upstream.