theURL<-"https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/COUNT/fishing.csv"
fishingData<-read.table(file=theURL, header= TRUE, sep=",")
head(fishingData)
## X site totabund density meandepth year period sweptarea
## 1 1 1 76 0.002070281 804 1978 1977-1989 36710.00
## 2 2 2 161 0.003519799 808 2001 2000-2002 45741.25
## 3 3 3 39 0.000980515 809 2001 2000-2002 39775.00
## 4 4 4 410 0.008039216 848 1979 1977-1989 51000.00
## 5 5 5 177 0.005933375 853 2002 2000-2002 29831.25
## 6 6 6 695 0.021800501 960 1980 1977-1989 31880.00
summary(fishingData)
## X site totabund density
## Min. : 1.0 Min. : 1.0 Min. : 2.0 Min. :0.0000148
## 1st Qu.: 37.5 1st Qu.: 37.5 1st Qu.: 55.5 1st Qu.:0.0007757
## Median : 74.0 Median : 74.0 Median : 155.0 Median :0.0026311
## Mean : 74.0 Mean : 74.2 Mean : 216.1 Mean :0.0047137
## 3rd Qu.:110.5 3rd Qu.:110.5 3rd Qu.: 283.5 3rd Qu.:0.0064955
## Max. :147.0 Max. :148.0 Max. :1230.0 Max. :0.0309239
## meandepth year period sweptarea
## Min. : 804 Min. :1977 1977-1989:97 Min. : 7970
## 1st Qu.:1410 1st Qu.:1980 2000-2002:50 1st Qu.: 41400
## Median :1993 Median :1982 Median : 54420
## Mean :2413 Mean :1988 Mean : 64800
## 3rd Qu.:3312 3rd Qu.:2001 3rd Qu.: 79810
## Max. :4865 Max. :2002 Max. :223440
nrows<-nrow(fishingData)
ncolumns<-ncol((fishingData))
dfDimensions<-data.frame(cbind(nrows, ncolumns))
dfDimensions
## nrows ncolumns
## 1 147 8
The means and medians of the variables
means<-sapply(fishingData[, c("density", "meandepth")], mean)
medians<-sapply(fishingData[, c("density", "meandepth")], median)
means_mediansDF<-data.frame(cbind(means, medians))
means_mediansDF
## means medians
## density 4.713711e-03 2.631079e-03
## meandepth 2.413088e+03 1.993000e+03
The correlation of the variables
cor(fishingData$density, fishingData$meandepth)
## [1] -0.5823638
2.Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)
head(fishingData)
## X site totabund density meandepth year period sweptarea
## 1 1 1 76 0.002070281 804 1978 1977-1989 36710.00
## 2 2 2 161 0.003519799 808 2001 2000-2002 45741.25
## 3 3 3 39 0.000980515 809 2001 2000-2002 39775.00
## 4 4 4 410 0.008039216 848 1979 1977-1989 51000.00
## 5 5 5 177 0.005933375 853 2002 2000-2002 29831.25
## 6 6 6 695 0.021800501 960 1980 1977-1989 31880.00
newfishingData<-fishingData[,c("X", "totabund","density", "meandepth", "year", "sweptarea")]
head(newfishingData)
## X totabund density meandepth year sweptarea
## 1 1 76 0.002070281 804 1978 36710.00
## 2 2 161 0.003519799 808 2001 45741.25
## 3 3 39 0.000980515 809 2001 39775.00
## 4 4 410 0.008039216 848 1979 51000.00
## 5 5 177 0.005933375 853 2002 29831.25
## 6 6 695 0.021800501 960 1980 31880.00
names(newfishingData)[names(newfishingData)=="X"]<-"ID"
names(newfishingData)[names(newfishingData)=="X"]<-"Id"
names(newfishingData)[names(newfishingData)=="totabund"]<-"Total_Fish_Per_Site"
names(newfishingData)[names(newfishingData)=="density"]<-"Density"
names(newfishingData)[names(newfishingData)=="meandepth"]<-"MeanDepth"
names(newfishingData)[names(newfishingData)=="year"]<-"Year"
names(newfishingData)[names(newfishingData)=="sweptarea"]<-"SweptArea"
head(newfishingData)
## ID Total_Fish_Per_Site Density MeanDepth Year SweptArea
## 1 1 76 0.002070281 804 1978 36710.00
## 2 2 161 0.003519799 808 2001 45741.25
## 3 3 39 0.000980515 809 2001 39775.00
## 4 4 410 0.008039216 848 1979 51000.00
## 5 5 177 0.005933375 853 2002 29831.25
## 6 6 695 0.021800501 960 1980 31880.00
require(ggplot2)
## Loading required package: ggplot2
boxplot((newfishingData$Total_Fish_Per_Site))
hist(newfishingData$Total_Fish_Per_Site, main="Total Fish per Site Histogram", xlab="Total Fish per Site")
plot(Total_Fish_Per_Site ~ Year, data=newfishingData, main="Total_Fish_Per_Site Plotted Against Year")
abline(lm(newfishingData$Total_Fish_Per_Site ~ newfishingData$Year))
summary(lm(newfishingData$Total_Fish_Per_Site ~ newfishingData$Year))
##
## Call:
## lm(formula = newfishingData$Total_Fish_Per_Site ~ newfishingData$Year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -256.41 -139.17 -51.17 86.16 1080.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9484.198 3584.590 2.646 0.00905 **
## newfishingData$Year -4.663 1.803 -2.586 0.01071 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 216.4 on 145 degrees of freedom
## Multiple R-squared: 0.04407, Adjusted R-squared: 0.03748
## F-statistic: 6.685 on 1 and 145 DF, p-value: 0.01071
Question: Is there a correlation to how certain deep-sea fish populations were affected when commercial fishing began in locations with deeper water than in previous years?
Based on our scatterplot we can tell certain deep-sea fish populations were affected when commercial fishing begin in those locations with deeper water. We see there is a decrease in fish population from the previous years. I believe that some deep-sea fish species up to the commerical fishing zones and get caught, thus reducing their population over the years.
# url<-"https://raw.githubusercontent.com/Sizzlo/Rdatasets/master/csv/COUNT/fishing.csv"
# fishingdata<-read.csv(url)