1. Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.
theURL<-"https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/COUNT/fishing.csv"
fishingData<-read.table(file=theURL, header= TRUE, sep=",")
head(fishingData)
##   X site totabund     density meandepth year    period sweptarea
## 1 1    1       76 0.002070281       804 1978 1977-1989  36710.00
## 2 2    2      161 0.003519799       808 2001 2000-2002  45741.25
## 3 3    3       39 0.000980515       809 2001 2000-2002  39775.00
## 4 4    4      410 0.008039216       848 1979 1977-1989  51000.00
## 5 5    5      177 0.005933375       853 2002 2000-2002  29831.25
## 6 6    6      695 0.021800501       960 1980 1977-1989  31880.00
summary(fishingData)
##        X              site          totabund         density         
##  Min.   :  1.0   Min.   :  1.0   Min.   :   2.0   Min.   :0.0000148  
##  1st Qu.: 37.5   1st Qu.: 37.5   1st Qu.:  55.5   1st Qu.:0.0007757  
##  Median : 74.0   Median : 74.0   Median : 155.0   Median :0.0026311  
##  Mean   : 74.0   Mean   : 74.2   Mean   : 216.1   Mean   :0.0047137  
##  3rd Qu.:110.5   3rd Qu.:110.5   3rd Qu.: 283.5   3rd Qu.:0.0064955  
##  Max.   :147.0   Max.   :148.0   Max.   :1230.0   Max.   :0.0309239  
##    meandepth         year            period     sweptarea     
##  Min.   : 804   Min.   :1977   1977-1989:97   Min.   :  7970  
##  1st Qu.:1410   1st Qu.:1980   2000-2002:50   1st Qu.: 41400  
##  Median :1993   Median :1982                  Median : 54420  
##  Mean   :2413   Mean   :1988                  Mean   : 64800  
##  3rd Qu.:3312   3rd Qu.:2001                  3rd Qu.: 79810  
##  Max.   :4865   Max.   :2002                  Max.   :223440
nrows<-nrow(fishingData)
ncolumns<-ncol((fishingData))
dfDimensions<-data.frame(cbind(nrows, ncolumns))
dfDimensions
##   nrows ncolumns
## 1   147        8

The means and medians of the variables

means<-sapply(fishingData[, c("density", "meandepth")], mean)
medians<-sapply(fishingData[, c("density", "meandepth")], median)
means_mediansDF<-data.frame(cbind(means, medians))
means_mediansDF
##                  means      medians
## density   4.713711e-03 2.631079e-03
## meandepth 2.413088e+03 1.993000e+03

The correlation of the variables

cor(fishingData$density, fishingData$meandepth)
## [1] -0.5823638

2.Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)

head(fishingData)
##   X site totabund     density meandepth year    period sweptarea
## 1 1    1       76 0.002070281       804 1978 1977-1989  36710.00
## 2 2    2      161 0.003519799       808 2001 2000-2002  45741.25
## 3 3    3       39 0.000980515       809 2001 2000-2002  39775.00
## 4 4    4      410 0.008039216       848 1979 1977-1989  51000.00
## 5 5    5      177 0.005933375       853 2002 2000-2002  29831.25
## 6 6    6      695 0.021800501       960 1980 1977-1989  31880.00
newfishingData<-fishingData[,c("X", "totabund","density", "meandepth", "year", "sweptarea")]
head(newfishingData)
##   X totabund     density meandepth year sweptarea
## 1 1       76 0.002070281       804 1978  36710.00
## 2 2      161 0.003519799       808 2001  45741.25
## 3 3       39 0.000980515       809 2001  39775.00
## 4 4      410 0.008039216       848 1979  51000.00
## 5 5      177 0.005933375       853 2002  29831.25
## 6 6      695 0.021800501       960 1980  31880.00
names(newfishingData)[names(newfishingData)=="X"]<-"ID"
names(newfishingData)[names(newfishingData)=="X"]<-"Id"
names(newfishingData)[names(newfishingData)=="totabund"]<-"Total_Fish_Per_Site"
names(newfishingData)[names(newfishingData)=="density"]<-"Density"
names(newfishingData)[names(newfishingData)=="meandepth"]<-"MeanDepth"
names(newfishingData)[names(newfishingData)=="year"]<-"Year"
names(newfishingData)[names(newfishingData)=="sweptarea"]<-"SweptArea"
head(newfishingData)
##   ID Total_Fish_Per_Site     Density MeanDepth Year SweptArea
## 1  1                  76 0.002070281       804 1978  36710.00
## 2  2                 161 0.003519799       808 2001  45741.25
## 3  3                  39 0.000980515       809 2001  39775.00
## 4  4                 410 0.008039216       848 1979  51000.00
## 5  5                 177 0.005933375       853 2002  29831.25
## 6  6                 695 0.021800501       960 1980  31880.00
  1. Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.
require(ggplot2)
## Loading required package: ggplot2
boxplot((newfishingData$Total_Fish_Per_Site))

hist(newfishingData$Total_Fish_Per_Site, main="Total Fish per Site Histogram", xlab="Total Fish per Site")

plot(Total_Fish_Per_Site ~ Year, data=newfishingData, main="Total_Fish_Per_Site Plotted Against Year")
abline(lm(newfishingData$Total_Fish_Per_Site ~ newfishingData$Year))

summary(lm(newfishingData$Total_Fish_Per_Site ~ newfishingData$Year))
## 
## Call:
## lm(formula = newfishingData$Total_Fish_Per_Site ~ newfishingData$Year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -256.41 -139.17  -51.17   86.16 1080.49 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)         9484.198   3584.590   2.646  0.00905 **
## newfishingData$Year   -4.663      1.803  -2.586  0.01071 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 216.4 on 145 degrees of freedom
## Multiple R-squared:  0.04407,    Adjusted R-squared:  0.03748 
## F-statistic: 6.685 on 1 and 145 DF,  p-value: 0.01071
  1. Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end.

Question: Is there a correlation to how certain deep-sea fish populations were affected when commercial fishing began in locations with deeper water than in previous years?

Based on our scatterplot we can tell certain deep-sea fish populations were affected when commercial fishing begin in those locations with deeper water. We see there is a decrease in fish population from the previous years. I believe that some deep-sea fish species up to the commerical fishing zones and get caught, thus reducing their population over the years.

  1. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
# url<-"https://raw.githubusercontent.com/Sizzlo/Rdatasets/master/csv/COUNT/fishing.csv"
# fishingdata<-read.csv(url)