df <- read.csv("whaledata.csv")
dim(df)
## [1] 100 8
str(df)
## 'data.frame': 100 obs. of 8 variables:
## $ month : chr "May" "May" "May" "May" ...
## $ time.at.station: int 1344 1633 743 1050 1764 580 459 561 709 690 ...
## $ water.noise : chr "low" "medium" "medium" "medium" ...
## $ number.whales : int 7 13 12 10 12 10 5 8 11 12 ...
## $ latitude : num 60.4 60.4 60.5 60.3 60.4 ...
## $ longitude : num -4.18 -4.19 -4.62 -4.35 -5.2 -5.22 -5.08 -5 -4.64 -4.84 ...
## $ depth : int 520 559 1006 540 1000 1000 993 988 954 984 ...
## $ gradient : int 415 405 88 409 97 173 162 162 245 161 ...
head(df)
## month time.at.station water.noise number.whales latitude longitude depth
## 1 May 1344 low 7 60.37 -4.18 520
## 2 May 1633 medium 13 60.38 -4.19 559
## 3 May 743 medium 12 60.54 -4.62 1006
## 4 May 1050 medium 10 60.29 -4.35 540
## 5 May 1764 medium 12 60.41 -5.20 1000
## 6 May 580 high 10 60.38 -5.22 1000
## gradient
## 1 415
## 2 405
## 3 88
## 4 409
## 5 97
## 6 173
sum(is.na(df))
## [1] 1
df[!complete.cases(df),]
## month time.at.station water.noise number.whales latitude longitude depth
## 40 May 663 low NA 60.67 -4.83 1016
## gradient
## 40 114
table(df$month, df$water.noise)
##
## high low medium
## May 4 22 24
## October 11 6 33
This table tells us the noise of the water for May and October. It reveals that the water was more noisy overall in October compared to May. #### 4. Create a boxplot that shows the number of whales on the y axis and water noise on the x axis. Be sure to label the axes, use a color for the boxes and give the chart a title. What does this graph tell us?
boxplot(df$number.whales ~ df$water.noise,
xlab = "Water Noise",
ylab = "Number of Whales",
col = "blue",
main = "Whale Population vs Water Noise")
This box plot shows that the highest average whales spotted occurred when the water noise was low, however this comes with a very low variance in the populations recorded, while the variance of the number of whales recorded is much higher for medium and high water noises, both of which have large portion of their data above the mean number of whales for low water noise, despite them both having a lower mean. #### 5. List the 10 steps from Roger Peng’s EDA Checklist. Style your output so that there is a header called “Roger Peng’s EDA Checklist” formatted as Header 2 and each step is formatted in a numbered list. Include a hyperlink called Source that links to Peng’s book at https://bookdown.org/rdpeng/exdata/exploratory-data-analysis-checklist.html. Enter you answer below using R Markdown. Hint: You can access the R Markdown Reference Guide from Help/Cheat Sheets in RStudio.