1. Create an R Markdown file, import the whaledata.csv file and assign it to a data frame called df. Review the data using any of the functions we have used to find the number of rows and columns, classes of the data columns and view a small subset of the data.

df <- read.csv("whaledata.csv")
dim(df)
## [1] 100   8
str(df)
## 'data.frame':    100 obs. of  8 variables:
##  $ month          : chr  "May" "May" "May" "May" ...
##  $ time.at.station: int  1344 1633 743 1050 1764 580 459 561 709 690 ...
##  $ water.noise    : chr  "low" "medium" "medium" "medium" ...
##  $ number.whales  : int  7 13 12 10 12 10 5 8 11 12 ...
##  $ latitude       : num  60.4 60.4 60.5 60.3 60.4 ...
##  $ longitude      : num  -4.18 -4.19 -4.62 -4.35 -5.2 -5.22 -5.08 -5 -4.64 -4.84 ...
##  $ depth          : int  520 559 1006 540 1000 1000 993 988 954 984 ...
##  $ gradient       : int  415 405 88 409 97 173 162 162 245 161 ...
head(df)
##   month time.at.station water.noise number.whales latitude longitude depth
## 1   May            1344         low             7    60.37     -4.18   520
## 2   May            1633      medium            13    60.38     -4.19   559
## 3   May             743      medium            12    60.54     -4.62  1006
## 4   May            1050      medium            10    60.29     -4.35   540
## 5   May            1764      medium            12    60.41     -5.20  1000
## 6   May             580        high            10    60.38     -5.22  1000
##   gradient
## 1      415
## 2      405
## 3       88
## 4      409
## 5       97
## 6      173

2. Determine if there are any NAs in the data. If so, print out the rows that contain NAs.

sum(is.na(df))
## [1] 1
df[!complete.cases(df),]
##    month time.at.station water.noise number.whales latitude longitude depth
## 40   May             663         low            NA    60.67     -4.83  1016
##    gradient
## 40      114

3. Create a contingency table with the two categorical variables in the data frame. What does this tell us?

table(df$month, df$water.noise)
##          
##           high low medium
##   May        4  22     24
##   October   11   6     33

This table tells us the noise of the water for May and October. It reveals that the water was more noisy overall in October compared to May. #### 4. Create a boxplot that shows the number of whales on the y axis and water noise on the x axis. Be sure to label the axes, use a color for the boxes and give the chart a title. What does this graph tell us?

boxplot(df$number.whales ~ df$water.noise, 
        xlab = "Water Noise", 
        ylab = "Number of Whales", 
        col = "blue", 
        main = "Whale Population vs Water Noise")

This box plot shows that the highest average whales spotted occurred when the water noise was low, however this comes with a very low variance in the populations recorded, while the variance of the number of whales recorded is much higher for medium and high water noises, both of which have large portion of their data above the mean number of whales for low water noise, despite them both having a lower mean. #### 5. List the 10 steps from Roger Peng’s EDA Checklist. Style your output so that there is a header called “Roger Peng’s EDA Checklist” formatted as Header 2 and each step is formatted in a numbered list. Include a hyperlink called Source that links to Peng’s book at https://bookdown.org/rdpeng/exdata/exploratory-data-analysis-checklist.html. Enter you answer below using R Markdown. Hint: You can access the R Markdown Reference Guide from Help/Cheat Sheets in RStudio.

Roger Peng’s EDA Checklist

  1. Formulate the question/problem
  2. Read in the data
  3. Check the dimensions of the data
  4. Check variable names
  5. Check variable types
  6. Look at the top few rows of the data
  7. Check for missing data
  8. Plot the data to explore relationships
  9. Summarize the data to better understand the distribution
  10. Consider alternative visualizations to better represent the data

Source

6. Saver your file as Quiz3.Rmd, knit your file to html, zip it and upload the file to Blackboard or post to RPubs and submit the RPubs link.