Alter the below code chunks to complete the lab. Also, ensure you include your name in the author portion of this header. When finished, knit your file to an html and submit this to your TA for grading.

This first lab is meant to get you acquainted with the data visualization process in R. We will primarily be using the Seatbelts dataset which reports aggregate monthly automobile casualties in Great Britain from the year 1969 to 1979.

1 Initial Inspection

  1. Include the datasets package. Then read in the Seatbelts dataset, and print the first five rows (0.5pt).
  2. Check the dimension of the data, and find any missing observations (if they exist) (0.5pt).
    dim(Seatbelts)
  3. Use the as.data.frame command to coerce the Seatbelts into a dataframe, and store it in a variable called dat (0.5pt).
  4. Find the median of the drivers variable in dat (0.5pt).
#1
library(datasets)
#2
head(Seatbelts, 5)
##      DriversKilled drivers front rear   kms PetrolPrice VanKilled law
## [1,]           107    1687   867  269  9059   0.1029718        12   0
## [2,]            97    1508   825  265  7685   0.1023630         6   0
## [3,]           102    1507   806  319  9963   0.1020625        12   0
## [4,]            87    1385   814  407 10955   0.1008733         8   0
## [5,]           119    1632   991  454 11823   0.1010197        10   0
#3
dat <- as.data.frame(Seatbelts)
#4
median(dat$drivers)
## [1] 1631
#Median = 1631
  1. Include the tidyverse library, and create a bar plot for the VanKilled variable within dat. Add color to the bars of this plot, and label the horizontal and vertical axes “VanKilled” and “Frequency” respectively (2pt).

2 Visualization

#1
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
ggplot(data=dat) +
  geom_bar(aes(x=VanKilled),color="dimgrey",fill="deepskyblue1",size=2) + 
  xlab("VanKilled") + ylab("Frequency") + 
  theme_classic()

  1. Next create a scatterplot relating DriversKilled to PetrolPrice and coloring the points according to VanKilled. Remark as to why it might make sense that the trend appears to be negative (2pt).
#2
ggplot(dat) + 
  geom_point(mapping = aes(x = DriversKilled, y = PetrolPrice, color = VanKilled))

#The graph is downward trending, as there are less drivers on the road when petrol is expensive. As a result of decreased drivers, there will be a decrease in chance of road accidents, resulting in less van deaths on the road. 

Next, we will pivot to the volcano dataset. This is a matrix reporting topographical information on the Maunga Whau Volcano. More specifically, entries report height of the volcano at specific points along the terrain. I have already included the necessary packages, and I have done some data wrangling (which we will learn about in the coming lectures) to make this data conformable to a heatmap.

  1. Before I alter the data via wrangling, check the class, dimension, and type of the volcano matrix (2pt).

  2. After the data has been manipulated, create a heatmap plotting X against Y and filling according to Z. Include horizontal and vertical axis labels as well as a title. In terms of X and Y coordinates, report the approximate position of the crater (spout) of the volcano (2pt).

library(tidyr)
library(tibble)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(dplyr)

#1
class(dat)
## [1] "data.frame"
dim(dat)
## [1] 192   8
typeof(dat)
## [1] "list"
#2

# Heatmap 
volcano %>%
  
  # Data wrangling
  as_tibble() %>%
  rowid_to_column(var="X") %>%
  gather(key="Y", value="Z", -1) %>%
  
  # Change Y to numeric
  mutate(Y=as.numeric(gsub("V","",Y))) %>%

#uncomment the above piping function before beginning ggplot command
#this will automatically pass in the altered volcano data.  

#student input
  ggplot() +
  geom_tile(aes(X, Y, fill = Z)) + xlab("Distance Along Terrain") + ylab("Height") + 
  ggtitle("Volcano Information")

# The approximate position of the crater is x = 25, y = 35.