Alter the below code chunks to complete the lab. Also, ensure you
include your name in the author portion of this header.
When finished, knit your file to an html and submit this to your TA for
grading.
This first lab is meant to get you acquainted with the data
visualization process in R. We will primarily be using the
Seatbelts dataset which reports aggregate monthly
automobile casualties in Great Britain from the year 1969 to 1979.
datasets package. Then read in the
Seatbelts dataset, and print the first five rows
(0.5pt).as.data.frame command to coerce the
Seatbelts into a dataframe, and store it in a variable
called dat (0.5pt).drivers variable in
dat (0.5pt).#1
library(datasets)
#2
head(Seatbelts, 5)
## DriversKilled drivers front rear kms PetrolPrice VanKilled law
## [1,] 107 1687 867 269 9059 0.1029718 12 0
## [2,] 97 1508 825 265 7685 0.1023630 6 0
## [3,] 102 1507 806 319 9963 0.1020625 12 0
## [4,] 87 1385 814 407 10955 0.1008733 8 0
## [5,] 119 1632 991 454 11823 0.1010197 10 0
#3
dat <- as.data.frame(Seatbelts)
#4
median(dat$drivers)
## [1] 1631
#Median = 1631
tidyverse library, and create a bar plot
for the VanKilled variable within dat. Add
color to the bars of this plot, and label the horizontal and vertical
axes “VanKilled” and “Frequency” respectively (2pt).#1
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
ggplot(data=dat) +
geom_bar(aes(x=VanKilled),color="dimgrey",fill="deepskyblue1",size=2) +
xlab("VanKilled") + ylab("Frequency") +
theme_classic()
DriversKilled to
PetrolPrice and coloring the points according to
VanKilled. Remark as to why it might make sense that the
trend appears to be negative (2pt).#2
ggplot(dat) +
geom_point(mapping = aes(x = DriversKilled, y = PetrolPrice, color = VanKilled))
#The graph is downward trending, as there are less drivers on the road when petrol is expensive. As a result of decreased drivers, there will be a decrease in chance of road accidents, resulting in less van deaths on the road.
Next, we will pivot to the volcano dataset. This is a
matrix reporting topographical information on the Maunga Whau Volcano.
More specifically, entries report height of the volcano at specific
points along the terrain. I have already included the necessary
packages, and I have done some data wrangling (which we will learn about
in the coming lectures) to make this data conformable to a heatmap.
Before I alter the data via wrangling, check the class, dimension, and type of the volcano matrix (2pt).
After the data has been manipulated, create a heatmap plotting X against Y and filling according to Z. Include horizontal and vertical axis labels as well as a title. In terms of X and Y coordinates, report the approximate position of the crater (spout) of the volcano (2pt).
library(tidyr)
library(tibble)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(dplyr)
#1
class(dat)
## [1] "data.frame"
dim(dat)
## [1] 192 8
typeof(dat)
## [1] "list"
#2
# Heatmap
volcano %>%
# Data wrangling
as_tibble() %>%
rowid_to_column(var="X") %>%
gather(key="Y", value="Z", -1) %>%
# Change Y to numeric
mutate(Y=as.numeric(gsub("V","",Y))) %>%
#uncomment the above piping function before beginning ggplot command
#this will automatically pass in the altered volcano data.
#student input
ggplot() +
geom_tile(aes(X, Y, fill = Z)) + xlab("Distance Along Terrain") + ylab("Height") +
ggtitle("Volcano Information")
# The approximate position of the crater is x = 25, y = 35.