Wk 4 Assignment: Data visualisation from the Hands-on Programming with R

title: "Untitled" author: "Suma Pendyala" date: "6/7/2020" output: html_document

Sections: Introduction, Prerequisites, First Steps, The mpg Data Frame, Creating a ggplot, A Graphing Template Exercises: 1, 2 (Read it as mpg and not mtcars), 4, 5

library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.1     v purrr   0.3.4
## v tibble  3.0.1     v dplyr   1.0.0
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ---------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
mpg
## # A tibble: 234 x 11
##    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl    class
##    <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr> <chr>
##  1 audi         a4         1.8  1999     4 auto(l~ f        18    29 p     comp~
##  2 audi         a4         1.8  1999     4 manual~ f        21    29 p     comp~
##  3 audi         a4         2    2008     4 manual~ f        20    31 p     comp~
##  4 audi         a4         2    2008     4 auto(a~ f        21    30 p     comp~
##  5 audi         a4         2.8  1999     6 auto(l~ f        16    26 p     comp~
##  6 audi         a4         2.8  1999     6 manual~ f        18    26 p     comp~
##  7 audi         a4         3.1  2008     6 auto(a~ f        18    27 p     comp~
##  8 audi         a4 quat~   1.8  1999     4 manual~ 4        18    26 p     comp~
##  9 audi         a4 quat~   1.8  1999     4 auto(l~ 4        16    25 p     comp~
## 10 audi         a4 quat~   2    2008     4 manual~ 4        20    28 p     comp~
## # ... with 224 more rows
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy))
plot of chunk unnamed-chunk-3
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = class))
plot of chunk unnamed-chunk-4
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, size = class))
## Warning: Using size for a discrete variable is not advised.
plot of chunk unnamed-chunk-5
#> Warning: Using size for a discrete variable is not advised.
# Left
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
## Warning: Using alpha for a discrete variable is not advised.
plot of chunk unnamed-chunk-6
# Right
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, shape = class))
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 7. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 62 rows containing missing values (geom_point).
plot of chunk unnamed-chunk-6
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
plot of chunk unnamed-chunk-7

3.2.4 Exercises 1 - Run ggplot(data = mpg). What do you see? 2 - How many rows are in mpg? How many columns? 3 - What does the drv variable describe? Read the help for ?mpg to find out. 4 - Make a scatterplot of hwy vs cyl. 5 - What happens if you make a scatterplot of class vs drv? Why is the plot not useful?

library(tidyverse)
mpg <- ggplot2::mpg
ggplot(data = mpg)
plot of chunk unnamed-chunk-8
dim(mpg)
## [1] 234  11
nrow(mpg)
## [1] 234
ncol(mpg)
## [1] 11

f = front-wheel drive r = rear wheel drive 4 = 4wd

ggplot(data = mpg) +
  geom_point(mapping = aes(x = cyl, y = hwy))
plot of chunk unnamed-chunk-10
ggplot(data = mpg) +
  geom_point(mapping = aes(x = drv, y = class))
plot of chunk unnamed-chunk-11

3.3.1 Exercises 1 - What's gone wrong with this code? Why are the points not blue? 2 - Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg? 3 - Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables? 4 - What happens if you map the same variable to multiple aesthetics? 5 - What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point) 6 -What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you'll also need to specify x and y.

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
plot of chunk unnamed-chunk-12
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
plot of chunk unnamed-chunk-13
str(mpg)
## tibble [234 x 11] (S3: tbl_df/tbl/data.frame)
##  $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##  $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##  $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr [1:234] "f" "f" "f" "f" ...
##  $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr [1:234] "p" "p" "p" "p" ...
##  $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = displ))
plot of chunk unnamed-chunk-15
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, size = displ))
plot of chunk unnamed-chunk-16
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = drv, shape = drv))
plot of chunk unnamed-chunk-17
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = displ, size = displ))
plot of chunk unnamed-chunk-18
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), shape = 21,
             fill = 'red', size = 4, stroke = 3, color = 'white')
plot of chunk unnamed-chunk-19
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))
plot of chunk unnamed-chunk-20

3.5.1 Exercises answers

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_wrap(~ cty, nrow = 2)
plot of chunk unnamed-chunk-21
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(cty ~ year)
plot of chunk unnamed-chunk-22
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y= hwy)) +
  facet_grid(drv ~ cyl)
plot of chunk unnamed-chunk-23
ggplot(data = mpg) +
  geom_point(mapping = aes(x = drv, y = cyl))
plot of chunk unnamed-chunk-24
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)
plot of chunk unnamed-chunk-25
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)
plot of chunk unnamed-chunk-26
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_wrap(~ class, nrow = 2)
plot of chunk unnamed-chunk-27

3.6.1 Exercises answers

geom_line(), geom_boxplot(), geom_histogram(), and geom_area().

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-28
ggplot(data = mpg) +
  geom_smooth(
    mapping = aes(x = displ, y = hwy, color = drv),
    show.legend = TRUE
  )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-29
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-30
ggplot() +
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-30
ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) +
  geom_point() +
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-31
ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) +
  geom_point() +
  geom_smooth(mapping = aes(group = drv), se = FALSE, show.legend = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-32
ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) +
  geom_point(mapping = aes(color = drv)) +
  geom_smooth(mapping = aes(color = drv, group = drv), se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-33
ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) +
  geom_point(mapping = aes(color = drv)) +
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-34
ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) +
  geom_point(mapping = aes(color = drv)) +
  geom_smooth(mapping = aes(linetype = drv), se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot of chunk unnamed-chunk-35
ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) +
  geom_point(mapping = aes(fill = drv), color = 'white', stroke = 2, shape = 21)
plot of chunk unnamed-chunk-36

3.7.1 Exercises answers

diamonds %>% group_by(cut) %>% summarize(median_y = median(depth),
                                         min_y = min(depth),
                                         max_y = max(depth)) %>%
  ggplot() +
  geom_pointrange(mapping = aes(x = cut, y = median_y, ymin = min_y, ymax = max_y)) +
  labs(y = 'depth')
## `summarise()` ungrouping output (override with `.groups` argument)
plot of chunk unnamed-chunk-37
ggplot(data = diamonds) +
  geom_bar(aes(x = cut))
plot of chunk unnamed-chunk-38
diamonds %>% group_by(cut) %>% summarise(count = n()) %>%
  ggplot() +
  geom_col(mapping = aes(x = cut, y = count))
## `summarise()` ungrouping output (override with `.groups` argument)
plot of chunk unnamed-chunk-39
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop..))
plot of chunk unnamed-chunk-40
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
plot of chunk unnamed-chunk-41
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))
plot of chunk unnamed-chunk-42
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = color))
plot of chunk unnamed-chunk-43
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = color),
           position = 'dodge')
plot of chunk unnamed-chunk-44

3.8.1 Exercises answers

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point()
plot of chunk unnamed-chunk-45
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter()
plot of chunk unnamed-chunk-46
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_count()
plot of chunk unnamed-chunk-47
ggplot(data = mpg) +
  geom_boxplot(mapping = aes(y = displ, x = drv, color = factor(year)))
plot of chunk unnamed-chunk-48

3.9.1 Exercises answers

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = clarity)) +
  coord_polar()
plot of chunk unnamed-chunk-49
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() +
  geom_abline() +
  coord_fixed()
plot of chunk unnamed-chunk-50