Wk 4 Assignment: Data visualisation from the Hands-on Programming with R
title: "Untitled" author: "Suma Pendyala" date: "6/7/2020" output: html_document
Sections: Introduction, Prerequisites, First Steps, The mpg Data Frame, Creating a ggplot, A Graphing Template Exercises: 1, 2 (Read it as mpg and not mtcars), 4, 5
library(tidyverse)
mpg
## # A tibble: 234 x 11 ## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 audi a4 1.8 1999 4 auto(l~ f 18 29 p comp~ ## 2 audi a4 1.8 1999 4 manual~ f 21 29 p comp~ ## 3 audi a4 2 2008 4 manual~ f 20 31 p comp~ ## 4 audi a4 2 2008 4 auto(a~ f 21 30 p comp~ ## 5 audi a4 2.8 1999 6 auto(l~ f 16 26 p comp~ ## 6 audi a4 2.8 1999 6 manual~ f 18 26 p comp~ ## 7 audi a4 3.1 2008 6 auto(a~ f 18 27 p comp~ ## 8 audi a4 quat~ 1.8 1999 4 manual~ 4 18 26 p comp~ ## 9 audi a4 quat~ 1.8 1999 4 auto(l~ 4 16 25 p comp~ ## 10 audi a4 quat~ 2 2008 4 manual~ 4 20 28 p comp~ ## # ... with 224 more rows
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = class))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, size = class))
## Warning: Using size for a discrete variable is not advised.

#> Warning: Using size for a discrete variable is not advised.
# Left ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
## Warning: Using alpha for a discrete variable is not advised.

# Right ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, shape = class))
## Warning: The shape palette can deal with a maximum of 6 discrete values because ## more than 6 becomes difficult to discriminate; you have 7. Consider ## specifying shapes manually if you must have them.
## Warning: Removed 62 rows containing missing values (geom_point).

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

3.2.4 Exercises 1 - Run ggplot(data = mpg). What do you see? 2 - How many rows are in mpg? How many columns? 3 - What does the drv variable describe? Read the help for ?mpg to find out. 4 - Make a scatterplot of hwy vs cyl. 5 - What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
library(tidyverse) mpg <- ggplot2::mpg ggplot(data = mpg)

dim(mpg)
## [1] 234 11
nrow(mpg)
## [1] 234
ncol(mpg)
## [1] 11
f = front-wheel drive r = rear wheel drive 4 = 4wd
ggplot(data = mpg) + geom_point(mapping = aes(x = cyl, y = hwy))

ggplot(data = mpg) + geom_point(mapping = aes(x = drv, y = class))

3.3.1 Exercises 1 - What's gone wrong with this code? Why are the points not blue? 2 - Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg? 3 - Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables? 4 - What happens if you map the same variable to multiple aesthetics? 5 - What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point) 6 -What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you'll also need to specify x and y.
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

str(mpg)
## tibble [234 x 11] (S3: tbl_df/tbl/data.frame) ## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ... ## $ model : chr [1:234] "a4" "a4" "a4" "a4" ... ## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ... ## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ... ## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ... ## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ... ## $ drv : chr [1:234] "f" "f" "f" "f" ... ## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ... ## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ... ## $ fl : chr [1:234] "p" "p" "p" "p" ... ## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = displ))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, size = displ))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv, shape = drv))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = displ, size = displ))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), shape = 21, fill = 'red', size = 4, stroke = 3, color = 'white')

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))

3.5.1 Exercises answers
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_wrap(~ cty, nrow = 2)

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(cty ~ year)

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y= hwy)) + facet_grid(drv ~ cyl)

ggplot(data = mpg) + geom_point(mapping = aes(x = drv, y = cyl))

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(drv ~ .)

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(. ~ cyl)

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_wrap(~ class, nrow = 2)

3.6.1 Exercises answers
geom_line(), geom_boxplot(), geom_histogram(), and geom_area().
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(se = FALSE)

ggplot(data = mpg) + geom_smooth( mapping = aes(x = displ, y = hwy, color = drv), show.legend = TRUE )

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point() + geom_smooth()

ggplot() + geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) + geom_point() + geom_smooth(se = FALSE)

ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) + geom_point() + geom_smooth(mapping = aes(group = drv), se = FALSE, show.legend = FALSE)

ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) + geom_point(mapping = aes(color = drv)) + geom_smooth(mapping = aes(color = drv, group = drv), se = FALSE)

ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) + geom_point(mapping = aes(color = drv)) + geom_smooth(se = FALSE)

ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) + geom_point(mapping = aes(color = drv)) + geom_smooth(mapping = aes(linetype = drv), se = FALSE)

ggplot(data = mpg, mapping = aes(y = hwy, x = displ)) + geom_point(mapping = aes(fill = drv), color = 'white', stroke = 2, shape = 21)

3.7.1 Exercises answers
diamonds %>% group_by(cut) %>% summarize(median_y = median(depth), min_y = min(depth), max_y = max(depth)) %>% ggplot() + geom_pointrange(mapping = aes(x = cut, y = median_y, ymin = min_y, ymax = max_y)) + labs(y = 'depth')

ggplot(data = diamonds) + geom_bar(aes(x = cut))

diamonds %>% group_by(cut) %>% summarise(count = n()) %>% ggplot() + geom_col(mapping = aes(x = cut, y = count))

ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop..))

ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))

ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))

ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = color))

ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = color), position = 'dodge')

3.8.1 Exercises answers
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + geom_point()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + geom_jitter()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + geom_count()

ggplot(data = mpg) + geom_boxplot(mapping = aes(y = displ, x = drv, color = factor(year)))

3.9.1 Exercises answers
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity)) + coord_polar()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + geom_point() + geom_abline() + coord_fixed()
