title: “week1assigment” author: “Ndisha Mwakala” date: “1/11/2020” output: html_document —
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.3.0 --
## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3
## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3
## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0
## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Week 1 excercises: Chapter 3; 3.3.1
Question1: What’s gone wrong with this code? Why are the points not blue?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
#Answer: This is because the aesthetic setting is being done manually, by providing a color “blue”, which should be done outside of the function aes(). In the code above, it was done inside which makes the code not to work
#Question2: Which variables in mpg are categorical? Which variables are continuous? #(Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg?
#Answer: Categorical variables: manufacturer, trans, drv, fl, class #Continuos variables: displ, year, cyl, cty, hwy, model
#Q2b: How can you see this information when you run mpg? #Answer: Under the dataset header (where variable names are displayed, R provides information about the variable type)
#Question3: Map a continuous variable to color, size, and shape.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = year))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = model))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = displ))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = year))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = model))
## Warning: Using size for a discrete variable is not advised.
#How do these aesthetics behave differently for categorical vs. continuous variables? #Answer: for continous variables, R creates categories and then plots the values against the nearest category for each value unlike categorical variables which are definite - already specified
#For size, R gives a warning message when a continous variable is used for aescthetics but still plots the values
#Question4: What happens if you map the same variable to multiple aesthetics?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = year, size = year))
#Answer: The charts produced are appropriately displayed with e.g. in the case above the latest year having larger dots and colored as shown on the scale
#Question5: What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size=trans, stroke=5))
## Warning: Using size for a discrete variable is not advised.
#Answer: The stroke aesthetic modifys the width of the border. It works with shapes that have borders.
#Question6: What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you’ll also need to specify x and y.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ<5))
#Answer: The values are categorised before being plotted e.g. in the case above, the values are categorised into #either less than 5 or greater than five - and colored accordingly
#Week 1 excercises: Chapter 3; 3.5.1
#Question1: What happens if you facet on a continuous variable?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ displ, nrow = 2)
#Answer:The output doesnt make sense because R tries to plot a chart for each value which leads to too many charts
#Question2: What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl)) +
facet_grid(drv ~ cyl)
#Answer: The empty cells mean that there are no point to plot for the values of drv & cyl provided e.g. there is 4 #wheel drive with 5 cylinders
#Question3: What plots does the following code make? What does . do?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
#Answer: This code plots the x~y values, faceted either on the x axis of y axis by the variable provided. The . #completes the formula without having to provide a variable for either x or y, depending on what I want to achieve.
#Question4: Take the first faceted plot in this section:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
#Question4b:What are the advantages to using faceting instead of the colour aesthetic? #Answer: Faceting helps with quick visuals - one can easily see how the data is distributed
#What are the disadvantages? #Answer: It might not be possible to facet some datasets e.g. large datasets How might the balance change if you had a larger dataset? #Answer: It wont be possible to visually understand the data with too many rows and columns.
#Question5: Read ?facet_wrap. What does nrow do? What does ncol do? #Answer: nrow determines number of rows that will be displayed while ncol determine number of columns #What other options control the layout of the individual panels? #Answer: scales, as.table, switch #Why doesn’t ?facet_grid() have nrow and ncol arguments?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
#Answer: Because facet_grid() forms a matrix of panels as defined by the row and column facting variables therefore #they dont have to be defined as opposed to facet_wrap() which is a sequence of panels
#Questions6: When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?